Research
Preliminary Research Agenda

The CCB focuses specifically on predictive cell biology, a research area that uses computational methods to understand, explain, and predict complex function from the basic molecular building blocks. This work emerges from several simultaneous streams of research. The project’s bedrock is genome sequence, informed through evolutionary understanding and inferences. Simultaneously, we aim to explore the ‘parts list’ of a cell: starting from the genome we aim to identify and understand the functions of genomic regions, proteins, and other biomolecules that act in cellular processes. Even while this task is in progress, work is going on in trying to quantitatively describe the dynamic interactions of these cellular components. All of these enterprises progress from the computational and statistical analysis of vast quantities of molecular data.

Predictive cell biology will offer challenges for many decades, and our program will be responsive to new directions. The field will lay the groundwork for new understandings of the differences between cells, and yield insight into how differences in genome sequence, composition and organization across the kingdoms of life manifest themselves as differences in cellular behavior. These models will thus touch upon evolutionary, developmental, and comparative biology. Ultimately, predictive cell biology will inform the development of medical treatment and understanding of how we interact with environment.

The following examples, while by no means exhaustive, illustrate the scope of our agenda.

Phylogeny and systematics
  1. Comparative Systematics: Develop biologically sophisticated and computationally tractable methods to reconstruct evolutionary relationships.
  2. Genome evolution: Develop new algorithms to align, compare and model the evolution of whole genomes, incorporating complex genomic rearrangements.

Identification and functional analysis of cellular molecules
  1. RNA genomics: Employ stochastic and combinatorial methods to identify RNA-encoding genes, predict their function, and interpret three-dimensional structure.
  2. Knowledge based prediction of protein structure: Use knowledge of secondary structure, recurrent sequence motifs, and evolutionary relationships to predict and design protein structures, protein folding mechanisms, and protein-protein interactions.
  3. Protein function determination: Assign function to proteins based on structural analysis, homology to known proteins, and phylogeny.
  4. Analysis of transcriptional regulation: Determine the interactions between transcription factors and genomic DNA that govern the transcription of genes.

Molecular process modeling
  1. Metabolic pathway modeling: Construct mathematical models of coupled systems of chemical reactions catalyzed by proteins, using stochastic differential equations and simulation techniques.
  2. Cellular dynamics: Construct mathematical models of cell motility, protein transport, dynamics of DNA, molecular machines, and cytoskeletal structure.
  3. Relation of cellular processes and genetic variation to the organism: Relate variations in transcriptional regulatory networks and genomic sequence to organismal form and function.

Database design and management
Devise methods for specifying, constructing and accessing distributed databases for heterogeneous biological data such as images, protein and DNA sequences, gene expression data, and published articles – as well as the relations amongst these data.


Applied Statistics and Statistical Computing
Devise multivariate statistical learning methods and software for cluster analysis, prediction, computational inference, causal inference, multiple testing, and model/feature selection. Although biological questions can be highly-specific, statistical and computational methodology are general and can be applied to address an extraordinary variety of different biological questions, such as phylogeny, expression analysis, transcription regulation, molecular process modeling, protein structure and function prediction.