Entry Date:
December 15, 2006

Computational Biology Group (CBG)


Work focuses on the computational foundations of genomics, developing algorithmic, statistical, and machine learning methods to interpret the functional elements encoded in the human genome, reconstruct the regulatory circuits they define, and understand their evolutionary mechanisms.

We work in a highly interdisciplinary environment at the interface of Computer Science and Biology. Since its inception, our lab has eagerly engaged in collaborative research partnerships with biological and experimental collaborators, facilitated by our affiliation with the Broad Institute and the Computational and Systems Biology initiative (CSBi) at MIT, our participation in the ENCODE and modENCODE consortia, and by several other ongoing collaborations at MIT, Harvard, and the Harvard Medical School affiliated hospitals.

Research focuses on the following major questions, central to our understanding of biological systems:

(*) Genome Interpretation: We have developed comparative genomics methods which can directly discover diverse functional genomic elements based on their characteristic patterns of evolutionary change across related species. These "evolutionary signatures" are dictated by precise functional constraints unique to each class of functional elements, thus enabling their genome-wide discovery. We have used such signatures in the human, fly, and yeast genomes to recognize protein-coding genes and exons, RNA genes and structures, microRNAs and their targets, and diverse classes of regulatory elements. This has resulted in many surprising findings and new insights, including extensive stop-codon read-through in adult brain proteins, novel types of RNA structures involved in post-transcriptional and translational regulation, miRNA targeting in protein-coding regions, functionality of both arms of a miRNA hairpin, and both sense and anti-sense miRNAs, and the discovery of a new class of long intergenic non-coding RNAs.

(*) Gene regulation: We have also developed computational methods to study the cellular circuitry of genomes, which directs gene expression levels in response to environmental and developmental stimuli. Our work has resulted in global maps of regulatory elements in yeast and animal genomes, and their role in specifying pre- and post-transcriptional gene regulatory networks. Combining comparative genomics with experimental datasets, we have studied condition-specific and tissue-specific activation networks, and revealed new insights on activation and silencing of developmental genes, and post-transcriptional targeting by miRNA genes.

(*) Epigenomics: With the recent availability of genome-wide maps of histone modifications, we have developed new methods for the systematic discovery of recurrent combinations of chromatin marks, or "chromatin signatures," which we found to be associated with very specific types of functional elements, including diverse classes of enhancers, promoters, and insulators. We have used these signatures to discover new elements, including novel non-coding RNA genes, and to systematically study the dynamics of chromatin state across tissues and during development, and to discover the sequence elements and grammars governing those changes. We are currently also exploring the role of small non-coding RNAs in the establishment, maintenance, and targeting of chromatin state.

(*) Genome evolution: We have also developed methods to study systematic differences between the species compared, and uncovered important evolutionary mechanisms for the emergence of new functions. Our work provided definitive proof of an ancestral whole-genome duplication in yeast, which led to a complete doubling of the gene count, and was rapidly followed by massive gene loss, asymmetric divergence, and new gene functions. To further understand the evolutionary processes leading to new functions, we developed a phylogenomic framework for studying gene family evolution in the context of complete genomes, revealing two largely independent evolutionary forces, dictating gene- and species-specific mutation rates. De-coupling these two rates also allowed us to develop the first machine-learning approach to phylogeny, resulting in drastically higher accuracies than any existing phylogenetic method.