The CompBio Student Seminar Series features short seminars by graduate students from departments across the University about current research. Seminars are held every other Tuesday evening in the Designated Emphasis interaction space, the Campbell Annex.
2009-2010 Seminar Schedule
Title: Stability in Times of Change: Biophysics of Robust Gene Control
Title: Computational Data Driven Algorithms for MicroRNA prediction in Ciona
Title: The Concomitant Genomic Analysis of an Obligate Pathogen-Host Interaction
Abstract: The development of genomic tools to concomitantly study obligate eukaryotic biotrophic pathogens interacting with their respective hosts has been rapidly advanced by the use of high throughput next generation deep-sequencing technologies. These technologies now allow for the simultaneous isolation of different RNA species (mRNA or smallRNAs) from both the pathogen and infected host tissue with the ability to deep sequence all classes of RNA and then computationally assign results to either the pathogen or host genome. In this study, we aim at analyzing the well-characterized pathosystem of the obligate eukaryotic oomycete pathogen, Hyaloperonospora arabidopsidis and the model plant host, Arabidopsis thaliana. Using next generation sequencing, we are monitoring the networks of genes that are either up or down regulated during different stages of pathogenesis in both host and the pathogen. Additionally, we are interested in computational identification of specific class of virulence factors from H. arabidopsidis, called “effectors”. Oomycete effectors are united by a common domain architecture and are key factors in determining outcome of the infection. This presentation would summarize the current progress in our lab in genomic analysis of H. arabidopsidis / A. thaliana pathosystem.
Title: Theory and Application of Random Forests to Genome Wide Association studies
Abstract: Random Forests is one of the of the few machine learning algorithms capable of handling large GWA SNP data sets. I will review the theory behind the RF algorithm and discuss some issues that arise when applying it to this type of data. I will discuss some new methods in development for selecting important variables.
Title: Berkeley PHOG: PhyloFacts Orthology Group
Abstract: We present a new algorithm, PHOG phylogenomic orthology prediction, which is available as a web service through the PhyloFacts Phylogenomic Encyclopedias at http://phylofacts.berkeley.edu/orthologs.
Title: Inferring Adaptive Landscapes from Phylogenetic Trees
Abstract: How many adaptive peaks underly the evolution of a particular set of phenotypic traits? At what rate do transitions occur between these peaks? What factors are responsible for the rapid diversification of character traits in some clades while others remain virtually unchanged? Readily available genomic data and increasingly powerful computational resources have opened the door to a new approach to understanding phenotypic evolution. I will review some of the existing techniques in comparative methods for inferring evolutionary patterns from phylogenetic trees. I will show examples of where this inference can be misleading and highlight the need for nonlinear models of phenotypic evolution to capture phenomena such as traits that cluster into multiple niches or adaptive peaks and understanding transition rates between them. I will then outline our approach for addressing these challenges.
Title: Genomic Redundancy Confers Robust, Adaptable Gene Expression
Abstract: The traditional view of gene regulation postulates the existence of individual enhancer sequences for each spatiotemporal pattern associated with a given gene. In 2008, Hong et al. challenged this by reporting that several important dorsal-ventral patterning genes in Drosophila are driven by multiple, separable enhancers which are active in the same tissues at the same time, and that many additional developmental control genes were predicted to have such “shadow enhancers”. It was proposed that such enhancers provide an opportunity for the evolution of novel patterns of gene expression without compromising the original genetic functions.
We present evidence from genomic analysis and transgenic expression assays that many of these shadow enhancers are in fact deeply conserved among Drosophilids, suggesting a functional, rather than evolutionary transient role. Using a mathematical model of enhancer activity to explore other possible selective advantages to dual enhancers, we identify conditions in which dual enhancers acting on a single promoter may have synergistic (super-additive) effects on the reliability of gene induction in response to internal stochastic noise and/or external variation (e.g. temperature fluctuations). Using a transgenic reporter system, quantitative high resolution FISH, and custom software, we have assayed the reliability of expression (i) when activator concentrations are changed, and (ii) when the temperature is changed. Genes that have a shadow enhancer and primary enhancer show less variability under these stressed conditions than genes without. Moreover, the independent enhancers of the shadow-primary pair, when taken alone, show substantially less robustness than when both function together. We postulate that dual enhancers have evolved to ensure robust expression and adaptability to environmental variations.
Abstract: In this presentation I will give some insights into the analysis of HIV quasispecies populations using the Illumina platform. This novel method aims to amplify full length HIV genomes without the use of sequence specific primers for high throughput DNA sequencing, specifically using the Illumina technology. Illumina was chosen due to its ability to provide greater coverage of the HIV genome compared to 454, allowing for more comprehensive characterization of the heterogeneity present in the HIV samples analyzed. Our novel amplification method in combination with Illumina sequencing was used to analyze two HIV populations: a homogenous HIV population based on the canonical NL4-3 strain and a heterogeneous viral population obtained from an HIV positive patient’s infected T cells. This study demonstrates how the use of our novel amplification method in combination with Illumina sequencing provides in-depth, complete coverage of the HIV genome, allowing us to better characterize the quasispecies present in a clinically relevant HIV population as well as provide a potential method for studying how HIV mutates in response to a selective pressure.
Title:: Mutual Information Analysis Reveals Functionally Important Co-Evolving Residues in HIV-1 Tat with Implications for Viral Latency
Abstract: After HIV-1 integrates into the host genome, the provirus exhibits two distinct gene expression states – viral transactivation resulting in replication vs. a rare inactive state that leads to viral latency. Shortly after integration, the few Tat molecules synthesized via weak viral gene expression bind TAR and recruit other cellular factors leading to transcriptional activation, whereas the failure of these few Tat molecules to initiate the positive-feedback loop contributes to transcriptional silence and latency. The stochastic nature of this positive feedback loop can thus give rise to these two distinct gene expression modes. We have studied how sequence variation in the critical viral protein Tat among natural HIV-1 subtypes drives different gene expression dynamics.
A standard approach for identifying structurally and functionally important residues in proteins is multiple sequence alignment to find conserved positions. However, this analysis misses residues that may not be individually conserved but co-evolve with other positions in the protein. We used a statistical technique termed mutual information and identified pairs of apparently co-evolving amino acid pairs in Tat. These in silico results were verified experimentally by introducing mutations in a lentiviral system, LTR-GFP-IRES-Tat, revealing that the pairs of residues that were statistically strongly correlated play an important role in viral transactivation. These coevolving residues contribute to unique gene expression phenotypes, yielding insights into mechanisms of viral evolution and latency.
Title:: Modeling CRISPR: The Evolutionary Dynamics of Hereditable, Adaptive Immunity in Prokaryotes
Abstract: The genomic integrity of all organisms is under constant threat from mobile genetic elements such as viruses and plasmids. Just as eukaryotes have a genomically-based specific immune mechanism RNA-i, an analogous, although evolutionarily distinct, pathway exists in prokaryotes. The genetic interference system in prokaryotes is called CRISPR, i.e., Clustered Regularly Interspersed Short Palindromic Repeats. The CRISPR locus is found in 40% of Bacteria and 90% of Archaea.
Within a CRISPR locus, extrachromosomal material is stored in locations termed spacers. These spacers of 30-80 base pairs in length provide the host prokaryote with specific immunity against any invading pathogen or genetic element with the same spacer sequence in its genome. This selects for viral “proto-spacer” evolution to enable the infection of new hosts. At the same time, prokaryotes continually extract new spacers from new infecting agents, leading to a coevolutionary war with surprising outcomes.
Title:: Using data driven modeling to reveal structure in a metazoan transcriptional network
Abstract: Gene expression in the early embryo of the fruit fly Drosophila melanogaster is governed primarily by transcriptional regulation. Though hundreds of embryonically transcribed genes are known to be regulated in complex spatiotemporal patterns, only a handful of examples exist to illustrate the regulatory structure underlying these patterns. My research explores a large spatiotemporal atlas of gene expression with a combination of models to learn gene expression relationships in the fly embryo. I will summarize my work to date and discuss some predictions currently being tested.
Title:: Association studies in admixed populations – Understanding the Biases and Introducing Normalization Methods
Abstract: The field of association studies plays a major role in human genetics. If the cases and controls of an association study do not hold the same population structure, significant biases towards non-causal alleles arise when using costumed association tests. In admixed populations, different population structures may exist, but the structures vary along the genome, calling for different methods of normalization for population stratification.
In this talk I will present ongoing work with Sriram Sankararaman, Gadi Kimmel and Eran Halperin, on the characterization of different biases present in association studies of admixed populations, and the methods we are developing to correct for these biases.