The Center for Computational Biology Seminar Series features research areas that are currently being investigated by professors from national and international research institutions.
2015-2016 Seminar Schedule
Title: Differential Analysis of Bifurcating Single-cell Gene Expression Trajectories
Abstract: Single-cell trajectory analysis is a powerful approach for studying gene regulatory changes during cell differentiation and other dynamic processes. Recently, we showed that individual cells can be ordered according to progress through differentiation by analyzing their transcriptomes with unsupervised algorithms. Previous studies by our group and others have been limited to linear trajectories tracking unipotent progenitor cells. Such cellular trajectories have only one outcome. However, during development, cells make fate decisions that lead to one of several mutually exclusive states in the adult. How to reconstruct and analyze single-cell trajectories that include and span fate decisions is an open problem.
We will describe an approach for reconstructing single-cell trajectories that include bifurcations corresponding to cell fate decisions. We then describe statistical methods for identifying genes that are differentially expressed between trajectory outcomes. We illustrate the power of this technique by analyzing differentiating bronchoalveolar progenitor cells undergoing specification into type I and type II pneumocytes. This analysis reveals hundreds of genes with lineage-dependent expression. Our approach, which encodes a topological description of the trajectory as continuous predictors in a generalized linear model, can distinguish, for example, genes that become lineage-dependent proximal to the fate specification from those that are restricted to a lineage later in differentiation. We conclude with an analysis of bifurcations in settings other than development to argue that single cell trajectory analysis can help pinpoint the genes that drive a process from those more downstream.
Title: New Methods for Studying Polygenic Traits and Polygenic Adaptation in Humans
Abstract: Most common phenotypic variation in humans is highly polygenic. Although there are examples of strong selective sweeps at individual loci, we and others have hypothesized that the bulk of human adaptation occurs through small shifts in allele frequencies at hundreds or thousands of relevant loci. In this talk I will describe our recent work on methods for studying the genetic basis of a variety of complex traits, with an emphasis on gene regulation; and extensions of these approaches to detect signals of polygenic adaptation during the past 2000 years in the ancestors of the British.
Title: A fast and powerful approach for highly contiguous de novo genome assembly
Abstract: Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem, dramatically increasing the scaffold contiguity of assemblies. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago datasets with human DNA and developed a statistical model and a new software pipeline (“HiRise”) that can identify poor quality joins and produce accurate, long-range sequence scaffolds. We used these to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 20 Mbp. We also demonstrated the utility of Chicago for improving existing assemblies by re-assembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kbp to 10Mbp.
Title: Decoding Epigenetic and Transcriptional Programs in Cellular Differentiation
Abstract: In order to differentiate into distinct lineages, multipotent cells
must undergo large-scale remodeling of chromatin and orchestrate dramatic gene expression changes. How do multipotent cells encode the potential for multiple cell fates, and how can we decipher the transcriptional programs that carry out cell state transitions in commitment to specific fates? To address these questions, we carried out an integrative computational analysis of enhancer landscape and gene expression dynamics in hematopoietic differentiation using DNase-seq, histone mark ChIP-seq, and RNA-seq. We examined how early establishment of enhancers and complex regulatory locus control
together govern gene expression changes in cell state transitions. We found that high complexity genes i.e. those with a large total number of DNase-mapped enhancers across the lineage differ architecturally and functionally from low complexity genes, achieve larger expression changes, and are enriched for both cell-type specific and “transition” enhancers, which are established in hematopoietic stem and progenitor cells (HSPCs) and maintained in one differentiated cell fate but lost in others. We then developed a quantitative model to predict gene expression changes from the DNA sequence content and lineage history of active enhancers. Our method accurately predicts expression changes for high complexity genes during differentiation, suggests a novel mechanistic role for PU.1 at transition peaks in B cell specification, and can be used to improve assignment of enhancers to genes. We are currently using these methods to decode cell state transitions in T lymphocyte differentiation in inflammation and T cell “exhaustion” in tumors.
Title: Dimensionality in Biological Data – The Power of Single Cells
Title: What can tens of thousands of human RNA-seq samples tell us about how how much of the genome is transcribed?
Abstract: There has been a lot of debate over how much of the genome is functional over the last several years. This debate has generated a lot of heat, particularly in internet discussions, and mainly due to differing definitions of “functional”. In this talk I will ask a simpler and more clearly defined question: “How much of the genome is transcribed and in what scenarios?” I will explain our cloud-based and annotation-free approach to the analysis of data from RNA-sequencing experiments and make an effort estimate the extent and variation in transcription in the human genome using tens of thousands of human RNA-sequencing samples.