Talks

Click here to vote for the Best Talks!

Speakers listed in order of presentation

Saturday, November 5

1. Aidan McLoughlin – (Huang Lab), Biostatistics
Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data
Abstract: Unsupervised cell clustering on the basis of meaningful biological variation in single-cell RNA sequencing (scRNA seq) data has received significant attention, as it assists with ontological subpopulation identification among the data. A key step in the clustering process is to compute distances between the cells under a specified distance measure. Although particular distance measures may successfully separate cells into biologically relevant clusters, they may fail to retain global structure of the data, such as relative similarity between the cell clusters. In this article, we modify a biologically motivated distance measure, SIDEseq, for use of aggregate comparisons of cell types in large single-cell assays, and demonstrate that, across simulated and real scRNA seq data, the distance matrix more consistently retains global cell type relationships than commonly used distance measures for scRNA seq clustering. We call the modified distance measure “SIDEREF.” SIDEREF visualizations more consistently reflect global structures in the data than other commonly considered distance measures. We utilize relative cell type distances and the SIDEREF distance measure to uncover compositional differences between annotated leukocyte cell groups in a compendium of Mus musculus scRNA seq assays comprising 12 tissues. Utilizing distances to a small, globally representative group of cells can be leveraged as a bi-clustering mechanism in unsupervised deep learning analysis of other single cell genomics data, such as multiomic scRNA seq scTAC seq data.

2. Hunter Nisonoff – (Listgarten Lab), Computational Biology 
Augmenting Neural Networks with Priors on Function Values
Abstract: The need for function estimation in label-limited settings is common in the natural sciences. At the same time, prior knowledge of function values is often available in these domains. For example, data-free biophysics-based models can be informative on protein properties, while quantum-based computations can be informative on small molecule properties. How can we coherently leverage such prior knowledge to help improve a neural network model that is quite accurate in some regions of input space — typically near the training data — but wildly wrong in other regions? Bayesian neural networks (BNN) enable the user to specify prior information only on the neural network weights, not directly on the function values. Moreover, there is in general no clear mapping between these. Herein, we tackle this problem by developing an approach to augment BNNs with prior information on the function values themselves. Our probabilistic approach yields predictions that rely more heavily on the prior information when the epistemic uncertainty is large, and more heavily on the neural network when the epistemic uncertainty is small. 

3. Connor Bybee – (Sommer Lab), Computational Biology
Coupled Oscillatory Neural Networks
Abstract: Understanding the computational dynamics of analog systems may provide insight into biological computation. Furthermore, analog computers may have advantages compared to current digital computers for certain tasks. Therefore, we investigate the ability of coupled oscillator neural networks (NN) to perform important functional tasks, e.g., inference in feedforward NNs, attractor dynamics in recurrent NNs, and optimization of hard combinatorial optimization problems. We demonstrate several advantages of computing with coupled oscillator NNs. Additionally, our model makes predictions of the functional role of cross-frequency coupling observed in biological neural networks.

4. Lightning talk – Yu-Jen (Jennifer) Lin (Brenner Lab), Molecular and Cell Biology 
Is expression or splicing more important in an RNA-seq study?
Abstract: 

5. Lightning talk – Carlos Albors (Song Lab), Electrical Engineering and Computer Sciences
SLC Transporter Variant Effect Prediction
Abstract: SLC transporters are the second largest superfamily of membrane proteins. These are clinically important: more than 120 Mendelian diseases are caused by pathogenic variants in an SLC transporter gene, and over 30 SLC transporters are drug targets. Interpretation of missense variants is an important problem. I’ll discuss how language models trained on homologous protein sequences can make progress on this problem.

6. Lightning talk – Juan Vazquez (Sudmant Lab), Integrative Biology
Building a Model System for Studying the Evolution of Extraordinary Longevity in Bats Using Functional Pangenomics
Abstract:Lifespan is one of the most variable traits across the entire tree of life, and especially in mammals. Differences in lifespans between closely-related species provides a promising avenue for discovering novel pro-longevity pathways using evolutionary techniques. Previous studies looking at the genetics underpinning aging in long-lived mammals have suffered from a combination of low-quality genomes, low-phylogenetic coverage, or long evolutionary times, all of which can negatively affect their power to detect genes associated with longevity. In order to comprehensively study the evolution of aging and aging-associated traits in bats, we are creating a panel of chromosome-scale reference genomes, primary cell lines, and phenotypic responses for 10 species of California Myotis spanning 14.2-million-years and a 5-fold difference in lifespans. Using a combination of functional genomics and pan-genome graph methods, we will examine the genetic basis underlying changes in lifespan in these species, and establish a system for future work on the evolution of longevity and longevity-associated traits in aging-relevant tissues using induced pluripotent stem cells and other ex vivo systems.

7. Laurits Skov – (Moorjani Lab), Molecular and Cell Biology
Reconstructing the history of archaic introgression in modern humans: Insights from whole genome sequences of worldwide populations
Abstract: Analysis of archaic and modern human genomes has revealed evidence of gene flow from archaic hominins (Neanderthal and Denisovans) into modern humans and highlighted its critical role in shaping the genetic and phenotypic variation in modern humans. However, most studies have focused on Europeans and East Asians, with very few genomes from other parts of the world, leaving our understanding of this important chapter of human evolutionary history incomplete. In this study, we integrate data from four different studies including ~27,000 European genomes from deCODE Genetics, ~2500 worldwide genomes from the 1000 Genomes Project, ~929 worldwide genomes from the Human Genome Diversity Panel and ~2700 newly sequenced genomes from South Asia.

8. Sebastian Prillo – (Song Lab), Electrical Engineering and Computer Sciences
CherryML: Scalable Maximum Likelihood Estimation of Phylogenetic Models

Abstract: Phylogenetic models of molecular evolution are central to diverse problems in biology, but maximum likelihood estimation of model parameters is a computationally expensive task, in some cases prohibitively so. To address this challenge, we introduce CherryML, a broadly applicable method that achieves several orders of magnitude speedup. We demonstrate its utility by applying it to estimate a general 400 x 400 rate matrix for amino acid co-evolution at protein contact sites.

9. Joana Rocha – (Sudmant Lab), Integrative Biology
A Pan-pangenome captures the full spectrum of genetic variation and ancient trans-species structural polymorphism in humans, chimpanzees and bonobos
Abstract: Evolutionary biology is undergoing a paradigm shift with third generation sequencing technologies enabling the sequencing and assembly of complete, reference-quality genomes and haplotypes. These technologies specifically allow interrogation of complex genome structures and highly divergent haplotypes which have previously been intractable with short-read mapping-based approaches. Here we outline our efforts to resolve the diversity of genome structural variation in the Pan genus from comprehensive sampling of a diverse set of chimpanzees and bonobos. This pan-pangenome effort complements and extends the recent efforts of the human pangenome reference consortium in providing evolutionary context for the extensive structural diversity of human haplotypes. Using this resource we are characterizing the structure, composition, function, and evolutionary trajectories of shared genomic variation across humans, chimpanzees and bonobos, to shed light on the origins of previously identified and hitherto unknown ancestral trans-species polymorphisms.

Sunday, November 6

1. Lightning talk – Junhao (Bear) Xiong (Listgarten/Song Lab), Electrical Engineering and Computer Sciences
Modeling antibody-antigen interactions: a structural modeling perspective
AbstractAntibodies are one of the most important classes of bio-therapeutics. Understanding how antibodies interact with their corresponding antigens is of both scientific and therapeutic interests. Due to the difficulty of experimentally characterizing the structures of antibody-antigen complexes, it is desirable to being able to computationally predict the paratope/epitope given structures of the antibody and antigen. However, the lack of experimental structures and the complexity of the interaction have rendered such prediction task difficult. We hypothesize that the modeling of antibody-antigen interactions can benefit from the advances in ML-based structural modeling techniques, and would like to share some very early ideas and analyses on this problem.

2. Lightning talk – Maya Lemmon-Kishi (Nielsen Lab), Computational Biology 
A Penalized Likelihood Approach for Estimating Haplotypes from Environmental DNA
Abstract: Environmental DNA (eDNA) is a rich source of genetic data that, compared to conventional sampling methods, is non-invasive, captures a broader perspective of biodiversity, and enables long-term ecological studies. The use of eDNA is not limited to modern environmental samples. Ancient eDNA from lake sediments and permafrost can also be recovered and employed to explore patterns of biodiversity across the globe over thousands of years. Currently, eDNA analysis is limited to occupancy of species at sampling sites and is unable to explore the genetic structure of the populations detected. Exploring eDNA data through population genetics theory will increase our understanding of ecological and evolutionary processes across spatiotemporal scales. Despite the insights this combination can provide, population genetic analyses are stymied by the fact that eDNA is composed of short reads from unknown numbers of individuals. This causes the haplotypes required for population genetic analyses to be fragmented, leading to a loss of information. We describe a method to jointly infer phylogenetically compatible haplotypes and sample frequencies using a penalized likelihood. To maximize this penalized likelihood, we have developed an optimization algorithm that is able to efficiently and consistently optimize over a complex state space. By providing biologically relevant haplotypes and frequencies from eDNA, we will be able to explore shifts in biodiversity through the lens of population genetic theory.

3. Lightning talk – Stacy Li (Sudmant Lab), Integrative Biology
High-throughput haplotype and de novo mutation inference using linked-read sequencing
Abstract: Most population studies rely heavily on short-read sequencing and statistical imputation to infer individuals’ haplotypes. These methods limit evaluation of genomic context and variant effects to haplotypes available in reference panels. To address this challenge, we have developed a new protocol for haplotagging, a short-read sequencing technology that obtains true physical haplotype information at population scale. Briefly, haplotagging leverages the physical properties of DNA in solution to assign unique molecular identifiers (UMI) to individual DNA molecules. Libraries derived from the same strand of DNA will share the same UMI: these molecule-linked reads can be used to reconstruct megabase-sized blocks of physically phased DNA sequences, enabling de novo haplotype characterization and inference of long-range genomic architectures at population scale.

Our protocol extends this technique by increasing UMI complexity to the order of 109, enabling ultra-sensitive detection of de novo mutations and joint characterization of haplotype context. Presently, our lab has demonstrated applications of haplotagging in de novo haplotype reconstruction from chimpanzee cell lines. We will deploy haplotagging in service of two major projects: (1) characterizing the influence of haplotype in germline mutations during aging and (2) evaluating how haplotype context enhances targeting in CRISPR-based therapeutic strategies. Data from these two projects will provide greater breadth in understanding how genomic context influences evolutionary trajectories, as well as guide the development of functional genomic tools for addressing genetic disease.

4. Philippe Boileau – (Dudoit Lab), Biostatstics
uniCATE: Flexible Predictive Biomarker Discovery
Abstract: An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. However, the identification of such biomarkers is often the byproduct of the related, but fundamentally different, task of treatment rule estimation. Applying treatment rule estimation methods to identify predictive biomarkers often results in high false discovery rates. The higher than expected number of false positives may translate to waste of resources when conducting follow up experiments of drug target identification and diagnostics assay development. Patient outcomes are in turn negatively affected. We propose a variable importance measure for directly assessing the importance of potentially predictive biomarkers, and develop a flexible semiparametric estimation procedure, uniCATE, for this parameter. We prove that our estimator is double-robust and asymptotically linear under loose conditions on the data-generating process, permitting valid inference about the metric. The statistical guarantees of the method are verified in a thorough simulation study representative of randomized control trials with moderate and high-dimensional covariate vectors. Our procedure is then used to discover predictive biomarkers from among the tumor gene expression data of metastatic renal cell carcinoma patients enrolled in recently completed clinical trials. We find that our approach more readily discerns predictive from non-predictive biomarkers than procedures whose primary purpose is treatment rules estimation, and that these biomarkers delineate more clinically relevant patient subpopulations. A open-source software for the R programming language of the same name, uniCATE, will be made available for general use.

5. Carmelle Catamura– (Lareau Lab), Computational Biology
Poison Exon Engineering Strategies for Haplotype-specific Depletion
Abstract: We are developing a novel genome engineering strategy that targets disease genes most resistant to CRISPR editing: dominant negative diseases caused by repeat expansions, such as Huntington’s disease. In eukaryotes, a surveillance pathway, nonsense-mediated mRNA decay (NMD), degrades mRNA transcripts containing exons with premature stop codons, referred to as “poison exons.” We take advantage of the NMD pathway for allele-specific depletion by engineering poison exons into the disease allele. To discriminate between the wild-type and disease allele, we leverage benign single nucleotide polymorphisms (SNPs) that commonly occur with the repeat expansion as biomolecular anchors. These SNP anchors are targets for CRISPR editing to engineer a poison exon. The mRNA from the edited allele, now containing the poison exon, is degraded by NMD. Although aimed at targeting pathogenic alleles, this strategy can be used to target any gene with a benign SNP anchor, allowing us to control allele-specific gene expression.

6. Boyan Xu – (Krasileva Lab), Mathematics
Structure-aware repeat annotation of protein solenoid domains
Abstract: The leucine-rich repeat (LRR) domain is responsible for effector binding in intracellular plant immune receptors. LRR’s are known to undergo tandem duplications, deletions, and insertions, but existing work has not been able to analyze these mutations in detail. We develop a method to detect tandem repeat mutations. Applying the method to the model plant organism Arabidopsis Thaliana, we uncovered higher rates of tandem mutations in receptors responsible for direct binding to pathogen effectors.

7. Isabel Serrano – (Sudmant Lab), Computational Biology 
Mitochondrial haplotype and mito-nuclear matching drive somatic mutation and selection through aging
Abstract: As cells age, both the nuclear and mitochondrial genomes (mt-genome) accumulate somatic mutations due to environmental and cellular processes. Relative to the nuclear genome, the mt-genome has a 10-fold higher somatic mutation rate. Whereas one nuclear genome exists in a given cell, 100 – 10,000 mt-genomes can coexist in a cell. Thus, mt-genomes present a subcellular population riddled with mutations. While somatic mutations have been associated with aging and age-related diseases, the evolutionary processes shaping the somatic mutational landscape remain largely unknown. In this study, we explore how mitochondrial haplotype and mito-nuclear ancestral mismatching influence the aggregation of somatic mutations through aging. To address these questions, we employ a panel of conplastic mice which are isogenic in their nuclear genome but differ by 1-3 variants in their mitochondrial haplotypes. Using ultra-sensitive Duplex Sequencing, we profiled approximately 4.6 million mitochondrial genomes, with an average of 141,000 mitochondrial genomes sequenced per condition, allowing us to ascertain mutations down to a frequency of 7e-6. We demonstrate tissue- and haplotype specific age-associated somatic mutations. We identify regions in the mt-genome that are prone to the introduction of mutations and locate areas with a high mutation frequency driving a mt-haplotype specific mutational profile. We observe age-associated signatures of selection in coding regions and explore how this mutational landscape is affected by mito-nuclear ancestral mismatching. We identify reversion mutations through aging suggesting somatic selection for haplotype matching of nuclear and mitochondrial genomes. Together, our findings explore somatic evolution in the context of an important cellular organelle and begin to discern how evolutionary processes act to shape the population of mt-genomes.

8. Ryan Chung – (Ioannidis Lab), Computational Biology 
Tissue-specific impacts of aging and genetics on gene expression patterns in humans
Abstract: Age is the primary risk factor for many common human diseases including heart disease, Alzheimer’s dementias, cancers, and diabetes. Determining how and why tissues age differently is key to understanding the onset and progression of such pathologies. Here, we set out to quantify the relative contributions of genetics and aging to gene expression patterns from data collected across 27 tissues from 948 humans. Jointly modeling the contributions of age and genetics to transcript level variation we find that the heritability of gene expression is largely consistent among tissues. In contrast, the average contribution of aging to gene expression variance varied by more than 20-fold among tissues with. We find that the coordinated decline of mitochondrial and translation factors is a widespread signature of aging across tissues. Finally, we show that while in general the force of purifying selection is stronger on genes expressed early in life compared to late in life as predicted by Medawar’s hypothesis, a handful of highly proliferative tissues exhibit the opposite pattern. These non-Medawarian tissues exhibit high rates of cancer and age-of-expression associated somatic mutations in cancer. In contrast, gene expression variation that is under genetic control is strongly enriched for genes under relaxed constraint. Together we present a novel framework for predicting gene expression phenotypes from genetics and age and provide insights into the tissue-specific relative contributions of genes and the environment to phenotypes of aging.

9. Diana Aguilar – (Nielsen Lab), Computational Biology 
Understanding strawberry poison frog color polymorphism in Bocas del Toro
Abstract: Oophaga pumilio is an iconic model system for studying the evolution and maintenance of color polymorphism. Bocas del Toro Province, Panama, has a striking variation of color and pattern. We generated genetic resources for 347 individuals from ten different populations to test different hypotheses: recent selective sweeps, balancing selection, and random genetic drift. Using a demographic model, we show that the divergence times between populations predate the split of the islands. We found that kit is the gene responsible for the blue-red polymorphism in Dolphin Bay. Structural blue in this population is produced by having more melanosomes. We also found ttc39b a as our top gene causing the yellow-red polymorphism in the Cemetery Bastimentos area. This gene has been recently identified as an enhancer of yellow to red carotenoid conversion in birds. Both of these genes show signatures of old balancing selection, which suggests the polymorphism is stable. We also studied the pattern of these frogs and found non-random associations between color and pattern.