Click here to vote for the Best Talks!
Speakers listed in order of presentation
Saturday, October 25
|
Session 1: 9:15 – 9:30 AM |
Sarah Johnson (Moorjani Lab): Reconstructing Denisovan Ancestry Abstract: Gene flow from the extinct hominin group, Denisovans, has shaped the landscape of the modern human genome, and in some cases provided adaptive advantages to modern humans. Most non-Africans harbor ~0.1% Denisovan ancestry, while Oceanians contain ~4% Denisovan ancestry, a striking finding given that the only sequenced Denisovan genome was recovered in Siberia. Recent studies have suggested that up to 4 independent pulses of Denisovan gene flow contributed to modern humans, revealing a complex history of interaction between modern humans and Denisovans. Moreover, the sequenced Denisovan genome contains evidence of deeply diverged ancestry, potentially originating from Homo erectus. In this study we investigate Denisovan gene flow by analyzing patterns of Denisovan ancestry in >5,000 published present-day genomes collected across Asia and Oceania. We infer archaic ancestry within each individual and aggregate fragments across individuals to reconstruct a “synthetic” Denisovan genome from each population. These synthetic genomes are up to 1 gigabase long, approaching the 1.6 gigabase length of the high coverage sequenced Denisovan genome, demonstrating the power of this approach for recovering Denisovan genetic material from modern populations. We find the signal of deeply diverged ancestry in all synthetic Denisovan genomes, suggesting that this deeply diverged ancestry was present prior to Denisovan gene flow into modern humans. Finally, we develop a method that extends LD-decay based dating approaches to directly use the covariance of local ancestry fragments to infer the time of Denisovan admixture. Our simulations suggest this method is accurate for gene flow that occurred between 30,000-60,000 years ago, the expected range of Denisovan gene flow into modern humans. We apply our method to infer the date of gene flow from modern human populations. Our results demonstrate the strength of using synthetic Denisovan genomes, as well as provide an exciting new approach to dating Denisovan gene flow in modern populations. |
| 9:30 – 9:45 AM |
Maya Lemmon-Kishi (Nielsen lab): Decoding Time: A Phylogenetic Framework for Molecular Dating of Sedimentary Ancient DNA Abstract: Sedimentary ancient DNA (sedaDNA) from permafrost, lake and marine sediments provides a rich source of genetic data that captures broad perspectives of past biodiversity. However, accurate dating is crucial for discovering ecologically relevant patterns from sedaDNA, and increasingly samples are too old for C-14 dating. While molecular dating allows for sample ages to be estimated from the recovered genetic material itself, the fragmented and damaged nature of short-read ancient DNA poses significant challenges. We have developed ratePlacer, a phylogeny-based method for analyzing sedaDNA that can combine information from many short reads in a sample while accounting for DNA damage to provide maximum likelihood estimates of sample ages. By applying ratePlacer to a diverse set of sedaDNA samples from various time points, we establish a timeline that allows us to contextualize exceptionally old samples from sites like Kap København and Fyles Leaf Beds against younger C-14 dated samples. This comprehensive dating approach enhances our understanding of the age distribution of recoverable genetic material and expands our ability to study ancient ecosystems. |
| 9:45 – 10:00 AM |
Yulin Zhang (Moorjani Lab): Identifying footprints of archaic admixture in modern humans Abstract: The sequencing of the Neanderthal and Denisovan genomes has revolutionized our understanding of archaic contributions to modern humans. Yet, a significant portion of human evolutionary history remains elusive—in particular, the impact of introgression from archaic lineages for which no genomic reference sequences exist. Detecting ghost admixture is inherently challenging, as most available methods rely on either a sequenced archaic reference genome or unadmixed outgroup populations. We introduce a new approach, ARG-HMM, to identify archaic ancestry segments in modern human genomes that leverages features of the ancestral recombination graph (ARG) constructed using contemporary genomes alone, without requiring an archaic genome or an unadmixed outgroup. Our method is based on a hidden Markov model (HMM) that utilizes two key hallmarks of introgression: “long branches” in the coalescent tree, which reflect deep lineages, and “long haplotypes,” which are contiguous ancestry segments that persist due to limited recombination following introgression. We evaluated the performance of ARG-HMM under a range of demographic scenarios and compared it with existing archaic ancestry inference methods such as hmmix and Sprime, which use unadmixed outgroup genomes. Using simulated ARGs, we show that ARG-HMM yields high sensitivity and precision. With inferred ARGs (e.g., from SINGER), it achieves comparable performance, outperforming existing methods. Applying ARG-HMM to 1000 Genomes Project data, we recover introgressed regions enriched for known Neanderthal and Denisovan ancestry in non-Africans, validating our method against established results. We also identify previously unreported archaic introgression signals in both African and non-African populations, including segments with site frequency spectra indicative of gene flow from an uncharacterized hominin lineage. Our results demonstrate the power of ARG-based approaches to recover hidden episodes of gene flow in our past and offer a scalable path towards mapping archaic ancestry in modern humans, even in the absence of archaic DNA samples. |
| 10:00 – 10:15 AM |
Kaiyuan Li (Nielsen Lab): Morphology Evolution on Phylogenetic Trees with Local MCMC Algorithm Abstract: Modeling morphological evolution along phylogenies requires capturing both stochastic evolutionary processes and the geometric structure of biological shapes. We present a new generative framework that combines Brownian motion with Large Deformation Diffeomorphic Metric Mapping (LDDMM)–based covariance structures to model the evolution of high-dimensional landmark data. To perform inference under this model, we develop a GPU-accelerated Markov chain Monte Carlo (MCMC) algorithm that exploits the conditional independence structure of phylogenies. Specifically, we introduce local MCMC updates, where shapes at conditionally independent nodes are updated in parallel, enabling efficient exploration of high-dimensional posteriors. Our method supports hierarchical priors on evolutionary parameters, such as rate scaling ($k_\alpha$) and spatial scale ($k_\sigma$), and can flexibly incorporate extensions including branch-type partitioning, within-species variation, and axis-dependent correlations. By leveraging JAX and just-in-time compilation, the algorithm achieves substantial speedups compared to standard implementations, making inference feasible on phylogenetic trees with hundreds of nodes and landmarks. We validate our approach using simulation studies and demonstrate accurate recovery of both per-branch evolutionary rates and latent ancestral shapes. Application to empirical datasets, such as butterfly wings and bird beaks, highlights the method’s ability to infer not only species-specific morphology but also evolutionary heterogeneity across lineages. This framework provides a principled Bayesian approach to integrating geometric morphometrics and phylogenetic inference, offering new insights into the tempo and mode of morphological evolution. Beyond morphology, the local MCMC strategy illustrates a general principle for scaling Bayesian inference on tree-structured models, with potential applications to comparative genomics and population genetics. |
|
Session 2: 10:30 - 10:45 AM |
Yu-Jen (Jennifer) Lin (Brenner Lab): Diagnostic analsyis of rare genomes (DART) Abstract: For many patients with rare genetic disease, whole-genome sequencing enables diagnosis, yet ~50% remain unsolved. RNA-seq is a complementary readout, but most pipelines hunt for “outliers” relative to appropriate controls, such as unusually high or low expression. These approaches have yielded successes, but performance has been disappointingly limited. We suspect this is because the largest transcript changes often reflect downstream consequences of the true cause: a variant in a gene whose own expression may be unchanged while it induces broad shifts elsewhere. We present DART, which reframes analysis from isolated outliers to whole-transcriptome state matching. The principal motivating intuition for DART is that the transcriptome is, in its entirety, reflective of a disease state originating with specific variants—but that there may be no way to reconstruct the series of mechanisms by which the genetic variation created the observed transcriptome. DART therefore depends upon assembling a library of gene-targeted reference profiles from large-scale Perturb-seq screens, each perturbing one gene and recording the resulting expression pattern. Then, for a given patient, DART compares the patient’s cell-state pattern to this library and returns a shortlist of genes most consistent with the observed state. We trained on a large single-cell compendium targeting ~2,000 genes that measured expression across ~8,000 genes, and we evaluated on additional single-cell and pseudobulk (aggregation)/bulk datasets. Because geneticists typically review only a brief shortlist, we assess performance using top-k accuracy. Across held-out single-cell datasets, DART consistently outperforms outlier-based methods on top-k accuracy. DART treats a patient's whole transcriptome as a cell state signature, comparing it to a library of gene-targeted reference profiles to shortlist genes with likely causal variants, even when the gene's own expression is unaltered and precipitates a transcriptome-wide cascade. |
|
10:45 - 11:00 AM |
Kristina Garske (Ayroles Lab): Rural-to-urban transitions and the genetics of non-communicable disease risk Abstract: Humans adapted to subsistence lifestyles that differ markedly from those of today, and the global rise in non-communicable diseases (NCDs) may partly reflect an “evolutionary mismatch” between past adaptations and modern environments. We are investigating kidney and cardiometabolic health in the Turkana of northwest Kenya, an historically pastoralist population now undergoing rapid rural-to-urban transition. Our prior genomic analyses in >300 Turkana individuals revealed signatures of adaptation to heat stress, dehydration, and a high-protein diet. Building on this, we are testing whether genetic variants that were once beneficial or neutral become maladaptive in urban settings. To do so, we integrate whole-genome sequencing, gene expression, chromatin accessibility, and metabolic phenotyping across individuals living along the rural–urban gradient. This approach allows us to map genotype-by-environment interactions and pinpoint molecular mechanisms that contribute to elevated NCD risk in modern lifestyles. |
| 11:00 – 11:15 AM |
Shuyi Yang (Marshall Lab): Spatial Close-Kin Mark-Recapture Methods to Estimate Dispersal Parameters and Barrier Strength for Mosquitoes Abstract: Close-kin mark-recapture (CKMR) methods have recently been used to infer demographic parameters for several aquatic and terrestrial species. For mosquitoes, the spatial distribution of close-kin pairs has been used to estimate mean dispersal distance, of relevance to vector-borne disease transmission and genetic biocontrol strategies. Close-kin methods have advantages over traditional mark-release-recapture (MRR) methods as the mark is genetic, removing the need for physical marking and recapturing that may interfere with movement behavior. Here, we extend CKMR methods to accommodate spatial structure alongside life history for mosquitoes and comparable insects. We derive kinship probabilities for parent-offspring and full-sibling pairs in a spatial context, where an individual in each pair may be a larva or adult. Using the dengue vector Aedes aegypti as a case study, we use an individual-based model of mosquito life history to test the effectiveness of this approach at estimating parameters such as mean dispersal distance, daily staying probability, and the strength of a barrier to movement. Considering a simulated population of 9,025 adult mosquitoes arranged on a 19-by-19 grid, we find the CKMR approach provides unbiased and precise estimates of mean dispersal distance given a total of 2,500 adult females sampled over a three-month period using 25 traps evenly spread throughout the landscape. The CKMR approach is also able to estimate parameters of more complex dispersal kernels, such as the daily staying probability of a zero-inflated exponential kernel, or the strength of a barrier to movement, provided the magnitude of these parameters is greater than 0.5. These results suggest that CKMR provides an insightful characterization of mosquito dispersal that is complementary to conventional MRR methods. |
|
Session 3: 4:00 - 4:15 PM |
Chengzhong Ye (Song Lab): Predicting functional constraints across evolutionary timescales with phylogeny-informed genomic language models Abstract: Genomic language models (gLMs) have emerged as a powerful approach for learning genome-wide functional constraints directly from DNA sequences. However, standard gLMs adapted from natural language processing often require extremely large model sizes and computational resources, yet still fall short of classical evolutionary models in predictive tasks. Here, we introduce GPN-Star (Genomic Pretrained Network with Species Tree and Alignment Representation), a biologically grounded gLM featuring a phylogeny-aware architecture that leverages whole-genome alignments and species trees to model evolutionary relationships explicitly. Trained on alignments spanning vertebrate, mammalian, and primate evolutionary timescales, GPN-Star achieves state-of-the-art performance across a wide range of variant effect prediction tasks in both coding and non-coding regions of the human genome. Analyses across timescales reveal task-dependent advantages of modeling more recent versus deeper evolution. To demonstrate its potential to advance human genetics, we show that GPN-Star substantially outperforms prior methods in prioritizing pathogenic and fine-mapped GWAS variants; yields unprecedented enrichments of complex trait heritability; and improves power in rare variant association testing. Extending beyond humans, we train GPN-Star for five model organisms – Mus musculus, Gallus gallus, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana – demonstrating the robustness and generalizability of the framework. Taken together, these results position GPN-Star as a scalable, powerful, and flexible new tool for genome interpretation, well suited to leverage the growing abundance of comparative genomics data. |
|
4:15 – 4:30 PM |
Junhao (Bear) Xiong (Listgarten, Song Labs): ProteinGuide: On-the-fly property guidance for protein sequence generative models Abstract: Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in an on-the-fly manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity. |
|
4:30 – 4:45 PM |
Claire LeBlanc (Staller Lab): Interpretable biophysical neural networks of transcriptional activation domains separate roles of protein abundance and coactivator binding Abstract: Deep neural networks have improved the accuracy of many difficult prediction tasks in biology, but it remains challenging to interpret these networks and learn molecular mechanisms. Here, we address the interpretability challenges associated with predicting transcriptional activation domains from protein sequence. Activation domains, regions within transcription factors that drive gene expression, were traditionally difficult to predict due to their sequence diversity and poor conservation. Multiple deep neural networks can now accurately predict activation domains, but these predictors are difficult to interpret. With the goal of interpretability, we designed simple neural networks that incorporated biophysical models of activation domains. The simplicity of these neural networks allowed us to visualize their parameters and directly interpret what the networks learned. The biophysical neural networks revealed two new ways that arrangement (i.e. the sequence grammar) of activation domain controlled function: 1) hydrophobic residues both increase activation domain strength and decrease protein abundance, and 2) acidic residues control both activation domain strength and protein abundance. Notably, the biophysical neural networks helped us to recognize the same signatures in complex interpreters of the deeper neural networks. We demonstrate how combining biophysical and deep neural networks maximizes both prediction accuracy and interpretability to yield insights into biological mechanisms. |
| 4:45 – 5:00 PM |
Xinyi Yang (Titov Lab): Data Driven Simulations and Experiments to Study Glucose Metabolism Abstract: Glycolysis is the central pathway of energy metabolism that converts glucose to lactate and produces ATP. Extensive biochemical studies have established that four glycolytic enzymes are allosterically regulated by a dozen effectors, namely: hexokinase (HK), phosphofructokinase (PFK), glyceraldehyde-3-dehydrogenase (GAPDH), and pyruvate kinase (PK). However, it is largely unknown what physiological properties are enabled by allosteric regulation. Our lab has developed and validated a mathematical model of glycolysis that predicted that the role of allosteric regulation of HK and PFK is to maintain high ATP levels and prevent accumulation of intermediates by inhibiting the reaction of Harden and Young. The overarching goal of this project is to experimentally validate these predictions in human cells and a reconstituted glycolysis system using HK and PFK enzyme variants that lack allosteric regulation. |
|
Session 4: 5:15 - 5:30 PM |
Samvardhini Sridharan (Sudmant Lab): Rapid turnover and recurrent structural variation at the 17q21.31 locus in humans and non-human primates Abstract: The 17q21.31 locus in humans harbors several complex structural haplotypes including a ∼970kb inversion. Different inversion haplotypes have been associated with susceptibility to microdeletions causing Koolen-de Vries syndrome and variation in fecundity and recombination rates. Here, using 210 haplotype-resolved human genome assemblies and pangenome graph-based approaches we characterize 11 distinct structural haplotypes, several of which have not been previously described. Extending our analyses to a set of haplotype-resolved great-ape genomes, we characterize the structure of an independent inversion in chimpanzees which extends an additional 650kb, encompasses 5 additional genes, and is ∼2 million years younger than the human inversion. We further determine that gorillas exhibit an independent duplication of the KANSL1 gene which may predispose them to Koolen-de Vries syndrome causing microdeletions. Using short read sequencing data we characterize 17q21.31 haplotype diversity worldwide in ∼5174 individuals from 107 populations finding increased frequencies of KANSL1 duplication-containing haplotypes in both European and South Asian populations as well as 8 double recombination events between inverted and non-inverted haplotypes ranging in size from 20-180kb. Finally, using 626 ancient Eurasian human genomes we show the frequency of haplotypes containing KANSL1 duplications has increased ∼6-fold over the past 12 thousand years in Europe. Together, our results highlight the dynamics, complexity, and recurrent, independent evolution of a medically relevant locus across humans and great apes. |
| 5:30 - 5:45 PM |
Scott Ferguson (Sudmant Lab): T2T primate genomes reveal 60 million years of structural variation and karyotype evolution Abstract: High-quality reference genomes are essential for identifying the genetic mechanisms underlying phenotypic differences between species. The sequencing of the human genome, followed by several primate genomes, significantly advanced our understanding of primate genetics and diversity, providing valuable insights into the evolution of our species and its close relatives. However, of the approximately 500 extant primate species, only about 15 high-quality reference genomes currently exist. We are constructing complete, near telomere-to-telomere (T2T), reference-quality genomes for 50 diverse primate species, which will greatly expand the number of available high-quality genomes. For each, high molecular weight DNA is obtained from a curated set of Coriell cell lines. From these, we generate PacBio HiFi reads, ultra-long Oxford Nanopore Technologies (ONT) reads, and Hi-C reads. Additionally, to enable annotation, we are generating long-read RNA (PacBio Kinnex). To date, we have produced 10 near-T2T primate genomes, which we are using to investigate the evolution of primate karyotypes over 60 million years. A key focus is identifying structural variants (SVs) within and between species. The high-quality, haplotype-resolved genomes allow for the comprehensive detection of SVs, including insertions, deletions, inversions, duplications, and translocations. We are examining the association of these SVs with genes, exploring their potential impact on gene expression and function. Furthermore, we are analysing the role of SVs in primate adaptation and divergence by examining their distribution across species and searching for signatures of conservation and selection. This work enables the analysis of evolutionary relationships, including incomplete lineage sorting and large-scale karyotype changes. Additionally, the resources created by this project support comparative genomics, evolutionary studies, and biomedical research. |
| 5:45 – 6:00 PM |
Johnathan Lo (Sudmant, Boots Labs): Catostomid genomics Abstract: Whole genome duplications (WGDs) result in both evolutionary opportunities and increased selective pressure. Evolutionary opportunities arise from relaxed constraints, allowing subfunctionalization, neofunctionalization, regulatory rewiring, and escape from adaptive conflict. On the other hand, WGDs can also produce added selective pressure, especially in allopolyploids, when regulatory mechanisms come into conflict between the two parental lineages. Here I present preliminary findings on subgenome dominance and homeolog partitioning in an allotetraploid fish complex. |