Monthly Industry Seminar Series
Speaker Profile
[BACK]

Francisco De La Vega, Ph.D.
Distinguished Scientific Fellow and VP of SOLiD™ Applications and Bioinformatics
Applied Biosystems

"Next-Generation whole genome sequencing of a single human recapitulates population signatures of natural selection."

WHEN: TUESDAY, September 23, 2008, 4:00PM
WHERE: 177 Stanley Hall
Pre-seminar coffee at 3:45 PM in B1 atrium.
Post-seminar reception at 5:00 PM in B1 atrium.



Abstract:
We obtained the genome sequence of an African Yoruba individual studied in the HapMap by sequencing fragment and mate-pair libraries of various sizes with the Applied Biosystems SOLiD™ System. We detected over 3.4 million SNPs; 81.4% are present in dbSNP and the remaining 18.6% can be considered personal or novel rare SNPs. SNPs are under-represented in exons (1.98%) compared to introns/intergenic regions. Of the coding SNPs, 54.2% are silent, 45.2% are missense, and 0.6% are nonsense. We categorize the functions of genes using the Panther ontology, and we annotate the damaging potential of non-synonymous SNPs (nsSNPs) using predictions from PolyPhen. 20.5% of nsSNPs in this sample are predicted to be damaging (compared to 33.2% in the current PolyPhen database). There are fewer homozygous damaging SNPs than heterozygous damaging SNPs, as one would expect given that many damaging SNPs will be recessive and will be expressed and selected against only in the homozygous state. We identified 49 SNP alleles previously associated with human disease (OMIN database), very few in homozygous state and none of highly penetrant nature. Approximately 137,000 small indels in the size range of ≤10bp were indentified and found significantly underrepresented in exons. Detailed exon analysis revealed that indels in exons are greatly overrepresented in the first and last exon, which contains the 3’ UTR region and the 5’ UTR region, respectively. The analysis of the distance and orientation of the paired end reads allowed the identification of over 27,000 putative insertions and deletions ranging from 50bp to several hundred Kb. The type of variation in the later size range has not been effectively detected in previously published personal genome sequences. About 254 genes are likely disrupted by the presence of such structural variants, 28 of which have been previously implicated in disease. We also predict 50 inversions and 9 putative gene fusions resulting from deletions, involving 7 disease genes. Our results suggest that much more genetic variation remains to be uncovered in human populations, in particular structural, which must be considered to obtain a complete picture of functional variation in a personal genome sequence. As the throughput of sequencing increases, and the cost of sequencing complete human genomes breaks the $10,000 barrier, studies to sequence thousands of human and cancer genomes are becoming feasible.


About the speaker:
Francisco De La Vega is Distinguished Scientific Fellow in Genetics and Vice President, Applications and Bioinformatics, at Applied Biosystems in Foster City, California. He earned his Doctor of Science degree in Genetics and Molecular Biology at CINVESTAV (Mexico City), studying the genetic regulation of protein biosynthesis in the bacteria Escherichia coli, initially at the lab bench and later shifting to the computational analysis. In 1990 he was appointed assistant professor at CINVESTAV, where he was the head of the Bioinformatics research and service unit. He then joined Applied Biosystems in 1997 to lead the bioinformatics efforts of the probe design pipelines for the 1700 Human array, the TaqMan® genomic assays, and the SNPlex™ Genotyping System, and later created the SNPbrowser™ Software, a tool to select SNPs for genetic studies. He also led the design and analysis of a pioneering project that genotyped over 200,000 SNPs in human populations to develop validated genotyping assays and survey the patterns of genetic variation. Currently he is working in the analysis of genetic variation with the SOLiD™ System. Francisco leads the collaboration of Applied Biosystems with the 1000 Genomes Project Consortium, and recently received the 2008 Bio-IT World Best Practices Award in Basic Research.