Francisco De La Vega, Ph.D.
Distinguished Scientific Fellow and VP of SOLiD Applications and Bioinformatics
Applied Biosystems
"Next-Generation whole genome sequencing of a single human recapitulates population signatures of natural selection."
WHEN: TUESDAY, September 23, 2008, 4:00PM
Abstract:
We obtained the genome sequence of an African
Yoruba individual studied in the HapMap by sequencing fragment and
mate-pair libraries of various sizes with the Applied Biosystems SOLiD™
System. We detected over 3.4 million SNPs; 81.4% are present in dbSNP
and the remaining 18.6% can be considered personal or novel rare SNPs.
SNPs are under-represented in exons (1.98%) compared to
introns/intergenic regions. Of the coding SNPs, 54.2% are silent, 45.2%
are missense, and 0.6% are nonsense. We categorize the functions of
genes using the Panther ontology, and we annotate the damaging potential
of non-synonymous SNPs (nsSNPs) using predictions from PolyPhen. 20.5%
of nsSNPs in this sample are predicted to be damaging (compared to 33.2%
in the current PolyPhen database). There are fewer homozygous damaging
SNPs than heterozygous damaging SNPs, as one would expect given that
many damaging SNPs will be recessive and will be expressed and selected
against only in the homozygous state. We identified 49 SNP alleles
previously associated with human disease (OMIN database), very few in
homozygous state and none of highly penetrant nature. Approximately
137,000 small indels in the size range of ≤10bp were indentified and
found significantly underrepresented in exons. Detailed exon analysis
revealed that indels in exons are greatly overrepresented in the first
and last exon, which contains the 3’ UTR region and the 5’ UTR region,
respectively. The analysis of the distance and orientation of the paired
end reads allowed the identification of over 27,000 putative insertions
and deletions ranging from 50bp to several hundred Kb. The type of
variation in the later size range has not been effectively detected in
previously published personal genome sequences. About 254 genes are
likely disrupted by the presence of such structural variants, 28 of
which have been previously implicated in disease. We also predict 50
inversions and 9 putative gene fusions resulting from deletions,
involving 7 disease genes. Our results suggest that much more genetic
variation remains to be uncovered in human populations, in particular
structural, which must be considered to obtain a complete picture of
functional variation in a personal genome sequence. As the throughput of
sequencing increases, and the cost of sequencing complete human genomes
breaks the $10,000 barrier, studies to sequence thousands of human and
cancer genomes are becoming feasible.
About the speaker:
Francisco De La Vega is Distinguished Scientific Fellow in Genetics and Vice President, Applications and Bioinformatics, at Applied Biosystems in Foster City, California. He earned his Doctor of Science degree in Genetics and Molecular Biology at CINVESTAV (Mexico City), studying the genetic regulation of protein biosynthesis in the bacteria Escherichia coli, initially at the lab bench and later shifting to the computational analysis. In 1990 he was appointed assistant professor at CINVESTAV, where he was the head of the Bioinformatics research and service unit. He then joined Applied Biosystems in 1997 to lead the bioinformatics efforts of the probe design pipelines for the 1700 Human array, the TaqMan® genomic assays, and the SNPlex™ Genotyping System, and later created the SNPbrowser™ Software, a tool to select SNPs for genetic studies. He also led the design and analysis of a pioneering project that genotyped over 200,000 SNPs in human populations to develop validated genotyping assays and survey the patterns of genetic variation. Currently he is working in the analysis of genetic variation with the SOLiD™ System. Francisco leads the collaboration of Applied Biosystems with the 1000 Genomes Project Consortium, and recently received the 2008 Bio-IT World Best Practices Award in Basic Research.