Novel approaches for detecting and analyzing stuctural variants in personal genomes
Final Report Abstract
Genomic structural variants (SVs), such as copy-number variants or large balanced inversions, represent a major form of genetic variation with an impact on the human genome similar to single nucleotide polymorphisms (SNPs). However, compared to SNPs, our understanding of SVs has been more limited. Prior to us pursuing research funded through this Emmy Noether project, the resolution of published surveys has thus far been insufficient for mapping the start- and end-points (i.e., breakpoints) of SVs, hampering detailed analyses of SVs. We proposed to study SVs in unprecedented detail, by developing computational approaches based on next-generation DNA sequencing data to identify SVs and analyze their extent in the genome; mining published datasets to unravel SV de novo formation processes; and applying approaches to measure SV de novo formation rates and to infer the influence of natural selection. We accomplished all three aspects. Specifically, we first systematically collected information on SVs with known breakpoints and standardized their description in a breakpoint library, which we initially published in 2011 and have since been updating in the literature. Scanning short DNA sequencing reads against the library enables accurate SV-detection and thus adding significant value to “personal genomes” that presently lack detailed SV analyses. We have also demonstrated the use of read depth analysis to detect SVs reliably, and we have demonstrated the use of these principles to obtain new insights into effects of SVs on gene expression. Second, we developed a framework for untangling SV formation by devising computational approaches for inferring the likely ancestral state (by comparison with primate genomes) and the likely causal mutational mechanism (by breakpoint analysis) at each SV locus, a principle we have employed to study SV formation in humans, chimpanzees, orangutans and macaques. Third, we have been estimating SV formation rates experimentally, and SV-occurrence frequencies computationally, to assess the factors contributing to the abundance and distribution of SVs in the genome – which may involve both formation mechanism biases (such as fragile sites or “SV hotspots“ in the genome) as well as evolutionary selection. Since our proposed sperm-based analyses of SV formation proved to be more challenging than anticipated at application stage, we followed a contingency plan using the yeast Saccharomyces cerevisiae for studying SV formation, which unexpectedly uncovered an exciting novel link between meiotic genes and DNA repair processes relevant to de novo SV formation. Research we have carried out so far has given us novel insights into common structural rearrangements in the genome – which have been published in "Nature". Research from the Korbel group that is funded by this Emmy Noether project has received coverage in normal print and online media (such as “Hamburger Abendblatt” on February 3rd 2011, as well as online at several sites including “Science Daily”; see http://www.sciencedaily.com/releases/2011/02/110202132326.htm).
Publications
-
A map of human genome variation from population-scale sequencing. Nature. 2010 Oct 28;467(7319):1061-73
1000 Genomes Project Consortium, Abecasis GR, Altshuler D, et al.
-
Systematic inference of copy-number genotypes from personal genome sequencing data reveals extensive olfactory receptor gene content diversity. PLoS Comput Biol 2010; 6(11): e1000988
Waszak SM, Hasin Y, Zichner T, et al.
-
Challenges in studying genomic structural variant formation mechanisms: The short-read dilemma and beyond. BioEssays 2011; 33:840-50
Onishi-Seebacher M & Korbel JO
-
Mapping copy number variation by population-scale genome sequencing. Nature 2011; 470(7332): 59-65
Mills RE, Walter K, Stewart C, et al.
-
Relating CNVs to transcriptome data at fine resolution: assessment of the effect of variant size, type, and overlap with functional regions. Genome Res 2011; 21(12): 2004-13
Schlattl A, Anders S, Waszak SM, Huber W, Korbel JO
-
An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491(7422): 56-65
1000 Genomes Project Consortium, Abecasis GR, Auton A, et al.
-
Genome Sequencing of Pediatric Medulloblastoma Links Catastrophic DNA Rearrangements with TP53 Mutations. Cell 2012; 148(1-2): 59-71.
Rausch T, Jones DTW, Zapatka M, et al.
-
Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing. Genome Res 2013; 23:568-79
Zichner T, Garfield DA, Rausch T, et al.
-
Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet 2013; 14:125-38
Weischenfeldt J, Symmons O, Spitz F et al.
-
Primate genome architecture influences structural variation mechanisms and functional consequences. Proc Natl Acad Sci USA 2013; 110(39): 15764-9
Gokcumen O, Tischler V, Tica J, et al.
-
Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res 2014. 24:1193-208
Huang W, Massouras A, Inoue Y, et al.
-
An integrated map of structural variation in 2,504 human genomes. Nature volume 526, pages 75–81 (01 October 2015)
Sudmant P, Rausch T., Gardner EJ, et al.
-
Analysis of deletion breakpoints from 1,092 humans reveals details of mutation mechanisms. Nat Commun 2015. 6:7256
Abyzov A, Li S, Kim DR, et al.