Project Details
Projekt Print View

Enabling haplotype-level genomics: Whole-chromosome integrative read-based phasing

Subject Area Bioinformatics and Theoretical Biology
Term from 2018 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 395192176
 
We have entered an era where genomics significantly impacts individuals and society. Recent advances in sequencing technology are transforming medical and fundamental research: Large genotype-phenotype studies are now being carried out routinely and yield new insights about the genetic basis of disease and drug response. These advances in medical genomics enable precision-medicine approaches for the treatment of patients, which are becoming more and more widespread and successful. Other fields, such as population genomics, benefit from the possibility to study millions of loci in large populations.However, individual genomes are currently predominantly studied at the level of genotypes. Genotyping refers to determining the two alleles (one inherited from each parent) present at a particular genetic locus and can be achieved using various technologies including microarrays and short-read sequencing. Whether a heterozygous variant resides on the paternal or the maternal chromosomal copy is unknown using genotype-level genomics, and therefore, the information passed on to down-stream analyses is incomplete. The full sequences of the two chromosomal copies are known as haplotypes. Moving from (sequences of) genotypes to haplotypes is known as phasing. Haplotype-level genomics will enable researchers to look at genomic sequences at full resolution. Besides allowing to address important questions in population genetics, for instance to study demographic history and selection, haplotype-level genomics is particularly relevant for medical genomics.In this project, we provide the algorithmic basis for entering the era of haplotype-level genomics. It will pave the way to a better understanding of the regulatory mechanisms underlying disease and non-disease phenotypes and to explaining missing heritability---the fact that only a small fraction of heritable disease risks has been successfully linked to genetic variants. We will design, implement, and benchmark read-based phasing algorithms to achieve three main goals: First, we solve problem instances that resist current approaches by developing novel algorithms. This particularly applies to problem instances that can deliver chromosome-length haplotypes by integrating different technologies and/or when using sequencing reads and pedigree information in combination. Second, we deliver an experimental map that precisely delineates the strengths and weaknesses of different (combinations of) technologies and hence guides future study design. This is made possible through tight collaboration with the Human Genome Structural Variation Consortium. Third, all algorithmic advances are integrated in our open source WhatsHap software suite, for direct inclusion in production pipelines.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung