Project Details
Projekt Print View

Ultra-fast haplotype phasing and genotype imputation service using a hybrid FPGA-GPU system

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Bioinformatics and Theoretical Biology
Epidemiology and Medical Biometry/Statistics
Computer Architecture, Embedded and Massively Parallel Systems
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 351403079
 
Final Report Year 2022

Final Report Abstract

In this project, we developed the software EagleImp, which combines haplotype phasing and genotype imputation in a single convenient tool. EagleImp is a fast and accurate stand-alone software based on the concept of the established tools Eagle2 for phasing and PBWT for imputation. Due to the introduction of algorithmic and technical improvements, including changes in the data structure, EagleImp is 2 to 10 times faster than the combination of Eagle2 and PBWT and provides the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp also yielded equal or higher imputation accuracy (r2) than the Sanger Imputation Service (SIS), the Michigan Imputation Server (MIS) and the TOPMed Imputation Server that use larger (not freely available) reference panels. Because of technical optimizations and improvements in the stability of the software, EagleImp can perform phasing and imputation for upcoming very large reference panels with more than 1 million genomes. EagleImp is freely available at GitHub. With EagleImp-Web, we further provide the community with a free and easy-to-use web service that runs an FPGA-accelerated version of the EagleImp software in the background. The FPGA hardware design results in a further speed increase of up to 66% compared to CPU-only processing. Our imputation web service provides a fast, secure and high-quality service for genome-wide genotype phasing and imputation, with many security and convenience features that other services lack, e.g. users can select algorithmic parameters (such as the K parameter for phasing) and tailor input and output data to their needs by selecting tolerance for ref/alt swaps and strand flips, as well as the required output information (such as allele dosage, genotype dosages and genotype probabilities). Further, EagleImp-Web provides transparent monitoring of the user’s jobs and makes all result files (including log files) available for download. All files belonging to a user are protected from unauthorized access via user accounts. Security can optionally be enhanced by 2-factor authentication. EagleImp-Web complies with the General Data Protection Regulation (GDPR) of the European Union and is available at our website.

Publications

  • (2020) Reference-Based Haplotype Phasing with FPGAs. In Krzhizhanovskaya, V. V. et al., (eds.), Computational Science – ICCS 2020, Springer International Publishing pp.481–495
    Wienbrandt, L., Kässens, J. C., and Ellinghaus, D.
    (See online at https://doi.org/10.1007/978-3-030-50420-5_36)
  • (2022) EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool
    Wienbrandt, L., Ellinghaus, D.
    (See online at https://doi.org/10.1101/2022.01.11.475810)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung