Ultraschnelle Haplotyp- und Genotyp-Schätzung von genomweiten Daten auf einem FPGA-GPU Hybridsystem
Bioinformatik und Theoretische Biologie
Epidemiologie und Medizinische Biometrie/Statistik
Rechnerarchitektur, eingebettete und massiv parallele Systeme
Zusammenfassung der Projektergebnisse
In this project, we developed the software EagleImp, which combines haplotype phasing and genotype imputation in a single convenient tool. EagleImp is a fast and accurate stand-alone software based on the concept of the established tools Eagle2 for phasing and PBWT for imputation. Due to the introduction of algorithmic and technical improvements, including changes in the data structure, EagleImp is 2 to 10 times faster than the combination of Eagle2 and PBWT and provides the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp also yielded equal or higher imputation accuracy (r2) than the Sanger Imputation Service (SIS), the Michigan Imputation Server (MIS) and the TOPMed Imputation Server that use larger (not freely available) reference panels. Because of technical optimizations and improvements in the stability of the software, EagleImp can perform phasing and imputation for upcoming very large reference panels with more than 1 million genomes. EagleImp is freely available at GitHub. With EagleImp-Web, we further provide the community with a free and easy-to-use web service that runs an FPGA-accelerated version of the EagleImp software in the background. The FPGA hardware design results in a further speed increase of up to 66% compared to CPU-only processing. Our imputation web service provides a fast, secure and high-quality service for genome-wide genotype phasing and imputation, with many security and convenience features that other services lack, e.g. users can select algorithmic parameters (such as the K parameter for phasing) and tailor input and output data to their needs by selecting tolerance for ref/alt swaps and strand flips, as well as the required output information (such as allele dosage, genotype dosages and genotype probabilities). Further, EagleImp-Web provides transparent monitoring of the user’s jobs and makes all result files (including log files) available for download. All files belonging to a user are protected from unauthorized access via user accounts. Security can optionally be enhanced by 2-factor authentication. EagleImp-Web complies with the General Data Protection Regulation (GDPR) of the European Union and is available at our website.
Projektbezogene Publikationen (Auswahl)
- (2020) Reference-Based Haplotype Phasing with FPGAs. In Krzhizhanovskaya, V. V. et al., (eds.), Computational Science – ICCS 2020, Springer International Publishing pp.481–495
Wienbrandt, L., Kässens, J. C., and Ellinghaus, D.
(Siehe online unter https://doi.org/10.1007/978-3-030-50420-5_36) - (2022) EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool
Wienbrandt, L., Ellinghaus, D.
(Siehe online unter https://doi.org/10.1101/2022.01.11.475810)