Project Details
Projekt Print View

Lifespan AI - Project D1: Deep Integration for Long Delta Health Data

Subject Area Epidemiology and Medical Biometry/Statistics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 459360854
 
Long Delta Health Data encompass the entire life of individuals and cohorts. They stem from different and changing sources and are typically sparse and variable. The different sources, called modalities, and their varying temporal resolution even within one study hamper computational integration substantially, and between different studies with only partially shared modalities and partial temporal overlap, this obstacle gets even worse. To integrate such data, and make them amenable for computational analysis, the project will hence have to overcome these challenges. To devise integration methods with a high level of generalisation and applicability, we propose to employ deep-learning-based models.We will employ learnable and transferable data embedding models for time-resolved units comprising one or more modalities. Conceptually, this approach builds on the so-called attention modules known from language transformer models. The resulting embeddings will need to be general, robust, and amenable to “transfer learning”-like approaches, and we will validate the success with the unique datasets available to the project. In our project, we use datasets from two large German population studies that span a wide range of temporal resolutions and data types, such as questionnaires, voice recordings, and medical images. We aim to build a library of trained embedding models that can be used for ”new” data beyond the three development datasets.Further, on selected use cases, we will demonstrate how to use embedded data for predictive tasks. Predictions may encompass the imputation of missing data with regard to modalities, the interpolation of data between measured time points, or the extrapolation of one or more modalities into the future. Reliable predictions will depend on a certain structure of embedding space. Consequently, an important scientific project goal is to develop meaningful metrics to characterise embedding spaces quantitatively with regard to data quality issues, outliers, bias, and inconsistencies limiting the predictive capacity. The results obtained in this project have the potential to profoundly impact data science with a particular focus on the health domain, where time resolved data with a huge variability of data types and quality levels are predominant. As our aim is to devise concepts to unify vastly different data types and information sources with long delta characteristics, we believe the results will be applicable to areas even beyond health sciences.
DFG Programme Research Units
 
 

Additional Information

Textvergrößerung und Kontrastanpassung