Project Details
Deep conditional independence tests with application to imaging genetics
Subject Area
Medical Informatics and Medical Bioinformatics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Epidemiology and Medical Biometry/Statistics
Statistics and Econometrics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Epidemiology and Medical Biometry/Statistics
Statistics and Econometrics
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 459422098
Deep learning is a workhorse for biomedical data analysis due to its ability to leverage highly nonlinear associations to train accurate prediction models, in particular for structured data such as images or sequences. However, prediction quality alone - which may be influenced by confounding - is insufficient in order to move from data towards understanding the underlying biology. Statistical tests for independence that adjust for potential confounding influences make sound statements about dependencies between biological variables on the population level. Current statistical tests, however, are not tailored towards applications to structured data. In this project, we develop conditional independence tests for statistical association between a continuous scalar response Y and an input/covariate X, while conditioning on covariates Z that may confound the association due to dependencies with both X and Y. Specifically, we consider the cases where one or both of X and Z are structured (in particular images). Our approach uses deep learning to map structured data X and/or Z onto continuous embeddings, which may for example come from transfer learning. We then use tests in linearized mixed effects models on embedded variables, where random effects allow for parsimonious modeling in high dimensions. We will investigate theoretical properties of all developed tests and provide efficient algorithms and implementations. We focus in particular also on good power properties in finite samples and derive sample size and power calculations.The developed methods are motivated by applications in imaging genetics, where conditional independence testing is used to map heritable phenotypes in images to genetic loci, correcting for confounding by population structure and relatedness via conditioning. Population-based imaging allows to efficiently quantify phenotypes, including disease biomarkers. While current genome-wide association studies analyze a priori known scalar biomarkers such as organ sizes, the goal of this project is to enable unbiased testing for the presence of any heritable phenotypic variation in images towards the discovery of novel biomarkers. In particular, we will use our methods for gene-based association studies of 2D retinal fundus images and 3D brain magnetic resonance images in the UK Biobank.
DFG Programme
Research Units