Project Details
Projekt Print View

Integrating clinical and molecular patient data into subgroup risk prediction models for enabling individualized therapy

Subject Area Epidemiology and Medical Biometry/Statistics
Term from 2013 to 2017
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 243584364
 
Final Report Year 2019

Final Report Abstract

When linking high-dimensional molecular measurements, e.g., genomic information, gene expression, or methylation, to clinical endpoints, such as time to relapse in cancer patients, different strategies can be employed. If primarily insight into the molecular mechanisms is of interest, it is sufficient to identify components significantly contributing to these endpoints. However, when guidance is needed for the clinician treating a patient, some sort of risk prediction model is wanted. Ideally, such models should assign a patient a predicted probability for future events, based on molecular measurements. The clinician then can, e.g., tailor therapy appropriately for high- and low-risk patients. Different diseases require different risk prediction models, but often it will be difficult to decide what constitutes a defined disease that is adequately served by a single risk prediction signature, comprising a list of clinical and molecular characteristics important for prognosis. For example, while a single prognostic model for breast cancer patients might seem plausible, several molecular breast cancer subtypes have been identified, where each subgroup of breast cancer patients might require a separate risk prediction model. Risk prediction models within these subgroups then can provide a basis for individualized therapy, as prognosis is more specific to the individual patient. As the number of individuals in each subgroup will often be small, a subgroup model fitting approach is desirable that can borrow information from other subgroups. In addition, statistical modeling approaches often will need to integrate different types of molecular measurements and also clinical data. To address these challenges, we first developed a weighted regression approach for providing stable subgroup signatures in settings with known subgroups, e.g., when connecting gene expression data to a time-to-event endpoint in a high-dimensional setting. Specifically, we developed a new weighted regression approach based on component-wise likelihood-based boosting. The idea behind our new approach is to borrow information across subgroups by incorporating all observations but down-weighting the complementary subgroups that are not under investigation. We payed particular attention to evaluation criteria, such as variable selection stability when identifying important predictors by the component-wise boosting approach and prediction performance. For this, we developed two visualization tools, the stability trajectories and the weight-frequency map. These can highlight the different kinds of molecular markers that are important for different subgroups or might be overall important. For evaluating and illustrating these new techniques, we considered different applications from the Genome Cancer Atlas. We subsequently focused on settings with subgroups that still need to be determined, e.g., by a clustering approach. In addition, we considered integration of different types of molecular measurements. Specifically, we combined weighted regression with hierarchical clustering. First, a cluster analysis is performed on one type of molecular data and then for one specific cluster of patients a subgroup signature is developed with the help of weighted regression on the other type of molecular measurements. For example in a kidney cancer application, we integrated DNA methylation gene expression data by combining non-sparse hierarchical cluster analysis on the methylation data with the new weighted regression for obtaining a sparse subgroup signature for the gene expression data for one specific cluster. Subsequently, we investigated the extent to which the clustering component in this strategy affects the model stability and prediction performance. We found that the prediction model is rather robust with respect to the clustering component, i.e. the proposed strategy can be useful for integrating different kinds of molecular data. Thus, we could develop reliable modeling strategies for obtaining risk prediction models in settings either with known or unknown subgroups, and for integrating different kinds of molecular measurements.

Publications

  • A weighting approach for judging the effect of patient strata on highdimensional risk prediction signatures. BMC Bioinformatics. 2015; 16(1), 1-12
    Weyer V and Binder H
    (See online at https://doi.org/10.1186/s12859-015-0716-8)
  • A weighting approach for a better identification of subgroup-specific effects, PhD Thesis, Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Germany, 2016
    Weyer-Elberich V
 
 

Additional Information

Textvergrößerung und Kontrastanpassung