Detailseite
Projekt Druckansicht

Integration von biologischem Vorwissen aus verschiedenen Typen von Omics Daten in Überlebenszeitmodelle

Antragsteller Dr. Kai Kammers
Fachliche Zuordnung Epidemiologie und Medizinische Biometrie/Statistik
Förderung Förderung von 2013 bis 2015
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 240819500
 
Erstellungsjahr 2015

Zusammenfassung der Projektergebnisse

Genes, lifestyle, and environment are three well-recognized factors influencing human health. The underlying biological pathways explaining the variability from the genome to phenotypes of health and disease are still not well understood. Large initiatives have helped to gain insight into the biological foundations for human health. The Human Genome Project, which had the goal of sequencing the entire human genome, was launched in 1990 and completed in 2003. Since 2000 numerous developments have appeared to predict outcome for several kinds of cancer on the basis of gene expression experiments. Several of these studies reported considerable success for classification and prediction. They allow the discovery of new markers opening the way to more subject-specific treatments with greater efficacy and safety. Clinical covariates like age, gender, blood pressure, tumor size, and grade, as well as smoking and drinking history have been extensively studied and shown to have satisfactory predictive power. They are usually easy to measure and of low dimensionality. By uncovering the relationship between different phenotypes and omics measurements (e.g., genomics or proteomics data), the hope is to achieve more accurate prognoses and improved treatment strategies. Predicting the prognosis and metastatic potential of cancer at the time of discovery is a major challenge in current clinical research. Numerous recent studies searched for genomic signatures in order to discriminate, e.g., cancer cells from healthy cells. A substantial challenge in this context comes from the fact that the number of omics variables is usually much larger than the number of samples. Integrating prior biological knowledge or aggregating genomic information from different data sources into statistical models is even more challenging. In addition, progressing developments in technology always deserve a critical review on existing statistical analyses pipelines. In one of my research projects, we investigated how to perform statistical inference on quantitative proteomics data derived from mass-spectrometry experiments. In a first step, we aggregated intensity measurements for all peptides belonging to the same protein to allow for protein-based statistical inference. Hereby, the relative abundance estimate for a protein is calculated as the median of peptide intensity spectra from this protein. Most commonly, the statistical inference in case-control is based on standard 2-sample t-tests, comparing the measured abundances for each protein across the conditions of interest. However, sample sizes are commonly small in such proteomic experiments, sometimes smaller than 10 samples in total, which result in great uncertainty in the sample variability estimates. We demonstrated how an empirical Bayes method, shrinking a protein’s sample variance towards a pooled estimate, leads to far more powerful and stable inference to detect significant changes in protein abundance compared to ordinary t-tests. We also showed how to analyze data from multiple experiments simultaneously, discussed the effects of missing data on the inference, and presented easy to use open source software for normalization of mass spectrometry data and inference based on moderated test statistics. In another project, we investigated in a first step how induced pluripotent stem cells (iPSCs) from people with informative genotypes transform into megakaryocytes (large bone marrow cells, MKs), the precursor cells for platelets, to determine patterns of gene transcript expression in the MKs related to specific genetic variants. We focused on an integrative approach by analyzing genomic data derived from different platforms. Hereby, we performed eQTL analyses (expression quantitative trait loci) with megakaryocytes to identify associations between gene expression features (exons, genes, transcripts) and genetic variation (single nucleotide polymorphisms, SNPs). We detected a large amount of highly significant eQTLs that are unique to MKs and not detected in other cell types.

Projektbezogene Publikationen (Auswahl)

  • (2015). Detecting significant changes in protein abundance. EuPA Open Proteomics 7:11-19
    Kammers K, Cole RN, Tiengwe C, Ruczinski I
    (Siehe online unter https://doi.org/10.1016/j.euprot.2015.02.002)
  • (2015). RNA Helicase DDX3 - A novel therapeutic target in Ewing sarcoma. Oncogene [Epub ahead of print]
    Wilky BA, Kim C, McCarty G, Montgomery EA, Kammers K, Cole RN, Raman V, Loeb D
    (Siehe online unter https://doi.org/10.1038/onc.2015.336)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung