Project Details
Estimation and resampling methods for judging the quality of multiple tests for high dimensional data
Applicant
Professor Dr. Arnold Janssen
Subject Area
Mathematics
Term
from 2011 to 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 209176168
Contents Nowadays, modern technologies in life sciences generate high dimensional data where the dimension of the data can be much larger than the sample size. We refer to genome studies in biology and medicine. Within these kind of applications hidden effects (labeled as signals) are typically rarely present. This is called the sparsity in statistics. In connection with genome studies often only a few genome risk positions occur. In order to find these type of effects the statisticians use multiple tests. Their quality is judged by the false discovery rate (FDR). The FDR is the expectation of the rate of the number of false rejected hypotheses divided by the number of all rejections. The most famous test is the Benjamini/Hochberg multiple test published in 1995. In our last DFG-project we studied multiple testing problems, where extended adaptive solutions were proposed. The topic of the present proceeding research proposal is a quality study of multiple tests. It is based on estimation and resampling procedures of the unknown effective FDR. We take into account that the solutions of multiple tests and risk estimation often have a large variance. In practice sparse effects are hidden within a high dimensional noise. We first study various estimators for the effective FDR which are slight modifications of estimators given in the literature. Their variability will be controlled by nonparametric confidence intervals also under sparsity. For this purpose, we will estimate the distribution of the estimators by resampling methods. To this end we propose a modified bootstrap procedure which is related to the so called low resampling bootstrap. Earlier publications of our research group can be used to attack this question. However, due to the sparsity problem given by the signals, new methods must be developed since the ordinary bootstrap of Efron may fail. We like to study the consistency of our estimators for different multiple tests. Then the quality of our resampling procedures is checked. The project will be accompanied by large Monte Carlo simulations. Moreover, we like to continue our cooperation with colleagues from molecular biology and biometrical sciences who gave important hints for applications.
DFG Programme
Research Grants