Project Details
Learning from high-dimensional, heterogeneous data: Machine learning methods in econometrics
Subject Area
Statistics and Econometrics
Term
from 2020 to 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 431701914
Due to the advancing digitalization, large data sets with many possible predictor variables for different areas of human behaviour are available nowadays. Data from microeconomic applications often show heterogeneity regarding data sources, endogeneity of relevant variables and increasing data dimensions. Thus, the analysis of large amounts of high-dimensional data from this area requires tailor-made machine learning methods. In this project, we intend to develop and to extend machine learning methods for dealing with heterogeneous treatment effects in randomized experiments, as well as with heterogeneity using random coefficient regression models in consumer demand analysis. In this framework, we will investigate the least absolute shrinkage and selection operator (LASSO), the adaptive version of the LASSO, the causal version of the random forest methodology as well as boosting. A focus will be on analysing variable selection properties of these methods, partly combined with the recently introduced knockoff methodology to achieve control of the false discovery rate. On the methodological side, we will develop theoretical guarantees of the proposed methods and investigate efficient ways for generating knockoff variables. On the applied side, we will implement the methods in freely available software packages, and we will apply them to data sets from consumer demand and to randomized experiments.
DFG Programme
Research Grants
International Connection
China, United Kingdom, USA
Co-Investigator
Professor Dr. Alexander Meister