Project Details
Projective item response theory models for count data and their application as interpretable approximations to black box models of machine learning
Applicant
Professor Dr. Philipp Doebler
Subject Area
Personality Psychology, Clinical and Medical Psychology, Methodology
General, Cognitive and Mathematical Psychology
General, Cognitive and Mathematical Psychology
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 463078117
Item Response Theory (IRT) provides measurement models for latent variables. Person-specific latent variable estimates and their measurement errors can be calculated in a variety of situations. IRT thus is the most comprehensive statistical basis for the operative use and evaluation of diagnostic testing in psychology and empirical educational research. Compared to IRT methods for binary data, count data IRT models are underdeveloped. Many count data IRT models use the Poisson distribution, but the resulting conditional equidispersion assumption is rarely empirically defensible. Current approaches based on the Conway-Maxwell-Poisson distribution solve this problem, but so far cannot take factor loadings into account. In applications, especially when unstructured indicators are to be used, multidimensional latent variable constellations are plausible. With projective IRT methods it is possible to derive an empirically indistinguishable one-dimensional IRT model with local item dependence, which is favorable for interpretation and further use. Therefore, projective IRT models are generalized to the count data case. In particular, multidimensional IRT models can be projected on their main dimension. Machine learning methods are seen as powerful data analytical tools in many areas of psychology, but also in the field of educational data mining. Results of many machine learning methods are difficult to interpret. The projective IRT models developed in this research project are applied as easily interpretable surrogate models in situations where a black box machine learning model is used for its predictive performance or its classification accuracy. This results in an interpretable approximation to a black box model, which helps to better understand it. Since multi-dimensional and even high-dimensional latent variable constellations are numerically complex, an EM algorithm for a general count data IRT model with factor loadings is developed. All common count data distributions with over- and under-dispersion will be considered, as well as covariates at the person and item level and their interactions.
DFG Programme
Research Grants
International Connection
Australia
Cooperation Partner
Alan Huang, Ph.D.
Co-Investigator
Dr. Boris Forthmann