Project Details
Projekt Print View

Statistical and practical significance of item misfit in educational testing

Applicant Dr. Carmen Köhler
Subject Area Personality Psychology, Clinical and Medical Psychology, Methodology
Education Systems and Educational Institutions
Term from 2017 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 388489847
 
Testing model fit is considered an important step in item response theory (IRT) modeling, since model fit is a necessary prerequisite for drawing valid inferences from estimated parameters (Wainer & Thissen, 1987). Hambleton and Hahn (2005) suggest several steps to evaluate model fit, including (a) the calculation of item fit statistics and (b) investigating the consequences of misfit with regard to test outcomes. In educational large-scale assessments, various item fit indices are employed. Many of these statistics show severe limitations such as lack of theoretical proof about the distribution of the test statistic (Liang, Wells, & Hambleton, 2014). This lack of methodological basis for determining accurate cut-off values makes decisions regarding item fit rather arbitrary. Unsurprisingly, studies comparing the statistics have demonstrated that the statistics result in contradictory conclusions regarding which items show statistical misfit and should hence be excluded from the test (see, e.g., Chon, Lee, & Ansley, 2013). A second issue regarding decisions about item fit concerns the fact that practical significance of item misfit (i.e., effects on relevant test outcomes) is not taken into account when determining item exclusions. This is mostly due the fact that, thus far, no readily usable method for evaluating practical significance exists. The proposed project aims to (1) establish guidelines for practitioners about the performance of relatively common as well as more recent item fit statistics in educational large-scale assessments and (2) develop criteria for evaluating practical significance of item misfit. Simulation studies will be conducted to pursue the first research objective, investigating possible influencing factors on the fit statistics` performance (i.e., their Type I error rates and power). These factors include (i.) sample size, (ii.) interaction between misfit and item parameters, (iii.) type of model violation (iv.) size of misfit, (v.) amount of misfitting items in the data, and (vi.) amount and type of missing values in the data. Regarding the size of misfit, we plan to propose different effect size measures that allow distinguishing between small, medium, and large item misfit. The second major objective is to develop methods to determine practical significance of misfit for outcomes that are relevant in low-stakes educational testing, including (i.) analyses on relationships between competence and covariates, and (ii.) competence comparisons over time. We will use empirical data (e.g., from PISA or NEPS) to validate our findings and to illustrate our methods.
DFG Programme Research Grants
Cooperation Partner Dr. Alexander Robitzsch
 
 

Additional Information

Textvergrößerung und Kontrastanpassung