No document available.
Abstract :
[en] Today’s educational field has a tremendous hunger for valid and psychometrically sound items to reliably track and model students’ learning processes. Educational large-scale assessments, formative classroom assessment, and lately, digital learning platforms require a constant stream of high-quality, and unbiased items. However, traditional development of test items ties up a significant amount of time from subject matter experts, pedagogues and psychometricians and might not be suited anymore to nowadays demands. Salvation is sought in automatic item generation (AIG) which provides the possibility of generating multiple items within a short period of time based on the development of cognitively sound item templates by using algorithms (Gierl, Lay & Tanygin, 2021).
Using images or other pictorial elements in math assessment – e.g. TIMSS (Trends in International Mathematics and Science (TIMSS, Mullis et al 2009) and Programme for International Student Assessment (PISA, OECD 2013) – is a prominent way to present mathematical tasks. Research on using images in text items show ambiguous results depending on their function and perception (Hoogland et al., 2018; Lindner et al. 2018; Lindner 2020). Thus, despite the high importance, effects of image-based semantic embeddings and their potential interplay with cognitive characteristics of items are hardly studied. The use of image-based semantic embeddings instead of mainly text-based items will increase though, especially in contexts with highly heterogeneous student language backgrounds.
The present study psychometrically analyses cognitive item models that were developed by a team of national subject matter experts and psychometricians and then used for algorithmically producing items for the mathematical domain of numbers & operations for Grades 1, 3, and 5 of the Luxembourgish school system. Each item model was administered in 6 experimentally varied versions to investigate the impact of a) the context the mathematical problem was presented in, and b) problem characteristics which cognitive psychology identified to influence the problem solving process. Based on samples from Grade 1 (n = 5963), Grade 3 (n = 5527), and Grade 5 (n = 5291) collected within the annual Épreuves standardisées, this design allows for evaluating whether psychometric characteristics of produced items per model are a) stable, b) can be predicted by problem characteristics, and c) are unbiased towards subgroups of students (known to be disadvantaged in the Luxembourgish school system). The developed cognitive models worked flawlessly as base for generating item instances. Out of 348 generated items, all passed ÉpStan quality criteria which correspond to standard IRT quality criteria (rit > .25; outfit >1.2). All 24 cognitive models could be fully identified either by cognitive aspects alone, or a mixture of cognitive aspects and semantic embeddings. One model could be fully described by different embeddings used. Approximately half of the cognitive models could fully explain all generated and administered items from these models, i.e. no outliers were identified. This remained constant over all grades. With the exemption of one cognitive model, we could identify those cognitive factors that determined item difficulty. These factors included well known aspects, such as, inverse ordering, tie or order effects in additions, number range, odd or even numbers, borrowing/ carry over effects or number of elements to be added. Especially in Grade 1, the chosen semantic embedding the problem was presented in impacted item difficulty in most models (80%). This clearly decreased in Grades 3, and 5 pointing to older students’ higher ability to focus on the content of mathematical problems. Each identified factor was analyzed in terms of subgroup differences and about half of the models were affected by such effects. Gender had the most impact, followed by self-concept and socioeconomic status. Interestingly those differences were mostly found for cognitive factors (23) and less for factors related to the embedding (6).
In sum, results are truly promising and show that item development based on cognitive models not only provides the opportunity to apply automatic item generation but to also create item pools with at least approximately known item difficulty. Thus, the majority of developed cognitive models in this study could be used to generate a huge number of items (> 10.000.000) for the domain of numbers & operations without the need for expensive field-trials. A necessary precondition for this is the consideration of the semantic embedding the problems are presented in, especially in lower Grades. It also has to be stated that modeling in Grade 1 was more challenging due to unforeseen interactions and transfer effects between items. We will end our presentation by discussing lessons learned from models where prediction was less successful and highlighting differences between the Grades.