![]() ![]() Michels, Michael Andreas ![]() ![]() ![]() Scientific Conference (2022, November) Today’s educational field has a tremendous hunger for valid and psychometrically sound items to reliably track and model students’ learning processes. Educational large-scale assessments, formative ... [more ▼] Today’s educational field has a tremendous hunger for valid and psychometrically sound items to reliably track and model students’ learning processes. Educational large-scale assessments, formative classroom assessment, and lately, digital learning platforms require a constant stream of high-quality, and unbiased items. However, traditional development of test items ties up a significant amount of time from subject matter experts, pedagogues and psychometricians and might not be suited anymore to nowadays demands. Salvation is sought in automatic item generation (AIG) which provides the possibility of generating multiple items within a short period of time based on the development of cognitively sound item templates by using algorithms (Gierl & Haladyna, 2013; Gierl et al., 2015). The present study psychometrically analyses 35 cognitive item models that were developed by a team of national subject matter experts and psychometricians and then used for algorithmically producing items for the mathematical domain of numbers & shapes for Grades 1, 3, 5, and 7 of the Luxembourgish school system. Each item model was administered in 6 experimentally varied versions to investigate the impact of a) the context the mathematical problem was presented in, and b) problem characteristics which cognitive psychology identified to influence the problem solving process. Based on samples from Grade 1 (n = 5963), Grade 3 (n = 5527), Grade 5 (n = 5291), and Grade 7 (n = 3018) collected within the annual Épreuves standardisées, this design allows for evaluating whether psychometric characteristics of produced items per model are a) stable, b) can be predicted by problem characteristics, and c) are unbiased towards subgroups of students (known to be disadvantaged in the Luxembourgish school system). After item calibration using the 1-PL model, each cognitive model was analyzed in-depth by descriptive comparisons of resulting IRT parameters, and the estimation of manipulated problem characteristics’ impact on item difficulty by using the linear logistic test model (LLTM, Fischer, 1972). Results are truly promising and show negligible effects of different problem contexts on item difficulty and reasonably stable effects of altered problem characteristics. Thus, the majority of developed cognitive models could be used to generate a huge number of items (> 10.000.000) for the domain of numbers & operations with known psychometric properties without the need for expensive field-trials. We end with discussing lessons learned from item difficulty prediction per model and highlighting differences between the Grades. References: Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 36, 359-374. Gierl, M. J., & Haladyna, T. M. (Eds.). (2013). Automatic item generation: Theory and practice. New York, NY: Routledge. Gierl, M. J., Lai, H., Hogan, J., & Matovinovic, D. (2015). A Method for Generating Educational Test Items That Are Aligned to the Common Core State Standards. Journal of Applied Testing Technology, 16(1), 1–18. [less ▲] Detailed reference viewed: 172 (7 UL)![]() ![]() Inostroza Fernandez, Pamela Isabel ![]() ![]() ![]() Scientific Conference (2022, November) Educational large-scale assessments aim to evaluate school systems’ effectiveness by typically looking at aggregated levels of students’ performance. The developed assessment tools or tests are not ... [more ▼] Educational large-scale assessments aim to evaluate school systems’ effectiveness by typically looking at aggregated levels of students’ performance. The developed assessment tools or tests are not intended or optimized to be used for diagnostic purposes on an individual level. In most cases, the underlying theoretical framework is based on national curricula and therefore too blurry for diagnostic test construction, and test length is too short to draw reliable inferences on individual level. This lack of individual information is often unsatisfying, especially for participating students and teachers who invest a considerable amount of time and effort, not to speak about the tremendous organizational work needed to realize such assessments. The question remains, if the evaluation could not be used in an optimized way to offer more differentiated information on students’ specific skills. The present study explores the potential of Diagnostic Classification Models (DCM) in this regard, since they offer crucial information for policy makers, educators, and students themselves. Instead of a ranking of, e.g., an overall mathematics ability, student mastery profiles of subskills are identified in DCM, providing a rich base for further targeted interventions and instruction (Rupp, Templin & Henson, 2010; von Davier, M., & Lee, Y. S., 2019). A prerequisite for applying such models is well-developed, and cognitively described items that map the assessed ability on a fine-grained level. In the present study, we drew on 104 items that were developed on base of detailed cognitive item models for basic Grade 1 competencies, such as counting, as well as decomposition and addition with low numbers and high numbers (Fuson, 1988, Fritz & Ricken, 2008, Krajewski & Schneider, 2009). Those items were spread over a main test plus 6 different test booklets and administered to a total of 5963 first graders within the Luxembourgish national school monitoring Épreuves standardisées. Results of this pilot study are highly promising, giving information about different student’s behaviors patterns: The final DCM was able to distinguish between different developmental stages in the domain of numbers & operations, on group, as well as on individual level. Whereas roughly 14% of students didn’t master any of the assessed competencies, 34% of students mastered all of them including addition with high numbers. The remaining 52% achieved different stages of competency development, 8% of students are classified only mastering counting, 15% of students also can master addition with low numbers, meanwhile 20% of students additionally can master decomposition, all these patterns reflect developmental models of children’s counting and concept of numbers (Fritz & Ricken, 2008; see also Braeuning et al, 2021). Information that could potentially be used to substantially enhance large-scale assessment feedback and to offer further guidance for teachers on what to focus when teaching. To conclude, the present results make a convincing case that using fine-grained cognitive models for item development and applying DCMs that are able to statistically capture these nuances in student response behavior might be worth the (substantially) increased effort. References: Braeuning, D. et al (2021)., Long-term relevance and interrelation of symbolic and non-symbolic abilities in mathematical-numerical development: Evidence from large-scale assessment data. Cognitive Development, 58, https://doi.org/10.1016/j.cogdev.2021.101008. Fritz, A., & Ricken, G. (2008). Rechenschwäche. utb GmbH. Fuson, K. C. (1988). Children's counting and concepts of number. Springer-Verlag Publishing. Rupp, A. A., Templin, J. L., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guildford Press. Von Davier, M., & Lee, Y. S. (2019). Handbook of diagnostic classification models. Cham: Springer International Publishing. [less ▲] Detailed reference viewed: 150 (7 UL)![]() ![]() Michels, Michael Andreas ![]() ![]() ![]() Scientific Conference (2022, March 09) Detailed reference viewed: 73 (10 UL)![]() ![]() Michels, Michael Andreas ![]() ![]() ![]() Scientific Conference (2021, November 11) Assessing mathematical skills in national school monitoring programs such as the Luxembourgish Épreuves Standardisées (ÉpStan) creates a constant demand of developing high-quality items that is both ... [more ▼] Assessing mathematical skills in national school monitoring programs such as the Luxembourgish Épreuves Standardisées (ÉpStan) creates a constant demand of developing high-quality items that is both expensive and time-consuming. One approach to provide high-quality items in a more efficient way is Automatic Item Generation (AIG, Gierl, 2013). Instead of creating single items, cognitive item models form the base for an algorithmic generation of a large number of new items with supposedly identical item characteristics. The stability of item characteristics is questionable, however, when different semantic embeddings are used to present the mathematical problems (Dewolf, Van Dooren, & Verschaffel, 2017, Hoogland, et al., 2018). Given culture-specific knowledge differences in students, it is not guaranteed that illustrations showing everyday activities do not differentially impact item difficulty (Martin, et al., 2012). Moreover, the prediction of empirical item difficulties based on theoretical rationales has proved to be difficult (Leighton & Gierl, 2011). This paper presents a first attempt to better understand the impact of (a) different semantic embeddings, and (b) problem-related variations on mathematics items in grades 1 (n = 2338), 3 (n = 3835) and 5 (n = 3377) within the context of ÉpStan. In total, 30 mathematical problems were presented in up to 4 different versions, either using different but equally plausible semantic contexts or altering the problem’s content characteristics. Preliminary results of IRT-scaling and DIF-analysis reveal substantial effects of both, the embedding, as well as the problem characteristics on general item difficulties as well as on subgroup level. Further results and implications for developing mathematic items, and specifically, for using AIG in the course of Épstan will be discussed. [less ▲] Detailed reference viewed: 68 (13 UL) |
||