Value-added modeling; school effectiveness; longitudinal data; primary school
Abstract :
[en] Value-added (VA) models are used for accountability purposes and quantify the value a teacher or a school adds to their students’ achievement. If VA scores lack stability over time and vary across outcome domains (e.g., mathematics and language learning), their use for high-stakes decision making is in question and could have detrimental real-life implications: teachers could lose their jobs, or a school might receive less funding. However, school-level stability over time and variation across domains have rarely been studied together. In the present study, we examined the stability of VA scores over time for mathematics and lan- guage learning, drawing on representative, large-scale, and longitudinal data from two cohorts of standardized achievement tests in Luxembourg (N = 7,016 students in 151 schools). We found that only 34–38% of the schools showed stable VA scores over time with moderate rank correlations of VA scores from 2017 to 2019 of r = .34 for mathematics and r = .37 for language learning. Although they showed insufficient stability over time for high-stakes decision making, school VA scores could be employed to identify teaching or school practices that are genuinely effective—especially in heterogeneous student populations.
Research center :
- Faculty of Language and Literature, Humanities, Arts and Education (FLSHASE) > Luxembourg Centre for Educational Testing (LUCET)
Disciplines :
Education & instruction
Author, co-author :
EMSLANDER, Valentin ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > LUCET
Levy, Jessica; University of Luxembourg > Faculty of Language and Literature, Humanities, Arts and Education (FLSHASE) > Luxembourg Centre for Educational Testing (LUCET)
Scherer, Ronny; University of Oslo, Norway > Faculty of Educational Sciences > Centre for Educational Measurement at the University of Oslo (CEMO)
FISCHBACH, Antoine ; University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Education and Social Work (DESW)
External co-authors :
yes
Language :
English
Title :
Value-Added Scores Show Limited Stability over Time in Primary School
Publication date :
28 December 2022
Journal title :
PLoS ONE
eISSN :
1932-6203
Publisher :
Public Library of Science, San Franscisco, United States - California
Chetty R, Friedman JN, Rockoff JE. Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. Am Econ Rev. 2014; 104: 2593–2632. https://doi.org/10.1257/aer.104.9.2593
Kane TJ, McCaffrey DF, Miller T, Staiger DO. Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment. Research Paper. MET Project. Bill & Melinda Gates Foundation; 2013. https://files.eric.ed.gov/fulltext/ED540959.pdf
Sanders WL, Wright SP, Horn SP. Teacher and classroom context effects on student achievement: Implications for teacher evaluation. J Pers Eval Educ. 1997; 11: 57–67. https://doi.org/10.1023/A:1007999204543
Tymms P. Baseline assessment, value-added and the prediction of reading. J Res Read. 1999; 22: 27–36. https://doi.org/10.1111/1467-9817.00066
Braun H. Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models. Educational Testing Service: Educational Testing Service; 2005.
Aslantas I. The Stability Problem of Value-added Models in Teacher Effectiveness Estimations: A Systematic Review Study. Imagining Better Education: Conference Proceedings 2019. Durham: Durham University, School of Education; 2020. pp. 1–14. http://dro.dur.ac.uk/31546/1/31546.pdf
Goldhaber D, Hansen M. Using Performance on the Job to Inform Teacher Tenure Decisions. Am Econ Rev. 2010; 100: 250–255. https://doi.org/10.1257/aer.100.2.250
Sass TR. The stability of value-added measures of teacher quality and implications for teacher compensation policy. Brief 4. National Center for Analysis of Longitudinal Data in Education Research; 2008 Nov. Report No.: 4. https://eric.ed.gov/?id=ED508273
Levy J, Brunner M, Keller U, Fischbach A. Methodological issues in value-added modeling: an international review from 26 countries. Educ Assess Eval Account. 2019; 31: 257–287. https://doi.org/10.1007/s11092-019-09303-w
Levy J, Mussack D, Brunner M, Keller U, Cardoso-Leite P, Fischbach A. Contrasting Classical and Machine Learning Approaches in the Estimation of Value-Added Scores in Large-Scale Educational Data. Front Psychol. 2020; 11: 2190. https://doi.org/10.3389/fpsyg.2020.02190 PMID: 32973639
Loeb S, Candelaria CA. How Stable Are Value-Added Estimates across Years, Subjects and Student Groups? What We Know Series: Value-Added Methods and Applications. Knowledge Brief 3. Carnegie Found Adv Teach. 2012.
Amrein-Beardsley A. Rethinking value-added models in education: Critical perspectives on tests and assessment-based accountability. New York, NY: Routledge; 2014.
Gorard S, Hordosy R, Siddiqui N. How unstable are “school effects” assessed by a value-added technique? Int Educ Stud. 2013; 6: 1–9. https://doi.org/10.5539/ies.v6n1p1
Ferrão ME. On the stability of value added indicators. Qual Quant. 2012; 46: 627–637. https://doi.org/10.1007/s11135-010-9417-6
Thomas S, Peng WJ, Gray J. Modelling patterns of improvement over time: Value added trends in English secondary school performance across ten cohorts. Oxf Rev Educ. 2007; 33: 261–295. https://doi.org/10.1080/03054980701366116
Perry T. English value-added measures: Examining the limitations of school performance measurement. Br Educ Res J. 2016; 42: 1056–1080. https://doi.org/10.1002/berj.3247
Everson KC. Value-added modeling and educational accountability: Are we answering the real questions? Rev Educ Res. 2017; 87: 35–70. https://doi.org/10.3102/0034654316637199
Emslander V, Levy J, Scherer R, Brunner M, Fischbach A. Stability of Value-Added Models: Comparing Classical and Machine Learning Approaches. virtual conference; 2021. http://hdl.handle.net/10993/48087
Levy J, Brunner M, Keller U, Fischbach A. How sensitive are the evaluations of a school’s effectiveness to the selection of covariates in the applied value-added model? Educ Assess Eval Account. 2022 [cited 25 May 2022]. https://doi.org/10.1007/s11092-022-09386-y PMID: 35646195
Driessen G, Agirdag O, Merry MS. The gross and net effects of primary school denomination on pupil performance. Educ Rev. 2016; 68: 466–480. https://doi.org/10.1080/00131911.2015.1135880
Tekwe CD, Carter RL, Ma C-X, Algina J, Lucas ME, Roth J, et al. An Empirical Comparison of Statistical Models for Value-Added Assessment of School Performance. J Educ Behav Stat. 2004; 29: 11–36. https://doi.org/10.3102/10769986029001011
Hanushek EA. Teacher characteristics and gains in student achievement: Estimation using micro data. Am Econ Rev. 1971; 61: 280–288.
Kurtz MD. Value-Added and Student Growth Percentile Models: What Drives Differences in Estimated Classroom Effects? Stat Public Policy. 2018; 5: 1–8. https://doi.org/10.1080/2330443X.2018.1438938
Hanushek EA. Testing, Accountability, and the American Economy. Ann Am Acad Pol Soc Sci. 2019; 683: 110–128. https://doi.org/10.1177/0002716219841299
Amrein-Beardsley A, Holloway J. Value-Added Models for Teacher Evaluation and Accountability: Commonsense Assumptions. Educ Policy. 2019; 33: 516–542. https://doi.org/10.1177/ 0895904817719519
Agasisti T, Minaya V. Precision and stability of schools’ value-added estimates: evidence for Italian primary schools. Appl Econ Lett. 2021; 28: 541–545. https://doi.org/10.1080/13504851.2020.1763242
Ferrão ME. School effectiveness research findings in the Portuguese speaking countries: Brazil and Portugal. Educ Res Policy Pract. 2014; 13: 3–24. https://doi.org/10.1007/s10671-013-9151-7
Timmermans AC, de Wolf IF, Bosker RJ, Doolaard S. Risk-based educational accountability in Dutch primary education. Educ Assess Eval Account. 2015; 27: 323–346. https://doi.org/10.1007/s11092-015-9212-y
Emslander V, Levy J, Fischbach A. Systematic Identification of High “Value-Added” in Educational Contexts (SIVA). 2022 [cited 15 Apr 2022]. https://doi.org/10.17605/OSF.IO/X3C48
Leckie G, Goldstein H. The importance of adjusting for pupil background in school value-added models: A study of Progress 8 and school accountability in England. Br Educ Res J. 2019; 45: 518–537. https://doi.org/10.1002/berj.3511
Wiliam D. Standardized Testing and School Accountability. Educ Psychol. 2010; 45: 107–122. https://doi.org/10.1080/00461521003703060
Zelazo PD, Carlson SM. Hot and Cool Executive Function in Childhood and Adolescence: Development and Plasticity. Child Dev Perspect. 2012; 6: 354–360. https://doi.org/10.1111/j.1750-8606.2012.00246. x
Hardy CL, Bukowski WM, Sippola LK. Stability and Change in Peer Relationships During the Transition to Middle-Level School. J Early Adolesc. 2002; 22: 117–142. https://doi.org/10.1177/ 0272431602022002001
Smith G, Smith J. Regression to the Mean in Average Test Scores. Educ Assess. 2005; 10: 377–399. https://doi.org/10.1207/s15326977ea1004_4
Ferrão ME, Goldstein H. Adjusting for measurement error in the value added model: evidence from Portugal. Qual Quant. 2009; 43: 951–963. https://doi.org/10.1007/s11135-008-9171-1
Dumay X, Coe R, Anumendem DN. Stability over time of different methods of estimating school performance. Sch Eff Sch Improv. 2014; 25: 64–82. https://doi.org/10.1080/09243453.2012.759599
Papay JP. Different Tests, Different Answers: The Stability of Teacher Value-Added Estimates Across Outcome Measures. Am Educ Res J. 2011; 48: 163–193. https://doi.org/10.3102/0002831210362589
Ferrão ME, Couto A. Indicador de valor acrescentado e tópicos sobre consistência e estabilidade: uma aplicação ao Brasil. Ens Aval E Políticas Públicas Em Educ. 2013; 21: 131–164. https://doi.org/10.1590/S0104-40362013000100008
Niepel C, Brunner M, Preckel F. The longitudinal interplay of students’ academic self-concepts and achievements within and across domains: Replicating and extending the reciprocal internal/external frame of reference model. J Educ Psychol. 2014; 106: 1170–1191. https://doi.org/10.1037/a0036307
Lockwood JR, McCaffrey DF, Hamilton LS, Stecher B, Le V-N, Martinez JF. The Sensitivity of Value-Added Teacher Effect Estimates to Different Mathematics Achievement Measures. J Educ Meas. 2007; 44: 47–67. https://doi.org/10.1111/j.1745-3984.2007.00026.x
No Child Left Behind Act. Sect. § 101, Stat. 1425, 107–110 2002.
Race to the Top Act. S.844-112th Congress 2011. www.govtrack.us/congress/bills/112/s844
Collins C. Houston, we have a problem: Teachers find no value in the SAS education value-added assessment system (EVAAS®). Educ Policy Anal Arch. 2014; 22: 1–39. https://doi.org/10.14507/epaa.v22.1594
Paige MA, Amrein-Beardsley A. “Houston, We Have a Lawsuit”: A Cautionary Tale for the Implementation of Value-Added Models for High-Stakes Employment Decisions. Educ Res. 2020; 49: 350–359. https://doi.org/10.3102/0013189X20923046
Sanders WL, Horn SP. The tennessee value-added assessment system (TVAAS): Mixed-model methodology in educational assessment. J Pers Eval Educ. 1994; 8: 299–311. https://doi.org/10.1007/BF00973726
Duclos M, Murat F. Comment évaluer la performance des lycées? Un point sur la méthodologie des IVAL (Indicateurs de valeur ajoutée des lycées). Éducation Form. 2014; 85: 73–84.
MEN-DEP. Trois indicateurs de performances des lycées (Les Dossiers d’éducation et Formations). 1994.
Hadjar A, Backes S. Bildungsungleichheiten am Übergang in die Sekundarschule in Luxemburg. 2021 [cited 13 Apr 2022]. https://doi.org/10.48746/BB2021LU-DE-21A
Ferrão ME. The evaluation of students’ progression in lower secondary education in Brazil: Exploring the path for equity. Stud Educ Eval. 2022; 75: 101220. https://doi.org/10.1016/j.stueduc.2022.101220
Sonnleitner P, Krämer C, Gamo S, Reichert M, Keller U, Fischbach A. Neue längsschnittliche Befunde aus dem nationalen Bildungsmonitoring ÉpStan in der 3. und 9. Klasse: Schlechtere Ergebnisse und wirkungslose Klassenwiederholungen. 2021 [cited 13 Apr 2022].
Fischbach A, Colling J, Levy J, Pit-ten Cate I, Rosa C, Krämer C, et al. Befunde aus dem nationalen Bildungsmonitoring ÉpStan vor dem Hintergrund der COVID-19-Pandemie. 2021 [cited 21 Jun 2022]. https://doi.org/10.48746/BB2021LU-DE-34A
Lenz T, Backes S, Ugen S, Fischbach A. Bereit für die Zukunft? Der dritte Bildungsbericht für Luxemburg. 2021 [cited 13 Apr 2022]. https://doi.org/10.48746/BB2021LU-DE-1
Kirsch C, Seele C. Early Language Education in Luxembourg. In: Schwartz M, editor. Handbook of Early Language Education. Cham: Springer International Publishing; 2022. pp. 789–812.
Haertel GD, Walberg HJ, Weinstein T. Psychological models of educational performance: A theoretical synthesis of constructs. Rev Educ Res. 1983; 53: 75–91. https://doi.org/10.3102/00346543053001075
Wang MC, Haertel GD, Walberg HJ. Toward a Knowledge Base for School Learning. Rev Educ Res. 1993; 63: 249–294. https://doi.org/10.2307/1170546
American Psychological Association. Ethical principles of psychologists and code of conduct. 2017. https://www.apa.org/ethics/code/
Peng P, Lin X, Ünal ZE, Lee K, Namkung J, Chow J, et al. Examining the mutual relations between language and mathematics: A meta-analysis. Psychol Bull. 2020; 146: 595–634. https://doi.org/10.1037/bul0000231 PMID: 32297751
Fischbach A, Ugen S, Martin R. ÉpStan Technical Report. Luxembourg: University of Luxembourg; 2014. http://hdl.handle.net/10993/15802
Ministry of National Education, Children and Youth. Elementary School. Cycles 1–4. The Levels of Competence. 2011. http://www.men.public.lu/catalogue-publications/fondamental/apprentissages/documents-obligatoires/niveaux-competences/en.pdf
Nagy G, Neumann M. Psychometrische Aspekte des Tests zu den voruniversitären Mathematikleistungen in TOSCA-2002 und TOSCA-2006: Unterrichtsvalidität, Rasch-Homogenität und Messäquivalenz. In: Trautwein U, Neumann M, Nagy G, Lüdtke O, Maaz K, editors. Schulleistungen von Abiturienten Die neu geordnete gymnasiale Oberstufe auf dem Prüfstand. Wiesbaden: VS Verlag für Sozialwissenschaften; 2010. pp. 281–306.
Wu LM, Adams RJ, Wilson MR, Haldane SA. ACER ConQuest version 2: Generalised item response modelling software [computer program]. Camberwell: Australian Council for Educational Research; 2007.
Warm TA. Weighted likelihood estimation of ability in item response theory. Psychometrika. 1989; 54: 427–450. https://doi.org/10.1007/BF02294627
Robitzsch A, Kiefer T, Wu M. TAM: Test Analysis Modules. 2019. https://CRAN.R-project.org/package=TAM
Dalby D. The linguasphere register of the world’s languages and speech communities /. Hebron, Wales, UK: Linguasphere Press; 1999.
Martin R, Ugen S, Fischbach A. Épreuves Standardisées: Bildungsmonitoring für Luxemburg. Eschsur-Alzette: University of Luxembourg, Luxembourg Centre for Educational Testing (LUCET); 2015. https://men.public.lu/dam-assets/catalogue-publications/statistiques-etudes/statistiques-globales/epreuves-standardisees.pdf
Schmitt N. Uses and abuses of coefficient alpha. Psychol Assess. 1996; 8: 350.
Ganzeboom HB. A new International Socio-Economic Index (ISEI) of occupational status for the International Standard Classification of Occupation 2008 (ISCO-08) constructed with data from the ISSP 2002–2007. 2010.
OECD, UNESCO Institute for Statistics. Literacy Skills for the World of Tomorrow: Further Results from PISA 2000. OECD; 2003.
Grund S, Robitzsch, Alexander, Luedtke, Oliver. mitml: Tools for Multiple Imputation in Multilevel Modeling. 2019. https://CRAN.R-project.org/package=mitml
Quartagno M, Carpenter J. jomo: A package for Multilevel Joint Modelling Multiple Imputation. 2019. https://CRAN.R-project.org/package=jomo
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015; 67: 1–48. https://doi.org/10.18637/jss.v067.i01
Marzano RJ, Toth MD. Teacher evaluation that makes a difference: A new model for teacher growth and student achievement. ASCD; 2013.
Conaway C, Goldhaber D. Appropriate Standards of Evidence for Education Policy Decision Making. Educ Finance Policy. 2020; 15: 383–396. https://doi.org/10.1162/edfp_a_00301
Cohen J. A power primer. Psychol Bull. 1992; 112: 155–159. https://doi.org/10.1037//0033-2909.112.1.155 PMID: 19565683
Gorard S. Value-added is of little value. J Educ Policy. 2006; 21: 235–243. https://doi.org/10.1080/02680930500500435
Minaya V, Agasisti T. Evaluating the Stability of School Performance Estimates over Time. Fisc Stud. 2019; 40: 401–425. https://doi.org/10.1111/1475-5890.12201
Goldhaber D, Hansen M. Is it just a bad class? Assessing the long-term stability of estimated teacher performance. Economica. 2013; 80: 589–612. https://doi.org/10.1111/ecca.12002
McCaffrey DF, Lockwood JR. Missing data in value-added modeling of teacher effects. Ann Appl Stat. 2011; 5. https://doi.org/10.1214/10-AOAS405
Newton X, Darling-Hammond L, Haertel E, Thomas E. Value-added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educ Policy Anal Arch. 2010; 18: 1–24. https://doi.org/10.14507/epaa.v18n23.2010
Scherrer J. Measuring teaching using value-added modeling: The imperfect panacea. NASSP Bull. 2011; 95: 122–140. https://doi.org/10.1177/0192636511410052
Hoffmann D, Hornung C, Gamo S, Esch P, Keller U, Fischbach A. Schulische Kompetenzen von Erstklässlern und ihre Entwicklung nach zwei Jahren. Luxembourg: Luxembourg Centre for Educational Testing, Universität Luxemburg; Service de la Coordination de la Recherche et de l’Innovation pédagogiques et technologiques; 2018 Dec pp. 84–96. https://orbilu.uni.lu/bitstream/10993/38687/1/ul_natbericht_de_web_1.6.pdf