Reference : Are Value-Added Scores Stable Enough for High-Stakes Decisions?
Scientific congresses, symposiums and conference proceedings : Unpublished conference
Social & behavioral sciences, psychology : Education & instruction
Educational Sciences
Are Value-Added Scores Stable Enough for High-Stakes Decisions?
Emslander, Valentin mailto [University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > LUCET >]
Levy, Jessica mailto [University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Education and Social Work (DESW) >]
Scherer, Ronny mailto [University of Oslo - UiO > Centre for Educational Measurement at the University of Oslo (CEMO), Faculty of Educational Sciences]]
Brunner, Martin mailto [University of Potsdam > Department of Education]
Fischbach, Antoine mailto [University of Luxembourg > Faculty of Humanities, Education and Social Sciences (FHSE) > Department of Education and Social Work (DESW) >]
9th Conference of the Society for Empirical Educational Research (GEBf)
from 09-03-22 to 11-03-22
[en] value-added modeling ; school effectiveness
[en] Theoretical Background: Can we quantify the effectiveness of a teacher or a school with a single number? Researchers in the field of value-added (VA) models may argue just that (e.g., Chetty et al., 2014; Kane et al., 2013). VA models are widely used for accountability purposes in education and quantify the value a teacher or a school adds to their students’ achievement. For this purpose, these models predict achievement over time and attempt to control for factors that cannot be influenced by schools or teachers (i.e., sociodemographic & sociocultural background). Following this logic, what is left must be due to teacher or school differences (see, e.g., Braun, 2005).
To utilize VA models for high-stakes decision-making (e.g., teachers’ tenure, the allocation of funding), these models would need to be highly stable over time. School-level stability over time, however, has hardly been researched at all and the resulting findings are mixed, with some studies indicating high stability of school VA scores over time (Ferrão, 2012; Thomas et al., 2007) and others reporting a lack of stability (e.g., Gorard et al., 2013; Perry, 2016). Furthermore, as there is no consensus on which variables to use as independent or dependent variables in VA models (Everson, 2017; Levy et al., 2019), the stability of VA could vary between different outcome measures (e.g., language or mathematics). If VA models lack stability over time and across outcome measures, their use as the primary information for high-stakes decision-making is in question, and the inferences drawn from them could be compromised.
Questions: With these uncertainties in mind, we examine the stability of school VA model scores over time and investigate the differences between language and mathematics achievement as outcome variables. Additionally, we demonstrate the real-life implications of (in)stable VA scores for single schools and point out an alternative, more constructive use of school VA models in educational research.
Method: To study the stability of VA scores on school level over time and across outcomes, we drew on a sample of 146 primary schools, using representative longitudinal data from the standardized achievement tests of the Luxembourg School Monitoring Programme (LUCET, 2021). These schools included a heterogeneous and multilingual sample of 7016 students. To determine the stability of VA scores in the subject of mathematics and in languages over time, we based our analysis on two longitudinal datasets (from 2015 to 2017 and from 2017 to 2019, respectively) and generated two VA scores per dataset, one for language and one for mathematics achievement. We further analyzed how many schools displayed stable VA scores in the respective outcomes over two years, and compared the rank correlations of VA scores between language and mathematics achievement as an outcome variable.
Results and Their Significance: Only 34-38 % of the schools showed stable VA scores from grade 1 to 3 with moderate rank correlations of r = .37 with language and r = .34 with mathematics achievement. We therefore discourage using VA models as the only information for high-stakes educational decisions. Nonetheless, we argue that VA models could be employed to find genuinely effective teaching or school practices—especially in heterogeneous student populations, such as Luxembourg, in which educational disparities are an important topic already in primary school (Hoffmann et al., 2018). Consequently, we contrast the school climate and instructional quality, which might be a driver of the differences between schools with stable high vs. low VA scores.


Braun, H. (2005). Using student progress to evaluate teachers: A primer on value-added models. Educational Testing Service.
Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. American Economic Review, 104(9), 2593–2632.
Everson, K. C. (2017). Value-added modeling and educational accountability: Are we answering the real questions? Review of Educational Research, 87(1), 35–70.
Ferrão, M. E. (2012). On the stability of value added indicators. Quality & Quantity, 46(2), 627–637.
Gorard, S., Hordosy, R., & Siddiqui, N. (2013). How unstable are “school effects” assessed by a value-added technique? International Education Studies, 6(1), 1–9.
Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment. Research Paper. MET Project. Bill & Melinda Gates Foundation.
Levy, J., Brunner, M., Keller, U., & Fischbach, A. (2019). Methodological issues in value-added modeling: An international review from 26 countries. Educational Assessment, Evaluation and Accountability, 31(3), 257–287.
LUCET. (2021). Épreuves Standardisées (ÉpStan).
Perry, T. (2016). English value-added measures: Examining the limitations of school performance measurement. British Educational Research Journal, 42(6), 1056–1080.
Thomas, S., Peng, W. J., & Gray, J. (2007). Modelling patterns of improvement over time: Value added trends in English secondary school performance across ten cohorts. Oxford Review of Education, 33(3), 261–295.
Faculty of Language and Literature, Humanities, Arts and Education (FLSHASE) > Luxembourg Centre for Educational Testing (LUCET)
Systematic Identification of High "Value-Added" in Educational Contexts - SIVA

There is no file associated with this reference.

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.