MDE / OEAA 1 Growing Pains: The State of the Art in Value- Added Modeling Presentation on March 2, 2005 to Michigan School Testing Conference By Joseph A. Martineau Psychometrician Office of Educational Assessment & Accountability Michigan Department of Education
Transcript
Slide 1
MDE / OEAA 1 Growing Pains: The State of the Art in Value-Added
Modeling Presentation on March 2, 2005 to Michigan School Testing
Conference By Joseph A. Martineau Psychometrician Office of
Educational Assessment & Accountability Michigan Department of
Education
Slide 2
MDE / OEAA 2 Why Value Added? Value Added measures of
achievement are being discussed as a possible addition to the
regulations of No Child Left Behind (NCLB). Various ways of
implementing Value Added in NCLB are possible One likely
implementation of Value Added is as another way to make safe harbor
if the percent proficiency targets are not met
Slide 3
MDE / OEAA 3 What is Value Added? In accountability, Value
Added is a term that describes the part of achievement (or change
in achievement) that is attributable to the effectiveness of a unit
(teacher or school) Positive estimates indicate units that are
above average, negative estimates indicate that units are below
average Defining what is attributable to the effectiveness of a
unit is a matter of philosophical debate
Slide 4
MDE / OEAA 4 The Logic of Value Added Holding educators
accountable for student performance has many pitfalls Educators
cannot control their students incoming achievement Educators cannot
control the effectiveness of their students previous
teachers/schools Educators cannot control the effects of non-
instructional student characteristics such as Poverty Parental
education Mobility Home environment Etcetera
Slide 5
MDE / OEAA 5 The Logic of Value Added, Continued Value Added
Models (VAM) attempt to obtain pure estimates of the contribution
of educators to student achievement and/or growth in achievement
The promise of VAM is that educators are held accountable only for
their impact on student learning The idea is not rocket science
(Sanders), but the implementation is (Reckase)
Slide 6
MDE / OEAA 6 The Idea Is Not Rocket Science For each school
Estimate the expected average achievement or gain score Calculate
the observed average achievement or gain score Subtract the
expected from the observed average score Define the resulting
difference between expected and observed scores as the value added
by the school
Slide 7
MDE / OEAA 7 The Idea Is Not Rocket Science Adjusting
Achievement Targets to be More Fair to Educators
Slide 8
MDE / OEAA 8 The Idea Is Not Rocket Science Adjusting Gain
Targets to be More Fair to Educators (Tennessee Model)
Slide 9
MDE / OEAA 9 The Idea Is Not Rocket Science Adjusting Gain
Targets to be More Fair to Educators (Dallas Model)
Slide 10
MDE / OEAA 10 The Idea Is Getting Closer to Rocket Science
Adjusting Yearly Gain Targets to Meet a Final Achievement Goal
(Thum Model)
Slide 11
MDE / OEAA 11 The Implementation IS Rocket Science In a
Growth-Based VAM, For Each School You Must 1.Specify a Mixed Model
(a sophisticated statistical procedure that accounts for the
structure of data coming from multiple occasions for each student,
and multiple students per unit) 2.Estimate an overall average gain
for each school year, and for the entire set of students and
schools 3.Estimate a unique expected average gain for each school
year and school 4.Estimate the difference between the schools
actual average trajectory and the expected average trajectory for
each school year and school 5.Keep track of previous schools
effects so that they dont get counted toward later schools
6.Estimate a unique expected gain for each school year, student,
and school 7.Estimate the difference between the expected gain and
the actual gain for each school year, student, and school 8.Keep
track of all differences across years so that a students high
growth in one year is not counted toward all subsequent years
9.Estimate all of these expected and actual gains together so that
they are unbiased and reliable 10.Do this all using a sparse data
matrix, which causes ordinary software to choke 11.So, you write
your own software, and develop new applications of statistical
theory to make your idea work 12.Communicate the results in an
understandable fashion to stakeholders
Slide 12
MDE / OEAA 12 The Problem with Rocket Science And with rocket
science, many things can cause large distortions in the results of
VAM, including Small problems with the scales of measurement Small
programming errors Small errors in assumptions needed for the
statistical models to work appropriately
Slide 13
MDE / OEAA 13 Statistical Issues in VAM 50 years ago,
researchers despaired of every being able to measure growth
validly, because the statistical issues seemed insurmountable Most
of the statistical issues have been solved by the introduction of
Statistical Mixed Models
Slide 14
MDE / OEAA 14 Statistical Issues in VAM, Continued For VAM, one
very significant statistical issue remains The parts of the
statistical models that produce estimates of Value Added were
originally included in statistical models with the purpose of
accounting for sources of error so that other effects were easier
to identify. Therefore Therefore, estimates of value added can also
be classified as error terms Estimates of Value Added are
technically the portion of achievement or gains that cannot be
explained by anything else included in the model In effect, the
implementation of a Value-Added Model says whatever portion of
achievement and/or growth we do not know how to explain is to be
attributed to schools
Slide 15
MDE / OEAA 15 Statistical Issues in VAM, Continued
Philosophical, ethical, and political considerations of attributing
to schools all achievement/gains that cannot be explained any other
way Do we have to remove differences explained by ethnicity before
we can attribute the rest to schools? Do we have to remove
differences explained by poverty before we can attribute the rest
to schools? Etcetera Is it possible to ever satisfy the majority of
stakeholders that whats left over is pure enough to hold schools
accountable for? No matter how we answer these questions, it raises
additional philosophical, ethical, and political concerns.
Slide 16
MDE / OEAA 16 Ethical Issues in VAM, Continued VAMs as
Currently Implemented Focus lies squarely on being fair to
educators In TN and OH All educators are expected to produce the
same average gains in their students The achievement gap is
expected to remain as it was because educators or lower-achieving
groups of students are not expected to help their students catch up
In Dallas All educators are expected to produce gains in their
students that are equivalent to the average gains achieved by
similar groups of students The achievement gap may be expected to
widen because lower performing groups of students may achieve lower
average gains than other groups of students
Slide 17
MDE / OEAA 17 Ethical Issues in VAM, Continued Where does VAM
take into account fairness for low- performing students? Currently
implemented VAMs say basically, I need to see one years growth for
one year of instruction where (as in the Dallas model), one years
worth of growth can be less for some groups of students than for
others Because of concerns about being fair to educators, groups of
students that start out behind are left behind by the same amount
(or even more) Thum model is a compromise that expects a modest
amount more of educators serving low-achieving students, but that
the gap will be closed over many grades Not really a VAM A mixture
of status and growth
Slide 18
MDE / OEAA 18 Political Issues in VAM Complexity Rocket Science
is a political liability As more of the statistical and ethical
issues of VAM are addressed, VAMs are likely to become even more
inaccessible to the lay audience VAM requires an extraordinary
amount of trust in those who implement the system Ethical issues
will be decided by a political process that does not necessarily
account for the best interest of students and educators, e.g.
Dallas: Focus on best interests of educators at the possible price
of increasing achievement gaps TN, HO: Focus on best interests of
educators at the possible price of leaving achievement gaps as they
are Thum: Focus on best interests of low-performing groups at the
possible expense of (1) high-performing groups of students, and (2)
making low- achieving schools less attractive to qualified teachers
The state of the art in VAM is incapable of providing for both high
achievement for all students and fairness in evaluating educators
of lower-performing students
Slide 19
MDE / OEAA 19 Measurement Issues in VAM Having solved most of
the statistical issues in VAM, the measurement issues have been
forgotten in the excitement
Slide 20
MDE / OEAA 20 Measurement Issues in VAM, Continued Assumes that
the same thing is being measured at every grade level of the test
Presents a dilemma In order to measure validly, we have to measure
what is being taught, which changes over grade levels In order to
calculate growth, gains, and value-added, we have to measure the
same thing every time we measure Value added models are being
applied to construct-shifting scales as if the scales were
interval-level measures of student achievement on unchanging
content
Slide 21
MDE / OEAA 21 Cautions in using Vertical Scales Scholars have
been warning against the use of construct-shifting scales to
measure growth for 50+ years However, the use of vertical scales in
growth models has become increasingly prevalent in scholarly
literature with the advent of recent statistical developments (HLM
and SEM) So am I just straining at gnats? Cant I just use vertical
scales to measure growth? What harm can it do? How big is the
effect of changing content on growth- and growth-based value-added
models?
Slide 22
MDE / OEAA 22 Hypothetical example A vertically scaled
mathematics test Grades 3-8 Composed of only two constructs Basic
Computation (BC) Problem Solving (PS) BC is heavily represented in
early grades PS is heavily represented in later grades Only the
single, combined math score is available (BC and PS are just in the
background)
Slide 23
MDE / OEAA 23 Hypothetical example
Slide 24
MDE / OEAA 24 Hypothetical Example
Slide 25
MDE / OEAA 25 Hypothetical Example
Slide 26
MDE / OEAA 26 The Effects of Construct Shift Construct shift
affects The estimation of educational effectiveness (the results of
Value-Added Models) Does not accurately identify effectiveness if
student achievement is outside the range measured well by the
grade-level test Attributes effectiveness of prior teachers/schools
to current teachers/schools (violates the promise of Value-Added
Models)
Slide 27
MDE / OEAA 27
Slide 28
MDE / OEAA 28 Reliability Ratio of construct-related variance
to total variance (construct-related plus non- construct-related
variance) Extend to Value-Added Models Ratio of variance in true
value added to total variance (true value-added variance plus
variance of distortions) How important is this distortion,
especially when the constructs are correlated?
Slide 29
MDE / OEAA 29 Reliability Martineau (in press) derived an an
upper bound on reliability of VAM Affected by content balance (more
balanced means lower reliability) Affected by correlation in value
added (higher correlation means higher reliability) Affected by
grade level (later grades have lower reliability) Affected by
magnitude of changes in content across grades (larger changes mean
lower reliability)
Slide 30
MDE / OEAA 30 Reliability of VAM Results
Slide 31
MDE / OEAA 31 Reliability Only in extraordinary circumstances
are the results reliable enough for high-stakes use For research
use, the results may be reliable enough in some limited
circumstances
Slide 32
MDE / OEAA 32 Alleviating low reliability of value- added
analyses Twice a year testing Not politically viable Completely
eliminates low reliability Once yearly testing, new equating design
Embed the entire set of below-grade items on the current grade test
by including a small portion of the set on each of multiple test
forms Calibrate a separate vertical scale for each adjacent pair of
grades (e.g. 3/4, 4/5, 5/6) Concurrent calibration of grade 3 and 4
items together, 4 and 5 items together, 5 and 6 items together
Should markedly reduce the amount of construct shift, and increase
the reliability to an acceptable degree
Slide 33
MDE / OEAA 33 Contact Information Joseph Martineau Office of
Educational Assessment & Accountability Michigan Department of
Education P.O. Box 30008 Lansing, MI 48909 (517) 241-4710
[email protected]