Post on 28-Dec-2015
transcript
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
CENTER FOR EDUCATION POLICY ANALYSIS at STANFORD UNIVERSITY cepa.stanford.edu
Measuring and Enhancing Teacher Effectiveness:
Data, Methods, and Policies
Susanna Loeb*
Higher School of Economics National Research University, Moscow
September 2014*content joint with Jim Wyckoff & Allison Atteberry, Ben Master, Matt Ronfeldt or Luke Miller
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Why Measure Teacher Effectiveness?
• Better decisions– Direct
• e.g. whom to promote
– Indirect • Improved understanding
– e.g. what experiences improveteacher effectiveness?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Today
• A bit of history on teacher effectiveness measures in the US
• Considerations of Measurement
• Four examples of potential uses– focus on the last one
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Large-Scale Test Data Availability
• Test-Based Accountability– State Level First
• TX, NC, SC, FL and others introduced yearly tests to track school performance.
– Federal Level - No Child Left Behind Act• Required ELA and math tests in 3rd-8th grade plus one in high school
• State and district data allowed researchers to assess policy effects and the effects of teachers– Teachers vary widely in their ability to improve student
achievement (Gordon, Kane, & Staiger 2006; Rivkin, Hanushek, & Kain 2005; Sanders & Rivers 1996)
– Teachers improve with experience, particularly during their first two years (e.g. Rockoff, 2004)
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
The Widget Effect
• 2009 Study in 12 large school districts
• Schools and districts– Not measuring teacher effectiveness
• In districts that use binary evaluation ratings (generally “satisfactory” or “unsatisfactory”), more than 99 percent of teachers receive the satisfactory rating.
• Districts that use a broader range of rating options do little better; in these districts, 94 percent of teachers receive one of the top two ratings and less than 1 percent are rated unsatisfactory.
– Not considering teacher effectiveness in decisions
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Push for Evaluation
• Combination of– Recognition of Teacher Importance– Recognition of the Widget Effect
• Lead to strong push for new evaluation systems– Not based solely on subjective assessments given the
forces leading to little variation.
• Speed of change probably due to Obama administration policies– close ties to entrepreneurial educators: TNTP, TFA…
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Race to the Top• $4.35 Billion Competition as part of the American Recovery
and Reinvestment Act of 2009
• Most points for “Great Teachers and Leaders” (138/500)– Improving teacher and principal effectiveness based on
performance (58 points)– Ensuring equitable distribution of effective teachers and
principals (25 points)– Providing high-quality pathways for aspiring teachers and
principals (21 points)– Providing effective support to teachers and principals (20 points)– Improving the effectiveness of teacher and principal preparation
programs (14 points)
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Improving teacher effectiveness using performance measures
• Raises Questions– How to measure effectiveness?– How to use measures of effectiveness once you have them?
• What are different kinds?– Output based (e.g., based on student test performance)– Process based (e.g., based on structured observational protocol)– Holistic / Subjective (e.g., principal evaluations)
• What features do we want?– Validity (measurement property)– Reliability (measurement property)– Stability (effectiveness property)
• Focus today on measures based on student test scores – Similar analyses could be done with other measures
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Value-Added• Measure teacher effectiveness by how much students’ test
performance improve from the spring of the prior year to the spring of the current year
• Idea is to isolate the teacher’s effect from other effects on learning – “value-added”
• Can only be calculated for teachers in grades and subject areas for which there are tests in the prior year as well as the current year
• Clearly better than using test performance levels
• Far from perfect– e.g., based on imperfect tests, subject to random fluctuations and
potential gaming
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
VAM - How are they calculated• Student test scores gains relative to what we think they would be
• Most are a basic regression– Predict what a student would score in the spring based on linear function of
prior score, demographic characteristics, program participation (maybe), class characteristics, school characteristics
– Value added is the average differences between predicted and actual
• “Colorado Growth Model”– For each student, how much do they learn relative to other students with the
same prior test score (percentiles)?– Median percentile of growth for the class
• Do Different Value-Added Models Tell Us the Same Things? – Models vary in how they account for student backgrounds, school, and
classroom resources and whether they compare teachers across a district (or state) or just within schools.
– Correlations between models are often high, but even so different models will categorize many teachers differently. (Goldhaber & Theobald, 2013)
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
A detailed example
Test Score
Predicted by prior score, background, and classroom
Use residual (plus classroom) and predicted by classmate
& school characteristics
Average residual for each teacher
NYC Standard Deviations: ELA: 0.24 (.19 shrunk) Math: 0.28 (.21 shrunk)
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Is VA a “Good” Measure?
• Carnegie Knowledge Network– http://www.carnegieknowledgenetwork.org/– Test score measures imperfect measure of all we
care about for students– Not obvious bias (especially within schools)– Substantial measurement error– Less when considering groups of teachers– Benefits of use depend on alternatives
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
POTENTIAL USES:2 DIRECT AND 2 INDIRECT
Understanding and Decision Making
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Example 1: simulated usethe case of Layoffs
• Several school districts confronted teacher layoffs in the Spring 2010 and 2011– Some avoided layoffs, e.g., New York City– Others did not, e.g., LA and DC
• Layoffs nearly always determined by a measure of seniority
• Many superintendents raised concerns that seniority layoffs compromise teacher quality
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
What might we expect if substituted VA for Seniority?
• Seniority layoffs typically affect teachers with two or fewer years of experience– On average teachers improve markedly during
their first 3-4 years
• Large variance in teacher effectiveness within and across experience
• Many districts have recently focused on recruiting more able teachers
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Simulate: Who is laid off by 5% Salary Savings under Seniority vs. VA?
Simply simulated what would happen if 5% of the workforce had been laid off two years earlier by seniority or value-added• Fewer teachers laid off with VA layoffs:
– Seniority-based layoff system would layoff 7% of teachers
– VA system would terminate 5% of teachers
• Little overlap– Only 13% of seniority layoffs would also be laid off by VA
– VA estimates that control for experience reduces overlap to 5%
• VA layoffs are, on average, 7 years more experienced than seniority layoffs
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
CENTER FOR EDUCATION POLICY ANALYSIS at STANFORD UNIVERSITY cepa.stanford.edu
Value-Added of Layoffs by Seniority and VA
4th and 5th grade
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
How would principals have rated laid off teachers?
• 2.5% of our sample received an “Unsatisfactory” rating by their principal from 2006-09
– Of these 16% would have been VA layoffs, but only 8% of VA layoffs would have received a “U” rating
– none would have been seniority layoffs
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Effects on Student LearningDifference 2007 2009
Std deviations of student achievement
.36 .12
Std deviations of teacher VA
1.9 0.70
Small effect overall since only 5% laid off, but large effects on students with the effected teachers.
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Layoff Example
• Dismissal based on teacher performance measures likely to have less negative effects on students than dismissal based on experience
• In reality, given coverage and reliability concerns, value-added measures would likely be used in combination with other performance measures
• Availability of performance measures allowed for simulation of policy effects that could be helpful for policy decisions
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Teacher Tenure: job protection most often received after 3 years
Tenure history▫ NJ first tenure law 1909; NY 1917; CA 1921; MI, PA WI 1937▫ 48 states▫ Contentious then, contentious now
Policy on two tracks▫ Eliminate tenure
• GA: eliminated 2001, reinstated 2003• ID: passed 2011, voters repealed 2012• SD: passed 2012, voters upheld, will eliminate by 2016• FL: eliminated in 2011; NC: will eliminate by 2018
▫ Make more rigorous• More than half the states require meaningful evaluation• 20 states require student test performance• 25 states have multiple categories for evaluation
Example 2: actual usethe case of Promotion
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Principal recommends, superintendent decides Tenure decisions: approve, extend or deny
Prior to 2009-10 tenure largely automatic
Reform encouraged careful review 2009-10
▫ Classroom obs, evals of teacher work products, annual S/D/U ratings▫ Teacher data reports (value-added measures for some teachers); in-class assessments aligned with NY
standards▫ District guidance: “tenure in doubt”, “tenure likely”; rationale for cases that countered district
guidance
2010-11▫ All teachers rated as highly effective, effective, developing, ineffective▫ District performance flags, but no guidance
2011-12▫ Same as before except value-added measures not available in time
2012-13▫ Same as before with State provided growth scores and growth ratings replacing local value-added
measures
New York City tenure policy
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
2007-08 2008-09 2009-10 2010-11 2011-12 2012-130%
20%
40%
60%
80%
100%
Approve Deny Extend
Perc
en
t of
Decis
ion
s
How did tenure rates change following reform?
New tenure Policy
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
SAT Math
SAT Verb
LAST Exam
U Rated
D Rated
Low Attd
505 505 257 5.7 22.2 37.1
490 494 254 52.1 66.7 56.2
469 490 248 42.2 11.1 6.7
Attributes of teachers by tenure decision,2010-11 to 2012-13
Tenure Decision
VAM ELA*
VAM Math*
Approve 0.081 0.248
Extend -0.138 -0.129
Deny -0.115 -0.740* Value added results for only 2010-11.
38% of a SD in teacher effectiveness
Which teachers were affected by the policy?
Extend v. Approve: p<0.05 Extend v. Deny: p<0.05
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Attributes of extended teachers by attrition behavior, 2010-11 & 2011-12
Attrition StatusVAM ELA
VAM Math
SAT Math
SAT Verbal
LASTCert Exam
Same School -0.091~ -0.090 491 495 253**
Transfer -0.355 -0.421 482 486 253Exit -0.332 -0.145 530 539 267
Notes: ** p<0.01, * p<0.05, ~ p<0.1 – compares same school to transfer/exit
How did the composition continuing teachers change following reform?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Tenure Example
• Effectiveness measures used directly in practice
– Reform of practice, not policy, that worked within the current contract
• Imprecision is part of all evaluation measures
– Here structure of reform allows for corrections
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Example 3: to understand schooling, the case of Turnover,
• Nationally, about 1/3 teachers leave the profession in first 5 years – Higher in high-poverty, urban, & low-performing
schools (Hanushek, Kain & Rivkin, 1999)
• In NYC, about 14% of 4th & 5th grade teachers leave their school each year
• 4% migrate schools, 10% leave district
• Is this problematic?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
• Teacher turnover often assumed to harm student achievement…but is it?– Little empirical evidence for direct effect (Guin, 2004)
• Turnover rates are higher in lower-performing schools (Guin, 2004; Hanushek et al. 1999)– Causal? A third factor explaining both (principal leaving)? – Direction?
• Some turnover can be beneficial – new ideas, person-job match (Organizational management lit, e.g. Abelson & Baysinger, 1984)
Background
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Consider 2 Theories of Action
• Compositional – turnover changes composition of teachers (esp. quality) which, in turn, impacts achievement
• Disruption – disruptive effect beyond changes in composition of teachers– Organizational -- ALL teachers – NOT just leavers & their replacements
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
• Unique identification strategy – school-by-grade-by-year level turnover (2 measures)
• Two classes of fixed-effects regression models– Grade-by-School: Look within same school and grade
across time • lower achievement in years with more turnover?
– School-by-Year: Within same school and year across grades • Lower achievement in grades with more turnover?
Methods
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
• Student achievement is lower in years/grades when turnover rates were higher
• Math scores are 8-10 percent of a standard deviation lower in years when there is 100 percent turnover (vs. no turnover). ELA smaller effect: 5-6 percent
• In a grade level that has 5 teachers, reducing turnover from 2 teachers leaving to none increases math achievement by 3% of SD– Small but meaningful, and applies to all students in grade level– Roughly same magnitude of coefficient on free lunch eligibility
• Probably underestimating effect exploiting “idioscyncratic” turnover (ignore systemic effects)
Findings
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Is the effect compositional?
• Control for teaching experience, new to the school, and value-added
• Evidence for compositional theory of action – Significant effect remains unexplained by
compositional (30-70%)
• Also, evidence for disruptive effect beyond changes in teacher composition– Students of stayers do worse in years with more
turnover
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Turnover Example
• Student test score measures used to better understand the implications of turnover of students
• Value-added measures allowed for distinguishing compositional effects of turnover from disruptive effects
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Example 4: to understand Teaching & Learning, the case of Persistent Learning
• Final example – explores what students learn in school and how
that impacts their later achievements
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Getting on the same page
Knowledge& Skill
ContentSubject Specific
Overlapping / General
TermLong that builds
Short or peripheral
LearningSource
TeacherOther
Knowledge & Skill Type
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Getting on the same page
Short-Term
Long-Term Subject
Long-Term
General
Prior Current
Short-Term
Long-Term Subject
Long-Term
GeneralPriorTeacher
CurrentTeacher
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Cross-subject effects
Short-Term
Long-Term Subject
Long-Term
General
Prior CurrentOther Subject
Short-Term
Long-Term Subject
Long-Term
GeneralPriorTeacher
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Why Might Teachers Vary In Persistence?
• different forgetting of “long-run” knowledge
Different Students
• different abilitiesDifferent Teachers
• different incentives (e.g. teaching to the test) or supports
Different Schools
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Relevant Extant ResearchStudent test score gains depend on their teacher
Some but not all teacher-driven gains persists into future years (about 20%-35%)
Persistence is higher for test-score gains on low-stakes tests
Knowledge gains from teachers result in long-run gains in earnings
Long-term earning gains are greater for ELA knowledge gained from teachers (though teachers affect ELA less)
Long-term earnings effects lower for low-income students, even though teachers’ effects on test-scores are similar
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
What’s missing (and interesting)?
• Few persistence studies – Replication
• No cross-subject persistence studies for test performance– Distinguishing general and specific knowledge gains
• Few studies of variance in persistence
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Research Questions1. What is the persistence of teachers’ value-added
within and across subject areas?
2. Does value-added persistence vary by teachers’ ability?
3. Does value-added persistence vary by students’ background or prior achievement?
– Does variation in persistence stem from students’ differential rates of forgetting previously acquired long-term knowledge?
4. Do school-level characteristics predict variation in teachers’ persistence?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
1. What is the persistence of teachers’ value-added within and across subject areas?• Use method from Jacob, Lefgren and Sims (2010)
• Predict current test score with students’ prior test score, – Same subject: Gives observed relationship between prior and current
score.– Other subject: Gives observed relationship between prior and current
score in other subject.
• Instruments prior score with twice lagged score (only using variation in score that was there the prior year)– Same subject: How much of long-term knowledge is retained– Other subject: How much long-term knowledge is general (applies to
both subjects)
• Instruments prior knowledge with prior teacher value-added (only using variation in score that came from teacher)– Same subject: How much of learning from teacher is persistent – Other subject: How much learning from teacher is general
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Cross subject
• Replace the outcome measure with the other subject score (and classroom fixed effects with other subject classroom fixed effects)
• Long-run knowledge– Same approach captures percent of long-term
knowledge that is general knowledge
• Persistence– Same approach captures percent of teacher effect
that is persistent through only general knowledge
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Context: Correlations ELA teachers’ value added
Not Much
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Research Question 1 What is the persistence of teachers’ value-
added within and across subject areas?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Persistence of Observed Knowledge, Long Term Knowledge, and Teacher Value Added
Retain most long-term knowledgeRetain about 20% of learned knowledge
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Cross-subject
Learning from ELA teachers affects future math 3+ times as much as Math teachers affect ELA
(almost as much as math learning affects math)
About 60% of long-term goes across subjects
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Research Question 2 Does value-added persistence vary
by teachers’ ability?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Table 4: Heterogeneity of ELA Teachers’ Persistence
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Table 5: Heterogeneity of Math Teachers’ Persistence
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Research Question 3 Does value-added persistence vary by students’ background or prior scores?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Heterogeneity of ELA Teachers’ Persistence
Poor, Black, Hispanic and Low-Performing Student Retain Less of What They Learn from Teachers
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Heterogeneity of Math Teachers’ Persistence
Not the same for math except:Math learning has even less of an effect on ELA for
Black, Hispanic and Low-Scoring Students
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Does variation in persistence stem from students’ differential rates of forgetting
previously acquired long-term knowledge?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Table 6: Heterogeneity in Long-Term Knowledge Persistence
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Table 6: Heterogeneity in Long-Term Knowledge Persistence
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Research Question 4Do school-level characteristics predict
variation in teachers’ persistence?
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
ELA Teacher persistence estimates across multiple school-level characteristics
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Summary
1. About 20 percent of what students learn from a teacher is long-term knowledge– Similar for math teachers and ela teachers
2. More of ELA teachers’ effect work through general knowledge that affects Math as well as ELA – about 15% of learning vs 4% for math
3. ELA teacher persistence is higher for high ability teachers
4. ELA teacher persistence is lower for low-performing and low-income students– Higher rate of forgetting explains a small part– Schools explain far more – persistence lower in in schools
serving low performing students with few high-ability teachers
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
cepa.stanford.edu
Implications
• ELA teaching affects both ELA and Math learning
• Teachers vary in their persistence in ways not captured by value-added
• Likely causes (worth considering when assessing teachers)– Ability– Incentives
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Examples: VA for Direct and Indirect Use
1. Layoffs – Simulating potential policy effects when used for layoffs
2. Tenure – Tracing policy effects with used in practice
3. Turnover – Understanding the implications of school processes for student learning
4. Persistence - Understanding teaching and learning
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
Measures of Effectiveness• Inherently flawed– Do not captured the full range of effectiveness– Measurement error (affected by unobserved shocks and
differences)– May have bias
• Yet, may be useful in practice– Real-time decision making– Broader understanding
• Whether value-added is useful– Availability of tests that measure valued outcomes– Availability of alternative measures of teacher
effectiveness
CE
NT
ER F
OR E
DU
CA
TIO
N P
OL
ICY
AN
ALY
SIS
at STA
NF
OR
D U
NIV
ER
SIT
Ycepa.sta
nford.edu
CENTER FOR EDUCATION POLICY ANALYSIS at STANFORD UNIVERSITY cepa.stanford.edu
Measuring and Enhancing Teacher Effectiveness:
Data, Methods, and Policies
Susanna Loeb*
Higher School of Economics National Research University, Moscow
September 2014*content joint with Jim Wyckoff & Allison Atteberry, Ben Master, Matt Ronfeldt or Luke Miller