C ENTER FOR E DUCATION P OLICY A NALYSIS at S TANFORD U NIVERSITY cepa.stanford.edu C ENTER FOR E...

transcript

at STA

Ycepa.sta

nford.edu

CENTER FOR EDUCATION POLICY ANALYSIS at STANFORD UNIVERSITY cepa.stanford.edu

Measuring and Enhancing Teacher Effectiveness:

Data, Methods, and Policies

Susanna Loeb*

Higher School of Economics National Research University, Moscow

September 2014*content joint with Jim Wyckoff & Allison Atteberry, Ben Master, Matt Ronfeldt or Luke Miller

at STA

Ycepa.sta

nford.edu

Why Measure Teacher Effectiveness?

• Better decisions– Direct

• e.g. whom to promote

– Indirect • Improved understanding

– e.g. what experiences improveteacher effectiveness?

at STA

Ycepa.sta

nford.edu

• A bit of history on teacher effectiveness measures in the US

• Considerations of Measurement

• Four examples of potential uses– focus on the last one

at STA

Ycepa.sta

nford.edu

Large-Scale Test Data Availability

• Test-Based Accountability– State Level First

• TX, NC, SC, FL and others introduced yearly tests to track school performance.

– Federal Level - No Child Left Behind Act• Required ELA and math tests in 3rd-8th grade plus one in high school

• State and district data allowed researchers to assess policy effects and the effects of teachers– Teachers vary widely in their ability to improve student

achievement (Gordon, Kane, & Staiger 2006; Rivkin, Hanushek, & Kain 2005; Sanders & Rivers 1996)

– Teachers improve with experience, particularly during their first two years (e.g. Rockoff, 2004)

at STA

Ycepa.sta

nford.edu

The Widget Effect

• 2009 Study in 12 large school districts

• Schools and districts– Not measuring teacher effectiveness

• In districts that use binary evaluation ratings (generally “satisfactory” or “unsatisfactory”), more than 99 percent of teachers receive the satisfactory rating.

• Districts that use a broader range of rating options do little better; in these districts, 94 percent of teachers receive one of the top two ratings and less than 1 percent are rated unsatisfactory.

– Not considering teacher effectiveness in decisions

at STA

Ycepa.sta

nford.edu

Push for Evaluation

• Combination of– Recognition of Teacher Importance– Recognition of the Widget Effect

• Lead to strong push for new evaluation systems– Not based solely on subjective assessments given the

forces leading to little variation.

• Speed of change probably due to Obama administration policies– close ties to entrepreneurial educators: TNTP, TFA…

at STA

Ycepa.sta

nford.edu

Race to the Top• $4.35 Billion Competition as part of the American Recovery

and Reinvestment Act of 2009

• Most points for “Great Teachers and Leaders” (138/500)– Improving teacher and principal effectiveness based on

performance (58 points)– Ensuring equitable distribution of effective teachers and

principals (25 points)– Providing high-quality pathways for aspiring teachers and

principals (21 points)– Providing effective support to teachers and principals (20 points)– Improving the effectiveness of teacher and principal preparation

programs (14 points)

at STA

Ycepa.sta

nford.edu

Improving teacher effectiveness using performance measures

• Raises Questions– How to measure effectiveness?– How to use measures of effectiveness once you have them?

• What are different kinds?– Output based (e.g., based on student test performance)– Process based (e.g., based on structured observational protocol)– Holistic / Subjective (e.g., principal evaluations)

• What features do we want?– Validity (measurement property)– Reliability (measurement property)– Stability (effectiveness property)

• Focus today on measures based on student test scores – Similar analyses could be done with other measures

at STA

Ycepa.sta

nford.edu

Value-Added• Measure teacher effectiveness by how much students’ test

performance improve from the spring of the prior year to the spring of the current year

• Idea is to isolate the teacher’s effect from other effects on learning – “value-added”

• Can only be calculated for teachers in grades and subject areas for which there are tests in the prior year as well as the current year

• Clearly better than using test performance levels

• Far from perfect– e.g., based on imperfect tests, subject to random fluctuations and

potential gaming

at STA

Ycepa.sta

nford.edu

VAM - How are they calculated• Student test scores gains relative to what we think they would be

• Most are a basic regression– Predict what a student would score in the spring based on linear function of

prior score, demographic characteristics, program participation (maybe), class characteristics, school characteristics

– Value added is the average differences between predicted and actual

• “Colorado Growth Model”– For each student, how much do they learn relative to other students with the

same prior test score (percentiles)?– Median percentile of growth for the class

• Do Different Value-Added Models Tell Us the Same Things? – Models vary in how they account for student backgrounds, school, and

classroom resources and whether they compare teachers across a district (or state) or just within schools.

– Correlations between models are often high, but even so different models will categorize many teachers differently. (Goldhaber & Theobald, 2013)

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

A detailed example

Test Score

Predicted by prior score, background, and classroom

Use residual (plus classroom) and predicted by classmate

& school characteristics

Average residual for each teacher

NYC Standard Deviations: ELA: 0.24 (.19 shrunk) Math: 0.28 (.21 shrunk)

at STA

Ycepa.sta

nford.edu

Is VA a “Good” Measure?

• Carnegie Knowledge Network– http://www.carnegieknowledgenetwork.org/– Test score measures imperfect measure of all we

care about for students– Not obvious bias (especially within schools)– Substantial measurement error– Less when considering groups of teachers– Benefits of use depend on alternatives

at STA

Ycepa.sta

nford.edu

POTENTIAL USES:2 DIRECT AND 2 INDIRECT

Understanding and Decision Making

at STA

Ycepa.sta

nford.edu

Example 1: simulated usethe case of Layoffs

• Several school districts confronted teacher layoffs in the Spring 2010 and 2011– Some avoided layoffs, e.g., New York City– Others did not, e.g., LA and DC

• Layoffs nearly always determined by a measure of seniority

• Many superintendents raised concerns that seniority layoffs compromise teacher quality

at STA

Ycepa.sta

nford.edu

What might we expect if substituted VA for Seniority?

• Seniority layoffs typically affect teachers with two or fewer years of experience– On average teachers improve markedly during

their first 3-4 years

• Large variance in teacher effectiveness within and across experience

• Many districts have recently focused on recruiting more able teachers

at STA

Ycepa.sta

nford.edu

Simulate: Who is laid off by 5% Salary Savings under Seniority vs. VA?

Simply simulated what would happen if 5% of the workforce had been laid off two years earlier by seniority or value-added• Fewer teachers laid off with VA layoffs:

– Seniority-based layoff system would layoff 7% of teachers

– VA system would terminate 5% of teachers

• Little overlap– Only 13% of seniority layoffs would also be laid off by VA

– VA estimates that control for experience reduces overlap to 5%

• VA layoffs are, on average, 7 years more experienced than seniority layoffs

at STA

Ycepa.sta

nford.edu

Value-Added of Layoffs by Seniority and VA

4th and 5th grade

at STA

Ycepa.sta

nford.edu

How would principals have rated laid off teachers?

• 2.5% of our sample received an “Unsatisfactory” rating by their principal from 2006-09

– Of these 16% would have been VA layoffs, but only 8% of VA layoffs would have received a “U” rating

– none would have been seniority layoffs

at STA

Ycepa.sta

nford.edu

Effects on Student LearningDifference 2007 2009

Std deviations of student achievement

.36 .12

Std deviations of teacher VA

1.9 0.70

Small effect overall since only 5% laid off, but large effects on students with the effected teachers.

at STA

Ycepa.sta

nford.edu

Layoff Example

• Dismissal based on teacher performance measures likely to have less negative effects on students than dismissal based on experience

• In reality, given coverage and reliability concerns, value-added measures would likely be used in combination with other performance measures

• Availability of performance measures allowed for simulation of policy effects that could be helpful for policy decisions

at STA

Ycepa.sta

nford.edu

Teacher Tenure: job protection most often received after 3 years

Tenure history▫ NJ first tenure law 1909; NY 1917; CA 1921; MI, PA WI 1937▫ 48 states▫ Contentious then, contentious now

Policy on two tracks▫ Eliminate tenure

• GA: eliminated 2001, reinstated 2003• ID: passed 2011, voters repealed 2012• SD: passed 2012, voters upheld, will eliminate by 2016• FL: eliminated in 2011; NC: will eliminate by 2018

▫ Make more rigorous• More than half the states require meaningful evaluation• 20 states require student test performance• 25 states have multiple categories for evaluation

Example 2: actual usethe case of Promotion

at STA

Ycepa.sta

nford.edu

Principal recommends, superintendent decides Tenure decisions: approve, extend or deny

Prior to 2009-10 tenure largely automatic

Reform encouraged careful review 2009-10

▫ Classroom obs, evals of teacher work products, annual S/D/U ratings▫ Teacher data reports (value-added measures for some teachers); in-class assessments aligned with NY

standards▫ District guidance: “tenure in doubt”, “tenure likely”; rationale for cases that countered district

guidance

2010-11▫ All teachers rated as highly effective, effective, developing, ineffective▫ District performance flags, but no guidance

2011-12▫ Same as before except value-added measures not available in time

2012-13▫ Same as before with State provided growth scores and growth ratings replacing local value-added

measures

New York City tenure policy

at STA

Ycepa.sta

nford.edu

2007-08 2008-09 2009-10 2010-11 2011-12 2012-130%

Approve Deny Extend

How did tenure rates change following reform?

New tenure Policy

at STA

Ycepa.sta

nford.edu

SAT Math

SAT Verb

LAST Exam

U Rated

D Rated

Low Attd

505 505 257 5.7 22.2 37.1

490 494 254 52.1 66.7 56.2

469 490 248 42.2 11.1 6.7

Attributes of teachers by tenure decision,2010-11 to 2012-13

Tenure Decision

VAM ELA*

VAM Math*

Approve 0.081 0.248

Extend -0.138 -0.129

Deny -0.115 -0.740* Value added results for only 2010-11.

38% of a SD in teacher effectiveness

Which teachers were affected by the policy?

Extend v. Approve: p<0.05 Extend v. Deny: p<0.05

at STA

Ycepa.sta

nford.edu

Attributes of extended teachers by attrition behavior, 2010-11 & 2011-12

Attrition StatusVAM ELA

VAM Math

SAT Math

SAT Verbal

LASTCert Exam

Same School -0.091~ -0.090 491 495 253**

Transfer -0.355 -0.421 482 486 253Exit -0.332 -0.145 530 539 267

Notes: ** p<0.01, * p<0.05, ~ p<0.1 – compares same school to transfer/exit

How did the composition continuing teachers change following reform?

at STA

Ycepa.sta

nford.edu

Tenure Example

• Effectiveness measures used directly in practice

– Reform of practice, not policy, that worked within the current contract

• Imprecision is part of all evaluation measures

– Here structure of reform allows for corrections

at STA

Ycepa.sta

nford.edu

Example 3: to understand schooling, the case of Turnover,

• Nationally, about 1/3 teachers leave the profession in first 5 years – Higher in high-poverty, urban, & low-performing

schools (Hanushek, Kain & Rivkin, 1999)

• In NYC, about 14% of 4th & 5th grade teachers leave their school each year

• 4% migrate schools, 10% leave district

• Is this problematic?

at STA

Ycepa.sta

nford.edu

• Teacher turnover often assumed to harm student achievement…but is it?– Little empirical evidence for direct effect (Guin, 2004)

• Turnover rates are higher in lower-performing schools (Guin, 2004; Hanushek et al. 1999)– Causal? A third factor explaining both (principal leaving)? – Direction?

• Some turnover can be beneficial – new ideas, person-job match (Organizational management lit, e.g. Abelson & Baysinger, 1984)

Background

at STA

Ycepa.sta

nford.edu

Consider 2 Theories of Action

• Compositional – turnover changes composition of teachers (esp. quality) which, in turn, impacts achievement

• Disruption – disruptive effect beyond changes in composition of teachers– Organizational -- ALL teachers – NOT just leavers & their replacements

at STA

Ycepa.sta

nford.edu

• Unique identification strategy – school-by-grade-by-year level turnover (2 measures)

• Two classes of fixed-effects regression models– Grade-by-School: Look within same school and grade

across time • lower achievement in years with more turnover?

– School-by-Year: Within same school and year across grades • Lower achievement in grades with more turnover?

Methods

at STA

Ycepa.sta

nford.edu

• Student achievement is lower in years/grades when turnover rates were higher

• Math scores are 8-10 percent of a standard deviation lower in years when there is 100 percent turnover (vs. no turnover). ELA smaller effect: 5-6 percent

• In a grade level that has 5 teachers, reducing turnover from 2 teachers leaving to none increases math achievement by 3% of SD– Small but meaningful, and applies to all students in grade level– Roughly same magnitude of coefficient on free lunch eligibility

• Probably underestimating effect exploiting “idioscyncratic” turnover (ignore systemic effects)

Findings

at STA

Ycepa.sta

nford.edu

Is the effect compositional?

• Control for teaching experience, new to the school, and value-added

• Evidence for compositional theory of action – Significant effect remains unexplained by

compositional (30-70%)

• Also, evidence for disruptive effect beyond changes in teacher composition– Students of stayers do worse in years with more

turnover

at STA

Ycepa.sta

nford.edu

Turnover Example

• Student test score measures used to better understand the implications of turnover of students

• Value-added measures allowed for distinguishing compositional effects of turnover from disruptive effects

at STA

Ycepa.sta

nford.edu

Example 4: to understand Teaching & Learning, the case of Persistent Learning

• Final example – explores what students learn in school and how

that impacts their later achievements

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Getting on the same page

Knowledge& Skill

ContentSubject Specific

Overlapping / General

TermLong that builds

Short or peripheral

LearningSource

TeacherOther

Knowledge & Skill Type

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Getting on the same page

Short-Term

Long-Term Subject

Long-Term

General

Prior Current

Short-Term

Long-Term Subject

Long-Term

GeneralPriorTeacher

CurrentTeacher

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Cross-subject effects

Short-Term

Long-Term Subject

Long-Term

General

Prior CurrentOther Subject

Short-Term

Long-Term Subject

Long-Term

GeneralPriorTeacher

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Why Might Teachers Vary In Persistence?

• different forgetting of “long-run” knowledge

Different Students

• different abilitiesDifferent Teachers

• different incentives (e.g. teaching to the test) or supports

Different Schools

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Relevant Extant ResearchStudent test score gains depend on their teacher

Some but not all teacher-driven gains persists into future years (about 20%-35%)

Persistence is higher for test-score gains on low-stakes tests

Knowledge gains from teachers result in long-run gains in earnings

Long-term earning gains are greater for ELA knowledge gained from teachers (though teachers affect ELA less)

Long-term earnings effects lower for low-income students, even though teachers’ effects on test-scores are similar

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

What’s missing (and interesting)?

• Few persistence studies – Replication

• No cross-subject persistence studies for test performance– Distinguishing general and specific knowledge gains

• Few studies of variance in persistence

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Research Questions1. What is the persistence of teachers’ value-added

within and across subject areas?

2. Does value-added persistence vary by teachers’ ability?

3. Does value-added persistence vary by students’ background or prior achievement?

– Does variation in persistence stem from students’ differential rates of forgetting previously acquired long-term knowledge?

4. Do school-level characteristics predict variation in teachers’ persistence?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

1. What is the persistence of teachers’ value-added within and across subject areas?• Use method from Jacob, Lefgren and Sims (2010)

• Predict current test score with students’ prior test score, – Same subject: Gives observed relationship between prior and current

score.– Other subject: Gives observed relationship between prior and current

score in other subject.

• Instruments prior score with twice lagged score (only using variation in score that was there the prior year)– Same subject: How much of long-term knowledge is retained– Other subject: How much long-term knowledge is general (applies to

both subjects)

• Instruments prior knowledge with prior teacher value-added (only using variation in score that came from teacher)– Same subject: How much of learning from teacher is persistent – Other subject: How much learning from teacher is general

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Cross subject

• Replace the outcome measure with the other subject score (and classroom fixed effects with other subject classroom fixed effects)

• Long-run knowledge– Same approach captures percent of long-term

knowledge that is general knowledge

• Persistence– Same approach captures percent of teacher effect

that is persistent through only general knowledge

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Context: Correlations ELA teachers’ value added

Not Much

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Research Question 1 What is the persistence of teachers’ value-

added within and across subject areas?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Persistence of Observed Knowledge, Long Term Knowledge, and Teacher Value Added

Retain most long-term knowledgeRetain about 20% of learned knowledge

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Cross-subject

Learning from ELA teachers affects future math 3+ times as much as Math teachers affect ELA

(almost as much as math learning affects math)

About 60% of long-term goes across subjects

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Research Question 2 Does value-added persistence vary

by teachers’ ability?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Table 4: Heterogeneity of ELA Teachers’ Persistence

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Table 5: Heterogeneity of Math Teachers’ Persistence

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Research Question 3 Does value-added persistence vary by students’ background or prior scores?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Heterogeneity of ELA Teachers’ Persistence

Poor, Black, Hispanic and Low-Performing Student Retain Less of What They Learn from Teachers

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Heterogeneity of Math Teachers’ Persistence

Not the same for math except:Math learning has even less of an effect on ELA for

Black, Hispanic and Low-Scoring Students

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Does variation in persistence stem from students’ differential rates of forgetting

previously acquired long-term knowledge?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Table 6: Heterogeneity in Long-Term Knowledge Persistence

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Table 6: Heterogeneity in Long-Term Knowledge Persistence

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Research Question 4Do school-level characteristics predict

variation in teachers’ persistence?

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

ELA Teacher persistence estimates across multiple school-level characteristics

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Summary

1. About 20 percent of what students learn from a teacher is long-term knowledge– Similar for math teachers and ela teachers

2. More of ELA teachers’ effect work through general knowledge that affects Math as well as ELA – about 15% of learning vs 4% for math

3. ELA teacher persistence is higher for high ability teachers

4. ELA teacher persistence is lower for low-performing and low-income students– Higher rate of forgetting explains a small part– Schools explain far more – persistence lower in in schools

serving low performing students with few high-ability teachers

at STA

Ycepa.sta

nford.edu

cepa.stanford.edu

Implications

• ELA teaching affects both ELA and Math learning

• Teachers vary in their persistence in ways not captured by value-added

• Likely causes (worth considering when assessing teachers)– Ability– Incentives

at STA

Ycepa.sta

nford.edu

Examples: VA for Direct and Indirect Use

1. Layoffs – Simulating potential policy effects when used for layoffs

2. Tenure – Tracing policy effects with used in practice

3. Turnover – Understanding the implications of school processes for student learning

4. Persistence - Understanding teaching and learning

at STA

Ycepa.sta

nford.edu

Measures of Effectiveness• Inherently flawed– Do not captured the full range of effectiveness– Measurement error (affected by unobserved shocks and

differences)– May have bias

• Yet, may be useful in practice– Real-time decision making– Broader understanding

• Whether value-added is useful– Availability of tests that measure valued outcomes– Availability of alternative measures of teacher

effectiveness

at STA

Ycepa.sta

nford.edu

Measuring and Enhancing Teacher Effectiveness:

Data, Methods, and Policies

Susanna Loeb*

Higher School of Economics National Research University, Moscow

September 2014*content joint with Jim Wyckoff & Allison Atteberry, Ben Master, Matt Ronfeldt or Luke Miller

C ENTER FOR E DUCATION P OLICY A NALYSIS at S TANFORD U NIVERSITY cepa.stanford.edu C ENTER FOR E...

Documents