www.engageNY.org
Teacher Effectiveness MeasurementSome Whys and Hows
Amy McIntosh
Senior Fellow, Regents Research Fund
November 1, 2012All Materials from research studies described here are reprinted with permission of authors
www.engageNY.org
Agenda
Discussion of new research studiesthat confirm:
•Teacher effectiveness matters•To improve it, NY needs to measure it, using multiple measures.
How we are measuring teacher effectiveness in NYS,•And how you can help
2
www.engageNY.org
Study Number One: Long-term Impacts of Teachers
The Long-Term Impacts of Teachers: Teacher Value-added and Student Outcomes in Adulthood (Chetty, Friedman & Rockoff). http://obs.rc.fas.harvard.edu/chetty/value_added.html
Study Data:• 2.5 MM children from childhood to early adulthood in 1
large district• Teacher/course linkages and test scores in grades 3-8 from
1991-2009• US government tax data from W-2s: on parents AND
students• About parents: household income, retirement savings, home
ownership, marriage, age when student born• About students up to age 28: teen birth, college attendance,
earnings, neighborhood “quality”
3
www.engageNY.org
What is “teacher value added”
4
A statistical measure of the
growth of a teacher’s students
that takes into account the differences in students across classrooms that school systems can
measure but teachers can’t control.
Researchers using “value-added” are measuring:
Growth compared to the average growth of similar students
• “similar” includes student, classroom and school characteristics
www.engageNY.org
Key Finding: Teacher effectiveness mattersHaving a higher value-added teacher for even one year in grades 4-8, has substantial positive long-term impacts on a student’s life outcomes including:
–Likelihood of attending college (UP 1.25%)–Likelihood of teen pregnancy (DOWN 1.25%)–Salary earned in lifetime (UP $25K per avg.
student)–Neighborhood (More college grads live there)–Retirement savings (UP)
5
www.engageNY.org
Study Number Two: Measures of Effective Teaching
6http://www.metproject.org
www.engageNY.org
Study Number Two: Measures of Effective Teaching
Unique project in many ways: in the variety of indicators tested,
5 instruments for classroom observations
Student surveys (Tripod Survey)
Value-added on state tests in its scale,
3,000 teachers
22,500 observation scores (7,500 lesson videos x 3 scores)
900 + trained observers
44,500 students completing surveys and supplemental assessments
• and in the variety of student outcomes studied.
Gains on state math and ELA tests
Gains on supplemental tests (BAM & SAT9 OE)
Student-reported outcomes (effort and enjoyment in class)
7
www.engageNY.org
Measures have different strengths …and weaknesses
8
Dynamic Trio
Measure Predictive power ReliabilityPotential for
Diagnostic Insight
Value-added
Student survey
Observation
H
M
L
M
H
M/H
L
M
H
Framework for Teaching (Danielson)
9
Four Steps
Uns
atisf
acto
ry
Yes/no Questions, posed in rapid succession, teacher asks all questions, same few students participate.
Basic
Some questions ask for student explanations, uneven attempts to engage all students.
Profi
cien
t
Most questions ask for explanation, discussion develops/teacher steps aside, all students participate.
Adva
nced
All questions high quality, students initiate some questions, students engage other students.
www.engageNY.org
Student Feedback: related to student learning gains
10
Survey StatementRank
1
2
3
4
5
• Students in this class treat the teacher with respect
• My classmates behave the way my teacher wants them to
• Our class stays busy and doesn’t waste time
• In this class, we learn a lot every day
• In this class, we learn to correct our mistakes
Student survey items with strongest relationship to middle school math gains:
38• I have learned a lot this year about [the state test]
39 • Getting ready for [the state test] takes a lot of time in our class
Student survey items with the weakest relationship to middle school math gains:
Note: Sorted by absolute value of correlation with student achievement gains. Drawn from “Learning about Teaching: Initial Findings from the Measures of Effective Teaching
Project”. For a list of Tripod survey questions, see Appendix Table 1 in the Research Report.
www.engageNY.org 11
Combining Observations with other measures improved predictive power
Dynamic Trio
www.engageNY.org
Key Finding: Use multiple measures• All the observation rubrics are positively associated with
student achievement gains• Using multiple observations per teacher is VERY
important (and ideally multiple observers)• The student feedback survey tested is ALSO positively
associated with student achievement gains• Combining observation measures, student feedback and
value-added growth results on State tests was more reliable and a better predictor of a teacher’s value-added on State tests with a different cohort of students than:
» Any Measure alone» Graduate degrees» Years of teaching experience
• Combining “measures” is also a strong predictor of student performance on other kinds of student tests.
12
www.engageNY.org
Evaluating Educator Effectiveness
• Student growth on state assessments (state-provided)
• Student learning objectives
Growth20%
• Student growth or achievement
• Options selected through collective bargaining
Locally Selected Measures
20%
• Rubrics
• Sources of evidence: observations, visits, surveys etc.
Other Measures60%
www.engageNY.org
Key Points about NYS Growth Measures
• We are measuring student growth and not achievement• Allow teachers to achieve high ratings regardless of incoming levels of
achievement of their students
• We are measuring growth compared to similar students• Similar students: Up to three years of the same prior achievement,
three student-level characteristics (economic disadvantage, SWD, and ELL status)
• In 12-13, NY’s “value-added model” which needs Board of Regents approval, will consider additional student and classroom characteristics
Every educator has a fair chance to demonstrate effectiveness on these measures regardless of the composition of his/her class or school.
www.engageNY.org
0
200
400
600
800
Student AStudent BStudent CStudent DStudent E
Ms. Smith
Prior Performance
0
200
400
600
800
Student AStudent BStudent CStudent DStudent E
Ms. Jones
Prior Performance
Prior Year Performance for Students in Two Teachers’ Classrooms
─ Proficiency
www.engageNY.org
0
200
400
600
800
Student AStudent BStudent CStudent DStudent E
Ms. Smith
Prior Performance Current Performance
0
200
400
600
800
Student AStudent BStudent CStudent DStudent E
Ms. Jones
Prior Performance Current Performance
Current Year Performance of Same Students
─ Proficiency
www.engageNY.org
Prior and Current Year Performance for Ms. Smith’s Students
Ms. Smith’s Class
Prior Score Current Score
Student A 450 510
Student B 470 500
Student C 480 525
Student D 500 550
Student E 600 650
www.engageNY.org
EL
A S
cale
Sco
re
2011 2012
Student A
450
High SGPs
Low SGPs
Student A’s Current Year Performance Compared to “Similar” Students
If we compare student A’s current score
to other students who had the same
prior score (450), we can measure her
growth relative to other students. We
describe her growth as a “student
growth percentile (SGP”). Student A’s
SGP is 45, meaning she performed better
in the current year than 45 percent of
similar students.
www.engageNY.org
Comparing Performance of “Similar” Students
Prior Year Score
Cu
rren
t Ye
ar S
core
Given any prior score, we see
a range of current year
scores, which give us SGPs of
1 to 99.
www.engageNY.org
From Student Growth to Teachers and Principals (continued)
Ms. Smith’s Class
SGP
Student A
45
Student B
40
Student C
70
Student D
60
Student E
40
To measure teacher performance, we find the mean growth percentile
(MGP) for her students. To find an educator’s mean growth percentile,
take the average of SGPs in the classroom. In this case:
Step 1: 45+40+70+60+40=255
Step 2. 255/5=51
Ms. Smith’s mean growth percentile (MGP) is 51, meaning on average
her students performed better than 51 percent of similar students.
A principal’s performance is measured by finding the mean growth percentile for all students in the school.
www.engageNY.org
Expanding the Definition of “Similar” Students
• So far we have been talking about “similar” students as those with the same prior year assessment score
• We will now add two additional features to the conversation:
• Two additional years of prior assessment scores– Remember—a student MUST have current year and prior year
assessment score to be included
• Student-level factors–Economic disadvantage–Students with disabilities (SWDs)–English language learners (ELLs)
www.engageNY.org
Scatter Plot of Teacher MGPs and Percent Poverty Students in Class – Adjusted Model
Another very small
downward slope
suggesting very small
ED relationship
11-12 Technical Report NYS Growth Meausres
www.engageNY.org
MGPs and Statistical Confidence
87
Confidence Range
Upper
Limit
Lower
Limit
MGP
• NYSED will report a 95 percent confidence range, meaning we can be 95 percent confident that an educator’s “true” MGP lies within
that range. Upper and lower limits of MGPs will also be reported.
• An educator’s confidence range depends on a number of factors, including: number of student scores included in their MGP and the
variability of student performance in their classroom.
www.engageNY.org
From MGPs to Growth Ratings: TeachersRules on last slide result in these HEDI criteria for 2011-12
If yes
If no
Is your MGP ≥ 69?Is your Lower Limit >
Mean of 52?
Highly Effective: Results are
well above state average for
similar students
Is your MGP ≤ 35? Is your Upper Limit <
44?
Ineffective:
Results are well below state
average for similar students
Developing:
Results are below state
average for similar students
If no
Effective:
Results equal state average
for similar students
Mean Growth Percentile Confidence Range HEDI Rating
Is your MGP 42-
68?
Any Confidence
RangeIf yes
If no
Is your MGP 36-
41?
Is your Upper Limit <
Mean of 52?If yes
If yes
If yes
If yes
If yes
If yes
If no
If no
If no
www.engageNY.org
Results Distribution for Growth Subcomponent 2011-12
HEDI Rating & 2011-12 Points
(points assigned within category based on MGP)
2011-12 Percent of
Teacher MGPs
2011-12 Percent of Principal
MGPs
Highly Effective18-20
7% 6%
Effective9-17
77% 79%
Developing3-8
10% 8%
Ineffective0-2
6% 7%
www.engageNY.org
First let’s look at a growth report about a teacher…
Jane Eyre
Jane’s MGP = 47
(this is what is used to determine the
growth score and growth rating)
Jane’s
Upper Limit = 55 and Lower Limit = 39
www.engageNY.org
Teacher-level Report
District X
School #1
Jane Eyre
Jane Eyre
Teacher 1D’s Growth Score and
Growth Rating are listed here
Teacher 1D has a higher adjusted
MGP in Math than ELA
Teacher 1D does not have any
growth data reported for any of the
subgroups because 16 student
scores are required to report any
data
www.engageNY.org
School-level Report
District X
School #1
An adjusted MGP and associated
confidence range will be reported for each
subject and grade level within the school.
49 % of students at School #1
scored above the State median.
The Growth Score and Growth Rating for the
Principal of School #1 are listed here
School #1 has scores
broken out by subject for
grades 4-6.
36% of the student scores are
from economically
disadvantaged students, and no
scores from English language
learners.
Summary of Revised APPR Provisions Memo:
http://engageny.org/wp-content/uploads/2012/03/nys-evaluation-plans-guidance-memo.pdf
www.engageNY.org
School-level Report—Detailed View
District X
School #1
Teacher 1E
Teacher 1D
Teacher 1C
Teacher 1B
Teacher 1A
Teacher 1F
Teacher 1I
Teacher 1J
Teacher 1K
Teacher 1L
Teacher 1G
Teacher 1H
School #1 has 12 teachers
who teach grades 4-8 ELA
and Math
Teacher 1B has the most student scores linked to
him (43 scores) 43 student scores could not be linked to any of the
teachers
Each teacher receives an adjusted MGP and associated
confidence range that are used to determine the growth rating
and growth score
Teachers 1E and 1G did not
receive any growth data
because they are linked to
less than 16 student scores
www.engageNY.org
District-level View—Page 1 NY State Summary
NYS Summary Data—Included on ALL
District reports
Number of student scores
included in calculation of State
MGP
NY Statewide Adjusted MGP = 52State Median = 50
District X
Statewide about 50% of ELL, SWD,
and economically disadvantaged
students scored above the State
median.
www.engageNY.org
District X Summary Data
District-level View—Page 1-2 District Summary
District X Summary Data—
continued on next page of report
Number of student scores
included in calculation of district-
wide MGP
District-wide Adjusted
MGP
District X
District X
www.engageNY.org
District-level View—Page 3 List of Schools
District X has two schools that
have grades 4-8 ELA and Math
scores
School #1
School #2
District X
Principal of School #1
Growth Score = 14
Growth Rating = EffectivePrincipal of School #2
Growth Score = 6
Growth Rating = Developing
www.engageNY.org
Using Growth Score results
• Beyond evaluation, growth score information can provide additional information to help with instructional improvement.• Of course, these measures are only one of multiple sources of evidence to use for this purpose
• The best insight comes from considering the results in the context of other information about a teacher, group of teachers, principal or group of schools.
www.engageNY.org
Districts may want to:
Analyze district-level information using these reflective questions:
How much did our students grow, on average, compared to similar students? Is this higher, lower, or about what we would have expected? Why?
How do our MGPs for each reported subgroup (ELL, SWD, economically disadvantaged students, high- and low-achieving students) compare to each other and to our overall MGP? Are there any patterns? Are the MGPs higher, lower, or about what we would have expected? Why?
How do the MGPs compare by subject and across grade levels? Why might they be similar or different?
What should we do to understand any surprises using other information and evidence?
Do we have the right plans in place to aid in professional learning?
www.engageNY.org
Districts may want to:
Convene principals to reflect upon their school growth results in context of other information about student learning and teacher effectiveness in their schools:• Use BOCES trainers and/or SED online resources to ensure basic
understanding of the measures and what information is found on reports
• Engage principals individually or in a group to reflect on questions about their school information in the context of other evidence of teacher effectiveness:
• How much did the students of my teachers grow, on average, compared to similar students and how does this differ across teachers? Are there differences across grades or subjects?
• How do my teachers’ MGPs differ across each reported subgroup? Do I see any patterns?
www.engageNY.org
Principals may want to:• Consider the reflective questions in their school-level
reports:• See the Principal’s Guide to Interpreting Growth Scores:
http://engageny.org/wp-content/uploads/2012/06/Principals_Guide_to_Interpreting_Your_Growth_Score.pdf
• See the Sample Principal Report—Annotated: http://engageny.org/wp-content/uploads/2012/06/Principal_Sample_Growth_Report.pdf
• Plan how teachers will get the information they need to understand their own growth reports
www.engageNY.org
Teachers may want to:• Review materials from SED about growth measures
• View the “Growth Model for Educator Evaluation 2011-12” Webinar: http://engageny.org/resource/growth-model-for-educator-evaluation-in-2011-2012/
• View the “Using Growth Measures for Educator Evaluation in 2011-12” Webinar: http://engageny.org/resource/using-growth-measures-for-educator-evaluation-in-2011-2012/
• See the Teacher’s Guide to Interpreting Growth Scores: http://engageny.org/wp-content/uploads/2012/06/Teachers_Guide_to_Interpreting_Your_Growth_Score.pdf
• See the Sample Teacher Report—Annotated: http://engageny.org/wp-content/uploads/2012/06/Teacher_Sample_Growth_Report.pdf
• Consider the following reflective questions:• How much did my students grow, on average, compared to similar students? Is
this higher, lower, or about what I would have expected? Why?• How does this information about student growth align with information about my
instructional practice received through observations or other measures? Why might this be?
www.engageNY.org
You can help by supporting Districts to:
Understand the basics of the growth measures Analyze district-level information using these and other
reflective questions using growth and other measures:
• How much did our students grow, on average, compared to similar students? Is this higher, lower, or about what we would have expected? Why? What do we learn from subgroup, grade and subject level information?
• What should we do to understand any surprises using other information and evidence?
Put the right plans in place to aid in understanding evaluation measures and using them to support professional growth and learning for our educators.
Ensure accurate reporting of student/teacher linkage information for 12-13 school year.
www.engageNY.org
How would you answer these common misconceptions? • New York’s evaluation system is based mostly on State test scores and
that’s not good.
• A principal knows a good teacher when s/he sees one; we don’t need to include value-added results too.
• I’ve been doing teacher observations for years. I don’t need to go to your training.
• Teacher Value-added information is unreliable and shouldn’t be a part of teacher evaluation.
• I am a teacher with lots of students in poverty. How can measuring my student test score results be fair?
• I have a lot of high achieving students in my classes/school. They have no where to go but down so we won’t do well on “growth” measures.
39
www.engageNY.org
How would you answer these common misconceptions? • New York’s evaluation system is based mostly on State test
scores and that’s not good.• NY uses multiple measures as research advises. 60% involves measures of
educator practice. 20-25% involves GROWTH on state assessments or comparable measures. And the remaining points will be a locally-selected measure of student growth or achievement.
• A principal knows a good teacher when s/he sees one; we don’t need to include value-added results too.
• Recent MET study shows that combining observation results and teacher value-added is more predictive and reliable than either measure alone.
• I’ve been doing teacher observations for years. I don’t need to go to your training.
• The MET study shows that regularly recalibrating observers against benchmarks of accurate observation ratings is critical to ensuring a valid and reliable evaluation system. Even the best observers can “drift” over time. And the best can help others stay in sync. In addition, NYS training will help everyone identify evidence that the new Common core standards are being implemented well in classrooms.
40
www.engageNY.org
How would you answer these common misconceptions? • I am a teacher with lots of students in poverty. How can
measuring my student test score results be fair?• NY’s growth measures compare the performance of students to that of
similar students including similar prior test score history, poverty and other student characteristics. There is little relationship between the percent of students in poverty and a teacher’s mean growth percentile.
• I have a lot of high achieving students in my classes/school. They have no where to go but down so we won’t do well on “growth” measures.
• NY’s growth measures compare performance of students to that of similar students using prior test score history and other student characteristics. Teachers whose high achieving students outperform other high achieving students will do well on these growth measures whether or not the students scale scores go up year over year.
• Teacher Value-added information is unreliable and shouldn’t be a part of teacher evaluation.
• Many researchers have shown that teacher value-added is the best predictor we have of the future learning growth of a teacher’s students. Two new research studies, Chetty/Friedman/Rockoff and the Measures of Effective Teaching Study add new evidence in support of this argument.
41
www.engageNY.org
For More Information…
Please review resources about the State-provided growth measures here:
http://engageny.org/resource/resources-about-state-growth-measures/
And the guidance on NYS’s APPR Law and Regulations:http://engageny.org/resource/guidance-on-new-yorks-annual-professional-performance-review-law-and-regulations/
www.engageNY.org
Thank You.
www.engageNY.org
Review of Terms
• SGP (student growth percentile): • the result of a statistical model that calculates each student’s change in
achievement between two or more points in time on a State assessment or other comparable measure and compares each student’s performance to that of similarly achieving students
• Similar students: • students with the similar prior test scores,(up to three years), and ELL,
SWD, and economic disadvantage status
• Also include test measurement error correction
• Unadjusted and adjusted MGP (mean growth percentile): • the average of the student growth percentiles attributed to a given educator
• For evaluation purposes, the overall adjusted MGP is used. This is the MGP that includes all a teacher or principal’s students and takes into account student demographics.