Growth Scales and Pathways
William D. SchaferUniversity of Maryland
andJon S. Twing
Pearson Educational Measurement
NCLB leaves some unmet policy needs
Assessment of student-level growth
Sensitivity to change within achievement levels
Assessment and accountability at all grades
Broad representation of schooling outcomes
Descriptions of what students are able to do in terms of next steps
Cost-benefit accountability
How can we meet these needs?
Our approach starts with measurement of growth through cross-grade scaling of achievement
Current work is being done around:
Vertical Scales in which
Common items for adjacent grades used to generate a common scale across grades
Another approach is grade-equivalents.
Both are continuous cross-grade scales.
We only have three problems with continuous cross-grade scales:
1. The Past
2. The Present
3. The Future
Why the Past?
Ignores instructional history
The same student score should be interpreted differently depending on the grade level of the student
Why the Present?
Relationships among items may (probably do) differ depending on grade level of the student. (e.g., easy fifth grade items may be difficult for fourth graders)
Lack of true equating. It is better for fourth graders to take fourth grade tests and for fifth graders to take fifth grade tests.
Why the Future?
Instructional expectations differ. A score of GE = 5.0 (or VS = 500) carries different growth expectations from a fifth-grade experience next year for a current fifth grader than for a current fourth grader.
We do need to take seriously the interests of policymakers in continuous scaling.
But the problems with grade-equivalents and vertical scaling may be too severe to recommend them.
Here are seven criteria that an alternate system should demonstrate.
1. Implement the Fundamental Accountability Mission
Test all students on what they are supposed to be learning.
2. Assess all contents at all grades.
Educators should be accountable for all public expenditures.
Apply this principle at least to all non-affective outcomes of schooling.
3. Define tested domains explicitly.
Teachers need to understand their learning targets in terms of
Knowledge (what students know)
Factual
Conceptual
Procedural
Cognition (what they do with it)
4. Base test interpretations on the future.
We can’t change the past, but we can design the future.
It can be more meaningful to think about what students are prepared for than about what they have learned.
5. Inform decision making about students, teachers, and programs.
Within the limits of privacy, gathering data for accountability judgments about everyone and everything (within reason) will help decision makers reach the most informed decisions.
This also means that we will associate assessments with those who are responsible for improving them.
6. Emphasize predictive evidence of validity.
Basing assessment interpretations on the future (see point 4) suggests that our best evidence to validate our interpretations is how well they predicted in the past.
7. Capitalize on both criterion and norm referencing.
Score reports need to satisfy the needs of the recipients. Both criterion-referencing (what students are prepared to do) and norm-referencing (how many are as, more, and less prepared) convey information that is useful.
Other things equal, more information is better than less.
Our Approach to the Criteria
Many of the criteria are self-satisfying.
Some recent and new concepts are needed.
Four recent or new concepts:– Socially moderated standard setting– Operationally defined exit competencies– Growth scaling– Growth pathways
Socially Moderated Standard Setting
Ferrara, Johnson, & Chen (2005)
Judges set achievement level cut points where students have prerequisites for the same achievement level next year.
Note the future orientation of the achievement levels. This concept also underlies Lissitz & Huynh’s (2003) concept of vertically moderated standards.
Operationally Defined Exit Competencies
If we implement socially moderated standards, where do the cut points for the 12th grade come from?
Our suggestion is to base them on what the students are prepared for, such as (1) college credit, (2) ready for college, (3) needs college remediation, (4) satisfies federal ability-to-benefit rules, (5) capable of independent living, (6) below.
Modify as needed for lower grades (e.g., fewer levels) and certain contents (e.g., athletics, music)
Growth Scaling
Some elements of this have been used in Texas and Washington State.
Test at each grade level separately for any content (i.e., only grade-level items).
Report using a three-digit scale.First digit is the grade level.Second two digits are a linear transform of the lower “proficient” (e.g., 40) and “advanced” (e.g., 60)cut points. Could transform non-linearly to all cut points with more than three levels.
Growth Pathways
Given that content is backmapped (Wiggins & McTighe, 1998), and achievement levels are socially moderated, can express achievement results in terms of readiness for growth (next year, or at 12th grade or both).
Can generate transition matrices to express likelihoods of various futures for students.
Adequate Yearly Progress
Capitalizing on Hill et al. (2005) can use growth pathways as the bases for expectations and give point awards for students meeting or falling below or above their expectations based on year-ago achievement levels.
Existing Empirical State Data
Using existing data, we explored some of these concepts.
Two data sets were used from Texas.– All data is in the public domain and can be
obtained from the Texas website.– Current Texas data is used: TAKS– Previous Texas data is used: TAAS
TAAS Data (2000-2002)
Grade RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing
10 (Exit) 39/60 80.4 86 29/60 81.4 89 29/60 82.6 928 36/60 81.5 90 30/60 82.7 92 29/60 83.6 927 33/58 81.5 87 30/58 82.4 89 29/58 83.9 926 34/56 81.9 88 28/56 83.2 91 28/56 84.4 935 33/52 83.9 92 26/52 84.6 94 27/52 85.8 964 34/50 80.9 87 27/50 82.0 91 26/50 83.4 943 31/44 78.3 80 24/44 79.8 82 24/44 81.4 87
Grade RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing
10 (Exit) 31/48 84.7 90 30/48 85.5 90 29/48 87.6 948 31/48 85.7 89 30/48 87.2 91 29/48 89.5 947 31/45 82.1 83 27/45 86.4 89 29/45 87.2 916 27/40 84.6 86 27/40 84.5 85 26/40 86.8 885 26/40 85.9 87 26/40 86.9 90 26/40 88.8 924 26/40 86.1 89 26/40 86.4 90 25/40 87.3 923 24/36 82.7 87 24/36 82.6 86 24/36 83.1 87
Mathematics
Reading
*Note: The Passing Standard is a TLI value of 70.
2000 2001 2002
2000 2001 2002
Immediate Observations - TAAS Data
Passing standards appear to be relatively lenient.– Actual standards were set in fall of 1989.– Curriculum change occurred in 2000.
Texas Learning Index (TLI)– Is a variation of the “Growth Scaling” model previously
discussed.– Will be discussed in more detail shortly.
Despite the leniency of the standard, average cross-sectional gain is shown with the TLI.– About a 2.5 TLI value gain on average (across
grades).
TAKS Data (2003-2005)
Grade RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing
11 33/60 2101 44 32/60 2186 67 33/60 2201 7210 33/56 2115 48 34/56 2122 52 33/56 2139 589 31/52 2096 44 31/52 2121 50 31/52 2140 568 30/50 2116 51 31/50 2147 57 30/50 2156 617 28/48 2121 51 28/48 2139 60 28/48 2167 646 29/46 2167 60 28/46 2197 67 29/46 2234 725 30/44 2183 65 30/44 2229 73 30/44 2266 794 28/42 2194 70 28/42 2228 78 28/42 2256 813 27/40 2212 74 27/40 2247 83 27/40 2245 82
Grade RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing
11 43/73 2148 61 42/73 2214 83 39/73 2272 8710 47/73 2158 66 38/73 2179 72 43/73 2187 679 29/42 2140 66 23/42 2187 76 27/42 2218 828 34/48 2244 77 34/48 2247 83 34/48 2288 837 33/48 2192 72 34/48 2210 75 33/48 2224 816 27/42 2227 71 27/42 2260 79 26/42 2296 855 29/42 2188 67 30/42 2211 73 30/42 2217 754 27/40 2213 76 26/40 2234 81 28/40 2235 793 24/36 2255 81 25/36 2279 88 23/36 2306 89
Note: Passing = "Met Standard"; Data Reported is "Panel Recommended" Standard.*ELA for Grades 9, 10 and 11 includes open-ended and/or essay responses.Note: A a score of "2" on the essay is an additional requirement tp "pass".
Mathematics
ELA*
2003 2004 2005
2003 2004 2005
Immediate Observations -TAKS Data
Passing standards appear to be more severe than TAAS, but still the majority of students pass for the most part.– Standards were set using Item Mapping and field-test data in
2003. – Standards were “phased in” by the SBOE.– “Passing” is labeled as “Met the Standard”.
Scale Scores are transformed within grade and subject calibrations using Rasch.– Scales were set such that 2100 is always “passing”.– “Socially moderated” expectation that a 2100 this year is equal
to a 2100 next year.– We will look at this in another slide shortly.
Immediate Observations-TAKS Data
Some Issues/Problems seem obvious:– Use of field test data the and lack of student
motivation the first year.– Phase in of the standards makes the meaning of
“passing” difficult to understand.– Construct changes between grades 8 and 9.– Math increases in difficulty across the grades.– Cross-sectional gain scores show some progress,
with between 20 and 35 point gains in average scaled score across grades and subjects.
– Finally, the percentage of classifications (impact) resulting from the Item Mapping standard setting is quite varied.
A Pre-Organizer
• Socially Moderated Standard Setting– Really sets the expectation of student performance in
the next grade.• Growth Scaling
– A different definition of growth.– Growth by fiat.
• Operationally Defined Exit Competencies– How does a student exit the program?– How to migrate this definition down to other grades.
• Growth Pathways– Cumulative probability of success.– Not addressed in this paper with Texas data.
Socially Moderated Standard Setting
Consider the TAKS data in light of Socially Moderated Standard Setting.– The cut scores were determined separately by grade
and subject using an Item Mapping procedure.– 2100 was selected as the transformation of the Rasch
theta scale associated with passing.– 2100 became the passing standard for all grades and
subjects.– Similar to the “quasi-vertical scale scores” procedure
described by Ferrara et al. (2005).
Socially Moderated Standard Setting
Despite implementation procedures, the standard setting yielded a somewhat inconsistent set of cut scores.– Panels consisted of on and adjacent grade educators.– Performance level descriptors were discussed both
for the current grade and the next.– A review panel was convened to ensure continuity
between grades within subjects.– This review panel was comprised of educators from
all grades participating in the standard setting and use impact data for all grades as well as traditionally estimated vertical scaling information.
Socially Moderated Standard Setting
Yet, some inconsistencies are hard to explain.– For example, the standards yielded the following
passing rates for Reading:Grade 3 81
Grade 4 76
Grade 5 67
Grade 6 71
– Clearly, “social moderation” did not occur:• Differences in content standards from grade to grade.• Lack of a clearly defined procedure setting up expectation at
the next grade.• Mitigating factors (i.e., “kids cry”; raw score percent correct,
etc.).
Socially Moderated Standard Setting
What about unanticipated consequences?– Are teachers, parents and the public calculating “gain
score” differences between the grades based on these horizontal scale scores?
– Will the expectation not be “2100 this year = 2100 next year”? This is similar to one of the concerns in Ferrara et. al. (2005) that prohibited the research from being conducted.
– In fact, based on simple regression using matched cohorts, the expectation is a student with a scaled score of 2100 in grade 3 reading will earn a 2072 in grade 4 reading on average.
Growth ScalingThe TAAS TLI is an example of this type of “growth scale”.
– A standard setting was performed for the “Exit Level” TAAS test.
– This cut score was expressed in standard deviation units above or below the mean (i.e., a standard score).
– This same distance was then articulated down to other grades.
– The logic was one defining growth in terms of maintaining relative status as students move across the grades.
– For example, if the passing standard was 1.0 standard deviation above the mean at Exit Level, then students who are 1.0 standard deviation above the mean in the lower grade distributions are “on track” to pass the Exit Level test provided they maintain their current standing / progress.
Growth Scaling
• For convenience, the scales were transformed such that the passing standards were at 70.
• Grade level designations were then added to further enhance the meaning of the score.
• This score had some appealing reporting properties:– Passing was 70 at each grade.– Since the TLI is a standard score, gain
measures could be calculated for “value added” statements.
Growth Scaling
Some concerns were also noted:– Outside of the first cut score, the TLI was
essentially “content standard” free.– Because it was based on distribution
statistics, the distributions (like norms) would become dated.
– Differences in the shapes of the distributions (e.g., test difficulty) would have an unknown impact on student’s actually being able to “hold their own”.
– Differences in the content being measured across the grades is essentially irrelevant.
Operationally Defined Exit Competencies
The TAKS actually has such a component at the Exit Level.– This is called the “Higher Education Readiness
Component (HERC)” Standard.– Students must reach this standard to earn “dual
college credit” and to be allowed credit for college level work.
– Two types of research were conducted to provide information for “traditional” standard setting:
• Correlations with existing measures (ACT & SAT).• Empirical study examining how well second semester
freshmen performed on the Exit Level TAKS test.
Operationally Defined Exit Competencies
This research yielded the following:
College TAKS Math TAKS ELASample 2138 2172
TAKS Predicted PredictedMath ACT Math SAT Math
2100 19.5 4722200 21.9 521
TAKS Predicted PredicatedELA ACT English SAT Verbal
2100 17.7 4612200 20.1 502
Grade 11, Spring 2003
Operationally Defined Exit Competencies
Some interesting observations:– HERC standard was taken to be 2200, different from
that needed to graduate.– Second semester college freshmen did “marginally
better” than the required passing standard for TAKS to graduate.
– Predicted ACT and SAT scores support the notion that the TAKS passing standards are “moderately” difficult.
– Given the content of the TAKS assessments, how could this standard be articulated down to lower grades?
Concluding Remarks
Three possible enhancements that may or may not be intriguing for policymakers:– Grades as Achievement Levels– Information Rich Classrooms– Monetary Metric
Grades as Achievement Levels
Associating letter grades with achievement levels would:– Provide meaningful interpretations for grades– Provide consistent meanings for grades– Force use as experts recommend– Enable concurrent evaluations of grades– Enable predictive evaluations of grades– Require help for teachers to implement
Information Rich Classrooms
Concept is from Schafer & Moody (2004).Achievement goals would be clarified
through test maps.Progress would be tracked at the content
strand level throughout the year using combinations of formative and summative assessments (heavy role for computers).
Achievement level assignments would occur incrementally throughout the year.
Monitory Metric for Value Added
Economists would establish value of each exit achievement level through estimating lifetime earned income.
The earnings would be amortized across grade levels and contents.
The “value added” for each student each year is the sum across contents of the products of the achievement level times the vector of probabilities of exit achievement levels times the vector of amortized monitory values.
Enables cost-benefit analysis of education in a consistent metric for inputs and outputs.