Growth Scales and Pathways

Growth Scales and Pathways

William D. SchaferUniversity of Maryland

andJon S. Twing

Pearson Educational Measurement

NCLB leaves some unmet policy needs

Assessment of student-level growth

Sensitivity to change within achievement levels

Assessment and accountability at all grades

Broad representation of schooling outcomes

Descriptions of what students are able to do in terms of next steps

Cost-benefit accountability

How can we meet these needs?

Our approach starts with measurement of growth through cross-grade scaling of achievement

Current work is being done around:

Vertical Scales in which

Common items for adjacent grades used to generate a common scale across grades

Another approach is grade-equivalents.

Both are continuous cross-grade scales.

We only have three problems with continuous cross-grade scales:

1. The Past

2. The Present

3. The Future

Why the Past?

Ignores instructional history

The same student score should be interpreted differently depending on the grade level of the student

Why the Present?

Relationships among items may (probably do) differ depending on grade level of the student. (e.g., easy fifth grade items may be difficult for fourth graders)

Lack of true equating. It is better for fourth graders to take fourth grade tests and for fifth graders to take fifth grade tests.

Why the Future?

Instructional expectations differ. A score of GE = 5.0 (or VS = 500) carries different growth expectations from a fifth-grade experience next year for a current fifth grader than for a current fourth grader.

We do need to take seriously the interests of policymakers in continuous scaling.

But the problems with grade-equivalents and vertical scaling may be too severe to recommend them.

Here are seven criteria that an alternate system should demonstrate.

1. Implement the Fundamental Accountability Mission

Test all students on what they are supposed to be learning.

2. Assess all contents at all grades.

Educators should be accountable for all public expenditures.

Apply this principle at least to all non-affective outcomes of schooling.

3. Define tested domains explicitly.

Teachers need to understand their learning targets in terms of

Knowledge (what students know)

Factual

Conceptual

Procedural

Cognition (what they do with it)

4. Base test interpretations on the future.

We can’t change the past, but we can design the future.

It can be more meaningful to think about what students are prepared for than about what they have learned.

5. Inform decision making about students, teachers, and programs.

Within the limits of privacy, gathering data for accountability judgments about everyone and everything (within reason) will help decision makers reach the most informed decisions.

This also means that we will associate assessments with those who are responsible for improving them.

6. Emphasize predictive evidence of validity.

Basing assessment interpretations on the future (see point 4) suggests that our best evidence to validate our interpretations is how well they predicted in the past.

7. Capitalize on both criterion and norm referencing.

Score reports need to satisfy the needs of the recipients. Both criterion-referencing (what students are prepared to do) and norm-referencing (how many are as, more, and less prepared) convey information that is useful.

Other things equal, more information is better than less.

Our Approach to the Criteria

Many of the criteria are self-satisfying.

Some recent and new concepts are needed.

Four recent or new concepts:– Socially moderated standard setting– Operationally defined exit competencies– Growth scaling– Growth pathways

Socially Moderated Standard Setting

Ferrara, Johnson, & Chen (2005)

Judges set achievement level cut points where students have prerequisites for the same achievement level next year.

Note the future orientation of the achievement levels. This concept also underlies Lissitz & Huynh’s (2003) concept of vertically moderated standards.

Operationally Defined Exit Competencies

If we implement socially moderated standards, where do the cut points for the 12th grade come from?

Our suggestion is to base them on what the students are prepared for, such as (1) college credit, (2) ready for college, (3) needs college remediation, (4) satisfies federal ability-to-benefit rules, (5) capable of independent living, (6) below.

Modify as needed for lower grades (e.g., fewer levels) and certain contents (e.g., athletics, music)

Growth Scaling

Some elements of this have been used in Texas and Washington State.

Test at each grade level separately for any content (i.e., only grade-level items).

Report using a three-digit scale.First digit is the grade level.Second two digits are a linear transform of the lower “proficient” (e.g., 40) and “advanced” (e.g., 60)cut points. Could transform non-linearly to all cut points with more than three levels.

Growth Pathways

Given that content is backmapped (Wiggins & McTighe, 1998), and achievement levels are socially moderated, can express achievement results in terms of readiness for growth (next year, or at 12th grade or both).

Can generate transition matrices to express likelihoods of various futures for students.

Adequate Yearly Progress

Capitalizing on Hill et al. (2005) can use growth pathways as the bases for expectations and give point awards for students meeting or falling below or above their expectations based on year-ago achievement levels.

Existing Empirical State Data

Using existing data, we explored some of these concepts.

Two data sets were used from Texas.– All data is in the public domain and can be

obtained from the Texas website.– Current Texas data is used: TAKS– Previous Texas data is used: TAAS

TAAS Data (2000-2002)

Grade RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing

10 (Exit) 39/60 80.4 86 29/60 81.4 89 29/60 82.6 928 36/60 81.5 90 30/60 82.7 92 29/60 83.6 927 33/58 81.5 87 30/58 82.4 89 29/58 83.9 926 34/56 81.9 88 28/56 83.2 91 28/56 84.4 935 33/52 83.9 92 26/52 84.6 94 27/52 85.8 964 34/50 80.9 87 27/50 82.0 91 26/50 83.4 943 31/44 78.3 80 24/44 79.8 82 24/44 81.4 87

Grade RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing RS Cut Ave. TLI % Passing

10 (Exit) 31/48 84.7 90 30/48 85.5 90 29/48 87.6 948 31/48 85.7 89 30/48 87.2 91 29/48 89.5 947 31/45 82.1 83 27/45 86.4 89 29/45 87.2 916 27/40 84.6 86 27/40 84.5 85 26/40 86.8 885 26/40 85.9 87 26/40 86.9 90 26/40 88.8 924 26/40 86.1 89 26/40 86.4 90 25/40 87.3 923 24/36 82.7 87 24/36 82.6 86 24/36 83.1 87

Mathematics

Reading

*Note: The Passing Standard is a TLI value of 70.

2000 2001 2002

2000 2001 2002

Immediate Observations - TAAS Data

Passing standards appear to be relatively lenient.– Actual standards were set in fall of 1989.– Curriculum change occurred in 2000.

Texas Learning Index (TLI)– Is a variation of the “Growth Scaling” model previously

discussed.– Will be discussed in more detail shortly.

Despite the leniency of the standard, average cross-sectional gain is shown with the TLI.– About a 2.5 TLI value gain on average (across

grades).

TAKS Data (2003-2005)

Grade RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing

11 33/60 2101 44 32/60 2186 67 33/60 2201 7210 33/56 2115 48 34/56 2122 52 33/56 2139 589 31/52 2096 44 31/52 2121 50 31/52 2140 568 30/50 2116 51 31/50 2147 57 30/50 2156 617 28/48 2121 51 28/48 2139 60 28/48 2167 646 29/46 2167 60 28/46 2197 67 29/46 2234 725 30/44 2183 65 30/44 2229 73 30/44 2266 794 28/42 2194 70 28/42 2228 78 28/42 2256 813 27/40 2212 74 27/40 2247 83 27/40 2245 82

Grade RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing RS Cut Ave. SS % Passing

11 43/73 2148 61 42/73 2214 83 39/73 2272 8710 47/73 2158 66 38/73 2179 72 43/73 2187 679 29/42 2140 66 23/42 2187 76 27/42 2218 828 34/48 2244 77 34/48 2247 83 34/48 2288 837 33/48 2192 72 34/48 2210 75 33/48 2224 816 27/42 2227 71 27/42 2260 79 26/42 2296 855 29/42 2188 67 30/42 2211 73 30/42 2217 754 27/40 2213 76 26/40 2234 81 28/40 2235 793 24/36 2255 81 25/36 2279 88 23/36 2306 89

Note: Passing = "Met Standard"; Data Reported is "Panel Recommended" Standard.*ELA for Grades 9, 10 and 11 includes open-ended and/or essay responses.Note: A a score of "2" on the essay is an additional requirement tp "pass".

Mathematics

ELA*

2003 2004 2005

2003 2004 2005

Immediate Observations -TAKS Data

Passing standards appear to be more severe than TAAS, but still the majority of students pass for the most part.– Standards were set using Item Mapping and field-test data in

2003. – Standards were “phased in” by the SBOE.– “Passing” is labeled as “Met the Standard”.

Scale Scores are transformed within grade and subject calibrations using Rasch.– Scales were set such that 2100 is always “passing”.– “Socially moderated” expectation that a 2100 this year is equal

to a 2100 next year.– We will look at this in another slide shortly.

Immediate Observations-TAKS Data

Some Issues/Problems seem obvious:– Use of field test data the and lack of student

motivation the first year.– Phase in of the standards makes the meaning of

“passing” difficult to understand.– Construct changes between grades 8 and 9.– Math increases in difficulty across the grades.– Cross-sectional gain scores show some progress,

with between 20 and 35 point gains in average scaled score across grades and subjects.

– Finally, the percentage of classifications (impact) resulting from the Item Mapping standard setting is quite varied.

A Pre-Organizer

• Socially Moderated Standard Setting– Really sets the expectation of student performance in

the next grade.• Growth Scaling

– A different definition of growth.– Growth by fiat.

• Operationally Defined Exit Competencies– How does a student exit the program?– How to migrate this definition down to other grades.

• Growth Pathways– Cumulative probability of success.– Not addressed in this paper with Texas data.


Consider the TAKS data in light of Socially Moderated Standard Setting.– The cut scores were determined separately by grade

and subject using an Item Mapping procedure.– 2100 was selected as the transformation of the Rasch

theta scale associated with passing.– 2100 became the passing standard for all grades and

subjects.– Similar to the “quasi-vertical scale scores” procedure

described by Ferrara et al. (2005).


Despite implementation procedures, the standard setting yielded a somewhat inconsistent set of cut scores.– Panels consisted of on and adjacent grade educators.– Performance level descriptors were discussed both

for the current grade and the next.– A review panel was convened to ensure continuity

between grades within subjects.– This review panel was comprised of educators from

all grades participating in the standard setting and use impact data for all grades as well as traditionally estimated vertical scaling information.


Yet, some inconsistencies are hard to explain.– For example, the standards yielded the following

passing rates for Reading:Grade 3 81

Grade 4 76

Grade 5 67

Grade 6 71

– Clearly, “social moderation” did not occur:• Differences in content standards from grade to grade.• Lack of a clearly defined procedure setting up expectation at

the next grade.• Mitigating factors (i.e., “kids cry”; raw score percent correct,

etc.).


What about unanticipated consequences?– Are teachers, parents and the public calculating “gain

score” differences between the grades based on these horizontal scale scores?

– Will the expectation not be “2100 this year = 2100 next year”? This is similar to one of the concerns in Ferrara et. al. (2005) that prohibited the research from being conducted.

– In fact, based on simple regression using matched cohorts, the expectation is a student with a scaled score of 2100 in grade 3 reading will earn a 2072 in grade 4 reading on average.

Growth ScalingThe TAAS TLI is an example of this type of “growth scale”.

– A standard setting was performed for the “Exit Level” TAAS test.

– This cut score was expressed in standard deviation units above or below the mean (i.e., a standard score).

– This same distance was then articulated down to other grades.

– The logic was one defining growth in terms of maintaining relative status as students move across the grades.

– For example, if the passing standard was 1.0 standard deviation above the mean at Exit Level, then students who are 1.0 standard deviation above the mean in the lower grade distributions are “on track” to pass the Exit Level test provided they maintain their current standing / progress.

Growth Scaling

• For convenience, the scales were transformed such that the passing standards were at 70.

• Grade level designations were then added to further enhance the meaning of the score.

• This score had some appealing reporting properties:– Passing was 70 at each grade.– Since the TLI is a standard score, gain

measures could be calculated for “value added” statements.

Growth Scaling

Some concerns were also noted:– Outside of the first cut score, the TLI was

essentially “content standard” free.– Because it was based on distribution

statistics, the distributions (like norms) would become dated.

– Differences in the shapes of the distributions (e.g., test difficulty) would have an unknown impact on student’s actually being able to “hold their own”.

– Differences in the content being measured across the grades is essentially irrelevant.


The TAKS actually has such a component at the Exit Level.– This is called the “Higher Education Readiness

Component (HERC)” Standard.– Students must reach this standard to earn “dual

college credit” and to be allowed credit for college level work.

– Two types of research were conducted to provide information for “traditional” standard setting:

• Correlations with existing measures (ACT & SAT).• Empirical study examining how well second semester

freshmen performed on the Exit Level TAKS test.


This research yielded the following:

College TAKS Math TAKS ELASample 2138 2172

TAKS Predicted PredictedMath ACT Math SAT Math

2100 19.5 4722200 21.9 521

TAKS Predicted PredicatedELA ACT English SAT Verbal

2100 17.7 4612200 20.1 502

Grade 11, Spring 2003


Some interesting observations:– HERC standard was taken to be 2200, different from

that needed to graduate.– Second semester college freshmen did “marginally

better” than the required passing standard for TAKS to graduate.

– Predicted ACT and SAT scores support the notion that the TAKS passing standards are “moderately” difficult.

– Given the content of the TAKS assessments, how could this standard be articulated down to lower grades?

Concluding Remarks

Three possible enhancements that may or may not be intriguing for policymakers:– Grades as Achievement Levels– Information Rich Classrooms– Monetary Metric

Grades as Achievement Levels

Associating letter grades with achievement levels would:– Provide meaningful interpretations for grades– Provide consistent meanings for grades– Force use as experts recommend– Enable concurrent evaluations of grades– Enable predictive evaluations of grades– Require help for teachers to implement

Information Rich Classrooms

Concept is from Schafer & Moody (2004).Achievement goals would be clarified

through test maps.Progress would be tracked at the content

strand level throughout the year using combinations of formative and summative assessments (heavy role for computers).

Achievement level assignments would occur incrementally throughout the year.

Monitory Metric for Value Added

Economists would establish value of each exit achievement level through estimating lifetime earned income.

The earnings would be amortized across grade levels and contents.

The “value added” for each student each year is the sum across contents of the products of the achievement level times the vector of probabilities of exit achievement levels times the vector of amortized monitory values.

Enables cost-benefit analysis of education in a consistent metric for inputs and outputs.

Date post:	21-Jan-2016
Category:	Documents
Upload:	xylia
View:	19 times
Download:	0 times

Growth Scales and Pathways

Documents