+ All Categories
Home > Documents > Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study...

Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study...

Date post: 02-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
http://eepa.aera.net Policy Analysis Educational Evaluation and http://epa.sagepub.com/content/30/2/75 The online version of this article can be found at: DOI: 10.3102/0162373708317689 2008 30: 75 EDUCATIONAL EVALUATION AND POLICY ANALYSIS Julian Vasquez Heilig and Linda Darling-Hammond a High-Stakes Testing Context Accountability Texas-Style: The Progress and Learning of Urban Minority Students in Published on behalf of American Educational Research Association and http://www.sagepublications.com can be found at: Educational Evaluation and Policy Analysis Additional services and information for http://eepa.aera.net/alerts Email Alerts: http://eepa.aera.net/subscriptions Subscriptions: http://www.aera.net/reprints Reprints: http://www.aera.net/permissions Permissions: What is This? - Jun 11, 2008 Version of Record >> at UNIV OF TEXAS AUSTIN on March 12, 2013 http://eepa.aera.net Downloaded from
Transcript
Page 1: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

http://eepa.aera.netPolicy Analysis

Educational Evaluation and

http://epa.sagepub.com/content/30/2/75The online version of this article can be found at:

 DOI: 10.3102/0162373708317689

2008 30: 75EDUCATIONAL EVALUATION AND POLICY ANALYSISJulian Vasquez Heilig and Linda Darling-Hammond

a High-Stakes Testing ContextAccountability Texas-Style: The Progress and Learning of Urban Minority Students in

  

 Published on behalf of

  American Educational Research Association

and

http://www.sagepublications.com

can be found at:Educational Evaluation and Policy AnalysisAdditional services and information for    

  http://eepa.aera.net/alertsEmail Alerts:

 

http://eepa.aera.net/subscriptionsSubscriptions:  

http://www.aera.net/reprintsReprints:  

http://www.aera.net/permissionsPermissions:  

What is This? 

- Jun 11, 2008Version of Record >>

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 2: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

Accountability Texas-Style:The Progress and Learning of Urban Minority

Students in a High-Stakes Testing Context

Julian Vasquez HeiligUniversity of Texas at Austin

Linda Darling-HammondStanford University

This study examines longitudinal student progress and achievement on the elementary, middle, andhigh school levels in relation to accountability policy incentives in a large urban district in Texas.Using quantitative analyses supplemented by qualitative interviews, the authors found that high-stakes testing policies that rewarded and punished schools based on average student scores createdincentives for schools to “game the system” by excluding students from testing and, ultimately,school. In the elementary grades, low-achieving students were disproportionately excluded from tak-ing the high-stakes Texas Assessment of Academic Skills tests, demonstrating gains not reflected onthe low-stakes Stanford Achievement Test–Ninth Edition. Student exclusion at the elementary leveloccurred through special education and language exemptions and missing scores. Furthermore,gaming strategies reduced educational opportunity for African American and Latino high school stu-dents. Sharp increases in 9th-grade student retention and disappearance were associated withincreases in 10th-grade test scores and related accountability ratings.

Keywords: accountability, testing, dropout, pushout, minorities, urban education

HIGH-STAKES testing and accountability policiesare expanding their reach in states and districtsnationwide, stimulated in part by the 2002 pas-sage of the No Child Left Behind Act. The pre-vailing theory of action behind accountabilityratings and testing is that schools and studentswho are held accountable to these measures willautomatically increase educational output:Educators will try harder; schools will adoptmore effective methods; and students will learnmore. Pressure to improve test scores will pro-duce genuine gains in student achievement.

However, the effects of high-stakes testingpolicies have been debated. Do policies thatreward and sanction schools and students basedon test scores improve achievement and the qual-ity of education for all or most students? Do theycreate incentives to “game” the system by teach-ing to the test or by removing low-achieving stu-dents from the testing pool through placementdecisions and enrollment actions? Or do they havedifferential effects on different students in differ-ent school contexts? To answer questions likethese, it is critical to look not only at aggregate

We wish to gratefully acknowledge the support of the Rockefeller Foundation in conducting this research. The findings andviews expressed are ours alone.

75

Educational Evaluation and Policy AnalysisJune 2008, Vol. 30, No. 2, pp. 75–110

DOI: 10.3102/0162373708317689© 2008 AERA. http://eepa.aera.net

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 3: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

trends in average test scores at the school, district,and state levels but at individual student progressthrough school over time.

In this study, we examine longitudinal studentprogress and achievement using data on morethan 250,000 students over 7 years in a largeurban district in Texas that we call Brazos City(pseudonym), where the accountability systemadopted in the early 1990s provided the model forthe No Child Left Behind Act a decade later. Weexamine district, school, and individual studenttrends in test scores on multiple tests, as well asstudent progress through school and graduation.Our goal is to empirically evaluate whetheraccountability policies and incentives are associ-ated with changes in student achievement andwhether they increase retention, dropout, anddisappearance of students from school, withemphasis on outcomes for low-income studentsand students of color. We evaluate the relationshipsbetween these trends and school accountabilityratings over time using multivariate statisticalmethods and analysis of interview data from morethan 160 students and staff across seven highschools in the district. These analyses allow us toexamine the possibility of gaming actions aimed atboosting school-level accountability ratings thatmay result in unintended consequences for somesubpopulations of students, including efforts toexclude students from testing or from school.

Accountability Texas-Style

Texas was one of the earlier states to developstatewide testing systems during the 1980s, andthe state adopted minimum competency tests forschool graduation in 1987. In 1993, the Texaslegislature mandated the creation of the Texaspublic school accountability system to rate schooldistricts and evaluate campuses. The Texasaccountability system was supported by thePublic Education Information ManagementSystem (PEIMS) data collection system (a state-mandated curriculum) and the Texas Assessmentof Academic Skills (TAAS). Between 1994, thebaseline year, and 2002, when the TAAS wasreplaced by another test, the primary base indica-tors for determining school accountability ratingsinvolved, first, the proportion of students passingall TAAS subject area tests (with the passing scorerepresented by a 70 on the Texas Learning Index

[TLI], a scaled score on the TAAS) and, second,the annual drop-out rate—each disaggregated bystudent groups: African American, Latino, White,and economically disadvantaged.

A key element of this system involved the useof the 10th-grade TAAS tests in reading, writing,and mathematics as a requirement for graduationfrom high school as well as the central indicatorfor establishing high school accountability rank-ings. Schools were categorized as exemplary, rec-ognized, acceptable, and low performing. For aschool to be designated exemplary, 90% of its stu-dents (and 90% of each student subgroup, definedby race/ethnicity and income) had to pass the tests,and the school drop-out rate could not exceed 1%.To be recognized, a school needed a 65% pass ratefor all subgroups on the test and a drop-out rate ofno more than 3.5%. Between 1995 and 1998, therequired pass rate increased to 80%. To be accept-able, a school needed an initial pass rate of 25% onthe test—increasing to 50% for each subgroup by2000—and a drop-out rate of no more than 6%. In2001, required drop-out rates were dropped to 3%for a recognized school and 5.5% for an acceptableschool. In 2002, these rates were dropped further,to 2.5% and 5.0%, respectively.

High schools’ reputations, funding, and theircontinued existence depended on students’ per-formance on the exit TAAS. Graduation for stu-dents also depended on passing the 10th-gradeTAAS in reading, writing, and mathematics.Thus, the TAAS was high stakes for students,educators, and schools.

In addition to the increasing expectations ofschools, as reflected in these requirements, therewas a set of changing rules for students’ inclusionin the testing program. Initial exemption for spe-cial education students shifted over time, as didthat for limited-English-proficient (LEP) students,who could be exempt for a period of time andtested on the Spanish TAAS. Scores for those spe-cial education students tested on the TAAS wereconsidered in the accountability system beginningin 1999, and scores for elementary students testedon the Spanish-language TAAS were phased intothe accountability system in 1999 and 2000. In2000, LEP exemptions were limited, and studentunderreporting and special education compliancecould affect a district’s rating. By 2001, LEPexemptions were restricted further, and specialeducation exemptions were evaluated. By 2002, a

76

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 4: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

State-Developed Alternative Assessment forspecial education students was developed toinclude them in the accountability system; a socialstudies test was added to the list; and pass rateswere raised once again. After 2002, the testschanged to become more rigorous, and the systemevolved. By all accounts, managing this systemhas been a major preoccupation for Texas schoolsand districts.

Many perceive Texas-style high-stakes testingand accountability as having become the drivingeducation policy for the nation with its incorpo-ration into the reauthorization of the Elementaryand Secondary Education Act in 2002 as the NoChild Left Behind Act (McNeil, 2005). The latterrequires states and localities to build accountabil-ity systems based on assessments that becomehigh stakes, because schools must meet annualtest score targets for subgroups of students,thereby making adequate yearly progress, or facefederal sanctions and penalties. Thus, studyingBrazos City and Texas’s first-generation account-ability system provides an opportunity to evalu-ate one aspect of the theory of action underlyingNo Child Left Behind Act.

Prior Research on High-Stakes Testing

Evidence on the effects of high-stakes testingis mixed. Some studies suggest that students andschools make achievement gains in contextswhere tests are used for decision making,whereas other research has found no improve-ment or even negative consequences. Amongthese unintended outcomes are school strategiesto game the system, by adjusting the testingpool through student placements, admissions,and policies. Also of concern are the ways inwhich stakes attached to tests can corrupt whatthe tests measure, making outcomes nongener-alizable to other achievement measures andkinds of learning (for a summary of issues asso-ciated with test-based accountability systems,see, e.g., Hamilton, Stecher, & Klein, 2002).

Accountability Systems and StudentAchievement

Several studies have used aggregated state-level data to examine whether state high-stakestesting policies appear to increase average student

achievement levels. Carnoy and Loeb (2002) useda 5-point index of the strength of accountabilitysystems in all 50 states—with higher ratingsassigned to systems using high-stakes testing toreward or sanction schools—to examine whetheraccountability “strength” was related to studentgains on the National Assessment of EducationalProgress (NAEP) mathematics test in the1996–2000 period. They found that students instates with stronger high-stakes accountabilitysystems made significantly higher gains on theeighth-grade national mathematics assessmentand that these gains were greater for AfricanAmerican and Latino students, thus narrowing theachievement gap. The study did not find evidenceof a relationship between accountability systemsand either higher rates of student retention orchanges in high school completion rates.

Hanushek and Raymond (2003) also reportedpositive achievement effects in their analysis ofaggregate state-level NAEP mathematics data.They examined the relationship between state-level accountability policies and achievementfor cohorts at Grades 4 and 8 and found thataccountability schemes appeared to increasestate achievement gains. However, they alsofound that accountability policies did not closethe gap in student learning but actually increasedit, given that African Americans and Latinosshowed lower gains on each test when comparedto Whites. In contrast, Lee and Wong (2004)found no evidence that accountability policiesresulted in test score gains or changes in theachievement gap, positive or negative.

Another way of examining the question ofachievement effects is to evaluate whether gainson high-stakes tests appear to be related to gainsin other measures of learning. Amrein andBerliner (2002) examined 18 states with severeconsequences attached to their testing programs,to see if high-stakes testing affected studentlearning on measures other than the high-stakestests. The authors posited that if student learningis actually increasing under high-stakes testingprograms, then transfer learning would be evi-dent on other standardized tests, such as theNAEP, advanced placement tests, and collegeadmissions tests (e.g., ACT, SAT). They foundthat student learning effects were indeterminate:In most cases, when measured by tests other thanthe state-mandated high-stakes instruments,

77

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 5: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

student achievement appeared to remain at thesame level it was before the policy was imple-mented, or it went down when high-stakes test-ing policies were instituted.

Rosenshine (2003) reanalyzed Amrein andBerliner’s NAEP results (2002), arguing thattheir findings were incomplete because they didnot include a comparison group of states with-out high-stakes testing programs. His studyshowed that states that attached consequences totesting outperformed a comparison group ofstates without high-stakes tests on three NAEPtests for the last 4-year period.

Amrein-Beardsley and Berliner (2003)responded to Rosenshine’s critique (2003) withtheir own reanalysis. Using 4 years of NAEP read-ing and math data, they found that states with high-stakes tests appeared to outperform other states infourth-grade mathematics but not in fourth-gradereading or eighth-grade mathematics. They alsofound that states with high-stakes tests exemptedmore students from participating in the NAEP thandid the comparison states without high-stakes testsand that the apparent positive association betweenhigh-stakes testing and achievement in fourth-grade math disappeared when test exclusion rateswere taken into account. The authors argued thathigh-stakes testing may provide greater incentivesto exclude low-performing students from testingthan to increase learning. Increases over time intest exclusion rates for states with strong account-ability pressure were confirmed in a study byNichols, Glass, and Berliner (2006), leading theauthors to suggest that “it may be that increasingpressure leads to greater numbers of students drop-ping out or being held back in school” (p. 50).

Although state-level studies have producedmixed findings regarding the relationship betweenhigh-stakes exams and student progress, studiesusing less aggregated data have found higher ratesof retention and dropping out in states and citiesthat have instituted tougher graduation require-ments (Clarke, Haney, & Madaus, 2000; Lilliard& DeCicca, 2001; Orfield & Ashkinaze, 1991;Roderick, Bryk, Jacob, Easton, & Allensworth,1999; Wheelock, 2003). Using individual-leveldata from the National Educational LongitudinalSurvey, for example, Jacobs (2001) found thatgraduation tests increased the probability of drop-ping out among the lowest-ability students. Witha similar longitudinal data set, the Chicago

Consortium for School Research found thatalthough some students’ scores improved inresponse to a high-stakes testing policy tied tograde promotion, the scores of low-scoring stu-dents who were retained declined, relative tothose of similar-achieving students who had beenpromoted, and their drop-out rates substantiallyincreased (Roderick et al., 1999).

Most studies have found that retention ingrade negatively affects student achievementand graduation. Summarizing several decadesof research, the National Research Council con-cluded that low-performing students who areheld back do less well academically and are farlikelier to drop out than are comparable studentswho are promoted (Heubert & Hauser, 1999).One study, for example, found that retention canincrease the odds of dropping out by as much as250% above those of similar students who werenot retained (Rumberger & Larson, 1998).

These findings raise a number of issues forfurther study. First, studies using the state as theunit of analyses can mask variations acrossschools and districts in organizational responsesand student outcomes. In particular, state-leveldata sets do not allow examination of differencesin school and district capacity that may differen-tially affect student success and school responses.This is a salient issue, given that a growing bodyof literature shows that school conditions andteacher quality affect student learning. Studies ofteachers’ effects at the classroom, school, anddistrict levels have found that teacher effective-ness is a strong determinant of differences in stu-dent learning (Darling-Hammond, 2000; Jordan,Mendro, & Weerasinghe, 1997; Wright, Horn, &Sanders, 1997).

Second, the issue of students’ exclusion fromtesting and even from school requires furtherinvestigation with student- and school-level datasets. These inquiries should evaluate whetherexclusion occurs in systematic ways and whetherexclusion of different groups of students influ-ences school, district, and state test score trends.

Prior Research About AccountabilityEffects in Texas

The creation of the Texas accountability sys-tem set the stage for later proclamations of a“Texas miracle,” featuring sharp increases in

78

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 6: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

TAAS scores, apparent decreases in the achieve-ment gap, and decreases in recorded rates ofdropping out (Klein, Hamilton, McCaffrey, &Stecher, 2000). However, research on theseclaims has produced divergent findings. Haney’sreview of Texas data (2000) identified graderetention, testing exclusion for special educa-tion, English-proficiency exemptions, studentdropout, and other “illusions” as being theunderlying reasons for the apparent increases intest scores. Haney found that retention rates inninth grade and school-leaving rates for highschool students had increased substantiallysince the late 1980s, with fewer than 50% ofAfrican American and Latino ninth graders andonly about 70% of White ninth graders pro-gressing to graduation 4 years after enteringhigh school. He argued that part of the increasein pass rates on the 10th-grade exit TAAS wasattributable to the increases in the rates at whichlow-achieving students were missing from thetesting pool and, hence, the school accountabil-ity ratings.

RAND Corporation researchers Klein andcolleagues (2000) reported that although theTAAS mathematics scores were soaring, Texasstudents did not improve significantly more onthe NAEP math test than their did their counter-parts nationally and the TAAS gains were notreflected in scores on three other tests that theteam administered to Texas students. Theiranalysis also found “stark differences” betweenthe pictures painted by the NAEP and the TAAStests regarding the narrowing of the gap inscores between White and minority students,arguing that according to the NAEP results, theachievement gap in Texas was not only large butthat it even increased slightly. The researcherspointed to test-based instruction focused on thehigh-stakes test, as well as testing exclusions, aspossible reasons for these disparities.

By contrast, Grissmer, Flanagan, Kawata,and Williamson (2000) found that when chil-dren from similar families were comparedacross states, Texas ranked high in achievementon the NAEP. They suggested that the Texasaccountability regime might be one amongmany plausible explanations for the state’sNAEP gains but added that the research designcould not establish a causal linkage. This studydid not examine test exclusion rates.

In a broader study of state achievement inliteracy, another RAND team found that thesmall apparent gap between the scores of Whiteand Latino students on Texas’s state readingtests was not replicated on the NAEP, where thescore gap between the two groups was substan-tially larger (McCombs, Kirby, Barney, Darilek,& Magee, 2005). A study by Linton and Kestor(2003) found evidence of ceiling effects on theTAAS test for White students, given that morethan 60% of these students scored in the top10% of possible test scores. This, they argued,may have created the appearance of narrowingthe achievement gap without actually doing so.

Finally, in a study using Texas EducationAgency (TEA) data, Carnoy, Loeb, and Smith(2001) did not find increased drop-out ratesassociated with the TAAS 10th-grade exit exam-ination. They argued that statewide TEA datashow that grade retention rose when the firstminimum competency tests were introduced inthe late 1980s, before the start of the TAAS test-ing in 1990–1991. Their data also suggest thatdownward trends in 9th- to 12th-grade studentprogression ratios ended shortly after the 10th-grade TAAS was implemented in the early1990s, to which it leveled off thereafter, so thatthe TAAS may not have caused ongoingdeclines.

Much of the dispute has focused on the rela-tionship between test score trends and changesin the testing pool—including drop-outs at thesecondary school level. Although many cities inTexas report low drop-out rates, student datashow sharply dwindling cohort sizes between9th and 12th grades. For example, although theU.S. Department of Education lists Brazos Cityas having one of the lowest graduation ratesin the United States (National Center forEducation Statistics, 2003), the city has reportedannual drop-out rates to the TEA below 2%(TEA, 2003). TEA auditors who checked thedropout coding at Brazos City high schoolsuncovered school use of PEIMS leaver codesthat artificially reduced reported drop-out ratesat most of them. When the auditors reviewed therecords of nearly 5,500 students who left thoseschools, they found that almost 3,000 studentsshould have been coded as dropouts but werenot. This is one example of the broader phe-nomenon of gaming that has been documented

79

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 7: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

in some studies of districts and states withhigh-stakes testing policies.

Accountability Systems and Gaming

Evidence of the effects of high-stakes testingand accountability policies on school responsessuggests that high-stakes testing systems thatreward or sanction schools on the basis of aver-age student scores may create incentives forschools to boost scores by manipulating thepopulation of students taking the test. In addi-tion to retaining students in grade so that theirrelative standing will look better on grade-equivalent scores, schools have been found tolabel large numbers of low-scoring students forspecial education placements so that their scoresare not factored into school accountability rat-ings (Allington & McGill-Franzen, 1992; Figlio& Getzer, 2002); they have excluded low-scor-ing students from admission to open-enrollmentschools; and they have encouraged poorlyperforming students to leave school, transferto GED programs, or drop out (Darling-Hammond, 1991; Haney, 2000; Smith, 1986).

Some studies have found evidence of gamingactions in the grade level just before the one forwhich school-level scores produce schoolaccountability rankings. For example, Allingtonand McGill-Franzen (1992) examined trendsin the incidence of retention, remediation, andidentification of students as handicapped inNew York State elementary schools during aperiod when a high-stakes testing and accounta-bility plan was implemented (1978–1989). Theytheorized that schools, in response to the factthat special education students’ achievement isnot included in the school accountability sys-tem, might respond by increasing special educa-tion assignments as well as by retaining studentsin grade. They reported a statistically significantincrease in the proportion of children retainedin grade or identified as handicapped in thirdgrade, just before the fourth-grade high-stakestests. In considering this relationship, theyargued that the removal of low-achievingstudents from the stream and their delay of entryinflated the reported assessment results, therebydemonstrating no real increase in school effec-tiveness. Figlio and Getzer (2002) also foundthat Florida schools tended to reclassify

low-income and low-performing students as dis-abled at significantly higher rates following theintroduction of a new test-based accountabilitypolicy and that these behaviors were concen-trated among the low-income schools mostlikely to be on the margin of failing the state’saccountability system.

Similarly, when Massachusetts began requir-ing a 10th-grade high school exit exam forgraduation in 2002, with scores tied to schoolaccountability rankings, graduation rates decreasedsharply for African American and Latino studentswhereas grade retention and dropout/disappear-ance rates escalated, especially in the 9th grade.Schools with the highest grade retention anddrop-out rates experienced some of the steepestincreases in test scores. For example, highschools receiving state awards for gains in 10th-grade pass rates on the Massachusetts testshowed substantial increases in prior-year 9th-grade retention rates and in the percentage of stu-dents who disappeared between 9th and 10thgrades (Wheelock, 2003).

At the high school level, gaming may includenot only student placements in program cate-gories such as special education but the denialof admissions and the encouragement to leave.Smith (1986) explained the widespread engi-neering of student populations, which he foundin his study of New York City’s implementationof test-based accountability as a basis forschool-level sanctions:

Student selection provides the greatest leverage inthe short-term accountability game. . . . The easiestway to improve one’s chances of winning is (1) toadd some highly likely students and (2) to drop someunlikely students, while simply hanging on to thosein the middle. (pp. 30–31)

More recent evidence regarding New YorkCity’s new exit exam requirements, imposed in1999, suggests that many of the city’s highschools are trying to improve their test scores bypushing out students who are unlikely to passthe tests. By 2000–2001, more than 55,000 highschool students were discharged without gradu-ating, a number far larger than the 34,000 sen-iors who actually graduated from high school(Advocates for Children, 2002), and the numberof school-age students in GED programs run bythe city schools increased by more than 50%,

80

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 8: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

from 25,500 to more than 37,000 (The New YorkTimes, May 15, 2001, p. A1). A study ofEngland’s high-stakes accountability system,which tied school rankings to student scores,also found that it led to a large increase in stu-dent exclusion rates (Rustique-Forrester, 2005).

A study by Schiller and Muller (2000) sug-gests that the nature of the incentive structuremay affect school responses. The authors foundthat more frequent testing increased the odds ofgraduating when tests carried consequences forstudents and that teachers used scores to identifyat-risk students, presumably for greater atten-tion. They also found, however, that test-basedconsequences for schools increased the odds ofstudents’ dropping out: When schools stood tobe sanctioned for low scores, teachers’ identifi-cations of at-risk students were associated withmore of those students leaving school. This find-ing is consistent with studies that suggest thatwhen schools are rewarded or punished forstudents’ average scores, there are substantialincentives for low-scoring students to be pushedout of the testing pool in one way or another.

The extent to which high-stakes tests maylead to gaming actions and student exclusionrather than efforts to improve teaching may alsodepend on school capacity—including whethera school has a stable cadre of skilled teacherswho can develop strategies that will meet theneeds of struggling students. In many states,schools serving the highest-need students arethose with the highest turnover, the greatestnumbers of untrained and inexperiencedteachers, the fewest monetary and curricularresources, and the least knowledgeable adminis-trators and senior staff (Darling-Hammond &Sykes, 2003). In these contexts, designationsthat a school is failing may be less likely toresult in improvements than in actions toimprove average scores by removing the lowest-performing students. Indeed, Rustique-Forrester(2005) found that British schools with lowerrates of exclusion had stronger, more expertstaffs with more engagement in decision makingand greater investments in professional develop-ment whereas those with high rates of exclusionhad large numbers of inexperienced, untrained,and substitute teachers and few resourcesdevoted to improving staff skills to better meetstudents’ needs.

Similarly, Diamond and Spillane (2004)found that high-performing schools, when underhigh-stakes accountability policies focused onschool scores, increased academic press (i.e.,the normative emphasis placed on efforts toimprove academic achievement), worked to dis-cover and adopt more effective instructionalstrategies, and created interventions for studentswho were lower performing. This was found incontrast to low-performing schools that were onprobation—schools with more needy studentsand less school capacity—which drilled stu-dents on test format and narrowed, rather thanexpanded, their instructional strategies. Theselatter schools also focused on their higher-per-forming students, in hopes of getting them toraise their scores, and gave up on their lower-performing students.

DeBray, Parson, and Woodworth (2001)also documented the compliance-without-capacity responses of low-performing schools toaccountability pressures in Vermont and NewYork. Whereas higher-performing schools usedthe policies to create greater internal accounta-bility around the construction of shared goals,curriculum changes, professional development,and teacher evaluation, the low-performingschools lacked the capacity to mobilize them-selves for productive change. In these schools,superficial compliance that focused on the testswas not accompanied by schoolwide initiativesto improve curriculum and instruction.

Mintrop’s study (2003) of 11 low-performingschools that were placed on probation inMaryland and Kentucky revealed that most ofthese schools—with high levels of teacherturnover and little teacher expertise—did notknow how to improve. In these schools, teachersdid not know how to better teach the students,and they often blamed them for the low per-formance; furthermore, rather than instituteteacher learning processes, administratorsresponded with control strategies that rigidifiedteaching. The few schools that were able toimprove were those that had more skilled teach-ers and a principal who created a collegial learn-ing process that could tap their expertise.

In sum, although testing policies may mobilizeschools to improve teaching for some students, itappears that these policies—especially whereschools and teachers have little knowledge and

81

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 9: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

capacity to improve instruction—can cause themost difficult-to-educate students to be held back,placed in special education, and encouraged toleave. Such actions may make schools look asthough they are succeeding on aggregated meas-ures without actually improving their quality.

Method

We use a mixed-methods approach to under-stand students’ school experiences and progres-sion through school, combining descriptive andmultivariate analyses of a longitudinal student-level administrative data set secured fromBrazos City Independent School District, withinterviews with students and staff to learn abouttheir direct experience of the policy.

As one of the large urban school districts inTexas, the district of Brazos City is fairly typi-cal. As in other Texas cities, it mostly serveslow-income students who are Latino andAfrican American, and in 2001–2002, just overone quarter of its students were identified asLEP. In sum, these cities have a much greatershare of students of color, LEP students, andlow-income students when compared to thestate as a whole (see Table 1).

Overview of Quantitative Data Set

The student-level data set includes 2,500 vari-ables for 270,000 students over a 7-year period(1995–2002), providing information about stu-dent background characteristics (race/ethnicity,income, language proficiency), school place-ments (grade level), and achievement scores for

each year, as linked to teacher and school charac-teristics: percentages of students by race/ethnicity,income, language status, special education status;the percentage “at risk,” as defined by a multifac-eted state index; the school accountability rating;and the percentages of certified teachers, newteachers, and rates of teacher turnover.

Unique student identifiers provide the abilityto follow students throughout their tenure in thedistrict. The data allowed for the examination ofelementary and middle school data from 1995 to2002. Three cohorts could be followed throughhigh school: those that began high schoolbetween 1996 and 1998, representing the gradu-ating classes of 2000, 2001, and 2002. The dataset included PEIMs codes designating studentsas dropouts, withdrawn from school, and gradu-ated. We constructed additional variables forstudents who were retained in grade and forthose who “disappeared” from the data set butwho were not coded as withdrawn or dropout.

Qualitative Data Collection

To examine how school staff and studentsexperienced the accountability policies, aresearch team undertook companion qualitativeinvestigations in a set of Brazos City highschools. The research was initiated with infor-mal focus group discussions with school stafffrom the Brazos City area about the history ofthe accountability system and their experiencesof the policies. These meetings were conductedto inform the research team on key issues andquestions to include in the instrumentation.These meetings were followed by interviews

82

TABLE 1Student Demographics for Texas, Brazos City, and Large Urban School Districts in Texas (2001–2002; inpercentages)

Texas Brazos City Large urban district average

African American 14 31 28Latino 42 56 53White 41 10 18Asian / Pacific Islander 3 3 2Native American 0 0 0Economically disadvantaged 51 79 67Limited-English proficient 15 28 27

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 10: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

with staff and students in seven traditional dis-trict high schools that agreed to participate inthe study. Because most of Brazos City’s highschools are “majority minority,” we selectedthree high schools with a majority of Latino stu-dents and four with a majority of AfricanAmerican students, all of which had high ratesof ninth-grade retention and dropout/disappear-ance. Nontraditional schools (e.g., charters,schools for previous dropouts) were excludedfrom the sample. Our sample of schools repre-sents about a third of the traditional high schoolsin Brazos City School District (BCSD).

We used a key informant strategy and soughtout school staff with significant levels of institu-tional memory in each high school. Schoolswere asked to randomly choose administrators,staff, and teachers who had more than 5 years ofexperience in BCSD. A total of 24 math andEnglish teachers were interviewed across theseven high schools. Fourteen BCSD high schooladministrators and staff (principals, counselors,testing coordinators, etc.) were included in thesample. Additional interviews were conductedwith 122 current and former BCSD high schoolstudents. Schools were asked to randomlychoose 18-year-old students from senior Englishclasses. English classes were chosen becausefewer seniors take mathematics; as such, sam-pling 12th-grade math classes would have biasedthe sample toward higher-achieving students.

The mixed-methods approach provides anopportunity to triangulate focus group inter-views and individual field interviews fromalmost 200 individuals, representing administra-tors, students, and teachers, with the trendsexamined in the quantitative research. In combi-nation, these sources of data allow us to gain amacro-level perspective regarding trends in stu-dent achievement, progression, and graduationin Brazos City, along with a micro-level view ofthe dynamics of student, teacher, and schoolresponses to the evolving incentives offered bythe accountability system.

Analyses

Our analyses were designed to address manyof the questions raised in the literature about theeffects of accountability systems on student per-formance and school continuation, as well as the

effects on school ratings of strategies for servingor excluding students. We sought to understandwhether test scores improved and for whom didthey improve and whether scores were affectedby excluding students from the testing pool orfrom school altogether, as some previous studieshave hypothesized. We also sought to under-stand whether schools improved their ratings inthe accountability system by undertaking gam-ing strategies affecting student participation.

Individual-level student achievement trends.Our first set of analyses uses the longitudinalstudent-level data set to track student test scoretrends on the TAAS, the Texas high-stakes test,and the Stanford Achievement Test–NinthEdition (SAT-9), a low-stakes test offered inBrazos City. We examine the relationshipsbetween students’ scores on the two tests, aswell as numbers and characteristics of studentswho were excluded from each test. If achieve-ment gains are real and generalizable, oneshould expect scores on the high-stakes TAASto strongly predict performance on the low-stakes SAT-9. To take account of student char-acteristics and the test exclusions that we foundon the TAAS, we use an ordinary least squaresregression model to examine the predictors ofstudents’ scores on the SAT-9 test as a functionof their TAAS scores (or their exclusion fromTAAS), their personal background characteris-tics, and their teachers’ qualifications. For thiscase of k independent variables, the ordinaryleast squares multiple regression equationmodel is

Yi = α + β1X1 + β2Xi2 . . . + βkXik + εi.

The βs equal the regression coefficients forthe independent variables: student demographiccharacteristics, teacher quality indicators, pro-portion of at-risk students at the school level,and whether the student had a valid TAAS score.The dependent variable Y represents SAT-9math and reading scores for student i. The con-stant α is where the regression line interceptsthe y-axis, and ε is the error term reflected in theresiduals.

Test score trends and student exclusions. Oursecond set of analyses focus on the Grade 10

83

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 11: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

TAAS exit exam, which provides the basis forhigh school rankings and which some previousanalyses have suggested may be associated withhigh rates of grade retention and eventualdropout. Using the individual level data, weexamine trends in test participation and passrates on each of the 10th-grade tests (reading,writing, and mathematics) over time, and weexamine students’ grade-to-grade progressionthrough school, tracking students’ trajectoriesover time in relation to race/ethnicity, languagestatus, and income, as well as test-taking his-tory. This allows us to empirically examinewhether test exclusions, grade retentions, anddropout or disappearance were widespread,which students were affected, and how theyinteracted with test score trends. We useindividual-level data to track the trajectory ofone cohort of entering ninth graders throughhigh school, following them for 6 years (from1996–1997 to 2001–2002) to track their gradu-ation status, including their test-taking and test-passage experience on the TAAS exit exam. Werecord not only the district’s codes for dropouts,transfers, and withdrawals but also the disap-pearance rates of students of different typesfrom the database and the district.

School-level achievement and gaming behav-iors. Our third set of analyses seeks to ascertainwhether high schools may deliberately engagein gaming the system by retaining low-scoringstudents in grade or by keeping or pushing themout of school to raise scores on the tests used forschool rankings. As noted earlier, we conductedin-depth interviews with 160 students, teachers,and administrators in a cross-section of BrazosCity high schools to understand how they per-ceive the influences of the testing and accounta-bility system and what strategies they use toboost scores. We then test whether these strate-gies have the effects on school rankings thatpractitioners believe they have.

One of the unique aspects of this data set isthat it allows us to use the district’s individual-level data to calculate student progress on theschool level more accurately than what schoolself-reported data may actually reflect. This isimportant given the growing discussion in the lit-erature about whether the data that are publiclyreported by schools, districts, and TEA adequately

represent student progress through Texas highschools (Greene, 2002; Haney, 2000; Orfield,Losen, Wald, & Swanson, 2004). We use individ-ual-level data to calculate school-level studentprogress variables from grade to grade andthrough graduation, in a set of regressions thatallow us to investigate the impact of student 9th-grade retention, dropout, and disappearance on10th-grade high-stakes test scores and schoolaccountability ratings tied to these scores.

Our school-level variables are constructedfrom BCSD-provided data on students whowere officially coded as dropping out or with-drawing from high school—plus a disappear-ance variable that we created to reflect theproportion of students who were not officiallycoded by schools or the district as withdrawingor dropping out but who did not continue in aBCSD high school. A variable was also createdto measure the proportion of students who wereretained in the ninth grade in each school. Theschool-level variables are expressed as propor-tions of students in BCSD high schools experi-encing each event.

We use a set of regressions to consider thestatistical relationships between year-to-yearchanges in student progress (grade retention,dropout/disappearance, withdrawal) and changesin school test scores and accountability ratings,controlling for changes in the school’s teachingcapacity and changes in the school’s studentdemographics. First, we use fixed-effects gener-alized least squares regression models to test therelationship between school-level changes inaverage exit exam scores and changes in studentprogression trends, demographics, and teachercapacity. We analyze achievement trends for thepopulation of 24 traditional high schools,arranged in a panel format with school and yearsas the units of analysis. The model is

Yit = β0 + ΣβkXkit +εit,

where

εit = ui + vi + wit.

As such, β denotes generalized least squaresregression coefficients; k indexes measure

84

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 12: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

independent variables; i indexes, high schools; tindexes, school years; ε is the error term; u is theschool component of error; v is the error acrossyears; w is the random component of error; andβ0 is the intercept. The dependent variable, Y, ismeasured as year-to-year changes in average10th-grade exit TAAS mathematics and readingTLI scores for each school from 1997 to 2002.

The TLI is a scaled score derived for theTAAS that describes how far a student’s per-formance is above or below the passing standardon each test. The TEA used the TLI as a metricto permit comparisons between TAAS adminis-trations and across grades for use in the account-ability system (TEA, 2000). However, the TLIdoes not represent a vertical scale that extendsacross the grades; instead, scores are scaled foreach grade level each year, using a scale that hasa maximum value of 100. If a student receivesthe same numerical score at consecutive gradelevels, he or she is said to have achieved a yearof growth.

To predict changes in school-level exitTAAS–TLI scores, we estimate both a random-effects and a fixed-effects model. (A school-fixed effects model is often used to remove biascreated by the inability to include controls forunmeasured school characteristics, for example,unchanging aspects of school culture, schoolstaff capacity, parental involvement, and othercharacteristics that have additive effects.) In thiscase, effects are fixed for schools and years. Wecompare the results of the two models and con-duct a Hausman test to consider whether thecoefficients estimated by the efficient random-effect estimator are the same as the ones esti-mated by the consistent fixed-effects estimator(Stock & Watson, 2003).

The equations use controls for changes inschool-level demographic variables and meas-ures of teaching capacity, including year-to-yearchanges in student characteristics (percentage ofWhite students, LEP students, special educationstudents, at-risk students) and teacher character-istics (percentage of teachers certified, teacherswith less than 3 years of experience, annualteacher turnover). The dependent variable in thefixed- and random-effect regressions considerschange in school-level average exit TAAS–TLIfor each high school (see appendix for descrip-tive statistics for variables used in the analysis).

Each year-to-year change represents a separateobservation in the random, fixed, and multino-mial regression models. Year-to-year changevariables for student progress, school capacity, andstudent demographics, as well as exit TAAS–TLIscores, were calculated as

Δ Vt = Vt – Vt – 1.

After considering the relationship betweenchanges in student progress, demographics, andteacher quality and high-stakes exit test scoresin the general linear model regressions, we thenexamine how these sets of independent vari-ables are related to changes in TEA accounta-bility rankings, by conducting a comparableyet independent analysis for the same years(1997–2002). This regression analysis usesmultinomial logistic regression, which estimatesthe probability of a specific event’s occurringand so allows consideration of more than twocategorical outcomes of the dependent variable.The dependent variable in the multinomialregression is the year-to-year change in a highschool’s accountability rating. Using blocks ofpredictor variables (changes in student progress,demographics, and teacher quality), regressioncoefficients were obtained for three contrastingsituations: a decrease in TEA school rating(used as the reference group), no change in rat-ing, or an increase in the school rating. TEA rat-ings were determined by the state as a functionof increases in TLI scores, coupled with offi-cially reported drop-out rates below thresholdlevels for each rating. The model is

log (πj /πJ) = αj + ΣΚk = 1βjkΧk.

Independent variables are denoted by Χk.These influence the probability πj that category jof the response variable will be chosen. In thisanalysis, 1 is used as the reference category.This analysis tests the relationship betweenyear-to-year school-level changes in TEAaccountability ratings and changes in studentprogression, controlling for changes in studentdemographics and teacher capacity.

Together, these analyses help us to under-stand student achievement trends in Brazos City

85

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 13: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

in conjunction with evidence about studentexclusion from the testing pool through testingexemptions, grade retention, and exclusion fromschool. The analyses explore the influences ofaccountability testing on students’ classifica-tions and progress through school for differentsubgroups of students, as well as the influenceson school accountability ratings of engaging inthese classification and retention practices.

Student Achievement and Attainment inBrazos City

Achievement and Exclusions on the TAAS

Improvements in TAAS achievement scoresfor White students and students of color havebeen widely cited as evidence of majorimprovements in education in Texas. The TAASwas administered statewide in Grades 3–8 andGrade 10 from 1994 to 2002. Cut scores were

set for the 10th-grade TAAS, which served as anexit exam from high school, and TEA accounta-bility ratings for schools were based on the per-centage of students passing the TAAS at thatgrade level.

For cross-year comparisons, we used theTAAS–TLI. We found sharp increases in stu-dent performance on this indicator in readingand mathematics, with the exception of a dropin 1998–1999, when there was a large decreasein TAAS testing exemptions for LEP studentsand a small decrease in special educationexemptions, occasioned by public criticismabout the numbers of students not tested (seeFigure 1). After the drop in scores that occurredwhen exemptions decreased, math and readingTLIs stabilized for all grades in 1999–2000 andimproved steadily until 2001–2002. Grade 3scores dropped most in 1998–1999, whenexemptions were reduced and then remainedlower than those of other grade levels. Thiscoincides with a doubling of retention rates at

86

65

70

75

80

85

90

1995–1996 1996–1997 1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

School Years

Scores

3 4 5 6 7 8 10.Grade

FIGURE 1. Mean Texas Assessment of Academic Skills–Texas Learning Index scores in mathematics, by gradelevel.Note. Grade 10 was phased into the data set in 1997.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 14: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

Grade 3 between 1999 and 2000 (from 5% to10%) and a continuation of higher-than-averageretention rates at this grade in the subsequentyears. Holding back low-scoring third-gradestudents may have depressed scores at thatgrade level.

Trends in TAAS scores appear to suggest asubstantial closing of the achievement gapbetween racial and ethnic groups, as shown inFigure 2, and between English-proficient andLEP students. African Americans, who repre-sent a disproportionate share of special educa-tion students in Brazos City, and LEP studentsshowed the largest drops in scores in1998–1999, when exemptions from testing werereduced; they also showed the greatest apparentrecovery thereafter. However, as we discussbelow, the second wave of score increasesoccurred while there was a correspondingincrease in the number of elementary studentswho were missing test scores altogether.

Two important factors influence the interpre-tation of these trends. First, the TEA chose notto include achievement on the Spanish TAAS aspart of the reported TLI for elementary students(Grades 3–6), which excluded approximately

7,000 Latino students per year from calculationsof TLI trends. The proportion of students takingthe Spanish TAAS increased over the period thatwe examined, from 7% in 1995–1996 to 10% in2000–2001. In addition, the number of studentswith missing scores increased as exemptionsdecreased, growing from 2% of scores in1995–1996 to 10% of all scores in 2000–2001and 9% in 2001–2002 (see Table 2). Thus,whereas LEP and admission, review, anddismissal exemptions from TAAS testingdecreased from 16.5% of all students in1995–1996 to 2.5% by 2001–2002, most of thestudents no longer exempted appear to haveshifted to the Spanish TAAS test or to missingscores.

Despite an increase in the share of studentstested over time, the proportion of Brazos Citystudents included in English TLI scores reachedonly 78% in 2002, up from 72% in 1996. Thus,more than 20% of the district’s students werenot included in the state-reported TLI scores ineach year. This figure suggests that althoughsome of the gains in TAAS scores may havereflected improvements in learning, much of theincrease was likely associated with changes in

87

90

Asian American

White

85

African American

roficient

Limited-English proficient

Latino

English p

80

75

70

65

earsSchool Y

1995–1996 1996–1997 2001–20022000–20011999–20001998–19991997–1998

Scores

FIGURE 2. Mean Texas Assessment of Academic Skills–Texas Learning Index scores for Grades 3–8 andGrade 10 in mathematics, by demographic group.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 15: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

the testing pool. This possibility was reinforcedby a conversation with a BCSD board memberwho noted,

The LEP students were not tested because it wouldaffect the scores. So the outcry was “Your scores areinflated because you don’t test everybody.” You can’thave it both ways. You have to understand what theresults are and what went into it, if you test every-one, like we did . . . like [the superintendent] recom-mended. Okay, you think our scores are inflated,next year we’re going to test everybody. And so thescores went down, of course, because you throweveryone into the mix. . . . That’s one of the negativeconsequences: When you test everyone, the scoresare going to plummet.

We found much lower rates of student exclu-sion on the low-stakes SAT-9 tests, instituted byBCSD in 1996 but not considered in the state ordistrict accountability system. In 1997–1998,approximately 93% of BCSD elementary and

middle school students were tested on the SAT-9 and Aprenda 2 (Spanish version) mathematicstests, compared to 81% on the English andSpanish TAAS. By 2001–2002, 96% of elemen-tary and middle school students took the SAT-9and Aprenda 2, compared to 88% who took theTAAS in English or Spanish. In contrast to theTAAS, there was little difference in test-takingpatterns by race/ethnicity throughout the period.Given our analysis of the individual-level data,we found that throughout the period of 1997 to2002, 96% of Latino and Asian students hadscores reported on the SAT-9 tests, as did 95%of African American and White students (seeTable 3).

On average, about 12% of African Americanand Latino elementary and middle school stu-dents were excluded from the TAAS but notfrom Harcourt testing (SAT-9 or Aprenda) from1997 to 2002, which was about double the

88

TABLE 2English and Spanish Texas Assessment of Academic Skills (TAAS) Mathematics Score Codes: Grades 3–8(1995–2002; in percentages)

Students 1995–1996 1996–1997 1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

English TAAS 71.9 71.6 72.4 77.4 77.1 76.2 77.5Spanish TAAS 7.3 6.3 8.6 9.8 10.3 10.4 10.1Missing 2.0 4.4 1.7 1.0 1.5 10.0 8.9ARD exempt 8.5 9.1 9.1 8.4 7.8 0.0 0.0LEP exempt 8.0 6.6 6.3 1.9 2.0 2.3 2.4Absent 1.9 1.5 1.5 1.2 0.9 0.8 0.7Other 0.5 0.5 0.4 0.2 0.3 0.3 0.3

Note. ARD = admission, review, and dismissal; LEP = limited-English proficient.

TABLE 3Texas Assessment of Academic Skills (TAAS) and Harcourt Mathematics Testing by Race/Ethnicity: Grades 3–8(1997–2002; in percentages)

Students Not Harcourt math tested Harcourt math tested

Asian American No math TAAS 3 9TAAS math tested 1 87

African American No math TAAS 3 12TAAS math tested 2 83

Latino No math TAAS 2 12TAAS math tested 1 85

White No math TAAS 3 6TAAS math tested 2 89

Note. Harcourt = Stanford Achievement Test–Ninth Edition or Aprenda.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 16: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

89

TABLE 4Stanford Achievement Test–Ninth Edition Mean Scores, by English Texas Assessment of Academic Skills (TAAS)Participation Status: Grades 3–8 (1997–2002)

Stanford Achievement Test–Ninth Edition

Reading score (n) Math score (n)

Students TAAS No TAAS TAAS No TAAS

Overall 459.70 (341,169) 210.54 (65,307) 504.9 (346,663) 273.75 (59,813)African American 439.95 (125,008) 196.78 (25,737) 467.12 (127,584) 233.91 (23,161)Latino 415.29 (163,522) 201.21 (33,097) 479.22 (166,004) 282.49 (30,615)Native American 556.33 (253) 354.90 (28) 588.76 (258) 427.38 (23)Asian American 598.29 (11,144) 272.27 (1,614) 705.97 (11,221) 455.31 (1,537)White 654.97 (41,242) 327.76 (4,831) 664.86 (41,596) 361.09 (4,477)

proportion of Whites excluded. Furthermore,Latino students taking the TAAS in Spanish—accounting for 10% of all test takers by 2002—were excluded from the TLI. As a result, muchgreater proportions of Latino and AfricanAmerican students’ test scores were excludedfrom TEA accountability ratings than what wastrue for White and Asian American students.

Further examination demonstrated that thoseexcluded from the English TAAS—the basis ofthe state and district accountability rankings—scored significantly lower on the SAT-9 than didthose who took the TAAS, across all racial/ethnic groups (p < .001; see Table 4).

These differences in exclusions are likely amajor reason why the increases in scores seenon the TAAS were not found on the SAT-9.Although some increase was seen in SAT-9 read-ing and mathematics scores between 1997 and1998, average scores were relatively flat from1998 to 2001 and then decreased in 2002. Therewas little reduction of the achievement gap byrace/ethnicity (Figure 3), and the gap betweenEnglish speakers and LEP students increasednoticeably in mathematics (Figure 4) and to aneven greater extent in reading (Figure 5).

Predictors of Student Achievement

The substantial exclusion of low-achievingstudents on the TAAS probably explains whythe correlations between student SAT-9 normalcurve equivalent scores and the TAAS–TLIscores (in Grades 3–8 plus Grade 10), thoughsubstantial, are more modest than what might be

expected (r = .64 in reading and .59 in math,p < .001). To evaluate the extent to which testexclusion and other policy variables (such as theprovision of qualified and experienced teachers)might be associated with student performanceindependent of student and school demographiccharacteristics, we estimated the determinantsof SAT-9 scores in Grades 3–5, given that theywere the grade levels at which students had sin-gle teachers who could be attached to their testscore records.

A generalized least squares regressionanalysis uses individual-level data to examinestudents’ test scores on the SAT-9 in relationto their scores (and exclusions) on the TAASas well as their demographic characteristics.As shown in Model A (see Table 5) and asexpected, students’ SAT-9 scores are stronglypredicted not only by their TAAS–TLI scoresbut also by their race/ethnicity, language status,income status, and school-level proportions ofat-risk students. Through Model B, we exam-ined the effects of a dummy variable, represent-ing inclusion or exclusion in the English TAAS,which proved to exert a strong influence onSAT-9 achievement.

Finally, although exerting a smaller influ-ence, achievement was significantly higher forstudents with teachers who were certified andhad more than 3 years of experience—an addi-tional important policy variable in a district withlarge proportions of uncertified teachers whoare disproportionately allocated to poor andminority students. Together, these findingssuggest that low scorers were significantly less

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 17: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

90

750

700

Asian American

650

White

600

1998–1999

ears

Latino

Native

School Y

2001–20022000–20011999–2000

550

500

450

400

350

3001997–1998

American

African American

Scores

FIGURE 3. Mean Stanford Achievement Test–Ninth Edition math normal curve equivalent scores for Grades3–10, by race/ethnicity.

250

300

350

400

450

500

550

1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

School Years

Scores

EP LEP

FIGURE 4. Mean Stanford Achievement Test–Ninth Edition math normal curve equivalent scores for Grades3–10, by limited-English-proficient status.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 18: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

91

250

300

350

400

450

500

1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

School Years

EP LEP

Scores

FIGURE 5. Mean Stanford Achievement Test–Ninth Edition reading normal curve equivalent scores forGrades 3–10, by limited-English-proficient status.

TABLE 5Predictors of Stanford Achievement Test–Ninth Edition Scores, Grades 3–5: Random-Effects Generalized LeastSquares Regression

Stanford Achievement Test–Ninth Edition

Reading Math

Predictor Model A Model B Model A Model B

Constant 97.833*** (2.751) 487.377*** (1.982) –10.546*** (2.843) 498.527*** (2.112)Limited-English proficient –41.990*** (1.151) –56.635*** (1.215) –12.634*** (1.170) –19.084*** (1.296)Economically disadvantaged –47.051*** (0.963) –62.252*** (1.082) –34.813*** (0.978) –52.603*** (1.153)Native American –47.755*** (13.748) –38.186* (16.665) –28.777** (13.647) –19.589 (17.482)Asian American –8.065*** (2.490) –17.520*** (3.049) 52.439*** (2.457) 62.756*** (3.176)African American –109.519*** (1.422) –150.155*** (1.724) –83.471*** (1.408) 137.602*** (1.800)Latino –91.527*** (1.509) –127.648*** (1.807) –55.757*** (1.492) –87.796*** (1.889)Teacher fully certified 4.399*** (0.690) 8.275*** (0.705) 6.786*** (0.725) 10.843*** (0.764)Teacher experienced 13.814*** (0.726) 12.471*** (0.747) 11.042*** (0.762) 9.877*** (0.809)School at-risk index –0.741*** (0.022) –0.474*** (0.024) –0.676*** (0.022) –0.160*** (0.025)Texas Learning Index score 6.361*** (0.026) — 8.227*** (0.029) —Student has valid TAAS score — 159.102*** (1.062) — 159.032*** (1.170)R2 .506 .342 .495 .272Number of observations 163,709 193,339 166,950 193,339

Note. TAAS = Texas Assessment of Academic Skills.*p < .05. **p < .01. ***p < .001.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 19: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

likely to be included in high-stakes accountabil-ity formulas and more likely to have underpre-pared and inexperienced teachers and to bein schools with a greater proportion of at-riskstudents.

High School Exit TAAS Testing Trends

We found that the number of students untestedby the TAAS high-stakes tests grew even larger inthe high school years but that the reasonschanged. In 2002, 79% of sophomores in theBCSD were reported to be passing the TAAS exitexams, up from 72% in 2001, a gain widely cel-ebrated in the local press. We found, however,that these 10th-grade pass rates did not signifythat most high school students in BCSD took andpassed the exit exam or successfully graduatedfrom high school. Indeed, only a minority did so.

Figure 6 shows the cumulative testing andpassing rates by subject for the group of stu-dents who began high school as ninth graders inBrazos City in 1997 and would have graduatedon time in 2001. This cohort contains more than13,000 students who were in the eighth grade inBCSD in 1996–1997 and then the ninth grade in1997–1998. This method leaves out retainedeighth graders and previously retained ninthgraders from the cohort.

Only 40% of the BCSD 1997–1998 ninth-grade cohort ever passed the writing section ofthe exit TAAS. The reading and math sectionsshow slightly less student success at 39% and38%, respectively. Approximately 20% of stu-dents took and failed the math, reading, andwriting sections of the test. Most surprising, ofthe original ninth graders in BCSD in1997–1998, about 40% did not take each sectionof the exit exam in the district during any aca-demic year between 1997 and 2001. Althoughthere were differentials by race/ethnicity, thepass rates were not high for any ethnic group.For example, only 62% of Asian American stu-dents and 54% of White students in the 19979th-grade cohort ever passed the 10th-gradereading test in BCSD. Only 38% of AfricanAmerican students and 36% of Latino studentspassed this portion of the test.

One reason for the low proportions of studentstaking the exit TAAS is that many did not reachthe 10th grade when it was first offered. Among

9th graders who entered high school in 1997,26% were retained in 9th grade. Ninth-graderetention increased to 31% of all students by2001. Most of these students never took the exittest. For example, of ninth-grade-retained stu-dents in the 1997–1998 cohort, 64% never tookthe reading portion of the exit test, and only 12%ever passed it. During their high school careers,only 209 of 3,489 retained students (about 8% ofthe total) ever became eligible to graduate bypassing all three subjects on the spring exitTAAS. What happened to these students and oth-ers who did not complete high school in BCSD?

Where Did Students Go? Analyses ofStudent Enrollment Trends

Carnoy et al. (2001) propose two scenariosabout the possible relationship between theTAAS and student enrollment outcomes:

In the first, an emphasis on increasing TAAS scoresincreases the overall quality of schooling, leading togains in student learning on multiple levels anddecreases in the dropout rate. In an alternative sce-nario, however, increased emphasis on TAAS comesat the expense of other learning or leads to efforts toscreen students before they take the TAAS. This maylead to increases in the dropout rate, either as lowperforming students are forced out of schools inorder to increase school average TAAS scores or asstudents choose to leave. (p. 18)

To examine whether the alternative scenariooccurred in BCSD during the formative years ofthe Texas test-based accountability system, weexamined several measures of student progressin school: grade retention, dropout, withdrawal,disappearance, and graduation. First, however,we examined student mobility because some ofthe students who never took the tests undoubt-edly transferred to other schools outside the dis-trict. To estimate the potential upper bound onmobility owing to transfers, we use BCSD’sindividual-level data set to analyze potentialmobility for students who attended middleschool (Grades 6–8) in BCSD but were nolonger in the district for ninth grade. Weexpected that the greatest mobility would befound in the transition from middle school tohigh school, given that this is the point at whichmany students opt for private schools and sub-urban public high schools.

92

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 20: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

Among eighth graders who attended BSCDfor middle school in 1998–1999, 97% remainedin BCSD the following year. About 1% wereofficially coded as having dropped out. Giventhe nature of the accountability system, therewas no incentive for schools to overreportdropout data. If there is a bias, it would be tounderreport dropouts. This suggests a potentialupper-bound mobility rate of a little more than2% for students advancing to high school frommiddle school. Assuming an average rate of 2%mobility for each of the 4 years of high school,no more than 8% of BCSD high school studentsmight be considered to have transferred to otherschools.

High School Grade Progression

We looked at how students progressedthrough high school and how many made itthrough to graduation by following three enter-ing ninth-grade cohorts over a 4-year period. AsFigure 7 shows, about half of each class did notprogress from 9th to 10th grade in their 2nd yearof high school, and an additional 10% did notprogress from 10th to 11th grade in their 3rd year.

For each cohort, African Americans and Latinosshowed the steepest loss between the 9th and10th grades, given that 50% to 55% of the fresh-man class did not progress on time, as comparedto 30% to 35% of Whites and Asian Americans.All student groups and cohorts showed a smaller(10%) failure to progress from the 10th grade tothe 11th grade on time, and showed fairly levelprogression trends from the 11th to 12th grades.What is interesting is that the 1997–1998 AfricanAmerican and Latino 9th-grade cohort gainedstudents between the 11th and 12th grades. Thisresulted from another gaming practice: that of“skipping” students past the 10th grade to avoidthe TAAS test at the time when it would be fac-tored into the school accountability ratings. Fouryears after entering 9th grade, about 40% ofAfrican American and Latino students in eachcohort were enrolled in the 12th grade, as com-pared to just more than 50% for Whites andAsians.

This analysis highlights the volatile size ofthe 9th-grade class from year to year: The num-ber of 9th graders increased by nearly 30%between 1996–1997 and 1997–1998 anddropped again by about 10% in the subsequent

93

19.420.3

21.4

40.339.4

40.3 40.3 40.3

38.3

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

40.0

45.0

Writing Reading Math

TAAS Exit Test

Per

cen

tag

e o

f S

tud

ents

No Pass Pass No Test

FIGURE 6. Ninth-grade cohort (1997–1998): Cumulative testing results on the exit Texas Assessment ofAcademic Skills, by subject.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 21: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

year. These changes were not a function of dis-trict enrollment changes. Instead, they reflectfluctuations in ninth-grade retention rates, giventhat the BCSD district accountability system—which was piggybacked on the state system—was introduced in 1995. Some schools began touse substantial grade retention in the 9th gradeshortly thereafter, and as we learned in our inter-views, other schools followed suit when theylearned about the effects of the practice on 10th-grade scores, a point that we explore below.

To understand in detail how studentsprogressed through their high school careers,we followed a first-time 9th-grade cohort(1996–1997) for 7 years. Before entering the9th grade, about 97% of this entering cohort wasin the 8th grade in Brazos City (see Table 6).After the 9th-grade year, 53% of the cohortadvanced to the 10th grade, whereas 30%remained in the 9th grade—including one thirdof African Americans and Latinos and one eighthof Whites and Asians. What is interesting to noteis that about 4% of the students in the 9th gradeskipped to the 12th grade in the 2nd year, almostall of them African American and Latino. Theremaining 13% of students disappeared. By

the 3rd year, 44% had progressed to the 11thgrade; 32% had disappeared; 20% were still in9th or 10th grade; and 6% skipped to the 12thgrade. In the 4th year, 45% of students had madeit to the 12th grade; 40% had left; and 15% stillremained in school in grades below the 12th. Bythe 5th and 6th years, only a few remained inschool. Thus, there was not a large increase ingraduation rates after the traditional 4 years ofhigh school.

Dropout and Graduation Rates

Clearly, not all the students who failed tograduate dropped out, but as we have suggested,figuring out how many students actually gradu-ated from BCSD and how many transferred toother districts or dropped out is not an easy mat-ter. Earlier, we reviewed differing reports ofdrop-out rates at the state level in Texas andnoted TEA audits revealing that schoolsseverely underreported dropout data in responseto incentives in the accountability system. In theTEA accountability baseline year of 1994, drop-out rate standards were first established forschools rated exemplary (no more than 1.0%)

94

14320

7532

6307 6447

18292

8552

7190

16707

8517

66456875

6901

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

1995–1996 1996–1997 1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

Nu

mb

er

of

Stu

de

nts

School Year

9th Grade1996–1997 9th Grade1997–1998 9th Grade1998–1998

FIGURE 7. Brazos City School District high school cohort progression (entering ninth graders, 1996–1998).

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 22: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

and recognized (no more than 3.5%). In1995–1996, drop-out rate standards were estab-lished for schools rated acceptable (no morethan 6% for each demographic group).

In our data set, school-reported drop-out ratesby grade level hovered below 2%, except thosefor the ninth grade, which peaked around 3.5%in 1999–2000. Rates for all demographic groupsdipped sharply in 1995–1996 and remained justbelow accountability system threshold levels,positioning schools to meet expectations for akey indicator in the TEA accountability system.

BCSD reported in 2002 that the district grad-uation rate had soared 21 percentage points over5 years, from 54% in 1997 to 75.3% in 2002.However, our analyses show that most studentsfailed to pass the exit exam and were not gradu-ation eligible. Arriving at an estimate close toours, a management team member at King HighSchool estimated from her experience, “I wouldthink that the graduation rate is closer to40%–45%, not 85%.” She continued, “Ultimatelywhat’s happening is that we’re letting kidsdown. We’re using some kind of system to dis-guise where they are. If you’ve got 600 [fresh-men] in your school and 300 graduate, they’resomewhere . . . ”

In a study of dropout data, Orfield and col-leagues (2004) noted that the overstatement ofgraduation rates in Texas may occur partlybecause Texas’s individualized tracking system

(PEIMS) includes many ways to exclude stu-dents from enrollment data used to calculategraduation rates. For example, a missing studentmay be taken off the books by a school if he orshe is presumed to be in school elsewhere or tohave graduated, when that student may in facthave dropped out. In practice, what this meansis that if a student does not have an officialPEIMS code for the TEA cohort graduation cal-culation, he or she is dropped from the denomi-nator (TEA, 2002).

Using a cohort method to track graduationrates for students entering ninth grade in 1997,we calculated a graduation eligibility variable toconsider students who were eligible for gradua-tion based on having reached senior year andhaving passed all sections of the TAAS exitexam. As Table 7 shows, of the 13,651 studentsin the 1997–1998 ninth-grade cohort, 4,111(30%) appear to be graduation eligible within 5years. Using the district-provided graduationstatus code, 4,458 (33%) students are coded ashaving graduated by 2002 (5-year span).

Asian Americans had the highest completionrate, with almost 50% coded as having gradu-ated. Whites and African Americans had thenext-highest rates at 43% and 39%, respectively.Less than a quarter of Latino students in thecohort were coded as having graduated fromBCSD by their 5th year. Economically disad-vantaged students showed a graduation rate of

95

TABLE 6Progression of Ninth-Grade Cohort (1996–1997) Through High School (in percentages)

Grade Students 1995–1996 1996–1997 1997–1998 1998–1999 1999–2000 2000–2001 2001–2002

8 White/Asian American 98African American/Latino 97All students 97

9 White/Asian American 100 13 2 2African American/Latino 100 33 6 4 1All students 100 30 6 4

10 White/Asian American 66 7 2African American/Latino 50 15 4 1All students 53 14 4 1 0.2

11 White/Asian American 55 3 1 0.0African American/Latino 42 8 3 1.0All students 44 7 1 0.5

12 White/Asian American 1 2 55 2African American/Latino 4 6 43 6 1All students 4 6 45 2 1

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 23: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

approximately 28%, whereas only 20% of LEPstudents were coded as graduated.

Of note, African American students appear tograduate at rates 10 percentage points higherthan their rates of graduation eligibility. Thereare two possible explanations for this difference.One possibility is that large numbers of AfricanAmerican students took and passed the make-upsummer exit TAAS to gain eligibility for gradu-ation. However, a check of the district’s 2001summer school report shows that even if AfricanAmericans composed half of all students pass-ing the make-up exit TAAS, it would increasethe overall African American eligibility forgraduation by only 2 percentage points, from29% to 31%.

A more plausible explanation is that AfricanAmerican students received special education(admission, review, and dismissal) exemptionsfrom the exit TAAS testing and could thereforegraduate even though they were not identified asbeing eligible by our measure, which includedsuccessful completion of the exam. Unfortunately,the data set does not include a special educationidentifier for individual students. However, thereis a significant and positive correlation betweenthe proportion of African American and specialeducation students (r = .45) in traditional BCSDhigh schools. In national data sets, AfricanAmericans are much more likely than other stu-dents to be identified for special education. Useof admission, review, and dismissal exemptionsfor some or all sections of the exit TAAS could

have allowed a significant subset of AfricanAmericans to graduate without passing all sectionsof the exit TAAS.

In addition to the 33% of students who werecoded as graduates by the 5th year after entering9th grade, 15% were still seniors and 6% wereenrolled in Grades 9–11. About 25% officiallywithdrew from school; 6% were coded asdropouts; and 18% disappeared from the dataset. An evaluation of the status of these studentswith respect to the exit exam reveals that asidefrom seniors, of whom two thirds had passed theexam, most students in other categories werenot in position to graduate and would have hada strong probability of having dropped out ofschool (see Table 8). Among those who disap-peared from the data set, only 132 of 2,419 hadpassed all three exit tests. Similarly, amongthose who withdrew from school, only 362 outof 3,348 had passed the exit tests. Among thosestill enrolled in Grades 9–11, only a minority(121 out of 809) had passed the tests.

If we assume, on the basis of our earlieranalysis, that up to 8% of BCSD high school stu-dents may have transferred to other schools andif we paint the cohort’s progress in the rosiestpossible hues, about 33% graduated within 5years; 19% were still in school (about a third ofthem in a position to graduate); an estimated 8%had transferred to public and private schools outof district; and about 40% dropped out, “with-drew” without having passed the exit exam, ordisappeared without having passed the test.

96

TABLE 7Graduation Rate of Ninth-Grade Cohort Entering High School in 1997 (in percentages)

Student characteristics Graduation eligible within 5 years Graduated within 5 yearsa

Overall 30.1 32.7White 44.9 43.3Latino 26.1 24.8African American 29.3 39.4Asian American 53.1 49.4Economically disadvantaged 26.3 28.3Limited-English proficient 14.1 20.0Did not pass eighth-grade TAAS in reading 7.3 19.3Did not pass eighth-grade TAAS in math 9.7 22.3

Note. TAAS = Texas Assessment of Academic Skills.a. As coded by district.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 24: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

School Gaming and Accountability

It would appear that the large numbers ofBrazos City students who failed to take the highschool tests and who left school without beingcoded as dropouts could have enabled schools toappear to meet accountability standards withoutactually doing so. In this section, we evaluatewhether this occurred and, if so, how. We do thisby examining what educators and students toldus about their experiences and by modelingchanges in school accountability ratings as afunction of specific practices, such as graderetention and student pushout.

Accountability Pressure in BCSD High Schools

If institutional theorists are correct, schoolsmay react to accountability pressures with cere-monial conformity and various types of gamingresponses, some of which may be educationallydestructive (Meyer & Rowan, 1977; Oliver,1991). In hierarchical systems, policy pressurein the form of rewards and sanctions is firstapplied to district superintendents, then to theirsubordinates (including principals), then toteachers, and on to students. In Brazos City,incentives included bonuses for principals whoraised scores and the probability of terminationotherwise, given that contracts were at will.A former BCSD board member describedhow these rewards and sanctions affected schooladministrators.

In the BCSD evaluation system, you [feel pressure]because you have to measure progress. . . . The peo-ple who are manipulating it . . . feel pressure becausethere’s a stipend involved. . . . If it’s the goal [of

accountability] that’s set to raise the scores, thenyou’re going to do whatever it takes to raise thescores.

The flip side of monetary incentives involvedthe threat of firing. An administrator atEdgeview High School explained, “All of us areon at-will contracts. So . . . we can be let go atthe end of the year. . . . It’s a lot of pressure . . .not even subtle pressure . . . just hard pressureput on you to get those scores up.”

We found substantial consensus among theBCSD staff whom we interviewed about theinfluences of these pressures and the accounta-bility system itself. Several principals identifieda widespread culture of gaming the system inBCSD. One said,

When you talk the company talk, you forget whathonesty is. And my fear is that . . . we’ve forgottenwhat honesty is. I think that what has happened isthat we’ve gotten all caught up in [accountability]that we don’t know what honesty is anymore. I wasdown at the administration building just today, thismorning. I was passing by some secretaries whowere there, and they said . . . “Why are they makingsuch a big to-do over these schools that cheat?Everybody cheats.” They think everybody is doing it,and it makes it right. . . . That’s the feeling that per-meates everything that I hear and see.

Gaming Strategies

Administrators and teachers described a cul-ture of gaming and a set of strategies that hadbeen devised to boost ratings. A former CrockettHigh School staff member observed,

[You] have to understand the culture of Texas. Texasis a standardized testing machine. We believe in

97

TABLE 8Ninth-Grade Cohort (1997–1998) by Student Progress and Exit Test Status (percentages in parentheses)

Exit test status

Students Not passed Passed Total

Enrolled in Grades 9–11 688 (5) 121 (1) 809Coded as dropout 782 (6) 65 (< 1) 847Disappeared from data set 2,287 (17) 132 (1) 2,419Senior (includes graduates) 2,115 (15) 4,113 (30) 6,228Coded as withdrawn 2,986 (22) 362 (3) 3,348Total 8,858 (65) 4,793 (35) 13,651

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 25: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

evaluation and assessments, quantitative numbers. That’sthe way the state is run, and that’s the culture. So if aprincipal knows that his ninth graders are not preparedfor that examination . . . he is going to put certain mech-anisms in place that are going to involve students[being] held back, suspended, so on and so forth.

A management team member at Edgeviewdescribed how schools have approached gaminghigh-stakes testing and accountability year toyear.

I think each year we get a new set of regs, and we tryand figure out how is the best way to use it to ouradvantage. . . . The game changes . . . it’s . . . likeany—like a game that has a set of instructions. Andeverybody gets the same set of instructions, andeverybody follows the same set of instructions. . . . Ifyou’re really savvy and if you’re really into every-thing as a principal, you may see a problem . . . youmay give your campus an advantage that anothercampus doesn’t have.

Retention and the waiver policy. One strategydescribed by administrators and students wasthe waiver promotion policy, which requiredfreshman and sophomore students to pass alltheir core courses to move onto the next grade.A management team member at King explainedhow the waiver allowed schools to avoid testingstudents on the 10th-grade exam:

If you’re not in a 10th-grade homeroom, then youdon’t get officially listed as a test taker. . . . So whatyou ended up with in 10th grade then were all thepeople who could pass all their core classes. Thescores jumped because you put up a barrier, andeverybody else was still in ninth grade. . . . With thatwall that you could create with the waiver, those kidsnever entered the accountability picture.

A member of Edgeview’s management teamdetailed how this policy functioned both toincrease scores on the 10th grade test and tosecure rewards:

The waiver was set up so that if you did not pass allfour core subjects at the ninth-grade level, no matterhow many credits you had, you could not push for-ward. . . . Well, I am taking all these 10th-gradeclasses, except I have to wait a whole semester to takethe [failed section of] Algebra I B. When I finally getthat credit, I have enough credits to be a junior. I nowhave 12 credits rather than my 7 credits. So I skipover taking that 10th-grade test. It’s the 10th-gradetest that had been used to judge the school. So I have

a large group of people who are skipping over theaccountability grade. . . . Here in our school, thatwaiver was used basically to boost the test scores. Ithad a lot to do with who got the [incentive] money.They wrote a waiver, so they’ll circumvent the rule,so they’ll know they’ll have a higher percentage ofstudents passing the test. . . . There was like a pool ofmoney each school would get.

Focus groups at all the high schools we vis-ited included seniors who discussed peers whohad been retained in the ninth grade one or moretimes. Most focus groups included students whowere retained in the ninth grade for 3 or 4 years.Students consistently reported that many of theirpeers who were retained multiple times eventu-ally gave up and dropped out of school. Thiscomment by a Latino student was typical:

I have a friend that was in ninth grade for 2 years, andshe was 19 or 20 years old. She did not pass algebra,and the school told her that if she didn’t improve hergrades, they were going to drop her since she wasolder. So she . . . dropped out of school.

Skipping over 10th grade. Our analysis surfacedthe interesting practice of grade skipping, inwhich students stayed in 9th grade for 2 years ormore and then suddenly reappeared in the 12thgrade. This practice could have two benefits forschools: First, by skipping 10th grade, studentswould not take the TAAS exit test in the yearthat it counted for school accountability ratings.Second, by showing up in 12th grade, theywould contribute to a more favorable statisticwhere school progression is examined as theproportion of 9th graders who appear in 12thgrade 4 years later. The practice was describedto us as being widespread but mysterious to thestudents. A Clearbend student talked about herbrother’s experience:

This year he was supposed to graduate, but last yearhe was in the ninth about this time. But he’s in the11th right now, so if he did community service, he’llprobably be in 12th. . . . [He was in ninth grade] likeabout 3 years . . . [then] they made him 12th.

Another Clearbend student spoke aboutgrade skipping and how it had affected a friend:

This person I was telling you about . . . she was inthe same year when we got here but all of a suddenshe was in the 12th grade, and then she didn’t know

98

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 26: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

why. . . . But even though she was a senior, shewouldn’t be able to graduate because of [not havingtest results for the] TAAS.

A disturbing aspect of this grade-skippingpractice is that many students never tested onthe exit TAAS whether they were held back in9th grade, progressed to 10th grade, or jumpedover their sophomore year. This allowed theschools that they attended to effectively sidestepthe high school accountability system. And ifstudents did not test on the exit exam, then theywere not eligible to graduate unless theyreceived an exemption, available to special edu-cation students.

Keeping or pushing out low-scoring students.We encountered a number of administrators whodescribed schools’ refusing to enroll low-scoringstudents, and we spoke to students who reportedtrying to enroll at schools but getting turnedaway. Staff were clear that the ground rules pro-vided no incentive for high schools to keep stu-dents who, they believed, would negativelyaffect their accountability ratings. A member ofEdgeview’s management team described howstudents are affected by these pressures:

I encountered a student just a week ago, and he is 16years old; this is his first year in the ninth grade. Hischances of graduating are slim. . . . Most of theninth-grade kids are like this—he is going to giveup. . . . If he does not make it to the 10th grade, he isgoing to be 17 years old, and he is going to be adropout. . . . No school is going to want to take him.They are not going to want him. He is going to screwup their test scores. There are no incentives [to keephim in school] unless you have a principal that iswilling to work with these kids. These kids movefrom school to school and then drop out.

An administrator at Clearbend—a high-minority, low-income school—described thedisincentives for high-ranking schools to take orkeep students who might lower their accounta-bility ratings:

I don’t think that schools that are blue-ribbon schoolsin the state of Texas or that are exemplary schoolswill take students like we have at Clearbend. . . . I’veheard stories of schools in our district that turn ourkids away. They find a way, and that’s wrong. That’smorally wrong, but they get away with it . . . and itstarts at the top.

The administrator described high schools’financial incentives to keep students in schoolthrough October and then push them out: “Manyschools unload their troublemakers right after the[enrollment] snapshot. They keep them so they canget the dollars.” Students at several high schoolsexplained that their schools were so full at the startof the year that there were no empty desks in class-rooms. However, by the time that testing occurredin the spring, there were many desks availablebecause many students were no longer attendingthe school. The means for pushing students outranged from enforcing zero-tolerance disciplinepolicies, especially on low-achieving students, toexpelling students for attendance problems, and tocounseling them out by encouraging them to enrollin GED programs or by transferring them to othernontraditional settings. Students were often explic-itly discouraged from taking the exit exam. Oneadministrator explained:

I think that the kids are being forced out of school. Ihad a kid who came here from Fine Oaks HighSchool and said, “Miss, if I come here, could I evertake the [exit exam]?” And I said, “What do youmean? If you come here, you must take the [exam].And he said, “Well, every time I think I’m going totake the test, they either say, “You don’t have tocome to school tomorrow, or you don’t have to [takethe test]. . . . We’re told different things.” That’swhen kids drop out . . . when you never give them achance. . . . I think that what has happened at FineOaks is what happens at many schools. I think we’vedone a lot to force kids out of school.

Most administrators whom we interviewedbelieved that practices were commonplace thatmanipulated the student population to game theaccountability system. As one administrator putit, BCSD was billed as a Texas miracle:

And I think it’s a nonmiracle. It’s not a miracle tomanipulate things. A miracle is saving kids actually inreality; that’s what miracles are. To go out and getthese kids who were dropped out or to get kids whoare not achieving and find ways—that’s a miracle. . . .It’s not to manipulate things so that it appears [to besomething it’s not.] It’s a façade.

The Relationship Among Student Retention,Dropout, and School Ratings

The qualitative interviews describe howhigh schools had responded to the press of

99

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 27: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

accountability by manipulating student popula-tions to raise test scores. Brazos City practition-ers perceived that a variety of gaming activities(especially purposeful school-level escalation ofgrade retention and student exclusion) boostedschool test scores and accountability ratings. Totriangulate these findings, quantitative analyseswere conducted to consider whether BCSD highschools were actually able to raise their testscores and accountability ratings by increasingretention and dropout or disappearance of low-scoring students.

During the formative years of Texas account-ability, the primary base indicators for deter-mining a high school’s accountability ratingswere the exit TAAS scores and the annual drop-out rate. Based on these measures, the four typ-ical TEA accountability rating levels wereexemplary, recognized, acceptable, and low per-forming. As Figure 8 suggests, between 1996and 2001, schools figured out how to achievehigh ratings. Although 26% were rated low per-forming in 1996–1997, by the next year, noschools were rated as such. From 1997–1998 to2000–2001, the proportion of schools falling inthe top two categories (recognized and

exemplary) increased from 8% to 43%, evenwhile test score standards were also increasing.

In the last year of the period, a large shiftoccurred as TEA again increased the TAASpassing-rate thresholds required for each cate-gory and phased in lower dropout targets.Following these readjustments and with adecrease in ninth-grade retention rates (from31% to 27% between 2001 and 2002), mostBCSD high schools were rated low performingonce again.

We wanted to evaluate whether BCSD highschools that responded to the press of accounta-bility by escalating dropout, disappearance, and9th-grade retention were indeed able to increase10th-grade test scores and, ultimately, TEAaccountability ratings. We conducted general-ized least squares regression analyses to evalu-ate whether exiting students raised test scoresindependent of the change in accountability rat-ing thresholds.

Tables 9 and 10 show the results of analysesexamining predictors of changes in reading andmathematics scores, using random effects andfixed effects for year and school. (Fixed effectsare often used to account for unmeasured

100

FIGURE 8. Texas Education Agency accountability ratings for Brazos City School District High Schools(1996–2002).

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 28: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

variables—for example, in the school environ-ment—that may influence the outcomes inquestion, in this case, test scores.) We use theHausman test to check the differences in out-comes of the two models, and we find no signif-icant difference, suggesting that the use of fixedeffects is not necessary in this case. In eachcase, we add the student progress measures(changes in school average retention, dropout,disappearance, and withdrawal rates) as a sepa-rate block, having controlled for changes in stu-dent characteristics and school capacity (teachercertification, experience, and turnover).

We find that adding the vector of variablesthat represent student progress through schoolto the equations sharply increases the proportionof explained variance in reading scores, from19% to 29%, and in math scores, from 5% to15%. Some changes in school demographicvariables influence changes in 10th-gradescores. For example, an increase in the propor-tion of students identified as “at risk” signifi-cantly depresses reading achievement on theTAAS at the 10th-grade level. Furthermore,increases in the proportion of students identifiedfor special education marginally increase read-ing scores in the fixed-effects model (perhapsbecause special education identification alsoincreases exemptions of students from the test).

After controlling for these changes, we findthat the most powerful predictor of changes in10th-grade scores in reading and math in allmodels is an increase in 9th-grade retention.Disappearance from school is marginally signif-icant in the reading models employing randomand fixed effects. No other variables are signifi-cant for either test.

In terms of effect size, a 7-percentage-pointincrease in 9th-grade retention predicts a1-point increase on the math TAAS–TLI score.An 8-percentage-point increase in 9th-graderetention predicts a 1-point increase in a highschool’s average reading TAAS–TLI indexscore. A 1-point increase on the reading TLIcould also be achieved by an increase of 11 per-centage points in student disappearance between9th grade and 10th grade. By increasing theiraverage school TLI, schools could ascend to ahigher TEA accountability rating category.

We examine how school strategies translateinto changes in TEA accountability rankings by

conducting a separate analysis for the sameyears (1997–2002) using multinomial logisticregression, which estimates the probability of anevent’s occurring and allows consideration ofmore than two categorical dependent variables.Using the same predictor variables, we obtainedregression coefficients for three contrasting sit-uations: a decrease in TEA school rating (usedas the reference group), no change in rating, oran increase in the school rating. TEA ratingswere a function of increases in TLI scorescoupled with officially reported drop-out ratesbelow threshold levels for each rating.

The results are compatible with the account-ability system incentives. Table 11 shows that,once again, ninth-grade retention rates stronglypredict better TEA ratings. The odds of a 1%increase in ninth-grade retention in a school thatincreased its TEA rating was about 24% greaterthan in a high school with a TEA ratingdecrease, both before and after changes in stu-dent characteristics and school capacity are con-trolled. Additionally, as called for in theaccountability system, officially reported ninth-grade drop-out rates show a negative coefficientin schools whose TEA ratings rose, as comparedto those where ratings declined. As we haveseen, these dropout codes bore little relationshipto actual school leaving for students.

Not all the responses of schools suggest gam-ing. It appears that schools could also improvetheir ratings by increasing their teaching capac-ity. An increase in school ratings was associatedwith a positive and significant increase in thepercentage of fully certified teachers. Suchimprovements in teacher qualifications were20% more likely in high schools that had anincrease in TEA ratings than in schools whereratings decreased. Also, schools that had stableratings were more likely than others to haveexperienced an increase in students’ being clas-sified as “at risk” in models that already con-trolled for race, language, and special educationstatus, whereas schools that increased their rat-ings were more likely to have lost special edu-cation students than were those whose ratingsdecreased.

The multinomial regression findings holddespite the fact that TEA test score thresholdswere increased and official dropout levels low-ered in the final year. Essentially, high schools

101

Accountability Texas-Style

(text continues on p. 106)

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 29: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

102

TAB

LE

9C

hang

es i

n 10

th-G

rade

Sco

res—

Exi

t Tex

as A

sses

smen

t of

Aca

dem

ic S

kill

s, R

eadi

ng:

Gen

eral

ized

Lea

st S

quar

es R

egre

ssio

n W

ith

Ran

dom

Effe

cts

and

Scho

ol/Y

ear

Fix

edE

ffect

s

Pred

icto

rR

ando

m e

ffec

ts: M

odel

AR

ando

m e

ffec

ts: M

odel

BFi

xed

effe

cts:

Mod

el A

Fixe

d ef

fect

s: M

odel

B

Con

stan

t1.

180**

(.45

2).9

39 (

.445

)1.

202**

(.53

3)81

6 (.

529)

ΔSc

hool

cap

acity

(%

)Fu

lly c

ertif

ied

–.07

1 (.

045)

–.03

5 (.

045)

–.07

3 (.

050)

–.03

6 (.

050)

Nov

ice

teac

her

–.04

0 (.

065)

–.03

3 .(

063)

–.03

9 (.

076)

–.03

2 (.

073)

Teac

her

turn

over

–.05

0 (.

046)

–.02

7 (.

045)

–.05

9 (.

051)

–.03

8 (.

049)

ΔSc

hool

dem

ogra

phic

(%

)W

hite

.127

(.1

66)

.205

(.1

66)

.230

(.2

19)

.301

(.2

19)

Lim

ited-

Eng

lish

prof

icie

nt.0

25 (

.078

).0

52 (

.078

).0

81 (

.088

).1

21 (

.088

)Sp

ecia

l edu

catio

n.2

13 (

.155

).2

62†

(.15

5).2

39 (

.196

).3

98**

(.20

3)A

t ris

k–.

083**

(.03

0)–.

099**

(.03

1)–.

087**

(.03

4)–.

101**

(.03

5)Δ

Nin

th-g

rade

stu

dent

pro

gres

s (%

)D

isap

pear

ance

—.0

64†

(.03

9)—

.082

†(.

046)

Ret

aine

d—

.119

**(.

044)

—.1

33**

(.05

1)W

ithdr

awal

—.0

59 (

.050

)—

.078

(.0

59)

Dro

pout

—–.

214

(.18

1)—

–.17

0 (.

201)

R2

.189

.288

.184

.278

n94

9494

94

Not

e.N

umbe

rs in

par

enth

eses

are

sta

ndar

d er

rors

.† p

<.1

0. * p

<.0

5. **

p<

.01.

*** p

<.0

01.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 30: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

103

TAB

LE

10

Cha

nges

in

10th

-Gra

de S

core

s—E

xit T

exas

Ass

essm

ent

of A

cade

mic

Ski

lls,

Mat

h: G

ener

aliz

ed L

east

Squ

ares

Reg

ress

ion

Wit

h R

ando

m E

ffect

s an

d Sc

hool

/Yea

r F

ixed

Effe

cts

Pred

icto

rR

ando

m e

ffec

ts: M

odel

AR

ando

m e

ffec

ts: M

odel

BFi

xed

effe

cts:

Mod

el A

Fixe

d ef

fect

s: M

odel

B

Con

stan

t.9

23 (

.504

).7

64 (

.499

).9

83 (

.608

).7

43 (

.622

Scho

ol c

apac

ity (

%)

Fully

cer

tifie

d–.

035

(.04

9).0

01 (

.050

)–.

027

(.05

7).0

03 (

.059

)N

ovic

e te

ache

r–.

068

(.07

3)–.

060

(.07

0)–.

071

(.08

7)–.

063

(.08

6)Te

ache

r tu

rnov

er.0

14 (

.051

).0

37 (

.050

).0

14 (

.058

).0

32 (

.058

Scho

ol d

emog

raph

ic (

%)

Whi

te.2

56 (

.185

).2

81 (

.186

).2

79 (

.250

).2

58 (

.257

)L

imite

d-E

nglis

h pr

ofic

ient

.105

(.0

87)

.107

(.0

88)

.130

(.1

00)

.142

(.1

03)

Spec

ial e

duca

tion

.042

(.1

73)

.140

(.1

74)

–.04

9 (.

223)

.142

(.2

38)

At r

isk

.015

(.0

34)

–.01

2 (.

036)

.007

(.0

39)

–.01

3 (.

041)

ΔN

inth

-gra

de s

tude

nt p

rogr

ess

(%)

Dis

appe

aran

ce—

.024

(.0

44)

—.0

18 (

.053

)R

etai

ned

—.1

46**

*(.

050)

—.1

41**

(.06

0)W

ithdr

awal

—.0

41 (

.057

)—

.034

(.0

69)

Dro

pout

—–.

136

(.50

1)—

–.10

9 (.

236)

R2

.046

.152

.042

.148

n94

9494

94

Not

e.N

umbe

rs in

par

enth

eses

are

sta

ndar

d er

rors

.* p

<.0

5. **

p<

.01.

*** p

<.0

01.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 31: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

104

TAB

LE

11

Mul

tino

mia

l L

ogis

tic

Reg

ress

ion

of T

exas

Edu

cati

on A

genc

y (T

EA

) A

ccou

ntab

ilit

y C

hang

es (

1997

–200

2):

Coe

ffici

ents

and

Odd

s R

atio

s

Pred

icto

rR

ando

m e

ffec

ts: M

odel

AR

ando

m e

ffec

ts: M

odel

BFi

xed

effe

cts:

Mod

el A

Fixe

d ef

fect

s: M

odel

B

TE

A r

atin

gSa

me

Ris

eSa

me

Ris

eSa

me

Ris

eSa

me

Ris

e

ΔSc

hool

cap

acity

(%

)

Fully

cer

tifie

d0.

083

0.04

50.

086

0.24

4*(0

.054

)(0

.075

)—

——

—(0

.073

)(0

.124

)0.

129

1.04

81.

090

1.27

7N

ovic

e te

ache

r–0

.022

−0.0

43–0

.161

–0.0

84(0

.076

)(0

.106

)—

——

—(0

.110

)(0

.173

)0.

774

0.95

80.

851

0.91

9Te

ache

r tu

rnov

er0.

047

0.01

70.

101

0.07

3(0

.055

)(0

.077

)—

——

—(0

.097

)(0

.131

)1.

048

1.01

81.

107

1.07

6

ΔN

inth

-gra

de s

tude

nt p

rogr

ess

(%)

Dis

appe

aran

ce–0

.112

–0.0

03–0

.061

0.13

3—

—(0

.060

)(0

.072

)—

—(0

.079

)(0

.111

)0.

894

0.99

70.

941

1.14

3R

etai

ned

0.21

5**0.

294**

0.22

1**0.

437**

——

(0.0

71)

(0.0

94)

——

(0.1

09)

(0.1

56)

1.24

01.

342

1.24

81.

548

With

draw

al–0

.149

–0.0

66–0

.147

0.04

4—

—(0

.077

)(0

.097

)—

—(0

.099

)(0

.147

)0.

054

0.93

60.

864

1.04

5D

ropo

ut–0

.150

–0.6

61*

–0.4

39–1

.845

**

——

(0.2

40)

(0.3

16)

——

(0.2

96)

(0.6

27)

0.86

10.

516

0.64

50.

158

(con

tinu

ed)

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 32: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

105

TAB

LE

11

(co

ntin

ued)

Pred

icto

rR

ando

m e

ffec

ts: M

odel

AR

ando

m e

ffec

ts: M

odel

BFi

xed

effe

cts:

Mod

el A

Fixe

d ef

fect

s: M

odel

B

TE

A r

atin

gSa

me

Ris

eSa

me

Ris

eSa

me

Ris

eSa

me

Ris

e

ΔSc

hool

dem

ogra

phic

(%

)

Whi

te0.

198

−0.0

050.

315

0.68

0—

——

—(0

.215

)(0

.303

)(0

.298

)(0

.523

)1.

219

0.99

51.

371

1.97

3L

imite

d-E

nglis

h pr

ofic

ient

–0.1

38–0

.261

–0.1

85–0

.164

——

——

(0.1

26)

(0.1

79)

(0.2

08)

(0.3

10)

0.87

10.

770

0..8

310.

849

Spec

ial e

duca

tion

——

——

–0.1

61–0

.638

–0.0

30–1

.203

(0.2

15)

(0.3

37)

(0.2

89)

(0.5

49)

0.85

10.

528

1.03

10.

300

At r

isk

0.19

5**0.

040

0.18

9*–0

.040

——

——

(0.0

68)

(0.0

85)

(0.0

93)

(0.1

38)

1.21

51.

041

1.20

90.

960

Not

e.N

umbe

rs in

par

enth

eses

den

ote

stan

dard

err

ors.

* p <

.05

**p

<.0

1.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 33: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

that escalated ninth-grade retention maintainedor increased their ratings in the years that theyescalated their ninth-grade retention. In relationto the school’s reference group, the reverse isalso true: A decrease in retention portends adecrease in rating. Further study will shed lighton whether high schools that responded to thepress of accountability by excluding studentsfrom 10th-grade testing (and even from school)were able to continue to increase test scores overthe long term as they adjusted to the changingformulas and requirements of the state’s second-generation accountability policies.

These robust findings regarding the effectsof ninth-grade retention on school test scoresand accountability ratings support the con-tention made by Holmes (2006) that when largenumbers of students are retained in a grade, thenext grade’s scores are higher because the lowscorers are removed from that year’s pool.Although school and district scores may go up,the long-term consequences for individual stu-dents are invisible in the yearly snapshot ofhigh-stakes test-score reporting. As a result ofretention practices and strategies resulting inthe loss of low-achieving students, large gainsin school and districtwide test scores can beobtained without improving educational oppor-tunities for those students (Owens & Ranick,1977).

Discussion

This study has sought to better understandstudent achievement and progress in BrazosCity, a large urban district in the midst of first-generation Texas-style high-stakes testing andaccountability. We examined trends in studentperformance and progression through schoolwhile investigating sources of potential gamingthat have been identified in other studies: graderetention, student exclusion from testing andfrom school, and misreporting of indicators thatare valued in the accountability system (e.g.,drop-out rates).

A major strategy for avoiding the TAAStests at the high school level was 9th-graderetention. At its peak, more than 30% of 9th-gradestudents were retained for 1 or more years. Ofthose who were retained, only 12% ever tookthe TAAS, and only 8% passed it. A majority of

retained students left school as dropouts ordisappearances. Although official drop-outrates were kept low—under the annual 3.5%threshold required for a recognized schoolaccountability rating—the proportion of stu-dents withdrawing or disappearing reachedmore than 40% of the cohort. More than 90% ofstudents who were coded as withdrawn or whodisappeared from the data set had failed to passthe exit exam. We also found that some studentswere kept in the 9th grade for more than 1 yearand then skipped to the 11th or 12th grade,thereby never taking the 10th-grade exit TAASthat was used in the accountability ratings.

Although BCSD reported soaring graduationrates and high pass rates on the exit TAAS in10th grade, our high school cohort analysis forthose entering 9th grade in 1997 documentedthat only 33% of the cohort had graduatedwithin 5 years; that 49% had dropped out, with-drew, or disappeared from the data set (amongthese, about 8% likely transferred to schoolsoutside the district); and that the remainder werestill enrolled in school, trying to make up cred-its or pass the exit exam. African Americans,Latinos, and LEP students had the lowest grad-uation rates.

The large discrepancy between publiclyreported graduation rates and micro-level stu-dent data can be reconciled only by consideringthe many ways that students are excluded fromthe enrollment data for calculating drop-outrates. In addition to the large number of disap-pearances of students from the data set with nocodes, most withdrawals appear to be dropouts.This finding aligns with qualitative data thatdescribe how low-achieving students wereretained and how they were discouraged fromentering and staying in school.

In addition to finding increases in 9th-graderetention rates over the period studied, especiallyfor African American and Latino students, wefound that high schools that retained greaternumbers of students in the 9th grade—and thosewith more student disappearances—were able toboost their 10th-grade exit TAAS scores and stateaccountability ratings. The significant relation-ships between (a) increases in 9th-grade retentionand student disappearances and (b) gainsin test scores and ratings hold up before andafter considering changing student populations,

106

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 34: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

teacher capacity, and other measures of studentprogress. The widespread use of these strategiesis confirmed by interviews with students, teach-ers, and administrators.

In the Texas high-stakes accountability sys-tem that we studied, Brazos City schools wereforced to organize their responses around snap-shot accountability measures based on testscores and reported drop-out rates instead of along-term measure of student learning and suc-cess in completing school. From an institutionaltheory perspective, this macro-level policysought to build public confidence in educationbased on student achievement on standardizedtests. As test scores improved, the state and dis-trict gained confidence from the media andpolitical system. However, when students didnot show test score improvement, the onus ofaccountability fell on them and their schools,instead of the state. Although many schools andstudents were handicapped by capacity andresource constraints, the state was able to trans-fer the consequences for failure to them.Improvements in the quality of education for theleast advantaged students did not materialize.

Thus, governmental agents, instead of students,became the primary beneficiary of accountabil-ity policy, whereas some students were clearlythe losers.

It may be possible to construct a high-stakesaccountability system without engenderingsome version of gaming. However, it is apparentthat first-generation Texas-style accountabilityclearly created incentives for pushing out stu-dents from high schools and that schoolsresponded to each shift in the incentives by find-ing new accountability loopholes to manipulatestudent placements and how data were reportedabout students. Students also detailed structuralobstacles designed to encourage them to leave,such as excessive enforcement of attendancepolicies, repetitive class and grade-level assign-ment, and a generally nonsupportive environ-ment for low-achieving students. An importantquestion for the field is whether there is any wayto protect low-income, low-achieving students—often students of color and recent immigrants—from bearing the brunt of accountability strategiesthat impose test-based sanctions on the schoolsthey attend.

107

Accountability Texas-Style

Appendix: Summary of Variables Used in School-Level Regression Analyses (1997–2002)

n Minimum Maximum M SD

Δ Achievement scoresa

Reading 96 –5.93 6.85 0.92 2.30Math 96 –4.92 10.02 1.16 2.37

Δ School capacity (%)

Fully certified 94 –11.66 14.70 3.15 5.34Novice teacher 94 –13.05 4.97 –4.94 3.76Teacher turnover 95 –22.37 9.71 –1.47 5.02

Δ Student progress (%)

9th gradeDisappearance 96 –24.26 19.48 2.00 8.86Retained 96 –14.12 15.53 0.86 5.45Withdrawal 96 –12.88 23.02 0.73 6.81Dropout 96 –6.14 4.22 0.05 1.27

10th gradeDisappearance 72 –16.45 19.50 1.82 7.31Retained 96 –15.27 15.96 2.33 5.31Withdrawal 96 –12.07 24.43 1.90 6.14Dropout 72 –4.82 2.60 0.02 1.15

(continued)

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 35: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

References

Advocates for Children. (2002). Pushing out at-riskstudents: An analysis of high school dischargefigures—A joint report by AFC and the PublicAdvocate. Retrieved November 30, 2005, from http://www.advocatesforchildren.org/pubs/pushout11-20-02.html

Allington, R. L., & McGill-Franzen, A. (1992).Unintended effects of educational reform in NewYork. Educational Policy, 6(4), 397–414.

Amrein, A. L., & Berliner, D. C. (2002). High-stakestesting, uncertainty, and student learning. EducationPolicy Analysis Archives, 10(18). Retrieved November16, 2003, from http://epaa.asu.edu/epaa/v10n18

Amrein-Beardsley, A., & Berliner, D. C. (2003).Re-analysis of NAEP math and reading scores instates with and without high-stakes tests: Responseto Rosenshine. Education Policy Analysis Archives,11(25). Retrieved November 16, 2003, from http://epaa.asu.edu/epaa/v11n25/

Carnoy, M., & Loeb, S. (2002). Does externalaccountability affect student outcomes? A cross-state analysis. Education Evaluation and PolicyAnalysis, 24(4), 305–332.

Carnoy, M., Loeb, S., & Smith, T. (2001). Do higherstate test scores in Texas make for better highschool outcomes? (CPRE Research Report No.RR-047). Philadelphia: Consortium for PolicyResearch in Education.

Clarke, M., Haney, W., & Madaus, G. (2000,January). High stakes testing and high schoolcompletion. Retrieved March 16, 2008, from the

Web site of the National Board on EducationalTesting and Public Policy: http://www.bc.edu/research/nbetpp/publications/v1n3.html

Darling-Hammond, L. (1991). The implications oftesting policy for quality and equality. Phi DeltaKappan, 73(3), 220–225.

Darling-Hammond, L. (2000). Teacher quality andstudent achievement: A review of state policy evi-dence. Education Policy Analysis Archives, 8(1).Retrieved July 30, 2004, from http://epaa.asu.edu/epaa/v8n1/

Darling-Hammond, L., & Sykes, G. (2003). Wanted:A national teacher supply policy for education:The right way to meet the “Highly QualifiedTeacher” challenge. Education Policy AnalysisArchives, 11(33). Retrieved April 15, 2007, fromhttp://epaa.asu.edu/epaa/v11n33/

DeBray, E., Parson, G., & Woodworth, K. (2001).Patterns of response in four high schools understate accountability policies in Vermont and NewYork. In S. H. Fuhrman (Ed.), Annual yearbook ofthe National Society for the Study of Education:Vol. 2. From capitol to the classroom: Standards-based reform in the states (pp. 170–192). Chicago:University of Chicago Press.

Diamond, J. B., & Spillane, J. P. (2004). High-stakesaccountability in urban elementary schools:Challenging or reproducing inequality? TeachersCollege Record, 106, 1145–1176.

Figlio, D. N., & Getzer, L. S. (2002, April).Accountability, ability, and disability: Gaming thesystem? Cambridge, MA: National Bureau ofEconomic Research.

108

Appendix (continued)

n Minimum Maximum M SD

Δ School demographic (%)

White 96 –10.27 3.04 –0.31 1.48Limited-English proficient 96 –10.36 12.57 0.11 3.20Special education 96 –2.76 6.78 0.60 1.48At risk 96 –12.86 28.05 4.90 7.98

School demographics (%)White 96 0.11 60.97 11.70 15.97Limited-English proficient 96 0.00 42.38 10.59 9.01Special education 96 0.28 27.77 12.33 5.45At risk 96 2.13 88.63 65.51 20.59Free lunch 96 5.22 91.11 39.22 17.23

School capacity (%)Certified teachers 95 47.27 86.36 69.23 7.88Novice teachers 95 0.00 28.95 10.18 7.62Teacher turnover 96 0.00 100.00 8.59 13.65

a. Tenth-grade exit Texas Assessment of Academic Skills–Texas Learning Index.

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 36: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

Greene, J. P. (2002). High school graduation rates inthe United States: Revised. New York: ManhattanInstitute for Policy Research. Retrieved December 1,2007, from http://www.manhattan-institute.org/pdf/cr_baeo.pdf

Grissmer, D. W., Flanagan, A., Kawata, J., &Williamson, S. (2000). Improving student achieve-ment: What do state NAEP test scores tell us?Santa Monica, CA: RAND.

Hamilton, L. M., Stecher, B. M., & Klein, S. P.(2002). Making sense of test-based accountabilityin education. Santa Monica, CA: RAND.

Haney, W. (2000). The myth of the Texas miracle ineducation. Education Policy Analysis Archives,8(41). Retrieved April 15, 2004, from http://epaa.asu.edu/epaa/v8n41/

Hanushek, E., & Raymond, M. (2003). Improvingeducational quality: How best to evaluate ourschools? In Y. Kodrzycki (Ed.), Education in the21st century: Meeting the challenges of a changingworld (pp. 193–224). Boston: Federal ReserveBank of Boston.

Heubert, J., & Hauser, R. (Eds.). (1999). High stakes:Testing for tracking, promotion, and graduation.Washington, DC: National Academy Press.

Holmes, C. T. (2006). Low test scores + high reten-tion rates = more dropouts. Kappa Delta PiRecord, 42(2), 56–58.

Jacobs, B. A. (2001). Getting tough? The impactof high school graduation exams. EducationalEvaluation and Policy Analysis, 23(2), 99–122.

Jordan, H. R., Mendro, R. L., & Weerasinghe, D.(1997, June). Teacher effects on longitudinal stu-dent achievement: A preliminary report on researchon teacher effectiveness. Paper presented at theNational Evaluation Institute, Indianapolis, IN.

Klein, S. P., Hamilton, L. S., McCaffrey, D. F., &Stecher, B. M. (2000). “What do test scores inTexas tell us?” Education Policy Analysis Archives,8(49). Retrieved February 27, 2006 from http://epaa.asu.edu/epaa/v8n49/

Lee, J., & Wong, K. (2004). The impact of accountabil-ity on racial and socioeconomic equity: Consideringboth school resources and achievement outcomes.American Educational Research Journal, 41(4),797–832.

Lilliard, D., & DeCicca, P. (2001). Higher standards,more dropouts? Evidence within and across time.Economics of Education Review, 20(5), 459–473.

Linton, T. H., & Kester, D. (2003). Exploring theachievement gap between White and minority stu-dents in Texas: A comparison of the 1996 and 2000NAEP and TAAS eighth grade mathematics testresults. Education Policy Analysis Archives,11(10). Retrieved July 20, 2007, from http://epaa.asu.edu/epaa/v11n10/

McCombs, J. S., Kirby, S. N., Barney, H., Darilek, H.,& Magee, S. (2005). Achieving state and nationalliteracy goals: A long uphill road. Santa Monica,CA: RAND.

McNeil, L. (2005). Faking equity: High-stakes testingand the education of Latino youth. In A. Valenzuela(Ed.), Leaving children behind: How “Texas-style”accountability fails Latino youth (pp. 57–112).Albany: State University of New York Press.

Meyer, J., & Rowan, B. (1977). Institutionalizedorganizations: Formal structure as myth and cere-mony. American Journal of Sociology, 83, 340–363.

Mintrop, H. (2003). The limits of sanctions in low-performing schools: A study of Maryland andKentucky schools on probation. Education PolicyAnalysis Archives, 11(3). Retrieved November 8,2004, from http://epaa.asua.edu/epaa/v11n3.htm

National Center for Education Statistics. (2003).Characteristics of the 100 largest public elemen-tary and secondary school districts in the UnitedStates: 2001–02. Washington, DC: U.S. Departmentof Education.

Nichols, S. L., Glass, G. V., & Berliner, D. C. (2006).High-stakes testing and student achievement: Doesaccountability pressure increase student learning?Educational Policy Analysis Archives, 14 (1).Retrieved on August 5, 2007 from http://epaa.asu.edu/epaa/v14n1/

Oliver, C. (1991). Strategic responses to institutionalprocesses. Academy of Management Review, 16(1),145–179.

Orfield, G., & Ashkinaze, C. (1991). The closingdoor: Conservative policy and Black opportunity.Chicago: University of Chicago Press.

Orfield, G., Losen, D., Wald, J., & Swanson, C. B.(2004). Losing our future: How minority youth arebeing left behind by the graduation rate crisis.Cambridge, MA: Civil Rights Project at HarvardUniversity.

Owens, S. A., & Ranick, D. L. (1977). TheGreensville program: A commonsense approach tobasics. Phi Delta Kappan, 58(7), 531–533.

Roderick, M., Bryk, A., Jacob, B., Easton, J., &Allensworth, E. (1999). Ending social promotion:Results from the first two years. Chicago: Consortiumon Chicago School Research.

Rosenshine, B. (2003). High-stakes testing: Anotheranalysis. Education Policy Analysis Archives,11(24). Retrieved March 16, 2008, http://epaa.asu.edu/epaa/v11n24/

Rumberger, R. W., & Larson, K. A. (1998). Studentmobility and the increased risk of high schooldropout. American Journal of Education, 107(1),1–35.

Rustique-Forrester, E. (2005). Accountability and thepressures to exclude: A cautionary tale from

109

Accountability Texas-Style

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from

Page 37: Educational Evaluation and Policy Analysis · Linda Darling-Hammond Stanford University This study examines longitudinal student progress and achievement on the elementary, middle,

England. Education Policy Analysis Archives.Retrieved March 16, 2008, http://epaa.asu.edu/epaa/v13n26/

Schiller, K., & Muller, C. (2000). External examina-tions and accountability, educational expectations,and high school graduation. American Journal ofEducation, 108(2), 73–102.

Smith, F. (1986). High school admission and theimprovement of schooling. New York: New YorkCity Board of Education.

Stock, J. H., & Watson, M. W. (2003). Introduction toeconometrics. Boston: Addison-Wesley HigherEducation.

Texas Education Agency. (2000). TAAS technicaldigest. Retrieved July 20, 2007, from http://www.tea.state.tx.us/student.assessment/researchers.html

Texas Education Agency. (2002). Three-year follow-up of a Texas public high school cohort (WorkingPaper No. 6). Austin, TX: Author.

Texas Education Agency. (2003). Secondary schoolcompletion and dropouts in Texas public schools,2001–02. Austin, TX: Author.

Wheelock, A. (2003). School awards programs andaccountability in Massachusetts: Misusing MCASscores to assess school quality. Cambridge, MA:Fair Test.

Wright, S. P., Horn, S. P., & Sanders, W. L. (1997).Teacher and classroom context effects on studentachievement: Implications for teacher evaluation.Journal of Personnel Evaluation in Education,11(1), 57–67.

Authors

JULIAN VASQUEZ HEILIG is an assistant profes-sor of educational policy and planning in theDepartment of Educational Administration at theUniversity of Texas at Austin. His current researchincludes quantitatively and qualitatively examininghow high-stakes testing and other accountability-basedreforms and incentive systems impact minority stu-dents. Other research interests include issues of access,diversity, and equity in higher education.

LINDA DARLING-HAMMOND is Charles E.Ducommun Professor of Education at StanfordUniversity. Her research, teaching, and policy inter-ests focus on school reform, teaching quality, andeducational equity.

Manuscript received February 8, 2007Final revision received January 2, 2008

Accepted January 23, 2008

110

Vasquez Heilig and Darling-Hammond

at UNIV OF TEXAS AUSTIN on March 12, 2013http://eepa.aera.netDownloaded from


Recommended