+ All Categories
Home > Documents > DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10....

DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10....

Date post: 26-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
378
ED 377 073 AUTHOR TITLE SPONS AGENCY REPORT NO PUB DATE CONTRACT NOTE AVAILABLE FROM PUB TYPE EDRS PRICE DESCRIPTORS DOCUMENT RESUME SE 055 578 Romberg, Thomas A., Ed. Mathematics Assessment and Evaluation: Imperatives for Mathematics Educators. Office of Educational Research and Improvement (ED), Washington, DC.; Wisconsin Center for Education Research, Madison. ISBN-0-7914-0900-7 92 R117G10002 378p. State University of New York Press, State University Plaza, Albany, NY 12246 (paperback: ISBN-0-7914-0900-7; clothbound: ISBN-0-7914-0899-X). Books (010) Viewpoints (Opinion/Position Papers, Essays, etc.) (120) MFO1 /PC16 Plus Postage. *Calculators; Criticism; Educational Assessment; Elementary Secondary Education; *Evaluation Methods; Mathematics Achievement; Mathematics Education; *Mathematics Tests; *Sex Differences; State Programs; *Student Evaluation; Testing IDENTIFIERS *Alternative Assessment; NCTM Curriculum and Evaluation Standards; Reform Efforts; *State Mathematics Assessments ABSTRACT This books contains papers written on issues related to externally mandated mathematics tests and their influence on school mathematics. Chapter 1 presents an overview of the book, including brief abstracts of each chapter. Chapter 2 presents a summary of the overall problems associated with the need for valid information. Remaining chapte...-s include: (3) Implications of the National Council of Teachers of Mathematics (NCTM) Standards for Mathematics Assessment (Norman Webb & Thomas A. Romberg); (4) Curriculum and Test Alignment (Thomas A. Romberg, and others); (5) State Assessment Test Development Procedures (James Braswell); (6) Test Development Profile of a State-Mariated Large-Scale Assessment Instrument in Mathematics (Tej Pandey); (7) Assessing Students' Learning in Courses Using Graphics Tools: A Preliminary Research Agenda (Sharon L. Senk); (8) Mathematics Testing with Calculators; Ransoming the Hostages (John G. Harvey); (9) Gender Differences in Test Taking: A Review (Margaret R. Meyer); (10) Communication and the Learning of Mathematics (David Clarke, and others); (11) Measuring Levels of Mathematical Understanding (Mark Wilson); (12) A Framework for the California Assessment Program to Report Students' Achievement in Mathematics (E. Anne Zarinnia & Thomas A. Romberg); (13) Evaluation--S.me Other Perspectives (Phillip C. Clarkson). A reference list organized by chapter contains 300 citations. Appendices include the NCTM Evaluation Standards, a classification matrix, illustrative questions, history and rationale for student mathematics journals, SMP Project student log sample pages, and the report of Vermoner Mathematics Portfolio Assessment Program. (MKR)
Transcript
Page 1: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

ED 377 073

AUTHORTITLE

SPONS AGENCY

REPORT NOPUB DATECONTRACTNOTEAVAILABLE FROM

PUB TYPE

EDRS PRICEDESCRIPTORS

DOCUMENT RESUME

SE 055 578

Romberg, Thomas A., Ed.Mathematics Assessment and Evaluation: Imperativesfor Mathematics Educators.Office of Educational Research and Improvement (ED),Washington, DC.; Wisconsin Center for EducationResearch, Madison.ISBN-0-7914-0900-792R117G10002378p.

State University of New York Press, State UniversityPlaza, Albany, NY 12246 (paperback:ISBN-0-7914-0900-7; clothbound:ISBN-0-7914-0899-X).Books (010) Viewpoints (Opinion/Position Papers,Essays, etc.) (120)

MFO1 /PC16 Plus Postage.

*Calculators; Criticism; Educational Assessment;Elementary Secondary Education; *Evaluation Methods;Mathematics Achievement; Mathematics Education;*Mathematics Tests; *Sex Differences; State Programs;*Student Evaluation; Testing

IDENTIFIERS *Alternative Assessment; NCTM Curriculum andEvaluation Standards; Reform Efforts; *StateMathematics Assessments

ABSTRACTThis books contains papers written on issues related

to externally mandated mathematics tests and their influence onschool mathematics. Chapter 1 presents an overview of the book,including brief abstracts of each chapter. Chapter 2 presents a

summary of the overall problems associated with the need for validinformation. Remaining chapte...-s include: (3) Implications of theNational Council of Teachers of Mathematics (NCTM) Standards forMathematics Assessment (Norman Webb & Thomas A. Romberg); (4)

Curriculum and Test Alignment (Thomas A. Romberg, and others); (5)

State Assessment Test Development Procedures (James Braswell); (6)

Test Development Profile of a State-Mariated Large-Scale AssessmentInstrument in Mathematics (Tej Pandey); (7) Assessing Students'Learning in Courses Using Graphics Tools: A Preliminary ResearchAgenda (Sharon L. Senk); (8) Mathematics Testing with Calculators;Ransoming the Hostages (John G. Harvey); (9) Gender Differences inTest Taking: A Review (Margaret R. Meyer); (10) Communication and theLearning of Mathematics (David Clarke, and others); (11) MeasuringLevels of Mathematical Understanding (Mark Wilson); (12) A Frameworkfor the California Assessment Program to Report Students' Achievementin Mathematics (E. Anne Zarinnia & Thomas A. Romberg); (13)Evaluation--S.me Other Perspectives (Phillip C. Clarkson). Areference list organized by chapter contains 300 citations.Appendices include the NCTM Evaluation Standards, a classificationmatrix, illustrative questions, history and rationale for studentmathematics journals, SMP Project student log sample pages, and thereport of Vermoner Mathematics Portfolio Assessment Program.(MKR)

Page 2: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

U' ".." .1

I.

Page 3: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

MATHEMATICS ASSESSMENTAND

EVALUATION

3

Page 4: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

K

SUNY Series. Reform in Mathematics EducationJudith Sowder, editor

The preparation of this book was supported by the Office forEducational Research and Improvement, United States Depart-ment of Education (Grant Number R117G10002) and by theWisconsin Center for Education Research, School of Education.University of Wisconsin-Madison. The opinions expressed inthis publication do not necessarily reflect the views of the Of-fice of Educational Research and Improvement or the Wiscon-sin Center for Education Research.

4

Page 5: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

MATHEMATICS ASSESSMENTAND

EVALUATION

Imperatives for Mathematics Educators

Edited by

Thomas A. Romberg

STATE UNIVERSITY OE NEW YORK PRESS

5

Page 6: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Published byState Univesity of New York Press, Albany

© 1992 State University cf New York

All rights reserved

Printed in the United States of America

No part of this book may be used or reproducedin any mariner whatsoever without written permissionexcept in the case of brief quotations embodied incritical articles and reviews.

For information, address the State University of New York Press,State University Plaza, Albany, NY 12246

Production by Christine M. LynchMarketing by Fran Keneston

Library of Congress Cataloging-in-Publication Data

Mathematics assessment and evaluation : imperatives formathematics educators / edited by Thomas A. Romberg

p. cm. (SUNY series, reform in mathematicseducation)

Includes bibliographical references and index.ISBN 0-7914-0899-X (CH : acid-free). ISBN 0 -7914-0900 -7 (PB : acid-free)

1. Mathematical abilityTesting. 2. MathematicsStudy and teachingUnited States. I. Romberg, ThomasA. II. Series. 510'.71dc20 91-11157

CIP

10 9 8 7 6 5 4 3 2 1

Page 7: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

CONTENTS

1. Overview of the BookThomas A. Romberg 1

2. Evaluation: A Coat of Many ColorsThomas A. Romberg 10

3. Implications of the NCTM Standards forMathematics AssessmentNorman Webb and Thomas A. Romberg 37

4. Curriculum and Test AlignmentThomas A. Romberg, Linda Wilson,'Mamphono Khaketla, and Silvia Chavarria 61

5. State Assessment Test Development ProceduresJames Braswell 75

6. Test Development Profile of a State-MandatedLarge-Scale Assessment Instrument inMathematicsTej Pandey 100

7. Assessing Students' Learning in CoursesUsing Graphics Tools: A PreliminaryResearch AgendaSharon L. Senk 128

8. Mathematics Testing with Calculators:Ransoming the HostagesJohn G. Harvey 139

7

Page 8: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

9. Gender Differences in Test Taking: A ReviewMargaret R. Meyer 169

10. Communication and the Learningof MathematicsDavid Clarke, Max Stephens, andAndrew Waywood 184

11. Measuring Levels of MathematicalUnderstandingMark Wilson 213

12. A Framework for the CaliforniaAssessment Program to Report Students'Achievement in MathematicsE. Anne Zarinnia and Thomas A. Romberg 242

13. EvaluationSome Other PerspectivesPhilip C. Clarkson 285

Appendices 301

References 335

Contributors 356

Index 357

Page 9: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

1

Overview of the Book

Thomas A. Romberg

The purpose of this book Is to share with mathematics educa-tors a set of recent papers written on issues surrounding math-ematics tests and their influence on school mathematics. Theimpetus for the contributions grew from a conference, TheInfluence of Testing on Mathematics Education," sponsored bythe Mathematical Sciences Education Board (MSEB) at UCLAin June 1986. The purpose of the conference was to gatherinformed input and advice on current testing practices. Thefact is that students in American schools are subjected to avariety of tests, often standardized tests, from kindergarten tograduate school. Such tests are, according to widely held per-ception, inhibitors to change and improvement in educationand especially in mathematics education. Since MSEB was or-;anized to coordinate the current reform movement in schoolmathematics, the topic of the conference was deemed critical.Two things became clear at the UCLA meeting: First, therewas agreement that tests need to change to reflect curriculumchanges, and second, many participants articulated theirbeliefs about the inadequacy of current tests and provided rel-evant anecdotes on problems to others at the conference. How-ever, no one was sure how such changes could be accom-plished, nor did participants even have substantive, reliableinformation about the actual impact of testing on classroompractices.

1

9

Page 10: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

2 Mathematics Assessment and Evaluation

Following the conference, a three-person Testing Design TaskForce, consisting of Jeremy Kilpatrick, Tej Pandey, and ThomasRomberg, was organized by MSEB. In September 1986, thistask force produced ''A Proposal for Studies of MathematicsTests and Testing" based on the following four assumptions:

Valid information about student mathematics is neededby a variety of people (students, teachers, parents,administrators, policy makers) for a variety of pur-poses (monitoring progress, selection for and place-ment in courses, program evaluation, accountability).Both the curriculum and teaching practice in math-ematics need to be directed toward strategies whichstudents could use to solve problems, the applicationof mathematics to practical situations, and the devel-opment of thinking skills. Consequently, testing shouldreflect students' achievement in these directions.

Serious questions have arisen about the validity ofexisting tests for the uses to which they are beingput. Standardized tests and state-mandated tests mayyield information that is invalid for certain purposesand provide little or no information on several impor-tant dimensions of achievement.

The continued use of existing tests appears likely toimpede the much-needed reform in curriculum andinstruction to which the mathematics education com-munity is committed.

On the basis of these assumptions, a set of questions andresearch studies was proposed. In particular, several literaturereviews were planned, each of which would explore one facet ofthe validity of mathematics tests for various purposes. Topicswere to include surveys of testing practice, the alignment oftests with curricula, test-preparation practices and effects, test-taking skills, the student use of calculators during test taking,teacher and student attitudes toward tests, time spent in test-ing, alternatives to testing, and minority group and gendergroup differences in risk taking and test performance.

In 1987, when the Wisconsin Center for Education Researchwas awarded the grant to form the National Center for Re-

10

Page 11: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Overview of the Book 3

search in Mathematical Sciences Education (NCRMSE), NCRMSEassumed responsibility for carrying out several aspects of theproposed scope of work outlined by the MSEB task force. Thepapers in this volume represent a number of the literaturereviews proposed. They were written by the Center staff or byinvited scholars. The contributions cover many of the issuesidentified by the MSEB task force and are an important contri-bution to our growing knowledge about the impact of tests andtesting on school mathematics.

The papers arc only a part of the work now being conductedby the Center on this important topic. Since 1987. the Centerhas conducted two major surveys. The first, a national surveyof a sample of Grade 8 mathematics teachers (Romberg, Zarinnia,& Williams, 1989). provides information about teachers' per-ceptions of the impact of mandated testing on their instruction.Findings reveal that teachers are familiar with mandated tests,make efforts to ensure that students perform well on the tests,and actiu -. their curriculum and modes of instruction to focuson the knowledge and skills being tested.

The second is a survey of state mathematics coordinatorson the current types of mandated testing in the fifty states(Romberg, Zarinnia, & Williams, 1990). This study examinesthe actual mandated testing practices in each state, includingthe kinds of tests given, the uses to which they are put, andthe kinds of test-score information subsequently available tothe teachers.

In addition to these surveys and this collection of papers,four related activities are now in progress:

1) During the past year, two in-depth case studieson the impact of mandated testing in classroomshave been conducted at four sites. Informationfrom these studies is now being analyzed.

2) Three extensive reviews of literature and of prac-tice are now underway on classroom testing forinstructional decision making, testing for place-ment and grouping, and test validity.

3) Some sample test items have been written and arebeing tried out: they have been designed to assesslevel of reasoning in some of the particular do-mains outlined in the Curriculum and Evaluation

a

Page 12: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

4 Mathematics Assessment and Evaluation

Standards for School Mathematics (National Coun-cil of Teachers of Mathematics, 1989).

4) Two projects on curriculum design, development,and assessment are being conducted jointly withthe Research r;roup in Mathematics Education atthe University of Utrecht.

In all, the work of the staff and consultants of NCRMSE onthe influence of testing on school mathematics makes it clearthat valid information about student performance is sorelyneeded if the reform movement in school mathematics is tosucceed.

OVERVIEW OF THE CHAVFERS

The next twelve chapters were prepared during 1988 and 1989Chapter 2 represents a summary of the overall problems asso-ciated with the need for valid information. Chapters 3 and 4examine the use of tests in the context of the current reformmovement in school mathematics. Chapters 5 and 6 describethe current procedures used co develop state tests. Chapter 7summarizes current efforts to incorporate the use of calcula-tors in mathematics tests. This is followed by chapter 8, aleview of research on testing with calculators. Chapter 9 is areview of gender differences and testing. Chapter 10 is an ex-amination of an Australian project addressing teachers' assess-ment practices. The next two chapters, 11 and 12, deal withalternative strategies for gathering, analyzing. and reportingstudent performance information. The final chapter is an in-vited review and critique of chapters 2 through 12.

Chapter 2: Evaluation: A Coat of Many Colorsby Thomas A. Romberg

An earlier draft of this chapter was prepared as an invitedaddress for Theme GroupT4, Evaluation and Assessment, atthe Sixth International Congress on Mathematical Education inHungary. This paper examines both the methods of gatheringinformation from students and the use of that information tomake a variety of Judgments. It considers the history of evalua-tion and how evaluation relates to the gathering of assessmentdata and to educational decision making. To examine the

n

Page 13: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Overview of the Book 5

strengths and weaknesses of the evaluation of the impact ofnew mathematics programs and of large-scale profile evalua-tions, it describes trends in evaluation and assessment thatshow the disparity between what is possible and what is. infact, achieved.

Chapter 3: Implications of the NCTM Standardsfor Mathematics Assessmentby Norman Webb and Thomas A. Romberg

In 1987 work began on the NCTM Standards. Thomas Rombergwas the chair of the commission that produced this documentand Norman Webb chaired the working group that preparedthe evaluation standards. This chapter includes criteria for as-sessment which would be compatible with and supportive ofthe curriculum standards. Three examples of alternative as-sessment techniques are presented that correspond to the in-tent of the evaluation standards and provide illustrations offorms of assessment that are applicable in evaluating the cur-riculum standards.

Chapter 4: Curriculum and Test Alignmentby Thomas A. Romberg, Linda Wilson,'Mamphono Khaketla, and Silvia Chavarria

In this chapter, a variety of tests and test items are examinedto determine whether they reflect the recommendations madeIn the Standards In the initial sections, six commonlystandardized tests are examined. It is clear from this examina-tion that those tests fail to assess the higher-order skills suchas problem solving, reasoning, and connections that are stressedin ti t. Standards Then items are identified from other testswhich could be used to assess such aspects of mathematics.

Chapter 5: State Assessment TestDevelopment Proceduresby James Braswell

The primary purpose of this paper is to describe how tests aredeveloped for state assessment programs. The methods describedare based In part on discussions with state department of edu-cation assessment staff members in Florida, Louisiana, Massa-chusetts, Michigan, and New Jerseystates in which testing

jfr 3

Page 14: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

6 Mathematics Assessment and Evaluation

practice was judged to be representative of a range of approachesto test development. Occasional references that reflect previousexperience with other state testing programs and current workwith the National Assessment of Educational Progress test de-velopment team for the 1990 Mathematics Assessment are alsotaken into consideration.

Chapter 6: Test Development Profile of a State-MandatedLarge-Scale Assessment Instrument inMathematicsby Tej Pandey

Two main types of large-scale assessments are examined inthis paper. The first focus of interest is oriented primarily tothose individuals who typically use test information to rank astudent on an established norm, find a student's strengths andweaknesses, and determine whether that student has masteredspecific course content. The second focus of interest lies primarilyin the administrative use of information to determine the achieve-ment level of students in a school, district, or regional systemfor purposes of assessing program effectiveness. This paperexamines the nature and design of test instruments in a large-scale assessment program (California Assessment Program) pro-viding reliable group-level information. The paper also describesthe test development process as it has evolved over a period offifteen years to meet the curriculum demands of the time.

Chapter 7: Assessing Students' Learning in CoursesUsing Graphics Tools: A PreliminaryResearch Agendaby Sharon L. Senk

Recently mathematics educators have called for the use of cal-culator and computer-graphing technology in mathematicsclasses, and several software and curriculum developmentprojects have been initiated to transform these recommenda-tions into reality. However, until now, there has been littlesystematic study of how teaching. learning, and assessment incourses using such graphics tools are affected by the technol-ogy. This paper describes a preliminary agenda developed byresearchers in the field for assessing students' learning incourses using graphing tools. Included are suggested investiga-tions of student and teacher outcomes and a discussion ofmethodological issues.

14

Page 15: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Overview of the Book 7

Chapter 8: Mathematics Testing with Calculators:Ransoming the Hostagesby John G. Harvey

This paper argues that present testing practices hold today'sstudents hostage to yesterday's mistakes. The author predictsthat because mathematics tests fail to incorporate the use ofcalculators in the testing process, mathematics instruction willfail to incorporate the use of calculators effectively, continuingto hold today's students prisoners to a mathematics curricu-lum that is failing to prepare them for society's immediate needsas well as those of the twenty-first century. The paper suggeststhat the use of calculators on mathematics tests will not rem-edy the failures of present tests, but that their use is necessaryif we want students to investigate, to explore, and to discovermathematics effectively.

Chapter 9: Gender Differences in Test Taking: A Reviewby Margaret R. Meyer

Ideally. when students take a mathematics examination, theonly thing that should influence their score is their mastery ofthe material being tested. This paper reviews evidence concern-ing the existence of gender differences in mathematics testtaking. It examines several factors that have surfaced relatingto differences in performances for males and females. Thesefactors are power vs. speed test conditions, item-difficulty se-quencing, examination format, test-wiseness, risk-taking be-havior, and test-preparation behaviors. One conclusion reachedis that the use of the multiple-choice format could result in amale advantage. A recommendation is therefore made that as-sessment instruments not rely as heavily on the multiple-choiceformat.

Chapter 10: Communication and the Learning ofMathematicsby David Clarke, Max Stephens, andAndrew Waywood

The learning of mathematics is fundamentally a matter of con-structing mathematical meaning. The environment of the math-ematics classroom provides experiences which stimulate thisprocess of construction. This chapter presents the findings of

1a

Page 16: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

8 Mathematics Assessment and Evaluation

three studies based in Australian schools: the IMPACT Project,Assessment Alternatives in Mathematics, and the Vaucluse Col-lege Study. The purpose of the research synthesis consideredin this chapter is to discuss (a) the extent to which the strate-gies reported encourage children to broaden their mathematicalthinking and facilitate meta-learning and (13) the impact ofthese strategies on the nature of mathematical activity in class-rooms, in particular with reference to redefining the roles ofteacher and student in creating and giving personal meaning tomathematics.

Chapter 11: Measuring Levels of MathematicalUnderstandingby Mark Wilson

This chapter describes recent psychometric advances in thecreation of models that measure developmental change in un-derstanding. Standardized, norm-referenced tests are based onan accumulation of bits of knowledge rather than on under-standing, which is a constructivist, developmental process. Asthe latter conception gains more acceptance, there is a need fornew assessment models. Empirical examples of response mapsare used to illustrate the potential of the new models.

Chapter 12: A Framework for the California AssessmentProgram to Report Students' Achievementin Mathematicsby E. Anne Zarinnia and Thomas A. Romberg

The purpose of this paper is to propose categories for the Cali-fornia Assessment Program that report student achievement inmathematics. Initially, the purpose of reporting achievementwas accountability. This paper examines explicit and tacit mes-sages imposed in the analyzing, gathering, and aggregating ofthis information that expose subtle effects on teaching andstudent achievement. The paper determines that units of analysisand reporting categories are needed that will both deliberatelysupport the purposes of gathering adequate information formonitoring andby focusing attention on critical consider-ationspromote reform in mathematics education. This chap-ter outlines seven bases for forming reporting categories.

Page 17: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Overview of the Book 9

Chapter 13: EvaluationSome Other Perspectivesby Philip C. Clarkson

A common response to the challenge of the Standards is, "Yes,but who will change the tests?" (National Council of Teachers ofMathematics, 1989, p. 189).

It is apparent that "the tests" referred to are not the teststeachers give in their classrooms on a day-to-day or weeklybasis. They have control over those already. "The tests" are thestandardized assessment instruments which are used through-out the United States, often authorized by legislation, devisedby commercial organizations, and seen by many teachers in thecountry as being a forceful factor in structuring their math-ematics curriculum.

The preceding chapters have provided background informa-tion on these tests and have made some suggestions on howthey could be altered. None reflect on the question as to whetherthey are indeed necessary. This chapter sketches developmentsover the last twenty -five years in the State of Victoria, Austra-lia, where there is now only one external test given at the endof the school system. in Year 12. This contrasting situation maycontribute constructively to the ongoing debate in both Austra-lia and in the United States as to how to monitor the work ofschools.

In summary, as the title to this book suggests, the authorsof these chapters address an important set of issues aboutmathematics assessment and evaluation. It is clear that it isimportant to gather information on student performance inmathematics for a variety of reasons. However, while the math-ematics curriculum and the way mathematics is taught arechanging, the definition of assessment and how performance isassessed also need to change. It is imperative. if the schoolmathematics reform efforts are to be successful, that math-ematics educators become aware of the issues addressed inthese chapters.

Page 18: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

2

Evaluation: A Coat of Many Colors

Thomas A. Romberg

"EVALUATE: to Judge or determine the worth or quality of."Webster's New World Dictionary, 1985.

This paper examines both the methods of gathering informa-tion from students and the use of that information to make avariety of Judgments. It considers the history of evaluation.and how evaluation relates to the gathering of assessmentdata and to educational decision making. To examine thestrengths and weaknesses of evaluations of the impact of newmathematics programs and large-scale profile evaluations, itdescribes trends in evaluation and as5essment, showing thedisparity between what is possible and what is already beingdone.

Evaluation in education has evolved from an initial andsingle concentration on the measurement of achievement inorder to make judgments about students to the current andgrowing interest in providing information to support policy andprogram decision making. To make these latter judgments, in-formation from students about their mathematical achievementis usually used. Thus, in this paper both the methods of gath-ering information from students and the uses of that Informa-tion to make a variety of judgments are examined.

The assessment of student performance in schools has along history. However, contemporary models for the gatheringof performance data and the use of the information for policy

10

1.6

Page 19: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 11

and program decision making have only evolved during thepast quarter-century. The purposes of this survey paper are

I. To relate the gathering of assessment data to edu-cational decision making;

2. To trace the history of this evolution. The assess-ment history begins in the nineteenth century andthe evaluation history in the 1930s. However,against the background of each case, developmentsin the past decade are stressed;

3. To illustrate the strengths and weaknesses of twocontemporary social policy evaluation models.These are: evaluations of the impact of new math-ematics programs, and large-scale profile evalua-tions; and

4. To describe four recent trends in evaluation andassessment.

Although the history of and trends in assessment and evalu-ation are not unique to school mathematics, the emphasis andexamples in this paper focus on assessing mathematical per-formance and on the use of that information in instructionaland policy contexts Also, the examples have been selected toreflect the variety in models, methods, and procedures usedthroughout the world.

The principal point which should be understood is that atpresent there is considerable disparity between theory and prac-tice. Academic considerations about goals, decisions, methodsof gathering information, and the validity of that informationare in sharp contrast to the political and practical expectationsof many governments and administrators. What is possible dif-fers from what is done.

EDUCATIONAL DECISION MAKING

The following examples are provided to illustrate the relation-ships between measures of achievement and the variety of situ-ations in which this information is used to make a judgment(hence, the title of this chapter):

1) A student has decided to study biology and wouldlike to know whether she has the prerequisiteknowledge to enroll in a biometrics course.

19

Page 20: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

12 Romberg

2) The admissions committee of a tertiary institutionmust select one hundred students from someeight hundred who have applied for an engineer-ing program.

3) A teacher would like to grade students on howwell they understand the chapter on simultaneouslinear equations Just completed.

4) An official in a department of education hasbeen asked to provide a legislative committeewith information about pupil performance inmathematics.

5) A publishing company is interested in developinga text to teach specific concepts of statistics tostudents in middle school. It needs feedback fromteachers about the adequacy of the materials (i.e.,what things were successful and what things werenot) so that improvements can be made.

6) A researcher interested in early cognitive develop-ment with respect to mathematics would like toassess the ability of preschool children to handlecertain mathematical relationships, such as thecomparison of two sets with respect to numerosity.

7) An employer is interested in the mathematical ca-pability of job applicants.

8) An official must decide which students are to beadmitted to academic high schools and which totechnical schools.

These examples are only a few of the typical situations inwhich information from students about their mathematical per-formance is frequently used. In addition, they reflect the diver-sity of judgment (qualification, selection, placement, diagnosis,grading, profiling, researching, and so forth) involved in thosedecisions as well as the variety of personnel (students, admin-istrators, teachers, developers, employers, and researchers).

From these examples. I have assumed that information fromstudents about their mathematical achievement is importantand that such information should influence educational deci-sions. The scenarios cited here are but a few examples of themany decisions facing educators throughout the world. Whetherachievement data as a source of information actually influence

Page 21: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 13

schooling decisions is a separate and distinct empirical ques-tion. Nevertheless, valid data about student achievement shouldbe available and used when making many such decisions.

Also, we must ask: How should such information be elic-ited? The answer to this question will be based on a secondassumption: The methods of gathering information (how data iscollected, from whom, and how it is aggregated. organized, andreported) depend on the decisions to be made.

From these assumptions and the examples given above, Ibelieve three elements of the decision-making process shouldbe considered.

1) The decisions must be specifically identified. Gath-ering information without an explicit purpose inmind wastes time and resources. Although it isnow fashionable to create data bases under theassumption that having such data will be useful,it has been shown that such data bases are rarelyused or of value unless the purposes for whichthe data are to be used were considered when de-signing the data base.

2) The implications of the judgments to be made, orthe questions to be answered, must be examined.This involves considering error in measurement,the errors in judgment that one is willing to toler-ate, and whether the decisions are irrevocable.Teachers may be willing to accept considerablemeasurement error when administering chaptertests because they can rely on other informationto judge a student's progress; a developer may bewilling to live with high judgment errors in thedevelopment of a new instructional unit, while anadmissions committee should seek minimal mea-surement error in choosing which applicants toaccept into a program.

3) The "unit" about which the decisions are to bemade must be specified (individuals, groups,classes, schools, materials, research questions). Ithas long been common practice to test all stu-dents on every item in every test; data from indi-viduals can then be aggregated at any group level

Page 22: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

14 Romberg

for any purpose. This practice is extremely waste-ful, in terms of both cost and time. For example,the administration of a standardized test merelyto publish the results in the local press (as iscommon in the United States) is wasteful both ofstudent time and district resourcesthat is, thecost of administration and scoring. Profiling schoolperformance can be accomplished more efficientlyby other means.

In summary, to assess student performance in mathemat-ics, one should consider the kinds of judgments or evaluationsthat need to be made and tailor the assessment procedures tothe decisions that will be made on the basis of those judg-ments. This is particularly important when the information isbeing used by policy makers to make programmatic decisions.

HISTORY OF ASSESSMENT AND EVALUATION

The history of the measurement of human behavior, with pri-mary reference to the capacities and educational attainmentsof school students, may be divided roughly into four periods.During the first period, from the beginning of the historicalrecord to the nineteenth century, measurement in educationwas quite crude. During the second period, embracing approxi-mately the whole of the nineteenth century, educational mea-surement began to assimilate, from various sources, the ideasand the scientific and statistical techniques which were later toresult in the psychometric testing movement. The third period,dating from about 1900 to the 1960s, can be characterized asthe psychometric period. The final period, dating from the 1960sto the present, is the policy-program evaluation period.

Early ExaminationThe initiation ceremonies by which primitive tribes tested theknowledge of tribal customs, endurance, and the readiness ofthe young for admission to the ranks of adulthood may beamong the earliest examinations employed by human beings.Use of a crude oral test was reported in the Old Testament. andSocrates is known to have employed searching types of oralquizzing. Elaborate and exhaustive written examinations were

4.4

Page 23: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 15

used by the Chinese as early as 2200 B.C. in the selection oftheir public officials. These illustrations may be classified ashistorical antecedents of performance tests, oral examinations,and essay tests. However, there :s no evidence that differentindividuals ever took the same tests and all judgments weremade by officials in a manner similar to that used in examiningdoctoral students today.

Educational Testing in the Nineteenth CenturyThree persons made outstanding contributions to educationaltesting in the nineteenth century. The ideas of these menHorace Mann, George Fisher, and J. M. Riceset the precedentfor developments during the present century.

The first school examinatior I note appear to have beeninstituted in the United States, the Boston schools in 1845,as substitutes for oral tests when enrollments became so largethat the school committee could no longer examine all pupilsorally. These written examinations, in arithmetic, astronomy,geography, grammar, history, and natural philosophy, impressedHorace Mann, then secretary of the Massachusetts Board ofEducation. As editor of the Common School Journal, he pub-lished extracts from them and concluded that the new writtenexamination was superior to the old oral test in these respects:

1. It is impartial2. It is just to the pupils.3. It is more thorough than older forms of examina-

tion.4. It prevents the "officious interference" of the

teacher.5. It "determines, beyond appeal or gainsaying,

whether the pupils have been faithfully and com-petently taught."

6. It removes "all possibility of favoritism."7. It makes the information obtained available to all.8. It enables public appraisal of the ease or difficulty

of the questions. (Greene, Jorgenson, & Gerberich,1953)

Although these ideas are similar to the objectives repre-sented by modern tests, the instruments themselves were inad-equate. However, in successive issues of the Common School

f)a

Page 24: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

16 Romberg

Journal, Mann was to suggest most of the principles of exami-nation that are found in contemporary measurementforexample, timed responses by students to identical seriesquestions.

To the Reverend George Fisher, an English schoolmaster,goes the credit for devising and using what were probably thefirst objective measures of achievement. His "scale books," usedin the Greenwich Hospital School as early as 1864, providedthe means for evaluating accomplishments in handwriting, spell-ing, mathematics, grammar and composition, and in severalother school subjects. Specimens of pupil work were comparedwith "standard specimens" to determine numerical ratings that,at least for spelling and a few other subjects, depended onerrors in performance (Greene, Jorgenson, & Gerberich, 1953).Scoring procedures for many examinations still follow this pro-cedure (e.g., the English 0 level examinations).

The use of test information for program evaluation was firstdeveloped by J. M. Rice, an American dentist. In 1894, hedeveloped a battery spelling test. Having administered a list ofspelling words to pupils in many school systems and analyzedthe results, Rice found that pupils who had studied spellingthirty minutes a day for eight years were no better spellersthan children who had studied the subject fifteen minutes aday for eight years. Rice was attacked and reviled for this "her-esy," and some educators even attacked the use of a measureof how well pupils could spell as a means of evaluating theefficiency of spelling instruction. They intended that spelling betaught to develop the pupils' minds and not to teach them tospell. It was more than a decade later that Rice's pioneeringeffort resulted in significant attention to objective models ineducational testing (Ayres, 1918).

The Psychometric Period

This era began shortly after the turn of the century. Althoughthe historical antecedents sketched in the preceding paragraphswere essential prerequisites, developments first in mental test-ing and shortly after in achievement testing lay a; the roots oftesting progress in this era.

General Intelligence Tests. Attempts to measure general in-telligence. ability to learn, and ability to adapt oneself to newsituations had been made both in the United States and in

4 zi

Page 25: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 17

France. The first individual test was developed in France, andthe first group test was developed some years later in the UnitedStates.

Individual intelligence scales were originated in 1905 byBinet and Simon in France. Their first scale was devised prima-rily for the purpose of selecting mentally retarded pupils whorequired special instruction. This pioneer individual-intelligencescale was based on interpreting the relative intelligence of dif-ferent children at any given chronological age by the number ofquestions they could answer of varied types and increasinglevels of difficulty. These characteristics were all reembodied inthe 1908 and 1911 revisions of the Binet-Simon Scale andremain basic to most individual intelligence scales today. The1908 revision introduced the fundamentally important conceptof mental age (MA) and provided a means for determining it(Freeman, 1930).

The first group intelligence test was Army Alpha, used forthe measurement and placement of American army recruitsand draftees during World War I. It was the product of thecollaboration of various psychologists working on group intelli-gence tests when the United States entered the war.

Aptitude Tests. The measurement of aptitudes, or those po-tentialities for success in an area of performance that existprior to direct acquaintance with that area, was closely relatedto intelligence testing. Early attempts to measure general intel-ligence tested many specific traits and aptitudes, but this ap-proach was abandoned after Binet showed that tests of morecomplex forms of behavior were superior. It was soon apparent,however, that general intelligence tests were not highly predic-tive of certain types of performance, especially in the tradesand industry. Munsterberg's aptitude tests for telephone girlsand streetcar motormen were followed by tests of mechanicalaptitude, musical aptitude, art aptitude, clerical aptitude, andaptitude for various subjects of the high school and collegecurriculum (Watson, 1938). Spearman's (1904) splitting of totalmental ability into a general factor and many specific factorshad a decided influence on this movement.

Achievement Tests. Modern achievement testing was stimu-lated by Thorndike's (1904) book on mental, social, and educa-tional measurements. Through his book and his influence onhis students, Thorndike was predominantly responsible for the

4,0r

Page 26: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

18 Romberg

early development of standardized tests. Stone, a student ofThornclike's, published the first arithmetic reasoning test in1908. Between 1909 and 1915, a series of arithmetic tests andsca. for measuring abilities in English composition, spelling,drawing, and handwnting were published (Odell, 1930). Duringthe more than half century since these early testing efforts,literally thousands of standardized achievement tests have beenpublished.

The reasons for presenting this brief history of testing arethreefold. First, what is referred to as the "modern testing move-ment" began with a selection problem (Binet & Simon) and aplacement problem (Army Alpha). It was assumed that a singlemeasure (e.g., MA) or index (e.g., IQ) could be developed tocompare individuals on what was assumed to be a general,fixed, unidimensional trait. In turn, the procedures that evolvedin developing and administering these tests were used in apti-tude and achievement tests. Second, the testing proceduresnow considered typical in many countries were developed forgroup administration of early intelligence tests. Such tests com-prise a set of questions (items), each having one unambiguousanswer. In this sense, the tests are "objective," since no allow-ance is made for subjective inferences. Third, subjects are ad-ministered the same items under standard (nearly identical)conditions with the same instructions, time, and constraints.Furthermore, subjects' answers can be easily scored as corrector not, the total number of correct answers tallied, tallies trans-formed, and transformed scores compared. Psychometrics, in-volving the application of statistical procedures to such tests,developed as a field of study in the 1920s.

Most importantly, it should be understood that the testingmovement was a product of a historical era. It grew out of themachine-age thinking of the industrial revolution of the lastcentury. Business, industry, and, in particular, schools havebeen conceived, modified, and operated based on this mechani-cal view of the world since before the turn of the century.

The Policy-Program Evaluation Period

Information about student achievement has long been used byteachers and educators to make decisions about students. How-ever, the use of that information for wide-scale policy or pro-gram judgments is recent. It began with the burst of reformpolicies associated with the mid-sixties Great Society initiatives

Page 27: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 19

in the United States. Federal-level insistence on evaluation ofthese initiativ,.s was thrust upon a largely unprepared field. Inareas as diverse as bilingual education, career education, com-pensatory programs, reading, or mathematics, little expertise inevaluation existed in the very agencies responsible for carryingout program evaluation. In fact, In the United States the initialtraining institute on program evaluation was held at the Uni-versity of Illinois in 1963 (directed by Lee Cronbach).

That early work followed the notions of Ralph Tyler (1931),the "father of educational evaluation." His conception of evalua-tion involved comparison between intended and observed pro-gram objectives. Tyler's model of evaluation in education pre-vailed until the 1970s. when his approach, like traditional socialscience models, was found inadequate as a guide for policy andpractice. The Tyler evaluation model was based on thehypothetico-deductive traditions of "hard science." It focusedon outcomes and sought significant differences between in-tended and observed outcomes. Initial evaluations of federaleducation prcgrams used experimental methodologies to assessstudent achievement and program effectiveness. As applied,this approach paid little attention to the context of programactivities or the processes by which program plans weretranslated into practice (Sash, 1985: O'Keefe. 1984). The dis-course about evaluation included fairly rigid rules for "good"design and "scientific" evaluation. In particular, evaluatorsgathered data on student performance using standard achieve-ment tests.

In summary. evaluation for policy and program purposesbegan in the 1960s by attempting to apply "scientific" prin-ciples that used concepts from the experimental sciences. Theinformation on students was from tests based on the psycho-metric assessment technique outlined above. Again, this ap-proach to evaluation is a product of "industrial age" thinking.

TWO SOCIAL POLICY EVALUATION MODELS

Policy makers (legislators, government officials, school adminis-trators, and other educators) must make many decisions re-lated to the teaching and learning of mathematics. In this sec-tion, two evaluation models often used by policy makers arcexamined in detail so that their strengths and weaknesses be-come apparent.

n -4

Page 28: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

20 Romberg

Program Evaluation

Attempts to evaluate the impact of a new curriculum programinvolved comparison of the performance of a group of studentswho had studied mathematics from that curriculum with analternate group, most often a nonequivalent group. Performancewas measured in both groups based on scores derived from thesame instrument. Initially. in the United States standardizedtests were used; later it became common to use objective-refer-enced teststhat is, tests that produced scores related to spe-cific instructional objectives.

Norm-referenced standardized tests have become an an-nual ritual in most American schools. Such tests are designedto indicate a respondent's position in a population. Each testcomprised a set of independent, multiple-choice questions. Theitems have necessarily been subjected to a preliminary trialwith a representative pupil group so that it is possible to ar-range them in the desired manner with respect to difficulty andthe degree to which they discriminate among students. Also,the test is accompanied by a chart or table to be used totransform test results into meaningful characterizations of pu-pil mental ability or achievement (grade-equivalent scores, per-centiles, stanines).

Three features of such tests merit comment. First, althougheach test is designed to order individuals on a single (unidi-mensional) trait, such as quantitative aptitude, the derived scoreis not a direct measure of that trait. Second, because individualscores are compared with those of a norm population, therewill always be some high and some low scores. This is trueeven if the range of scores is small. Thus, high and low scorescannot fairly or accurately be judged as "good" or "bad" withrespect to the underlying trait. Third, test items are assumed tobe equivalent to one another. They are selected on the basis ofgeneral level of difficulty (p-value) and some index of discrimi-nation (e.g., nonspurious biserial correlation). Furthermore, thetest items are not representative of any well-defined domain.

The primary strengths of standardized tests are that theyare relatively easy to develop, inexpensive, and convenient toadminister. Furthermore, the results are readily comprehen-sible since standard procedures are followed.

Thu, primary weakness is that they are often used as abasis for decisions they were not designed to address. For ex-

28

Page 29: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 21

ample, aggregating standardized scores for students in a class(or in a school, or district) to produce a class profile of achieve-ment (class mean) is very inefficient. The tests provide too littleinformation in light of the high cost involved. In fact, it hasbecome clear that such tests are of little value for most evalua-tions, since the items are not intended to be representative ofthe mathematical domains in the curriculum.

Unfortunately, in the United States their use appears to bemore strongly related to political, rather than educational, uses.For example. it is claimed that elected officials and educationaladministrators increasingly use the scores from such tests incomparative waysto indicate which schools, school districts,and even individual teachers give the appearance of achievingbetter results (National Coalition of Advocates for Students,1985). Such comparisons are simply misleading. One can onlyconclude that standardized tests are unwisely overused.

Objective-referenced tests are a product of the behavioralobjectives movement in the 1960s. They were developed to pro-vide teachers with an objective set of procedures with which tomake instructional decisions. Item development was based onthe identification of a set of behavioral objectives such as, "Thesubject, when exposed to the conditions described in the ante-cedent. displays the action specified in the verb in the situationspecified by the consequent to some specified criterion" (Rom-berg, 1976, p. 23). Items randomly selected from a pool designedto represent the antecedent conditions and the same action verbare given to students. From their responses, diagnosis of prob-lems or judgments of mastery of objectives can be made.

Three features of these tests should be mentioned. First,they usually are designed as part of a curriculum and meant tobe administered to individuals at the end of a specific instruc-tional topic. Often, they are given individually, and teachers'judgments are made quickly. Second, they have occasionallybeen used in group settings. For example, the comprehensiveachievement monitoring scheme (Gorth, Schriber, & O'Reilly,1974) periodically assesses student performance on a set ofobjectives. Third, decisions about performance are made withrespect to certain a priori criteria.

The strengths of objective-referenced tests lie in their use-fulness in instruction. As long as instruction on a topic focuseson the acquisition of some specific concept or skill, such testscan be used to indicate whether or not the concept has been

Page 30: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

22 Romberg

learned or the skill mastered. Furthermore, such tests are scoredeasily and are readily interpretable.

Objective-referenced tests have four weaknesses. First, speci-fying a set of behavioral objectives fractionates mathematicalknowledge. In no way is it possible to reflect in these tests theinterrelatedness of concepts and procedures in any domain.Second, objective-referenced tests are costly to construct, be-cause hundreds of objectives are included in any instructionalprogram. Third, simple aggregation across objectives is not rea-sonable. since objectives are interdependent. Fourth, and mostimportantly, items for higher-level or complex problem-solvingprocesses are very difficult to construct and are usually omit-ted. In fact, as used, these tests reinforce the factory metaphorof schooling. They clearly do not reflect how students reasonabout problem situations, interpret results, or build arguments.

The problem faced by most program evaluators in the 1960swas a direct result of the "scientific" tradition. The only evi-dence deemed of value was student performance at the end oftreatment when compared with that of an alternate treatmentgroup, and the evidence was gathered from either a standard-ized test or, later, a criterion-referenced test. The results ofexaminations (such as the National Longitudinal Study of Math-ematical Abilities, Beg le & Wilson, 1970) did not show that thenew program was uniformly superior to the old program, butrather that different curricula were associated with differentpatterns of achievement.

Policy ProfilesProfile tests are intended to provide information on a variety ofmathematical topics so that policy makers can compare indi-viduals or groups in terms of those topics. Profile tests havebecome very popular. They have been developed for severalmajor studies of mathematical performance, including the Na-tional Assessment of Educational Progress (NAEP) in the UnitedStates, the First International Mathematics Study (FIMS), theSecond International Mathematics Study (SIMS), and the As-sessment Performance Unit (APU) in England.

Five features of profile assessments distinguish them fromprevious tests. First, they make no assumption of an underly-ing single trait: rather, the tests are designed to reflect themultidimensional nature of mathematical content. Second, itemssimilar to those used in standardized or objective-referenced

Page 31: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 23

tests are used. However, it must be acknowledged that themathematics profiles developed by the APU in England (Foxmanet al., 1980, 1981) differ from most other profile assessments inthe choice and form of items or exercises administered. Theirexercises include a variety of open-ended questions, perfor-mance tasks, and other items. Third, the unit of investigationis a group rather than an individual. Matrix sampling is usu-ally used so that a wider variety of items can be included.Fourth, comparisons between groups are shown graphically onactual scores so that no transformations are needed (see, forexample, Figures 2-1 and 2-2). Finally, validity is determined

GradeLevel

0

9G

WG

9A

10A

9B

10B

Percent Correct

50 100

5 26 4

5 3 61 42

5 3 1 14 4

T-Mean T4AeanCor. % Omds

1 81

8 66

7 53

16 41

34 30

50 19

0 81

2 79

this topic is not part of the Grade 7 or Grade 8 program.

a surprisingly large number of Grade 10 Advanced students omitted theseitems.

results indicate that where this skill is needed in Grade 11 and 12 it should bereviewed and practiced then.

Figure 2-1. Algebraequations and inequalities. Range of correct responses to the sixinstruments, by grade (from McLean, 1982, p. 207).

Reprinted "nth permission from the Queens Printer for Ontario

ttJ1

Page 32: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

24 Romberg

Grade

7

8

9

10

0

Percent Correct

30 45+ 60 75 90

Statistical Summary

Grade No. of Grade GradeLevel Classes Mean St. Dev.

7 97 18.6 11.88 98 26.8 12.99 122 25.6 15.4

10 103 30.4 13.8

Figure 2-2. Range of correct responses to topic group by grade given in percentages.(McLean, 1982, p. 138).

Reprinted with permission from the Queen's Printer tor Ontano.

in terms of content and/or curriculum. Mathematicians andteachers are asked to judge whether individual items reflect acontent-by-behavior cell in a matrix. In fact, the usual ap-proach in profile testing is to specify a content-by-behaviormatrix. For example, to establish a framework for an item do-main, a content-by-behavior grid was developed for each targetpopulation in SIMS (Weinzweig & Wilson, 1977). The contentdimensions for both Grade 8 and Grade 12 populations wereintended to cover all topics likely to be taught in any country.For Grade 8, the content outline contained 133 categories un-der five broad classifications: arithmetic, algebra, geometry, sta-tistics, and measurement. For Grade 12, the content descrip-tion was broader, containing 150 categories under sevenheadings: sets and relations, number systems, algebra, geom-etry. elementary fractions and calculus, probability and statis-tics, and finite mathematics.

,10J 4

Page 33: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 25

For each population in SIMS, the behavior dimension re-ferred to four levels of cognitive complexity expected of stu-dents: computation, comprehension, application, and analysis.This classification is adapted from Bloom's Taxonomy of Educa-tional Objectives (1956). The adaptation involved replacing"knowledge" with "computation's and eliminating the higher lev-els of synthesis and evaluation. Data from such tests can thenbe reported in several ways. First, they can be reported interms of items or cell means. For example, in Figure 2-1, themeans are presented for six items on a topic (each given adifferent instrument) for different students at different gradesin the province of Ontario, Canada (McLean, 1982). Second,item scores can be aggregated by columns to yield cognitivelevel scores or by rows to yield topic scores (see Figure 2-2).

One strength of profile achievement tests is that they canprovide useful information about groups; thus they are par-ticularly useful for evaluating educational policy changes thatdirectly affect classroom instruction. However, profile achieve-ment tests are weak in four specific areas. First, because theyare designed to reflect group performance, they are not usefulfor ranking or diagnosing individuals. An individual studenttakes only a sample of items. Second, they are somewhat morecostly to develop and harder to administer and score thanstandardized norm-referenced tests. Third, because they yielda profile of scores, they are often difficult to interpret. Finally,however, the primary weakness of most profile achievementtests centers on the outdated assumptions underlying the twodimensions of content-by-behavior matrices. The content di-mension involves a classification of mathematical topics into"informational" categories. As I have argued:

"Informational knowledge" is material that can be fallenback upon as given, settled. established, assured in adoubtful situation. Clearly, the concepts and processesfrom some branches of mathematics should be knownby all students. The emphasis of instruction, however,should be "knowing how" rather than "knowing what."(Romberg, 1983, p. 122)

Furthermore, items in any content category are tested as ifthey were independent of one another, a practice that ignoresthe interconnections between ideas within a well-defined math-

440

Page 34: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

26 Romberg

ematical domain. Schoenfeld and Herrmann (1982) cautionedabout the problems inherent in testing students on isolatedtasks.

If they succeed on those problems, we and they con-gratulate each other on the fact that they have learnedsome powerful mathematical techniques. In fact, theymay be able to use such techniques mechanically Whilelacking some rudimentary thinking skills. To allow them,and ourselves, to believe that they understand the math-ematics is deceptive and fraudulent. (p. 29)

Thus, the items should reflect the interdependence, rather thanindependence, of ideas in a content domain.

The behavior dimension of matrices has always posed prob-lems. All agree that Bloom's Taxonomy (1956) has proven use-ful for low-level behaviors (knowledge, comprehension, and ap-plication), but difficult for higher levels (analysis, synthesis,and evaluation). Single-answer, multiple-choice items are notreasonable at higher levels. One problem is that the Taxonomysuggests that "lower" skills should be taught before "higher"skills. The fundamental problem is the Taxonomy's failure toreflect current psychological thinking on cognition, and the factthat it is based on "the naive psychological principle that indi-vidual simple behaviors become integrated to form a more com-plex behavior" (Collis, 1987, p. 3). In the past thirty years, ourknowledge about learning and about how information is pro-cessed has changed and expanded.

Bloom's taxonomy of educational objectives epitomizes thedomination of American education by scientific management,for it completed the process by which not only the content oflearning, but the proxies for its intelligent application, wereclassified, organized in a linear sequence and, by definition,broken into a hierarchy of mutually exclusive cells. The conse-quences in the classroom were far reaching. Scope and se-quence charts prescribed which parts of a subject were to becovered in what order: each cellular part of each subject wasput into a matrix (e.g.. Romberg & Kilpatrick, 1969, p. 285);behaviors suggesting desirable intellectual activity were alsosequenced. However, given the multiplicity of subject cells to becovered, the easiest way to finish the prescribed course of studywas to simply cover content without worrying too much about

Page 35: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 27

thought. Furthermore, matrices are difficult to construct effec-tively on paper in more than two dimensions. Consequently,few scope and sequence charts addressed, in a coherent man-ner, both levels of thinking and specific aspects of contentwithin an overall discipline. Thus, one focus of concern in docu-ments addressing the quality of education has been the failureof students to attain "higher order intellectual skills" (NationalCommission on Excellence in Education, 1983, p. 9).

Attacking behaviorism as the bane of school mathematics,Eisenberg (1975) criticized the dubious merit of a task-analy-sis, engineering approach to curricula, because it essentiallyequates training with education, missing the haart and essenceof mathematics. Expressing concern over the validity of learn-ing hierarchies, he argued for a reevaluation of the objectives ofschool mathematics. The goal of school mathematics is to teachstudents to think, to make them comfortable with problemsolving, to help them question and formulate hypotheses, in-vestigate. and simply tinker with mathematics. In other words,the focus is turned inward to cognitive mechanisms.

I believe that instruments for assessment should embodythe following commonalities:

I. All knowledge is rooted in experience.2. Knowledge entails the structural modeling of per-

ceived regularities and the reconciling of irregu-larities.

3. Cohesion of structure is integral and derived frompurpose.

4. Quality is determined by predictive power.5. Disequilibrium is essential to the process.6. Knowledge is both individual and communal.

Simply stated, there is a need for tools that document theproduction of knowledge and not mei ely the proxies that con-tribute to the process, such as time spent learning or the qual-ity of the teaching staff. A sufficiently detailed view of the pro-cess is essential in order to have some idea of how to constructpolicies for intervention. However, if there is any lesson to belearned from the old paradigm, it is that parts of the processcannot be analyzed in isolation, and then aggregated, with theresult regarded as an adequate indicator.

Page 36: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

28 Romberg

In summary, profiling is important, but current profile testsfail to reflect the way mathematical knowledge is structured orhow information is processed within mathematical domains.

TRENDS

In this section four trends are described. The first three areacademic or theoretical trends apparent in the literature onassessment and evaluation. The last is a conservative politicaland practical trend which, in some respects, runs counter tothe other trends.

The Trend in gram EvaluationFar from the limited alternatives of "treatment/control" or ran-domized designs (see Campbell & Stanley, 1966), contemporaryevaluators have developed a diverse assortment of evaluationapproaches from which to choosegiven purpose, context, andprogram stage. In contrast to the one right way" approach ofthe 1960s, today evaluators have multiple (and not always com-patible. approaches. This trend began in the 1970s when schol-ars trained in disciplines other than experimental psychologywere asked to assist in educational evaluations. Scholars likeMichael Young (1975), Michael Apple (1979), and a little laterThomas Popkewitz (1984), who were trained in anthropology,sociology, and political science respectively, brought the meth-ods of information gathering and analysis of those disciplinesto evaluation. In fact, the list of names of designations for thenew methods and models can be confusing to someone unfa-miliar with the field of evaluation and the controversies thatunderlie the various empirical procedures. For example, thecatalogue of choices now available to evaluators includes: goal-free evaluation (Scriven, 1974); advocate evaluation (Stake &GJerde, 1974; Reinhard, 1972): connoisseurship (Eisner, 1976):user-driven evaluation (Patton, 1980); ethnographic evaluation(Fetterman, 1984); responsive evaluation (Stake, 1974): natu-ralistic inquiry (Guba & Lincoln, 1981).

These diverse approaches to evaluation differ in many re-spects. Chief among them are the role of the evaluator (fromeducator to management consultant to assessor to advocate),the role of the client (from active stakeholder and collaboratorto passive recipient of the evaluation product), the overall de-

Page 37: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 29

sign (from experimental or quasi-experimental to exploratory),and focus (on processformative evaluation: or outcomesummative evaluation). Each of these dimensions correspondsto the contingencies upon which evaluation choices are based:purpose, decision context, stage of program development, sta-tus of theory or knowledge base. One consequence for productdevelopment was the specification of four stages of evaluation:(1) product design stagedeveloping a needs assessment; (2)product creation stagegathering formative data to improvethe product; (3) product implementation stagedemonstratingdifferences between products and making sure appropriate sup-port services are available: and (4) product illuminative stagean in-depth examination of how the product is actually used(Romberg, 1975).

Another consequence has been the use of a convergentstrategy, that is, using several different evaluation models withthe same program. For example, in the IGE (Individually GuidedEducation) Evaluation Study which I directed (Romberg, 1985),we gathered data about reading and mathematics in our schoolsites in four phases. Phase 1 involved large-scale survey proce-dures (including the use of a standardized test). Phase 2 was afollow-up study examining the validity of the Phase 1 data.Phase 3 was an ethnographic study of six exemplary IGE schools.Finally, Phase 4 was a detailed examination in Grades 2 and 5using time-on-task observations and the repeated administra-tion of criterion-referenced tests.

Note also that evaluation experts began calling for betterand different instrumentation to gather information about stu-dent performance. Overall, while program evaluation modelshave proliferated and the questions which they must addresshave become clear, the information used to answer questionstoo often still comes from inappropriate tests.

It is only recently that it has become apparent that the kindof evidence one needs to gather to judge many programs is, ofnecessity. different from that obtained from conventional as-sessment procedures. Tests given in a restricted format (e.g.,multiple-choice items) and in a restricted time fail to capturethe most important aspects of doing mathematics. During thepast decade researchers have developed a plethora of proce-dures for gathering information from students: think-aloud in-terview procedures, performance tasks, projects (both individual

Page 38: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

30 Romberg

and group). hierarchical reasoning tasks, and others. Unfortu-nately, with one notable exception, these procedures have notbeen used in program evaluations because of cost of adminis-tration.

The exception is the elialuation cf the flewet Mathematics AProject in The Netherlands (de Langt 1987). In that evaluationfive different tasks were used to gather information: timed writ-ten tests, two-stage tasks, a take-home task, an essay task,and an oral task. The overall picture of how well studentslearned, developed from information from the five tasks, is muchmore enriched than would have been the case if the research-ers had used any one task.

Trends in External AssessmentWhile past assessment procedures are useful for some pur-poses and undoubtedly will continue to be used, they are prod-ucts of an earlier era in educational thought. Like the Model TFord assembly line, objective tests were considered in the 1920sas an example of the application of modern scientific tech-niques. Today, we are both technologically and intellectuallyequipped to improve on outdated methods and instruments.The real problem is that all three forms of tests (profile, stan-dardized, and criterion-referenced) are based on the same set ofassumptions: an essentialist view of knowledge. a behavioraltheory of learning, and a dispensary approach to teaching. Itshould be obvious that new assessment techniques need to bedeveloped which are consistent with a different view of knowl-edge, learning, and teaching.

New evaluation models are being developed which demandnew assessment procedures. One approach is based on thespecification of mathematical domains and the development ofitems that reflect that domain (Romberg, 1987). In turn, thisassessment approach grows out of the extensive research onsuch domains. A good example is the work of Gerard Vergnaudwith respect to "conceptual fields" (cf. 1982). The principlesthat are followed in this approach include:

Principle 1. A set of specific and important mathematicaldomains needs to be identified, and the structure andinterconnectedness of the procedures, concepts, andproblem situations in each of the domains needs to bespecified.

3 ei

Page 39: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 31

Note that this approach is different from the current ap-proach to specifying the mathematical content of a test in thatnetworks are being defined rather than categories. This meansthat the interconnections of concepts and procedures with prob-lem situations are as important as mastery of any node (e.g., aspecific procedure). For example, consider the following exer-cise in second grade addition and subtraction:

Sue received a box of candy for her birthday. She sharedtwenty-seven pieces with her friends and now has thirty-seven pieces left. How many pieces of candy were origi-nally in the box?

To solve this exercise, a child would be expected first torepresent the quantitative information with the subtraction sen-tence I I 27 = 37. Second, the sentence should be trans-formed to the addition sentence 27 + 37 = I: then the additionshould be performed to yield an answer. What is important isthat the child must know that separating situations can berepresented by subtraction sentences, that subtraction sen-tences can be transformed into equivalent addition sentences,and that there are procedures for performing additions. Eachpiece of knowledge, while important, contributes to a solutionprocess or way of reasoning about a situation that is moreimportant than any single concept or process.

Principle 2. A variety of tasks should be constructed thatreflect the typical procedures, concepts, and problemsituations of the chosen mathematical domain.

This is a key principle in that the envisioned tasks are notJust a more clever set of paper-and-pencil, multiple-choice testitems. Although some typical test items may be appropriate fordetermining mastery of a specific concept or process, many ofthe tasks must be different. For example, some should be exer-cises that require the student to relate several concepts andprocedures, such as those in the example, from the additionand subtraction given above. (Note: See also the discussion ofthe Journey problem and Figures 3-1 and 3-2 in the nextchapter.)

Other tasks may emphasize the level of reasoning associ-ated with a set of questions about the same situation such asthe superitem (in Figure 2-3). Still other tasks may ask stu-dents to carry out a physical process, such as gather data,

Page 40: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

32 Romberg

This is a machine that changes numbers. It adds the number you put in three timesand then adds 2 more. So, if you put in 4, it puts out 14.

U. If 14 is put out, what number was put in?Answer

Answer: 4

Comment: Students have to understand the problem well enough to be able toclose on the correct response which is displayed in the stem.

M. If we put in a 5, what number will the machine put out9Answer

Answer: 17Comment: Students need to comprehend the set problem sufficiently to be able to

use the given statements as a recipe and thus pedorm a sequence of closures whichthey do not necessarily relate to one another.

R. If we got out a 41. what number was put in?Answer

Answer: 13

Comment: An integrated understanding of the statements in the problem isnecessary to carry out a successful solution strategy in this case. Correct solutions mayinvolve working backwards or carrying out a series of approximation trials. It should be

noted that the solution requires only data-constrained reasoning in that no abstractprinciples need to be invoked.

E. If 'X" is the number that comes out of the machine when the number is put in,write down a formula which will give us the value of -Y" whatever the value of 'X.'

X 2Answer

Answer. Y --aComment: A correct response involves extracting the relationships from the problem

and setting them down in an abstract forrryula. It involves using the informationgiven in

a way quite different from that of the lower levels.

Figure 2-3. An example of a super item (Collis, Romberg, & Jurdak, 1986, p. 12).

Reprinted with penrISSIOn.

Page 41: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 33

measure an object, construct a figure, work in a group, toorganize a simulation. And still others may be open ended, likethe Roller Coaster problem (in Figure 2-4).

The picture above shows the rack of a free-wheeling oiler coaster, which is travellingat a walking pace between A and 8.

1. Write a paragraph describing how you think the speed of the roller coaster varies asit travels along the track. (Use the letters A to 0 to help you in your description.)

2. Now sketch a graph which shows how the speed varies as it travels along the track.(Don't expect to get it right the first time!)

Figure 2-4. Interpreting a roller-coaster (Swan, 1986, p. 36).

Repnnied with permission

These illustrations demonstrate that there are several dif-ferent aspects of doing mathematics within any mathematicaldomain. To be able to assess the level of maturity an individualor group has achieved in a domain requires that a rich set oftasks be constructed.

Principle 3. Scme tasks in a particular domain would beadministered to students via tailored testing, and forgroups via matrix sampling as well.

Not all tasks for a domain need to be given to a student orgroup to determine the level of maturity. The technology isavailable to systematically vary several aspects of any exerciseor problem situation. For example, for the subtraction exerciseunder Principle 1, one could vary the situations (join-separate,

Page 42: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

34 Romberg

part-part-whole, comparison, etc.). the size of the numbers, thetransformations, and the computational strategies (counting,algorithms, and others).

Principle 4. Based on the tasks administered to a stu-dent in a domain, their complexity, and the student'sresponses to those tasks, the information should belogically combined to yield a score for that domain.

Note that this score is not just the number of the correctanswers a student has found. Instead, it would involve Booleancombinations of information (such as, following Inferential ruleslike if and , then "). The intent of the score isthat it reflect the degree of maturity the student has achievedwith respect to that domain. Note that this assumes all stu-dents are capable of some knowledge in several domains.

Principle 5. A score vector over the appropriate math-ematical domains would be constructed for each indi-vidual or group. Thus, for any individual one wouldhave several scores (xi. x2 x) where Xi is the scorefor a particular domain.

Note that this simply reinforces the notion that mathematics isa plural noun.

In summary. awareness of a problem, such as the need foralternative testing procedures for school mathematics, does notmean solutions are easy. It may take years to replace currenttesting procedures in schools. Nevertheless, this should notdeter us from exploring plausible alternatives. What is neededare tasks that provide students an opportunity to reflect, orga-nize. model, represent, and argue within specific domains. Con-structing, scoring, scaling, and interpreting responses to suchtasks for mathematical domains will not be easy. but will, inthe long run, be well worth the effort.

Trends in Assessment by Teachers

One striking consequence of scientific, psychometric assess-ment procedures has been to deskili teachers. External objec-tive assessment was deemed better than professional judgment.Today. too many teachers are no longer trained in evaluationand lack confidence in their ability to judge student perfor-

Page 43: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation: A Coat of Many Colors 35

mance (Apple, 1979). In reaction to this awareness, a trend toempower teachers is emerging. For example, the Graded As-sessment Project in England (Close & Brown, 1988) providesteachers with procedures to assess performance. This theme iscentral to the North American National Council of Teachers ofMathematics' Curriculum and Evaluation Standards (1989). It isalso a major component in the Australian Mathematics Cur-riculum Teaching Project (MCTP) (Clarke, 1987) and a focalpart of the Cognitively Guided Instruction research project atthe University of Wisconsin (Peterson, Fennema, Carpenter, &Loef, 19871.

Political-Practical TrendIn most of the world, it is generally agreed that the educationalsystem, as a whole, and the teaching of the learning of math-ematics, in particular, need to change. Demands are being madeof governments. politicians, and administrators for funds tobring about reform. In turn, of course, these groups have aright to demand that evidence be gathered to prove that theirmonies are well spent, that changes are in fact made, and thatthe changes make a difference. Valid pupil performance dataare the kinds of information demanded.

However, governmental expectations about such data in theUnited States and Great Britain revert back to the scientific-experimental notions of the past: behavioral objectives, norm-referenced scores, Bloom's Taxonomy. For example, attainmenttargets in the new national curriculum in Great Britain is merelya new label for behavioral objectives. The use of SIMS items forpolicy profiles (e.g., in Italy and in some parts of the UnitedStates) continues the practice of not assessing problem-solvingstrategics, communication skills, level of reasoning, and othervital areas. These, along with other examples, make it clearthat there is considerable disparity between current theory andthese practical demands. The demands for information are le-gitimate. The validity of procedures is suspect.

CONCLUSIONS

The field of assessment and evaluation has come a long wayduring the last quarter century. However, a lot of work stillneeds to be done. Assessment of growth in specific domains

if 3

Page 44: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

36 Romberg

has replaced general assessment of status performance overseveral domains.

Unless we make changes in the way in which we gatherinformation from students, we will only contribute to the ongo-ing difficulties of sterile lessons and the further deskilling ofteachers, and we will lose a major opportunity to change theway mathematics is done. Instead, we need to conceive of cur-ricular evaluations and of assessments of individual progressin light of mathematical maturity in specific domains.

1. Current testing procedures are unlikely to providevalid information for decisions about the currentreform movement.

Current tests reflect the ideas and technology of a differentera and world view. They cannot assess how students think orreflect on tasks, nor can they measure interrelationships ofideas.

2. Work should be initiated, or extended, to developnew assessment procedures.

Only by having new assessment tools that reflect authenticachievement in specific mathematical domains can we provideeducators with appropriate information about how studentsare performing. Of necessity, this Implies that considerable fundsbe allocated for research and development. Only when newinstruments are developed will we no longer be bound by oldassessment procedures rooted in the traditions of the indus-trial age.

3. The emerging variety of evaluation models needsto utilize assessment procedures that reflect thechanges in school mathematics.

Today. school mathematics is changing the emphasis fromdrill on basic mathematical concepts and skills to explorationsthat teach students to think critically, to reason, to solve prob-lems. The criteria for judging level of performance by a studentor group of students should be based on these notions. Thiswill involve the student's capabilitywhen presented with aproblem situation in a specific mathematical domainof com-municating, reasoning. modeling, solving, and verifying propo-sitions. Also, the index or scale developed to measure perfor-mance should reflect the student's level of maturity in thatdomain.

Page 45: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

3

Implications of the NCTM Standardsfor Mathematics Assessment

Norman Webb and Thomas A. Romberg

The primary purpose of this paper is to provide an overview ofNCTM's 1989 Curriculum and Evaluation Standards and theirimplications for mathematics reform. This is followed by adiscussion of (1) some underlying assumptions about what itmeans to know mathematics and (2) ways of organizing math-ematical knowledge into conceptual fields. Then, criteria forassessment, compatible with and supportive of the curriculumstandards, are presented. Finally, three examples of alterna-tive assessment techniques are given that correspond to theintent of the evaluation standards and illustrate forms of as-sessment that are applicable in evaluating the curriculum stan-dards.

The National Council of Teachers of Mathematics (NCTM)Commission on Standards for School Mathematics was createdin 1986 and charged with the development of a set of curricu-lum standards concerning (1) the mathematics that ought to beincorporated into quality school mathematics programs and (2)the instructional conditions necessary for students to learnmathematics. In addition, the Commission was asked to de-velop standards for both the evaluation of a school programbased on the new curriculum standards and on student perfor-mance in light of those curriculum standards. During the sum-

37

, rJ

Page 46: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

38 Webb and Romberg

mer of 1987, four working groups met for a month and draftedfour sets of standards. one each for grade levels K-4. 5-8, and9-12, and one on evaluation. Drafts of the standards weredistributed to NCTM members during the 1987-88 academicyear for their review and commentary. During the summer of1988. the working groups met again to finalize the standardsbased on the feedback received and to produce a final docu-ment that was officially presented for implementation at the1989 NCTM Annual Meeting in Orlando, Florida.

The Curriculum and Evaluation Standards for School Math-ematics (NCTM. 1989) provides a new vision for the K-12 cur-riculum. That vision prescribes that mathematical knowledge.because of its dynamic and multiplex nature, be acquiredthrough investigating, exploring, reasoning. making connections,and communicating. The curriculum goal is for students toknow mathematics as an integrated whole, including a range oftopics many of which are interrelated by common symbols,concepts, rules, and procedures.

In support of this vision, assessment as a means of observ-ing what students know of mathematics needs to be seen dif-ferently from traditional forms of testing used in measuringoutcomes of present school curricula. Most multiple-choice orfixed-choice tests. in which total scores are based on aggregat-ing results from a set of items scored as correct or incorrect.are designed to measure independent partitioning of mathemat-ics rather than knowledge and the interrelationships amongmathematical ideas. The organization of these tests is based oninstructional objectives or competencies that reflect a view ofmathematics as a large collection of separate skills and con-cepts. In the new evaluation standards, assessment is viewedas integral to instruction, with the primary role of improvingstudent learning. In this way assessment becomes a process ofunderstanding the meaning students give to mathematicsitsconcepts, its procedures. and the ways problems arc solved,the reasonings used, the means of communication, as well ashow one comes to appreciate the mathematical enterprise.

Like mathematical knowledge. assessment also is dynamicand involves a variety of approaches. Assessment is a meansfor determining students' understanding of mathematical pro-cesses and the interrelationships of mathematical topics; it alsocan be used to determine their ability to apply mathematics in

Page 47: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCI'M Standards 39

different situations. For the vision of the NCTM Standards to berealized, the new vision for assessment is necessary.

The NCTM Curriculum Standards

Mathematics is changing, and what people need to know aboutmathematics to be productive citizens is changing. Importantfactors implicated in these changes are the advances in tech-nology, such as the prevalence of computers and calculators,and the expanding use of quantitative methods in almost allintellectual disciplines. In defining what mathematics is needed.five goals are identified by the Standards for the K-12 curricu-lum. Students are to develop their mathematical power andbecome mathematically literate by:

1) learning to value mathematics:2) becoming confident in one's own ability:3) becoming a mathematical problem solver;4) learning to communicate mathematically; and5) learning to reason mathematically.

There are four common standards, based on these goals,that are part of each set of Standards for each grade grouping:mathematics as problem solving, mathematics as communica-tion, mathematics as reasoning, and mathematical connections.Positioning these standards as the first four of each set atteststo their importance and their relevance to all instruction. Al-though not stated directly as standards, the valuing of math-ematics and confidence in doing mathematics are emphasizedthroughout the descriptions of the standards and the suggestedapproaches to teaching. Focusing on problem solving, commu-nicating, reasoning, and connections as standards for all threegrade groupings recognizes that these will be attained over aperiod of years as a result of their reinforcement both withinand across grade levels.

Solving problems. communicating, and reasoning via math-ematics are not independent of each other but develop concur-rently through the interaction of each with the other. The devel-opment of these mathematical abilities should be viewed asdegrees of maturation within each process. Students come tokindergarten already possessing problem-solving strategies forfinding answers about situations, words for describing situa-tions, and forms of thinking about situations. Over the school

Page 48: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

40 Webb and Romberg

years, through additional experiences, these strategies can befurther developed, new strategies learned, and more sophisti-cated problems solved. The intent of the NCTM Standards is forthe mathematics curriculum to become the means for expand-ing students' existing knowledge for introducing students toadditional forms of mathematical thought; and for developingtheir power to use mathematics as a means of abstracting theworld, interpreting the world, working within the world, andincreasing their knowledge of the world.

The approach taken and the topics covered within eachgrade category of the Standards varies and is affected by thedevelopmental level of students and the inherent structure ofmathematics. In Grades K-4, the authors of the Standardsargue that the empirical language of the mathematics of wholenumbers, common fractions, and descriptive geometry, derivedfrom the child's environment, should be emphasized. In thesegrades, a four Mon for all further study of mathematics isfirmly established. Mastery of computational algorithms hasgenerally been considered a primary objective in the currentcurriculum for the lower grades. Skill and proficiency in calcu-lating by using paper-and-pencil algorithms are important indi-cators of success in the curriculum. The Standards, on theother hand. maintains that the use of paper-and-pencil algo-rithms is only one among several forms of computing. In fact.depending upon the problem situation in which a computa-tional answer is sought, one may need to estimate an answeror find an exact answer. If the latter, then one again has choices,depending on the context. One choice is to calculate mentally,a second is to use a paper-and-pencil algorithm, and another isto use a calculator. Thus, students need to learn all computa-tional proceduresestimation, mental arithmetic, paper-and-pencil algorithms, and calculator uses. It is as important to beable to choose among different means of computation as it is toachieve appropriate answers.

Along with using number to describe the world empirically.it is necessary in the lower grades to develop a sense of spaceand knowledge of the basic concepts and rules for buildinggeometries. Also, the underpinnings of the descriptive and in-ferential sciences of statistics and function that will be devel-oped in later grades need to be introduced and experienced inGrades K-4. Throughout the process, learning mathematics

Page 49: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 41

should involve exploring, validating, representing, solving, con-structing, discussing, using, investigating, describing, develop-ing, and predicting.

In Grades 5-8, according to the Standards, the empiricalstudy of mathematics should be extended to include other num-bers beyond whole numbers, and emphasis should graduallyshift to developing the abstract language of mathematics neededfor algebra and more formal mathematics. The middle gradesare not viewed as the culmination of the arithmetic curriculum,but are seen as a transition leading to more advanced math-ematics. In this sense, the number of topics covered by allstudents should be increased to include significant work ingeometry, statistics, probability, and algebra. The study of arith-metic skills should not he carried out in isolation but should bedriven by subject matter provided in these other areas. Work inthe middle grades should lead to students thinking quantita-tively as well as spatially. There should be an increasing un-derstanding of mathematical structure so that students be-come more aware of the relationships within and amongoperations, numbers, spatial figures, and other forms of repre-sentation.

In high school, Grades 9-12, students are assumed to havehad the mathematical experiences of a broad, rich curriculumand to have reached some degree of computational proficiency.The emphasis of the curriculum for these later grades shouldbe shifted from paper-and-pencil procedural skills to concep-tual understanding, multiple representations and connections,mathematical modeling, and mathematical problem solving. Inpursuing these, lessons should be designed around problemsituations posed in an environment that encourages studentsto explore, to formulate and test conjectures, to prove generali-zations, and to communicate and apply the results of theirinvestigations. An important goal for the high school grades isfor students to become increasingly self-directed learners,through experience in instructional programs designed to fos-ter intellectual curiosity and independence. Although thereshould be variation in the depth and breadth of coverage, allstudents taking at least three years of high school mathematicsshould be exposed to algebra, functions, geometry, trigonom-etry. statistics, probability, discrete mathematics, the concep-tual underpinnings of calculus, and mathematical structure.

49

Page 50: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

42 Webb and Romberg

Throughout the curriculum, as these topics are integrated acrosscourses, students should become aware of the structure ofmathematics and be able to recognize and make the connec-tions among topics. These connections include forming math-ematical representations of problem situations and the abilityto distinguish among equivalent forms of representations.

Evaluation Standards and Assessing Mathematics

T1 fourteen NCTM Evaluation Standards (see Appendix A) aredivided into three groups. In one group are the seven studentassessment standards that describe what is to be observed andmeasured in the process of understanding what mathematicsstudents know. These state that in order to adequately testmathematical knowledge, assessment needs to measure knowl-edge of mathematics as an integrated whole, conceptual under-standing, procedural knowledge, problem solving, reasoning,communication, and mathematical disposition. A second groupcomprises three general assessment standards that present prin-ciples for judging assessment instruments. Inherent in the gen-eral assessment standards is an assumption that all evaluationprocesses should use multiple assessment techniques that arealigned with the curriculum and consider the purpose for as-sessment. A third group comprises the four standards thatidentify what should be included in evaluating a mathematicsprogram. One purpose or program evaluation is to obtain rel-evant and useful info ination for making decisions about cur-riculum and instruction. These four evaluation standards pro-vide indicators of a mathematics program consistent with theCurriculum Standards, the focus for examining the instruc-tional resources of a mathematics curriculum, the focus forexamining instruction and its environment to determine a math-ematics program's consistency with the Curriculum Standards,and provisions for program evaluation, to be planned and con-ducted by an evaluation team.

For the purpose of reflecting on the implications of theNCTM Evaluation Standards for assessing mathematics, thispaper will focus on the final group, the three standards thataddress general assessment criteria: alignment, multiple sourcesof information, and appropriateness. It is these three standardsthat can be used to justify the consideration of new or alterna-tive forms for assessing mathematics other than just changingthe content of tests.

Page 51: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 43

Alignment of AssessmentIn order for methods of assessment to be aligned with theNCTM Curriculum Standards, their spirit and their goals, theassessment methods need to conform to the Standards at theinstructional level. program level, and mathematical domainlevel.

Instructional Alignment. The assessment method will bealigned with instruction to the extent that it covers the breadthof topics taught and provides information about the full rangeof student outcomes expected. The concept of alignment is usedinstead of that of validity to stress the dynamic nature of as-sessment and the need to use multiple sources for information.Where traditionally the validity of one instrument or test isanalyzed, the NCTM Evaluation Standards indicate that anyone instrument will be insufficient to measure the full intent ofinstruction based on the NCTM Standards. Thus, in bringingthe assessment methods into agreement or into alignment withinstruction, it is necessary to consider the variety and range ofassessment methods being used.

It is also important to consider the learning environmentand its expectations for the use of technology. Alignment meansthat if certain materials or equipment are being used in in-struction and are a part of the mathematical experiences of thelearners, then these materials and equipment should be usedin assessment. For example, the Curriculum Standards notethat calculators are one of several means for computing. Calcu-lators also are a means of exploring and investigating math-ematical ideas. Thus, for assessment to be aligned with in-struction represented in the Standards, learners should atsome time during assessment have the option of using calcula-tors to do computations and to investigate other mathematicalideas.

Program Alignment. In addition to instructional alignment,assessment methods should be aligned with the total K-12mathematics program and conform to the expectations andgoals that students are to have attained at the completion ofeleven or twelve years of mathematics. This is referred to asprogram alignment. The NCTM Curriculum Standards describewhat it is that students should know about mathematics, aboutmathematical concepts and procedures, and about applying

Page 52: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

44 Webb and Romberg

reasoning and communication for the purpose of understand-ing and solving problem situations. The goals for a K-12 pro-gram, as described in the Standards, cannot be achieved byaggregating learning that has been partitioned into eleven ortwelve segments, but will be achieved only through having acommon thrust across grades with each building upon theknowledge developed in previous grades and with each leadingtoward constructing a complete knowledge of mathematics. Theremust be an articulated program so that students progressthrough the grades with their knowledge of mathematics ma-turing in a logical and deliberate sequence.

When an assessment method is in alignment with the teach-ing program, the method will measure what students knowabout mathematics that assures the desired level of knowledgein order for them to be productive citizens throughout theirlifetimes. For example, administering timed tests of basic addi-tion and subtraction facts would be part of a program only ifthis method is supported by collecting evidence on how wellstudents can provide an explanation of their efficient thinkingfor determining answers in a number of ways. If timed tests areused only to assess the students memorization of facts withoutunderstanding, then timed tests, as a measure of learning, arenot in alignment with the program goals that project for stu-dents the development of a number sense and the foundationfor developing knowledge of the real number system.

The set of assessment methods used at any specific gradelevel or within any specific course may lack program alignmentsimply due to the omission of methods of gathering evidence onan aspect rather than because a method is being used thatdoes not coincide with program goals. A major goal of programcompliance with the NCTM Curriculum Standards is for stu-dents to achieve the ability to communicate mathematically.This means that students need to be able to use the languageof mathematics to talk, to write. to listen. and to read math-ematics. Students are to he engaged in the communication ofmathematics at all grade levels. Student assessment withineach grade should include some procedures for observing andmeasuring the development of student ability to communicate.The assessment situation should match as closely as possiblethe desired outcomes and normal progression toward the pro-gram goal. The assessment of students' communication skills

Page 53: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 45

can be made via interviews, or by some other means of havingstudents explain their thinking.

Mathematical Domain Alignment. The third type of align-ment for assessment methods is alignment with the field ofmathematics, its structure, and organization. The goal is forstudents to construct a body of knowledge that will result intheir having the power of mathematics. If students are to ac-quire the knowledge of mathematics described in the NCTMStandards, they need to know the concepts, symbols, proce-dures, reasoning, and processes of mathematics as well astheir structure and their interrelationship. Students also needto be able to apply mathematics to situations that add meaningto these symbols and concepts.

For an assessment method to be aligned with the field ofmathematics means that students are tested on mathematicsin a way that is compatible with the structure of mathematicsEmd how mathematics exists within the minds of students.Consideration of the structure of mathematics in constructingassessment methods affects how tasks are designed and cho-sen, how tasks are administered, the desired form of response.what rules are followed to make Judgments about responses,and how information is aggregated and reported.

One example of a strategy for constructing assessment in-struments that is aligned with a conception of the field of math-ematics is the domain knowledge strategy (Romberg, 1987).The domain knowledge strategy is based on Gerard Vergnaud's(1982) notions about conceptual fields. His theories are basedon the philosophic premise that the power of mathematics liesin the fact that a small number of symbols and symbolic state-ments can be used to represent a vast array of different prob-lem situations. If a set of symbols represents a closely relatedset of concepts, referred to as a "conceptual field," then thismonitoring framework should allow one to determine the de-gree of knowledge a student or group has acquired with respectto that domain.

The properties of a conceptual field are:

a set of situations that makes the concept meaningful;a set of invariants that constitutes the concept; anda set of symbolic situations used to represent the con-cept, its properties. and the situations it refers to.

Page 54: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

46 Webb and Romberg

For example, the related mathematical concepts of additionand subtraction of whole numbers has been defined by Vergnaudas the conceptual field additive structure. Such fields are de-rived in four steps.

1. The symbolic statements (e.g., a + b = c and a bc; where a, b, and c are natural numbers) which

characterize the domain are identified.2. The implied task (or tasks) to be carried out are

specified. For addition and subtraction, this in-volves describing the situations where two of thethree numbers, a, b. and c, in the statements areknown and the other is unknown.

3. Rules (invariants) are identified that can be fol-lowed to represent, transform, and carry out pro-cedures to complete the task (e.g., find the un-known number using one or more of suchprocedures as: counting strategies, basic facts,symbolic transformations such as a + I I = c whichimplies c - a = I I, computational algorithms forlarger numbers, and others).

(Note that in the first three steps only formal aspects of amathematical system are considered.)

4. A set of situations are Identified that have beenused to make the concepts, the relationships be-tween concepts, and the rules meaningful (e.g.,join-separate, part-part-whole, compare, equalize,fair trading).

The result of these four steps yields a map (a tightly con-nected network) of the domain knowledge. It is this map that isused as a framewc t for assessment, instead of other possibleframeworks, such as a content-by-behavior framework. In ad-dition to the additive structure, other conceptual fields wouldinclude multiplicative structure, proportional reasoning struc-ture, probability structure, spatial structure, logical structure.relational structure, iterative (discrete) structure, measurementstructure, algebraic structure, integral structure, and the dif-ferential structure. These structures overlap in that some willuse common symbols, concepts, and rules. Problem situationsthat will be used also apply across the different fields.

Page 55: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 47

One of the implications of using a domain knowledge strat-egy is that the set of situations that gives meaning to theconcepts and rules is equally as important as learning to followthe procedural rules. Knowledge of a domain is viewed as form-ing a network of multiple possible paths and not partitionedinto discrete segments as implied by such models as the con-tent-by-behavior model. Over time, the maturation of a person'sfunctioning within a conceptual field should be observed bynoting the formation of new linkages, the variation in the situa-tions the person is able to work with, the degree of abstractionthat is applied, and the level of reasoning applied. What isimportant for alignment is that the assessment methods con-form to some conception of the field being assessed.

Mathematics is plural and represents a field of study com-posed of several domains. To increase the knowledge of math-ematics can be interpreted as maturing in the knowledge ofeach domain. However, there are common ideas, symbols, andprocedures that interconnect the different domains. One chal-lenge for better understanding the interrelationship of domainsis to construct a map of how specific domains are related. Sucha map would be very valuable for guiding both interaction andassessment.

Multiple Sources of Information

In the use of multiple forms of assessment, inferences aboutwhat a student knows must be based on confidence in how theevidence from different sources converges to support a singleconclusion. Traditional notions of reliability are not as mean-ingful or as applicable when trying to determine what someoneknows and when making instructional decisions. Any one sourceof measurement, such as a test, will naturally have built-inecor as a measure of what mathematics a r. -son knows, sim-ply because mathematical knowledge is multuaceted. It is alsonot feasible to expect teachc!rs and others who develop assess-ment instruments to have the time and resources to develophighly reliable tests in the classical sense. For instructionalpurposes. assessment should be viewed as an ongoing processof gathering inform on for making instructional decisions andfor reporting the outcomes of instruction in relation to thedomain of mathematical knowledge. Confidence in one sourceof evidence can only be achieved by supporting evidence from

Page 56: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

48 Webb and Romberg

other sources. To assess the intent of the Curriculum Stan-dards, many different forms of assessment will have to be used.

Appropriate Assessment Methods and UsesA general assessment criterion is that the assessment methodbe appropriate to the use that will be made of the results. Aswith any assessment, the method needs to coincide with thepurpose for doing the assessment and with the developmentaland mathematical maturity of the students. There are manydifferent purposes for assessment such as grade-to-grade pro-motion, graduation requirement, diagnosis, instructional group-ing. and program evaluation. Assessment for the purpose ofjudging the strength and weaknesses of a school district math-ematics program will need to use methods different from thoseused for assessment for the purpose of assigning grades toindividual students. In deciding on any method of assessment,the purpose of the assessment needs to be an important con-sideration.

The more closely the form of assessment matches the levelof mathematical maturation of the learner, the more useful willbe the information obtained. A key principle of assessment is todetermine what mathematics the learner knows. This is doneby locating where the student is on the map of mathematicalknowledge by noting what meaning the learner gives to con-cepts and symbols, what procedures the learner knows andcan use, and how the learner Is able to reason, solve problems,and communicate. In order to locate the individual precisely,the assessment instrument must be sensitive to the distinc-tions that the learner makes. This requires refined assessmentinstruments. It implies that manipulatives should be a part ofthe assessment environment when assessing the mathematicalknowledge of primary age learners. Instruments that distin-guish among forms of abstract knowledge should be useful inassessing the knowledge of eleventh and twelfth graders whohave experienced a curriculum corresponding to the Standards.

Criteria for Judging Assessment InstrumentsIn judging assessment instruments for meeting the main crite-ria listed in the Evaluation Standards, four points need to beconsidered.

U

Page 57: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 49

1. The assessment instrument should provide infor-mation that will contribute to decisions for theimprovement of instruction.

2. The assessment instrument should be aligned withthe instructional goals, the goals for the overallprogram, and a holistic conceptualization of math-ematical knowledge.

3. The assessment instrument should provide infor-mation on what a student knows.

4. The results from one assessment instrumentshould be such that when combined with resultsfrom other forms of assessment, a global descrip-tion is obtained of what mathematics a person orgroup knows.

Three Examples of Alternative Assessment Tasks

The components most commonly encountered in large scaleassessments are fixed-choice items, multiple-choice items, anditems requiring short answers. Classroom assessment is thoughtof as giving students a set of problems in symbolic formsuchas an equation, sequence of numbers, or wordswith eachproblem requiring a number as an answer. Other than these,there are few alternative forms of assessment of mathematicscurrently in use. There are few that provide information on thecommunication of mathematics, on the understanding of math-ematics as an interrelated set of ideas, and on how well thelearner is able to gain meaning from the situation.

Recently certain interesting methods of assessment havebeen explored that show some promise and conform to thevision of the Standards. Some of these newer methods havebeen developed in other countries as problems of reform inthose countries are being addressed. Examples from three dif-ferent sources are given below. These examples have been se-lected to illustrate different means of assessment that seem toadhere to the spirit and recommendations in the NCTM Stan-dards. The three examples were also selected to reflect whatcan he done at different grade levels. However, each assess-ment method discussed should he applicable to a broader graderange than the one presented.

Page 58: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

50 Webb and Romberg

Grades K-4 Assessment

The work of Carpenter, Fennema, Peterson. and Carey (1987)on the Cognitively Guided Instruction Project illustrates mak-ing assessment a part of instruction. A guiding principle forCognitively Guided Instruction is that instructional decisionsshould he based on careful analyses of students' knowledgeand the goals of instruction (Carpenter & Fennema. 1988).Instruction should be appropriate for each child's level of knowl-edge and facilitate growth on successive levels. This requiresthat individual students be assessed at regular intervals andthat instruction planning be based on the results of assess-ment. In Cognitively Guided Instruction, it is important to as-sess not only whether a learner can solve a particular problem,but also how the learner solves the problem. This is very com-patible with the emphasis of the NCTM Standards on problemsolving and reasoning.

Three methods of assessment are being explored by theCognitively Guided Instruction Project (Carpenter, Fennema, &Peterson, unpublished): assessing children in individual inter-views, assessing children during group instruction, and spot-checking assessment during seatwork. Clearly. interviewing orobserving each student in a class requires some form of organi-zation for assessment so that all students are observed or in-terviewed during the assessment period. There also needs to besome means for systematically recording learners' responses. Itis important when using these methods for the teacher to haveknowledge of the classification of problems and of children'sstrategies. In this way, the teacher is able to ask questions orto structure the situation to more appropriately conform to thelearner's development and mathematical maturity. In viewingassessment in this way, it is not necessary to use large batter-ies of problems in order to make important and relevant deci-sions for instruction. What is important is to give the learner aproblem that matches or nearly matches his/her level of knowl-edge. In interviews, there is more flexibility to make adjust-ments, such as leading the student in a given direction.

The following protocol was presented as a way of indicatingwhether a child is ready to proceed from Counting All (giventwo sets of objects. the student counts all objects, beginningwith one, to determine the sum) to Counting On (given two sets

58

Page 59: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 51

of objects, the student counts beginning with the number inone set).

TEACHER: 'There are 6 pennies in the bank." (Theteacher places the pennies in the bank with-out counting each one.) "How many pennieswill be in the bank if we put in 2 more?"

Paul begins to count on his fingers, "1, 2, 3, 4, 5, 6" toestablish the number 6, then hesitates and counts 2more fingers. He looks at the 8 fingers and says "Eight."

TEACHER: "Do you think you could solve the problemwithout counting on all 8 fingers?"

PAUL: No response.

TEACHER: -When you count, what number comes after6?"

PAUL: "Seven comes after 6."

TEACHER: "Right. Suppose we had 7 pennies in thebank and we add 1 more penny. How manypennies would we have? Can you think ofthe number that is 1 more than 7 when youcount?"

PAUL: "Well.... 7, 8. Eight comes after 7."

TEACHER: "Good. Let's put 7 pennies in the bank."(Teacher places chips in groups.) "If we put2 more pennies in the bank, can we figureout how many pennies there will be alto-gether?"

PAUL: "Seven, (pause) 8, 9. There are 9 pennies."

To assess whether Paul could independently Count On,a similar problem using the number fact 5 + 2 wasgiven. (Carpenter, Fennema. & Peterson, unpublished,PP. 7-8)

The process described in this protocol is an example of ateacher leading a student through a situation. Through theprocess of trying to determine whether Paul could approach the

Page 60: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

52 Webb and Romberg

problem by some means other than concretely using fingers,the teacher, by asking questions, is able to determine if thestudent has some of the knowledge necessary to proceed. Inthis situation, a student could make a simple counting errorand produce a wrong answer. Without interviewing, a teachermay reach the wrong conclusion about the student's readinessto use the Count On strategy.

Grades 5-8 AssessmentAn example of using a situation to partially assess middle schoolstudents' knowledge of mathematics as communication andreasoning and knowledge of number comes from the Shell Cen-tre for Mathematical Education (Swan. 1985) in England. Fig-ure 3-1 illustrates a specimen examination question from amodule aimed at developing the performance of children ininterpreting and using information presented in a variety offamiliar mathematical and nonmathematical forms. Figure 3-2gives the marking scheme for the question.

This particular example requires the learner to describe inwords the relationship of one form of representation (a map) toanother form of representation (a graph). Then in sketching agraph. the learner has the opportunity of modeling a situationgraphically. In constructing the graph, some proportional rea-soning is required. Of note is that more than one question isderived from a situation. This provides an opportunity to ob-serve the interrelatedness of different forms of representation.The learner will need to have a knowledge of concepts in orderto describe the car journey. Procedural knowledge is requiredin reading the map and graph in order to get information fromeach. In order to sketch the graph, reasoning is required todetermine the speed of the car in relation to the time. Themarking scheme then gives some indication of what a learnerknows by indicating a global score based on how well thatstudent is able to bring everything together.

This example demonstrates a form of assessment alignedwith a conception of mathematics that involves different formsof representations within the conceptual field of proportionalreasoning. A learner who is able to score high on this situationshows a good understanding of speed as a form of proportionby being able to derive meaning from the map and graph. Byproviding a score for the different parts of the situation, it is

Page 61: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCI'M Standards 53

THE JOURNEY

The map and the graph below describe a car journey from Nottingham toCrawley using the MI and M23 motorways.

1/144rux 11.1

I milesI

-,nalman-%aamana Ann_ II BIM=annanonANmar==, a nom

CI

Time thour4

(i) Describe each stage of the journey. making use of the graph and the map. Inparticular describe and explain what is happening from A to B: B to C:C to D: D to E and E to F.

(ii) Using the information given ants e. sketch a graph to show how the speed ofthe car varies dur ng the journey.

140

NI

Speed(mph)

40

20

2 3

Time (hours)4

Figure 3-1. One examination question from the Shell Centre for MathematicalEducation. (Swan. 1985,2.13). Reprinted with permission.

UI,

Page 62: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

54 Webb and Romberg

THE JOURNEY ...MARKING SCHEME

(I) Interpreting mathematical representations using words and combining infor-mation to draw inferences.

Journey from A to B 'Travelling on the M1' 1 mark

'Travelling at 60 mph' (± 5 mph)or 'travels 60 miles in one hour'

1 mark

Journey from B to C 'Stops'or 'At a service station'or 'In a traffic jam'or equivalent

1 mark

Journey from C to D 'Travelling on the motorway' 1 mark

'Travelling at the same speed asbefore'or Travelling at 60 mph (± 5 mph)'or 'Travels 50 miles in 50 minutes'

(± 5 mins.)

1 mark

Journey from D to E Travelling through London' 1 mark

'Speed fluctuates', or equivalent.eg: 'there are lots of trafficlights'. Do not accept 'car slows down'.

1 mark

Journey from E to F 'Travelling on the motorway'or 'Travelling from London to Crawley'. 1 mark

(ii) Translating Into and between mathematical representations.

For the genera/ shape of the graph, award:

1 mark if the first section of the graph shows a speed of 60 mph (± 10 mph) reducingto 0 mph.

1 mark if the final section of the graph shows that the speed increases to 60 mph (±10 mph) then decreases to 20 mph (± 10 mph) and then increases again.

For more detailed aspects, award:

1 mark if the speed for section AB is shown as 60 mph and the speed for section CDis shown as 60 mph (± 5 mph).

1 mark it the changes in speed at 1 hour and 1'h hours are represented by (near)vertical lines.

1 mark if the stop is correctly represented from 1 hour to 11/2 hours.

1 mark if the speed through London is shown as anything from 20 mph to 26 mph oris shown as fluctuating.

1 mark if the graph is correct in all other respects.

A total of 15 marks are available for this question.

Figure 3-2. Scoring scheme for examination queation given in Figure 3-1. (Swan.1985, p. 13). Reprinted with permission.

0

Page 63: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 55

easier to determine what a student knows and what links haveformed. Having students write a response to the problem-solv-ing process provides an occasion to observe the use of languageto describe a situation given in pictorial form using a map anda graph.

Grades 7-12 AssessmentIn The Netherlands, mathematics education is in a state oftransition just as it is in this country. Out of the reform efforthas come a course referred to as Mathematics A (de Lange.1987) that is aimed at students who are expected to pursuestudies at the university in disciplines where mathematics isneeded only as a tool. Many of the students who take Math-ematics A will specialize in economics, social sciences, andmedicine. One form of assessment used is written timed tests.These are classified as follows:

1. Class 1. Exercises without context, or with hardlyany context

2. Class 2. Exercises with a substantial use of con-texta. Exercises strongly resembling the exercises

from the booklet'b. Exercises resembling those of the booklet

somewhat, but not strikinglyc. Exercises not resembling those of the booklet.

One sample form of assessment included in the Mathemat-ics A materials is a -Two-Stage" task. In this learners are givena situation and asked to respond to as many of the questionsas possible in a traditional timed written test. The first half ofthe test consists mainly of open-ended questions. The secondhalf of the test may include essay questions. The results arescored and then returned to the student.

For the second stage, students are provided information ontheir scores and on the gross mistakes they made in the firststage. Then the students are asked to repeat their work on the

'A booklet is a separately bound unit based on a realistic set of prob-lems. The course is organized around several booklets, each of whichtakes two to four weeks to complete. (de Lange. 1987). Reprinted withpermission.

C3

Page 64: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

56 Webb and Romberg

problem situation at home where they have no restrictions andare completely free to answer the questions as they choose.Students may be given as much as three weeks to do this.Then students are scored on both stages.

An example of a problem given is the Forester Problem.This is to test learners' knowledge of matrices. (de Lange, 1987,pp. 187-89)

FORESTER PROBLEM

A forester has a piece of land with 3.000 Christmas trees. Justbefore Christmas he cuts a number of trees to sell them. Theforester distinguishes three classes of length: S. M. L trees. Thesmall trees have just been planted and have no economic value,the medium trees are sold fl. 10,- a piece, and the large onesfor fl. 25,-. He has, Just after Christmas, 1,000 small, 1,000medium, and 1,000 large trees. All these grow uneventfullyuntil just before next Christmas. From experiences of colleagues,he knows approximately about the growth per year:

40% of the small trees become medium20% of the medium trees become large

Or, in a GRAPH:

0.6 0.8 1

S M L

This graph may be represented by a GROWTH-MATRIX G:

from

S M

. . S

to M

1. Complete the matrix G.2. Calculate the composition of the forest just before

the next Christmas (using G).3. After cutting medium and large trees and planting

Page 65: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 57

small trees, the forester wants his starting popu-lation B (1,000 of each) back. How many of eachkind should be cut and planted?

4. Cutting one tree costs fl. 1. -; planting one treecosts 11. 2,-; what will be the forester's profit thisChristmas?

The forester wonders whether the above strategy is the mostprofitable one. He considers two other strategies, so he has thechoice of the following three strategies:

I. Cut after one year and plant so as to get yourstarting population back.

H. Cut after two years and plant back so as to getthe starting population.

III. Cut after one year the large trees only (leaving1,000) and replant the same number of small trees:repeat this the second year.

5. Which of the above three strategies is the mostprofitable per year?

The forester considers the use of fertilizer to make the treesgrow faster. There exists a fertilizer that, according to the pro-ducer, might lead to the following growth-matrix:

0.6 0 0

G= 0.4 0.5 0to 0.5 1

6. Explain why the trees grow faster with this mode.

The forester likes to use the fertilizer but has some doubtswhether it would not be possible to get the start population Bback after each Christmas. because getting back the B-popula-tion is essential to him.

7. Will it be possible to get back the start populationB when the matrix G is of issue (after one year)?

8. The forester decides to thin the fertilizer such thatto get his B-population back every year he onlyhas to cut large trees and to replant the same

0J

Page 66: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

58 Webb and Romberg

number of small trees. Can you suggest any Ma-trix G that will have the desired effect?

9. Using all information available, what would youadvise the forester?

Another forester prefers five length-classes:

Class 1 2 3 4 5

S P, = 0.4 Pa= 0.3 Pa= 0.4 P4 =0.2

1,000,0001

H= 1,0001.0001,000

10. Write an essay including:the growth-matrix in this situationthe effect on the total population after one yearthe possibility to get B backif this is impossible, change one of the entries ofthe matrix In order to make it possiblethe effect if a tree manages to grow in one yearfrom class 3 to class 5.

11. Find the matrix for the general case:

How can you conclude from the matrix whethergetting B is possible or not?

12. What are the limitations of the model? What re-finements would you like to suggest?

13. (See question 5) The third strategy was: Cut after

;10

Page 67: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Implications of the NCTM Standards 59

one year the large trees only (leaving) and replantthe same number of small trees: the same is donethe next year(s). What will be the effect of thisstrategy in the long run?

The Forester Problem illustrates the development of severalquestions from one situation. The questions are sequenced bydifficulty and complexity. The final questions require the stu-dent to make generalizations and draw conclusions from theprevious work. One difficulty in giving such an assessmenttask is the scoring. De Lange noted that teachers approach thescoring in different ways. One approach was to read the com-plete work, mark positive and negative points, and then give agrade. It is possible that such a situation could provide somerich data on motivated students who complete the task.

The three examples given above were presented to generatethought and dialogue regarding assessment. Each demonstratesa means for assessing what students know about an area ofmathematics. The Carpenter, Fennema, and Peterson protocolfor a teacher-student interview illustrates a teacher using in-formation from the student and leading the student to a moreadvanced counting strategy. This requires a deep understand-ing by the teacher of the procedures and of the student's think-ing process. But in the interview situation it becomes apparentthat the student begins to make links between existing knowl-edge and computation of the sum by using Counting On. Theuse of this type of approach by the teacher supports the K-4Curriculum Standards by teaching students different strategiesthrough using known procedures. The teacher interview is aviable assessment approach for the Evaluation Standards, be-cause the teacher obtains information on the students' proce-dural knowledge that can lead to further development of thenew strategy.

The Swan examination question illustrates the assessmentof students' knowledge of different forms of representation. Thisapproach allows the teacher to understand better what thestudent is able to do with proportional reasoning. Having thestudent respond verbally and graphically provides an opportu-nity to observe the link between the concepts of speed and thedifferent forms of the language of mathematics. Awarding scoreson parts of the task indicates knowledge of the same concept in

Page 68: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

60 Webb and Romberg

different contexts created at different stages of the journey. Thede Lange Forester Problem illustrates an assessment situationthat extends to more than one approacha timed test and anextended test. For one situation, multiple questions are asked.These questions cover a span of knowledge on the use of matri-ces, from applying basic operations to being able to generalizeto n dimensions. Having students respond to a number of ques-tions regarding a given situation in different ways under differ-ent conditions provides the opportunity to observe the use ofmultiple sources of information and to ensure a clearer under-standing of the depth of knowledge the student has.

The three examples discussed in this paper generate morequestions than answers. The examples correspond to the spiritof the NCTM Standards and provide evidence of the existence ofsome alternative forms of assessment. But the challenge existsfor those in testing and evaluation to make use of such ex-amples and develop others that are both aligned with curricu-lum and provide useful and accurate information. In doingthis, the forms of assessment must be compatible with thestructure and nature of mathematics and aligned with instruc-tion and the curriculum program. For teachers, the challengeis (1) to become comfortable with the different means of assess-ment and (2) to apply these tools for building instructionalstrategies and an understanding of what students know.

Page 69: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

t

4

Curriculum and Test Alignment

Thomas A. Romberg, Linda Wilson,Mamphono Khaketla, and Silvia Chavarria

The purpose of this chapter Is to report on information gath-ered from two studies related to the reality of Evaluation Stan-dard I: Alignment of the NCTM Standards (see Appendix A). Inthe first study. the six standardized tests most widely used atstate and district levels in schools in the United States areexamined to determine whether or not they are appropriateInstruments for assessing the content, process, and levels ofthinking called for In the Standards. The results show that thetests are not appropriate. They are found to be generally weakin five of the six content areas and in five of the six processareas. Furthermore. the tests place too much emphasis onprocedures and not enough on concepts. In the second study,we conduct an examination of items and tests from newlydeveloped state tests and foreign tests. It is clear that thereare test items currently in use and some being developed thatprovide the kind of breadth of content and depth of knowledgecited In the Standards.

PART I: THE STUDY OF SIX STANDARD TESTS

BackgroundRomberg. Wilson. and Khaketla's 1989 study An Examinationof Six Standard Mathematics Tests for Grade 8" followed anearlier large-scale questionnaire survey conducted by Romberg,Zarinnia. and Williams (1989). The survey was conducted to

61

Page 70: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

62 Romberg, Wilson. Khaketla, and Chavarria

find out from Grade 8 teachers how mandated testing influ-enced their teaching of mathematics. It sought information fromteachers about district and state tests that their students took,the amount of time they spent on testing and preparing stu-dents to take tests, how they used test results, what their viewswere about the effects of the tests, how the tests influencedtheir teaching, and what they perceived as the influence testinghad on the mathematics curriculum.

The results of the study indicate that nearly 70 percent ofthe teachers report that their students take a mandated test ateither the district or state level or both. Secondly, becauseteachers know the form and character of the tests their stu-dents take, most teachers make changes in their teaching toreflect this knowledge. Thirdly. the changes teachers make intheir classroom practice tend to contrast with the recommen-dations made by NCTM's Curriculum and Evaluation Standardsfor School Mathematics (1989). For example, the Standards rec-ommended more activities involving the use of calculators inthe classroom (p. 8; p. 75). However, about 25 percent of theteachers report that they decreased emphasis on calculatoractivities because students cannot use calculators on stan-dardized tests; less than 10 percent reported an increased useof calculators in their classrooms.

One of the survey questions asked teachers to list the teststhat their schools used. Six commercially developed tests werelisted as the most widely used in Grade 8, both at the districtand state level: The California Achievement Test (CAT) (1985),The Metropolitan Achievement Test (MAT) (1986), The StanfordAchievement Test (SAT) (1982), The Science Research Associ-ates Survey of Basic Skills (SRA) (1985), The ComprehensiveTest of Basic Skills (CTBS) (1989), and The Iowa Test of BasicSkills (ITBS) (1986). The analysis of these six tests serves asthe basis for the findings in the report of the first study.

PurposeThe six tests were analyzed to (1) determine whether they re-flect the recommendations made in the Standards, and. if not,to (2) make recommendations to the test developers for revisingtheir tests. The rationale for this study is based on the fact thatsince a number of teachers reported changing their teaching toreflect their knowledge of what is tested, the best way to ensure

Page 71: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 63

a shift in emphasis in teaching is to change the test. It is arationale based directly on the statement, To change the cur-riculum, change the test," for which considerable evidence ex-ists (e.g., Edelman, 1980; Millman. Bishop, & Ebel, 1965; Rom-berg, Zarinnia. & Williams, 1989; Stanley & Hopkins, 1981).

A major argument against standardized tests has been theirfailure to assess higher-order skills; rather, such tests empha-size computations. recognition, and other lower-order thinkingskills (Meier. 1989; Putnam, Lampert, & Peterson, 1989). Thislatter type of assessment is in contrast to the major theme ofthe StandardsProblem Solvingwhich is an all-encompass-ing higher-order skill. However, before urging test developers tochange their tests, it is necessary to determine whether currenttests evaluate the achievement of the objectives set out in theStandards. The intent of this study of six standard tests, there-fore, is to make that determination.

Design

The mandated tests under study were Grade 8 tests. Therefore.the Standards for Grades 5-8 formed V ie basis for the classifi-cation of items.

Each item on each test was classified within three areas: (1)the content it tests: (2) the process required to respond to theItem; and (3) the level of the response required.

The item was first categorized into one of the following sevencontent areas described in the Standards (the numbers in pa-rentheses refer to their position in the Grade 5-8 Standards):

Number and Number Relations (5)

Number and Number Theory (6)

Algebra (9)

Statistics (10)

Probability (11)

Geometry (12)

Measurement (13)

Each item was categorized Into one of six process areasdescribed in the Standards:

4l

Page 72: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

64 Romberg. Wilson. Khaketla. and Chavarria

Communication (2)

Computation and Estimation (7)

Connections (4)

Reasoning (3)

Problem Solving (1)

Patterns and Functions (8)

Finally, each item was classified into one of two levels.procedure or concept, according to whether the response to thequestion required procedural or conceptual knowledge.

A matrix (Appendix B) was developed and used to classifyeach test item according to the three classifications. For ex-ample, for the CTBS test. Item 1, "8.685 - 2.150," is classifiedas a computation problem that tests number and number rela-tions: to work it requires procedural knowledge. Totals weregenerated for each test, and the results are discussed in theresults section.

Two raters did the classification. A test was chosen at ran-dom and analyzed by both raters working together. Then adifferent test was picked and individually analyzed by eachrater. Their results were compared for interrater reliability. Nosignificant differences in the individual ratings were found, andthe remaining tests were divided among the raters for indi-vidual analysis.

RESULTS OF THE ANALYSIS

Problems

Two problems were encountered in categorizing the test items.First, in many cases, an item was put into a certain categoryeven though in fact it did not reflect the true spirit of theStandards. For example, an item from the ITBS has the follow-ing stem: "How would you write 6 thousandths as a decimal?"This was categorized under the process area, Communication,though it requires a much "lower' level of communication thanthat described in the Standards:

In Grades 5-8. the study of mathematics should includeopportunities to communicate so that students can:

I

Page 73: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 65

model situations using oral, written, concrete, picto-rial, graphical, and algebraic methods:reflect on and clarify their own thinking about math-ematical ideas and situations:develop common understandings of mathematical ideasand skuations:develop common understandings of mathematicalideas, including the role of definitions;use the skills of reading, listening, and viewing tointerpret and evaluate mathematical ideas:discuss mathematical ideas and make conjectures andconvincing arguments:appreciate the value of mathematical notation and itsrole in the development of mathematical ideas. (NCTM.1989, p. 78)

Second, many of the tests label a section "Problem Solving," yetthe problems do not resemble the types of problems describedin the Standards as problem solving. Nearly all of the problemsso labeled in the tests were routine word problems, such as:"Brett correctly answered 24 out of 25 questions on a sciencetest. What percent of the questions did Brett answer correctly?"(SRA. Level 36, Form P). The Standards on the other hand, callfor nonroutine problem situations which "are much broader inscope and substance than isolated puzzle problems" and "verydifferent from traditional word problems, which provide con-texts for using particular formulas or algorithms but do notoffer opportunities for true problem solving" (NCTM. 1989.p. 76).

Individual Test ResultsThe percent of items classified in each category for each test isgiven in Appendix C.

1. SRA Survey of Basic SkillsLevel 36, Form PScience Research Associates, 1985Chicago, IllinoisThere were 90 items in the mathematics portion of thetest. Of those. the majority (82 percent) were classifiedin the content area of Number and Number Relations,with 7 percent each in Number Systems and Number

I

Page 74: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

66 Romberg, Wilson, Minket la, and Chavania

Theory, 4 percent In Geometry, and none in Probability,Statistics, or Measurement. in areas of process, most(91 percent) of the items were classified as Computa-tion/Estimation, with 5 percent in Communication, 3percent in Problem Solving, 1 percent in Reasoning, andnone in Connections or Patterns and Functions. Eighty-four percent of the items were viewed as procedural and16 percent conceptual.

2. The Calybrnia Achievement TestLevel 18, Form E, Tests 6 & 7CI LI/McGraw Hill, 1985Monterey, CaliforniaThere were 105 items on the mathematics portion of thetest. This test had a somewhat broader distribiltion ofcontent areas, with 73 percent of the items in Numberand Number Relations. 6 percent each in Measurement,Algebra, Probability or Statistics, 5 percent in NumberSystems and Number Theory, and 4 percent in Geom-etry. In the process areas, most (83 percent) of the itemswere Computation/Estimation, 11 percent were Com-munication. 6 percent were Reasoning. and none werein Problem Solving, Connections, or Patterns and Func-tions. Most (90 percent) of the items were procedural,with 10 percent conceptual.

3. The Stanford Achievement TestAdvanced Level (7th ed.)The Psychological Corporation. 1982San Antonio. TexasWhile most (64 percent) of the 118 items on the math-ematics portion of the test were classified in the contentarea of Number and Number Relations, a broader spreadexisted among the other content areas. Fifteen percentof the items were classified as Measurement, 10 percentas Algebra. and 9 percent as Probability or Statistics.However, only 2 percent of the items were in Geometry.and none were in Number Systems and Number Theory.In the process categories, 38 percent of the items wereIn Communication. 62 percent were in Computation/Estimation, and none were in the other categories. Nearlyall (92 percent) of the items were considered procedural.with only 8 percent conceptual.

I 4

Page 75: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 67

4. The Iowa Test of Basic SkillsLevel 14, Form 7Riverside Publishing Co., 1986Chicago, IllinoisThe mathematics portion of this test, with 193 items,was nearly twice as large as the other five tests understudy. However, like the others, the majority (62 per-cent) of the items were classified in the content area ofNumber and Number Relations. Measurement had 13percent of the items, and 11 percent of the items werein Number Systems and Number Theory. Seven percentof the items were in Algebra, 4 percent were in Geom-etry. and 3 percent were in Probability or Statistics.Nearly all (89 percent) of the items were in the processarea of Computation /Estimation, with 9 percent in Com-munication, 1 percent in Patterns and Functions, 1 per-cent in Reasoning, and none in the other process cat-egories. Only 4 percent of the items were consideredconceptual, with 96 percent procedural.

5. The Metropolitan Achievement TestAdvanced 1, Forms L & M (6th ed.)The Psychological Corporation, 1986San Antonio. TexasThis test, with 95 mathematics items, was quite similarto the others in content. Sixty-six percent of the itemswere classified as Number and Number Relations. 15percent as Measurement, 8 percent as Geometry. 6 per-cent as Number Systems and Number Theory. 5 percentas Probability or Statistics, and none as Algebra. Likethe others, most (79 percent) of the items were classi-fied in the process area of Computation/Estimation. with21 percent in Communication, and none in the others.Eighty-eight percent of the items were procedural, and12 percent conceptual.

6. The Comprehensive Test of Basic SkillsLevel 17/18, Fortn ACTB/McGraw Hill, 1989Monterey. CaliforniaSeventy-six percent of the 94 items on the mathematicsportion of the test were classified as Number and Num-ber Relations. Eleven percent were classified In the con-

Page 76: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

68 Romberg, Wilson, KhakeUa, and Chavarria

tent area of Probability or Statistics, 8 percent in Geom-etry, 5 percent in Measurement, and none in theareas. Most (71 percent) of the items were Computa-tion/Estimation, with 25 percent in Communication, 2

percent each in Reasoning and Patterns and Functions,and none in the other areas. Eighty-five percent of theitems were procedural, and 15 percent were conceptual.

General Results1. Most items (between 62 percent and 82 percent,

with an average of 71 percent) were found to be inthe content area of Number and Number Rela-tions, with the rest fairly evenly distributed amongthe other five categories.

2. Most items (between 71 percent and 91 percent,with an average of 79 percent) were found to be inthe process area of Computation/Estimation, with20 percent in Communication, and very few in theother four categories.

3. An average of 89 percent (with a range of from 84percent to 96 percent) of the items were classifiedas procedural, rather than conceptual.

4. Little variation was found among the tests in termsof categorizing the items. The greatest range wasfound in the category of Communication, with theSRA having 5 percent of its items in Communica-tion and the SAT having 38 percent.

5. In the category of Computation/Estimation, themajority of items were Computation. with no morethan 10 percent. and in some cases 0 percent,being Estimation.

Conclusions

Romberg. Wilson, and Khaketla's 1989 study of the six stan-dardized tests used most widely at state and district levels Inschools in the United States made the following findings:

1. The items in the tests examined do not adequatelycover the range of content described in the Stan-dards. The great majority of items were found tobe computations on numbers, or were based on

I

Page 77: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 69

algorithmic procedures. The Standards, on theother hand call for a decreased emphasis on per-forming routine computations by hand, and in-creased emphasis in other areas such as problemsolving, masoning, connections, and communication.

2. The tests do not address one of the primary con-tent areas of the Standards, that is. problem solv-ing. The Standards "strongly endorse" the first rec-ommendation of Art Agenda for Action, which statesthat problem solving must be the focus of schoolmathematics (NCTM, 1980, p. 6). Further, the Stan-dards consider problem solving to involve muchmore than routine word problems. According tothat definition, an average of only 1 percent of theitems on the tests were categorized as problemsolving, with a range of from 0 percent to 3 percent.

3. The tests do not adequately cover the followingcontent areas: Number Systems and NumberTheory, Algebra, Statistics, Probability, Geometry,and Measurement.

4. The following process areas are not covered ad-equately by the tests: Communication, Connec-tions, Reasoning, Problem Solving, and Patternsand Functions.

5. The tests place too much emphasis on proceduresand not enough emphasis on concepts.

PART 2: THE FOLLOW-UP STUDY

The aim of the follow-up study (Romberg. Wilson, & Chavarria,1990) was to demonstrate the existence of test items that aremore closely aligned with the Standards than are the itemsfound in the six tests of the first study. The investigation drewupon items and tests from two sources: newly developed statetests and foreign tests. The study looked at materials fromCalifornia. Connecticut, South Carolina. Massachusetts, andVermont: then it considered materials from several foreign coun-triesprimarily Britain, but also Australia. France, Korea, TheNetherlands, and Nonvay.

The conclusion of the investigation was that there are testitems which are currently in use that are more closely aligned

Page 78: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

70 Romberg. Wilson. Khaketla, and Chavarria

with the Standards than the six standardized tests that aremost widely used now at the eighth-grade level in the UnitedStates. Several states are implementing reforms in their assess-ment practices and have developed tests to reflect the objec-tives described in the Standards. In addition, many tests andtest items that are currently used in foreign countries, mostnotably Britain, surpass American standardized tests in theiralignment with the Standards. The feature shared by all ofthese tests and test items is that they are open response, notmultiple choice, in format. The content and processes mea-sured in these items are rich and varied. Many of the items areable to assess higher-order thinking with greater ease than dotypical multiple-choice questions. Process areas such as Prob-lem Solving, Reasoning, and Communication lend themselvesto an open-response format. and this has been borne out inour investigations. In United States tests. only 1 percent of theitems could be classified as Problem Solving, 20 percent asCommunication (and that at the lowest levels of communica-tion). and 1 percent as Reasoning. In contrast, the open-re-sponse tests being developed in several states and in Britaincontained excellent examples of items in those three processareas.

The following arc examples of test items found on eitherstate or foreign tests:

I. Connecticut Common Core of Learning Assessment ProjectStudents are given an article from the newspaper en-

titled, "Survey Finds Many Below Town's Mean Income."They are then given the following questions: Use thearticle and your understanding of statistics to completethe following tasks:I. Write an expository paragraph that begins with either:

The headline is fine because ... ORThe headline is absurd because ...

2. Write an expository paragraph that begins with either:The article makes sense and has no statistical

errors because ... ORThe article is absurd and makes statistical er-

rors because ...3. How can more than half the people be below the

mean income?

Page 79: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 71

4. Create a data set to show how more than half thenumbers are below the mean. Describe your rea-soning.

SURVEY FINDS MANY BELOWTOWN'S MEAN INCOME*

OLD SAYBROOK A recent survey shows more than half of therespondents earn well below the town's mean annual income of$37,500. Vicki Mc Court, a member of the Old Saybrook Afford-able Housing Task Force, said 65% of the 200 respondentsreported earning less than $30,000 a year. 'There are no housesin Old Saybrook that anyone can afford within the mean in-comethey Just cannot do it." Mc Court said.

This item is a good example of a problem in the contentarea of Statistics, the process area of Communication, and aconceptual level of knowledge.2. California Assessment Program

James knows that half of the students from his schoolare accepted at the public university nearby Also, halfare accepted at the local private college. James thinksthat this adds up to (1.00 percent. so he will surely beaccepted at one or the other institution. Explain whyJames may be wrong. If possible, use a diagram in yourexplanation.This item taps into the critical process areas of reasoning

and communication.The following three items are taken from British tests:

3. London & East Anglian Group for GCSE ExaminationsAn air-mail letter to India costs 34p. How can you pay

correct postage using only 4p stamps and llp stamps?This item, in the content area of Number Systems and Num-

ber Theory, is a computation problem, but at a conceptualrather than procedural level.'Source: Connecticut State Department of Education, Connecticut Common

Core of Learning Performance Assessment Project. Used with per-mission. This item has been replaced by a revision pilot tested inSpring, 1991. The project was funded by a grant from the NationalScience Foundation.

Page 80: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

72 Romberg, Wilson. Khaketla. and Chavarria

4. "Beefo Cubes" are 2 cm x 2 cm x 2 cm.2 cm

They are sold in a thin cardboard sleeve 4 cm x 4 cm x8 cm.

1\4 cm

m

6 Cfn

How many cubes are in one full sleeve?

This item could be classified into content areas of eitherGeometry or Measurement, and a process level of Connectionsor Problem Solving: it requires a conceptual level of knowledge.

5. Northern Examining AssociationThe picture shows a woman of average height standingnext to a lamp post.a) Estimate the height of the lamp post.b) Explain how you got your answer.

60

Page 81: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Curriculum and Test Alignment 73

This item is a refreshingly different estimation problem, onethat does not involve simply rounding numbers. The "b" part ofthe problem makes it a good communication problem also. Itcould be classified as either Measurement or Number Relationsand again is at a conceptual level.

6. The NetherlandsThe results of two classes of a math-test are presented in a

stem-leaf-display:CLASS A

7 1

7 24 3

55 44 51 61 7

9966555 897 9

CLASS B

34

512344668114

Does this table suffice to judge which class performed best?

Source: de Lange, van Reeuwijk, Burrill, & Romberg (in press). Used withpermission.

This open-ended item requires that the student have a con-ceptual understanding of the data represented in the displayand be able to communicate those concepts by means of a validmathematical argument.

These six items are a sample of the kinds of problems thatare possible when one is not bound by the multiple-choiceformat. Each problem is rich, engaging, and interesting. Teststhat are comprised of items such as these can provide a morevalid means of assessing the content areas, processes, andlevels of knowledge described in the Standards. Perhaps mostimportant, a student who encounters test items such as thesewill come away from the test having learned some mathematicsthrough the experience.

SUMMARY

This study has attempted to provide evidence that the six stan-dardized tests most often used by states in their mandated

Page 82: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

74 Romberg. Wilson. Khaketla. and Chavarria

testing programs are not appropriate instruments for assessingthe content areas, processes, and levels of thinking called for inthe NCTM Standards. These tests are "based on different viewsof what knowing and learning mathematics means" (NCTM,1989, p. 191). As the Standards become more widely imple-mented in the schools, standardized tests used in the schoolswill have to change to more accurately reflect the net.- vision ofthe mathematics curriculum which the Standards outline. Rom-berg, Wilson. and Chavarria's 1990 study of state and foreigntests found that there are examples of tests and test items, allof which are open-response in format, that can provide morevalid means of assessing the mathematics of the Standards.

Page 83: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

5

State Assessment TestDevelopment Procedures

James Braswell

The primary purpose of this chapter is to describe how testsare developed for state assessment programs. The methodsdescribed are based in part on recent discussions with statedepartment of education staff in states Judged to be represen-tative of a range of approaches to test development. I spokewith assessment representatives in Florida, Louisiana, Massa-chusetts. Michigan, and New Jersey. Occasionally, there willbe observations in the chapter that reflect previous experiencewith other state testing programs and current work with theNational Assessment of Educational Progress (NAEP) test de-velopment team for the 1990 Mathematics Assessment. Whatis reported in this chapter is primarily descriptive rather thanevaluative.

The primary purpose of state assessment programs is tomonitor trends in achievement. In some states, such as Cali-fornia (see Chapter 6), the emphasis is on group assessment asopposed to individual student assessment. In other states, suchas Michigan, the emphasis is on individual student assess-ment. However, Michigan also provides school and district levelreports that can be used for curriculum evaluation and ac-countability purposes.

When group assessment is the goal, the larger test instru-ment can be viewed as a compilation of several shorter testinstruments in which no student takes more than one of the

75

Page 84: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

76 Braswell

shorter instruments. For example, the te:i specifications andevaluation design may be developed into a pool of 400 ques-tions, which are placed on ten 40-item tests. When adminis-tered, group data are available on the entire pool of 400 ques-tions, even though any given student only responds to a single40-item test. When individual student assessment is a majorgoal, students at a given grade level generally take the sametest.

If the primary goal of assessment is improvement in educa-tion, test instruments should be designed to provide an appro-priate target for instruction. Shavelson (1990) reminds us thatwhat 3 monitored is what gets taught and that what getstaught has a better chance of being learned than what is nottaught. He goes on to point out that:

By creating achievement tests that more closely mea-sure valued education outcomes, achievement indica-tors will achieve a reporting function and yet benefit,not create dysfunctions in, education systems. Put an-other way, achievement indicators should be created insuch a way that if education institutions teach to thetest, they will be teaching what is important for stu-dents to know. (Shavelson, 1990, p. 7)

My overall impression is that the states engaged in assess-ment activities do a thorough job of developing specifications.Advice and review is sought from a wide spectrum of the edu-cational community. Increasingly, states seem aware of the factthat their assessment programs should not be narrow in scope.but rather reflect the range of desired instructional outcomes.Massachusetts. for example, has introduced open-ended sec-tions in their state assessment program and has published aseries of booklets to describe the implications for instruction.The recently published NCTM Curriculum and Evaluation Stan-dards for School Mathematics (1989) is a useful document forstates to consider as a meter stick against which the currentrange of their assessment objectives can be measured. Theimplications of these standards are discussed in Chapter 3.

Although some states use commercially developed, norm-referenced tests as a component of state assessment, otherstates have legislative mandates to develop instruments thatare tailored to local objectives and that involve teachers andother educators In the state. States that opt for using an exist-

t 4

Page 85: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 77

ing norm-referenced instrument generally select an Instrumentthat represents the best fit with the state's objectives. Althoughsuch tests are likely to contain a number of items that statedepartment staff and local educators view as inappropriate,budget and other considerations can make the use of commer-cially available assessment instruments appealing. Rather thancontracting with others to develop instruments tailored to thestate's objectives, items in "best fit" commercially available testscan be matched with local objectives. For those areas deemedimportant. but not included on the test of best fit. the state candevelop. or contract to have developed, a tailored instrument tofill in gaps. This possibility will not be explored further, but itdoes represent an alternative to a completely tailored program.Chapter 4 of this book examines a number of standardizedtests. Based on the findings reported, one might expect to finda rather large gap between the current content of these testsand what is called for by the NCTM Standards.

State mandated testing has increased substantially in re-cent years. As Coley and Goertz (1990) report:

In 1989-90, 47 states required that local school dis-tricts test students at some point(s) between grades 1and 12. an increase of five states since 1984-85. Thirty-nine of those states test students using state-developed,state-selected. or state-approved tests and assess stu-dent performance against state-established performancestandards. (p. 3)

Statewide assessment is commonly conducted at three gradelevels. such as Grades 4. 7, and 11. Increasingly, there is ahigh school graduation component administered at one or morelevels in Grades 9-12. In 1990. twenty states required thatstudents pass a basic skills or other competency assessmentbefore receiving a high school diploma.

The major steps in developing any test are the following:

setting specificationswriting itemsreviewing itemsfield testing itemstest assemblytest review

Page 86: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

78 Braswell

The pages that follow will describe in a collective way howcertain of the states accomplish the above steps. Then, threecase studies will be presented to give the reader a better pic-ture of how particular states approach test development. Fi-nally, there is a section on test development activities related tothe 1990 National Assessment in Mathematics.

Test SpecificationsMost states have been conducting state assessments for severalyears and use specifications that have evolved over time. Typi-cally, states actively involve classroom teachers, curriculumcoordinators, and others in setting and reviewing specifications.It is becoming increasingly common for states to seek the ad-vice of those outside the immediate educational communitye.g., representatives of business and industryin the formula-tion of broad educational goals. Such goals serve to focus theassessment activity and provide a support base.

In some states. minimum performance standards have beenset for the schools by the state, and test specifications aredeveloped by local test-writing committees to reflect these stan-dards. State board of education staff may review and approvethe specifications. These specifications are sometimes too gen-eral to guide item writers, and states may contract with anindependent contractor to develop item-level specifications andone or more sample items. The independent contractors fre-quently involve teachers and other educators in setting theitem-level specifications, which normally undergo extensive re-view. When consensus has been reached, specifications andsample items may be printed in a form that can be shared withlocal school districts. Some states use an eclectic approach Intheir assessment activities. Objectives developed by others areeviewed and adapted to local goals.

In order to determine which topics are important, somestates have prepared a long list of topics for the grade levelsbeing assessed. This list is mailed to teachers throughout thestate and they are asked to rate topics. For example, a topicrated 4 might be viewed as extremely Important while one rated1 would be viewed as unimportant. Average ratings are com-piled for each topic. These ratings guide those who set specifi-cations for the various tests. Few states, however, follow suchan extended consensus-building process. In fact. one could

3, 6

Page 87: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 79

argue that a few well-informed educators are in a better posi-tion to determine an appropriate list of skills and topics thanmight be obtained by means of a statewide survey of teachersin targeted grades.

In addition to content specifications, increased attention isbeing given to the process dimension. Some states (e.g., seeMassachusetts Department of Education. 1987, p. 29) use cat-egory specifications recently proposed by the National Councilof Teachers of Mathematics. These categories are ProceduralKnowledge. Conceptual Understanding, Problem Solving. andReasoning and Analysis. Other states (e.g.. see Michigan casestudy later in this chapter) use categories that have greaterspe,:ificity, such as Mental Arithmetic. Estimation. Computa-tion, Conceptualization, Applications, Calculators, and Com-puters. Whatever process categories are used, it is importantthat they have two characteristics: (I) They should span therange of what it is important to measure, and (2) What isincluded under each process dimension needs to be well de-fined. For example. it is not sufficient to simply list "conceptualunderstanding" as a category. The category should be describedand illustrated so that item writers. reviewers, and users willunderstand what the categories mean. Note that the NCTMcategories are generally disjoint process categories, whereas inthe second listing Mental Arithmetic. Estimation, and Compu-tation tend to be procedurally oriented. By specifically includ-ing calculators and computers. these tools are singled out asimportant elements in the assessment framework.

In summary, content specifications for state assessmentinstruments are generally developed under the direction of statedepartment of education staff with the help of teachers. cur-riculum coordinators, other educators, and occasionally repre-sentatives of business and industry. State department staffalmost universally depend on the talents and perspectives ofthose outside the department. This is especially true of statedepartments with limited subject-matter expertise on staff, butit is also true of those state departments that have a fairly largestaff of subject-matter specialists.

While teachers and curriculum specialists play a major rolein setting content specifications, the statistical specificationsare generally determined by others. The primary statistical con-siderations concern the difficulty level of the test and gm ranee

V

Page 88: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

80 Braswell

of difficulty of the various items that will be used on the test.Setting statistical specifications is usually done jointly by theappropriate state department staff and the independent con-tractor. Those states that develop their own instruments with-out the assistance of an independent contractor generally havestaff with psychometric training within the state department.

The statistical specifications depend on the purpose of theassessment. In the mastery-level or basic skills programs. whichwere popular in the 1970s. the statistical specifications gener-ally called for items that the majority of students could pass. Inthe more comprehensive assessment programs characteristic ofthe 1980s, in which the goal is to assess a wide range ofstudent achievement, statistical specifications call for a rangeof difficulty to differentiate among levels of achievement. Forthe most part. statistical specifications tend to be a function ofthe content and ability specifications. For example. a proce-dural knowledge question in a certain content category is usu-ally easier than a conceptual understanding or problem-solvingquestion in the same content domain. Rigid statistical specifi-cations do not appear to drive state assessment programs.

Item-Level Specifications

Some state assessment programs provide few details about thespecific characteristics of items that arc to be written to meetthe content and ability specifications. For a given content speci-fication, a wide range of items would be viewed as acceptableeven though some would be considerably more difficult thanothers. Other states provide extremely detailed item-level speci-fications. These specifications may describe the characteristicsof the numbers to be used and the method of creatingdistractors. In such cases, item writers have less freedom inposing questions. and items written for a given specificationusually tend to be similar in content and difficulty. It is gener-ally easier to pinpoint instructional deficiencies when perfor-mance on such items is weak. A drawback of such specificity isthat the Items so written tend to test lower-level cognitive skillsat the expense of the higher-order problem-solving skills thatare viewed as increasingly important by mathematics educa-tors. While highly detailed specifications are an advantage inidentifying deficiencies in skill areas, highly specified item pa-rameters may impact instruction negatively if there is pressure

Page 89: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 81

on schools to do well on these items at the expense of otherimportant instructional outcomes.

Some state assessment programs are beginning to includeopen-ended items which require students to explain, construct,measure, graph, analyze, and compute. Such exercises are moreexpensive to administer and score, but they provide more de-finitive and interesting information about what students canand cannot do. Those states that administered NAEP blocks inGrades 4, 8. and 12 of the 1990 assessment found traditionalmultiple-choice questions as well as a wide range of open-ended and calculator-active items. During testing. four-func-tion calculators were furnished in Grade 4 and scientific calcu-lators were furnished in Grades 8 and 12.

Item Writing and Review

Once the content and ability-level specifications have been de-termined. the next step is to write items which meet thesespecifications. States use different approaches in item develop-ment. In some states items are written by an independent con-tractor who sometimes involves teachers in the target state.Other states appoint subject area committees to write the ques-tions. As a rule. these committees consist of teachers, curricu-lum specialists. and of her educators with knowledge of thecurriculum at the relevant grade level(s). Still other states con-ract with state universities to develop items to fit the specifica-

tions. Several states use NAEP items or blocks of items thathave been written to be broadly applicable to curricula acrossthe nation and to be con .istent with current thinking amongmathematics educators. (See section on NAEP item-develop-ment procedures for more information on this activity.)

Item review procedures also differ from state to state. Usu-ally. items written by a contracting agency arc reviewed by staffat the agency and edited for style. clarity, and grammar. Theyare then reviewed by state department staff. It is fairly commonfor the stale to appoint a committee consisting of teachers.curriculum coordinators, and/or Independent consultants toreview items. These reviews attempt to answer several impor-tant questions:

1. Is the item technically correct (i.e., does it haveone and only one correct answer, is it unambigu-ous. and is it grammatically correct)?

Page 90: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

82 Braswell

2. Does the item meet the specification?3. Is the item written at a level appropriate for the

target grade?4. Are the distractors reasonably chosen?

At the review stage, items may be rejected, accepted. or revisedThe end product of the item-writing and review phase is acollection of questions ready for field testing. As part of thereview process. items are usually classified according to a con-tent-ability/process matrix, such as the one below. If gaps incoverage are observedi.e., there are too few items in certainspecification categoriesadditional writing and review may berequired.

Table 5-1Content-Ability Matrix

Content Area

MathematicalAbility

Numbers &Operations Measurement Geometry

Data Analysis.Statistics. &Probability

Algebra &Functions

ConceptualUnderstanding

ProceduralKnowledge

ProblemSolving

An example of a question that could go in the cell corre-sponding to "Conceptual Understanding, Geometry." is the fol-lowing:

Points P and Q are on opposite sides of a rectangle thathas length 10 and perimeter 32. If x represents thedistance between P and Q, what is the LEAST possiblevalue of x?

(A) 4 (B) 6 (C) 8 (D) 10 (E) 16

Correct answer (B).

Item TryoutStates usually field test Items prior to their Inclusion on actualtests. In most state assessment programs, the Independent con-tractor assembles the tryout blocks. Tryout procedures vary.

JO

Page 91: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 83

but generally local school districts serve as pilot sites. Somestates pilot Items in experimental forms along with the opera-tional assessment. Items may be in separate blocks or embed-ded In operational test forms. Other states pilot items indepen-dent of operational administration and may provide feedback toparticipating districts. Sample sizes are usually fairly large. Forexample. Florida field tests items on approximately fifteen hun-dred students. When items are tried out on sufficiently largesamples that are representative of the intended population, onecan be reasonably sure that tryout data will hold up on opera-tional administrations.

Test Assembly

All items involved in the tryout phase are candidates for opera-tional forms. As a result of the field testing, some items arejudged as unsatisfactory and are not placed in the Item poolfrom which operational forms are assembled. States view fieldtesting as a,t opportunity to replenish and expand their itempools and to fill gaps that have been identified by those in-volved In the assessment effort. Normally, item pools containItems tried out at different times, some of which may have beenpreviously used in operational assessments.

In some states the independent contractor assembles andproduces operational forms of the test according to the specifi-cations established before item development began. In otherstates, state department staff assemble the operational forms.

Usually, new operational forms of the test are reviewed by avariety of people outside the stale department. Such reviewsare in addition to the review steps taken by the independentcontractor. In Massachusetts. for example. a committee con-sisting of teachers and curriculum coordinators, as well as anequity review committee, review operational tests. In Michigan,tests assembled by an independent contractor are reviewed bya committee consisting of teachers, curriculum coordinators.and college faculty. Michigan Department of Education staffand two independent consultants also conduct reviews.

Not all states equate new operational forms to those built inprevious years. States that do not are likely to have substantialitem overlap from one form to the next. For example, in Massa-chusetts only about 15 to 20 percent of the questions arcreplaced from one year to the next. Performance on the remain-ing Items serves as a basis for making comparisons. In Florida,

Page 92: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

84 Braswell

test assemblers strive to build comparable forms based on sta-tistical data available from previous administrations and itemtryouts. In Louisiana. the blueprints for test and item specifica-tions at Grades 3. 5, and 7 are very detailed. New forms areassembled using a Rasch model and equated to previously ad-ministered forms. A test built to such a model uses Item diffi-culty parameters that will provide the most accurate score in-formation at those score levels which are viewed as the mostimportant. Technically, the standard en-or of measurement iscontrolled at various points on the reporting scale.

Graduation TestsSome states now have a test requirement for high school gradu-ation (that must be passed in order to graduate from highschool). In Florida, for example, the graduation test is firstadministered in Grade 10 and is based on skills approved bythe state board of education. The test contains two sections,communications and mathematics. Each section contains sev-enty-five items measuring fifteen skills. Items are selected withineach skill area on the basis of Rasch difficulty values. Testscores are equated to the base-year scale on the basis of com-mon Items sets and a linking design.

In New Jersey. the state department of education has dis-continued the Minimum Basic Skills Tests that were adminis-tered during the 1980s in Grades 4, 7. 9. and 11. In 1985. thestate implemented the High School Proficiency Test (HSPT),which students may take beginning in Grade 9. The HS17applies to students who enter high school in September 1985.or thereafter. In order to graduate from high school, studentsmust achieve a passing score on the HSPT sometime betweenGrades 9 and 12.

Specifications for the HSPT are developed by state commit-tees of educators and business people. Based on these specifi-cations, item development committees, consisting of teachers,curriculum supervisors and others, write sample items to fitthe specifications. The specifications, together with sample items,go to an independent contractor who writes additional items foreach specification. These items are returned to the item devel-opment committees for review and revision. The committeesdecide whether the items meet specifications and make appro-priate revisions. items _judged appropriate are returned to theindependent contractor for field testing. Items are placed in

R.

Page 93: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 85

tryout tests and field tested in ninth-grade classes throughoutthe state. Following the field testing, item data is available andthe committees rate items as either acceptable or not accept-able. The state department of education chooses from amongthe acceptable items those which satisfy the test specificationsand returns them to the contractor. The contractor is respon-sible for producing test copy which is reviewed by state depart-ment staff. A minority review group reviews the initial itemsthat are written for field testing as well as the final test.

Within a given skill specification, the HSPT content canvary from year to year. The concept of average, for example,might be tested in a straightforward way one year and in asomewhat more advanced way In a subsequent year. A straight-forward approach might require the student to simply find theaverage of a given set of numbers. A more demanding exercisemight require the student to solve an exercise such as thefollowing:

The average of five numbers is 18. If one of the numbersis 10. what is the average of the other four numbers?

(A) 2 (B) 14 (C) 20 (D) 24 (E) 26

The New Jersey state department is currently planning todevelop forms of the examination that are not immediately re-leased so that equating can be done more directly. In 1993, thestate plans to shift to an eleventh-grade test that will be moredemanding than the present ninth-grade test, which containsmaterial through pre-algebra and some geometry. An early warn-ing test may be offered in Grade 8.

CASE STUDIES

In order to provide a clearer idea of how particular states de-velop their assessment instruments, the approaches taken byFlorida. Massachusetts, and Michigan are described below.'

' The author wishes to express his thanks to Mark Heidorn (Florida).Elizabeth Badger (Massachusetts). and Sue RignLy (Michigan) forproviding information about their state assessment activities and forreviewing a draft of the descriptions provided for their respectivestates. Rebecca Christian (Louisiana) and Stan Rabinowitz (New Jer-sey) also provided helpful information about current assessment ac-tivities in their states.

03

r.

Page 94: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

86 Braswell

Case Study Florida1. Who develops and reviews specifications?

Minimum Student Performance Standards are devel-oped under the direction of the state department ofeducation by local writing committees representingFlorida schools. The standards are defined by compo-nent skills, which are the instructional objectives as-sessed by the program. These standards and skillsare reviewed and approved by the state board of edu-cation. The next step is to develop item-level specifi-cations. These are generally written under contract bystate university centers or staff together with person-nel from local school districts. The contracting agency,together with school personnel. are involved in writingand reviewing these specifications. Typically, a teamwrites both a specification and a single sample item.After this phase, the state conducts a department-level external review with a group of teachers andcurriculum leaders who are experienced with the con-tent the specifications are designed to measure andwho have been selected to represent major ethnicgroups and geographical regions of the state. Afterthe item-level specifications and sample items havebeen reviewed, they are sent to the sixty-seven schooldistricts for approval and validation. The goal is tohave each district conduct a thorough review. Mostdistricts take this task seriously. If some specifica-tions need to be reworked as a result of these reviews,they may be resubmitted to the districts for approval.Finally. item specifications and sample items are thenprepared in a formal document which is shared withthe districts.

2. Who writes items?The process for writing items is similar to the itemspecifications and sample items process. The statecontracts again with state university centers or staffor with local school districts who obtain teachers atthe district level to write questions. All test items.after having been internally reviewed by the contract-ing team, are pilot tested by the contractor in Florida

04

Page 95: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 87

classrooms. As a part of pilot testing, students areinterviewed about their solutions to questions.

3. Who reviews items?The items are reviewed by the contracting agency atthe time they are written. This review involves teach-ers employed by the contracting agency.

4. Who edits items?Editing is conducted by the contractor in severalphases. Items are also edited by department staff aspart of the review process.

5. Are items tried out?Items are initially tried out on groups of twenty totwenty-five students. Following this phase, the con-tractor revises the items and prepares them for statedepartment review. After this review, the items arefield tested statewide, using samples of approximatelyfifteen hundred students. The field testing is done atthe time of the regular assessment. Usually, the ex-perimental forms are given immediately after the regu-lar assessment is completed. In some instances, itemsare embedded in operational test forms.

6. What is the item selection/ rejection process?Questions may be revised or rejected at any stage ofdevelopment. Questions that prove to be satisfactoryas judged by the field test results are placed in thestate department's item bank.

7. Who assembles the final test?Members of the state department staff assemble newforms of the test using questions in the item bankThese banks are used for Grades 3, 5, 8, and 10Skills assessed in a given year may vary from theprevious year. but a substantial core of common itemsis generally used. Although the item specifications maybe the same, the items used to measure those specifi-cations might change. State department staff gener-ally attempts to select questions that have approxi-mately equal percent-correct values to measure thesame specification. This tends to assure that tests areroughly comparable in difficulty from year to year Ina given year. approximately 60 percent of the items

Page 96: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

88 Braswell

may be identical to those used in a previous year. Theremainder test the same specification, but with a dif-ferent question. A small number of items pertaining

::ertain skills may be removed from or added to thetest each year.

8. What are the review steps?The test is reviewed internally by state departmentstaff. One staff member has primary responsibility forthis process and several others are involved in assem-bly and review.

9. Are forms equated from year to year?In Grades 3, 5, and 8 the test assemblers try to buildcomparable forms based on available statistical data.Staff replace approximately 30 to 40 percent of theitems each year. The graduation test, which is firstadministered in Grade 10, is based on skills approvedby the state board of education. This consists of twosections, communications and mathematics. Each sec-tion contains seventy-five items measuring fifteenskills. Items are selected within each skill on the ba-sis of Rasch difficulty values. Test scores are equatedto the base year scale on the basis of common itemsets and a linking design.

10. What use is made of the tests?Individual school districts receive considerable feed-back at each grade level. The following materials aremade available: individual student reports, class sum-maries, school summaries, district and regional sum-maries. These reports enable one to identify individualstudents and the percentage of students at each levelwho fail to master specific skills and standards. If theaverage mastery rate for a school falls below a certainlevel, the school is considered to be deficient. Stu-dents who do not pass must be given remediation,but they are not held back from the next grade on thebasis of the test score. Decisions to retain studentscould be made on the basis of the results ofremediation.

11. Miscellaneous comments.Part I of the assessment program at Grade 10 con-sists of basic skills. Part II is an assessment of the

6

Page 97: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

4

State Assessment Test Development Procedures 89

application of the basic skills to every day situations.Part II was moved to the tenth-grade so that studentswho fail can have additional opportunities to take thetest during their junior and senior years. Florida isalso developing a series of secondary school subjectarea testsEnglish I Skills, English I, English Hon-ors, Algebra I, Algebra I Honors, Introduction to Ameri-can History, American History, and Advanced Ameri-can History.

Case Study Massachusetts1. Who develops and reviews specifications?

The Massachusetts statewide assessment programuses the NAEP objectives, but adapts them to fit localgoals. The assessment takes a forward-looking ap-proach as opposed to evaluating the status quo. Theassessment covers Grades 4, 8 and 12. The focus isnot just on what Is taught at these specific gradelevels, but on what it is important for students toknow at each of these grade levels.

In order to determine what topics are important,all public schools in Massachusetts are surveyed pe-riodically. Teachers are provided with a list of contenttopics and are asked to rate the topics according totheir importance at the three grade levels. The stateappoints a committee in each subject area consistingof Massachusetts public school teachers and curricu-lum coordinators. This committee interprets the rat-ings of the various topics and sets the specificationsfor assessment at each grade level. A booklet provid-ing a framework for the assessment is prepared. Thebooklet contains a description of the content to beassessed in Grades 4, 8, and 12. Sample questionsare also provided. A two-dimensional assessment ma-trix provides guidelines for writing items and assem-bling the final test. In addition to content areas, suchas Numbers and Numeration, Measurement and Ge-ometry, Problem Solving, and Probability and Statis-tics, there are four process categories:

Procedural knowledgeConceptual understanding

37

Page 98: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

90 Braswell

Problem solvingReasoning and analysis

The process categories, the content categories, andmany content subcategories become the reporting cat-egories for school results.

Approximately two to three hundred questionsare used at each grade level. There are at least twelvequestions in each reporting category; for example, inOperations: Whole Numbers, there would be at leasttwelve questions scattered across the four process cat-egories.

2. Who writes items?Questions are written by an independent contractor.Massachusetts also uses questions from NAEP andother sources.

3. Who reviews items?Items are reviewed by the independent contractor, com-mittees consisting of teachers and curriculum coordi-nators, and the Massachusetts state department staff.An equity review committee addresses minority andgender issues. In the future, equity issues will beaddressed by a curriculum advisory committee.

4. Who edits items?Items are cclited by the independent contractor andby various reviewers along the way.

5. Are items tried out?All new items are tried out in Massachusetts publicschools. The state department identifies the schoolsand assists in piloting. The independent contractorprepares the tryout tests. All questions that appear intryout tests are reviewed and approved by the com-mittee of teachers and curriculum coordinators. Ques-tions used from NAEP are not tried out since thesehave already been through u similar process.

6. What is the item selection/rejection process?Newly written questions as well as questions fromother sources are considered for the final assessment.Individual items are rated by the committee and placedin one of three categories:

Os

Page 99: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 91

1. The committee likes the question, and it is appro-priate for assessment.

2. The committee likes the question, but some modifi-cation is needed.

3. The question is not appropriate and is rejected.

The committee reviews, culls, and classifies ques-tions, and matches them with the content-processmatrix. If there are gaps. these are filled in by theindependent contractor.

7. Who assembles the final test?The committee approves all items that go into thefinal test. The pool of approved items then goes to theindependent contractor who assembles booklets ac-cording to a matrix-sampling plan that will meet thecontent and process goals of the assessment. The pur-pose of the assessment is not to provide a report ofindividual student performance, but rather to providedata at the school, district, and state levels about theperformance of Massachusetts students on importantcontent and process dimensions. Some open-endedquestions are included in the assessment.

8. What are the review steps?The test is reviewed by a committee of Massachusettsteachers and curriculum coordinators, Massa-chusetts state department staff, and the independentcontractor.

9. Are forms equated from year to year?Although test forms are not equated from year to year,results can be compared, since only about 15 to 20percent of the questions are usually replaced betweenconsecutive years.

10. What use is made of the tests?The test results are used in three basic ways:

At the school level, reports are prepared if morethan twenty students in a school are tested at agiven grade level.Detailed reports are prepared for use by the statedepartment of education.Questionnaires administered to teachers, students,and principals are used to help explain the results

Page 100: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

92 Braswell

of the assessment. For example, actual test ques-tions are shown on the teacher questionnaire andteachers are asked about the students' opportunityto learn what is tested by the various questions.This type of information helps interpret results.

11. Miscellaneous comments.A primary goal of the Massachusetts assessment pro-gram is to help schools evaluate and improve theirperformance. Several activities currently underway aredirected toward this goal. For example, responses tothe open-ended sections of the test have been ana-lyzed by curriculum committees and the items, re-sults, and implications for instruction will be pub-lished in a series of booklets. Teachers will beencouraged to administer the questions to their ownstudents for comparison and diagnosis.

Massachusetts is also promoting performancetesting in mathematics and science. Approximatelyseventy teachers have been trained to test a randomsample of one thousand students in Grades 4 and 8.Videos of the testing, together with written reports ofthe results, will be used to help teachers improveclassroom assessment.

Case Study Michigan1. Who develops and reviews specifications?

The Michigan Educational Assessment Program (MEAP)tests all students in mathematics and reading atGrades 4, 7, and 10. The basis for these tests are the"Essential Goals and Objectives in Mathematics" (de-veloped by groups of Michigan content specialists andapproved by the state board of education) and itemspecifications. State department staff are currently re-vising the specifications for the mathematics tests aspart of a comprehensive test revision process result-ing from the recent approval of new "Essential Goalsand Objectives in Mathematics." The first every-pupilMEAP administration of the revised mathematics testsis scheduled for 1991.

The "Essential Goals and Objectives" may beviewed as a table of specifications for the test, whereasitem specifications detail the standards and conven-

1 0 0

Page 101: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A

State Assessment Test Development Procedures 93

tions to be employed within individual items. Itemspecifications for the current mathematics tests weredeveloped by the Michigan Council of Teachers ofMathematics several years ago. Some gaps in thoseitem specifications have been identified and will becorrected as part of the test-revision process.

The test-revision process includes consultationwith content area specialists (teachers, curriculum co-ordinators, and others knowledgeable about the math-ematics curriculum) throughout the state. A twenty-member advisory group, the Mathematics CoordinatingCommittee, has been formed to advise the MEAP staffon test-related issues, including Item writing. The com-mittee represents different geographical regions of thestate as well as different vertical perspectivesthatis, college faculty, teachers at appropriate grade lev-els, and curriculum specialists. Such diversity tendsto assure a good balance of viewpoints. Also, thisgroup tends to be a good resource for disseminatingaccurate information about the assessment programacross the state.

2. Who writes items?Items are generated by mathematics educators inMichigan based on objectives approved by the stateboard of education. Meetings are held at which matheducators establish a conceptual base for fozinulatingquestions. Important instructional outcomes are iden-tified and the questions developed. The items are writ-ten by various subgroups for each of several contentstrands. For example, the content strands include ar-eas such as fractions, decimals, and geometry. Thewriting groups also focus on the process dimension.The following process categories are used:conceptualization, mental arithmetic, estimation, com-putation, applications, calculators, and computers.

3. Who reviews items?Items are reviewed and edited by an independent con-tractor. The contractor comments on (a) the matchbetween item and specification, (b) item classification,and (c) duplication of items and psychometric consid-erations. The contractor types items, supplies relevantart work, provides comments on individual items, and

101

Page 102: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

94 Braswell

then returns them to the original writing group for astrand review. Classroom teachers and content spe-cialists who wrote the items review them along withthe contractor's comments. Substantial modificationmay occur at this stage. Items are also subjected toin-house review by one member of the TAP staffand by two consultants. The following questions areasked as each item is reviewed:

1. Is the item in its current form appropriate to theobjective?

2. Is the item clearly worded?3. Is the item grade-level appropriate?4. Are the distractors reaseinable?5. Is the artwork correct?

4. Who edits items?Items are edited by the independent contractor (seestep 3 above).

5. Are items tried out?The most recent item tryout was informal, consistingof questions that were administered to a sample ofstudents from representative districts. Following thisphase, MEAP staff asked that schools volunteer for aformal item tryout. Districts agreed to provide stu-dents, and test results were shared with the pilotschools. Michigan also pilots the actual test to seehow the questions fit together as a group and to pro-vide a trial run of administration procedures and sup-port materials The next test pilot is scheduled for thefall of 1990.

6. What is the item selection/rejection process?The twenty-member advisory group and various otherreview groups have input throughout the test devel-opment process. The final selection of items is madeby MEAP staff.

7. Who assembles the final test?The final test is assembled by the independent con-tractor with a detailed recipe furnished by MAEP. At-tention is given to appropriate content areas as wellas to the problem-solving process dimension.

102

Page 103: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

r

State Assessment Test Development Procedures 95

8. What are the review steps?The test prepared by the independent contractor isreviewed by the twenty-member committee, MAEP staff,and two consultants. There is a pilot trial of the testas described under Step 5 above Final test copy isproofed by MEAP staff, the advisory committee mem-bers, and content specialists.

9. Are forms equated from year to year?In the past, the test changed very little from year toyear and there was no reason to equate. New formsare being developed, and they will be equated andyear-to-year trends charted.

10. What use is made of the tests?Test results are used to generate student reports,school and district level reports, and detailed statesummaries. At the school level, results provide usefulinformation for:

individual student remediation,curriculum review and improvement, andstudents, parents, school boards, and the public.

At the state level, results arc also used as a basis forfunding allocation and research.

11. Miscellaneous comments.Michigan is considering the development of an Em-ployability Skills assessment. The impetus for thisactivity is the perceived lack of certain basic skills onthe part of those who enter the job market.

National Assessment of EducationalProgress (NAEP) Procedures

Because so many states are involved in various aspects of na-tional assessment, including the use of NAEP objectives, items,and comparative data, it may be useful to outline briefly NAEPdevelopment procedures. In many ways, the NAEP procedureshave parallels in state assessment efforts. However, the con-sensus-building activity, specifications-setting process, and test-development procedures are generally more complex. An out-line of the key NAEP acts-Titles leading to the 1990 assessmentis provided below.

103

Page 104: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

41.

96 Braswell

Planning PhaseIn 1988 Congress authorized NAEP to provide for voluntarystate-by-state assessment in addition to the traditional assess-ment activities carried out by NAEP over the last twenty years.Government funds were made available to the Council of ChiefState School Officers (CCSSO) to lay the groundwork for statecomparisons. The charge to the CCSSO was to recommendobjectives for the state-level assessment and to suggest howstate results should be reported. A planning group under thedirection of the CCSSO was appointed to develop objectives forGrades 4, 8. and 12. Legislation was subsequently passed tospecify that Grade 8 would be the target grade for state-by-state assessment activity. Because the 1990 assessment is de-signed to provide state-level performance reports as well asnational reports, in its planning phase careful attention wasgiven to the objectives of the various states. Also, advice wassought from various other groups.

The objectives developed under the direction of the CCSSOwere then formulated and refined by a Mathematics ObjectivesCommittee composed of teachers, administrators, mathematicseducators from various states, mathematicia parents. andcitizens. Sample questions were also develop._ _.. These materi-als underwent extensive review by the states, by NAEP policygroups, and by others. The objectives underwent further reviewby NAEP's Item Development Panel and a framework for theassessment was published in November 1988.2The frameworkcalled for assessment in five content areas and three abilitylevels. The content areas are 1k. mbers and Operations; Mea-surement; Geometry; Data Analysis, Statistics, and Probability;and Algebra and Functions. The ability levels are ConceptualUnderstanding (CU). Procedural Knowledge (PK), and ProblemSolvir (PS). Percentages were set at each of Grades 4, 8, and12 for noth content areas and ability levels. For example, atGrade 8 the percentages for CU, PK, and PS were 40 percent,30 percent. and 30 percent, respectively. The Mathematics Ob-jectives booklet provides a detailed description of what the abil-

2This publication. Mathematics Objectives 1990 Assessment, can beordered from the National Assessment of Educational Progress atEducational Testing Service, Rosedale Road. Princeton, NJ 08541-0001. Refer to publication No. 21-M-10.

Page 105: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 97

ity levels encompass and the range of subtopics included underthe various content areas Sample items are also included.

The ten-member Mathematics Item Development Commit-tee then began work on the item-writing phase of the project. Inaddition to this group, about twenty-five other mathematicsteachers and educators across the country were asked to de-velop new items. Test specialists at the Educational TestingService (PAS) conducted an on-site training session for thirteenof the item writers and made specific item-writing assignmentsbased on the objectives.

The following steps were then taken by the NAEP/ETS de-velopment team:

1. Internal review of newly developed items. Itemswere reviewed for clarity, appropriateness to thespecification, technical accuracy, age-level appro-priateness, distractors, and for possible offensive-ness to population subgroups. Gaps in specifica-tions were filled by test development staff.

2. Items that were judged acceptable were classifiedand filed. Rejected items were filed separately.

3. Seven mathematics test specialists assembled draftfield test blocks, keeping in mind the overall tar-get specifications for the 1990 assessment, as wellas making judgments about the overall contentand difficulty of individual blocks.

4. Individual blocks were reviewed internally by othertest specialists, the first reviewer working with theassembler to achieve mutually acceptable retisionsor replacements.

5. A second reviewer then reviewed the revised ver-sion to yield a final draft of the block.

6. All draft blocks were typed and submitted to sev-eral program staff for their review.

7. Final revisions were made to blocks which werethen proofed and keyed by yet another test spe-cialist.

8. The eighth-grade blocks, which will be used forthe state trial assessment, were reviewed by aboutsixty state representatives from forty states. Thisgroup consisted primarily of mathematics special-

133

Page 106: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

98 Braswell

ists and testing directors. The blocks were reviewedin groups consisting of ten state representatives.one NAEP/ETS staff member, one EFS test devel-opment specialist, and one staff member from theNational Center for Educational Statistics (NCES).This process assured that each block was reviewedindependently by two groups. Suggestions fromthe various groups regarding the appropriate-ness of questions, wording. classification, andother Issues were collated and shared with theItem Development Committee, which met shortlythereafter.

9. After all blocks were prepared, they were mailedto members of the Item Development Committeefor review. A committee meeting was subsequentlyheld to discuss and revise the questions so thatfinal blocks could be prepared for field trials Atthis meeting, suggestions made by the sixty staterepresentatives were evaluated and incorporatedinto the blocks, or rejected for use.

10. After the Item Development Committee meeting.revisions were made to blocks by the various testassemblers, and each block was edited for clarity,usage, and format. All blocks received a sensitiv-ity review as prescribed by E lb guidelines.

11. Final draft blocks were submitted to NCES forreview and clearant° by the Office of Managementand Budget.

12. Camera-ready copy of each block was then pre-pared and reviewed by the original test assemblerand another mathematics test specialist.

13. Galleys were produced and following additionalquality control checks by test specialists and NAEPprogram staff, the materials were printed andshipped to field test sites.

14. In February 1989, field trials took place in thenation's schools.

15. Assembly of the final 1990 assessment blocks fol-lowed much the same development and review pro-cedures that are outlined above. However, itemstatistics on multiple-choice and open-ended items,

106

Page 107: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

State Assessment Test Development Procedures 99

as well as the readability characteristics of stu-dent solutions to the open-ended items, were avail-able to assist in final assembly. Also, items fromprevious NAEP assessments together with thenewly field-tested items were available for finalassembly. All items were classified and assem-blers of final blocks maintained counts of itemsselected so that overall content, ability, and sta-tistical specifications were met.

SUMMARY

My overall impression is that the states engaged in assessmentactivities do a thorough job of developing specifications. Adviceand review is sought from a wide spectrum of the educationalcommunity. Also, representatives of business and industry, aresometimes involved.

Items and tests appear to receive thorough reviews. The onearea that seems to need more attention is item writing. Nomatter how carefully specifications are set, if the questionswritten to tit these specifications are not crafted with skill andcare, the impact of the assessment will not be as significant asit might otherwise be. The new NCTM Standards are beginningto have an impact on state assessment activities. In the future,the Standards should play an even greater role in providingproper focus for test content and the process dimension alongwhich content resides. At the present time, states are makingstrides in extending the range of processes measured to bettercover conceptual understanding and higher-order thinking. How-ever, in most available assessment instruments there are manymore examples involving standard procedures and simple un-derstandings than exercises that call for deeper understandingand significant problem solving.

It might possibly be helpful to view each question writtenand selected for inclusion on the assessment instrument as acandidate for the front page of the New York Timesdisplayedthere so all can see what it is important to test. Viewed inthis context, certain questions might be withdrawn fromconsideration.

Page 108: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

6

Test Development Profile of a State-Mandated Large-Scale AssessmentInstrument in Mathematics

Tej Pandey

From the methodological point of view, the character and de-sign of test instruments that are optimum for individual-leveland for group-level assessments are quite different. This paperexamines the nature and design of test Instruments of a large-scale assessment program, the California Assessment Program(CAP), which is designed to provide reliable group-level infor-mation. The paper also describes the test development processas it has evolved over a period of fifteen years to meet thecurriculum demands of the time.

Large-scale assessments can be classified into two maintypes: in one the interest lies primarily at the level of individu-als and in the other the interest lies primarily at the grouplevel. Individual-level assessments typically use test informa-tion to rank a student on an established norm, find anIndividual's strengths and weaknesses, and determine whethera student has mastered specific course content. Group-levelassessments typically use the information to measure theachievement level of students in a school, district, or regionalsystem for purposes of determining program effectiveness.Group-level assessments generally are concerned with trendsin achievement from one cycle of assessment to the next andmay even incorporate provisions in assessment designs to re-late trends with other factors.

106

100

Page 109: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 101

PURPOSES OF LARGE-SCALE ASSESSMENTS

Large-scale assessments generally have their genesis either infederal programs, such as Titles III and V of the National De-fense Education Act (NDEA), or in state-level accountabilityprograms. Both federal- and state-level accountability programsrequire the centralized collection of test scores, primarily fromcommercially published standardized tests, by state depart-ments of education. In the last two decades, although the char-acter of assessments for accountability has changed and thepurposes of assessment have varied from program to program,they are usually designed to evaluate curricular programs, gathercurricular and related information for policy development, andstimulate curricular practices. As Cronbach (1980) states:

Outcomes of instruction are multidimensional, and asatisfactory investigation will map out the effects of thecourse along these dimensions separately. ... To agglom-erate many types of post-course performance into a singlescore is a mistake, since failure to achieve one objectiveis masked by success in another direction.... Moreover,since a composite score embodies (and usually conceals)judgments about the importance of the various outcomes,only a report that treats the outcomes separately can beuseful to educators who have a different value hierar-chy. (p. 236)

In other words, assessment for broader educational andsocietal uses calls for tests that are comprehensive in breadthand depth. Both breadth and depth can be covered by includ-ing a large number of questions for assessment using a varietyof assessment modes, such as direct -ssessment of perfor-mance, open-ended questions, and portfolios, in addition to themultiple-choice format.

MULTIPLE -MATRIX SAMPLING

Since the interest in program assessment is not to obtain scoresfor individual students but to see how well a body of subjectmatter has been learned by a cohort of students, multiple-matrix sampling or item sampling can be used effectively. Un-der matrix sampling or an item-sampling plan, a universe of

109

Page 110: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

102 Pandey

test items is subdivided into multiple test forms with each formadministered to a certain number of examinees selected ran-domly from the population of examinees. Although each exam-inee is administered only a portion of the test items in the totalpool, the results from each subtest may be used to estimate theparameters of the universe scores, such as the mean, variance,and associated standard errors.

Item-sampling procedures have several advantages over theconventional testing procedure. First, since in item samplingno student takes more than a small portion of the total itempool, the test takes less classroom instructional time, is lessfatiguing to students, and results in greater cooperation fromstudents and school authorities. Second, since it allows fortesting a large number of questions, it results in a comprehen-sive assessment leading to more content-related information.Third, it produces more reliable group scores.' Lord (1962)showed that for a fixed number of student-item confrontations.the group mean of the item domain is estimated most reliablywhen the size of the item subset is one, that is, when each itemis taken by a different sample of students. Greater reliability isachieved because item responses tend to be positively corre-lated over the population: two items presented to one studentwill not generally supply as much information about the meanof tree item domain as two items presented to different stu-dents.

THE CAP'S ITEM-SAMPLING DESIGN

As of spring 1990, the California Assessment Program adminis-ters tests in reading, written expression, and mathematics an-nually in Grades 3, 6, 8, and 12. Also, science and history-social science are assessed at Grade 8, and direct writing isassessed at Grades 8 and 12. The assessment program uses anonoverlapping item-sampling design (see Pandey, 1974; Pandeyand Carlson, 1975, 1983) for student assessment. Since theassessment program provides information to all schools "smalland large," all students are tested rather than a sample ofstudents. Table 6-1 shows the total number of questions andthe number of forms in each of the content areas for eachgrade tested.

From Table 6-1 it is apparent that the Surtsey of BasicSkills: Grade 3 (1980 version) consists of a total of 1,020 items-

1 1 0

Page 111: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 103

240 in reading, 420 in written expression, and 360 in math-ematicsdivided into thirty unique test forms. Under the item-sampling procedure, each test form consists of a total of 34questions made up of 8 question- in reading, 14 questions inwritten expression, and 12 questions in mathematics. Eachform of the test is constructed to have an equal number of easy

Table 6-1Number of Questions and Test Fo ms of CAP Tests

at Grades 3, 6, 8, and 12

Grade 3 . Grade 6 Grade 8 Grade 12 Total

First year administered 1980 1982 1984 1987

Content areas testedEnglish-language arts x' x'ReadingWritten expression

xx

xx

x x

Direct writing assessment 1987 1988Mathematics x x xHistory-social science 1985Science 1986

Number of forms 30 40 60 24 154Items per form 34 31 36 26"Total items 1,020 1,240 2,160 608 5,026

Items per form by contentReading 8 10 6 10Written expression 14 9 4(editing) 4(editing)Mathematics 12 12 7 10History-social science 10Science 9

Number of skill scoresReading 27 54 13 13 107Written expression 34 39 73Direct writing assessment 33 17 50Mathematics 29 50 40 9 128History-social science 41 41Science 40 40

Total skill scores 90 143 167 39 439

Supplementary informationSex x x x xMobility x x x xEnglish-language fluency x x x xOther language spokenSESParent occupationSESParent education

xx

xx

x

x

x

xSpecial program particip. x x xTime reading x x x

continued or next page

Page 112: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

104

Table 6-1 Continued

Pandey

Grade 3 Grade 6 I Grade 8 Grade 12 Total

First year administered 1980 1982 1984 1987

Supplementary Info. CentTime watching TV x x xTime on homework x x x

Writing assignments x x x

Attitude toward subjects x xEthnic backgroundCourses completedExtracurricular activities

. x x xxx

Grades repeatedSpecial math questionsPost high school activitiesSchool climateOpen-ended mathematics

xx

xx

x

NOTES(*) At grades 8 and 12, reading and edi ing are combined into English-language artsel 24 items per form plus two math computation items per each of 15 supplements (30

computation items total)

and difficult questions and consists of items from all major skillareasstratified by difficulty and content. For test administra-tion, the forms are stacked sequentially and are distributed tostudents in a manner similar to conventional tests. Since eachstudent takes only one of the thirty forms containing 34 ques-tions, testing time is limited to only one class period.

Since CAP administers tests to each student in each school,it allows for aggregating data to produce reports at the schooland district levels. Although CAP's procedure could allow re-ports at the classroom level, no classroom reports are produced.

The report shown in Figure 6-1 is the skill area report for atypical school, showing the total score along with the subscoresuseful for program diagnostic purposes. The subscores areshown with a band of 0.67 standard error of measurementaround the point estimate to discourage overinterpretation ofskill area scores. In general, if the skill area band Is below thetotal score line, it reflects an area of relative weakness; simi-larly, if the band is clearly above the total score line, it reflectsan area of relative strength. If the band overlaps the total score,it is neither an area of relative weakness nor relative strength.The interpretation and meaning of these data must be judged

112

Page 113: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-1.

CA

P S

urve

y of

Bas

ic S

kills

: Pro

gram

Dia

gnos

tic D

ispl

ay fo

r M

athe

mat

ics.

Gra

de 3

-198

8.

a an

al a

nM

elee

f e l

l a1

1 O

l i n

4 :

0 0

1 1

0 1a

l the

A fl

a r

epr

am s

alin

g/am

icar

nes

Dan

n at

s.

As

tom

, Won

. es

Kan

F i

n w

e e

l t t

otLa

na fo

r 0

wan

In O

ldca

rran

naS

ctin

a Is

an

NC

ALS

HA

MR

OC

K E

LEM

EN

TA

RY

ora

WE

ST

UW

E()

Cou

nt,

CA

LVO

ra C

OM

7Y50

5:91

7123

4547

1973

12

bi

I cS

ipie

yKI A

t'M

OM

Y t

t t c

oI 2

1 2

It e

s p

r o

t e

n d

W e

n a

l a

A d

v e

nt r

a l

M A

inla

id+

WE

a ic

onY

earn

In O

atm

aron

ttr

ee.

Pao

VO

N It

Wan

ed11

11na

ps a

rra

War

ner

MO

WN

as n

aps

In M

OW

Ira

Ste

Par

t NS

Ilka

avar

raan

relle

ala

AM

She

'Alf

WIlM

lia a

lma

Ww

e In

depn

dow

n in

lo s

kis

and

appr

oam

e6-

5A

poka

tai s

on a

lona

bbm

men

plim

of K

am fa

S m

ples

Sm

om

parla

anea

ral

eara

MA

TH

EM

AT

ICS

SK

ILL

AR

EA

S

84.7

.4 -In

a1

se IS

MS

c. s

ows

el20

2 la

lepa

nne

Is k

r w

alle

tI

Ta.

'"'..

..n

taw

OR

ISO

400

u-87

la

MA

TH

EM

AT

ICS

an .1

2

Cam

ay a

nd P

a W

arS

asle

ircan

S

216

*21

163.

2626

72 1

35

urn

RM

r ..r...ii

Orr

afio

ntE

ar n

abM

ann

San

aaha

mat

onW

arm

Ba

ac

lace

Adr

arna

coon

lara

nnar

292

Ina)

6 *4

327

9 05

293

127

346

0325

3 *2

120

6 *3

777

11 *

3424

1 04

16 NW

RS

NW

76

x:58

.7=

7171

-L7 C

..aS

ESM

gra

MM

L".7

.-M

all

Nat

ure

al N

eman

and

Pro

wl:D

aP

rope

rbts

al r

enim

are

Maw

sad

Maw

sA

ppk:

16,1

1

216

*22

275

0627

5 06

235

.15

t:...t

tetr

a

Inv

qal

&C

MS

'77

Gal

anS

ea War

ms

*28

2254 56

04

252

Of

-M

I

IF"

Mt

aLe

En.

....

..77.

Cril

lE

LM

amm

alLi

ner

Ws

to

App

icab

a ns

232

.21

263

0227

2 07

161.

%

an 116

net

oww

. -t."

7.I_

. mom

sLe

hr m

eant

an--

-..

Oan

orm

M! U

mS

alk

Ain

r-M

VIS

MO

04

2.0

139

130

01

yrnf

r36

7 03

265

116

RC

Pte

Mon

&av

ian

=nn

d m

ars

s

N.

Inia

laa=

ijfl

BE

ST

CO

PY

AV

AIL

AB

LE

Page 114: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

106 Pandey

professionally by curriculum experts before any curricularchanges are made. The experts will take into consideration theimportance of certain skill areas, their interrelationship withother skill areas, and the nature and relevance of the questionson the test.

A BRIEF HISTORY OF THE CAP

As discussed earlier, the purpose of the CAP is to provide pro-grammatic information to schools, districts, and the state as awhole. The changes in the assessment program, therefore, shouldbe seen in light of this purpose.

Until 1972, achievement testing in California consisted oftesting students in a variety of grades with one or more com-mercially published standardized tests. For example, in 1972,the CAP administered the Comprehensive Tests of Basic Skills(CTBS), Form S at Grade 3; the CTBS, Form Q at Grade 6; andIowa Tests of Educational Development (ITED) Form X, at Grade12.. However, it was soon realized that the standardized testsdid not match California's curriculum. A statewide task forcewas established to examine the content of these tests and tomake recommendations to the legislature. On the basis of therecommendations of this task force, the California legislaturemandated that the CAP develop tests that would be appropriateto assess the variety of curricular programs in California.

Beginning in 1972. the California Assessment Program ini-tiated the development of new tests at Grades 3, 6, and 12.With the help of statewide committees, CAP developed test con-tent specifications at each grade level. For the sake of effi-ciency, CAP chose to lease questions from test publishers' itempools that matched the specifications, rather then writing itsown questions. Since only reading was assessed at Grade 3,and reading, written expression, and mathematics were as-sessed at Grades 6 and 12, the use of leased publishers' itemswas reflected in the 1973 version of Grade 12 and the 1975version of Grade 6 tests. Because only items from the publisher'sitem bank were used, the committees soon realized that thetests were limiting in scope, as reflected in the test contentspecifications, because items reflecting the quality of California'scurriculum were not available for many of the strands ofmathematics.

Ia. 4

Page 115: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 107

In 1975, the CAP started developing its own test items withthe help of statewide content area advisory committees consist-ing of educators throughout California. The CAP constructedits first instrument in mathematics at the third-grade level in1975, followed by the revision of the sixth-grade test in 1982, anew test at Grade 8 in 1984, and a revision of the twelfth-gradetest in 1987. The following sections describe how the specifica-tions of test content and, therefore, the nature of questions onthe test, have been changed from 1975 to 1987.

THE CAP'S TEST CONSTRUCTION PROCEDURES

The California Assessment Program began constructing its owntest questions in 1975. Although from a mechanical aspect testconstruction procedures from 1975 to date have remained thesame, significant substantive changes have taken place in thenature of questions since that time. The paragraphs below firstdescribe the mechanics of test construction followed by thechanges in the specification of test questions and the writing ofthose questions.

Mechanics of Test Construction

Following are the main steps in the CAP's test developmentprocess for mathematics:

1. Establishing an assessment advisory committee. An as-sessment advisory committee is established consistingof curriculum specialists from the following groups:school districts, offices of county superintendents ofschools, professional associations, the California StateUniversity, the University of California, and the statedepartment of education.

2. Reviewing existing curricular and instructional materialsThe CAP staff reviews the California Mathematics Frame-work, the state-adopted textbooks, county courses ofstudy, and other curriculum materials, such as the ModelCurriculum Guide, to prepare preliminary test contentspecifications. The members of the advisory committeereview the specifications and help CAP staff write illus-trative test questions.

Page 116: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

108 Pandey

3. Establishing a test development team. In addition to theassessment advisory committee, an ad hoc item-writingteam, consisting primarily of classroom teachers fromthe appropriate grade levels, is established. The teach-ers are selected from a pool established from the recom-mendations of the advisory committee members, direc-tors of the California Mathematics Projects, officers ofthe California Mathematics Council, the pool of appli-cants for the president's award in mathematics, andother mat..ematics educators having a stake in assess-ment. Approximately ten to fifteen teachers per gradelevel serve on the item-writing team.

4. Writing test questions. The item-writing team membersare given the task of writing items. Some members ofthe advisory committee who have a special interest in aspecific grade level also participate in the item-writingprocess.

Questions are written by the team members individu-ally or jointly in small groups. In certain hard-to-mea-sure concepts or problem-solving tasks, three or fourmembers of the team may be engaged in discussion withperhaps two members of the team listening to the discus-sion, writing, and verifying with members discussing theconcept that their item was the one under discussion.

For example, in 1980 when the sixth-grade test wasbeing revised, the discussion group felt that studentsperformed quite well on questions related to mathemati-cal operations, but they did not understand what thedifferent steps in the operation meant. The group wantedto provide a question in which the student did no com-putation but could interpret the results of a correctlyperformed calculation. After several trials, the item writerwrote the following question:

130 students from Marie Curie School want togo to a school picnic. A school bus can carry 50students. John did the following calculation tofind the number of buses needed for the picnic.

250) 130

10030

4

Page 117: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 109

John's arithmetic is correct. How many buseswill be needed to carry all the students?

(A) 30 (8) 3 (C) 2 (D) 2R30Of course, the above question is the final product

after several reviews and edits. The point is that an itemlike this requires collective thinking, checking, and vali-dation before it takes final shape.

5. Reviewing and editing. The item-writing team membersusually meet six to eight times for two or three dayseach over a period of nine to twelve months. After theteam members have completed the writing process, theadvisory committee and the item-writing team Jointlyreview the items. The questions are edited for clarity,appropriateness of response choices, and mathematicsassessed by the items. After the committee review, theCAP staff reviews each item for consistency in format,correctness of artwork, and precision of technical writing.

6. Field testing items. Usually the number of questions forfield testing is quite large. For example, during the testconstruction phase of the eighth-grade test between 1982and 1984, approximately fifteen hundred questions werefield tested in mathematics. For field-testing purposes,the questions are distributed into short forms, each formconsisting of approximately thirty-five items so as to beeasily administered in one class period. Each form isbalanced for content and difficulty so that the studentsees each form as a complete test in mathematics. AllCalifornia school districts are sent an invitation to par-ticipate in field testing. School districts are also asked iftheir teachers would be willing to participate in itemreview. In this process, teachers review the questionsfrom two test forms for clarity and indicate the degree ofinstructional emphasis and appropriateness of theseitems as a measure of the effectiveness of their district'smathematics program. Of the approximately eight hun-dred school districts having an eighth-grade, five hun-dred volunteered to participate in field testing. Approxi-mately six hundred teachers reviewed the questions andapproximately twenty thousand students participated inthe field testing process.

7. Calculating item statistics and compiling field review dataNumerous item statistics, such as item difficulty for

117

Page 118: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

110 Pandey

each item in the group as whole, are calculated. Itemstatistics are also arranged by subgroup of students,such as by sex. ethnic group, language fluencygroup, and socioeconomic category. Item correlationwith the total test is also calculated for each group andfor each response choice. Several bias indices, indicat-ing the discrepancy between the performance of a par-ticular group and the total test population, are alsocalculated.

8. Reviewing field-tested items. Advisory committee mem-bers review the difficulty of each item and look for prob-lems such as bias, unclear wording, inappropriate re-sponse choices, or inconsistent formats among the itemsto assure that only the best items survive the analysisof the field test. The items are modified or deleted basedon an indication of bias and inappropriate or misleadingwordings. The committee uses field test data to improvethe overall quality of items. The modified items are fieldtested again to check whether the modifications haveintroduced additional unforeseen defects.

9. Selecting the final set of items. The advisory committeemembers. working with the CAP staff, select the finalset of test questions. The selected questions reflect theproportions of items according to an agreed-upon distri-bution of items as specified in the test content specifica-tions. For example, the distribution of items accordingto various reporting categories of mathematics for thesixth-, eighth-, and twelfth-grades (1984 version) isshown in Tables 6-2, 6-3, and 6-4 respectively.

Table 6-2Skill Areas Assessed in MathematicsGrade 6

I. Counting, Numeration, and Place ValueA. Skills

1. Counting and numeration2. Place value

S. Applications

II. Nature of Numbers and PropertiesA. Skills

1. Ordering and properties2. Classification of numbers

B. Applications

Page 119: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 111

III. OperationsA. Skills

1. Addition/subtraction of whole numbers2. Multiplication of whole numbers3. Division of whole numbers4. Addifion/subtra an of decimals5. Multiplication/division of decimals6. Operations on fractions7. Percents and equivalent fractions and decimals

B. Applications1. One-step involving whole numbers2. One-step involving rational numbers3. Two (or more) steps

IV. Expressions, Equations, and Coordinate GraphsA. Skills

1. Expressions and equations2. Graphs and function tables

B. Applications

V. GeometryA. Skills

1. Shapes and terminology2. Relationships

B. Applications

Vi. MeasurementA. Skills

1. Metric units2. U.S. Customary units3. Perimeter, area, and volume

B. ApplicationsVII. Probability and Statistics

A. ProbabilityB. Statistics

VIII. Tables, Graphs, and Integrated ApplicationsA. Tables and graphsB. Integrated applications

IX. Problem SolvingA. FormulationB. Analysis and strategyC. InterpretationD. Solution of problems

Table 6-3Skill Areas Assessed in MathematicsGrade 8

(Total number of questions: 468)

PercentI. Numbers 15

A. Skills/concepts 101. Order relations and classification 32. Number theory 43. Properties 3

B. Applications 5continued on next page

4 .L

4

Page 120: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

112 Pandey

Table 6-3 Continued

II. OperationsA. Skills/concepts

Percent15

71. Whole and rational numbers 42. Percents, proportions. and

conversions3

B. Applications 81. One-step 42. Two or more steps 4

III. Algebra 15A. Skills/concepts 10

1. Expressions and equations 52. Graphs and functions 5

B. Applications 5

IV. Geometry 15A. Skills/concepts 10

1. Geometric terms and figures 42. Geometric relationships and postulates 6

B. Applications 5V. Measurement 9

A. Skills/concepts 61. Units and estimations 32. Measurement of perimeter, area, and volume 3

B. Applications 3

VI. Probability and Statistics 8A. Probability 4B. Statistics 4

VII. Tables, Graphs and Integrated Applications 7A. Tables and graphs 4B. Integrated applications 3

VIII. Problem Solving 16A. Formulation of a problem 4B. Analysis of a problem 4C. Strategies 5D. Interpretation 3

Table C-4Reporting Categories

Survey of Academic Skills: Grade 12Mathematics

I. Problem Solving/Reasoning [25%]A. Problem formulationB. Analysis and strategiesC. Interpretation of solutionsD. Nonroutine problems/synthesis of routine applications

II. Understandings and Applications [75%]A. Numbers and Operations [14°4]

1. Nature of real numbers2. Selection and use of operations on real numbers

Page 121: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 113

B. Patterns, Functions, and Algebra [17%)1. Patterns2. Relations, functions, and graphs3. Algebra

C. Data Organization and Interpretation [18%)1. Organizing data as graphs and charts2. Statistics3. Probability and systematic counting

D. Measurement, Geometry, and Spatial Relationships (18%)1. Mensuration2. Geometric and spatial relationships

E. Logical Reasoning [8%)1. Quantifiers, Connectives, and Relationships2. Using deductive and inductive reasoning

10. Reviewing the selected questions. The f nal set of ques-tions is then subjected to another review by CAP staffand testing professionals. In addition, a variety of itemstatistics are examined in the search for otherwise un-detected defects and sources of bias. The questions arealso reviewed by experts for linguistic, ethnic, and gen-der bias.

Test Content Specifications

Test content specifications are the blueprint for test item con-struction. The test content specifications denote the depth andbreadth of what is considered important for assessment. Theyare also the bridge between curriculum/instruction on the onehand and assessment on the other. In other words, test contentspecifications serve as the main evidence to establish contentvalidity of a test instrument.

The reader will discern that the procedures for delineatingtest content specifications in the CAP have gone through changesover time. These changes reflect the prevailing tension betweenthe concerns of policy makers and the concerns of mathemat-ics educators.

Specifications in 1975: Grade 3 Test Development. Duringthe period in which the third-grade test was developed, theprevailing philosophy of test development was that in order forthe tests to be accepted by a vast majority of districts for theirprogram evaluation, the test must match what was actuallybeing taught in their classrooms. Furthermore, since what wasbeing taught in the classrooms was based on state-adoptedtextbooks, the content of the test had to be limited to what

12i

Page 122: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

114 Pandey

appeared in most textbooks. Therefore, the test content specifi-cations were written based upon the content appearing in state-adopted textbooks at the time.

Figures 6-2a, 6-2b, and 6-3a, C 3b show the pages appear-ing in the draft Test Content Specifications -Operatioas for theThird-Grade Test (1980 version). In Figures 6-2a and 6-3a. onecolumn provides the page numbers of the textbook containinga particular topic. Figures 6-2a. 6-2b and 6-3a, 6-3b show. inparticular, that certain mathematical content, such as basicarithmetic, ppeared in all textbooks: however, topics such asproblem solving and modeling did not appear in any of thebooks, or appeared in only one or two books. Before developingthe final test content specifications, a random sample of teach-ers from throughout the state was surveyed to determine thedegree of emphasis they placed on each skill and whether theywould like that skill to be measured as part of the statewideassessment. The resulting specifications were quite narrow inthe sense that important mathematical topics, such as problemsolving and modeling, were not taught In most classrooms.

Specifications in 1980: Grade 6 Test Development. Thetest content specifications for the sixth-grade test, developedbetween 1978 and 1980, were derived from the MathematicsFramework for California Public Schools rather than exclusivelyfrom the content analysis of commonly used state-adopted text-books. The Agenda for Action, published by the National Coun-cil of Teachers of Mathematics, was also influential In develop-ing the test content specifications. As a result, it was determinedappropriate to include a problem-solving subtest, such as prob-lem formulation, problem analysis, and problem interpretation,in addition to an emphasis on routine and nonroutine problem-solving skills. The specifications also included skills in geom-etry. algebra, measurement, and probability and statistics. Table6-2 above shows the content outline of the sixth-grade test.

Specifications in 1984: Grade 8 Test Development. Thetest content specifications for the eighth-grade test, developedbetween 1981 and 1984, were based on the Mathematics Frame-work for California Public Schools and the Model CurriculumGuide. The rationale for the test content specifications wasbased on three major concerns: ( I) a general concern for "excel-lence" in that all children deserve a decent education involving

1 2

Page 123: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-2a

.C

AP

Tes

t Con

tent

Spe

cific

atio

nsO

pera

tions

for

the

Thi

rd-G

rade

Tes

t (19

80 v

ersi

on).

Ski

llP

erfo

rman

ce O

bjec

tive

Item

Ste

mC

hara

cter

istic

sR

espo

nse

Cha

ract

eris

tics

Tex

t Boo

kP

age

Num

bers

Rec

all b

asic

addi

tion

lads

Giv

en tw

o nu

mbe

rs, t

he s

tude

ntw

ill a

dd th

em a

nd s

elec

t the

The

item

ste

m w

ill b

e tw

o si

ngle

digi

t num

bers

alig

ned

eith

erT

he r

espo

nse

choi

ces

will

be

who

lenu

mbe

rs w

ith o

ne a

nd tw

o di

gits

.S

F14

-23,

24-

31,

32-3

3co

rrec

t ans

wer

from

four

opt

ions

.ve

rtic

ally

or

horiz

onta

lly w

ith a

The

inco

rrec

t opt

ions

wiN

be

the

Ho

46-4

9, 6

2-69

,pr

oper

add

ition

sig

n.re

sult

of:

74, 7

9. 1

01.

110

Nol

o: A

list

of b

asic

add

ition

fact

s,a.

mak

ing

an e

rror

in th

e ba

sic

HM

2-4,

36-

46ca

tego

rized

by

leve

l of d

iffic

ulty

isfa

cts

(one

mor

e or

one

less

than

He

1, 7

, 8, 1

1-12

give

n in

the

appe

ndix

.th

e co

rrec

t ans

wer

)25

-26,

35-

36,

b. p

erce

ptua

l err

or o

f rev

ersi

ng46

, 55.

70-

71,

the

sum

dig

its (

e.g.

, 8 +

9 =

71)

73-

78.8

3 -8

4c.

faili

ng to

und

erst

and

the

mea

ning

AW

28-

35, 4

0-41

.of

the

oper

atio

n si

gn (

e.g.

,50

-53,

59-

61,

mer

ging

the

adde

nds;

7 +

3 =

73)

64, 8

7d.

faili

ng to

inte

rpre

t the

ope

ratio

nS

RA

6-9

, 14,

60,

sign

cor

rect

ly (

e.g.

, mul

tiply

ing

152-

154,

287

inst

ead

of a

ddin

g)S

B28

-30,

36,

39.

e. fa

iling

to u

nder

stan

d th

e43

-50,

52

iden

tity

elem

ent z

ero

(whe

n it

occu

rsin

the

stem

), e

.g.,

csN

asin

g ze

ro a

sth

e co

rrec

i ans

wer

58-6

0, 3

28

Page 124: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-2a

Con

tinue

d

Ski

llP

erfo

rman

ce O

bjec

tive

Item

Ste

mC

hara

cter

istic

sR

espo

nse

Cha

ract

eris

tics

Tex

t Boo

kP

age

Num

bers

Rec

all b

asic

subt

ract

ion

Giv

en tw

o nu

mbe

rs, t

he s

tude

ntw

ill fi

nd th

e di

ffere

nce

and

The

item

ste

m w

ill b

e a

one-

or

two-

digi

t min

uend

and

a s

ingl

e-di

git

The

res

pons

e ch

oice

s w

ill b

e w

hole

num

bers

with

one

or

two

digi

ts.

SF

14-2

3, 2

4-31

,32

-33

fact

sse

lect

the

corr

ect a

nsw

er fr

omsu

btra

hend

alig

ned

veni

calh

i or

The

inco

rrec

t opt

ions

will

be

the

Ho

50-5

2, 5

5, 6

4fo

ur o

ptio

ns.

horiz

onta

lly w

ith a

pro

perly

resu

lt of

:67

-69,

74,

posi

tione

d su

btra

ctio

n si

gn.

101,

107

, 110

a.m

akin

g an

err

or in

the

basi

cH

M5-

8. 1

0. 2

9-30

.N

ote:

An

exte

nsiv

e lis

t of

fact

s (o

ne m

ore

or o

ne le

ss37

-38

subt

ract

ion

fact

s is

giv

en in

than

the

corr

ect a

nsw

er)

He

3-4.

6, 1

3-17

,th

e ap

pend

ix.

b.fa

iling

to u

nder

stan

d th

e m

eani

ng25

-26,

35-

36.

of th

e op

erat

ion

sign

(e.

g.,

46, 5

5. 7

0-71

,m

ovin

g th

e m

inue

nd a

nd73

-78,

83-

84su

btra

hend

toge

ther

)A

W 2

8-35

, 40-

41,

c.fa

iling

to in

terp

ret t

he o

pera

tion

50-5

3. 5

9-61

,si

gn c

orre

ctly

64. 1

23d.

faili

ng to

und

erst

and

the

SR

A 1

3- 1

6.15

2 -1

54,

iden

tity

elem

ent z

ero

(whe

n it

289

occu

rs in

the

stem

); e

.g..

SB

33-

39.4

3 -5

0.ch

oosi

ng z

ero

as th

e co

rrec

tan

swer

e.re

vers

al o

f dire

ctio

n of

subt

ract

ion-

subt

ract

ing

top

num

ber

from

bot

tom

num

ber

52. 5

8-60

, 329

SF

: Sco

ff, F

ores

man

and

Com

pany

-Mat

hem

atic

s A

roun

d U

s: S

kills

and

App

licat

ions

, by

Bol

ster

, Lev

el 3

. Pup

il's

Boo

k. 1

975

Ho:

Hol

t Rin

ehar

t and

Win

ston

-Hol

t Sch

ool M

athe

mat

ics,

Nic

hols

, et a

l., S

tude

nt T

ext 1

974

HM

: Hou

giko

n M

ifflin

Com

pany

-Mat

hem

atic

s fo

r In

divi

dual

Ach

ieve

men

t, by

Den

holm

, Lev

el 3

text

. 197

4H

e: D

.C. H

eath

and

Com

pany

-Hea

th M

athe

mat

ics

by B

liley

. et a

l., L

evel

3 S

tude

nt E

ditio

n. 1

975

AW

: Add

ison

-Wes

ley

Pub

lishi

ng C

ompa

ny, I

nc -

Inve

stig

atin

g S

choo

l Mat

hem

atic

s, b

y E

lean

or, R

., E

icho

lz, P

., 01

1affe

r, B

ook

3. 1

976

SR

A: S

ilenc

e R

esea

rch

Ass

ocia

tes-

Mat

hem

atic

s Le

arni

ng S

yste

m, L

evel

3, 1

974

SB

: Silv

er B

urde

tt C

ompa

ny-S

ilver

Bur

dett

Mat

hem

atic

s S

yste

m, b

y Le

Bla

nc, L

evel

3, S

tude

nt E

ditio

n, 1

973

t ta.

Page 125: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-2b

.Ill

ustr

ativ

e E

xam

ples

of O

pera

tions

Item

s fr

om th

e C

AP

Thi

rd-G

rade

Tes

t Con

tent

Spe

cific

atio

ns (

1980

ver

sion

).

Exa

mpl

eI

Exa

mpl

e 2

Exa

mpl

e3

Exa

mpl

e 4

Exa

mpl

e 5

26

+ 0

= 0

68

+ 1

=8

IN+

2+

1+

4

A.

22(c

)A

.6

C)

A.

61(c

)A

.10

(a)

A.

84(c

)B

.3

(a)

B.

0(e

)B

.7

(')B

.7

(d)

B.

4(d

)C

.4

(')C

.60

(c)

C.

5(d

)C

.9

(')C

.21

(b)

D.

5(a

)D

.7

(a)

D.

8(a

)D

.81

(c)

D.

12(*

)

3 -

0=9

610

- 6

= O

1512

-9

=_1

-7A

.0

(d)

A.

5(')

A.

4 (1

A.

8(t

)A

.3

(1B

.2

(a)

B.

7(c

)B

.5

(a)

B.

22(c

)B

.4

(a)

C.

3(')

C.

4(a

)C

.16

(c)

C.

12(a

)C

.17

(e)

D.

30(b

)D

.61

(b)

D.

106

(b)

D.

22(c

)D

.12

9 (b

)

Page 126: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-3a

.C

AP

Tes

t Con

tent

Spe

cific

atio

nsP

robl

em S

olvi

ng a

nd L

ogic

al T

hink

ing

for

the

Thi

rd G

rade

Tes

t (19

80 v

ersi

on).

Ski

llP

erfo

rman

ce O

bjec

tive

Item

Ste

mC

hara

cter

istic

sR

espo

nse

Cha

ract

eris

tics

Tex

t Boo

kP

age

Num

bers

Iden

tify

anG

iven

a n

umbe

r se

nten

ce o

rT

he it

em S

tem

will

be

a si

mpl

e w

hole

The

res

pons

e ch

oice

s w

ill b

e si

mpl

eS

Fap

prop

riate

geom

etric

mod

el, t

he s

tude

nt w

hit

num

ber

sent

ence

or

sim

ple

stor

y pr

oble

ms

or s

impl

e qu

estio

ns.

Ho

56-5

7qu

estio

n or

dete

rmin

e th

e ap

prop

riate

geom

etric

sha

pe, w

ith d

irect

ions

toIn

corr

ect o

ptio

ns w

ill b

e th

epr

oble

mqu

estio

n or

sto

ry p

robl

em th

atch

oose

a m

atch

ing

stor

y pr

oble

mre

sult

of:

He

rela

ted

tois

rel

ated

to it

and

sel

ect t

heor

an

appr

opria

te q

uest

ion.

AW

a nu

mbe

rco

rrec

t ans

wer

from

four

a.fa

iling

to c

hoos

e co

rrec

tS

RA

sent

ence

optio

ns.

oper

atio

nb.

taili

ng to

cho

ose

corr

ect

num

bers

c.ch

oosi

ng in

com

plet

e in

form

atio

nd.

choo

sing

inap

prop

riate

que

stio

ne.

taili

ng to

cho

ose

corr

ect s

hape

f.ch

oosi

ng m

easu

rem

ent q

uest

ion

SB

Iden

tify

anA

. Giv

en a

sto

ry p

robl

em a

ndT

he 't

am s

tem

will

be

a si

mpl

e st

ory

The

res

pons

e ch

oice

s w

ill b

eS

Fap

prop

riate

ques

tion,

the

stud

ent w

hit

prob

lem

with

dire

ctio

ns to

cho

ose

sim

ple

grap

hs, d

iagr

ams,

tabl

es.

Ho

35, 1

45, 2

93an

alys

is fo

ra

stor

y pr

oble

mde

term

ine

the

corr

ect

proc

edur

e to

sol

ve th

eth

e ap

prop

riate

ope

ratio

n, ta

ble,

grap

h, d

iagr

am, o

r es

timat

edor

the

writ

ten

wor

ds a

dd. s

ubtr

act.

mul

tiply

, div

ide.

The

inco

rrec

t opt

ions

HM

157

He

828

prob

lem

and

sel

ect t

heco

rrec

t ans

wer

from

thre

ean

swer

.w

ill b

e th

e re

sult

of:

AW

SR

Aor

lour

opt

ions

.a.

faili

ng to

cho

ose

corr

ect o

pera

tion

b.fa

iling

to a

ssoc

iate

the

corr

ect g

raph

,ta

ble,

or

diag

ram

with

the

give

n in

form

atio

n

SB

SF

: Sco

tt. F

ores

man

and

Com

pany

Mat

hem

atic

s A

roun

d U

s: S

kiN

s an

d A

pplic

atio

ns. b

y B

olst

er, L

evel

3, P

upil'

s B

ook.

197

5H

o: H

olt R

ineh

art a

nd W

inst

onH

olt S

choo

l Mat

hem

atic

s. N

icho

ls, e

t al.,

Stu

dent

Tex

t, 19

74H

M: H

ough

ton

Miff

lin C

ompa

nyM

athe

mat

ics

for

Indi

vidu

al A

chie

vem

ent,

by D

enho

lm. L

evel

3 te

xt. 1

974

He:

D.C

. Hea

th a

nd C

ompa

nyH

eath

Mat

hem

atic

s by

Dill

ey, e

t al.,

Lev

el 3

, Stu

dent

Edi

tion,

197

5A

WA

ddis

on-W

esle

y P

ublis

hing

Com

pany

, Inc

Inve

stig

atin

g S

choo

l Mat

hem

atic

s, b

y E

leen

or. R

., E

icho

lz, P

., O

'Daf

fer,

Boo

k 3.

197

6S

RA

: Sile

nce

Res

earc

h A

ssoc

iate

sMat

hem

atic

s Le

arni

ng S

yste

m, L

evel

3. 1

974

SB

: Silv

er B

urde

tt C

ompa

nyS

ilver

Bur

dett

Mat

hem

atic

s S

yste

m, b

y Le

Bla

nc. L

evel

3. S

tude

nt E

ditio

n. 1

973

LI;

Page 127: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-3b

.Ill

ustr

ativ

e E

xam

ples

of P

robl

em S

olvi

ng a

nd L

ogic

al T

hink

ing

Item

s fr

om th

e C

AP

Thi

rd -

Gra

de T

est C

onte

nt S

peci

ficat

ions

(19

80 v

ersi

on).

Exa

mpl

e I

Exa

mpl

e 2

Exa

mpl

e 3

Exa

mpl

e 4

Exa

mpl

e 5

53

= 2

10 +

6 =

16

6 x

2 =

12

Whi

ch q

uest

ion

wou

ldW

hich

que

stio

n w

ould

you

ask?

you

ask?

Whi

ch m

atch

esth

e pr

oble

m?

Whi

ch m

atch

esth

e pr

oble

m?

Whi

ch m

atch

esth

e pr

oble

m?

A.

3 ap

ples

. 5 a

pple

s.H

ow m

any

in a

N?

(a)

A.

Ric

o ha

d 10

mar

bles

.H

e ga

ve 6

aw

ay.

(a)

How

man

y w

ere

left?

A.

3 ca

rs, 2

peo

ple

inea

ch c

ar. H

owm

any

peop

le?

08I

_,

B.

3 ap

ples

, 5 s

eeds

.B

.R

ico

had

10 m

arbl

es.

B.

6 ca

rs, 2

mor

e ca

rs.

A. H

ow b

ig is

the

A. H

ow m

any

tria

ngle

s c)

How

man

y se

eds?

(a)

He

got 6

mor

e. H

ow (

)m

any

in a

ll?H

ow m

any

cars

inal

l?(a

)tr

iang

le?

(e)

do y

ou s

ee?

C.

Jan

had

5c. S

hesp

ent 3

c. H

ow m

uch

was

left?

(1G

.S

am h

ad 6

mar

bles

.H

e lo

st 4

. How

(a)

man

y w

ere

left?

C.

6 ca

rs, 2

dro

ve a

way

.H

ow m

any

wer

elo

ft?(a

)B

. How

man

y(d

)w

ere

left?

B. W

hat i

s th

e su

m?

(d)

D.

Jan

had

30. S

hesp

ent 2

c. H

ow m

uch

(b)

D-

Sam

had

6 m

arbl

es.

Eac

h ha

d 4

dots

.(b

)0.

6 ca

rs, 2

peo

ple

inea

ch. H

ow m

any

(1C

. How

man

y ci

rcle

s? (

e)C

. How

muc

h le

ft?(d

)

was

left?

How

man

y do

ts in

all?

peop

le in

all?

D. H

ow m

any

side

s(')

are

ther

e?D

. How

far

arou

nd(e

)th

e sq

uare

cont

inue

d on

nex

t pag

e

1'1

Page 128: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

6-3b

Con

tinue

d

Exa

mpl

e I

Exa

mpl

e 2

Exa

mpl

e 3

Exa

mpl

e 4

Exa

mpl

e 5

Ann

had

4 a

pple

s. J

oe g

ave

her

2 m

ore.

How

man

y di

dsh

e ha

ve a

ll to

geth

er?

How

wou

ld y

ou fi

nd th

ean

swer

?

A. S

ubtr

act

B. A

ddC

. Mul

tiply

D. D

ivid

e

(a) O (a)

(a)

Alic

e sp

ent 1

5c fo

r pe

ncils

.S

he b

ough

t 5 p

enci

ls. H

owm

uch

for

each

pen

cil?

How

wou

ld y

ou fi

nd th

ean

swer

?

A. D

ivid

eB

.S

ubtr

act

C. M

ultip

ly o

r A

dd

(.)

(a)

(a)

30 p

eopl

e10

like

van

illa

Ice

crea

m.

5 lik

e st

raw

berr

y ic

e cr

eam

.15

like

cho

cola

te ic

e cr

eam

.

Whi

ch s

how

s th

is?

26 15

A

5

B C.

(b)

(b)

The

re a

re 1

2 pe

ople

.6

smal

l4

mid

dle

2 bi

g

Whi

ch s

how

s th

is?

The

re w

ere

2 bo

ys. E

ach

boy

had

3 co

okie

s. H

owm

any

cook

ies

in a

ll?

Whi

ch s

how

s th

is?

miti

lmii)

AA

A-

filin

g no

mlif

t HA

W a

a a

'

Page 129: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 121

higher-level thinking, problem solving, and understanding; (2)a commitment to research that expands knowledge and under-standing of how students develop thinking skills and learn tosolve problems; and (3) a concern that the quality of test ques-tions was less than desirable on standardized tests.

The theme of higher expectations and improved achieve-ment is addressed in the Mathematics Framework for CaliforniaPublic Schools (1985), which states:

The mathematics program recommended in this frame-work reflects raised expectations for student achieve-ment. The goal for all students is to be able to usemathematics with confidence; therefore, every studentmust be instructed in the fundamental concepts of eachstrand of mathematics and no student limited to thecomputational aspects of the number strand.... Moststudents will go beyond the fundamental concepts toachieve deeper and broader capability in mathematics,but even the less capable students, by learning theseconcepts, will have appropriate experiences in all of thestrands. They must not, for example, be deprived ofwork in geometry or probability in order to have morepractice with narrow computational skills. Rather, theywill continue to learn the new concepts of all of thestrands and to integrate those concepts into their un-derstanding throughout their school careers.... This ex-pectation applies to all students, including students withspecial needs and those who come from groups whohave historically been underrepresented in upper levelmathematics courses. (p. 3)

In addition to the concern for excellence, a major emphasisof the Mathematics Framework for California Public Schools: Kin-dergarten Through Grade Twelve (1985) and the MathematicsModel Curriculum Guide (1987) is teaching for understanding.The theme of teaching for understanding is stated in the Math-ematics Framework for California Public Schools (1985):

Teaching for understanding emphasizes the relationshipsamong mathematical skills and concepts and leads stu-dents to approach mathematics with a common-senseattitude, understanding not only how but also why skills

Page 130: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

122 Pandey

are applied. Mathematical rules, formulas, and proce-dures are not powerful tools in isolation, and studentswho are taught them out of any context are burdenedby a growing list of separate items that have narrowapplication. Students who are taught to understand thestructure and logic of mathematics have more flexibilityand are able to recall, adapt, or even recreate rulesbecause they see the larger pattern. Finally, these stu-dents can apply rules, formulas, and procedures to solveproblems, a major goal of this framework. (p. 12)

The concern for excellence and an emphasis on under-standing in the Framework resulted in a consensus that theeighth-grade test must have the following characteristics.

In computational questions, emphasis was placed onthe understanding of an arithmetic operation ratherthan on performing an algorithmic manipulation. Mostof the questions can be answered mentally if the stu-dent has a clear understanding of arithmetic opera-tions and symbols.Test questions reflected a level of achievement andsophistication consistent with a mathematics programthat eliminates the repetition of content from one gradelevel to the next unless there is an increase in depthor breadth.Test questions were designed to assess not only thearithmetic computational skills, but also the skills in-volved in pre-algebra, geometry, measurement, logic,and probability and statistics.Special emphasis was placed on the assessment ofproblem-solving processes, such as problem formula-tion, problem analysis, interpretation of results, andon problem-solving questions. including routine andnanroutine problems.

The art of writing questions in problem solving and otherhard-to-measure areas was influenced by the work of Lester(1978. 1982), Lesh (1983), Mayer (1983), Newell and Simon(1972), Polya (1957, 1965), Resnick (1983), Schoenfeld (1982),Sternberg (1981, 1983), and Silver (1982). Pandey (1983) de-scribed the implications of research in problem solving for

I0

Page 131: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 123

assessment. The California Assessment Program's efforts to im-prove the quality of test questions were aided by the work ofthe MAAC members and through the review of test items byAlan Hoffer. University of Oregon; Thomas Romberg, Universityof Wisconsin; James Wilson, University of Georgia; and MaryKay Corbitt, East Tennessee State University.

Specifications in 1987: Grade 12 Test Development. Thetest content specifications for the twelfth-grade test, developedbetween 1983 and 1985. was based on the desire to raiseexpectations for all students and to develop mathematical powerin students before they graduate from high school. The Frame-work (1985) defines mathematical power as follows:

To enable all graduates to meet current and future de-mands, mathematics education must focus on students'capacity to make use of what they have learned in allsettings. Mathematical power, which involves the abilityto discern mathematical relationships, reason logically,and use mathematical techniques effectively, must bethe central concern of mathematics education and mustbe the context in which skilis are developed. (p. 1)

The major difference between the revised twelfth-grade CAPtestnow called Survey of Academic Skills. Grade 12, and theolder version called the Survey of Basic Skills, Grade 12, is thatthe new test emphasizes understanding of mathematical con-cepts and problem solvinga shift in emphasis similar to thatof the eighth-grade test. The specifications were written andtest questions were designed to measure what students under-stand about the mathematical concepts and skills they havelearned from kindergarten through Grade 12 and how well theycan use this learned mathematics in familiar and unfamiliarproblem situations. The test was designed to assess students'abilities to estimate, to discern relationships, and to use num-ber sense in the evaluation and interpretation of intermediateand final results of a problem-solving process. It requires stu-dents to use higher-level thinking skills and therefore providesa measure of their ability to do so in a mathematical setting asopposed to providing a measure only of their ability to performrote mathematical algorithms which they may do correctly butdo not understand. Table 4 shows the skill areas reported for

1 31

Page 132: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

124 Pandey

the twelfth-grade test. In comparing the skill areas reported forGrade 12 with the skill areas reported for the sixth- or eighth-grade test (Tables 6-2 and 6-3), it is interesting to note thatproblem solving was the skill area reported first, followed byanother major skill area, understanding.

The Art of QuestioningTo appreciate how the CAP has evolved over a period of years,we will examine relationships among test specifications, thenature of the questions, and the reporting categories over aperiod of fifteen years. In the CAP's evolution. it should beemphasized that CAP test designers have cr stently soughtto blend into their assessment instruments 2 most currentknowledge about our understanding of the nature of math-ematics, theories of learning, art of test construction, and pro-gram improvement strategies.

Early DevelopmentsThird-grade test development in 1975 was heavily influencedby the writings of Popham (1973) ,md Millman (1974), whorecommended rigorous specifications to ensure that each itembe a true reflection of the intended skill to be measured. Thestructure of the test content specifications was derived usingthe traditional content-by-process matrix. For example. the con-tent categories for the third-grade specifications were the strandsof mathematics specified in the California Mathematics Frame-work for Kindergarten through Grade Twelve (1985)Number,Algebra, Geometry, Measurement, Probability and Statistics,and Logic. The process categories were computation/knowledgeof facts, comprehension, and applications. Detailed specifica-tions were generated for each cell of the content-by-processmatrix. As shown in Figures 6-2a and 6-2b, the item specifica-tions were very structured in the specificity of the performancemode and characteristics of distractors. The performance modedescribed the limits of the item stem of a multiple-choice ques-tion. and distractor characteristics described the various waysof constructing the incorrect choices for the item.

This method of test construction resulted in a large numberof items, each item designed to measure a discrete skill de-scribed in the specification. The collection of items in a sub-domain, such as functions, contributed to the sub-domainscores.

Page 133: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 125

This analytical approach toward test construction is basedon the assumption that a single content area, such as math-ematics, can be divided into measurable minuscule bits andpieces, that a sampling of behaviors from the content-by-pro-cess matrix could be generalized to the population of behaviors.and that the diagnostic information generated from sub-do-main scores will be useful for improving a school's program.

This method of constructing achievement tests has longproved useful. It is still useful in situations where instruction isnot "corrupted" by how tests are constructed, or where testvalidity is not tampered with by revealing the exact content ofthe test. However, in situations of high stakes testing, as insituations where teachers or administrators are held account-able for student achievement, the analytical approach is unde-sirable. When the test questions can be replicated throughprecise test specifications and the results reported for a largenumber of narrowly defined skill areas, the message is con-veyed that learning can be improved by teaching bits and piecesof information. In many situations, rather than teaching todesirable curriculum practices, teachers resort to "multiple-choice instruction because of the form of the assessment in-strument.

Recent Developments

In recent years, as in the development of the twelfth-grade testimplemented in 1985, criteria for item specifications take intoaccount the emerging role of high stake tests. The Lsiteria canbe traced to three main concerns:

1. Test questions must reflect the current view of thenature of mathematics. This view emphasizes un-derstanding, thinking, and problem solving thatrequire students to see mathematical connectionsin a situation-based problem and to be able tomonitor their own thinking processes to accom-plish the task efficiently. This requires that testquestions have the following characteristics:

They assess thinking, understanding. and prob-lem solving in a situational setting as opposed toalgorithmic manipulation and recall of facts.They assess the interconnection among math-ematical concepts and the outside world.

133

Page 134: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

126 Pandey

2. Test questions must reflect the current under-standing of how children learn. The current viewof instruction and learning assumes that childrenare active learners and engage in creating theirown meaning during the instructional process. Thisrequires that test questions have the followingcharacteristics:

They must be engaging.They must be situational and based upon real-life applications.They must have multiple-entry points in the sensethat students at various levels in their math-ematical sophistication should be able to answerthe question.They allow students to explore difficult problemsand students' explorations are rewarded.They allow students to answer correctly in di-verse ways according to their experiences, ratherthan requiring a single answer.

3. Test questions must support good classroom in-struction and not lend themselves to distortion ofcurriculum. Good curriculum practices require thattest questions have the following characteristics:They must be exemplars of good instructionalpractices.They should be able to reveal what students knowand how they can be helped to learn moremathematics.

Questions having these characteristics have been christenedby Honig (1985) as "power items." Such questions cannot bemeasured by the typical test comprising thirty- to sixty-secondmultiple-choice questions. However, multiple-choice questionsrequiring two to four minutes can be developed that have mostof the characteristics described above. Examples 1 and 2 (seeAppendix D) are questions of the type appearing in the twelfth-grade test, the Survey of Academic Skills: Grade 12. in thetwelfth-grade test, CAP also uses open-ended questions thatrequire 12 to 15 minutes for students to answer. Example 3has been taken from the 1987-88 version of the test, which hasbeen discussed in detail In A Question of Thinking (California

1 4

Page 135: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Test Development Profile 127

State Department of Education, 1989). Example 4, shown inAppendix D also shows the response of one student on thisquestion.

CAP Instruments in the Future. This paper describes CAPtest development procedures prior to 1989. The CAP is cur-rently revising tests at Grades 3, 6, 8, and 12 and introducinga new test at Grade 10. Besides the new type of multiple-choicequestions, the revised tests will have open-ended and perfor-mance-type questions. In addition, portfolio assessment in math-ematics is being explored as an alternative. Sample perfor-mance tasks and guidelines for the portfolio can be obtained bywriting to the author.

1 nr

Page 136: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

7

Assessing Students' Learning in CoursesUsing Graphics Tools: A PreliminaryResearch Agenda

Sharon L. Sonic

Recently mathematics educators have called for the use ofcalculator snd computer graphing technology in mathematicsclasses, and several software and curriculum developmentprojects have attempted to transform these recommendationsinto reality. However, until now there has been little system-atic study of how teaching, learning, and assessment In coursesusing such graphics tools are affected by the technology. Thispaper describes a preliminary agenda developed by research-ers in the field for assessing students' learning In coursesusing graphics tools. Included are suggested investigations ofstudent and teacher outcomes and a discussion of method-ological Issues.

In recent years there have been many calls for the reform ofmathematics education in the United States. Among the mostconsistent recommendations is that mathematics programs takeadvantage of the power of calculators and computers (CollegeBoard, 1985; Fey, 1984; NCTM, 1980, 1989). Specifically, func-tion graphing tools available on calculators and computers aresuggested as a means to produce both a richer mathematicscurriculum and a deeper understanding of mathematics with-

128

Page 137: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Assessing Students' Learning 129

out having to use valuable time in mathematics classes for astudy of computer programming (Demana & Waits, 1990; Fey,1989; Kaput, 1989: Waits & r 1989).

These calls for using graphing technology have been accom-panied by several research and development projects. The Edu-cational Technology Center has developed software called Visu-alizing Algebra: The Function Analyzer (Harvey, Schwartz, &Yerushalmy, 1988) and used it to study students' conceptionsand misconceptions of scale in graphs (Goldenberg, 1988). Feyand Heid (1987) are developing booklets for students, guidesfor teachers, and correlated computer software for elementaryalgebra. The University of Chicago School Mathematics Project(UCSMP) has developed a course called Functions, Statisticsand Trigonometry with Computers (Rubenstein et at, 1988) inwhich students use as tools any standard graphing software, astatistics package, and BASIC programs. The Ohio State Uni-versity Calculator and Computer Precalculus (C2PC) Project hasdeveloped software (Waits & Demana, 1988) and designed aprecalculus text (Demana & Waits, 1989) that can be used withits software or a graphics calculator at both the high schooland college levels. During the 1988-89 academic year, each ofthe latter three groups studied the effects of their materials onlearning and teaching in regular classroom settings (Demana.et al., in preparation: Lynch, Fischer, & Green, 1989; Sarther,Hedges, & Stodolsky, in preparation).

Materials based on graphing tools such as those above arereshaping the profession's conceptions of what we ought toteach, what we can teach. and how we can teach it (Fey, 1989).They allow students to explore "advanced" mathematical func-tions without having to master much prerequisite algebraicmanipulation. Making graphs becomes a tool for solving otherproblems, rather than an end in itself. From exploring multipleinstances, generalizations can be formed; and, conversely, in-stances of proposed generalizations can be tested quickly usinggraphing tools.

Surprisingly, there has been little discussion among math-ematics educators of the methods and materials used to assesslearning and teaching in such environments. Goldenberg (1988)reports that he found little in the research literature on learn-ing or teaching about graphing functions in any environmentwith or without computers.

1n

Page 138: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

130 Senk

At present there are no nationally available mathematicstests that require calculator or computer use. The Mathemati-cal Association of America's Calculator-Based Placement TestProgram Project and the College Board's Mathematics Achieve-ment Test Committee are presently developing tests that in-clude calculator-active items. Such tests demand changes fromtypical achievement tests in the types of problems that can andcannot be included (Harvey, 1989). In each case, however, thetests being developed consist only of multiple-choice items andassume that students have only a non-graphics calculator avail-able. In addition, these tests are designed to assess students'knowledge of the present mathematics curriculum where cal-culator and computer use may not have been an integral partof the course. Furthermore, no guidelines exist for assessmentof student learning in calculator- or computer-based courses.Recently, the NCTM (1989) has called for broadening our viewof appropriate assessment techniques in all areas of mathemat-ics. Senk (1989) and Wiske et al. (1988) have called for furtherresearch on techniques and instruments for assessing students'learning in advanced technological environments.

Call for a Meeting

Given the needs outlined above, funding was secured from theNational Center for Research in Mathematical Sciences Educa-tion for a meeting to discuss ways of assessing the impact offunction graphing tools on students' learning. The meeting tookplace on December 15-16, 1988, on the campus of the Univer-sity of Chicago. with the following people participating:

Dora Aksoy, Department of Education, The University ofChicago:

James Flanders. Department of Mathematics and Sta-tistics, Western Michigan University;

E. Paul Goldenberg. Education Development Center, New-ton, MA;

John G. Harvey, Department of Mathematics, Universityof WisconsinMadison;M. Kathleen Held. Department of Curriculum and In-struction, Pennsylvania State University:

Page 139: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Assessing Students' Learning 131

Catherine Sarther, Departments of Mathematics andEducation, Mount Mary College;Sharon L. Senk, Department of Education, The Univer-sity of Chicago;

Bert K. Waits, Department of Mathematics, The OhioState University; and

Orit Zaslaysky, Department of Education in Technologyand Science, Technion, Israel.

The contributions of Goldenberg, Harvey, Heid, Senk, andWaits to issues related to graphing technology are noted above.At the time of the meeting Aksoy, Flanders, and Sarther wereall doctoral students at the University of Chicago working withthe University of Chicago School Mathematics Project. Aksoyand Flanders were editors, and Sarther was coordinator of the1988-89 field study of Functions, Statistics, and Trigonometrywith Computers (Rubenstein et al.. 1988). Zaslaysky was apostdoctoral researcher at the Learning Research and Develop-ment Center, University of Pittsburgh, working on a review ofthe literature on functions and graphs in mathematics(Leinhardt, Zaslaysky, & Stein, in press).

The two main questions this meeting addressed with re-spect to algebra and precalculus courses based on functiongraphing tools were:

1. What are the fundamental goals to be assessed?(What are the core content, processes, and be-liefs?)

2. How should we go about assessing them? (Whatkinds of problems or situations appropriately mea-sure these goals? What techniques enable the re-searcher or teacher to uncover likely causes ofstudents' difficulties? To what extent should as-sessment instruments use graphing technology?To what extent should assessment be done with-out access to graphing tools?)

Both the funding agency and the participants hoped that themeeting would encourage the formation of an "invisible college"of researchers interested in this topic who would continue tocollaborate on common interests even after the meeting wasover.

I`0

Page 140: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

132 Senk

Summary of Discussion

Between them the participants had used graphing technologywith students at each level from Grade 9 through the collegesophomore level. During the first part of the meeting each par-ticipant described briefly his or her experiences using this tech-nology and some issues his or her own project faced related tothe use of graphing tools.

Virtually everyone agreed that research or curriculum de-velopment projects could not (or at least, should not) infusegraphing technology into a secondary or college course withoutchanging some of the original curricular goals. In particular,participants agreed that courses which use graphing technol-ogy in significant ways should, in comparison to standardcourses, increase emphasis on realistic applications of math-ematics: they should also focus on problems that encourageexploration and conjecturing, and decrease emphasis on manytraditional manipulative skills.

The following were identified as issues faced by studentsand teachers in all the projects represented at the meeting:

1. mastering the technology itself (ease of use is criti-cal for implementation)

2. balancing exact and approximate answers, copingwith multiple answers

3. putting the control of instruction and learning morewith students than ever before

4. worrying about long term effects of less manipula-tive skill and more graphical representation onstudents' performance in subsequent courses.

Based on the research of Fey and Heid (1987). Goldenberg(1988), and Leinhardt, Zaslaysky, and Stein (in press). scalingseems to be a large issue early in the study of functions andgraphs. Beginning algebra students seem to need instructionon how changes In scale do not change the values on thegraph. but only their perception of its shape or the amount ofthe graph they can see on the screen. Beginning algebra stu-dents in courses that use graphics tools also seem to needmore explicit instruction than they do in traditional algebracourses on deciding what scale to use on graphs. However.scaling seems to be much less an Issue by the time a student

Page 141: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Assessing Students' Learning 133

reaches a precalculus course in h school or college. Harveyand Waits reported that older stuuents seem to have few diffi-culties changing the scale on a viewing window so a completegraph can be seen.

A Preliminary Research AgendaThe discussion then turned to the need for research on issuesrelated to assessment in courses that use graphing technology.Participants agreed that a useful point of departure for thisdiscussion was Standard 6 on Functions (NCTM, 1989). Ourrecommendations for research are grouped into two areasUnder student and teacher outcomes, we include what we be-lieve to be the most important goals with respect to both con-tent and process for courses which emphasize graphs and func-tions. Under methodological issues, we identify how we believewe should go about investigating student and teacher outcomes.

Student and Teacher Outcomes. We recommend that stud-ies be developed to investigate the effects of graphing technol-ogy on the ability of students to:

1. interpret information from graphs alone, that is,without algebraic formulation

2. translate across representations, that is, from onetabular, graphical, function rule, or physical con-text to another

3. generate examples of particular types of functions,for example, linear or exponential functions

4. discuss the effects of changes in the viewing rect-angle on their perception of the shape of a graphor the nature of its properties

5. solve equations or inequalities, or systems of equa-tions or inequalities by both standard paper-and-pencil algorithms and graphical means

6. use graphs to hypothesize whether two algebraicexpressions are identically equal

7. for a given function, describe its properties (be-havior), for example, intercepts, maxima/minima,end behavior. points of discontinuity

8. describe the effects of parameter changes on afunction within whatever representation system isused.

Page 142: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

134 Senk

We also recommend that research address the impact ofgraphing technology on students':

9. frequency and proficiency of use of such technol-ogy

10. ability to generate higher-order questions aboutfunctions

11. ability to Justify conclusions both visually, using agraphing tool, and deductively, based on proper-ties of functions

12. beliefs about mathematics, for example, the ex-tent to which it is fun, dynamic, or evolving

13. attitudes toward learning mathematics, such asconfidence or persistence.

We further recommend that research investigate the impactof graphing technology on teachers':

14. frequency and proficiency of use of such technology15. structure of class time (we hypothesize less one-

way lecture. and more discussion and attention tostudents' questions)

16. beliefs about mathematics, for example, the ex-tent to which it is fun, dynamic, or evolving

17. beliefs about learning and teaching, such as, thewillingness to give messier examples, or the will-ingness to say, "I don't know"

18. ability to assess what students are learning, whatmisconceptions they have, and how students ac-quire their knowledge.

Methodological Issues. We believe that assessment of teach-ing and learning in courses using graphing tools should ad-dress the following issues.

1. Development of instruments. Three types of instrumentsare suggested for assessing students' knowledge of thecontent above: (a) items presented electronically, say ona computer, on which the student also has access tographing tools; (b) Items presented on paper that a stu-dent may respond to with access to graphing tools; and(c) items done completely by paper and pencil withoutaccess to graphing tools. Comparing results of studies

142

Page 143: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Assessing Students' Learning 135

using all three types of items will reveal how much moreinformation about a student's abilities a graphing utilitymakes available and will help determine the financialand time costs of each type of assessment instrument.Concomitant with the development of instruments thatencourage use of graphing technology, we must alsodevelop new instruments that assess the knowledge we .want students to be able to apply without access tosophisticated technology. For both "button pushing" andlead pushing" knowledge we encourage the develop-ment of both short-answer and longer, more elaborateopen-ended assessment items.

2. Variations in the technology itself. Studies should be con-ducted to compare and contrast ease of use of graphingcalculators and different configurations of graphing soft-ware and hardware and their effects on learning andteaching. In particular, the effects of software that allowseveral views of a graph simultaneously, or simulta-neous views of graph and table of values, should bestudied.

3. Duration of study. Both short-term and longitudinal stud-ies with multiple-time-point data are suggested. Theformer allows researchers to get quick feedback andmake revisions in curriculum or instruction based onunsatisfactory results. The latter is necessary to studycumulative effects.

4. Nature of the research. Both basic research (e.g., labora-tory or case studies) and classroom research (e.g., cur-riculum evaluation) are necessary. Laboratory researchusing state-of-the-art hardware, software, and deliverysystems, and a small number of students allows investi-gators to probe more deeply into what is ultimately pos-sible for teaching, learning, and assessment. Classroomresearch using the best commercially available productsand normal classroom conditions allows policy makersto think about what is realistic in the immediate future.

5. Cooperative efforts. Research on the effects of technol-ogy on learning or on methods of assessment usingcalculators and computers should be shared with and,on some occasions, conducted with the cooperation ofprofessional organizations, such as the National Council

1 3

Page 144: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

136 Senk

of Teachers of Mathematics or the Mathematical Asso-ciation of America, and testing agencies, such as theEducational Testing Service.

Outreach and CommunicationThe participants suggested the following activities as appropri-ate next steps toward implementing the above research anddevelopment agenda:

1. Propose a symposium sharing the above ideas,and some of the results of our own investigationsat the 1989 Psychology of Mathematics Educa-tion/North American Chapter Meeting PME/NAmeeting.

2. Encourage dialogue with others interested in re-search and development activities related to graph-ing technology and assessment.

3. Pick some abilities on the preceding IL,.. of studentoutcomes, develop items measuring those abili-ties, and share items and results with each other.

4. Lobby for the use of calculators and computersand for technology-based tests on national, state,and local assessments, and college entrance andplacement exams.

5. Define fundamental "lead pushing" skills relatedto graphing and functions and work to have itemsmeasuring them incorporated into standard as-sessment instruments.

6. Write a position paper on issues related to assess-ment for publication in professional _journals, forinstance, in the "Soundoff" section of the Math-ematics Teacher.

As of this time (June 1990). we have accomplished Item 1and have made some progress on Items 2 to 5. Waits and Senkshared with the other participants copies of their texts (Demana& Waits. 1989; Rubenstein et al.. 1988) and selected tests usedin program evaluation. (See Figures 7-1 and 7-2). Harvey andSenk organized a symposium on Changes in Student Assess-ment Occasioned by Function Graphing Tools at the PME /NAmeeting held in September 1989, in which they, Heid, and

114

Page 145: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Assessing Students' Learning 137

Waits participated. At the meeting the four decided to shareitems and results from their work with the hope of collecting anitem bank that might eventually be used as a source for re-search or classroom or program evaluation. Harvey continuesto work through the Mathematical Association of America onlobbying the College Board to incorporate use of technology oninstruments developed by the Educational Testing Service. Waitshas organized two conferences on Technology in Collegiate Math-ematics. and plans to host a third in November 1990. Finally,Harvey and Senk are preparing an analysis of assessment is-sues related to functions and graphs for another publication.

1. Use the graph to solve 1(x) > g(x).

A. x > 0

B. -2 <x <7

C. x<-2 orx> 7

D. 3 < x< 30,

E. x<3orx:-30

f(a

t1.1

a110

2. Which one of the following could represent a complete graph of /(x) = x3 + axwhere a is a real number?

B) arDI

1 A

E)

1

Figure 7-1. Sample multiplechoice items testing graphical knowledge of functions(Deniana & Waits, 1989). Used with permission.

1;v

Page 146: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

138 Senk

1. (a) Determine to the nearest tenth the zeroes of the function defined by 1(x) = x'10x + 5.

(b) Explain your method.

2. The polynomial function A defined by A(x) = .00153 + .1058x gives the approxi-mate alcohol concentration (in percent) in an average person's bloodstream xhours after drinking about 250 ml of 100-proof whiskey. The function is approxi-mately valid for values of x between 0 and 8. How many hours after the consump-tion of this much alcohol would the pert[ age of alcohol in a person's blood bethe greatest? Express the answer correc the nearest tenth, and explain hcwyou got your answer.

Figure 7-2. Sample open-ended items testing ability to use technology to solve prob-lems about functions (Sarther, Hedges, & Stodolsky, in preparation). Wedwith permission.

Page 147: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

8

Mathematics Testing with Calculators:Ransoming the Hostages

John G. Harvey

We must ensure that tests measure what Is of value, not Justwhat is easy to test. If we want students to investigate, ex-plore. and discover, assessment must not measure Just mim-icry mathematics. By confusing means and ends, by makingtesting more important than learning, present practice holdstoday's students hostage to yesterday's mistakes.

(MSEB) Everybody Counts

This paper analyzes research on the use of calculators inmathematics testing. Three kinds of tests are considered: (a)calculator-passive tests (i.e., tests on which calculator use isnot intended), (b) calculator-neutral tests (i.e.. tests that haveno "calculator sensitive" items and on which calculator use isnot required), and (c) calculator-based tests that were devel-oped so that most students will need calculators while re-sponding to some of the items. The effects of calculator use onthe characteristics of all three kinds of mathematics tests isreported. Included In the paper are examples of items fromcalculator-neutral tests and of calculator-active items fromcalculator-based tests.

The hand-held calculator was invented by Texas Instru-ments Incorporated (TI) in 1967. In 1972 Texas Instrumentsintroduced the TI Data Math calculatora four-function calcu-lator that retailed at the time for $150. Also in 1972, the Hewlett-

139

i 7

Page 148: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

140 Harvey

Packard Company began marketing the HP-35, a scientific cal-culator that retailed for $395. In that era, it seemed unlikelythat hand-held calculators would ever be consistently and widelyused in mathematics instruction for two reasons: they wereexpensive, and it was unclear in what ways they could be usedeffectively to improve mathematics learning and teaching. TheHP-35 had clearly been designed for use by engineers andengineering students; the initial market for the TI Data Mathwas business and industry.

In the interim ownership of hand-held calculators has be-come widespread. Hand-held calculators so dominate the cal-culator market that most people assume hand-held calculatorsare being discussed whenever the word calculator is used. Alongthe way calculators have become increasingly versatile: pres-ently, the kinds of calculators range from four-function calcu-lators with arithmetic operating systems (e.g., the TI-108) tographics and symbolic mathematics calculators like the HP-285. In between are nonprogrammable and programmable sci-entific calculators, calculators designed especially for businessapplications, and scientific graphing calculators that may havematrix functionality. Almost all of the calculators that havescientific functionality also have one- or two-variable statisticsfunctionality.

The prices of calculators like those first produced by Hewlett-Packard and Texas Instruments presently sell for about one-tenth of the original price of the HP-35 and TI Data Math. Thepresent price of a four-function calculator is in the range of $4to $7 and that of a simple (non-programmable) scientific calcu-lator is in the range of $10 to $15. As a result, it can no longerbe argued that calculators are too expensive for school stu-dents: even in school districts with large numbers of studentsfrom low-income households, it is possible for all students tohave their own calculators as has been demonstrated by theChicago Public Schools. This school district provides four-func-tion calculators for students in Grades 4-6 and scientific calcu-lators for students in Grades 7-9 (Dorothy Strong, personalcommunication).

It is still argued that calculators are not widely or effectivelyused in mathematics instruction. Overall this seems to be true(Kouba & Swafford. 1989. p. 102: Mathematical Sciences Edu-cation Board, 1989, p. 62), but there are beginning to be some

1 8

Page 149: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 141

good examples of ways in which mathematics can be taughtand learned using calculators. Among these examples are:

1. The seventh- and eighth-grade supplementary materialsGetting Ready for Algebra (Demana, Leitzel, & Osborne,1988) developed by the Ohio State University Approach-ing Algebra Numerically project. Students studying fromGetting Ready for Algebra use scientific calculators tolearn about variables and their applications by employ-ing strategies such as making tables and "guess-and-test."

2. The twelfth-grade textbook, Transition to College Math-ematics (Demana & Leitzel. 1984) developed at OhioState University for students who are classified as "re-medial" by their scores on the placement test they weregiven as high school juniors by the Ohio Early Math-ematics Placement Testing Program (Leitzel & Osborne,1985). This text requires students to use scientific cal-culators.

3. The precalculus sour. e developed by the Ohio State Uni-versity Calculator and Computer Precalculus Project(Demana & Waits, 1989). Throughout this course stu-dents are expected to use a graphing tool; appropriategraphing calculators are made by Casio and by Sharp.

4. Two inservice teacher education modules developed bythe Texas Education Agency; each of these modules isstructured around the use of the TI Math Explorer, afractions calculator, to teach fraction concepts and op-erations.

5. The materials being developed for Grades 7-12 by theUniversity of Chicago School Mathematics Project(UCSMP). In each of the six courses under developmentcalculators and computers are needed; in particular, inthe fifth UCSMP course, Functions, Statistics and Trigo-nometry with Computers. a graphing unit (e.g., the Casiofx-7000G) is essential (Sharon Senk, personal commu-nication).

Thus, it is presently possibleespecially at the middle, juniorhigh, and high school levelsto find teaching materials thatrequire the use of calculators. These materials can either beused directly in classroom instruction or they can provide enough

Page 150: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

142 Harvey

guidance so that teachers can create calculator-based studentmaterials. Two significant problems seem to remain that pre-vent the widespread, effective use of calculators in mathemat-ics instruction. One problem is that of training both preserviceand inservice mathematics teachers so that they and their stu-dents can learn to use calculators effectively; this includesshowing teachers effective ways of using calculators and per-suading them that the use of calculators will improve, not di-minish, students' abilities to learn mathematics and to solveproblems. This is a significant problem in that large numbersof elementary and secondary school teachers teach mathemat-ics and need to be trained: the magnitude of the problem iscomparable to that faced when the New Math curricula wereintroduced in the 1950s and 1960s, since teachers need tolearn both how to use a variety of calculators and how toexplore the ways these calculators can be used to teach amathematics curriculum restructured around the use of calcu-lator and computer technologies.

The second, equally significant problem is the developmentof valid, reliable mathematics tests that require calculator use.At present, there are few established guidelines for the develop-ment of mathematics tests that require calculator use and onlya scattered sample of nationally published mathematics testsfor students who have and who know how to use calculators.The need to improve mathematics assessment in general andmathematics tests in particular to encourage the infusion ofcalculators in instruction (or in the classroom) is crucial. AsEuenybody Counts so succinctly states: "What is tested is whatgets taught. Tests must measure what is most important" (Math-ematical Sciences Education Board, 1989, p. 69). Thus, if wewant to encourage teachers to use calculators in mathematicsinstniction, we must develop tests that require stude is to usecalculators. This paper explores the guidelines by which suchtests might be developed, and it cites a limited number of casesin which these kinds of tests have been developed and used.

CALCULATORS AND MATHEMATICS 'VESTING

In 1975. soon after the introduction of the TI Data Math andthe I-IP-35, the National Advisory Committee on MathematicalEducation (NACOME) urged that calculators be used in math-

i

Page 151: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 143

ematics instruction (NACOME, 1975, pp. 40-43). In their con-clusion on the advantages of using calculators in school math-ematics instruction, they stated that "present standards of math-ematical achievement will most certainly be invalidated by'calculator classes'." A recommendation that calculators be usedduring mathematics instruction was made by the National Coun-cil of Teachers of Mathematics (NCTM) in its An Agenda forAction, which urges that -mathematics programs [should] takefull advantage of calculators ... at all grade levels" (NCTM, 1980.p. I). In 1986. the NCTM again addressed the use of calcula-tors in mathematics classrooms and specifically stated that

The evaluation of student understanding of mathemati-cal concepts and their application, including standard-ized tests, should be designed to allow the use of thecalculator.... The National Council of Teachers of Math-ematics recommends that publishers, authors, and testwriters integrate the use of the calculator into their math-ematics materials at all grades levels. (NCTM, April 1986)

In the interim, the College Board (1983: Kilpatrick. 1985), theConference Board of the Mathematical Sciences (1983). the"School Mathematics: Options for the 19905" conference (Rom-berg. 1984), and a joint symposium sponsored by the CollegeBoard and the Mathematical Association of America (MAA)(Kenelly, 1989) have all recommended that calculators be usedduring both mathematics instruction and mathematics testing.

Most recently. the NCTM Commission on Standards forSchool Mathematics (1989) based its recommendations on theassumption that all students will have a calculator available tothem while studying mathematics. While this commission madeno recommendation about the kinds of calculators that shouldbe used in Grades K-4. they do recommend scientific calcula-tors for middle school (i.e.. Grade 5-8 students) and graphingcalculators for students in Grades 9-12. Further, theCommission's first evaluation standard states that: "Methodsand tasks for assessing students' learning should be alignedwith the curriculum's instructional approaches and etivitics,including the use of calculators" [emphasis added! (NCTivi Com-mission on Standards for School Mathematics, 1989. p. 193).

These recommendations have informed a broad audience,including mathematicians, mathematics educators and teach-

.1.

Page 152: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

144 Harvey

ers, school administrators, and parents. that school and collegemathematics curricula and tests of the future will require stu-dents to use calculators. However, neither singly nor in concerthave the groups making the recommendations described thechanges needed in present test and assessment procedures toassure that the achievement and aptitude of calculator-usingstudents will be accurately measured. Because there have beenno recommendations about the ways in which tests should bechanged. three approaches have been used that permit stu-dents to use calculators while taking tests. These approaches

1. permit students to use calculators, but give themtests that make no provision for calculator use. Iwill call this approach calculator-passive testing.

2. permit students to use calculators, but give themtests developed so that none of their items requirecalculator use. This approach will be called calcu-lator-neutral testing.

3. presuppose that students will need calculatorswhile taking the test. The test is developed sothat, for a majority of students, some portion ofthe items require calculator use in order to besolved successfully. An appropriate term for thisapproach is calculator-based testing.

In the next three sections, research and scholarship associatedwith each of these approaches will be considered.Calculator-Passive Testing

This approach to the use of calculators during testing wouldrequire no changes in the mathematics tests that are presentlyadministered to measure student achievement or aptitude. Let-ting students use calculators on tests that do not take intoaccount or plan for that use is potentially hazardous sincethese tests may have on them items that are "calculator-unac-ceptable" (also called "calculator-sensitive). In a previous paper,I have defined calculator-acceptable and calculator-unacceptableitems in this way:

1. An item is acceptable if (a) the objective(s) testedby it are the same whether or not a calculator isused and (h) the difficulty Ilevel of the item) seemsto be approximately the same when a calculator isused.

Page 153: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 145

2. An item is marginally acceptable if only la or lbholds.

3. An item is unacceptable if neither la nor lb holds.(Harvey, 1989a, p. 28)

An item was judged to change in difficulty if the thinking levelchanges when a calculator is used while responding to theitem: the thinking levels are those described by Epstein (1968):

I. recall factual knowledge,2. perform mathematical manipulations,3. solve routine problems.4. demonstrate comprehension of mathematical ideas

and concepts.5. solve nonroutine problems requiring insight or in-

genuity, and6. apply "higher" mental processes to mathematics.

(pp. 315-316)

Using these definitions, two placement tests that were partof the MAA Placement Test Program test package (i.e., Math-ematics Test A/4A and Mathematics Test CR/1B) were studied.Mathematics Test A/4A examines knowledge of the contenttypically taught in basic, intermediate, and advanced algebracourses: Mathematics Test CR/113 tests skills and understand-ings prerequisite for calculus. On the thirty-two-item Math-ematics Test A/4A, five of the items were judged to be margin-ally acceptable and six of the items to be unacceptable. On thetwenty-five-item Mathematics Test CR/1B, there were three mar-ginally acceptable and three unacceptable items. In Figure 8-1,three items from these two tests are shown; the first two wereoriginally judged as marginally acceptable and the third asunacceptable (Harvey. 1989a, p. 29).

1. 2

2 + 13

(a) (b) f (c)34 (d) 2 (e) 3

2. (8-")(9").(a) 6 (b) 6 (c) (72y' 2 (d) 2/3 (e) 3/2

3. For which values of x is tan x not defined?

(a) -II (b) n./2 (c) 0 (d) n/4 (e) rd3

Figure 8-1.Examples of marginally acceptable and unacceptable test items.

3

Page 154: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

146 Harvey

Using the above definitions of acceptability and unac-ceptability, I find all three items in Figure 8-1 unacceptable.When only paper and pencil are used, the first item tests stu-dents' knowledge of fractions concepts and operations and ofthe order in which operations are performed; An .ien a calculatoris used the only mathematical knowledge tested is order ofoperations. Item 1 is also less difficult if calculator use is per-mitted because students need only enter the numbers into thecalculator in a correct order to obtain a decimal approximationof the correct answer; changing the distractors so that they areless familiar fractions closer to the correct answer might allevi-ate this problem.

The second item fails to test the objectives for which it wasoriginally written (i.e., understanding of fractional and negativeexponents) since, when a scientific calculator is used, the itemtests the ability to enter the item stem into the calculator ex-actly as it appears except that parentheses are needed to en-close 73 and 72. My present judgment is that this also re-duces the difficulty level of this item.

Both the objective tested and the difficulty of the third itemchanges when a calculator is used because entering the num-bers one by one and pressing the tan key will reveal the correctanswer. Of the three items in Figure 8-1, this item would ap-pear to be the most calculator sensitive.

The Effects of Calculator-Passive Testing. Several instancesof calculator-passive testing have been reported. In six instances(Colefield, 1985; Connor, 1981; Elliott, 1980; Golden. 1982;Hopkins, 1978; Lewis & Hoover, 1981), standardized math-ematics achievement tests were used. In three of these studies(Colefield, 1985; Hopkins, 1978; Lewis & Hoover, 1981), thescores of students who were permitted to use calculators weresignificantly higher than were the scores of those of who werenot permitted to use calculators. A similar result was reportedby Murphy (1981) who used the Problem Solving AchievementTest: the authorship of this test was not indicated in the dis-sertation abstract. Murphy reported that "students with unre-stricted use of calculators achieved higher scores than stu-dents in the other three treatment groups in totalproblem-solving achievement." In this study there were twobinary blocking variables (i.e., calculator use during instruction

Page 155: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 147

and calculator use during testing); crossing the values of thesevariables produced the four treatment groups.

Connor's study (1981) investigated the use of calculatorsduring instruction on the concepts and techniques of trigonom-etry. One treatment group (N = 48) was permitted to use calcu-lators during both instruction and testing; another treatmentgroup (N = 50) was not permitted to use calculators duringeither instruction or testing. The test administered as the pre-and post-test in this study was the 1963 version of the Trigo-nometry Test in the Cooperative Mathematics Series. Analysisof the test data revealed no significant achievement differencesbetween the two treatment groups.

Verbal problem solving was the focus of the study con-ducted by Elliott (1980). There were two treatment groups inthis study: one group (N = 70) practiced verbal problem solvingusing only paper- and - pencil materials, while the second group(N = 67) was permitted to use calculators. Students in bothtreatment groups were given two post-tests; on one of these,they were permitted to use calculators and on the other, theywere not. Elliott reported no significant differences between thetreatment groups.

Golden (1982) studied the effects of calculator use on theachievement of EMR students in Grades 7-9. There were twotreatment groups; one group (N = 23) used calculators whilestudying the four fundamental algorithms while the other group(N = 27) did not. On the post-test there were no significantdifferences between the two groups on addition and subtrac-tion items. The calculator-using students did perform signifi-cantly better (p < 0.05) than the other group on multiplicationand division items.

There is only one calculator-passive study that attemptedto discover the effects of using calculators on an existing test.Gimmestad (1982) randomly chose a group of nineteen stu-dents from among those taking a Calculus H course at Michi-gan Technological University; all of the students in this grouphad been permitted to use calculators in the course. Nine ofthe students composed the calculator group; the remaining tenstudents, the non-calculator group. Each student was asked to"think aloud" while solving twenty-four sample problems fromthe College Board's Advanced Placement Calculus Examina-tion. Each student interview was videotaped, coded, and ana-

1

Page 156: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

148 Harvey

lyzed for the reasoning processes used and the results pro-duced by the student. These outcomes are important:

1. Calculator use seemed to change the strategiesused to solve only "a couple of problems." Gim-mestad does not identify which problems weresolved with changed strategies, but she does com-ment that multiple-choice problems in which thechecking method strategy (Harvey. 1989a) can beemployed seemed to be calculator-sensitive. Theuse of this process was negatively correlated withthe product scores (r = -0.25).'

2. Exploratory manipulations were more effectivewhen calculators were used. The problem cited byGimmestad was one that involved finding a limit.The process variable manipulation was also nega-tively correlated with product score (r = -0.13).

3. The frequency of checking by retracing steps forthe calculator-using si dents was twice that ofstudents not using calculators. The correlation re-ported for this process variable with the productscore is 0.47. Based on this result, Gimmestadconcluded "this may be an important differencebetween testing calculus with and without the cal-culator" (p. 3).

4. There was no significant difference between themean product scores of the test group that usedcalculators and the group that did not.

With the exception of Gimmestad's study, none of the cal-culator-passive studies reported here attempted to discover howthe use of calculators during testing changed the processesused by students or the objectives that were tested. In theother studies, there seems to have been an Implicit assumptionthat the objectives tested by an item remained unchanged whencalculator use was permitted. This assumption permitted Lewisand Hoover (1981) to argue. based on their results, that since

' The correlations reported are between the use of processes and theproduct score for all nineteen of the students and not for the ninecalculator-using students. I speculate that these correlations mighthave been different if the process use and product scores of only thecalculator-using students had been used.

Page 157: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 149

there was a nearly perfect correlation between the ranks of thestudents on the regular and calculator administrations of thetest they used, the pupil percentile ranks would be the samewhether or not calculators were used. Thus, they concludedthat the only change that would he needed to permit the use ofcalculators on this standardized test would be to re-norm thetest using data from calculator administrations of it.

However, as I have already argued. item objectives canchange when calculators are used, especially on computationalitems, since these items can be answered by simply keying intothe calculator an appropriate sequence of numbers and opera-tions. As a result, at least the "strictly" computational items onstandardized tests are no longer testing mathematics achieve-ment but instead are testing students' calculator facility: theresult is a changedand possibly distortedpicture of astudent's mathematics achievement or aptitude.

Calculator-Neutral TestingCalculator-neutral tests are tests that permit but do not re-quire those taking them to use calculators. To achieve thisgoal, the tests must not include any items on which calculatoruse will benefit test takers. One way in which calculator-neu-tral tests have been developed is to begin with an existing testand to determine, in some way, which of its items are calcula-tor-sensitive. Once this determination has been made, the cal-culator- sensitive items are replaced with new items that arenot calculator-sensitive (Leitzel & Waits. 1989). A similar strat-egy may have been used by persons developing new calculator-neutral tests, but in the studies I have examined (Abo-Elkhair,1980; Casterlow, 1980; Long, Reys. & Osterlind. 1989: Mellon.1985; Rule, 1980) the ways in which the items were generatedin order to make them calculator neutral were not discussed.

The National Assessment of Educational Progress (NAEP)(1988) has defined a calculator-neutral item as one whose so-lution does not require the use of a calculator (p. 33). Ideally. acalculator-neutral item should be one to which calculator-us-ing and non-calculator-using students respond equally well.So, when applying the NAEP definition to a potential test itemboth the objectives being tested and any calculator-related skillsneeded have to be considered. Figure 2 shows what I considerto be a stereotypical calculator-neutral item adapted from

r

Page 158: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

150 Harvey

an item on the calculator-neutral test given by Abo- Elkhair(1980).

The number of students in five classes is 25, 21, 27, 29, and 28. What isthe average number of students in each class?

Figure 8-2.Stereotypical calculator-neutral test item.

The study in which this item was used (Abo-Elkhair, 1980)was one in which students were taught about averages andaveraging. This item is calculator-neutral for students with agood understanding of the paper-and-pencil algorithms for ad-dition and division since they will not need a calculator toperform the necessary computations; the responses of thesestudents should accurately reflect their understandings of av-eraging. However, when students not having the needed com-putational proficiency are tested, the item may not help toaccurately measure student understanding. Within this group,the responses of the students who have facility with and arepermitted to use calculators will depend upon their under-standings of averages and averaging. But if students in thisgroup do not know how to use or are not permitted to usecalculators, then their responses will not reflect their knowl-edge of the objective being tested. This effect could be elimi-nated by providing all students with calculators and helpingthem to acquire the needed calculator-related skills. However,this has not typically been the practice; in all of the studies Iexamined. both calculator-using and non-calculator-using stu-dents participated. Perhaps the most extensive examination ofcalculator-neutral testing has been reported by Leitzel and Waits(1989).

The test used by the Ohio Early Mathematics PlacementTesting Program for High School Juniors (EMPT) is calculator-neutral (Leitzel & Waits, 1989). Until 1983-84. students takingthe thirty-two-item EMPT test were not allowed to use calcula-tors; since then students have been permitted to use any cal-culator they bring with them to the testing. Essentially. thesame thirty-two items were used on EMPT tests EB1 and EB2during the school years 1979-80 through 1982-83. In 1983-

1- 8

Page 159: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 151

84. a test consisting of six new items and twenty-six items fromthe EB2 test was given. During the next two years, three itemswere replaced to produce the test. EB4/5, used since 1984-85.The EMPT test EB4/5 has twenty-three items in common withEB1 and EB2. In modifying EB1 and EB2 to obtain EB3 and itssuccessor EB4/5. only one item was eliminated because it was"obviously calculator-dependent." (p. 18).

Leitzel and Waits reported that the means on EMPT testswere fairly stable from 1977-78 to 1985 -86; during this periodthe means varied from a low of 14.2 to a high of 15.2. Datareported for the school years 1983-84. 1984-85, and 1985-86.when students were permitted to use calculators, showed thatduring each of these years, calculator-using students had highertest means than did non-calculator-using students. Unfortu-nately. Leitzel and Waits neither reported nor statistically com-pared the means of the two groups of students. They have.however, studied some of the characteristics of calculator-usingversus non-calculator-using students. The calculator-using stu-dents were more likely than their non-calculator-using peersto: (a) be planning to attend a four-year college. (b) be takingAlgebra II as a junior, (c) be taking advanced mathematicscourses as a junior, and (d) have made a grade of A or B in thelast mathematics course they took. Based on these data, Leitzeland Waits concluded that "it is not surprising that the groupusing calculators performed at a higher level."

These investigators have also examined the difficulty levelsand calculator sensitivity of their test items. For most of the teneasiest and ten most difficult items on the EB4/5 test, bothcalculator-using and non-calculator-using students found theitems to be almost equally difficult. Only one of the items seemedto be much less difficult for the calculator-using students; thatitem, Item 8, is shown in Figure 8-3. It also proved to be themost calculator-sensitive.

The decimal fraction 0.222 most nearly equals:

(A) 2/10 (B) 2/11 (C) 2/9 (D) 2/7 (E) 2/8

Figure 8-3.EMPT test item that was less difficult for calculator-using students.

1 5j

Page 160: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

152 Harvey

To examine the calculator sensitivity of their items. Leitzeland Waits (p. 22) developed a calculator sensitivity (CS) indexthat is defined as follows:

% correct by calculator-using group% correct by non-calculator-using group

The thirty-two items on the EMPT test EB4/5 have CS valuesranging between 1.1 and 2. The higher the CS index, the morecalculator-sensitive the item would seem tc e. Three EMPTitems had a CS index greater than 1.6; six kr.: k CS indicesbetween 1.5 and 1.6; and the remaining twentythree itemshad CS indices not greater than 1.5. Leitzel ar ri Waits "believethe expected CS index range for items Ion EB4/5) that are notcalculator-sensitive is between 1.5 and 1.2. "2

The three items with the lowest CS indices -equire thatstudents (a) find a simultaneous solution for a pair of linearequations, (b) simplify the sum of two rational functions, and(c) multiply a imadratic expression by a linear one. The threeitems with high CS indices are Item 8 (see Figure 8-3) anditems that ask students to (a) compute the value (1/2)-s, and (b)simplify 32 - v2. When judged by the criteria for calculator-acceptable and calculator-unacceptable items given earlier inthis chapter. I judged the three items with low CS indices to beacceptable and the three items with high CS indices to beunacceptable.

A less extensive study than that by Leitzel and Waits (1989)was the study by Long, Reys. and Osterlind (1989) who investi-gated the differences in the scores of calculator-using and non-calculator-using students in Grades 8 and 10 on the MissouriMastery and Achievement Tests (MMAT). The investigators re-ported that the tests for Grades 7-10 were not designed to testcalculator use (i.e.. were intended to be calculator-neutral).Even so, of the nine released Items from the eighth-grade andtenth-grade MMAT tests, seven would seem to be calculator-sensitive even for students using a four-function calculator.Forty-five percent of the eighth-graders and 56 percent of the

2 I hypothesize that the lower bound on CS indices should be 1.00instead of 1.2. If CS is an accurate measure of calculator-sensitivity.then the best calculator-neutral item would be one with a CS Indexof 1.00.

Page 161: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

I

Mathematics Testing with Calculators 153

tenth-graders who took MMAT tests reported that they used acalculator. At the eighth-grade level, calculator-using studentssignificantly outperformed non-calculator-using students on thetest and on three of the four MMAT subtests: (a) understandingnumbers, (b) computation, and (c) interpretation and applica-tion (p < 0.001). At the tenth-grade level, the calculator-usingstudents also significantly outperformed non-calculator-usingstudents on the total test and on two of its three subtests: (a)computation and (b) interpretation and application (p < 0.001).

The outcomes described by Long, Reys, and Oster lind (1989)are like those reported by Abo-Elkhair (1980), Caster low (1980),and Mellon (1985). Rule (1980) does not report significant dif-ferences between his two treatment groups (calculator versusnon-calculator), who studied functions, graphing, function com-position, and inverse functions. On Rule's calculator-neutraltests, calculator use would be of little or no assistance; theitems require students to manipulate symbolic expressions, in-terpret graphs, or generate graphs. In one sense, on a calcula-tor-neutral test like Rule's, I would not expect to find differ-ences between calculator-using and non-calculator-usingstudents since the calculator-using students really have noopportunity to use calculators while taking the test. On theother hand, it is disappointing that Rule did not find differ-ences between his two treatment groups, since the commonlyexpressed hope is that calculator-using students will developbetter conceptual understanding of the mathematics they study.However, the instructional part of Rule's study lasted for onlyeleven consecutive instructional days, and the lessons presenteddo not Include explorations of the kind that may be needed toproduce deeper conceptual understanding. So, it is possiblethat Rule's calculator-neutral test would have shown the samekinds of differences described by the other studies discussedhad the calculator-using students had more opportunities toexplore functions using their calculators as tools.

Overall, the uses of the calculator-neutral tests describedhere showed that calculator-using students more often thannot outperformed their non-calculator-using counterparts. Thesestudies also show that great care must be used in developingcalculator-neutral items that might permit calculator use; insome instances, lack of rigor in developing these items canresult in an inaccurate test of the objectives stated for the item

1G'

Page 162: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

154 Harvey

or in an item that is calculator-sensitive instead of being calcu-lator-neutral. It seems to me that valid calculator-neutral testscan be developed that can be used with both calculator-usingand non-calculator-using students; the data presented hereshow, however, that it may be necessary to norm the scoresfrom these two groups separately.

Calculator-Based TestingUntil all mathematics students have and know how to employcalculators, it will probably be necessary to develop and usecalculator-neutral tests. However, when all students have avail-able and can use calculators effectively as tools, it will be pos-sible to administer calculator-based tests routinely.

In the course of my work, I define calculator-based math-ematics tests and calculator-active test items as follows:

A calculator-based mathematics test is one that (a) testsmathematics achievement, (b) has some calculator-ac-tive test items on it and (c) has no items on it that couldbe, but are not, calculator-active except for items thatare better solved using non-calculator based techniques.A calculator-active test item is an item that (a) containsdata that can be usefully explored and manipulated us-ing a calculator and (b) has been designed to requireactive calculator use. (I larvey. 198913. p. 78)

These definitions must be interpreted by those using them. Thefirst two criteria for a calculator-based test can be strictly ap-plied. The first is intended to affirm that the objectives of calcu-lator-based mathematics tests should be mathematics objec-tives and that it is not the intent of these tests or their items totest calculator facility solely. Criterion (a) proceeds from theassumption that test takers will already have adequate facilitywith the calculator to take the test; this criterion agrees with arecommendation made by a joint symposium on the use ofcalculators in standardized testing convened by the CollegeBoard and the Mathematical Association of America (Kenelly,1989, p. 47). Criterion (b) is intended to ensure that a calcula-tor-based test will contain items that require students to usetheir calculators while taking the test; this criterion distin-guishes calculator-based tests from calculator-neutral tests likethe one given by Rule (1980). The third, criterion (c). cannot be

.32

Page 163: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 155

strictly applied since a judgment is required about the best wayof solving a problem. As an example, consider the items shownin Figure 8-4. These items appear on college-level placementtests being developed by the MAA Calculator-Based PlacementTest Program Project.

x2-9 62x 3x+9

x-3a) -1 b) -3 c) x-3 d)x + 3

e)

If 21.-00. 10% then xis

a) 600 b) 903.090 c) 2079.442 d) 9965.784 e) undefined

Copyright 1989 by the Mainemabcal Assoceten of Amenca. Used by permsoon

Figure 8-4.Test items satisfying criterion (c) of a calculator-based test.

There are at least two ways to solve the first problem: one ofthese ways would seem to require calculator use while theother does not. On an untimed test, calculator use would per-mit use of the specific-instance strategy (Harvey. 1989a); toemploy this strategy a student would substitute numbers forthe variable x in each of the expressions in the item stem andin the foils and would select an answer by comparing the nu-meric evaluations of the item stem to those of the foils. Asecond, and better, way of solving this problem is simply tofactor each of the expressions in the item stem, cancel liketerms wherever possible, and combine the remaining terms soas to reach one of the multiple-choice answers.

A mathematically correct way to solve the second problemwould be to find 23000 and then to take the base-10 logarithm ofthat result. Present calculators give an error message when23C0'3 is entered. A way to solve this problem is first to take thebase-10 logarithm of both sides of the equation and then tomultiply log (2) by 3000. Another way to solve the problemwould be to estimate the size of 230410 as 101" and so. to deter-mine that x is about 1000. Using this approximation, thecorrect answer to the problem becomes apparent.

Students taking calculator-based tests will be "calculator-dependent": that is, they will actively use calculators as tools

3

Page 164: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

156 Harvey

and many of the techniques and algorithms they use for solvingproblems will be calculator-based. Students who study fromcurricula that meet the NCTM Standards (NCTM Commissionon Standards for School Mathematics, 1989) will he calculator-dependent. Criterion (c) was included in the definition of calcu-lator-based mathematics tests to require test developers to in-vestigate the calculator solutions for each item and to judgewhether or not a calculator-based solution can and should beexpected. This examination should produce valid tests for cal-culator-dependent test takers. This criterion is also intended,for the present, as a reminder to experienced mathematics testdevelopers that the non- calculator Lchniques and algorithmsfor solving problems that they know and successfully apply arenot the ones that will be used by the intended test audience.

My definition of a calculator-active test item is like thatgiven by the National Assessment of Educational Progress (1988,p. 33). However, the definition given here is more specific inthat it insists that the item contain data that can be usefullyexplored using a calculator. This criterion is intended to pre-vent labeling an item as calculator-active when, for example.the only calculator activity required is to change an answercomputed as a common fraction into a decimal approximationof that answer. The second criterion for a calculator-active itemshould probably be modified to read, "designed so as mostlikely to require calculator use," since there are presently fewinstances in mathematics where paper-and-pencil procedurescannot be used. For example, it is not easy without a calculatorto approximate the powers of e or continuously compoundedinterest or to compute the combinations of it objects taken r ata time when it is large. but there are paper-and-pencil tech-niques that apply in each situation. However. it is not likelythat calculator-dependent test takers would think to use thesetechniqueseven if they know about thembecause the calcu-lator is more facile and faster in these situations. Even with thesuggested modification, the second criterion signals test devel-opers that they must plan for active calculator use and, if theitem is a multiple-choice one, to develop the foils based both onthe mathematical errors students make while solving such prob-lems and on the exact form those incorrect answers take whencalculatoes are used.

1

Page 165: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 157

Recommendations for Test Construction. The joint sympo-sium sponsored by the College Board and the MathematicalAssociation of America considered and made recommendationsabout a number of issues related to the development of calcula-tor-based mathematics tests. One of these recommendations,that the tests be curriculum based and should not measureonly calculator skills or techniques, has already been men-tioned. The symposium participants made eight additional rec-ommendations; three of these recommendations pertain to thedevelopment of calculator-based mathematics tests. The perti-nent recommendations are:

1. Studies are needed that will identify the contentareas of mathematics that have gained in impor-tance because of the emergence and use of tech-nology. In addition, the ways in which achieve-ment and ability are measured in these areasshould be studied as should new ways of testingachievement and ability.

2. Choosing whether or not to use a calculator whenaddressing a particular test question is an impor-tant skill. Thus, not all questions on calculator-based mathematics achievement tests should re-quire the use of a calculator.

3. Nationally developed tests of calculator-basedmathematics achievement tests should provide de-scriptive materials and sample questions thatclearly indicate the level of calculator skills needed.(Kenelly. 1989. pp. 47-48)

The last cited recommendation goes on to say that studentsshould be permitted the use of any calculator as long as it hasthe functionality required to solve the problems on the test.When these two parts are tai.en together the result is a recom-mendation that test developers specify the least capable calcu-lator needed to respond to the calculator-active items success-fully but should not bar the use of more capable calculators. Atthe time of the symposium, the first graphing calculator, theCasio fx-70000, had just been introduced and the Hewlett-Packard HP28C would not be Introduced for another three

135

Page 166: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

158 Harvey

months. These calculators and calculators like them can givestudents who have them an advantage. Graphing calculatorsadd the possibility of geometric problem solving; calculatorsthat graph and symbolically manipulate might change sometest items into tests of the student's calculator facility.

For example, if asked to find the real zeros of a polynomialfunction, a student having only a scientific calculator could usethat calculator to check for the rational zeros of the function, todevelop a table of values so as to sketch a graph of the func-tion, and to apply numeric techniques for approximating thereal zeros. A student solving that problem with a graphingcalculator could develop a complete graph of the function, ob-serve the places where that graph crosses the x-axis. and usethe calculator's "zoom-1W program or [SOLVE] key to approxi-mate the real solutions. Even if these two students were askedto find only one real zero of a given polynomial function, thestudent having the graphing calculator would still have an ad-vantage. Thus, in developing calculator-based tests it will benecessary to specify both the least capable and the most ca-pable calculator that can be used while taking the test.

Research on Calculator-Based Mathematics 'rests. It seemsquite likely that many calculator-based tests have been devel-oped; however, not many of these tests have been widely circu-lated or discussed. As a result, this section discusses a singledissertation study. the tests being developed by the MAA Cal-culator-Based Placement Test Program Project, and the chaptertests developed by the Ohio State Calculator and ComputerPrecalculus Curriculum (C2PC) Project.

The three calculator-based unit tests were developed byBone (1983) as part of her study of the effectiveness of anintroductory unit on circular functions; all of the items onthese tests were free response items. The total number of ques-tions and calculator-active questions on each test is given inTable 8-1; Bone did not report test and item statistics for anyof the tests. The four calculator-active items on which Bone'sten subjects scored most poorly were two word problems, anitem that asked students to make a table of values in order tosketch the graph of a function, and an item that asked stu-dents to find the value of an angle in the third quadrant givenits cosine. The calculator-active items that almost all students

-Lull

Page 167: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 159

Table 8-1Types of Questions on the Tests Developed by Bone

Number ofUnit Number of Calculator-ActiveTest Questions Questions

I 24 14II 46 10

III 26 1

Note. Source: Bone, 1983. Used with permission.

answered correctly are on the Unit I test and are items requir-ing simple computations. Examples of the test questions usedby Bone are in Figure 8-5.

Unit I Test Items

1. Use a calculator to find the value of sin 27° to 4 decimal places. (2.5 points)

2. Change 4.75 from radians to degrees (to the nearest 0.001). (2.5 points)

3. Assuming the earth is a sphere with radius of 4000 miles, how far is Tokyo fromAdelaide (to the nearest mile)? Tokyo, Japan is located 35° 30' North latitudeand Adelaide, Australia, is located at about 35° South latitude, just about duesouth of Tokyo. (10 points)

Unit II Test Items

4. Use a calculator to find the value of cos (-11/6) to 4 decimal places (to thenearest 0.001). (2 points)

5. Given sec 0 = 2.13 find the smallest positive measure for 0 to the nearestdegree. (2 points)

Unit Ill Test Item

6. Sketch the graph of y = 3 sin (4nt) for 0 f S 1. Locate points at intervals of 0 25(6 points)

Figure 8-5.Examples of calculator-active items included on Bone's unit tests

(Bone, 1983). Used by permission.

The MAA Calculator-Based Placement Test Program (CBPTP)Project is developing six college-level placement calculator-basedplacement tests and two high school prognostic calculator-basedtests intended to be used in testing high school juniors in orderto forecast the mathematics courses these students would be

1u

-

Page 168: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

160 Harvey

placed in upon entering college if they take no additional highschool mathematics courses. The CBPTP college-level place-ment tests will all require students to use scientific (non-pro-grammable) calculators; the high prognostic tests will requirestudents to use graphing calculators. On each of these sixtests. about 25 percent of the test items will be calculator-active.

Development of two of these tests, the Calculator-BasedArithmetic and Skills Test (CB-A-S) (Boyd et al., 1989) and theCalculator-Based Calculus Readiness Test (CB-CR) (Kenelly etal., 1990), has been completed. Development of two additionaltests, the Calculator-Based Basic Algebra Test (CB-BA) and theCalculator-Based Algebra Test (CB-A), is nearly completed. Thedata reported here are from the tryouts of these tests with highschool or college students.

The CB-A-S test consists of thirty-two items that test stu-dents' knowledge of arithmetic and pre-algebra; seven of theitems on this test are calculator-active. When administered to191 students enrolled in remedial college mathematics courses.the mean score on this test was 16.01 (s.d. 6.53). the reliability(coefficient a) was 0.86. and the mean item difficulty was 0.50.The r-biserials and item difficulties for the seven calculator-active items are shown in Table 8-2.

Item

Table 8-2r-biserials and Difficulties for CB-A-S

Calculator-Active Items

r biserial Item difficulty

1 0.44 0.302 0.44 0.183 0.54 0.604 0.45 0.555 0.57 0.416 0.46 0.347 0.56 0.30

Note. Source: Mathematical Association of America, 1989, 1990. Used by permission.

All of the calculator-active items on the CB-A-S test corre-late well with the other items on the test; most of the items are,as intended, of medium difficulty though, in general, they areharder than is the average item on this test. The hardest item,

1 66

Page 169: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 161

Item 2. is one that asked for an approximation of the value ofN2 N when N = -1.1. The next two most difficult items askedstudents to approximate the value of (1 + 1/6)4 and of 1/(3 +5), respectively. The easiest calculator-active item. Item 2 askedfor an approximation of n when nx n x n= 63.

The CB-CR test has two parts. The arst part is composed oftwenty items intended to test precalculus knowledge; the sec-ond, five-item part specifically tests knowledge of trigonometryand elementary functions. There are nine calculator-active itemson CB-CR. When given to a group of thirty-six high schoolseniors finishing their precalculus course, the mean score onthis test was 16.81 (s.d. 3.49), the reliability (coefficient a) was0.65. and the mean item difficulty was 0.67. When the calcula-tor-active items are considered alone the mean score was 4.67(s.d. 2.11). the reliability (coefficient a) was 0.61. and the meanitem difficulty was 0.52. The r-biserials and item difficulties forthe nine calculator-active items are shown in Table 8-3.

Table 8-3r-biserials and Difficulties for CB-CR

Calculator-Active Items

!tern r-biserial Item difficulty

1 0.50 0.562 0.44 0.503 0.18 0.974 0.44 0.535 0.46 0.366 0.55 0.397 0.45 0.398 0.28 0.509 0A8 0.47

Note. Source: Mathematical Association of America, 1989. 1990. Used by permission.

The item that is easiest is also the item with the lowest r-biseriat. This item defines two functions and asks that a valueof the function that is the composition of the given functions becomputed at x = 1.7. The other item whose r- biserial value Isan outlier is Item 8: this item requests that the value of tan(2n /5) be computed. The remaining calculator-active items are ofmedium difficulty and correlate well with the other. non-calcu-lator-active Items on the test that were, for the most part. takenfrom the existing MAA calculus readiness placement test.

1

Page 170: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

162 Harvey

The Calculator-Based Basic Algebra Test (CB-BA) (Curtis etal.. in press) consists of twenty-five items from basic and inter-mediate algebra. On the CB-BA test, eight of the items arecalculator-active. When this test was given to 256 studentscompleting their algebra and precalculus courses, the meanscore was 16.44 (s.d. 3.90). the reliability (coefficient a) was0.75. and the mean item difficulty was 0.66. On the subtestconsisting of the calculator-active items, the mean score was4.08 (s.d. 1.64), the reliability (coefficient a) was 0.38. and themean item difficulty was 0.51. For those students who areplanning to take courses beyond basic and intermediate alge-bra. the test proved to be slightly more difficult than is usuallyintended for placement tests: however, the more difficult itemswould appear to be among the calculator-inactive and calcula-tor-neutral items. These items were largely drawn from amongthose on the existing MAA Basic Algebra Test. The r-biserialsand item difficulties for the calculator-active items is shown inTable 8-4.

Item

Table 8-4r-biserials and Difficulties for CB-BA

Calculator-Active Items

bisenal Item difficulty

1 0.35 0.632 0.38 0.333 0.42 0.714 0.23 0.395 0.24 0.346 0.37 0.457 0.48 0.438 0.29 0.79

Note. Source: Mathematical Association of America. 1989. 1990- Used by permission.

Items 4 and 5, the two items on CB-BA with the lowest r-biserials. also are two of the harder calculator-active items onthis test. In each case, the two lowest quintiles of studentstaking the test responded correctly to these items less than 30percent of the time, and the highest quintile responded cor-rectly to the item about 60 percent of the time. One of theseitems asked students to determine the Interval in which thegraph of 2.5x - y + 8.2 = 0 crosses the x-axis, while the otherasked students to approximate the larger root of a quadratic

Page 171: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 163

equation. The two easiest calculator-active items, Items 3 and8, asked respectively for an approximation of x when (x 5)5 =10 and for the missing test score given three of the scores andthe mean score.

The remaining test that has been developed, so far, by theCalculator-Based Placement Test Project is the Calculator-BasedAlgebra Test (CB-A) (Cederberg et al.. in press). The CB-A testincludes items from basic, intermediate, and college algebra.This test consists of thirty-two items; eight items are calcula-tor-active. At present, data are available for only six of the eightcalculator-active items, because after the last tryout of the test(N = 210), two of the items were discarded and replaced withnew ones. When these two items were deleted and the datafrom the last tryout were reanalyzed on the resulting thirty-item subtest, the mean score was 11.56 (s.d. 3.49), the reliabil-ity (coefficient a) was 0.51. and the mean item difficulty was0.39. The mean score on the six calculator-active Items was1.73 (s.d. 1.15): this subtest had a reliability (coefficient a) of0.11 and a mean item difficulty of 0.29. In contrast, the twenty-four-item subtest consisting of the twenty-four calculator-inac-tive and calculator-neutral items produced a mean score of9.82 (s.d. 3.10), a reliability (coefficient a) of 0.50, and a meanitem difficulty of 0.41. Overall, this was'a difficult test for thesample of students who took it. and for those students thecalculator-active items were, overall, more difficult than werethe non-calculator-active items. The r-biserials and item diffi-culties for the calculator active items are shown in Table 8-5.

Item

Table 8-5rbiserials and Difficulties for CB-A

Calculator-Active Items

r-biserial Item difficulty

1 0.29 0.412 0.12 0.303 0.36 0.274 0.12 0.235 0.13 0.196 0.19 0.32

Note. Source: Mathematical Association of America. 1989. 1990. Used by permission.

Items 2. 4. and 5 have low r-biserials: two of these itemsare the most difficult of the calculator-active Items on the test.Item 2 is a similar triangle problem In which all of the lengths

.4 4

Page 172: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

164 Harvey

given are decimal fractions. Item 4 seeks an approximationto the root of a quadratic equation, and Item 5 seeks a simi-lar approximation, except that the equation in the item stem is3 qx2 + 1 = 2.98. It is my observation that these are conceptsand skills that are usually difficult for students. Thus, the useof a calculator to solve these problems may contribute to theirdifficulty but calculator use is not a major factor.

Since the content of the four calculator-based placementtests range from tests of arithmetic and skills to knowledge ofprecalculus, the calculator-active items suggest that calculatoruse should be expected during their solution. Figure 8-6 showsrepresentative calculator-active items that appear on CB -A -S.CB-BA, CB-A, or CB-CR. The distractors for each of these itemsare based upon mathematical errors that students make andare not intended to measure students' calculator facility. It isexpected that students who take these placement tests will al-ready have and know how to use calculators of the kind needed.

1. Which of the following best approximate 6(1.4 1.2)6?

(A) 0.0003 (B) 0.0019 (C) 0.7805 (D) 0.7850 (E) 17.3395

2. The approximation of (1 + 1i6)4 correct to 4 decimal places is

(A) 1 0008 (B) 1.1667 (C) 1.8526 (D) 2.1614 (E) 4.6667

3. If x3+ 2.75 = 5.12, then which of the following best approximates x?

(A) 1.33 (B) 13.31 (C) 113.42 (D) 131.47 (E) 487.44

4. The radius of the larger of two concentric circles is 6.9; the radius of the smallercircle is 4.7. Which of the following numbers best approximates the area of theregion between the two circles?

(A) 13.82 (B) 15.21 (C) 25.52 (D) 80.17 (E) 149.57

5. A pole 7.8 feet high casts a shadow 12.8 feet long. If the length of a shadow cast bya tree is 83.9 feet, which of the following best approximates the height of the tree?

(A) 51.1 (B) 78.9 (C) 88.9 (D) 99.8 (E) 137.7

6. Which of the following best approximates a solution of x2- 4x = 3?

(A) -2.65 (B) -0.65 (C) 0.65 (0) 1.73 (E) 3

7. Which of the following best approximates the number approached by the sequence(3/2)', (4/3)6. (5/4)8.... ((n+ 1 )!n)2"...

(A) 1 (B) 2.718 (C) 6.192 (0) 7.389 (E) No finite mber

Page 173: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 165

8. Which of the following best approximates the sum of the areas of the rectanglesshaded in the figure below?

(A) 0.131(8) 0.163(C) 0.194(D) 0.538(E) 1.944

Copyright 1989. 1990 by the Mathematical Association of America. Used by permission.

Figure 8-6.Sample calculator-active items.

While the four tests developed by the MAA Calculator-BasedPlacement Test Program Project give examples of calculator-active items that can be developed when scientific calculatorsare required, these tests provide no examples of the kinds ofitems that result when graphing calculators are required. TheOhio State University Calculator and Computer Precalculus Cur-riculum (C2PC) Project has developed precalculus text materialsthat require the use of calculator or computer graphing tools(Demana & Waits, 1989) and tests; some of whose items alsorequire the use of these same tools (Demana et al., 1990).Items from these tests were used on the two midterm tests andthe final examination in a single section of Algebra and Trigo-nometry at the University of Wisconsin-Madison during the FallSemester. 1989-90. Table 8-6 describes some characteristics ofthese tests.

Table 8-6Characteristics of Algebra and Trigonometry Tests

Test Number of Free-response Multiple-choicestudents items items

NMean(s.d.) Reliability Difficulty

Mid-term I 75 6 26 64.16' 0.52 0.82(7.94)

Mid-term II 72 6 26 58.67' 0.73 0.75(11.02)

Final exam 72 11 19 36.00' 0.63 0.63(9.05)

Each item had a value of 3 points.

Page 174: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

166 Harvey

On the first midterm test, five free-response and three mul-tiple-choice items clearly would require students to use thegraphics capabilities of their calculators to respond to them.On the second midterm test, these numbers were seven andfour, respectively. On the final examination, only one multiple-choice item required graphics calculator capability: a majorityof the free-response portion of this test required students touse graphics capabilities. Table 8-7 gives the r-biserials anditem difficulties for the eight multiple-choice items on the teststhat were graphics calculator-active.

Table 8-7r-biserials and Difficulties

Graphics Calculator-Active Item

Item r-biserial Item difficulty

1 0.26 0.872 0.46 0.873 0.35 0.854 0.48 0375 0.43 0.606 0.30 0.257 0.41 0.888 0.36 0.67

Most of the graphing calculator-active multiple-choice itemscontain specific instructions to use a grapl or implicitly sug-gested that use. An example of an item that specifically toldstudents to use a graph is the one on the first mid-term testthat stated: "Use a graphing utility to determine the number ofreal solutions to the equation 4x3 10x+ 17 = 0." On that sametest the item stem implicitly called for the use of a graphingutility when it asked: "Which one of the following viewing rect-angles3 gives the best complete graph of y = 10x3 6x2 + 20rOverall, these items correlated satisfactorily with all of the mul-tiple-choice items on the test: they ranged from very easy tomoderately difficult.

On each of the three tests administered to the algebra andtrigonometry class there were a number of items that could besolved algebraically or graphically. It is not known how stu-

3 The viewing rectangle is a description of the minimum and maximumx- and y-coordinates that arc shown on the graphics screen.

Page 175: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Mathematics Testing with Calculators 167

dents actually solved these problems; an example was the onethat asked students to determine the period of the function fix)= 3 sinOtx).

The free-response items on these tests varied in that someof them requested symbolic manipulations and solutions, someexpected students to use algebraic techniques and algorithmsto produce exact solutions, and some required the use of graph-ing calculators for their solution. The scores on the free-re-sponse and the multiple-choice portions of the two midtermtests were moderately well correlated. The correlation coeffi-cient of these two parts on the first midterm test was r = 0.63;

it was r = 0.61 on the second midterm test. On the first test,students correctly responded, on average, to 82.18 percent ofthe multiple-choice items and to 65.48 percent of the free-response items. On the second midterm test, the correspondingdata were 75.21 percent and 69.3E percent, respectively. Thecorresponding data for the final examination have not yet beencomputed.

The tests and test items that have been produced by theMAA Calculator-Based Placement Testing Project and the OhioState C2PC Project demonstrate that valid, reliable calculator-based tests and calculator-active items can be generated thatsatisfy the definitions of these terms that were given earlier inthis paper. At present there is a paucity of published calcula-tor-based tests and calculator-active items. In order to studythe items that have been developed and, at the same time, tokeep them secure, faculty from the University of Chicago. theOhio State University, the Pennsylvania State University. andthe University of Wisconsin-Madison are establishing a pool ofcalculator-active items in content areas including algebra. pre-calculus, and functions.

CONCLUSION

This paper begins with a quote that avers that present testingpractices hold today's students hostages to yesterday's mis-takes. One reason for this is that "What is tested is what getstaught" (Mathematical Sciences Education Board, 1989. p. 69j.

Thus, as long as mathematics tests fail to incorporate the useof calculators. I am certain that mathematics instruction willfail to incorporate the use of calculators effectively, and sotoday's students will be prisoners to a mathematics curriculum

115

Page 176: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

168 Harvey

that is failing to prepare them for the society in which they willlive both now and in the twenty-first century.

Just permitting students to use calculators while takingmathematics tests will not be enough. Students will need to betaught how and when to use calculators while solving all kindsof mathematics problems. Equally important. tests will have toactively account for the changes in the ways that mathematicsproblems are solved and the kinds of mathematics problemsthat can be solved when calculators are used. I conclude thatcalculator-passive and calculator-neutral tests do not satisfac-torily account for these changes and that only calculator-basedtests can. In addition, it seems clear that each time calculatorsbecome more capable and more responsive to mathematics in-structionand each is occurringmathematics tests will haveto be changed.

While the use of calculators on mathematics tests and, byimplication. in mathematics instruction will not remedy all ofthe failings of present tests and instruction, their use is neces-sary if we want students to investigate, to explore. and to dis-cover mathematics.

I" 6

Page 177: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

9

Gender Differences in Test Taking:A Review

Margaret R. Meyer

Ideally, when students take a mathematics exam, the onlything that should influence their score is their mastery of thematerial being tested. This paper reviews evidence concerninggender differences in mathematics test taking. It examinesseveral factors which have surfaced relating to differences inperformances for males and females. One conclusion reachedis that the use of the multiple-choice format may result in amale advantage. A recommendation is therefore made thatassessment instruments be developed that do not rely as heavilyon the multiple-choice format.

Do males differ in their mathematics test-taking perfor-mance independent of their understanding of the mathematicsbeing tested? This review attempts to answer this question.Although the focus will be on mathematics tests, very littleresearch has looked specifically at mathematics test taking.Therefore, evidence from more general test taking will bepresented.

Several factors have been investigated that relate to differ-ences in test performance for males and females. These factorsare power vs. speed test conditions. Item difficulty sequencing,exam format, test-wiseness, risk taking behavior, and test prepa-ration behaviors. The first three of these factors have received

169

Page 178: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

170 Meyer

the most attention in the literature. The other factors are usu-ally included in studies as covariates. Gender of examinee isnot always included as a factor of interest. Those studies thatdid not include gender will be reviewed only when they illumi-nate those that did include it.

POWER VS. SPEED TEST CONDITIONS

One characteristic of a test is the amount of time available tocomplete it. This time factor defines an examination given un-der power or speed conditions. In a speeded test, score differ-ences are determined by differences in the rate of response tothe test items: that is, the amount of time available is usuallylimited, and those who respond at a slower rate may not finishall of the items. The degree of speededness varies across tests.The difference between a highly speeded test and a moderatelyspeeded test is the number of test takers expected to finish inthe time allowed. In contrast, in a power test the score differ-ences are independent of the rate of response. That is, everyonehas enough time to respond to the items and relative scoreswould not change if more time were available.

It is obvious that, from the test-taker's point of view, powertests would be preferred. However, from the test-giver's point ofview, this is not always feasible or practical. It is also clear thatresponse rate is not always strongly related to accuracy ofresponse. Speed of response might be related to personalitycharacteristics like risk taking rather than to differences incognitive factors.

The results from the limited research available on the inter-action of sex and speededness are mixed. Kappy (1980) lookedat the effects of speededness on the Graduate Record Exam(GRE) for males and females. For both the quantitative andverbal portions of the GRE, little evidence was found of differ-ential speededness patterns for the sexes. Another study (Wild,Durso, & Rubin, 1982) involving the GRE investigated whetherincreasing the amount of time per question for the verbal andquantitative sections of the exam would have a differential ef-fect on examinee groups defined by sex, race, and number ofyears since completing an undergraduate degree. The resultsshowed that although a larger portion of examinees were ableto finish the test when given additional time, this extra time did

I II()4. 4 (3

Page 179: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 171

not differentially improve the performance of any of the groupsstudied. In particular females did not significantly increasetheir scores relative to the males.

Much of the research on test conditions investigates theeffects of speeded tests on motivation and anxiety. In 1984,Hill, for example, examined the interaction of anxiety and timepressure and concluded that the low performance of high-anx-ious students is not. due to lack of mastery of the material, butrather to motivational and test-taking problems that can becorrected. In a similar study, Plass and Hill (1986) looked atthe interrelation of time pressure, test anxiety, and sex onthird- and fourth-grade students taking a test composed of age-appropriate arithmetic problems. Using scores from a measureof test anxiety, the 173 students were divided into three groupsbased on low, medium, and high test anxiety. Approximatelyequal numbers of students from these groups were assigned atrandom by grade and sex into each of the two experimentaltesting conditions: one under time pressure and one in theabsence of time pressure. Analyses of the data showed signifi-cant effects for the time pressure condition, level of anxiety,and sex. The children showed better performance without timepressure; low-anxious children scored higher than both middle-and high-anxious children, and females scored better thanmales. In addition there was a significant three-way interactionof time pressure, test anxiety, and sex. The authors report:

In the condition removing time pressure, there are strongoptimizing effects for boys but not for girls. Both high-and middle-anxious boys catch up completely with theirlow-anxious counterparts ... In cont. ast, girls showedweaker interfering effects of anxiety. and there are r ooptimizing trends for high-anxious girls, who actuallyperform best under standard testing conditions. (p. 33)

The study also Investigated the amount of time that stu-dents In the various anxiety groups took per problem. Usingperformance rate as the dependent variable in an analysis ofcovariance (ANCOVA), significant main effects for anxiety leveland sex were found. High-anxious and low-anxious studentstook less time per problem than middle-anxious students andboys worked faster than girls. None of the interaction effectswas significant. An examination of accuracy and performance

Page 180: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

172 Meyer

rate revealed that high-anxious girls showed a slow rate withmiddle performance accuracy whereas high-anxious boys showeda fast rate with low performance. The authors summarized theimportance of the rate-accuracy trade-offs in understandingperformance and anxiety effects in testing as follows:

The data indicate that there is an optimal. Intermediaterate for high test performance. shown by low-anxiousboys and girls in the present study. Middle-anxious chil-dren. especially girls. showed an accurate but too slowrate, whereas high-anxious children, especially boys,showed a too fast, inaccurate strategy. (p. 35)

The authors conclude that current testing programs couldbe improved if students were tested twice, once under standardconditions and once under optimizing (without time pressure)conditions.

Graf and Riddell (1972) evaluated the factor of time differ-ently by measuring the effect of context on problem-solvingperformance and the amount of time used to solve the prob-lems. Context was manipulated by presenting the subjects withtwo mathematically identical problems. one considered to havea context more familiar to females and one a context morefamiliar to males. Results showed that although males andfemales did Jr. -` differ in the amount of time they took to solvethe problem wren the female eontext. females took significantlymore time to solve the problem having a male context. Thestudents' perception of the difficulty of the two problems wasalso measured. Males perceived that the two problems were ofequal difficulty. The females perceived that the problem set in amale context was the more difficult. It is not clear whether thisperception was a result of their experience with the problem(i.e., it took them longer to solve it and therefore they thought itmore difficult), or whether they found it more difficult at theonset and therefore took more time in solving it. There was nodifference in their accuracy in solving the two problems. Theauthors concluded that between-sex differences In problem solv-ing could be significantly decreased by giving power tests ratherthan speed tests.

In summary, these studies do not strongly support the no-tion that time pressure diffel entIally affects the performance offemales and males on tests of mathematics. However, time pres-

1

Page 181: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 173

sure could interact with other individual variables, such as testanxiety. to result in differences for males and females (Plass &Hill. 1986). A conservative approach would suggest:

that unspeeded tests of cognitive abilities should be usedwhenever sex-related differences are being investigated.It also suggests that using a speeded aptitude test as acriterion variable when examining sex-related differencesin a typically male domain (e.g., mathematics) may beinappropriate or produce misleading data. In such male-typed areas. the true scores of high ability females maybe underestimated. (Dwyer, 1979, p. 341)

ITEM DIFFICULTY SEQUENCING EFFECTS

Research on item sequencing effects investigates the differ-ences in performance on achievement examinations as a resultof changing the sequence of the test items. The most frequentarrangements are from easy-to-difficult. difficult-to-easy, spiralcyclical, and random. Arguments can be made for the potentialmerit of each of these arrangements. In the easy-to-hard ar-rangement, for example, beginning a test with easy questionscould provide early success and therefore encourage continuedeffort. On the other hand, beginning a test with hard problemscould challenge the student. In addition, the difficult itemsmight be answered more easily when the examinee is less fa-tigued. Possible negative effects of this arrangement are alsoobvious. Examinees could become discouraged by encounteringdifficult items at the beginning of the test, especially If theythought the items would become increasingly difficult as theyprogressed. Spending time on the difficult items might result Innot allowing enough time to answer the easy questions.

The results of studies on item arrangement that have in-cluded sex as a variable have been mixed. Plaice et al. (1982)investigated the interactive effects on performance of the sex ofthe subject. test anxiety. item arrangement, and knowledge ofarrangement on a mathematics test. The forty-eight-item mul-tiple-choice mathematics test was composed of items from theACT College Mathematics Placement program. It was consid-ered slightly speeded. Three forms of the test were constructedusing the item difficulty indices: easy -to -hard. uniform or spi-

Page 182: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

174 Meyer

ral cyclical, and random. For each form, half of the test book-lets informed the examinee of the item arrangement and halfdid n Anxiety was a covarlate. Results of the three-factorfixed-effects ANCOVA (item order, knowledge of ordering, andsex) showed a significant main effect for sex and a significantsex-by-order interaction. Overall, males performed better thanfemales, and significantly better than females on the easy- to-hard ordering of the items. Males also performed better thanfemales on the random item ordering. Knowledge of the itemarrangement did not appear to significantly influence test per-formance.

Plake, Pati nee, and Whitney (1988) investigated the effectsof item context on differential item performance between malesand females. The speeded test consisted of twenty mathematicsitems selected by content from a pool of items. Three forms ofthe test were assembled based on the difficulty indices: easy-to-hard, easy-to-hard within content, and spiral cyclical. Nosignificant main effect was found for form or form-by-sex inter-action. A significant main effect was found for sex, with femalesoutperforming the males. Plake concluded "that item arrange-ment is not a potent variable in producing differential itemperformance between males and females" (p. 892).

Similar results were found in a study (Klimko, 1984) involv-ing college students in an Introductory educational psychologycourse; it examined the effects on test performance of itemarrangement. cognitive entry characteristics, test anxiety, andsex. Three forms of the fifty-item multiple-choice midterm ex-amination were used: easy-to-hard. hard-to-easy. and random.Of the four independent variables, the only significant predictorof achievement differences was student cognitive entry charac-teristics. Item arrangements based on item difficulties and sexdid not influence performance. "'he author cautioned againstdrawing conclusions based upon gender due to the small num-ber of males in the study.

Item order and sex were among the variables considered ina study of fourth graders by Kleinke (1980). Two forms of thespeeded social studies test were used: easy-to-hard and uni-form. Although the boys outperformed the girls, there were nosignificant interaction effects for sex and item ordering. Therewas a significant effect for Item - ordering with those examineestaking the easy-to-hard form; they scored higher. The author

1132

Page 183: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 175

concluded that item order should be considered on speededtests.

One variation on item arrangement studies is student aware-ness of the item arrangement. In a two-experiment study Laneet al. (1987) developed five forms of a forty-item multiple-choiceexam. The items tested course content from an undergraduateeducation course. Item order was determined as a result ofmanipulation of the statistical and cognitive item-difficulty level(based on Bloom's taxonomy). In the first experiment, studentswere unaware of the ordering patterns. The results showed nosignificant differences for the test of item order by gender. Inthe second experiment, six forms of the test were developed bymanipulating the statistical and cognitive item-difficulty level.Knowledge of the ordering was provided by labels that Indi-cated the cognitive level of the item. The results showed nosignificant differences for item order by gender by knowledge.However, a significant difference was found for the interactionof knowledge of level with gender. Males without labels scoredlowest, followed by females without labels, then by males withlabels. Finally. females with labels scored the highest. The au-thors concluded that the lack of an item ordering by genderinteraction in the second experiment suggests that the longaccepted view that easy items should come first is oversimpli-fied. They offered no conclusions based upon gender except tonote the increase in the males' scores when labels were pro-vided.

Hambleton and Traub (1974) also investigated the effects ofitem arrangement for males and females, using a mathematicsachievement test of multiple-choice items arranged easy-to-dif-ficult and ulificult-to-easy. Since the amount of time was re-stricted to forty minutes, the test was considered slightlyspeeded. although the differences in the number of studentscompleting each form was not significant. Neither the maineffect due to sex nor the interaction between item order and sexwas significant. They did find, however, a significant main ef-fect due to item ordering with higher mean scores on the easy-to-difficult arrangement. The authors concluded that reorder-ing the items on a test produces a test with properties differentfrom the original.

In a review of these and other studies that considered itemarrangement but not gender. Leary and Dorans (1985) con-

Page 184: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

176 Meyer

eluded that hard-to-easy arrangements of items should beavoided for all students, especially under highly speeded condi-tions. No additional conclusions regarding the differential ef-fects of item arrangement due to the gender of the examineeseems warranted, based on the studies reviewed here.

EXAMINATION FORMAT

Examination format is another factor in studies of differentialperformance for males and females. The two formats usuallyconsidered are essay and multiple choice. The arguments forthe importance of format are: superior verbal ability in femaleswill enhance their scores on essay exams, and differences intest-taking strategies will favor males on multiple-choice ex-ams. However, format as a factor is perhaps not as relevant formathematics tests where alternatives to the multiple-choice for-mat would not place much emphasis on writing ability. As withother test factors reviewed above, not all of the studies reviewedused mathematics as the content of the test.

Murphy (1982) studied the performance of male and femalecandidates on sixteen General Certificate of Education (GCE)examinations. The examinations included both multiple choiceand other forms, and three of the sixteen tested mathematicsachievement. At least one thousand males and one thousandfemale students took these tests for each of four consecutiveyears. The overall performance of females relative to males wasnot of interest in this study, but rather the relative perfor-mance of the groups on one type of exam compared to theother types. A series of t-tests was carried out to determine anysignificant difference in the performance of male and femaleexaminees on the multiple-choice tests, as compared to theother formats. Res' ilts showed that in the majority of cases themales performed better than the females on the multiple-choiceexams, as conmared to their relative performance on the otherformats. An important exception occurred on two of the threemathematics tests. On these two tests, there was no consistentmale advantage relative to the performance of the females onthe multiple-choice tests. The fact that a male advantage didexist on the one exam Is an unexplained inconsistency.

Using a sample of fifteen . and sixteen-year-old Irish stu-dents, Bolger (1984) examined gender differences in achieve-

Page 185: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 177

ment for three school subjects (Irish, English, and mathemat-ics). Multiple-choice and written formats were used. Males werefound to perform relatively better than females on the multiple-choice forms and relatively poorer on the written examinations;the opposite was true for females. The effect was constant acrossthe three subjects measured. An additional hypothesis thatgender difference would be largest for the languages and small-est for mathematics was not supported. The author cited thisas evidence that method-based gender difference cannot beattributed to the differential verbal skills required by the twomethods. Alternative explanations offered by the author in-clude neatness of presentation contributing to the performanceof females on essay exams and the possibility that males aremore likely to guess the answer on multiple-choice exams and.therefore, be more likely to obtain the right answer.

Studies testing for gender differences on multiple-choiceexams do not always reveal a male advantage. In a test ofEnglish language comprehension and composition. Bell and(Hay, 1987) used both multiple-choice and extended-responseformats. Males were not found to score higher on multiple-choice questions.

These three studies examined the relative difference in scoresbetween multiple - choice and essay format exams for males andfemales. Gender differences did not always occur. but whendifferences were found. they favored males on multiple-choiceexams. It is reasonable to conclude that exam format doescontribute to gender differences and that the use of the mul-tiple-choice format can result in a male advantage that is in&pendent of ability.

Student behaviors associated with multiple-choice exams(e.g.. guessing and answer changing) have also been studied.Differences in these behaviors for males and females mightexplain the differences found overall on multiple-choice exams.For example. if females are more likely to answer only thosequestions on which they are sure of the answer and leave therest blank, they will score lower than equal ability males whorespond to the questions they know and guess on the ques-tions for which they arc unsure.

Choppin (1975) looked for gender differences in the ten-dency to guess on multiple-choice exams. The sample consistedof fourteen-year-old students from 160 secondary schools In

Page 186: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

178 Meyer

England and Wales who had participated in the cross-culturalInternational Association for the Evaluation of EducationalAchievement (IEA) study of academic standards. The study ana-lyzed responses to six separate multiple-choice instruments thattested aspects of science and the English language. The resultsshowed a significant difference in favor of males in the ten-dency to guess. However, the size of the gender difference wassmall compared to the size of the difference found when thedata were analyzed by school type. A similar battery of testswas administered to ten-year-old students and again the datawere analyzed for gender differences. For this sample no cleardifferences emerged. although, for both males and females, thetendency to guess remained high.

More recently Khampalikit (1982) investigated guessing asa test-taking strategy of elementary school students. Randomsamples of students in Grades 2. 5, and 8 from a nationwidenorming group were used to compute four guessing-relatedindices using item responses for the 3R's Reading Test andMathematics Test. Results showed that the overall amount ofguessing was low and there was little evidence of differencesbetween the sexes on the test-taking behaviors assessed.

Answer-changing behavior was the subject of a study bySkinner (1983). Males and females from an introductory psy-chology course served as subjects. The test was a speeded 1C0-question multiple-choice exam given as a midterm examina-tion. Erasures on the answer sheet were examined to determineanswer changes and whether the answer was changed fromright to wrong or wrong to right. Overall, the number of answerchanges was small, only about 4 percent. An analysis for gen-der differences revealed that females made significantly morechanges than did males. Their rate of 4.83 percent was morethan double that of the males (2.36 percent). Regarding thisgender difference, the author concludes:

Clearly. deliberating about answer-changing leaves lesstime available for other activities, such as answeringmultiple choice questions not yet attempted, or doingother types of questions (e.g., essays). Thus. regardlessof whether or not there is a functional relationship be-tween the number of answer changes and time taken to

Page 187: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 179

consider and implement such changes, on a speededtest the tendency for females to make more than twiceas many answer changes as males may well be counter-productive, particularly in light of two further findings:first, the success rate for answer changes for womenwas not better than that for men (indeed, males made54% successful changes. females 50%); and second, fe-male subjects achieved a mean grade of 65.9% on theexamination, compared to 70% for the males. (p. 221)

Answer-changing behavior has been studied in relation toother test-taking factors, such as, for example, test anxiety.Payne (1984) hypothesized that high test anxiety would be as-sociated with a high degree of answer changing. Data for asample of 296 eighth-grade students consisted of scores on ananxiety measure and the number of item revisions on an aggre-gate of four multiple-choice science achievement tests givenover the period of one year. Item changes were coded wrong-to-wrong. wrong-to-right, and right-to-wrong. Race (black and white)and sex were also used as factors in the study. No significantsex differences in answer changing were found. Significant cor-relations between answer changing and test anxiety were foundfor white males, for the total white student group, for the totalmale group. and for the total student group, suggesting a ten-dency in each for higher test anxiety to be associated with moreanswer-changing behavior. None of the correlations for blackstudents was significantly different from zero.

Answer changing and guessing might be associated withgender differences found on the 1988 University of MinnesotaTalented Youth in Mathematics Program (UMTYMP) testing forthe Twin Cities sample (Terwilliger. 1988). Response patternslot 50-item multiple-choice test were analyzed on a gcnderbasis. It was found that females were less likely than males tofinish the test. and the drop in their success rate toward theend of the test was more pronounced than that of the males.The responses were not analyzed for answer changes.

Although the evidence is not conclusive, these studies sug-gest that guessing and answer changing might be associatedwith gender differences on multiple-choice exams. A greatertendency by males to guess answers can result in higher test

. .".1

Page 188: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

180 Meyer

scores independent of ability. Likewise, a greater tendency byfemales to change answers can result in lower test scores be-cause of the time this behavior takes in a speeded test.

TEST-WISENESS AND RISK TAKING

Two additional factors that arc sometimes associated with dif-ferences In test taking are test-wiseness and risk taking. Test-wiseness (TW) has been defined as "a subject's capacity toutilize the characteristics and formats of the test and/or thetest taking situation to receive a high score" (Millman, Bishop,& Ebel, 1965. p. 707). Risk taking on objective examinations(RTOOE) is defined as "guessing when the examinee is awarethat there Is a penalty for incorrect responses" (Slakter, 1967,p. 33). Both TW and RTOOE are associated with the multiple-choice format and, therefore, gender differences in them mighthelp explain gender differences due to test format.

Slakter, Koehler, and Hampton (1970) developed a measureof test-wiseness for use with students in Grades 5 through I I.The instrument measured four aspects of TW corresponding tothese behaviors: (1) select the option which resembles an as-pect of the stem; (2) eliminate options which are known to beInc( rrect and choose from among the remaining options; (3)eliminate similar options, that is, options that imply the cor-rectness of each other; (4) eliminate those options which in-clude specific determiners (p. 119). The instrument was admin-istered to 1,070 students in Grades 5 through 11 and replicatedwith a group of 1.291 students. A sex-by-grade multivariateanalysis ec variance was performed on the four subecale scores.The only significant effect was that of grade with TW increasingover grade level. Neither the sex effect nor the sex-by-gradeinteraction was significant.

A related study (Crehan et al., 1978) was conducted to de-termine the relationship between TW and grade level, the rela-tionship between TW and sex, and the stability of TW. This wasa longitudinal study that tested students three times at two-year intervals In Grades 5, 7. and 9 (n = 75): 6, 8. 10 (n = 76):7, 9, 11 (n = 73); and 8. 10, 12 (n = 64). The same four aspectsof TW were measured. As with the previous study. there was noevidence of sex differences or sex-by-grade interaction. TW wasfound to be somewhat stable and to Increase over grades.

Page 189: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 181

RTOOE has been measured using a variety of formulas.Slakter (1967) compared results obtained using five differentmeasures of RTOOE and an additional one he proposed in thisstudy. Results did not show any sex differences in RTOOE forany of the measures.

In summary, there is no evidence that males and femalesdiffer in terms of test-wiseness and risk taking on objectiveexaminations.

TEST PREPARATION STRATEGIES

The final area to be examined is that of study behaviors andtest preparation strategies for males and females. This researchhas focused on characterizing the study behaviors of successfulstudents. One general conclusion is that no single approach isassociated specifically with success. Biggs (1976) looked fordifferences in study behaviors for males and females. Using theStudy Behavior Questionnaire developed for his study, he foundevidence to suggest that a single task can be successfully ap-proached in different ways by males and females. He charac-terized the male approach as "seeing 'truth' emerging from ex-ternal sources and authorities, and not worrying too muchabout Interrelating past knowledge with what one is In theprocess of acquiring" (p. 77). On the other hand, he describedthe approach taken by females to be that of "making up one'sown mind about 'truth' by avoiding rote learning of detail andby actively using transformational strategies" (p. 77). It is inter-esting to note that these characterizations of study behaviorare exactly opposite those that are usually put forth to explainsuperior male performance on standardized mathematicsachievement tests.

Watkins and Hattie (1981), also using the Biggs Study Be-havior Questionnaire, investigated the study methods of stu-dents at an Australian university. Other factors that they con-sidered in addition to gender were age. academic year. and fieldof study. They found that regardless of these other factors,

females were more likely than the males to show inter-est in their courses and to adopt a deep-level approachto their work. At the same time the females also goner-: illy seemed to possess more organized study methods

Page 190: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

182 Meyer

than the males. The males were more likely to have apragmatic approach to tertiary study, to be more wor-ried about their work, and to adopt reproducing strate-gies which would allow them to scrape through theirexaminations. (p. 392)

They further noted that, based upon these findings, femaleswould be expected to achieve better academic results than males;indeed, this proved true for the females they investigated.

These results are interesting when applied to gender differ-ences in mathematics achievement. In a review of the litera-ture, Kimball (1989) pointed out that in contrast to standard-ized measures of mathematics achievement, females receivebetter grades in school mathematics than do boys. If thesesame differences in academic behaviors, as observed by Watkinsand Hattie (1981). are present at the elementary and secondaryschool levels, they might help explain the higher grades earnedby girls. They do not, however, explain why boys score better onstandardized tests.

A study by Speth (1987) investigated the interaction of learn-ing style, gender. and type of examination on anticipated testpreparation strategies. The two examination conditions usedwere multiple choice and essay exams. On the basis of twodifferent learning style instruments, the students from educa-tional psychology classes were grouped into four clusters. Asurvey of test preparation activities developed by the investiga-tor was used as the dependent measure. A factor analysis ofthe test preparation survey yielded six subscales. A 4 x 2 x 2MANCOVA tested hypotheses of no difference among clusters,between males and females, or between test conditions on thesix test preparation strategies, while controlling for a self-ratingof academic ability. The results showed no significant maineffect for sex or a sex-by-test type interaction. There was, how-ever, a significant three-way interaction of duster, gender, andtype of examination. This suggests that gender by itself is notthe critical variable, and that males and females relative toeach other do not prepare differently based on the method oftesting. Instead, as Biggs found, different test preparation strat-egies correlate with gender and learning styles (Biggs, 1976).What was not Included in this study was any measure of thesuccess of these different test preparation strategies.

1 I)

Page 191: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Gender Differences in Test Taking 183

The evidence from these three studies suggests that whilemales and females do have different approaches to academicwork (Watkins & Hattie. 1981), the within-sex differences mightbe as important as the between-sex differences (Speth, 1987).In addition, differences in approach can be equally effective(Biggs. 1976) and do not necessarily result in different out-comes.

SUMMARY AND RECOMMENDATIONS

This review has considered the research evidence related togender differences in test taking in general and mathematicstest taking in particular. The major factors reviewed were powervs. speed test conditions, item difficulty sequencing, and examformat. In addition, behaviors associated with the multiple-choice format were also reviewed. They included guessing, an-swer changing, test-wiseness, risk taking, and study behaviors.Evidence of gender differences in these factors was for the mostpart inconsistent. The one conclusion that does seem justifiedis that the use of the multiple-choice format couid result in amale advantage. Reasons for this advantage are not completelyclear, but they might include differences in answer changingand the willingness to guess. A conservative recommendationbased upon this conclusion would be to develop assessmentinstruments that do not rely on the multiple-choice format.This change should have no detrimental effect on the perfor-mance of males, and it might result in a decrease in any ad-vantage males have enjoyed because of the testing format cur-rently used.

-1

it) I.

Page 192: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

10

Communication and theLearning of Mathematics

David Clarke, Max Stephens, and Andrew Waywood

The learning of mathematics is fundamentally a matter ofconstructing mathematical meaning. The environment of themathematics classroom provides experiences which stimulatethis process of construction. This chapter presents the findingsof three studies based in Australian schools that exemplify thesuccessful introduction of innovation into mathematics cur-riculums. The purpose of this research synthesis is to reporton (a) the extent to which the strategies used encourage chil-dren to broaden their mathematical thinking and facilitatemetalearning and (b) the impact of these strategies on thenature of mathematical activity in classrooms. with particularreference to redefining the roles of teacher and student increating and giving personal meaning to mathematics.

The NCTM Curriculum and Evaluation Standards (1989) haveattached great importance to communication in mathematics:

Listening and reading with comprehension. developingan attitude of questioning, and describing mathematicalthought processes all contribute not only to learningmathematics with understanding, but also to the abilityto apply learned skills in new contexts, to solve prob-lems. and to extend learning beyond the task at hand.(p. 98(

184

102

Page 193: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 185

This report presents an overview and conclusions from threestudies. The IMPACT Project (Clarke, 1985), Assessment Alter-natives in Mathematics (Clarke, 1989). and the Vaucluse CollegeStudy, which suggest how the above standard can be realizedat classroom and school level. The central purpose of this chapteris to discuss:

the extent to which the strategies reported encouragechildren to broaden their mathematical thinking andfacilitate metalearning.

the impact of these strategies on the nature of math-ematical activity in classrooms and, in particular.with regard to redefining the roles of teacher and stu-dent in creating and giving personal meaning tomathematics.

The nature of these strategies was essentially metacognitiveThe neaning of metacognition in the mathematics context hasbeen most usefully articulated by Garofalo and Lester (1984) asthe knowledge and regulation of cognition. An essential aspectof this metacognitive activity is reflection on learning (Kilpatrick,1985), and White (1986) identified a need for training in justthis aspect of metacognition: "Much learning is superficial, be-ing done without deep reflection. Appreciation of this pointleads to the recognition of the need for training in metacognition"(P. 5).

Biggs (1988) put forward the image of "learning throughguided student self-questioning" and suggested "self-manage-ment of learning" as an essential goal of education. The reflec-tive review of learning and student self-management of learningwere central concerns of all three studies reported here.

The learning of mathematics is fundamentally a matter ofconstructing mathematical meaning. The environment of themathematics classroom provides experiences which stimulatethis process of construction. While the mathematical knowl-edge of school children will Incorporate visual imagery, both atthe level of ikonic thought and at a level involving more elabo-rated visual representations (geometrical, graphical). mathemati-cal meaning requires a language for its internalization withinthe learner's cognitive framework and for its articulation in thelearner's interactions with others. Communication is at the

193

Page 194: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

186 Clarke. Stephens, and Warmed

heart of classroom experiences which stimulate learning. Class-room environments that place particular communication de-mands on students can facilitate the construction and sharingof mathematical meaning and promote student reflection onthe nature of the mathematical meanings they are required tocommunicate.

FACILITATING COMMUNICATION IN THE MATHEMATICSCLASSROOM: THREE STUDIES

We would suggest the existence of three distinct types of com-munication in the mathematics classroom:

communication about mathematics;

communicating mathematics;

using mathematics to communicate.

The IMPACT Project is primarily concerned with the first ofthese. The use of student mathematics journals at VaucluseCollege provided the opportunity for the development of com-munication in all three modes. The IMPACT program requiredthat students reflect on their mathematical activity and theirlearning and, through student-teacher dialogue, sought to fa-cilitate self-management of learning. The national developmentand testing of the assessment strategies which became Assess-ment Alternatives in Mathematics (Clarke, 1989) gave consider-ation to communication in two senses: communication as themeans by which assessment information is obtained and com-munication skills as one focus of assessment. The use of math-ematics journals in the Vaucluse College study demonstratedthe potential to develop in learners the reviewing and reflectiveskills required by the IMPACT program and also to develop instudents the ability to think mathematically.

THE IMPACT PROJECT

The IMPACT Project benefited from the support of the Facultyof Education, Monash University. The publication of the evalu-ation report (Clarke, 1985) was funded by the Monash Math-ematics Education Centre. The objectives of the IMPACT projectwere:

104

Page 195: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 187

to provide a mechanism whereby the student can regu-larly inform the teacher of difficulties experienced. helpneeded, and anxiety felt:

to encourage and facilitate meaningful pupil-teacherdialogue, student reflection on learning, and negoti-ated instruction.

During 1984, about seven hundred Year 7 children in thirty-six mathematics classes in fifteen Victorian secondary schoolswere regularly given the opportunity, about once every twoweeks, to give confidential, written answers to questions like:

What was the best thing to happen in Maths?

What is the biggest worry affecting your work in Maths?

What is the most important thing you have learned inMaths?

How do you feel in Maths classes?

How could we improve Maths classes?

The regular, written reflections of seven hundred childrenconcerning their learning of mathematics provided a pool ofdata related to the achievement of the above objectives. A graphicportrayal of the children's conception of mathematics emergedover the year (Clarke, 1987). The immediacy of this portrayalwas heightened by the children's spontaneous (and highly idio-syncratic) use of technical mathematical terms. Many of thecurrent preoccupations of the mathematics education commu-nity (language, differential treatment and behavior of boys andgirls, mode of instruction, student-generated algorithms, thesocial context of instruction and learning, and t on) emergedas concerns for the students in this study. Participant teachersexpressed surprise at the significance of these issues for theirclassrooms.

Examples of questions used and some answers obtainedfollow:

What would you most like more help with?

Nothing much, but I'm not sure Your way 4)80how to do division your way. My way 4)80

35)

Page 196: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

a

188 Clarke, Stephens. and Waywood

Write down one particular problem which you founddcult.

Algebra a bit, because I don't understand why wedon't Just use numbers. It would be simpler.

Write down one new problem which you can now do.

113 ÷ 4 = 1/3 x 411 = 4/3 = 1 / 12

How could we improve maths classes?

Have less work and more learning.The nature of communication in the mathematics class-

room became the central consideration in the teachers' evalua-tions of both instruction and learning and in their planning forfurther instruction. The IMPACT study provided an opportunitywith some classes for a redefinition of the function of the math-ematics classroom. The findings show clear instances whereteacher action in response to student requests or suggestionssignificantly altered the form of instruction. Students in thoseclasses were confronted with the need for a reinterpretation oftheir role in determining the nature of classroom activity andthe possible nature of student-teacher communication.

Specific findings from this study included:

The student attitude to the administration of the IM-PACT procedure was predominantly one of acceptanceand passive compliance.

The quality of student responses varied. Teachersreported that many students experienced difficultyin articulating their feelings or their mathematicsdifficulties.

A majority of students reported finding the procedure"useful."

More boys than girls reported finding the procedurepersonally useful. (An interesting result, since severalteachers commented that the girls made better use ofthe procedure, offering more informative and insight-ful responses.)

Students who found the procedure useful offered threecategories of benefit:

re

Page 197: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 189

Reflection on Learning"It makes you realize moreabout the subject" and "It makes you think howyou're going."

Reporting Feelingsit gives you a chance to ex-press your feelings/tell about your problems/saywhat you like and don't like."

Information for the Teacherit helps the teacherknow ... "

Teacher action took the following forms: Organizationalaction, instructional action, individual assistance, in-dividual counseling and, in two cases, no action.

Most teachers reported improved student-teacherrelationships

Where a student expressed dissatisfaction with theprocedure, the reason most commonly given was lackof consequent teacher action.

Several instances were documented in which teacheraction arising from information provided through theIMPACT procedure led to positive changes in studentattitudes and achievement.Over 80 percent of participating teachers consistentlyreported finding the IMPACT procedure to be of value

Teachers identified a lack of time in the past to engage inprivate conversation with every pupil as a major concern As aresult, they greeted the IMPACT procedure with initial enthusi-asm, since it provided the opportunity for all students to com-municate confidentially with their mathematics teachers withminimal reduction in instruction time. Other benefits were iden-tified by the teachers:

Students talked to rue through the sheets, very frankly.and I gained tremendous insights into anxieties theyhad, and frustrations. Students I felt were coping quitehappily mentioned anxiety about tests. Some studentsfelt I did not explain things thoroughly enough and wenttoo fast, and these were students who did quite well intests, so I had assumed they were happy. Other stu-dents mentioned boredom and felt the work was too

Page 198: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

190 Clarke, Stephens, and Waywood

easy and was repeating Primary School. Without thesheets, I would not have gained this information as theywould never have been so frank in conversation. I obvi-ously changed my teaching methods to comply with theinformation and this helped my relationship with theclass and with individual students. In general, studentsreally appreciated the fact that I was taking the trot_'to find out what they think and they used the systemvery responsibly.

(Year 7 Maths teacher, female September 1985).

Use of the IMPACT program facilitated communication be-tween teachers and students about the mathematics being stud-ied and about the students' feelings concerning their learning,the content, and the instruction. In several instances, this com-munication led to fundamental changes in instructional prac-tice, learning behavior, and classroom environment.

The extent to which participation in the IMPACT programactually facilitated metalearning and the development of stu-dent mathematical thinking remains uncertain. The IMPACTprogram certainly provided a stimulus for reflection on learn-ing. but no training was provided in review techniques ormetacognitive strategies (cf. the PEEL Project, Baird & Mitchell,1986). Nor was any feedback provided to students concerningthe quality of their IMPACT responses. By its nature, theIMPACT program provided documentation of student communi-cation of the first type, communicating about mathematics.and, to a lesser extent, of the second type, communicatingmathematics.

ASSESSMENT ALTERNATIVES IN MATHEMATICS

In 1966. the Mathematics Curriculum and Teaching Program(MCTP), a national initiative concerned with the professionaldevelopment of mathematics teachers, commissioned a studyof effective assessment practices in use in Australian math-ematics classrooms. The outcome of this project was to be ateacher resource guide, subsequently published as AssessmentAlternatives in Mathematics (Clarke, 1989). It was aimed atassisting mathematics teachers to expand their repertoire ofassessment strategies in order that their assessment might be-

108

Page 199: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 191

come optimally effective, giving appropriate recognition to allthe goals of the contemporary mathematics classroom.

Communication in the mathematics classroom became acentral concern in the compilation of assessment strategies.This was particularly the case with regard to the assessment ofstudent problem solving in mathematics, where it became evi-dent that a teacher's capacity to evaluate a student's problem-solving performance was critically dependent on the student'sability to articulate, in either spoken or written form, the prob-lem-solving process, the nature of the solutions, and the evalu-ation of the appropriateness and the quality of their solutions.

Among the various assessment strategies collected. studied,tested, and refined during the course of this project, the role ofcommunication varied with the particular strategy under con-sideration. The following discussion examines the nature ofthe communication component for a sample of the assessmentstrategies.

ASSESSMENT THROUGH CLASSROOM OBSERVATION

Teachers made succinct annotations to class lists during thecourse of a lesson. These brief records were restricted to "aber-rations and insights." that is, observations of student behaviorsor utterances which challenged or extended the teacher's exist-ing conception of a student's competence or understanding.The effectiveness of such informal assessment is critically de-pendent on the nature of classroom activity. This was addresseddirectly by drawing teachers' attention to factors which facili-tate or inhibit student communication in the classroom. A ma-jor assertion of Assessment Alternatives in Mathematics wasthat the most effective instructional activities are typically thosewhich also provide the best assessment opportunities. Strate-gies for maximizing assessment opportunities included consid-eration of "wait time" (Rowe, 1978). the characteristics of "goodquestions," and the establishment of "student work folios."

Test AlternativesTeachers were encouraged to explore different approaches toformal testing. These included.

Practical tests, in which student competence was dem-onstrated through the completion of tasks with a practi-

139

Page 200: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

3/4

192 Clarke, Stephens. and Waywood

cal emphasis, typically involving the manipulation ofconcrete materials. Computing skills were also assessedin this way. It was a common requirement for studentsto provide an account of their methods, but the essenceof this approach was communication by demonstration.

Group tests, in which tasks were solved through studentcollaboration. Successful performance was associatedwith effective student-student communication and anability to translate into personal terms the ideas andinsights of others.

Student-constructed tests, in which groups of studentswould contribute test items covering a topic _lust com-pleted. Trial teachers were unanimous that the demandsof articulating the essence of a topic through a repre-sentative set of problems made this strategy an im-mensely powerful review technique, leading to signifi-cant advances in student understanding. The resultingtests were consistently more difficult than those theteacher would have set, were typically completed withhigher levels of student enthusiasm and succe andprovided a context particularly conducive to sub._ _4uentdiscussion.

Problem Solving and Investigations

A four-dimensional structure for the assessment of problem-solving behavior emerged in the course of the testing (see, forcomparison, Schoenfeld, 1985), and tee her attention was drawnto the need to identify which aspect of problem-solving behav-ior was of interest. Assessment information was typically col-lected through informal observations and from student reports.This inform- 'on could then be located within the categoriza-tion scheme t.elow.

Dimension 1 relates to the spontaneous use of math-ematical procedures, principles, and facts, that is. themathematics that our students choose to usewithoutthe explicit cueing of a test question.

Dimension 2 is concerned with problem-solving strate-gies. There are many lists of such strategies. Practicingteachers seemed quite confident in their ability to dis-

0 0

Page 201: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 193

tinguish strategies such as "restated the problem," or"organized information systematically," or "found a re-lated but similar problem" (and so on) from the math-ematical tool skills which provide the focus of Dimen-sion 1.

Dimension 3 is the structural dimension, particularlyconcerned with planning, decision making, verifying, andevaluating. One secondary mathematics teacher provideda succinct summation of the focus of Dimension 3 inobserving, "Students should show a systematic approachof reviewing what they know, planning their actions,testing their ideas, and evaluating their work."

Dimension 4 is the personal dimension, concerned withstudent participation, motivation, work habits, the skillsassociated with cooperative group work, and beliefs aboutthe nature and purpose of mathematical activity.

Teachers reported that students experienced significant dif-ficulties in recording and reporting their problem-solving at-tempts and required substantial guidance and detailed feed-back.

Communicating Assessment Information

Issues related to the grading of student work and the effectivereporting of assessment information were explored. The needfor clarity of communication and the establishment of an ongo-ing dialogue between student and teacher concerning thestudent's growth towards competence was stressed.

Expanding the Assessment Netv .)rk

Teachers were encouraged to consider other purposes to whichassessment information might he put (program evaluation andinstructional review, for instance) and other groups or individu-als who might contribute assessment information. Parental in-volvement, peer tutoring, and peer assessment were investi-gated and various strategies offered to facilitate studentself-assessment. These latter strategies included the IMPACTprocedure. already reported, and the use of student mathemat-ics journals. It was the evaluation of the use of student _Jour-

201

Page 202: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

194 Clarke. Stephens. and Waywood

naffs which subsequently became the focus of the VaucluseStudy reported below.

Communication plays a central role in each of these ap-proaches to assessment, and much of the effort expended dur-ing the testing of the assessment strategies related to fosteringclear, purposeful, meaningful, informative communication ofmathematics and about mathematics.

THE VAUCLUSE COLLEGE STUDY

This study explored the implications of the regular completionof student journals in mathematics. Vaucluse College is a Catho-lic secondary girls school. There are approximately five hun-dreu girls from Year 7 to Year 12 at Vaucluse. It serves amulticultural population: 20 percent Asian. 30 percent Italianand Greek, with the remaining 50 percent being predominantlyAnglo-Saxon. For all students at this secondary school fromYear 7 onwards, a central component of mathematical activityis the daily completion at home of a student journal. Throughtheir journal-keeping activities students are introduced to de-scribing what they have learned, summarizing key topics, andidentifying appropriate examples and questions. Regular moni-toring of the journals informs teaching practice and providesthe basis for individual teacher-student discussion.

In 1986, mathematics journals were first introduced experi-mentally in one class each at Year 7, 9, and 10 levels. Resultswere encouraging enough to warrant the expansion of theiruse. By the start of 1989. the keeping of mathematics journalswas seen as an essential element in the teaching of mathemat-ics from Years 7 to 10. Appendix E presents the history andrationale for student mathematics journals at Vaucluse Collegefrom the perspective of the school and the mathematics staff.The school statement includes the following aims:

By keeping a mathematics journal we intend thatstudents:1. Formulate, clarify, and relate concepts.2. Appreciate how mathematics speaks about the world.3. Think mathematically:

a. Practice the processes (problem solving) that un-derlie the doing of mathematics.

b. Formulate physical relations mathematically.

Page 203: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 195

As an introduction to journal writing, Year 7 students aresupplied with a book in which each page is divided into threesections: What we did, What I learned, Examples and ques-tions. Students are required to write in their journal after everymathematics lesson. This is seen as ongoing homework. Jour-nals contribute 30 percent to the assessment in mathematics.When writing student reports, mathematics teachers were giventhe following guidelines for the assessment of students' journals:

A. Quantity of work.1. Frequency: that is, is it done after every lesson?2. Volume: the amount of work done can be taken as a

measure of both ability and enthusiasm.3. Presentation.

B. How well is it used?1. Is the work summarized, and do the summaries indi-

cate developing note-taking skills?2. Is the journal used to collect important examples of

procedures and/or applications?3. Are errors identified and discussed?4. Arc there signs of involvement with the work, origi-

nal or probing questions, a willingness to explore?5. is the student learning to "dialogue," that is, ask her

own questions and then set about methodically seek-ing an answer and presenting her investigationslogically?

As a minimum, a satisfactory journal entry should reflectthe intellectual involvement of the student in the day's lesson.What form a particular entry will take is determined by theform of the day's lesson and the level of sophistication at whichthe student can interpret the journal tasks. Appendix E setsout the school's expectations with regard to theory, practice,and activity-oriented lessons.

Journal writing was intended to assist students to see them-selves as active agents in the construction of mathematicalknowledge (see Stephens, 1982). The school hoped that journalwriting would assist students progressively to engage in aninternal dialogue through which they reflected on and exploredthe mathematics they met. In this respect, there is a link to theIMPACT Study through a similar focus on the development of

Fi U o

Page 204: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

196 Clarke, Stephens. and Waywood

metacognitive learning. It was also hoped that students, throughtheir journal writing, would begin to see mathematical activitynot simply in terms of applying prescribed rules and proce-dures, but more as engaging in activities such as searching forpatterns, making and testing conjectures, generalizing, asking"Why?", trying to be systematic, classifying, transforming, search-ing for methods, deciding on rules, defining, agreeing on equiva-lences, reasoning, demonstrating, expressing doubt, and prov-ing (cf. Mason, 1984).

If such aims were to be realized through the use of jour-nals, it would be necessary to focus on the linguistic forms bywhich students communicated what they had learned and howthey had gone about it.

Methodology of the Evaluation Study

During 1988 and 1989, an evaluation was conducted of stu-dent journal use and its effects on the learning and teaching ofmathematics. Consultation with school staff and perusal of asample of student journals led to the construction of a ques-tionnaire which, after testing, wt.., administered to all studentsin Years 7 to 12. The questionnaire examined student use ofjournals and their perceptions of the purpose of journal com-munication and its contribution to their learning of mathemat-ics. Students' conceptions of the natuz of mathematics and ofmathematical activity in schools were also addressed. A similarsurvey was conducted of school mathematics staff, with spe-cific focus on the extent to which they valued and fosteredstudents' journal communications and made use of studentJournal communications in their classroom teaching and intheir work with individual students.

At the time the evaluation began there was a perception inthe school mathematics department that a progression existedin student Journal writing from a narrative mode to a summarymode to dialogue. Conversation with teachers and the perusalof student journals suggested that student journal writing couldbe usefully divided into the three categories: Narrative (or Epi-sodic), Summary, and Dialogue. This categorization assumedthe status of a hypothesis, and provided much of the structurefor the initial data analysis. School sources asserted that amajor aim of journal writing was to facilitate student develop-ment in question asking and that questioning reflects the dia-lectic of Narrative, Summary, and Dialogue (Waywood, 1988).

A: 4n

Page 205: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 197

Narrative, Summary, and DialogueThe categorization of student journal use into Narrative, Sum-mary, and Dialogue warrants more detailed explanation. Theexamples which follow were offered as both Illustrative instancesof each response category (Examples 1, 4. and S), and also asexamples of "transition" responses from students whose jour-nal entries suggest that they are in transition between catego-ries. Seen in this light, Example 2 shows a student movingfrom simple narrative of classroom experiences to the restruc-turing of content and experience required for effective sum-mary. Examples 3 and 5 show two students' initial experimentswith a new form of journal entry. In each of these two cases,the excerpts represent embryonic instances of the Summaryand Dialogue categories, respectively.

Narrative:Example 1. "Today was the day that Mr. Waywood wasabsent and set us work to do that gave me a lot ofthinking to do. I. don't think that it was very hard butyou had to think about what to write for the answer tothe questions."Example 2. "I think today I began to understand thatmaths is a way of describing things in reality. A greatexample is that a ball flying through the air travels thepath of a parabola. Because there is an infinite numberof ways for the ball to travel there is an infinite numberof possible parabolas. Because parabolas can be writtenmathematically there would be a mathematical functionto describe every arc in the world."

Summary:Example 3. "Logarithms arc an index which are used tosimpl'fy calculations. The whole number part of a loga-rithm is called the characteristic. The decimal part of alogarithm is called the mantissa."Example 4. "Equations ... the main word here is to solve.Equations have an unknownthere is an answer to theproblem. Linear techniques revolve around inverse op-erations, and quadratic equations, different from theabove, require different techniques to solve them, suchas factorization.... You can't solve all the equations the

(7) r4.1.1.

Page 206: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

198 Clarke, Stephens. and Waywood

same way. because they are all different, and that iswhy we have to learn different techniques."

Dialogue:Example 5. -The sin of 60 = 0.866025403... firstly isthe sin of 60 infinite I wonder. I think it is because yousaid the points on a circle are infinite. Then how couldthe square of 0.866025403... be exactly 0.75? If it isjust an approximation, then how could it equal exactly17 Can you please explain?"Example 6. "Another thing, transposition and substitu-tion, really show you the quality of operations. Likedivision. is sort of a secondary operation, with multipli-cation being the real basis behind it. This ties in withmy learning about reading division properly (in previouspages), that is, fractions are different forms of multipli-cation. So I guess that's like rational numbers (Q) arelike a front for multiplication, an extension of multipli-cation. Which came first, multiplication or division? Itwould have to be multiplication. They are so similar, nothat's not what I mean. I mean they are so stronglyconnected. But its like division does not really exist,multiplication is more real. The same with subtraction.Addition and Multiplication are the only real operations."

The study design provided a diversity of data sources bywhich the validity of the categorization could be assessed. Stu-dent interviews, student and teacher questionnaires, teacherinterviews, and the study of journal entries represented a sub-stantial body of data by which both the individual validity ofeach category could be judged and any patterns of individualdevelopment identified.

Observations and FindingsAn initial analysis of the student survey data has been com-pleted. Findings suggest that journal writing leads to a progres-sive refinement of purpose from an initial narrative stage ofsimply listing events in the mathematics classroom to summa-rizing work done and topics covered. Within this stage, we notea move away from a simple summary of items of mathematicalwork covered to a more personal summary of mathematicalactivity in terms of developing understanding and addressing

nU

Page 207: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 199

problems. Finally, some students move beyond this to an inter-nal dialogue, where they begin to pose questions and hypoth-eses concerning the mathematics in which they are engaged(e.g., "I wonder whether this works or other graphs as well,"and "So, why is it that ...

More specifically, the narrative descriptions of what wasdone on a particular day, so prevalent in Years 7 and 8, appearto be progressively enriched by the inclusion of reflective writ-ing in which the students discuss how they went about aninvestigation and how the work in hand related to work theyhad previously covered. This review process, together with re-quests for teacher help and indications of things they wouldlike to find out, is similar to the responses solicited through theIMPACT program. Journal entries of some students occasion-ally took on the aspect of dialogue. Our research suggests thatthrough the process of their journal writing students increas-ingly interpret mathematics in personal trims, constructingmeanings and connections.

Student Survey FindingsWhile questionnaires were administered to every student, asample of 150 students, 25 at each year level, was chosen forstatistical analysis. Three questionnaires were administered("Mathematics," "JournalsPart A," and "JournalsPart B," inthat order), and the sample selection procedure ensured that allstudents at a particular ear ]eve;, who had completed all threequestionnaires, had the same chance of appearing in the sample.

A full statistical report was prepared for the use of theschool (Clarke, Stephens, & Waywood, 1989). but the purposesof this report are best served by a summary of significantfindings. These are set out below, with related conclusions ap-propriately clustered. It must be borne in mind that thesefindings are the results of students' reports of their behavior,their teachers' behavior, their perceptions, and their beliefs.

Frequency of Journal Use.

The majority of students (54 percent) reported thatthey write in their maths journal "after every lesson."

A similar majority (53 percent) estimated the time spenton Journal writing in one week as less than one hour.

Page 208: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

200 Clarke, Stephens, and Waywood

Ninety percent of students reported reading their Jour-nals either occasionally or often.

Nature of Journal Use. By clustering student responses toparticular items it was possible to construct indices associatedwith the hypothesized taxonomy of writing modes: Narrative.Summary, and Dialogue. Of the sample of 150 students, 65could be identified as predominantly employing one of the threemodes of journal use. This enabled statistical analyses to becarried out for this subset of students incorporating a measureof Mode of Use. (A "Modal Rating" on a seven-point scale wassubsequently generated for 123 of the 150 students, and theconclusions which follow held true for both measures).

Year Level was more decisive in determining the fre-quency of Journal use than was a student's experiencewith Journal use. However, experience with journaluse was more significant in accounting for Mode ofUse. This justifies the conclusion that it is the experi-ence of using journals that promotes more sophisticatedmodes of use rather than simply student maturation.

Analysis of variance revealed that Mode of Use madethe most significant contribution to accounting for thevariation evident in the three key indices. User Index,Difficulty Index, and Positive-Effect Index.

A clear and statistically significant trend emerged inthe consideration of Mode of Use in relation to each ofthe other critical indices. The more sophisticated themode of journal use, the more likely a student was to:

make more use of Journals* find journal completion less difficult

express greater appreciation of journal completionreport positive, rather than negative, outcomes ofJournal use.

These results may not be surprising, but the consistency inthe direction of the trend and in the statistical significancestrongly supports the interpretation of Mode of Use as a mean-ingful structure for the analysis of student journal writing.

v'S

Page 209: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 201

Incentives and Obstacles to Journal UseSixty percent of students gave as the main reason forwriting in their journal, "because it helps me." In an-other item, the most popular Justification for journaluse was "to help me learn?Most students (75 percent) found the act of journalwriting "mostly" or "always" easy. However, studentswere evenly divided over whether or not they found itdifficult to put their mathematical thinking into wordsIn this regard, it is worth noting that half of the stu-dent sample reported that the most important thinglearned from journal completion was *To be able toexplain what I think."

Purpose.Asked to identify "the most important thing for me todo in my journal," students indicated: "to summarizewhat we did in class," "to write down what I under-stand," and "to write down examples of how things aredone," in that order. These responses are consistentwith the finding that the majority of students appearto be operating in the Summary mode and to perceivejournal use in either Summary or Narrative terms

In response to the item "I think of my mathematicsjournal as ... ," the most frequent student responseswere "as a summary for me to study from later" and"as a record of the things I have learnt in maths

Teacher Action.

The most common student estimate of the frequencyof teachers reading journals was "once a month."

A Teacher Action measure was constructed from acluster of related questionnaire items. The reportedvariation in Teacher Action with Teacher Identity wasstatistically significant. that is. the differences whichstudents saw in the action which particular teacherstook in relation to journal use were consistent andsignificant.

a

C

Page 210: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

202 Clarke. Stephens, and Waywood

Mathematics and the School Mathematics ProgramSince data were collected regarding student conceptions of thenature of mathematics and mathematical activity. the possibil-ity exists for the later collection of parallel data in schoolswhere mathematics journals are not in use. A comparativeanalysis of student responses may shed some light on the roleof mathematics journals in developing particular student con-ceptions of the nature of mathematical activity.

Students reported that their most common experienceof mathematics at school was "listening to the teacher,"closely followed by "writing numbers," "listening to otherstudents," and "working with a friend."

Students rated aspects of their mathematics course inorder of importance. By far the most important was"the teacher's explanations? Other important aspects,in order, were: "the help my teacher gives me," "work-ing with others," "my maths journal" and "the text-book."

The role of communication in the learning of mathematicsand in the performance of mathematical activities was givenconsiderable prominence by a significant majority of students.Pending further analyses, some sample student responses scveto illustrate the variety of student views about the value ofmathematical journal writing. Responses include. "I find doingthe journal

useful, because it helps me explain to myself what Iam doing wrong" (Year 8).

hard, because sometimes you forget and other timesyou don't remember what you understood in class"(Year 10).

a waste of time, because my teacher never collects myjournal to help me" (Year 10).

useful, because it helps me keep up with what's hap-pening in class" (Year 12).

210

Page 211: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 203

Teachers' Perceptions and Reported PracticesAll eight teachers of mathematics responded to a questionnaireabout their expectations of mathematics journal writing andthe use that they made of journals. As a follow-up to the ques-tionnaire, three teachers were chosen for interviews accordingto their experience in using journals and the year levels atwhich they taught.

There was a high degree of consistency among teachers'responses to the questionnaire. All teachers expected studentsto write in their journals for at least one hour each week, andto read over what they had written at regular intervals Mostteachers aimed to read all journals at least twice a term, withsome expecting to do so more frequently, even though this wasacknowledged to be a substantial time commitment. They alsoexpected students to show journals to their parents.

For the majority of teachers, the most important thing forstudents to do in their journals was to write down what theyunderstand. Likewise, a majority agreed that mathematics jour-nals are most effective in showing how students think aboutmathematics. This was considered far more important thanstudents' ability to summarize what they had learned.

A student In Year 7, for example. commenting on her reviewof place value and addition, said she was no longer learninghow to do a "long sum," but learning "why I'm carrying." In herjournal, she further noted:

As many of us have worked on place value before, theobject of this work is not to teach us how to do a longsum, but to do it so we understand. I must think aboutwhy I'm carrying one ... is it a ten, a one (unit), orsomething else? ... I must think about why I'm doingthings with all sorts of maths, and not just do thingsautomatically. That is how I was taught to do it.

teachers tended to agree that students found journal writ-ing difficult, and added that most students found it hard toexplain what they thought. Journal writing was seen as helpingstudents to write summaries, to be able to explain what theythink, and, more importantly. to not be put off by mathemati-cal words and symbols. One teacher commented that journalwriting allowed students to investigate ideas Independently.

CAt

Page 212: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

204 Clarke. Stephens. and Way-wood

When asked to consider the greatest benefit for students inreading over their journals, teachers very strongly believed thatreview of the journals was most valuable when students weretrying to grasp a new idea. This outcome w r,s rated more highlythan using journals to go over material that has been dealtwith before.

Teachers were more diverse in articulating the benefits theyderived from reading students' journals. These ranged fromgetting feedback on teaching, identifying difficulties experiencedby specific students, and seeing how students learn as well asseeing what students think they have learned. A very commonresponse of the teachers was to view journals as a way forstudents to communicate to teachers their feelings about math-ematics. In general, teachers consistently noted that readingjournals had confirmed for them the importance of two-waycommunication as a part of mathematics learning.

Nearly all teachers saw themselves interacting regularly andoften with students through their journals. These interactionsmost commonly took the form of writing comments in journals,talking to students about what they had written, and helpingstudents to overcome difficulties they had mentioned in theirwriting, as well as suggesting ways in which students couldimprove the quality of their mathematical writing. A majority ofteachers said that they often raised issues in class based onwhat they had read in individual journals. Several teacherssaid that they needed more time to read journals, to makecomments, and to provide individual feedback.

When asked to be more specific about ways in which stu-dents could improve their journal writing, teachers consistentlycommented in favor of students writing more about their ownthinking and asking more questions in their journals. Thesetwo responses had stronger support than "writing better sum-maries" or "collecting more examples."

When given the opportunity to say how they regarded themathematics journals, the three universally endorsed responseswere: as a way for students to communicate their mathematicalthinking; as a record of students' difficulties in mathematics;and as a way for students to think through the mathematicsthey had done. All teachers agreed that reading students' jour-nals had contributed significantly to what they knew abouttheir students. Some specific responses were:

et 0ee id

Page 213: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 205

(The journals provide] a more precise indication of howmuch they understand.[Through reading journals] I am able to identify stu-dents who have "no idea," or have difficulty expressingthemselves.(The journals) help students to clarify difficulties, andverbalize attitudes toward mathematics.All teachers agreed that journals had helped them to un-

derstand their own teaching. Some specific responses were:I used to dominate discussion. I now guide discussionand encourage my students to explore....They often say if I have explained something well or not.Easier to assess how well (or badly) you have covered aparticular idea.I now write notes on the blackboard in every lesson.Despite their references to students' finding journal writing

as challenging and, at times, a demanding task, all teachersaffirmed that they saw improvement in students' journal writ-ing during the year and, when viewed across several years.cumulative improvement. In the subsequent interviews, teach-ers were asked to explain what they looked for to indicateimprovement in journal writing. The teachers' response to thisand other similar requests was to offer illustrative examplesfrom particular students' journals:

A student, described by her teacher as quite capable, atthe start of Year 10, was using her journal to summa-rize, basically in her own words, what the teacher hadwritten on the blackboard. Later in the first half of thatyear, she wrote: "I ran into a problem. When I do sumslike this I need to ...Towards the end of the year, having studied the effectsof transformations on linear and quadratic functions,she began to investigate on her own the effects of thesame transformations on sine functions: "I know what'sine' looks like .... I'm surprised to find that the rulesarc similar to those for a quadratic function .... As Iwas unsure whether these rules apply to all or somefunctions, I went on to find evidence to support thisclaim."

3

Page 214: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

206 Clarke, Stephens. and Waywood

In their questionnaire responses, teachers commented thatpractice in journal writing had enabled students to expressideas more clearly and to relate ideas; they also noted thatstudenta' summaries had become more detailed and accurate.

Another Year 10 student was described by her teacher as"just taking notes at the start of the year? At this stage, herjournal was used mainly as a device to summarize work donein class. Later in the year, she began to make comments on herown work, such as, "I'm still getting confused on what numbersto use in the domain and co-domain."

Towards the end of the year, the same student wrote: "Be-fore today, I didn't realize what f(x) meant. Today I learned thatf(x) means function of x."

Her teacher annotated this entry, asking her to explain thiscomment, and suggesting that she should try to analyze herown thoughts further.

Finally, teachers were asked whether their view of math-ematics journals had changed over the period they had beenusing them. Three of the eight felt that there had been nochange, commenting that they had always supported the use ofjournals in mathematics. From other teachers, there was adeveloping sense of greater appreciation of the value of math-ematics journals. Some typical responses were:

Journals are a more powerful tool than I once thought.My appreciation of their benefits has increased, as hasmy ability to assist students to use them.I am a lot more aware of their usefulness.

Teachers brought to the interviews several journals by stu-dents, representing a range of ability. From the interviews, itwas clear that these teachers used consistent criteria to trackimprovement in the quality of students' journal writing. Im-proved Journal writing was noted from individual students withina single year level and by a comparison among Journals fromstudents in the same class. The criteria used by teachers sup-ported the classification of developmental stages in mathemat-ics journal writing which has been employed in this study.

Progression in Journal WritingTeacher interviews, together with an examination of students'Journals, served to confirm the categories that had been used

214

Page 215: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 207

to classify the major developmental stages in students' math-ematical journal writing. It appears that the three categories,Narrative, Summary. and Dialogue, as employed to categorizequestionnaire responses. c! -wed a marked consistency withthe linguistic forms by which students communicated whatthey had learned and how they had gone about it.

In the Narrative stage, students' frames of reference fortheir journals are defined, in the main, by tasks which makeup the mathematics lesson and by the chronology of the math-ematics classroom. In some instances, the description may beas bald as, 'Today we did the pink sheet? Students seemsatisfied, at this stage, to describe themselves as, "doing frac-tions," or -in the middle of chapter 3," and to comment on theirlearning in general terms, such as "It was easy," or 1 finishedall the work and got most of it right." Examples seem chosen todo no more than illustrate the work done. Many students atYear 7, as they begin to use journals. may be expected to be atthis Narrative stage. A teacher of middle secondary classesdescribed many students at this level as still coming to termswith journal writing. They are either still at a Narrative stage orJust beginning to move into a Summary stage. To use thisteacher's own words:

It is a case of knowing that they have to write some-thing. but many have difficulty knowing what to do. Atthe beginning of the year, these students are sayingwhat they did in the mathematics class.... They areable to describe what they did, and the types of thingsthey did.

Unlike students' writing in a narrative stage, there is now adeliberate effort to delineate key features of the territory. Themathematics may still be "out there." but students give greaterattention to describing key steps in their work. It is no longersufficient for students to describe in very general terms whatthey are doing; journal writing provides an opportunity for themto "map" the territory in some detail and to record their progress.However, their descriptions are almost devoid of personal com-mentary or reflection. At this stage, their frame of reference isrestricted to recording, "in very basic terms," the mathematicsthat has been covered in class.

e44.

Page 216: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

208 Clarke, Stephens. and Waywood

With further refinement, students begin to include them-selves in their summary of the mathematics covered. Not onlyis there more detail about what has been covered in class, butthey are beginning to locate themselves in relation to the math-ematics being taught. They begin to identify "problems in theirown learning and to describe how they achieved a solution It iscommon, at this stage, for students to illustrate their work byreference to several examples and by comments on them Yetthere is little discussion of why these problems arose and littleanalysis by the students of their own thinking.

At a more developed stage, students begin to focus on theideas being presented. This term marks a significant transitionin the frame of reference for students' journal writing. A thresh-old is crossed when students begin to relate the mathematicsbeing taught to what they are learning and begin to demon-strate their ability to connect new ideas with what they alreadyknow. One does not simply record or summarize Ideas Onehas to try to make sense of, or, come to terms with them Theycan be illustrated by, but are not identical with, examplesIdeas make up the territory, but no longer is the territory seenas fixed and unchanging. The student is part of the territoryand can change the way it looks. Communicating ideas andconnecting them to what is already known now become keyfeatures of students' writing.

At this stage, students are able to identify and analyze theirdifficulties, suggesting reasons why they are thinking in a cer-tain way. According to teachers, students begin to questionwhat they are doing and show increasing confidence in usingtheir own words to link ideas. They are able to make sugges-tions about possible ways to solve problems, even if these ap-proaches may not prove to be successful. They are able to talkmore confidently about questions they have "in mind." Throughtheir writing, they show that they are actively teaching them-selves mathematics.

Teachers can play a critical role in helping students toassume this degree of control over their learning. Getting stu-dents to articulate their own thinking at the point where theyare coming to terms with a new idea, or meeting difficulty, isessential to helping many to move into the more reflective modeof writing, characterized as Dialogue. The key is to encourage

Page 217: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 209

students to question themselves when they do not understand,rather than rely on the teacher to tell them whether they un-derstand. As one teacher wrote in a student's Journal: "Unlessyou can explain it to me, you don't really understand."

Articulating their own thinking In their own terms is chal-lenging and empowering to students. As they move into thismode of journal writing, many students frequently comment onrealizing "just how valuable the journal has become." Helpingstudents to achieve this level of development in their Journalwriting was a goal which teachers saw as achievable bymany students and to which all teachers expressed genuinecommitment.

Students and Teachers: A Brief Comparison of ViewsA comparison of student and teacher data is informative in atleast three ways. Such comparison can reveal (a) the percep-tions of the purpose and value of journal writing held by thetwo groups, (b) the extent to which teacher expectations arerealized in student practic.% and (c) the way in which studentperceptions of teachers' actions and beliefs match the professedbeliefs and actions of the teachers. From the emergent com-monalities and differences of view, it was evident that, whilethe classroom implementation of Journal use may not univer-sally match the stated policy and goals of the school, bothteachers and students saw real value in journal writing. Thestatements which follow summarize points of contrast and con-sistency between the two groups' accounts of journal writing:

While most students reported that they were writingin their journals after every mathematics lesson, asrequired, the amount of time spent in this writing wastypically less than teachers' expectations.

Three-quarters of the student sample reported thattheir teachers read the journals at least as frequentlyas the "twice a term" which most teachers reported.Student data revealed that the frequency with whichteachers read their students' journals was predomi-nantly a characteristic of the individual teacher.

Contrary to teachers' expectations, very few parentsever read their children's mathematics journals.

Page 218: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

210 Clarke. Stephens, and Waywood

Teachers recognized the difficulty many children expe-rienced in trying to explain their thinking.

Responses from teachers and students stressed theimportance of communication as a I rt of mathemat-ics learning.

The regular interaction, which teachers saw as arisingfrom journal use. varied substantially with studentperceptions r` the actions of individual teachers.

Those aspects of journal writing which teachers mostfrequently saw as needing improvement focused oncharacteristics of the Dialogue mode. Student responseswere more varied. The forms of improvement whichreceived significant support from students were as di-verse as their modes of use.

Senior students reported an improvement in their jour-nal writing. Teachers felt that students progressed intheir writing in the course of a year. The proposedtaxonomy of Journal writing (Narrative, Summary, Dia-logue) emerged as a robust, powerful, and informativemodel of this progression.

The use of student mathematics journals at Vaucluse Col-lege offered the possibility of communication in all three modescommunicating about mathematics, communicating mathemat-ics, and using mathematics to communicate. In particular, theintegrated development of communication skills and mathemati-cal thinking was central to the aims of the Vaucluse program.For some teachers, the ultimate goal of journal writing was toequip students to use mathematical forms and structures todescribe their everyday world. However, the nature of journalwriting derived from classroom purposes, and this close con-nection with schoolwork may not have offered students theopportunity to extend their growing confidence in mathematicallanguage by applying it to situations outside schoolwork.

SOME BROADER ISSUES

Communication in mathematics is not a simple and unambigu-ous activity. The significance of this study is that it points to

1_0

Page 219: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Communication and the Learning of Mathematics 211

modes of communication as indicative of stances towards learn-ing mathematics and ultimately of students perceptions of math-ematical knowledge. The categories which we have employedserve a dual purpose: as descriptive of students' perceptions oftheir learning of mathematics and, in the second instance, as aprogression in student mathematical activity.

When students write in the Narrative mode, they see math-ematical knowledge as something to be described. In the Sum-mary mode, students are engaged in integrating mathematicalknowledge, now conceived of as a collection of discrete items ofknowledge to be collected and connected. When writing in theDialogue mode, students are involved In creating and shapingmathematical knowledge.

IMPLICATIONS AND DIRECTIONS FOR FURTHER RESEARCH

The IMPACT procedure is now in wide use nationally, havingbeen applied in the teaching of students from Year 4 of primaryschool to third year tertiary, and several localthat is, school-basedevaluations of its effectiveness are being conducted.

The publication of Assessment Alternatives in Mathematics(Clarke, 1989) has received an enthusiastic response, and theimplementation of its contents would significantly alter the qual-ity and the diversity of the modes of communication typicallyemployed in mathematics classrooms.

With regard to the use of mathematics journals, a criticalconsideration for other teachers of mathematics is the natureof the interaction and the communication opportunities whichstudent journals offer. We hope to continue the Vaucluse Studyand to report in greater detail on the teacher's role in nurturingthe emerging dialogue and responding to signs of increasingstudent reflection and changes in the quality and sophistica-tion of their communications. Comparison of the Vaucluse datawith responses from students and teachers in other schoolswould shed further light on the possible effects of journal writ-ing on student conceptions of mathematics, mathematical ac-tivity, and school mathematics practices and on the signifi-cance of communication in the learning of mathematics.

The student journals themselves constitute a unique datasource on the way in which students construct mathematicalmeaning and on the developmental stages in students' ability

Page 220: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

212 Clarke, Stephens. and Waywood

to make such constructions. Our understanding of communi-cation and the relationship between language and mathematicslearning may also be informed by a more detailed study of thenature and process of journal writing.

.2 2 0

Page 221: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of MathematicalUnderstanding

Mark Wilson

When we think of the learner as an active participant in con-structing his or her own conceptualization of mathematics, weare forced to reassess the nature of mathematics tests. Tradi-tional tests were based on an atomistic model of knowledge.Newer tests, based on a model of developmental change inunderstanding, are needed. This article describes recent ad-vances in developing such an approach.

One way to measure student achievement is to give a testand to record the questions answered correctly or incorrectly.In modern test theory (such as Item Response Theory (Hambleton& Swaminathan. 1985( or Rasch Model analyses (Wright &Stone, 19791), a student's standing on an achievement variableis estimated from the resulting vector of right and wrong an-swers. This variable is calibrated and criterion-referenced bythe test items that students attempt and so provides a frame-work for mapping student progress. If the aim of an instruc-tional program is to provide students with an unstructuredbody of facts, skills, and algorithms, then this methodology canbe particularly appropriate. Items can be constructed to indi-cate the presence or absence of specific pieces of mathematicson any given occasion, and students' performances on thoseitems can be scored either right or wrong. However, not allcurricula are based on the premise that learning is a matter of

213

1 -4

Page 222: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

214 Wilson

absorbing and reproducing provided information. Another wayto build a curriculum is to concentrate on the conditions underwhich students change the way they conceptualize a subject.Progress occurs when a student discards a less sophisticatedmodel or representation of a phenomenon in favor of a moreexpert conception. Traditional mathematics achievement testsare not well suited to the identification of the conceptions thatstudents bring to problems. A new testing approach is requiredto map progress in conceptual understanding. This article de-scribes recent advances in developing such an approach.

UNDERSTANDING AS A CONSTRUCTIVE PROCESS

A view of learners as passive absorbers of facts, skills, andalgorithms provided by the teacher is the basis of most currentmeasurement theory and practice. Standard achievement testsmeasure students' abilities to recall and apply facts and rou-tines presented during instruction. Some items require only thememorization of detail: other items, although supposedly de-signed to assess higher-level learning outcomes like "synthesis"and "evaluation," often require little more than the ability torecall a formula and to make appropriate substitutions to ar-rive at a correct answer. Test items of this type are consistentwith a view of learning as a passive, receptive process throughwhich new facts and skills are added to a learner's repertoire inmuch the same way as bricks might progressively be added toa wall. The process is additive and incremental: students withthe highest levels of achievement in an area are those who haveabsorbed and can reproduce the greatest numbers of facts,formulae, and algorithmic productions. The practice of scoringanswers to Items of this type either "right" or "wrong" is consis-tent with the view that individual units of knowledge or skillare either present or absent in a learner at the time of testing.Under this approach, diagnosis is a simple matter of identifyingunexpected holes or gaps in a student's store of knowledge.This creates a perceived need for remedial teaching that fills adeficit in those subareas of learning in which knowledge is"missing."

For some topics in the school curriculum, this approach tomeasurement may be appropriate. But recent research on stu-dent learning has led to a new view of the student as a con-

on.)A; Ad Ad

Page 223: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 215

structive participant in building his or her own understandingof subject matter. Learners do not just absorb new information,but rather they construct their own interpretations and relatenew information to their existing knowledge and understand-ings. Thus, experts and novices are seen to differ nct merely inamount of their knowledge but also in the types of conceptionsand understandings that they bring to a problem and in thestrategies and approaches that they use. In cognitive science,comparisons of novices and experts in various fields of learning(Chi, Feltovich, & Glaser, 1981; Larkin, 1983; McCloskey,Caramazza, & Green, 1980) show that expertise typically in-volves much more than mastery of a body of facts: experts andnovices usually have very different ways of viewing phenomenaand of representing and approaching problems in a field. Ex-pert-novice studies suggest that the performances of beginninglearners often can be understood in terms of the inappropriateor inefficient models that these learners have constructed forthemselves. Similar observations have been made in the field ofmathematics education (Nesher, 1986; Resnick, 1982, 1984).

Expert-novice research does not in itself offer a panacea forthe problems that arise from traditional views of learning, em-phasizing as it does the differences between two (relatively)static states rather than the process of change. which shouldbe the focus of assessmentbut it does at least point out twoend points of the process of learning. The importance of pro-cess in mathematics education has been emphasized in a num-ber of surveys (D'Ambrosio, 1979; Freudenthal, 1983; Rom-berg, 1983), as have the active, constructive processes ofconjecture (Schwartz, 1985) and problem solving (NCTM, 1980).A constructivist vision of what constitutes mathematicsthecreation of (new) orderlies behind the epistemology of vonGlasersfeld (1983) and Davis and Hersh (1981). The "concep-tual field" approach of Vergnaud (1983) has also as one of itsmost important elements a constructivist perspective on howchildren's conceptions are' built from problems they have solvedand situations that they have met.

The "phenomenographers" in Sweden and other parts ofEurope (Marton, 1981; Dahlgren, 1984; Saljo, 1984) haveadopted a similar perspective, using clinical interviews to ex-plore the different understandings that students have of keyprinciples and phenomena in a number of fields of learning.

` 1)

Page 224: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

216 Wilson

These interviews have revealed a range of student conceptionsof each of the phenomena that these studies have explored andhave illustrated the importance of forms of learning which pro-duce "a qualitative change in a person's conception of a phe-nomenon" from a lower-level, less sophisticated conception to amore expert understanding of that phenomenon (Johansson,Marton, & Svensson, 1985. p. 235). Similar investigations onproblem solving in both mathematical and science contexts hasbeen carried out by Laurillard (1984). This interviewing tech-niqt e has resulted in a conception of learning In which a stu-dent is considered to almost always have some understandingand some strategy when addressing a new problem. All learn-ers are considered to be engaged in an active search for mean-ing, constructing, and using representations or models of sub-ject matter. Rather than being "wrong." beginning learners havenaive representations and frequently display partial understand-ing which they apply rationally and consistently. In arithmetic,for example, "it has been demonstrated repeatedly that noviceswho make mistakes do not make them at random, but ratheroperate in terms of meaning systems that they hold at a giventime" (Nesher, 1986. p. 1117).

For the assessment and monitoring of student learning, animplication of this view of learning is that we must start mea-suring the understandings and models that individual studentsconstruct for themselves during the learning process. In manyareas of learning, and in mathematics in particular. levels ofachievement might be better defined and measured not in termsof the number of facts and procedures that a student canreproduce (i.e., test score as counts of correct items) but interms of best estimates of his or her levels of understanding ofkey concepts and principles underlying a learning area.

CONSEQUENCES FOR MATHEMATICS ACHIEVEMENT TESTING

Traditional achievement tests begin with a statement of theinstructional objectives to be assessed, which should be statedas directly observable student behaviors that can be reliablyrecorded as either present or absent (Bloom. Hastings, &Madaus, 1971). This advice tends to result in items that arediscrete in their relationship to the objectives and involve rela-tively unambiguous performances. The epitome of this is the

Page 225: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 217

multiple-choice item, which, due also to its ease of use withmachine-scored answer sheets, has made the multiple-choiceitem the automatic choice for test developers. Hence, the ad-vantages of traditional achievement testing include (a) its provi-sion for a close link between curriculum objectives that can beexpressed in behavioral terms and the resulting measures ofstudent achievement and (b) the specification of standard test-ing conditions and scoring rules, which reduce subjectivity inassessment and provide results that are comparable over timeand across students.

However, a disadvantage of traditional achievement tests Isthat, because of the emphasis these tests place on preciselydefined student behaviors, they can encourage students to fo-cus their efforts on relatively superficial forms of learning. AsBloom himself wrote, such tests "might lead to fragmentationand atomization of educational purposes such that the partsand pieces finally placed into the classification might be verydifferent from the more complete objective with which onestarted" (Bloom, 1956, pp. 5-6). Alternatively, one might baseachievement testing not on the detailed specification of manyobservable student behaviors, each of which can be recordedas either present or absent, but on a consideration of the keyconcepts, principles, and phenomena that underlie a course ofinstruction and around which factual learning can be orga-nized. This alternative approach recognizes that learners havea variety of understandings of phenomena and that some ofthese understandings are less complete than others.

The challenge, then, is to find out enough about studentunderstanding in mathematics to design performances that willreflect these different understandings and to then design as-sessment techniques that can accurately reflect these differentunderstandings. This is a much more theory-Intensive test gen-eration model than that used for traditional tests. Even in do-mains where much research has been done, it may be the casethat important subgroups of students give responses that donot match our expectations well. Hence, the test developmentand Implementation model that we need must allow for greaterflexibility in item scoring and in interpretation of the test results.

The primary focus of a mathematics testing methodologybased on an active, constructive view of learning is on revealinghow individual students view and think about key concepts in

(1)1')ii

Page 226: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

218 Wilson

a subject. Rather than comparing students' responses with a"correct" answer to a question so that each response can bescored right or wrong, the emphasis is on understanding thevariety of responses that students make to a question andinferring from those responses students' levels of conceptualunderstanding.

One area of learning in which work has been done to un-derstand how students think about and approach phenomenais the area of so called "open sentences." Take as an examplethe work of Sandberg and Barnard (1986). Elementary schoolstudents were asked to solve open sentences like 6 - r. 2 andwere then asked to explain their solutions. Sandburg andBarnard analyzed the protocols from these explanations to clas-sify their solution strategies into one of the six types given inTable 11-1.

Table 11-1Strategies for Solving Open Sentences

Strategy

1

2

3

4

5

6

Answer

Add all. When the form does not conform to the canonicalstructure add the two given numbers.

Interpret the operation sign as a direct instruction to performthe stated operation on the two givens.

Read and solve the problem from right to !eft when the equal-izing sign is placed on the left.

Read and solve the problem from the right to the left whenthe problem first states the unknown.

Bridge the gap between the two given numbers. When thestructure is riot canonical then the difference between thelargest and smallest number is determined.

Expert

The observations made in studies such as this one suggestthat students do not simply make random "errors" but operatein terms of naive theories about mathematical phenomena. Inthe area of open sentences. Sandburg and Barnard (1986) foundthat "their answer pattern could be interpreted in terms of verysystematic behavior.... Each child was found to use one over-all strategy" (p. 5). Similarly, through their interviews with Swed-ish students about aspects of science learning, Johansson,

0'104.4 (I

Page 227: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 219

Marton, and Svensson (1985) arrive at a similar conclusion: "Inour case, a discovery of decisive importance was that for eachphenomenon, principle, or aspect of reality, the understandingof which we studied, there seemed to exist a limited number ofqualitatively different conceptions of that phenomenon, prin-ciple, or aspect of reality" (pp. 235-36). Researchers have ob-served that the same naive conceptions can be found amongstudents from different countries and with different educationalbackgrounds. Studies in four countries, for example, have shownthat there is a systematic and understandable set of rules usedby students who do not compare decimals in the standard way(Leonard & Sackur-Grisvald, 1981; Nesher & Peled, 1984: Swan.1983).

Research findings such as these invite a reconsideration ofthe way in which we think about and attempt to measurestudent learning. Many students are succeeding on precise.operationally defined objectives without developing an under-standing of the material that they are learning. Partial if notdirect blame for this, at least for the ease with which this hasbecome the norm, must surely be directed to the standardsand practices that we have allowed to flourish in the testingcommunity. For many mathematics educators, the answer is toplace greater emphasis not on the learning of mathematicalformulas and algorithms but on changing students' ways ofthinking about mathematics. As one of the phenomenographersput it: "In our view, learning (or the kind of learning we areprimarily interested in) is a qualitative change in a person'sconception of a certain phenomenon or of a certain aspect ofreality" (Johansson, Marton, & Svensson, 1985. p. 235). Theassessment of such qualitative changes must equally becomethe goal of those who construct mathematics achievement tests.

LEVELS OF MATHEMATICAL UNDERSTANDING

A methodology for mapping student progress in conceptual un-derstanding would first identify a variety of important conceptsin an area of mathematics learning and then develop questionsor tasks that can be used to explore the different understand-ings that students have of those concepts. A set of orderedcategories would be defined corresponding to different levels ofconceptual understanding within each task. The conception of

247

Page 228: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

220 Wilson

ordered levels is basic to a view of learning as a "shift" or a"change" in a student's understanding. Such a change consti-tutes learning only if it involves a change from a lower-level,less sophisticated understanding to a higher-level, more so-phisticated conception. Of course, there may be interesting con-ceptual changes that are not fundamentally ordered, and theseneed explication also, but such changes take on educationalsignificance only in relation to students' progress toward moreexpert states (i.e., progression through the levels).

The set of (un- )ordered categories for a question is con-structed by first exploring the variety of responses that stu-dents give when they are confronted with that a' Mon andasked to explain their thinking about it. To start w . the datafrom which ordered categories are constructed for a questionare usually collected through student interviews. Qualitativeanalysis of the interview protocols results in ordered categoriesthat provide a framework for recording future responses to thatquestion and introduce the possibility of basing measures ofachievement on students' levels of understanding. This is es-sentially the method used by Martnn (1981) and hisphenomenography group at the University of Gothenburg. There,researchers interview students to explore their understandingsof particular concepts and principles, transcribe tape record-ings of these interviews, and then carry out detailed analyses oftranscripts. "The aitn of the analysis is to yield descriptive cat-egories representing qualitatively distinct conceptions of a phe-nomenon" (Dahlgren, 1984, p. 24). These categories form an"outcome space" that provides "a kind of analytic map" (p. 26)of students' understandings of each phenomenon. Learning isthought of as "a shift from one conception to another" (p. 31)on this map.

Returning to the example of open sentences depicted inTable 11-1, the strategy categories can be quite straightfor-wardly interpreted as ordered levels: level 0 is "no strategy";level 1 is the use of strategies that arc only sometimes success-ful, that is. strategies 1 through 5; and level 2 Is the use of theexpert strategy 6. In this interpretation, the structure of thelevels would be identical for each item. Sandberg and Barnardpoint out that, in fact, the success of solution strategies I

through 5 is dependent on which types of open sentence prob-lem are being solved. For instance, strategy 1 will correctly

0 to

Page 229: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 221

solve an item like: 2 = 7, but strategy 5 will not. Under thesecircumstances it may be preferable to use a more complicatedset of levels: let level 0 be "no strategy" as before; the incorrectstrategies can be mapped onto level 1; the strategies that arecorrect in this case but not generally (i.e., not strategy 6) canbe mapped onto level 2; and the expert strategy can be mappedonto level 3. An example of how this would work for three typesof open sentence items is given i.1 Table 11-2. This time theinterpretation will be complicated by the fact that the strategiesdo not have consistent efficacies across problem types..

Table 11-2Partial Credit Levels for Three Types of Open Sentence Items

Strategy

Item Type'

a

"No strategy' 0 0 0

1 2 2 1

2 1 2 2

3 2 1 2

4 1 1 2

5 1 1 2

6 3 3 3

'Exemplars of the item types are

These interviews with students are essential for identifyingthe variety of understandings that learners have of phenomenaand for constructing ordered categories for individual ques-tions. But in many practical settings, interviews are not practi-cable as a basis for achievement testing. Alternative observa-tion formats must be used for the purpose of assigning studentsto the categories that have been defined for test questions. Thisrequires new kinds of imaginative tests that are capable ofproviding information about the conceptions that students bringto questions and that are also sensitive to the performancechanges that can result from conceptual change.

One approach to exploring students' levels of understand-ing is through computer-administered tasks. When students

0 I4.f..#%-1

Page 230: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

222 Wilson

enter their responses to questions into a computer, these canbe matched to libraries of common responses that are keyed tostrategy use. In this way, particular kinds of errors and misun-derstandings can be identified and inferences made about stu-dents' levels of understanding. Clearly, in the open sentencesexample, it would be possible to assemble sets of open sen-tences that together would allow a decision to be made con-cerning which strategy was being used. Additionally, if a deci-sion was not clear within a reasonable number of problems,this could indicate a student whose strategy use was eitherinconsistent, or of a nature different from the Sandburg-Barnardtheory. Sufficient evidence of this type would lead to modifica-tions in the theory itself.

A MODEL FOR MEASURING LEVELS OF UNDERSTANDING

The methods that have been developed for the analysis of rightand wrong answers to test questions must be extended to sup-port the construction of achievement measures from observa-tions recorded in sets of ordered outcome categories. One suchmethod, the Partial Credit Model (PCM). is described by Mas-ters (1982) and Wright and Masters (1982); another, the GradedResponse Model, has been described by Samejima (1969). Al-though these two models have certain important differences interms of philosophical foundations and psychometricparametization, they yield quite similar results in practical ap-plications. The PCM proposes that the probability of a personscoring in ordered level x rather than level x- 1 on a particularitem i will increase steadily with ability in an area of learningsuch that the conditional probability of being in the highercategory is:

RIPE- + ED,

exp(3 -1 + exp((3 - 8)

where If is the probability of a person responding in category x(x = 1,2 m,) of item i, (3 is a person's level of ability in thisarea of learning (to be measured by this set of items), and Sir isa parameter that governs the probability of a response beingmade in category x rather than in category x 1 of item I. Byapplying this simple logistic expression to the transition be-tween each pair of adjacent outcome categories for each item,

4.

Page 231: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 223

we form a connection between the ordered categories for thatitem and the underlying variable that the set of items is used tomeasure. This model can provide measures of achievement basedon inferences of students' levels of understanding of each of anumber -f concepts or phenomena in an area of learning.

The PCM provides a framework for assessing the validity ofattempting to summarize performances on the basis of differ-ent aspects of achievement in a single global measure. ThePCM is used to construct a "map" to show how students' un-derstandings of a phenomenon change with developing compe-tence. In addition, the PCM provides a framework for identify-ing aspects of achievement in which a student is experiencingdifficulty or making unexpectedly slow progress. The PCM takesas its basic observation the number of steps that a person hasmade beyond the lowest performance level. Consequently, theparameter to be estimated is the step difficulty (8) within eachitem. These step difficulties are substituted into the above modelequation for the PCM to give a set of model probabilities for anygiven value of person ability. Figure 11-1 shows a plot of thesemodel probabilities in a diagram called an "Item response map."

to

0.9

OS 2.

0.1

06 -

05

0.412

0.3

0.2

0 . 0 i _

0

r- -I I I 1

4.0 -3.0 -2.0 0.0

Ability

3

1.0 7.0 3.0 4.0

Figure 11-1.Item response map (Item 1). (Kulm, 1990, p. 190). Reprinted with permission.

0 "A 'I!.A A-

Page 232: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

224 Wilson

Responses to this item have been scored in four orderedcategories labelled 0 to 3. In this picture, ability increases tothe right on the page from -4.0 to +4.0 logits. The logit scale isa log odds scale. Thus, for a dichotomous item, the odds ofsuccess is calculated by taking the antilog (to base e) of thelogit difference, and the probability of success is found by solv-ing the equation L = log(P/(I-P)), where P is probability. L is thelogit and log is the natural logarithm. For a person who is 1.0logits above an item, the odds of that person succeeding on theitem is exp(1) = 2.72, and the probability of success is exp(1)/ll+exp(I)) = 0.73. This calculation is useful for gaining a "feel"for the interpretation of distance in the logit metric, but it mustbe emphasized that the interpretation for polytomous items issomewhat more complex. The best strategy is to read the prob-ability directly from the Figure, as is done in the next para-graph.

From Figure 1 it can be seen that a person with an esti-mated ability of 0.0 logits (middle of the picture) has estimatedmodel probability of about 0.05 of scoring 0 on this item; 0.18of scoring 1; 0.42 of scoring 2; and 0.35 of scoring 3. Therelative values of these model probabilities change with in-creasing ability so that, over the portion of the ability variableshown here, low scores of 0 and 1 become decreasingly likely,and a score of 2 on this item becomes increasingly likely up toan ability level of about 0 logits. As ability increases above thislevel, a score of 2 becomes less likely as the highest possiblescore of 3 on this item becomes an increasingly probable result.

The item response map in Figure 11-1 can be used to illus-trate several important features of the PCM. Consider the hori-zontal line through the middle of the picture at probability P =0.5. The intersection points of this straight line, labelled heret1,T. and -(3, are known in the psychometric literature as "thresh-olds." In dichotomously scored items there is only one thresh-old (or difficulty) for each item, defined as the position on thecontinuum at which the single ogive for that item intersects P =0.5. One practical difficulty that arises in examining item re-sponse maps is that it is difficult to arrange more than two ofthem side-by-side in a reasonably sized figure. This is oftenrequired as the items are most often interpreted in relation toone another. The thresholds provide a way to summarize infor-mation about several partial credit items; simply place the

0'1')410 is

Page 233: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 225

Thurstone thresholds next to one another on a "summary re-sponse map," an example of which is shown in Figure 11-2. Acertain amount of detail is los+ (in fact information is providedonly about the points at which successive cumulative prob-abilities reach .5). but this is always the case with a summaryand should not be a problem if the item response maps areprovided as well. The Thurstone thresholds can be interpretedas the crest of a wave of predominance of successive dichoto-mous segments of the set of levels. For example. 1-1 is theestimated point at which levels 1, 2, and 3 become more likelythan level O. 12 is the estimated point at which levels 0 and 1become more likely than ! -31s 2, and 3. and T3 is the estimatedpoint at which levels 0, 1, and 2 become more likely thanlevel 3.

-40 -311 -20 OD

Abity

Figure 11-2.A summary item response map. (Kulm, 1990, p. 192). Reprinted with permission.

10 LO 3.0 4.0

The PCM makes no assumptions about the unconditionaldistributions of the persons along the latent trait but doesassume that the model adequately fits the data. Tests of itemfit identify individual items which function differently from other

Page 234: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

226 Wilson

items and may lead to the conclusion that it is inappropriate toattempt to summarize all aspects of competence in a singlemeasure. Persons who may be functioning differently from themajority are also indicated by fit indicators. Model fit can beassessed using a measure of fit caned the Item Fit t" for itemsand the "Person Fit t" for persons (Wright & Masters, 1982),which is a transformed mean square statistic. The distributionof these statistics is not precisely standard normal, so it will beused to focus attention on the more serious problems ratherthan to make a strict decision about whether persons or itemsfit or not. For items, it is possible to find the empirical itemresponse map, which allows visual inspection of items thathave been selected on the basis of the Item Fit t as question-able. Another way to assess fit is to divide the sample of per-sons into groups with interesting and interpretable differences,reestimate the parameters in each case, and examine the dif-ferences. Only if the model fits in the different groups canmeaningful comparisons be made. These comparisons can beorganized by using an indicator called the "standardized differ-ence" between the estimates (Wright & Masters, 1982. p. 115).

EXAMPLE: USE OF THE SOLO TAXONOMY

The example discussed below is based on the Structure of theObserved Learning Outcome (SOLO) Taxonomy (Biggs & Collis,1982), which is a method of classifying learner responses ac-cording to the structure of the response elements. The tax-onomy consists of five levels of response structure:

1. a prestructural response is one that consists onlyof irrelevant information

2. a unistructural response is one that includesonly one relevant piece of information from thestimulus

3. a multistructural response is one that includesseveral relevant pieces of information from thestimulus

4. a relational response is one that integrates all rel-evant pieces of information from the stimulus

5. an extended abstract response is one that not onlyincludes all relevant pieces of information, but nx-

f

Page 235: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 227

tends the response to integrate relevant pieces ofinformation not in the stimulus.

It is expected that in a given topic area learners will movethrough each level from the prestructural to the extended ab-stract as their comprehension and maturity improve. Further-more, the majority of responses should be classifiable into oneof the levels in the SOLO taxonomy indicating the learner'slocation on a latent dimension: "The structure of the SOLOtaxonomy assumes a latent hierarchical and cumulative cogni-tive dimension" (Collis, 1982. p. 7).

In the particular items under study (Romberg, Collis.Donovan, Buchanan. & Romberg, 1982: Romberg, Jurdak, Collis,& Buchanan. 1982) a short piece of stimulus material, whichmight consist of text, tables, or figures, is supplied, and stu-dents are asked to answer open-ended questions concerningthe material. Together, the stimulus material and the questionsare referred to as a "superitem" (Cureton, 1965), and an ex-ample of one is given in Figure 11-3. The questions are linkedto one of the higher four levels of the taxonomy. The responsesare judged as acceptable or otherwise according to an agreedset of criteria, and the sum of the questions in a superitem isused as the indicator of SOLO level. In discussing the results,individual items within a superitem will be referred to as "ques-tions" to help keep clear the distinction between levels. Thefollowing example uses data from a study of a new statisticscurriculum for high schools (Webb, Day, & Romberg, 1988). Inall, 1,238 responses without any missing data on the sevenproblem-solving items are available for the analysis. Because ofthe age of the students, only the first four levels (i.e., excludingextended abstract) are assessed.

In the case of SOLO superitems, the thresholds in Figures11-1 and 11-2 can be interpreted in the following way. Thefirst threshold, r1, is where it becomes more probable that aresponse will be unistuctural or above; T2 is where it becomesmore probable that a response will be multistructural or above;and 13 is where it becomes more probable that a response willbe relational rather than multistructural or below. In Figure11-2, the unistructurai threshold is marked by a "+", themultistructural threshold is marked by an X, and the relationalthreshold is marked by an "*".

Page 236: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

228 Wilson

1. The lines on the graph are city streets. One-way streets for vehicles are indicated byarrows.

A. How many blocks would Alice (A) have to walk to visit her friend, Gayle, who lives atG, if she walks by the shortest way?

Answer

B. Alice (A) and Bill (B) have a friend Clara who lives at C. The three of them arewalking from their homes to meet at a restaurant (R). Who has the furthest to walk?

Answer

C. If Bill (B) moves 2 blocks east and 5 blocks south. Gayle (G) moves 4 blocks southand 2 blocks west, and Alice (A) moves 6 blocks east and 2 blocks south, whichperson now has the farthest to go to the restaurant by car if the car takes theshortest possible route from each home?

Answer

Figure 11-3.Item 1. (Ku Im, 1990. p. 188). Reprinted with permission.

ResultsThe distribution of the students along the latent variable de-fined by the seven problem-solving items is shown in Figure11-4, where ability has been estimated using the PCM (and isexpressed in logit units). The great bulk of the students (90percent) were estimated to be between -.63 logits and 2.38logits (scores 9 to 18). Thus, in interpreting the item response

Page 237: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical bnderstanding 229

maps, attention will be focused or portion of the abilityscale. Note also the nonlinear relationship between the logitscale and the scores; this is indicated by the selection of scorelocations given on the left-hand side of the figure. This will alsoneed to be borne in mind when interpreting the item responsemaps. The analysis was performed using the PC- CREDIT pro-gram (Masters & Wilson, 1988).

4.0 -3.0

(Suns 3

40 00 to 20

Ability

9 2 15 13)

Figure 11-4.Distribution of students along the problem-solving variable.

(Kulm, 1990, p. 188). Reprinted with permission.

3D 419

Figures 11-2 and 11-4 give an overall picture of the progresstoward a relational level of understanding that has been achievedby these students. All students are beyond the unistructuralthreshold for items 1 to 5. so, for these items, all are morelikely to give a higher response than prestructural. Most stu-dents are displaying a level of understanding where prestructuraland unistructural are less likely than multistructural and rela-

C 144; 0

Page 238: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

230 Wilson

[tonal, but few give evidence of a mainly relational response.This summary response map leads to an interpretation of "typi-cal performance" for students at a given location. For example,a student at 2.38 logits might be expected to respond thus tothe seven items: (3.3,2,3,3,2 or 3,3) (note that the estimatedexpected responses are real numbers, not integersthese "ex-pected responses" have been rounded to the nearest wholenumber and are hence somewhat inaccurate). Overall, we wouldwant to typify this student response as indicating a grasp ofthe relational level that is nevertheless challenged by the itemswith more difficult relational steps. Real student responses thatcorrespond to a logit of 2.38 vary considerably from this ex-pected pattern. For example, two student responses that actu-ally were recorded were, for student A, (3,3,2,3,2,2,3) and, forstudent B, (1,3,3,3.3.2,3). Student A differs from the expectedpattern only in one place. Student B, in two places. Are eitherof them seriously divergent from the expected pattern? ThePerson misfit indicator gives us some hint of this. It turns outthat while Student A's pattern is quite innocuous as far as fit isconcerned, the fit value for Student B is the highest recordedfor any student. Thus, absent further information, onewould be justified in making the above interpretation of theresponse vector for Student A, but one would need furtherinformation before one could make any such interpretation forStudent B.

Item 1 was shown in Figure 11-3, and the estimated itemresponse map for this item was shown in Figure 11-1. Theconditional probabilities of response indicate that most stu-dents are performing above the prestructural level, ranging fromapproximately 15 percent of students with an item score of 0(prestructural) at 9 points total (-.63 logits), to approximately85 percent of students with an item score of 3 (relational) at 18points total (2.38 logits). Thus, the predicted responses to thisitem range over the full SOLO spectrum within the range ofability of the majority of students in the sample. Moreover, theprogress within the SOLO levels is quite regular fromprestructural to relational for this item.

The relationship of the steps of Item 1 to the steps of theother items is also displayed in Figure 11-2, item 2. whichconcerns train timetable reading (Figure 11-5), has a pattern

'In this case. the expected value was approximately 2.5. hence 2 and 3 haveabout the same probability.

Page 239: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 231

of thresholds similar to item 1 for the first two steps, butclearly has a more difficult transition to the relational level. Theeffect of this on the item response map for item 2 can be seenin the upper panel of Figure 11-6, where the band for score 2(multistructural) is about twice as wide as that for item 1.Moreover, the relational threshold for item 1 is the easiest of allthe items. It is interesting to consider the differences betweenthe two "relational" tasks to try to understand this discrepancy.For item 1, the multistructural question requires the student tocompare the distances from three separate places on a streetgrid to a fourth location; the relational question adds the com-plication that the three original places are moved. For item 2,the multistructural question requires the student to find thelatest train that can reach a destination by a certain time: therelational question adds the complication that there is a certaintime required at each end of the train journey for walking toand from the station. The item 2 question clearly demands thatthe student go beyond the immediate information provided bythe timetable and use the timetable information in the contextof a more complicated problem. The item 1 question uses dif-ferent information to that provided by the original street grid,but the new information is of the sante kind as the originalthe student is asked to construct a revised grid. This is cer-tainly more difficult than the multistructural question, but itdoes not clearly involve the understanding of a relationshipamong the pieces of information in the stimulus. What might a"taxi-cab geometry" item that was relational look like? Perhapsif the students were asked to use some standard geometricalconcepts in the taxi-cab geometry world, such as. "What does acircle look like in taxi-cab geometry?", we might see more con-sistency between item 1 (Figure 11-3) and the rest.

Item 3 (Figure 11-7) displays a divergent pattern also, butthis time the relational question is more difficult than that ofthe remainder of items. The lower panel of Figure 6 shows avery wide band for score 2 ( multistructural), which makes aresponse on the relational level quite unlikely for this item.This item concerns the approximation of lengths of line seg-ments to the nearest inch and half inch, using a ruler. Theunistructural question requires the student to estimate thelength of a line segment to the nearest inch: the multistructuralquestion asks the same question, but specifies half inches; andthe relational question makes this harder by mtsaligning the

C1.4e ti

Page 240: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

232 Wilson

A train leaves Alma and arrives in Balma at these times in the summer:

Leave Alma Arrive Balma Leave Alma Arrive Balma

6:05 a.m. 6:50 am. 11:35 12:20 p.m.6:55 7:40 2:08 p.m. 2:537:23 8:12 3:35 4:207:42 8:17 4:50 5:308:03 8:43 5:12 5:479:20 10:05 5:34 6:14

10:35 11:20 7:35 8:20

A. What is the latest train from Alma you can get if you want to reach Balma by 4:30p.m.?

Answer

B. If you are busy working all morning and cannot travel before 10:00 a.m., what is thelatest train you can get so as to reach Balma by 3:00 p.m.?

Answer

C. A person lives 30 minutes from Alma and has an appointment in Berme at 1:30 p.m.The appointment is 20 minutes from the Balma station. What is the latest time thisperson could leave home for this appointment?

Answer

Figure 11-5.Item 2. (Romberg, Collis. Donovan, Buchanan, & Romberg, 1982).

Reprinted with permission.

line interval with the end of the ruler and failing to specify thestandard (i.e.. inch or half inch). Given this description, thedistinction between the uni- and multistructural questions doesnot appear to fit so well into the SOW framework. The rela-tional question is obviously going to be harder for students, butthis time it seems that the inconsistencies between this itemand the others may be confusing students. This may be caus-ing the relational question to appear very difficult.

Items 4 and 5 display a pattern similar to item 2. Item 4concerns a survey of people attending a football game, and item5 concerns the proportional mixing of liquids. As they, alongwith item 2, constitute the most generally consistent block ofitems, they will not be discussed at this point. Items 6 (itemresponse map in top panel of Figure 11-8) and 7 (Item responsemap in lower panel of Figure 11-8) exhibit a quite differentpattern of thresholds from that of items 2. 4, and 5. For bothpatterns, the unistructural threshold is much more difficult

Ir 0

Page 241: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

03

OD

0.7

0.6

0.5

0.4

03

Measuring Levels of Mathematical Understanding 233

to

to

Ability

03 -

03 -

0.1

Oh -

E 05 -

0 0.4 -0-

0.3 -

02 -

01

OD

43 -3.0 -20 -tO 0.0 W 10 3D 4D

Ability

Figure 11-6.Item response maps for items 2 (top) and 3 (bottom). (Kulm, 1990. p. 193).

Reprinted with permission.

C 1

Page 242: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

234 Wilson

3. When we use a ruler our measuring is not exact. To the nearest inch, the lines beloware each 3 inches long. The lengths are somewhere in the range of 21/2 inches to 3' /2inches.

A

C

E

D

B

F

0 1 2 3

A. What is the length, to the nearest inch, of the line EF?Answer

B. What is the length of GH?Answer

0 2 4 6 8 1011

I I

J K

C. What are the smallest and largest possible lengths of .1K?Answer

Figure 11-7.Item 3. (Romberg, Collis, Donovan, Buchanan, 8. Romberg, 1982).

Reprinted with permission.

than that of the other items, and the multistructural thresholdis correspondingly harder also. This has resulted in item re-sponse maps that are "pushed" to the right compared withthose for the other items.

Item 6 (Figure 11-9) has been criticized elsewhere (Rom-berg, Jurdak, Collis, & Buchanan, 1982; Wilson & Iventosch,

4 2

Page 243: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 235

1988) on the basis of an ambiguous and relatively very compli-cated multistructural question, so this issue will not be pur-sued here. One would not expect this problem to make theunistructural question unusually difficultit is a seeminglystraightforward graph-reading question, although it does usethe word "average," which might mislead some students intotrying to calculate a mean. Item 7 (Figure 11-10) is a probabil-ity question about guessing the month and season in whichpeople's birthdates fall. Given some familiarity with probability,the unistructural question seems a straightforward estimationof an expected value. Perhaps the explanation of the discrep-ancy here lies not in possible misapplication of the SOW tax-onomy, but in the lack of familiarity of students in the samplewith statistics and probability. This would explain the transla-tion of the Thurston thresholds for the uni- and multistructuralquestions towards the difficult end of the scale. The relationalquestions in both cases do not experience so great a shift. Thismight indicate that the lack of familiarity of the more ablestudents with statistics and probability was less marked thanthat of the less able. This might be due to such topics beingcustomarily included in enrichment portions of curricula, or tothe possibility that students who are more able in general havesufficient mathematical intuition and attention to detail to suc-ceed on these items, where less able students need instruc-tional exposure.

The fit of the items, as indexed by the Item Fit t, indicatesthat the worst case, by a considerable degree, is that o: item 1(t = 5.78). The origin of this lack of fit can he examined byconsidering the empirical item response map (solid lines inFigure 11-11), constructed by calculating the proportions ofstudents at each total score that make up each item score andthen plotting them on an ability metric as was done for thetheoretical item response maps. Looking at this Figure alonereveals two "blips" in the empirical map: one between -2.0 and-1.0 logits, and a smaller one at about 1.0 logits. Some per-spective on the meaning of "deviation" in this case can begained by superimposing the estimated item response map onthe empirical one. The dashed lines in Figure 11-11 show thatthe theoretical response curves are very discrepant at the lowerend but tend to fit somewhat better at the top end, apart fromthe second "blip". Notice how the theoretical curves tend to

c tA/ A

Page 244: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

236 Wilson

10

09

0.8

0.7

0.6

0.5

0.4 -a

02 -

02

0.1

0.0

to

09

4.0

as

0.7

04

z1-Z---

03.C3OofL

0.4

02

02

0.1

0.0

I IIII-10 -ID 40

Abiay

to

g

2.0 43

-

-

4.0 -10

I I I

-20 an

Ability

to 24

Figure 11-8.Item response maps for items 6 (top) and 7 (bottom). (Kulm. 1990. p. 193).

Reprinted with permission.

C +Iv -1 gi

4.0

Page 245: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 237

6. The figure below shows the average birth rates, marriage rates, and divcrce rates inMapland for each 10-year period beginning in 1925 up to 1974.

ME Marriages I \ \ Births Divorces

A. What was the average marriage rate in the years from 1925 to 1934?

Answer

B. Between which two periods did the average marriage rate decreasewhile the average birth rate increased?

Answer

C. What relationship seems to exist in general between birth rate andmarriage rate?

Answer

Figure 11-9.Item 6. (Romberg, Collis, Donovan, Buchanan, & Romberg, 1982).

Reprinted with permission.

Page 246: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

238 Wilson

7. A teacher tries to guess the season and month when any child in her class wasborn. If the teacher was to guess the season, she would most likely get 1 correct forevery 4 guesses.If the teacher was to guess which month any child was born, she would be likely toget 1 correct for every 12 guesses_

A. If the teacher used the seasons to make her guesses, how many times do you thinkshe would have been correct with four children's birthdays?

Answer

B. The teacher has 12 girls and 16 boys in her class. She guessed the month in whicheach girl was born and the season in which each boy was born. In how many of her28 guesses was she likely to have been correct?

Answer

C. if the teacher guessed 7 right out of 16 for the seasons and 6 right out of 12 for themonths, how many more correct guesses altogether has she made than you wouldexpect by chance?

Answer

Figure 11-10.Item 7. (Romberg, Coilis, Donovan, Buchanan. & Romberg, 1982).

Reprinted with permission.

balance between over- and underestimating the empirical curvesabove -1.0 logits. In calculating this statistic, greater weight isgiven to parts of the scale where greater information is avail-able, so it is not necessarily the case that the greatest contri-butors to the statistic are the discrepancies that look greateston Figure 11-11.

What has this told us about item 1? It looks uncomfortablylike this item is susceptible to some sort of misinterpretation bystudents of lower abilities. The estimated step difficulties andthresholds are mainly being determined by the behavior of stu-dents of ability greater than about -1.0 logits. For student. oflower abilities, their scores are being somewhat overestimated1..y these values. It looks as if some confusion occurs in stu-dents at about -1.0 logits that makes the questions relativelyharder. Perhaps students who recognize the grid as being aCartesian coordinate system (not the least able, obviously) makethe problem harder for themselves by trying to solve for Euclid-ean distances. This is the sort of issue that can only beunravelled by gathering more information from students abouttheir problem-solving tactics.

Page 247: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 239

jj0.8

0.7

0$ -1

OS

0.4

03 -I

02 1

0.1

OA I

40 -10

0

os

Abity

Figure 11-11.Empirical and theoretical item response maps for item I.

(Kuhn. 1990. p. 194). Reprinted with permission.

4.0

As a comparison. Figure 11-12 shows the theoretical andempirical item response maps for item 3, which had a muchbetter Item Fit (t = .50). For this item the considerable discrep-ancy at the lower end has not had so great an effect on thediscrepancies at the upper end where most of the weight of thestudent information lies. This situation raises the possibility ofan alternative interpretation of the large lit statistic for item 1.Perhaps the students at the lower end are simply less consis-tent about their problem solving than those over -1.0 logits.and the sum of these inconsistent responses for item 1 was onethat, by chance, affected the estimation procedure. Unfortu-nately, there is no way to determine the most likely of thesepossibilities given the data. The empirical results and the analy-sis of them using an IRT model can show ineonsistencic butinterpretation of such inconsistencies must be accomplished byprobing more deeply into the students' cognitions than is re-vealed by scores on the Items.

1ti A I

Page 248: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

240 Wilson

tn

02

0.7

0.6

0.5

0.4

0.3

1

Abifty

Figure 11-12.Empirical and theoretical item response maps for item 7.

(Kulm, 1990, p. 195). Reprinted with permission.

Discussion of Example

The results for this example have included a "map" of thevariable representing progress through the SOLO levels thatallows one to give a criterion-referenced interpretation of a givenstudent's mathematical understanding with respect to the SOLOlevels and the items that were used to elicit performance. Italso provided a framework for picking out performance pat-terns that were especially inconsistent and that deserve furtherexamination.

The results have also pointed to some specific problemswith particular items. Such results are not very useful if ourattempt to measure levels of mathematical understanding isseen as a hit-or-miss, once-only task. If, however, measure-ment is seen as an incremental process. involving the gatheringof information at various times in a variety of contexts, then

0

Page 249: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Measuring Levels of Mathematical Understanding 241

the lessons learned from this analysis may be put to some gooduse. The empirical results can be used to sharpen the tech-nique of translating the SOLO scheme into the reality of math-ematical problem-solving items. The need to sharpen the dis-tinction between multistructural and relational for one of theitems was noted. Another item needs closer examination toclarify why its relational level is so difficult. One item displayedsome inconsistency that may be due to confusion it caused forcertain students. The best way to explore such empirical re-sults is to collect samples of qualitative data at the same timeas the item scores are recorded. This could be as straightfor-ward as collecting a sample of the students' answer sheets(especially if they were encouraged to "show their work"). Amore formal strategy would be to interview a sample of thestudents taking the test.

Looking more closely at some of the relational questionswithin the items (e.g., items 1, 2, 3 and 4) leads one to specu-late whether the relational level has been well realized by theseitems. Certainly the relational question within each of theseitems would be expected to be more difficult, but that is notsufficient for it to be considered as indicating a higher levelwithin the SOLO taxonomy. For example, in item 2, the rela-tional question asks the student to place the use of a railwaytimetable into the broader context of a real-life problem whereone has to consider time taken to get to and from the railwaystation. This is adding an extra variable to the problem, but isit addressing the mathematical relations among the compo-nents of the timetable? What is needed is a strongly math-ematical idea of how to apply SOLO. One potential source forthis is the van Hiele (1986) mathematics learning sequence. Ifone compares the SOLO idea, which is a general approach, tothe van Hiele approach, one realizes that the van Hiele levelsconstitute successive relational levels that could be used in aSOLO framework. The Interesting complication is that SOWprovides a framework for assessing within the van Hick levels,and van Hiele levels provide a framework for linking betweenSOLO items at different levels.

C:

Page 250: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

12

A Framework for the CaliforniaAssessment Program to Report Students'Achievement in Mathematics

E. Anne Zarinnia and Thomas A. Romberg

This chapter proposes categories for the California Assess-ment Program to use in reporting student achievement inmathematics. Initially, state assessment reported achievementto the legislature for the purpose of accountability. However.assessment does more than simply register students' achieve-ment; it affects it in ways both intended and unintended. Inthese pages. the authors examine the explicit and tacit mes-sages. imposed in the analyzing, gathering, and aggregating oftest data. that have subtle effects on teaching and studentachievement. It Is determined that units of analysis and re-porting categories are needed which will deliberately supportthe purposes of adequate information for monitoring andbyfocusing attention on critical considerationspromote reformin mathematics education.

With recognition of the impact of assessment and strong,ongoing demand for educational reform, the goal of state as-sessment in mathematics is now to go beyond recording foraccountability purposes and to become an intentional catalystfor educational change. Thus, units of analysis and reportingcategories are needed that will both deliberately support thepurposes of gathering adequate information for monitoring and,by focusing attention on critical considerations, promote re-form in mathematics education.

4-%

242

Page 251: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 243

This document outlines seven bases for forming reportingcategories. Each set of categories is associated with major is-sues. prevailing educational practices, and demands for re-form. Also, each is discussed as it arises in the logic of theargument. The argument proceeds first by examining the presentassessment situation in California as well as in other parts ofthe United States. In the course of that examination, four re-porting categories are described. This is followed by a consider-ation of the primary objective of the reform movementthedevelopment of "mathematical power" for all students. Fromthat analysis, three additional reporting categories are presented,the last being recommended for use in California. Suggestionsare made for gathering and reporting appropriate evidence. Fi-nally, a recommendation is made based on a consideration ofthese alternatives.

THE PRESENT ASSESSMENT SITUATION IN CALIFORNIA

The gathering. reporting, use of, and reactions to assessmentinformation, as these activities now occur in California, shedlight on the problem of arriving at reform-oriented categories.There are a few key features to consider: first is the curricu-lum, intended, actual, measured, and achieved: second is theset of assessment and testing programs that are in place tomeasure and report the achieved curriculum; and third is thereaction of different groups of people to both.

For the last five years, the Mathematics Framework for Cali-fornia Public Schools: Kindergarten Through Grade 12 (Califor-nia State Department of Education. 1985) has been the state'soutline of an intended curriculum. Each district has its ownintended curriculum spelled out explicitly in its curriculumguide and tacitly in its textbook adoptions. In progressive dis-tricts. the guide has been revised to support the FYamework. Inother districts, the Framework may be mentioned but not reallyfollowed, or it may never be mentioned. The actual curriculumaddressed by the teachers is undoubtedly a pragmatic mixdetermined under day-to-day circumstances.

Assessment is a means of reporting students' achieved cur-riculum. Whatever we know about the curriculum that stu-dents actually achieve depends on the way the assessmentprogram measures and reports. The resulting information about

C4. Li 4.

Page 252: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

244 Zarinnia and Romberg

the achieved curriculum is only good for accountability if whatis measured is a valid proxy for the intended curriculum out-lined in the Framework.

To grasp the dimensions of the reporting problem, it iscritical to look at the broad picture of testing and assessmentand at the ways in which program data from mandated testingis used. The National Center for Research in Mathematical Sci-ences Education has gathered information about the impact ofmandated testing in the United States in a series of studies.The first surveyed the perceptions of eighth-grade mathematicsteachers nationally (Romberg, Zarinnia, & Williams, 1989). Thesecond surveyed state supervisors (Romberg, Zarinnia. & Wil-liams, 1990). in the third (Romberg & Zarinnia, in press a). theissue was pursued through in-depth case studies in four statesby interviewing teachers, testing directors, and administratorsin selected districts. The fourth study (Romberg & Zarinnia, inpress b) extended the pursuit of the issue to follow-up ques-tions with teachers.

This series of studies has providt a base of informationabout the impact of mandated testing in California in two ways.First, a set of data for California was extracted from the na-tional survey (Romberg. Zarinnia, & Williams. 1989). Second,California was one of the four states chosen to conduct thecase studies and follow-up questioning for the third and fourthstudies (Romberg & Zarinnia, in press a: in press b). Californiawas selected because it has been actively pursuing educationalreform by developing a state curriculum framework and bymodifying its state assessment to support attainment of thestandards in the framework.

The National Survey: California

Data (Romberg. Zarinnia, & Williams, 1989) suggested that Cali-fornia mathematics teachers are well informed about their stateassessment program and perceive it as emphasizing mathemati-cal understanding. In this respect, the perception of Californiateachers differs from the perception of teachers nationally thatstate assessment stresses essential competencies. The Califor-nia teachers also distinguished quite clearly between the em-phasis on understanding in the California Assessment Programand the basic skills nature of a typical district test.

Page 253: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 245

The California Case Studies

The teachers in the national survey were selected randomly.That is not true of the districts in the case studies, which werechosen to reflect a range of curriculum and testing environ-ments. Nevertheless, information from the case studies in sev-eral California districts can be used to develop a compositepicture that illustrates ways in which existing state and districtassessment programs exert an effect. The statements from teach-ers and administrators are used to write a coherent story andto point to issues that need to be considered in establishingreform-oriented reporting categories for mathematics.

Three assessment and testing programs have been man-dated to measure the achievement of California's students. Thefederal government has required pre- and post-testing to selectand account for Chapter 1 students. Second, the state requiresthat districts have a proficiency test for graduation. To satisfythis state proficiency requirement, school boards usually re-quire students to take a standardized test. And third, Califor-nia requires that every student must participate in the Califor-nia Assessment Program (CAP).

District Testing. In one of the districts in the California casestudy, to ensure adequate performance on the ComprehensiveTest of Basic Skills (CTBS) and achievement of minimum stan-dards, all students are tested at least weekly in a ComputerManaged Instruction (CMI) program and are required to mastera series of objectives specifically correlated with the CTBS. Prin-cipals in the district are evaluated on students' performance onCMI and the CTBS. Their teachers refer to the mastery programas "computer-managed testing" and are skeptical about thevalidity of the CTBS in relation to their curricular objectives. Tominimize the number of CMI tests to be taken by a student, thedistrict administers the CTBS in the fall as well as spring,recording, thereby, mastery of as many of the CMI objectives aspossible at the beginning of the year and reducing the timestudents spend in testing. The CTBS data are used to commu-nicate with parents about their child's performance. They arealso used to group and place students: one of the criteria forplacement In high school algebra. for example, is a score of 80percent or above on the CTBS. Although other districts are less

(

Page 254: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

246 ZarInnia and Romberg

intense in their approach, the uses of the district test are con-sistent with the data from the national survey.

The California Assessment Program (CAP). In addition todistrict testing with the CTBS/CMI systemor other tests inother districtsstudents are tested with the CAP in the springusing a matrix-sampling approach. Although each student takesonly a subset of test items, within each school the total matrixaddresses each of the subject strands and subtopics of theFramework. Results are reported by Framework strands. CAPreturns a performance report to each school with school andstate scores and change scores, as well as percentile rankingwithin a statistically similar comparison group of schools. Inthis way, schools can compare their performance with schoolsstatewide, with schools in their statistical comparison group,and with their own performance in previous years.

Both teachers and school administrators in the case stud-ies stated privately that CAP scores are important and are abasis for formal evaluation of administrators and informal evalu-ation of teachers. The pressure to use the scores in this way isconsiderable. CAP scores are printed in the newspapers in sucha way that every school is compared with every other school.

The case study on the CTBS/CMI system revealed that thedistrict administration requires principals to analyze and re-spond to the data in a written report that identifies perceivedproblems and outlines plans for dealing with them. Thus. theprincipals examine performance on the strands reported, pin-point two to three weak categories or topics, and request themathematics department to propose strategies for dealing with.for example, low scores in Measurement. An indication of theimportance of the scores to districts is that although teachersin the district with the CMI system have no prep periods. theyare released for a whole day to go to the district offices todiscuss the CAP profile and review the mathematics program.These teachers regard CAP as helping them to get away fromthe overemphasis on the CTBS objectives, which they describeas almost entirely computational and to focus on the strands ofthe FYamework. They said that CAP supports the Frameworkand validates their emphasis on problem solving. However, theyalso claimed a mismatch between what they do in problemsolving and the efforts of CAP to assess it with multiple-choiceItems:

-

Page 255: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 247

It has something that it calls problem solving, but Idon't think the way that I do problem solving, you couldput on a matrix type test, where you have just a fewquestions and it's multiple-choice. (Becker & Pence, inpress)

The same teachers from the CTBS/CMI district were inves-tigating alternative assessment strategies, such as portfolios,and proposed alternatives:

If we are going to have a test, I would like to see it be acollection of the different works the students have done.take a sample of their efforts to do a nice rich problemin the Fall. and then use that as a testing situation.... Idon't know how they are going to go through life with-out being able to put all of these types of ideas together.(Becker & Pence, in press)

In another district, a teacher described a close match be-tween the test and what he wants the remedial students tolearn. He distinguished clearly between the kinds of problemshe considered suitable for his remedial class and those suitedto the accelerated class. He also had a different attitude aboutcalculators for the remedial group, feeling that those studentsneeded to know how to enter an operation into the calculatorbefore they should be allowed to use one freely. Interestingly,the proficiency testa dominant part of their mathematicalexperience for the remedial group, but a negligible concern forthe accelerated studentsdoes not allow calculators. The sameteacher's accelerated students "behave as though their calcula-tor is an extension of their hand." The irony is that many wouldconsider this entrenchment of low-level approaches appropriatedifferentiation of the curriculum.

The pressures resulting from CAP's percentile rankings area problem. A teacher in a district of lower socioeconomic statusexpressed frustration with the publication of rankings that mini-mize the impact of any improvement in scores. In a high scor-ing school, the percentile rankings are also bad for teachermorale and act as disincentives.

We are expected to be in the high 90s or they wonderwhy. Last year we scored 95 percent in math, and thisyear we scored 94 percent: we know our administrator

Page 256: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

248 Zarinnia and Romberg

is going to wonder why ... and they'll be asking us tolook over our program and see what area we are goingto bolster up to get that score back. If we happen tocome in the 93rd instead of the 96th, we did the best wecould, the kids did the best they could. But at the sametime, we get a feeling we are being criticized for it.... Well,if you got a hundred percent. you would be dead be-cause the next thing you would be expected to do isimprove beyond that ... and you just can't. (B..eker &Pence, in press)

In spite of this pressure, one teacher described the relation-ship between CAP. the 1985 Framework, and her .,:aching asfollows:

Well. I have never seen the CAP test; I think here -Indthere I have seen samples ... to me it's just this bigmysterious thing that I am really curious about. I amreally trying to line up with that Framework because Ithink it's a very sound framework. I think it's very wellbalanced. I can't see too many things at this point thatthey would change. I think the trend now is to get morewriting into the math cuniculum, which still you canvery easily slide that in. The Framework is making mathfun to teach. I have taken lots of courses on usingmanipulatives, so I'm just very excited about it.... [CAMis just as accurate, I guess, as anything else could be.But I still want more of an individual type score. I thinkthat would be more helpful to us teachers. (Becker &Pence, in press)

Teachers interviewed for the case studies repeatedly saidthat the data from CAP are not very useful to them because ofthe time lag and because they do not get data for individualstudents. CAP is intended for program, not student, assess-ment. In fact, there are strong feelings in some quarters thatCAP should not be extended to individual student assessmentbecause of the embedded implications of state political controlversus local authority. The fact remains that in lieu of mea-sures of Individual student achievement that represent a seri-ous attempt to reflect the Framework, poor measures generatedby the district tests are used:

Page 257: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the Caltfornia Assessment Program 249

My feeling is that the CTBS test is not updated. So, it'skind of out of sync with what is really happening. Theyare testing things we taught eight years ago. (Becker &Pence, in press)

Teachers in the CTBS/CMI district, which gives the CTBStwice and requires passing of the CMI objectives for graduation,commented:

The CTBS does affect what I do in the classroom, be-cause I do have to spend time on those narrowly fo-cused ideas. The test itself is almost defeating the ideas

have tried to instill in my classroom.

The CAP doesn't affect what I do: it does explain why Ido it. (Becker & Pence, in press)

They observed that although the CTBS /CMI program is sup-posed to be diagnostic, it is not effective for that purpose, andthey argued that their students "Flunk test-taking before theyeven have the opportunity to flunk content." One teacher com-plained that:

The kids who are poorest on the CMIs are the ones thatcan problem solve the best in class, especially if it is nota math-related problem. (Becker & Pence, in press)

Summary. In the case studies, both teachers and administra-tors subscribed to California's 1985 Framework. The districtsmet the obligations imposed by the state for proficiency testing,but teachers described the resulting emphases as computational.Teachers appreciate the efforts to correlate CAP with the Frame-work because it validates their efforts at problem solving. How-ever, because districts did address the categories reported byCAP, It validated equally the problem solving conducted byteachers of accelerated classes and the computational empha-sis of remedial teachers. Teachers decried the competitivepressures resulting from percentile rankings and were skepti-cal about CAP's ability to measure problem solving with mul-tiple-choice approaches. They expressed a strong need for indi-vidual student data and some wanted alternative assessmentmeasures.

Both districts and teachers need individual data. If CAPdoes not provide it, they will continue to use the most cost-

Page 258: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

250 Zarinnia and Romberg

effective data available. This is typically from low-level, districttests.

Conclusion

It is essential to recognize that information from testing andassessment programs is used for multiple purposes and thatCAP is only one part of a coherent system. The need for data tosupport internal instructional decisions is paramount in theschools. Perhaps this is because instruction is the central mis-sion of the system, whereas accountability is inherently anexternal issue. CAP does not provide individual student infor-mation, so schools derive that from other sources. The result isthat although CAP is making strenuous efforts to adjust Itsprogram and move beyond basic skills and multiple-choice test-ing, it does so in the context of district tests that are substan-tially at odds with reform goals but that, nevertheless, arerelied on for individual student data.

Every assessment. including those intended for programassessment, should provide timely information and consist ofvalid instructional tasks that are a conscious and integral partof the intended curriculum of each child. If this were in fact thecase, there would be no need to distinguish between individualand program assessment with respect to appropriate tasks.Only the distinction between such things as sampling strate-gies and units of analysts would be significant. If one acknowl-edges student learning as the central mission of schooling, itfurther suggests that not only the tasks, but also the systemand structures for gathering accountability information andreporting the data, should be designed with instructional needsin mind.

ALTERNATIVE REPORTING CATEGORIES WITHIN THEEXISTING SYSTEM

Based on the description of testing in California and othercurrent practices in the United States, four alternative ap-proaches to reporting categories are apparent. These vary inthe way items are categorized in terms of mathematical contentand/or assumed abilities (or intellectual processes).

CV4. Ca°0

Page 259: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 251

Alternative Number 1: Consolidate Testing andUpdate ContentReporting Categories: Key Mathematical Content(Subcategories for process)One potential response already under investigation in a num-ber of school districts in California is a combination of CAP andupdated district testing in an experimental program called Cur-riculum Alignment System/Comprehensive Assessment System(CAS). To replace low-level district testing in mathematics, aconsortium of districts has attempted to alleviate:

the massive amounts of time spent testing

the lack of alignment between the curriculum and thetests

the absence of individual student data in CAP.

A report from one of the districts is indicative of the pro-cess. The district formerly gave the California Achievement Test(CAT). as well as a criterion-referenced district test, and CAP:these three tests took about thirteen hours to administer. Withthe 1985 Framework in mind, the district now prioritizes itsobjectives, which are then mapped onto an item bank to gener-ate the tests. The publisher sends a practice test, which thedistrict administration says is an indication to the teachers ofwhat they need to teach. On the actual test, the questions thatcompose the CAP matrix appear on the first few pages of alonger district test that is norm referenced. CAP items are re-turned to CAP, which generates the usual reports. The testpublisher returns detailed analyses to the district for individualstudents and for classes.

The advantages of Alternative Number 1, in which the con-tent of the tests is updated and merged for efficiency, is that itis nondisruptive. There are few major changes in strategies forgathering, analyzing, and reporting information. Through thedistrict committee, the teachers can emphasize the aspects ofcurriculum that they value most and, thereby, ensure a closermatch between the district test and the curriculum.

The disadvantage of this approach is that it is essentiallyan updating and consolidation of existing strategies. It is, there-fore, unlikely to promote substantive change. It does not an-

r,:r (.14.00.

Page 260: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

252 Zarinnia and Romberg

swer teachers' concerns about the use of multiple-choice itemsto test problem solving. Furthermore. CAP's traditional report-ing categories do nothing either to change perceptions of math-ematics or to focus on the measurement of mathematical power.

Alternative Number 2: Emphasize Mathematical Abilitiesin Each Content Area

Reporting Categories: Content (Reported only insubcategories of mathematical ability)

This second alternative expands the reporting of performancein each content area by sorting items into ability subcategories.For example. the National Assessment of Educational Progresshas identified concepts. procedures. and problem solving ascritical mathematical abilities (NAEP, 1988). Assessment for 1990will report scores for abilities within cacti of the categories ofcontent to be assessed. By reporting conceptual. procedural,and problem-solving scores and appropriately weighing itemswithin each content category, NAEP hopes to emphasize theprocess of doing mathematics (see Figure 12-1). NAEP intendsto use its assignment of content categories to reduce the em-phasis on Number and Operations and to increase attention toGeometry. Algebra. and Functions. These tables illustrate theadvantage of this alternative in the use of content categories topromote instruction in geometry (which has languished) andreliance on subcategories to clarify NAEP's vision of mathematics.

Table 1 Percentage Distribution of Questions by Grade andMathematical Ability

Mathematical Ability Grade 4 Grade 8 Grade 12Conceptual Understanding 40 40 40Procedural Knowledge 30 30 30Problem Solving 30 30 30

Table 2 Percentage Distribution of Questions by Grade and Content AreaContent Area Grade 4 Grade 8 Grade 12Numbers and Operations 45 30 25Measurement 20 15 15Geometry 15 20 15Data Analysis. Statistics.

and Probability 10 15 15Algebra and Functions 10 20 25

Figure 12-1.Tables 1 and 2 from Mathematics Objectives: 1990 Assessment (NAEP, 1988, p. 14).

0u

Page 261: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 253

However, the tables also illustrate the disadvantage ofthis alternative. It is likely that schools will use the categoriesfor a multigrade summary analysis in which two or three weakpoints will he selected for attention. Despite the subcategories,the likelihood is that the primary emphasis of such analyseswill continue to be on categories of content, doing little tochange the vision in schools of what it means to engage inmathematics.

Alternative Number 3: Upgrade Process to the SameStatus as Content ReportingCategories: Content and ProcessThe problem of providing adequate information about perfor-mance is more deep seated than simply updating the contentcategories.

It has been popular to use content-by-behavior matrices.Such matrices have proven to be a powerful organizing struc-ture. Despite modification of the specifics on each axis, thematrix approach has been used in many programs during thepast quarter century. For example, it was integral to the modelof mathematics achievement in the National Longitudinal Studyof Mathematical Abilities (NISMA) (Romberg & Wilson, 1969,pp. 29-44). and to all administrations of the National Assess-ment of Educational Progress. Persistence of the matrix as atool for organizing activity is important and reflects:

its power as an organizing tool;

its visual facility: and

the strong continuity between assessment projects cre-ated by relying on those with the most relevant experi-ence in the field and those planning the next project.

Today, however, the inadequacies of this structure have begunto outweigh its advantages. Evidence for this lies in two basicareas. First, the content dimension remains unchanged. Theresult is implicit statements about curricula that focus on knowl-edge segmented into subjects for study, such as mathematicsInto arithmetic, algebra. and geometry. These have the immedi-ate impact of implying that:

Page 262: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

254 Zarinnia and Romberg

knowledge can be broken down into clearly defined.independent, self-sustaining parts;

st"^h an approach is important, more important thanany other approaches which might follow;

there is a logical sequence of development in whicheach part builds on a preceding foundation;

it is important to know about the divisions of knowl-edge enumerated:.

if knowledge were acquired in this manner, studentswould be able to use and apply their mathematicalknowledge as needed.

Such implicit assumptions may be unwarranted if, for ex-ample. knowledge is regarded as unitary and emphasis is onknowing rather than knowing about. The approach is also un-suitable if there is genuine concern with application and prob-lem solving. Stated simply, purpose should suggest form, andform implies purpose; incoherence may be inferred from any-thing Itss.

Disagreement over the precise structure and arrangementof content in a grid is only part of the problem. We.,,bury (1980)pinpointed a more fundamental concern: the difference betweenthe intellectual structure of a discipline and its institutionalstructure in schools, where it is an administrative frameworkfor tasks. The consequence is that administrative stability im-pedes intellectual change. For similar reasons, Romberg (1985)described mathematics in schools as a stereotyped. static disci-pline in which the pieces have become ends in themselves. Asimilar response to the impact of scientific management andbehaviorism on mathematics as a school subject is Scheffiers(1975) denunciation of the traditional, mechanistic approach tobasic skills and concepts:

The oversimplified educational concept of a "subject"merges with the false public image of mathematics toform quite a misleading conception for the purposes ofeducation: Since it is a subject, runs the myth. It mustbe homogeneous, and in what way homogeneous? Ex-act. mechanical, numerical. and preciseyielding for ev-ery question a decisive and unique answer in accordance

r,u

Page 263: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 255

with an effective routine. It is no wonder that this con-ception isolates mathematics from other subjects, sincewhat is here described is not so much a form of think-ing as a substitute for thinking. The process of calcula-tion or computation only involves the deployment of aset routine with no room for ingenuity or flair, no placefor guesswork or surprise, no chance for discovery, noneed for the human being, in fact. (p. 184)

A second concern is that the process dimension has beenbased on behaviorism. This ref is an application to the prob-lems of education of the engineering approach to scientific man-agement, focusing on managing environmental factors to achievea defined outcome and ignoring the internal cognitive mecha-nisms. Scientific management rests on three basic principles:specialization of work through the simplification of individualtasks, predetermined rules for coordinating the tasks, and de-tailed monitoring of performance (Reich, 1983). Thesemicroprinciples pervaded American education with the samethoroughness with which they were applied in the economy.They dominated the breakdown of knowledge, the roles of teach-ers and students, instructional and administrative processed.the building-block approach of Carnegie units, the content andstructure of textbooks, belief in the textbook as an effective toolfor transmitting content. the structure of university education,and monitoring and evaluation. Hence, the notion of progressemerged through the mastery of simple steps, the developmentof learning hierarchies, explicit directions, daily lesson plans,frequent quizzes, and objective testing of the smallest steps,scope and sequence curricula.

Unfortunately, these are only the more obvious aspects.One consequence of such meticulous planning is that it ren-ders the unplanned unlikely. A second is that a system de-signed to eliminate human error and the element of risk alsoeliminates innovation. A third is that. like factory work, it isdull, uninspiring, and unmemorable except for its boredomfor personal involvement and the mnemonics of the unexpectedare nonexistent.

Bloom's Taxonomy of Educational Objectives (1956) epito-mizes the domination of American education by scientific man-agement, for it completed the process by which not only the

Page 264: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

256 Zarinnia and Romberg

content of learning but the proxies for its intelligent applicationwere classified, organized in a linear sequence, and by defini-tion, broken into a hierarchy of mutually exclusive cells. Theconsequences in the classroom were far reaching. Scope andsequence charts prescribed which parts of a subject were to becovered in what order; each cellular part of each subject wasput into a matrix (e.g., Romberg & Kilpatrick. 1969, p. 285);behaviors suggesting desirable intellectual activity were alsosequenced. However, given the multiplicity of subject cells to becovered, the easiest way to finish the prescribed course ofstudy was to simply cover content without worrying too muchabout thought. Furthermore, matrices are difficult to constructeffectively on paper in more than two dimensions. Conse-quently, few scope and sequence charts addressed both levelsof thinking and specific aspects of content in a very coherentmanner.

The dilemma such matrices pose for both assessment andinstruction is whether to "cover some content areas at alllevels of behavior or to place emphasis on the lower levels ofbehavior for all content areas. This dilemma is partially recog-nized by CAP in its elevation of two process categories to thesame vector as content: Problem Solving and Tables. Graphs.and Integrated Applications. Otherwise. CAP's reporting catego-ries continue the content-by-behavior approach by reporting inmajor content strands and specifying sublevels for Skills andApplications.

CAP's strategy in mixing content and process categories inthe same primary vector of the standard, two-dimensional frame-work recognizes that it is essential to focus on the process ofdoing mathematics. Therefore, processes to be valued as highlyas content, such as problem solving, need to be elevated to thesame category status in the reporting framework if they are toreceive proportional attention in the curriculum.

The disadvantage is that elevating selected processes. suchas problem solving, to the content vector effectively categorizesthem as content. It implirs that they are distinct from othercategories of content, just. is algebra is separated from geom-etry. Thus, although Alternative Number 3 supports California's1985 Framework by emphasizing problem solving, representa-tion, and integrated application, it only adds to the content ofschool mathematics and does little to support the significantly

fl 1

NU

Page 265: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 257

different view: that, in every area of content. it is the doing ofmathematics that is mathematics.

Alternative Number 4: Focus Primarily on ProcessReporting Cattgories: Process (Subcategories for content)If one wished assessment to convey the message that it is theprocess of doing mathematicsrather than simple coverage ofcontentthat is closer to the nature of mathematics (e.g., Gale& Shapley, 1967) and the intent of the 1985 Framework, thenthe logical strategy would be to reserve the major reportingcategories for mathematical processes. Thus, if one consideredmathematics as problem solving, this would generate a set ofmajor reporting categories originating in problem solving, suchas the following:

WorkingDesigning/ Through/ Explaining/

Inquiring Modeling Solving interpretingprobing constructing deducing connectingconjecturing formulating analyzing

concludingcommunicatinggeneralizing

CAP (1987a) adopted a similar strategy in the preliminaryedition of its revised Survey of Academic Skills: Grade 12. whichhas two major reporting categories: Problem Solving/ Reasoningand Understanding and Applications. The former is subdividedaccording to four major components of the problem-solving pro-cess: problem formulation: analysis and strategies: interpreta-tion of solutions: and nonroutine applications /synthesis of rou-tine applications. The latter is divided into the subcategories:number and operations: patterns, functions, and algebra: dataorganization and interpretation/probability; measurement andgeometry: and logical reasoning.

Such categories have the advantage of representing math-ematics as a purposeful and active occupation, especially Ifeach of the processes is fleshed out in subcategories of process.They also address indirectly three of the five major goals of theNCTM (1989) Standards: reasoning. communicating. and prob-lem solving. In addition, the notion of process categories islikely to seem very reasonable to those accustomed to the con-tent-by-process framework because there is only a minor modi-fication in thinking from the content-by-process approach. Flow-

0- NV/*lit)

Page 266: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

258 Zarinnia and Romberg

ever, without content subcategories, there is little about processcategories to distinguish them as unmistakably mathematical.

Thus, process categories have several disadvantages. First,if one were to treat processes as categories for separate coursesor topics of study, it could lead to discrete textbook chaptersand lessons on particular processes. such as working throughor solving. There is no guarantee that processes would be as-sessed as part of a holistic task; in fact, a probable conse-quence would be to disintegrate problem solving in the sameway that the meticulous definition of precise subcategories at-omized content under the existing system.

A second disadvantage is that particular philosophies ofmathematics generate different metaphors of mathematics as:problem solving; modeling; a cultural system: the science ofpatterns; a language. In fact, the power of mathematics lies inthe fact that it is all of these (e.g., NCTM, 1989). Each meta-phor contributesand leads toconsiderable insight into thenature of the mathematical endeavor. Mathematics as a scienceof patterns focuses on the discipline's search to identify anddescribe invariance and, consequently, on the big ideas of math-ematics: quantity, space, dimension, chance, and change (MSEB,1990).

Mathematics as a language emphasizes the discipline's uni-versality, pithy symbolism, semantics and grammar, and gen-erative nature. It also brings insight into the problems of thosewho are linguistically restricted or from minority cultures (e.g.,Cocking & Mestre. 1988). If the analogy of study is pursued,tone, voice, clarity, and precision are all essential, implying theability to represent one's beliefs about the beauty of fractalsfor example, in a way and a medium that are appropriate to theargument and the audience. This philosophy leads to the beliefthat students should be able to convey their mathematical ar-guments in various representations, formally and informally,eloquently and appropriately (NCTM. 1989).

The crux of the problem is that if process categories arerestricted to the integrity of a particular metaphor, they willprobably fossilize and impoverish the vision of the mathemati-cal endeavor in schools. If spread across multiple metaphors.they are likely to disintegrate the mathematical experiences ofchildren as inevitably as minutely specified content has.

nu

Page 267: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 259

SummaryEach of the four alternatives proposed for reporting categoriesis viable, moderate, and attainable. However, each anticipatesrelatively minor changes in the existing system and portrays arestricted view of mathematics. If one takes seriously the taskof measuring and reporting achievement in a way that encour-ages desirable change, the issue demands more than rearrang-ing within extant structures. As they stand, the first four alter-native structures for reporting categories are unlikely to bringabout substantir change.

MATHEMATICAL POWER

The basic issues and goals of assessment need to be reconsid-ered and alternatives proposed that are powerful and practicaland that make sense. The essential issue is that there is aconsiderable array of desirable information, but one can onlyhave a limited number of reporting categories if their messageis to be readily intelligible. This has led to the selection of alimited number of critical features for gathering and reportinginformation and resulted in a grossly simplified version of math-ematics. The real problem is how to gather complex informa-tion and report simply and effectively, but not simple mindedly

Epistemology and Authority

The single greatest issue in improving school mathematics is tochange the epistemology of mathematics in schools, the senseby teachers and students of what the mathematical enterpriseis all about. The magnitude of the current misunderstanding ofmathematics is well illustrated by the fact that over 80 percentof the teachers who responded to NCRMSE's survey on man-dated testing believe problem solving is included in their stan-dardized district test of basic skills (Romberg, Zarinnia, & Wil-Hams, 1989).

The epistemology of school mathematics will be turnedaround only by its complete democratization and a change inthe authority structure of the subject (Me Inn-Olsen. 1987) Thenotion that mathematics is a set of rules and formalisms in-vented by experts that everyone else is to memorize and use toobtain unique, correct answers must be changed. In this con-

41 -4

Ay kr, I,

Page 268: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

260 Zarinnia and Romberg

text, there is an obvious potential for developing a unity involv-ing the cultural genesis of mathematical ideas (Bishop, 1988).research in situated cognition (Brown. Collins. & Duguid, 1988).and a multicultural population of inadequately served minori-ties. The answers and the problems beg to be connected. Teach-ers, deskilled for decades, are pivotal. So, too, are students.

To lead school mathematics epistemologically, reporting cat-egories must not only convey information about achievement.but carry clear messages about the nature of the mathematicalenterprise, and every individual's role, rights, and responsibili-ties in the undertaking. The single most urgent message ispolitical: mathematics Is a universal, democratic, and collabo-rative endeavor in which all students are entitled to participateas citizens.

The question is. -What set of reporting categories mightsupport this?" California's 1985 Rramework introduced the ideaof mathematical power as the goal of instruction:

Mathematical power, which involves the ability to dis-cern mathematical relationships, reason logically, anduse mathematical techniques effectively, must be thecentral concern of mathematics education and must bethe context in which skills are developed. (CaliforniaState Department of Education. 1985. p. I)

Despite introduction of the idea of mathematical power inthe 1985 Framework, It is essential to recognize that the docu-ment focused heavily on outlining desirable mathematical con-tent. Consequently, it is to its seven strands of content Num-ber. Measurement, Geometry. Patterns and Functions, Statistics,Probability, and Logicthat administrators in California makerepeated reference. Their comments on the impact of testingsuggest that they are also quite familiar with the Framework's recommendations on instruction, especially with the em-phasis on problem solving. This may be because problem solv-ing is reported by CAP as one of the categories of content. Infact, when districts review their programs, it is the CAP catego-ries that arc addressed (Romberg & Zarinnia, in press a).One cannot assume that a focus on these categories will trans-late automaiically into the mathematical power sought by theFlramework.

Page 269: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 261

The NCTM (1989) Standards also adopted mathematicalpower as the phrase most evocative of the quality of math-ematical literacy sought for the entire population. Now. Califor-nia is developing a new mathematics framework. Although stillin its preliminary stages. it is clear the 1990 California "Math-ematics Framework" will build both on increasing knowledgeand conviction nationally about what it means to have math-ematical power and on how students become mathematicallypowerful. The 1990 "Framework" will seek a new organizationand structure of mathematics programs at all grade levels. onethat emphasizes students' independent judgment and activeengagement in mathematical investigation. It will also focusspecific attention on assessment, both as an integral part of theclassroom program and as externally imposed.

There are, therefore, five critical questions involved in theexamination of mathematical power:

1. What is meant by mathematical power?2. What can be regarded as convincing evidence of

mathematical power?3. How should that evidence be gathered and ana-

lyzed?4. How should the evidence be summarized, and how

can achievement of mathematical power be de-scribed in a report?

5. How corruptible is the resulting structure? Ifschools examine the report. focus on weak catego-ries, encourage students to create evidence of im-provement. and evaluate themselves by the in-struments recommended, will students, in fact,become mathematically powerful?

Mathematical Power Interpreted

There is a strong distinction between definition of mathemati-cal power as the intrinsic power of mathematics and the per-ception of mathematical power as individuals and societies em-powered by mathematics. Therefore. one needs to think aboutwhat it means to be mathematically powerful both as individu-als and as a society and to consider ways of identifying math-ematical power.

is0(1

',AO

Page 270: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

262 Zarinnia and Romberg

All societies use and create mathematics. Mathematicalpower for the individual means that each person has the expe-rience and understanding to participate constructively in soci-ety. Over the ages. people have invented and used mathematicsto count. measure, locate, design, play. conjecture, and ex-plain. They have also examined its generalized abstractionsand developed out of them further mathematicswhether ex-planations, designs, proofs, or new theoremswhich may ormay not have been put to practical application (Bishop, 1988).They continue to do all of these, but in a rapidly increasingvariety of contexts, in increasingly complex situations, and withshorter and shorter time spans for development. Assessmentshould seek and report evidence of these kinds of mathematicalactivity.

Mathematics is essential to value-guided optimization andchoice between alternatives. for example. Consequently, under-standing and experience of such uses of mathematics as pre-cise and imprecise measures of vast and difficult to measurequantities is now critical to policy formulation and public deci-sion making. In a society in which mathematics and informa-tion technology are pervasive, all members need to understandhow mathematics is used: they need to know how to use it andto have a sense of how the discipline functions. The bottom lineis whether we have a society whose members have a broad,reflective understanding and experience of mathematics in use,or whether we do not.

Mathematics is a profound and powerful part of humanculture (MSEB, 1989. p. 33). It provides practical knowledge foreveryday quantifying. locating, and designing. As such, it is thebasis for science and technology and is deeply ingrained inaesthetics. Furthermore, in a culture that is heavily mathemati-cal and technical, mathematical inference is at the root of ratio-nal argument and behind many debates on public policy. In thesense that citizens need a solid understanding of, for example,very large numbers, it is also a civic issue. Finally, it is a majorpart of the western Intellectual tradition, and there is a deepvein of amateur mathematics in many leisure activities.

Everybody uses and relies on mathematics and, to somedegree. everyone Is a mathematician. However, not only do moststudents leave school with Inadequate preparation, but "math-ematics is the worst curricular villain In driving students to

;

Page 271: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 263

failure in school" (MSEB. 1989. p. 7). To counteract this situa-tion, assessment programs should seek evidence of studentsusing, reflecting on, and inventing mathematics in the contextof value and policy judgments, designing and inventing, playingaround with objects and ideas, describing and explaining theirideas and positions.

For a society to be mathematically powerful, its citizensmust have the mathematical understanding and experience tojointly undertake the routine tasks of everyday life, operate as asociety, and progress as a civilization. This means that in soci-ety both a critical mass of understanding and experience areneeded and, in addition, availability of a substantial range ofspecial expertise. Individuals have the potential for several kindsof power: to stand on their own with independent power; tocontribute to the power of a group; to enhance and extend theirown power by drawing on the group context. A recent trend byemployers is the search for individuals who can work as effec-tive members of a team. Hence, intertwined with individualpower is the ability of a society to produce mathematicallypowerful groups. The full range of a society's power depends onthe degree to which each of these facets exists in conjunctionwith the other. For accountability and encouragement, evidenceof each facet should be reported.

The NCTM Standards argue that to be mathematically pow-erful in a mathematical and technical culture, students shoulddevelop the power to explore, conjecture, reason logically, andintegrate a variety of mathematical methods effectively to solveproblems. In becoming mathematical problem solvers, they needto value mathematics, to reason and communicate mathemati-cally, and to become confident in their power to use mathemat-ics coherently to make sense of problematic situations in theworld around them. Hence, the document advocates four stan-dards that should be used to critique all of the other stan-dards: mathematics as problem solving, mathematics as rea-soning, mathematics as communication, and mathematics asconnections between topics and with other disciplines. Any as-sessment should provide evidence of each.

Students retain best the mathematics that they learn throughconstruction and experience. Hence, the argument in the Stan-dards is that students are more likely to become mathemati-cally powerful if they learn mathematics in the context of prob-

1) "1ti1

Page 272: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

264 Zarinnia and Romberg

lematic mathematical situations. As students use this approachto mathematical content, they learn to formulate problems anddevelop and apply strategies to their solution both within andoutside mathematics. In a range of contexts, they verify andinterpret results and generalize solutions to new problem situa-tions. In so doing, they apply mathematical modeling and be-come confident in their ability to address real-world problemsituations. As they reason through their problem situations,students develop the habit of making and evaluating conjec-tures, and of constructing, following, and judging valid argu-ments. In the process, they deduce and induce, apply spatial.proportional, and graphic reasoning. construct proofs, and for-mulate counter-examples. Assessment of students' problem solv-ing must reflect these considerations.

The problem situations referred to in both the Standardsand the draft of the 1990 "Framework" intend purposeful inves-tigation of situations that arc open to multiple approaches.Students need experience in a range of prototypic situations sothat they can analyze their structure, finding essential featuresand ways in which aspects are related. Prototypic is meant intwo ways: prototypic in the sense that the situation should berepresentative of the kind of cultural context that has tradition-ally given rise to mathematics (Freudenthal. 1983) and prototypicin the sense of the familiarity of the particular context to thestudent. In the latter context, students need to be able to posea question, see the next question. evaluate a strategy, andconstruct and discuss alternative methods. Having done so.they need to examine assumptions and arguments and makeefficient choices.

To produce a worthwhile result, students may need to judgewhat data are required, and then gather, process, and evaluatethem. They may also need to develop examples by which to testconjectures. If the evaluation is unsatisfactory, they may needto regroup for reconsideration. This suggests the need for fluencywith notational systems and the ability to develop abstractionsand explain clearly, to appreciate another's point of view, andto arrive at a shared understanding.

Communication is essential to mathematically powerful in-dividuals. In communicating with others about the problemsthat they are engaged in, students develop the power to reflecton, evaluate, and clarify their own thinking, to model situa-tions, to formulate definitions, and to express ideas. In the

0 "I i)iv Ai

Page 273: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 265

process, they discuss their conjectures with others and developthe power to make convincing arguments, to read, listen, andview with understanding. and to ask extending questions. Ulti-mately, their power to communicate will be judged by theirversatility, fluency, and elegance in choosing. using. and switch-ing between representations that both symbolize best the math-ematical ideas under discussion and are most appropriate totheir audience.

This is a picture of individuals who tackle problems with aconfidence based on a combination of the coherent mathemati-cal knowledge that has emerged from working experience andthe collaborative support that comes from membership in abroader community. It suggests that both the power of math-ematics and the mathematical power of individuals are multi-faceted. All facets are required among members of a society, butdifferent groups and group members may reasonably and pro-ductively be stronger in some facets than in others. The issue ishow to gather evidence and report it in such a way as to setstandards and describe range without imposing expectations onindividuals that generate a sense of failure and lack of power.

This empowerment view of mathematical literacy differs fromtraditional conceptions in two major and inherently related ways.First, it goes beyond the typical stipulation of knowledge. skills,and application. Power carries connotations of control and au-thority as well as of driving force. To advocate that all studentsbecome mathematically powerful is to demand that they havethe experience, confidence, desire, and independence to wieldtheir knowledge actively and productively. It carries with itconcepts of choice, judgment, initiative, self-evaluation, respon-sibility, collaboration, and mutual respect, almost all of whichare missing in the present formulation of school mathematics.

Second, the change in language from purely symbolic prob-lems to problems situated in a realistic context reflects theneed for students to become immersed in significantly morecomplex, messy, and culturally based problems that are opento a variety of strategies and multiple solutions. The magni-tude, or the unfamiliarity, of ensuing investigations may re-quire extended effort by one student or the joint efforts of morethan one student. The demand for problem situations is intrin-sically related to the need for mathematical power. The demandfor power, the recommended context of problem situations fordeveloping expertise, extended projects, and the need for col-

4-, ..4be

3

Page 274: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

266 Zarinnia and Romberg

laborative effort go hand in hand with the recognition thatmathematics is value driven and value laden.

This kind of reflective, experience-based, collaborative know-ing and doing is substantially different from the traditionalpursuit of a sequence of independently acquired symbols, rules,procedures, methods, and skills whose sequential acquisitionwas presumed to aggregate coherently as effective mathemati-cal knowledge. Standard algorithms may save time, but astudent's standard algorithm does not have to be the standardalgorithm. Note that coherence stems from purpose. A situationmay cohere in several ways, depending on perspective. It willhave coherence for the student only if the perspective and pur-pose is the student's own.

Definitions of mathematics as the science of patterns, as alanguage. as modeling, as a powerful abstraction, or as a toolfor solving problems, will all continue to fall short unless stu-dents learn mathematics as something created by a communityin which they are independent, collaborative, and contribu-ting members. Their contribution may emerge either in responseto a specific practical need, or tangentially from reflection,conjecture, argumentation, and validation. The root problem isto change the epistemology and politics of mathematics inschools.

Thus, the challenge is to causeand gather evidence ofaradical rethinking in the classroom of what it means to learnmathematics. We are looking for students interested in think-Lig mathematically. purposefully, and productively rather thanin accumulating an aggregate of classes that in combinationpurport to represent coverage of mathematics. The task of math-ematics education is to enculture students into a democratic,entrepreneurial, mathematical, and technical society and to helpthem develop a sense of the culture of mathematics that isrequisite. They must be empowered not only by a knowledgeabout mathematics but also by the confidence that, to somedegree. they are mathematicians and members of the math-ematics community. The immediate goal is to develop reportingcategories to help communicate that challenge.

STANDARDS OF EVIDENCE IN THE IDENTIFICATION OFMATHEMATICAL POWER

If we expect students to use mathematics confidently and effec-tively to make sense of their world. we should gather and report

0is 4

Page 275: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 267

evidence that they are using mathematics: Direct evidence isessential. If we expect them to use mathematics for designpurposes, they should have design experiences and he assessedin that context. Similarly. if we expect them to play aroundwith simulations or with abstract ideas in order to developeffective algorithms, hypotheses, explanations, or new math-ematics, then it is in the context of these experiences thatevidence should be gathered. This argument suggests that cat-egories that would report the doing of mathematics, whetherusing or developing it. would elicit the most direct evidence ofinvolvement in the mathematics process.

Alternative Number 5: Societal Uses of MathematicsReporting Categories: Societal UseIn reporting the condition of mathematics education in theUnited States, the Mathematical Sciences Education Board(1989, p. 2) outlined a series of societal uses of mathematics.offering a potential set of categories for the doing of mathematics.

Practical knowledge that can be put to immediateuse in Improving basic living standards

Civic knowledge to enhance understanding ofpublic policy issues (A public afraid or un-able to reason with figures is unable to dis-tinguish between rational and recklessclaims in public policy.)

Professional -- knowledge as an occupational tool

Leisure the knowledge anc disposition to enjoymathematical and logical challenges

Cultural knowledge as a major part of our intellec-tual tradition

These categories have strong intuitive appeal for a numberof reasons. First, they place immediate emphasis on mathemat-ics use at all levels of society across major societal functions.They also make clear its pervasiveness. Each category is acces-sible and purposeful for all students. The set emerges from astudy of the dimensions of the problem of improving math-ematics education in the United States and leads readily to theuse of problematic situations as the context for learningmathematics.

Page 276: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

268 Zarinnia and Romberg

If. in conjunction with these reporting categories, an inves-tigational and problem-solving approach were adopted, a stu-dent could engage in, for example, designing and furnishing ofhis or her bedroom within a given budget: analyzing and judg-ing the arguments regarding the deforestation of Amazonia;drawing and analyzing the geometry and trigonometry of localIndian burial mounds: investigating the precise role of propor-tion in visual illusion: developing a paper on Mandelbrot, orspeculating about numeric relationships. Such an investiga-tional and problem-solving approach conveys a clear messageabout mathematics as something that everyone engages in pur-posefully and productively in the context o: .eal situations.

These examples make a second feature explicit. The catego-ries not only assume the integration of mathematics, they in-volve almost automatic connection with other content areas inthe curriculum, with the student's personal life, and with thesignificant issues of everyday life. Selection of work in each ofthe categories can be tied readily to the individual and groupinterests of culturally diverse student populations. Each cat-egory is also amenable to efforts of different magnitude, whethergroup or individual.

The most obvious disadvantage is that, even more thanwith the process categories in Alternative Number 4. these cat-egories have no . rt relationship to mathematics and could beapplied to any area of the curriculum. In addition, categoriesthat focus exclusively on the uses of mathematics and ignoreits invention implicitly leave the development to experts. Thisomission from reports on mathematical achievement for theentire population of students would impede democratizationand fundamental change in epistemology.

Alternative Number 6: Cross-Cultural Genesis ofMathematical IdeasReporting Categories: Universal Human Activities ThatHave Prompted the Creation of Mathematical IdeasCounting. Measuring, Designing, Locating, Playing.Explaining

A cross-cultural study of the genesis of mathematical Ideasconcluded that mathematics is a cultural technology that isinvented by all societies. Every society develops the means to

Page 277: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 269

locate that is at the heart of geometrical and topographicalreasoning. In each society, the most obviously mathematicalactivities are efforts to develop systems for counting and mea-suring. Design, both the abstract form and the abstractingprocess (the science of patterns), is at the heart of mathemati-cal activity. It is essential to the development of counting, locat-ing, and measuring systems, to the abstraction and inventionof mathematical objects. and to the identification of more tan-gible objects and processes. Controlled, rule driven, specula-tive, and voluntary distancing from realitybriefly, playis be--hind design and is essential to hypothesis development andmodeling. Explaining focuses attention on the essence of themathematical culture, the search for patterns that establishconnections, and efforts to communicate their description ef-fectively and elegantly. Communication is the representation ofexplanations. This distinction between explanation and com-munication becomes especially significant in the context of col-laborative activity among a group of students (Bishop. 1988).

A set of categories based on this philosophy combines afocus on mathematics in use with emphasis on the inventionand generation of mathematical ideas as a sense-making activ.ity. The categories support reflective activity in the practical aswell as the esoteric, fantastic, and theoretical sense. Further-more, they lend themselves to the integration. rather than sepa-ration. of mathematical topics. This set of categories has theadvantage of emphasizing mathematics in use and simulta-neously being obviously mathematical. It focuses on mathematicsas a discipline for making sense of the world and reasoningabout itself without restricting the nature of mathematics to aparticular philosophical metaphor.

In addition, the categories have the advantage over tradi-tional designationslike abstractions. invention, proof, and ap-plicationof being more obviously based in cultural contextswhile at the same time sounding both reasonably familiar andmathematical. The most important quality of this approach isthat it emphasizes mathematics as a human activity under-taken by the great and small of all societies, individually and incooperation. in accomplishments of they increments and hugeleaps.

The set of categories selected for reporting should providethe information that the members of the educational system

Al

Page 278: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

270 Zarinnia and Romberg

seek, whether parents. teachers, administrators, communitymembers, or policy makers. It should support the traditionaluses foi accountability. placement. evaluation, guidance, andinstruction. It should, above all else, support. the two mainconcerns of providing Information for monitoring and establish-ing a set of values for guiding change. In particular. It shouldanswer such questions as:

1. Do students think mathematically and use math-ematics reflectively for practical and theoreticalinvestigation in the course of authentic mathemati-cal activity?

2. Do students communicate their ideas with fluency,versatility, and appropriate technology in writing,speech, graphical representation, and appropriatesymbols? Can they think on their feet, mathemati-cally? Do they do these things adequately, effec-tively. accurately. and elegantly?

3. Do students have a sense of mathematical com-munity? Can they work with others? How do theyfunction as part of an investigational team?

4. Do they have a sense of the mathematical enter-prise? Do they find mathematics valuable and fun?Do they have a strong sense of mathematical In-quiry? Has the student's engagement in authenticmathematical activity engendered an enculturationin mathematics. a set of beliefs and understand-ings about the nature of mathematical knowing?

The need for direct evidence to a !slyer these questions wouldsuggest that the set of categories selected should address thedoing of mathematics, for which Bishop's categories for thecross-cultural genesis of mathematical ideas are most appro-priate. it also suggests that there should be categories to en-courage collaborating, communicating, and developing a math-ematical disposition. Thus, the following set of major categoriesis recommended to answer the educational system's most perti-nent questions about mathematics education and to spur re-form:

fl "01 0

Page 279: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 271

Alternative Number 7: Recommended ReportingCategories

Doing Mathematics: Locating. Counting. Measuring. De-signing. Playing, Explaining

Representing and Communicating Mathematics: Mentaland Represented Facility in CommunicatingVerbally,Visually. Graphically, Symbolically

Mathematical Community: Individual Activity. Collabo-rative Activity

Mathematical Disposition: Valuing Math Confidence, Be-liefs about the Mathematical Enterprise. Willingness toEngage and Persist

The first set of categoriesDoing Mathematicshas sub-categories that arc not phrased in traditional mathematicalterms. In other contexts, such mathematical terms as space.number, logic (Rucker. 1987). or logic. number measurement.space, statistics (GAIM 1988) have been used as a frameworkto describe mathematical activity. Unfortunately, the traditionalterms that describe mathematical activity have become associ-ated with a kind of school mathematics that is so sterile anddivorced from the reality of mathematicswhether as a cultureor in culturethat Bishop's (1988) alternatives are powerfullyevocative. They are clearly close enough to traditional vocabu-lary for those who think in that language to make the connec-tion, yet they emphasize mathematics in terms of active en-gagement, creative reflection, and productive effort. It takeslittle effort to .,.°e that number is included in the set of catego-ries, but that counting emphasizes mathematical activity. Playis less obviously mathematical until one considers that intellec-tual "what-if-in( in such domains as number, space. and logicis the essence of mathematical creativity. In fact the subcat-egories of doing mathematics are essentially one classificationof the mathematical problems that societies have addressed.

The categories are not, cannot be. and should not be, mu-tually exclusive. For example. explaining may be seen as com-municating: in fact, one represents and communicates an ex-planation. Similarly, the categories for individual andcollaborative activity are meaningless unless they apply to one

' I ;Ae 4.

1i.

Page 280: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

272 Zarinnia and Romberg

or more categories of mathematical activity. Basically, the threecategories for Mathematical Community. Communication, andDisposition would, of necessity, cut across all categories ofmathematical Doing. In addition, one could not, for example,represent or communicate mathematics if there were no au-thentic mathematical activity to represent and communicate.

The intermeshing of the categories should contribute to anintegrated and collaborative approach to mathematics educa-tion. First, with the exception of counting and measuring, thecategories do not resemble existing expressionand, therefore.existing breakdownof aspects of mathematics. Second, thecategories are more open than traditional vocabulary to inter-disciplinary effort and a more realistic mathematics education.Finally, when examined in conjunction with subcategoriesfor individual and collaborative work, such distinctions as ex-planation, communication. and representation present clear op-portunity for diverse individuals to make different kinds ofcontributions.

THE CONTEXT FOR GATHERING ASSESSMENT INFORMATION

However appropriate the categories for reporting evidence ofmathematical power may be. they are corruptible. Changingone part of the assessment system does not guarantee thatanother facet of the system will not corrupt it. Suppose, forexample, that the assessment of student performance in eachreporting category were undertaken as a discrete act with dis-crete items designed especially and exclusively for a particularcategorywhether multiple-choice in format or not. The resultwould be to maintain the perception of mathematics as a col-lection of unrelated piece: Similarly, if tasks are prescribed sothat there is no student choice or initiative in the process andno opportunity for stuoent self-evaluation. mathematics willcontinue to be seen as a compendium of expert knowledge tobk. covered. If mathematical discourse is assessed independentlyof authentic mathematical activity, it will be learned as a skillwith slight chance for real application and transfer.

The Nature of TasksThe potential corruptibility of any set of categories makes itclear that the attributes of the assessment task affect the valid-

Page 281: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 273

ity of subsequent inference to such a degree that they must beconsidered an integral part of th" evidence. This is already thecase for very minor changes in wording between almost identi-cal versions of multiple-choice questions and is also the basisfor arguments of cultural bias in tasks. If one translates thetraditional handshake problem from handshaking to kissingupon greeting, this is more evident: boys do not usually kissboys; men in some societies kiss, but do it on each cheek. It iseven more obvious when one is comparing students' perfor-mance on mental tasks, written tasks, discussion, or in alter-nate forms of representation. One cannot. with reliability, gen-eralize from one task type to another.

In other words, task type and context are important qualifi-ers of evidence. They are the frameworks within which studentscan experience and demonstrate mathematical power. A richtask should be a microcosm of mathematical activity, open tostudent engagement in more than oneor either of severalcategories of mathematical doing. Whether the task is gener-ated by the student or the teacher, the quality and range oftasks and the nature of the mathematics addressed are strongindicators of the quality of the curriculum within which thestudent has experienced mathematics.

This is important becituse our efforts, to date, have been toadminister a large number of small and relatively uniformtaskswhich in no way could be regarded as microcosms ofmathematical activityin an attempt to describe performancein a specific cell of a matrix of competencies. This practicesuggests that criterion-referenced tasks that arc designed me-ticulously to elicit performance in one narrow domain provideevidence that can only be construed very indirectly as repre-senting mathematical power. Time available for the task is partof the task context. Aggregation of performance on narrowlyconstrued tasks undertaken under time constraints cannot heregarded as evidence of student power in more complex ortime elastic situations.

If you want to know what an individual can do in a difficultsituation, under a short time frame, using a computer, eitheralone or as part of a team, you put the individual In a simula-tion of that situation and observe closely. If you want to knowwhether a student can talk, write about, graph. or present alogical argument in conversation about the mathematics of the

Page 282: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

274 Zarinnia and Romberg

task under consideration, you assess It directly by specifyingcontext. The same is true regardless of whether the purpose isto find out if the student can produce a reasonable estimateunder time pressure or if. after extended reflection. he or shecan put forward a conjecture with an elegant explanation oruse technology to create and Investigate models.

An implication is that assessment tasks should vary withrespect to time frame. familiarity. available technologw, andsocial context. This inevitably means that some tasks mightcall for a simple and rapid decision. others for more contempla-tive judgment. yet others for extended and collaborative effort.In existing assessments. task type has been rigidly controlledand restricted. Measurement issues required narrow domainspecification and economics dictated machine scoringthus.the exclusion of open-ended answers and variable forms ofrepresentation. However. assessment of mathematical power re-quires a range of task categories and contexts in which thestudent is culturally comfortable. This is essential to the re-porting and fostering of reform in school mathematics.

Few countries arc so constrained in the tasks they set forstudents as ours is. Some of these following task types arebeing used in England. The Netherlands. and Australia (see deLange. 1987: Department of Education and Science. 1985:Collis. Romberg. & Jurdak. 1986: School Mathematics Project,1988):

Extended Project Work (Individual and Group) (lastsabout two weeks/five times per year) (School Math-ematics Project ISMPI. 1988)

Open-Ended Tasks (Individual and Group) (GAM.1988) (last from 20 minutes to 90 minutes)

Mental Facility Tasks (Last about 15 minutes: includejudgments on spatial tasks. such as 3D rotations aswell as rough or limited computation)

Two-Stage Tasks (Initially undertaken as test itemsand then taken as homework for further exploration)(de Lange. 1987)

SOLO Items (Multiple-test items linked to a single.more complex stem) ( Collis. Romberg. & Jurdak. 1986)

Page 283: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Frarnework for the California Assessment Program 275

As we reconsider the parameters of tasks in the context ofefforts to change the epistemology of school mathematics. it isessential to acknowledge that the prescription of tasks implicitin outcome-based measurement (Sirotnik, 1984) detracts fromthat goal and also imposes culture. The School MathematicsProject recognized the importance of school-based. and stu-dent-initiated tasksand, thus, local and individual freedom tocontrol culture as well as other aspects of contextin its crite-ria for designing and evaluating extended project work (SMP,1988). It opted deliberately for templates to analyze extendedtasks which recognize the right of students to not only choose.but to conceive the focus of their own tasks. A similar move inCalifornia would restore local control while maintaining na-tional standards and at the same time contribute to the democ-ratization of the curriculum.

The Evaluation of Work

If one really subscribes to the idea that a change in the author-ity structure of school mathematics is essential to real changein its epistemology, that authority must be seen to transferfrom external experts to the school, the teacher, and the stu-dent. The assessment process, including its tasks, is a key partof that process. As long as assessment is entirely an externaldictate rather than a collaborative effort, the final reality forstudents is that they must learn somebody else's mathematicsas opposed to holding their own mathematical ideas up forcooperative assessment by the total mathematical community.which includes their peers.

One effective strategy for democratizing school mathematicswould be for tasks to include a strong element of choice, self-evaluation, and peer review. Student self-evaluation and peerreview ild be moderated either by the teacher, or on a sam-pling basis, by the assessment program. Self-evaluation andchoice would serve the dual purpose of changing the authoritystructure and fostering the habit of self-analysis.

There is an additional advantage: choice allows identifica-tion of extraordinary achievement, whether In depth or range.Without choice, there is a serious problem of identifying appro-priate range and level of specificity in the reporting categories.With choice, the range of mathematics undertaken by studentscan flex, and thus can be described more precisely.

ifo

Page 284: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

276 Zarinnia and Romberg

The Strategy for Collecting EvidenceEach assessment task should be a sound instructional activity.Given that direct evidence is essential, alternatives to existingassessment tasks are required: There are several reasons whythis should be the case:

a. It is quite obvious that the alternative assessment tasksdescribed or envisioned (Stenmark, 1989) take consid-erably longer than the typical timed test. If an adequatenumber and range of tasks is to be engaged in, itmakes sense for them to be an integral part of thecurriculum.

b. One of the expressed intents is for the assessment toaffect teaching. Hence, developing tasks that look asmuch as possible like the kind of teaching they intendto encourage would be the fastest way of producing animpact.

c. If tasks arc sound instructional activities and take con-siderably longer than standardized test items, they shouldbe incorporated into Instruction. This implies continu-ous assessment, which serves to have a greater impactand simultaneously provides teachers with more usableinformation about students than is derived from stan-dardized tests. It also provides more authoritative infor-mation for regular communication with parents.

d. If there is a major element of choice, self-evaluation,and school-based administration and analysis, the strat-egy will effectively and rapidly reskill the teachers, oneof the biggest single challenges of reform.

There Should Be Agreed Aspects

Categories of tasks, common categories for analyzing evidence,common standards for judging evidence, a standard languageof description, and formal organizational strategies for arrivingat interjudge agreement need to be decided upon. Direct evi-dence is essential, the context is an essential part of the evi-dence, and common sense suggests that the assessment tasksbe incorporated into regular instruction. However, additionalproblems need to be resolved if the reporting categories are tobe effective. The first problem is that of finding an alternative to

2S4

Page 285: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 277

standardizc'd tests for the purpose of accountability, a matter ofvalid assessment. The second is one of describing the quality ofperformance.

It is comr-on knowledge that the high school of originasopposed to grades or test scoresis the most significant pre-dictor of students' success in college. Standardized testing pro-liferated because transcripts from a particular school could berelied on to reflect little more than credit accrual and seat time(Wiggins, 1989). Thus, regardless of whether assessment tasksare a one-time affair or become part of instruction, it is essen-tial that there be universally respected interjudge agreementif categories that rely on school-based assessment of suchthings as student portfolios of extended project work are to beusable for monitoring and reporting students' mathematicalachievement.

Interjudge agreement, whether on a large or small scale,requires common rubrics for defining, undertaking, and ana-lyzing tasks. The SMP (1988) assessment sheets for open-endedtasks (Figures 12-2 and 12-3) illustrate the use of rubrics toguide analysis, set standards, and enable the process ofinterjudge agreement. The investigational sheet in Figure 12-2guides teachers through the process of assessing a student'smathematical investigation of a pool table. It makes clear whatthe agreed-upon criteria are for arriving at a grade, it identifiesthe teachers' task-related modification of the criteria, and itmeasures the student's assessed performance. Figure 12-3 dem-onstrates a similar set of rubrics and place for teacher com-ments for a student who conducted a practical project investi-gating the design of packaging for Smarties (similar to M&Mcandies). In addition, during the course of the project, the teachercan make notations and comments on the student's diary ofthe project. To arrive at reliable coding, teachers receive ad-vanced training and also meet collegially with other teachers intheir school and region to discuss grading. The strategy bothenables teachers to arrive at interjudge agreement and alsocommunicates the kind of mathematics sought.

The SMP (1988) strategy for logging, commenting, judging,coding, summarizing, and arriving at agreement about studentwork makes possible the incorporation of assessment as a rou-tine and productive part of instruction. It also provides a basisfor the development of a common language for describing the

0

Page 286: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

12-2

.S

MP

Ass

essm

ent S

heet

for

Inve

stig

atio

nal P

roje

ct (

SM

P, 1

988,

p. 5

).N

ote:

Thi

s gr

id is

bac

ked

up b

y su

bsta

ntia

l writ

ten

mat

eria

l in

an a

ccom

pany

ing

hand

book

.

3X1.

11-1

6A

SS

( S

UIE

NT

SH

EE

T F

OR

OT

T

(IN

VE

ST

GA

110N

AL)

Sim

-( SIR

Cad

dale

lB

oa U

S*

Sin

k

((tm

Cod

( 71

. ma

al g

ni*

r .1

1 to

o M

Y la

QU

ES

TIO

NIN

G/ Q

E D

da l.

-1. p

*In

u*((

lC

ET

I:H6@

(0O

m G

an in

inua

psk

red

Ea

((W

m(

Grid

(

Tec

her

cod(

TT

* .M

u gr

ids

C (

V. I

.T

M P

ON

4 p

ita A

wl;

Tuc

k.'.

teen

tsto

S 0

.116

0( lo

(IS

* le

ist

11.1

10n1

( *N

HY

KS

Mal

e

11.Q

INC

IOF

L R

ea .Y

(*n

ibs

was

a,..

1of

. In

.***

(1.7

1(...

Juin

ren

tal .

..SW

NV

( m

an *

Sat

( ap

pnad

naS

ark(

tan*

* as

ear

n nk

b Ild

....n

ues

inns

enn

es.

mu

*am

r a

IN In

t(id

iom

...el

(fla

n w

*a

1.3.

1...

wok

Wen

s

GE

TT

ING

GS

ape

s *T

ay. s

r '..

+4a

ST

AIM

GV

aS

IMP

LIF

YIN

G

St(

4.11

.44,

1u41

(spi

no

Um

1S

t( N

ES

***

Lly

"Ps(

( (

Sub

Gat

(itW

EE

LS13

WS

Foi

l*Ilr

0 p

osea

su h

(1.

0.)T

ST

EM

AT

1C-

limal

s **

Oa(

Ipie

l eel

ens

-G

UI

0.(6

1. T

im I.

IS a

l as

*et w

ik*t

rws

stia

Yap

E..*

snot

*. k

vet

t com

al s

gab

bg a

Fn.

.

CLA

SS

IFY

ING

CL

Coi

tal.

((am

ine:

4y 1

.so

* el

ls*

...F

AH

kkn

owka

Inp.

a. le

wd

clsa

ircaf

ee,

Wok

e* S

am c

n.tia

sym

bolis

m s

it M

al p

olar

is 4

s**

11(E

rnow

eLE

CO

RD

040

or ta

rlo Iw

Dag

in.(

(S

i ent

onal

11*

Man

n of

tad

GS

A r

pa :

abi

n na

da(

ant (

ea d

apI

iay

.(C

OS

IEC

TU

RIN

G/ C

O b

ow(

wag

*ta

ns p

ans.

CE

NU

A11

5010

NA

.. m

ks..

flea

wIr

onsa

p M

any

no:w

awa

fest

.4(

I (ow

. * (

Mk

pag

ram

CH

EC

KIN

G/

CT

ON

E s

al p

en H

y*,

(II *

*M

OV

ING

nag&

nut(

'Oas

ts..

own

ato

. ?K

w s

om.S

.G. 4

4 w

alyi

ea*

nalu

ese

pm. S

oma

dapt

ima

. pal

l.S

IAG

IMU

SIN

OS

U s

a.(6

. 0..*

*.di

acrb

i Im

o*L

I diw

aser

aw

ry F

s.G

in e

t wan

acio

uza.

lags

So

Ha(

Sire

, mat

hW

e*, G

tqr*

S1

oale

*.4.

CO

MM

UN

IC.

co1*

"....

*!(*

.....,

......

..(...

......

.....s

.a...

....

..O

wn

. (ha

sou

nd 4

cry(

*(n1

1 4a

. tha

t .14

.1.4

wic

sms=

-....

,.....

.--

-. -

,...

,.....

......

._...

....._

..s .4

5 0,

17 *

**po

ckm

+.=

Nal

l al S

s ra

w.*

EIM

WD

DIG

EX

Gr.

ked

($01

4E1

Don

*, a

Win

f(na

rca(

Vm

w (

(Fay

Ixam

ou e

0s

tcot

sed

as./

a( (

or (

I de

atra

ni m

Isrm

i.d...

..bla

(4 L

oma.

BE

ST

CO

PY

AV

AIL

AB

LE

cn 4.

A

I*1

!.VEIN

Page 287: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Fig

ure

12-3

.S

MP

Ass

essm

ent S

heet

for

Pra

ctic

al P

roje

ct (

SM

P, 1

988,

p. 6

).N

ote:

Thi

s gr

id is

bac

ked

up b

y su

bsta

ntia

l writ

ten

mat

eria

l in

an a

ccom

pany

ing

hand

book

.

IX?

11-1

6A

suss

ral 5

Hec

t OR

OE

T(P

RA

CT

ICA

L)

5un

holm

Cla

San

tsna

TO

Hit*

dusk

Cod

. TIN

INS

N r

ski 1

.131

be

Mao

leT

A. F

M A

l FC

FA

Ypo

pIt s

tret

t A IN

K la

ram

mar

M.N

. cod

s

Ttic

Ste

s te

nses

% I.

ada

lo*L

o

Cti A I. Y I N

AR

AM

SA

N O

mer

Aro

n M

A*

Am

of

gsffe

wtO

7F

mow

s...A

...A

..,.

A...

......

.....

$un.

.. se

pia

tsta

rth.

a*in

tim.i

nail

1101

1,10

/111

SC

A

FLM

%tJ

OP

L °L

ean.

P a

FIN

N L

aNN

IN

evin

Maw

any

Nan

AA

NL

NN

NK

.INP

L56

ILL

Mb

naiN

IN F

L. 6

. Im

o.*

Nia

tinuL

dLts

al 1

1515

ISLA

vs

Dev

esra

ndik

go

seat

. it

mas

a en

ich

no 1

5Goa

lm

epis

t IN

NS

. Wm

&N

ovst

eam

avV

FN

M.

6 i NI F k LI E N 7 A 5 i; ! V I E II I N j./I

MO

DE

LIM

MO

me.

......

%+

n.."

IAA

AN

ek4N

. ma.

/M

t and

ilind

meg

a ad

NN

LLY

N a

m*

mol

awko

liatt

Val

, Ss

EX

PE

UN

EV

IING

/ IQ

Mn

Nap

le tw

ists

aft.

QuE

snoN

efa

"....

..G

ap L

omel

i am

ewen

.L.

N.L

... E

niaL

. se

a *p

eke

Saa

"sen

a..

....L

.. L.

iL.K

.Lay

,..,,

,,s,,A

NG

GN

LOy

LL L

.SIL

LEN

L. w

an

SA

MP

LIN

CV

SC

pd.

/. 5.

ans.

......

..C

OLL

EC

TIN

GG

ams

LauL

anth

arm

dr

NE

VA

gnw

. ...a

.tAcs

iwa

1.IN

SLE

Lva

.06A

AA

A

ME

AS

UR

IHO

MR

Cm

ouN

iv in

.. m

ina.

Ot w

t.E

N e

ular

l* In

trem

iLY

faC

arlo

NN

E=

avi

d 17

kra

1 et

KIN

N..

FN

. d N

FLL

Nm

ata7

dan

dy Im

tsm

iL

PR

OC

ES

SIN

GP

D P

lea

1...,

eas

t Ann

&D

AT

Aak

sknr

sonw

.ba

K.5

545t

iLL

...ea

tang

. esp

.. W

it IL

Nay

NA

FA

S. i

thro

okst

wem

s.

RE

PR

ES

EN

ING

RE

Mir

srph

a.m

.. pp

.tw

at F

nette

Llan

nv w

IN a

Fria

Lira

s. c

r w

as 5

W N

aval

Rpm

...ad

vac

ka...

min

wits

eLp

vpiL

LSrt

voN

.L.P

.NI S

G e

moN

s.

C H

E C

K I

N G

/C

II C

s a

nis

ethi

ent E

l io

n.C

IPT

INE

SE

NG

AN

,. se

way

NN

I Nta

nis

Lip

Obo

invi

ams

rr la

idaa

ioLe

SIN

I ato

m.

oric

aml e

sidi

a.

FO

ILM

ULA

TE

AT

O F

S F

. mam

mal

Slim

*A

oki.

A A

MM

ON

Pal

aS

omal

ia in

st./

Loin

sbe

lt' s

Nun

. ent

issa

to 0

.in

cIN

K L

thao

n w

INV

1454

iNhl

. FE

W%

IN 4

.1.1

Pt

tINN

Ns

Lwro

plie

sol s

wan

.6-

imeN

va

CO

KM

UN

1CA

. CO

FitA

nny

dni l

a A

fsiA

AA

TIN

Gpa

blea

... m

in N

okon

tne

Lacc

erui

. oct

lyse

mpo

Ld.

Con

clw

, rat

hIna

vis

NIL

FL

4 el

art5

1 LS

terv

SN

nee

.41

onsa

lettu

regm

a..4

mow

/b r

am w

eave

sno

s. U

LLA

PA

I.La

ryire

a rig

a.1

05 a

tom

ise

LAN

K

INT

ER

PR

ET

ING

IN C

sser

veLI

ntis

miu

st a

te.

Inm

ams

el 0

. IN

A d

erW

igs

ea a

.1 IN

vat

km

tapi

r In

es is

Nvi

t Am

v iL

L co

u. S

acry

aLl p

ar...

mum

s. M

aCira

d ar

Min

tA

saa:

mt I

...ai

r M

asa

IN a

r e

lira

AD

AP

TIN

GA

D F

awn

man

inF

b ef

en4

Ay.

Ala

mo

Or

Cita

, id

6.4

ca5S

55

C IM

P

CA

0 1

As

1/4.

1

BE

ST C

OPY

AV

AIL

RE

'Ll

MI E

NT

RY

Page 288: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

280 Zarinnia and Romberg

tasks and the quality of achievement, which is essential if in-structional assessment is to serve accountability purposes. Thus,for example, conclusions about the quality of verbal communi-cation would result from the use of rubrics ranging from sloppyto precise and rigorous, and from inarticulate to articulate andelegant to analyze the quality of mathematical communicationabout a design project. Definition for, and by, teachers of thequality of communication that would satisfy each aspect woulddo much to set and implement standards among the participat-ing schools.

In summary, rubrics for analysis and scoring of alternativeassessment tasks would help ensure uniform judgment of qual-ity. raise standards, and promote interjudge reliability. Fur-thermore, teachers cooperating to create and use agreed ru-brics for commenting on student work would, in essence,integrate assessment with instruction. In a context in whichthey were reasonably sure of the reliability of their judgmentsagainst those of other teachers, they would also have their owngrading for valid and timely information for instructional deci-sion making. At the same time, the formal rubrics and strate-gies for accomplishing interjudge agreement would support de-velopment of alternative assessment tasks, prompt efforts toimprove the mathematics curriculum, and help instill publicconfidence in the use of school-based information for account-ability.

Multiple measures (NCTM. 19891 and an agreed-upon lan-guage permit significantly more informative statements aboutmathematical power, such as:

The student invents elegant algorithms to design three-dimensional movement of robot arms, often in familiarcontexts.

In group endeavors, the student suggests and engagesIn significant algebraic generalization and conceivesextensions, usually in the familiar context of dairyfarming.

Sixty percent of eighth-graders in Lodestone, Califor-nia. clarify assumptions about their projects: a few ofthem do so elegantly and creatively.

If there is an agreed language tied to common constructs forwhat may be regarded as elegant and creative in Fish Creek,

2 :3 .

Page 289: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 281

Wisconsin, then Wisconsin's mathematical power will be of asimilar quality to that described in the same terms at PyramidLake, Nevada, and Lodestone, California.

Therefore, in addition to the intellectual underpinnings, prac-tical and logistical strategies are needed for arriving at interjudgeagreement on standards of work between teachers within aschool and between schools in different districts. Acceptableinterjudge agreement will be achieved only through teachers'membership in a statewide community, collegial meetings withina school, and regional meetings between schools and betweendistricts to arrive at and calibrate judgments. These structuresexist in most states, but in such fields as athletics and musicrather than in mathematics and language.

Vermont, for example. has not previously had state assess-ment. It seeks now to design a state mathematics assessmentbased on a combination of uniform tests, student portfolios,and student surveys. Its uniform assessment will enable NAEPco: nparison. Vermont's portfolio assessment in mathematics isintended to result in the assessment of more authentic math-ematical activity. However, accomplishing portfolio assessmentin mathematics requires substantial intellectual and practicalpreparation.

A committee of Vermont teachers has proposed that, forportfolio assessment in mathematics, teachers collect studentwork in folders. They decided to design portfolio assessment onthe basis of assessing individual work and aggregating fromthat basis. For program assessment, entire portfolios of indi-vidual students would be assessed by an external team on asmall sampling basis. For student assessment, the best threework examples of each studentselected by teacher and stu-dentwould be submitted for external review.

The committee solicited examples of student work and metto consider how each might be evaluated and how the resultingdata might be used for state assessment. Student work thatcould be assessed in the context of a uniform test was setaside, and the efforts that drew the favorable attention of com-mittee members were considered more closely. In the process,several things became apparent;

there was a generally agreed upon, but poorly articu-lated, perception of the quality of mathematical activ-ity among the group;

Page 290: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

282 Zarinnia and Romberg

as group members worked to spell out to each otherwhat they were seeking as evidence of mathematicalpower, a series of rubrics and common standards ofjudgment emerged;

even the best student work, including that of theirown students, did not meet all the standards of excel-lence that members of the group were applying;

typical worksheets were judged of little value in thecontext of portfolio assessment and were to be ex-cluded;

much student work had been brought to an end justas it had the potential to become mathematically in-teresting;

the most interesting student work emerged in the con-text of such activities as design and explanation;

articulate language and variety in representation bythe students was, for those judging, a critical entryinto the quality of their mathematical activity;

introductory paragraphs from the teacher and the stu-dent would add significantly to the meaning of thework and the ability of an external assessor to evalu-ate it;

some aspects of mathematical power, such as confi-dence, could be assessed only by the classroom teacher;and

individual teachers planned modifications of their ownwork with students to bring it more in line with thestandards for judging mathematical power that theyhad been helping to articulate.

The committee's initial draft of a coding scheme was basedon an implicit emphasis on problem solving and an explicitconcern with the five goals of the NCTM (1989) Standards:becoming a mathematical problem solver; learning to reasonmathematically; learning to communicate mathematically; learn-ing to value mathematics; and developing confidence in one'sown ability to do mathematics. Vermont has made a powerfulstart in the direction of collecting direct evidence of authentic

n nU

Page 291: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

A Framework for the California Assessment Program 283

mathematical activity. The quality of evidence sought and theepistemological stance implicit in the proposed assessmentstructure are likely to spur real change. (See excerpts fromthe Report of Vermont's Portfolio Assessment Program in Ap-pendix G.)

SUMMARY AND CONCLUSION: A RECOMMENDATION FOR CAP

The objective of this paper has been to propose reporting cat-egories for the CAP that would simultaneously monitor andpromote reform in mathematics education. The object of reformhas been defined as the attainment of mathematical power.This is a complex and multifaceted view of mathematical lit-eracy that requires a different epistemology for schoolmathematics.

A number of bases for forming reporting categories wereoutlined. Each reflects significant aspects of information aboutstudents' mathematical achievement and has advantages anddisadvantages. Direct evidence of mathematical power requirescategories that focus on the active use, generation, and com-munication of mathematical ideas in problematic situations andin collaborative contexts that are based in a familiar culture.

However, the 'sine qua non for reform is change in schoolbeliefs about the nature of authentic mathematical activity andthe character of the mathematical enterprise. Unless studentshave experience in generating mathematical ideasseeing math-ematics as part of their culture and becoming encultured intothe mathematical enterprise (Bishop, 1988)little of substancewill have changed. The set of categories recommended has thesetwo changes as its cohering purpose.

Nevertheless, the most appropriate set of reporting catego-ries is still comiptible by other facets of assessment sys-tem. It is clear that to gather the kinds of direct evidenceregarded as essential will require data collection strategies dif-ferent from those now in place for CAP or for any other generalassessment program. A representative range of potential strate-gies has been succinctly and effectively summarized in Assess-ment Alternatives in Mathematics: An Overview of AssessmentTechniques for the Future (Stenmark, 1989).

It is obviously impossible for CAP to use such measures forexternal assessment without creating a massive and expensiveassessment bureaucracy. Such a strategy would be undesirable

Page 292: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

K

284 Zarinnta and Romberg

because it would be antithetical to the kind of epistemologicalchange that has been advocated. The logical strategy is for CAPitself to re-empower teachers by involving schools and teachersin a collaborative assessment process that includes an elern: ntof student self-assessment. This would be a major step towardinitiating epistemological change. It would require training andinservice for teachers, as well as common templates for taskanalysis, a common language of description, and formal organi-zational structures for developing and maintaining interjudgeagreement.

The implication is that the California Assessment Programneeds to develop a new character and a new program. To guideschools and teachers in the assessment process, it needs to:

set standards for school-based assessment;

train teachers to gather evidence;

provide a structure for developing interjudge agree-ment;

provide quality control over the process;

act as a mentoring and moderating authority to ini-tiate, sponsor, and adjudicate; and

collect. analyze. and disseminate a much more com-plex range of information.

Assessments that incorporate student self-assessment,teacher involvement in the assessment process, alternative strat-egies for gathering information, and other such efforts are pos-sible and being used in some places Furthermore, the kinds ofcategories, the concerns discussed, and the solutions suggestedhave been independently arrived at in disciplines other thanschool mathematics. There is a converging view that the kindsof categories proposedauthentic activity in the domain, col-laborative activity, facility in communication, and enculturationinto the domainhave broad significance for authentic assess-ment. One cannot have both total freedom and total control,and democratic strategies are as essential to change in theepistemology of school mathematics as they are to the nationaleconomy.

Page 293: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

.4

13

EvaluationSome Other Perspectives

Philip C. Clarkson

"Yes, but who will change the tests?"(National Council of Teachers of Mathematics, 1989. p. 189)

It is apparent that "the tests" referred to are not the teststeachers give in their classrooms on a day-by day or weeklybasis. They already have control over these. "The tests" arethe standardized assessment instruments which are usedthroughout the United States, often authorized by legislation.devised by commercial organizations, and seen by many teach-ers throughout the country as being a forceful factor in struc-turing the mathematics curriculum. The preceding papers !nthis volume introduce the issue of current testing practice intothe ongoing debate and ferment that surrounds mathematicstoday. This chapter sketches developments in the Slate ofVictoria, Australia over the last 25 years where there is onlyone external test given at the end of the school system, in Year12. This contrasting situation may contribute constructivelyto the ongoing debate in both Australia and the United Statesas to how to monitor the work of schools.

EVALUATION AND ASSESSMENTGENERAL THEMES

The introductory paper and the two following papers tend to"set the scene" for the rest of the volume. The opening paperprovides an overview, and the second paper a historical per-spective in which to place the looked-for changes in assess-

285

iwo

Page 294: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

286 Clarkson

ment. It is well to remember that what is being undertakentoday is not new in a fundamental way. Today's changes arethe latest in an ongoing process. The third of these papersfocuses discussion on the 1989 NCTM (National Council ofTeachers of Mathematics) Standards. Within these three papersare found most of the themes on which other cor-albutors tothe volume elaborate.

One of these themes has to do with the fact that standard-ized tests have gained an undue influence over the curriculum.It has become evident that no matter what is in the syllabus,teachers will teach to what is examined. There are two points tobe made about this: first, since by and large these tests exam-ine skills and knowledge. school mathematics has reflected thisfact in its emphasis on teaching skills and knowledge. It isnoted that such an emphasis is quite at variance with theproposed changes in the curriculum. The second point stressesthe fact that since teachers are very conscious of "what isexamined," a change in the assessment procedures to empha-size the new goals will encourage teachers to change as well.This circularity has been summed up in the now familiar phrase,"What is tested is what gets taught" (Mathematical Sciences Edu-cation Board. 1939, p. 69). Whether standardized tests can bechanged in such a way as to accurately reflect the changesproposed for the curriculum is examined: on balance. it seemsdoubtful whether they can be. Hence, the specific role thatsystems-wide testing may have in the changed educational en-vironment will need to be examined explicitly.

Another theme that emerges in this volume follows from theat. The fundamental view of knowledge embodied in the present

standardized testing procedure is the notion that knowledge. inthis case mathematical knowledge. is out there waiting for stu-dents to consume it. The role of the teacher is to serve up thisknowledge of mathematics in such a manner that the studentswill actually ingest it and assimilate it appropriately. It follows,in this analogy, that the role of assessment is to cue studentsto regurgitate their knowledge, often in forms which are onlyslightly digested. It is recognized that few students fully digestsuch knowledge, at least according to what the Tests tell us.And it is very difficult in any case to unscramble digestedknowledge reliably when you only use multiple-choice items. Sounscrambling" is not often attempted, and because of the sta-

1

tar LI

Page 295: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 287

tus that standardized testing has gained, any attempt at doingso is devalued.

But this volume argues that a far more rewarding way ofthinking about knowledge in the educational context is to thinkof it as a process. It is a process by which students constructtheir own meanings for this area in their lives called mathemat-ics. If this is so. then assessment needs to be thought of also asa process that provides some indication of the meaning stu-dents have accorded mathematics in their lives. It is like takingsnapshots of a moving target. Perhaps different types of cam-eras positioned in strategic locations will be necessary to createan overall picture of what students know. Maybe it also callsfor full investment on the part of the person most closely in-volved with them in the process, the teachers, and even otherpersons close to them. such as parents and peers. This ap-proach is not compatible with that implied by standardizedtests, which are supposed to provide an objective, clinical, sci-entific assessment of what students know, but more often tendto indicate what they do not know.

Following naturally from the above, the third theme emerges:this is the declaration that standardized tests cannot serve asappropriate assessment instruments for the collection of infor-mation of interest. to diverse parties. And yet it is these scoresthat are used to tell teachers how their classes are going, to tellbureaucrats how specific schools are doing, to tell politicianshow districts are doing, and to tell the nation how its educationsystem is doing. In some areas, the results from these tests arealso applied to individual students: hence, they and othershave an interest in finding out how they are doing as well. Theindefinite terms. doing" and "going." have been used advisedly.Each of these different groups respond to essentially the sameset of data, massaged in slightly different ways, to be sure.However, the meaning for each group is quite different. Bearingin mind that a specific objective should be articulated for anyassessment instrument, it is hard to believe that one standard-ized test can be used with confidence to respond to the widerange of interest represented by the constituencies named above.

The last theme to deserve comment involves specifically oneof the interested groups .amed In the last paragraph. Increas-ingly. the government and, particularly, politicians are demand-ing accountability for the many dollars invested in education.

Page 296: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

288 Clarkson

There seems to have grown up in both the United States andAustralia a management/economics style of government mod-eled in some ways on big business. Hence, virtually everythinghas to be accounted for in monetary terms. It is construed as amark of responsible government for bureaucrats to demandthat systems that receive federal funding, such as educationsystems, act for the public good. However, governments haveoften found that obtaining accountability reports that are mean-ingful is not always easy. As a case in point, the teaching/learning process in many ways is "messy," and in certain re-spects not easily reduced to simple terms. Like any process inaction, it Is often hard to capture its diversity and essentialqualities at the same time in a simple account. But govern-ments more often than not prefer simple accounts, and if thetargeted data can be reduced to numbers, all the better. Num-bers can be manipulated in many ways, and their use conveysthe impression of objective, even scientific reporting. The re-sults of extensive, mandatory tests comprise one such set offigures. But these, with other sets of figures, do not capture thereal story of what is happening in schools and, therefore, in thesystem as a whole. The politicians are selling 0" ;elves short.There are good stories to tell and some that not so good,but the telling is more ambiguous and complex than any set offigures can convey. This theme needs to be aldressed moredirectly in any ongoing discussion of the means by which indi-vidual groups both within the system and outside it can com-municate effectively with each other.

SPECIFIC ISSUES ASSOCIATED Wri 1-1MATHEMATICS ASSESSMENT

In the above section, I have attempted to cite some of thegenera: temes in this volume that are of greatest importance.However, there are also a number of specific issues that addother perspectives to the debate on evaluation. A few of theseare summarized below.The Use of Calculators and Microcomputersin the ClassroomOne of these issues is the diverse response within the math-ematics education community to the role of calculators andtheir place in assessment procedures. Electronic calculators

n (2/dJU

Page 297: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

1

Evaluation Some Other Perspectives 289

have been with us for more than twenty years. Their incorpora-tion into the curriculum, although called for sometime ago, isonly really happening now. But the use of calculators in as-sessment procedures is still not a fact of life. Without question.there is still a residual resistance to the use of this technologyin the classroom. Perhaps the reorientation to doing mathemat-ics argued for in the Standards has not yet been accepted bymany teachers.

Perhaps these factors, among others, play a role in thereluctance to incorporate the use of calculators into assess-ment procedures. At least they raise interesting questions: Ifthere had been a strong push for calculator use in assessmentprocedures early on, would this have led to an greater accep-tance ci them in the classroom? remembering that "What istested is what gets taught." In turn, would this have led todifferent types of standardized tests being produced? Or has itpartly been the ingrained place of standardized tests in thesystem and the nonuse of calculators in the tests that meantthat this technology was regarded in a neutral or negative lightby teachers? Furthermore, the corporations that produce suchtests may even have regarded calculators as a threat. Given theextensive descriptions of test-item production provided in thisvolume, how much control do teachers and others at the dis-trict level have over standardized tests? These questions areworth considering.

Interestingly, in this volume there is no treatment of therole of the microcomputer in mathematics assessment. Perhapswith so much specific attention directed to standardized test-ing, this issue was less compelling. One hopes that if there isno role currently accorded micros in system-wide testing, thiswill not in turn devalue the .microcomputer. That clearly wouldbe at odds with the general sentiment of contributors to thisvolume regarding the use of technology in the teaching of math-ematics. Rather than take that road, which may have been theone pursued in relation to calculators for too long, it may bebetter to question the place of standardized testing and itsJustification. But more of that later.

Problem Solving and Assessment

While it seems that calcula ors have not been accepted by thedevelopers of standardized tests, the term problem solving has

Page 298: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

290 Clarkson

been. But an examination both of the papers in this volumethat describe the production of these tests and those papersthat question their past and present use clearly shows that theterm is being used in a way different from that directed to theteachers by the NCTM. In the Standards, problem solving isseen as an all-pervading approach to mathematics. It is re-garded as one way of doing mathematics and not simply as onecognitive level compared to, say, analysis. Yet the test develop-ers have seen the term problem solving as another such cat-egory (leaving aside the question of the very use of categoriescanvassed extensively in this volume). Is this use of the termproblem solving a cynical effort to dress the old tests in newterminology to make them acceptable in the new environment?Or are we witnessing a real shift, an initial attempt by thetesting industry to respond to new directions? it is easy tobelieve the former in light of the analyses presented in thesechapters, but perhaps the question needs closer examination.The one exception, the one state program approach which hasresponded to the call and uses problem solving in a mannercompatible with that of th Standards is the development ofassessment in California.

Gender Bias in the Mathematics Classroomand in AssessmentAnother specific area examined in this volume is that of genderdifferences in performance on tests composed of multiple-choiceitems. Since these types of tests predominate in standardizedtesting. the conclusion that such items may in certain casesfavor males gives us pause with respect to the use of suchinstruments. This issue, too, encourages the use of other formsof assessment. It is suggested in a number of papers that avariety of assessment methods be employed. Indeed, using theanalogy of a series of cameras positioned in different locations,the same point was made in the previous section of this paper.However, if there is concern about gender effects in multiple-choice items, there is clearly a need to investigate whetherother forms of assessment are prone to the same type of problem.

One aspect of mathematics which is promoted in the Stan-dards is that of communication. Certainly within the verbaldiscourse which goes on in the classroom at present there aregender differences. It has been known for a long time that

Page 299: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 291

teachers react to males and females in different ways. In arecent paper dealing with this issue. Leder (1990) noted that inGrades 3, 6, 7, and 10, Australian males interacted more fre-quently with teachers than females and tended to dominate theattention of the teacher. However, she noted that the differ-ences in interactions were subtle. For example, although therewere the same number of questions asked of males and femalesby the teachers, the type of response made by the teachersdiffered relative to the gender of the student in Grades 6, 7,and 10. In these grades, there was an indication that the teacherwaited longer for answers from females on low cognitive ques-tions, but for males they waited longer when the question wasclassified as high cognitive. Leder went on to suggest that theseand other differences she found may well be signaling the stu-dents that there are differences in the mathematics they aresupposed to construct that are dictated by their gender Sinceboth questioning and the model of questioning that the teacheremploys are essential aspects of mathematics. as well as afundamental aspect of the teacher's assessment strategy, anygender bias needs to be recognized for what it is.

Another aspect of communication promoted in the Stan-dards is for students to write about their mathematics Again,there may well be gender differences pervading this activitywhich the mathematics teacher needs to be aware of. Perhapsan examination of writing in language classes would be a useful place to start in investigating this, bearing in mind thatthere is no guarantee that results in such contexts will transferexactly to a mathematical context.

Other Forms of Bias in Mathematics AssessmentSome further potentials for the examination of bias in assess-ment procedures may be those of language and culture Thesehave not been examined in this volume but need also to beaddressed. There is the whole aspect of dealing with Math-ematical English in monolingual classrooms and how that impinges on assessment procedures (see, for example, Newman,1983, and Watson, 1980). This will clearly overlap with aspectsof the gender bias issue noted above. However, it is also quiteevident that in a significant number of classrooms in the UnitedStates (see Secada, 1990) and, for that matter, in other pseudo-monolingual countries such as Australia and England, there

Page 300: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

292 Clarkson

are many students whose mother tongue is not English andwho are members of a minority culture. There has been someresearch on assessment and bilingual students (Cuevas, 1984),but the tests were of traditional types and were not conclusive.There has also been recent comment on students' cultural back-grounds and on the different styles of learning environmentsfor mathematics (see Clarkson, 1991). It may he that tor .somecultures, using small groups might prove to be a decided ad-vantage, but for others a disadvantage. If, however, assessmentprocedures for problem-solving situations envisage the use ofcooperative small group work, then it may be important to lookat the implications of bilingualism and the multicultural envi-ronment In which such groups may operate.

Other Issues

If large-scale testing is still to have a role to play in Americaneducation systems, then Mark Wilson's paper in this volumemay prove valuable. It will certainly enable tests to be devel-oped that are alternatives to the instruments of today. It is alsoof interest to note that work on linking the SOW model withthe work of van Hide is already underway (Pegg & Davey,1989). The new types of descriptors examined in Wilson's paperwould especially be of use. However, these descriptors have theadded advantage of being useful t 1 teachers as well, and henceempowering the process of teaching. They clearly imply thatmore than one type of assessment procedure must be used fora clear picture to emerge of what happens during the teachingand learning of mathematics in the classroom. The brief reportof the Australian research projects extends this idea.

The summary of results from journal writing developed inVictoria is perhaps more than just one other example of howteachers can gain insight into the way their students developmathematical ideas. It is more too than a student's own recordof self-dialogue. In some of these results, there is an indicationthat the change called for in the Standards may be attainable.The results suggest that teachers may well be more interestedin the strategies that students use in problem solving than inwhether they have acquired the set knowledge. The reports thatstudents can devise their own maps of knowledge serve as aninteresting parallel to the call for teachers to do just that. Thedata that suggest students can distinguish between the diffl-

330

Page 301: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 293

culty of a problem and the type of thinking required to solve itare also of interest. These last observations in combinationwith other data lead to the conclusion that if students are givenappropriate techniques and an environment conducive to learn-ing, they can indeed assume control of their own learning pro-cess. This set of data also clearly shows that feelings on thepart of students and teachers are part of the whole learning/teaching/assessment process. To ignore them is to ignore animportant element in that process. How do we recognize feel-ings and their role in this new approach?

SOME PERSONAL NOTES ON ASSESSMENT

The papers in this volume discuss current assessment proce-dures in mathematics, including the use of standardized test-ing in the United States, and suggest a number of options forconsideration as the impact of change in mathematics curricu-lums is felt in the classroom. However, there has been littleattempt to open up directly the question of whether systems-wide testing procedures in mathematics should continue to beemployed in the U.S. now that the new curriculum changes aretaking hold. There have certainly been some implicit sugges-tions that such testing will need to change radically; the feelingis that such testing mechanisms will not be useful in the fu-ture. Change has also been occurring elsewhere. It would beillustrative to sketch briefly an example of change in schoolswhere there has been little use of systems-wide mandatorytesting for many years. The contrasting situation may provideanother perspective to the ongoing debate.

Change in Mathematics Assessment in Victoria's SchoolsToward the close of the 1950s in Victoria, there was mandatoryassessment at the end of Years 10, 11, and 12. Prior to thistime, there had been examinations in earlier grades as well, butthey had been dispensed with. By the early 1960s, most schoolswere even afforded the privilege of assessing Year 10 studentsinternally with no recourse to external, education-department-approved tests. Indeed, some larger schools were accorded theprivilege of assessing Year 11 students internally. Schools wereaccredited on the strength of how well qualified their staffswere and who had experience teaching the subjects in ques-

Page 302: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

294 Clarkson

tion. Part of this accreditation was based on the strength ofreports made by departmental inspectors who visited the schoolsand classrooms on a regular basis.

The curriculum that was used in schools at that time re-flected the expectations of the system. The system was like atunnel. Students entered at one end and it was presumed thatthey would progress through the school experience until at theother end of the tunnel they made the transition to the univer-sity level. Melbourne University, the only university in Victoriaat that time, had a great deal of influence on the school cur-riculum, viewing it primarily as a private preserve. The reality,of course, was very different. Most children were pushed out ofthe system, or left of their own accprd, before Year 11. However,for many years the curriculum, based on the traditional school-ing system inherited from the English, simply did not reflectthis situation.

There were three mathematics subjects each in Years 10,11, and 12. In Year 12, the subjects were designated Pure,Applied, and General Mathematics. Students wishing to taketertiary courses which required mathematics took the Pure andApplied level courses. General Mathematics was considered aneasier option and was not recognized as a prerequisite for fu-ture study. It had been introduced as a way for returned ser-vicemen from the war to meet the university's entry require-ment for first year students, which mandated that they pass amodern language or a mathematics test at Year 12 level. How-ever. in succeeding years, some faculties at Melbourne Univer-sity did recognize General Mathematics in fulfillment of theirprerequisite requirements. Each mathematics examination wascomposed of about ten extended items. Students were advisedthat complete answers to about seven would be worth full marks.

The year 1966 was important in Australian education. Atthat time new courses of study for Years 10. 11. and 12 wereissued by a new board of the department of education. Sug-gested courses of study for Years 7 through 9 were also in-cluded, and these, for all intents and purposes, became theofficial syllabi. This board was composed of representatives ofthe education department, teachers, Melbourne University fac-ulty, and, importantly, faculty of the new Monash University,among others. The singular influence of Melbourne Universitywas now challenged. The new board proposed two syllabi for

Page 303: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 295

each mathematics subject: a school could choose to teach avariation of the traditional syllabus, or a new one based on theNew Mathematics. By 1972. it was clear that the new sylla-bus would become the standard for all schools. One of theinteresting points to not is the freedom given to schools tomake the choice individually as to which syllabus to follow inthe intervening period. The examination associated with thisnew syllabus was different as well. All components were of anextended item type. These were divided into two sections; thefirst containing items worth up to three marks each and thesecond section containing items worth up to ten marks or so.In 1966, the education department also issued its last syllabusfor primary schools.

In 1972, the new mathematics syllabus became the solestandard. That year also saw the phasing out of Year 11 as-sessment requirements. This meant that Year 12 was the onlyyear in which mandated assessment was required of studentscompleting their secondary schoolirg. This examination wasstill heavily influenced by the tertiary education sector, othercolleges also being represented on the examining board by then.However, while most students still left school before reachingYear 11, the school curriculum was just beginning to reflectthis fact. Teachers were starting to take greater control of theirteaching; and they were starting to teach their students, ratherthan being cowed into teaching a curriculum solely to preparestudents for university courseseven though many would not,and never intended to, go to the university. Interestingly, it wasin 1972 that the education department issued for primaryschools a suggested course of study in mathematics ratherthan an official syllabus.

There were at this time in Victoria other education activitiesthat also impacted on the schools. Throughout the 1960s, forexample, the teachers' unions were asking the increasingly stri-dent question: If teachers are professionals, why are they scru-tinized at regular intervals? During the 1970s, first in the sec-ondary schools (Years 7 through 12) and later in the primaryschools (Years Kindergarten through 6), the education depart-ment withdrew its inspectors as observers of classroom teach-ers and, finally, withdrew them from the schools altogether.The curriculum continued to change and teachers took ad-vantage of their newfound freedom. In mathematics, teacher

%.1V3

Page 304: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

296 Clarkson

groups like the Study Group for Mathematics Learning set outto encourage secondary teachers to explore new ways of ap-proaching the teaching of mathematics, such as using supple-mentary materials in their teaching. Other groups. like theRusden Activity Mathematics Project set out to develop writtenmaterials for teachers to use based upon activities in the class-room. They eventually published twelve booklets for use inYears 6 through 8. Another group. the Mathematics Associationof Victoria, became the nucleus of mass-based teacher organi-zations that developed and conducted many forward-lookingworkshops and conferences by and for teachers. Other educa-tional developments of broad significance included theregionalization of the once extremely centralized department ofeducation.

By the mid-1970s. the education department had re-formedits examination board and included representatives of employergroups as well as a wide array of education groups. The newboard was not just to take charge of the Year 12 syllabus, but itwas also to take an interest in the whole of the secondarysector of education. By the end of the decade, this board hadrevamped the Year 12 examinations. In mathematics, therewere still the three subjects that had been offered since the1950s; however, within each subject there was a designatedcore of study comprising about two-thirds of the content andthen a number of options from which a school could choose toteach. One of these options dealt with computers in mathemat-ics. The examination given at the end of Year 12 was no longerthe only assessment tool used. The teacher was authorized toallocate a score for the study of the optional section. Therewere various coordinating devices used to help ensure compat-ible marks for students from different schools. Teachers at-tended meetings throughout the year during which the coursewas discussed. There was also a process of statistical compari-son during which the internally allocated scores were adjustedto the mean and the standard deviation of the external scoresobtained by a particular school.

The re-formed board also recognized other subjects. Thethree traditional subjects were designated Group 1 subjects.There were also Group 2 subjects, which included no elementof external assessment in their curriculums. All assessmentwas carried out within a school with various comparison strat-

r) 1 .4

kl 41

Page 305: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 297

egies employed between groups of schools. An accreditationbody was formed to oversee the operation of these subjects. Inthe mathematics area, Group 2 subjects included Mathematicsfor Work and Business Mathematics, among others. Not alluniversities recognized these subjects, but some other tertiarycolleges and many employers did. A number of other certifi-cates were also recognized along with the traditional end-of-schooling certificate. One certificate was based on the philoso-phy that the curriculum should be negotiated between teacherand student. An accrediting body was constituted by the edu-cation department which established broad guidelines on con-tent and on procedures for comparative evaluation. Althoughnot many schools incorporated this option into their curricu-lums, those that that did reported great success with it. It willbe appreciated that these alternative curriculum styles fosteredexperimentation with a number of different assessment styles.

In the mid-1980s, a revamping of the mathematics subjectsin Year 12 finally resulted in the end of General Mathematics.This subject had been seen as the soft option by students formany years. Since the mid-1970s, mathematics teachers hadbeen trying to have it deleted against the opposition 'of a num-ber of university faculties. Another change was that calculatorswere expected to be used when completing the Year 12 exami-nations. These and other revisions of the curriculum were thebeginning of a wider move by the department of education toupdate the curriculum in Years 11 and 12. Also discarded werethe alternative Group 2 subjectsa move many teachers re-garded as unfortunate. Among other non-curriculum changesinstituted in the decade of the 1980s was the introduction ofself-management in individual schools.

The New Mathematics CurriculumA set of "new" mathematics units will be taught in all Victorianschools for the first time in 1991 in Year 11 and in 1992 inYear 12. The department has again stipulated a course for Year11, the first time this has been done for twenty years. This hasresulted in a certain amount of opposition from teachers whobelieve they are losing some control. However, the semesterunit structure that replaces the year long courses brings with ita lot more flexibility. The assessment procedures are of particu-lar interest. in brief, there will be two categories of assessment:

n t-v L.)

Page 306: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

298 Clarkson

the first deals with "completion of the unit," while the secondfocuses on the level of achievement." For each semester unitthere are specific tasks which a student needs to finish in orderfor the subject teacher to report on the certificate that the unitwas "successfully completed." This procedure is followed forboth years. The teacher and school are also responsible forjudging the "level of achievement" if a unit is taken at the oldYear 11 level. The assessment techniques employed are at thediscretion of the teacher and the school. However, for equiva-lent Year 12 units, there are four common assessment taskswhich must be used. At the time of writing, it is believed thatone of these tasks will be a 172-hour examination that is exter-nal to the schools and composed of fifty multiple-choice items.This examination will be mainly aimed at skills.

The second task will be another externally set examinationbut one composed of extended-answer items designed to exam-ine higher-level skills. Both of these examinations will be markedby external examiners. The use of calculators will continue tobe expected when completing papers. The third task will be aproject which will Involve an extended writing assignment. Aproblem-solving task will be the fourth. These last two taskswill be evaluated by the subject teacher and then submitted toa process of comparison with other teachers and moderatorsbefore final marks are arrived at. These tasks could be com-pleted individually, but there is scope in the procedures forgroup work as well. Indeed, it is hoped that small group workwill be a common approach. Separate lists of problems for eachof these tasks will be circulated, and two- to four-week timeslots will be designated during which the tasks will have to becompleted. Students will select the problems they wish to workon. The use of microcomputers is to be encouraged. Finally, inthe reporting process there will be no attempt at combiningthe four resulting scores into a global score for a mathematicsunit. The four letter grades per unit attempted are to be re-ported separately as letter grades on the certificate that thestudent receives (Victoria Curriculum and Assessment Board,1989).

This offers a general outline of how senior school math-ematics has changed in thirty years in Victoria. From beingdominated by the one university in the state via external ex-aminations, the students now receive a certificate that indi-

Page 307: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Evaluation Some Other Perspectives 299

cates their achievements at the end of secondary school. Thecurriculum is not completely dominated by what is expected inthe first year of university study. There is now recognition thatonly some students will choose to go to university as the nextstep after school. Teachers, reinforced by ex-teachers, have ex-erted great influence over the curriculum they teach, and in-deed over the schools, not to the exclusion of outside interests,but in a balanced way. They have been expected to act asprofessionals. And they have done so.

During this process, it was recognized that the only accept-able point for control of an examination by those outside agiven school was in Year 12. That calculators as well as micro-computers can be used when completing assessable tasks hasbeen accepted. A way has been found to include a range ofdifferent types of tasks on which to judge a student's math-ematical knowledge, and there is some room for the student'sown choice of problem.

Victoria's system is not perfect. There are certain partici-pants in the process who are not satisfied. Among these are thetertiary institutions. Since they have used a combination ofstudents' Year 12 final marks as an entry score because of theease with which such numbers can be computed (even thoughit has been acknowledged as an illogical computation), they arenot happy with up to !bur letter grades on different tasks foreach of up to twenty-four different units (both mathematicaland nonmathematical units).

It is acknowledged that this process of change will not stophere. Change will continue, and the subsequent changes willundoubtedly be built on present experience. There have been avariety of programs offering alternative assessments for a num-ber of years in Victoria. The present full scale implementationhas drawn from many of them. The use of calculators was agradual process in assessment procedures until fully imple-mented in the early 1980s. The present situation represents apoint reached after many years of change.

Nor is the situation in Victoria a blueprint for any otherstate, region, or province. The quite different pressures andcircumstances in each locality prevent this. However, this sum-mary is offered as an example of what can be done: it is notperfect by any means, but a stimulating example perhaps forothers.

Page 308: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

300 Clarkson

SUMMARY NOTES

Perhaps the major thrust of the changes in mathematics lookedfor and epitomized by the Standard!: document is the need toempower the teacher. After all is said about systems-wide test-ing programs, the overriding feeling is they are predominatelyused to check on teaching quality. And in one sense, such aconclusion is correct. Perhaps the crucial factor in schooling isteacher quality. But to use student tests to judge teachingquality is to employ a rather indirect method. The teach'rs arecertainly aware of the reason these tests are given and respondaccordingly. However, their response is not positive, but ratherone which prevents them from reacting in creative ways to thesituations that arise in their own classrooms. A quote from thepenultimate paper in this volume seems to sum up the point:

If one really subscribes to the idea that a change in theauthority structure of school mathematics is essentialto real change in its epistemology, that authority mustbe seen to transfer from external experts to the school,the teacher, and the student. The assessment process,including its tasks, is a key part of that process. As longas assessment is entirely an external dictate rather thana collaborative effort, the final answer for students isthat they must learn somebody else's mathematics asopposed to holding their own mathematical ideas up forcooperative assessment by an entire mathematical com-munity, which includes their peers (Zarinnia & Rom-berg, this volume, p. 275).

n t)1.)

Page 309: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX A

NCTM EVALUATION STANDARDS

General AssessmentStandard 1. Alignment

In assessing students' learning, assessment methods andtasks should be aligned with the curriculum in terms of:

its goals, objectives, and mathematical content;the relative emphases it gives to various topics andprocesses and their relationships;its instructional approaches and activities, includingthe use of calculators, computers, and manipulatives.

Standard 2. Multiple Sources of InformationDecisions concerning students' learning should be based

on the convergence of information obtained from a variety ofsources. These sources should embody tasks that:

demand different kinds of mathematical thinking;present the same mathematical concept or procedurein different contexts, formats, and problem situations.

Standard 3. Appropriate Assessment Methods and UsesAssessment methods and instruments should be selected

on the basis of:the type of information sought;the use to which the information will be put;the developmental level and maturity of the student.

301

Page 310: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

302 Mathematics Assessment and Evaluation

Use of assessment data for purposes other than those in-tended is inappropriate.

Note: From the "Overview of the Curriculum and EvaluationStandards for School Mathematics" (An Abridgement andExcerpts of NCTM's Curriculum and Evaluation Standardsfor School Mathematics). Prepared by the Working Groupsof the Commission on Standards for School Mathemat-ics, NCTM, October, 1988. pp. 16-18.

Student AssessmentStandard 4. Mathematical Power

The assessment of students' mathematical knowledge shouldseek information about their:

ability to apply their knowledge to solve problems withinmathematics and in other disciplines;

ability to use mathematical language to communicateideas;

ability to reason and analyze;

knowledge and understanding of concepts andprocedures:

disposition towards mathematics;

understanding of the nature of mathematics; and in-tegration of these aspects of mathematical knowledge.

Standard 5. Problem SolvingThe assessment of students' ability to solve problems should

provide evidence that they can:

formulate problems;

apply a variety of strategies to solve problems;

solve problems;

verify and interpret results;

generalize solutions.

Standard 6. CommunicationAssessment of students ability to communicate mathemat-

ics should provide evidence that they can:

Page 311: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Appendices 303

express mathematical ideas by speaking, writing, dem-onstrating. and depicting them visually;

understand, interpret, and evaluate mathematical ideasthat are presenter in written, oral, or visual forms;

use mathematical vocabulary, notation, and structureto represent ideas, describe relationships, and modelsituations.

Standard 7. ReasoningThe assessment of students' ability to reason mathemati-

cally should provide evidence that they can:

use inductive reasoning to recognize patterns and formconjectures:

use reasoning to develop plausible arguments for math-ematical statements;

use proportional and spatial reasoning to solveproblems;

use deductive reasoning to verify conclusions, judgethe validity of arguments, and construct validarguments;

analyze situations to determine common propertiesand structures;

appreciate the axiomatic nature of mathematics.

Standard 8. Mathematical ConceptsAssessment of students' knowledge and understanding of

mathematical concepts should provide evidence that they can:

label, verbalize, and define concepts;

identify and generate examples and nonexamples;

use models. diagrams. and symbols to representconcepts;

translate from one mode of representation to another;

recognize the various meanings and interpretations ofconcepts;

Page 312: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

304 Mathematics Assessment and Evaluation

identify properties of a given concept and recognizeconditions that determine a particular concept;

compare and contrast concepts with other relatedconcepts.

In addition, assessment should provide evidence of the ex-tent to which students have integrated their knowledge of vari-ous concepts.

Standard 9. Mathematical ProceduresThe assessment of students' knowledge of procedures should

provide evidence that they can:

recognize when it is appropriate to use a procedure;

give reasons for the steps in a procedure;

reliably and efficiently execute procedures;

verify results of procedures empirically (e.g., using mod-els) or analytically:

recognize correct and incorrect procedures;

generate new procedures and extend or modify famil-iar ones;

appreciate the nature and role of procedures in math-ematics.

Standard 10. Mathematical DispositionThe assessment of students' mathematical disposition should

seek information about their:

confidence in using mathematics to solve problems, tocommunicate ideas, and to reason;

flexibility in exploring mathematical ideas and tryingalternative methods in solving problems:

willingness to persevere at mathematical tasks;

interest, curiosity, and inventiveness in doing math-ematics;

inclination to monitor and reflect upon their own think-ing and performance;

Page 313: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Appendices 305

valuing of the application of mathematics to situationsarising in other disciplines and everyday enriences:

appreciation of the role of mathematics in our cultureand its value as a tool and as a language.

Program Evaluation

Standard 11. Indicators for Program EvaluationWhen evaluating a mathematics program's consistency with

the NCTM Standards, indicators of the program's match to theStandards should be collected on:

student outcomes;

program expectations and support;

equity for all students;

curriculum review and change.

In addition, indicators of the program's match to the Stan-dards should be collected on curriculum and instructional re-sources and instruction. These are discussed explicitly in Evalu-ation Standards 12 and 13.

Standard 12. Curriculum and Instructional ResourcesWhen evaluating a mathematics program's consistency with

the NCTM Curriculum Standards, examination of curricular andinstructional resources should focus on:

goals, objectives, and mathematical content;

relative emphases on various topics and processes andtheir relationships;

instructional approaches and activities;

articulation across grades;

assessment methods and instruments;

availability of technological tools and support materials.

Standard 13. InstructionWhen evaluating a mathematics program's consistency with

the NCTM Curriculum Standards, instruction and the environ-ment in which it takes place should be examined, with specialattention to:

LIS -41,.

Page 314: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

306 Mathematics Assessment and Evaluation

mathematical content and its treatment;

relative emphases assigned to various topics and pro-cesses and the relationships among them;

opportunity to learn;

instructional resources and classroom climate;

assessment methods and instruments used;

the ar, tulation of instruction across grades.

Standard 14. Evaluation TeamProgram evaluation should be planned and conducted with

the involvement of:

individuals with expertise and training in mathemat-ics education;

individuals with expertise and training in programevaluation;

decision makers for the mathematics program;

users of the information from the evaluation.

Note: From the "Overview of the Curriculum and Evaluation Standards forSchool Mathematics (An Abridgement and Excerpts of NCTM's Cuninc/tun and Evaluation Standards for School Mathematics). Prepared by theWorking Groups of the Commission on Standards for School Mathemat-ics. NCTM. October. 1988. pp. 16-18.

i Arust}

Page 315: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX B

CLASSIFICATION MATRIX

TEST NAME:

NUMBER OF ITEMS FOR EACH CATEGORY

QUES CONTENT PROCESS LEVEL

1 I nr ns alg p/s gee me com c/e con rea ps p&I cone proc2

3

4

6

7

8

9

TOT.

Key: nr Numbe and Number Relationsns Numbe Systems and Number Theoryalg Algebrap/s Probab lily or Statisticsgee Geome rymea Measu ementcorn Communicationde Compu ation or Estimationcon Connectionsrea Reasoningps Problem Solvingpeif Patterns and Functionscone Conceptsproc Procedures

n

Page 316: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX C

TEST RESULTS: PERCENT OF ITEMS FOREACH CATEGORY

TEST CONTENT PROCESS LEVEL

nr ns alg p/s geo me. ps com rea con de pea cone prosSRA 82 7 7 0 4 0 3 5 1 0 91 0 16 84

CAT 73 5 6 6 4 6 0 11 6 0 83 0 110 90

SAT 64 0 10 9 2 15 0 36 0 0 62 0 18 92

ITBS 62 11 7 3 4 13 0 9 1 0 89 1 1 4 96

MAT 66 6 0 5 8 15 0 21 0 0 79 0 1 12 88

CTBS 76 0 0 11 8 25 0 71 2 15 85

AVG. 71 3 5 6 5 Be 20 0 79 11 89

RNG. 20 7 10 11 6 15 1 18 2 0 20 11 12

Key: nr Numbe and Number Relationsns Numbe Systems and Number Theoryaig Algebrapis Probab lily or Statisticsgee Geome rymea Measurementps Problem Solvingcorn Communicationrea Reasoningcon Connectionsde Computation or EstimationPig Patterns and Functionscone Conceptsproc ProceduresAVG AverageRNG Range

Page 317: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX D

ILLUSTRATIVE QUESTIONS

Example I (multiple choice): Typically students are asked to solve for x when given anequation. In the following question, students are required to see an algebraic represen-tation as mathematizalion of a real-world problem.

Which one of the following probleascan be solved by using the equationx 4. 3 a 2$?

O A math class started with 28 students.The next day ] more students enrolledin the class. Row many students doesthis class have now?

O Erin added 2 more books to hercollection. If she now has 25 books,how many books did Erin haveoriginally?

O Tie had $25 in his account. A weeklater he deposited $2 acre. Rowsuch money does he have in hisaccount now?

O Ann biked 28 ha at 2 ks par hour.How long did Ann bike?

Example 2 (multiple choice): This question can be done in several ways dependingupon the mathematical sophistication of the student. It can be approached purely by trialand error or by trial and error in a systematic way using knowledge of place value.

ODDX CID

The five digits 1, 2, 3, 4, and $ Cr.placed in the boxes above to fors aaultiplication problea. If the digit.are placed to give the Raisw product,that product will fall between.

O 10,000 and 22,000

O 22,001 and 22,300

O 22,301 and 22,400

O 22,401 and 22,500

Page 318: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

I in I I IS I

sr l I

IMMUMMEMMOMMMEMMINTIMMOMMO=EMTMTME2E21.1P0-6.7.= =t"TrillaMbirt14tAM.mmAir sisididiSmIl

ME MOMMOMMOMMEMMEMMEM mmMMogimMINIMEIMMOMWMAIIMMA

ig. mMETEMEMEMSWIFTIM MEIimmumomm ITEM 71TIMMTM

Hip,

MO Miran Ennamil no Irmo, . . m . mirffirt err 31

am a Tie iiiImes, islim smimiEwan ..... .....man . low

millmulliiilvAILI..... .._ ..MUM M MEMMIME ffirairrillInt BM IR ENI monrthasounran

u it- 13 Mira ring 19 Mai i0.4 I" Irani La

EnSbmgalibk.

a

Page 319: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

. . . .

MEMEMEMEMEMIMEMMEMMEMMEUMMXlib,flW±Icrre,t_nWt+W AWMWASIMMO

1MMSME IOgMaEMAMMIMROMERRMMPeIRMNEIMMMMEEMOEMMEM

1 . manna:M...::IIEEflimMEMMOMEM mlprinnutun.OM MEMWMBERMEMMIVIMMIIIMIIIIIII I

MIME& EMMOI111.91111111111.111 ellawn .h.r simparosi. -4ri Aro sumimommal NM EMBOMMEMME PPMIIIIMPRNMAMImmErspmows

nwortsmMicromitrimtrogn rill! Ppsi en _ vl ,n, 4

Silli =Lir In 1 11

Page 320: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX E

HISTORY AND RATIONALE FOR STUDENTMATHEMATICS JOURNALS: A SCHOOLPERSPECTIVE

A. Waywood, Mathematics Co- ordinatorVaucluse CollegeRichmond, Victoria, Australia

Vaucluse College is a Catholic secondary girls school. Thereare approximately five hundred girls from Year 7 to Year 12 atVaucluse. It serves a multicultural population: 20 percent Asian,30 percent Italian and Greek, with the remaining 50 percentbeing predominantly Anglo-Saxon. Prior to the introduction ofthe mathematics journals, the mathematics program had beenfairly "text book traditional." Mathematics was a compulsorypart of the curriculum until Year 10. In Year 11 students coulddrop maths completely, do a Mathematics and Work unit, orcontinue with the core mathematics. Of students remaining atVaucluse to complete their final year, about 30 percent wouldcontinue with mathematics.

History

In 1986, mathematics Journals were introduced experimentallyin one class each at Year 7, 9, and 10 levels. Compared to thepresent understanding of the functioning of a mathematics Jour-nal, these initial experiments were very crude in terms of theperceived relationship between students keeping a journal and

"0

312

Page 321: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Appendices 313

a pedagogy of mathematics. Even so, results were encouragingenough to warrant the escpansion of their use. By early 1989,the keeping of mathematics journals was seen as an essentialelement in the teaching of mathematics from Years 7 to 10.Years 11 and 12 need to be bracketed out of this discussionbecause, even though students are required to keep summarybooks, they are geared to an examination system where theskills developed through journal keeping are not prized. Astudent's completion of a mathematics journal should result inmore than just a summary of mathematical procedures; ratherit is the developing of an attitude to the doing of mathematicsand coming to an understanding of what mathematics can befor students. In other words, in introducing journal keeping tothe mathematics classroom, something has to happen for thestudent and the teacher.

Educational RationaleThrough 1987 and 1988, we worked hard trying to see howjournals were functioning in terms of student learning andusing these insights in a formulation of the purpose for havingstudents keep a mathematics journal. This purpose is pre-mised on the beliefs that language and thought are intimatelyconnected and that mastering forms of communication goeshand in hand with mastering thinking.

By keeping a mathematics journal we intend that students:

1. Formulate, clarify, and relate concepts,2. Appreciate how mathematics speaks about the

world,3. Think mathematically,

a. Practice the processes (problem solving) that un-derlie the doing of mathematics,

b. Formulate physical relations mathematically.

This purpose is translated into a number of tasks for stu-dents to do when they write a mathematics journal. Basically,any journal entry should be structured around three activities:summarizing, discussing, exemplifying. As a first introductionto journal writing, we supply our Year 7 students with anactual book, in which each page is divided into the sections:What we did, What I learned, Examples, and Questions. As anactivity, each of these tasks has an internal structure that has

,

Page 322: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

314 Mathematics Assessment and Evaluation

the potential to draw the students into the experience of think-ing systematically and of taking control of :heir learning. Theactual structure of the tasks requires learning activity ratherthan learning passivity.

As I pointed out earlier, the journal activity has two roots:one in the dynamic of student learning, the other in the dy-namic of instruction. Even though each of the tasks is essentialfor learning, none of them are explicitly taught in a mathemat-ics classroom. We found that journals fulfilled our purpose tothe degree that we accommodated our teaching to their use.Put most globally, using journals as an instructional tool re-quired a shift from teaching techniques to helping studentsconstruct meaning. At the level of classroom implementation,this required teachers to:

appropriate new models of mathematics instruction,such as group work, library research, historical inves-tigations, and class discussion;

experiment with non-traditional Instructional devices,such as semantic maps. language of argument, mod-elling precise expresdon (formulating definitions), andredrafting.

To sum up, then, the educational rationale behind journalwriting is to have students experience sustained and precisethinking. We suggest that language activity and mathematicalactivity come together and are focused in the act of precisearticulation, which is the underlying demand of journal writing.Further, these two worlds of activity come together uniquely,because of the content of mathematics, which has to do withthe relation between ideas and not about the relationship be-tween things. Much of this rationale can be exemplified in adiscussion of what journal completion entails.

What Journal Completion Entails

Students are required to write in their journal after every math-ematics lesson. This is seen as ongoing homework. It is a re-quirement that is taken seriously because journals contribute30 percent to the assessment in mathematics. As a minimum,a satisfactory journal entry should reflect the intellectual in-volvement of the student in the day's lesson. What form a

n 1Vi C.

Page 323: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Appendices 315

particular entry will take is determined by the form of the day'slesson and the level of sophistication at which the student caninterpret the journal tasks. To simplify this discussion, I willcharacterize lessons as falling into one of three types, Theory.Practice, Activity, and discuss under each the appropriate jour-nal activity.

Journal Entry Appropriate to a Theory LessonThe students will have taken notes in class and then that nightwill reconstruct the lesson and present a clear summary of thelesson. While doing this, they will note connections with previ-ous ideas and concepts or applications that weren't clear tothem. They will discuss what wasn't clear with the aim of phras-ing a precise question that will get to the bottom of what theyhave not understood. The discussion wi;/ also aim to extend theideas through the use of "What if ... ?" questions. Where ap-propriate, they will give examples that illustrate the ideas orapplications being discussed.

Journal Entry Appropriate to a Practice LessonAfter a practice lesson, students will spend time annotating aworked example. They will demonstrate an understanding ofthe connection between techniques and applications with thetheory. They will isolate areas of background knowledge thatprevent mastery of new techniques and test their understand-ing by doing a hard example. They will comment on their par-ticular pattern of mistakes.

Journal 'Entry Appropriate to an Activity LessonIn the first place, students will unearth the relationship of theactivity to the unit of study, they will record what was doneand discuss what it means, and they will reflect on how itillustrates idea! Jr example. principles). They will describeand justify the method they have followed and state the conclu-sions they have reached.

It should be clear that the journal calls on many high-orderprocesses. Being able to write a journal entry is not automaticfor any student. Learning to use a journal has to be taught,and our experience is that if it is taught and applied duringinstruction in mathematics, then students find mathematicsmore meaningful, useful, and enduring.

-,NLS5.1. 4.,

Page 324: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

316 Mathematics Assessment and Evaluation

What Constitutes a Successful JournalA corollary to the issue of what journal teaching entails is theissue of how teachers recognize a successful journal. As wegained experience with reading journals, it became clear thatsomething more was happening in journals that were seen assuccessful rather than just being complete. Successful journalswere written differently. In the first instance, students whoseemed to be getting the most from their Journal work weremore often trying to explain rather than Just describe. Fromthis insight, we formulated a taxonomy of the function of lan-guage in journals, which spread entries alor a continuum.Students used language in a Narrative, Sumi y, or Dialoguemode. These terms are necessarily technical and are defined, atpresent. by examples of student work. What was most useful inthese categories was that they gave teachers a means to dis-criminate between journals and to model proper use. Our con-tention was that as students learned to explain rather thandescribe, where summarizing was seen as a precursor to ex-plaining. they were more likely to be thinking mathematically.This taxonomy of text has been very useful in judging success-ful journal completion and, further, seems to point towardsdiffering dispositions of students towards mathematics.

r:01/41 404

Page 325: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

APPENDIX F

SMP PROJECT: STUDENT LOGSAMPLE PAGES

SNP 11-16 Coursework

Cand Lan% Nan

School

Cann No. I I Candlelsee No. I

London and East Amelia GroupMidland Examining GroupOn behalf of Groups nationally

Brief tintof task Sttlairt.4

PLANNING SHEET FOR OPEN-ENDED TASKS

inizo packs5 i,5 aim44.1 rw raoda Tax_A-Knacje Cuu Scut' SEAutArito l44- EMMA:01N packagaJ.AK/tate- riA-a- di- Nita.EMU- 07- C-Ontirra4

I mailed fo dea(34,_ a_ Kati Got tta Item;Adak tvt_ Lunt a., Lws uxeusit idr pacLitaad waet catetr.i4 tic ct

cads CI- boxes -AD c.3/4,1 std- hew ntact-cacanuil_ ea uzaect Caer..LE: anual- of tireadip..

Ai2 lat,41;45 iertaittt,5 kafee a- too 7, rm. ft fCseen fvuce, edoo ivt<ceo swta,La,.

NW pachh cbAitts 14-evcia. AZ* aii ak-tkf u4¢4.1-j-

l-t'Lad 014.774.the dt. riz.7/2 pa.C.Ltitc5,

nl

Page 326: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

318 Mathematics Assessment and Evaluation

SNIP 11-16 Coursework

00411400s Na. R

Scheel I

Centre Ns l Canillak Ns

LOG SHEET FOR OPEN-ENDED TASKS

irate 04 tat Amelia GawpMidland baalaing Gm*Os khan et Croups natinally

MOO, C11.41,.., 0. Ake44)of task 4400.1t.ttleS Padear--

Date Work Done Queries Teacher's Not s

Er.% aam,,,,, 54

t, ClaiztaCci

g ......, 14 k-IT o.ak caavt ca.

artj1V24 ttk esti c -his 'puixot

b-A, 74.LOSUIFCLa1/44-V.C.Ibtt. lAw..

,i0.40.00-Yblinsk.laieslz,44 dr 1).ockset...kioth.s.

NO /A...0in (Wasik

04mt petit pees

- rtht 5.11.1.4-#1 la han iv

be ..t if Ma Aka

a h be cawknatd

No

14.0A0....rnok s2.

& vl_

fic s- Pabw..

n cs.VtLauX svt apuatc-

In- c.\ (wawaauks ',..5-\ ;tau,. -5S lArGil

& nth)

.k\ N ttru_ +GGP,..0±fce, C-

.%-°kr

;4 a th.oia. AI. w^0314-

('C \ 13\ etwar .\

0

Page 327: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

No WaSta3a

4ube.

Appendices 319

of tube

Par Ey Pc:Ad

was-tile

Page 328: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

320 Mathematics Assessment and Evaluation

fad Cost (7050 = O. 5953609 .0"75r '+

9. 47039

= /9.34072= 19 f h tud.A.Lit twat)

tom- (r)

InCs x /6b9° it

CLAQ:AdtA Cede- "bay).-ci.tf (./.7.57 ifrartitS

= 0.34p

076,e /73 5fikairitS= 0.30p

if0F, / ill ymor fin... 0.35i,

t./1/3 / 5morticS

= 0.3Se

(Ad. nu. is Ltexa ?eLpficleft,-to Li-ca ito-C,taihmie. dovt- /AP

as ,4& ii cat otai,t, t takEnLautA- a- rad" turie- 1A4

1Et cAtAt. -(;) a tap- Au/pc/VC bdis L11(4.

,A1.051-,,ertve- otAke,

cha-La bers.x22 twkatic-)4- is ti-ic a_ kg, .044e. CAGICA0-4t.

'K.& KA ea ill- isLIECee

a piaa A3. it aka

-7-0.,6 && 1/4-164-0 cotoiciAaii4.- 3 40.4,-C

101.41 4 atailtqi, dzu. 3 tau Cadlihoeig(ile_

peas c pot. utoLl& puce. Lig is 12-16 jell&deaA /mit eS

taltgcl I/ AS) ca-frLiec.b ptiCG 2

Petch-0 bac' 19p -`7Cc )acle,,t ailtoSeat, ChrsgA-Ce iii-uktadoLl-42?East !pa Cinon ,3 at.

Page 329: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

321

Looking Beyond"The Answer"

The Report of Vermont'sMathematics Portfolio Assessment Program

Pilot Year1990-91

State of VermontGovernor

Enchant A Snelling

State Bard of MantlesDouglas I. Tudbope. Glair North HemSany Sugarman, We Mir ShatsburyBarbara M. Forest Canino°Kathryn A. roper Lower WaterfordPatrick S. Robins BurlingtonFrances Elwell BrattleboroRoss Andersen Biskra

The Vermont Department of EducationCantata's of Education

PSchard P. Sills

Mena of Planning and Polley DevelopmentW. Ross Brava

fdatbemancs ConsultantRobin Kenney

TBA Consuldng GroupWilliam J. Thompson

Page 330: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

322 Mathematics Assessment and Evaluation

Somers IIMerees Penns. Ammer* cacao

II. Problem Solving and Communication:The Criteria

The b=it pitta of student week pemtk the bat to assessingthe prohksn Kiting and mathematical cortenunication slats of

Vermont It eents Ties section prides a &sniper:0 ot dust Iwoelements and the seam °gala by welchthem a. massed

Problem-Soh* WU.PSI. Understaadln of the TaskPSI. Selection of Approaches/Procedatet/SteMeglesPS3. the of Reflection, Justification. Analysis.

Vergladoo In Probkm SolvingESL Flodbe, Coodulons.Ohoenadom,

Connections, Gewalt:MksMatheaurthml Coanakation Rib

MCI. Lange of MatbernatIesK2. Matimmatital RepresentationsHO. Clarity of Presestatka

A. Problem Solving: The Essential SkillThe 'stand Cara d Teaches of Mathematics' Agenda for

Attica (IS ) hum... J Uul problem solaing be tie bon ofwhat mathematics. Reaisitng that recornmendafien, theCum adan and Eaduatto n 3andardy for School Hathemana placetecorrin a mathetaarkal preblen scans at the top of the Est ofgoals kr stodentz

The daelopmem death makals May to solve motions aeammal J hew the am be a wanly. amen To detv/opsuch Wens Marna and to work on problem; Mornay ratehours days. men ant to solve Mehaagh sane may berefanle/y sins* exerresee to lee oconNethed independentlyothers thadd much* and! gawp. or an ewe dam nataltooperalualy Some prodienu also told be opeaended wensno right answer. and alms need to be famtalmed

This goal is the lotndaton mgr. welch Vermont has bib itsassessment program Mathematics programs should be Soongermaluses on the traditional min twostep problem that aretategm4ed Seo tradtinsl !Wes. and using toad a &nadadelnitim of problem sulaing Problems should have a veiny atsmarm They should include the twos ofproblems that stodentssteads tiny day. Allgamlian probkens Amid play a major rolem the conicultmt Scene problems shoal be opemtodect Neatensolving should Mclude intetliganals and Wag-tem PreiettsTeactilg problem-whin* shal4es own be an Mural put elin:anathema:Id mum be reflected in the macaa

The NOM &infants encourage a tinny°, problemsolungoppaturites The probleMinhing assessinert Pandang stets

-11 pablem saking a to be the form of selanl matheman arows also te at ban of amemmeM Smfenn'atalm todveproblems &Mops our tom as a wolf el extended manammocycaraimn to solve many loads al mode= and encoumenma real world 14/vatee

Assearmerm should determum =dem' abdaym pease allmama d problem maul Emden about der abelery to askanon l e pen mtormanowt mid make (Wyman aessemal b dereanme M., can kernutaa problems

Assessment OS Medd red <Wrote on students' toe ofamnia and pebreamoltingtedniqvm and at Men abday tomadyand Minaret Texan finally became the pacersrocatremotia a derive( in part from its generalizabMel 04. utwaspom sob:kmma h genmalaed to a threaspre soluficniMS wed armaYem salting should be ammo, as sea"

VenrCOIS command* to pros:ming meaningful geohlmosohingactrOtin br its au:antswas the bask be development a thepoblerrasolving criteria

Vermont's ProbiasSolving CriteriaToo ohm probkm solving has been taught as a Wu process

with Ise amen steps begin by restating the problem Mangy astreet. she the patina died ram smuts. lids cm:Janettewoad. tented tot ran a et way to teach moduli, how toswot. word problems. Vermont's only' 01 Problem tubing°dads Iv !moodily asp Taos approach to saireg berdprams. and a grunt to assist students in dewlap:et madmen'apposches Its Ile range el types of problems they rut encounter

row WetVerwoal ear an also recent that prob lem sating is nol

necessity a Imes protest Ptob/ems can have multiple soh:boosor multiple arms of cowrie. g a milion. Recogniaing that studentshave &Went kayak* bases and varied kaning styles suggests

Vermont's concept of problem tottingextends far beyond the simplistic approach

to solcing word problems

the it is Mappropiate to adopt a angular approach to problemsohang and endorse that as the toly approach valued by the slate_The dmeloprnent of a rage of tedgerusolving straiten (e.g, trialand am. Galls tp5cafan ol algaidarm stud representmons)and a repsteire of problem-soh* *lb (el, rellectinwnbrabon) ert the gods of rruatmvalics education M Vemtun.The probknasobing altsia adopted 65-Vermont reSect thme

PalsElencents of pi blen sohing at hem* illegiateA and is

dR to separate at &rind aspects ltemuthelest. Vemiont'sassessment and provide raiment leeduck to prowamt Toroot that goal Vmman bolated the {db.*/ four key allot lotthe problemsehing a/aim bland u.

Problem...Solving Cyber!'LIndersUodlog tithe Lostlima the Madan approached the tut: the approach(n).woe/dents) mho /or grattgka adopted to Math theteatWhy the sticknt made the chokes aloof the way; therellectioojastlfkation, analple, ratio...Fe. verificationUM inflamed decisions.Who, !Wags, conclusion., observatIonaortsectIon.gentralLtations the student reached.

nrIreki r 77.1 Feisk). i

Page 331: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

'4C1

Page 332: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

C

lilliiWiliii Ingiiii ' iiii IiiiiiglirPelritdi 1tiafflighilliP Nig i nil nollidwittR.Isir i

tgirifilfii-wei 4,11,1[11 At filithulikliiit tyV! RignIt b MU NI lot liditnipAidiaRiqr gi

-4*.

Ili f 52

If I 1 Iii PP thli II i 110 Lk* eallirillth /1E.0 ilegr 1 1E1 ighi i Ora pliiihtaii Ed-,

1.111/111-1 Illeilli itil 110111 1/1111 IN/ IPOM li-q -* 'sups u. znigb 44.1 Will 111

,an a rw g Wg

DIMilldliii r 1 p 1 t g i u i i i i H twitapippidiffilliiiNiEr-ollitultgi flippt-gl itaidirttpawgigplir liw Htt ni itiEl p riritildligsaillidi111 i WI igi i Elr ply - $ p pe afi -1 ii! JR&

MII N101011911; ELT ligig- Ra-p ! if IP flog egiir

Iviii 41 1 411 02 qii FE id 1 ht at dip.dill, MI ! q i Si r Ifill lirtilt! EDI di

highs!! h PIS i trAtii111`4911En g i t i 9 6 a gF-gs 0 -r % t 3°

Page 333: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Pi

Appendices

vereaa's MS erniars Fame

325

Sums I 1-011 $, Tau2 /

3 31/ -a

5 0-11'1A-71 it? 5 if fr-,f

5

7 wii-t-fr I I /2- 7

8 .1-1-t71 6 a Im,,mil 7

10 111 10 rr1.,4 1 11 II

O ti 2over hypened, The cu./ n r q

a t is Set, r graAar S ,/,./N, eP titre.: a. ea fly:. Fc4

- tic

%Anal. happened" runt r e a

1 4 .1 e d

B. Communication:A Critical Element

The ISM Smite* el the Katmai Canal el TeahaselMathematic% included 'Lewin to Commtricate Wathematicallyis one elks new goals fa Okada

NCTAI mwested thatthan stalent should learn 'the apt. rprbels.and tern of maihrmatio. Bit k ben acccenpLished h pablemMuth:ins:I Ittikh students have an cppartwity to mat Me. and&cuss ideas in wtich the me ol the Impose el mathematicsbecome mina. As students cannturicate thei ilea., they learnto <laity. tette. old (smolt* their Welke

Verrrart teachrt stwort NOVA tad i the incatancepot Ccenrratnicalket sa pndc'AmM Mathelmtfch onlynatural to :rill& comsmaicabsn as a aura elerent it theassessrant el student wet and Its wanton h the eclat olprebiern Itccogniing*Ma autIcatS hfll yak Wade OhmI stool and as bk =tea emendal that they lean mom thathdependent Frokkasolvhe I . They must be able tocommunicale their ideas to othen and understand the Was elethers, It at In fully hen& innm the raped gimp analysis andrellection. with on Saw strengtten thek ixtnidual pmblemsolving 'kik

Studenh need to lean to cceenunkate nerds iks Theauphiths on get stu&nts to ar6ctiale hoe they seined toproblem ad why they made the choices they made tkottphart theproblem call kr schtisticated coati cafe this Being anSeethe problem stet hithatt keg able to conenuricate ideas

to others Enda mahnualfcal paw" in way that Es InconsistentWait Vermont's gcels. Rohm reasons commuckalice h thesecond key element that tell be stalled dwouEth to Verrrontpeed** Aga one that the Mitten Pms ol paeans provideoppommihm for the use el Meal commtrication

The NCTat assessment standatti fp communkation is a tit rareprocripote than the problemsolritg standard. It stater

Integral to Ad uric it process a ccemanicariod Ideas an&cased atomic shared covactunet cordrmed and*exalt* ncluded thmeigh larMag adintlPtaini. iitiwdand reading The mulct of canvntatiration dm*, thinkingand faeces students to 'adage in daisy avihemaita A,suckccamankation delete 001eantivand brazilmathematic Wit communicating matherndically dementsunique ddlSodrics br panted lkahmiatio is heady dated enthe toe of ointols and anodic *acid& and contemndifferent. MerMintre b mean weds

An assesunent al students' °Ways° nommanmatemate armada& should be directed at both the IntanS theymoth le the concept; eadpeaccdunts ofmathemedia and dryfluency in lading dead undo:Wakes aniettrafing kfewexpressed in inathemcila At M any larrage commustiration

mothermaio meant! one a able to use its acenbakuy.natation andPattie /0 evens and undedand Wigs andrthanoruhips Pt Ma tease canmauumew mathematic itintegral to lunang and Acing mathematics

ouJJ

,EOCDP'l AvAILABLE

t

Page 334: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and
Page 335: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

tJC.;

AFB l bit riliglin ill II HMI lipipli mi ilful H iTh El 11" * Plitt tlikikpr i taPail i nu ipil diam 4fL L/i giav m 3fr- pi Mar8in itti il ; 4 x ka 1411111 I. Oran lin 14thf t 4140100 III

11'4 !jig Hitin alffidin 1 I 1 I OWIllii 11rile! Flfir' 144911041 : P. 010 4 11.

idi 1 141111 140411111i g i Itilkilq -11

1 wit:4 ilt.4 1%illpp... ;. .i -Itway lh4 Ong %I.:41 INtailtattli I lilt ill

Iwl V: [41 W : I PUilia Id1 04 , kw

8 - q - 17 nl ' 519r13.

11 11 4 rilWw1-11 441111"R.; itl? /41112E3111

hit-.2..slikine-POE4 ritiP d " IP Oil

d°611010hin tirdpi witsi114' watniIpah rill 01 1tt ln 204-rirrpi: is +its 2,:cr.r1

isH11.1 gillritat it P4Lit"di Fig efiw au.

4 i i/kwgi 'Ai wil 141 /Er

PIAUI. !gl 111 it%.

Page 336: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and
Page 337: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and
Page 338: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

C.)

Page 339: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and
Page 340: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

CJ

C

co

tD

rHimutpigioid I ;WI51111,101111"1 Iggionthi1,11. t lihrgliph4404%11

tintrPhlitthiaI NW tti MP 1 [lb 1111I

Page 341: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Appendices

frakly kriollIVIL /ant

333

Anima: 13

Mathematics Rating Form

was Al Unarmandlowof Ink

as 0 EWOUCEFara, 0 V

A2 Now - Clay ofAppoachasfProadurosSOIBICES OF (MEC(OnnuavOnapley low w maiOWN *TV V rie

43 Why - Dec MawMoog !ha Way

BOURCIS CO MVO!C Marew

setae CHV. n AVOIfauum vi .. al)WV^ of V &VD00'.Wneve V

Powneirw w, MVO.Cava d Hainarena H Won 0

ereweemAdeie

ENTRY 1Me: J J J

P I A 014,s tvicricn AVM.. clew

ENTRY 2..: J __J

P I A 0AV V.V. AP.001 0110

ENTRY 3TS: J J J

P I A 0AAA. Peones,. Apes one.

ENTRY 4Me: _J __I __J

P I A 0even InWIVIV VV. ow

ENTRY 5sat __, _J __J

P I A 0P.a. M....fawn Vv.. on.

ENTRY 6Title: J J J

P I A 0NTH. mnicalcn fivnon Ore

ENTRY 7lint: J J J

P I A 0PV. Vv., aValon CV

OVERALLRATINGS

vl.

UHOERSIVD010or TASK

FINAL MONO0 .0/ .V.004

SOW - CUAUTY OFanitOACICPRCCEDURIS

MAL RATINGa

WHY - IXOSICKS ALCMlicE WAY

MAL FlAt0H1OW . 4 a! WeVWV.. . ...s paW rev VvIvan

Hint4 *V WWel

7hdnI gent*/ Mni

M inir ie.+Li) .1.1.4CIVV V/ wet

Law .........yam.al Vonvadvle.

Ere Wm we

M 4"'"SCI III~.*so ..........,

COMMENTS:

u 4.

3ESIi.)3PY tiV,P:1! ABI_F_

Page 342: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

334 Mathematics Assessment and Evaluation

Veermilltslia Pato& AuaaverNOsa

At MN - Outcomes01 Ac &NE

SOURCES CS D1OOCE.

Toro a.a. - error..orrecren wae..:nwants inn....,..

81 Linguist ofMil im 114

SOLROCS OF DIOECEina.ThManteroi

62 lalhoott1101Rapti Unlaced

SOUROLS a ('10 (1CGaps Wan ...aYowl.

MiS

63 Clod ly 01 CONTENT TALLIES

P1ifletts11/81btACIS OF EVOOC(Is ups. WfrmayalIs setTad nroasslrat tau

tin. i.t. -..t......... Ait paings %...,.

.....41 w.1 Oineatsa ASt....st a/1,m., M1 .n.intaWan 41-neatoron

anal A

A _J _J J IS"

J A sarre 4A 14Wain .

_.1 J __J __ILT" g

i 0~1W in. M

......._1 __J _Jlisnatr

_J H _J __J

, CWMCT TOt

a A

a JIM

_1 _ --I _AT - OVICCVSS OF

AMMOL Ansa

0 Wt...., sungl y... ..est.,LT "for Iv nen

acn".I sn . , .., ninu.. rano-

LANOLMOC. OF

! ItIllerATICEmug. 'undo

.0...veer n

ItATHDILtliCALPRIKSEPITAIIOM

MAL RI17113Tear ra.. ....

CI_AATT OFPIMINTATON

MAL RAMOID u... arras

00.01011001T C0110,11

sifter, ,1L.11*,Was. Miveco

I tent. ..,..IDEI

nalLO im w ..orour Ss

rant;

(en*. Wog ONOWin Or maNI saeourED c. t....-.....

.-0.. is.s.

40.0...rt.Sp., an Ivan.ITt ..-....c.

.9a9P nor 4 et s-a'Min .c. nos terIns, none..;...

3-2, mt.. .............. "retOr,.

D am vie 0 sute-aal.s.sumng

Page 343: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References

Chapter 1National Council of Teachers of Mathematics. (1989). Curriculum and

evaluation standards for school mathematics. Reston. VA: Author.Romberg, T. A.. Zarinnia. E. A., & Williams, S. R. (1989). The influence

of mandated testing on mathematics instruction: Grade 8 teachers'perceptions. Madison: University of Wisconsin. National Center forResearch in Mathematical Sciences Education.

Romberg. T. A.. Zarinnia, E. A., & Williams. S. R. (1990). Mandatedschool mathematics testing in the United States: A survey of statemathematics supervisors. Madison: University of Wisconsin. NationalCenter for Research in Mathematical Sciences Education.

Chapter 2Apple, M. W. (1979). Ideology and curriculum. London: Routledge &

Kegan Paul.Ayres. L. R (1918). History and present status of educational mea-

surements. In S. C. Parker (Ed.). The measurement of educationalproducts: Seventeenth yearbook of the National Society for the Studyof Education (Part II). Bloomington. IL: Public School Publishing Co.

Begle, E. G., & Wilson. J. W. (1970). Evaluation of mathematics pro-grams. In Begle, E. G. (Ed.). Mathematics education. 69th Yearbookof the NSSE (Part 1). Chicago: University of Chicago Press.

Bloom. B. S. (Ed.). (1956). Taxonomy of educational objectives: Theclassification of educational goals. Handbook I: Cognitive domain. NewYork: McKay.

Campbell. D. T.. & Stanley. J. T. (1966). Experimental and quasi-experimental designs for research. Chicago. Rand McNally.

Clarke. D. J. (1987). The interactive monitoring of children's learningof mathematics. For the Learning of Mathematics. AIL 2-6.

Close. C.. & Brown, M. (1988). Graduated assessment in mathematics:Report of the SSCC study. London: Department of Education andScience.

335

;A

Page 344: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

336 Mathematics Assessment and Evaluation

Collis, K. F. (1987). Levels of reasoning and the assessment of math-ematical performance. In T. A. Romberg & D. M. Stewart (Eds.), Themonitoring of school mathematics: Background papers. Madison: Wis-consin Center for Education Research.

Collis, K. F., Romberg. T. A.. & Jurdak, M. E. (1986). A technique forassessing mathematical problem-solving ability. down& for Researchin Mathematics Education. 17(3), 206-221.

de Lange, J. (1987). Mathematics, insight, and meaning: Teaching, learn-ing and testing of mathematics for the life and social sciences. Utrecht,The Netherlands: Rijksuniversiteit Utrecht.

Bash. M. J. (1985). Evaluation research and program evaluation: Ret-rospect and prospect: A reformulation of the role of the evaluator.Educational Evaluation and Policy Analysis, 7(3), 249-52.

Eisenberg. T. (1975). Behaviorism: The bane of school mathematics.Journal of Mathematical Education, Science, and Technology. 6(2),163-71.

Eisner. E. (1976). Educational connoisseurship and criticism: Theirform and function in educational evaluation. Journal of AestheticEducation, 10. 173-79.

Fetterman. D. (1984). Ethnography in educational evaluation. BeverlyHills: Sage.

Foxman, D. D.. Badger, M. E., Martini. R. M.. & Mitchell, R (1981).Mathematical development secondary survey report no. 2. London:Her Majesty's Stationery Office.

Foxman, D. D.. Cresswell, M. J.. Ward. M.. Badger. M. E., Tucson, J.A.. & Bloomfield. B. A. (1980). Mathematical development primarysurvey report no. 1. London: Her Majesty's Stationery Office.

Freeman. F. N. (1930). Mental tests: Their history, principles and appli-cations (rev. ed.). Boston: Houghton Mifflin.

Gorth. W. P., Schriber, P. E.. & O'Reilly, R. P. (1974). Comprehensiveachievement monitoring: A criterion-referenced evaluation system. NewYork: Educational Technology Publishers.

Greene, H. A., Jorgensen, A. N., & Gerberich, J. R. (1953). Measure-ment and evaluation in the elementary school (2nd ed.). New York:Longmans.

Guba, E.. & Lincoln. Y. (1981). Effective evaluation. San Francisco:Jossey-Bass.

Guralnik, P. B. (Ed.). (1985). Webster's New World Dictionary. NewYork: Prentice-Hall.

McLean, I,. D. (1982). Report of the 1981 field trials in English andmathematics: Intermediate division. Toronto, Ontario: The Minister ofEducation.

National Coalition of Advocates for Students. (1985). Barriers to excel-lence: Our children at risk. Washington. DC: Author.

Page 345: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 337

National Commission on Excellence in Education. (1983). A nation atrisk: The imperative for educational reform. Washington. DC: US Gov-ernment Printing Office.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston. VA: Author.

Odell, C. W. (1930). Educational measurements in high schooL NewYork: Century.

O'Keefe, J. (1984). The impact of evaluation on federal education pro-gram policies. Studies in Educational Evaluation. 10. 612-74.

Patton, M. 9. (1980). Qualitative evaluation methods. Beverly Hills:Sage.

Peterson. P. L.. Fennema, E.. Carpenter, T. P., & Loef, M. (1989).Teachers' pedagogical content beliefs in mathematics. Cognition andInstruction. 6(1), 1-40.

Popkewltz. T. S. (1984). Paradigm & ideology in educational research.London: The Falmer Press.

Reinhard. D. (1972). Methodology for input evaluation utilizing advo-cate and design teams. Unpublished doctoral dissertation. The OhioState University.

Romberg. T. A. (1987). The domain knowledge strategy for mathemati-cal assessment. Project Paper it/. Madison. WI: National Center forResearch in Mathematical Sciences Education.

Romberg. T. A. (1975). Answering the questionis lt- any good?Therole of evaluation in multi-cultural education through competency-based teacher education. In C. A. Grant (Ed.), Sifting and winnow-ing: An exploration of the relationship between multi-cultural educa-tion and CBTE. Madison, WI: Teacher Corps Associates.

Romberg. T. A. (1976). Individually guided mathematics. Reading. MA:Addison-Wesley.

Romberg. T. A. (1983). A common curriculum for mathematics. In G.D. Fenstermacher & J. I. Goodlad (Eds.). Individual cl(fferences andthe common curriculum. Chicago: The University of Chicago Press.

Romberg. T. A. (Ed.). (1985). Toward effective schooling. New York:University Press of America.

Romberg. T. A.. & Kilpatrick, J. (1969). Appendix D. Preliminary studyon evaluation in mathematics education. In T. A. Romberg & J. W.Wilson (Eds.), The development of tests. NLSMA report no. 7. (pp.281-981. Stanford, CA: School Mathematics Study Group.

Schoenfeld. A. H.. & Herrmann. D. J. (1982). Problem perception andknowledge structure in expert and novice mathematical problemsolvers. Journal of Experimental Psychology: Learning, Memory. andCognition, 8. 484-94.

Scriven. M. (1974). Evaluation perspectives and procedures. In W. J.Popham (Ed.). Evaluation in Education. Berkeley: McCutchan.

Page 346: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

338 Mathematics Assessment and Evaluation

Spearman. C. (1904). General intelligence objectively determined andmeasured. American Journal of Psychology, 15. 201-93.

Stake, R. E. (1974). Program evaluation. particularly responsive evalua-tion. Occasional Paper No. 5. Calamus: Western Michigan UniversityEvaluation Center.

Stake, R. E.. & Gjerde. C. (1974). An evaluation of the T-CITY. In Kraftet al. (Eds.). Four evaluation examples: Anthropological. economic,narrative and portrayal. AERA Monograph Series on CurriculumEvaluation, No. 7. Chicago: Rand McNally.

Swan, M. (1986). The language of graphs: A collection of teaching mate-rials. Nottingham, UK: University of Nottingham, The Shell Centrefor Mathematical Education.

Thorndike, E. L. (1904). An introduction to the theory of mental andsocial measurements. New York: Teachers College. Columbia Univer-sity.

Tyler. R. W. (1931). A generalized technique for constructing achieve-ment tests. Educational Research Bulletin. 8. 199-208.

Vergnaud. G. (1982). Cognitive and developmental psychology and re-search in mathematics education: Some theoretical and method-ological issues. For the Learning of Mathematics. 3(2). 31-41.

Watson. G. (1938). The specific techniques of investigation: Testingintelligence, aptitudes, and personality. In G. M. Whipple (Ed.), Thescientific movement in education: Thirty-seventh yearbook of the Na-tional Society for the Study of Education (Part II. pp. 365-66).Bloomington, IL: Public School Publishing.

Weinzweig, A. L. & Wilson, J. W. (1977). Second !EA mathematicsstudy: Suggested tables of specifications for the lEA mathematicstests. Working Paper I. Wellington. New Zealand: LEA InternationalMathematics Committee.

Young. M. F. D. (1975). An approach to the study of curricula associally organized knowledge. in M. Colby. .1 Greenwald. & R. West(Eds.). Curriculum design (pp. 101-27). London: The Open UniversityPress.

Chapter ;.3Carpenter, T. R, & Fennema. E. (1988). Research and cognitively guided

instruction. Madison, WI: National Center for Research in Mathemati-cal Sciences Education.

Carpenter, T. P., Fennema, E.. Peterson. P. L.. & Carey. D. A. (1987).Teachers' pedagogical content knowledge in mathematics. Paper pre-sented at the American Educational Research Association, Washing-ton. DC.

Carpenter. T. P.. Fennema, E., & Peterson, P. L. Assessing children'sthinking. Unpublished working paper for the Cognitively Guided In-struction Project. Wisconsin Center for Education Research.

3 ;6

Page 347: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 339

de Lange. J. (1987). Mathematics. insight. and meaning: Tcaching. learn-ing and testing of mathematics for the life and social sciences. Un-published doctoral dissertation, Rijksuniversiteit Utrecht, The Neth-erlands.

Guilford. J. P. (1965). Fundamental statistics in psychology and educa-tion. New York: McGraw-Hill Book Company.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston, VA: Author.

Romberg. T. A. (1987). The domain knowledge assessment strategy.Working Papers 87-1. School Mathematics Monitoring Center. Madi-son: Wisconsin Center for Education Research.

Swan. M. (1985). The language of functions and graphs. UK: Universityof Nottingham. Shell Centre for Mathematical Education.

Vergnaud. G. (1982). Cognitive and developmental psychology and re-search in mathematics education: Some theoretical and method-ological issues- For the Learning of Mathematics. 3(2), 31-41.

Chapter 4

Baron. J.. Forgione, P., Rindone, D.. Kruglenski, H., & Davey, B.(1989. March). Toward a new generation of student outcome mea-sures: Connecticut's Common Core of Learning assessment. Paperpresented at the annual meeting of the American Educational Re-search Association. San Francisco. CA.

California State Department of Education (1989). A question of think-ing: A first look at students' performance on open-ended questions inmathematics. Sacrametno. CA: Author.

The California Achievement Test. (1985). Monterey: CTB/McGraw Hill.The Comprehensive Test of Basic Skills. (1989). (4th ed). Monterey:

CTB/McGraw Hill.de Lange. J.. van Reemvijk. M.. Burrill. G.. & Romberg. T. A. (in press).

Learning and testing mathematics in context: The case: Data visual-ization. National Council of Teachers of Mathematics: Reston. Va.

Edelman..1 F. (1980). The impact of the mandated testing program onclassroom practices: Teacher perspectives. (Doctoral dissertation.University of California. Los Angeles) Dissertation Abstracts Interna-tional. 41, 04.

The Iowa Test of Basic Skills. (1986). Chicago. IL: Riverside PublishingCo.

The Metropolitan Achievement Test. (1986). (6th ed). San Antonio. TX:The Psychological Corporation.

Meier. T. (1989. Jan/Feb). The case against standardized achievementtests. In Rethinking Schools. Vol. 3(2). 9-12.

Millman. J.. Bishop. C. li.. & Ebel. R. (1965). An analysis of test-tviseness. Educational and Psychological Measurement, 25. 707-26.

Page 348: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

340 Mathematics Assessment and Evaluation

National Council of Teachers of Mathematics (1980). An agenda foraction: Recommendations for school mathematics of the 1980's. Reston.VA: Author.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston, VA: Author.

Putnam. R. T., Lampert. M.. & Peterson, R L. (1989). Alternative per-spectives on knowing mathematics in elementary schools. Center forthe learning and teaching of elementary school subjects. MichiganState University. Lansing.

Romberg. T. A.. Wilson. L.. & Chavarria, S. (1990). An examination ofstate and foreign tests. Madison: Wisconsin Center for EducationResearch.

Romberg. T. A.. Wilson. L., & Khaketla, M. (1989). An examination ofsix standard mathematics tests for grade eight. Madison: WisconsinCenter for Education Research.

Romberg. T. A.. Zarinnia. A.. & Williams, S. (1989). The influence ofmandated testing on mathematics instruction: Grade 8 teachers' per-ceptions. Madison: University of Wisconsin. National Center for Re-search in Mathematical Sciences Education.

Science Research Associates Sunset- (Basic Skills. (1985). Chicago, IL:Author.

The Stanford Achievement Test. (1982). (7th ed). San Antonio, TX: ThePsychological Corporation.

Stanley. J. C.. & Hopkins. K. D. (1981). Educational and psychologicalmeasurement and evaluation. Englewood Cliffs. NJ: Prentice-Hall.

Chapter 5Coley. R. J., & Goertz. M. E. (1990). Educational standards in the 50

states. Research Report 90-15. Princeton, NJ: Educational TestingService.

Massachusetts Department of Education. (1987). The 1987 Massachu-setts Educational Assessment Program. Quincy: Author.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston, VA: Author.

Shavelson, R. J. (1990). Can indicator systems improve the effective-ness of mathematics and science education? The case of the U.S.Santa Barbara: University of California.

National Assessment of Educational Progress (1988). Mathematics ob-jectives. 1990 Assessment: The nation's report card. Princeton. NJ:Educational Testing Service.

Chapter 6California Slate Department of Education (1985). Mathematics frame-

work for California public schools: Kindergarten throttgh grade 12.Sacramento: Author.

c

Page 349: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 341

California State Department of Education (1987). Mathematics modelcurriculum guide. K-8. Sacramento: Author.

California State Department of Education (1989). A question of think-ing: A first look at student performance on open-ended questions inmathematics. Sacramento: Author.

California State Department of Education (1989). Survey of academicskills: Mathematics, grade 12. Sacramento, CA: Author.

Cronbach. L. J. (1980). Toward reform of program evaluation. SanFrancisco: Jossey-Bass Publishers.

Honig, W. (1985). Last chance for our children. New York: Addison-Wesley.

Lester, F. K., Jr. (1978). Mathematical problem solving in the elemen-tary school: Some educational and psychological considerations. InL. L. Hatfield & D. A. Bradbard (Eds.), Mathematical problem solving:Papers from a research workshop. Columbus. OH: ERIC Clearing-house for Science, Mathematics, and Environmental Education.

Lester, F. K., Jr. (1982). Reflections about teaching mathematical prob-lem solving in the elementary grades. In R. I. Charles & E. A. Silver(Eds.). The teaching and assessing of mathematical problem-solving(pp. 115--124). Hillsdale. NJ: Erlbaum.

Lesh, R. (1983. June). Conceptual analyses of problem soloing perfor-mance. Paper presented at the Conference on Teaching Mathemati-cal Problem Solving. San Diego State University. San Diego.

Lord, F. M. (1962). Estimating norms by item sampling. Educationaland Psychological Measurement. 22. 259-67.

Mayer. E. (1983. June). Implications of cognitive psychology for instruc-tion in mathematical problem solving. Paper presented at the Confer-ence on Teaching Mathematical Problem Solving. San Diego StateUniversity. San Diego.

Millman, J. (1974). Criterion-referenced measurement. In W. J. Popham(Ed.), Evaluation in Education (309-397). Berkeley: McCutchan Pub-lishing Company.

National Council of Teachers of Mathematics. An agenda for action:Recommendations for school mathematics of the 1980s. Reston. VA:NCTM, 1980.

Newell. A.. & Simon, H. A. (1972). Human problem solving. EnglewoodCliffs. NJ: Prentice-Hall.

Pandey. T. N. (1974). Estimating the standard en-or of the mean inmultiple matrix sampling when items are sampled with and withoutreplacement. Paper presented at the annual meeting of the AmericanEducational Research Association, Chicago.

Pandey, T. N. (1983). Structure for the assessment of problem solving.Paper presented at the annual meeting of the American EducationalResearch Association. New Orleans.

c'c t (La 4 LI

Page 350: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

342 Mathematics Assessment and Evaluation

Pandey, T. N., & Carlson, D. (1975). Assessing payoffs in the estima-tion of the mean using multiple matrix sampling designs. In D. N.M. De Gruijter & L. T. Th. van der Kamp (Eds.), Advances in Psycho-logical and Educational Measurement (265-275). New York: Wiley.

Pandey. T. N., & Carlson, D. (1983). Application of item responsemodels to reporting assessment date. In R. K. Hambleton (Ed.),Applications of Item Response Theory (212-229). Vancouver. BC: Edu-cational Research Institute of British Columbia.

Polya. G. (1957). How to solve it (3rd ed.) Garden City. NJ: Doubleday.Polya. G. (1965). Mathematical discovery: On understanding, learning,

and teaching problem solving (Vol. 2). New York: Wiley.Popham, W. J. (1973). Evaluating instruction. New Jersey: Prentice-

Hall.Resnick. L. B. (1983). Mathematics and science learning: a new con-

ception. Science, 220. 477-78.Schoenfeld. A. H. (1982). Some thoughts on problem solving research

and mathematics education. In F. K. Lester & J. Garofalo (Eds.),Mathematical Problem Solving: Issues in Research (22-37). Philadel-phia: The Franklin Institute Press.

Silver, E. A. (1982. January). Thinking about problem solving: Towardan understanding of meta-cognitive aspects of mathematical prob-lem solving. Prepared for the Conference on Thinking. University ofthe South Pacific. Suva. Fiji.

Sternberg, R. J. (1981). Intelligence as thinking and learning skills.Educational Leadership. October, 1981.

Sternberg. R. J. (1983, February). Criteria for intellectual skills train-ing. Educational Researcher, 6-12.

Chapter 7

College Board. (1985). Academic preparation in mathematics: Teachingfor transition from high school to college. New York: Author.

Demana. F., & Waits, B. K. (1989). Precalculus mathematics: A graph-ing approach. Reading, MA: Addison-Wesley.

Demana, F.. & Waits. B. (1990). Enhancing mathematics teaching andlearning through technology. In T. Cooney (Ed.), Teaching and learn-ing mathematics in the 1990s (pp. 212-222). Reston. VA: NationalCouncil of Teachers of Mathematics.

Demana, F. D.. Foley. G.. Harvey. J. G.. Osborne. A., & Waits, B. K.Results of the 1988-89 field test of precalculus: A graphing approach.Unpublished manuscript.

Fey. J. T. (Ed.). (1984). Computing and mathematics: The impact onsecondary school curriculum. Reston. VA: National Council of Teach-ers of Mathematics.

Page 351: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 343

Fey, J. T. (1989). School algebra for the year 2000. In C. Kieran & S.Wagner (Eds.), Research issues in learning and teaching algebra (pp.199-213). Reston. VA: National Council of Teachers of Mathematics.

Fey, J., & Held. M. K. (1987. June). Effects of computer-based curriculain algebra. Paper presented at a meeting of National Science Foun-dation Project Directors. College Park, MD.

Goldenberg, E. R (1988). Mathematics, metaphors, and human fac-tors: Mathematical, technical, and pedagogical challenges In the edu-cational use of graphical representation of functions. Journal of Math-ematical Behavior, 7. 135-73.

Harvey. J. G. (1989). Placement test issues in calculator-based math-ematics examinations. In J. W. Kenelly (Ed.). The use of calculatorsin the standardized testing of mathematics (pp. 25-46). New York:College Board & Mathematical Association of America.

Harvey. W.. Schwartz, J., & Yerushalmy, M. (1988). Visualizing alge-bra: The function analyzer (Computer Software). Pleasantville, NY:Sunburst.

Kaput. J. (1989). Linking representations in the symbol systems ofalgebra. In C. Kieran & S. Wagner (Eds.). Research issues in learningand teaching algebra (pp. 167-94). Reston, VA: National Council ofTeachers of Mathematics.

Leinhardt. G.. Zaslavasky. 0.. & Stein, M. K. (1990). Functions, graphs,and graphing: Tasks. learning and teaching. Review of EducationalResearch, 60(1), 1-64.

Lynch. J. K., Fischer. P.. & Green. S. F. (1989). Teaching in a computer-intensive algebra curriculum. Mathematics Teacher 82 688-94.

National Council of Teachers of Mathematics. (1980). An agenda foraction: Recommendations for school mathematics of the 1980s. RestonVA: National Council of Teachers of Mathematics: Author.

National Council of Teachers of Mathematics. Commission on Stan-dards for School Mathematics. (1989). Curriculum and evaluationstandards for school mathematics. Reston. VA: Author.

Rubenstein. R.. Schultz. J.. Hacicworth, M.. Flanders, J., Kissane, B.,Aksoy, D., Brahos, D., Senk, S.. & Usiskin. Z. (1988). Functions,statistics and trigonometry with computers. Chicago: University ofChicago School Mathematics Project.

Sarther, C., Hedges. L., & Stodolsky. S. Formative evaluation of func-tions. statistics and trigonometry with computers. Unpublished manu-script. University of Chicago School Mathematics Project, Chicago.

Senk, S. L. (1989). Toward algebra in the year 2000. In C. Kieran & S.Wagner (Eds.). Research issues in learning and teaching algebra (pp.214-19). Reston, VA: National Council of Teachers of Mathematics.

Waits. B. K.. & Demana. F. (1989). Computers and the rational roottheoremanother view. Mathematics Teacher. 82, 124-25.

' f"

t 1

Page 352: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

344 Mathematics Assessment and Evaluation

Waits. B. K., & Demana. F. (1988). Master Grapher (Computer soft-ware]. Reading. MA: Addison-Wesley.

Wiske. M. S., Zodhiates, P., Wilson. B.. Gordon. M., Harvey. W.. Krensky,L.. Lord. B., Watt, M., & Williams, K. (1988. March). How technologyaffects teaching. Cambridge: Harvard Graduate School of Education,Educational Technology Center.

Chapter 8Abo-Elkhair, M. E. (1980). An investigation of the effectiveness of us-

ing minicalculators to teach the basic concepts If average in theupper elementary grades. Dissertation Abstract.- :r.t.-trnational. 41,2980A. (University Microfilms No. 81-01, 953).

Bone. D. D. (1983). The development and evaluation of an introduc-tory unit on circular functions and application.: b used on use ofscientific calculators. Dissertation Abstracts International. 44. 1363A.(University Microfilms No. 83-22, 178).

Boyd. L H.. Lindquist. M. M.. Harvey. J. G.. & Waits. B. K. (1989).Calculator-based arithmetic and skilLs test. Washington. DC: Math-ematical Association of America.

Casterlow. G.. Jr. (1980). The effects of calculator instruction on theknowledge, skills, and attitudes of prospective elementary math-ematics teachers. Dissertation Abstracts International, 41. 4319A.(University Microfilms No. 81-07. 547).

Cederberg, J., Demana. F. D., Harvey J. G., & Northcutt. R. A. (inpress). Calculator-based algebra test. Washington. DC: MathematicalAssociation of America.

Colefield. R. P. (1985). The effect of the use of electronic calculatorsversus hand computation on achievement in computational skillsand achievement in the problem-solving abilities of remedial middleschool students in selected business mathematics topics. Disserta-tion Abstracts International. 46, 2168A. (University Microfilms No.85-21. 950).

College Board. (1983). Academic preparation for college: What studentsneed to know and be able to do. New York: Author.

Conference Board of the Mathematical Sciences. (1983). New goals formathematical sciences education. Washington. DC: Author.

Connor, P. J. (1981). A calculator dependent trigonometry programand its effect on achievement in and attitude toward mathematics ofeleventh and twelfth grade college bound students. Dissertation Ab-stracts International. 42. 2545A. (University Microfilms No. 81-24.741).

Curtis. P. C.. Jr.. Harvey. J. G.. Madison. B. L.. & McCammon M. (inpress). Calculator-based basic algebra test. Washington. DC: Math-ematical Association of America.

n207

Page 353: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 345

Demana, F. D., & Leitzel, J. R. (1984). Transition to college mathemat-ics. Reading. MA: Addison-Wesley.

Demana, F. D., Leitzel J. R.. & Osborne. A. (1988). Getting ready foralgebra: Level 1 & Level 2. Lexington, MA: D. C. Heath.

Demana, F. D., & Waits, B. K. (1990). College algebra and trigonom-etry: A graphing approach. Reading, MA: Addison-Wesley.

Demana. F. D.. Waits, B. K., Foley, G. D.. & Osborne, A. (1990).College algebra and trigonometry: A graphing approach (Instructor'sresource guide). Reading. MA: Addison-Wesley.

Elliott. J. W. (1980). The effect of using hand-held calculators onverbal problem solving ability of sixth-grade students. DissertationAbstracts International. 41. 3464A. (University Microfilms No. 81-01.829).

Epstein. M. G. (1968). Testing in mathematics: Why? what? how?Arithmetic Teacher 15(4), 311-19.

Gimmestad, B. J. (1982). The Impact of the calculator on the contentvalidity of Advanced Placement calculus problems. Houghton, MI:Michigan Technological University, Department of Mathematical andComputer Sciences. (ERIC Document Reproduction Service No. ED218 074).

Golden. C. K. (1982). The effect of the hand-held calculator on math-ematics speed, accuracy, and motivation on secondary educable men-tally retarded students (Grades 7-9). Dissertation Abstracts Interna-tional. 43. 231 IA. (University Microfilms No. 82-26. 927).

Harvey. J. G. (1989a). Placement test issues in calculator-based math-ematics examinations. In J. W. Kenelly (Ed.). The use of calculatorsin the standardized testing of mathematics. New York: College Board& Mathematical Association of America. 25-33.

Harvey. J. G. (1989b). What about calculator-based placement tests?The AMATYC Review, 11(1. Part 2), 77-81.

Hopkins. B. L. (1978). The effect of a hand-held calculator curriculumIn selected fundamentals of mathematics classes. (Doctoral disserta-tion. University of Texas at Austin, 1978). Dissertaticn AbstractsInternational. 39, 280IA.

. The effect of a hand-held calculator curriculum in selectedfundamentals of mathematics classes. In J. Lewis & H. D. Hoover,The effect of pupil performance on using hand-held calculators onstandardized mathematics achievement tests. Paper presented atthe Annual Meeting of the National Council on Measurement inEducation. April 1981. Los Angeles.

Kenelly, J. W. (1989). The use of calculators in the standardized testingof mathematics. New York: College Board & Mathematical Associa-tion of America.

Kenelly, J. W., Harvey. J. G.. Tucker T. W.. & Zorn. R (1990). Calcula-tor-based calculus readiness test. Washington, DC: Mathematical As-sociation of America.

t.l

Page 354: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

346 Mathematics Assessment and Evaluation

Kilpatrick, J. (1985). Academic preparation in matheinattcs Teachingfor transition from high school to college. New York: College Board.

Kouba V. L.. & Swafford. J. 0. (1989). Calculators. In M. M. Lindquist(Ed.). Results from the fourth mathematics assessment of the NationalAssessment of Educational Progress (94-105). Reston. VA: NationalCouncil of Teachers of Mathematics.

Leitzel. J. R.. & Osborne. A. (1985). Mathematical alternatives forcollege preparatory students. In C. R. Hirsch & M. J. Zweng (Eds.).The secondary school mathematics curriculum (105-165). 1985 Year-book of the National Council of Teachers of mathematics. Reston.VA: National Council of Teachers of Mathematics.

Leitzel. J. R.. & Waits. B. K. (1989). The effects of calculator use oncourse tests and on statewide mathematics placement tests. In J. W.Kenelly (Ed.). The use of calculators in the standardized testing ofmathematics (17-24). New York: College Board & Mathematical As-sociation of America.

Lewis. J.. & Hoover. H. D. (1981. April). The effect on pupil performanceof using hand-held calculators on standardized mathematics achieve-ment tests. Paper presented at the Annual Meeting of the NationalCouncil on Measurement in Education. Los Angeles.

Lindquist. M. M. (Ed.) (1989). Results from the fourth mathematicsassessment of the National Assessment of Educational Progress.Reston. VA: National Council of Teachers of Mathematics.

Long. V. M.. Reys B.. & Osterlind. S. J. (1989). Using calculators onachievement tests. Mathematics Teacher. 82(5). 318-25.

Mathematical Sciences Education Board. (1989). Everybody counts: Areport to the nation on the future of mathematics education. Washing-ton. DC: National Academy Press.

Mellon. J. A. (1985). Calculator based units in decimals and percentsfor seventh grade students. Dissertation Abstracts International. 46.640A. (University Microfilms No. 85-10. 155).

Murphy. N. K. (1981). The effects of a calculator treatment on achieve-ment and attitude toward problem solving in seventh grade math-ematics. Dissertation Abstracts International. 42. 2008A. (UniversityMicrofilms No. 81-21. 439).

National Advisory Committee on Mathematical Education. (1975) Overview and analysis of school mathematics grades K-12. WashingtonDC: Conference Board of the Mathematical Sciences.

National Assessment of Educational Progress. (1988). Mathematics ob-jectives: 1990 assessment. Princeton: Educational Testing Service

National Council of Teachers of Mathematics. (1980). An agenda foraction: Recommendations for school mathematics or the 1980s. RestonVA: Author.

National Council of Teachers of Mathematics. (1986. April). Positionstatement: Calculators in the mathematics classroom. Reston VAAuthor.

n r A

0

Page 355: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 347

National Council of Teachers of Mathematics. Commission on Stan-dards for School Mathematics. (1989). Curriculum and evaluationstandards for school mathematics. Reston, VA: Author.

Romberg. T. A. (1984). School mathematics: Options for the 1990s(Chairman's Report of a Conference). Washington. DC: Department ofEducation. Office of the Assistant Secretary for Educational Re-search and Improvement.

Rule. R L. (1980). The effect of hand held calculators on learningabout: Functions. functional notation, graphing, function composi-tion, and inverse functions. Dissertation Abstracts International, 41,3866A. (University Microfilms No. 81-06, 048).

Chapter 9Bell, R. C., & Hay. J. A. (1987). Differences and biases in English

language examination formats. British Journal of Educational Psy-chology. 57. 212-20.

Biggs, J. B. (1976). Dimensions of study behaviour: Another look atATI. British Journal of Educational Psychology. 46, 68-80.

Bolger. N. (1984. August). Gender difference in academic achievementaccording to method of measurement. Paper presented at 92nd an-nual convention of the American Psychological Association. Toronto.Ontario.

Choppin, B. (1975). Guessing the answer on objective tests. BritishJournal of Educe Ronal Psychology, 45. 206-13.

Crehan, K. D., Gross, L. J.. Koehler, R. A.. & Slakter, M. J. (1978).Developmental aspects of test-wiseness. Educational Research Quar-terly. 3(1), 40-44.

Dwyer. C. A. (1979). The role of tests and their construction in produc-ing apparent sex-related differences. In M. A. Wittig, & A. C. Petersen(Eds.). Sex-related differences in cognitive functioning: Developmentalissues (pp. 335-53). New York: Academic Press.

Graf. R. G.. & Riddell, J. C. (1972). Sex differences in problem solvingas a function of problem context. The Journal of Educational Re-search, 65(10). 451-52.

Hambleton, R. K., & Traub, R. E. (1974). The effects of item order ontext performance and stress. The Journal of Experimental Education.43(1), 40-46.

Hill, K. T. (1984). Debilitating motivation and testing: A major educa-tional problempossible solutions and policy applications. Researchon Motivation in Education: Student Motivation, 1, 245-74.

Kappy, K. A. (1980). Differential effects of decreased testing time onthe verbal and quantitative aptitude scores of males and females.Dissertation Abstracts International, 40112-A). 1980. Fordham Uni-versity, Microfilm #DDJ82-18173.

el!. :-"03

Page 356: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

348 Mathematics Assessment and Evaluation

Khampalikit. C. (1982). Race and sex differences in guessing behavioron a standardized achievement test in the elementary grades. Dis-sedation Abstracts International, 43(03-A). 1382, University of Pitts-burgh. Microfilm #DDJ80-12788.

Kimball. M. M. (1989). A new perspective on women's math achieve-ment. Psychological Bulletin, 105(2). 198-214.

Kleinke, D. J. (1980). Item order, response location, and examinee sexand handedness and performance on a multiple-choice test. Journalof Educational Research, 73, 225-29

Klimko, I. P. (1984). Item arrangement, cognitive entry characteristics,sex, and text anxiety as predictors of achievement examination per-formance. Journal of Experimental Educatira, 52(4). 214-19.

Lane, D. S.. Jr., Bull. K. S.. rundert. D. K . & Newman. D. L. (1987).The effects of knowledge of item arrangement, gender, and statisticaland cognitive item difficulty on test performance. Educational andPsychological Measurement. 47. 865-79.

Leary. U. F.. & Dorans. N. J. (198Sb). Implications for altering thecontext in which test items appear: A historical perspective on animmediate concern. Review of Educational Research. 55(3). 387-413.

Millman. J., Bishop, H., & Ebel. It (1965). An analysis of test-wiseness.Educational and Psychological Measurement, 25, 707-26.

Murphy. R. J. L. (1982). Sex differences in objective test performance.British Journal of Educational Psychology, 52. 213-19.

Payne, B. D. (1984). The relationship of test anxiety and answer-changing behavior: An analysis by race and sex. Measurement andEvaluation in Guidance, 16(4). 205-10.

Plake. B. S.. Ansorge, C. J.. Parker. C. S., & Lowry. S. R. (1982).Effects of item arrangement. knowledge of arrangement. test anxietyand sex on test performance. Journal of Educational Measurement,19(1). 49-57.

Plake. B. S., Patience, W. M.. & Whitney. D. R. (1988). Differentialitem performance in mathematics achievement test items: Effects ofitem arrangement. Educational and Psychological Measurement. 48.885-94.

Plass J A. & Hill. K. T. (1986). Children's achievement strategiesand test performance: The role of time pressure, evaluation anxiety.and sex. Developmental Psychology. 22(1). 31-36.

Skinner. N. F. (1983). Switching answers on multiple-choice questions:Shrewdness or shibboleth? Teaching of Psychology. 10(4), 220-22.

Slakter, M..1 (1967). Risk taking on objective examinations. AmericanEducational Research Journal, 4(1), 31-43.

Siakter. M. J.. Koehler, R. A.. & Hampton, S. H. (1970). Grade level,sex. anG selected aspects of test-wiseness. Journal of EducationalMeasurement, 7(2), 119-22.

in

Page 357: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 349

Speth, C. A. (1987). The effects of learning style, gender, and type ofexamination on expected test preparation strategies. (Doctoral dis-sertation, The University of NebraskaLincoln. 1987). DissertationAbstracts International. 49(02-A).

Terwilliger. J. S. (1988). Item analysis of 1988 Twin Cities UMTYMPtesting. Unpublished manuscript.

Watkins. D.. & Hattie. J. (1981). The learning processes of Australianuniversity students: Investigations of contextual and personologicalfactors. British Journal of Educational Psychology, 51, 384-93.

Wild. C. L.. Durso, R., & Rubin. D. B. (1982). Effect of increased test-taking time on test scores by ethnic group. years out of school, andsex. Journal of Educational Measurement, 19(1). 19-28.

Chapter 10Baird. J. R.. & Mitchell, I. J. (Eds.). (1986). Improving the quality of

teaching and learning: An Australian Case Studythe PEEL Project.Melbourne. Victoria. Australia' Monash University Printery.

Biggs. J. (1988). The role of metacognition in enhancing learning.Australian Journal of Education 32(2), 127-38.

Clarke. D. J. (1985). The IMPACT project: Project report. Clayton, Victoria,Australia: Monash Centre for Mathematics Education.

Clarke. D. J. (1987). The interactive monitoring of children's learningof mathematics. For the Learning of Mathematics, 7(1), 2-6.

Clarke. D. J. (1989). Assessment alternatives in mathematics. A publi-cation of the Mathematics Curriculum and Teaching Program (MCTP)Professional Development Project. Canberra. A.C.T., Australia: Cur-riculum Development Centre.

Clarke, U. J., Stephens, W. M. & Waywood, A. (1989). Communicationand the learning of mathematics: The Vattc/use StudySupplementA. Oakleigh. N.S.W.. Australia. Australian Catholic UniversityChristCampus.

Garofalo. J.. & Lester. F. K. (1984). Metacognition. cognitive monitor-ing and mathematical performance. Journal for Research in Math-ematics Education 16(3). 163-76.

Kilpatrick. J.(1985). Reflection and recursion. Educational Studies inMathematics 16, 1-26.

Mason. J. (1984). Learning an,' coing mathematics. Mathematics foun-dation course. The Open University Press, UK.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston. VA : Author.

Rowe. M. B. (1978). Wait, wait, wait ... School Science and Mathemat-ics. 78(3). 207-16

Schoenfeld. A. (1985). Mathematical problem solving. London: AcademicPress.

Stephens. W. M. (1982) School work and mathematical knowledge.Madison: Wisconsin Center for Educational Research.

Page 358: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

350 Mathematics Assessment and Evaluation

Waywood. A. (1988). Mathematics and language: Reflections on stu-dents using mathematics Journals. In R. Hunting (Ed.). Languageissues in learning and teaching mathematics. Bundoora, N. S. W.,Australia: Latrobe University.

White, R. T. (1986). Origins of PEEL. In J. R. Baird & I. J. Mitchell(Eds.). Improving the quality of teaching and learning: An AustralianCase Studythe PEEL Project. Melbourne. Victoria, Australia: MonashUniversity Printery.

Chapter 11

Higgs. 13. B. & Collis. K. F. (1982). Evaluating the quality of learning:The SOLO Taxonomy. New York: Academic Press.

Bloom. B. S. (Ed.) (1956). Taxonomy of educational objectives: Theclassification of educational goals. Handbook 1: Cognitive domain.New York: Longman.

Bloom. B.S., Hastings. J. T.. & Madaus. G. F. 11971). Handbook onformative and summative evaluation of student learning. New York:McGraw-Hill.

Chi, M. T. H.. Feltovich. P. J.. & Glaser, R. (1981). Categorization andrepresentation of physics problems by experts and novices. Cogni-tive Science. 5(2). 121- 52.

Collis. K. (1983). Development of a group test of mathematical under-standing using superitem SOLO technique. Journal of Science andMathematics Education tr. South East Asia, 6(1). 5-1...

Cureton, E. E. (1965). Reliability and validity: Basic assumptions andexperimental designs. Educational and Psychological Measurement.25(2). 326-46.

Dahlgren. L-0. (1984). Outcomes of learning. In F. Marton, D. Hounsell.& N. Entwistle (Eds.), The experience of learning. Edinburgh: Sect-Ush Academic Press

D'Ambrosio. U. (1979). Overall goals and objectives for mathematicaleducation. In UNESCO International Commission on MathematicalInstruction. New trends in mathematics teaching. Paris: UNESCO.

Davis. P. J.. & Hersh. R. (1981). The mathematical experience. Boston:Houghton Mifflin.

Freudenthal. H. (1983). Major problems in mathematics education. InM. Zweng, T. Green. J. Kilpatrick, H. Pollack. & M. Suydam (Eds.).Proceedings of the Fourth International Congress on MathematicalEducation. Boston: Birkhauser.

Hambleton. R. K.. & Swaminathan. H. (1985). Item response theory:Principles and applications. Boston: Kluwer-Nijhoff.

Johansson. 13.. Marton, F., & Svensson. L. (1985). An approach todescribing learning as change between qualitatively different con-ceptions. In L. H. West & L. A. Pines (Eds.), Cognitive structure andconceptual change. Orlando: Academic Press.

n 7 (V 0

Page 359: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 351

Ku 1m, G. (Ed.). (1990). Assessing higher order thinking in mathemat-ics. Washington. DC: American Association for the Advancement ofScience.

Larkin. J. H. (1983). The role of problem representation in physics. InD. Gentner & A. Stevens (Eds.), Mental models. Hillsdale, NJ:Erlbaum.

Laurillard, D. (1984). Learning from problem solving. In F. Marton, D.Hounsell. & N. Entwistle (Eds.), The experience of learning. Edinburgh:Scottish American Press.

Leonard. F., & Sackur-Grisvald. C. (1981). Sur des regles implicitesutilisees dans la comparison des nombres decimaux positifs [On twoimplicit rules used in comps- on of positive decimal mumbers].Bulletin de I' APMER 327. 47-60.

Marton. F. (1981). Pbenomenographydescribing conceptions of theworld around us. Instructional Science, 10(2), 177-200.

Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika, 47(2). 149- 74.

Masters. G. N.. & Wilson. M. (1988). PC-CREDIT [Computer Program].Melbourne: University of Melbourne. Centre for the Study of HigherEducation.

McCloskey. M. Caramazza. A., & Green. B. (1980). Curvilinear motionIn the absence of external forces: Naive beliefs about motion ofobjects. Science. 210, 1139-41.

National Council of Teachers of Mathematics. (1980). An agenda foraction: Recommendations for school mathematics of the 1980's Reston,VA: Author.

Nesher, P. (1986). Learning mathematics: A cognitive perspective Ameri-can Psychologist. 41(10), 1114-22.

Nesher. P., & Peled. I. (1984). The derivation of mat-rules in the processof learning. Haifa, Israel: University of Haifa.

Resnick. L. B. (1982). Syntax and semantics In !earning to subtract InT. Carpenter. J. Moser. & T. A. Romberg (Eds.). Addition and sub-traction: A cognitive perspective. Hillsdale, NJ: Earlbaum.

Resnick, L. B. (1984). Beyond error analysis: The role of understand-ing in elementary school arithmetic. In H. N. Cheek (Ed.), Diagnosticand prescriptive mathematics: Issues. ideas and insights. Kent, OHResearch Council for Diagnosis and Prescriptive Mathematics Re-search.

Romberg. T. (1983). A common curriculum for mathematics In 0 DFestermacher & J. I. Goodlad (Eds.). Individual differences and thecommon curriculum: Eighty-second yearbook of the National Societyfor the Study of Education. Chicago: University of Chicago Press

Romberg. T. A.. Collis. K. F., Donovan. B. F.. Buchanan, A E &Romberg. M. N. (1982). The development of mathematical problem

tin -t..;

1

4..

Page 360: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

352 Mathematics Assessment and Evaluation

solving supertterns. (Report of NIE/EC Item Development Project.)Madison: Wisconsin Center for Educational Research.

Romberg. T. A., Jurdack, M. E., Collis. K. F. & Buchanan, A. E. (1982).Construct ualidity of a set of mathematical superitems, (Report ofNIE/EC Item Development Project.) Madison: Wisconsin Center forEducations! Research.

Saijo. R. (1984). Learning from reading. In F. Marton, D. Hounsell, &N. Entwistle (Eds.). The experience of learning. Edinburgh: ScottishAcademic Press.

Samejima, F. (1969). Estimation of latent ability using a responsepattern of graded scores. Psychometrika. Monograph Supplement No.17.

Sandburg, J. A.. & Barnard, Y. F. (1986). Story problems are difficult,but why? Paper presented at the annual meeting of the AmericanEducational Research Association, San Francisco.

Schwartz. J. (1985). The geometric supposer (computer prograinlPleasantville. NY: Sunburst Communications.

Swan. M. (1983). The meaning and use of decimals. Nottingham. Uni-versity of Nottingham: Shell Centre for Mathematical Education.

Van Miele, R M. (1986). Structure and insight: A theory of mathematicseducation. Orlando: Academic Press.

Vergnaud. G. (1983). Multiplicative structure. In R. Lesh & M. Landau(Eds.), Acquisition of mathematics concepts and processes. New York:Academic Press.

Von Glasersfeld, E. (1983). Learning as a constructive activity. In J. C.Bergeron & N. Herscovics (Eds.), Proceedings of the fifth annual meet-ing of the PME-NA. Montreal: Universite de Montreal, Faculte desSciences de l'EducaUon.

Webb, N. L.. Day. R.. & Romberg. T. A (1988). Evaluation of the use of-Exploring Data" and "Exploring Probability." Madison: Wisconsin Cen-ter for Education Research.

Wright. B. D., & Masters. G. N. (1982). Rating scale analysis. Chicago:MESA Press.

Wright, B. D.. & Stone. M. (1979). Best test design. Chicago: MESAPress.

Wilson, M., & iventosch, L. (1988). Using the Partial Credit model toinvestigate responses to structured subtests. Applied Measurementin Education. 1(4), 319-34.

Chapter 12Becker, J. R., & Pence, B. J. (in press). The California case: Curricu-

lum is what counts. In T. A. Romberg & E. A. Zarinnia (Eds.), Fourcase studies on the impact of mandated testing. Madison: WisconsinCenter for Education Research.

J iJ

Page 361: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 353

Bishop. A. J. (1988). Mathematical enculturation. Boston, MA: KluwerAcademic.

Bloom, B. S. (Ed.). (1956). Taxonomy of educational objectives: Theclassification of educational goals. Handbook 1: Cognitive domain.New York: !ongman.

Brown, J. S., Collins, A.. & Duguid, R (1988). Situated cognition andthe culture of learning. Educational Researcher, 18(1). 32-42.

California Assessment Program. (1987). Survey of academic skills: Grade12. Sacramento: California State Department of Education.

California State Department of Education. (1985). Mathematics frame-work for California public schools: Kindergarten through grade 12.Sacramento: Author.

Cocking. R. R.. & Mestre. J. P. (1988). Linguistic and cultural influ-ences on learning mathematics. Hillsdale, NJ: Erlbaum.

Collis, K. F., Romberg. T. A.. & Jurdak, M. E. (1986). A technique forassessing mathematical problem-solving ability. Journal for Researchin Mathematics Education, 17(3), 206-21.

de Lange. J. (1987). Mathematics, insight and meaning: Teaching, learn-ing and testing of mathematics for the life and social sciences. Utrecht.The Netherlands: Rijksuniversiteit Utrecht, Vakgroep OnderzoekWiskundeonderwijs en Onderwijscomputercentrum.

Department of Education and Science. (1985). Mathematics from 5 to16. London, UK: Her Majesty's Stationery Office.

Freudenthal, H. (1983). Didactical phenomenolcgy of mathematical state-, tures. Dordrecht, The Netherlands: D. Reidel.

Gale. D.. & Shapley. L. S. (1967). College admissions and the stabilityof marriage. In M. S. Bell (Ed.), Some uses of mathematics: A sourcebook for teachers. (Studies in Mathematics. Vol. XVI) Stanford, CA:School Mathematics Study Group.

GAIM Team. (1988). Graded assessment in mathematics. London. UK:Macmillan Education.

Mathematical Sciences Education Board. (1989). Everybody counts: Areport to the nation on the future of mathematics education. Washing-ton, DC: National Academy Press.

Mathematical Sciences Education Board. (1990). On the shoulder ofgiants. Washington. DC: National Academy Press.

Melltn- Olsen. S. (1987). The politics of mathematics education. Boston:D. Reidel.

National Assessment of Educational Progress. (1988). Mathematicsobjectives: 1990 assessment. Princeton: Educational TestingService.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston, VA: Author.

Reich, R. B. (1983). The next American frontier. Harmondsworth,Middlesex. UK: Penguin Books.

Page 362: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

354 Mathematics Assessment and Evaluation

Romberg. T. A. (1985, December). The content validity for school math-ematics in the US of the mathematics subscores and items for theSecond International Mathematics Study. Paper presented for theCommittee on National Statistics, National Research Council of theNational Academy of Sciences. Madison: Wisconsin Center for Edu:cation Research.

Romberg. T. A.. & Kilpatrick. J. (1969). Appendix D. Preliminary studyon evaluation in mathematics education. In T. A. Romberg & J. W.Wilson (Eds.). The development of tests. NLSMA report no. 7 (pp.281-98). Stanford. CA: School Mathematics Study Group.

Romberg. T. A., & Wilson. J. W. (1969). The development of tests.NLSMA report. no. 7. Stanford. CA: School Mathematics Study Group.

Romberg. T. A.. & Zarinnia. E. A. (Eds.). (in press a). Four case studieson the impact of mandated testing. Madison: Wisconsin Center forEducation Research.

Romberg, T. A., & Zarinnia. E. A. (Eds.). (in press b). A follow -up of thefour case studies on t, te impact of mandated testing. Madison: Wis-consin Center for Education Research.

Romberg. T. A.. Zarinnia, E. A., & Williams. S. R. (1989). The influenceof mandated testing on mathematics instruction: Grade 8 teachers'perceptions. Madison: University of Wisconsin, National Center forResearch in Mathematical Sciences Education.

Romberg. T. A.. Zarinnia. E. A., & Williams. S. R. (1990). Mandatedschool mathematics testing in the United States: A survey of statemathematics supervisors. Madison: University of Wisconsin. NationalCenter for Research in Mathematical Sciences Education.

Rucker. R. (1987). Mind tools: The five levels of mathematical reality.Boston: Houghton Mifflin.

Scheffler. I. (1975. October). Basic mathematical skills. Some philo-sophical and practical remarks. In The NW Conference on basic math-ematical skills and learning. (Euclid. OH) Volume I: Contributed posi-tion papers (pp. 182-89). Los Alamitos. CA: SWRL EducationalResearch and Development.

School Mathematics Project. (1988). SMP 11-16 0E7' handbook. (Trialed. (rev.)). Southampton, UK: Author.

Sirotnik. K. A. (1984). An outcome-free conception of schooling: Implic-i-tions for school-based inquiry and information systems. Los Angeles:University of California. Center for the Study of Evaluation.

Stentnark. J. K. (1989). Assessment alternatives in mathematics: Anoverview of assessment techniques for the fiiture. Prepared by EQUALSand the Assessment Committee of the California Mathematics Coun-cil. Campaign for Mathematics. Berkeley: University of California.Lawrence Hall of Science.

Westbury. I. (1980. January). Change and stability in the curriculum:An overview of the questions. In Comparative studies of mathematics

r'S 1,)4-1U4

Page 363: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

References 355

curricula: Changes and stability. 1960-1980 (pp. 12-36). Proceedingsof a conference jointly organized by the Institute for the Didactics ofMathematics (IDM) and the International Mathematics Committee ofthe Second International Mathematics Study of the InternationalAssociation for the Evaluation of Educational Achievement (IEA).Bielefeld, FRG: Institut far Didaktik der Mathematik der UniversitatBielefeld.

Wiggins. G. (1989. August). Teaching to the (authentic) test. Educa-tional Leadership 46. 41-47.

Chapter 13Clarkson, P. C. (1991). Bilingualism and mathematics learning. Geelong.

Victoria, Australia: Deakin University Press.Cuevas, G. J. (1984). Mathematics learning in English as a second

language. Journal for Research in Mathematics Education 15(2), 134-44.

Leder. G. (1990). Teacher/student interactions In the mathematicsclassroom: A different perspective. In E. Fennema & G. Leder (Eds.).Mathematics and gender. New York: Teachers College Press.

Mathematical Sciences Education Board. (1989). Everybody counts: Areport to the nation on the future of mathematics education. Washing-ton, DC: National Academy Press.

Mathematical Sciences Education Board. (1990). Reshaping school math-ematics. Washington, DC: National Academy Press.

National Council of Teachers of Mathematics. (1989). Curriculum andevaluation standards for school mathematics. Reston. VA: Author.

Newman, M. A. (1983). The Newman language of mathematics kit.Sydney, N. S..W., Australia: Harcourt Brace Jovanovich.

Pegg, J. E., & Davey, G. (1989). Clarifying level descriptors of children'sunderstanding of some basic 2-D geometric shapes. MathematicsEducation Research Journal 1(1): 16-27.

Secada, W. G. (1990). The challenge of a changing world for math-ematics education. In. T. Cooney & C. R. Hirsch, (Eds.), Teaching andlearning mathematics in the 1990s. Reston. VA: National Council ofTeachers of Mathematics.

Victoria Curriculum and Assessment Board. (1989). Mathematics studydesign. Melbourne. Victoria. Australia: Author.

Watson. I. (1980). Investigating errors of beginning mathematicians.Educational Studies in Mathematics 11, 319-30.

Page 364: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

CONTRIBUTORS

James BraswellEducational Testing ServicePrinceton, New Jersey

Silvia ChavarriaUniversidad de Costa RicaSan Jose

David ClarkeAustralian Catholic University

(Christ Campus)Oakleigh. Victoria. Australia

Philip C. ClarksonAustralian Catholic UniversityAscot Vale, Victoria

John G. HarveyUniversity of Wisconsin-Madison

'Mamphono RhaketlaMinistry of EducationLesotho

Margaret R. MeyerUniversity of Wisconsin-Madison

Tej PandeyCalifornia Department of

EducationSacramento

Sharon SenkMichigan State UniversityEast Lansing

Max StephensMinistry of EducationVictoria. Australia

Thomas A. RombergUniversity of Wisconsin-Madison

Andrew WaywoodVaucluse CollegeRichmond, Victoria. Australia

Norman WebbUniversity of Wisconsin-Madison

Linda WilsonUniversity of Wisconsin-Madison

Mark WilsonUniversity of California-Berkeley

E. Anne ZarinniaUniversity of Wisconsin-

Whitewater

356

n3t34

Page 365: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

AUTHOR INDEX

AAbo-Elkhair, M. E.. 150. 153 Davis. P. J., 215Aksoy, D., 130-131 De Lange, J.. 55-60Apple. M. W.. 28 Dorans, N. J.. 175-176

EElliott, J. W.. 147

Barnard, Y. F.. 218, 222 Eisenberg, T.. 27

Bell. R. C.. 177Biggs. J. B.. 181. 182. 185Binet & Simon, 17-18 Fennema, E., 50, 59Bishop, A. J., 270-271 Fey, J. T.. 132Bloom, B. S., 25, 175. 217. 255 Fisher. G.. 15Bolger, N., 176-177 Flanders, J., 130-131Bone. D. D.. 158-159Braswell, J.. 5. 75-99

G

Garofalo, J., 185Gimmestad. B. J.. 147-148Goetz. M. E., 77

Carey. D. A., 50 Golden, C. K.. 147

Carpenter. T. P.. 50. 59 Goldenberg. E. P., 129, 130-131,

Casterlow. G., Jr.. 153 132

Chavarria. S.. 5.61 -74 Graf, R. G.. 172

Choppin, B.. 177Clarke. D.. 7. 184-212Clarkson. R, 8, 285-300 Hambleton, R. K., 175Coley, R. J., 77 Hampton, S. H., 180Connor, P. J.. 147 Harvey, J. G., 7, 130-131.CorbItt, M. K., 123 136-137, 139-168Cronbach. L. J.. 101 Hattie J. 181-182

357

I-

Page 366: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

358 Mathematics Assessment and Evaluation

Hay. J. A.. 177Held, M. K.. 130-131, 132. 136Herrmann, D. J.. 26Hersh, R.. 215Hill, K. T., 171Hoffer. A., 123Honig. W.. 126Hoover. H. D.. 148

JJohansson, B., 218-219

KKappy. K. A., 170Khaketla, M.. 5. 61-74Khampalikit. C.. 178Kilpatrick. J.. 2Kimball, M. M.. 182Kleinke. D. J.. 174Koehler. R. A., 180

Lane. D. S.. Jr.. 175Laurillard, D., 216Leder, G., 291Leary, L. F.. 175-176Leinhardt, G.. 132Leitzel. J. R.. 151-152Lester. F. K.. 185Lewis. J.. 148Let. R.. 122Lester. F. K.. Jr.. 122Long. V. M., 152-153

MMann, Horace. 15Marton. F.. 219, 220Masters. G. N., 222Mayer. E.. 122Mellon. J. A.. 153Meyer. M.. 7. 124. 169-183

1-1 1

u 0

Millman, J.. 124Munsterberg. H.. 17Murphy. N. K.. 146Murphy. R. J. L.. 176

NNewell. A.. 122

0Osterllnd, S. J., 152-153

PPanday, TeJ, 2. 6. 100-127Patience. W. M.. 174Payne. B. D.. 179Peterson. P. L.. 50. 59Plake, B. S., 173. 174Plass. J. A., 171Polya, G., 122Popkewitz, T., 28Popham. W. J., 124

Resnick. L. B.. 122Reys. B.. 152-153Rice. J. M., 15-16Riddell. J. C., 172Romberg, T. A., 1-9. 10-36.

37-60, 61-74. 123. 242-284Rule. R. L.. 153-154

SSamejima, F.. 222Sandberg. J. A., 218. 222Sarther. C.. 131Scheffler, I.. 254-255Schoenfeld, A., 26. 122Senk, S. L., 6, 128-137Shavelson, R. J.. 76S (continued)

Page 367: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Silver, E. A., 122Simon, H. A., 122Skinner, N. F., 178-179Slakter, M. J., 180, 181Spearman, C.. 17Speth, C. A., 182Stein. M. K., 132Stephens, M.. 7, 184-212Sternberg. R. J., 122Svensson, L., 219Swan, M., 53. 59

Thorndike, E. L.. 17-18Traub. It E., 175Tyler, Ralph, 19

VVergnaud. G.. 30. 45. 215Von Glasersfeld, E., 215

Author Index 359

WWaits. B. K., 131. 136-137.

151-152Watkins, 1/, 181-182Waywood, A., 7, 184-212Webb, N.. 5, 37-60Westbury, 1., 254White, R. T., 185Whitney, D. R., 174Wilson. J., 123Wilson, L., 5, 61-74Wilson, M.. 8. 213-241. 292Wright, B. 1/, 222

Young, M., 28

Zarinnia, A.. 8, 61. 242-284Zaslayslcy. 0.. 131-132

r, l -4la 0

Page 368: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

INDEX

AAchievement measurement. 248

for Chapter I students, 245Army Alpha Tests, 17-18Assessment (see also Evaluation

and Tests & Testing)accountability. 8. 101, 242,

250, 287, 288achievement, 298administrative use of. 6aligned with curriculum, 257alignment with NCTM

Standards, 43-47, 61, 78,257

alternative forms of, 42. 48-49,190-191, 247, 249. 251-272,276, 280

basic skills/competency. 77bias in, 291calculator or computer use in,

81. 130. 288, 297, 298-299collaborative, 284computational skill, 122computer-administered tasks,

221conceptual understanding, 79congruence between methods

and use, 48continuous, 276criteria for, 37, 49, 42-48current trends in, 28-35curriculum evaluation, 75,

100-101360

an N eN

ti

educational change. 242. 250effecting change in, 293-228,

299employability skills, 75for educational decision

making, 11-13gender issues in, 4, 7, 169,

170-183goals of, 38, 42Grades K-4, 50-52Grades 5-8, 52-55, 64Grades 7- 12,55 -59group, 75-76. 100high school graduation

requirement, 77, 84history of, 10-11, 14-27in advanced technological

environments. 130in California. 243. 245in courses using graphics tools,

134individual. 75-76. 100. 248.

250instructional decisions, 250,

252instruments for, 26, 27. 48-49integrated with instruction.

247, 261, 276-277item-sampling (multiple-matrix

sampling), 101-102, 103language and culture bias in.

291-292large-scale. 6. 100

Page 369: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Index 361

methods of. 52multiple-choice items, 101,

124. 125-127. 217new principles of, 27new vision of. 39, 40of student-constructed

learning. 216open-ended items. 76. 81. 101,

127open-sentences, 218. 220performance. 101policy profile. 22-25. 35portfolio use in, 101problem solving. 79. 99, 101,

114, 121-122, 124. 191,192-193. 215-216, 227, 228,239, 289-290, 292

procedural knowledge, 79programs. 101. 244. 250purpose and use. 48, 75school-based, 275, 276, 280,

284state assessment, 75-99. 100-

127state-mandated, 5student journals, 187-190,

193-196effect on learning and

teaching, 196evaluation of. 196-209incentives and obstacles to

use. 201progression in student

Journal writing. 198, 207-209

purpose. 196-201. 209results, 198-209teacher response to. 203validity. 198

student self-assessment, 193,265, 275-276. 284via classroom observation,

191-192traditional. 270uses of, 11-12, 75. 100. 245.

250, 269, 270, 285-288writing component, 298

Assessment Alternatives inMathematics, 283

Assessment Performance Unit(England), 22-23

Australia, 8. 69, 274Assessment Alternatives in

Mathematics, 8. 185. 186.191

IMPACT Project. 3. 7. 185, 186-190, 194. 195, 211

Mathematics Curriculum andTeaching Program. 190

Vaucluse College Study. 8. 185.193-212

Victoria 285. 288. 292-299Mathematics Association of,

296Melbourne University. 293-

294Monash University, 186, 294new math syllabus, 295test program, 9

BBehaviorism, 27. 254. 255

behavioral objectivesmovement, 21

Binet-Simon Scale. 17, 18Bloom's Taxonomy of Educational

Objectives. 25-26, 35. 255

CCalculators and computers in

school mathematics (seealso Graphing technology),79. 247

calculator-active test items, 81change in item objectives with

use of, 149classroom use of, 7. 62. 140.

288, 289, 296, 299

Page 370: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

362 Mathematics Assessment and Evaluation

C (continued)

in mathematics testing, 4. 7,62, 139. 143, 157

research and development.132. 133

research on use in testing, 158Calculator-Based Placement Test

Program Project, 130California. 69. 290

A Question of Thinking, 126Curriculum Alignment System/

Comprehensive AssessmentSystem (CAS), 251

Mathematics Framework forCalifornia Public Schools, 107,114, 121-123. 124

Mathematics Framework forCalifornia Public Schools:Kindergarten Through Grade12, 243-244, 246-249, 251.257, 260

Mathematics Framework forCalifornia Public Schools(1990 draft). 261, 264

Mathematics Model CurriculumGuide. 107, 114, 121

Mathematics Projects. 108, 114proficiency requirement. 245,

249California Achievement Test

(CAT). 62, 66, 251California Assessment Program

(CAP). 6. 72, 242, 244, 245.246-251, 260. 283-284

content-specifications. 110,113-121

emphasis on understanding.244

field testing items, 109for program diagnosis. 104framework for testing, 1, 8history of, 106-107matrix sampling, 246, 247

percentile ranking, 246-249program assessment, 248. 250,

281purpose. 101-104reporting performance, 246scores, importance of, 246, 248strands. 246Survey of Academic Skills, 123,

126Survey of Academic Skills:

Grade 12. 257 correlationwith Framework

Survey of Basic Skills, 102, 123teacher perceptions, 244. 248test development, 107-113use of data, 245. 246, 248, 260

Changes In Student AssessmentOccasioned by FunctionGraphing Tools Symposium,136

Chapter 1: ESEA, 245Classroom testing

literature review, 3Cognition, situated, 260Cognitively Guided Instruction

(CGI) Project, 35. 50College Board, 143, 157

Mathematics Achievement Test,130

Common School Journal, 15Communication in the

mathematics classroom, 69.185, 186-190, 202, 210

modes of expression, 196dialogue. 198. 207, 211narrative. 197, 207. 211summary. 197, 207, 211

written. 187(see also Journal writing and

Portfolio use)Comprehensive Test of Basic

Skills (CMS), 62, 64, 67,246.

uses of, 245, 249Computational algorithms. 40

V 0

Page 371: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Index 363

Computer Managed Instruction(CM!) program, 245. 246. 249

testing, 245Computational emphasis, 246.

249Conceptual fields (Vergnaud), 30,

45. 215additive structure, 46

Conference Board of theMathematical Sciences, 143

Connecticut. 69Common Core of Learning

Assessment Project, 70Constructing knowledge, 287Content-by-behavior framework.

46, 47Content-by-behavior matrix. 24.

253-254Content-by-process framework,

255, 257Council of Chief State School

Officers (CCSSO). 96Curriculum

achieved, 243, 244actual. 243change. 295. 299differential, 243, 247Intended, 243, 244. 250introducing innovation into,

184, 185matrix use in. 256measured. 243test alignment with. 61-74

Curriculum Standards (NCTM),39-44

DDeskilling teachers. 34. 36Domain knowledge strategy. 45-

47

EEducational Technology Center.

Visualizing Algebra. 129

Educational Testing Service(ETS), 97, 136

England, 69, 70, 274English "0" level examinations. 16Epistomological change. 242.

251, 256, 259, 260. 266,272, 275, 283-284

monitoring, 242Evaluation, 10

application of 'scientificprinciples." 19

convergent strategy, 29defined, 29formative. 29history of, 14-27impact of new mathematics

programs, 10large-scale profiles, 10cf administrators. 246of higher-order skills, 298of principals, 245of teachers, 244. 246policy and program, 10-11. 18,

20-22program evaluation, 16, 19,

20-22, 29stages of, 29summative, 29trends in, 28-30

Evaluation Standards (NCTM).40. 49-60. 63-68

purpuzie of program evaluation.42

Every body Counts (MSEB), 142

FFirst International Mathematics

Study tFINIS), 22Florida. 75. 83, 84. 85

case study. 86-89use of tests. 88

France. 69Flatelions. Statistics. and Trtgonometni

with Computers. 131

Page 372: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

364 Mathematics Assessment and Evaluation

GGender bias. 4, 7

in assessment, 290in the classroom, 290-291

General intelligence tests, 16Graded Assessment in

Mathematics (GAIM), 271Graded Assessment Project

(England), 35Graduation

proficiency tests for, 245, 248-249

Graphing technologyimpact on learning, 130, 133-

134in assessment, 6. 128in teaching, 128methodological issues. 134re. curriculum goals, 132research recommendations,

133 134.136 -137Great Society initiatives. 18-19Greenwich Hospital School, 16

HNewel Mathematics A Project

(The Netherlands). 30Hewlett-Packard. 140High School Proficiency Test

(HSPT). 84Higher-level learning. 214Higher-order thinking skills. 63.

80. 121

Individually Guided Education(1GE) Evaluation Study,29

Influence of Testing onMathematics EducationConference, The. 1

Information technology. 262

Interdisciplinary effort, 272Intetjudge agreement, 276-277,

280-281, 282, 284Iowa Test of Basic Skills (ITBS),

62, 64. 67Iowa Test of Educational

Development (ITED). 106Item-response map(s), 223-226,

229-240

.1

Journal uniting, 199, 292

IfKnowledge as process. 287Korea, 69

LLondon & East Anglian Group for

GSCE Examinations. 72Louisiana, 75, 84, 85Lower-order thinking skills. 63

80

MMandated testing, 62. 245, 288

295federal. 245impact on instruction. 3Impact of in California. 244prevalence of, 62school board. 245use of data. 244

Massachusetts. 69. 75, 76, 79,83.85

case study, 89-92goal of assessment. 92use of tests, 91

Masteryof objectives. 245testing for. 245

Page 373: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Index 365

Mathematicalcommunication. 263, 264, 271-

272community. 266, 272connections. 263disposition, 270. 272literacy. 265reasoning, 263

Mathematical Association ofAmerica (MAA), 136, 143,157

Calculator-Based PlacementTest Program Project(CBPTP), 158, 159, 167Calculator-Based Arithmetic

and Skills Test (CB-A-S),160-161, 164

Calculator-Based CalculusReadiness Test (CB-CR),160, 161, 164

Calculator-Based BasicAlgebra Test (CB-BA). 160,162- 163,164

Calculator-Based AlgebraTest (CB-A), 160, 163-164

Placement Test Program, 145Mathematical domains

new assessment procedures. 30specification of, 30

Mathematical power. 39. 123,243, 252, 259, 260-267,272-273. 282-283

defined, 258. 262-266evidence of, 266-270. 273, 283

strategies for collecting, 275,276

in California Framework. 260in NCTM Standards, 261. 263-

264. 282individual. 261-265societal, 261-263understanding, 257

Mathematical Sciences EducationBoard (MSEB), 1, 2, 142

Mathematical understanding"Conceptual field" approach of

Vergnaud. 215expert-novice research, 215measurement of, 8student construction of, 215

Mathematics A Course (TheNetherlands), 55

Mathematicsand failure in school, 262cultural genesis of, 260,

268-270democratization of, 259, 268,

275education, politics of. 260, 266metaphors for. 258. 269understanding. 244

Mathematics as:cultural system, 258human activity. 269intellectual structure vs.

institutional structure, 254,255, 271

language. 258problem-solving, 257. 258process. 257science of patterns, 258

Mathematics CurriculumTeaching Project (Mall(Australia). 35

Mathematics educationchange in, 299government /political

expectations, 287-288Mathematics tasks

context, 273-274criteria-referenced. 273problem situations, 264,

265-266. 267templates for analysis of

performance, 275, 277,280

two-stage, 274Matrix sampling, 246

Page 374: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

366 Mathematics Assessment and Evaluation

M (continued)

Metacognitive learning(metalearning) inmathematics. 184, 185, 190,196

Metropolitan Achievement Test(MAT), 62, 67

Michigan, 75. 79, 83, 85case study, 92-95"Essential Goals and Objectives

in Mathematics," 92Michigan Educational

Assessment Program (MEAD),92-95

use of tests, 95Minimum Basic Skills tests. 84Missouri Mastery and

Achievement Tests (MMAT),152

Modern test theoryGraded Response Model. 222Item Response Theory. 213,

239Partial-Credit Model (PCM).

222-226, 228Rasch Model, 213

Monash University MathematicsEducation Centre, 186

Multicultural populations. 260

NNational Advisory Committee on

Mathematical Education(NACOME). 142

National Assessment ofEducational Progress (NAEP).6. 22. 75. 79. 89. 95. 149.156, 252. 253, 286. 290.291. 300

Mathematics Item DevelopmentCommittee, 97

1990 National Assessment inMathematics. 75, 78, 81

U 4

NAEP / 1S development team,97

state-by-state assessment, 96National Center for Educational

Statistics (NCES), 98National Center for Research in

Mathematical SciencesEducation (NCRMSE). 2-3.130. 244, 259

National Coalition of Advocatesfor Students, 21

National Council of Teachers ofMathematics (NCIM). 37. 135

Commission on Standards forSchool Mathematics, 143

National Defense Education Act(NDEA), 101

National Longitudinal Study ofMathematical Abilities(N1SMA). 22. 253

NCTM An Agenda for Action, 69.143

NCTM Standards. 3. 5. 9, 32, 35,37-60. 61-65, 68-69. 74. 76.77. 79. 99, 114. 130. 133.156. 286. 290, 291, 300

communication. 290-291Netherlands, The. 69New vision for school

mathematics, 30, 39New Jersey. 75. 84. 85New Math. 295Northern Examining Association

(England), 73N Drway, 69

0Ohio Early Mathematics

Placement Testing (EMFT)Program for High SchoolJuniors. 150

Ohio State University, 141, 167Approaching Algebra

Numerically Project, 141

Page 375: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Index 367

Calculator and ComputerPrecalculus Project (C2PC).129, 14i. 158. 165. 167

PPennsylvania State University,

167Percentile rankings

CAP, 246pressures from. 247-249

Performance testing, 78Phenomenographic learning (see

University of Gothenberg)Portfolio use, 101, 247, 250, 281Primary computational skills, 40Problem-solving, 63. 65. 69, 215-

216. 246. 247. 249, 252,257-259. 260. 263, 292

assessment of, 246Procedural knowledge, 64. 68. 69Proportional reasoning, 42. 52. 59

RReasoning. 69Reporting categories. 242. 251,

256, 259. 271-283alternatives. 250bases for. 243content. 252-253, 257, 260corruptibility. 272-273, 283doing mathematics, 271-272Framework strands, 246process. 252-253. 257. 258reform-oriented, 243. 245recommended for CAP. 271societal uses of mathematics.

267-268Research Group in Mathematics

Educstion (OW & OC. TheNk....erlands). 4

Rcskilling teachers, 276Role of audience (constituencies),

287

Rusden Activity MathematicsProject (Australia). 296

SSchool-based decisions, 296-297School mathematics, goal of,

26School Mathematics Project

(SMP), 275. 277Science Research Associates

Survey of Basic Skills (SRA),62, 65

Scientific management, 254-256impact on mathematics, 254

Scientific model, 26. 35for learning. 25for testing. t9. 22, 29impact on teachers, 34

Scope and Sequence charts,256

Second InternationalMathematics Study (SIMS).22. 24-25. 35

Self-directed learning, 41Shell Centre for Mathematical

Education (England), 52-53Sixth International Conference on

Mathematical Education(Hungary), 4

South Carolina, 69SOLO Taxonomy, (see Structure

of the Observed LearningOutcome)

Stanford Achievement Test (SAT).62. 66

Structure of the ObservedLearning Outcome (SOLO)Taxonomy, 226-241, 292

Studentsaccelerated. 247placement of, 245remedial, 247

Study Group for MathematicsLearning (Australia). 296

41 J

Page 376: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

368 Mathematics Assessment and Evaluation

TTeacher control. 295

empowerment, 295. 299professionalism, 295, 299

Teaching to the test, 62.76. 286Testing Design Task Force. 2Tests & Testing

achievement, 17-18alignment with curriculum,

248. 251alignment with the Nam

Standards. 5alternatives. 4, 191-192alternative test procedures.

34aptitude. 17basic skills. 244. 250. 254calculator-active items. 130.

167calculator-passive testing, 139,

144-149calculator-neutral tests, 139,

144. 149-154calculator-based tests. 139

144. 154-156calcul ,tor use in (see

Assessment)computational emphasis. 246conditions:

power. 174-177speeded. 174-177

construction procedures, 100.107-113

content. 61. 63. 69content-by-proce matrix.

124-125context-effer' on problem-

solving. 172criterion-referenced, 22. 30,

251current practices. 1. 4development procedures for

state assessment, 4. 5development steps, 77-84

(1Al 4 0

district level, 244, 245. 248essay, 180, 186

item context. 176essential competencies. 244features:

item-difficulty sequencing.177-180

multiple choice. 169. 173.175 - 177.180

first arithmetic reasoning test,18

gender differences in, 169.170-183

general intelligence tests. 16-17group, 192high-stake tests. 125higher-order questions. 134impact of. 260impact of mechanistic world

view on, 18item classification. 63item development. 81-82. 84item pools. 83large-scale. 292level of response. 63mandated. 288, 295mathematics content. 260measuring conceptual

understanding. 219microcomputer use in (see

Assessment)multiple-choice. 7. 38. 70. 246.

249, 250. 252. 273multiple-choice items, 81. 130.

286. 290. 29819th Century. 15norm-referenced. 76. 77objective models in. 16objective-referenced. 21-22open-ended items, 81, 99open-response, 70preparation strategies, 181-183process. 61. 63-67. 69process categories. 258

Page 377: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

Index 369

proficiency testing 245. 247. 249profile achievement. 22-28psychometric. 18reports to parents. 245specifications. 78

ability. 80. 81content. 79. 80. 81minimum performance

standards. 78process, 79statistical. 79-80

standardized. 1. 9. 71-77, 101.121. 277. 285, 286-287. 289.290. 293calculator use on, 62. 63. 68evaluation of. 61Grade 8. 70

state-mandated, 77student behaviors in test

taking:answer-changing. 177-180guessing. 177-180. 181risk taking. 180-181test anxiety. 179test-wiseness, 180 181

student-constructed. 192systems-wide procedures. 293.

300teacher perceptions. 244-246.

248-249. 259test assembly. 83-84theory-intensive model. 217traditional achievement tests.

217tryout blocks. 82-83t nits of analysis. 250use of data. 245. 246-248, 249.

250validity, 245

Texas Education Agency, 141Texas Instruments (Ti). 139Thurstone thresholds. 225Tyler evaluation model. .9

UUnited Kingdom (see England)Units of analysis. 250University of Chicago School

Mathematics Project(UCSMPI, 129. 131. 141 167

University of Gothenburgphenomenography group,215, 218-220

University of Illinois Institute onProgram Evaluation. 19

University of Pittsburgh LearningResearch and DevelopmentCenter. 131

V

van Hide levels. 241. 292Vermont. 69. 282-283Visualizing Algebra (see

Educational TechnologyCenter). 129

'What is tested is what getstaught' (MSEB), 167. 286289

Wisconsin Center for EducationResearch (WCER). 2

7

Page 378: DOCUMENT RESUME ED 377 073 SE 055 578 AUTHOR Romberg, … · 2014-05-07 · Thomas A. Romberg. 10. 3. Implications of the NCTM Standards for. Mathematics Assessment. Norman Webb and

MATHEMATICS ASSESSMENT

AND EVALUATION

Imperatives for Mathematics Educators

Thomas A. Romberg, Editor

Are current testing practices consistent with the goals of the reform movement

in school mathematics? If not, what are the alternatives? How can authentic

performance in mathematics be assessed?

These and similar questions about tests and their uses have forced those

advocating change to examine the way in which mathematical performance data;

is gathered and used in American schools. This book provides recent vimVsoii..41.

the issues surrounding mathematics tests, such as the need for valid perfor-

mance data, the implications of the Curriculum and Evaluation Standards for'.'5cltool Mathematics for test development, the identification of valid items and

tests in terms of the Standards, the procedures now being used to construct

sample of state assessment tests, gender differences in test taking, and meittridfrcla..

of reporting student achievement.

"The changing of assessment methods in mathematics is perhaps the largettf4-

single obstacle in the path of education reform in mathematics education;Th4I

book addresses the hurdles remaining in a clear and consistent fashion. IC:lc:::

provides achievable alternatives and plots the paths that will have to he takeir

achieve the goals of the NCTM STA %; DARDS"

John A. Dossey, Illinois State University, Normal, EL:Irsri

Thomas k Romberg is Sears Roebuck Foundation-Bascom Professor in

Education at the National Center for Research in Mathematical Science

Education at the University of Visconsin-Madison.

SATE UNIVERSITY OF NEW YORK PRESS

A Volume in the SONY Series,

Reform in Mathematics Education

Judith Sowder, Editor

1ISBN 0-7914-0900-7

I

I EST C nv inp11 r E 378

,411W?

'yi

toqw.

m4).

Prz;.4v,

sit!) if. m.y.Vlqit71;4 t:'wur. 'krM. ti,

rte

linaffig14.41ehr.

Z<ttti,14,y

"ea

I 4%41

4a16Pcii..4tit."021V1.4:1 14.

derVri

r.

S113/3.5>t%

-C11:.stA.?i,iirzr.

tar.":-4V

P.,44it;0 "ism.

et,SPAI.0zr

,S1G:Uir

't(00,13,r

y

t el

r:.


Recommended