+ All Categories
Home > Documents > Improving the Ways We Report Test Scores

Improving the Ways We Report Test Scores

Date post: 15-Jan-2016
Category:
Upload: kaleb
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Improving the Ways We Report Test Scores. Ronald Hambleton, April Zenisky University of Massachusetts Amherst, USA CERA Annual Meeting, June 1, 2010. Important Time in the Testing Field. - PowerPoint PPT Presentation
Popular Tags:
35
1 Improving the Ways We Report Test Scores Ronald Hambleton, April Zenisky University of Massachusetts Amherst, USA
Transcript
Page 1: Improving the Ways We Report Test Scores

1

Improving the Ways We Report Test Scores

Improving the Ways We Report Test Scores

Ronald Hambleton, April ZeniskyUniversity of Massachusetts

Amherst, USA

CERA Annual Meeting, June 1, 2010.

Ronald Hambleton, April ZeniskyUniversity of Massachusetts

Amherst, USA

CERA Annual Meeting, June 1, 2010.

Page 2: Improving the Ways We Report Test Scores

2

Important Time in the Testing FieldImportant Time in the Testing Field

New provincial tests in Canada and state tests in the USA being introduced as part of educational reform (e.g., MA went from 7 to more than 24 in 10 years).

Users need to understand and use the scores and score reports correctly (or substantial funding is wasted).

New provincial tests in Canada and state tests in the USA being introduced as part of educational reform (e.g., MA went from 7 to more than 24 in 10 years).

Users need to understand and use the scores and score reports correctly (or substantial funding is wasted).

Page 3: Improving the Ways We Report Test Scores

3

1. Considerable investment of time and money has been made to address technical problems:

• IRT modeling of data, test scoring of

performance data, test score equating, reliability estimation, computer technology, DIF analyses, standard-setting, and validity studies.

1. Considerable investment of time and money has been made to address technical problems:

• IRT modeling of data, test scoring of

performance data, test score equating, reliability estimation, computer technology, DIF analyses, standard-setting, and validity studies.

Page 4: Improving the Ways We Report Test Scores

4

2. Surprisingly, test score reporting attracts very little attention! Name a research study? Without clear and meaningful

reporting of information, the other steps are of less value!

Also, on this topic, more than other technical topics, many persons thinks they are experts—everyone has an idea here about what to do!

2. Surprisingly, test score reporting attracts very little attention! Name a research study? Without clear and meaningful

reporting of information, the other steps are of less value!

Also, on this topic, more than other technical topics, many persons thinks they are experts—everyone has an idea here about what to do!

Page 5: Improving the Ways We Report Test Scores

5

AERA, APA, NCME Test Standards: What do they say about

score scales and reporting?

AERA, APA, NCME Test Standards: What do they say about

score scales and reporting?

5.10. When test score information is released….those responsible should provide appropriate interpretations.

--information is needed about content coverage, meaning of scores, precision of scores, common misinterpretations, and proper use.

5.10. When test score information is released….those responsible should provide appropriate interpretations.

--information is needed about content coverage, meaning of scores, precision of scores, common misinterpretations, and proper use.

Page 6: Improving the Ways We Report Test Scores

6

13.14 …Score reports should be accompanied by a clear statement of the degree of measurement error associated with each score or classification level and information on how to interpret the scores.

13.14 …Score reports should be accompanied by a clear statement of the degree of measurement error associated with each score or classification level and information on how to interpret the scores.

Page 7: Improving the Ways We Report Test Scores

7

Major Problems in Score Reporting!Major Problems in Score Reporting!

Reporting scales and data displays (the reports) are confusing to many persons: percents vs. percentiles; IQ scores; New scales developed by states and

provinces T scores, stanine scores.

Reporting scales and data displays (the reports) are confusing to many persons: percents vs. percentiles; IQ scores; New scales developed by states and

provinces T scores, stanine scores.

Page 8: Improving the Ways We Report Test Scores

8

Major Problems in Score Reporting!Major Problems in Score Reporting!

Quantitative literacy is not high

(three kinds of persons!). Half of

population can’t read bus schedules

in the US. What’s 20 million dollars

for testing? (1/3 of 1% of education

budget)

NRT vs. CRT scores.

Quantitative literacy is not high

(three kinds of persons!). Half of

population can’t read bus schedules

in the US. What’s 20 million dollars

for testing? (1/3 of 1% of education

budget)

NRT vs. CRT scores.

Page 9: Improving the Ways We Report Test Scores

9

Major Problems in Score Reporting!Major Problems in Score Reporting! Body of evidence highlighting score

reporting problems (e.g., Jaeger) Reporting scores without error bands Too much meaningless score

information on some reports (called “chart clutter” by Tufte)

Not providing meaningful diagnostic information

Body of evidence highlighting score reporting problems (e.g., Jaeger) Reporting scores without error bands Too much meaningless score

information on some reports (called “chart clutter” by Tufte)

Not providing meaningful diagnostic information

Page 10: Improving the Ways We Report Test Scores

10

Goals of the PresentationGoals of the Presentation

1. Consider student reports—improving the meaning of score scales and diagnostic reports.

2. Mention several emerging methodologies for researching score reports and their utility.

3. Identify a seven step model for improving score report design and evaluation.

1. Consider student reports—improving the meaning of score scales and diagnostic reports.

2. Mention several emerging methodologies for researching score reports and their utility.

3. Identify a seven step model for improving score report design and evaluation.

Page 11: Improving the Ways We Report Test Scores

11

Individual Test Score ReportsIndividual Test Score Reports

In the USA, over 30,000,000 individual reports, alone, to parents of school children.

Over 1000 credentialing exams, and some of the exams exceed 100,000 candidates (e.g., securities, accountants, nurses)

In the USA, over 30,000,000 individual reports, alone, to parents of school children.

Over 1000 credentialing exams, and some of the exams exceed 100,000 candidates (e.g., securities, accountants, nurses)

Page 12: Improving the Ways We Report Test Scores

12

Shortcomings in the Student Reports(Goodman & Hambleton, AME, 2004)Shortcomings in the Student Reports(Goodman & Hambleton, AME, 2004)

No stated purpose, no advanced organizer, no clues about where to start reading.

Performance categories (typically) are not defined, even briefly.

No error bands on any of the reported scores, or even a hint that errors of measurement (i.e., imprecision) are present!

No stated purpose, no advanced organizer, no clues about where to start reading.

Performance categories (typically) are not defined, even briefly.

No error bands on any of the reported scores, or even a hint that errors of measurement (i.e., imprecision) are present!

Page 13: Improving the Ways We Report Test Scores

13

Shortcomings in the Student ReportsShortcomings in the Student Reports Font is often too small to read easily.

Instructional needs information is not always user-friendly—e.g. (to a parent), “You need help in “extending meaning by drawing conclusions and using critical thinking to connect and synthesize information within and across text, ideas, and concepts.”

Font is often too small to read easily.

Instructional needs information is not always user-friendly—e.g. (to a parent), “You need help in “extending meaning by drawing conclusions and using critical thinking to connect and synthesize information within and across text, ideas, and concepts.”

Page 14: Improving the Ways We Report Test Scores

14

Shortcomings in the Student ReportsShortcomings in the Student Reports

Several undefined terms on the displays: percentile, prompt, z score, performance category, achievement level, and more.

Basically, the reports are crowded!

Several undefined terms on the displays: percentile, prompt, z score, performance category, achievement level, and more.

Basically, the reports are crowded!

Page 15: Improving the Ways We Report Test Scores

15

Two Ideas for Score ReportsTwo Ideas for Score Reports

Bench-marking is one of our favorites and most promising: Capitalizes on item response theory (IRT)

—strong modeling of data, and items and candidates being reported on the same scale.

Researchers have been slow to take advantage of this

Bench-marking is one of our favorites and most promising: Capitalizes on item response theory (IRT)

—strong modeling of data, and items and candidates being reported on the same scale.

Researchers have been slow to take advantage of this

Page 16: Improving the Ways We Report Test Scores

16

Bench-Marking Solution: Makes Scale Scores More Meaningful

Bench-Marking Solution: Makes Scale Scores More Meaningful

Place boundary points on the reporting scale

Choose a probability associated with “knowing/can do”, say, 65%.

Use the ICCs from IRT to develop descriptions of what examinees can and cannot do between boundary points.

Place boundary points on the reporting scale

Choose a probability associated with “knowing/can do”, say, 65%.

Use the ICCs from IRT to develop descriptions of what examinees can and cannot do between boundary points.

Page 17: Improving the Ways We Report Test Scores

17

(3P) Item characteristic Curve (ICC)(3P) Item characteristic Curve (ICC)

-3 -2 -1 0 1 2 3

Ability

Prob

abili

ty o

f C

orre

ct R

espo

nse

.

Freq

uenc

y1.0

0.5

0.0

A

B

-3 -2 -1 0 1 2 3

Ability

Prob

abili

ty o

f C

orre

ct R

espo

nse

.

Freq

uenc

y1.0

0.5

0.0

A

B

Page 18: Improving the Ways We Report Test Scores

18

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

-3 -2 -1 0 1 2 3

Proficiency Scale

Exp

ecte

d S

core

(on

th

e 0-

1 m

etri

c)Item Characteristic Curves for 60 ItemsItem Characteristic Curves for 60 ItemsItem Characteristic Curves for 60 ItemsItem Characteristic Curves for 60 Items

P=0.65

W N P

Reporting Items PointsCategoryTopic 1 13 16Topic 2 18 21Topic 3 9 12Topic 4 8 11Topic 5 12 15

Page 19: Improving the Ways We Report Test Scores

19

Making Score Scales More MeaningfulMaking Score Scales More Meaningful

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

babi

lity

Page 20: Improving the Ways We Report Test Scores

20

Making Score Scales More MeaningfulMaking Score Scales More Meaningful

0.65

B P A

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

babi

lity

0.65

400 500 600

Page 21: Improving the Ways We Report Test Scores

21

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

ba

bil

ity 0.65

Making Score Scales More MeaningfulMaking Score Scales More Meaningful

400

Page 22: Improving the Ways We Report Test Scores

22

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

ba

bil

ity 0.65

500

Making Score Scales More MeaningfulMaking Score Scales More MeaningfulMaking Score Scales More MeaningfulMaking Score Scales More Meaningful

Page 23: Improving the Ways We Report Test Scores

23

Making Score Scales More MeaningfulMaking Score Scales More Meaningful

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

babi

lity 0.65

600

Page 24: Improving the Ways We Report Test Scores

24

Making Score Scales More MeaningfulMaking Score Scales More Meaningful

0.00

0.25

0.50

0.75

1.00

200 300 400 500 600 700 800

Mathematics

Pro

babi

lity 0.65

400 600500

Page 25: Improving the Ways We Report Test Scores

25

Meaning of the Mathematics ScaleMeaning of the Mathematics Scale200 300 400 500 600 700 800

Level 200-290: Students at this level can sometimes solve very basic problems in each of the content areas. For example, they can solve simple arithmetic problems and read simple data displays.

Level 300-390: Students at this level show a beginning ability to recall and use mathematical facts and terminology to solve basic problems. For example, they can identify the rule for a simple pattern and solve very routine geometry problems.

Level 400-490: Students at this level display the ability to solve a greater variety of basic problems in each of the content areas. For example, they can recognize relationships and solve routine problems presented in verbal, mathematical, or graphical forms.

Level 500-590: Students at this level are able to solve multi-step problems in different content areas and can make connections between content areas. For example, they can solve multi-step percent problems and can use algebraic skills to solve geometry problems.

Level 600-690: Students at this level show a clear increase in ability to solve more demanding problems, to generalize, to understand mathematical terminology, and to make connections. For example, they can solve complex counting problems involving permutations/ combinations, generalize complex patterns, and solve multi-step problems involving geometric/algebraic relationships.

Level 700-790: Students at this level have the ability to apply insight, reasoning, and problem solving strategies to solve a wide range of problems both within and across the content areas. For example, they can solve problems involving newly-defined functions in more than two variables and can solve conditional probability problems by constructing and analyzing a table of possible outcomes.

Page 26: Improving the Ways We Report Test Scores

26

Common Diagnostic ReportCommon Diagnostic ReportCommon Diagnostic ReportCommon Diagnostic Report

Candidate results by subdomain categories (e.g. math):

Content Domain Score Points Percent Correct

1. Data Analysis, Stats (20%) 1 of 10

2. Geometry (10%) 6 of 8

3. Measurement (20%) 9 of 12

4. Number Sense/Operations (15%) 4 of 9

5. Patterns (35%) 4 of 22

10%

75%

75%

44%

18%

0% 100%

Page 27: Improving the Ways We Report Test Scores

27

Highly Problematic Report!!Highly Problematic Report!!

No sense of measurement error No guarantee that the items are

representative No basis for score interpretation

No sense of measurement error No guarantee that the items are

representative No basis for score interpretation

Page 28: Improving the Ways We Report Test Scores

28

Mathematics Your Performance Compared to Passing Students

Content DomainYour Performance

Passing Student Performance

Weaker Comparable Stronger

1. Data Analysis, Stats (20%) 10% 20% X

2. Geometry (10%) 75% 60% X

3. Measurement (20%) 75% 90% X

4. Number Sense/ Operations (15%) 44% 60% X

5. Patterns (35%) 18% 65% X

Overall Performance Weaker Comparable Stronger

Multiple Choice (70%) X

Constructed Response (30%) X

Page 29: Improving the Ways We Report Test Scores

29

A Better Report!!A Better Report!!

Confidence bandsA frame of reference:

performance of borderline candidates, or passing candidates, for example.

Confidence bandsA frame of reference:

performance of borderline candidates, or passing candidates, for example.

Page 30: Improving the Ways We Report Test Scores

30

Score Report Design & Evaluation Score Report Design & Evaluation

Experiments Focus Groups Think-alouds Qualitative Reviews from the Field Tryouts

Experiments Focus Groups Think-alouds Qualitative Reviews from the Field Tryouts

Page 31: Improving the Ways We Report Test Scores

31

7 Steps in Report Development7 Steps in Report Development

Define purpose of score report

Identify intended audience(s)

Review report examples/literature

Develop reports(s)

Data collection/field test

Revise and redesign

Ongoing maintenance

Page 32: Improving the Ways We Report Test Scores

32

Necessary ResearchNecessary Research

Reducing the size of error bands for knowledge/skill areas improving the quality of test items Improving the targeting of the test capitalizing on correlational information

among the skills or other priors

Reducing the size of error bands for knowledge/skill areas improving the quality of test items Improving the targeting of the test capitalizing on correlational information

among the skills or other priors

Page 33: Improving the Ways We Report Test Scores

33

Necessary Research (cont.)Necessary Research (cont.)

Learning to move from the ICCs, to choosing the number of performance categories, to preparing the descriptive statements that can enhance the meaning of a score scale, and validation.

Learning to move from the ICCs, to choosing the number of performance categories, to preparing the descriptive statements that can enhance the meaning of a score scale, and validation.

Page 34: Improving the Ways We Report Test Scores

34

Final RemarksFinal Remarks Important advances have been made

in score reporting. More research needed on matching

score reports to intended audiences, and evaluating score reports prior to use.

Diagnostic reports are important to users but need more research.

Important advances have been made in score reporting.

More research needed on matching score reports to intended audiences, and evaluating score reports prior to use.

Diagnostic reports are important to users but need more research.

Page 35: Improving the Ways We Report Test Scores

35

Final RemarksFinal Remarks Seven step model should be used, and

exemplar reports compiled. We are pleased to see the developments

taking place.

--States, provinces and countries are beginning to use the tools and progress can be seen.

See the NCME bibliography by Deng and Yoo with 70+ pages of references!

Seven step model should be used, and exemplar reports compiled.

We are pleased to see the developments taking place.

--States, provinces and countries are beginning to use the tools and progress can be seen.

See the NCME bibliography by Deng and Yoo with 70+ pages of references!


Recommended