Scoring Technology Enhanced Items Sue Lottridge Director of Machine Scoring Amy Burkhardt Senior...

Post on 14-Jan-2016

223 views 3 download

transcript

Scoring Technology Enhanced Items

Sue LottridgeDirector of Machine Scoring

Amy BurkhardtSenior Research Associate of Machine Scoring

Technology Enhanced Items

• Seeing more TEIs in assessments– Consortia – Formative assessments

• Decisions around TEIs– Count-based (e.g., 25 MCs, 2 CRs, 3 TEIs)– Content-based

Drag and Drop TEIs

• Select– Drag N objects to a single drop target– Similar to ‘Check all that apply’ Selected Response

Items

• Categorize– Drag N objects to M drop targets– Limits: an object can be dragged to multiple Y

targets, or no

• Order– Drag N objects to M drop targets in proper order

• Composites (multi-part)– Dependencies

• Claims– Choice of TEI– Justification

• Creation– Environment– Format– Complexity– Constraints

• Interoperability– Rendering– Data storage– Porting

• Performance– Response time– Latency– Efficiency

• Cost– Time to develop– Permissions– Storage– QA

• Scoring– Combinatorics– Who sets rules

TEI Considerations

TEIs Live in the “Grey Area” between MC and CRs

Multiple Choice Items

Constructed Response

ItemsTEIs

Evaluating TE Item Scoring

• Classical Theory Methods (p-value, score dist, pbis)

• Analyze trends in responses– Frequency of response patterns– Counts of object choices– Proportion of ‘blank’ responses– Frequent, incorrect responses

• Analysis may– Suggest where examinees may not understand

the item– Highlight alternative correct answers– Suggest need for partial credit or collapsing

categories

TEI Scoring and Performance Factors

Item Design

Structure

Clarity

Constraints

Examinee

“Gets” the item

Facility with Tools

Experience with Item Type

Scoring

Rubric Alignment

Rubric Clarity

Scoring Quality

Item 1

Key: 2 points if response matches key.1 point if top or bottom row matches key.0 otherwise.

There are 19,531 ways to answer a single part, and so 381,459,961 ways to answer both parts.

What do the data tell us? Response pattern frequencies

More students dragged 2/3 and then 1/3 into boxes than answered the item correctly.

Part 1 and Part 2 Frequencies

Summation versus expression representation?

Summation versus expression representation?

  Original Rubric New RubricScore Count Percent Count Percent0 2432 81% 2257 75%1 212 7% 335 11%2 375 12% 427 14%p-value .16 .20

• 190 examinees would have received a higher score• 138 ---- 0 to 1• 37 ---- 0 to 2• 15 ---- 1 to 2

Item 1 Summary

• Item Design– Clarify question– Clarify directions– Review drag target size– Revisit number of drag objects

• Examinee – Enable practice with infinite wells– Observe examinees answering the item

• Scoring– Summation versus expression? – 14% of responses are blank, why?

Item 2

Score

Number of Correct Objects 

Present

Number of Incorrect 

Objects Present2 4 0

14 1 or 23 12 0

0 Otherwise

Ignoring order, there are 2^10 (1024) possible answers.Preserving order, there are about 10,000,000 possible answers.

Ignoring order, there were 573 unique answers.Preserving order, there were 2961 unique answers.

Response pattern frequencies

What objects are chosen by examinees?

Object MeanCorrelation with

item score3(x) 87% .13

x+x+x 69% .26x^3 65% -.52

5x-2x 46% .35x+3 43% -.37

3x+3 37% -.363(2x-x) 33% .17

x/3 55% -.495(x-2) 26% -.18x-x-x 23% -.25

Object selection by score

Object

0 (N=5814)

1(N=1212)

2(N=312)

3(x) 85% 94% 100%x+x+x 62% 92% 100%x^3 78% 20% 0%

5x-2x 37% 73% 100%x+3 53% 7% 0%

3x+3 46% 2% 0%3(2x-x) 31% 24% 100%

x/3 68% 6% 0%5(x-2) 30% 13% 0%x-x-x 28% 1% 0%

New Scoring Rules

• Student needs to drag more correct objects than incorrect objects to earn a score of 1

Scores Original Rubric New Rubric

0 79% 63%1 17% 33%2 4% 4%p-value .12 .21

Relationship of parts to item score

Object PercentOriginal

CorrelationNew

Correlation3(x) 87% .13 .12

x+x+x 69% .26 .30x^3 65% -.52 -.53

5x-2x 46% .35 .29x+3 43% -.37 -.52

3x+3 37% -.36 -.503(2x-x) 33% .17 .04

x/3 55% -.49 -.625(x-2) 26% -.18 -.24x-x-x 23% -.25 -.36

Object Selections by Score Point

Original Rubric Revised Rubric

Object0

(N=5814)1

(N=1212)2

(N=312)0

(N=4624)1

(N=2402)2

(N=312)3(x) 85% 94% 100% 85% 91% 100%

x+x+x 62% 92% 100% 58% 85% 100%

x^3 78% 20% 0% 84% 38% 0%

5x-2x 37% 73% 100% 36% 57% 100%

x+3 53% 7% 0% 64% 10% 0%

3x+3 46% 2% 0% 57% 4% 0%3(2x-x) 31% 24% 100% 35% 19% 100%

x/3 68% 6% 0% 79% 16% 0%

5(x-2) 30% 13% 0% 34% 15% 0%

x-x-x 28% 1% 0% 35% 2% 0%

Item 2 Summary

• Item Design– Review drag target size– Revisit number of drag objects

• Examinee – Examinees appeared to understand the task

• Scoring– Are more generous rules aligned with

standard/claim? – Other rules?

Item 3

Student earns a 2 if she drags 4 or 5 correct steps in order and last step is x-3.Student earns a 1 if she drags 3 correct steps in order and last step is x-3.Student earns a 0 otherwise.

There are 19,081 ways to answer this item.

20 ways to earn a 216 ways to earn a 1

Response Frequencies (1108 unique responses)

Score distributions

Score Original Rubric Revised Rubric

N % N %

0 3891 75% 3758 73%

1 40 1% 173 3%

2 1227 24% 1227 24%

P-value .24 .25

Revised rubric – allows for partial credit scoring when student response contains correct path, but student drags ‘extra’ objects to fill up the remaining spaces

775 (13% of responses were blank)

Item 3 Summary

• Item – Remove Infinite wells – Add ‘distractors’?– Remove borders around drop targets or make dynamic

• Examinee – Students seem compelled to drag objects to fill all

spaces– Students do not reduce to final answer

• Scoring– Combinatorics – complicated scoring rules – Reversals?– Same level transformations?

Conclusions

• A review of responses and frequencies can reveal areas of misunderstanding, potential for item revision, or uncaptured correct responses

• Complexity of item leads to complexity in scoring– More ‘objects’ = more possible correct responses!– Object content influences scoring

• Placing constraints on item can help– Infinite wells– Size and number of objects

• Changes to scoring don’t always add value

Thank you!

slottridge@pacificmetrics.com