+ All Categories
Home > Documents > Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible...

Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible...

Date post: 23-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
26
Statistical and operational complexities of the studies II Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS, TALIS datasets Ispra, Italy- June 24-27, 2014 Note: These slides were prepared as part of the IEA training portfolio with the collaboration of IEA staff and resource persons.
Transcript
Page 1: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Statistical and operational complexities of the studies II Assessment design: Use of IRT and Plausible Values

Andrés Sandoval-Hernández – IEA DPC

Workshop on using PISA, TIMSS & PIRLS, TALIS datasets

Ispra, Italy- June 24-27, 2014

Note: These slides were prepared as part of the IEA training portfolio with the collaboration of IEA staff and resource persons.

Page 2: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Table of Contents

• Introduction

• Producing comparable scores

• Scaling procedures

• Calculating standard errors

2

Page 3: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Introduction

• Complex Sample Design

• Probabilistic, stratified, multistage sample designs

• Need to take sample design into account when computing estimates

• Complex Assessment Design

• Multiple matrix sample designs where nobody takes all items, and no items are given to all

• Need to produce comparable scores and to take measurement uncertainty into account when computing estimates

3

Page 4: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Introduction – What we really want to know?

• How would the students have performed on the test had we been able to administer ALL the items to ALL of them

• Since we did not test everyone on everything, we need to make our best guess (scientific estimation)

• Remember our goal… • Administer in a sensible design

• Obtain comparable scores

• Correct for unreliability

4

Page 5: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Table of Contents

• Introduction

• Producing comparable scores

• Scaling procedures

• Calculating standard errors

5

Page 6: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Producing Comparable Scores

• Raw scores do not take into account the difficulty of the items

• Different students took different items

• Student comparability between different tests/ subsets of a test is not possible

• Instead, student achievement is estimated using scale scores computed based on Item Response Theory (IRT)

6

Page 7: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Why IRT?

• Many items are needed to assess a domain as broadly defined such as, for example, mathematics

• At the same time it is unreasonable to administer the whole item battery to each sampled student because: • Students‘ results will be affected by fatigue

• Principals and teachers would be hesitant to free students for very long testing periods which would reduce participation

Students are assigned subsets of the item pool

7

Page 8: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

IRT: Item Response Theory

• Response to an item depends on the interaction between the “ability” of the respondent, and characteristics of the item

• Persons of high ability should answer easy items correctly

• Persons of low ability should not answer difficult items correctly

• Does not make assumptions of normal distribution but assumes unidimensionality of measurement

8

Page 9: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Item Characteristic Curve

9

Page 10: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Advantages of IRT Models

• IRT models allow us to create a continuum on which both student performance and item difficulty will be located, linked by a probabilistic function

• IRT allows for performance in a subject to be summarized on a common scale even when different students are administered different items

• Facilitates linking when dealing with rotated test forms

10

Page 11: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Advantages of IRT

• It allows us to:

• Evaluate the effectiveness of a test at different levels of ability

• Design tests to best measure at specific ability level

• Develop new tests and investigate them without administering them

• Develop item statistics that do not change when the group of examinees change

11

Page 12: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Table of Contents

• Introduction

• Producing comparable scores

• Scaling procedures

• Calculating standard errors

12

Page 13: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Scaling Procedures

• Achievement is initially estimated using computed scale scores based on IRT

• IRT allows for performance in a subject to be summarized on a common scale even when different students are administered different items

• In addition to IRT, these studies make use of multiple imputations or “plausible values” methodology

13

Page 14: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

14

Plausible Values

• Random draws from the estimated ability distribution of students with similar item response patterns and background characteristics

• The variance of these draws reflects the uncertainty of measurement

• Think of a regression where the predictors are item responses and background data

PV2 PV4 PV1 PV5 PV3

PV1-PV5: randomly drawn plausible values Ability Distribution

Page 15: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Using Plausible Values

• Plausible values are optimal for obtaining population estimates

• Plausible values should not be used for individual reporting

• Compute statistics with each plausible value and average results

15

Page 16: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Table of Contents

• Introduction

• Producing comparable scores

• Scaling procedures

• Calculating standard errors

16

Page 17: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Calculating standard errors

17

• The standard error for any statistic estimated from an LSA is a combination of sampling and assessment variances

Standard Error (t)= se(sampling) + se(assessment)

Also known as:

Variance(t)= Var(sampling) + Var(assessment)

• Standard Error is square root of variance

Page 18: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Computing Standard Error (TIMSS & PIRLS)

+

75

1

2

1

i

i

Sampling Variance

15

5

1

2

i

i

Assessment Variance

Page 19: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Computing Standard Error (PISA)

+

Sampling Variance

15

5

1

2

i

i

Imputation Variance

80

2

1

280 (1 0.5)

i

i

Page 20: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Computing Standard Error (TALIS)

Sampling Variance

80

2

1

280 (1 0.5)

i

i

Page 21: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Computing Standard Error (PIAAC)

+

Sampling Variance Imputation Variance

NOTE: In PIAAC the replication method changes from country to country. So the

formula would be different according to this.

Page 22: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Example: Comparing Standard Errors

22

SPSS IDB Analyzer

SE taking into account sampling & assessment error

SE calculated with SPSS (assuming simple random sample)

Page 23: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

The IEA/ETS Research Institute

(www.IERInstitute.org)

23

57 points!

Only 1 point!

Page 24: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Summarizing… (about the PVs)

• NEVER treat them as an individual score

• NEVER use the average

• ALWAYS repeat the analysis separately with each plausible value

• When selecting variables

• When selecting people

• Report the average of the statistics computed

• When conducting significance testing, combine the assessment variance with the sampling variance

24

Page 25: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Summarizing… (in general)

• If you do not take into account the sample and test design into your analysis you simply end up with the wrong answer. • Sampling weights

• Replicate weights

• Plausible values

• If we did not have to do this, we wouldn’t!!

• Programs like IDB-Analyzer, WESVAR and AM take sample and test design into account.

25

Page 26: Assessment design: Use of IRT and Plausible Values · Assessment design: Use of IRT and Plausible Values Andrés Sandoval-Hernández – IEA DPC Workshop on using PISA, TIMSS & PIRLS,

Thank you for your attention!

Any questions?

26


Recommended