Creating an in-house computerized adaptive testing (CAT) program with Concerto

Creating an in-house computerized adaptive testing (CAT) program with Concerto

Atsushi, MIZUMOTO(Kansai University)

2013/09/20JLTA at Waseda University

Computerized Adaptive Testing

CAT needsItem Response Theory

CTT vs. IRTAspect CTT IRT

Test score Ordinal scale Interval scale

Ability estimate Test-dependent Test-independent

Test result Person-dependent Person-independent

Measurement target (Precision) All test-takers Individuals

Equating/CAT Difficult Easy

Ohtomo (2009)

CAT Needs IRT

CAT

IRT

IRT

IRT

History of CAT Research

40 years (Thomson & Weiss, 2011))

30 in LT (Koyama, 2010))

Example of CAT

Example of CAT

CBT ≠ CAT

How CAT Works

http://www.j-cat.org/page/interpret

Advantages of CAT

• Tailored for individual test-takers

• Shorter test time

• More precision (= SE smaller)

• No need for random sampling

www.geocities.jp/kosugitti/labo/irtnote.pdf

Purposes

•Creating a CAT program

•Evaluation

Creating a CAT Program

•Choosing the CAT System

•Constructing an Item Bank (Pretest)

•Calibrating the Item Bank

•Determine Specifications & Feedback

•Administering the CAT







Moodle Plugin

http://moodle2x.info

1. Free account（150 test takers/month）

2. Amazon Machine Images（Free for a year）

3. Installing it on your own server

• Open-source

• Running R on a server (catR, RMySQL)

• HTML-based

Installation on a server

https://code.google.com/p/concerto-platform/wiki/installation4

Wiki (Resources)

https://code.google.com/p/concerto-platform/wiki/Resources?tm=6













Constructing an Item Bank (Pretest)

•Vocabulary Test (Mizumoto, 2006) http://www.mizumot.com/files/VocSizeMeasure.pdf

•Based on SVL 12,000 (Up to 8,000 level; 30 items for each level)

•716 university EFL learners

Sample Question

(1) 心の, 精神の

A. essential

B. creative

C. loose

D. mental













Calibrating the Item Bank

•240 items analyzed (Rasch model)

•150 items left for the item bank

•Calibrated with two parameter logistic model (item difficulty & discrimination)

•Update the csv file to Concerto













Specifications of CAT

•Starting point (parameters, initial ability, randmized/fixed）

•Ability estimation method (empirical Bayes and others)

•Stopping rule (Number of items/Standard error）

•Final ability estimation

Magis and Raîche (2012, p. 7)

How many items for what SE?

• Simulation with catR package

Magis, D., & Raîche, G. (2012). http://www.jstatsoft.org/v48/i08

True Theta = 1, SE = 0.3

Stopping rule = 30 items

Concerto

http://langtest.jp/concerto/?tid=20

Feedback Page













268 test takers(university first year)

(1) CAT(2) Paper-pencil version (68 items) common person linking

(3) Questionnaire“What did you think of the CAT result?”

Evaluation

CAT vs. Paper-pencil

CAT Theta

0 1 2 3 4

-10

12

3

0.92

-1 0 1 2 3

01

23

4

Paper-pencil Theta

n = 268

Random30Qs

Fixed68Qs

CAT Theta

0 1 2 3 4

-10

12

3

0.92

-1 0 1 2 3

01

23

4Paper-pencil Theta

n = 268

CAT (30Qs)M = 1.71SD = 1.13

P-P (68Qs)M = 1.72SD = 0.95

CAT Theta

0 1 2 3 4

-10

12

3

0.92

-1 0 1 2 3

01

23

4Paper-pencil Theta

n = 268

CAT (30Qs)M = 1.71SD = 1.13

P-P (68Qs)M = 1.72SD = 0.95

Mean diff. = -0.0295% CI [-0.07, 0.04]

d = 0.01

Power = .06

CAT Theta

0 1 2 3 4

-10

12

3

0.92

-1 0 1 2 3

01

23

4Paper-pencil Theta

n = 268

CAT SE (30Qs)M = 0.39SD = 0.11

P-P SE (68Qs)M = 1.71SD = 1.13

CAT Theta

0 1 2 3 4

-10

12

3

0.92

-1 0 1 2 3

01

23

4Paper-pencil Theta

n = 268

CAT SE (30Qs)M = 0.39SD = 0.11

P-P SE (68Qs)M = 1.71SD = 1.13

Mean diff. of SE = -1.32

95% CI [-1.44, -1.19]

d = 1.65

Power = 0.99

EvaluationCAT vs. Paper-pencil

Means: CAT = Paper-pencilSEs: CAT < Paper-pencil

CAT measures the same ability with much more precision

(with fewer items).

Evaluation

Questionnaire

Result of the Questionnaire

Frequency

Response

150 100 50 0 50 100 150

Very inaccurate Inaccurate Rather Inaccurate Rather accurate Accurate Very accurate

Feedback Page

Future Research

•More items in the item bank

•Better formula for predicting other test scores

• Improved feedback

•Collaboration

Summary

•Created a CAT program

•Evaluation (1) CAT better than Paper-pencil (2) Feedback needs improvement.

Date post:	29-Jun-2015
Category:	Education
Upload:	atsushi-mizumoto
View:	1,372 times
Download:	0 times

Creating an in-house computerized adaptive testing (CAT) program with Concerto

Education