Date post: | 29-Jun-2015 |
Category: |
Education |
Upload: | atsushi-mizumoto |
View: | 1,372 times |
Download: | 0 times |
Creating an in-house computerized adaptive testing (CAT) program with Concerto
Atsushi, MIZUMOTO(Kansai University)
2013/09/20JLTA at Waseda University
Computerized Adaptive Testing
CAT needsItem Response Theory
CTT vs. IRTAspect CTT IRT
Test score Ordinal scale Interval scale
Ability estimate Test-dependent Test-independent
Test result Person-dependent Person-independent
Measurement target (Precision) All test-takers Individuals
Equating/CAT Difficult Easy
Ohtomo (2009)
CAT Needs IRT
CAT
IRT
IRT
IRT
History of CAT Research
40 years (Thomson & Weiss, 2011))
30 in LT (Koyama, 2010))
Example of CAT
Example of CAT
CBT ≠ CAT
How CAT Works
http://www.j-cat.org/page/interpret
Advantages of CAT
• Tailored for individual test-takers
• Shorter test time
• More precision (= SE smaller)
• No need for random sampling
www.geocities.jp/kosugitti/labo/irtnote.pdf
Purposes
•Creating a CAT program
•Evaluation
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Moodle Plugin
http://moodle2x.info
1. Free account(150 test takers/month)
2. Amazon Machine Images(Free for a year)
3. Installing it on your own server
• Open-source
• Running R on a server (catR, RMySQL)
• HTML-based
Installation on a server
https://code.google.com/p/concerto-platform/wiki/installation4
Wiki (Resources)
https://code.google.com/p/concerto-platform/wiki/Resources?tm=6
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Constructing an Item Bank (Pretest)
•Vocabulary Test (Mizumoto, 2006) http://www.mizumot.com/files/VocSizeMeasure.pdf
•Based on SVL 12,000 (Up to 8,000 level; 30 items for each level)
•716 university EFL learners
Sample Question
(1) 心の, 精神の
A. essential
B. creative
C. loose
D. mental
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Calibrating the Item Bank
•240 items analyzed (Rasch model)
•150 items left for the item bank
•Calibrated with two parameter logistic model (item difficulty & discrimination)
•Update the csv file to Concerto
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Specifications of CAT
•Starting point (parameters, initial ability, randmized/fixed)
•Ability estimation method (empirical Bayes and others)
•Stopping rule (Number of items/Standard error)
•Final ability estimation
Magis and Raîche (2012, p. 7)
How many items for what SE?
• Simulation with catR package
Magis, D., & Raîche, G. (2012). http://www.jstatsoft.org/v48/i08
True Theta = 1, SE = 0.3
Stopping rule = 30 items
Concerto
http://langtest.jp/concerto/?tid=20
Feedback Page
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
Creating a CAT Program
•Choosing the CAT System
•Constructing an Item Bank (Pretest)
•Calibrating the Item Bank
•Determine Specifications & Feedback
•Administering the CAT
268 test takers(university first year)
(1) CAT(2) Paper-pencil version (68 items) common person linking
(3) Questionnaire“What did you think of the CAT result?”
Evaluation
CAT vs. Paper-pencil
CAT Theta
0 1 2 3 4
-10
12
3
0.92
-1 0 1 2 3
01
23
4
Paper-pencil Theta
n = 268
Random30Qs
Fixed68Qs
CAT Theta
0 1 2 3 4
-10
12
3
0.92
-1 0 1 2 3
01
23
4Paper-pencil Theta
n = 268
CAT (30Qs)M = 1.71SD = 1.13
P-P (68Qs)M = 1.72SD = 0.95
CAT Theta
0 1 2 3 4
-10
12
3
0.92
-1 0 1 2 3
01
23
4Paper-pencil Theta
n = 268
CAT (30Qs)M = 1.71SD = 1.13
P-P (68Qs)M = 1.72SD = 0.95
Mean diff. = -0.0295% CI [-0.07, 0.04]
d = 0.01
Power = .06
CAT Theta
0 1 2 3 4
-10
12
3
0.92
-1 0 1 2 3
01
23
4Paper-pencil Theta
n = 268
CAT SE (30Qs)M = 0.39SD = 0.11
P-P SE (68Qs)M = 1.71SD = 1.13
CAT Theta
0 1 2 3 4
-10
12
3
0.92
-1 0 1 2 3
01
23
4Paper-pencil Theta
n = 268
CAT SE (30Qs)M = 0.39SD = 0.11
P-P SE (68Qs)M = 1.71SD = 1.13
Mean diff. of SE = -1.32
95% CI [-1.44, -1.19]
d = 1.65
Power = 0.99
EvaluationCAT vs. Paper-pencil
Means: CAT = Paper-pencilSEs: CAT < Paper-pencil
CAT measures the same ability with much more precision
(with fewer items).
Evaluation
Questionnaire
Result of the Questionnaire
Frequency
Response
150 100 50 0 50 100 150
Very inaccurate Inaccurate Rather Inaccurate Rather accurate Accurate Very accurate
Feedback Page
Future Research
•More items in the item bank
•Better formula for predicting other test scores
• Improved feedback
•Collaboration
Summary
•Created a CAT program
•Evaluation (1) CAT better than Paper-pencil (2) Feedback needs improvement.