+ All Categories
Home > Documents > Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for...

Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for...

Date post: 20-Dec-2015
Category:
View: 216 times
Download: 3 times
Share this document with a friend
Popular Tags:
51
Brian Junker Carnegie Mellon 2007 NCME Symposium on Le arning-Embedded Assessmen t 1 All Papers for this Session are available at http://www.stat.cmu.edu/~brian/NC ME07
Transcript
Page 1: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

1

All Papers for this Session are available at

http://www.stat.cmu.edu/~brian/NCME07

Page 2: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

2

Uncertainty, Prediction and Teacher Feedback using an

Online Systemthat Teaches as it Assesses

Brian W. Junker

Thanks to Neil Heffernan, Ken Koedinger, Mingyu Feng, Beth Ayers, Nathaniel Anozie,

Zach Pardos, and many othershttp://www.assistment.org

Funding from US Department of Education, National Science Foundation (NSF), Office of Naval Research,

Spencer Foundation, and the US Army

Page 3: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

3

The ASSISTments Project

• Web-based 8th grade mathematics tutoring system• ASSIST with, and ASSESS, progress toward

Massachusetts Comprehensive Assessment System Exam (MCAS)– Guide students through problem solving with MCAS released

items– Predict students’ MCAS scores at end of year– Provide feedback to teachers (what to teach next?)

• (Generalize to other States…)• Over 50 workers at Carnegie Mellon, Worcester

Polytechnic Institute, Carnegie Learning, Worcester Public Schools

Page 4: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

4

The ASSISTment Tutor• Main Items: Released MCAS or

“morphs”• Incorrect Main “Scaffold” Items

– “One-step” breakdowns of main task

– Buggy feedback, hints on request, etc.

• All items coded by transfer model (Q-matrix) for knowledge components (KC’s)

• Student records contain responses, timing data, bugs/hints, etc.

• System tracks students through time, provides teacher reports per student & per class.

– Predict MCAS Scores– KC Feedback: learned/not-

learned, etc.

Page 5: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

5

This talk draws on two recent reports

Predicting MCAS Scores – Summary/Review:

Junker, B. W. (2006)."Using on-line tutoring records to predict end-of-year exam scores: experience with the ASSISTments project and MCAS 8th grade mathematics". To appear in Lissitz, R. W. (Ed.), Assessing and modeling cognitive development in school: intellectual growth and standard settings. Maple Grove, MN: JAM Press.

KC Feedback – Some Current Progress:

Anozie, N. & Junker, B. W. (2007). "Investigating the utility of a conjunctive model in Q matrix assessment using monthly student records in an online tutoring system". Paper to be presented at the Annual Meeting of the National Council on Measurement in Education, April 12, Chicago IL (K4; Thursday 8:15-10:15 Intercontinental Seville East).

(These and all papers for this session are available at http://www.stat.cmu.edu/~brian/NCME07)

Page 6: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

6

Challenges: Predicting MCAS

• The exact content of the MCAS exam is not known until months after it is given

• The ASSISTments themselves are ongoing throughout the school year as students learn (from teachers, from ASSISTment interactions, etc.).

Page 7: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

7

Methods: Predicting MCAS• Regression approaches [Feng et al, 2006; Anozie & Junker, 2006;

Ayers & Junker, 2006/2007]:– Percent Correct on Main Questions– Percent Correct on Scaffold Questions– Rasch proficiency on Main Questions– Online metrics (efficiency and help-seeking; e.g. Campione et al., 1985;

Grigorenko & Sternberg, 1998)– Both end-of-year and “month-by-month” models

• Bayes Net (DINA Model) approaches:– Predicting KC-coded MCAS questions from Bayes Nets (DINA model)

applied to ASSISTments [Pardos, et al., 2006];– Regression on number of KC’s mastered in DINA model [Anozie 2006]

• HLM-style growth curve models– At the KC level [Feng, Heffernan & Koedinger, 2006]– At the total score level [Feng, Heffernan, Mani & Heffernan, 2006]

Page 8: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

8

Results: Predicting MCASPredictors df CV-MAD CV-RMSE Remarks

PctCorrMain 1 7.18 8.65 7 months, main questions only

#Skills of 77 learned (DINA)

1 6.63 8.62 3 months, mains and scaffolds

Rasch Proficiency

1 5.90 7.18 7 months, main questions only

PctCorrMain + 4 metrics

35 5.46 6.56 7 months; 5 summaries each month

Rasch Profic + 5 metrics

6 5.24 6.46 7 months, main questions only

• Feng et al. (in press) estimate best-possible

(11% of 54pt raw score) from split-half experiments with MCAS• Ayers & Junker (2007) reliability calculation suggests approximate

bounds 1.05· MAD · 6.53.

Page 9: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

9

Conclusions: Predicting MCAS

• Tradeoff:– Greater model complexity (DINA) can help [Pardos et

al, 2006; Anozie, 2006];– Accounting for question difficulty (Rasch), plus online

metrics, does as well [Ayers & Junker, 2007]

• Limits of what we can accomplish for prediction– MCAS reliability ¼ 0.91– Typical ASSISTments ¼ 0.81– If ASSISTments were perfectly reliable, approx.

bound on MAD would be cut in half (3.40)

Page 10: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

10

Goal: KC Feedback

• Providing feedback on– individual students– groups of students

• Current teacher report:

For each skill, report percent correct on all items for which that skill is hardest.

• Can we do better?

Page 11: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

11

Challenges: KC Feedback• Different transfer models are used and expected by

different stakeholders:– The MCAS itself is scaled using unidimensional IRT / Pct

Correct– Description and design of the MCAS is based on

• Five-strand model of mathematics (Number & Operations, Algebra, Geometry, Measurement, Data Analysis & Probability)

• 39 “learning standards” nested within the five strands.– ASSISTment researchers have developed a transfer model

involving up to 106 KC’s (WPI-106, Pardos et al., 2006) nested within the 39 learning standards

• Scaffolding can be designed as optimal measures of single KC’s; or as optimal tutoring aids– When more than one transfer model is involved, scaffolds fail to

line up with at least one of them!• Different students work through ASSISTments at

different rates

Page 12: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

12

Methods: KC Feedback

Conjunctive binary-skills Bayes Net (Macready & Dayton, 1977; Haertel, 1989; Maris, 1999; Junker & Sijtsma DINA, 2001; etc.)

P(Congruence)

1

P(Equation-Solving)

2

P(Perimeter)

3

Gate P(Question)

True 1-s1

False g1

Gate P(Question)

True 1-s2

False g2

G P(Question)

True 1-s3

False g3

Pardos et al (2006): tend to prefer more KC’s for prediction; Anozie & Junker (2007): inference about KC inference (106 KC’s)

Page 13: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

13

Results: KC Feedback

• Average percent of KC’s mastered:

30-40%• February dip reflects

a recording error for main questions

• Can also break this down by individual KC (next slide)

Page 14: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

14

Results: KC Feedback

Page 15: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

15

Results: KC Feedback• Prediction based on

‘ideal response’ (P[guess] = P[slip] =0)

• Split-half cross-val accuracy 68-73%

• Enough to help teachers decide what to teach next.

Page 16: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

16

Digression: Question & Transfer Model Characteristics

Which graph contains the points in the table?

1. Quadrant of (-2,-3)?2. Quadrant of (-1,-1)?3. Quadrant of (1,3)?4. [Repeat main]

X Y

-2 -3

-1 -1

1 3

Main Item:

Scaffolds:

Guess (posterior boxplots)

Slip (posterior boxplots)

Page 17: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

17

Conclusions• Different transfer models for different purposes seem

necessary.

• For unidimensional prediction, unidimensional IRT, augmented with “assistance metrics”, works well– Account for question difficulty, help-seeking behavior– We are close to best-possible prediction error

• A finer grained model like the DINA model is needed for individual and group diagnostics– Individual diagnosis uncertainty can be large – Group diagnosis seems good enough to help teachers decide

what to teach next– Scaffold questions: teaching or assessment?

Page 18: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

18

Future Work• Transfer model / KC’s

– Discovering and improving the transfer model?– Different transfer models for different purposes – “play

together”?

• Experimental design to improve KC inferences• Account for learning over time

– Prior distributions for skills based on past performance?

– Markov Learning Model for each skill?

• Compare with crediting/blaming hardest KC– Accuracy of inference?– Speed of computation?

Page 19: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

19

Page 20: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

20

RMSE and MAD bounds(Ayers & Junker, 2007)

• Let

• Then

• And

Page 21: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

21

Dynamic Models:Anozie and Junker (2006)

• More months helps more than more metrics• First 5 online metrics retained for final model(s)

Page 22: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

22

Full Set of Online Metrics

Page 23: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

23

Dynamic Models:Anozie and Junker (2006)

• Look at changing influence of online metrics on MCAS prediction over time– Compute monthly summaries of all online metrics (not just %-

correct)– Build linear prediction model for each month, using all current

and prev. months’ summaries

• To enhance interpretation, variable selection– by metric, not by monthly summary– include/exclude metrics simultaneously in all monthly models

Page 24: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

24

KC’s in DINA analysis

Page 25: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

25

Results: KC Feedback

• Top shows posterior CI’s for one skill; middle and bottom are ‘sample sizes’

• More data or consistent evidence smaller CI

• Less data, or inconsistent evidence larger CI

• Experimental Design? How many questions? Which skills? Etc.

Page 26: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

26

Page 27: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

27

Methods: KC Feedback

• Pardos et al (2006) first tried DINA for MCAS prediction– Compared the 1-KC, 5-KC, 39-KC and 106-

KC models– Found 39 KC’s did best; 106 KC’s 2nd best

• Anozie & Junker (2007) apply DINA with an eye toward feedback to teachers etc.

Page 28: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

28

Static Prediction Models

• Feng et al. (2006 & to appear):– Online testing metrics

• Percent correct on main/scaffold/both items• “assistance score” = (errors+hints)/(number of scaffolds)• Time spent on (in-)correct answers• etc.

– Compare paper & pencil pre/post benchmark tests

• Ayers and Junker (2006): – Rasch & LLTM (linear decomps of item difficulty)– Augmented with online testing metrics

• Pardos et al. (2006); Anozie (2006): – Binary-skills conjunctive Bayes nets– DINA models (Junker & Sijtsma, 2001; Maris, 1999; etc.)

Page 29: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

29

The ASSISTment Architectures• Extensible Tutor Architecture

– Scalable from simple pseudo-tutors with few users to model-tracing tutors and 1000’s of users

– Curriculum Unit• Items organized into multiple curricula• Sections within curriculum: Linear, Random, Experimental, etc.

– Problem & Tutoring Strategy Units• Task organization & user interaction (e.g. main item & scaffolds, interface widgets, …)• Task components mapped to multiple transfer models

– Logging Unit• Fine-grained human-computer interaction trace• Abstracting/coarsening mechanisms

• Web-based Item Builder– Used by classroom teachers to develop content– Support for building curricula, mapping tasks to transfer models, etc.

• Relational Database and Network Architecture supports– User Reports (e.g., students, teachers, coaches, administrators)– Research Data Analysis

• Razzaq et al. (to appear) overview

Page 30: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

30

Two Assessment Goals

• To predict end-of-year MCAS scores

• To provide feedback to teachers (what to teach next?)

• But there are some complications…

Page 31: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

31

2004-2005 Data• Tutoring tasks

– 493 main items– 1216 scaffold items

• Students– 912 eighth-graders in two middle schools

• Skills Models (Transfer Models / Q Matrices)– 1 “Proficiency”: Unidimensional IRT– 5 MCAS “strands”: Number/Operations, Algebra,

Geometry, Measurement, Data/Probability– 39 MCAS learning standards: nested in the strands– 77 active skills: “WPI April 2005” (106 potential)

Page 32: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

32

Static Models: Feng et al. (2006 & to appear)

• What is related to raw MCAS (0-54 pts)?

• P&P pre/post benchmark tests• Online metrics:

– Pct Correct on Mains– Pct Correct on Scaffolds– Seconds Spent on Incorrect

Scaffolds– Avg Number of Scaffolds per

Minute– Number of Hints Plus Incorrect

Main Items– etc.

• All annual summaries

Predictor Corr

P & P TestsSEP-TEST 0.75

MARCH-TEST 0.41

ASSISTmentOnline Metrics

MAIN_PERCENT_CORRECT 0.75

MAIN_COUNT 0.47

TOTAL_MINUTES 0.26

PERCENT_CORRECT 0.76

QUESTION_COUNT 0.20

HINT_REQUEST_COUNT -0.41

AVG_HINT_REQUEST -0.63

HINT_COUNT -0.39

AVG_HINT_COUNT -0.63

BOTTOM_OUT_HINT_COUNT -0.38

AVG_BOTTOM_HINT -0.55

ATTEMPT_COUNT 0.08

AVG_ATTEMPT -0.41

AVG_QUESTION_TIME -0.12

AVG_ITEM_TIME -0.39

Page 33: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

33

Static Models: Feng et al. (2006 & to appear)

• Stepwise linear regression

• Mean Abs Deviation

• Within-sample MAD = 5.533

• Raw MCAS = 0-54, soWithin-sample Pct Err =

MAD/54 =10.25%(uses Sept P&P Test)

Predictor Coefficient

(Const) 26.04

Sept_Test 0.64

Pct_Correct_All 24.21

Avg_Attempts -10.56

Avg_Hint_Reqs -2.28

Page 34: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

34

Static Models:Ayers & Junker (2006)

• Compared two IRT models on ASSISTment main questions:– Rasch model for 354 main questions.– LLTM: Constrained Rasch model decompose

main question difficulty by skills in the WPI April Transfer Model (77 skills).

• Replace “Percent Correct” with IRT proficiency score in linear predictions of MCAS

Page 35: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

35

Static Models:Ayers & Junker (2006)

• Rasch fits much better than LLTM– BIC = -3,300– df = +277

• Attributable to– Transfer model?– Linear decomp of item

difficulties?

• Residual and difficulty plots suggest transfer model fixes.

Page 36: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

36

Static Models:Ayers & Junker (2006)

• Focus on Rasch, predict MCAS with

where = proficiency, Y=online metric• 10-fold cross-validation vs. 54-pt raw MCAS:

Predictors Variables CV- MAD CV % Error

% Corr Main 1 7.18 13.30

(proficiency)

1 5.90 10.93

+ 5 Online Metrics

6 5.24 9.70

Page 37: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

37

Static Models:Pardos et al. (2006)

• Compared nested versions of binary skills models (coded both ASSISTments and MCAS):

• gi = 0.10, si = 0.05, all items; k = 0.5, all skills• Inferred skills from ASSISTments; computed

expected score for 30-item MCAS subsetMODEL Mean Absolute Deviance (MAD) % ERROR (30 items)

39 MCAS standards 4.500 15.00

106 skills (WPI Apr) 4.970 16.57

5 MCAS strands 5.295 17.65

1 Binary Skill 7.700 25.67

Page 38: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

38

Static Models: Anozie (2006)

• Focused on 77 active skills in WPI April Model

• Estimated k’s, gi’s and si’s using flexible priors

• Predicted full raw 54-pt MCAS score as linear function of (expected) number of skills learned

Months of Data CV MAD CV % Err

1 8.11 15.02

2 7.38 13.68

3 6.79 12.58

Page 39: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

39

Dynamic Prediction Models

• Razzaq et al. (to appear): evidence of learning over time

• Feng et al. (to appear): student or item covariates plus linear growth curves (a la Singer & Willett, 2003)

• Anozie and Junker (2006): changing influence of online metrics over time

Page 40: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

40

Dynamic Models: Razzaq et al. (to appear)

• ASSISTment system is sensitive to learning• Not clear what is the source of learning here…

0

5

10

15

20

25

30

35

40

0 1 2 3 4 5 6

Time

% C

orre

ct o

n S

yste

m p

er

stud

ent

Sept

Oct Nov JanDec Jan Feb Mar

Page 41: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

41

Dynamic Models: Feng et al. (to appear)

• Growth-Curve Model I: Overall Learning

• Growth-Curve Model II: Learning in Strands

School was a better predictor (BIC) than Class or Teacher;possibly because School demographics dominate the intercept.

Sept_Test is a good predictor of baseline proficiency.Baseline and learning rates varied by Strand.

Page 42: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

42

Dynamic Models:Anozie and Junker (2006)

Page 43: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

43

Dynamic Models:Anozie and Junker (2006)

• Recent main question performance dominates – proficiency?

Page 44: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

44

Dynamic Models:Anozie and Junker (2006)

• Older performance on scaffolds similar to recent – learning?

Page 45: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

45

Summary of Prediction ModelsModel Variables CV-MAD CV % Error CV-RMSE

PctCorrMain 1 7.18 13.30 8.65

#Skills of 77 learned

1? 6.63 12.58 8.62

Rasch Proficiency

1? 5.90 10.93 7.18

PctCorrMain + 4 metrics

35 ( = 5 x 7 ) 5.46 10.10 6.56

Rasch Profic + 5 metrics

6? 5.24 9.70 6.46

• Feng et al. (in press) compute the split-half MAD of the MCAS and estimate ideal % Error ~ 11%, or MAD ~ 6 points.

• Ayers & Junker (2006) compute reliabilities of the ASSISTment sets seen by all students and estimate upper and lower bounds for optimal MAD: 0.67 MAD 5.21.

Page 46: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

46

New Directions

• We have some real evidence of learning– We are not yet modeling

individual student learning• Current teacher report:

For each skill, report percent correct on all items for which that skill is hardest.– Can we do better?

• Approaches now getting underway:– Learning curve models– Knowledge-tracing models

Page 47: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

47

New Directions:Cen, Koedinger & Junker (2005)

• Inspired by Draney, Pirolli & Wilson (1995)– Logistic regression for

successful skill uses– Random intercept (baseline

proficiency)– fixed effects for skill and

skill*opportunity• Difficulty factor: skill but not

skill*opportunity • Learning factor: skill and

skill*opportunity

– Part of Data Shop at http://www.learnlab.org

• Feng et al. (to appear) fit similar logistic growth curve models to ASSISTment items

Page 48: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

48

New Directions: Knowledge Tracing

• Combine knowledge tracing approach of Corbett, Anderson and O’Brien (1995) with DINA model of Junker and Sijtsma (2001)

• Each skill represented by a two state (unlearned/learned) Markov process with absorbing state at “learned”.

• Can locate time during school year when each skill is learned.

• Work just getting underway (Jiang & Junker).

Page 49: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

49

Discussion

• ASSISTment system – Great testbed for online cognitive modeling and

prediction technologies– Didn’t mention reporting and “gaming detection”

technologies– Teachers positive, students impressed

• Ready-Fire-Aim– Important! Got system up and running, lots of user

feedback & buy-in– But… E.g. lack of control over content and content-

rollout (content balance vs MCAS?)– Given this, perhaps only crude methods

needed/possible for MCAS prediction?

Page 50: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

50

Discussion• Multiple skill codings for different purposes

– Exam prediction vs. teacher feedback; state to state.

• Scaffolds– Dependence between scaffolds and main items– Forced-scaffolding: main right scaffolds right– Content sometimes skills-based, sometimes tutorial

• We are now building some true one-skill decomps to investigate stability of skills across items

• Student learning over time– Clearly evidence of that!– Some experiments not shown here suggest modest but

significant value-added for ASSISTments– Starting to model learning, time-to-mastery, etc.

Page 51: Brian Junker Carnegie Mellon 2007 NCME Symposium on Learning-Embedded Assessment 1 All Papers for this Session are available at brian/NCME07.

Brian Junker Carnegie Mellon

2007 NCME Symposium on Learning-Embedded Assessment

51

ReferencesAnozie, N. (2006). Investigating the utility of a conjunctive model in Q-matrix assessment using monthly student records in an online tutoring system.

Proposal submitted to the National Council on Measurement in Education 2007 Annual Meeting.Anozie, N.O. & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring

system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighth-grade mathematics? American Association for Artificial

Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Working paper.Corbett, A. T., Anderson, J. R., & O'Brien, A. T. (1995) Student modeling in the ACT programming tutor. Chapter 2 in P. Nichols, S. Chipman, & R.

Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan,

Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrocs to

measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp 31-40.

Feng, M., Heffernan, N., Mani, M., & Heffernan, C. (2006). Using mixed effects modeling to compare different grain-sized skill models. AAAI06 Workshop on Educational Data Mining, Boston MA.

Feng, M., Heffernan, N. T., & Koedinger, K. R. (in press). Addressing the testing challenge with a web-based E-assessment system that tutors as it assesses. Proceedings of the 15th Annual World Wide Web Conference. ACM Press (Anticipated): New York, 2005.

Hao C., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A*Search and Logistic Regression. In Technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005.

Junker, B.W. & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement 25: 258-272.

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika 64, 187-212.Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with Bayesian

Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems. Taiwan. 2006.Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T.,

Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., & Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th Artificial Intelligence In Education. Amsterdam: ISO Press. pp 555-562.

Razzaq, L., Feng, M., Heffernan, N. T., Koedinger, K. R., Junker, B., Nuzzo-Jones, G., Macasek, N., Rasmussen, K. P., Turner, T. E. & Walonoski, J. (to appear). A web-based authoring tool for intelligent tutors: blending assessment and instructional assistance. In Nedjah, N., et al. (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series (see http://isebis.eng.uerj.br).

Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.Websites:

http://www.assistment.orghttp://www.learnlab.orghttp://www.educationaldatamining.org


Recommended