Post on 09-Aug-2015
transcript
Investigating Automated Student Modeling in a Java MOOC
Michael Yudelson1, Roya Hosseini2, Arto Vihavainen3, & Peter Brusilovsky2
1Carnegie Learning, 2University of Pittsburgh, 3University of Helsinki
Michael V. Yudelson (C) 2014 2
Everybody’s Coding
• Programming is no longer the trade of the few– Wide penetration of computer science– Challenge for educators• Talent pool is different
• Abundance of learning materials doesn’t help– Even if digital, there’s no persistent student model– New languages appear and need to be taught
(e.g., R, Swift)
Michael V. Yudelson (C) 2014 3
Problem
• Programming course (MOOC or otherwise) at University of Helsinki– 100 close-formed/open-ended assignments over 6 weeks (101-103
lines of code each)– NetBeans plugin for testing/submitting/feedback– Code snapshots are meticulously archived– No provisions to account for student learning
(no student model)• On top of black-box-style pass/fail code grading
– Build longitudinal student model automatically– Non-trivial programming assignments
Michael V. Yudelson (C) 2014 4
Data
• Every snapshot compiled and ran against tests• JavaParser* extracted concepts/skills (programming constructs)• Incremental snapshots that did not result in changes to
concepts removed
Course Students All (Male)
Age Min/Median/Max
Code snapshotsAll / Median
Intro to Programming, Fall 2012 185 (121) 18 / 22 / 65 204460 / 1131
Intro to Programming, Fall 2013 207 (147) 18 / 22 / 57 263574 / 1126
Programming MOOC, Spring 2013 683 (492) 13 / 23 / 75 842356 / 876
* Hosseini, R., & Brusilovsky, P. (2013). JavaParser: A Fine- Grain Concept Indexing Tool for Java Problems. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013) (pp. 60-63).
Code for assignment: automatically saved, ran against tests, submitted
Michael V. Yudelson (C) 2014 5
Questions
• Given the approach is fully automated– Can we build accurate models of learning?– Can we do that while using a fraction of the data?• Only fraction of the concepts are relevant in each
successive code snapshot
– Can the models be used beyond detecting student progress• E.g. for building an intelligent [fully automated] hinting
component for struggling students
Michael V. Yudelson (C) 2014 6
Methodology (1)
• Modeling student learning– Additive Factors Model• responseilj = studenti + problemj +
Σk(skillk + skill_slopek * attemtpsik)
• responseij – student ith code passing test l for problem j
• Selecting concepts (AFM A, AFM B, AFM C)– A. all concepts available– B. changes from the previous snapshot– C. changes, distinguishing addition/deletion
Michael V. Yudelson (C) 2014 7
Methodology (2)
• Selecting relevant concepts (+PC)– PC – conditional independence search algorithm
from Tetrad tool*– What concepts are associated with [not] passing
the test– PC data-mining task was setup for each problem
• Different snapshot submission speeds (+Ln)– Smoothing attempt counts by taking a logarithm
*Spirtes, P., Glymour, C., and Scheines, R. (2000) Causation, Prediction, and Search, 2nd Ed. MIT Press, Cambridge MA.
Michael V. Yudelson (C) 2014 8
Methodology (3)
• Validating models– Consecutive code snapshots and changes in
passing/failing the tests (YY, YN, NY, NN)– Model support scores for adding, deleting
concepts: positive, negative, neutral (P,N,0)• Support – sum of slopes for the concept changes
– NYP0 – from fail to pass, positive support for addition, neutral for deletions
Michael V. Yudelson (C) 2014 9
Methodology (4)
• Conditional probabilities – relative frequencies of– A: pass-to-pass – no-negative support for any changes– B: pass-to-fail – negative support for any change– C: fail-to-fail – no positive support for changes– D: fail-to-pass – positive support for changes
• Grouped conditional probabilities– Average of all A, B, C, D– Average of B and D (arguably of primary interest)
• Last but not least – size of the data required to fit models
Michael V. Yudelson (C) 2014 10
Results (1)
• Accuracy, Data size, Validation values
Michael V. Yudelson (C) 2014 11
Results (2)Model Acc. Acc. rnk File Sz rnk Val. A-D rnk Val. B,D rnk Overall rnk
Rasch .71 - - - - -
AFM A .81
AFM B .73
AFM C .78
AFM A+PC .84 1
AFM B+PC .77
AFM C+PC .83 2
AFM A+Ln* .75 2 (.62) 3 (.45)
AFM B+Ln .71 1 (123Mb) 1 (.63) 2 (4.75)
AFM C+Ln .77 2 (139Mb)
AFM A+PC+Ln .82 3 6 (284Mb) 8 (.59) 2 (.47) 3 (4.75)
AFM B+PC+Ln .75 3 (141Mb) 3 (.62) 1 (.49) 1 (4.00)
AFM C+PC+Ln .78
* Logarithm of opportunity counts slightly inflates log file size due to text format
See full table in the paper
Michael V. Yudelson (C) 2014 12
Discussion
• It is possible to fully automate student modeling (in programming domain) with a fraction of rich data
• Models we built have potential to be used for providing in-problem learning support
• The choice of best model has tradeoffs– Accuracy vs. data requirement vs. validation*
Michael V. Yudelson (C) 2014 13
Future Work
• Address concept counts in snapshots• Make use of code structure (parse trees)• Make use of student behaviors– Builder, Massager, Reducer, Struggler
• Account for within IDE actions (save, run, ask for hint)
• Tie to student’s browsing of the support material
Michael V. Yudelson (C) 2014 14
Thank You!