Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
SRI Technology Evaluation Workshop Slide 1RJM 2/23/00
Leverage Points for Improving Educational Assessment
Robert J. Mislevy, Linda S. Steinberg,
and Russell G. Almond
Educational Testing Service
February 25, 2000
Presented at the Technology Design Workshop sponsored by the U.S. Department of Education, held at Stanford Research Institute, Menlo Park, CA, February 25-26, 2000.
The work of the first author was supported in part by the Educational Research and Development Centers Program, PR/Award Number R305B60002, as administered by the Office of Educational Research and Improvement, U.S. Department of Education. The findings and opinions expressed in this report do not reflect the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment, the Office of Educational Research and Improvement, or the U.S. Department of Education .
SRI Technology Evaluation Workshop Slide 2RJM 2/23/00
Some opportunities...
Cognitive/educational psychology» how people learn,» organize knowledge,» put knowledge to use.
Technology to...» create, present, and vivify “tasks”; » evoke, capture, parse, and store data; » evaluate, report, and use results.
SRI Technology Evaluation Workshop Slide 3RJM 2/23/00
A Challenge
How the heck do you make sense of rich, complex data, for more ambitious inferences about students?
SRI Technology Evaluation Workshop Slide 4RJM 2/23/00
A Response
Design assessment from
generative principles ...
1. Psychology
2. Purpose
3. Evidentiary reasoningConceptual design LEADS
Tasks, statistics & technology FOLLOW
SRI Technology Evaluation Workshop Slide 5RJM 2/23/00
Principled Assessment Design
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
The three basic models
SRI Technology Evaluation Workshop Slide 6RJM 2/23/00
Evidence-centered assessment design
What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?
(Messick, 1992)
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 7RJM 2/23/00
Evidence-centered assessment design
What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?
What behaviors or performances should reveal those constructs?
(Messick, 1992)
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 8RJM 2/23/00
The Evidence Model(s)
Evidence rules extract features from a work product and evaluate values of observable variables.
Evidence Model(s)
Stat model Evidence
rules
Work product
Observable variables
SRI Technology Evaluation Workshop Slide 9RJM 2/23/00
Evidence Model(s)
Stat model Evidence
rules
The Evidence Model(s)
The statistical component expresses the how the observable variables depend, in probability, on student model variables.
Student modelvariables
Observablevariables
SRI Technology Evaluation Workshop Slide 10RJM 2/23/00
Evidence-centered assessment design
What complex of knowledge, skills, or other attributes should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society?
What behaviors or performances should reveal those constructs? What tasks or situations should elicit those behaviors?
(Messick, 1992)
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 11RJM 2/23/00
The Task Model(s)
Task-model variables describe features of tasks.
A task model provides a framework for describing and constructing the situations in which examinees act.
Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
SRI Technology Evaluation Workshop Slide 12RJM 2/23/00
The Task Model(s)
Includes specifications for the stimulus material, conditions, and affordances--the environment in which the student will say, do, or produce something.
Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
SRI Technology Evaluation Workshop Slide 13RJM 2/23/00
The Task Model(s)
Includes specifications for the “work product”:the form in which what the student says, does, or produces will be captured.
Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
SRI Technology Evaluation Workshop Slide 14RJM 2/23/00
Leverage Points...
For cognitive/educational psychology For statistics For technology
SRI Technology Evaluation Workshop Slide 15RJM 2/23/00
Leverage Points for Cog Psych
The character and substance of the student model.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 16RJM 2/23/00
Example a: GRE Verbal Reasoning
The student model is just the IRT ability parameter the tendency to make correct responses in the mix of items presented in a GRE-V.
Example b: HYDRIVE
Student-model variables in HYDRIVE
A Bayes net fragment.
Overall Proficiency
Procedural Knowledge
PowerSystem
SystemKnowledge
Strategic Knowledge
Use ofGauges
SpaceSplitting
Electrical Tests
SerialElimination
Landing GearKnowledge
Canopy Knowledge
ElectronicsKnowledge
HydraulicsKnowledge
Mechanical Knowledge
SRI Technology Evaluation Workshop Slide 18RJM 2/23/00
Leverage Points for Cog Psych
The character and substance of the student model. What we can observe to give us evidence,
The work product
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 19RJM 2/23/00
Leverage Points for Cog Psych
The character and substance of the student model. What we can observe to give us evidence,
and how to recognize and summarize its key features.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 20RJM 2/23/00
Leverage Points for Cog Psych
The character and substance of the student model. What we can observe to give us evidence,
and how to recognize and summarize its key features. Modeling which aspects of performance depend on which aspects of
knowledge, in what ways.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 21RJM 2/23/00
Leverage Points for Cog Psych
The character and substance of the student model. What we can observe to give us evidence,
and how to recognize and summarize its key features. Modeling how which aspects of performance depend on which aspects of
knowledge , in what ways. Effective ways to elicit the kinds of behavior we need to see.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 22RJM 2/23/00
Leverage Points for Statistics
Managing uncertainty with respect to the student model. Bayes nets (generalize beyond familiar test theory models--eg, VanLehn) Modular construction of models Monte Carlo estimation Knowledge-based model construction wrt the student model.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 23RJM 2/23/00
Leverage Points for Statistics
Managing the stochastic relationship between observations in particular tasks and the persistent unobservable student model variables. Bayes nets Modular construction of models (incl psychometric building blocks) Monte Carlo approximation Knowledge-based model construction--docking with the student model.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 24RJM 2/23/00
Example a, continued: GRE-V
Sample Bayes net --
Student model fragment
docked with an
Evidence Model fragment (IRT model & parameters for this item)
Xj
Library of
Evidence Model
Bayes net fragments
X1
X2::
Xn
Example b, continued: HYDRIVE
Sample Bayes net fragment Library of fragments
Canopy Situation--No split possible
Canopy Situation--No split possible
Use ofGauges
SerialElimination
Canopy Knowledge
HydraulicsKnowledge
Mechanical Knowledge
SRI Technology Evaluation Workshop Slide 26RJM 2/23/00
Leverage Points for Statistics
Extracting features and determining values of observable variables . Bayes nets (also neural networks, rule-based logic) Modeling human raters for training, quality control, efficiency
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 27RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 28RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about
knowledge used for production and interaction.
Stimulus material
Work environment
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 29RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about
knowledge used for production and interaction.
Work product
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 30RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about
knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 31RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about
knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work. Construction and calculation to guide acquisition of, and manage of
uncertainty about, our knowledge about the student.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 32RJM 2/23/00
Leverage Points for Technology
Dynamic assembly of the student model. Complex and realistic tasks that can produce direct evidence about
knowledge used for production and interaction. Automated extraction and evaluation of key features of complex work. Construction and calculation to guide acquisition of, and manage and
uncertainty about, knowledge about the student. Automated/assisted task construction, presentation, management.
Evidence Model(s) Task Model(s)
1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx
Student Model Stat model Evidence
rules
SRI Technology Evaluation Workshop Slide 33RJM 2/23/00
The Cloud behind the Silver Lining
These developments will have the most impact when assessments are built for well-defined purposes, and connected with a conception of knowledge in the targeted domain.
They will have much less impact for ‘drop-in-from-the-sky’ large-scale assessments like NAEP.