Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | savanna-masden |
View: | 217 times |
Download: | 1 times |
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1
Opportunities and Challenges for Developing and Evaluating
Diagnostic Assessments in STEM Education:
-A Modern Psychometric Perspective –
André A. Rupp, EDMS Department, University of Maryland
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 3
Proposed Panel Definition
The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to
learn, or knowledge. We use “diagnostic assessment (system)” to refer to
assessment processes based on an explicit cognitive model, itself supported by
empirical study, of proficient reasoning in a particular domain.
The cognitive model must support delineation of students’ and / or teachers’
strengths and weaknesses that can be traced as they move from less to more
proficient reasoning in the domain. The principled assessment design process
should specify how observed behaviors are used to make inferences about what
students or teachers know as they progress. We believe that diagnostic assessment
has the potential to inform and assess the outcomes of instruction.
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 4
Conceptualization of Problem Space
from Stevens, Beal, & Sprang (2009)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 6
The Evidence-centered Design Framework
adapted from Mislevy, Steinberg, Almond, & Lukas (2006)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 7
Frameworks vs. Models
A “principled assessment design framework” for diagnostic assessment such as
evidence-centered design is NOT a “model”. It does NOT prescribe a particular
statistical modeling approach.
A “statistical / psychometric model” is a mathematical tool that plays a supporting
role for generating evidence-based narratives about students’ and / or teachers’
strenghts and weaknesses. Its parameters do NOT have inherent meanings.
A “cognitive model” for diagnostic assessment is a theory and data-driven
description of how emergent understandings and misconceptions in a domain
develop and how these can be traced back to unobservable cognitive
underpinnings. It does NOT prescribe a singular assessment approach.
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 9
Test Score
I1
I2
Ik
:
Test Score
I1
I2
Ik
:
Test Score
I1
I2
Ik
:
Construct
Construct
Construct
Traditional Construct Operationalization
Theoretical Realm Empirical Realm
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 10
Feedback Utility (Part I – Scoring Card)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 11
Feedback Utility (Part II – Simple Progress Mapping)
Level 3 Level 4
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 13
Complex Assessment Tasks for Diagnosis (Part I)
from Seeratan & Mislevy (2008)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 14
Complex Assessment Tasks for Diagnosis (Example II)
from Behrens et al. (2009)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 15
Evidence Identification, Aggregation, & Synthesis
from Stevens, Beal, & Sprang (2009)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 16
Proficiency Pathways
from Stevens, Beal, & Sprang (2009)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 17
Interventional Pathways
from Stevens, Beal, & Sprang (2009)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 19
Selected Modeling Approaches for Diagnostic Assessments
Approaches Resulting in Continuous Proficiency Scales
1. Unidimensional explanatory IRT or FA models (e.g., de Boeck & Wilson, 2004)
2. Multidimensional CTT sumscores (e.g., Henson, Templin, & Douglas, 2007)
3. Multidimensional explanatory IRT or FA models (e.g., Reckase, 2009)
4. Structural equation models (e.g., Kline, 2010)
Approaches Resulting in Classifications of Respondents based on Discrete Scales
1. Bayesian inference networks (e.g., Almond, Williamson, Mislevy, & Yan, in press)
3. Parametric diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010)
4. Non- / Semi-parametric classification approaches (e.g., Tatsuoka, 2009)
4. Adapted clustering algorithms (e.g., Nugent, Dean, & Ayers, 2010)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 20
Psychometric Tools for Diagnostic Assessments
New frontiers of educational measurement
1. Educational data mining for simulation- / games-based assessment(e.g., Rupp et al., 2010; Soller & Stevens, 2007; West et al., 2009)
2. Diagnostic multiple-choice items / selected-response items(e.g., Briggs et al., 2006; de la Torre, 2009)
3. Computerized diagnostic adaptive assessment(e.g., Cheng, 2009; McGlohen & Chang, 2008)
Useful ideas from large-scale assessment
1. Modeling dependencies in nested response data (e.g., Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007)
2. Item families / task variants & automatic test / form assembly (e.g., Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press)
3. Survey designs using multiple test forms / booklets(e.g., Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010)
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 21
Opportunities and Challenges for Developing and
Evaluating Diagnostic Assessments in STEM Education:
-A Modern Psychometric Perspective –
André A. Rupp
EDMS Department, University of Maryland
1230-A Benjamin Building
College Park, MD 20742
Phone: (301) 405 – 3623
E-mail: [email protected]
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 22
References (Part I)
Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer.
Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, 191-204.
Borsboom, D., & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85–118). Cambridge, UK: Cambridge University Press.
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33-63.
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.
de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer.
de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163-183.
Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, 328-344.
Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), 39-53.
Geerlings, H., Glas, C. A. W., & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika.
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 23
References (Part II)
Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24, 417-444.
Haberman, S., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209-227.
Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79-95.
Jiao, H., von Davier, M., & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO.
Kane, M. T. (2006). Validation. In R L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Portsmouth, NH: Greenwood.
Kline, R. (2010). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.
Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press.
McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808-821.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–48). Mahwah, NJ: Erlbaum.
DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 24
References (Part III)
Nugent, R., Dean, N., & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA.
Reckase, M. (2009). Multidimensional item response theory. New York: Springer.
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press.
Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at http://escholarship.bc.edu/jtla/vol8/4/
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, 142-151. Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge.
Stevens, R., Beal, C., & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC.
Templin, J., & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.
West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA.