DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges...

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1

Opportunities and Challenges for Developing and Evaluating

Diagnostic Assessments in STEM Education:

-A Modern Psychometric Perspective –

André A. Rupp, EDMS Department, University of Maryland

Toward a Definition of “Diagnostic Assessment Systems”


Proposed Panel Definition

The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to

learn, or knowledge. We use “diagnostic assessment (system)” to refer to

assessment processes based on an explicit cognitive model, itself supported by

empirical study, of proficient reasoning in a particular domain.

The cognitive model must support delineation of students’ and / or teachers’

strengths and weaknesses that can be traced as they move from less to more

proficient reasoning in the domain. The principled assessment design process

should specify how observed behaviors are used to make inferences about what

students or teachers know as they progress. We believe that diagnostic assessment

has the potential to inform and assess the outcomes of instruction.


Conceptualization of Problem Space

from Stevens, Beal, & Sprang (2009)

Toward an Understanding of Frameworks & Models


The Evidence-centered Design Framework

adapted from Mislevy, Steinberg, Almond, & Lukas (2006)


Frameworks vs. Models

A “principled assessment design framework” for diagnostic assessment such as

evidence-centered design is NOT a “model”. It does NOT prescribe a particular

statistical modeling approach.

A “statistical / psychometric model” is a mathematical tool that plays a supporting

role for generating evidence-based narratives about students’ and / or teachers’

strenghts and weaknesses. Its parameters do NOT have inherent meanings.

A “cognitive model” for diagnostic assessment is a theory and data-driven

description of how emergent understandings and misconceptions in a domain

develop and how these can be traced back to unobservable cognitive

underpinnings. It does NOT prescribe a singular assessment approach.

Evidence-based Reasoning for “Traditional” Assessments


Test Score

I1

I2

Ik

:

Test Score

I1

I2

Ik

:

Test Score

I1

I2

Ik

:

Construct

Construct

Construct

Traditional Construct Operationalization

Theoretical Realm Empirical Realm


Feedback Utility (Part I – Scoring Card)


Feedback Utility (Part II – Simple Progress Mapping)

Level 3 Level 4

Evidence-based Reasoning for “Modern” Assessments


Complex Assessment Tasks for Diagnosis (Part I)

from Seeratan & Mislevy (2008)


Complex Assessment Tasks for Diagnosis (Example II)

from Behrens et al. (2009)


Evidence Identification, Aggregation, & Synthesis



Proficiency Pathways



Interventional Pathways


Selected Statistical Tools for Evidence-based Reasoning


Selected Modeling Approaches for Diagnostic Assessments

Approaches Resulting in Continuous Proficiency Scales

1. Unidimensional explanatory IRT or FA models (e.g., de Boeck & Wilson, 2004)

2. Multidimensional CTT sumscores (e.g., Henson, Templin, & Douglas, 2007)

3. Multidimensional explanatory IRT or FA models (e.g., Reckase, 2009)

4. Structural equation models (e.g., Kline, 2010)

Approaches Resulting in Classifications of Respondents based on Discrete Scales

1. Bayesian inference networks (e.g., Almond, Williamson, Mislevy, & Yan, in press)

3. Parametric diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010)

4. Non- / Semi-parametric classification approaches (e.g., Tatsuoka, 2009)

4. Adapted clustering algorithms (e.g., Nugent, Dean, & Ayers, 2010)


Psychometric Tools for Diagnostic Assessments

New frontiers of educational measurement

1. Educational data mining for simulation- / games-based assessment(e.g., Rupp et al., 2010; Soller & Stevens, 2007; West et al., 2009)

2. Diagnostic multiple-choice items / selected-response items(e.g., Briggs et al., 2006; de la Torre, 2009)

3. Computerized diagnostic adaptive assessment(e.g., Cheng, 2009; McGlohen & Chang, 2008)

Useful ideas from large-scale assessment

1. Modeling dependencies in nested response data (e.g., Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007)

2. Item families / task variants & automatic test / form assembly (e.g., Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press)

3. Survey designs using multiple test forms / booklets(e.g., Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010)


Opportunities and Challenges for Developing and

Evaluating Diagnostic Assessments in STEM Education:

-A Modern Psychometric Perspective –

André A. Rupp

EDMS Department, University of Maryland

1230-A Benjamin Building

College Park, MD 20742

Phone: (301) 405 – 3623

E-mail: [email protected]


References (Part I)

Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer.

Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, 191-204.

Borsboom, D., & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85–118). Cambridge, UK: Cambridge University Press.

Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, 33-63.

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer.

de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163-183.

Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, 328-344.

Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), 39-53.

Geerlings, H., Glas, C. A. W., & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika.


References (Part II)

Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24, 417-444.

Haberman, S., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209-227.

Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79-95.

Jiao, H., von Davier, M., & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO.

Kane, M. T. (2006). Validation. In R L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Portsmouth, NH: Greenwood.

Kline, R. (2010). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.

Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press.

McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808-821.

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.

Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–48). Mahwah, NJ: Erlbaum.


References (Part III)

Nugent, R., Dean, N., & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA.

Reckase, M. (2009). Multidimensional item response theory. New York: Springer.

Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press.

Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at http://escholarship.bc.edu/jtla/vol8/4/

Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, 142-151. Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge.

Stevens, R., Beal, C., & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC.

Templin, J., & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA.

Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press.

West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA.

Date post:	14-Dec-2015
Category:	Documents
Upload:	savanna-masden
View:	217 times
Download:	1 times

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges...

Documents