Overview The need for cognitively-based assessment. Defining cognitive models and their
properties. Tools for cognitively-based assessment
design and analysis. Empirical research on cognitive approaches to
test design and validation.
The Purpose of Educational Assessment New purposes for testing have introduced issues related to
the inappropriateness of current standardized tests.
Such assessments shall produce individual student interpretive, descriptive, and diagnostic reports…that allow parents, teachers, and principals to understand and address the specific academic needs of students, and include information regarding achievement on academic assessments aligned with State academic achievement standards, and that are provided to parents, teachers, and principals as soon as is practicably possible after the assessment is given, in an understandable and uniform format, and to the extent practicable, in a language that parents can understand.
NCLB Part A Subpart 1 Sec. 2221(b)3(C)(xii), 2001
Implications of new assessment needs. “[a]ll assessments will be more fruitful when based
on an understanding of cognition in the domain and on the precept of reasoning with evidence” (NRC, 2001, p. 178) Increase our understanding of the claims that we want to
make about students, instruction, programs, and policy. Detailed description of the skills, abilities, constructs we are
measuring. Increased understanding of the evidence provided by the
student responses to the test questions. Detailed description of item difficulty, discrimination, and other
statistical and psychological properties.
Cognitively Based Assessment Design If we begin with a complete understanding of
the skill, domain, competency, ability, etc. that we want to measure, then we can be more principled in our design of the tools we use to measure it.
Further, if we understand our measurement tools and the scores they generate more fully, we can evaluate their quality relative to our measurement goals.
Cognitive Models and Grain Size What is a complete understanding of the construct?
Of the test? Types of models:
Model of test specifications (content standards). Model of task performance (cognitive processes).
Alignment of the test with standards or test specifications may not be sufficient to yield the desired information. The model and alignment must be made at the appropriate
level of inference.
Construct / Latent Trait
Process A Process B Process C Process D
Item / Test Question
Feature A Feature B Feature C Feature D
General Model of Cognitively-Based Assessment Design
Potential Cognitive Models Information processing models.
Coherence-integration theory of text processing. Propositional representation of text. Cyclic construction of representation and integration of new
information.
Activation theory. Activation of information is influenced by various factors. The most highly activated information will be selected.
Evaluate the “completeness” of these models in terms of the target inferences to be made about students.
Steps in Assessment Design
Generate itemslinked to
cognitive model.
Derive person level information
from item responses.
Generate cognitive model of
the construct.
Construct-based ModelConstruct-based Model
Tools for Cognitively-Based Assessment Development
Design Frameworks Mislevy’s Evidence Centered Design
A multi-level model of assessment. Embretson’s Cognitive Design System
A process-based approach. Bennett and Bejar’s Generative Approach
A structural approach.
Cognitively-Based Assessment Design
Link item featuresto a
cognitive model.
Derive person level information
from item responses.
Generate a cognitive model of the items.
Task-based ModelTask-based Model
Purpose of Item Difficulty Modeling (IDM)
Extend available student and item information beyond a single statistical parameter.
Substantive information can be useful for: Verification of construct definition (Construct
Validity). Creating new items (Automatic/Algorithmic Item
Generation). Providing diagnostic information (Score
Reporting). Understanding group differences (DIF).
Tools for Cognitively-Based Assessment Development
Design Frameworks for Test Construction Evidence Centered Design Cognitive Design System Generative Approach
Psychometric Models for IDM Tree Based Regression Approach Rule Space Methodology Attribute Hierarchy Method Fusion Model Linear Logistic Latent Trait Model
Traditional item-construct definition. Reading comprehension questions measure the
ability to read with understanding, insight and discrimination. This type of question explores the ability to analyze a written passage from several perspectives. These include the ability to recognize both explicitly stated elements in the passage and assumptions underlying statements or arguments in the passage as well as the implications of those statements or arguments.
Model of Test-Specifications (Standards and Objectives)
Strand 2: Comprehending Literary Text Comprehending Literary Text identifies the comprehension strategies that are specific in the study of a variety of literature. Concept 1: Elements of Literature
Identify, analyze, and apply knowledge of the structures and elements of literature.
PO 1. Identify the plot of a literary selection, heard or read.
PO 2. Describe characters (e.g., traits, roles, similarities) within a literary selection, heard or read.
PO 3. Sequence a series of events in a literary selection, heard or read.
PO 4. Determine whether a literary selection, heard or read, is realistic or fantasy.
PO 5. Participate (e.g., clapping, chanting, choral reading) in the reading of poetry by responding to the rhyme and rhythm.
Concept 2: Historical and Cultural Aspects of Literature
Recognize and apply knowledge of the historical and cultural aspects of American, British, and world literature.
PO 1. Compare events, characters and conflicts in literary selections from a variety of cultures to their experiences.
Cognitive Model Development Cognitive Theory
Generate list of relevant processing components from theory. Correlational Studies
Establish a statistical relationship between the features and the item properties.
Experimental Manipulations Context/format Item Design
Use process tracing methods to identify additional processing influences.
Verbal protocols (“think alouds”) Eye-tracking data
Attribute/Skill List Encoding Process Skills:
EP1: Encoding propositionally dense text. EP2: Encoding propositionally sparse text. EP3: Encoding high level vocabulary. EP4: Encoding low level vocabulary.
Decision Process Skills: DP1: Synthesizing large sections of text into a single answer. DP2: Confirming correct answer from direct information in the text. DP3: Falsifying incorrect answers from direct information in the text. DP4: Confirming correct answer by inference from the text. DP5: Falsifying incorrect answers by inference from the text. DP6: Encoding correct answers with high vocabulary. DP7: Encoding incorrect answers with high vocabulary. DP8: Mapping correct answers to verbatim text. DP9: Mapping correct answers to paraphrased text. DP10: Mapping correct answers to reordered verbatim text. DP11: Mapping correct answers to reordered paraphrased text. DP12: Mapping incorrect answers to verbatim text. DP13: Mapping incorrect answers to paraphrased text. DP14: Mapping incorrect answers to reordered verbatim text. DP15: Mapping incorrect answers to reordered paraphrased text. DP16: Locating relevant information early in text. DP17: Locating relevant information in the middle of the text. DP18: Locating information at the end of the text. DP19: Using additional falsification skills for specially formatted items.
Cognitive Model (Skill/Subskill Model)
finer grain size
Cognitive Model (Information Processing Model)
Encoding: construction
Coherence
Processes: integration
Encoding & Coherence Processes
Text Mapping Evaluate Truth Status
Text Representation Response Decision
Cognitive Variables Modifier Propositional Density Predicate Propositional Density Text Content Vocabulary Level Percent Content Words Percent of Relevant Text Falsification Confirmation Vocabulary Level of the Distractors Vocabulary Level of the Correct Response Reasoning of the Distractors Reasoning of the Correct Response Location of Relevant Information in Text Length of Passage Special Item Format
Full Item Difficulty Model
Encoding: construction
Coherence
Processes: integration
Encoding & Coherence Processes
Text Mapping Evaluate Truth Status
Text Representation Response Decision
• Vocabulary Level
• Sentence Length
• Propositional Density
• Argument Structure
• Text Length
• Vocabulary Level
• Sentence Length
• Semantic Overlap
• Level of Question
• Vocabulary Level of Key and Distractors
• Falsifiability of Distractors
• Confirmation of the Key
Activation Model of Item DifficultyItem Features Item 1 Item 2 Item 3
Key
Location Correspondence Elaboration
Early Verbatim Strong Elaboration
Delayed Paraphrase Strong Elaboration
Delayed Paraphrase No Elaboration
Resulting Activation High Moderate Low Distractor
Location Correspondence Elaboration
Delayed Paraphrased No Elaboration
Delayed Paraphrased No Elaboration
Early Verbatim Strong Elaboration
Resulting Activation Low Low High
Expected Difficulty Easy
Medium
Hard
Regression Model of GRE ItemsVariable B SE(B) β t p
Text Encoding (TE) Modifier Propositional Density 6.121 3.698 .298 1.655 .100 Predicate Propositional Density 4.728 2.656 .190 1.780 .077 Text Content Vocabulary Level .643 .511 .092 1.257 .210 Percent Content Words -1.955 1.938 -.217 -1.009 .314
Decision Processing (DP) Percent Relevant Text -.14 .25 -.04 -.57 .57 Confirmation .08 .08 .07 .96 .34 Falsification .37 .25 .10 1.48 .14 Vocabulary Level – Correct .08 .02 .25 3.38 < .01 Vocabulary Level – Distractors -.05 .03 -.11 -1.53 .13 Reasoning – Correct .62 .11 .39 5.77 < .01 Reasoning – Distractors .14 .10 .09 1.33 .18 Location of Relevant Information -.05 .08 -.05 -.64 .52
GRE-V Specific Variables Special Item Format - Line Citation -.08 .10 -.05 -.85 .40 Special Item Format - Roman Numeral Item
-.38 .10 -.27 -4.00 < .01
Length of Passage -.18 .08 -.16 -2.24 .03
Contribution of Processing Factors to Item Difficulty
Change Statistics
Model R Adj. R square R square F df1 df2 Sig.
TE .15 .00 .02 1.18 4 195 .32
TE + DP .57 .28 .30 10.20 8 187 < .01
TE + DP + GRE Specific Factors .62 .34 .07 6.77 3 184 < .01
Implications for Score Meaning Variables related to both text encoding and
decision processes were significant predictors in models of GRE-V item difficulty.
This suggests that GRE-V RC items measure both processes. New evidence on construct validity of test scores
for these items.
Experimental Manipulations Experimental conditions corresponded to variations
in item features based on a hypothesized cognitive model. Passage propositional density and syntax modification. Passage passive voice and negative wording modification. Passage order of information change. Response alternative-passage overlap change.
Experimental effects tested with the LLTM model. Rasch model to deconstruct sources of item difficulty.
Contrast Coding for LLTM AnalysisD1 D2 D3 … D28 C1 C2 C3 C4
Item 1 1 0 0 0 0 0 0 0Item 2 0 1 0 0 0 0 0 0Item 3 0 0 1 0 0 0 0 0…Item 29 0 0 0 1 0 0 0 0Item 30 1 0 0 0 1 0 0 0Item 31 0 1 0 0 1 0 0 0Item 32 0 0 1 0 1 0 0 0…Item 37 0 0 0 0 0 1 0 0Item 38 0 0 0 0 0 1 0 0Item 38 0 0 0 0 0 1 0 0…Item 51 0 0 0 1 0 0 0 0
Scatterplot of Known and Estimated Item Difficulty Parameters for 29-Original
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
ETS Known 3-PL Item Parameters
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
10.00
Est
imat
ed 1
-PL
Effects of Manipulations on Item Difficulty
No. ofReps
MinValue
MaxValue Mean
New StdError ofEstimate Sig.
Parameter C1 5 .17 .24 .22 0.1424 ns
Estimates C2 5 .32 .38 .35 0.1398 <.05
C3 5 -.13 -.05 -.08 0.1457 ns
C4 5 -.02 .06 .03 0.1415 ns
Table of Regression Coefficients and Significance Tests for the Experimental Model
Unstandardized Coefficients
Standardized Coefficients
Parameter B SE Beta t Sig
Condition 1 .080 .044 .067 1.793 .081
Condition 2 .005 .044 .004 .115 .909
Condition 3 -.091 .044 -.077 -2.040 .048
Condition 4 -.104 .044 -.088 -2.345 .024
Implications for Test Design Significant effect of passive voice and negative
wording on item difficulty was found. Test writers often avoid the use of negative
wording, citing that it is complicated and can confuse readers.
Although the effects were not significant for item difficulty, two significant effects on response time were found.
Self Report Measures Verbal Protocols
Concurrent or retrospective
Structured Questionnaires Strategy use Background information Interest Confidence
Digital Eye Tracking Digital eye tracking data has been used to
examine cognition and individual differences in
language processing. facial processing. learned attention. electrical circuit troubleshooting. problem solving strategies.
Spatial reasoning Abstract reasoning
Gazetrail for a Reading Comprehension Item
Question (0.797) RO (0.297) Passage (4.03) Question (0.31) Passage (41.35) Question (2.59) Passage (2.65) RO (6.00) Passage (3.09) RO (0.76) Question (0.37) RO (.25)
Summary Data for Reading Comprehension Item
Gaze Distribution
0.81
0.07 0.12
0
0.2
0.4
0.6
0.8
1
Passage Question Options
Zone
Perc
ent of Tota
l Tim
e
Implications and future use. Verify some of our current models. Identify new variables related to processing. Describe qualitative differences characterized by
strategy differences. Examine specific aspects of test items that are
problematic for individuals or for subgroups. Observe the effects of controlled item manipulations
on item processing, not just item responses alone.
Summary of the Benefits of Cognitively-Based Assessment Design Construct validity is more completely understood.
Explicitly elaborates the processes, strategies and knowledge structures.
Enhanced score interpretations. Persons, as well as items, can be described by processes,
strategies and knowledge structures. Generation of items with specified sources and
levels of item difficulty. Item parameters may be predicted for newly developed
items. Items can be generated for specific populations by
controlling the cognitive processing requirement.
Greatest Challenges Our limited understanding of the cognitive models
and the test items. Current item response data is limited in information.
A “one size fits all” approach to cognitive modeling will not work. Changing your goals for the assessment necessitates a
change in the items. Changing the items (or features of the items) implies a
change in what can be concluded from the test.