Creating Tests that Measure Well and that Model Good Instructional
and Learning Practice
National Conference on Student Assessment Minneapolis, MN
June 2012
Session Outline
• Presenters – Randy Bennett (ETS) – Edys Quellmalz (WestEd)
• Discussant – Brian Gong (NCIEA)
• Q&A
Copyright © 2011 by Educational Testing Service.
2
CBAL: Modeling Good Instructional and Learning Practice through Assessment
Randy Bennett ETS
Presentation at the National Conference on Student Assessment, Minneapolis, MN, June 2012
Overview
• Brief description of CBAL’s goal and design characteristics • Brief outline of pilot results • Examples of how we try to model good teaching and
learning practice • List of outstanding issues • Summary
Copyright © 2011 by Educational Testing Service.
4
5
Cognitively Based Assessment of, for, and as Learning
• Began in 2007 • Goal: Create knowledge and capability, grounded in the learning
sciences, that can be configured in different ways to address the assessment innovation needs of the field
• CBAL assessment prototypes attempt to: – Document what students have achieved (“of learning”) – Help identify how to plan instruction (“for learning”) – Offer worthwhile educational experiences (“as learning”)
• R&D covers reading, writing, mathematics, and science from elementary school through adult education
5
Copyright © 2011 by Educational Testing Service.
Key Design Characteristics
• Summative and formative assessment built as part of a coherent system
• System model was created from a detailed theory of action • Assessment designs are grounded in principles and
domain conceptions from learning-sciences’ research • Assessment prototypes are computer-delivered and make
heavy use of structured, scenario-based task sets • Summative assessments use a distributed design • Assessment prototypes are built to measure well and to
model good instructional and learning practice
Copyright © 2011 by Educational Testing Service.
6
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
7
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
8
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
9
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
10
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
11
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
12
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Summary of Results from 16 Online Summative Assessment Pilots
Copyright © 2011 by Educational Testing Service.
13
Content Area (& # of Form Admini-strations)
# of Tests
Median (M) of the M p+ Values
M of the M Percent Omitted/ Missing
M Coeff. Alpha
Most Frequent Factor Analytic Result
M r with Other Tests of the Same Skill
Diff. Btwn Auto-Human (H) & H-H Agreement
Reading (6)
3,062 .51 0% .88 1 F within & across forms
.74 M=6 k pts
Writing (9)
5,410 .57 1% .82 1 F within & across forms
--- 3 r pts
Math (12)
1,347 .45 6% .92 1 F within forms
.76 M=15 k pts
Note. M=median; F = factor; k = kappa; r = correlation coefficient.
Modeling Good Teaching and Learning Practice
• CBAL Summative (and formative assessments) try to: – Give students something substantive and reasonably realistic with
which to reason, read, write, or do mathematics or science – Include tools and representations similar to ones proficient
performers tend to use – Connect qualitative (conceptual) understanding with formalism – Use “lead-in” and “culminating tasks” to suggest how the skills
required for more complex performance might be decomposed for instruction
– Use (CCSS-aligned) learning progressions to denote and measure levels of qualitative change in student understanding
Copyright © 2011 by Educational Testing Service.
14
Modeling Good Teaching and Learning Practice
• CBAL Summative (and formative assessments) try to: – Give students something substantive and reasonably realistic with
which to reason, read, write, or do mathematics or science – Include tools and representations similar to ones proficient
performers tend to use – Connect qualitative (conceptual) understanding with formalism – Use “lead-in” and “culminating tasks” to suggest how the skills
required for more complex performance might be decomposed for instruction
– Use (CCSS-aligned) learning progressions to denote and measure levels of qualitative change in student understanding
Copyright © 2011 by Educational Testing Service.
16
Modeling Good Teaching and Learning Practice
• CBAL Summative (and formative assessments) try to: – Give students something substantive and reasonably realistic with
which to reason, read, write, or do mathematics or science – Include tools and representations similar to ones proficient
performers tend to use – Connect qualitative (conceptual) understanding with
formalism – Use “lead-in” and “culminating tasks” to suggest how the skills
required for more complex performance might be decomposed for instruction
– Use (CCSS-aligned) learning progressions to denote and measure levels of qualitative change in student understanding
Copyright © 2011 by Educational Testing Service.
19
Copyright © 2012 by Educational Testing Service.
20
Will the lake become so shallow that water can no longer flow through the dam?
Modeling Good Teaching and Learning Practice
• CBAL Summative (and formative assessments) try to: – Give students something substantive and reasonably realistic with
which to reason, read, write, or do mathematics or science – Include tools and representations similar to ones proficient
performers tend to use – Connect qualitative (conceptual) understanding with formalism – Use “lead-in” and “culminating tasks” to suggest how the skills
required for more complex performance might be decomposed for instruction
– Use (CCSS-aligned) learning progressions to denote and measure levels of qualitative change in student understanding
Copyright © 2011 by Educational Testing Service.
22
Modeling Good Teaching and Learning Practice
• CBAL Summative (and formative assessments) try to: – Give students something substantive and reasonably realistic with
which to reason, read, write, or do mathematics or science – Include tools and representations similar to ones proficient
performers tend to use – Connect qualitative (conceptual) understanding with formalism – Use “lead-in” and “culminating tasks” to suggest how the skills
required for more complex performance might be decomposed for instruction
– Use (CCSS-aligned) learning progressions to denote and measure levels of qualitative change in student understanding
Copyright © 2011 by Educational Testing Service.
24
CBAL Definition of “Learning Progression”
• A description of qualitative change in a student’s level of sophistication for a key concept, process, strategy, practice, or habit of mind. Change in student standing on such a progression may be due to a variety of factors, including maturation and instruction. Each progression is presumed to be modal--i.e., to hold for most, but not all, students. Finally, it is provisional, subject to empirical verification and theoretical challenge
Copyright © 2011 by Educational Testing Service.
25
Provisional Learning Progression for Argument-Building (Deliberation)
• PRELIMINARY: Can distinguish reasons from non-reasons and infer whether reasons would be used to support or oppose a position
• FOUNDATIONAL: Can self-generate multiple reasons to support an opinion
• BASIC: Can rank and select reasons by how convincing they seem; Can distinguish facts and details that strengthen a point from those that weaken it; can distinguish between reasoning that seems convincing because one agrees with it and reasoning that seems convincing because of the content of the argument.
• INTERMEDIATE: Can recognize counter examples. Can distinguish valid from invalid arguments and recognize unsupported claims and obvious fallacies.
• ADVANCED: Can identify and question the warrants of arguments, distinguish necessary and sufficient evidence, and synthesize a position from many sources of evidence, using that to identify key evidence and propose new lines of argument.
Copyright © 2011 by Educational Testing Service.
26
Provisional Learning Progression for Argument-Building (Deliberation)
• PRELIMINARY: Can distinguish reasons from non-reasons and infer whether reasons would be used to support or oppose a position
• FOUNDATIONAL: Can self-generate multiple reasons to support an opinion
• BASIC: Can rank and select reasons by how convincing they seem; Can distinguish facts and details that strengthen a point from those that weaken it; can distinguish between reasoning that seems convincing because one agrees with it and reasoning that seems convincing because of the content of the argument.
• INTERMEDIATE: Can recognize counter examples. Can distinguish valid from invalid arguments and recognize unsupported claims and obvious fallacies.
• ADVANCED: Can identify and question the warrants of arguments, distinguish necessary and sufficient evidence, and synthesize a position from many sources of evidence, using that to identify key evidence and propose new lines of argument.
Copyright © 2011 by Educational Testing Service.
27
Provisional Learning Progression for Argument-Building (Deliberation)
• PRELIMINARY: Can distinguish reasons from non-reasons and infer whether reasons would be used to support or oppose a position
• FOUNDATIONAL: Can self-generate multiple reasons to support an opinion
• BASIC: Can rank and select reasons by how convincing they seem; Can distinguish facts and details that strengthen a point from those that weaken it; can distinguish between reasoning that seems convincing because one agrees with it and reasoning that seems convincing because of the content of the argument.
• INTERMEDIATE: Can recognize counter examples. Can distinguish valid from invalid arguments and recognize unsupported claims and obvious fallacies.
• ADVANCED: Can identify and question the warrants of arguments, distinguish necessary and sufficient evidence, and synthesize a position from many sources of evidence, using that to identify key evidence and propose new lines of argument.
Copyright © 2011 by Educational Testing Service.
29
Outstanding Issues
• Do our modeling strategies affect classroom teaching and learning practice?
– Do teachers change their instructional practice in the intended ways?
– Do students change their learning practice in the intended ways? – Does achievement improve?
• Are our learning progressions useful for measurement and for instruction?
• Do the modeling strategies and learning progressions appear to be of benefit for students from special populations, as well as for those from the general population?
Copyright © 2011 by Educational Testing Service.
31
Summary
• In CBAL we are: – Designing assessment prototypes to measure well and have positive impact – Attempting to have positive impact by modeling good teaching and learning
practice • Give students something substantive and reasonably realistic with which to work • Include tools and representations similar to ones used by proficient performers • Connect qualitative understanding with formalism • Use “lead-in” and “culminating tasks” to suggest how complex performance might
be decomposed • Use provisional learning progressions to denote and measure levels of qualitative
change • Data on “measuring well” appear promising • Much more work needs to be done to verify the effectiveness of our “practice-
modeling” attempts
Copyright © 2011 by Educational Testing Service.
32
For More About CBAL
• Overview Papers – Bennett, R. E., & Gitomer, D. H. (2009). Transforming K-12 assessment: Integrating
accountability testing, formative assessment, and professional support. In C. Wyatt-Smith & J. Cumming (Eds.), Educational assessment in the 21st century (pp. 43-61). New York: Springer.
– Bennett, R. E. (2010). Cognitively Based Assessment of, for, and as Learning: A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70-91.
– Bennett, R. E. (2011). CBAL: Results from piloting innovative K-12 assessments (RR-11-23). Princeton, NJ: Educational Testing Service.
• Commentaries – Embretson, S. (2010). Cognitively based assessment and the integration of
summative and formative assessments. Measurement: Interdisciplinary Research & Perspectives, 8, 180-184.
– Linn, R. L. (2010). Commentary: A new era of test-based educational accountability. Measurement: Interdisciplinary Research & Perspectives, 8,145–149.
• www.ets.org/research/topics/cbal/initiative
Copyright © 2011 by Educational Testing Service.
33