Post on 02-Jan-2016
transcript
GETTING MORE FROM YOUR TECHNICAL ADVISORY COMMITTEE: DESIGNING AND IMPLEMENTING A VALIDITY RESEARCH AGENDA
CCSSO National Conference on Student Assessment
June 2015
Chad Buckendahl
Session overview
Purpose of Technical Advisory Committee– Independence, quality control, research
Recruiting a TAC– Aligned with purpose– Complementary expertise and experiences
Considering the forest, not just the trees
Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity
Research Agenda
CCSSO National Conference on Student AssessmentJune 2015
Chris DomaleskiCenter for Assessment
4
What is the role of TAC?
• Evaluate: review the work of the SEA’s assessment contractors to affirm or (or not) that the work meets acceptable standards.
– Provides an independent check
– Bolsters credibility
• Collaborate: work with the SEA to help develop and implement effective solutions.
• Both are important, but often programs focus on the former and not the latter.
5
Validity Framework
• We argue that TAC work can be most effective when organized around a coherent and comprehensive validity framework
• The validity framework guides the planning, implementation, and follow-up
6
What is a validity framework?
• Refers to the practices and sources of evidence that bolster claims that assessments can support the intended purposes and uses
• For example, the Standards for Educational and Psychological Testing call for evidence related to:– Test Content
– Response Processes
– Internal Structure
– Relationship to Other Variables
– Consequences
7
Implementing the Framework
• Consider developing guidance to inform elements of the framework in lieu of organizing TAC work exclusively by contractor, event, or deliverable
• For example:
– Process for content validation
– Table of contents for technical manuals
– Research agenda to validate achievement standards
– QA procedures
– Plan for collecting evidence to monitor accountability system
8
Illustration
• Internal Structure– There are often common sources of evidence that address the internal
structure of assessments, such as:• Dimensionality analyses
• Model fit
• Differential test and item functioning
– There are many approaches to elicit this evidence• Sampling
• Analytic procedures
• Flagging criteria
– We argue that as much or more emphasis should be placed on establishing procedures and rationale in advance, than reviewing outcomes post hoc.
– By so doing, the likelihood of producing a complete and comprehensive set of evidence is improved.
9
General Suggestions for Implementation
• Have TAC member(s) collaborate in developing the agenda and facilitating the meeting.
• Every topic should have clear focus questions that target intended outcomes. Be clear about constraints in order to focus on conversation on productive areas.
• Plan well in advance in order to maximize opportunity for TAC influence. – In general, post-hoc reviews are of limited utility
• Identify someone with appropriate technical expertise to take notes that emphasize action items. Have a process for notes to be reviewed.
• Consider having TAC members lead discussions, which may involve preparing materials in advance
• Engage your TAC between meetings – even for FYI items
Contact Information
Chris Domaleski
Senior Associate, Center for Assessment
cdomaleski@nciea.org
Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity
Research Agenda
2015 National Conference on Student Assessment
Enis Dogan
PARCC Overview
• PARCC development and implementation• TAC overview• Determining the agenda
• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents
• Validity framework• Psychometric Roadmap
12
PARCC Overview
• Development initiated in 2011• First research conducted in 2012• Field trial in 2013• Field tests in 2014• First operational assessments in 2014-15 school year in:
13
• Arkansas • Colorado• District of
Columbia • Illinois • Louisiana• Maryland
• Massachusetts • Mississippi• New Jersey• New Mexico• Ohio• Rhode Island
• Henry Braun (Boston College)
• Bob Brennan (University of Iowa)
• Derek Briggs (University of Colorado at Boulder)
• Linda Cook (Retired, ETS)
• Ronald Hambleton (University of Massachusetts, Amherst)
• Gerunda Hughes (Howard University)
• Huynh Huynh (University of South Carolina)
• Michael Kolen (University of Iowa)
PARCC TAC
• Suzanne Lane (University of Pittsburgh)
• Richard Luecht (University of North Carolina at Greensboro)
• Jim Pellegrino (University of Illinois at Chicago)
• Barbara Plake (University of Nebraska- Lincoln)
• Rachel Quenemoen (National Center on Educational Outcomes)
• Laurie Wise (Human Resources Research Organization, HumRRO)
Provides guidance on assessment design and development, and the research agenda of the consortium
Determining the agenda
• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents
• Psychometric Roadmap• Validity framework
15
• Lists psychometric assumptions and decisions and provides a road map for making decisions on pending issues.
• The psychometric issues are categorizes as follows• PARCC Scaling Approach and Reporting Scale Characteristics• Claims and Subclaims Reporting• Scale Construction and Properties • Item Response Theory (IRT) Modeling• Mode and Device Comparability• Data Forensics• Linking Considerations
Psychometric Roadmap
Psychometric Workplan Issues
• Determine properties of the primary (summative) reporting scale• Determine number of digits for reported scale scores• Establish rules defining the lowest and highest reported scale scores• Determine how cut scores will be reported across performance levels, grades and subjects (i.e., determine
scale anchors)• Determine how transformations from raw scores to scale scores will be carried out
Assumptions and Decisions
• CCR cut score will be fixed so that the same value indicates CCR performance across all grades and content areas
References for Assumptions/Decisions
• PARCC Scale Score Brief (Sept 2014): “PARCC Score Scale Brief_090314.docx”
Outstanding Questions• What is the range of the summative scale scores? (Evaluating the CSEMs may help inform this decision)• What are the lowest and highest obtainable scale scores (LOSS and HOSS) for the summative scores?• What are the LOSS and HOSS for the sub-scores in reading and writing?
Psychometric Roadmap
Task Deadline
Policy decision about number of digits for PARCC reporting scales 3/31/2015
PARCC scaling approach presentation and discussion with RAP 4/2/2015
Scaling approach discussion at research planning meeting 4/29/2015
Scaling consideration presentation to TAC 6/17/2015
Policy decision about the properties of the reading and writing sub-score scales 6/30/2015
Simulations with spring 2015 operational data 7/27/2015
Performance level setting meeting for high school 7/27/2015
Governing Board approves standards and summative scales for high school 8/14/2015
Performance level setting meeting for grades 3-8 8/24/2015
Governing Board approves standards and summative scales for grades 3-8 9/11/2015
Psychometric Roadmap
• In terms of planning and executing research studies to collect empirical validity evidence, the first step is to build and follow a framework which the studies can be organized around. Lack of connectedness among validity studies is a challenge in many assessment programs (Haladyna, 2006).
• We built our framework by first dividing the assessment development and implementation period into four phases:
• Phase I: Defining measurement targets, item and test development• Phase II: Test delivery and administration • Phase III: Scoring, scaling, standard setting• Phase IV: Reporting, interpretation and use of results
Validity Framework
• Phase I: Defining measurement targets, item and test development• 1-A: The purposes of the assessments are clear to all stake holders.
Relevant standards: 1.1• 1-B: Test specifications and design documents are clear about what knowledge
and skills are able to be assessed, the scope of the domain, the definition of competence, and the claims the assessments will be used to support.
Relevant standards: 1.2, 3.1, 3.3• 1-C: Items are free of bias and accessible.
Relevant standards: 7.4, 7.7, 9.1, 9.2, 10.1• 1-D: Items measure the intended constructs and elicit behavior that can be used
as evidence in supporting the intended claims.Relevant standards: 1.1, 1.8, 13.3
Validity Framework
• Phase I: Defining measurement targets, item and test development
Sources/Evidence of Procedural Validity for Phase I • Performance-Level Descriptors (PLDs) • Supported conditions/outcome: 1-B (scope of domain)
Sources/Evidence of Empirical Validity for Phase I• Study 4: Use of Evidence-Based Selected Response Items in Measuring Reading
Comprehension o Supported conditions/outcome: 1-D (intended constructs)o Source of validity evidence: Response processes
Validity Framework
PARCC Validity Framework is described in more detail in
Dogan, E. & Hauger, J.(in press). Empirical and Procedural Validity Evidence in Development and Implementation of PARCC Assessments. In Lissitz, R. W. (Editor), The Next Generation of Testing: Common Core Standards, Smarter-Balanced, PARCC, and the Nationwide Testing Movement. Charlotte: Information Age Publishing Inc.
Validity Evidence
22
• Evidence and Design Implications Required to Support Comparability Claims by Richard M. Luecht (The University of North Carolina at Greensboro) and Wayne J. Camara (The College Board)
• Combining Multiple Indicators by Lauress L. Wise (HumRRO)
• Issues Associated with Vertical Scales for PARCC Assessments by Michael J. Kolen (The University of Iowa)
• Making Inferences about Growth and Value-Added: Design Issues for the PARCC Consortium by Derek Briggs (University of Colorado at Boulder)
• Defining and Measuring College and Career Readiness and Informing the Development of Performance Level Descriptors (PLDs) by Wayne Camara (College Board) and Rachel Quenemoen (National Center on Educational Outcomes)
• Scores and Scales: Considerations for PARCC Assessments by Michael J. Kolen (University of Iowa)
• Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example by Robert L. Brennan (University of Iowa)
PARCC TAC
TAC Webinars in 2015
February 2015
•Field Test Analyses
March 2015
•Device Comparability Study
April 2015
•IRT Analyses•Mode Comparability Study
May 2015
•Scale Properties•International Benchmarking Study (Content Alignment)•Data Forensics•End-of-Course Comparability Study•PARCC Test Design Change
June 2015
•PARCC CCR Policy
24
TAC Presentation
Julian Montoya Nevada Department of Education
APAC: Assessment, Program Accountability & Curriculum
Used TAC for:Assessment Development and Standard Setting
(Criterion-Referenced Tests – 3-8, High School Proficiency Exams, & Nevada Alternate Assessments)
Reviewing vendors’ deliverables with Nevada professionals intimately involved throughout
NDEs assessment and curriculum programs that were attached at the hip
Organizational Structure - Old
Prior to NDE Reorganization
Office of Standards & Instructional Support – Curriculum side is now own office
Enhanced– Program ManagementEnsure Nevada is being proactive with alignment
of our assessment and accountability systems
Organizational Structure - New
ADAM: Assessment, Data & Accountability Management
After the NDE Reorganization
NDE has always tried to get a diverse group that can accommodate the majority of our needs:1. Standards adoption and Implementation 2. Communication concerning standards,
assessments, and accountabilityAwareness of what is happening across the
states and nationally, potential application in Nevada
Support NDE’s ability to answer questions from the field and the Feds
TAC helps us determine what should be asked and documented
How we determine how we should use our TAC?
Work in progress – we always like to ask first and include many members of NDE to make sure we try and utilize our TAC to the fullest (State superintendent and his deputies in planning; directors of Special Education, School Improvement, IT, Standards, and CTE)
Beyond the scope of their expertise, for example, the technology of our test delivery system
Understanding when we cannot use TAC
Test delivery challenges – From the example above, Nevada faced many challenges this year and we were unable to utilize TAC because that is not their charter even though they understand the issues very well
New RFP – could not ask for specific guidance due to the nature of the process. Our state could have utilized unbiased professionals who know our system (TAC).
Emerging topics that extend beyond their expertise (guest TAC members)
When do we need to go beyond and outside our TAC?
Prior to setting agendas we really work out our questions to answer
These questions fall into 3 organizing buckets1. Operational – development of assessments,
monitoring assessments, the assessment itself2. Policy – pop up unexpectedly, questions from
leadership and the legislature3. Innovation/Change Management – paper-pencil
to online, network connections, future planningThe 3 buckets help ensure that we will have
an aligned assessment system from standards adoption to accountability and helps to prioritize discussions
TAC Buckets
Agenda setting is collaborative among ADAM, SIS, and front office to ensure all issues are addressed and input gathered.
Concerted effort to shift away from vendor driven agenda setting to state driven
Vendor might have own agenda and might not fit into the state’s plan
Once agenda is set, materials are created and sent to TAC members prior to meeting to make them more productive
Also create questions that align to the topics – much more effective for discussion and decision making by NDE staff with the help of our TAC Coordinator (the moderator)
Agenda Setting