G ETTING MORE FROM YOUR T ECHNICAL A DVISORY C OMMITTEE : D ESIGNING AND I MPLEMENTING A V ALIDITY R...

transcript

GETTING MORE FROM YOUR TECHNICAL ADVISORY COMMITTEE: DESIGNING AND IMPLEMENTING A VALIDITY RESEARCH AGENDA

CCSSO National Conference on Student Assessment

June 2015

Chad Buckendahl

Session overview

Purpose of Technical Advisory Committee– Independence, quality control, research

Recruiting a TAC– Aligned with purpose– Complementary expertise and experiences

Considering the forest, not just the trees

Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity

Research Agenda

CCSSO National Conference on Student AssessmentJune 2015

Chris DomaleskiCenter for Assessment

What is the role of TAC?

• Evaluate: review the work of the SEA’s assessment contractors to affirm or (or not) that the work meets acceptable standards.

– Provides an independent check

– Bolsters credibility

• Collaborate: work with the SEA to help develop and implement effective solutions.

• Both are important, but often programs focus on the former and not the latter.

Validity Framework

• We argue that TAC work can be most effective when organized around a coherent and comprehensive validity framework

• The validity framework guides the planning, implementation, and follow-up

What is a validity framework?

• Refers to the practices and sources of evidence that bolster claims that assessments can support the intended purposes and uses

• For example, the Standards for Educational and Psychological Testing call for evidence related to:– Test Content

– Response Processes

– Internal Structure

– Relationship to Other Variables

– Consequences

Implementing the Framework

• Consider developing guidance to inform elements of the framework in lieu of organizing TAC work exclusively by contractor, event, or deliverable

• For example:

– Process for content validation

– Table of contents for technical manuals

– Research agenda to validate achievement standards

– QA procedures

– Plan for collecting evidence to monitor accountability system

Illustration

• Internal Structure– There are often common sources of evidence that address the internal

structure of assessments, such as:• Dimensionality analyses

• Model fit

• Differential test and item functioning

– There are many approaches to elicit this evidence• Sampling

• Analytic procedures

• Flagging criteria

– We argue that as much or more emphasis should be placed on establishing procedures and rationale in advance, than reviewing outcomes post hoc.

– By so doing, the likelihood of producing a complete and comprehensive set of evidence is improved.

General Suggestions for Implementation

• Have TAC member(s) collaborate in developing the agenda and facilitating the meeting.

• Every topic should have clear focus questions that target intended outcomes. Be clear about constraints in order to focus on conversation on productive areas.

• Plan well in advance in order to maximize opportunity for TAC influence. – In general, post-hoc reviews are of limited utility

• Identify someone with appropriate technical expertise to take notes that emphasize action items. Have a process for notes to be reviewed.

• Consider having TAC members lead discussions, which may involve preparing materials in advance

• Engage your TAC between meetings – even for FYI items

Contact Information

Chris Domaleski

Senior Associate, Center for Assessment

cdomaleski@nciea.org

Getting More from Your Technical Advisory Committee: Designing and Implementing a Validity

Research Agenda

2015 National Conference on Student Assessment

Enis Dogan

PARCC Overview

• PARCC development and implementation• TAC overview• Determining the agenda

• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents

• Validity framework• Psychometric Roadmap

PARCC Overview

• Development initiated in 2011• First research conducted in 2012• Field trial in 2013• Field tests in 2014• First operational assessments in 2014-15 school year in:

• Arkansas • Colorado• District of

Columbia • Illinois • Louisiana• Maryland

• Massachusetts • Mississippi• New Jersey• New Mexico• Ohio• Rhode Island

• Henry Braun (Boston College)

• Bob Brennan (University of Iowa)

• Derek Briggs (University of Colorado at Boulder)

• Linda Cook (Retired, ETS)

• Ronald Hambleton (University of Massachusetts, Amherst)

• Gerunda Hughes (Howard University)

• Huynh Huynh (University of South Carolina)

• Michael Kolen (University of Iowa)

PARCC TAC

• Suzanne Lane (University of Pittsburgh)

• Richard Luecht (University of North Carolina at Greensboro)

• Jim Pellegrino (University of Illinois at Chicago)

• Barbara Plake (University of Nebraska- Lincoln)

• Rachel Quenemoen (National Center on Educational Outcomes)

• Laurie Wise (Human Resources Research Organization, HumRRO)

Provides guidance on assessment design and development, and the research agenda of the consortium

Determining the agenda

• Research and Psychometrics Committee• Other Working Groups and PARCC State Leads • Guiding documents

• Psychometric Roadmap• Validity framework

• Lists psychometric assumptions and decisions and provides a road map for making decisions on pending issues.

• The psychometric issues are categorizes as follows• PARCC Scaling Approach and Reporting Scale Characteristics• Claims and Subclaims Reporting• Scale Construction and Properties • Item Response Theory (IRT) Modeling• Mode and Device Comparability• Data Forensics• Linking Considerations

Psychometric Roadmap

Psychometric Workplan Issues

• Determine properties of the primary (summative) reporting scale• Determine number of digits for reported scale scores• Establish rules defining the lowest and highest reported scale scores• Determine how cut scores will be reported across performance levels, grades and subjects (i.e., determine

scale anchors)• Determine how transformations from raw scores to scale scores will be carried out

Assumptions and Decisions

• CCR cut score will be fixed so that the same value indicates CCR performance across all grades and content areas

References for Assumptions/Decisions

• PARCC Scale Score Brief (Sept 2014): “PARCC Score Scale Brief_090314.docx”

Outstanding Questions• What is the range of the summative scale scores? (Evaluating the CSEMs may help inform this decision)• What are the lowest and highest obtainable scale scores (LOSS and HOSS) for the summative scores?• What are the LOSS and HOSS for the sub-scores in reading and writing?

Task Deadline

Policy decision about number of digits for PARCC reporting scales 3/31/2015

PARCC scaling approach presentation and discussion with RAP 4/2/2015

Scaling approach discussion at research planning meeting 4/29/2015

Scaling consideration presentation to TAC 6/17/2015

Policy decision about the properties of the reading and writing sub-score scales 6/30/2015

Simulations with spring 2015 operational data 7/27/2015

Performance level setting meeting for high school 7/27/2015

Governing Board approves standards and summative scales for high school 8/14/2015

Performance level setting meeting for grades 3-8 8/24/2015

Governing Board approves standards and summative scales for grades 3-8 9/11/2015

• In terms of planning and executing research studies to collect empirical validity evidence, the first step is to build and follow a framework which the studies can be organized around. Lack of connectedness among validity studies is a challenge in many assessment programs (Haladyna, 2006).

• We built our framework by first dividing the assessment development and implementation period into four phases:

• Phase I: Defining measurement targets, item and test development• Phase II: Test delivery and administration • Phase III: Scoring, scaling, standard setting• Phase IV: Reporting, interpretation and use of results

Validity Framework

• Phase I: Defining measurement targets, item and test development• 1-A: The purposes of the assessments are clear to all stake holders.

Relevant standards: 1.1• 1-B: Test specifications and design documents are clear about what knowledge

and skills are able to be assessed, the scope of the domain, the definition of competence, and the claims the assessments will be used to support.

Relevant standards: 1.2, 3.1, 3.3• 1-C: Items are free of bias and accessible.

Relevant standards: 7.4, 7.7, 9.1, 9.2, 10.1• 1-D: Items measure the intended constructs and elicit behavior that can be used

as evidence in supporting the intended claims.Relevant standards: 1.1, 1.8, 13.3

Validity Framework

• Phase I: Defining measurement targets, item and test development

Sources/Evidence of Procedural Validity for Phase I • Performance-Level Descriptors (PLDs) • Supported conditions/outcome: 1-B (scope of domain)

Sources/Evidence of Empirical Validity for Phase I• Study 4: Use of Evidence-Based Selected Response Items in Measuring Reading

Comprehension o Supported conditions/outcome: 1-D (intended constructs)o Source of validity evidence: Response processes

Validity Framework

PARCC Validity Framework is described in more detail in

Dogan, E. & Hauger, J.(in press). Empirical and Procedural Validity Evidence in Development and Implementation of PARCC Assessments. In Lissitz, R. W. (Editor), The Next Generation of Testing: Common Core Standards, Smarter-Balanced, PARCC, and the Nationwide Testing Movement. Charlotte: Information Age Publishing Inc.

Validity Evidence

• Evidence and Design Implications Required to Support Comparability Claims by Richard M. Luecht (The University of North Carolina at Greensboro) and Wayne J. Camara (The College Board)

• Combining Multiple Indicators by Lauress L. Wise (HumRRO)

• Issues Associated with Vertical Scales for PARCC Assessments by Michael J. Kolen (The University of Iowa)

• Making Inferences about Growth and Value-Added: Design Issues for the PARCC Consortium by Derek Briggs (University of Colorado at Boulder)

• Defining and Measuring College and Career Readiness and Informing the Development of Performance Level Descriptors (PLDs) by Wayne Camara (College Board) and Rachel Quenemoen (National Center on Educational Outcomes)

• Scores and Scales: Considerations for PARCC Assessments by Michael J. Kolen (University of Iowa)

• Scaling PARCC Assessments: Some Considerations and a Synthetic Data Example by Robert L. Brennan (University of Iowa)

PARCC TAC

TAC Webinars in 2015

February 2015

•Field Test Analyses

March 2015

•Device Comparability Study

April 2015

•IRT Analyses•Mode Comparability Study

May 2015

•Scale Properties•International Benchmarking Study (Content Alignment)•Data Forensics•End-of-Course Comparability Study•PARCC Test Design Change

June 2015

•PARCC CCR Policy

Enis Doganedogan@parcconline.org

TAC Presentation

Julian Montoya Nevada Department of Education

APAC: Assessment, Program Accountability & Curriculum

Used TAC for:Assessment Development and Standard Setting

(Criterion-Referenced Tests – 3-8, High School Proficiency Exams, & Nevada Alternate Assessments)

Reviewing vendors’ deliverables with Nevada professionals intimately involved throughout

NDEs assessment and curriculum programs that were attached at the hip

Organizational Structure - Old

Prior to NDE Reorganization

Office of Standards & Instructional Support – Curriculum side is now own office

Enhanced– Program ManagementEnsure Nevada is being proactive with alignment

of our assessment and accountability systems

Organizational Structure - New

ADAM: Assessment, Data & Accountability Management

After the NDE Reorganization

NDE has always tried to get a diverse group that can accommodate the majority of our needs:1. Standards adoption and Implementation 2. Communication concerning standards,

assessments, and accountabilityAwareness of what is happening across the

states and nationally, potential application in Nevada

Support NDE’s ability to answer questions from the field and the Feds

TAC helps us determine what should be asked and documented

How we determine how we should use our TAC?

Work in progress – we always like to ask first and include many members of NDE to make sure we try and utilize our TAC to the fullest (State superintendent and his deputies in planning; directors of Special Education, School Improvement, IT, Standards, and CTE)

Beyond the scope of their expertise, for example, the technology of our test delivery system

Understanding when we cannot use TAC

Test delivery challenges – From the example above, Nevada faced many challenges this year and we were unable to utilize TAC because that is not their charter even though they understand the issues very well

New RFP – could not ask for specific guidance due to the nature of the process. Our state could have utilized unbiased professionals who know our system (TAC).

Emerging topics that extend beyond their expertise (guest TAC members)

When do we need to go beyond and outside our TAC?

Prior to setting agendas we really work out our questions to answer

These questions fall into 3 organizing buckets1. Operational – development of assessments,

monitoring assessments, the assessment itself2. Policy – pop up unexpectedly, questions from

leadership and the legislature3. Innovation/Change Management – paper-pencil

to online, network connections, future planningThe 3 buckets help ensure that we will have

an aligned assessment system from standards adoption to accountability and helps to prioritize discussions

TAC Buckets

Agenda setting is collaborative among ADAM, SIS, and front office to ensure all issues are addressed and input gathered.

Concerted effort to shift away from vendor driven agenda setting to state driven

Vendor might have own agenda and might not fit into the state’s plan

Once agenda is set, materials are created and sent to TAC members prior to meeting to make them more productive

Also create questions that align to the topics – much more effective for discussion and decision making by NDE staff with the help of our TAC Coordinator (the moderator)

Agenda Setting

G ETTING MORE FROM YOUR T ECHNICAL A DVISORY C OMMITTEE : D ESIGNING AND I MPLEMENTING A V ALIDITY R...

Documents