Post on 22-Feb-2016
description
transcript
Testing a Strategic Evaluation Framework for Incrementally Building Evaluation Capacity in a Federal R&D Program
27th Annual Conference of the American Evaluation Association
Washington, DCOctober 17, 2013 JOHN TUNNA
DirectorOffice of Research and Development
Office of Railroad Policy and DevelopmentFederal Railroad Administration
Federal Railroad Administration (FRA)Evaluation Implementation Plan
• Introduction– R&D Evaluation Mandate– R&D Evaluation Goals– R&D Evaluation Standards
• Uses of Evaluation– Formative – Summative
• Types of Evaluation (CIPP Evaluation Model) – Context– Input– Implementation– Impact
• Evaluation Framework & Key Evaluation Questions• Start-up Pilot Evaluations• Institutionalizing and Mainstreaming Evaluation
– Metaevaluation– The Evaluation Manual
• Evaluation templates• Attestation of standards
R&D Evaluation Mandate• Congressional Mandates
– Government Performance and Results Act (GPRA, 1993)– Program Assessment Rating Tool (PARTs, 2002)– GPRA Modernization Act of 2010
• OMB Memos– M-13-17, July 26, 2013: Next Steps in the Evidence and Innovation Agenda– M-13-16, July 26, 2013: Science and Technology Priorities for the FY 2015 Budget– M-10-32, July 29, 2010: Evaluating Programs for Efficacy and Cost-Efficiency– M-10-01, October 7, 2009: Increased Emphasis on Program Evaluations– M-09-27, August 8, 2009: Science and Technology Priorities for the FY2011 Budget
• Federal Evaluation Working Group – Reconvened in 2012 to help build evaluation capacity across the federal government– “[We] need to use evidence and rigorous evaluation in budget, management, and policy decisions to make government
work effectively.”
• GAO reports– Program Evaluation: Strategies to Facilitate Agencies’ Use of Evaluation in Program Management and Policy Making
(June, 2013) – Program Evaluation: A Variety of Rigorous Methods Can Help Identify Effective Interventions (GAO-10-30, November,
2009)– Program Evaluation: Experienced Agencies Follow a Similar Model for Prioritizing Research (GAO-11-176 , January, 2011)
R&D Evaluation MandateOMB Memo M-13-16 (July 26, 2013)Subject: Science and Technology Priorities for the FY 2015 Budget
“Agencies . . . should give priority to R&D that strengthens the scientific basis for decision-making in their mission areas, including but not limited to health, safety, and environmental impacts. This includes efforts to enhance the accessibility and usefulness of data and tools for decision support, as well as research in the social and behavioral sciences to support evidence-based policy and effective policy implementation. “
“Agencies should work with their OMB contacts to agree on a format within their 2015 Budget submissions to: (1) explain agency progress in using evidence and (2) present their plans to build new knowledge of what works and is cost-effective.“
R&D Evaluation Goals
• Meet R&D accountability requirements• Guide and strengthen Division R&D
program effectiveness and impact• Facilitate knowledge diffusion and
technology transfer• Build R&D evaluation capacity• Improve railroad safety
FundedActivity“Family”
___________
ScientificResearch
TechnologyDevelopment
Deliverables/Products
Technical Report(s)
ForecastingModel(s)
Application of Research
Data Use Adoption of Guidelines, Standards
or Regulations
ReducedAccidentsInjuries
ACTIVITIES OUTPUTS OUTCOMES IMPACTS
ChangingPracticesEmergent
Outcomes
NegativeEnvironmental Effects
PositiveKnowledge Gains
Why Evaluation in R&D? Assessing the logic of R&D Programs
Research Evaluation
Primary Purpose: - contribute to knowledge - improve understanding
- program improvement - decision-making
Primary audience: - scholars - researchers - academicians
- program funders - administrators - decision makers
Types of Questions: - hypotheses - theory driven - preordinate
- practical - applied - open-ended, flexible
Sources of Data: - surveys - tests - experiments - pre-ordinate
- interviews - field observations - documents - mixed sources - open-ended, flexible
Criteria: - validity - reliability - generalizability
- utility - feasibility - propriety - accuracy - accountability
The Research-Evaluation Paradigm
Program Evaluation Standards:Guiding Principles for Conducting Evaluations
• Utility (useful): to ensure evaluations serve the information needs of the intended users.
• Feasibility (practical): to ensure evaluations are realistic, prudent, diplomatic, and frugal.
• Propriety (ethical): to ensure evaluations will be conducted legally, ethically, and with due regard for the welfare of those involved in the evaluation, as well as those affected by its results.
• Accuracy (valid): to ensure that an evaluation will reveal and convey valid and reliable information about all important features of the subject program.
• Accountability (professional): to ensure that those responsible for conducting the evaluation document and make available for inspection all aspects of the evaluation that are needed for independent assessments of its utility, feasibility, propriety, accuracy, and accountability.Note: The Program Evaluation Standards were developed by the Joint Committee on Standards for Educational Evaluation and have been accredited by the American National Standards Institute (ANSI).
CIPP Evaluation Model:(Context, Input, Process, Product)
• Context• Input
• Implementation• Impact
Daniel L. Stufflebeam's adaptation of his CIPP Evaluation Model framework for use in guiding program evaluations of the Federal Railroad Administration's Office of Research and Development. For additional information, see Stufflebeam, D.L. (2000). The CIPP model for evaluation. In D.L. Stufflebeam, G. F. Madaus, & T. Kellaghan, (Eds.), in Evaluation models (2nd ed.). (Chapter 16). Boston: Kluwer Academic Publishers.
Stakeholder engagement is key
Types of Evaluation
Context Inputs Implementation Impact
FormativeEvaluation (proactive)
Identifies:
• Needs• Problems• Assets
Helps set:
• Goals • Priorities
Assesses:
Alternative approaches
Develops:
Program plans, designs,budgets
Monitors, documents, & guides execution
Assesses:+/- outcomes
Reassess:project and program plans;
Informs:Policy development Strategic planning
Summative Evaluation (retroactive)
Assesses:
Original rogram goals & priorities
Assesses:
Original procedural plans & budget
Assesses: Execution
Assesses:
OutcomesImpacts Side effectsCost-effectiveness
Evaluation Framework:Roles and Types of Evaluation
11
Context Inputs Implementation Impact
FormativeEvaluation
What are the highest priority needs to improve safety culture in the U.S. rail industry?
What are the most promising alternatives for safety culture interventions (BBS, ISROP, Rules Revision, Close Calls, etc.)? How do they compare (potential success, costs, etc.)? How can these interventions be most effectively implemented? What are some potential barriers to implementation?
To what extent do safety culture interventions proceed on time, within budget, and effectively?
If needed, how can the intervention design be improved?
How can safety culture interventions be implemented to maximize effectiveness? What are some indicators of impact or use, if any, that have emerged to indicate that these interventions are being adopted more broadly? What are some emerging outcomes (positive or negative)? How can the implementations be modified to minimize costs and maximize effectiveness?
Summative Evaluation
To what extent did this intervention address the high priority safety need?
What intervention strategy was chosen, and why was it chosen compared to other viable strategies (re. prospects for success, feasibility, costs)?
To what extent was the intervention carried out as planned, or modified with an improved plan?
To what extent did these interventions improve safety/safety culture? Were there any unanticipated negative or positive side effects? What conclusions and lessons learned can be reached (i.e. cost effectiveness, stakeholder engagement, program effectiveness)?
Evaluation Framework:Key Evaluation Questions – Safety Culture
Evaluation as a Key Strategy Tool• Ask questions that matter.
About processes, products, programs, policies, and impacts Then develop appropriate and rigorous methods to answer them.
• Measure the extent to which, and ways, programs goals are being met. What’s working, and why, or why not?
• Use to refine program strategy, design and implementation. Inform others about lessons learned, progress, and program impacts.
• Improve likelihood of success with:– Intended users– Intended uses – Outcomes and impacts– Unanticipated (positive) outcomes
• Use evaluation to develop appropriate and useful performance measures for reporting R&D outcomes, and monitoring those outcomes for continuous improvement.
13
Michael CoplenSenior Evaluator
Office of Research & DevelopmentFederal Railroad Administration
202-493-6346Michael.Coplen@dot.gov
13
QUESTIONS?
14
Supplemental Information
15
Context Inputs Implementation Impact
FormativeEvaluation
What are the highest priority needs for sleep health and safety in the railroad industry?
Given the need for sleep health education and training, what are the most promising alternatives (fatigue website, regulations, etc.)? How do they compare (potential success, costs, etc.)? How can this strategy be most effectively implemented? What are some potential barriers to implementation?
To what extent is the website project proceeding on time, within budget, and effectively? If needed, how can the design be improved?
To what extent are people using the website? What other indicators of use, if any, have emerged that indicate the website is being accessed and the information is being acted upon? What are some emerging outcomes (positive or negative)? How can the implementation be modified to maintain and measure success?
Summative Evaluation
To what extent did the fatigue website address this high priority need?
What strategy was chosen and why compared to other viable strategies (re. prospects for success, feasibility, costs)?
To what extent was the website carried out as planned, or modified with an improved plan?
To what extent did this project effectively address the need to educate railroad employees on sleep health and safety? Were there any unanticipated negative or positive side effects? What conclusions and lessons learned can be reached (i.e. cost effectiveness, stakeholder engagement, program effectiveness)?
Evaluation Framework:Illustrative Questions – Fatigue Website
Safety Culture
ValuesManagement
Establish SteeringCommittee
(M anagem ent)
Data Analysis & CAPlanning
(S teering C om m ittee,C A Team )
Corrective ActionsWorkers don’t have control(CA Team)Workers have control(Steering Committee)
Develop Checklist(S teering
C om m ittee)
Observer Training(Steering Committee
(O bservers)
Data Gathering &Feedback
(O bservers)
Attitudes Competencies Patterns of Behavior
At-RiskConditions
At-RiskBehaviors
Incidents
INTERVENTION(Management & Labor)
Clear Signal for Action (CSA) Theory of Change
Input Evaluation: Program Design and Partnership Commitment to Change
SafetyOutcomes
Implementation Evaluation
Continuous Improvement (CI)
Safety Leadership Development (SLD)
Peer-to-Peer Feedback
S.T.E.E.L. Activities General employee practices
Culture
S.T.E.E.L.-targeted employee practices
Reactions to problems
Corporate results
Employee well-being
Incidents
Steering committee
training
Checklist develop-
ment
Sampler training
Coaching
Commun-ications
Feedback
Data analysis
Sampling
Barrier identifica-
tion
Barrier removal
Leadership training
Implementation First Order Impacts Second Order Impacts Third Order Impacts
Attitude toward safety
Safe behaviors
Safety culture
Labor-management relations
Personal sense of control/responsibility
Equipment control
Close calls
Personal Injuries
Derailments
Collisions
Rule compliance
Job satisfaction
Safety hotline
Health
Stress
Liability
Incident costs
Productivity
Public image
Discipline
FTX results
Investigations
Decertifications
Management practices
Communication quality, amount and
consistency
Safety-enabling leadership behaviors
Awareness
Attitude toward safety
Employee involvement in S.T.E.E.L.
Other influences include:· Corporate policy changes· FRA practices
Impact Evaluation: Expected changes and possible metrics (Union Pacific example)