Post on 25-Jul-2020

Transitioning from Traditional ACs to Automated Simulations: Insights for Practice and Science

ACSG 2014 Stellenbosch, South Africa

Presenter: Dan J. Putka Alexandria, VA, U.S.A.

March 14, 2014

● Advances in technology are creating new and exciting opportunities for automating traditional AC assessments, but…

● Assessment practice is having trouble keeping pace with

technological advancement – Constantly evolving technology + demand for latest/greatest + limited budgets/time

● Psychometrics is having trouble keeping pace with technical advancement and

assessment practice – Is the “technology train” taking us to a station without a psychometric platform?

Balancing promise and peril to create cutting-edge, yet scientifically sound




● Practice is getting ahead of the AC research literature –

research continues to be thin (Gibbons, 2013) – Long publication cycles + data access issues

In sum…


Technology Assessment


AC Research & Psychometrics

● Helping a large U.S. federal government transition a long-standing, nationwide AC programme to an automated simulation-driven assessment programme – History of the AC programme – Why change? – Technology, meet tradition – Making the transition – Challenges and opportunities – The road ahead

Today’s presentation


History of the AC Programme

Organizational context

● U.S. Bureau of Alcohol, Tobacco, Firearms, and Explosives (ATF) – Federal law enforcement agency within

the U.S. Department of Justice – Dual responsibilities for enforcing Federal

criminal laws and regulating firearms and explosives industries

– Offices located in cities throughout the U.S.


Context for the original AC programme

● 1996 settlement agreement required ATF to change its human resource practices for special agents

● Develop AC programmes for promotion decisions at two levels of leadership within ATF – 1st and 2nd level supervisors

● Court-appointed oversight committee overseeing custom

development of AC programmes


Programme scope

● 1st level supervisor AC – 2002 N = 170 - 2003 N = 100 – 2006 N = 275 - 2008 N = 347

● 2nd level supervisor AC

– 2003 N = 71 - 2004 N = 41 – 2005 N = 13 - 2007 N = 85 – 2009 N = 73

Note. N = Number of candidates evaluated at the AC in the given year.


Dimension × exercise map

Category/Dimension Analysis Exercise

In Basket

Role Play

Past Behavior Interview

Technical Procedural Knowledges Knowledge of Relevant Laws, Regs, Policies X X General Investigative Knowledge X X X Knowledge of Administrative Procedures X

Management/Administrative Resources Management X X Judgment & Problem Solving X X X X Decisiveness X X Plan, Organize, Prioritize X X X X

Influence/Interpersonal Communicate Orally X X X X Relate to Others X X Lead Others X X


Comparison to other ACs


Characteristic 2008

ATF AC Woehr & Arthur (2003)

Studies Number of Assessments 4 M = 4.78, SD = 1.47 Number of Dimensions 10 M = 10, SD = 5.11 Candidate-to-Assessor Ratio 1-to-2 (.50) M = 1.71

Rating Approach Within-exercise 63% of those reporting used a within-exercise approach

Assessor Occupation Managers/ supervisors

83% of those reporting used managers/supervisors

Length of Assessor Training 4 days (32 hours) M = 3.35 days, SD = 3.06

AC Purpose Selection/ promotion

81% of those reporting were for selection/promotion

Summary of original AC

● Highly rigorous, custom development process grounded in SIOP Principles for developing content valid assessments and Guidelines and Ethical Principles for AC Operations

● Legal challenges drastically reduced

● Overall, viewed as a clear improvement

● Leveled the playing field for all candidates


Why Change?


● ATF was looking for alternatives that would retain as many benefits of a traditional AC as possible, but that would not involve human assessors

● The cost of “manager” assessors – Flights: 30+ managers flying in for the AC from all over the U.S. – Hotels, per diem, labor costs

• Two week AC + 4 days of pre-AC training – Productivity losses: Managers away from their jobs for two+ weeks

● Flights, hotels, per diem, labor costs for 300+ internal candidates!


Technology, Meet Tradition

Pros & Cons: Financial/business perspective Technology Enhanced Sims Traditional AC Exercises

Lower long term costs (no assessors)

Higher long term costs (assessors)

Higher short term costs (tech development/coding/testing)

Lower short term costs (little or no technology)

Ease of administration (lower logistics burden)

Difficulty of administration (greater logistics burden)

Reduced testing time (e.g., 4.5 hrs) Longer testing time (e.g., 9 hrs)

Lower long term costs + lower admin burden allows benefits of ACs to be pushed to more levels of leadership

Higher long term costs + higher admin burden tends to limit ACs to highest levels of leadership


Pros & Cons: Psychometric perspective Technology Enhanced Sims Traditional AC Exercises

Lower fidelity with on-the-job behavior – closed-ended response formats

Higher fidelity with on-the-job behavior – free response formats

Difficulty measuring constructs best judged by actual behavioral observation (e.g., oral communication)*

Well suited for measuring constructs best judged by actual behavioral observation

Difficulty of validation strategy – criterion and content focused

Relative ease of validation strategy – content focused

Potential for fully standardized assessment

Potential lack of standardization across assessors/role players

More objective scoring More subjective scoring (perceived)

Complex scoring and measurement issues to confront

Relatively simple scoring and measurement issues

Thin research literature – far less precedent for best practice

Deep research literature – much precedent for best practice


Making the Transition

Overview of the EPAS

● Electronic Promotion Assessment System (EPAS) – Suite of three assessments for promoting ATF Special Agents to 1st

line supervisor positions throughout the U.S. – Delivered online to 623 candidates at eight proctored test sites

throughout the U.S. in the fall of 2012 • Situational Judgment Test • Office Simulation • Virtual Role Play

– The EPAS was custom developed by HumRRO (prime contractor) and ClicFlic (subcontractor) in partnership with ATF


EPAS development ● 7 months start to finish!

– Initial assessment and scoring development • Working from previously developed “paper” in-basket and role play

exercises and detailed job analysis data • Multiple SME workshops

– Audio-animation production – Coding – Beta testing and quality control – Pilot testing and refinement – Criterion development – Concurrent, criterion-related validation study – Implementation – Sleep, lots of sleep


EPAS assessments

● Situational Judgment Test (SJT) – Simple progression through a series of animated scenarios – Closed-ended response format – “rate the effectiveness of each response”

● Office Simulation (OS)

– Variation on a traditional AC in-basket – Simple progression through a series of animated “in-basket” items

• E-mails with attachments • Phone calls • Office visits from “virtual” employees

– Limited branching within items – Wide variety of response types designed to mimic job behavior

● Virtual Role Play (VRP)

– Highly interactive – a lot of conditional branching – Relatively fewer response types than the OS, but still varied


A video is worth 10,000 words


Dimension × exercise map

Category/Dimension SJT Office Sim

Virtual Role Play

Technical Procedural Knowledges Knowledge of Relevant Laws, Regs, Policies X General Investigative Knowledge X X

Management/Administrative Judgment & Problem Solving X X X Plan, Organize, Prioritize X X X

Influence/Interpersonal Relate to Others X X Lead Others X X


EPAS development

● Followed a strict content-oriented development process (SIOP, 2003) – Reviewed task/KSAO linkages from Special Agent job analysis

(and existing in-basket and role play assessments) – Identified critical job tasks that could be simulated and provide

sufficient stimuli for eliciting dimension-relevant behavior – Developed assessments based on a subset of those tasks – Worked closely with Special Agent SMEs on content and scoring – Ensured ample opportunities for candidates to demonstrate

behaviors relevant to the critical KSAOs required by the job/tasks – Evaluated strength of linkages between KSAOs/dimensions and

each assessment (post-development content validity ratings)


EPAS criterion-related validity

● Concurrent, criterion-related validation study prior to operational use – Job incumbents and their supervisors (raters) – Supervisor ratings of incumbent job performance

• Multi-dimension, behavioral summary scales

– Sample size = 134

● Raw, uncorrected correlations between job performance and EPAS scores: – Overall EPAS score: Correlation in the mid .40s – Dimension and exercise-level scores: Correlations averaged in the

mid- to upper- 20s.


EPAS cost savings











Original AC EPAS 1st Administration


EPAS Subsequent

Administration Years*

Full development and administration cost per AC candidate in U.S. dollars

Note. *Projected cost for future administration years.

The bottom line

● Custom, content-oriented development process that follows SIOP Principles

● Evidence of content and criterion-related validity study prior to implementation

● Substantial cost savings per candidate - even in the development year

● ATF received a 2013 SIOP-SHRM HRM Impact Award for the EPAS work


Challenges & Opportunities

Challenges and opportunities

● In practice, many issues arise for which the research literature is unclear or non-existent (i.e., fun!) – In terms of implementing sound practice, these represent challenges – In terms of scientific advancement, these represent opportunities

● Development and implementation of the EPAS presented several challenging opportunities – Standardization in the face of branching – Fidelity of closed-ended response formats – Multiple item response formats + branching = scoring fun!


Standardization in the face of branching

Recommendations and lessons learned ● Don’t go overboard with branching

– Enough for realism, but not so much that it makes it difficult to ensure a sufficient number of “common” assessment points per competency

● Map and “play out” different potential branches with multiple SMEs

● Look for creative ways to “build in” commonality and strategically “redirect” the candidate as needed – “Dead ends” that feed into a new common path or assessment point that

requires action on the part of the candidate – “Visit” from another member of the office – Arrival of a new e-mail – Incoming phone call


Fidelity of closed-ended responses

Recommendations and lessons learned ● Use response formats that “naturally” follow from the stimulus

used to elicit the competency you are trying to measure

● To the extent possible, let the response format mirror the types of judgments and decisions candidates make on the job, e.g.:

– Evaluating the effectiveness of potential courses of action – Prioritizing potential courses of actions – Identifying errors/deficiencies in work products – Evaluating the seriousness of errors/deficiencies – Evaluating the criticality of information for decision making – Determining when sufficient information has been obtained

– This goes well beyond simply “pick the best answer”.


Multiple response formats + branching = …


● Use of items with different response formats can lead to unintended weighting of items when forming assessment scores – Large item variance differences can greatly impact weighting – Contrast 0-1 scale with 1-100 scale

● Individuals completion of different items (as a result of branching)

can decrease the reliability of the measure – When individuals complete the same items, differences in item difficulty

do not affect the rank ordering of candidates – When individuals complete different sets of items, differences in item

difficulty can affect the rank ordering of candidates – Potentially exacerbated by use of items with different response formats

Recommendations and lessons learned ● Don’t go overboard with branching or multiple response formats!

– Need to balance realism/fidelity with psychometric reality

● Map different item response scales to a common scale – Simple (e.g., linear) transformations: cost effective and perhaps

sufficient if meaning of resulting scale is not critical (e.g., simple rank ordering of candidates)

– SME-based mapping: more expensive, may not completely solve it, but important if meaning of overall scale is important (e.g., comparison of scores to a proficiency benchmark or cut-off)

● When designing simulations with branching, be very cognizant of the amount of overlap among items that are scored and strive to maximize that overlap


= psychometric fun

The Road Ahead

● As technology evolves, there will ALWAYS be issues in

need of research attention, and unknown questions – These are sources of risk, potential value….and fun!

● How do we ensure sound practice given differences in

rates of knowledge advancement highlighted earlier? – This is issue is bigger than AC research/practice – it spans domains – Assessment technology consortia spanning academia-practice?

• What would be the stimulus for practice to participate? • What would be the stimulus for academe to participate?

– Balancing competitive advantage with scientific advancement

Advancing science and practice


In sum…


Technology, assessment practice , AC research, and psychometrics

The more we can bring knowledge in these areas into alignment, the more likely it is that our assessments will reflect truly innovative, best-in class solutions for the individuals we assess, while at the same time providing the basis for making impactful, practical contributions to the scientific knowledge base.

● Gibbons, A. M. (2013). Research evidence and AC 2.0: What we know and what we don’t. Presentation at the 33rd Annual Assessment Centre Study Group Conference. Stellenbosch, South Africa.

● Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th edition). Bowling Green, OH: Author.

● Woehr, D. J., & Arthur, W., Jr. (2003). The construct-related validity of assessment center ratings: A review and meta-analysis of the role of methodological factors. Journal of Management, 29, 231–258.



Dr. Dan Putka, Principal Staff Scientist Human Resources Research Organization (HumRRO) Email:

