Richard Fuller
Leeds Institute of Medical Education
AIMING FOR A NEW HORIZON?
‘Assessment personalization in an era of
massification…’
Massification (n)
Globalisation and
internationalisation of
education
Educational development
applied to mass audience
Application of new
‘technologies’ and uniformity of
practice / student experience
(e.g. the OSCE)
The growth of organised,
standardised testing
IMPACT
• Growth of more organised models of testing – e.g. schedules of assessment
• Constructive alignment and blueprinting
• Holistic assessment – knowledge, clinical skills and (eventually) professionalism
• A focus on quality….
1. JUST FOCUS ON THE
LATEST IN FASHION TOOL
2. PUT PROFESSIONALISM IN A
DIFFERENT TALK/WORKSHOP….
Unprofessionalism/lack of
professionalism
• Critical incidents?
• Markers/traits & Consequences (Papadakis)?
• Lapses (Ginsberg)?
• Identity (Creuss)?
• Faculty (Steinert)?
Or, Meritorious behaviours – care, compassion, team working?
3. MAKE IT EASY TO MARK
• What is easy to assess (and mark) does not equate with importance and value
• SBAs too often ‘knowledge’ only - bias is away from challenging, realistic tasks
• Can we (re)conceptualise the challenges and benefits of computer based assessment in delivering better tests?
4. MAKING ASSESSMENT DECISIONS
IS EASY…
Assessors Learners
Institutions Remediation
What am I
testing?
‘Double Duty’
- Teacher
- Professional /
Assessor
Patients / Safety
Identity?
Resilience?
Capacity for
improvement
Readiness to
join profession?
Motivation
Culture
Reputation
Standards
Tools
Systems
Effective?
Alternatives /
Exits?
Boud Studies in continuing Education 2000
Yepes-Rios et al BEME Guide 42 Medical Teacher 2016
5. EMBRACE THAT
RELIABILITY
‘TYRANNY’
6. THEN CONTINUE THAT GRADE /
TARGET DRIVEN CULTURE….
• ‘Grey and fuzzy’ (Yorke)
• ‘Transactional Currency’ (Sadler)
• ‘Value-laden, inherently unstable process reactive to complex and changing interactions within an assessment’ (Hodges)
• Conclusion:
• Within any individual ‘test’, difficulty at the pass/fail boundary remains (Sadler)
• ‘Let’s stop the pretence of consistent marking’ (Bloxham)
Yorke 2011
Sadler 2010
Bloxham et al 2016
GLOBAL?
REALITY?
30 end of block MCQ tests
5 projects
8 skills portfolios
12 end of block OSCEs
15 professionalism surveys
20 end of placement
evaluations
Endless WBA and compulsory
reflective exercises
Limited (efficacy) feedback….
FEEDBACK…….
• ‘Grade Justification’
(Sadler)
• ‘Hopefully useful
information’ (Boud)
• Need to overcome
cultural ‘history’
– Feedback to pass the test
– Feedback to get a better
grade
IMPACT ON LEARNERS AND
TEACHERS?
• Grades are arbitrary and can be subject to inflation
• (Bad) Assessment does not drive meaningful learning
• Over tested students are:
– Disjointed - ‘learn & forget’
– Disengaged - shallow, ‘strategic’ learners that focus
only on ‘what they think can be tested’
– Dissatisfied - ‘that’s not why I came to medical
school’
CHANGING ASSESSMENT’S HORIZONS?
Using assessment data ‘intelligently’ (Programmatic Assessment)
- e.g. McMaster Master WBA programme for postgraduate training
Milestones and ‘Growth’
- e.g. US Milestones programme, GMC General Professional Capabilities
‘Sustainable Assessment revisited’ (Boud & Soler 2016)
- Assessment to focus on learning beyond the timescale of the (given) course..
CAN WE RECONCEPTUALISE
ASSESSMENT IN MEDICAL EDUCATION?
• Can we rebalance our focus on both high stakes decision making and psychometric analysis and consequences on learning and instruction?
• Move to a personalised model of assessment that is more….
• Compassionate?
CASE 1: EASING TRANSITION SHOCK
• Entry to Medical School– grappling with being a new starter (and existing ‘habits’) yet ‘hit’ with the summative test
– Transitions and Critically Intense Learning Period
• Cumulative/continuous testing across much of Year 1 and Year 2
– Multi-stage class tests with debrief and feedback + alternate testing for those who struggle
– Weekly cases and continuous assessment opportunities
– Non Graded Pass
• Outcomes
– Happier students and staff
– Better test success in high stakes end of year exam
– Developing trust and ‘learning to learn’ effectively
Baker 2017
Kilminster et al Med Educ 2011
Velji. BSc Med Ed Leeds 2015
CASE 2: RITUALS AND RELAXING
• Should we sequester / quarantine students for large scale tests to reduce impact of cheating?
– Large scale psychometric evidence says ‘no’
– Continued student anxiety….
• Student research project looking at what students do in the run up to and time in between/after OSCEs
– Draws on sports and performance art
– ‘rituals and rites’ identified that help students manage high stress situations proximal to performance – many solo/in solitude
– Justification for not sequestering, but scope for further inter-disciplinary work on keeping assessment ‘compassionate’
McCourt et al Med Educ 2012
Lever. BSc Med Ed Leeds 2014
CAN WE RECONCEPTUALISE
ASSESSMENT IN MEDICAL EDUCATION?
• Can we rebalance our focus on both high stakes decision making and psychometric analysis and consequences on learning and instruction?
• Move to a personalised model of assessment that is more….
• Customised?
CASE STUDY 3: IMPROVE LARGE
SCALE TEST DESIGN & DELIVERY
• Challenges in assessment
– Considerations of reliability >> Sampling >> item
quality
– Large, expensive assessments
– Challenges of setting standards and accurate decision
making.
• Hypothesis
– Not enough focus on blueprinting and high quality
items
– Overtesting of those we know who will pass.
– Under assessment of those we are most concerned about
ADAPTIVE TESTING: THE
SEQUENTIAL TEST (SQT)
• Shorter tests with ‘an adaptive stopping rule’
– Stronger (obviously competent) candidates receive a shorter ‘screening test’
– Candidates of concern receive longer or multiple sequence tests
• Designed primarily around effective blueprinting and effective sampling/testing of those of most concern
– Fairer to all stakeholders
– Better reliability (driven through more testing of the critical pass-fail/borderline group)
– Improved diagnostic accuracy over ‘traditional tests’
– More effective and cost efficient use of resources
Wainer & Feinburg, Significance 2015
Pell, Fuller, Homer (multiple)
IMPACT OF REMEDIATION OF
FAILING STUDENTChange in percentile ranks following Y5 resit year
(n=19)
• Median OSCE rankings
improve by 30 percentile
points
• Knowledge test: 20
percentile points
• Longitudinal ‘proportion
of change’ has reduced
in Year 5 over cohorts
• Engagement with
remediation and ITA
‘better’
• Better acceptance of
failure with S1/S2 vs
single test and resit
IMPACT ON THE ‘JUST PASSING /
BORDERLINE STUDENT’
How do those who get brought back for S2 and progress do (Y4 →
Y5) compared that only do S1?
• Emerging results – OSCE
• Those who got brought back for S2
OSCE (only) improved their relative
ranking a little over Y4 to Y5 (n=38;
total cohort = 442).
• Those brought back for both OSCE
and knowledge make a smaller
improvement (n=20).
• Evidence from WBA monitoring that
Seq 2 seems to ‘switch on’ better
engagement (epiphany moments).
• Sustained result across 3 cohorts
3.3
1.3
-1.6
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
S2 OSCE S2 Both (OSCE) S1 OSCE
Median percentile change - OSCE
CAN WE RECONCEPTUALISE
ASSESSMENT IN MEDICAL EDUCATION?
• Can we rebalance our focus on both high stakes decision making and psychometric analysis and consequences on learning and instruction?
• Move to a personalised model of assessment that is more….
• Consequential?
CASE STUDY 4: (WORKPLACE)
ASSESSMENT AS DIAGNOSIS?
• Blended learning curriculum
• Learner held assessment and content on smartphones and tablets across programme
• Assessment for Learning
• Feedback, reflection & student evaluation to learner and assessor
• Big data set and longitudinal impact~ 3000 Year 4 and 5 students (2011 +)
> 50 000 individual workplace assessments
LEARNER ENGAGEMENT AND
PROGRESSION
• 2011
– Emergent correlation between early, sustained WBA engagement and OSCE success (r 0.327)
– Late onset, bare minimum approach (r -0.25)
• 2015 - 7
– Sustained and growing correlation between engagement and success (0.59; p<0.001). Strong link with SQT outcomes
• ‘Early Alert’ with differential ITA and support for those not engaging
– Worrying correlation with persistent disengagement and SQT outcomes ( -0.6, p <0.001)
• Introduction of customised nudges for ‘at risk’ student in 2016-17
• Early results show good outcomes – nudge responders not in S2, better than ‘predicted’ performance
• Correlation now -0.32. What to do with the resistors?
IMPACT ON FEEDBACK
• Assessor comments
– ‘Efficient Rectal
Exam’
• Student response
– ‘Thanks’
• It was very useful to have discussed patient management and discussion prior to seeing this patient.
• I was then able to use the skills taught to present and create a management plan.
• I need to practice patient management discussion but today's clinic has given me the foundation to do so
• You handled this well and were confident enough to take a patient centred and problem list approach including seeking his views about management planning.
• Your presentation and summary were much more fluid and confident and you made good use of practical tips to help manage info from the patient
• Good evidence of reasoning and discrimination and great feedback from the patient who felt confident to seek reassurance from you about his meds
• Next steps - routine use of these approaches in all your notes and build confidence in note taking whilst consulting rather than at the end.
Trends in who students are asking to
assess…
Growth in feedback quantity, quality
and student activity….
Strong correlation with self regulation
and student success…
Conceptual framework of self-
determination, learner identity and
autonomous motivation
A RECONCEPTUALISED FUTURE?
PERSONALISED ‘3C’ ASSESSMENT
• COMPASSIONATE
– Learning initiated assessment - A partnership between teachers and learners
– Well designed tests and programmes of assessment that are sensitive to learner and teacher ‘load’ and the needs of patients and wider society
• CUSTOMISED
– Wider scale adaptive test models that invest in supporting learners of different ability
– Intelligent use of existing structured assessment – and research driven innovation
• CONSEQUENTIAL
– Sensitive use of ‘big data’ (numbers and words) to generate individualised, meaningful feedback, actions and growth
– Focus on personalization of achievement & impact for all stakeholders
THANKS FOR YOUR ATTENTION
References, copies of
slides and comments:
Professor Richard
Fuller
@LeedsARG