European College of Veterinary Internal Medicine Congress2012
Debbie Jaarsma ([email protected])Academic Medical Centre, University of Amsterdam
The Netherlands
With many thanks to prof. dr. Cees van der Vleuten, Maastricht University
General principles in
assessment of
professional competence
SHORT BIOGRAPHY
� VETERINARIAN BY TRAINING
� SPECIALIST TRAINING PATHOLOGY AT FACULTY OF VETERINARY MEDICINE, UTRECHT UNIVERSITY (FVMU)
� TEACHER UNIVERSITY OF APPLIED ANIMAL SCIENCES AT DEN BOSCH
� SALES MANAGER FARMACEUTICAL INDUSTRY (JOHNSON & JOHNSON)
� PhD IN VETERINARY EDUCATION AT FVMU*
� ASSISTANT PROFESSOR ‘QUALITY IMPROVEMENT VETERINARY EDUCATION’ AT FVMU
� FULL PROFESSOR ‘EVIDENCE BASED EDUCATION’ AT
ACADEMIC MEDICAL CENTRE, UNIVERSITY OF AMSTERDAM
* Dissertation title: Developments in Veterinary Medical Education – Intentions, Perceptions, Learning Processes and Outcomes
(http://igitur-archive.library.uu.nl/dissertations/2008-1014-200404/UUindex.html)
Welcome everyone!
� Are you familiar with assessment literature?
A. Not at all
B. To some extent
C. Pretty familiar
D. As my backyard
Overview of presentation
� General principles of assessment
� Implications for practice
� Criteria for good assessment
� Final note
Assessment:
“involves testing, measuring, collecting and
combining information, and providing feedback”
� Drives and stimulates learning
� Provides information on educational efficacy
institutions/teachers
� Protects patients and society
Simple competence model
Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7.
Knows
Shows how
Knows how
Does
Pro
fessio
nal auth
enticity
Cognition
Behaviour
Simple competence model
Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7.
Knows
Shows how
Knows how
Does
Pro
fessio
nal auth
enticity
Establishedtechnology
Under construction
Knows
Shows how
Knows how
Does
Stimulus format: fact orientedResponse format: written, open, computer-based, oral
Stimulus format: (patient) scenario, simulationResponse format: written, open, oral, computer-based
Stimulus format: hands-on (patient) standardized scenario or simulationResponse format: direct observation, checklists, rating scales
Stimulus format: habitual practice performanceResponse format: direct observation, checklists, rating scales, narratives
Assessment formats used
What do you think?
� Clinical reasoning, particularly with experts,
is:
A. More generic than context specific
B. More context specific than generic
C. Equally specific and generic
Assessment principle 1
� Competence is specific, not generic
TestingTime inHours
1
2
4
8
MCQ1
0.62
0.76
0.93
0.93
Case-BasedShortEssay2
0.68
0.73
0.84
0.82
PMP1
0.36
0.53
0.69
0.82
OralExam3
0.50
0.69
0.82
0.90
LongCase4
0.60
0.75
0.86
0.90
OSCE5
0.47
0.64
0.78
0.88
PracticeVideoAssess-ment7
0.62
0.76
0.93
0.93
1Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 1987
4Wass et al., 20015Petrusa, 20026Norcini et al., 1999
MiniCEX6
0.73
0.84
0.92
0.96
7Ram et al., 19998Gorter, 2002
TestingTime inHours
1
2
4
8
MCQ1
0.62
0.76
0.93
0.93
Case-BasedShortEssay2
0.68
0.73
0.84
0.82
PMP1
0.36
0.53
0.69
0.82
OralExam3
0.50
0.69
0.82
0.90
LongCase4
0.60
0.75
0.86
0.90
OSCE5
0.47
0.64
0.78
0.88
PracticeVideoAssess-ment7
0.62
0.76
0.93
0.93
1Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 1987
4Wass et al., 20015Petrusa, 20026Norcini et al., 1999
MiniCEX6
0.73
0.84
0.92
0.96
7Ram et al., 19998Gorter, 2002
Competence is not generic
Practical implications
� Competence is specific, not generic
� One measure is no measure� Increase sampling (across content, examiners, patients@)
within measures
� Combine information across measures and across time
What do you think?
� Multiple choice questions are objective and
therefore more reliable.
A. True
B. False
Assessment principle 2
� Objectivity is not the same as reliability
TestingTime inHours
1
2
4
8
MCQ1
0.62
0.76
0.93
0.93
Case-BasedShortEssay2
0.68
0.73
0.84
0.82
PMP1
0.36
0.53
0.69
0.82
OralExam3
0.50
0.69
0.82
0.90
LongCase4
0.60
0.75
0.86
0.90
OSCE5
0.47
0.64
0.78
0.88
PracticeVideoAssess-ment7
0.62
0.76
0.93
0.93
1Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 1987
4Wass et al., 20015Petrusa, 20026Norcini et al., 1999
MiniCEX6
0.73
0.84
0.92
0.96
7Ram et al., 19998Gorter, 2002
TestingTime inHours
1
2
4
8
MCQ1
0.62
0.76
0.93
0.93
Case-BasedShortEssay2
0.68
0.73
0.84
0.82
PMP1
0.36
0.53
0.69
0.82
OralExam3
0.50
0.69
0.82
0.90
LongCase4
0.60
0.75
0.86
0.90
OSCE5
0.47
0.64
0.78
0.88
PracticeVideoAssess-ment7
0.62
0.76
0.93
0.93
1Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 1987
4Wass et al., 20015Petrusa, 20026Norcini et al., 1999
MiniCEX6
0.73
0.84
0.92
0.96
7Ram et al., 19998Gorter, 2002
Objectivity is not the same as reliability
Reliability oral examination
TestingTime inHours
1
2
4
8
Two NewExaminers
forEach Case
0.61
0.76
0.86
0.93
NewExaminer
forEach Case
0.50
0.69
0.82
0.90
SameExaminer
forAll Cases
0.31
0.47
0.47
0.48
Numberof
Cases
2
4
8
12
TestingTime inHours
1
2
4
8
Two NewExaminers
forEach Case
0.61
0.76
0.86
0.93
NewExaminer
forEach Case
0.50
0.69
0.82
0.90
SameExaminer
forAll Cases
0.31
0.47
0.47
0.48
Numberof
Cases
2
4
8
12
Swanson, 1987
Practical implications
� Objectivity is not the same as reliability
� Don’t trivialize assessment (and compromise on validity) with unnecessary objectification and standardization
� Don’t be afraid of holistic judgement
� Sample widely across sources of subjective influences (raters, examiners, patients)
What do you think?
� Which format measures ‘understanding’
best?
A. MCQs
B. Essay questions
C. Orals
D. All of the above
Assessment principle 3
� What is being measured is more determined
by the stimulus format than by the response
format.
Knows
Shows how
Knows how
Does
Stimulus format: fact orientedResponse format: written, open, computer-based, oral
Stimulus format: (patient) scenario, simulationResponse format: written, open, oral, computer-based
Stimulus format: hands-on (patient) standardized scenario or simulationResponse format: direct observation, checklists, rating scales
Stimulus format: habitual practice performanceResponse format: direct observation, checklists, rating scales, narratives
Assessment formats used
Empirical findings
� Once reliable (= sufficient sampling):
correlations across formats are huge
� Cognitive activities follow the task you pose
in the stimulus format
Moving from assessing knows1
Knows:What is arterial blood gas analysis most likely to show in dogs with cardiogenic shock?
A. Hypoxemia with normal pHB. Metabolic acidosisC. Metabolic alkalosisD. Respiratory acidosisE. Respiratory alkalosis
1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.
To assessing knowing how1
Knowing How:A 7-year-old bitch is brought to the emergency department. She is restless and panting. On admission, her temperature is 37.7 C, pulse is 120/min, and resp are 40/min. During the next hour, she becomes increasingly stuporous, pulse increases to 140/min, and respirations increase to 60/min. Blood gas analysis is most likely to show:
A. Hypoxemia with normal pHB. Metabolic acidosisC. Metabolic alkalosisD. Respiratory acidosisE. Respiratory alkalosis
1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.
Practical implications
� What is being measured is more determined by the stimulus format than by the response format
� Don’t be married to a format (e.g. essays)
� Worry about improving the stimulus format
� Make the stimulus as (clinically) authentic as possible (e.g. in MCQs, OSCEs)
What do you think?
� The best strategy for constructing good test
material is:
A. Training staff to write test material
B. Peer review of test material
Assessment principle 4
� Validity can be ‘built-in’
Empirical findings
� Validity is a matter of good quality assurance
around item construction (Verhoeven et al 1999)
� Generally, medical (and veterinary) schools
can do a much better job (Jozewicz et al 2002)
Item review process
anatomy
physiology
int medicine
surgery
psychology
item poolreview
committee
test
administration
item analyses
student
comments
Info to users
item bank
Pre-test review Post-test review
anatomy
physiology
int medicine
surgery
psychology
anatomyanatomy
physiologyphysiology
int medicineint medicine
surgerysurgery
psychologypsychology
item poolitem poolreview
committee
review
committee
test
administration
test
administration
item analyses
student
comments
item analyses
student
comments
Info to usersInfo to users
item bankitem bank
Pre-test reviewPre-test review Post-test reviewPost-test review
Practical implications
� Validity can be ‘built-in’
� Outcomes and content need to be clear and known to the item constructors AND the learner
� Assessment is as good as you are prepared to put into it
� Develop quality assurance cycles around test development
� Share (good) test material across institutions
What do you think?
� What drives students’ learning most?
A. The teacher
B. The curriculum
C. The assessment
Assessment principle 5
� Assessment drives learning
Curriculum
Teacher
Assessment
Student
Assessment
Learner
Assessment may drive learning through:� Content� Format� Programming/scheduling� Regulations� ..........
An alternative view
Empirical findings
� The relationship between assessment and learning
is complex
� Learning strategy is mediated by the perception of
the students on the assessment
� Summative assessment systems often drive in a
negative way
� Formative feedback has dramatic impact on learning
� Learners want feedback (more than grades)
Practical implications
� Assessment drives learning
� For every evaluative action there is an educational reaction
� Verify and monitor the impact of assessment : many intended effects are not actually effective
� No assessment without feedback!
� Embed the assessment within the learning programme
� Use the assessment strategically to reinforce desirable learning behaviours
What do you think?
� The best method of assessment is:
A. Vignette based MCQs
B. Orals
C. Portfolios
D. None of these
Assessment principle 6
� No single method can do it all
Empirical findings
� One measure is no measure
� All methods have limitations (no single superior
method exists)
� Different methods may serve a different function
� In combination, information from various
methods provide a richer picture and combines
formative and summative functions
Practical implications
� No single method can do it all
� Use a cocktail of methods across the competency pyramid
� Arrange methods in a programme of assessment
� Any method may have utility
� Compare assessment design with curriculum design� Responsible people/committees
� Use an overarching structure
� Involve your stakeholders
� Implement, monitor and change (assessment programmes ‘wear out’)
Assessment principles
1. Competence is specific, not generic
2. Objectivity is not the same as reliability
3. What is being measured is more determined
by the stimulus format than by the response
format
4. Validity can be ‘built-in’
5. Assessment drives learning
6. No single method can do it all
Criteria for good assessment
� Validity or Coherence
� Reproducibility or Consistency
� Feasibility
� Educational effect
� Catalytic effect
� Acceptability
Theme group “Criteria for Good assessment”, Norcini et al.
Medical Teacher 2011
Finally
� Assessment in medical education has a rich history of
research and development with clear practical
implications
� Veterinary education is catching up @..
� Assessment is much more than psychometrics; it
involves educational design
� Lots of exciting developments lie still ahead of us!
Literature� Cillier, F. (In preparation). Assessment impacts on learning, you say? Please explain how. The impact of summative
assessment on how medical students learn.
� Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with mixed success? A systematic review. Med Educ, 41(12), 1224-1233.
� Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitative
� research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2), 214-220.
� Dijkstra, J. , Schuwirth, L. & Van der Vleuten (In preparation) A model for designing assessment programmes.
� Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54.
� Gorter, S., Rethans, J. J., Van der Heijde, D., Scherpbier, A., Houben, H., Van der Vleuten, C., et al. (2002). Reproducibility of clinical performance assessment in practice using incognito standardized patients. Medical Education, 36(9), 827-832.
� Govaerts, M. J., Van der Vleuten, C. P., Schuwirth, L. W., & Muijtjens, A. M. (2007). Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract, 12, 239-260.
� Hodges, B. (2006). Medical education and the maintenance of incompetence. Med Teach, 28(8), 690-696.
� Jozefowicz, R. F., Koeppen, B. M., Case, S. M., Galbraith, R., Swanson, D. B., & Glew, R. H. (2002). The quality of in-house medical school examinations. Academic Medicine, 77(2), 156-161.
� Meng, C. (2006). Discipline-specific or academic ? Acquisition, role and value of higher education competencies., PhD Dissertation, Universiteit Maastricht, Maastricht.
� Norcini, J. J., Swanson, D. B., Grosso, L. J., & Webster, G. D. (1985). Reliability, validity and efficiency of multiple choice question and patient management problem item formats in assessment of clinical competence. Medical Education, 19(3), 238-247.
� Papadakis, M. A., Hodgson, C. S., Teherani, A., & Kohatsu, N. D. (2004). Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med, 79(3), 244-249.
� Papadakis, M. A., A. Teherani, et al. (2005). "Disciplinary action by medical boards and prior behavior in medical school." NEngl J Med 353(25): 2673-82.
� Papadakis, M. A., G. K. Arnold, et al. (2008). "Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards." Annals of Internal Medicine 148: 869-876.
Literature� Petrusa, E. R. (2002). Clinical performance assessments. In G. R. Norman, C. P. M. Van der Vleuten & D. I. Newble (Eds.),
International Handbook for Research in Medical Education (pp. 673-709). Dordrecht: Kluwer Academic Publisher.
� Ram, P., Grol, R., Rethans, J. J., Schouten, B., Van der Vleuten, C. P. M., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Medical Education, 33(6), 447-454.
� Stalenhoef- Halling, B. F., Van der Vleuten, C. P. M., Jaspers, T. A. M., & Fiolet, J. B. F. M. (1990). A new approach to assessign clinical problem-solving skills by written examination: Conceptual basis and initial pilot test results. Paper presented at the Teaching and Assessing Clinical Competence, Groningen.
� Swanson, D. B. (1987). A measurement framework for performance-based tests. In I. Hart & R. Harden (Eds.), Further developments in Assessing Clinical Competence (pp. 13 - 45). Montreal: Can-Heal publications.
� van der Vleuten, C. P., Schuwirth, L. W., Muijtjens, A. M., Thoben, A. J., Cohen-Schotanus, J., & van Boven, C. P. (2004). Cross institutional collaboration in assessment: a case on progress testing. Med Teach, 26(8), 719-725.
� Van der Vleuten, C. P. M., & D. Swanson, D. (1990). Assessment of Clinical Skills With Standardized Patients: State of the Art. Teaching and Learning in Medicine, 2(2), 58 - 76.
� Van der Vleuten, C. P. M., & Newble, D. I. (1995). How can we test clinical reasoning? The Lancet, 345, 1032-1034.
� Van der Vleuten, C. P. M., Norman, G. R., & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education, 25, 110-118.
� Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessment of professional competence: from methods to programmes. Medical Education, 39, 309-317.
� Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (Under editorial review). On the value of (aggregate) human judgment. Med Educ.
� Van Luijk, S. J., Van der Vleuten, C. P. M., & Schelven, R. M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R. J. Hiemstra, A. J. J. A. Scherp bier & R. P. Zwierstra (Eds.), Teaching and Assessing Clinical Competence. (pp. 202-207). Groningen: Boekwerk Publications.
� Wass, V., Jones, R., & Van der vleuten, C. (2001). Standardized or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321-325.
� Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical performance ratings. Teaching and Learning in Medicine, 15(4), 270-292.