Post on 16-Jan-2020
transcript
This article was downloaded by:[Learning Resource Center/Wrs]On: 29 June 2008Access Details: [subscription number 788730037]Publisher: Informa HealthcareInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Medical TeacherPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713438241
The use of a virtual patient case in an OSCE-basedexam - A pilot studyO. Courteille a; R. Bergin†a; D. Stockeld a; S. Ponzer a; U. Fors aa Karolinska Institutet, Stockholm, Sweden
First Published: 2008
To cite this Article: Courteille, O., Bergin†, R., Stockeld, D., Ponzer, S. and Fors,U. (2008) 'The use of a virtual patient case in an OSCE-based exam - A pilot study',Medical Teacher, 30:3, e66 — e76
To link to this article: DOI: 10.1080/01421590801910216URL: http://dx.doi.org/10.1080/01421590801910216
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expresslyforbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will becomplete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should beindependently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with orarising out of the use of this material.
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
2008; 30: e66–e76
WEB PAPER
The use of a virtual patient case in anOSCE-based exam – A pilot study
O. COURTEILLE, R. BERGINy, D. STOCKELD, S. PONZER & U. FORS
Karolinska Institutet, Stockholm, Sweden
Abstract
Background: This study focuses on a skills test based clinical assessment where 118 fourth-year medical students at the four
teaching hospitals of Karolinska Institutet participated in the same 12-module OSCE. The goal of one of the twelve examination
modules was to assess the students’ skills and ability to solve a virtual patient (VP) case (the ISP system), which included medical
history taking, lab tests, physical examinations and suggestion of a preliminary diagnosis.
Aims: The primary aim of this study was to evaluate the potential of a VP as a possible tool for assessment of clinical reasoning and
problem solving ability among medical students. The feeling of realism of the VP and its possible affective impact on the student’s
confidence were also investigated.
Method: We observed and analysed students’ reactions, engagement and performance (activity log files) during their interactive
sessions with the simulation. An individual human assistant was provided along with the computer simulation and the videotaped
interaction student/assistant was then analysed in detail and related to the students’ outcomes.
Results: The results indicate possible advantages of using ISP-like systems for assessment. The VP was for instance able to reliably
differentiate between students’ performances but some weaknesses were also identified, like a confounding influence on students’
outcomes by the assistants used. Significant differences, affecting the results, were found between the students in their degree
of affective response towards the system as well as the perceived usefulness of assistance.
Conclusion: Students need to be trained beforehand in mastering the assessment tool. Rating compliance needs to be targeted
before VP-based systems like ISP can be used in exams and if such systems would be used in high-stake exams, the use of human
assistants should be limited and scoring rubrics validated (and preferably automated).
Introduction
Over recent years, we have seen an increasing level of use of
simulated and virtual patients (computer-based simulations
of patients) for both training and assessment in medical
education (Cantillon et al. 2004; Guagnano et al. 2002).
This has been extensively reported by Issenberg et al. (2005)
in a BEME systematic review (guide no 4) where both
problems and opportunities were thoroughly investigated.
Studies investigating the usefulness of simulations for
instance, have showed that students might react similarly to
real and simulated patients (Sanson-Fisher & Poole 1980).
In the study by Edelstein et al. (2000), students thought that
computer-based case simulations were better tests of clinical
decision making than written shelf examinations. Schuwirth
& Van der Vleuten (2003) investigated the problem of
construct and face validity and described how to meet the
need for assessment procedures that are both authentic and
well-structured.
In their educational assessment guidelines, Appel et al.
(2002) even recommend the Clerkship Directors to ‘use
computer-based case simulations to augment traditional internal
medicine evaluation methods’ and that these ‘would be
used as a supplement to, and not a replacement for,
other assessment tools’ (http://dx.doi.org/10.1016/S0002-9343
(02)01211-1). In a special themed article, Holmboe (2004)
states that simulated patients and other simulation technologies
are considered as ‘being important and reliable tools for
teaching clinical skills and evaluating competence’ but also
emphasises that ‘they cannot substitute to the direct observation
by faculty of trainees’ clinical skills with actual patients’. Hence,
because of deficiencies in faculty direct observation evaluation
skills, automated scoring of patient interactions has been
proposed as a way to limit the effort required for mentor
evaluation (Nielson et al. 2003). However, we need to be aware
Practice points
. Virtual Patient cases with automated scoring might be
used as a complementary method for summative
assessment.
. Students need to be trained in mastering the assessment
system prior to exams.
. Scoring rubrics should be developed and validated
before implementing computer-based assessment.
. Human assistance should be limited because of possible
confounding influence on students’ outcomes.
Correspondence: Olivier Courteille, Karolinska Institutet, Berzelius vag 3, S-171 77 Stockholm, Sweden. Tel: 46-8-524 8 7289; fax: 46-8-34 51 28;
email: olivier.courteille@ki.se
yDeceased.
e66 ISSN 0142–159X print/ISSN 1466–187X online/08/030066–11 � 2008 Informa UK Ltd.
DOI: 10.1080/01421590801910216
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
of the necessity first to ensure the construct validity of automated
scores (Collins & Harden 1998).
Simulated patients have therefore been suggested to be
useful as assessment tools in Objective Structured Clinical
Examinations (OSCEs) or in other assessment methods in
evaluating students’ interactions with patient related medical
issues, such as clinical reasoning and/or medical problem
solving abilities (Collins 1992; Schuwirth & Van der Vleuten
2003). More recently, virtual patients (VP) have been gradually
introduced as a complementary method to simulated patients
because they support active and reflective learning (Clyman &
Orr 1990; McGaghie 1999).
Interactive simulation of patients (ISP)
ISP is a comprehensive and high-fidelity virtual patient-based
learning tool designed for medical and healthcare students
(clinical level) to explore and solve clinical cases in respect of
diagnosis (Bergin & Fors 2003). The system aims to help
students to practise clinical reasoning skills and has been
designed for that purpose to resemble a realistic patient
encounter covering extensive functions for medical history,
physiological examination procedures and laboratory/
imaging tests.
To achieve a sense of authenticity in a virtual healthcare
situation, the ISP enables students to freely navigate without
any pinpoints and choose, for example, to
. take the patient’s illness history and ask any medical related
question (via an interactive dialogue with text entry, natural
language processing and video-clip based answers), (see
Figure 1).
. ‘perform’ any physical examination (most medical
examination procedures are available); (see Figure 2).
. request one or several laboratory/imaging tests,
. suggest a preliminary diagnosis,
. go back to either the illness history section, or order more
lab tests and/or perform physical examinations,
. ask for feedback.
The ISP is also designed to act as naturally as possible,
meaning that the virtual patient can be programmed to ‘react’
emotionally on, for example, repeated questions, unnecessary
questions, or unwanted questions related to sexual behaviour
(if not medically indicated) (Bergin & Fors 2003; Bergin et al.
2003).
Although the case scenarios are pre-defined, the interaction
itself is unscripted. The scenario establishes initial conditions
but the student’s responses to the virtual patient, as well as
Figure 1. ISP screenshot from the history taking. Note the free text input in natural language.
The use of a virtual patient case in OSCE
e67
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
inherent flexibility in how the ISP patient is allowed to react,
aim to result in an authentic case interaction. ISP is also
capable of tracking the individual learner’s decision paths and
interactions during the case encounter in detail.
Previous studies have shown that ISP is an engaging
and trustworthy learning tool (Bergin et al. 2003) with
potential to offer realistic patient case scenarios, therefore,
it has been suggested to use this case simulation
method as one of several possible assessment methods in a
clinical exam.
Objective structured clinical examination
OSCE is a well-known assessment method that has been
developed for use in both formative and summative
assessment of students’ clinically related knowledge and
performance (Harden 1990; Clyman & Orr 1990). The basic
ideas behind the OSCEs are that this type of assessment is
structured – trying to be more objective than the merely
passive observation during clinical rotations, and that they are
intended to assess clinically relevant procedures like patient
interviews, medical decision making and practical tasks
(Harden et al. 1992; Newble 1992).
However, OSCE exams also have some potential problems
such as that they can be very resource intensive in terms of the
number of teachers/observers needed as well as the logistics
involved in terms of time allocation and facilities (Van der
Vleuten et al. 1989). Even though they are intended to be
structured and objective, the judgments are still made by
individual teachers/evaluators whose assessments criteria
might jeopardize both reliability and validity (Weatherall
1991; Wilson et al. 1969).
Aim
The primary aim of this pilot study was to assess the potential
of a VP case as part of an examination of clinical reasoning and
problem solving ability among medical students. Our research
questions were the following:
. Can a VP learning tool like ISP be used for assessment and
thus be able to differentiate reliably between students’
performances, including gender-related differences?
. What are the necessary modifications and do the students
need to be trained beforehand?
Secondary aims were to try to measure the feeling of
realism of the ISP system and perform a preliminary
observation of the affective impact of the VP on the student’s
confidence and ability to solve clinical problems. This led us to
formulate an additional research question as follows:
. Can we notice an emotional incidence on the social
interaction despite time and situational constraints?
Figure 2. ISP screenshot from the physical examination.
O. Courteille et al.
e68
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
Methods
The OSCE exam
The study was conducted in May 2004 during the OSCE
procedure at the surgery course of the five and a half-year long
medical programme at Karolinska Institutet (KI), Stockholm,
Sweden. The general examination goal of the OSCE in the
surgery course was to assess the students’ skills and ability to
perform basic physical examination (e.g. examination of the
knee joint) or procedures (e.g. local anaesthesia) and also to
assess the students’ ability to carry out a systematic patient
interview, in order to figure out the diagnosis and suggest
adequate examination and treatment as defined in the
curriculum for the course. During the examination, the
students rotated through a series of 12 different skills stations
during 90 minutes. They had to perform a specific task at each
station and their performance was scored by an observer using
a predetermined checklist or rating scale. Different session
lengths were allocated to the stations depending on the task:
5 minutes for each of the eight ‘short stations’ and 10 minutes
for each of the four ‘long stations’. At each hospital, four of the
long stations consisted of two ISP’s and two Standardized
Patients (SP) who were human actors), see Figure 3.
The common goal for the participants at both the SP and ISP
stations was to take a short history and then either to inform
the patient about the findings (SP) or to make a diagnosis (ISP)
within 10 minutes.
All of the fourth year students (n¼ 118), enrolled in the
20-week clinical course in surgery participated in the OSCE
exam. They were dispached over four university hospitals
(later on referred to as H1, H2, H3 and H4) and divided in two
subgroups of up to 16 students each.
The ISP station and the case
Since ISP was not originally designed for assessment purposes,
special customizations and set-ups had to be considered.
Two surgeons were involved in the redesign process of an
existing colorectal case (requiring about 16 person-hours
in total). The customized technical design and content
management was performed by two developers and required
about 48 person-hours.
Furthermore, none of the students had tried the ISP system
before, so in order to facilitate the use of ISP in this pilot study,
every ISP station was assigned an assistant who knew how to
run the system. Moreover, since this was a pilot test, the
performance on the ISP station was decided to only be able to
result in positive outcomes for the students, meaning that their
overall test score might increase if the ISP case was handled
very well, but that no student could be down-graded due to
a bad result on ISP. Thus no student would fail the OSCE exam
because of a bad performance on ISP.
The basic requirements for solving the ISP case were
considerably simplified as compared with the cases used for
learning, in order to make it possible to complete the case
GeneralSurgery 3
AnesthesiaStandardised
Patient
ISP
2-1
Long Stations(10 min each)
Anesthesia
Questionnaire
ISP
StandardisedPatient
2-2
X-Ray 11
Orthopedics 2
Orthopedics 1
GeneralSurgery 2
GeneralSurgery 4
Urology 1
GeneralSurgery 1
Short Stations(5 min each)
Questionnaire
Figure 3. Set-up for short and long stations.
The use of a virtual patient case in OSCE
e69
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
within 10 minutes. For instance, there were fewer lab tests
available in the simulation than usual (but still enough for not
providing too many clues and/or revealing the nature of the
case) and the illness history was made unusually straightfor-
ward with a very ‘cooperative’ patient. The case used was
based on a 68-year old female with rectal cancer and with
a relatively clear medical history and symptoms.
To facilitate grading, the case creator (an experienced
clinical teacher with expertise in the actual medical domain)
was asked to specify the most important illness history
questions as well as the most relevant physical examination
procedures and laboratory tests. Two other senior clinicians
also validated the case. In total, 27 illness history questions (out
of which 11 were judged as required), 12 physical examinations
(out of which 10 required) and 11 lab tests (out of which
7 required) were identified as important for this ISP case.
When running the ISP case, none of the required questions/
procedures or any feedback on these were revealed to the
students, until a correct diagnosis had been submitted. The
system was also automatically recording which interactions
were made with the system. The results were stored in a
database format for enabling further analyses, for instance,
computing how many of the most relevant questions had been
asked and procedures undertaken, in what order they
appeared and the time elapsed for each task.
Assistants
An assistant whose task was to introduce the system briefly for
the student and help him or her to navigate and interact
smoothly with the programme was available at each ISP station
(n¼ 8). The assistant was also instructed, if needed, to help the
student to formulate system compliant history questions by
offering to handle the keyboard and/or get the student back on
track if he or she was deviating too much from the objectives
(due to lack of familiarity).
In order to avoid interference with the results, the assistants
should not reveal the correct diagnosis for the students or
supply too much ‘medical help’ in solving the case. When the
ISP session was over, the assistants handed out a paper-based
summary of the case, including formative feedback to the
examinees.
The enrolled assistants were physicians, clinical teachers
or other persons who were very familiar with the ISP system.
Data collection
The ISP system automatically tracked each student’s interac-
tion. This information, gathered in log files, contained
complete and detailed chronological data on medical history
questions asked, physical examinations performed and lab
tests ordered, interaction time and navigation paths, and finally
diagnoses suggested/submitted, confidence scale and justifica-
tion of the suggested diagnoses.
Two questionnaires were also conducted during the OSCE
procedure. A main questionnaire (online-based and delivered
as one of the four long stations) evaluating the surgical course
and the OSCE as a whole, and a second questionnaire specific
to the ISP (paper-based and handed out to the participants
immediately after the ISP sessions). The students were
asked to return the ISP-specific questionnaire by post.
This questionnaire was anonymous and collected information
on demographic data as well as on usability, attitudes and
expectations towards the ISP as a new examination tool.
Video observations
Additionally, in order to measure the possible affective impact
of ISP on examinees, video observations were performed on
four of the eight ISP stations. A DV-video camera was placed
behind the computer screen and used a wide angle for
covering both the student’s and the assistant’s mutual
interaction, as well as the student’s own interaction with the
ISP system. A coding process for further analysis was
developed and based on the following variables: interaction
assistant/student (weak, medium, intensive); expression of
uncertainty (doubtful, neutral, certain); external signs of stress
(low, medium, high); flow (frustrating, normal, playful); mouse
handling (student, assistant) and finally keyboard handling
(student, assistant).
An exact binary logistic regression analysis (LogXact 7.0,
Cytel Software Inc.) was carried out to investigate the
association between indicators like the behavioural variables
described above (exposure variables) and the student’s
outcomes (outcome variable).
Grading/assessment potential
To study the potential of ISP as an assessment tool to
differentiate between students’ individual performances, an
initial hypothesis was set up: there should be ‘enough’
variability among the students’ individual results in solving
the case (Friedman Ben-David 2000). Typical ordinary exams
at KI usually have a level for passing set at about 70–80%
correct answers, and the results of most exams show that most
students pass this level, but that some are well below that level
on the first trial. Therefore, we set a ‘pass rate’ of about 70–80%
on the ISP station as a goal.
To study the potential of ISP to be able to present cases
with accurate complexity, the variability in suggesting a correct
diagnosis, the percentage of correct illness questions, physical
examinations and lab tests proposed were also measured.
Furthermore, to judge this pilot test as positive, it was stated
that most students should be able to come to a preliminary
diagnosis within the time limit allocated to the long
OSCE stations (10 min). The completion time was
therefore computed.
Ethical considerations
This study was approved by the ethical committee of
Karolinska Institutet. In order to comply with these ethical
considerations, a consent form was handed beforehand to
every examinee informing him/her about the specific aspect of
the ISP station, including possible video observation, and
that the ISP station could not influence their grading in a
negative way.
O. Courteille et al.
e70
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
Results
Overall results
Altogether, 110 students out of the 118 in the course
participated in the OSCE examination. The eight missing
were either sick or had other allowed excuses. All 110 students
volunteered to use the ISP station. The overall performance
can be observed in Table 1 below.
As can be seen, the 110 students worked with the ISP case
for about 7 minutes 45 seconds on average. Most of the
students arrived at a correct diagnosis after 1.12 tries on
average. However, as indicated in Table 1, the assistants at
hospital 1 seemed to have helped the students to arrive at a
correct diagnosis, since a larger than anticipated majority of the
students at hospital 1 found the correct diagnosis. These two
assistants later on confirmed this assumption in part.
Furthermore, the students at hospital 3 had only 40%
(group 1) and 42% (group 2) correct diagnoses on average,
which might indicate that their assistants were less helpful than
the other assistants. This was also indicated by the fact that
the same students ran out of time, limiting the possibility of
submitting a second diagnosis. However, it is interesting
to note that all students who had time to submit a diagnosis
already succeeded on the first trial. Figure 4 shows the
direct effect of the degree of assistance on the mean
session time.
In Table 2, the detailed results from the medical history
and lab sections of ISP are shown. Unfortunately due to
an unanticipated error in the logging system, the
physical exam procedures were not logged. As can be seen,
on average 8.8 questions were asked to the patient and
63% of these were highly relevant. Those who asked the
most history questions (e.g. hospital groups 2.1, 3.1 and 3.2)
formulated them with a higher number of words
per sentence. As a result, their completion time (above
8 min) was also longer than for other groups. On average,
Table 1. Average session durations, number of correct diagnoses and the average of students’ own estimation of their confidence insuggesting a correct diagnosis for the ISP-OSCE pilot.
Mean no. of correct diagnoses Mean no. of submitted diagnoses
HospitalHospitalgroup
No. ofexaminees
Mean sessiontime
Correctdiagnosis
Correct onfirst trial
Only 1 subm.diagnosis
More than 1subm. diagnosis
Confidencelevel
H1 1. 1 17 6 min 36 sec 94% 76% 82% 18% 88%
Males 2 7 min 26 sec 100% 50% 50% 50% 75%
Females 15 6 min 29 sec 93% 86% 87% 13% 90%
1. 2 12 7 min 06 sec 100% 100% 100% 0% 95%
Males 4 7 min 05 sec 100% 100% 100% 0% 93%
Females 8 7 min 07 sec 100% 100% 100% 0% 96%
H2 2. 1 11 8 min 56 sec 91% 54% 64% 36% n/a
Males 6 9 min 39 sec 83% 50% 66% 34% n/a
Females 5 8 min 04 sec 100% 60% 60% 40% n/a
2. 2 15 8 min 22 sec 87% 67% 80% 20% 93%
Males n/a n/a n/a n/a n/a n/a n/a
Females n/a n/a n/a n/a n/a n/a n/a
H3 3.1 15 8 min 12 sec 40% 40% 100% 0% 76%
Males 3 8 min 07 sec 33% 33% 100% 0% 85%
Females 12 8 min 13 sec 42% 42% 100% 0% 73%
3.2 12 7 min 43 sec 42% 42% 100% 0% 90%
Males 7 7 min 48 sec 43% 43% 100% 0% 93%
Females 5 7 min 37 sec 40% 40% 100% 0% 86%
H4 4. 1 12 7 min 21 sec 75% 67% 92% 8% 90%
Males 8 7 min 08 sec 75% 75% 100% 0% 87%
Females 4 7 min 49 sec 75% 50% 75% 25% 96%
4.2 16 7 min 47 sec 75% 56% 81% 19% 85%
Males 6 7 min 44 sec 83% 83% 100% 0% n/a
Females 10 7 min 49 sec 70% 40% 70% 30% 85%
Totals/averages 110 7 min 45 sec 74% 63% 87% 13% 88%
Males n/a 7 min 52 sec 72% 62% 88% 12% 86%
Females n/a 7 min 28 sec 76% 60% 84% 16% 84%
1.1 1.2 2.1 2.2 3.1 3.2 4.1 4.2
Hospital group
100
200
300
400
500
600
700
800
Ses
sion
leng
th (
seco
nds)
Median 25%-75% Non-outlier range Outliers Extremes
Figure 4. Mean distribution of session duration per assistant
(hospitals H1, H2, H3, H4 are sub-divided into hospital groups:
1.1, 1.2, 2.1, etc.).
The use of a virtual patient case in OSCE
e71
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
5.6 lab tests were ordered, out of which 51% had been
recommended by the case author.
The gender-related performance can be observed in
Table 1 and Table 2. We observed that female students
had generally higher means of required history
questions asked (69% compared to 57% for males, p¼ 0.006)
and lab–tests ordered. Their overall performance was also
slightly better compared to male students.
Assistants
Most of the students reported that the assistants helped them to
feel more calm and comfortable. As a matter of fact, their
presence was experienced as a relief for ‘first time users’ in
particular for the more stressed students. It turned out that
this somewhat special (and resource-consuming) pilot test
environment provided a convenient method to monitor the
students’ clinical reasoning processes in a natural way. From
analyses of the videotapes and from discussions with the
assistants, it was found that the presence of the assistant seems
to have made the students think aloud spontaneously and to
verbalize what they were doing and why they were doing it
while working with the ISP system.
Nevertheless, as reported later on, the assistants clearly
influenced the performance of the students.
Questionnaires
The ISP-related questionnaire was answered by 68
students out of 110 (62%). Their median age was 26 years.
The results of the questionnaire are shown in Table 3. As one
could expect, due to the short time allocated to the ISP station
and the fact that the students had never used the ISP system
before, not all students’ answers were positive. The major
reported complaint (question 2b) was about the
limitations of the interactive dialogue with the patient.
This might be explained by the fact that none of the students
had used the system before, and to a certain degree also
by the fact that the case’s dialogue interaction had not been
fine-tuned.
However, the majority of students expressed agreement
with the potential of ISP-like systems in future exams. Most of
the students reported that they experienced the VP case as
engaging (63%) and realistic (78%).
In Table 4, the ISP-related questions of the general on-line
delivered questionnaire are shown. All students answered this
questionnaire. However, due to the rotation scheme of the
stations, 16 students (e.g. four students at each hospital) filled
it out before they had used ISP (their blanked ISP-related
answers were not considered). As can be seen, the opinions
of the students differed to some extent from hospital to
hospital. The overall opinions about the surgery course and
the OSCE exam were rated higher than the ISP programme.
This can be observed for students from hospitals H3 & H4,
whose relatively bad performance and/or quality of
assistance can be associated with rather low ratings for ISP.
The very short and first-time experience with the ISP
might also have conveyed a negative or diffuse general
impression.
Video observations
Due to limited human resources we could only video-monitor
half of the ISP stations. As a result, 47 students out of 110 were
videotaped during their ISP session. No apparent effect from
the presence of the video camera could be noticed on
students’ performance. In fact, one female student reported
that she ‘didn’t feel as nervous as being filmed during an
encounter with a ‘real’ patient’.
As hypothesised, the quality and intensity of assistance
provided had a strong positive impact on students’ outcomes
(Table 5). The logistic regression analysis showed that the
effect of a high degree of interaction between assistant and
student (e.g. an intensive assistance) provided an estimated
odds ratio of 17.21 (95% C.I. [1.30; 1032], p¼ 0,025)
when compared to a weak interaction (baseline group).
This indicates that there is a significant correlation between
the student‘s outcome and the interaction between the student
and the assistant.
The statistical analysis also showed that the assistant cluster
3 (senior physicians) and assistant cluster 2 (researchers
experienced in medical simulations) had odds ratios of 17.37
(95% C.I. [2.48; 763], p < 0.001) and 2.77 (95% C.I. [0.84; 10.82],
p¼ 0.105) respectively when compared to the (less
experienced) assistant cluster 1 (undergraduate students of
Table 2. Mean values (amounts and percentages) for history questions asked and lab tests ordered.
Hospital
group
n Mean no. of
history questions asked
Mean % of history
questions required
Mean no. of words
per question
Mean no. of
lab tests ordered
Mean % of lab
tests required
1.1 17 6.9 78% 1.6 3.2 73%
1.2 12 7.8 77% 3.4 5.8 56%
2.1 11 10.2 61% 4.6 5.4 48%
2.2 15 9.1 62% 2.8 6.9 44%
3.1 15 10.0 60% 4.7 6.7 53%
3.2 12 9.9 56% 3.8 5.9 32%
4.1 12 7.7 50% 3.6 5.9 38%
4.2 16 8.7 62% 3.5 5.1 50%
Males 36 8.4 57% 3.7 5.8 46%
Females 59 8.5 69% 3.3 5.1 55%
Totals/averages 8.8 63% 3.5 5.6 51%
P-value 0.606 0.006 0.166 0.297 0.203
O. Courteille et al.
e72
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
Tab
le3
.M
ed
ian
and
ave
rage
valu
es
for
the
ISP
-rela
ted
quest
ionnaire
.
Quest
ion
68
resp
onse
s:45
fem
ale
s,21
male
s(2
did
not
rep
ort
gend
er)
1.
Was
the
ISP
desi
gned
insu
ch
aw
ay
that
you
could
ap
ply
your
know
led
ge?
Med
ian:
4
(scale
rangin
gfr
om
1‘‘H
ighly
dis
agre
e’’
to6
‘‘H
ighly
agre
e’’
)52
resp
onse
s
2a.
What
do
you
thin
kis
the
best
thin
gR
anki
ng
of
the
most
freq
uently
cite
dfa
cto
rs(n
o.
of
resp
ond
ents
,(%
)):
ab
out
the
ISP
exa
min
atio
n?
1.
Realis
m/a
uth
entic
ity/t
rust
wort
hin
ess
of
the
case
:10
(15%
)
2.
Fun/e
njo
yab
le/e
ngagin
g:
9(1
3%
)
3.
Easy
access
tola
bte
sts
and
phys
icalexa
m:
8(1
2%
)
4.
New
learn
ing
mod
e/inst
ructiv
e/e
ducatio
nal:
7(1
0%
)
64
resp
onse
s
2b
.W
hat
do
you
thin
kis
the
wors
tth
ing
Ranki
ng
of
the
most
freq
uently
cite
dfa
cto
rs(n
o.
of
resp
ond
ents
,(%
)):
ab
out
the
ISP
exa
min
atio
n?
1.
Lim
itatio
nof
inte
ractiv
ed
ialo
gue
with
patie
nt:
38
(56%
)
2.
Lack
of
time:
14
(21%
)
3.
Doesn
’tfe
elauth
entic
:10
(15%
)
4.
Not
fam
iliar
with
the
syst
em
/this
way
of
solv
ing
pro
ble
m:
5(7
%)
3.
Should
statio
ns
of
ISP
typ
eb
euse
din
the
pra
ctic
alass
ess
ment
test
(OS
CE
)?
Yes
24
(35%
)
No
21
(31%
)
Not
sure
23
(34%
)4.
What
mad
eth
eIS
Pcase
engagin
g?
Pos
43
(63%
)
Neg
11
(16%
)
n/a
14
(20%
)
(op
en
free
text
quest
ion)
5.
What
contr
ibute
dto
som
ese
nse
of
realis
min
the
ISP
case
?
Pos
53
(78%
)
Neg
9(1
3%
)
n/a
6(9
%)
(op
en
free
text
quest
ion)
Exam
ple
so
fp
osit
ive
an
sw
ers
toQ
4:
.G
ood
stru
ctu
re.
With
his
tory
taki
ng,
statu
setc
.
.G
ood
tose
eth
ep
atie
nt
and
hear
her
voic
e.
.To
be
ab
leto
thin
kand
reaso
nw
ithth
ehelp
of
the
com
pute
r.G
ood
com
ple
ment
tooth
er
teachin
g.
.The
many
poss
ibilitie
sto
exa
min
ea
patie
nt.
Gre
at
varia
tion.
Inte
rest
ing
tob
eab
leto
try
many
tools
as
inre
allif
eand
see
what
they
would
giv
eyo
u.
.The
feelin
gof
gett
ing
itrig
ht
–th
ed
ete
ctiv
ew
ork
–and
havi
ng
access
toall
op
tions
inan
imm
ed
iate
way.
.The
motiv
atio
nto
‘solv
e’
the
case
.
.To
be
ab
leto
have
exa
min
atio
nd
one
imm
ed
iate
lyand
als
oto
get
resu
ltsrig
ht
aw
ay!
.E
xciti
ng,
fun
toget
resu
ltsfr
om
lab
dire
ctly
,good
exe
rcis
e.
That
the
patie
nt
was
ab
leto
answ
er
my
quest
ions.
.D
irect
feed
back,
often
mis
sing
inre
allif
e.
.To
be
ab
leto
freely
choose
exa
min
atio
ns/
test
s.
.That
the
patie
nt
desc
ribes
her
sym
pto
ms
inher
ow
nw
ay
–not
just
inte
xtb
ook
fash
ion.
Exam
ple
so
fn
eg
ati
ve
an
sw
ers
toQ
4:
.Ib
ecam
em
ost
lyirr
itate
don
the
case
.
.N
ice
pic
ture
,and
the
way
exa
min
atio
ns
were
done,
but
Id
on’t
thin
kit
was
effic
ient.
Good
pic
ture
sand
exa
min
atio
ns
though.
.It
was
fun
totr
yto
reach
the
right
dia
gnosi
suntil
Ire
aliz
ed
that
she
neve
rw
ould
answ
er
fund
am
enta
l,im
port
ant
quest
ions
and
that
she
began
sayi
ng
stuff
that
Ialre
ad
yhad
ask
ed
with
out
pro
perly
answ
erin
gth
eq
uest
ion.
That
only
mad
em
eirr
itate
dand
Iw
ent
on
tod
oa
phys
icalexa
min
atio
nin
stead
..
That
Iw
ante
dto
find
out
the
pro
ble
mof
the
patie
nt.
But
itd
idn’t
go
smooth
lyall
the
time.
.Too
stre
ssfu
lfo
ran
exa
min
atio
nsi
tuatio
nto
becom
eengaged
.U
nd
er
oth
er
circ
um
stances
can
Iim
agin
eth
at
one
would
becom
eengaged
because
itw
as
rath
er
realis
tic.
Exam
ple
so
fp
osit
ive
an
sw
ers
toQ
5:
.That
the
lad
yta
lks
tom
e.
.To
see
the
patie
nt
and
that
she
talk
ed
toyo
u.
.The
voic
eand
the
pic
ture
s.
.The
‘livi
ng’
patie
nt.
.It
felt
inte
ractiv
e,
itaffecte
dth
eoutc
om
e.
.R
ealp
atie
nt
on
the
scre
en,
with
avo
ice.
The
varia
tion
of
dia
gnost
icp
oss
ibilitie
s.
.That
the
patie
nt
talk
ed
and
move
d.
Not
just
rep
lied
with
text
.
Exam
ple
so
fn
eg
ati
ve
an
sw
ers
toQ
5:
.Ith
ink
the
inte
ractio
nw
ithp
atie
nt
did
n’t
work
,oth
erw
ise
inte
rest
ing.
.A
realp
ers
on
speaki
ng,
unfo
rtunate
lysh
ealw
ays
said
the
sam
eth
ings.
.It
did
n’t
feelre
alis
tic,
unfo
rtunate
ly.
.The
diff
icult
his
tory
taki
ng.
The use of a virtual patient case in OSCE
e73
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
Medical Informatics), meaning that differences existed among
the students in the perceived usefulness of assistance.
The student’s outcome also appeared to be strongly associated
to the flow experienced with the virtual patient (i.e. the degree
of affective response).
Grading possibilities
As mentioned above, no students could be ‘down-graded’ due
to their performance on the ISP-station in this pilot study, but
they could benefit from a good outcome on the ISP station as
a way to pass the whole OSCE-exam (if they were short of only
one or two points in the other stations). Therefore, the
assistants were asked to fill in a special form for each of
the students, indicating their overall performance on the
ISP station.
In analysing these forms, it was rather clear that most
assistants indicated a potential of using the ISP station for
grading. Items like students’ individual behaviour, ability to
formulate adequate history questions, flexibility in re-formulat-
ing initial diagnostic strategies, combined with analyses of the
individual log files, were mentioned as positive opportunities.
As observed above, the inter-hospital differences were
higher than anticipated, with the inter-standard variability
of important illness history questions asked (averages of
63%: S.D.¼ 18.99), of required labs ordered (average of
51%: S.D.¼ 26.60) and of correct diagnoses (average of
74%). This indicates, as hypothesised, that there was a rather
large variability among the students’ ability to solve the case,
thus indicating that ISP-like systems might be used as one part
of an ‘assessment toolkit’ for assessing students abilities to
solve clinical cases. Even though the range of correctly
performed physical examination procedures was not recorded,
the assistants indicated that there was a satisfactory variability
also in this aspect. The proportion of students supplying the
correct diagnosis on first trial was 62%.
Discussion
This pilot study investigated the potential of a VP-based system
tested during an OSCE-exam. Even though there are a number
of methodological shortcomings and limitations, interesting
results were found.
The case used and the assessment results
Although the case presented at the ISP station was re-designed
to be easy to solve, it turned out that, given the time
constraints, it was difficult for some examinees to solve the
case without any external help. However there were clear
indications that, if students can be trained and run a mock
exam first, and also be given some more time to solve the
Table 4. Examinees’ answers to the general on-line delivered questionnaire (n¼110).
Question Median hosp. 1 Median hosp. 2 Median hosp. 3 Median hosp. 4
Do you think that ISP gave you a possibility to apply your knowledge? 6 6 3 5
Do you think that ISP-like program should be used in examination? 5 5 3 4
Rate your opinion of the surgery course in general. 9 8 7 7
Was the OSCE relevant? 8 8 8 8
The ISP-program stimulated problem solving. 6 6 3 5
The scale of the answers ranged from 1 to 9, where 9 is best.
Table 5. Video-observational variables interaction on the outcome results (% and odds ratio).
95% C.I.
Observational variable n Comparison between baseline groups (n,%)a OR Lower Upper P-value
Interaction assistant/student 47 Intensive (24, 83%) vs. Weak (5, 20%) 17.21 1.30 >500 0.026
Intensive (24, 83%) vs. Medium (18, 67%) 2.44 0.47 14.38 0.374
Medium (18, 67%) vs. Weak (5, 20%) 7.27 0.56 426 0.177
Assistant clusterb 111 3 (29, 97%) vs. 1 (55, 61%) 17.37 2.48 763 <0.001
3 (29, 97%) vs. 2 (27, 81%) 6.18 0.63 311 0.162
2 (27, 81%) vs. 1 (55, 61%) 2.77 0.84 10.82 0.105
Flow with ISP 47 Playful (6, 100%) vs. Frustrating (7, 29%) 12.19 1.21 >500 0.033
Playful (6, 100%) vs. Normal (34, 74%) 6.57 0.89 80.65 0.070
Normal (34, 74%) vs. Frustrating (7, 29%) 2.72 0.34 >500 0.384
Expression of uncertainty 47 Certain (22, 86%) vs. Doubtful (7, 14%) 30.98 2.60 1834 0.002
Certain (22, 86%) vs. Neutral (18, 72%) 13.74 1.22 771 0.028
Neutral (18, 72%) vs. Doubtful (7, 14%) 2.38 0.055 2.6 0.474
External signs of stress 47 High/Medium (23, 65%) vs. Low (24, 75%) 1.58 0.38 6.93 0.679
Mouse handling 48 Assistant (7, 86%) vs. Student (41, 68%) 2.74 0.28 138 0.658
Keyboard handling 48 Assistant (12, 83%) vs. Student (36, 67%) 2.46 0.42 27.6 0.470
a (n,%): No. of observed students, percentage with correct diagnosis.bAssistant cluster consists of 3 groups based on the following profiles: 1. Undergraduate students of Medical Informatics (4 persons) 2. Experienced researcher in
Medical Simulation (2 persons) 3. Senior physician (2 persons).
O. Courteille et al.
e74
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
cases, ISP-like systems might have a potential as a summative
assessment tool (i.e. without human assistants). Earlier studies
of ISP indicated that the system is fairly easy to learn, requiring
only 20 minutes or so to learn how to interact with the system
(Bergin & Fors 2003).
It can be noted that ISP was originally specially designed
for collaborative learning, and previous studies have showed
that the most ideal situation for solving simulated clinical
problems is peer-to-peer collaboration. However, during the
OSCE exam test here, the students could only rely on the
assistant’s ability and willingness to provide medical advice,
even if the latter were told to act as neutrally as possible.
This unique situation fostered the thinking aloud process
on the examinee’s part. Consequently, it allowed the assistant
to cope with the student’s ongoing clinical reasoning process
and thereby to canalize his/her preliminary thoughts in case
he/she was deviating from the main track. The observational
data collection from the 47 video recordings on the ISP stations
showed clearly that many of the examinees were uncon-
sciously thinking aloud while an observer/rater was sitting
beside them. The quality of the interaction and the degree of
engagement from the assistant also appeared to affect
the overall performance. This is something that needs to be
further studied.
Variation in assistant behaviour and intervention
In their recent book, Developing Organizational Simulations:
A Guide for Practitioners and Students, Thornton & Mueller-
Hanson (2004) emphasize the importance of using ‘. . . trained
assessors to observe behaviour, classify behaviour into the
dimensions being assessed, and make judgments about
participants’ level of proficiency on each dimension being
assessed’ (Thornton & Mueller-Hanson 2004, p. 5).
The assistants at the ISP stations could act both as
instructors and as raters, even if they did not grade the
students in detail in this pilot study. But we do not know how
many students actually worked on the case independently
(i.e. with almost no help from the assistant). We identified
problems with scoring validity due to the fact that the
assistants/examiners’ level of help was not standardized, thus
compromising the objectivity of their intervention and the
accuracy and fairness of the rating system.
Besides, the use of human assistance is a rather resource-
consuming task and we noticed that the instructors seemed to
influence the outcome to a high degree. It eventually became a
psychometric concern in our case.
Validity and usability in exams
One key issue, related to face validity, is that the performance
of standardized patients (human actors) might not be regular,
but varies due to human factors (Adamo 2003). Actors or
external examiners are trained to react and behave the same
way and in the same manner at each session, but the same
actors/evaluators cannot perform identically to assess a full
course of hundreds of students, resulting in the need for
several individual assessors/actors. This opens up the potential
for non- standardized assessment, which could interfere with
the results more than lack of familiarity with technology).
In contrast, a virtual patient offers a measurement tool that
guarantees the regularity and reproducibility of patient
behaviour as well as the judgement of the student’s interaction
(including provoked reactions and conveyed emotions) with
the case over time (given that no assistants are used).
Therefore, we suggest that students should be judged on
the basis of predefined scoring rubrics with well defined
cut-off points, for ease of administration and grading. For
example, a possible grading scheme could be defined in terms
of pounded score based on the percentage of relevant
questions asked in the history taking, physical examinations
and lab tests ordered, as well as correct diagnosis in relation to
the number of submitted diagnoses.
Conclusion
The very fact that all 110 students volunteered both to use ISP
and completed the whole session, indicates that in general
they were positive about the use of an ISP-like simulation
system for assessment. Findings from the statistical analysis
showed that significant differences existed among the
students in the perceived usefulness of human assistance
and their degree of affective response towards the system.
One randomized study (Smith et al. 1995) and one
descriptive study (Holm 1996) have shown that females
score better than males after a training course in
communication skills. Interestingly, these effects could be
significantly measured by ISP (means of relevant history
questions). Confirming thereby previous observations done by
Vanden Brink-Muinen et al. (1998).
The present study indicates that computer-based simula-
tions like ISP are able to present and simulate realistic patient
encounters to an acceptable level of complexity and allow
differentiation of one student’s performance from another,
including gender-related differences. A strength of ISP-like
systems is that these virtual patient cases can also be
programmed to score automatically and immediately present
results of the examination thus saving expensive labour and
facility resources.
Therefore, if students are trained beforehand, limiting
the need for assistants, we believe that VP cases can be a
useful complementary tool for assessing some of the many
components of clinical competence, including clinical
reasoning skills, but they should always be combined with
other methods (Pugh & Youngblood 2002).
Notes on contributors
OLIVIER COURTEILLE PhD candidate at Karolinska Institutet, senior
developer and project manager in Educational Technologies at the
VP-Lab, Karolinska Institutet.
ROLF BERGIN (deceased 2006) Formerly Senior Researcher in Medical
Education and Simulation at Karolinska Institutet.
DAG STOCKELD Teacher at Karolinska Institutet since 1986. Senior
Consultant in Surgery at Danderyd University Hospital in Stockholm.
SARI PONZER Professor in Orthopaedics at Karolinska Institutet.
Responsible for organizing the OSCE together with other senior lecturers
at Karolinska Institutet. Member of the research team, participated in
designing the study and commented on the draft paper.
The use of a virtual patient case in OSCE
e75
Dow
nloa
ded
By:
[Lea
rnin
g R
esou
rce
Cen
ter/W
rs] A
t: 14
:44
29 J
une
2008
UNO FORS Researcher and teacher at Karolinska Institutet since 1980.
Professor in Medical Educational Simulation and Chairman of LIME.
Acknowledgements
The authors wish to thank Dr Staffan Sahlin for adapting and
evaluating the remodelled virtual patient case and allowing us
to test the ISP system in the OSCE exam. The authors are also
grateful to the 110 students who volunteered to participate in
this study. Thanks also to Jacob Bergstrom for statistical
assistance and analysis.
The study was supported by grants from the Wallenberg
Global Learning Network (WGLN) and from Karolinska
Institutet.
References
Adamo G. 2003. Simulated and standardized in OSCEs: achievements and
challenges 1992-2003. Med Teach 25:262–270.
Appel J, Friedman E, Fazio S, Kimmel J, Whelan A. 2002. Educational
assessment guidelines: a Clerkship Directors in Internal Medicine
commentary. Am J Med 113:172–179.
Bergin R, Fors U. 2003. Interactive Simulation of Patients – an advanced
tool for student-activated learning in medicine & healthcare. Computers
Educ 40:361–376.
Bergin R, Youngblood P, Ayers M, Boberg J, Bolander K, Courteille O, Dev
P, Hindbeck H, Stringer J, Thalme A, Fors U.G.H. 2003. Interactive
simulated patient: experiences with collaborative e-Learning in
medicine. J Educ Compu Res 29:387–400.
Cantillon P, Irish B, Sales D. 2004 Using computers for assessment in
medicine. BMJ 2004;329:606-609, doi:10.1136/bmj.329.7466.606.
Clyman SH, Orr N.A. 1990. Status report on the NBME’s computer-based
testing. Acad Med 65:235–41.
Collins JP. 1992. Real versus standardised patients in the OSCE,
In: RM Harden, IR Hart & H Mulholland (Eds), Approaches to the
Assessment of Clinical Competence, pp. 24–26 (Dundee, UK, Centre for
Medical Education).
Collins JP, Harden RM. 1998. AMEE Medical Education Guide No. 13: real
patients, simulated patients and simulators in clinical examinations.
Med Teach 20:508–521.
Edelstein RA, Reid HM, Usatine R, Wilkes MS. 2000. A comparative study of
measures to evaluate medical students’ performances. Acad Med
75:825–833.
Friedman Ben-David M. 2000. AMEE Guide No. 18: Standard setting in
student assessment. Med Teach 22:120–130.
Guagnano MT, Merlitti D, Manigrasso MR, Pace-Palitti V, Sensi S. 2002. New
medical licensing examination using computer-based case simulations
and standardized patients. Acad Med 2002; 77:87–90.
Harden RM. 1990. The OSCE –a 15 year perspective, In: IR Hart, RM Harden
& J Des Marchais (Eds), Current Developments in Assessing Clinical
Competence (Can-Heal Publications Inc., Montreal, Quebec).
Harden RM, Hart IR. & Mulholland, H. (eds). 1992. Approaches to the
Assessment of Clinical Competence, (Centre for Medical Education,
Dundee, UK).
Holm U. 1996. The Affect Reading Scale: a method of measuring the
prerequisites for empathy. Scand J Educ Res 40:239–253.
Holmboe E. 2004. Faculty and the observation of trainees’ clinical skills:
problems and opportunities. Acad Med. Special Theme: Teaching
Clinical Skills 79:16–22.
Issenberg SB, McGaghie WC, Petrusa ER, Gordon DL, Scalese RJ. 2005.
BEME guide no 4: Features and uses of high-fidelity medical simulations
that lead to effective learning: a BEME systematic review. Med Teach
27:10–28.
McGaghie WC. 1999. Simulation in professional competence assessment:
basic considerations. In: A Tekian, CH McGuire & WC McGaghie (Eds),
Innovative Simulations for Assessing Professional Competence: From
Paper-and-Pencil to Virtual Reality, (Department of Medical Education,
University of Illinois at Chicago, Chicago).
Newble DI. 1992. ASME Medical Education Booklet No 25.
Assessing clinical competence at the undergraduate level. Med Educ
26:504–511.
Nielson, JA., Maloney, C, Robison, R. 2003. Internet-Based Standardized
Patient Simulation with Automated Feedback. AMIA Annual
Symposium Proceedings 2003. p. 952. Available online at
www.ncbi.nlm.nih.gov (PMID: 14728457, Last accessed date: 2007/
06/14).
Pugh CM, Youngblood P. 2002. Development and validation of assessment
measures for a newly developed physical examination simulator. J Am
Med Informatics Assoc 9:448–460.
Sanson-Fisher RW, Poole AD. 1980. Simulated patients and the assessment
of medical students’ interpersonal skills. Med Educ 14:249–253.
Smith RC, Lyles JS, Mettler JA, Marshall, AA. 1995. A strategy for improving
patient satisfaction by the intensive training of residents in psychosocial
medicine: a controlled randomized study. Acad Med 70:729–732.
Schuwirth LWT, Van der Vleuten CPM. 2003. The use of clinical simulations
in assessment. Med Educ 37:65–71.
Thornton GC, Mueller-Hanson RA. 2004. Developing Organizational
Simulations: A Guide for Practitioners and Students (Lawrence
Erlbaum Associates, Mahwah, NJ).
Van den Brink-Muinen A, Bensing JM, Kerssens JJ. 1998. Gender
and Communication Style in General Practice: Differences
Between Women’s Health Care and Regular Health Care. Med Care
36:100–106.
Van der Vleuten CPM, Van Luyk SJ, Van Ballegooijen AMJ, Swanson
D.B. 1989. Training and experience of examiners. Med Educ
23:290–296.
Weatherall DJ. 1991. Examining undergraduate examiners. Lancet
338:37–39.
Wilson GM, Lever R, Harden RM, Robertson JIS, MacRitchie J. 1969.
Examination of clinical examiners. Lancet 1:37–40.
More information on ISP: http://ispvl.learninglab.ki.se
O. Courteille et al.
e76