i
Psychometrics and Assessment Services
July 2015
Technical Report on the Standard Setting Exercise for the Medical Council of Canada Qualifying Examination Part I
MCCQE Part I Standard Setting Report
ii
Contents INTRODUCTION .............................................................................................................................. 1
Pre-Session Activities ............................................................................................................................. 1
Selecting a standard setting method ................................................................................................ 1
Selecting participants and assigning into panels .............................................................................. 2
Selecting test questions for the standard setting session ................................................................ 2
Pre-session materials ........................................................................................................................ 3
Activities During the Two-Day Session .................................................................................................. 3
Orientation ........................................................................................................................................ 3
Defining the borderline candidate .................................................................................................... 3
The practice test ................................................................................................................................ 4
The practice bookmark method ........................................................................................................ 4
Two rounds of bookmarking ............................................................................................................. 5
Recommendation from the Panelists .................................................................................................... 7
Evaluation of the Standard Setting Judgments ..................................................................................... 7
Providing Feedback through an Online Survey ..................................................................................... 8
Concluding Remarks .............................................................................................................................. 9
REFERENCES ................................................................................................................................. 11
Table 1: Canadian and International Medical Graduate Pass/Fail Rates for the years 2012-2014 .... 12
Table 2: Standard Setting Results For Panels 1 and 2 for Rounds 1 and 2 ........................................ 12
Figure 1: Failure Rates for First-Time Takers (Panel 1) .................................................................... 13
Figure 2: Failure Rates for First-Time Takers (Panel 2) .................................................................... 14
Figure 3: Failure Rates for First-Time Takers (Combined Panels ) .................................................... 15
Figure 4: Failure Rates for all First-Time Takers (Round 2) .............................................................. 16
Figure 5: Failure Rates for all First-Time Takers and Hofstee Boundaries ........................................ 17
Appendix A: Demographic Information Sheet ................................................................................ 18
Appendix B: Demographic Summary of the Two Panels ................................................................. 21
Appendix C: Standard Setting Agenda ........................................................................................... 22
Appendix D: Defining Borderline Performance and the Minimally Competent Candidate ............... 24
Appendix E: Form to Document a Bookmark for Each Round ......................................................... 25
Appendix F: Form to Document Hofstee Boundaries ...................................................................... 26
Appendix G: Part I Standard Setting Fall 2014 – Post-Session Survey Summary .............................. 27
MCCQE Part I Standard Setting Report
1
INTRODUCTION In the context of licensing and certification, standard setting has become a critical and
essential component of assessment programs. Standard setting is a process by which
an acceptable level of performance is defined (Kane, 1994, 1998). For the medical
profession, standard setting is the establishment of a qualitative statement of what
minimum level of performance should be attained to practice medicine safely and
effectively. An integral part of the standard setting process is also the establishment of a
cut score on an assessment of interest that is congruent with the definition of a
minimum performance level. At the Medical Council of Canada (MCC), standard setting
is an essential part of every examination program, including the Medical Council of
Canada Qualifying Examination (MCCQE) Part I.
The MCCQE Part I is a computer-delivered examination which assesses basic medical
knowledge and skills expected to be mastered at the end of medical school. It is
composed of a three and a half hour multiple-choice (MCQ) component and a four hour
clinical decision-making (CDM) component. Its MCQ component consists of seven
sections of 28 questions in which testlets of four questions for each of the six disciplines
(Internal Medicine, Obstetrics/Gynecology, Pediatrics, Population Health, Ethical, Legal,
and Organizational aspects of Medicine, Psychiatry, and Surgery) are presented to
candidates. The CDM component is composed of 36 clinical cases each including one
to four questions. CDM questions can either be selected-response type or constructed-
response (CR) type items.
The purpose of the standard setting session for the MCCQE Part I that took place
October 23-24, 2015, was to arrive at a recommended cut score for subsequent review
and approval by the Central Examination Committee (CEC). The most important aspect
of standard setting is the validity of the process and activities. In the sections that
follow, we describe in detail the pre standard setting session activities, as well as the
activities that took place during the standard setting session for the MCCQE Part I.
Pre-Session Activities
SELECTING A STANDARD SETTING METHOD
Standard setting methodologies abound but not all are well suited for the types of items
that are used in the MCCQE Part I. Several methodologies were considered but the
Bookmark method was chosen because of its simplicity and the ease with which both
MCQs and CDM items can be integrated in the cut score (Cizek, 2007). The Bookmark
method is an item mapping procedure where items are ordered from easiest to most
difficult based on operational data and panelists are asked to place a bookmark at the
point at which they believe a minimally proficient candidate would no longer correctly
answer subsequent items presented in the ordered exam form. De facto, this
MCCQE Part I Standard Setting Report
2
corresponds to the cut score for each panelist. A detailed description of participants’
task is outlined in a later section of this report.
SELECTING PARTICIPANTS AND ASSIGNING INTO PANELS
Since the panelists selected for a standard setting exercise represent a microcosm of all
MCCQE Part I examination stakeholders, it is critical to select participants that are
representative with respect to a number of key variables, including the region of
Canada, ethnicity, medical specialty and years of experience. Furthermore, to assess
the reproducibility of the cut score across 2 groups of physicians, we decided to split our
panelists into 2 matched subgroups. The latter allows us to collect critical validity
evidence in support of the recommended cut score.
The process of selecting participants started with an invitation which was forwarded to
physicians from across Canada, targeting Family Physicians as well as a broad range of
other specialists. A total of 22 physicians were retained based on several key criteria
(see Appendix A for the demographic information survey that was filled out by all
potential participants). As previously mentioned, we attempted to select panelists in
both subgroups that were reflective of various regions across the country (i.e., Western,
Central, and Eastern Canada); medical specialty (family medicine, internal medicine,
surgery, obstetrics and gynecology, pediatrics, and psychiatry); ethnicity (i.e., Asian,
Black, Caucasian, First Nation, or Hispanic), sex, and years of experience supervising
residents. In Appendix B, we present a summary of the demographics of the two panels.
Some minor imbalance ensued when five participants bowed out a few days before the
session. Two of these people decided not to participate on account of the tragic incident
that occurred in Ottawa at the War Memorial and Parliament building center block the
day before this session.
SELECTING TEST QUESTIONS FOR THE STANDARD SETTING SESSION
All questions used for the standard setting session were taken from the most recent
MCCQE Part I, namely the spring 2014 administration. Dichotomously scored MCQs
were calibrated using the Rasch model (Rasch, 1960/1980) which, in turn, were used as
anchors to calibrate the CDM questions (Rasch model for dichotomous CDMs and the
partial credit model (Masters, 1982) for polytomous CDMs). With the bookmark
method, the basic question that panelists must answer is the following: “Is it likely that
the borderline candidate will be able to answer this question correctly”. A typical
probability level used with the bookmark method is the 67% response probability or, 2/3
chance of answering correctly. Therefore, response probabilities were calculated using
a 2/3 probability criterion for each dichotomously scored MCQ and CDM and for each
step value for polytomously scored CDMs.
MCCQE Part I Standard Setting Report
3
PRE-SESSION MATERIALS
To assist panelists to prepare for the standard setting session, we asked them to read
an article (De Champlain, 2004) and a book chapter (De Champlain, 2014) on the topic
of standard setting that we sent out prior to the exercise in October, 2014. Additionally,
the agenda for the two-day session was mailed out to participants a few weeks before
the session (see Appendix C).
Activities During the Two-Day Session
ORIENTATION
The success of any standard setting session relies heavily on the extensive training of
participating panelists. This helps to ensure that panelists have the same objective in
mind and the same basic premises and understanding of the standard setting process.
To this end, we spent half of the first day of the exercise training our panelists on a
number of issues, including the structure and content of the MCCQE Part I. Examples
of questions for both components of the examination were shown with the type of
scoring rubrics that would be seen in the exercises included in the session. This was
followed by a tutorial on standard setting, including issues to consider, methods and
sources of evidence to support the reliability and validity of any cut-score. Particular
attention was provided to the method that was selected to arrive at a recommended cut-
score for the MCCQE Part I exam, namely, the Bookmark method. In addition, a
second, ancillary standard setting method was introduced, the Hofstee method, which
was used as a complement to the item-centered Bookmark approach. The Hofstee
method is described in the literature as a compromise method (Hofstee, 1983) in that it
integrates both norm-referenced (relative interpretations) and criterion-referenced
(absolute interpretations) considerations in a “gut estimate” that is used to further
validate the cut-score obtained following the Bookmark exercise.
DEFINING THE BORDERLINE CANDIDATE
Commonly, standard setting methodologies, including the Bookmark method, assume
that a cut-score is set for the minimally proficient or borderline candidate. This
hypothetical candidate is critical in setting the cut-score, i.e., a point on the continuum of
professional competence that separates those deemed as competent candidates from
those deemed as incompetent. The Bookmark method requires that panelists clearly
define what constitutes a minimally proficient (or borderline) candidate, with respect to
what they may know and not know in the domains targeted by the MCQE Part I exam.
To assist panelists in this task, a basic definition was developed by the Vice-chair of the
CEC and offered to the panelists as a starting point. After much discussion, the
participants agreed on some modifications and enhancements by listing several
MCCQE Part I Standard Setting Report
4
attributes that they felt were reflective of borderline candidate behaviours and attitudes.
The definition that was agreed upon by all our panelists is shown in Appendix D.
THE PRACTICE TEST
To better understand the type of questions that Part I candidates must answer during an
examination, a practice test was administered to the panelists prior to collecting their
judgments. It contained a representative sample of 50 multiple-choice questions and 26
clinical decision-making questions selected from the spring 2014 MCCQE Part I
examination. Panelists were given 90 minutes to complete the practice test after which
they were instructed to self-score their test using an item map which provided correct
answers for each question. The purpose of the practice test was also to give
participants a sense of the level of difficulty of the MCCQE Part I. Participants were not
asked to share their resulting score with other panelists. However, this exercise did
provide the basis for a discussion of their perceived level of difficulty of the questions
and the appropriateness of the content in relation to the purpose of the Part I
examination and its target population (i.e. candidates entering supervised training or
residency).
THE PRACTICE BOOKMARK METHOD
A practice bookmark exercise was planned to train the panelists in this procedure
before they engaged in the actual full-scale activity. The same questions used in the
practice test were used for this exercise as well. However, the questions were now
ordered by difficulty level, from “easiest” to “most difficult”, based on actual spring, 2014
MCCQE Part I candidate performances. The goal of this standard setting method was
to allow panelists, in a practice round, to identify a point on the scale that they believed
reflected minimal competency in the domains measured by the MCCQE Part I
examination.
Each participant was presented with a booklet that contained examination questions
(one per page) that were ordered by difficulty from easiest to most difficult. Each
participant was asked to place their bookmark at the point at which they felt a minimally
proficient (or borderline) candidate would correctly answer all items up to that point and
incorrectly answer all items beyond that point. The basic question that panelists must
answer in the Bookmark procedure is the following: “Is it likely that a minimally proficient
candidate will be able to correctly answer this test question?” Of course, the “likeliness”
must be defined more specifically. In the Bookmark method, it is defined as having a
2/3 chance of answering correctly (or 2/3 chance of reaching a CR score or higher – for
polytomous items). The expression “RP67” is often used to capture the essence of a
.67 response probability; simply another way of expressing the 2/3 chance of answering
correctly.
MCCQE Part I Standard Setting Report
5
Panelists were instructed to read questions starting with the first question in their
booklet and proceed one item at a time sequentially until they arrived at a point where
they felt that the minimally proficient candidate would no longer have a 2/3 chance of
correctly answering the item. Panelists were not provided with the correct answers for
this initial practice round. Following this initial bookmark placement, panelists were then
provided with an item map that contained information on each question in the booklet
including the correct answer as well as the associated RP67 value. Following this
practice round, panelists were invited to begin the actual two rounds of the Bookmark
standard setting exercise.
TWO ROUNDS OF BOOKMARKING
Round 1 (Preliminary round). Following the practice bookmark round, panelists were
reminded of some key points about the Bookmark method and were assigned to their
respective panels. They were then each provided with a booklet that contained 236
items (one form’s worth of items) which were ordered by difficulty level (based on RP67
value) from easiest to most difficult. They were then instructed to independently place a
bookmark at the point at which they felt a minimally proficient (or borderline) candidate
would correctly answer all items up to that point and incorrectly answer all items beyond
that point. Forms were distributed for documenting each panelist’s bookmark (see
Appendix E). The panelists were given 3.5 hours to complete their round 1 bookmark
placement. Note that the judgments provided in round 1 were solely based on the item
text that was provided, i.e., no performance data were given.
Following round 1, panelists were asked to provide answers to the following four
Hofstee method questions: (1) What is the minimally acceptable cut-score (Cmin), even if
all candidates attained this score level; (2) What is the maximum acceptable cut-score
(Cmax), even if no candidate tis score level; (3) What is the minimum tolerable failure rate
(Fmin) and; (4) What is the maximum tolerable failure rate (Fmax). Again, this information
is used to gauge the appropriateness of the Bookmark method cut-score as per the
panelists’ holistic views. Forms were distributed (see Appendix F) to allow panelists to
record the data for the Hofstee method. Forms were collected and provided to
Statistical Analysts who in turn entered the data in an application which allowed us to
view each panel’s bookmark overlaid with the Hofstee boundaries. Figures 1 and 2
illustrate Bookmark and Hofstee data for round 1 for Panels 1 and 2, respectively.
Figure 3 combines the data for both panels. Panel 1 panelists are represented as blue
letters on each graph. Panel 1 had 9 panelists: A, C, D, E, G, H, I, J, and K. Panel 2
panelists are represented as red letters. Panel 2 had 8 panelists: A, B, E, F, G, H, I,
and J. The placement of letters on the graphs have significance only on the x-axis,
namely the cut scores on the theta scale. The stacking of some of the letters was done
simply to distinguish panelists whose cut score was the same instead of superimposing
MCCQE Part I Standard Setting Report
6
them. The placement of the letters has no significance on these graphs in terms of
failure rates for individual panelists.
Panelists from both panels were gathered in one room to provide them with impact data
which consisted of failure rates given their respective cut scores and combined cut
score values. Pass and failure rates for Canadian and International Medical Graduates
for the years 2012-2014 were presented to all panelists (See Table 1). Also, a
cumulative distribution of examination results was prepared from all first-time
candidates who completed the spring 2014 MCCQE Part I. For each score, a
distribution of cumulative percentage of failures was established and a look-up table
was created to obtain a percentage failure for any given cut score obtained from each
panelist.
To translate bookmark placement into cut scores on the item response theory (IRT)
ability (theta) scale, an additional look-up table was created that listed: (1) item
identification number for each item used in the bookmarking exercise; (2) the
corresponding booklet page number; (3) the Rasch item difficulty measure and; (4) the
RP67 value or IRT ability value needed to have a 2/3 chance of correctly answering any
given item in the sample MCQE Part I exam form that was used in our standard setting
exercise. Once we obtained all bookmark placement page numbers, those were
entered and a corresponding cut score was identified using the look-up table for each
panelist, panel and overall.
To obtain a panel-level cut score, the median cut score was calculated from the
distribution of cut scores by panel. The median was chosen instead of the mean since it
mitigates the influence of extreme values when they occur. The latter value
corresponded to the preliminary or round 1 cut score.
In Figure 1, we can observe that failure rates increase as cut scores increase and that
the cut score obtained by the Hofstee method (established by drawing a line down to
the cut score at the point where Fmax / Cmin and Fmin / Cmax lines traverse the cumulative
failure rates curve) for Panel 1 falls between the lower and higher boundaries identified
by the Hofstee method. This is a desirable outcome. It is desirable because it indicates
that the cut score (-0.39 on the theta scale) identified by Panel 1 falls within what they
expected in terms of maximum and minimum failure rates and maximum and minimum
cut scores.
In Figure 2, Panel 2 results for round 1 are presented. The results indicate that this
panel had incongruent outcomes between what they established as acceptable Hofstee
boundaries and the bookmark cut score (-0.78 on the theta scale). It would seem that 2
MCCQE Part I Standard Setting Report
7
panelists (B and E) are mostly responsible for this outcome. Figure 3 illustrates the
results of the combined data for both panels taken together resulting in a combined cut
score of -0.58 which falls within the Hofstee higher and lower boundaries. Panelists
were provided with an opportunity to discuss the results presented to them after this
preliminary round. Much discussion ensued in terms of the impact on medical
graduates who would potentially fail given the cut score produced by round 1
bookmarking. Some panelists expressed the fact that, given the impact data, they felt
that they were too lenient in terms of what they expected the borderline candidate would
be able to master while others felt they were too harsh.
Round 2 (Final round). Panelists were then directed to their respective subgroup to
engage in the second and final round of bookmarking. Results from this second round
constitute the recommended cut score which was subsequently brought forward to the
CEC for consideration and adoption. Panelists were given two hours to complete this
final standard setting round. As was the case in the preliminary round (round 1), forms
were gathered from panelists who indicated their second bookmark placement as well
as their responses to the four Hofstee questions (post round 2). Graphical
representations for round 2 bookmarking results are presented in Figures 4 and 5. In
Figure 4, round 2 individual and panel bookmark cut scores and corresponding failure
rates are presented. In Figure 5, the same data are provided with an additional overlay
of the Hofstee boundaries from round 2. The combined (i.e., both panels taken together)
cut score of -0.22 on the |IRT ability scale (theta) would fail 14% of all first-time
candidates using the spring 2014 examination results. This cut score would fail 5.1% of
first-time Canadian medical graduates from the spring, 2014 MCCQE Part I
administration.
Recommendation from the Panelists The abovementioned figures were presented to all panelists concurrently and they were
provided with an opportunity to discuss the impact of using the resulting cut score.
Several panelists expressed their satisfaction with the method that they used to arrive at
the final cut score. They felt comfortable with the results of the exercises. Consistent
with their mandate as set at the beginning of the meeting, they recommended that the
cut score of -0.22 on the IRT ability scale be brought forward to the CEC for approval, at
the spring, 2015 meeting.
Evaluation of the Standard Setting Judgments Details of each panel’s recommended cut scores following Round 2 (final round) are
presented in Table 2. This table presents a summary of the 2 panels’ cut scores and
their associated descriptive statistics, namely the means, medians and standard
deviations. The standard error of judgment (SEJ) is also presented. This statistic
MCCQE Part I Standard Setting Report
8
captures the amount of variability associated with each panel’s cut score. It provides a
rough indication of the extent to which the same or a similar cut score would be
obtained if we were to gather physicians with the same demographics as the ones that
were chosen for this session, who would have gone through the same type of training
and using the same examination items. By building a confidence interval around the
SEJ, we can evaluate the extent to which the 2 panels arrived at comparable cut
scores. Panel 1’s interval extends from 0-.37 to -0.18 and Panel 2’s interval extends
from -0.38 to 0.03. From this finding, we confidently conclude that their cut scores were
very comparable.
Providing Feedback through an Online Survey At the conclusion of the meeting, panelists were provided with an opportunity to provide
feedback on the activities in which they participated. An online survey tool was
developed for this specific purpose. Panelists were informed that the feedback provided
would be treated anonymously. All but one panelist completed the survey before they
left the meeting. One of the panelists completed the survey one day later.
Results of the survey are presented in Appendix G. All 17 participants thought that the
information regarding the overview of the MCCQE Part I was either good (18%), very
good (18%), or excellent (65%). They thought that the overview of standard setting was
either good (6%), very good (29%), or excellent (65%). Central to the exercises during
this standard setting session was the notion of the minimally competent (i.e., borderline)
candidate. Participants were asked to assess the clarity of the definition of that target
population that they developed. All 17 participants thought that the definition was clear
(76%) or very clear (24%).
A significant amount of time was devoted to training panelists to the task which was felt
by staff as extremely important to ensure a common understanding of what we
expected of them before they engaged in the actual bookmarking exercise. Ninety-
four percent of panelists thought that exercise was appropriate, 6% thought that it was
somewhat appropriate, and none thought it was not appropriate. All participants
thought that the training provided for the bookmark method was either good (12%), very
good (18%) or excellent (71%).
Among the facto" that influenced participants the most when they engaged in the
Bookmark method were their perception of the level of difficulty of the items (94%), the
description of the minimally competent candidate (88%), the item statistics provided in
round 2 (76%), and the knowledge and skills measured by the items (76%). Among the
factors that had the least influence on their bookmarking exercise were the quality of the
item distractors (12%) and the number of answer choices per item (18%).
MCCQE Part I Standard Setting Report
9
Participants were asked about their level of understanding of how to apply the
bookmark and Hofstee methods during round 1. For the bookmark method, 16 out of
17 participants said that they either understood (29%) or understood very well (65%)
this process while one participant reported that they understood “somewhat”. For the
Hofstee method, 1 participant (6%) said that they understood somewhat, 5 participants
said that they understood (29%), 11 participants (65%) said that they understood very
well, while none of the participants reported not understanding the method “at all”.
Participants were also asked about their level of confidence regarding the
consequential/ feedback data and the final discussion. Two participants (12%) felt
somewhat confident, 6 participants (35%) felt confident, 9 participants (53%) felt very
confident, whereas none of the participants felt that they were not at all confident.
One of the significant outcomes desired following a standard setting exercise is a
standard that participants would recommend with a very high level of confidence. As
part of the survey, participants were asked about the level of confidence in the final
recommended passing score. One participant felt somewhat confident while the large
majority reported being confident (18%) or very confident (76%) about the
recommended cut score value.
Finally, participants were surveyed on potential improvements to consider for further
standard setting exercises. Among the suggestions for improvement were comments
about providing impact data after the practice bookmark method as well as each
panelist’s bookmark placement. Also, one participant suggested providing failure rates
for each panelist’s bookmark following the practice bookmark method. A few
participants felt that there were no improvements to be made.
Concluding Remarks The main goal of this report was to outline the main activities that constituted the
standard setting exercise for the MCCQE Part I. In summary, two panels were gathered
for the purpose of establishing and recommending a cut score by participating in a 2-
day session during which they were trained in the Bookmark and Hofstee standard
setting methods. A significant amount of time was spent defining the target population
and training of panelists on various critical aspects of the exercise. Two panels
established highly comparable cut scores as demonstrated by the overlap of their
respective confidence interval using the standard error of judgment. A high level of
confidence in the recommended cut score was expressed by a majority of participants.
Several staff from Psychometrics and Assessment Services and the Evaluation Bureau
participated in making this a successful session. Finally, a comprehensive description
of all the activities and the resulting cut score as well as impact data for both the spring
MCCQE Part I Standard Setting Report
10
2014 and 2015 cohorts were presented to the CEC on June 8, 2015 for their discussion
and consideration. The CEC unanimously accepted the recommended cut score of -
0.22 (427 on the 3-digit MCCQE Part I reporting scale) at this meeting.
MCCQE Part I Standard Setting Report
11
REFERENCES
Cizek, G. J. and Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and
Evaluating Performance Standards on Tests (55-189). Thousand Oaks, CA: Sage.
De Champlain, A. F. (2014). Standard setting methods in medical education. In T.
Swanwick (Ed.). Understanding Medical Education: Evidence, Theory and Practice.
(305-316). Chichester, West Sussex: John Wiley & Sons, Ltd.
De Champlain, A. F. (2004). Ensuring that the competent are truly competent: An
overview of common methods and procedures used to set standards on high-stakes
examinations. Journal of Veterinary Medical Education, 31, 61-5.
Hofstee, W. K. B. (1983). The case for compromise in educational selection and
grading. In S. B. Anderson and J. S. Helmick (Eds.). On educational testing (109-127).
San Francisco: Jossey-Bass.
Kane, M. (1994). Validating the Performance Standards Associated With Passing
Scores. In Review of Educational Research. Fall 1994 64 (3), 425-461.
Kane, M. (1998). Choosing Between Examinee-Centered and Test-Centered Standard-
Setting Methods, Educational Assessment, 5 (3), 129-145.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-
174.
Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests.
(Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with
foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press
Wright, B. D. and Stone, M. H. (1979). Best Test Design: Rasch Measurement
MCCQE Part I Standard Setting Report
12
Table 1: Canadian and International Medical Graduate Pass/Fail Rates for the
Years 2012-2014
2012 2013 2014
Canadian Medical Graduates First-Time Takers
FAIL 1.5% 1.3% 2.3%
PASS 98.5% 98.7% 97.7%
Canadian and International Medical Graduates First-Time
Takers
FAIL 9.0% 8.6% 10.6%
PASS 91.0% 91.4% 89.4%
Table 2: Standard Setting Results for Panels 1 and 2 for Rounds 1 and 2
Summary of Cut Scores by Panel
for Rounds 1 and 2
Round 1 Round 2
Panel 1 Panel 2 Panel 1 Panel 2
Panelist 1 -0.07 -0.98 -0.26 -0.04
Panelist 2 -0.89 -1.74 -0.31 -0.02
Panelist 3 -0.46 -1.73 -0.46 -0.44
Panelist 4 -0.37 0.74 -0.19 -0.17
Panelist 5 -0.07 -1.08 -0.26 -0.37
Panelist 6 -0.95 -0.27 -0.59 -0.18
Panelist 7 -0.99 -0.58 -0.20 -0.53
Panelist 8 0.29 0.53 -0.27 0.38
Panelist 9 -0.39
-0.43
Mean -0.43 -0.64 -0.33 -0.17
Median -0.39 -0.78 -0.27 -0.17
Standard
Deviation 0.44 0.93 0.13 0.29
Standard Error
of Judgment
(SEJ) 0.16 0.35 0.05 0.10
MCCQE Part I Standard Setting Report
13
Figure 1: Failure Rates for First-Time Takers (Panel 1)
0%
10%
20%
30%
40%
50%
-2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Cut Score
Round 1 Failure Rates for First-Time Takers: Panel 1
Panel 1 Hofstee Cut Score Panel 1 Bookmark Cut Score
Hofstee Lower Boundary
Hofstee Higher Boundary
MCCQE Part I Standard Setting Report
14
Figure 2: Failure Rates for First-Time Takers (Panel 2)
0%
10%
20%
30%
40%
50%
-2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Cut Score
Round 1 Failure Rates for First-Time Takers: Panel 2
Panel 2 Hofstee Cut Score Panel 2 Bookmark Cut Score
For round 1, Panel 2's cut-score falls outside the lowest boundary as established by their Hofstee maximum failure rate and lowest cut-score
Hofstee Lower Boundary
Hofstee Higher Boundary
MCCQE Part I Standard Setting Report
15
Figure 3: Failure Rates for First-Time Takers (Combined Panels)
0%
10%
20%
30%
-2.00 -1.75 -1.50 -1.25 -1.00 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 1.00
Cut Score
Round 1 Failure Rates for First-Time Takers Combined Panels
Panel 1 Bookmark Cut Score Panel 2 Bookmark Cut Score Combined Bookmark Cut Score
Panel 1 Hofstee Cut Score Panel 2 Hofstee Cut Score Combined Hofstee Cut Score
Combined results produce a cut-score that falls within the Hostee boundaries Hofstee Higher
Boundary
Hofstee Lower Boundary
MCCQE Part I Standard Setting Report
16
Figure 4: Failure Rates for all First-Time Takers (Round 2)
0%
10%
20%
-0.70 -0.60 -0.50 -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50
Pe
rce
nta
ge F
ailin
g
Cut Score
Failure Rates for All First-Time Takers: Round 2
Panel 1 Bookmark Cut Score Panel 2 Bookmark Cut Score Combined Bookmark Cut Score
MCCQE Part I Standard Setting Report
17
Figure 5: Failure Rates for all First-Time Takers and Hofstee Boundaries
0%
5%
10%
15%
20%
-1.00 -0.90 -0.80 -0.70 -0.60 -0.50 -0.40 -0.30 -0.20 -0.10 0.00 0.10 0.20 0.30 0.40 0.50
Pe
rce
nta
ge F
ailin
g
Cut Score
Failure Rates for All First-Time Takers and Hofstee Boundaries Round 2
% of Failure Panel 1 Bookmark Cut Score Panel 2 Bookmark Cut Score Combined Bookmark Cut Score
Hofstee Lower Boundary
Hofstee Higher Boundary
Combined Panels Bookmark
MCCQE Part I Standard Setting Report
18
Appendix A: Demographic Information Sheet
The information requested below is being collected to help the MCC obtain a pan-
Canadian representative panel to recommend a passing score on the MCC Part I
Examination. This information will only be used to select the panel members so that we
can represent the diversity of physicians across the country. The information will not be
linked in any way to the collection of data for setting the passing score. A reminder that the
meeting will take place on 22, 23, and 24 October, 2014 therefore we are asking panelists
to be available on those 3 days.
Please provide your name and contact information, and check a box next to each of the
questions. The form can be sent by mail or electronically by 30 April 2014.
Medical Council of Canada
100-2283 St-Laurent Blvd.
Ottawa, ON K1G 5A2
Name (please print):_____________________________________________
Preferred contact information (mailing address, email address & phone
number):______________________________________________________
_____________________________________________________________
1. Number of years in practice post residency:
1-5 years ☐
6-10 years ☐
11-20 years ☐
21-30 years ☐
More than 30 years ☐
2. Number of years’ experience supervising residents:
1-5 years ☐
6-10 years ☐
11-20 years ☐
21-30 years ☐
More than 30 years ☐
MCCQE Part I Standard Setting Report
19
3. Do you have experience supervising Canadian Medical Graduates?
Yes ☐
No ☐
4. Have you ever been a member of a Medical Council test committee?
Yes ☐
No ☐
5. Country of medical training (post graduate training):
Canada ☐
Other ____________ ☐
6. Region of the country in which you live:
Alberta ☐
British Columbia ☐
Manitoba ☐
Maritimes ☐
Ontario ☐
Quebec ☐
Saskatchewan ☐
Territories ☐
7. First Language:
English ☐
French ☐
Other (______________) ☐
8. Sex:
Male ☐
Female ☐
9. Ethnicity:
Asian ☐
Black ☐
Caucasian ☐
First Nations ☐
Hispanic ☐
MCCQE Part I Standard Setting Report
20
10. Medical Specialty:
Pediatrics ☐
Internal Medicine ☐
Psychiatry ☐
Obstetrics and Gynecology ☐
Surgery ☐
Family Medicine ☐
Other ______________ ☐
11. Type of community in which you work:
Urban ☐
Rural ☐
12. Type of care setting:
Hospital-based ☐
Community-based ☐
MCCQE Part I Standard Setting Report
21
Appendix B: Demographic Summary of the Two Panels
Variable of Interest Group Panel A Panel B Total
Female 56% 50% 53%
Male 44% 50% 47%
West 22% 38% 29%
Central 56% 38% 47%
East 22% 25% 24%
Internal Medicine 33% 38% 35%
Surgery 22% 13% 18%
Obstetrics/Gynecology 11% 13% 12%
Pediatrics 22% 13% 18%
Psychiatry 0% 13% 6%
Family Medicine 11% 13% 12%
1-5 years 11% 38% 24%
6-10 years 44% 13% 29%
11-20 years 11% 25% 18%
21-30 years 33% 25% 29%
Canada 89% 88% 88%
Other 11% 12% 12%
Gender
Geographic Region
Medical Specialty
Number of Years
Supervising Residents
Country of Medical
Training
MCCQE Part I Standard Setting Report
22
Appendix C: Standard Setting Agenda
STANDARD SETTING FOR QUALIFYING EXAMINATION PART I MCC Office – University Boardroom (3rd Floor) OCTOBER 23RD-24TH, 2014 | 8:00 a.m. – 4:00 p.m.
AGENDA
DAY 1: Thursday, Oct. 23rd, 2014
CONTINENTAL BREAKFAST 08:00
1. Breakfast and Registration 08:00 1.1 Complete confidentiality and biographical information forms 1.2 Let panellists know to what table/room they belong 2. Welcome and Introduction by MCC 2.1 Introduction of Panellists 2.2 Overview of Agenda 2.3 Overview of Part I Examination 2.4 Overview of Standard Setting 2.5 Overview of Bookmark Method 3. Review Practice Test and Self-Score 09:30 3.1 Break as needed 3.2 Take Practice Test: 50 MCQs + 25 CDM questions 3.3 Self-score using Practice Test Item Map 3.4 Discuss knowledge and skills on test LUNCH 11:45
4. Develop Target Student Description and Reach Consensus 12:30 4.1 Clear definition of minimally competent candidate starting residency 5. Training of Bookmark Method & Practice 13:15 5.1 Practice bookmark method 50 MCQs and 39 CDMs P.M. BREAK 14:45
6. Practice Ordered Item Booklet (OIB) 15:00 6.1 Provide item map for Practice OIB 6.2 Discussion of ordered difficulty and placement of bookmark 6.3 Survey post-bookmark training 7. Additional Discussion/Clarification 16:30 END OF DAY 1 17:00
MCCQE Part I Standard Setting Report
23
Day 2 – Friday, Oct. 24th, 2014 CONTINENTAL BREAKFAST 08:00
8. Independently Mark Round 1 Bookmark Judgements/ 08:30 Hofstee by Panel 9. End of Round 1 11:30 9.1 Data entry LUNCH/Data Entry 11:30
10. Round 1 – Data Feedback Whole Group 10.1 Provide Panel- and room-level data and impact data 12:15 10.2 Round 1 panel discussions with large group 11. Independently Make Round 2 Bookmark Judgements/Hofstee 13:00 P.M. BREAK 15:15
12. End of Round 2 15:15 12.1 Data entry 13. Round 2 Data Feedback 15:45 13.1 Provide panel- and room-level data and impact data 13.2 Presentation of Bookmark Recommendation 14. Complete Final Evaluation and Collection of Materials 16:15 END OF DAY 2 16:30
MCCQE Part I Standard Setting Report
24
Appendix D: Defining Borderline Performance and the Minimally
Competent Candidate
The “minimally competent” candidate entering residency is a candidate who possesses the
minimum level of knowledge, skills and attitudes required to safely practice medicine under
supervision. A “minimally competent” candidate’s performance is acceptable, despite gaps
in their knowledge and clinical decision-making skills.
The minimally competent candidate will:
Have the right attributes
Be able to reflect limits of their own
Be able to recognize that a patient is sick, but doesn’t necessarily know why
May not have the ability to adequately recognize life threatening situations
Be able to gather information but not necessarily be able to integrate it
Be reliable in identifying red flags (and sense of urgency) for patient safety
Ask for help
Improve over time
Recognize his/her own weakness
Have a willingness to learn and reflect on feedback
Incorporate professionalism
Be clinically, logically and culturally competent
Build a rapport with the patient
Synthesize information
MCCQE Part I Standard Setting Report
25
Appendix E: Form to Document a Bookmark for Each Round
Panel: _______
Panelist: _______
Standard Setting for the Qualifying Examination Part I
The Bookmark Method
Please indicate the page number of the item on which you placed your bookmark. It
is the item for which, in your judgment, a minimally proficient candidate’s chance of
answering correctly falls below a 2/3 probability.
Please initial after each round:
Round Bookmark Page Initials
1
2
MCCQE Part I Standard Setting Report
26
Appendix F: Form to Document Hofstee Boundaries
Panel: _____
Panelist: _____
Standard Setting for the Qualifying Examination Part I
The Hofstee Method
Please answer the following 4 questions, once after each round:
1. What is the highest percent correct cut score that would be acceptable, even
if every candidate attains that score? This value represents your estimate of
the maximum level of knowledge that should be required of candidates.
Round 1: ______ Round 2: ______
2. What is the lowest percent correct cut score that would be acceptable, even if
no candidate attains that score? This value represents your judgment of the
minimum acceptable percentage of knowledge that should be tolerated.
Round 1: ______ Round 2: ______
3. What is the maximum acceptable failure rate? This value represents your
judgment of the highest percentage of failing candidates that could be
tolerated. Round 1: ______ Round 2: ______
4. What is the minimum acceptable failure rate? This value represents your
judgment of the lowest percentage of failing candidates that could be
tolerated. Round 1: ______ Round 2: ______
MCCQE Part I Standard Setting Report
27
Appendix G: Part I Standard Setting Fall 2014 – Post-Session Survey
Summary
1. Which panel did you participate in? (Select ONE)
Response Chart Percentage Count
Panel 1 (University room) 53% 9
Panel 2 (Barr/Bérard room) 47% 8
Total Responses 17
2. What was your impression of the clarity of the information regarding the overview
of the MCCQE Part I exam that was provided on the morning of Day 1? (Select ONE)
Response Chart Percentage Count
Excellent 65% 11
Very good 18% 3
Good 18% 3
Fair 0% 0
Poor 0% 0
Total Responses 17
3. What was your impression of the clarity of the information regarding the overview
of standard setting that was provided on the morning of Day 1? (Select ONE)
Response Chart Percentage Count
Excellent 65% 11
Very good 29% 5
Good 6% 1
Fair 0% 0
Poor 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
28
4. What was your impression of the clarity of the information regarding the overview
of the Bookmark Method that was provided on the morning of Day 1? (Select ONE)
Response Chart Percentage Count
Excellent 53% 9
Very good 41% 7
Good 6% 1
Fair 0% 0
Poor 0% 0
Total Responses 17
5. How clear were you about the description of the “Minimally Competent” (or
sometimes called “Borderline”) candidate on the MCCQE Part I exam as you began
the task of setting a passing score following the training on the afternoon of Day 1?
(Select ONE)
Response Chart Percentage Count
Very clear 24% 4
Clear 76% 13
Somewhat clear 0% 0
Not clear 0% 0
Total Responses 17
6. Did you feel the discussion of the “Minimally Competent” (or sometimes called
“Borderline”) candidate on the MCCQE Part I exam was helpful during the training
on Thursday afternoon? (Select ONE)
Response Chart Percentage Count
Yes, very helpful 47% 8
Yes, helpful 47% 8
Yes, somewhat helpful 6% 1
Not helpful at all 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
29
7. How would you judge the length of time spent (about 45minutes on the agenda)
on the afternoon of Day 1 introducing, discussing and editing the definition of the
“Minimally Competent” or “Borderline” candidate? (Select ONE)
Response Chart Percentage Count
About right 82% 14
Too little time 6% 1
Too much time 12% 2
Total Responses 17
8. What is your impression of the practice session for applying the Bookmark
Method to a set of MCQs and CDM questions on the afternoon of Day 1? (Select
ONE)
Response Chart Percentage Count
Appropriate 94% 16
Somewhat appropriate 6% 1
Not appropriate 0% 0
Total Responses 17
9. What is your overall evaluation of the training that was provided for setting a
passing score using the Bookmark Method? (Select ONE)
Response Chart Percentage Count
Excellent 71% 12
Very good 18% 3
Good 12% 2
Fair 0% 0
Poor 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
30
10. What factors influenced your placement of your Bookmark on day 2? (Select
ALL choices that apply)
Response Chart Percentag
e
Count
The description of the Minimally
Competent or Borderline
candidate
88% 15
My perception of the difficulty of
the test items
94% 16
The test item statistics 76% 13
Other panelists during the
discussion
53% 9
My experience in the field 41% 7
Knowledge and skills measured
by the test items
76% 13
The quality of the distractors to
the test items
12% 2
The number of answer choices
to the test items
18% 3
Other (please specify) 0% 0
Total Responses 17
11. How did you feel about participating in the group discussions regarding the
ordered item booklet? (Select ONE)
Response Chart Percentage Count
Very comfortable 82% 14
Somewhat comfortable 18% 3
Unsure 0% 0
Somewhat uncomfortable 0% 0
Very uncomfortable 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
31
12. How would you rate your understanding of how to apply the Bookmark Method
during the marking round 1 on Day 2? (Select ONE)
Response Chart Percentage Count
I understood very well 65% 11
I understood 29% 5
I understood somewhat 6% 1
I did not understand at all 0% 0
Total Responses 17
13. How comfortable were you in applying the Bookmark Method during marking
round 1 on Day 2? (Select ONE)
Response Chart Percentage Count
Very comfortable 53% 9
Somewhat comfortable 35% 6
Unsure 12% 2
Somewhat uncomfortable 0% 0
Very uncomfortable 0% 0
Total Responses 17
14. How would you rate your understanding of the Hofstee task of providing
boundary values for the passing score?
Response Chart Percentage Count
I understood very well 53% 9
I understood 35% 6
I understood somewhat 12% 2
I did not understand at all 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
32
15. How comfortable were you in applying the Hofstee during marking round 1 on
Day 2? (Select ONE)
Response Chart Percentage Count
Very comfortable 47% 8
Somewhat comfortable 41% 7
Unsure 6% 1
Somewhat uncomfortable 6% 1
Very uncomfortable 0% 0
Total Responses 17
16. What level of confidence do you have that the consequential/feedback data and
final discussion this afternoon helped the panel arrive at a defensible passing
score? (Select one)
Response Chart Percentage Count
Very confident 53% 9
Confident 35% 6
Somewhat confident 12% 2
Not at all confident 0% 0
Total Responses 17
17. What level of confidence do you have in the final recommended passing
score? (Select one)
Response Chart Percentage Count
Very confident 76% 13
Confident 18% 3
Somewhat confident 6% 1
Not at all confident 0% 0
Total Responses 17
MCCQE Part I Standard Setting Report
33
18. How could the method used for setting a passing score on the MCCQE Part I
exam have been improved? |
1. The process as executed was excellent.
2. no
3. I think it took a little while to grasp the concept of minimally competent & hence
the book mark but became very clear after the initial exercise
4. I think that people are pushed to change their scores after the first session on
day 2. The bias was to increase the passing score on the second round
because of the large disparity in panels.
5. This is my first time doing this exercise, so I do not have previous experience for
comparison. Having said that, I don't feel there was nothing to improve.
6. it would have been valuable after the practice bookmark to provide the data
including the impact information and graphical spread, as we had done after
round 1 on day 2.
7. I think the discussions were excellent!
8. no improvement needed - there was lots of time for discussion which I think was
important
9. Not sure; I thought the process went well as it is.
10. Develop the list of competencies from the onset of the exercise.
11. the teaching, preparation, and handling of questions were all excellent. there
was some confusion among participants as to whether they should discuss with
others or not, especially during round I. Given the discussion that ensued
after the impact statistics were shown, I wonder about including that on the
practice day so desensitize people to this aspect.
12. As suggested at the time, letting us know immediately what failure rate would
result from with our individual bookmarks would be helpful.