+ All Categories
Home > Documents > The Objective Structured

The Objective Structured

Date post: 05-Feb-2017
Category:
Upload: lecong
View: 258 times
Download: 0 times
Share this document with a friend
8
ANNALS OF SURGERY Vol. 222, No. 6, 735-742 © 1995 Lippincott-Raven Publishers The Objective Structured Clinical Examination The New Gold Standard for Evaluating Postgraduate Clinical Performance David A. Sloan, M.D., Michael B. Donnelly, Ph.D., Richard W. Schwartz, M.D., and William E. Strodel, M.D. From the Department of Surgery, University of Kentucky Chandler Medical Center, Lexington, Kentucky Objective The authors determine the reliability, validity, and usefulness of the Objective Structured Clinical Examination (OSCE) in the evaluation of surgical residents. Summary Background Data Interest is increasing in using the OSCE as a measurement of clinical competence and as a certification tool. However, concerns exist about the reliability, feasibility, and cost of the OSCE. Experience with the OSCE in postgraduate training programs is limited. Methods A comprehensive 38-station OSCE was administered to 56 surgical residents. Residents were grouped into three levels of training: interns, junior residents, and senior residents. The reliability of the examination was assessed by coefficient a; its validity, by the construct of experience. Differences between training levels and in performance on the various OSCE problems were determined by a three-way analysis of variance with two repeated measures and the Student- Newman-Keuls post hoc test. Pearson correlations were used to determine the relationship between OSCE and American Board of Surgery In-Training Examination (ABSITE) scores. Results The reliability of the OSCE was very high (0.91). Performance varied significantly according to level of training (postgraduate year; p < 0.0001). Senior residents performed best, and interns performed worst. The OSCE problems differed significantly in difficulty (p < 0.0001). Overall scores were poor. Important and specific performance deficits were identified at all levels of training. The ABSITE clinical scores, unlike the basic science scores, correlated modestly with the OSCE scores when level of training was held constant. Conclusion The OSCE is a highly reliable and valid clinical examination that provides unique information about the performance of individual residents and the quality of postgraduate training programs. 735
Transcript
Page 1: The Objective Structured

ANNALS OF SURGERYVol. 222, No. 6, 735-742© 1995 Lippincott-Raven Publishers

The Objective StructuredClinical ExaminationThe New Gold Standard for EvaluatingPostgraduate Clinical PerformanceDavid A. Sloan, M.D., Michael B. Donnelly, Ph.D., Richard W. Schwartz, M.D.,and William E. Strodel, M.D.

From the Department of Surgery, University of Kentucky Chandler Medical Center,Lexington, Kentucky

ObjectiveThe authors determine the reliability, validity, and usefulness of the Objective Structured ClinicalExamination (OSCE) in the evaluation of surgical residents.

Summary Background DataInterest is increasing in using the OSCE as a measurement of clinical competence and as acertification tool. However, concerns exist about the reliability, feasibility, and cost of the OSCE.Experience with the OSCE in postgraduate training programs is limited.

MethodsA comprehensive 38-station OSCE was administered to 56 surgical residents. Residents weregrouped into three levels of training: interns, junior residents, and senior residents. The reliability ofthe examination was assessed by coefficient a; its validity, by the construct of experience.Differences between training levels and in performance on the various OSCE problems weredetermined by a three-way analysis of variance with two repeated measures and the Student-Newman-Keuls post hoc test. Pearson correlations were used to determine the relationshipbetween OSCE and American Board of Surgery In-Training Examination (ABSITE) scores.

ResultsThe reliability of the OSCE was very high (0.91). Performance varied significantly according tolevel of training (postgraduate year; p < 0.0001). Senior residents performed best, and internsperformed worst. The OSCE problems differed significantly in difficulty (p < 0.0001). Overallscores were poor. Important and specific performance deficits were identified at all levels oftraining. The ABSITE clinical scores, unlike the basic science scores, correlated modestly with theOSCE scores when level of training was held constant.

ConclusionThe OSCE is a highly reliable and valid clinical examination that provides unique information aboutthe performance of individual residents and the quality of postgraduate training programs.

735

Page 2: The Objective Structured

736 Sloan and Others

The primary goal of training programs is to producecompetent practitioners. In most training programs, theperformance of residents is judged by subjective facultyevaluations and standardized multiple choice tests, suchas the In-Training Examination. This type ofassessmentprogram is problematic for two reasons: (1) Subjectivefaculty evaluations are unreliable'3 and tend to inflateresident performance3-5; and (2) multiple-choice exami-nations, such as the In-Training Examination or the ap-propriate American Board Certifying Examination, al-though very reliable, assess only a single dimension ofclinical competence, that is, knowledge base.6 Other im-portant aspects of clinical expertise, such as physical ex-amination skills, interpersonal skills, technical skills,problem-solving abilities, decision-making abilities, andpatient treatment skills are not assessed objectively.

Recently, clinicians have focused on the ObjectiveStructured Clinical Examination (OSCE), a multidimen-sional practical examination of clinical skills, as a toolfor assessing clinical competence.'-" Although most ofthe information on the OSCE has been gained from ex-perience with medical students, a handful ofstudies haveshown the value ofthis tool in assessing the performanceof residents.9-'2 Experience with the OSCE is somewhatlimited in the United States, but this evaluative methodhas emerged elsewhere as the premiere method for as-sessing clinical competence. In Canada, for example, allphysicians must now pass the OSCE if they are to be li-censed by the Medical Council of Canada.'3"4 Legiti-mate concerns linger, however, about the wide range ofreliabilities (0.19-0.89) reported for the OSCE910"4'8;there is no universal agreement that conventional assess-ment methods must be abandoned and replaced by suchperformance-based tests.A pilot study at our institution demonstrated that the

OSCE reliably measured the clinical performance of in-terns. " We have also shown that the OSCE furnishes in-formation on clinical competence distinct from that pro-vided by faculty ward evaluations and the AmericanBoard of Surgery In-Training Examination (ABSITE).19Numerous competency deficits are uncovered by theOSCE, and we and others believe that medical graduatesenter postgraduate training programs with weak, andeven declining, clinical skills.'1,20-23

In the current study we had three goals: (1) To deter-mine the reliability ofthe OSCE in testing an entire resi-dency population; (2) to determine the validity of the

Supported in part by a grant from the Association for Surgical Educa-tion.

Address reprint requests to David A. Sloan, M.D., Department of Sur-gery, University of Kentucky Chandler Medical Center, 800 RoseStreet, Lexington, KY 40536-0084.

Accepted for publication November 23, 1994.

OSCE in measuring the performance of residents atmultiple training levels; and (3) to determine the useful-ness of the information gained about residents' clinicalskills. To our knowledge, such an assessment ofan entirecohort of residents has never been performed.

METHODSA comprehensive 38-station OSCE was administered

to 56 surgical residents whose postgraduate year (PGY)levels ranged from PGY1 to PGY6. The examinationwas administered just before the beginning of the aca-demic year, so that the PGY1 residents (interns) wereentering the training program. All residents were fromthe same surgical residency program at the University ofKentucky. For the purposes of data analysis, the resi-dents were grouped into three levels of training: (1) in-coming interns (n = 18); (2) junior residents (PGY2 andPGY3; n = 25); and (3) senior residents (PGY4, PGY5,and PGY6; n = 13). The examination was conducted ontwo consecutive Saturday mornings; halfofthe residentswere evaluated on each day.The OSCE consisted of 19 clinical problems (Table 1);

each clinical problem was divided into two parts (A and B).Part A consisted ofa 5-minute interaction between patientand resident, in which the resident was usually asked eitherto obtain a focused history or to perform a physical exami-nation. Occasionally, the resident was asked to perform atechnical exercise (i.e., suturing a laceration) or to give asecond opinion (i.e., explain the options to a patient withnewly diagnosed breast cancer). Some of the patients usedwere actual patients, whereas others were playing the roleof a patient. A faculty member graded each resident ac-cording to a given set of predetermined criteria presentedin the form of a checklist. The items on the checklist werethe key history or physical examination items that hadbeen deemed by expert faculty to be critical to a competentperformance. For example, a breast surgeon listed all ofthe key physical examination maneuvers and the pertinentclinical findings that he considered essential to making adiagnosis in the case ofa patient with fibrocystic breast dis-ease. At each examination station, the faculty membersacted as passive evaluators and were instructed not to guideor prompt the residents. Although surgical residents werebeing evaluated, several faculty members were recruitedfrom other departments, including internal medicine, toproctor those stations that dealt with clinical problems ofabroader nature. A total of 34 faculty members participatedin the examination.

Part B of each clinical problem consisted of a seriesof questions dealing directly with the patient interactioncompleted in part A. For example, havingjust examineda patient with a thyroid nodule, the residents were askedto respond to questions concerning the diagnostic

Ann. Surg. * December 1995

Page 3: The Objective Structured

The OSCE 737

Table 1. CONTENT OF THE RESIDENTOBJECTIVE STRUCTURED CLINICAL

EXAMINATION (OSCE)

OSCE Problem Description

Arterial ischemia

Knee trauma

Biliary colic

Suturing

Sciatica

Postoperative abdominal pain

Breast cancer options

Multitrauma assessment

Tongue cancer

Hypercalcemia

Thyroid nodule

Venous leg ulcer

Surgical anatomy

Hematuria

Breast examination

ICU

Abdominal pain

Mole evaluation

Lung cancer

Examination of an actual patient withsigns of advanced peripheralvascular disease

Examination of a simulated patient witha history of knee trauma

Evaluation of a simulated patient withsymptomatic cholelithiasis;explanation of options

Suturing a standardized laceration on apig's hock

Examination of a simulated patient withback pain

Evaluation of a simulated postoperativepatient with new onset of abdominalpain and fever

Explanation of treatment options to apatient with newly diagnosed breastcancer

Resuscitation of a simulated patientwith multiple injuries

Obtaining a history from a simulatedpatient with a new sore on histongue

Assessment of an actual patient withsymptoms of primaryhyperparathyroidism

Examination of an actual patient with asolitary thyroid nodule

Examination of an actual patient with avenous stasis ulcer

Identification of multiple surfaceanatomy landmarks of surgicalimportance

Evaluation of a simulated patient withgross hematuria

History and physical examination of anactual patient with fibrocystic breastdisease

Paper case dealing with a complex ICUpatient management problem

Obtaining a history from a patient withleft lower quadrant pain and rectalbleeding

Focused history and physicalexamination of an actual patient witha mole

Obtaining a history from a simulatedpatient with symptomatology.suggestive of lung cancer

ICU = intensive care unit.

workup and treatment ofthe patient. The part B answers

were graded according to a checklist of objective criteriaagain preset by expert faculty members. The part B sta-tions were also 5 minutes in duration.

A computerized buzzer system signaled the residentsto rotate from station to station until each candidate hadvisited every station. The total examination time foreach resident was 190 minutes.The reliability ofthe examination was assessed by co-

efficient alpha. Reliability coefficients were calculated forpart A, part B, and combined (averaged) part A/part B.The unit of analysis in calculating the reliability coeffi-cients was the total score. The number ofstations neededfor the benchmark reliability (0.80) was estimated by theSpearman-Brown formula. Validity was assessed withuse of the construct of experience (e.g., a senior residentshould perform better than junior residents). A three-way analysis ofvariance with two repeated measures andthe Student-Newman-Keuls post hoc test were used todetermine whether there were significant differencesamong PGY levels, whether there was a significantdifference in performance on part A and part B, andwhether there were differences in performance amongthe various clinical problems. Pearson's correlationswere used to determine the magnitude ofthe relationshipbetween level of performance and level of training andto determine the relationship between the OSCE and theABSITE. Competent performance on the OSCE was op-erationally defined as 60%. This standard applied to in-dividual problems as well as to the overall examination.Considering that this was the first OSCE that the entiregroup ofresidents had taken, the decision to set 60% as apassing score for each station was arbitrary. The 0.05level of confidence was used to define a significantdifference.

RESULTSThe reliability of the part A component of the exami-

nation was 0.87. The reliability ofpart B was 0.83. Whenparts A and B were combined, the reliability ofthe entireexamination increased to 0.91. The Spearman-Brownformula indicated that an 8-problem, 16-station OSCEwas required to reach the benchmark reliability of0.80.

Performance varied significantly by level oftraining (F= 53.87; df= 2, 53; p < 0.0001). The Student-Newman-Keuls post hoc test indicated that the three residentgroups were significantly different from each other: thesenior residents performed best; the interns, worst. Pear-son's correlation between level of training and averageOSCE percentage score was 0.80. All of these datastrongly support the construct validity ofthe OSCE.The residents performed better on part A than on part

B (F = 220.18; df = 1, 53; p < 0.0001). Figure 1 showsthe mean OSCE part A and part B percentage scores foreach resident group. This figure shows that there is analmost constant difference between part A and part Bfor each ofthe three groups. There was also a significant

Vol. 222 * No. 6

Page 4: The Objective Structured

738 Sloan and Others

-+- Part A

Part B

Figure 1. Mean percentage OSCEscore and 95% confidence intervalfor each group.

Intern Junior Resident Senior Resident(PGY-1) (PGY-2&3) (PGY-4,5,6)n=18 n=25 n=13

Resident Group

difference in the difficulty ofthe various OSCE problems(F = 39.89; df= 18, 954; p < 0.0001). Figure 2 shows themean percentage score for each OSCE problem. As can

be seen from this figure, the means varied from 37% forthe arterial ischemia problem to 73% for the lung cancer

problem.Forty-three of the 56 residents also took the ABSITE

examination the following January. Performance on theOSCE was correlated with performance on the ABSITE.The results of these analyses are shown in Table 2. Thesecond column of the table presents the simple corre-

lations between the OSCE and ABSITE scores. The AB-SITE Clinical score correlated very highly with the threeOSCE scores. In contrast, the ABSITE Basic Sciencescore correlated much more modestly with the OSCEscores. We hypothesized that these correlations might beartificially high because scores on both the OSCE andthe ABSITE systematically vary by level of training. Tocontrol for this factor, we computed partial correlationsbetween the OSCE and ABSITE scores, holding level oftraining constant. These correlations are presented in thethird column of Table 2. As can be seen from the table,the partial correlations are dramatically lower than thesimple correlations. The ABSITE Clinical score corre-lates most highly with the OSCE scores, although thesecorrelations are not very high.

Figure 3 shows the distribution of the average OSCEproblem scores. The scores ranged from the 30s to the60s. Seventy-five percent of the scores were below theestablished acceptable performance level of 60%. As

noted earlier, the 60% passing standard was arbitrary andperhaps not reflective of competence. Items on thechecklist varied in importance. In conducting futureOSCEs, we plan to prospectively determine key compe-tence items and then base a passing score on these keyitems. The percentage of residents whose performancewas deficient on each OSCE problem is shown in Figure4. As can be seen from this figure, significant group defi-cits existed for most ofthe OSCE problems.The direct cost for the one-time examination was cal-

culated to be $74.00 per resident. The facility and facultytime were donated. Patients were paid $20.00 per hourfor their training and participation in the OSCE.

DISCUSSION

Physicians recognize that clinical competence is deter-mined by more than knowledge. Although a soundknowledge base is vital, clinical competence encom-

passes numerous other domains, including interviewingand interpersonal skills, physical diagnosis skills, prob-lem-solving abilities, and technical skills. Unfortunately,many ofthe skills crucial to the competent performanceofa physician or surgeon are poorly evaluated by facultymembers.324 Stillman et al. noted that in many cases,internal medicine residents taking a history or perform-ing a physical examination were never observed by fac-ulty members.24 Researchers have established that the re-

liability and validity offaculty rating forms are generallypoor and that these ratings do not correlate well with

a0

16C,)

w

0Co

LLI-c

o

ciua1)

CoC)

U)

40

U1)

0~

C

U1)

70

65

60

55

50

45

40

35

Ann. Surg. - December 1995

Page 5: The Objective Structured

The OSCE 739

Figure 2. Mean percentagescore for each OSCE station.

wC)0

I:Lung Cancer-Mole Evaluation -Abdominal Pain -

ICU -Breast Examination-

Hematuria -Surgical Anatomy -Venous Leg Ulcer-Thyroid Nodule-Average Score-Hypercalcemia -Tongue Cancer-

Multitrauma Assessment-Breast Cancer Options -Postop Abdominal Pain-

Sciatica -

Suturing -

Biliary Colic -Knee Trauma-

Arterial Ischemia -

0 20 40 60 80Mean Percent Scores for Each OSCE Problem and

95% Confidence Interval

Table 2. CORRELATION OF OBJECTIVESTRUCTURED CLINICAL EXAMINATIONSCORES WITH AMERICAN BOARD OFSURGERY IN-TRAINING EXAMINATION

PERFORMANCE

PartialCorrelation Correlation

ABSITE totalOSCE totalOSCE part AOSCE part B

ABSITE clinicalOSCE totalOSCE part AOSCE part B

ABSITE basic scienceOSCE totalOSCE part AOSCE part B

0.66*0.58*0.71 *

0.250.090.37t

0.81 *0.78*0.80*

0.42t0.37t0.43t

0.39t0.280.484

0.10-0.080.27

ABSITE = American Board of Surgery In-Training Examination; OSCE = ObjectiveStructured Clinical Examination.* p < 0.0001.tp<0.05.tp<0.01.

more objective measures of clinical competence.' 3,16,19

Other studies have noted that faculty members typicallyinflate resident performance and are generally reluctantto underscore deficits in clinical performance.4'5"l9Multiple-choice tests, such as the ABSITE and the testsdeveloped by the National Board ofMedical Examiners,although very reliable for measuring knowledge, do not

50

40 -.

0.

0

0

cU2)C.)L..

a1)

a.

30 .. ................ ..

20 ..

10 -..............

0I

30% to 40%

OSCE Score RangeFigure 3. Distribution of average OSCE scores.

100

Vol. 222 a No. 6

Page 6: The Objective Structured

740 Sloan and Others

Arterial IschemiaSuturing

Postop Abdominal PainKnee Trauma

SciaticaTongue CancerAverage Score

Biliary ColicMultitrauma Assessment

UJ Breast Cancer Optionsc) Surgical AnatomyU) Thyroid Nodule° Hypercalcemia

HematuriaVenous Leg Ulcer

Breast ExaminationICU

Abdominal PainMole Evaluation

Lung Cancer

0 20 40 60 80

Figure 4. Percentage of residentsscoring less than 60%.

100

Percent Unsatisfactory

measure clinical ability, nor do they correlate well withother evaluative tools that do measure such abil-ity.6,1 1,12,19The OSCE is a relatively new tool for evaluating phy-

sicians in training. The examination was introduced byHarden et al. at the University ofDundee (Dundee, Scot-land) in an effort to improve the evaluation of medicalstudents' clinical performance.7 As described by the au-thors, the OSCE consisted of a number of stationsthrough which medical students rotated, spending pre-determined amounts of time at each station. Differentclinical skills were assessed at the individual stations, anditem checklists were used to objectively grade students'skills in performing clinical tasks. As elaborated byHarden et al., the OSCE offered the advantages of con-trolled grading criteria and easy repeatability of the ex-amination.7 Subsequent authors have shown the OSCEto be a reliable and valid examination not only for med-ical students but also for residents in a variety of disci-plines, including internal medicine and surgery.8'2 InCanada, confidence in the OSCE as a superior method ofperformance evaluation has led the Medical Council ofCanada and several specialty boards to include the OSCEin the licensing and certifying examination process.13-5Although such steps have not been taken in the UnitedStates, there is considerable interest in expanding the useof performance-based testing to evaluate the perfor-mance ofmedical students and residents. We believe that

it is crucial that the OSCE be completely evaluated forreliability and validity before the numerous resources(e.g., money, faculty time, patient recruitment and train-ing, administrative costs, quality control costs, etc.) nec-essary to develop OSCE examinations for certificationand licensure are expended. One of the primary prob-lems with the OSCE is the wide range of reliabilities,from 0. 19 to 0.89, reported for this examination.9" 0'14-18This wide range has actually led some authors to suggestthat the test is only reliable if it lasts for 6 to 10 hours,which would clearly produce an unmanageable and im-practical examination.'7 It is absolutely necessary toshow that the OSCE has a reliability that can justify itsuse as a licensing examination. In addition, to ourknowledge, an OSCE has never before been given to allresidents in an entire residency program. Administeringthe OSCE in this manner is mandatory ifwe are to deter-mine the validity of such an examination. It is crucial todemonstrate that the OSCE can detect significant differ-ences in clinical performance between residents atdifferent levels of training.The reliability ofthis OSCE examination was excellent

(0.91), exceeding both the accepted benchmark reliabil-ity standard of 0.80 and all other reported reliabilities.This level of reliability is comparable to the reliability ofthe multiple-choice examinations that are used for thepurposes of the In-Training Examination and specialtyboard certification.2526 A testing time of 3 hours was

Ann. Surg. * December 1995

Page 7: The Objective Structured

The OSCE 741

sufficient to achieve reliability. In fact, the Spearman-Brown equation indicated that an even shorter examina-tion containing only 16 stations or 8 problems wouldhave been associated with an acceptable level of reliabil-ity. We find it interesting that in the literature, reliabili-ties for the OSCE appear to be better for residents thanfor medical students. We attribute the high reliability ofthe OSCE in the current study partly to the populationtested, which consisted of residents at multiple levels oftraining. The reliability was also improved by the largenumber ofclinical problems.Another advantage of testing residents at multiple

training levels is that validity can be established. Previ-ous reports on the reliability ofthe OSCE have not beenbased on the examination ofan entire group ofresidents.A valid examination should show that more experiencedresidents perform at a higher level. In the current study,the performance ofresidents could be grouped into threecategories, the highest level ofperformance coming fromthe residents at the most senior levels. Joorabchi used anOSCE to evaluate a group of pediatric residents andfound it to be more effective at identifying the level ofresidency training than either the Pediatric Board In-Training Examination or resident performance ratings.'2Cohen et al., at the University of Toronto, used theOSCE to evaluate the clinical skills ofPGY2 surgical res-idents.9 Significant differences in scores were found be-tween these residents and foreign medical graduates.9The question ofwhether the OSCE provides informa-

tion not provided by other testing methods, such as theABSITE, cannot be answered, because scores on bothtesting methods tend to increase with level ofexperience.Thus, the two methods will tend to rank individuals inthe same way even though they may be measuring verydifferent skills. The simple correlations between ABSITEclinical scores and OSCE scores demonstrated just suchan effect. However, when level oftraining was controlledfor, there was a dramatic decrease in the magnitude ofthe correlations between ABSITE and OSCE. Neverthe-less, the ABSITE clinical scores did correlate signifi-cantly with the OSCE scores, and to some limited degreethey measure similar skills.The OSCE has not been used as a screening test for

surgical residents by our institution or by any other, toour knowledge. The weak performance of incoming in-terns in general is disturbing and underscores significantdeficiencies in undergraduate medical education. Withthe growing number of institutions using the OSCEmethod to test medical students, perhaps program direc-tors should look specifically at the OSCE scores in sur-gery and in other disciplines ofprospective residents.The advantages of the OSCE are numerous. By defi-

nition, residents are placed in well-defined clinical sce-narios in which the variables can be controlled. There is

no limit to the variety of clinical situations that can beconstructed. Although most authors have relied almostexclusively on persons playing the role of patients (i.e.,"simulated" patients), we have attempted to use as manyactual patients as possible. Although it has been shownthat simulated patients reliably mimic actual patients,27actual patients are clearly superior in demonstratingphysical findings, such as a thyroid nodule, a venous sta-sis ulcer, or an ischemic foot.The OSCE allows residents to be placed in clinical sit-

uations that cannot be duplicated by a case presented onpaper. Further, the OSCE format permits evaluation ofmore obscure areas, such as the residents' abilities to in-teract effectively with a patient.28 We have previouslydemonstrated a correlation between resident interper-sonal skills and clinical performance on the OSCE.28Without question, a resident with poor interpersonalskills stands out in the OSCE, and we have shown thatfaculty members can reliably measure this subjective as-pect ofa resident's performance with this test.28A real advantage of the OSCE is that it demonstrates

to residents that basic clinical skills are valued by facultymembers. Having a faculty member directly observe res-idents either taking a history or performing a physicalexamination underlines the importance ofclinical exam-ination and sends a clear message to the residents. Notonly are faculty rating forms vague, but also, facultymembers are reluctant to criticize trainees on the basis ofthis subjective information.29 The OSCE results aremore objective and allow for much more accurate feed-back to residents. Such feedback is particularly valuablewhen dealing with the problem resident. Typically, theonly information available for judging a resident who isperforming poorly is gained from faculty ratings, which,as previously stated, tend to underestimate the prob-lem.3" 9'29 The OSCE provides the program director withmuch more objective information than would otherwisebe available.

Perhaps the greatest benefit of the OSCE is that it al-lows identification of problem areas."l'20'2' The OSCEcan be used to identify weaknesses within the residentcurriculum. In the current study we showed numerousdeficits in clinical performance on basic clinical prob-lems, even among the more senior residents. We havebegun to use this information to modify our curriculum,as have other investigators.30We conclude that the OSCE is a very reliable and valid

method for evaluating residents. Because the OSCE pro-vides a unique insight into the progression of residents'clinical competence, we believe that it should become astandard part of resident evaluation. We believe that thedata support our contention that the OSCE is the newgold standard for evaluating the clinical performance ofresidents.

Vol. 222 -f No. 6

Page 8: The Objective Structured

742 Sloan and Others

References1. Maxim BR, Dielman TE. Dimensionality, internal consistency,

and interrater reliability ofclinical performance ratings. Med Educ1987; 21:130-137.

2. Ansell JS, Boughton R, Cullen T, et al. Lack ofagreement betweensubjective ratings ofinstructors and objective testing ofknowledgeacquisition in a urological continuing medical education course. JUrol 1979; 122:721-723.

3. Schwartz R, Donnelly M, Drake D, Sloan D. Faculty sensitivity indetecting medical students' clinical competence. Clin Invest Med1993; 16(suppl):B87.

4. Goetzl EJ, Cohen P, Downing E, et al. Quality ofdiagnostic exam-inations in a university hospital outpatient clinic. JAMA 1983;249:1035-1037.

5. Wray NP, Friedland JA. Detection and correction of house stafferror in physical diagnosis. Ann Intern Med 1973; 78:481-489.

6. Levine HG, McGuire CH. The validity of multiple choice tests asmeasures ofcompetence in medicine. Am Educ Res J 1970; 7:69-83.

7. Harden RM, Stevenson M, Downie WW, Wilson GM. Assessmentof clinical competence using objective structured examination. BrMed J 1975; 1:447-451.

8. Petrusa ER, Blackwell TA, Carline J, et al. A multi-institutionaltrial of an objective structured clinical examination. Teach LearnMed 1991; 3:86-94.

9. Cohen R, Reznick RK, Taylor BR, et al. Reliability and validity ofthe objective structured clinical examination in assessing surgicalresidents. Am J Surg 1990; 160:302-305.

10. Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and validityof an objective structured clinical examination for assessing theclinical performance ofresidents. Arch Intern Med 1990; 150:573-577.

11. Sloan DA, Donnelly MB, Johnson SB, et al. Use of an objectivestructured clinical examination (OSCE) to measure improvementin clinical competence during the surgical internship. Surgery1993; 114:343-351.

12. Joorabchi B. Objective structured clinical examination in a pediat-ric residency program. Am J Dis Child 1991; 145:757-762.

13. Reznick RK, Blackmore D, Cohen R, et al. An objective structuredclinical examination for the licentiate of the medical council ofCanada: from research to reality. Acad Med 1993; 68(10):S4-S6.

14. Reznick R, Smee S, Rothman A, et al. An objective structuredclinical examination for the licentiate: report ofthe Pilot Project ofthe Medical Council ofCanada. Acad Med 1992; 67:487-494.

15. Grand'Maison P, Lescop J, Rainsberry P, Brailovsky CA. Large

Ann. Surg. * December 1995

scale use ofan objective structured clinical examination for licens-ing family physicians. Can Med Assoc J 1992; 146:1735-1740.

16. Matsell DG, Wolfish NM, Hsu E. Reliability and validity of theobjective structured clinical examination in paediatrics. Med Educ1991; 25:293-299.

17. Newble DI, Swanson DB. Psychometric characteristics of the ob-jective structured clinical examination. Med Educ 1988; 22:325-334.

18. Roberts J, Norman G. Reliability and learning from the objectivestructured clinical examination. Med Educ 1990; 24:219-223.

19. Schwartz R, Donnelly M, Sloan D, et al. The relationship betweenfaculty ward evaluations, OSCE, and ABSITE as measures of sur-gical intern performance. Am J Surg (in press).

20. Sloan DA, Donnelly MB, Schwartz RW, et al. Assessing the clini-cal competence of medical students and surgery residents in surgi-cal oncology problem solving. Ann Surg Oncol 1994; 1:204-212.

21. Endean E, Sloan D, Veldenz H, et al. Performance ofthe vascularphysical examination by residents and medical students. J VascSurg 1994; 19:149-156.

22. Gleeson F. Defects in postgraduate clinical skills as revealed by theobjective structured long examination record (OSLER). Ir Med J1992; 85:11-14.

23. Reisner E, Dunnington G, Beard J, et al. A model for the assess-ment of students' physician-patient interaction skills on the surgi-cal clerkship. Am J Surg 1991; 162:271-273.

24. Stillman PL, Swanson DB, Smee S, et al. Assessing clinical skills ofresidents with standardized patients. Ann Intern Med 1986; 105:762-771.

25. Grosse ME, Craft GE, Blaisdell W. The American Board of Sur-gery In-Training Examination. Arch Surg 1980; 115:654-657.

26. Langdon LO, Grosso LJ, Day SC, et al. A core component of thecertification examination in internal medicine. J Gen Int Med1993; 8:497-501.

27. Norman GR, Tugwell P, Feightner JW. A comparison of residentperformance on real and simulated patients. J Med Educ 1982; 57:708-715.

28. Sloan D, Donnelly M, Johnson S, et al. Assessing the interpersonalskills of surgical residents and medical students. J Surg Res 1994;57:613-618.

29. Irby DM, Fantel JI, Milam SD, Schwarz MR. Special report: legalguidelines for evaluating and dismissing medical students. N EnglJ Med 1981; 3:180.

30. Lunenfeld E, Weinreb B, Lavi Y, et al. Assessment of emergencymedicine: a comparison of an experimental objective structuredclinical examination with a practical examination. Med Educ1991; 25:38-44.


Recommended