Clerkship Assessments: Standardization, Subjectivity, and ... · NBME and Step 2 scores for...

Post on 27-Jun-2020

1 views 0 download

transcript

Clerkship Assessments: Standardization, Subjectivity, and “Science”

Matt Fitz, MD

November 7, 2019

• I have no conflicts of interest

Today’s Objectives

1. Differentiate the role that NBME exams may have in predicting USMLE exams with restructuring of the preclinical & clinical years

2. Identify limitations of NBME subject exam assessments

3. Discuss additional and different options for assessment in the clinical setting

Where does this fit?

• Assessment– Multiple-Choice Exams (& the NBME)– Clinical Performance Evaluation / EPAs– OSCE– Professionalism– Curriculum– Simulation / Clinical Skills

Where does this fit?

• Assessment– Multiple-Choice Exams (& the NBME)– Clinical Performance Evaluation / EPAs– OSCE– Professionalism– Curriculum– Simulation / Clinical Skills

NBME exams… not this again

• Potential applications to your curricular change: Timing– Impact of shorter preclinical time– Impact of Step 1– How many NBME exams are necessary?

30 – 50% “uncovered” material?

Our study

1. We wanted to look at how different IM Clerkship characteristics would affect the NBME IM Subject Exam

2. And what about the impact of earlier clinical starts?

3. We also wanted to look at how different IM Clerkship characteristics and NBME usage would affect Step 2

4. Controlled for Step 1

The Schools

• 60 schools (IRB exemption / approval)• >20,000 examinees from 2011-2014• Examinee data was cross-referenced by NBME

exam date and school reported testing times• IM Clerkships were highly variable• 8 week (40%) v 12 week (40%) clerkships

served as the framework for analysis

What were these characteristics?

• Length• Combined• Ambulatory• P/F• Honors

• Study Day(s)• Didactic Hours• # of NBME subject exams• Preclinical curriculum

– Hybrid– Traditional– Organ-based

Two additional notes…

• Longitudinal students (3 schools)

• Non-traditional academic year start date schools*

NBME exam performance: July start dates

Note: N = 14,667 examinees from K = 37 Medical Schools. These results also control for examinees’ Step 1 Score

NBME exam performance: July start dates

Construct Comparison Mean Difference

95% CI Adj. pLower Upper

Note: N = 14,667 examinees from K = 37 Medical Schools. These results also control for examinees’ Step 1 Score

NBME exam performance: May start dates

Note: N = 4,211 examinees from K = 10 Medical Schools. These results also control for examinees’ Step 1 Score

NBME exam performance: May start dates

Construct ComparisonMean

Difference

95% CIAdj.

pLower Upper

Note: N = 4,211 examinees from K = 10 Medical Schools. These results also control for examinees’ Step 1 Score

Impact on USMLE Step 2 July academic start

Note: N = 13,486 non-longitudinal examinees from K = 36 Medical Schools. These results also control for examinees’ Step 1 Score.

Impact on USMLE Step 2 May academic start

Correlation Coefficients and RSQ

Step 1 RSQ Step 2 RSQ Exam Assessment

Clerkships

Neurology 0.62 - 0.72 ~.45 0.59 - 0.74 ~.5 Department

Medicine 0.64 - 0.72 0.65 - 0.69 Department

Psychiatry 0.57 - .64 0.59 - 0.68 Department

Surgery 0.59 - 0.68 ~.45 0.64 - 0.7 ~.5 NBME

Ob/Gyn 0.56 - 0.62 0.6 - 0.65 NBME

Pediatrics 0.57- 0.62 ~.4 0.61 - 0.71 ~.45 Aquifer

Family Medicine 0.51 - 0.61 0.54 - 0.59 Aquifer

Collective R2 ~.75

At Brown (2011–2014), the collective R2 was .8At USUHS, (2009-1010), the collective R2 was .77

Discoveries

1. Clerkship Length has significant impact on NBME and Step 2 scores for non-traditionalclerkship dates – but not for the traditional schools

2. Other clerkship variables have very little significant impact

3. Longitudinal students score similarly, if not better, when compared to their peers

4. Increasing number of NBME exams does not have significant impact on Step 2 scores

A Few Ongoing Questions?

• What is the real impact of Step 1?• What does this mean for preclinical and clinical

integration? … if anything• Why do the non-traditional schools demonstrate

differences?– And what may be the impact for us

• Will there be impact on Step 1 with NBME & clerkship variability as schools become more “non-traditional?”

Questions / Discussion?

Valid N Odds Ratio95% Confidence Interval

pLower Upper

First Clinical Performance (CP1) 797 1.088 1.038 1.141 .001Weighted CP1 796 1.275 1.073 1.515 .01Second Clinical Performance (CP2) 797 1.064 1.015 1.115 .01Weighted CP2 797 1.238 1.039 1.476 .02Outpatient Clinical Performance (OCP) 797 1.089 1.038 1.143 .001Weighted OCP 797 0.968 0.603 1.554 .55NBME score 546 1.010 0.975 1.047 .58Weighted NBME Score 797 1.100 0.889 1.360 .38OSCE score 795 1.027 0.974 1.082 .33Weighted OSCE score 692 1.392 0.619 3.128 .42MCQ Exam Score 146 0.994 0.931 1.060 .85Exam Free Text Score 143 1.027 0.943 1.119 .54Exam Score 651 1.024 0.980 1.069 .29Weighted Exam Score 797 1.113 0.828 1.496 .48Overall Clerkship Score 797 1.088 1.021 1.159 .01Final Grade (overall significance) 778 .052

High Pass versus Pass 2.451 1.165 5.156 .02Honors versus Pass 2.263 0.909 5.635 .08Honors versus High Pass 0.923 0.409 2.086 .85

Table 1A: Odds of chief resident selection

External Validity – High Performers

Internal & External Validity – Low Performers

Valid N Odds Ratio95% Confidence Interval

pLower Upper

Medicine vs Surgery2012-2013 144 6.771 2.822 16.248 <.0012013-2014 136 2.716 1.313 5.617 .012014-2015 155 5.073 2.062 12.481 <.0012015-2016 154 3.380 1.495 7.645 .004

All Years 586 4.117

2.758 6.145

<.001

Table 2Odds of recording lower level CPE responses between medicine versus surgery

Overall #s are quite low; For Medicine, the number of low performers identified were at 2.5% of the total evaluation marks.

Surrogate Markers – Word Count

Medicine Surgery

MedianInterquartile Range

MedianInterquartile Range

Lower Upper Lower Upper

2012-2013 54 39 73 16 3 30

2013-2014 61 45 77 18 3 36

2014-2015 70 52 87 25 10 44

2015-2016 69 50 82 25 13 41

All Years 64 46 80 22 7 38

Summary statistics for word counts

Surrogate Markers – Word Count

Summary statistics for word counts

Surrogate Markers – Unique Boxes

Valid N Odds Ratio95% Confidence Interval

pLower Upper

Medicine vs Surgery

2012-2013 144 19.963 12.451 32.006 <.001

2013-2014 136 1.067 0.753 1.511 .71

2014-2015 155 0.932 0.687 1.264 .65

2015-2016 154 1.623 1.186 2.221 .003

All Years 586 1.917 1.633 2.249 <.001

T

Table 5AOdds of more unique boxes checked for medicine v surgery

Surrogate Markers – Unobserved Behaviors:

Generally, the medicine clerkship was more likely to have more unobserved behaviors than students in the Surgery Clerkship

Pro Faculty and staff are encouraged NOT to blindly assess every competency if they did not have enough observations of a particular competency (Procedures)Con Or does this actually reflect that the teams were not observing behaviors and raises the question about the overall validity of all other competencies evaluated