Curriculum Evaluation
Eva Aagaard
MDUniversity of ColoradoDenver School of Medicine
Stephen D. Sisson MDThe Johns Hopkins University School of Medicine
Abby L. Spencer MD MSAllegheny General Hospital
Donna Windish
MD MPHYale Primary Care Residency Program
Workshop OverviewReview vocabulary, hypothesis testing, and study designSmall group exercisesReview evaluation objects and evaluation methodsSmall group exercisesReview statistical methodsSmall group exercisesWrap-up/evaluations
Vocabulary
EvaluationFormative evaluation
Evaluation with intent to improve performance, usually provided during evaluated experience
Summative evaluationEvaluation with intent to judge performance, usually provided at end of evaluated experience
Quantitative DataQuantifiable, numerically expressed data
Examples:Number of students taking courseAverage post-test score of all PGY-3 residents
Source: oerl.sri.com
Quantitative AnalysisUse of computational procedures and statistical tests to evaluate quantitative data
Examples: Means, standard deviations, tests of statistical significance
Source: oerl.sri.com
Qualitative DataNon-quantified narrative information
Example: Words used by physicians talking to patients to admit errors
Source: oerl.sri.com
Qualitative AnalysisUse of systematic procedures (inductive, iterative) for deriving meaning from qualitative data
Example: Physician panel reviews doctor/patient discussion of medical errors to reach consensus on which words used by physicians to admit errors
Source: oerl.sri.com
ReliabilityConsistency or reproducibility of measurements
Intrarater/interrater: measurements are the same when repeated by same/different personTest/retest reliability: measurements are the same when repeated at different timesEquivalence: do alternate forms of the test produce same results (i.e. paper forms vs. online forms)
Source: Kern et al.
ValidityDo results represent what they claim to?
Validity is a measure of results, not just the instrumentBased on several criteria, including:
Face validityCriterion validityConstruct validity
Face ValidityDegree to which instrument seems to measure what it is supposed to (aka surface/content validity)
Criterion ValidityConcurrent validity: Results from new instrument are the same as another proven instrument
Example: Pass rate of students on new curriculum MCQ test is same as on shelf exam
Predictive validity: Instrument predicts individual’s performance on specific abilities
Example: Students who pass cardiology curriculum post-test are more likely to prescribe beta blockers when treating post-MI patients
Source: Kern et al.
Construct ValidityInstrument performs as expected when used in groups with or without the attribute being measured
Example: Nutritionists score highly on test of knowledge of nutrition while first year medical students’ scores are very low
http://hotdogcharlies.com/about.htm
CongruenceSome methods of measurement are more appropriate for measuring a specific attribute (i.e. knowledge, skill, behavior, attitude) than others
Examples of CongruenceKnowledge Skill Attitude
MCQ testOral exam
OSCEStandardized
patientObservation
checklist
Self-assessment questionnaireRatings formInterviews
Source: Kern et al.
Hypothesis testing
Hypothesis Testing vs.
Study Question
Comparison
Hypothesis Testing
Statement
Used to determine statistical significance
Study Question
Question
Used to determine practical significance
Hypothesis Testing
An approach that helps you make decisionsabout your results.
1.
Requires a statement of the null hypothesis.
2.
A threshold for declaring a p-value to be significant.
3.
Deciding if the p-value obtained is statistically significant.
Null Hypothesis
A statement
of no effect or no association.
“Participants and controls do not differ
in interpersonal scores at the end of the curriculum.”
Reject or accept the null hypothesis based on the p-value obtained, and the level of a p-
value you consider to be statistically significant.
P-Value
Probability of obtaining an outcome as extreme or more extreme than the observed result assuming the null hypothesis is true.
p=0.05 means: there is a 1 in 20 chance that a difference could occur by chance alone.
Study Question
Question
you wish to answer regarding a comparison.
“Do participants have higher interpersonal scores than controls at the end of the
curriculum?”
Study Question
Do the study results have practical significance.
“Is a Likert
scale score of 1.99 different from 2.10 practically, even if
statistically they are different?”
Good Study Questions
State what/who is being compared.
When the comparison is being made.
State the outcome of interest.
State the direction of change.
Good or Bad Study Question?
Do students score higher on multiple choice tests?
Good or Bad Study Question?
Do students differ before or after the curriculum?
Good or Bad Study Question?
Do students who received the curriculum have improved interpersonal scores at the
end of the intervention compared to control students?
Study designs
Study design: Post test
X--------OUsed to test proficiencyThreats:
Selection bias HistoryMaturation
Example:After completion of OB/GYN rotation, how many students pass shelf exam
Source: Kern et al.
Study design: Pretest-Post test
O1---X---O2Used to quantify impact of interventionThreats:
Selection biasHistoryMaturationTestingInstrumentation
May include a control group or be randomizedReduces bias, controls for instrument
Source: Kern et al.
Study design: Pretest-Post test
Example:Does student knowledge improve from baseline after completion of module on hypertension?
Does student knowledge improve more from baseline after completion of module on hypertension compared to students who don’t complete module?
Study design: Randomized controlled trial
E
O1---X---O2RC
O1--------O2
Addition of control group controls for instrument, maturation, and other factorsAddition of randomization reduces biasConsidered a true experimental designDisadvantages:
Resource intensiveIntervention denied to control group
RCT: Option 2E
O1---X---O2
R C
O1--------O2---X
Allows both groups to receive interventionParticularly useful when desire is to offer intervention to both groups
Alternate control groupsConcurrent controls
Like standard controls, do not receive interventionUnlike standard controls, are not randomly selectedExamples:
Convenience samplesLearners from a different year of training Learners from a different institution
Historical controlsSimilar to concurrent controls, but subjects studied from prior time frame
Study design: Randomized controlled trial
Example: How are second-year medical students’ history-taking skills affected by taking a course on cultural sensitivity as compared to second-year medical students who do not take this course?
True experimental design: students in the same class randomized to intervention or control groupConcurrent controls: e.g., students from the second year class at another medical school used for comparison with intervention groupHistorical controls: e.g., results of a previous second-year medical school class’ assessment of history-taking skills were used for comparison with intervention group
Breakout #1
Evaluation Objects
Evaluation objectsLearner outcomes
E.g., changes in knowledge, skills, behaviors, attitudes
Structural outcomesE.g., attendance, ACGME compliance
Patient outcomesE.g., patient satisfaction, ER visits, LDL-C levels
Levels of EvaluationMiller’s Pyramid of Clinical Competence
Does
Shows how
Knows how
Knows
Action
Performance
Competence
Knowledge
Miller. Acad Med 1990
ResourcesKern DE, Thomas PA, Howard DM, Bass EB. Curriculum development for medical education: A six-step approach. Johns Hopkins University Press. National Science Foundation Online Education Resources Library. http://oerl.sri.com
Evaluation Methods
Questionnaires/Surveys
DescriptionM-W: “A set of questions obtaining statistically useful or personal information from individuals”Proper instrument construction essential to obtaining reliable/valid results
Step 1Define goals and objectives of questionnaire
Key questions:What do you want to learn from the results?Who will be surveyed?What will be done with results?
Step 2Write draft questions to match goals and answer questions that are purpose of survey
Area of Inquiry
Purpose of Question
Indicators First draft
Attendance Does attendance impact ratings?
% of lectures attended
What # of lectures did you attend?
Impact of syllabus
Did people use the syllabus?
Read syllabus vs. not
Did you read any of the syllabus?
Demographics Does PGY year impact ratings?
PGY1PGY2PGY3
What is your year of training?
Step 3Determine format of final drafts of questions
Closed-ended vs. open-ended? Use closed-ended for quantitative results
Range of results must be anticipatedMany add “Other______” to allow for unanticipated responses
Use open-ended for qualitative results May establish unanticipated themes
Questions must be:ClearUse unambiguous termsAvoid bias
Closed-ended questionsDichotomous
Yes/no, did/did not etc.Scaled
Likert scaleTraditionally 5-point, some advocate 7 or 9Label all optionsCentral option typically neutral choice
Even number scale used to push fence-sitters
Rank order scaleObjects are ranked based on particular attribute
E.g., “Rank the following clinical rotations according to how useful you view them in preparing you for your post-graduate career (1=most useful; 4=least useful)”
__MICU__CCU__General Medical Wards__Ambulatory Clinic
Open-ended questionsCan be completely unstructured
E.g., “Tell us how to improve this curriculum”Can be sentence or paragraph completion
E.g., “The best way to improve the syllabus would be to ___________”
Step 4Determine order of questionsQuestions should flow:
LogicallyFrom general to specificFrom least sensitive to most sensitiveFrom factual/behavioral to attitudinal/opinion
Best to establish rapport at beginning, esp. if sensitive questions included
Step 5Pilot survey
Step 6Distribute/collect survey
Online resources available for electronic survey distribution/collection
E.g., www.surveymonkey.com
UseUseful for collecting a wide range of information from a large number of individuals
Confidentiality compromised in small groupsCommon uses:
Needs assessment during curriculum development Curriculum evaluation
Psychometric qualitiesCan evaluate:
AttitudesSatisfactionSelf-reported behaviorsBeliefs
Not used to evaluate individuals or predict clinical performance
Bias hard to avoidCentral tendency bias: extreme response categories avoidedAcquiescence bias: respondents agree with statements as presentedOrder of questions can bias result
Consider different forms of same questionnaire
Feasibility/practicalityMost resources used for questionnaire construction
Inappropriate questions, ordering, scaling, or format can compromise results
Also challenging:DistributionCollectionStatistical analysis
Suggested referencesOnline Evaluation Resource Library: www.oerl.sri.com
Woodward CA. Questionnaire construction and question writing for research in medical education. Med Educ 1988; 22: 345-63.
http://www.oerl.sri.com/
Self-assessment exercises
DescriptionDefinition
“The involvement of learners in judging whether or not learner-identified standards have been met”
Eva KW, Regehr G. Acad Med 2005
Three types of self assessment: Predictive: Physician predicts his/her performance on a task to be completed
Concurrent: Physician assesses his/her performance while performing a task
Summative: Physician compares performance on a completed task to some standard of reference
Most commonly used methods for obtaining self-assessment:
Questionnaires/surveysChecklists
Competence: The ability to perform a task properly, when compared to a standardized reference
May include knowledge, skills, and behavior
Confidence: A person’s sense of being capable
Confidence does not equal competence
Motivational DiscomfortWhen self-assessment is compared to external assessment, a performance gap creates “motivational discomfort”, which leads to improvementCommonly used external assessment methods:
OSCEStandardized patientsSimulationsIn-training/other examsChart auditOral exam
UseOften used to:
Establish learning needsAssess confidenceAssess general clinical skillsAssess medical knowledgeOther (teaching skills; cultural competence)
Davis DA et al. JAMA 2006
Psychometric qualitiesLittle evidence that self-assessment predicts clinical performance
Little correlation between self-assessment and external assessment
20 studies reviewed comparing self-assessment to external assessment Majority (13/20) showed little, no, or inverse relationship between self-assessment and external assessmentInability to self-assess was independent of level of training, specialty, or manner of comparisonThose who performed least well by external assessment were also the worst at self-assessment
Feasibility/practicalityQuestionnaires/checklists relatively easy to design and administerLack of psychometric validation seriously limits this methodBest use may be as tool to create motivational discomfort to stimulate improvement
ResourcesDavis DA, Mazmanian PE, Fordis M et al. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA 2006; 296: 1094-1102. Epstein RM. Assessment in medical education. New Engl J Med 2007; 356: 387-96.
Multiple Choice Questions (MCQs)
DescriptionMost common type of written test in all medical educationOften written according to a test blueprint, which itself is based on learning objectives
Terminology“Item”: an entire test question (stem + options)“Stem”: the question-asking section“Options”: the answer choices“Keyed response”: the correct answer choice“Distractors”: incorrect options
Option formatsConventional multiple choiceAlternate choiceTrue/FalseMatchingComplex multiple choice (“K type”)Context-dependent item (item set)
The StemShould express a complete thoughtBest items answerable by reading stem onlyBest written in positive, not negativeShould avoid “window dressing”Avoid:
Absolute terms (“always”, “never”)Imprecise terms (“seldom”, “occasionally”, “rarely”), Opinion terms (“may”, “could”, “can”)
Bad stem (1)Among the following antibiotics, which one could be used for endocarditis prophylaxis during dental procedures?
Better stemOf the antibiotics listed, which one is acceptable for endocarditis prophylaxis during dental procedures?
Bad stem (2)Your favorite patient, a 57-year-old woman, returns to follow up on her diabetes. It is the end of clinic, so you are running late. You note her hemoglobin A1c was 8.9% from blood work done last week. She was previously on glyburide, but 1 year ago you added metformin, which caused diarrhea for the first month, since resolved. She has been on full doses of glyburide and metformin for 6 months. She has seen the nutritionist. Which ONE of the following statements is correct?
Better stemA 57-year-old woman with type 2 diabetes, hepatitis C and congestive heart failure has a hemoglobin A1c of 8.9% despite being on maximal doses of glyburide/metformin. Which combination of medications should be used in this patient to improve diabetes control?
Bad stem (3)When obtaining informed consent, you should never do any of the following except…?
Better stemWhich one of the following is a core principle of obtaining informed consent?
The Options3-5 commonly providedDistractors most important discriminators of knowledge
Should be accurate, plausible, but clearly incorrectMay address common misconceptions
Keyed response and distractors should be similar in grammar, format, etc.Avoid:
“All of the above”“None of the above”
Bad items (1)Of the antibiotics listed, which one is acceptable for endocarditis prophylaxis during dental procedures?
A. DoxycyclineB. CiprofloxacinC. A semi-synthetic penicillinD. MetronidazoleE. None of the above
Bad items (2)A 57-year-old woman with type 2 diabetes, hepatitis C and congestive heart failure has a hemoglobin A1c of 8.9% despite being on maximal doses of glyburide/metformin. Which combination of medications should be used in this patient to improve diabetes control?
A. Double glyburide/metforminB. Metformin/glargineC. Add pioglitazoneD. Obtain a nutrition consult and focus on lifestyle
modification
UseMCQs can be used to assess:
KnowledgeComprehensionApplicationAnalysis
Good for large-scale assessments of groups
Psychometric QualitiesTests cognitive processesContext-rich questions may assess more complex cognitive processes (i.e. “knows how”rather than just “knows”)
CueingRespondent able to answer from options, but couldn’t if options not providedMay mimic premature closure in clinical-decision makingMinimized by using extended match lists or open-ended short answer questionsRemains a limitation of MCQ tests
Item discriminationGood items
Answered correctly by those who do well on a testAnswered incorrectly by those who do poorly on a test
Multiple equations used to determine item discrimination score
Scores range from -1 to +1Negative scores (and those
Cronbach’s alphaMeasures how well a set of items measure a construct (i.e. knowledge)
Tests reliability of an entire set of items (i.e. a test)Improves with increasing number of items and increasing inter-item correlationsScore of 0.70 or higher is considered acceptableScore of 0.85 or higher used for pass/fail decisionsLower scores acceptable for low-stakes testing
Feasibility/practicalityMost resources used in item writing
Professionals expect ~1H to write 1 itemMany programs use pre-written “shelf” exams (e.g., ITE, which has reliability 0.90)
Administering test easy partHigher stakes tests require more items and should be piloted and have reliability testing
Pass/Fail tests should have reliability 0.85 or greater
ResourcesCase SM, Swanson DB. Constructing written test questions for the basic and clinical sciences (3rd edition, revised). Philadelphia; National Board of Medical Examiners. 2002.Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Ed. 2001; 15: 309-34
Patient Surveys
Miller’s Pyramid of Clinical Competence
Patient Surveys
Does
Shows how
Knows how
Knows
Action
Performance
Competence
Knowledge
Miller. Acad Med 1990
DescriptionEvaluation completed by patientsGenerally assess patient satisfaction
Psychometric QualitiesNeed 20-80 patient ratings for sufficient reliabilityPatients unable to discriminate different dimensions of competencePatient ratings often quite highCorrelate poorly with physician ratings (poor concurrent validity)Difficult to assess trainee /curriculum separate from rest of health care team, environmentProfessional resistance to use
Evans RG, et al. Family Practice 2007;Apr;24(2):117-27.Matthews DA and Feinstein AR. JGIM 1989; 4: 14-21.
ABIM PSQ Project Executive Summary. Philadelphia, PA 1989.Weaver, et al. JGIM. 1993; 8: 135-9.
UsesEVALUATION/ FEEBACK TO PROVIDERS ON:
CommunicationHumanismProfessionalismOverall satisfaction with care
IMPACT OF CURRICULUM ON PATIENT PERCEPTION
RECERTIFICATION/ PROMOTION
PAY FOR PERFORMANCE
FeasibilityLogistically challenging
Large # neededData collection & recording
Primarily used for resident formative assessment; part of 3600 EvalIncreasingly used by health care/ insurance co to determine salary/ bonuses
ExampleEffects of structured encounter forms on pediatrichousestaff knowledge, parent satisfaction, andquality of care: A randomized, controlled trial
Purpose: To evaluate the effects of structured encounter forms on pediatric housestaff knowledge, parent satisfaction, and quality of care
Intervention: Housestaff randomized to use structured encounter forms focused on developmental milestones during health supervision visits
Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-80
Example cont.Outcome Measurements:
Changes in housestaff knowledge Pretest and posttest MCQs
Parent satisfaction Parent Surveys
Quality of care, defined as compliance with recommended guidelines for age-specific health supervision
Audiotaped visit review
Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-80
Example Cont.Results:
Intervention group > knowledge of developmental milestones (not stat sig)Parent satisfaction with developmental screening greater with intervention group (P < .001)Compliance with recommended standards of developmental screening greater with intervention group (P = .001)
Zenni A.Arch Pediatr Adolesc Med. 1996 Sep;150(9):975-801
References
Evans RG, et al. Family Practice 2007;Apr;24(2):117-27. ABIM PSQ Project. ABIM. Philadelphia, PA. 1989.Chang JT et al. Ann Int Med 2006; 144: 665-72.Thomas PA, et al. Acad Med 1999; 74: 90-91Matthews DA, et al. Am J Med. 1987; 83: 938-44.Weaver MJ. JGIM. 1993; 8: 135-9.Calhoun JG et al. Proc Annu Conf Res Med Educ 1984; 23:205-10.
Oral Examinations & Chart Stimulated Recall
Miller’s Pyramid of Clinical Competence
Chart Stimulated Recall
Oral Examination DoesShows how
Knows how
Knows
Action
Performance
Competence
Knowledge
Miller. Acad Med 1990
DescriptionExaminer presents a patient case scenarioExaminee describes patient managementQuestions probe:
Clinical reasoningInterpretation of findingsTreatment plans
Description- Oral BoardsCommittee of experts craft clinical scenarios from patient cases
Focus on key features of caseRepresentative cases chosen
1-2 physician examiners18-60 clinical casesEach scenario: 3-5 minExam duration: 90 min to 2 ½ hours
Psychometric Qualities Board Oral Exams
SCORINGPre-defined scoring rules Scores from each scenario combinedAnalyzed using item response theory or generalizability theory
RELIABILITYFair to Good 0.45-0.88
VALIDITYConcurrent: 0.75Predictive: 0.45
Maatsch JL. Emergency Med Annual 1982.Soloman DG, et al. Acad Med 1990; 65:S34-44.
Kearney RA, et al. Can J Anesth 2002 Mar;49(3):232-6.
Chart Stimulated RecallTrainees own patient chart used as basis for examination/ evaluationPredesigned questions used as framework for discussion
ID “disconnects” evident from chartDefine ? to probe disconnectsDefine desired response
15-20 min longOften video or audiotaped
Jennett P and Affleck L. J of Cont Educ in Health Care Prof. 1998; 18:
163-171
Psychometric Qualities CSR (Best Circumstances)
Require 3-6 cases to assess competencyConcurrent validity with ABEM written exam Good: 0.7Reliability fair to good: 0.54-0.64
Munger, Oral Examinations.Jennett P and Affleck L. J of Cont Educ in Health Care Prof. 1998; 18: 163-171
UsesAssesses:
KnowledgeApplication of knowledgeUnderlying reasoningAreas for remediation/ curriculum enhancementImpact of other variables (patient, provider, system, environment, etc) on understanding/ decisions
Can be used as a teaching tool
FeasibilityExtensive expertise required for scenario development in non-CSRExaminers must be trained and inter-rater reliability assessed prior to implementationDifficult to standardizeTime and faculty intensiveExpensive
Example- Oral ExamPoor inter-rater reliability on mock anesthesia oral examinations
Purpose:Assess the impact of a curriculum on Oral Examination communication and presentation techniques on resident performance on the oral examinations
Can J Anaesth. 2006 Jul;53(7):659-68
Oral Exam ContMethods:
Randomized, pretest-posttest trial of 25 residents taking a mock anesthesia board oral examination
E1 (25 residents) E2Curriculum E2
Videotaped oral exams graded by 6 experienced gradersResults:
Curriculum did not improve scores on oral exams, but limited by poor inter-rater reliability
Can J Anaesth. 2006 Jul;53(7):659-68
Example- CSRAre physicians discussing prostate cancer screening with their patients and why or why not? A pilot studyPurpose:
Assess whether primary care physicians routinely discuss prostate cancer screening (PCS) Explore the barriers to and facilitators of these discussions
Methods: 18 academic and community-based primary care physicians
Semi-structured interviewsCSR
J Gen Intern Med. 2007 Jul;22(7):901-7
CSR Cont.Results:
All physicians reported discussed PCS with patients6 reported ordering PSA tests without discussionsPCS occurred in 36% of 44 encounters qualifying for CSRImportant barriers to discussion are:
inadequate time for health maintenancephysician forgetfulnesspatient characteristics
J Gen Intern Med. 2007 Jul;22(7):901-7
ReferencesMancall EL, Bashook PG. (eds.) Assessing clinical reasoning: the oral examination and alternative methods. Evanston, IL: American Board of Medical Specialties, 1995.Jacobsen E et al. Can J Anesthesia 2006; 53 (7): 659-668.Jennett P and Affleck L. Chart audit and Chart Stimulated Recall as Methods of Needs Assesment in Continuing Professional Health Education. J Cont Educ 1998;18: 163-71
Performance Audit
Miller’s Pyramid of Clinical Competence
Performance Audit
Does
Shows how
Knows how
Knows
Action
Performance
Competence
Knowledge
Miller. Acad Med 1990
Performance Audit: Description
Patient information abstracted from medical records Results compared to accepted standards
Agency for Healthcare Research and QualityUSPSTFhttp://www.ahrq.gov
HEDIS (health plan employer data & information set)http://web.ncqa.org
Most commonly used (and studied) to assess quality of care
Performance Audit: Description
Patient data may include:Tests/studies ordered (Lipids, Mammogram)Laboratory/study results (Hemoglobin A1C)ImmunizationsDiabetic foot examinations Counseling for smoking cessationDocumentation of DNR or end-of-life discussions
Data usually collected by trained chart reviewersor member(s) of research team
Performance Audit: Uses
Provides evidence about:Clinical decision-makingFollow-through of tests orderedProvision of preventative servicesAppropriate consultation
Allows evaluation:Before and after educational interventionExposed versus not to educational intervention
Curriculum on screening guidelines, using EBP, lipid or A1C targets……
Performance Audit: Psychometric Qualities
Reliability:Sample size of 10 patient records is sufficient
Accuracy:Recording biasMissing or incomplete data is interpreted as not meeting accepted standardVariability in skills or chart reviewerCharting skills may differ from clinical skills
Performance Audit: Psychometric Qualities
Chart abstraction vs standardized patients20 GIM residents and faculty blindly evaluated and treated standardized patients (SP)Each SP had one of four diagnosesEach resident was evaluated for 2 of 4 cases
160 resident/SP encounters
Luck et al. 2000 Am JMed.
Performance Audit: Psychometric Qualities
Compared chart abstraction by trained nurses to SP reports (gold-standard) for 4 aspects of encounter:
Taking historyPerforming proper Physical ExamMaking correct diagnosisPrescribing appropriate treatment
pre-determined by national guidelinesSensitivity of chart abstraction =70%Specificity =81%
Luck et al. 2000 Am JMed.
Performance Audit: Psychometric Qualities
Medical chart abstractionProvides modest sensitivity and specificityMay underestimate quality of care for common
outpatient medical conditions
From a medical education standpoint, chart abstraction can
Underestimate curricular success if actual practice outperforms documentationOverestimate success if chart reflects actions or decisions by someone other than learner
Performance Audit: Feasibility/Practicality
Training reviewers to decode clinical data is timelyReview by trained reviewers averages 30 minutesReview by study authors may take even longerRecords may be inaccurate or incompleteDocumented care may represent decisions by other members of health care team rather than residentMust agree on standard to be compared against
Performance Audit: Example in the literature
Implementing achievable benchmarks in preventive health: A controlled trial in residency education
Purpose: to evaluate success of preventive health curriculum Methods: practice-based evaluation of 208 residents’delivery of preventive careCompared baseline and follow-up data from 2001-0410 Outcome: difference in receipt of preventive care for patients seen by intervention vs control residentsIntervention= preventive health curriculum
Houston et al. Ac Med 2000
Performance Audit: Example in the literature
Results: Charts reviewed for ~4000 resident patientsReceipt of preventive care increased for patients of intervention group, but not for patients of controlIntervention group: significant increases occurred for:
Screening for: smoking, colon ca, lipids Advice to quit smoking, provision of pneumovacc
Conclusions: Residents exposed to the curriculum outperformed controls on a practice-based evaluation of provision of preventive services.
Houston et al. Ac Med 2000
Performance Audit: Example in the literature 2
Evaluation of an educational intervention to encourage advance directive discussions between medicine residents and patients
Purpose: evaluate educational intervention to teach residents to discuss advance directivesMethods:
Didactic and role-play curriculumChart audit 10 days prior and 5 days post-interventionDNR discussion rates were noted
Furman et al. J Palliat Med. 2006
Performance Audit: Suggested References
Luck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. American Journal of Medicine. 2000 Jun 1;108(8):642-9.
Standardized Patients and Objective Structured
Clinical Examination
SPs and OSCEs
Miller’s Pyramid of Clinical Competence
Standardized Pt/ OSCEDoes
Shows how
Knows how
Knows
Action
Performance
Competence
Knowledge
Miller. Acad Med 1990
Standardized Patients: Description
Person trained to simulate a specific patient with a medical condition in a standardized fashion
Proposed in 1964 to overcome threats to validity of written-simulation tests
Students would say they would ask more questions or perform more physical exam maneuvers than in actuality
Standardized Patients: Vocabulary
Simulated patientMedical encounter conducted for educational purposeMay or may not use simulator’s personal medical history
Standardized patientConsistent content of verbal and behavioral response by SP to stimulus provided by trainee
A standardized patient is a simulated patient, but a simulated patient may not be standardized
Standardized Patients: Uses
Practice skills and formative feedbackWill not be covered today
Evaluation of skillsInterview skills (taking a good history)Physical examination skills*Communication skillsDifferential diagnosis skillsManagement/Treatment skillsProfessionalism skills
*Barrow. Acad Med 1993
Standardized Patients: Uses
SPs can be trained to provide: Written and objective reports via check-listsPatient-centered subjective rating and descriptive evaluation of trainees’ behaviorConstructive verbal or written feedback to the student
Additional raters/observers may also be present to assess competence
Standardized Patients: Uses
Can be used in actual clinical setting as registered patient with false records
Assess actual physician behaviors More commonly used as summative exam to evaluate clinical skills as individual station or collection of stations
Objective Structured Clinical Examination (OSCE)
OSCE AKA Clinical Skills Assessment/ Exam
(CSA/ CSE) AKA Clinical Practice Examinations (CPX)
http://images.google.com/imgres?imgurl=http://www.uniklinik-freiburg.de/augenklinik/live/lehre/OSCE.jpg&imgrefurl=http://www.uniklinik-freiburg.de/augenklinik/live/lehre.html&h=533&w=600&sz=103&hl=en&start=16&tbnid=M6XwPnoPu78BKM:&tbnh=120&tbnw=135&prev=/images?q="OSCE"&gbv=2&svnum=10&hl=enhttp://images.google.com/imgres?imgurl=http://www.heicumed.uni-hd.de/images/img_osce_doppler1_big.jpg&imgrefurl=http://www.heicumed.uni-hd.de/index.php4?cat=pruefungsformen&subcat=osceinnere&h=488&w=650&sz=67&hl=en&start=36&tbnid=nohnI-haTLJeQM:&tbnh=103&tbnw=137&prev=/images?q="OSCE"&start=20&gbv=2&ndsp=20&svnum=10&hl=en&sa=Nhttp://images.google.com/imgres?imgurl=http://www.thieme.de/viamedici/studienort_koeln/klinik/osce-test_bild1.jpg&imgrefurl=http://www.thieme.de/viamedici/studienort_koeln/klinik/osce-test.html&h=248&w=233&sz=15&hl=en&start=89&tbnid=HEVbyfpzgRg_5M:&tbnh=111&tbnw=104&prev=/images?q="OSCE"&start=80&gbv=2&ndsp=20&svnum=10&hl=en&sa=N
OSCE: DescriptionMultiple station SP exerciseUses multiple focused clinical encounters Each encounter assesses different skills/ competenciesOften incorporate non-patient stations for additional evaluations
Interpret EKG, CXR, Labs…Mannequins for technical skills
OSCE: DescriptionStudents read patients’ charts while waiting for the signal to enter the “exam rooms”
Chart contains pertinent information about the "patient" and background of the medical situation to which student is about to enter
medicine.iu.edu/body.cfm
http://medicine.iu.edu/body.cfm?id=836&oTopID=223
OSCE: DescriptionStudents alert the SPs by knocking and immediately begin to act out the patient-doctor relationship
Behind the doors the students are presented with a variety of clinical situations
Here, the student tells an older man that his wife had a heart attack and explains the EKG report to him
medicine.iu.edu/body.cfm
http://medicine.iu.edu/body.cfm?id=836&oTopID=223
OSCE: DescriptionSP presents case history in response to trainee’s questionsTrainee examines SP as appropriateSP then completes checklist to document actions (history, PE, behaviors, communication..)Score usually determined by percentage of actions recorded on SP checklist
OSCE:UsesSimilar to Standardized Patients
Medical school/ residency assessments
USMLE Step 2 Clinical SkillsQualifying Examination for Licensure
(Canada)
OSCE/SP: Psychometrics
Reliability averages ~0.7 for scored testsRecommended value for educational tests = 0.8
Can improve Reliability by:Proper training of evaluators/raters (MD’s or SPs) Increasing number of cases/stations on exam
Especially important for more complex skills such as clinical reasoning or high stakes examsNeed 3.5 hrs depending on # and complexity of cases
Using pass/fail rather than scored tests (up to 0.96)
OSCE/SP: Psychometrics
ValidityConstruct validity
Senior residents perform better than junior residentsResidents perform better than medical studentsExam scores improve with more trainingTime in direct patient care
OSCE/SP: Psychometrics
Face validity: ExcellentSimilar to clinical tasksCan assess multiple aspects of competencies
Concurrent validity: Modest correlations between SP/OSCE scores and clinical ratings or written examsAre they measuring different competencies?
OSCE/SP: Psychometrics
Predictive ValidityBetter predictors of resident performance than MCQ examsPoor correlation between SP performance in testing vs. real environmentCorrelation improves when factor in efficiency and consultation time
Rethans J-J, et al. BMJ 1991; 303: 1377-80
OSCE/SP: Feasibility/Practicality
Creating OSCE/SP examDetermine specific competencies to be testedTrain SP’s
Case-presentation, Rating, FeedbackDevelop check-lists or rating forms
Listen to heart in 4 placesDid patient make you feel comfortable
Set criteria for passing
OSCE/SP: Feasibility/Practicality
Creating OSCE/SP examTime-intensive
New SP can learn to simulate new case in 8-10 hrsExperienced SP can learn new case in 6-8 hrsLearning to use checklists to evaluate resident performance takes much longer
Cost-intensive ~300$ per student testedSpace-intensive (need rooms)Time/cost can be reduced by sharing SPs
http://www.vwsd.k12.ms.us/wwwroot1/vwse/moneysign.gif
OSCE/SP: Feasibility/Practicality
Challenges to using SPsAre SPs accurate and believable portraying their rolesAre they consistent and accurate completing checklists
What does the data show? When sent unannounced to MD’s office, experienced MD’s cannot differentiate SP from real patient Detection rate
OSCE/SP: Suggested References
Colliver JA, Swartz MH. Assessing clinical performance with standardized patients. JAMA. 1997 Sep 3;278(9):790-1.Van der Vleuten, CPM and Swanson, D. Assessment of clinical skills with standardized patients: State of the art. Teach Learn Med. 1990; 2: 58-76.Barrows HS. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Academic Medicine. 1993. 68(6):443-51.Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992-2003. Med Teach. 2003 May;25(3):262-70. Barrows HS. Acad Med 1993; 6: 443-51
Breakout #2
Statistical Methods
How Do You Choose A Statistical Test?
Goals
Use a case of an educational intervention to help demonstrate:
-
Study designs-
Variable types
-
Exploratory data analysis-
Confirmatory (inferential) data analysis
-
Basic interpretation of results
Step 1. Study Question
Step 2. Study Design
Step 3. Type of Outcome Variable
Step 4. Distribution of the Outcome Variable
The Four Step Approach to Choosing a Statistical Test
Case Presentation: Educational Intervention
New curriculum for 2nd-year medical students aimed at improving:
1.
Physical examination skills
2.
Confidence in performing
physical examination maneuvers
3.
Interpersonal skills
Randomized Controlled Trial
120 students
60 students 60 students
Standard Curriculum New Curriculum
Standardized patient exam used to evaluate outcomes
Step 1. Study Question
Step 2. Study Design
Step 3. Type of Outcome Variable
Step 4. Distribution of the Outcome Variable
The Four Step Approach to Choosing a Statistical Test
The Case: Study Question 1
Do participants and controls differ in the mean number of relevant physical
examination maneuvers performed correctly at the end of the curriculum?
Hypothesis Testing
Participants and controls do not
differ in the mean number of relevant physical
examination maneuvers performed correctly at the end of the curriculum.
Step 1. Study Question
Step 2. Study Design
Step 3. Type of Outcome Variable
Step 4. Distribution of the Outcome Variable
The Four Step Approach to Choosing a Statistical Test
Types of Study Designs: Observational vs. Experimental
Observational Study DesignStudies that observe groups at one or morepoints in time without imposing an intervention:
•
Cross-sectional studies•
Case-control studies
•
Cohort, longitudinal, prospective studies
Types of Study Designs: Observational vs. Experimental
Experimental Study DesignStudies that allocate interventions to one or more groups and make comparisons:
•
Pre-post tests•
Controlled clinical trials
•
Randomized controlled trials
Designing a Study: Are the Data Paired or Unpaired?
Importance: Measurements of paired subjects are more likely to be highly correlated (highly related) than measurements of two randomly selected subjects.
Designing a Study: Are the Data Paired or Unpaired?
Paired measurements come from common origins:
Same subject before and afterTwins (genetic)Husbands and wives (environmental)Matched cases and controls (e.g., age)
Designing a Study: Are the Data Paired or Unpaired?
Unpaired measurements come from 2 independent (or unrelated) groups:
e.g., Cholesterol levels from different study groups
Back to the Case Step 2: Study Design
Observational or Experimental?
Randomized Controlled TrialExperimental
Paired or Unpaired Data?
Within our RCT, we could have paired and unpaired data depending on the question we wish to answer.
Do participants and controls differ in mean number ofphysical exam maneuvers at the end of the curriculum?
Unpaired groups•
Intervention Students
•
Control Students
Step 1. Study Question
Step 2. Study Design
Step 3. Type of Outcome Variable
Step 4. Distribution of the Outcome Variable
The Four Step Approach to Choosing a Statistical Test
Types of Research Variables
ContinuousDichotomous
OrdinalNominal
Types of Research Variables
Continuous VariableVariable with no gaps in values
Example: Age
Birth
Death
Types of Research Variables
Dichotomous VariableA discrete categorical variable
with two possible values.
Example:
Gender
FemaleMale
Types of Research Variables
Ordinal VariableA ranked or ordered variable
Example:
Likert scale (1 –
5)
1 2 3 4 5
Types of Research Variables
Nominal VariableClassifies data into categories
Example:
Marital status
Single Married
Divorced Widowed
Back to the Case Step 3: Type of Outcome Variable
The mean number of relevant physicalexam maneuvers performed correctly.
Continuous variable
Step 1. Study Question
Step 2. Study Design
Step 3. Type of Outcome Variable
Step 4. Distribution of the Outcome Variable
The Four Step Approach to Choosing a Statistical Test
Exploratory Data Analysis
Why Explore Your Data?
Look for mistakes in data entryChoose summary measuresChoose a parametric or nonparametric statistical test
Look at your data!
Summary Measures
Measures of Central Tendency Mean = the average (continuous)Median = the midpoint (continuous, ordinal)Mode = the most frequent number (any variable)
Types of Statistical Tests
Parametric versus NonparametricStatistical Tests
Parametric TestsKey point: Use when evaluating continuous or ordinal variables with a normal distribution.
05
1015
Freq
uenc
y
0 20 40 60 80 100Age
Parametric Tests
Examples:
Student t-test –
comparison of means (unpaired)
Paired t-test –
comparison of means (paired)
Linear regression –
analysis when outcome is continuous and normally distributed
Nonparametric Tests
Key point: Use when the sample size is small or if data are NOT normally distributed.
More conservative than parametric tests.0
1020
3040
No.
Sei
zure
s
Plbo Drug
Boxplots: Baseline Seizure Rate
Treatment Group
Nonparametric Tests
Examples:
Wilcoxon rank sum test –
unpaired
Wilcoxon signed rank test –
paired
Nonparametric regression
Confirmatory Data Analysis (Inferential Statistics)
Inferential Statistics
Uses Hypothesis Testing to:
Assess the strength of the evidenceMake predictions Draw conclusions about a population
Based on sample data
Inferential Statistics
1. Comparisons between two groups:Bivariate analyses
2. Assess one outcome with more than one predictor variables:Multivariable regression analyses
Regression
Statistical method used to describe theassociation between one dependent (outcome) variable and one or more independent (predictor) variables.
One Reason to use Regression:To adjust for confounding factors.
Confounding Factor
A variable related to ≥
1 of the variables in a study.It may mask an actual association or falsely demonstratean association where no real association exists.
Examples: age, gender, comorbidities
Back to the Case
Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?
Step 2: Study DesignRandomized controlled trial using unpaired
data.
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Back to the Case
Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?
Step 2: Study DesignRandomized controlled trial using unpaired
data.
Step 3: Type of Outcome VariableContinuous
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed?
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Back to the Case Step 4: Distribution of the Outcome Variable
05
1015
20
0 2 4 6 8 0 2 4 6 8
1 2
Freq
uenc
y
P h ys ical E xa m Item s O btain edG raphs by 1 = Case, 2 = Control
Intervention Control
The distribution of the number of physical exam maneuvers for each group plotted on a histogram appears normally distributed.
Back to the Case
Step 1: Study QuestionIs there a difference in the mean number of relevant physical examination maneuvers performed correctlybetween groups?
Step 2: Study DesignRandomized controlled trial using unpaired
data.
Step 3: Type of Outcome VariableContinuous
Step 4: Distribution of Outcome VariableNormally distributed
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed?
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed?
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Parametric
Nonparametric
Figure B
Understanding and Interpreting our
Statistical Results
Results of the Student t-test
Mean number (standard deviation)of relevant physical examinationmaneuvers performed correctly
Intervention Control14.4 (1.1) 12.1 (1.0)
p
Case Interpretation
Reject the null hypothesis and conclude:
The intervention students scoredstatistically significantly higher than the controls.
Curriculum for second-year medical students aimed at improving:
1.
Physical examination skills 2.
Confidence in performing
physical
examination maneuvers 3.
Interpersonal skills
Study Question 2
Step 1: Study Question 2
Is there a difference in the intervention students’
confidence level in performing
physical examination maneuvers before and after the curriculum?
Step 2: Study Design
Intervention students before
and after
the curriculum
Pre-post design comparing paired group
Two Paired (Dependent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed
(parametric)
Paired t-testWilcoxon
signed-rank test
McNemar’sTest
Wilcoxonsigned-rank
test
McNamar’sTest
noyes
Parametric Nonparametric
Figure C
Step 3: Type of Outcome Variable
The confidence level is measured on a 4-point Likert scale: 1 = not very confident
4 = very confidentand is a(an) _______ variable.
Step 3: Type of Outcome Variable
The confidence level is measured on a 4-point Likert scale: 1 = not very confident
4 = very confidentand is a(an)
ordinal variable.
Two Paired (Dependent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed
(parametric)
Paired t-testWilcoxon
signed-rank test
McNemar’sTest
Wilcoxonsigned-rank
test
McNamar’sTest
noyes
Parametric Nonparametric
Figure C
Step 4: Distribution of the Outcome Variable
Before After
020
4060
80Fr
eque
ncy
1 2 3 4Confidence After
010
2030
40Fr
eque
ncy
1 2 3 4Confidence Before
Statistical Test
Compare differences in confidence scores with a: parametric
or nonparametric test?
nonparametric test
Two Paired (Dependent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed
(parametric)
Paired t-testWilcoxon
signed-rank test
McNemar’sTest
Wilcoxonsigned-rank
test
McNamar’sTest
noyes
Parametric Nonparametric
Figure C
ResultsThe median number (interquartile range)
of the confidence scores:
Pre Post2 (IQR 2 – 3) 3.5 (IQR 3 – 4)
p
Interpretation
Reject the null hypothesis and conclude:
The intervention was successful at improving students’
confidence.
Curriculum for second-year medical students aimed at improving:
1.
Physical examination skills 2.
Confidence in performing
physical
examination maneuvers 3.
Interpersonal skills
Study Question 3
Step 1: Study Question 3
Do participants and controls differ in their overall interpersonal scores at the end of
the curriculum?
Step 2: Study Design
Randomized controlled trial comparingtwo: paired
or unpaired groups?
two unpaired groups:
•
Intervention Students•
Control Students
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed
(parametric)
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Step 3: Type of Outcome Variable
The overall interpersonal score is the sum of the20-item interpersonal scores each rated on a
5-point Likert scale.
(1=poor, 5=excellent)
This score is continuous ranging from 20 -
100.
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed?
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Step 4: Distribution of the Outcome Variable
05
1015
20
4 0 6 0 8 0 1 0 0 4 0 6 0 8 0 1 0 0
1 2
Freq
uenc
y
T o ta l In te rp e r so n a l S c o r eG ra ph s b y 1 = Ca se , 2 = C on tro l
Intervention Control
Step 4: Distribution of the Outcome Variable
Although the outcome is continuous, the distribution of the scores plotted on a histogramappeared negatively skewed.
Statistical Test
Compare differences in overall interpersonal scores with a: parametric
or nonparametric test?
nonparametric test
Two Unpaired (Independent)
Samples
ContinuousOutcome
Dichotomous Outcome
OrdinalOutcome
NominalOutcome
Normally Distributed?
Student t-test
Wilcoxonrank-sum
Small Sample Size
Fisher’s Exact test
Chi-square test
Wilcoxonrank-sum
Fisher’s Exact test
noyes noyes
Figure B
Parametric
Nonparametric
Results
The median number (interquartile range) of the interpersonal scores:
Intervention Control79 (IQR 72 – 89) 74 (IQR 65 – 86)
p=0.06
Interpretation
Cannot reject the null hypothesis and conclude:
Our curriculum did not significantly improve interpersonal skills.
Any Questions ? ? ?
Your Turn!
Final Comments
Consult a statistician or someone with statistical knowledge early in your research for guidance.
Helpful BooksIntuitive BiostatisticsStudying a Study and Testing a Test Basic and Clinical Biostatistics
Other Resources
1. Free Statistical Calculatorshttp://graphpad.com/quickcalcs/index.cfm
2. Free Statistical & Power Analysis Softwarehttp://www.ncss.com/download.html
3. Online Statistics Texthttp://www.statsoft.com/textbook/stathome.html
http://graphpad.com/quickcalcs/index.cfmhttp://www.ncss.com/download.htmlhttp://www.statsoft.com/textbook/stathome.html
ReferencesLuck J, Peabody JW, Dresselhaus TR, Lee M, Glassman P. How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record. American Journal of Medicine. 2000 Jun 1;108(8):642-9.Houston TK, Wall T, Allison JJ, Palonen K, Willett LL, Keife CI, Massie FS, Benton EC, Heudebert GR. Implementing achievable benchmarks in preventive health: a controlled trial in residency education. Academic Medicine. 2006 Jul;81(7):608-16.Furman CD, Head B, Lazor B, Casper B, Ritchie CS. Evaluation of an educational intervention to encourage advance directive discussions between medicine residents and patients. J Palliat Med. 2006 Aug;9(4):964-7Miller GE. The assesment of clinical skills/competence/performance. Acad Med 1990. 65 (S9) S63-67. Colliver JA, Swartz MH. Assessing clinical performance with standardized patients. JAMA. 1997 Sep 3;278(9):790-1.Van der Vleuten, CPM and Swanson, D. Assessment of clinical skills with standardized patients: State of the art. Teach Learn Med. 1990; 2: 58-76.Barrows HS. An overview of the uses of standardized patients for teaching and evaluating clinical skills. Academic Medicine. 1993. 68(6):443-51.Epstein RM, Hundert EM. Defining and assessing professional competence.JAMA. 2002 Jan 9;287(2):226-35. Adamo G. Simulated and standardized patients in OSCEs: achievements and challenges 1992-2003. Med Teach. 2003 May;25(3):262-70. Norman GR et al. J Med Educ 1982; 57:708-15Petrusa ER, et al. Arch Int Med 1990; 150: 573-7Rethans J-J, et al. BMJ 1991; 303: 1377-80Norman GR, et al. J Med Educ 1985; 60: 925-34King AM, et al. Teach Learn Med 1994; 6: 6-14Williams R. Teach Learn Med 2004; 16 (2): 215-222.http://mededonline.usc.edu/spcalconsortium.html
Curriculum EvaluationWorkshop OverviewVocabularyEvaluationQuantitative DataQuantitative AnalysisQualitative DataQualitative AnalysisReliabilityValidityFace ValidityCriterion ValidityConstruct ValidityCongruenceExamples of CongruenceHypothesis testingSlide Number 17Slide Number 18Slide Number 19Slide Number 20Slide Number 21Slide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Study designsStudy design: �Post testStudy design: �Pretest-Post testStudy design: �Pretest-Post testStudy design: �Randomized controlled trialRCT: Option 2Alternate control groupsStudy design: �Randomized controlled trialBreakout #1Evaluation Objects� Evaluation objectsLevels of EvaluationResourcesEvaluation MethodsQuestionnaires/SurveysDescriptionStep 1Step 2Slide Number 46Step 3Slide Number 48Closed-ended questionsSlide Number 50Open-ended questionsStep 4Step 5Step 6UsePsychometric qualitiesSlide Number 57Feasibility/practicalitySuggested referencesSelf-assessment exercisesDescriptionSlide Number 62Slide Number 63Slide Number 64Motivational DiscomfortUsePsychometric qualitiesSlide Number 68Slide Number 69Feasibility/practicalityResourcesMultiple Choice Questions (MCQs)DescriptionSlide Number 74Slide Number 75The StemBad stem (1)Better stemBad stem (2)Better stemBad stem (3)Better stemThe OptionsBad items (1)Bad items (2)UsePsychometric QualitiesCueingItem discriminationCronbach’s alphaFeasibility/practicalityResourcesPatient SurveysMiller’s Pyramid of Clinical CompetenceDescriptionPsychometric QualitiesUsesFeasibilityExampleExample cont.Example Cont.ReferencesOral Examinations & �Chart Stimulated RecallMiller’s Pyramid of Clinical CompetenceDescriptionDescription- Oral BoardsPsychometric Qualities�Board Oral ExamsChart Stimulated RecallPsychometric Qualities�CSR (Best Circumstances)UsesFeasibilityExample- Oral ExamOral Exam ContExample- CSRCSR Cont.ReferencesPerformance AuditMiller’s Pyramid of Clinical CompetencePerformance Audit: �DescriptionPerformance Audit: �DescriptionPerformance Audit: �UsesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: � Psychometric QualitiesPerformance Audit: Feasibility/PracticalityPerformance Audit: �Example in the literaturePerformance Audit: �Example in the literaturePerformance Audit: �Example in the literature 2Performance Audit: �Suggested ReferencesStandardized Patients�and Objective Structured Clinical ExaminationSPs and OSCEsMiller’s Pyramid of Clinical CompetenceStandardized Patients: DescriptionStandardized Patients: VocabularyStandardized Patients:� UsesStandardized Patients:� UsesStandardized Patients:� UsesOSCE �AKA Clinical Skills Assessment/ Exam �(CSA/ CSE)�AKA Clinical Practice Examinations (CPX)OSCE: DescriptionOSCE: DescriptionOSCE: DescriptionOSCE: DescriptionOSCE:UsesOSCE/SP:� PsychometricsOSCE/SP:� PsychometricsOSCE/SP:� PsychometricsOSCE/SP: � PsychometricsOSCE/SP: �Feasibility/PracticalityOSCE/SP: �Feasibility/PracticalityOSCE/SP: � Feasibility/PracticalityOSCE/SP:� Suggested ReferencesBreakout #2Statistical MethodsSlide Number 155Slide Number 156Slide Number 157Slide Number 158Slide Number 159Slide Number 160Slide Number 161Slide Number 162Slide Number 163Slide Number 164Slide Number 165Slide Number 166Slide Number 167Slide Number 168Slide Number 169Slide Number 170Slide Number 171Slide Number 172Slide Number 173Slide Number 174Slide Number 175Slide Number 176Slide Number 177Slide Number 178Slide Number 179Slide Number 180Slide Number 181Slide Number 182Slide Number 183Slide Number 184Slide Number 185Slide Number 186Slide Number 187Slide Number 188Slide Number 189Slide Number 190Slide Number 191Slide Number 192Slide Number 193Slide Number 194Slide Number 195Slide Number 196Slide Number 197Slide Number 198Slide Number 199Slide Number 200Slide Number 201Slide Number 202Slide Number 203Slide Number 204Slide Number 205Slide Number 206Slide Number 207Slide Number 208Slide Number 209Slide Number 210Slide Number 211Slide Number 212Slide Number 213Slide Number 214Slide Number 215Slide Number 216Slide Number 217Slide Number 218Slide Number 219Slide Number 220Slide Number 221Slide Number 222Slide Number 223Slide Number 224Slide Number 225Slide Number 226Slide Number 227Slide Number 228Slide Number 229Slide Number 230Slide Number 231Slide Number 232Slide Number 233Slide Number 234Slide Number 235References