Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | tamsyn-bell |
View: | 214 times |
Download: | 0 times |
Department of Primary Health CareUniversity of Oxford
Matthew J. ThompsonGP & Senior Clinical Scientist
Appraising diagnostic studies
Overview of talk
Diagnostic reasoning Appraising diagnostic studies
What is diagnosis?
Increase certainty about presence/absence of disease
Disease severity Monitor clinical course Assess prognosis – risk/stage
within diagnosis Plan treatment e.g., location Stalling for time!
Knottnerus, BMJ 2002
Diagnostic errors
Most diagnostic errors due to cognitive errors: Conditions of uncertainty Thinking is pressured Shortcuts are used(Ann Croskerry. Ann Emerg Med 2003)
Human toll of diagnostic errors (Diagnostic errors - The next frontier for Patient Safety. Newman-Toker, JAMA 2009)
40,000-80,000 US hospital deaths from misdiagnosis per year
Adverse events, negligence cases, serious disability more likely to be related to misdiagnosis than drug errors
Diagnostic reasoning
Diagnostic strategies particularly important where patients present with variety of conditions and possible diagnoses.
Diagnostic reasoning
For example, what causes cough? Comprehensive history examination differential
diagnosis final diagnosis
Diagnostic reasoning
For example, what causes cough? Comprehensive history…examination…differential
diagnosis…final diagnosis
Cardiac failure, left sided , Chronic obstructive pulmonary disease , Lung abscess Pulmonary alveolar proteinosis, Wegener's granulomatosis, Bronchiectasis Pneumonia, Atypical pneumonia, Pulmonary hypertension Measles, Oropharyngeal cancer, Goodpasture's syndrome Pulmonary oedema, Pulmonary embolism, Mycobacterium tuberculosis Foreign body in respiratory tract, Diffuse panbronchiolitis, Bronchogenic carcinoma Broncholithiasis, Pulmonary fibrosis, Pneumocystis carinii Captopril, Whooping cough, Fasciola hepatica Gastroesophageal reflux, Schistosoma haematobium, Visceral leishmaniasis Enalapril, Pharyngeal pouch, Suppurative otitis media Upper respiratory tract infection, Arnold's nerve cough syndrome, Allergic bronchopulmonary aspergillosis Chlorine gas, Amyloidosis, Cyclophosphamide Tropical pulmonary eosinophilia, Simple pulmonary eosinophilia, Sulphur dioxide Tracheolaryngobronchitis, Extrinsic allergic alveolitis, Laryngitis Fibrosing alveolitis, cryptogenic, Toluene di-isocyanate, Coal worker's pneumoconiosis Lisinopril, Functional disorders, Nitrogen dioxide, Fentanyl Asthma, Omapatrilat, Sinusitis Gabapentin, Cilazapril
……diagnostic reasoning
53!
Diagnostic reasoning strategies
Aim: identify types and frequency of diagnostic strategies used in primary care 6 GPs collected and recorded strategies used on 300
patients.
(Diagnostic strategies used in primary care. Heneghan, Glasziou, Thompson et al,. BMJ in press)
Refinement of the diagnostic
causes
•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule
Spot diagnosesSelf-labelling Presenting complaintPattern recognition
Initiation of the diagnosis
Defining the final diagnosis
Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label
Diagnostic stages & strategies
Stage Strategy
Refinement of the diagnostic
causes
•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule
Spot diagnosesSelf-labelling Presenting complaintPattern recognition
Initiation of the diagnosis
Defining the final diagnosis
Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label
Diagnostic stages & strategies
Stage Strategy
Unconscious recognition of non-verbal pattern, e.g.:
visual (skin condition)
auditory (barking cough with croup) Fairly instantaneous, no further history needed. 20% of consultations
*Brooks LR. Role of specific similarity in a medical diagnostic task. J Exp Psychol Gen 1991;220:278-87
Initiation: Spot diagnosis
Initiation: Self-labelling
“Its tonsillitis doc– I’ve had it before”
“I have a chest infection doctor”
20% of consultations
Accuracy of self-diagnosis in recurrent UTI
88 women with 172 self-diagnosed UTIs Uropathogen in 144 (84%) Sterile pyuria in 19 cases
(11%) No pyuria or bacteriuira in
9 cases (5%)
(Gupta et al Ann Int Med 2001)
Refinement of the diagnostic
causes
•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition•Clinical Prediction Rule
Spot diagnosesSelf-labelling Presenting complaintPattern recognition
Initiation of the diagnosis
Defining the final diagnosis
Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label
Stage Strategy
Refining: Restricted rule-out (or Murtagh’s) process
A learned diagnostic strategy for each presentation Think of the most common/likely condition AND… what needs to be ruled out also?
Example: patient with headache …learn to check for migraine, tension type headache, but to rule out temporal arteritis, subarachnoid haemorrhage etc
Used in 30% consultations
Murtagh. Australian Fam Phys 1990. Croskerry Ann Emerg Med 2003
Refining: Probabilistic reasoning
The use of a specific but probably imperfect symptom, sign or diagnostic test to rule in or out a diagnosis.
E.g. urine dipstick for UTI, arterial tenderness in Temporal Arteritis
Used 10% of cases
Refining: Pattern recognition
Symptoms and signs volunteered or elicited from the patient are compared to previous patterns or cases and a disease is recognized when the actual pattern fits.
Relies on memory of known patterns, but no specific rule is employed.
Used in 40% cases
Refining: Clinical prediction rules
Formal version of pattern recognition based on a well defined and validated series of similar cases.
Examples: Ottawa ankle rule, streptococcal sore throat,
Rarely used <10% cases
Refinement of the diagnostic
causes
•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule
Spot diagnosesSelf-labelling Presenting complaintPattern recognition
Initiation of the diagnosis
Defining the final diagnosis
Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label
Stage Strategy
Final diagnostic stage
Known diagnosis Order further tests Test of treatment Test of time Can’t label
Known
Diagnosis
Further tests ordered
Test of treatment
Test of time
No Label0%
10
%2
0%
30
%4
0%
50
%6
0%
70
%8
0%
90
%1
00%
Defining the final diagnoses
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look after my patients?
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look after my patients?
Basic design of diagnostic accuracy study
Series of patientsSeries of patients
Index testIndex test
Reference (“gold”) standardReference (“gold”) standard
Blinded cross-classificationBlinded cross-classification
Validity of diagnostic studies
1. Was an appropriate spectrum of patients included?
2. Were all patients subjected to the gold standard?
3. Was there an independent, blind or objective comparison with the gold standard?
Selected PatientsSelected Patients
Index testIndex test
Reference standardReference standard
Blinded cross-classificationBlinded cross-classification
1. Was an appropriate spectrum of patients included? Spectrum bias
1. Was an appropriate spectrum of patients included? Spectrum bias
You want to find out how good chest X rays are for diagnosing pneumonia in the Emergency Department
Best = all patients presenting with difficulty breathing get a chest X-ray
Spectrum bias = only those patients in whom you really suspect pneumonia get a chest X ray
Series of patientsSeries of patients
Index testIndex test
Reference Reference standardstandard
Blinded cross-classificationBlinded cross-classification
2. Were all patients subjected to the gold standard? Verification (work-up) bias
2. Were all patients subjected to the gold standard? Verification (work-up) bias You want to find out how good is exercise
ECG (“treadmill test”) for identifying patients with angina
The gold standard is angiography Best = all patients get angiography Verification (work-up bias) = only patients
who have a positive exercise ECG get angiography
Series of patientsSeries of patients
Index testIndex test
Reference standardReference standard
Unblinded cross-classificationUnblinded cross-classification
3. Was there an independent, blind or objective comparison with the gold
standard? Observer bias
You want to find out how good is exercise ECG (“treadmill test”) for identifying patients with angina
All patients get the gold standard (angiography)
Observer bias = the Cardiologist who does the angiography knows what the exercise ECG showed (not blinded)
3. Was there an independent, blind or objective comparison with the gold
standard? Observer bias
Series of patientsSeries of patients
Index testIndex test
Reference standard….. includes parts of Reference standard….. includes parts of Index testIndex test
Unblinded cross-classificationUnblinded cross-classification
Incorporation Bias
Series of patientsSeries of patients
Index testIndex test
Blinded cross-classificationBlinded cross-classification
Ref. Std ARef. Std A Ref. Std. BRef. Std. B
Differential Reference Bias
Validity of diagnostic studies
1. Was an appropriate spectrum of patients included?
2. Were all patients subjected to the Gold Standard?
3. Was there an independent, blind or objective comparison with the Gold Standard?
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look after my patients?
Sensitivity, specificity, positive & negative
predictive values, likelihood ratios
…aaarrrggh!!
2 by 2 table
Disease
Test
+ -
+
-
2 by 2 table
Disease
Test
+ -
+
- c
a b
d
2 by 2 table
Disease
Test
+ -
+
-
a
True positives
c
False negatives
b
False positives
d
True negatives
2 by 2 table: sensitivity
Disease
Test
+ -
+
- c
a
Sensitivity = a / a + c
Proportion of people with the disease who have a positive test result.
.…a highly sensitive test will not miss many people
2 by 2 table: sensitivity
Disease
Test
+ -
+
- 1
99
Sensitivity = a / a + c
Sensitivity = 99/100 = 99%
2 by 2 table: specificity
Disease
Test
+ -
+
-
b
d
Specificity = d / b + d
Proportion of people without the disease who have a negative test result.
….a highly specific test will not falsely identify people as having the disease.
Tip…..
Sensitivity is useful to me
Specificity isn’t….I want to know about the false positives
…so……use 1-specificity which is the false positive rate
2 by 2 table:
Disease
Test
+ -
+
- c
a b
d
Sensitivity = a/a+c False positive rate = b/b+d
(same as 1-specificity)
2 by 2 table:
Disease
Test
+ -
+
- 1
99 10
90
Sensitivity = 99% False positive rate = 10%
(same as 1-specificity)
Example
Your father went to his doctor and was told that his test for a disease was positive. He is really worried, and comes to ask you for help!
After doing some reading, you find that for men of his age:
The prevalence of the disease is 30%The test has sensitivity of 50% and specificity of 90%
“Son, tell me what’s the chance I have this disease?”
100% Always
50% maybe
0% Never
A disease with a A disease with a prevalence of 30%.prevalence of 30%.
The test has The test has sensitivity of 50% sensitivity of 50% and specificity of and specificity of
90%.90%.
Prevalence of 30%, Sensitivity of 50%, Specificity of 90%
30
70
15
7100
22 people test positive……….
of whom 15 have the disease
So, chance of disease is 15/22 about 70%
Disease +ve
Disease -ve
Testing +ve
Sensitivity = 50%
False positive rate = 10%
A disease with a prevalence of 4% must be diagnosed.
It has a sensitivity of 50% and a specificity of 90%.
If the patient tests positive, what is the chance they have the disease?
Try it againTry it again
Prevalence of 4%, Sensitivity of 50%, Specificity of 90%
4
96
2
9.6
100
11.6 people test positive……….
of whom 2 have the disease
So, chance of disease is 2/11.6 about 17%
Disease +ve
Disease -ve
Testing +ve
Sensitivity = 50%
False positive rate = 10%
Doctors with an average of 14 yrs experience
….answers ranged from 1% to 99%
….half of them estimating the probability as 50%
Gigerenzer G BMJ 2003;327:741-744
Sensitivity and specificity don’t vary with prevalence Test performance can vary in different settings/
patient groups, etc. Occasionally attributed to differences in disease
prevalence, but more likely is due to differences in diseased and non-diseased spectrums
2 x 2 table: positive predictive value
Disease
Test
+ -
+
- c
a b
d
PPV = a / a + b
Proportion of people with a positive test who have the disease
2 x 2 table: negative predictive value
Disease
Test
+ -
+
- c
a b
d
NPV = d / c + d
Proportion of people with a negative test who do not have the disease
What’s wrong with PPV and NPV?
Depend on accuracy of the test and prevalence of the disease
Likelihood ratios
Can use in situations with more than 2 test outcomes
Direct link from pre-test probabilities to post-test probabilities
2 x 2 table: positive likelihood ratio
Disease
Test
+ -
+
- c
a b
d
LR+ = a/a+c / b/b+d
or
LR+ = sens/(1-spec)
How much more often a positive test occurs in people with compared to those without the disease
2 x 2 table: negative likelihood ratio
Disease
Test
+ -
+
- c
a b
d
LR- = c/a+c / d/b+d
or
LR- = (1-sens)/(spec)
How less likely a negative test result is in people with the disease compared to those without the disease
LR>10 = strong positive test result
LR<0.1 = strong negative test result
LR=1
No diagnostic value
McGee: Evidence based Physical Diagnosis (Saunders Elsevier)
Post test 20%
? Appendicitis:
McBurney tenderness LR+ = 3.4
Pre test 5%
%
%
Bayesian reasoning
Fagan nomogram
Do doctors use quantitative methods of test accuracy?
Survey of 300 US physicians 8 used Bayesian methods, 3 used
ROC curves, 2 used LRs Why?
…indices unavailable…
…lack of training…
…not relevant to setting/population.
…other factors more important…
(Reid et al. Academic calculations versus clinical judgements: practicing physicians’ use of quantitative measures of test accuracy. Am J Med 1998)
Appraising diagnostic tests
1. Are the results valid?
2. What are the results?
3. Will they help me look after my patients?
Will the test apply in my setting?
Reproducibility of the test and interpretation in my setting
Do results apply to the mix of patients I see? Will the results change my management? Impact on outcomes that are important to patients? Where does the test fit into the diagnostic strategy? Costs to patient/health service?
Reliability – how reproducible is the test?
Kappa = measure of intra-observer reliability
Test Kappa value
Tachypnoea 0.25
Crackles on auscultation
0.41
Pleural rub 0.52
CXR for cardiomegaly
0.48
MRI spine for disc
0.59
Value of Kappa Strength of Agreement
<0.20 Poor
0.21-0.40 Fair
0.41-0.60 Moderate
0.61-0.80 Good
0.81-1.00 Very Good
Will the result change management?
No action Test Action
(e.g. treat)
Probability of disease0% 100%
Testing threshold
Action threshold
Any questions!