Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | osborn-johnson |
View: | 215 times |
Download: | 0 times |
Common Errors by Teachers and Proponents of EBM
Thomas B. Newman, MD, MPHwith thanks to Michael Kohn, MD, MPP
and Andi Marmor, MD
Evidence-Based Pediatrics SIG, 2012
Outline/Menu Interval likelihood ratios
Septic arthritis When not to use likelihood ratios
UTI in young febrile children Critical appraisal of studies of diagnostic tests: Beyond the checklist Signs and symptons of appendicitis
Getting the most out of ROC curves (LAST YEAR): Meningitis in young infants ROC Curve demonstration
Septic ArthritisBacterial infection in a joint.
Does this Adult Patient Have Septic Arthritis? JAMA. 2007;297:1478-1488.
“A 48-year-old woman…presents to the emergency department with a 2-day history of a red, swollen right knee that is painful to touch….
On examination, she is afebrile and has a right knee effusion…An arthrocentesis is performed and initial laboratory results show a negative Gram stain...”
Pre-Test Probability of Septic Arthritis = 38%Synovial Fluid WBC Count = 48,000/µLPost-Test Probability of Septic Arthritis = ?
Copyright restrictions may apply.
Margaretten, M. E. et al. JAMA 2007;297:1478-1488.
Test Characteristics of Synovial Fluid Studies
WBC (/uL)
Sensitivity
Specificity LR+ LR-
>100,000 29% 99%29.0 0.7
>50,000 62% 92% 7.8 0.4
>25,000 77% 73% 2.9 0.3
Synovial WBC Count = 48,000/uL
Which LR should we use?
Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at Different Cutoffs
Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic
Arthritis at 3 Different Cutoffs
WBC (/uL)
Sensitivity
Specificity LR+ LR-
>100,000 29% 99%29.0 0.7
>50,000 62% 92% 7.8 0.4
>25,000 77% 73% 2.9 0.3
Synovial WBC Count = 48,000/uLJAMA authors used this one
Clinical ScenarioSynovial WBC = 48,000/mL
Pre-test prob: 0.38Pre-test odds: 0.38/0.62 = 0.61LR(+) = 2.9 (According to JAMA authors)
Post-Test Odds = Pre-Test Odds x LR(+)
= 0.61 x 2.9 = 1.75Post-Test prob = 1.75/(1.75+1) = 0.64
WBC (/uL)
Sensitivity
Specificity LR+ LR-
>100,000 29% 99%29.0 0.7
>50,000 62% 92% 7.8 0.4
>25,000 77% 73% 2.9 0.3
Synovial WBC Count = 48,000/uL
Which LR should we use?
Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs
Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic
Arthritis at 3 Different Cutoffs
WBC (/uL)
Sensitivity
Specificity LR+ LR-
>100,000 29% 99%29.0 0.7
>50,000 62% 92% 7.8 0.4
>25,000 77% 73% 2.9 0.3
Synovial WBC Count = 48,000/uL
Which LR should we use? NONE of THESE!
Likelihood Ratios
LR(result) = P(result|D+)/P(result|D-)
P(Result) in patient WITH disease
----------------------------------------------------P(Result) in patients WITHOUT disease
Likelihood RatioWBC (/uL) Interval % of D+ % of D-
Interval LR
>100,000 29% 1% 29.0
>50,000-100,000 33% 7% 4.7
>25,000-50,000 15% 19% 0.8
0 - 25,000 23% 73% 0.3
Likelihood RatioWBC (/uL) Interval % of D+ % of D-
Interval LR
>100,000 29% 1% 29.0
>50,000-100,000 33% 7% 4.7
>25,000-50,000 15% 19% 0.8
0 - 25,000 23% 73% 0.3
More appropriate LR?
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 - Specificity
Sen
sitiv
ity
> 50k
> 25k
15%
19%Slope = 15%/19% =0.8
LR = Slope of ROC Curve
Clinical ScenarioSynovial WBC = 48,000/uL
Pre-test prob: 0.38Pre-test odds: 0.38/0.62 = 0.61LR(WBC btw 25,000 and 50,000) = 0.8Post-Test Odds = Pre-Test Odds x LR(48)
= 0.61 x 0.8 = 0.49Post-Test prob = 0.49/(0.49+1) = 0.33
Doing it right makes a difference
From JAMA paper:“Her synovial WBC count of 48,000/µL increases the probability from 38% to 64%.” (Used LR = 2.9)
Alternative calculation:Her synovial WBC count of 48,000/µL decreases the probability from 38% to 33%.” (Used LR = 0.8)
Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure?
JAMA. 2005;294:1944-1956.
“In this case, a BNP level could be very helpful. If it were less than 100 pg/mL, heart failure would be extremely unlikely (LR 0.09). If it were elevated, the probability of heart failure is higher but not diagnostic.”
How to interpret serum BNP (B-type Natriuretic Peptide) results?
Copyright restrictions may apply.
Wang, C. S. et al. JAMA 2005;294:1944-1956.
Summary of Operating Characteristics of Serum BNP in Emergency Department Patients
When NOT to use LR
Background
Black children (at least girls) appear to be at lower risk of UTI (RR ~0.3)
Circumcised boys are at much lower risk than uncircumcised boys (RR ~0.1)
In diagnosing UTI, it makes sense to use both history findings like these with physical examination (height of fever, etc.) and laboratory (urine white cells)
But there is a very important difference!
Does This Child Have a UTI?
JAMA. 2007;298(24):2895-2904
Does This Child Have a UTI?
JAMA. 2007;298(24):2895-2904
What is wrong with using LRs for these
risk factors?
LR will vary tremendously with the prevalence of the risk factor in each study!
Definitions
LR+= a/(a+c) b/(b+d)
LR- = c/(a+c) d/(b+d)
OR = ad/bc = LR+/LR-
Disease Risk factor or Test Result Yes No TotalPresent (+) a b a+bAbsent (-) c d c+dTotal a+c b+d N
Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel A: Low prevalence of strong risk factor.
Figure 8.9
Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel B: High prevalence of strong risk factor.
Figure 8.9
OR vs LR
Characteristic LR ORBest for Single or independent
diagnostic test resultsRisk factors and treatments
Main use Revising probability estimates
Identifying and quantifying causal associations
Causality direction Best if disease causes test result
Best if risk factor causes disease
Typical values Varies, but for dichotomous tests LR often >10 or < 0.1
Many OR between 0.3 and 3
Depend on prevalence of test result or risk factor
Yes No
Additional problem: failing to quantify risks and benefits of tests and treatments, leading overly aggressive testing recommendations
Except in blacks, urinalysis and urine culture recommended for: Girls and uncircumcised boys 3-24 months with any fever of any duration even if they look well and have an apparent source
Circumcised boys with any fever > 24 hours even if they look well and have an apparent source
*Shaikh N et al. JAMA 2007;298:2895-2904, figures 2 & 3
Critical Appraisal of Studies of Diagnostic Test Accuracy
Index Test = Test Being Evaluated
Gold Standard = Test Used to Determine True Disease Status
Chapter 5 – Studies of Diagnostic Tests
Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)
Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)
Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)*
Spectrum Bias D+ sickest of the sick (Sensitivity Up) D- wellest of the well (Specificity Up)
*If cases resolve spontaneously.
Bias #2 Example: Visual assessment of jaundice in newborns Study patients who are getting a bilirubin measurement
Ask clinicians to estimate extent of jaundice at time of blood draw
Compare with blood test
Visual Assessment of jaundice*: Results
*Moyer et al., APAM 2000; 154:391
Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97%
Specificity = 19%
What is the problem?
Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication.
--Catherine D. DeAngelis, MD
Bias #2: Verification Bias* -1
Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin
Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice
How does this affect sensitivity and specificity?
*AKA Work-up, Referral Bias, or Ascertainment Bias
Verification Bias
TSB >12 TSB < 12
Jaundice below nipple
a b
No jaundice below nipple
c d
Sensitivity, a/(a+c), is biased ___.
Specificity, d/(b+d), is biased ___.
*AKA Work-up, Referral Bias, or Ascertainment Bias
Double Gold Standard Bias
Two different “gold standards” One gold standard (usually an immediate, more invasive test, e.g., angiogram, surgery) is more likely to be applied in patients with positive index test
Second gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.
Double Gold Standard Bias
There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up)
Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)
Effect of Double Gold Standard Bias: Spontaneously resolving disease
Test result will always agree with gold standard
Both sensitivity and specificity increase Example: Joey has an intussusception that will resolve spontaneously. If his ultrasound scan is positive, he will get a contrast enema that will show (and cure) the intussusception (true positive)
If his ultrasound scan is negative, his intussusception will resolve and we will think he never had one (true negative)
Ultrasound scan can’t be wrong!
Copyright restrictions may apply.
Does This Child Have Appendicitis?JAMA. 2007;298:438-451.
RLQ Pain: Sensitivity = 96% Specificity = 5% (1 – Specificity = 95%)
Likelihood Ratio =1.0RLQ pain was present in 96% of those with appendicitis and 95% of those without appendicitis.
Verification (Referral) Bias
Biases the accuracy of a finding when the presence of the finding makes the patient more likely to be studied.
Specificity biased down (5%) .Sensitivity biased up (96%) .
Copyright restrictions may apply.
Bundy, D. G. et al. JAMA 2007;298:438-451.
Study Population: Children who underwent appendectomy
Does the LR of 1 mean that, in children, RLQ pain is not indicative of appendicitis?
No; it means only kids with RLQ pain get appendectomies.
Studies of Diagnostic Test Accuracy: Checklist
Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?
Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?
Was the reference standard applied regardless of the diagnostic test result?
Was the test (or cluster of tests) validated in a second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68
A clinical decision rule to identify children at low risk for appendicitis* (Problem 5.6 in EBD)
Study design: prospective cohort study Subjects
4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain
767 (19%) received surgical consultation for possible appendicitis
113 Excluded (chronic diseases, recent imaging)
53 missed 601 included in the study (425 in derivation set)
*Kharbanda et al. Pediatrics 2005; 116(3): 709-16
A clinical decision rule to identify children at low risk for appendicitis
Predictor variables Standardized assessment by pediatric ED attending
Focus on “Pain with percussion, hopping or cough” (complete data in N=381)
Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%)
Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%)
Kharbanda et al. Pediatrics 116(3): 709-16
A clinical decision rule to identify children at low risk for appendicitis
Results: Pain with percussion, hopping or cough
78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image?
Kharbanda et al. Pediatrics 116(3): 709-16
Checklist
Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?
Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?
Was the reference standard applied regardless of the diagnostic test result?
Was the test (or cluster of tests) validated in a second, independent group of patients?
From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68
In what direction would these biases affect results?
Sample not representative (population referred to pedi surgery)?
Verification bias? Double-gold standard bias? Spectrum bias
For children presenting with abdominal pain to SFGH 6-M Sensitivity probably valid (not falsely low) But whether all of the kids in the study tried to hop is not clear
Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision