Common Errors by Teachers and Proponents of EBM Thomas B. Newman, MD, MPH with thanks to Michael...

Common Errors by Teachers and Proponents of EBM

Thomas B. Newman, MD, MPHwith thanks to Michael Kohn, MD, MPP

and Andi Marmor, MD

Evidence-Based Pediatrics SIG, 2012

Outline/Menu Interval likelihood ratios

Septic arthritis When not to use likelihood ratios

UTI in young febrile children Critical appraisal of studies of diagnostic tests: Beyond the checklist Signs and symptons of appendicitis

Getting the most out of ROC curves (LAST YEAR): Meningitis in young infants ROC Curve demonstration

Septic ArthritisBacterial infection in a joint.

Does this Adult Patient Have Septic Arthritis? JAMA. 2007;297:1478-1488.

“A 48-year-old woman…presents to the emergency department with a 2-day history of a red, swollen right knee that is painful to touch….

On examination, she is afebrile and has a right knee effusion…An arthrocentesis is performed and initial laboratory results show a negative Gram stain...”

Pre-Test Probability of Septic Arthritis = 38%Synovial Fluid WBC Count = 48,000/µLPost-Test Probability of Septic Arthritis = ?

Copyright restrictions may apply.

Margaretten, M. E. et al. JAMA 2007;297:1478-1488.

Test Characteristics of Synovial Fluid Studies

WBC (/uL)

Sensitivity

Specificity LR+ LR-

>100,000 29% 99%29.0 0.7

>50,000 62% 92% 7.8 0.4

>25,000 77% 73% 2.9 0.3

Synovial WBC Count = 48,000/uL

Which LR should we use?

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at Different Cutoffs

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic

Arthritis at 3 Different Cutoffs

WBC (/uL)

Sensitivity

Specificity LR+ LR-

>100,000 29% 99%29.0 0.7

>50,000 62% 92% 7.8 0.4

>25,000 77% 73% 2.9 0.3

Synovial WBC Count = 48,000/uLJAMA authors used this one

Clinical ScenarioSynovial WBC = 48,000/mL

Pre-test prob: 0.38Pre-test odds: 0.38/0.62 = 0.61LR(+) = 2.9 (According to JAMA authors)

Post-Test Odds = Pre-Test Odds x LR(+)

= 0.61 x 2.9 = 1.75Post-Test prob = 1.75/(1.75+1) = 0.64

WBC (/uL)

Sensitivity

Specificity LR+ LR-

>100,000 29% 99%29.0 0.7

>50,000 62% 92% 7.8 0.4

>25,000 77% 73% 2.9 0.3


Which LR should we use?

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs

Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic

Arthritis at 3 Different Cutoffs

WBC (/uL)

Sensitivity

Specificity LR+ LR-

>100,000 29% 99%29.0 0.7

>50,000 62% 92% 7.8 0.4

>25,000 77% 73% 2.9 0.3


Which LR should we use? NONE of THESE!

Likelihood Ratios

LR(result) = P(result|D+)/P(result|D-)

P(Result) in patient WITH disease

----------------------------------------------------P(Result) in patients WITHOUT disease

Likelihood RatioWBC (/uL) Interval % of D+ % of D-

Interval LR

>100,000 29% 1% 29.0

>50,000-100,000 33% 7% 4.7

>25,000-50,000 15% 19% 0.8

0 - 25,000 23% 73% 0.3

Likelihood RatioWBC (/uL) Interval % of D+ % of D-

Interval LR

>100,000 29% 1% 29.0

>50,000-100,000 33% 7% 4.7

>25,000-50,000 15% 19% 0.8

0 - 25,000 23% 73% 0.3

More appropriate LR?

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

1 - Specificity

Sen

sitiv

ity

> 50k

> 25k

15%

19%Slope = 15%/19% =0.8

LR = Slope of ROC Curve

Clinical ScenarioSynovial WBC = 48,000/uL

Pre-test prob: 0.38Pre-test odds: 0.38/0.62 = 0.61LR(WBC btw 25,000 and 50,000) = 0.8Post-Test Odds = Pre-Test Odds x LR(48)

= 0.61 x 0.8 = 0.49Post-Test prob = 0.49/(0.49+1) = 0.33

Doing it right makes a difference

From JAMA paper:“Her synovial WBC count of 48,000/µL increases the probability from 38% to 64%.” (Used LR = 2.9)

Alternative calculation:Her synovial WBC count of 48,000/µL decreases the probability from 38% to 33%.” (Used LR = 0.8)

Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure?

JAMA. 2005;294:1944-1956.

“In this case, a BNP level could be very helpful. If it were less than 100 pg/mL, heart failure would be extremely unlikely (LR 0.09). If it were elevated, the probability of heart failure is higher but not diagnostic.”

How to interpret serum BNP (B-type Natriuretic Peptide) results?


Wang, C. S. et al. JAMA 2005;294:1944-1956.

Summary of Operating Characteristics of Serum BNP in Emergency Department Patients

When NOT to use LR

Background

Black children (at least girls) appear to be at lower risk of UTI (RR ~0.3)

Circumcised boys are at much lower risk than uncircumcised boys (RR ~0.1)

In diagnosing UTI, it makes sense to use both history findings like these with physical examination (height of fever, etc.) and laboratory (urine white cells)

But there is a very important difference!

Does This Child Have a UTI?

JAMA. 2007;298(24):2895-2904

Does This Child Have a UTI?

JAMA. 2007;298(24):2895-2904

What is wrong with using LRs for these

risk factors?

LR will vary tremendously with the prevalence of the risk factor in each study!

Definitions

LR+= a/(a+c) b/(b+d)

LR- = c/(a+c) d/(b+d)

OR = ad/bc = LR+/LR-

Disease Risk factor or Test Result Yes No TotalPresent (+) a b a+bAbsent (-) c d c+dTotal a+c b+d N

Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel A: Low prevalence of strong risk factor.

Figure 8.9

Figure 8.9 Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel B: High prevalence of strong risk factor.

Figure 8.9

OR vs LR

Characteristic LR ORBest for Single or independent

diagnostic test resultsRisk factors and treatments

Main use Revising probability estimates

Identifying and quantifying causal associations

Causality direction Best if disease causes test result

Best if risk factor causes disease

Typical values Varies, but for dichotomous tests LR often >10 or < 0.1

Many OR between 0.3 and 3

Depend on prevalence of test result or risk factor

Yes No

Additional problem: failing to quantify risks and benefits of tests and treatments, leading overly aggressive testing recommendations

Except in blacks, urinalysis and urine culture recommended for: Girls and uncircumcised boys 3-24 months with any fever of any duration even if they look well and have an apparent source

Circumcised boys with any fever > 24 hours even if they look well and have an apparent source

*Shaikh N et al. JAMA 2007;298:2895-2904, figures 2 & 3

Critical Appraisal of Studies of Diagnostic Test Accuracy

Index Test = Test Being Evaluated

Gold Standard = Test Used to Determine True Disease Status

Chapter 5 – Studies of Diagnostic Tests

Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)

Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)

Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)*

Spectrum Bias D+ sickest of the sick (Sensitivity Up) D- wellest of the well (Specificity Up)

*If cases resolve spontaneously.

Bias #2 Example: Visual assessment of jaundice in newborns Study patients who are getting a bilirubin measurement

Ask clinicians to estimate extent of jaundice at time of blood draw

Compare with blood test

Visual Assessment of jaundice*: Results

*Moyer et al., APAM 2000; 154:391

Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97%

Specificity = 19%

What is the problem?

Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication.

--Catherine D. DeAngelis, MD

Bias #2: Verification Bias* -1

Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin

Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice

How does this affect sensitivity and specificity?

*AKA Work-up, Referral Bias, or Ascertainment Bias

Verification Bias

TSB >12 TSB < 12

Jaundice below nipple

a b

No jaundice below nipple

c d

Sensitivity, a/(a+c), is biased ___.

Specificity, d/(b+d), is biased ___.

*AKA Work-up, Referral Bias, or Ascertainment Bias

Double Gold Standard Bias

Two different “gold standards” One gold standard (usually an immediate, more invasive test, e.g., angiogram, surgery) is more likely to be applied in patients with positive index test

Second gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.

Double Gold Standard Bias

There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up)

Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

Effect of Double Gold Standard Bias: Spontaneously resolving disease

Test result will always agree with gold standard

Both sensitivity and specificity increase Example: Joey has an intussusception that will resolve spontaneously. If his ultrasound scan is positive, he will get a contrast enema that will show (and cure) the intussusception (true positive)

If his ultrasound scan is negative, his intussusception will resolve and we will think he never had one (true negative)

Ultrasound scan can’t be wrong!


Does This Child Have Appendicitis?JAMA. 2007;298:438-451.

RLQ Pain: Sensitivity = 96% Specificity = 5% (1 – Specificity = 95%)

Likelihood Ratio =1.0RLQ pain was present in 96% of those with appendicitis and 95% of those without appendicitis.

Verification (Referral) Bias

Biases the accuracy of a finding when the presence of the finding makes the patient more likely to be studied.

Specificity biased down (5%) .Sensitivity biased up (96%) .


Bundy, D. G. et al. JAMA 2007;298:438-451.

Study Population: Children who underwent appendectomy

Does the LR of 1 mean that, in children, RLQ pain is not indicative of appendicitis?

No; it means only kids with RLQ pain get appendectomies.

Studies of Diagnostic Test Accuracy: Checklist

Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

Was the reference standard applied regardless of the diagnostic test result?

Was the test (or cluster of tests) validated in a second, independent group of patients?

From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

A clinical decision rule to identify children at low risk for appendicitis* (Problem 5.6 in EBD)

Study design: prospective cohort study Subjects

4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain

767 (19%) received surgical consultation for possible appendicitis

113 Excluded (chronic diseases, recent imaging)

53 missed 601 included in the study (425 in derivation set)

*Kharbanda et al. Pediatrics 2005; 116(3): 709-16

A clinical decision rule to identify children at low risk for appendicitis

Predictor variables Standardized assessment by pediatric ED attending

Focus on “Pain with percussion, hopping or cough” (complete data in N=381)

Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%)

Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%)

Kharbanda et al. Pediatrics 116(3): 709-16

A clinical decision rule to identify children at low risk for appendicitis

Results: Pain with percussion, hopping or cough

78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image?

Kharbanda et al. Pediatrics 116(3): 709-16

Checklist

Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

Was the reference standard applied regardless of the diagnostic test result?

Was the test (or cluster of tests) validated in a second, independent group of patients?

From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), 2000. p 68

In what direction would these biases affect results?

Sample not representative (population referred to pedi surgery)?

Verification bias? Double-gold standard bias? Spectrum bias

For children presenting with abdominal pain to SFGH 6-M Sensitivity probably valid (not falsely low) But whether all of the kids in the study tried to hop is not clear

Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision

Date post:	19-Jan-2016
Category:	Documents
Upload:	osborn-johnson
View:	215 times
Download:	0 times

Common Errors by Teachers and Proponents of EBM Thomas B. Newman, MD, MPH with thanks to Michael...

Documents