Department of Primary Health Care University of Oxford Matthew J. Thompson GP & Senior Clinical...

Post on 26-Dec-2015

214 views 0 download

transcript

Department of Primary Health CareUniversity of Oxford

Matthew J. ThompsonGP & Senior Clinical Scientist

Appraising diagnostic studies

Overview of talk

Diagnostic reasoning Appraising diagnostic studies

What is diagnosis?

Increase certainty about presence/absence of disease

Disease severity Monitor clinical course Assess prognosis – risk/stage

within diagnosis Plan treatment e.g., location Stalling for time!

Knottnerus, BMJ 2002

Diagnostic errors

Most diagnostic errors due to cognitive errors: Conditions of uncertainty Thinking is pressured Shortcuts are used(Ann Croskerry. Ann Emerg Med 2003)

Human toll of diagnostic errors (Diagnostic errors - The next frontier for Patient Safety. Newman-Toker, JAMA 2009)

40,000-80,000 US hospital deaths from misdiagnosis per year

Adverse events, negligence cases, serious disability more likely to be related to misdiagnosis than drug errors

Diagnostic reasoning

Diagnostic strategies particularly important where patients present with variety of conditions and possible diagnoses.

Diagnostic reasoning

For example, what causes cough? Comprehensive history examination differential

diagnosis final diagnosis

Diagnostic reasoning

For example, what causes cough? Comprehensive history…examination…differential

diagnosis…final diagnosis

Cardiac failure, left sided , Chronic obstructive pulmonary disease , Lung abscess Pulmonary alveolar proteinosis, Wegener's granulomatosis, Bronchiectasis Pneumonia, Atypical pneumonia, Pulmonary hypertension Measles, Oropharyngeal cancer, Goodpasture's syndrome Pulmonary oedema, Pulmonary embolism, Mycobacterium tuberculosis Foreign body in respiratory tract, Diffuse panbronchiolitis, Bronchogenic carcinoma Broncholithiasis, Pulmonary fibrosis, Pneumocystis carinii Captopril, Whooping cough, Fasciola hepatica Gastroesophageal reflux, Schistosoma haematobium, Visceral leishmaniasis Enalapril, Pharyngeal pouch, Suppurative otitis media Upper respiratory tract infection, Arnold's nerve cough syndrome, Allergic bronchopulmonary aspergillosis Chlorine gas, Amyloidosis, Cyclophosphamide Tropical pulmonary eosinophilia, Simple pulmonary eosinophilia, Sulphur dioxide Tracheolaryngobronchitis, Extrinsic allergic alveolitis, Laryngitis Fibrosing alveolitis, cryptogenic, Toluene di-isocyanate, Coal worker's pneumoconiosis Lisinopril, Functional disorders, Nitrogen dioxide, Fentanyl Asthma, Omapatrilat, Sinusitis Gabapentin, Cilazapril

……diagnostic reasoning

53!

Diagnostic reasoning strategies

Aim: identify types and frequency of diagnostic strategies used in primary care 6 GPs collected and recorded strategies used on 300

patients.

(Diagnostic strategies used in primary care. Heneghan, Glasziou, Thompson et al,. BMJ in press)

Refinement of the diagnostic

causes

•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule

Spot diagnosesSelf-labelling Presenting complaintPattern recognition

Initiation of the diagnosis

Defining the final diagnosis

Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label

Diagnostic stages & strategies

Stage Strategy

Refinement of the diagnostic

causes

•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule

Spot diagnosesSelf-labelling Presenting complaintPattern recognition

Initiation of the diagnosis

Defining the final diagnosis

Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label

Diagnostic stages & strategies

Stage Strategy

Unconscious recognition of non-verbal pattern, e.g.:

visual (skin condition)

auditory (barking cough with croup) Fairly instantaneous, no further history needed. 20% of consultations

*Brooks LR. Role of specific similarity in a medical diagnostic task. J Exp Psychol Gen 1991;220:278-87

Initiation: Spot diagnosis

Initiation: Self-labelling

“Its tonsillitis doc– I’ve had it before”

“I have a chest infection doctor”

20% of consultations

Accuracy of self-diagnosis in recurrent UTI

88 women with 172 self-diagnosed UTIs Uropathogen in 144 (84%) Sterile pyuria in 19 cases

(11%) No pyuria or bacteriuira in

9 cases (5%)

(Gupta et al Ann Int Med 2001)

Refinement of the diagnostic

causes

•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition•Clinical Prediction Rule

Spot diagnosesSelf-labelling Presenting complaintPattern recognition

Initiation of the diagnosis

Defining the final diagnosis

Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label

Stage Strategy

Refining: Restricted rule-out (or Murtagh’s) process

A learned diagnostic strategy for each presentation Think of the most common/likely condition AND… what needs to be ruled out also?

Example: patient with headache …learn to check for migraine, tension type headache, but to rule out temporal arteritis, subarachnoid haemorrhage etc

Used in 30% consultations

Murtagh. Australian Fam Phys 1990. Croskerry Ann Emerg Med 2003

Refining: Probabilistic reasoning

The use of a specific but probably imperfect symptom, sign or diagnostic test to rule in or out a diagnosis.

E.g. urine dipstick for UTI, arterial tenderness in Temporal Arteritis

Used 10% of cases

Refining: Pattern recognition

Symptoms and signs volunteered or elicited from the patient are compared to previous patterns or cases and a disease is recognized when the actual pattern fits.

Relies on memory of known patterns, but no specific rule is employed.

Used in 40% cases

Refining: Clinical prediction rules

Formal version of pattern recognition based on a well defined and validated series of similar cases.

Examples: Ottawa ankle rule, streptococcal sore throat,

Rarely used <10% cases

Refinement of the diagnostic

causes

•Restricted Rule Outs•Stepwise refinement•Probabilistic reasoning•Pattern recognition fit•Clinical Prediction Rule

Spot diagnosesSelf-labelling Presenting complaintPattern recognition

Initiation of the diagnosis

Defining the final diagnosis

Known DiagnosisFurther tests orderedTest of treatmentTest of timeNo label

Stage Strategy

Final diagnostic stage

Known diagnosis Order further tests Test of treatment Test of time Can’t label

Known

Diagnosis

Further tests ordered

Test of treatment

Test of time

No Label0%

10

%2

0%

30

%4

0%

50

%6

0%

70

%8

0%

90

%1

00%

Defining the final diagnoses

Appraising diagnostic tests

1. Are the results valid?

2. What are the results?

3. Will they help me look after my patients?

Appraising diagnostic tests

1. Are the results valid?

2. What are the results?

3. Will they help me look after my patients?

Basic design of diagnostic accuracy study

Series of patientsSeries of patients

Index testIndex test

Reference (“gold”) standardReference (“gold”) standard

Blinded cross-classificationBlinded cross-classification

Validity of diagnostic studies

1. Was an appropriate spectrum of patients included?

2. Were all patients subjected to the gold standard?

3. Was there an independent, blind or objective comparison with the gold standard?

Selected PatientsSelected Patients

Index testIndex test

Reference standardReference standard

Blinded cross-classificationBlinded cross-classification

1. Was an appropriate spectrum of patients included? Spectrum bias

1. Was an appropriate spectrum of patients included? Spectrum bias

You want to find out how good chest X rays are for diagnosing pneumonia in the Emergency Department

Best = all patients presenting with difficulty breathing get a chest X-ray

Spectrum bias = only those patients in whom you really suspect pneumonia get a chest X ray

Series of patientsSeries of patients

Index testIndex test

Reference Reference standardstandard

Blinded cross-classificationBlinded cross-classification

2. Were all patients subjected to the gold standard? Verification (work-up) bias

2. Were all patients subjected to the gold standard? Verification (work-up) bias You want to find out how good is exercise

ECG (“treadmill test”) for identifying patients with angina

The gold standard is angiography Best = all patients get angiography Verification (work-up bias) = only patients

who have a positive exercise ECG get angiography

Series of patientsSeries of patients

Index testIndex test

Reference standardReference standard

Unblinded cross-classificationUnblinded cross-classification

3. Was there an independent, blind or objective comparison with the gold

standard? Observer bias

You want to find out how good is exercise ECG (“treadmill test”) for identifying patients with angina

All patients get the gold standard (angiography)

Observer bias = the Cardiologist who does the angiography knows what the exercise ECG showed (not blinded)

3. Was there an independent, blind or objective comparison with the gold

standard? Observer bias

Series of patientsSeries of patients

Index testIndex test

Reference standard….. includes parts of Reference standard….. includes parts of Index testIndex test

Unblinded cross-classificationUnblinded cross-classification

Incorporation Bias

Series of patientsSeries of patients

Index testIndex test

Blinded cross-classificationBlinded cross-classification

Ref. Std ARef. Std A Ref. Std. BRef. Std. B

Differential Reference Bias

Validity of diagnostic studies

1. Was an appropriate spectrum of patients included?

2. Were all patients subjected to the Gold Standard?

3. Was there an independent, blind or objective comparison with the Gold Standard?

Appraising diagnostic tests

1. Are the results valid?

2. What are the results?

3. Will they help me look after my patients?

Sensitivity, specificity, positive & negative

predictive values, likelihood ratios

…aaarrrggh!!

2 by 2 table

Disease

Test

+ -

+

-

2 by 2 table

Disease

Test

+ -

+

- c

a b

d

2 by 2 table

Disease

Test

+ -

+

-

a

True positives

c

False negatives

b

False positives

d

True negatives

2 by 2 table: sensitivity

Disease

Test

+ -

+

- c

a

Sensitivity = a / a + c

Proportion of people with the disease who have a positive test result.

.…a highly sensitive test will not miss many people

2 by 2 table: sensitivity

Disease

Test

+ -

+

- 1

99

Sensitivity = a / a + c

Sensitivity = 99/100 = 99%

2 by 2 table: specificity

Disease

Test

+ -

+

-

b

d

Specificity = d / b + d

Proportion of people without the disease who have a negative test result.

….a highly specific test will not falsely identify people as having the disease.

Tip…..

Sensitivity is useful to me

Specificity isn’t….I want to know about the false positives

…so……use 1-specificity which is the false positive rate

2 by 2 table:

Disease

Test

+ -

+

- c

a b

d

Sensitivity = a/a+c False positive rate = b/b+d

(same as 1-specificity)

2 by 2 table:

Disease

Test

+ -

+

- 1

99 10

90

Sensitivity = 99% False positive rate = 10%

(same as 1-specificity)

Example

Your father went to his doctor and was told that his test for a disease was positive. He is really worried, and comes to ask you for help!

After doing some reading, you find that for men of his age:

The prevalence of the disease is 30%The test has sensitivity of 50% and specificity of 90%

“Son, tell me what’s the chance I have this disease?”

100% Always

50% maybe

0% Never

A disease with a A disease with a prevalence of 30%.prevalence of 30%.

The test has The test has sensitivity of 50% sensitivity of 50% and specificity of and specificity of

90%.90%.

Prevalence of 30%, Sensitivity of 50%, Specificity of 90%

30

70

15

7100

22 people test positive……….

of whom 15 have the disease

So, chance of disease is 15/22 about 70%

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 50%

False positive rate = 10%

A disease with a prevalence of 4% must be diagnosed.

It has a sensitivity of 50% and a specificity of 90%.

If the patient tests positive, what is the chance they have the disease?

Try it againTry it again

Prevalence of 4%, Sensitivity of 50%, Specificity of 90%

4

96

2

9.6

100

11.6 people test positive……….

of whom 2 have the disease

So, chance of disease is 2/11.6 about 17%

Disease +ve

Disease -ve

Testing +ve

Sensitivity = 50%

False positive rate = 10%

Doctors with an average of 14 yrs experience

….answers ranged from 1% to 99%

….half of them estimating the probability as 50%

Gigerenzer G BMJ 2003;327:741-744

Sensitivity and specificity don’t vary with prevalence Test performance can vary in different settings/

patient groups, etc. Occasionally attributed to differences in disease

prevalence, but more likely is due to differences in diseased and non-diseased spectrums

2 x 2 table: positive predictive value

Disease

Test

+ -

+

- c

a b

d

PPV = a / a + b

Proportion of people with a positive test who have the disease

2 x 2 table: negative predictive value

Disease

Test

+ -

+

- c

a b

d

NPV = d / c + d

Proportion of people with a negative test who do not have the disease

What’s wrong with PPV and NPV?

Depend on accuracy of the test and prevalence of the disease

Likelihood ratios

Can use in situations with more than 2 test outcomes

Direct link from pre-test probabilities to post-test probabilities

2 x 2 table: positive likelihood ratio

Disease

Test

+ -

+

- c

a b

d

LR+ = a/a+c / b/b+d

or

LR+ = sens/(1-spec)

How much more often a positive test occurs in people with compared to those without the disease

2 x 2 table: negative likelihood ratio

Disease

Test

+ -

+

- c

a b

d

LR- = c/a+c / d/b+d

or

LR- = (1-sens)/(spec)

How less likely a negative test result is in people with the disease compared to those without the disease

LR>10 = strong positive test result

LR<0.1 = strong negative test result

LR=1

No diagnostic value

McGee: Evidence based Physical Diagnosis (Saunders Elsevier)

Post test 20%

? Appendicitis:

McBurney tenderness LR+ = 3.4

Pre test 5%

%

%

Bayesian reasoning

Fagan nomogram

Do doctors use quantitative methods of test accuracy?

Survey of 300 US physicians 8 used Bayesian methods, 3 used

ROC curves, 2 used LRs Why?

…indices unavailable…

…lack of training…

…not relevant to setting/population.

…other factors more important…

(Reid et al. Academic calculations versus clinical judgements: practicing physicians’ use of quantitative measures of test accuracy. Am J Med 1998)

Appraising diagnostic tests

1. Are the results valid?

2. What are the results?

3. Will they help me look after my patients?

Will the test apply in my setting?

Reproducibility of the test and interpretation in my setting

Do results apply to the mix of patients I see? Will the results change my management? Impact on outcomes that are important to patients? Where does the test fit into the diagnostic strategy? Costs to patient/health service?

Reliability – how reproducible is the test?

Kappa = measure of intra-observer reliability

Test Kappa value

Tachypnoea 0.25

Crackles on auscultation

0.41

Pleural rub 0.52

CXR for cardiomegaly

0.48

MRI spine for disc

0.59

Value of Kappa Strength of Agreement

<0.20 Poor

0.21-0.40 Fair

0.41-0.60 Moderate

0.61-0.80 Good

0.81-1.00 Very Good

Will the result change management?

No action Test Action

(e.g. treat)

Probability of disease0% 100%

Testing threshold

Action threshold

Any questions!