Download - Evaluation of Diagnostic Tests...Diagnostic tests are used for screening, as add-on tests, for triage, as replacement tests or for monitoring. Diagnostic test evaluation includes (at

Evaluation of Diagnostic Tests July 18, 2011

Introduction to Clinical Research:A Two-week Intensive Course

Milo A. Puhan, MD, PhD Department of Epidemiology

Johns HopkinsBloomberg School of Public Health

1

Today’s learning objectives

To describe the use of diagnostic tests in practice

To know about the phases of diagnostic test evaluation

To know an approach for designing randomized trials for diagnostic test evaluation

2

To know about biases that affect diagnostic test accuracy studies

Today’s key messages

Diagnostic tests are used for screening, as add-on tests, for triage, as replacement tests or for monitoring.

Diagnostic test evaluation includes (at least) test accuracy studies,health outcomes studies and cost effectiveness studies.

3

Biases related to the spectrum of patients and to the reference standard affect estimates of diagnostic test accuracy most.

To ensure that informative RCTs, identify the critical comparisons between the old and new test-treatment.

Recommended books

Evidence Base of Clinical DiagnosisTheory and Methods of Diagnostic Researchby Andre Knottnerus and Frank Buntinx (Editor), Publisher: Wiley, John & Sons, Pub. Date: November 2008 ISBN-13: 9781405157872

4

Evidence-Based DiagnosisThomas B. Newman and Michael A. KohnPublisher: Cambridge University PressPub. Date: 2009ISBN: 978-0-521-71402-0

http://www.amazon.com/gp/reader/0521714028/ref=sib_dp_pt

5

Part I: Stages of diagnostic test evaluation

Brain natriuretic peptide (BNP) for diagnosing heart failure

BNP

Toma et al Cardiovascular Medicine 2007;10:27–33

Cut-offs between 15 and 100 pg/ml used

6

7 7

Diagnostic test should reduce uncertainty

Example 1: BNP for diagnosing heart failure in patients with dyspnea in ER

Pre-test probability Diagnostic test Post-test probability

20% BNP - 2%

100%

50%

0%

Pretest probability

Probability of target condition before testing

100%

50%

0%

Posttest probability

Probability of target condition after testing

Test

Diagnostic test (may) have an indirect and direct impact on health outcomes

8

9

Early diagnostic work-up

Stage of clinical management

Setting Purpose of test

Primary careER

Available information

Patient historyPhysical exam

ScreeningTriage

Use of a test (BNP) in practice

Monitor disease processUnder treatment

Primary orspecialized care

Diagnostic work-up + treatments

100%

50%

0%

Prognostic information

Diagnosis establishedER

Specialized care+

echocardiography

100%

50%

0%

Diagnosis not yet established ER

+Chest X-ray

ECG

Add-onReplacement

100%

50%

0%

100%

50%

0%

BNP used for triage to avoid unnecessary work-up

Heart failure among differential diagnoses after taking patient history and physical exam

100%

50%

0%

Positive test result Negative test result 100%

50%

0%

100%

50%

0%

Heart failure work-up± treatment

Consider other diagnoses

Health outcomes 10

BNP used as add-on test

Heart failure suggested after patient history, physical exam, ECG and Chest x-ray

100%

50%

0%


50%

0%

100%

50%

0%

Heart failure work-up± treatment

Consider other diagnoses

Health outcomes11

BNP used as replacement test for echocardiography

Heart failure suggested after patient history, physical exam, ECG and Chest x-ray

100%

50%

0%


50%

0%

100%

50%

0%

Treatment Consider other diagnoses

Health outcomes12

BNP used as a prognostic marker

AgeGenderExacerbations of heart failureDyspnea

Risk of 5-year mortality

0-10%>10-20%>20-30%>30%

+

Improved prediction by adding BNP?

13

BNP used for monitoring

Heart failure diagnosis and treatment established

Not in therapeutic range In therapeutic range

Adapt treatment Treatment unchanged

14

Test phases for diagnostic tests

Investigates whether test results are different for patients ±disease

Phase I

Investigates whether patients with disease are more likely to have positive test results compared to patients without disease

Phase II

Investigates how well the test distinguishes between patients ± disease in patients suspected of having the disease

Phase III

Investigates whether using the test leads to better health outcomes

Phase V

Phase VI Investigates whether using the test leads to better health outcomes at acceptable costs

Phase IV Investigates how informative a test is considering additional information available at the moment of testing.

15

Phases of diagnostic test evaluation

Phase I

Patient with heart failure

Healthy subjects

Phase IIPatient with heart failure

Healthy subjects

90 20

10 80

Test positive

Test negative

PPV: 82%

NPV: 89%

Sens: 90% Spec: 80%

DOR: 36

+LR: 4.5

-LR: 0.13

16

Phase IV

Patients suspected of having disease

Age, gender, smoking and coronary heart disease status known

Probability of heart failure?

Phase III

heart failure no heart failure

85 250

15 650

Test positive

Test negative

PPV: 25%

NPV: 96%

Sens: 85% Spec: 72%

DOR: 15

+LR: 3.0

-LR: 0.21



17

Phase V

Patients suspected of having heart failure

Health outcomeR

± Treat and follow-up

+

-

Treat and follow-upRandomized trial

or


Before-after study

until to 2000


from 2001

Health outcome


+- Treat and follow-up


18

Phase VI


Health outcome

+ costs

R


+

-



19

Outcomes for phase V and VI studies

20

Study designs for the evaluation of diagnostic tests

Cross-sectional case-control study (pro- or retrospective)

Cross-sectional study of patients suspected of having disease (pro- or retrospective)

Randomized trial or before-after study

Phase I

Cross-sectional case-control study (pro- or retrospective)

Phase II

Phase III

Phase IV

Phase V Cost effectiveness study

Systematic review

Cross-sectional study of patients suspected of having disease

Phase IV

21

Test phases are not well established for diagnostic studies

22

Clinical Trials

Phase 1 Phase 2 Phase 3 Phase 4

- Safety (maximum tolerated dose)

- Pharmacokinetics

- Prelim. Efficacy- Dosage response

- Efficacy- Clinically relevant

effects

- Safety surveillance

Diagnostic studies

19 models have been proposed…

Lijmer et al. Med Decis Making 2009; 29; E13

23

Synthesis of models of diagnostic test evaluation phases

Lijmer et al. Med Decis Making 2009; 29; E13

Technical requirements

Test accuracy

Effects on decisions

Effects on patient outcomes

Effects on health care system

24

Part II: Biases in diagnostic test accuracy studies

The diagnostic test accuracy study

Aim: To obtain (unbiased) estimates of diagnostic test accuracy such as sensitivity, specificity, likelihood ratios, etc


Study design: Cross-sectional study with patients suspected of having disease (phase III)

Index test Reference test

Patient with disease

Patients without disease

85 250

15 650

Test positive

Test negative

25

The “perfect”diagnostic test accuracy study


Index test

Reference test

- Spectrum of patients adequate for setting and intention-to-diagnose

- Prospective data collection- Consecutive recruitment- Referral filter described (setting)- Prior information available (comprehensive

ascertainment of patient history, exam, tests)

- Aim of test (triage, replacement, add-on)- Well defined protocol- Well defined threshold- Performed for all patients- Maximized reliability (intra- and inter-rater)- Blinded towards reference test

- Good measure of target condition- Well defined protocol- Performed for all patients- Performed at same time as index test- Maximized reliability (intra- and inter-rater)- Blinded towards index test 26

Sources of bias in diagnostic test accuracy studies


Index test

Reference test



ascertainment patient history, exam, tests)


- Good measure of target condition- Well defined protocol- Performed for all patients- Performed at same time as index test- Maximized reliability (intra- and inter-rater)- Blinded towards index test

27

Sources of variability in diagnostic test accuracy studies


Index test

Reference test



ascertainment patient history, exam, tests)


- Good measure of target condition- Well defined protocol- Performed for all patients- Performed at same time as index test- Maximized reliability (intra- and inter-rater)- Blinded towards index test 28

Bias from definition or recruitment of population - Spectrum bias



- Prospective data collection- Consecutive recruitment- Referral filter described- Prior information available (comprehensiveascertainment patient history, exam, tests)

Setting: Outpatient cardiology clinic

Patients: referred from primary care with suspected new heart failure

Clinical manifestations

None Very severe

Number of patients

x x

x

x

x

x

29

Bias from definition or recruitment of population -Spectrum bias






Clinical manifestationsNone Very severe

BNP levels

30







BNP levels


31







BNP levels

Heart failure No heart failure

9 8

2 23

Test +

Test -

Sens: 82% Spec: 74%


32







BNP levels


5 2

1 10

Test +

Test -

Sens: 83% Spec: 83%


33

Eissa et al. J Urol 2010;183:493-498

Healthy controls frequently included in DTA studies

Mao et al. Gut 2010;59:1687e1693.

Often strong conclusions despite being phase II studies

Empirical Evidence of Design-Related Bias in Studies of Diagnostic Tests

Lijmer, J. G. et al. JAMA 1999;282:1061-1066.

Meta-analyses of at least 5 test accuracy studies

PubMed, EMBASE, DARE, Cochrane

18 meta-analyses found including 193 studies

Quality assessment + diagnostic odds ratio (DOR)

80 50

20 450

80*450

20*50= 36

Association of quality of studies with diagnostic odds ratio

DOR low quality= relative DOR

DOR high quality

36


Relative Diagnostic Odds Ratios of the 9 Study Characteristics

37





BNP levels

No verification of disease status

Validated BNP measurements?

Same reference standard for all?

Same assessors for reference standard for all?

Bias from definition or recruitment of population –Prospective vs retrospective

38



39

Index test


Heart sound level

Bias from index test - Test review bias (blinding)

Blinding No blinding

40



41

Biases from reference test

Reference test


None Very severe

Challenges for verification of disease status by reference test

- Partial verification bias

- Differential verification bias

- Disease progression bias

- Incorporation bias

- Blinding – Diagnosis review bias

42

Reference test


None Very severe

How to define heart failure?

- Clinical criteria?

- Echocardiography?

- Response to treatment?

Biases from reference test

43

Disease progression bias

Reference test



Index test

Reference test

Delay of reference test

- Disease status may have changed

- Particularly problematic for acute diseases

(infections)

- Problematic for chronic diseases if prognostic

criteria used as reference standard

44

Partial verification bias

Reference test


- Missed to perform reference test

randomly

No verification of disease status

in patients with lower

disease probability

45


Reference test


9 8

2 23

Test +

Test -

Sens: 82% Spec: 74%


7 4 6

2 4 19

Test +

Test -

Unclear

Sens: 78% Spec: 76%

Complete case analysis If uncorrelated to Test +/- tends to underestimate accuracy

prevalence low sensitivity more affected

prevalence high specificity more affected


46


Reference test

Heart failure

No heart failure

7 4 6

2 4 19

Test +

Test -

UnclearHeart failure No heart

failure

11 6

2 23

Test +

Test -

Sens: 85% Spec: 76%

Add prognostic criterion: Was heart failure diagnosed at later stage?


7 4 6

2 4 19

Test +

Test -

UnclearHeart failure No heart

failure

10 7

3 22

Test +

Test -

Sens: 77% Spec: 76%

3 1

1 3


47

48

Rutjes et al. Health Tech Ass 2007; Vol. 175, 1605-1612

Methods to correct for partial and differential verification bias

Incorporation bias

Reference test

Index test

Diagnosis of multiple sclerosis

Reference test

MRI Clinical follow-up, cerebrospinal fluid + MRI


49



50

Solutions to minimize bias from reference standard

- Case definition- Intra- and inter rater reliability

Sources of bias

- Partial verification bias- Differential verification bias

- Double/triple reading- Adjudication committee (expert panels)

Solutions

- Incorporation bias and diagnosis review bias

- Ensure quality control for data collection- Use “realistic” reference tests- Foresee missings and consider

prognostic criteria- Statistical methods

- Appropriate case definition- Ensure blinding

- Applies to all - Learn from previous studies- Do systematic review- Write protocol- Take time and find consensus

- Disease progression bias - No relevant time delay between index and reference test

51

52

Part III: Randomized trials for diagnostic test evaluation

Test phases for diagnostic tests

Investigates whether test results are different for patients ±disease

Phase I

Investigates whether patients with disease are more likely to have positive test results compared to patients without disease

Phase II

Investigates how well the test distinguishes between patients ± disease in patients suspected of having the disease

Phase III

Investigates whether using the test leads to better health outcomes

Phase V

Phase VI Investigates whether using the test leads to better health outcomes at acceptable costs

Phase IV Investigates how informative a test is considering additional information available at the moment of testing.

53

Phase V


Health outcomeR


+

-



The critical questions when assessing patient outcomes

55

What is the intended incremental value of the test on outcomes (short- and long-term patient outcomes and costs)?

What type of evidence is needed to assess this incremental value?

Recommended approach

Define the purpose of the test

Display the existing test-treatment strategy

Display the new test-treatment strategy

Identify the critical comparison to assess the incremental value

Assess whether existing evidence suffices or if RCTs are required

Lord et al. Med Dec Making 2009;29:E1

Test-treatment strategy for replacement tests

Target populationPrior tests

Existing test

Test result

Management

Test pos pathway TF FP

Test neg pathway TN FN

Test safety & other attributes?

Sensitivity and specificity?

Change in management?

Treatment effects?

Patient outcomes

Test neg pathway TN FN

New test

Test result

Management

Test pos pathway TF FP

Example: Liquid-based cytology to replace Pap smear for cervical cancer screening in order to reduce repeated testing (poor Pap smear quality)

Target population

Pap smear LBCTest procedure identical

for women

Reference standard for both: Biopsy

SR show: Sensitivity and specificity very similar

Test result Test result

Management Management

Test pos pathway

Test neg pathway

No change in management

Treatment effects not different

Patient outcomes

Test pos pathway

Test neg pathway

test again test againRCT to compare

short-term effects from

testing

No long-term RCT needed

Test-treatment strategy for add-on tests


Existing test Existing test

Test safety & other attributes?

Test result

Management

Management

Test pos pathway A

Test neg pathway B

Sensitivity & specificity

Treatment effects?

Patient outcomes

Test pos pathway A

Test neg Add-on test

Test result

Test neg pathway B*

Test result

Management

Test pos pathway A*

Treated populations?

Example: MRI as add-on test to mammography and ultrasound in breast cancer screening to detect extra cases and inform decision on type of surgery


Mammography + ultrasound

MRI more sensitive more cases and detects multifocal

disease

Test result

Management

Management

Test pos BCS or

mastectomy

Test neg continue screening

Treatment effects unclear

Patient outcomes

Test pos BCS or

mastectomy

Test neg MRI

Test result

Test negcontinue screening

Test result

Management

Test pos BCS or

mastectomy*

Treated populations

different

Mammography + ultrasound RCT to compare

short-term effects from

testing

RCT to compare long-term effects of different

treatments (different

surgery and populations)

Test-treatment strategy for triage tests


Existing test Triage testTest safety & other attributes?

Test result

Management

Management

Test pos pathway A

Test neg pathway B

Sensitivity & specificity

Treatment effects?

Patient outcomes

Test neg pathway B*

TN FN

Test result

Treated populations?

Test pos Add existing test

Test neg pathway B

Test result

Management

Test pos pathway A


Ultrasound D-DimerPatient convenience

Test result

Management

Management

Test pos pathway A

Test neg pathway B

Same treatment effects

Patient outcomes

Test neg pathway B*

Test result

Same treated populations

Test pos Add ultrasound

Test neg pathway B

Test result

Management

Test pos pathway A

Example: Triage D-Dimer test to reduce the number of ultrasounds in patients at low risk for DVT

RCT to compare short-term

effects from testing

RCT to compare long-term

effects may not be necessary

D-Dimer sensitivity >98%

Use of randomized trials for test evaluation

62

Define the purpose of the test

Display the existing test-treatment strategy

Display the new test-treatment strategy

Identify the critical comparison to assess the incremental value

Assess whether existing evidence suffices or if RCTs are required

Today’s key messages

Diagnostic tests are used for screening, as add-on tests, for triage, as replacement tests or for monitoring.

Diagnostic test evaluation includes (at least) test accuracy studies,health outcomes studies and cost effectiveness studies.

63

Biases related to the spectrum of patients and to the reference standard affect estimates of diagnostic test accuracy most.

To ensure that informative RCTs, identify the critical comparisons between the old and new test-treatment.