1 R2 ImageChecker CT CAD PMA: Clinical Results Nicholas Petrick, Ph.D. Office of Science and...

Post on 28-Dec-2015

216 views 3 download

transcript

11

R2 ImageChecker CT CAD PMA:Clinical ResultsNicholas Petrick, Ph.D.Office of Science and Technology

Center for Devices and Radiological Health

U.S. Food and Drug Administration

2

Outline

• Applicability of Az in analysis• Az is same as area under the curve (AUC)

• Pool of CT cases for clinical study• Defining actionable nodules by panel of experts• Clinical studies

• Primary analysis: analysis using fixed expert panel• Secondary analysis: analysis using random panels of

experts• Measurement of CAD standalone performance

• Algorithm’s performance with no reader involvement

3

Applicability of Az in analysis

• Average reader ROC Curves (pre/post CAD)

FPP

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

TP

P

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pre-CAD ROC

Post-CAD ROC

4

Applicability of Az in analysis

• Pre and post-CAD curves do not cross• No substantial pre/post-CAD crossing in

either averaged or individual ROC curves• Az is an appropriate performance measure

• Az used as figure of merit in all analysis

5

Pool of CT Cases

• Nodule cases• Documented cancers

• Primary neoplasm or extrathoracic neoplasm with presumptive spread to lungs

• Cases were allowed to contain non-nodule, pathologic processes (e.g., pneumonia, emphysema, etc.)

• Non-nodule cases• Normal cases

• No nodule deemed present by site P.I.• Primarily relied upon original radiology report

• History of cancer, radiation therapy, or even previous thorocatomy allowed

6

Defining Actionable Nodules by Panel of Experts

• ‘Actionable’ nodules are objects of interest• Panel of expert radiologists identify

actionable nodules• Nodules defined using a 2-pass process

7

Defining Actionable Nodules by Panel of Experts

• 1st reading of CT cases• Cases read independently & blinded by 3 expert radiologists• Radiologist provided subject’s age, gender, and indication

for exam• Marked all findings deemed lung nodules• Radiologist provided rating

• Intervention – Actionable, further workup advised• Surveillance – Actionable, monitor with follow-up studies• Probably Benign, calcified – no action required• Probably Benign, non-calcified – no action required

8

Defining Actionable Nodules by Panel of Experts

• 2nd pass• Findings that lacked 100% consensus after 1st pass were

reviewed unblinded by all 3 radiologists• 2/3 or 1/3 radiologists called the location a nodule are

reevaluated• Radiologists rated (or re-rated) the actionability of the

nodule candidates• Thresholds applied to all findings

• >4mm diameter• > -100 HU maximum density

• Each lung quadrant categorized by the highest actionable finding within quadrant

9

Defining Actionable Nodules by Panel of Experts

Disposition Unanimous Actionable

3/3

Majority Actionabl

e2/3

Minority Actionabl

e1/3

Sample Size 142 168 149

• 3 experts per panel

10

Clinical Studies

• ROC Observer Study• Az is test statistic

• Analysis of a 90 cases dataset (360 quadrants)

• Confidence intervals and significance testing• ANOVA-after-jackknife

• Bootstrap analysis

11

Clinical Studies Analysis Flowchart

Resampling

Scheme

Jackknife or

Bootstrap DefinitionOf Nodules

MRMC ROC Observer

Study

Pool of Cases

Pool of Experts

Pool of Readers

AzEstimates

12

ANOVA-after-Jackknife Analysis

• Parametric analysis• Leave-one case out (all 4 quadrants,

quadrant-based analysis)• Analysis assumes modality as a fixed

effect and readers, cases and all interactions as random effects

• Example• Set: [1 2 3], Partitions:[1 2], [1 3], [2 3]

13

Bootstrap Analysis

• Nonparametric analysis• Randomly generated datasets, based

on original data with replacement• Example

• Set: [1 2 3], Partitions:[3 2 3], [3 1 2], [1 1 2], …

14

Clinical Studies Primary Analysis

Resampling

Scheme

Jackknife or

Bootstrap

DefinitionOf Nodules

MRMC ROC Observer

Study

Pool of Cases

Pool of Experts

Pool of Readers

AzEstimates

• Fixed 3-member nodule definition panels (unanimous consensus)• ANOVA-after-jackknife and Bootstrap analysis

15

Clinical Studies Primary Analysis

• Fixed 3-member nodule definition panels

VarianceAnalysis

Pre-CADAz

Post-CADAz

ΔAzp-

valueLower C.L.

Upper C.L.

Jackknife 0.881 0.905 0.024

0.003 0.008 0.040

Bootstrap

0.879 0.903 0.025

<0.001

0.009 0.045

16

Clinical StudiesPrimary Analysis

• Statistically significant improvement in Az pre- to post-CAD• ΔAz~0.025

• ANOVA-after-jackknife and bootstrap analysis is consistent

• Analysis limited because it did not take into account any variation in the expert panel• Variability of panel would add uncertainty to performance

estimates• How would performance change with a different panel makeup?

• Different number of panel members• Different set of experts

17

Clinical Studies Secondary Analysis

Resampling

Scheme

BootstrapDefinitionOf Nodules

MRMC ROC Observer

Study

Pool of Cases

Pool of Experts

Pool of Readers

AzEstimates

• Random 3, 2, 1-member nodule definition panels (unanimous consensus)

• Only bootstrap analysis possible

18

Clinical StudiesSecondary Analysis

• Bootstrap analysis• Random 3-member nodule definition

panelsRandom

Panel Size

Pre-CADAz

Post-CADAz

ΔAzp-

valueLower C.L.

Upper C.L.

3-members

0.845 0.868 0.022

<0.001

0.008 0.040

2-members

0.832 0.854 0.022

0.002 0.008 0.039

1-member

0.817 0.838 0.021

<0.001

0.008 0.037

19

Clinical StudiesSecondary Analysis

• Sponsor's analysis takes into account random nature of expert panel for defining ‘actionable’ nodules• Different number of panel members: 3, 2, 1-member panels• Different panel makeup: bootstrap selection of panel

• All variations of panel makeup confirm a statistically significant improvement in Az from pre to post-CAD • ΔAz~0.02

• Likely to be a more appropriate analysis for assessment of devices when only panel truth is available

20

CAD Standalone Performance

• Performance of the CAD algorithm alone• Algorithm sensitivity and specificity (no reader

involvement)• Standalone CAD performance is important

• Radiologist needs this information to appropriately weight their confidence in the CAD markings

• Benchmark for future revisions to the algorithm • What is an appropriate performance measure

for this device?

21

CAD Standalone Performance

• Many of 142 findings (Fixed 3-member panel) did not meet criteria as a solid discrete, spherical density

• Second panel reevaluated nodules for appearance• 5 independent radiologists• 2 Categories

• Classic nodule: discrete solid, spherical or ovoid• Non-classic:

• Not discrete• Hyperdense• Irregularly shaped• Normal structure• Not a nodule

22

CAD Standalone Performance

No. Panelists defining as

classic

No. of Findings

CADTPF (%)

CADFalse

Marker Rate

TP Median Diamete

r(mm)

<3/5 65 32.3

~3 per-case

7.6-9.0

3/5 13 69.2 7.4

4/5 11 81.8 11.2

5/5 53 83.0 6.9

All 142 58.5 7.9<3/5 65 32.3

~3 per-case7.6-9.0

≥3/5 77 80.5 6.9-11.2

23

CAD Standalone Performance

• Large variation in performance of the CAD based on physicians assessment of nodule appearance as “classic”

24

Summary

• Az appropriate test statistic for clinical analysis

• No substantial crossing of pre/post-CAD ROC curves

• Primary Analysis• Nodule definition panel

• Fixed 3-member expert panel

• Shows statistically significant Az improvement in detection with CAD

• ANOVA-after-jackknife and bootstrap are comparable

25

Summary

• Secondary Analysis• Nodule Definition panel

• Varied number of panel members• Varied the panel makeup (bootstrap selection of panel

members)• Confirmed statistically significant Az improvement in

detection with CAD• Standalone performance

• Large variation in CAD performance based on reassessment of nodule appearance

• Necessary for appropriate utilization of the device by clinicians in the field and assessment of future algorithm revisions