+ All Categories
Home > Documents > Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R....

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R....

Date post: 28-Mar-2015
Category:
Upload: javon-veazey
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
33
Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat Dundar Vikas Raykar Shipeng Yu Sriram Krishnan Xiang Zhou Arun Krishnan Marcos Salganicoff Luca Bogoni Matthias Wolf Anna Jerebko Jonathan Stoeckel
Transcript
Page 1: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.

Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat Dundar Vikas Raykar Shipeng Yu Sriram Krishnan Xiang Zhou Arun Krishnan Marcos Salganicoff Luca Bogoni Matthias Wolf Anna Jerebko Jonathan Stoeckel

Page 2: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 2

Outline of the talk

Mining medical images

Computer aided diagnosis (CAD)

Key data mining challenges

Clinical impact

Lessons learnt

Several thousand units of the products described in this paper have been commercially deployed in hospitals around the world since 2004

Page 3: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 3

Medical Imaging 1895 X-ray used for broken bones, locating foreign objects 1970 Computed tomography (CT) 3-D imaging As resolution increased in-vivo imaging is widely used to locate

medical abnormalities for diagnosis and surgery planning

Digital MammogramDigital Mammogram

CT ScanCT Scan

Page 4: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 4

Mining medical imaging data Increased resolution has resulted in Data Overload

Increased total study time Increase in data does not always translate to improved diagnosis

Automatically extract the actionable information from the imaging data in order to ensure improvement in patient care simultaneous reduction in total study time

Raw imaging dataRaw imaging data Clinically relevant informationClinically relevant information

Knowledge based data-mining algorithms

Knowledge based data-mining algorithms

Computer aided diagnosis/detection CAD

Page 5: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 5

Computer-aided diagnosis/detection (CAD)

Used as a second reader

Improves the detection performance of a radiologist

Reduces mistakes related to misinterpretation

The principal value of CAD is determined by carefully measuring the incremental value of CAD in normal clinical practice

CAD technologies support the physician by drawing attention to structures in the image that may require further review.

Page 6: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 6

Lung CAD

Identify suspicious regions called nodules (which are known to be precursors of cancer) in CT scans of the lung.

Page 7: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 7

Colon PEV Polyp Enhanced Viewer

Identify suspicious regions called polyps in CT scans of the colon.

Page 8: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 8

Mammo CAD

Identify abnormal masses/calcifications in digital mammograms.

PECAD and MammoCAD are only sold outside the US.

Page 9: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 9

PE CAD

Pulmonary Embolism (PE) is a sudden blockage in a pulmonary artery caused by an embolus that is formed in one part of the body and travels to the lungs in the bloodstream through the heart.

PECAD and MammoCAD are only sold outside the US.

Page 10: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 10

CAD

Goal is to detect potentially malignant nodules (lung)

polyps (colon)

lesions (breast)

Pulmonary emboli (lung)

in medical images like CT scans, X-ray, MRI, etc.

Early detection provides the best prognosis

Page 11: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 11

Typical CAD architecture

Candidate Generation

Feature Computation

Classification

Image [ X-ray | CT scan | MRI ]

Location of lesions

Focus of the current talk

Potential candidates

Lesion

> 90% sensitivity60-300 FP/image

> 80% sensitivity 2-5 FP/image

Page 12: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 12

Key Data Mining Challenges

High accuracy 2-5 FP/image sensitivity > 80%

1. The breakdown of assumptions

2. Highly unbalanced data

3. Feature computation cost

4. Incorporating domain knowledge

5. No objective ground truth

Page 13: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 13

The breakdown of assumptionsregion on a mammogram lesion not a lesion

Traditional classification algorithms

Neural networksSupport Vector MachinesLogistic Regression ….

Often violated in CAD

Make two key assumptions

(1) Training samples are independent (2) Maximize classification accuracy over all candidates

Page 14: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 14

Violation 1: Training examples are correlated

Candidate generation produces a lot of spatially adjacent candidates.

Hence there are high level of correlations among candidates.

Also correlations exist across different images/detector type/hospitals.

Page 15: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 15

Violation 2: Candidate level accuracy is not important

Several candidates from the CG point to the same lesion in the breast.

Lesion is detected if at least one of them is detected.

It is fine if we miss adjacent overlapping candidates.

Hence CAD system accuracy is measured in terms of per lesion/image/patient sensitivity.

So why not optimize the performance metric we use to evaluate our system?

Most algorithms maximize classification accuracy.Try to classify every candidate correctly.

Page 16: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 16

Solution 1: Multiple Instance LearningFung, et al. 2006, Bi, et al. 2007, Raykar et al. 2008, Krishnapuram, et al. 2008,

How do we acquire labels ?

Candidates which overlap with the radiologist mark is a positive.Rest are negative.

1

1

0

0

0

0

Single Instance Learning

1

0

0

0

0

Multiple Instance Learning

Classify every candidate correctly

Positive Bag

Classify at-least one candidate correctly

We have modified SVM and logistic regression for multiple instance learning

Page 17: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 17

Simple Illustration

Single instance learning:

Reject as many negative candidates as possible.

Detect as many positives as possible.

Multiple Instance LearningSingle Instance Learning

Multiple instance learning:

Reject as many negative candidates as possible.

Detect at-least one candidate in a positive bag.

Accounts for correlation during trainingAccounts for correlation during training

Page 18: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 18

Solution 2: Batch ClassificationVural et al., 2009 Accounts for correlation during testingAccounts for correlation during testing

Change the decision boundary during test time.Change the decision boundary during test time.

Page 19: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 19

Skewed data and expensive features

1. Highly unbalanced class distribution (less than 1% are abnormal)

2. Huge number of experimentally engineered features

3. Lot of them are irrelevant and redundant.

4. Feature computation is expensive

5. Stringent run-time requirements

1. Feature selection/Sparse classifiers2. Cascaded classification architecture1. Feature selection/Sparse classifiers2. Cascaded classification architecture

Page 20: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 20

Cascaded classification architectureBi, et al. 2006

Page 21: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 21

Novel AND-OR training of cascadesDundar and Bi 2007

Page 22: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 22

Incorporating domain knowledge

We know that lesions have different shapes/sizes/appearance

Page 23: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 23

Gated Classification architecture

Page 24: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 24

Incorporating domain knowledgeDundar et al. 2007

Exploit different sub-classes of False Positives

Page 25: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 25

Subjective Ground truthRaykar et al. 2009

Lesion ID Radiologist 1

Radiologist 2

Radiologist 3

Radiologist 4

Truth

Unknown

12 0 0 0 0 x

32 0 1 0 0 x

10 1 1 1 1 x

11 0 0 1 1 x

24 0 1 1 1 x

23 0 0 1 0 x

40 0 1 1 0 x

Each radiologist is asked to annotate whether a lesion is malignant (1) or not (0).

In practice there is a substantial amount of disagreement.

We have no knowledge of the actual golden ground truth.

Getting absolute ground truth (e.g. biopsy) can be expensive.

We have proposed an EM algorithm to simultaneously learn the ground truth and the classifier.We have proposed an EM algorithm to simultaneously learn the ground truth and the classifier.

Page 26: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 26

Key Data Mining Challenges

Challenge Solutions

1. Training/testing data is correlated Multiple instance learningbatch classification

2. Evaluation metric is CAD specific Multiple instance learning

3. Highly unbalanced data Cascaded classifiers

4. Feature computation cost Cascaded classifiersFeature selection methods

5. Incorporating domain knowledge Gated classifiersPolyhedral classifiers

6. No objective ground truth EM algorithm

Page 27: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 27

Clinical Impact

1. How much can a radiologist benefit by using the CAD software ?

2. CAD is mostly deployed in second reader mode.

3. Measure the improvement in performance of a radiologist with CAD.

4. Several independent clinical studies/trials have been conducted by our collaborators worldwide.

Page 28: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 28

Lung CAD

1. FDA clinical validation study with17 radiologists,196 cases from 4 hospitals. Average reader AUC increased by 0.048 (p<0.001) because of CAD.

2. Recent study at NYU by Godoy et al. 2008

3. New prototype also helps detect different kinds of nodules.

.

Mean sensitivity without CAD

Mean sensitivity with CAD

Increase in sensitivity

Solid Nodules 60% 85% 15 %Part-solid Nodules 80% 95% 15%

Ground Glass Opacities 75% 86% 11%

Sensitivity without CAD Sensitivity with CAD Increase in sensitivity

Reader 1 56.2 % 66.0 % 9.8 %Reader 2 79.2 % 89.8 % 10.6 %

Page 29: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 29

Colon PEV

Colon PEV (Polyp Enhanced Viewer) was evaluated by Baker, et al. 2007

Study with seven less-experienced readers

Without PEV average sensitivity was 0.810

With PEV average sensitivity was 0.908

A 9.8% increase in average sensitivity (p=0.0152).

.

Page 30: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 30

PE CAD

Das et al. 2008 conducted a study with 43 patients to asses the sensitivity of detection of pulmonary embolism.

.

Sensitivitywithout CAD

Sensitivity with CAD

Increase in sensitivity

Reader 1 87% 98% 11%

Reader 2 82% 93% 11%

Reader 3 77% 92% 15%

Page 31: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 31

Key data mining lessons

1. True measure of impact is how much does CAD help the radiologists.

2. Design algorithms to optimize the metric you care about

3. Careful analysis of the assumptions behind off-the-shelf data-mining algorithms. In CAD most of these assumptions break down. Need to design new methods.

4. Domain knowledge is very important. Collaboration with radiologists is crucial in eliciting the domain knowledge.

Page 32: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 32

Conclusions

1. Radiologists have access to orders of magnitude more data for diagnosing various cancers.

2. Difficult and time-consuming to identify key clinical findings.

3. We described the data-mining challenges in a commercially deployed CAD software.

4. Use of CAD as second reader improves radiologist's detection performance.

5. Key opportunity for data mining technologies to impact patient care worldwide.

Page 33: Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved. Mining Medical Images R. Bharat Rao Glenn Fung Balaji Krishnapuram Jinbo Bi Murat.

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 33

Acknowledgements

Dr. D. Naidich, MD, of New York University

Dr. M. E. Baker, MD, of the Cleveland Clinic Foundation

Dr. M. Das, MD, of the University of Aachen

Dr. U. J. Schoepf, MD, of the Medical University of South Carolina

Dr. Peter Herzog, MD, of Klinikum Grossharden, Munich.

Alok Gupta, Ph.D., Ingo Schmuecking, MD,

Harald Steck, Ph.D., Stefan Niculescu, Ph.D., Romer Rosales, Ph.D.,

Sangmin Park, Ph.D., Gerardo Valadez Ph.D.

Maleeha Qazi, and the entire SISL team.


Recommended