AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Learning (MAC403)

transcript

Advisors: Robert Chang, Jeff Ullman, Andreas Paepcke

November 30, 2016

Automatic Grading of Diabetic

Retinopathy through Deep LearningApaar Sadhwani, Leo Tam, and Jason Su

MAC403

Problem, Data and Motivation Motivation:

Affects ~100M, many in developed, ~45% of diabetics Make process faster, assist ophthalmologist, self-help Widespread disease, enable early diagnosis/care

Given fundus image Rate severity of Diabetic Retinopathy 5 Classes: 0 (Normal), 1, 2, 3, 4 (Severe) Hard classification (may solve as ordinal though) Metric: quadratic weighted kappa, (pred – real)2 penalty

Data from Kaggle (California Healthcare Foundation, EyePACS) ~35,000 training images, ~54,000 test images High resolution: variable, more than 2560 x 1920

Example images

Class 0 (normal) Class 4 (severe)

Challenges High resolution images

Atypical in vision, GPU batch size issues

Discriminative features small Grading criteria:

not clear (EyePACS guidelines) learn from data

Incorrect labeling Artifacts in ~40% images Optimizing approach to QWK Severe class imbalance

class 0 dominates

Too few training examples

Image size Batch Size224 x 224 1282K x 2K 2

class 0 dominates

Class 0 1

class 0 dominates

Too few training examples Class 2

class 0 dominates

- Mentioned in problem statement- Confirmed with doctors

class 0 dominates

- Hard classification non-differentiable- Backprop difficult

0 1Truth

Penalty/Loss

class 0 dominates

0 1Truth

Predict1

Penalty/Loss

class 0 dominates

0 1Truth

Predict2

Penalty/Loss

class 0 dominates

0 1Truth

Predict3

Penalty/Loss

class 0 dominates

0 1Truth

Penalty/Loss

class 0 dominates

- Squared error approximation?- Differentiable

0 1Truth

Penalty/Loss

Class2.5

class 0 dominates

- Naïve: 3 class problem, or all zeros!- Learn all classes separately: 1 vs All?- Balanced while training

- At test time?

class 0 dominates

- Big learning models take more data!- Harness test set?

Conventional Approaches Literature survey:

Hand-designed features to pick each component

Clean images, small datasets Optic disk, exudate segmentation: fail

due to artifacts SVM: poor performance

Conventional Approaches Literature survey:

Hand-designed features to pick each component

Clean images, small datasets Optic disk, exudate segmentation: fail

due to artifacts SVM: poor performance

Our Approach

1. Registration, Pre-processing2. Convolutional Neural Nets (CNNs)3. Hybrid Architecture

Step 1: Pre-processing

Registration

Hough circles, remove outside portion

Downsize to common size (224 x 224, 1K x 1K)

Color correction Normalization (mean, variance)

Step 2: CNNs

3 Conv layers (depth 96)

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

MaxPool (stride2)

Network in Network architecture 7.5M parameters No FC layers, spatial average pooling instead

Transfer learning (ImageNet) Variable learning rates

Low for “ImageNet” layers Schedule

Combat lack of data, over-fitting Dropout, Early stopping Data augmentation (flips, rotation)

Step 2: CNNs

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

MaxPool (stride2)

Step 2: CNNs

MaxPool (stride2)

3 Conv layers (depth 384, 64, 5)

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

MaxPool (stride2)

Step 2: CNNs

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Step 2: CNNs

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Step 2: CNNs

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Step 2: CNN Experiments

What image size to use? Strategize using 224 x 224 -> extend to 1024 x 1024

What loss function? Mean squared error (MSE) Negative Log Likelihood (NLL) Linear Combination (annealing)

Class imbalance Even sampling -> true sampling

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Nolearning

Loss Function Sampling Result

Image size: 224 x 224

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Nolearning

MSE Fails to learn

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Nolearning

MSE Fails to learn

NLL Kappa < 0.1

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Nolearning

MSE Fails to learn

NLL Kappa < 0.1

NLL Kappa = 0.29

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

Nolearning

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

0.01x step size

NLL(top layers only)

Kappa = 0.29

NLL Kappa = 0.42

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

0.01x step size

Kappa = 0.29

NLL Kappa = 0.42

NLL Kappa = 0.51

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

0.01x step size

Kappa = 0.29

NLL Kappa = 0.42

NLL Kappa = 0.51

MSE Kappa = 0.56

MaxPool (stride2)

AvgPool

Input Image

Class probabilities

0.01x step size

Step 2: CNN Results

Computing Setup

Amazon EC2: GPU nodes, VPC, Amazon EBS-optimized Single GPU nodes for 224 x 224 (g2.2xlarge) Multi-GPU nodes for 1K x 1K (g2.8xlarge)

EBS, Amazon S3

Used Python for processing

Torch library (Lua) for training

Computing Setup

Data EBS (gp2)

Model Expt.

1 or 4 GPU node on EC2

Computing Setup

Data 1 Data 2EBS (gp2) EBS (gp2)

Snapshot (S3)

Model Expt.

GPU node on EC2

Computing Setup

Master

Data 1 Data 2Central Node

Model 2

Model 1

Model 10

EBS (gp2)

EBS-optimized

EBS (gp2)

Snapshot (S3)

VPC on EC2

Model Expt.

GPU node on EC2

Computing Setup

Master

Model 2

Model 1

Model 10

EBS (gp2)

EBS-optimized

EBS (gp2)

Snapshot (S3)

VPC on EC2

Model Expt.

GPU node on EC2~200 MB/s

Computing Setup

Master 2

Model 12

Model 11

Model 20

EBS (gp2)

EBS-optimized

EBS (gp2)

Snapshot (S3)

VPC on EC2

Master 1

Central Node

Model 2

Model 1

Model 10…

EBS-optimized VPC on EC2

Computing Setup

g2.2xlarge1 GPU node on EC2

4 GB GPU memoryBatch size: 128 images of 224 x 224

Computing Setup

!! Batch size: 8 images of 1024 x 1024 !!

Computing Setup

!! Batch size: 8 images of 1024 x 1024 !!

16 GB GPU memoryData ParallelismBatch size: ~28 images of 1024 x 1024

Step 3: Hybrid Architecture

2048 1024

64 tiles of256 x 256

MainNetwork

Class probabilities

LesionDetector

Lesion Detector

Web viewer and annotation tool Lesion annotation Extract image patches Train lesion classifier

Viewer and Lesion Annotation

Lesion Annotation

Extracted Image Patches

Train Lesion Detector

Only hemorrhages so far Positives: 1866 extracted patches from 216

images/subjects Negatives: ~25k class-0 images Pre-processing/augmentation

Crop random 256 x 256 image from input, flips Pre-trained Network in Network architecture Accuracy: 99% for Negatives, 76% for Positives

Hybrid Architecture

2048 1024

MainNetwork

Class probabilities

LesionDetector

Hybrid Architecture

64 x 31 x 312 x 31 x 31

66 x 31 x 31

2048 1024

2 Conv layers

MainNetwork

Class probabilities

LesionDetector

Hybrid Architecture

64 x 31 x 312 x 31 x 31

66 x 31 x 31

2048 1024

2 Conv layers

MainNetwork

Class probabilities

LesionDetector

2 x 56 x56

Training Hybrid Architecture

Class probabilities

2048 1024

MainNetwork

LesionDetector

Backprop

2048 1024

MainNetwork

Class probabilities

LesionDetector

Backprop

2048 1024

MainNetwork

Class probabilities

LesionDetector

Other Insights

Supervised-unsupervised learning Distillation Hard-negative mining Other lesion detectors Attention CNNs Both eyes Ensemble

Clinical Importance

3 class problem True “4” problem Combining imaging modalities (OCT) Longitudinal analysis

Many thanks to…

Amazon Web Services AWS Educate AWS Cloud Credits for Research

Robert Chang Jeff Ullman Andreas Paepcke

Thank you!

Remember to complete your evaluations!

AWS re:Invent 2016: Automatic Grading of Diabetic Retinopathy through Deep Learning (MAC403)

Technology