+ All Categories
Home > Documents > Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica...

Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica...

Date post: 20-Jan-2016
Category:
Upload: beverley-oliver
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer Institute, Toronto, Ontario
Transcript
Page 1: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

Crystallization Image Analysis on the World Community Grid

Christian A. Cumbaa and Igor JurisicaJurisica Lab, Division of Signaling Biology

Ontario Cancer Institute, Toronto, Ontario

Page 2: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

2

Why automate classification of protein crystallization trial images?

• Hauptman-Woodward has 65,000,000 images.– They want 65,000,000 outcomes.

clearphase separationprecipitateskincrystalX

garbageunsure

Page 3: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

3

Why automate classification of protein crystallization trial images?

• Assist or replace human screening• Speed the search phase in protein crystallization• Improve throughput, consistency, objectivity• Enables data mining and statistical optimization

of the crystallization process

clearclear precipitateprecipitate crystalcrystal

Page 4: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

4

Image classification

clear

phase separation

precipitate

skin

crystalX

garbage

unsure

100000s of numbers 7 numbers10s of numbers

feature 1feature 2

…feature k

feature extraction classification

Page 5: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

5

Truth data

• 96 study– 96 proteins X 1536 images

hand-scored by 3 experts– Presence/absence of 7

independent outcomes

• NESG & SGPP– 15000 images– Hand-scored by 1 expert,

same scoring system

• 50% unanimously-scored images– 10 most interesting

compound categories

96-study

SGPP (crystals)

NESG (crystals)

Page 6: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

6

Feature set

12375 features computed per image

– A few basic statistics– 50 microcrystal features– Euler number features,

two variations1. 11 Blur levels

2. 11 Blur levels X 4 thresholds

– Image “energy”• 11 blur levels

– 2925 Grey-Level Co-occurrence Matrix features

• 3 different grey-level quantizations

• 13 basic functions

• 25 sample distances

• ~100 directions– Computable from every

point in the image– Distilled to max range,

max mean, min mean

– ~9500 image-blob features• Radon & edge-detection

Page 7: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

7

Our image analysis problem

• Computing all 12,375 features takes >5 hours for a single image

• We have 165,000 images in our training set• Features must be evaluated for quality• The best features (10s or low 100s) must be

computed for the remaining 65,000,000 images

Massive computing resources required!

Page 8: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

8

Image analysis on the World Community Grid

• http://www.worldcommunitygrid.org– a global, distributed-computing platform for solving large

scientific computing problems with human impact– 377,627 volunteers contribute idle CPU time of 960,346

devices.

• Our project: Help Conquer Cancer* – launched November 2007.

• HCC has two goals:1. To survey a wide tract of image-feature space and identify

image analysis algorithms and parameters (features) that best determine crystallization outcome.

2. To perform the necessary image analysis on Hauptman Woodward’s archive of 65,000,000 crystallization trial images.

* fundraising slogan of the Ontario Cancer Institute and its parent organization.

Page 9: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

9

Image analysis on the World Community Grid

• HCC has two phases– Phase I: calculate 12,375 features per image on

high-priority images, including 165,441 hand-scored images.– November 2007-May 2008– analysis on hand-scored images completed January 2008

– Phase II: calculate the best features from Phase I on the backlog of HWI images

• Grid members have contributed 8,919 CPU-years so far to HCC, an average of 55 CPU-years per day.

Page 10: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

10

Page 11: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

11

Page 12: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

Phase I: feature assessment

Page 13: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

13

Measuring feature quality

• Treat as random variables:– Image class– Feature value

• Measure the mutual information between them (unit: bits)= entropy(class) +

entropy(feature) – entropy(class,feature)

00.10.20.30.40.50.60.70.80.9

1

cle

ar

ph

ase

sep

ara

tion

pre

cip

itate

skin

crys

tal

ga

rba

ge

un

sure

En

tro

py

(b

its

)

feature entropy

class entropy

Page 14: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

14

clear

precipitate (no crystal)

other

Measuring feature quality

Page 15: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

15

Information density: microcrystal counts parameter space

Clear Precipitate Crystal

Page 16: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

16

Information density: GLCM maximum range parameter space

Clear Precipitate Crystal

Page 17: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

17

Information density: Radon-Sobel soft sum parameter space

Clear Precipitate Crystal

Page 18: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

18

Information density: Radon-Sobel blob metrics (means) parameter

space

Clear Precipitate Crystal

Page 19: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

Towards Phase II: image classification

Page 20: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

20

Building classifiers

• handpicked 74 features from peaks in the clear, precipitate and other mutual information plots

• two classification schemesthree-way: clear, non-crystal precipitate, other

ten-way: clear, phase separation, phase + precipitate, skin, phase + crystal, precip, precip + skin, precip + crystal, crystal, garbage

• naïve Bayes model• leave-one-out cross-validation

Page 21: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

21

Measuring classifier accuracy: precision and recall

precision

recall

crystals

“I think these are crystals”

truepositives

false negatives

false positives

Page 22: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

22

Three-class distribution

Clear 24.3%

Precipitate AND NOT crystal 52.7%

Other 23.0%

1709552585109

15928451121819

61781727615clear

non-crystal precipitate

other

cle

ar

non

-cry

stal

p

reci

pita

te

oth

er

machine saystrue

class

Confusion matrix

Page 23: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

23

Recall & precision

Page 24: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

24

10-class distribution

Clear 33.83%

Phase separation 7.00%

Phase separation + precipitate 0.50%

Skin 0.79%

Phase separation + crystal 2.32%

Precipitate 34.25%

Precipitate + skin 4.95%

Precipitate + crystal 7.53%

Crystal 8.34%

Garbage 0.55%

Page 25: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

25

3132002521490428

129312910729021964958656345888

8914285261110635621118522235

2930539520086923282433320512

38551240883440169075536174941972441

105512928875511853726874

2010505136372029126

331107819751632241

91503139752986682814024331446

1193920181501135122725585clear

phase separation

phase and precipitate

skin

phase and crystal

precipitate

precipitate and skin

precipitate and crystal

crystal

garbage

clea

r

phas

e se

para

tion

phas

e an

d

prec

ipita

te

skin

phas

e an

d cr

ysta

l

prec

ipita

te

prec

ipita

te a

nd s

kin

prec

ipita

te a

nd

crys

tal

crys

tal

garb

age

machine says

true

class

Confusion matrix

Page 26: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

26

Recall & precision

Page 27: Crystallization Image Analysis on the World Community Grid Christian A. Cumbaa and Igor Jurisica Jurisica Lab, Division of Signaling Biology Ontario Cancer.

27

AcknowledgementsHauptman-Woodward Medical Research Institute

George DeTitta, Joe Luft, Eddie Snell, Mike Malkowski, Angela Lauricella, Max Thayer, Raymond Nagel, Steve Potter, and the 96-study reviewers.

World Community GridBill Bovermann, Viktors Berstis, Jonathan D. Armstrong, Tedi Hahn, Kevin Reed, Keith J. Uplinger, Nels Wadycki

IBM Deep Computing: Jerry Heyman

Jurisica Lab: Richard Lu

All crystallization images were generated at the High-Throughput Screening lab at The Hauptman-Woodward Institute.

Funding fromNIH U54 GM074899Genome CanadaIBMNSERC

(and earlier work from)NIH P50 GM62413NSERCCITO


Recommended