Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of...

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of

Stromal Development

Olcay Sertel1,2, Antonio Ruiz3, Umit Catayurek1,2, Manuel Ujaldon3, Joel Saltz1, Metin Gurcan1

1Dept. of Biomedical Informatics, 2Dept. of Electrical & Computer Engineering, 3Dept. of Pathology, The Ohio State University, 3Dept. of Dept. of

Computer Architecture, Computer Architecture, The University of Malaga

2

Why do we need high-performance Why do we need high-performance tools?tools? The size of a single whole-slide image is extremely

large! Typically an uncompressed whole-slide image digitized at

40x is more than 40GB. A spatial resolution of 120K x 120K

120K x 120K x 3 Bytes(RGB) per pixel ≈ 43.2 GB

Complicated and time-consuming image analysis algorithms.

3

Parallel processing infrastructureParallel processing infrastructure

`

Whole-slide image

Label 1

Label 2

Background

Label 3

Assign classification labels

Classification map

Image tiles (40X magnification)

Processor 1 Processor N………

Parallel Classification

4

What is GPGPU?What is GPGPU?

GPGPU stands for General Purpose Graphics Processing Units

Initially designed for gaming applications Fast GPUs are used to implement complex shader and

rendering operations for real-time effects.

Doom 3, © id Software Call of Duty, © Infinity Ward

5

ApplicationsApplications

Physically-based Simulation

Particle Systems

Molecular DynamicsFluid models

Signal and Image Processing

Segmentation

Volume Rendering

Visualization

Photon Mapping

Ray Tracing

Medical Image Analysis

Databases & Data Mining

Database queries

Stream Mining

6

GPU resourcesGPU resources

CPU GPU

Processor clock 2.13 GHz 575 MHz

Raw computational power 10 GFLOPS 520 GFLOPS

Memory bus width 64 bits 384 bits

Memory clock 2x333 MHz 2x900 MHz

Memory bandwidth 10.8 GB/s 86.4 GB/s

Memory size and type 2 Gb DDR2 768 Mb GDDR3

GPUs: Speed increasing at cubed-

Moore’s law! Ubiquitous and inexpensive Functional units for specific

graphics-based operations (vertex & pixel shaders)

Small memory but raw computational power

Memory bandwidth & clock provides superior performance

7

GPU implementationGPU implementation

The implementation is crucial Programming model is unusual Programming idioms tied to computer

graphics Programming environment tightly

constrained

Can’t simply port CPU code: Poorly suited to sequential, “pointer-

chasing” code

Missing support for some basic functionality (e.g., integers, bitwise operations)

Underlying architectures are: Inherently parallel Rapidly evolving (even in basic feature set!) Largely secret

8

Computational savings on GPUsComputational savings on GPUs

Execution times (in msec.) for a 1Kx1K image tile.

CPU (Matlab) CPU (C++) GPU

LA*B* conversion 3185.3 614.8 0.5

Statistical features

2081.8 28.9 13.6

LBP 771.8 208.8 4.7

Total 6038.9 852.5 18.8

Processing of a relatively small whole-slide image of 50Kx50K size is:

• 47 sec. on GPU• 35 min. on CPU

Task to perform C++ vs. Matlab

GPU vs. C++ GPU vs. Matlab

RGB to LA*B* conv.

5.9x - 5.2x 69.2x -1409.6x 406.1x - 7391.3x

Statistical features

122.2x - 90.0x 0.2x - 2.1x 21.8x - 192.1x

LBP operator 8.3x - 3.9x 4.2x - 38.3x 34.6x - 350.9x

TOTAL 13.3x - 7.6x 2.6x - 46.3x 33.4x - 350.9x

Performance gain depends on image resolution, varying from 128x128 to 1024x1024

9

Verification of the out valuesVerification of the out values

Mean Standard deviation

CPU(Matlab) / CPU(C++) 1.410-4 - 1.210-2

1.810-4 - 1.010-

2

CPU(C++) / GPU 6.510-4 - 2.110-2

4.310-4 - 5.010-

2

CPU(Matlab) / GPU 1.510-3 - 1.710-2

7.510-3 - 5.010-

2

Verification of the output values across hardware platforms obtained from 500 training images.

There is no variation in the classification accuracy when using the feature values computed on GPU

10

Future directions & ConclusionsFuture directions & Conclusions

Processing of the whole-slide images is essential to overcome the sampling bias problem.

We need HPC tools that are available due to the huge sizes of whole-slide images and sophisticated image analysis algorithms The processing time can be reduced drastically using different

infrastructures We are investigating novel ways of whole-slide images over

various computational infrastructures Cluster of GPUs

One drawback of GPUs is the low-level programmability Requires good knowledge of architecture Rapid changes in the architecture

However, higher level development tools (CUDA by NVidia)

11

Thanks for your attention

Any questions?

Date post:	28-Dec-2015
Category:	Documents
Upload:	godwin-ferdinand-hamilton
View:	218 times
Download:	0 times

Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of...

Documents