+ All Categories
Home > Documents > Saoni Mukherjee, Nicholas Moore, James Brock, Miriam...

Saoni Mukherjee, Nicholas Moore, James Brock, Miriam...

Date post: 03-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
filtered projections reconstructed 3D volume (F) X-ray source Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA sinogram: a line for every angle reconstructed cross- sectional slice data 3D reconstructed volume , 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 MATLAB C Time ( Sec) Backprojection time Total time 2hr 20m 40s 2hr 20m 43s 1hr 32m 36s 1h 32m 39s Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels Size = 1 MB + 1 MB Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels Size = 542 MB + 768 MB > 1GB Programming Paradigm Speedup over single threaded MATLAB Speedup over single threaded C Speedup over multi- threaded C C with OpenMP 45x 4x -- OpenCL (NVIDIA) 2026x 200x 45x OpenCL (AMD) 400x 40x 8x CUDA 4500x 430x 100x Future work Optimize other GPU kernels More configurations to be tested with auto-tuning Streaming for bigger datasets Overlapping computation and communication Improve performance on AMD device References [1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz. Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University Back projection has most intensive computations, but highly parallelizable. Different voxels are independent. Fessler’s image reconstruction toolbox 6 implements Feldkamp CBCT in MATLAB. Widely used in academia. Sample Projections Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q 1 ,Q 2 , ...,Q K , collected at an angle θ n where 1 ≤ n ≤ K. d i = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space. Backprojection: The volume F is reconstructed using the following equations: Our approach Results on Phantom- Total time Architectures and Languages used Co- ordinates Weight value, Advantage: Faster reconstruction of the final volume will help in treatment/diagnosis of patients. Capturing data takes only ~9 seconds and reconstruction takes ~3 hours. CT Scan Procedure Feldkamp Algorithm 3D CT Reconstruction Abstract Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography (CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional reconstruction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes written in C, OpenMP and MATLAB are compared with several heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over 40 times using the GPU are seen for phantom data and close to 90 times for the larger mouse datasets over multithreaded C implementation. Motivation 2.25 89.62 14.07 14.7 123.23 19.68 0 20 40 60 80 100 120 140 Weighting Filtering Backprojection Time (Millisec) NVIDIA timings AMD timings Result on phantom- kernel runtime Results on Mouse- Total time Acknowledgements More information and software available: http://www.coe.neu.edu/Research/rcl//projects/ CBCT.php Raw projections Reconstructed 3D volume Reconstructed 3D volume Filtered projections 1.Weighting 2. Filtering 3. Back projection CPU GPU Raw projections Host Device Language Intel Xeon CPU E5- 2620 0 @ 2.00GHz with 6 cores, Cache size: 15MB, RAM size: 32GB. MATLAB MATLAB PCT C C with OpenMP NVIDIA Tesla C2075 AMD Raedon HD5870 CUDA OpenCL 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Runtime in log scale (in Sec) Backprojection time Total time MATLAB PCT C OpenMP OpenCL CUDA -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Runtime in log scale (in Sec) Backprojection time Total time MATLAB C OpenMP OpenCL OpenCL CUDA (NVIDIA) (AMD) Programming Paradigm Speedup over single threaded MATLAB Speedup over multi- threaded MATLAB Speedup over single threaded C Speedup over multi- threaded C MATLAB PCT 2x -- -- -- C with OpenMP 10x 5x 4x -- OpenCL (NVIDIA) 700x 385x 350x 85x CUDA 800x 415x 380x 90x
Transcript
Page 1: Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leesersaoni/files/Mukherjee_PUMPS2013_poster.pdf · Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE

filtered projections

reconstructed 3D volume (F)

X-ray source

Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA

sinogram: a line for every angle

reconstructed cross-sectional slice

data

3D reconstructed

volume

,

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

MATLAB C

Tim

e (

Se

c)

Backprojectiontime

Total time

2h

r 2

0m

40

s

2h

r 2

0m

43

s

1h

r 3

2m

36

s

1h

32

m 3

9s

Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels Size = 1 MB + 1 MB Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels Size = 542 MB + 768 MB > 1GB

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

C with OpenMP

45x 4x --

OpenCL (NVIDIA)

2026x 200x 45x

OpenCL (AMD)

400x 40x 8x

CUDA 4500x 430x 100x

Future work

• Optimize other GPU kernels • More configurations to be tested with auto-tuning • Streaming for bigger datasets • Overlapping computation and communication • Improve performance on AMD device

References

[1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz.

Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University

• Back projection has most intensive computations, but highly parallelizable. • Different voxels are independent. • Fessler’s image reconstruction toolbox6 implements Feldkamp CBCT in MATLAB. Widely used in academia.

Sample Projections

Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q1,Q2, ...,QK, collected at an angle θn where 1 ≤ n ≤ K. di = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space.

Backprojection: The volume F is reconstructed using the following equations:

Our approach

Results on Phantom- Total time

Architectures and Languages used

Co- ordinates

Weight value,

Advantage: Faster reconstruction of the final volume will help in treatment/diagnosis of patients. Capturing data takes only ~9 seconds and reconstruction takes ~3 hours.

CT Scan Procedure

Feldkamp Algorithm

3D CT Reconstruction

Abstract Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography (CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional reconstruction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes written in C, OpenMP and MATLAB are compared with several heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over 40 times using the GPU are seen for phantom data and close to 90 times for the larger mouse datasets over multithreaded C implementation.

Motivation

2.25

89.62

14.07 14.7

123.23

19.68

0

20

40

60

80

100

120

140

Weighting Filtering Backprojection

Tim

e (

Mill

ise

c)

NVIDIA timings

AMD timings

Result on phantom- kernel runtime

Results on Mouse- Total time

Acknowledgements

More information and software available: http://www.coe.neu.edu/Research/rcl//projects/

CBCT.php

Raw projections

Reconstructed 3D volume

Reconstructed 3D volume

Filtered projections

1.Weighting 2. Filtering

3. Back projection

CPU GPU

Raw projections

Host Device Language

Intel Xeon CPU E5-2620 0 @ 2.00GHz with 6 cores, Cache size: 15MB, RAM size: 32GB.

MATLAB MATLAB PCT C C with OpenMP

NVIDIA Tesla C2075 AMD Raedon HD5870

CUDA OpenCL

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

MATLAB MATLAB PCT C OpenMP OpenCL(NVIDIA) CUDA

Ru

nti

me

in lo

g sc

ale

(in

Se

c) Backprojection time

Total time

MATLAB PCT C OpenMP OpenCL CUDA MATLAB PCT C OpenMP OpenCL CUDA

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Ru

nti

me

in lo

g sc

ale

(in

Se

c)

Backprojection time

Total time

MATLAB C OpenMP OpenCL OpenCL CUDA (NVIDIA) (AMD)

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over multi-threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

MATLAB PCT

2x -- -- --

C with OpenMP

10x 5x 4x --

OpenCL (NVIDIA)

700x 385x 350x 85x

CUDA 800x 415x 380x 90x

Recommended