+ All Categories
Home > Documents > Saoni Mukherjee, Nicholas Moore, James Brock, Miriam...

Saoni Mukherjee, Nicholas Moore, James Brock, Miriam...

Date post: 03-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
filtered projections reconstructed 3D volume (F) X-ray source Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets. What is CT Scanning? sinogram: a line for every angle reconstruction routine reconstructed cross- sectional slice data 3D reconstructed volume , , , Advantage i) Reduced X-ray exposure, ii) Image accuracy - more accurate than MRI! Disadvantage The longer time it takes to reconstruct the volume! - - interruption in treatment/ diagnosis. Co- ordinates Weight value, 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 MATLAB C Time ( Sec) Backprojection time Total time 2hr 20m 40s 2hr 20m 43s 1hr 32m 36s 1h 32m 39s Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels Host Device Language Intel Core i7 quad-core processor with @ 3.4 GHz MATLAB MATLAB PCT Intel Xeon W3580 quad-core processor @ 3.33 GHz NVIDIA Tesla C2070 C C with OpenMP CUDA Intel Xeon CPUs E5520 @ 2.27GHz AMD Radeon HD5870 OpenCL Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels 2.25 89.62 14.07 14.7 123.23 19.68 0 20 40 60 80 100 120 140 Weighting Filtering Backprojection Time (Millisec) NVIDIA timings AMD timings 17.02 1.36 0.32 0.01 0.1 0.01 17.09 1.44 0.33 0.11 0.16 0.1 0 2 4 6 8 10 12 14 16 18 MATLAB C C+OpenMP OpenCL-NVIDIA OpenCL-AMD CUDA Time (sec) Backprojection time Total time 32m 9s 1h 14m 37s 1h 14m 43s 32m 12s 42 s 2h 20m 43s 1m 31s 1m 7s 2h 20m 40s 55s Programming Paradigm Speedup over single threaded MATLAB Speedup over single threaded C Speedup over multi-threaded C C with OpenMP 50x 4x - OpenCL (NVIDIA) 1700x 136x 32x OpenCL (AMD) 170x 13x 3x CUDA 1700x 136x 32x Programming Paradigm Speedup over single threaded MATLAB Speedup over multi-threaded MATLAB Speedup over single threaded C Speedup over multi- threaded C MATLAB PCT 1.5x - - - C with OpenMP 4x - 2x - OpenCL (NVIDIA) 125x 80x 70x 30x CUDA 200x 130x 100x 45x Future work 1) The next bottleneck- Weighted Filtering. Was not earlier! 2) More configurations to be tested with auto-tuning- number of kernels to be launched, number of threads. 3) Streaming for bigger datasets. 4) Overlapping computation and communication. Acknowledgments References [1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz. Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University Backprojection takes most of the time, but highly parallelizable. Different voxels are independent. Fessler’s image reconstruction toolbox 6 implements Feldkamp CBCT in MATLAB. Widely used in academia. Sample Projections 3D CT Reconstruction Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q 1 ,Q 2 , ...,Q K , collected at an angle θn where 1 ≤ n ≤ K. d i = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space. Backprojection: The volume F is reconstructed using the following equations: Feldkamp Algorithm Motivation Our approach Results P H A N T O M M O U S E S C A N Architectures and Languages used More information and software available: http://www.coe.neu.edu/Research/rcl//projects/CBCT.php Abstract
Transcript
Page 1: Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leesersaoni/files/Mukherjee_GTC2013_poster.pdf · filtered projections reconstructed 3D volume (F) X-ray source Saoni Mukherjee,

filtered projections

reconstructed 3D volume (F)

X-ray source

Saoni Mukherjee, Nicholas Moore, James Brock, Miriam Leeser

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA

Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data and then filters the weighted data to backproject the data and create the final three dimensional construction. This is implemented on two types of hardware: CPU and a heterogeneous system combining CPU and GPU. The CPU codes in C and MATLAB are compared with the heterogeneous versions written in CUDA-C and OpenCL. The relative performance is tested and evaluated on a mathematical phantom as well as on mouse data. Speedups of over thirty times using the GPU are seen for phantom data and close to fifty times for the larger mouse datasets.

What is CT Scanning?

sinogram: a line for every angle

reconstruction routine

reconstructed cross-sectional slice

data

3D reconstructed volume

, , ,

Advantage i) Reduced X-ray exposure, ii) Image accuracy - more accurate than MRI! Disadvantage The longer time it takes to reconstruct the volume! - - interruption in treatment/ diagnosis.

Co- ordinates

Weight value,

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

MATLAB C

Tim

e (

Se

c)

Backprojection time

Total time

2h

r 2

0m

40

s 2h

r 2

0m

43

s

1h

r 3

2m

36

s

1h

32

m 3

9s

Mathematical phantom Input: 64 × 60 pixels with 72 projections final volume: 64 × 60 × 50 voxels

Host Device Language

Intel Core i7 quad-core processor with @ 3.4 GHz MATLAB MATLAB PCT

Intel Xeon W3580 quad-core processor @ 3.33 GHz

NVIDIA Tesla C2070 C C with OpenMP CUDA

Intel Xeon CPUs E5520 @ 2.27GHz AMD Radeon HD5870 OpenCL

Mouse scan Input: 512 × 768 pixels with 361 projections final volume: 512 × 512 × 768 voxels

2.25

89.62

14.07 14.7

123.23

19.68

0

20

40

60

80

100

120

140

Weighting Filtering Backprojection

Tim

e (

Mill

ise

c)

NVIDIA timingsAMD timings

17.02

1.36 0.32 0.01 0.1 0.01

17.09

1.44 0.33 0.11 0.16 0.1 0

2

4

6

8

10

12

14

16

18

MATLAB C C+OpenMP OpenCL-NVIDIA OpenCL-AMD CUDA

Tim

e (

sec)

Backprojection time

Total time

32

m 9

s

1h

14

m 3

7s

1h

14

m 4

3s

32

m 1

2s

42

s

2h

20

m 4

3s

1m

31

s

1m

7s 2h

20

m 4

0s

55

s

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

C with OpenMP 50x 4x -

OpenCL (NVIDIA) 1700x 136x 32x

OpenCL (AMD) 170x 13x 3x

CUDA 1700x 136x 32x

Programming Paradigm

Speedup over single threaded MATLAB

Speedup over multi-threaded MATLAB

Speedup over single threaded C

Speedup over multi-threaded C

MATLAB PCT 1.5x - - -

C with OpenMP 4x - 2x -

OpenCL (NVIDIA) 125x 80x 70x 30x

CUDA 200x 130x 100x 45x

Future work 1) The next bottleneck- Weighted Filtering. Was not earlier! 2) More configurations to be tested with auto-tuning- number of kernels to be launched, number of threads. 3) Streaming for bigger datasets. 4) Overlapping computation and communication.

Acknowledgments

References [1] S. Mukherjee, N. Moore, J. Brock, M. Leeser, CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging, Proc. of IEEE High Performance Extreme Computing, (2012). [2] L. A. Feldkamp, L. C. Davis, J. W. Kress, Practical cone-beam algorithm, J. Opt. Soc. Am., Volume 1(A), (1984). [3] F. Xu, K. Mueller, Real-time 3D computed tomographic reconstruction using commodity graphics hardware, Physics in Medicine and Biology, 52(12) (2007). [4] F. Ino, S. Yoshida, K. Hagihara, RGBA Packing for Fast Cone Beam Reconstruction on the GPU, Proc. of SPIE, Vol. 7258, (2009). [5] NVIDIA corporation, NVIDIA CUDA C Programming Guide, CUDA Toolkit 4.1. [6] Fessler's image reconstruction toolbox, http://www.eecs.umich.edu/~fessler/irt/fessler.tgz.

Klaus Mueller, Introduction to Medical Imaging, Lecture 6: X-Ray Computed Tomography, Computer Science Department, Stony Brook University

• Backprojection takes most of the time, but highly parallelizable. • Different voxels are independent. •Fessler’s image reconstruction toolbox6 implements Feldkamp CBCT in MATLAB. Widely used in academia.

Sample Projections

3D CT Reconstruction

Weighted Projection: Weighted and ramp filtered raw data produce filtered projections Q1,Q2, ...,QK, collected at an angle θn where 1 ≤ n ≤ K. di = distance between the volume origin and the source. F(x, y, z) = value of voxel (x, y, z) in volume F. Volume F in xyz space and Projections are in uv space. Backprojection: The volume F is reconstructed using the following equations:

Feldkamp Algorithm

Motivation

Our approach

Results P

H

A

N

T

O

M

M

O

U

S

E

S

C

A

N

Architectures and Languages used

More information and software available: http://www.coe.neu.edu/Research/rcl//projects/CBCT.php

Abstract

Recommended