+ All Categories
Home > Documents > Ultrafast Multipinhole SPECT Iterative Reconstruction...

Ultrafast Multipinhole SPECT Iterative Reconstruction...

Date post: 01-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
GTC 2012 San Jose, CA Ultrafast Multipinhole SPECT Iterative Reconstruction Using CUDA-based GPU Computing F. Alhassen 1 , J. D. Bowen 1 , H. Kudrolli 2 , B. Singh 2 , R. G. Gould 1 , V. V. Nagarkar 2 , Y. Seo 1 1 Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, USA, [email protected] 2 Radiation Monitoring Devices, Inc., Watertown, MA, USA We have developed an ultrafast SIR method for multipinhole SPECT programmed in CUDA and tested using a high performance graphic processing unit. We show significant performance improvement in reconstruction using both computer-generated and experimental sinograms, demonstrating an up-to fifty-fold speed enhancement with virtually the same accuracy as the CPU-based SIR (with 0.15% normalized root mean square error). Motivation Why use GPUs for statistic iterative reconstructions (SIRs)? [1] What’s new here? Pros Cons Model stochastic processes Model physical processes Better noise/ resolution tradeoff than FBP Potentially lower dose or scan time Much slower than FBP Limited in what physical processes can actually be modeled due to computational constraints GPU NVIDIA Tesla M2070 448 cores, 6 GB DDR RAM, 150 GB/s, 1 peak TFLOPS © 2011 NVIDIA Corporation [2] Pros Cons Model stochastic processes Model physical processes Better noise/ resolution tradeoff than FBP Potentially lower dose or scan time GPU programming complexity GPU-based SIR for multipinhole SPECT using pre-computed system matrix Implemented using CUDA / CUSPARSE [3] GPU computing API Models finite pinhole apertures Why multipinhole SPECT? High resolution with increased sensitivity compared to single pinhole or parallel hole SPECT Simultaneously acquired multiple views enhance accuracy in dynamic studies and reduce motion artifacts [2] CPU vs. GPU Implementations Hardware / Method CPU GPU Processing cores 1 core of a AMD Opteron 6128 2.0 GHz CPU 448 cores of a NVIDIA Tesla M2070 GPU RAM 16 GB 6 GB Sparse matrix operations (projections) Eigen 3.0 [4] CUSPARSE 1.0 [5] Rotation, correction, and reduction operations C++ functions CUDA kernels computational constraints d = 5 k = 2 i = 1 ρ5 62 1 ρ5 92 1 ρ5 1 ρ5 12 1 ρdδk i ρ5 1 δ = 6 δ = 9 ρ52 1 ρ53 1 ρdk i Maximum-likelihood expectation maximization (MLEM) Initialize activity estimate ˆ L 1 = 1 1 1 ( ) T Project the estimated activity distribution XR i = Pˆ LR i Back-project the ratio between the real and estimated projections YR i = P T XR ÷ XR i ( ) Generate rotations of estimate ˆ LR i = R ˆ L i ( ) Rotate back and reduce to a single correction set Y i = R YR i ( ) m M Correct current estimate ˆ L i+1 = Y i ÷ V ˆ L i Terminal condition satisfied? Output final estimate ˆ L = ˆ L i Yes No Legend Activity estimate for the ith iteration Rotation operation Rotated activity estimate System matrix Estimated projections Measured projections Back-projected correction Rotation in opposite direction Correction Normalization Transpose operation Element-wise multiplication Element-wise division R ˆ L i R ˆ LR i P XR i XR ÷ YR i Y i V T MLEM SIR algorithm Using matrix-based notation and implementation Ray-tracing Incremental Siddon’s Method [3] ldk i f fdi b x y z Sub-pixels Detector pixels Multipinhole collimator Activity voxels System matrix generation Calculating weights between a pixel d and a voxel k through the ith pinhole Ray tracing is faster on GPU TCPU 2TCPU CPU 3TCPU GPU TGPU 2TGPU SIR Setup Projections from digital phantoms Projections from experimental acquisitions With a microcolumnar scintillator [7] Hot rod phantom MOBY phantom [6] Mouse heart phantom [7] Ideal aperture Finite aperture (0.5 mm) References 1. Pratx, G.; Chinn, G.; Olcott, P. D.; and Levin, C. S.; “Fast, Accurate and Shift-Varying Line Projections for Iterative Reconstruction Using the GPU,” Medical Imaging, IEEE Transactions on, vol. 28, pp. 435-445, March 2009. 2. Xu, F.; “Fast Implementation of Iterative Reconstruction with Exact Ray-Driven Projector on GPUs,” Tsinghua Science & Technology, vol. 15, pp. 30-35, 2010. 3. Han, G.; Liang, Z.; You, J.; , "A fast ray-tracing technique for TCT and ECT studies," Nuclear Science Symposium, 1999. Conference Record. 1999 IEEE, vol.3, no., pp.1515-1518 vol.3, 1999. 4. Eigen, http://eigen.tuxfamily.org/index.php?title=Main_Page, 2011. 5. Naumov, M.; “CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices,” GPU Technology Conference, 2070, Sept. 23, 2010. 6. Segars, W. P.; Tsui, B.M.W.; Frey, E. C.; Johnson, G. A.; and Berr, S. S.; Development of a 4D digital mouse phantom for molecular imaging research. Molecular Imaging & Biology, vol. 6, issue 3, p. 149-159, 2004. 7. Alhassen, F.; Kudrolli, H.; Singh, B.; Kim, S.; Seo, Y.; Gould R. G.; and Nagarkar, V. V.; "A preclinical SPECT camera with depth-of-interaction compensation using a focused-cut scintillator", Proc. SPIE 7961, 796121 (2011). Reconstructions Pinhole aperture Phantom dimensions (voxels) Sinogram dimensions (pixels) Sub-pixels per pixel Pinhole pixels per sub-pixel Number of rays System matrix size (MB) Ideal 643 2562 x 60 32 1 589,824 76 Finite 643 2562 x 60 32 52 14,745,600 952-1072 Settings Benchmarking Pinhole aperture Computational element Ray tracing (s) Average per iteration (s) All iterations (s) Total reconstruction (s) GPU speed enhancement NRMSE (%) Ideal CPU 6.31 10.26 513.20 527.05 37.25 0.15 GPU 0.62 0.23 11.60 14.15 Finite CPU 197.68 47.3 2365.14 2586.02 47.71 1.24E-4 GPU 13.30 0.78 38.86 51.20 Finite (Mouse heart phantom) CPU 356.47 60.79 3039.44 3421.37 37.25 4.50E-6 GPU 23.11 0.83 41.28 66.47 NRMSE i () = 1 max λk ( ) 1 Nk λk ˆ λk i ( ) 2 k Reconstruction Accuracy MOBY phantom convergence GPU CPU Line profile
Transcript
Page 1: Ultrafast Multipinhole SPECT Iterative Reconstruction ...developer.download.nvidia.com/.../P0476_poster.pdf · 1 core of a AMD Opteron 6128 2.0 GHz CPU 448 cores of a NVIDIA Tesla

GTC 2012 San Jose, CA

Ultrafast Multipinhole SPECT Iterative Reconstruction Using CUDA-based GPU Computing

F. Alhassen1, J. D. Bowen1, H. Kudrolli2, B. Singh2, R. G. Gould1, V. V. Nagarkar2, Y. Seo1 1Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, CA, USA, [email protected]

2Radiation Monitoring Devices, Inc., Watertown, MA, USA

We have developed an ultrafast SIR method for multipinhole SPECT programmed in CUDA and tested using a high performance graphic processing unit. We show significant performance improvement in reconstruction using both computer-generated and experimental sinograms, demonstrating an up-to fifty-fold speed enhancement with virtually the same accuracy as the CPU-based SIR (with 0.15% normalized root mean square error).

Motivation

Why use GPUs for statistic iterative reconstructions (SIRs)? [1]

What’s new here?

Pros Cons

Model stochastic processes

Model physical processes

Better noise/resolution tradeoff

than FBP

Potentially lower dose or scan time

Much slower than FBP

Limited in what physical processes

can actually be modeled due to computational

constraints

GPU

NVIDIA Tesla M2070 448 cores, 6 GB DDR RAM, 150 GB/s, 1 peak TFLOPS

© 2011 NVIDIA Corporation [2]

Pros Cons

Model stochastic

processes

Model physical

processes

Better noise/

resolution tradeoff

than FBP

Potentially lower

dose or scan time

GPU programming

complexity

•  GPU-based SIR for multipinhole SPECT using pre-computed system matrix

•  Implemented using CUDA / CUSPARSE [3] GPU computing API

•  Models finite pinhole apertures

Why multipinhole SPECT? •  High resolution with increased sensitivity compared to

single pinhole or parallel hole SPECT •  Simultaneously acquired multiple views enhance accuracy

in dynamic studies and reduce motion artifacts [2]

CPU vs. GPU Implementations

Hardware / Method CPU GPU

Processing cores 1 core of a AMD Opteron 6128 2.0 GHz CPU

448 cores of a NVIDIA Tesla M2070 GPU

RAM 16 GB 6 GB

Sparse matrix operations (projections) Eigen 3.0 [4] CUSPARSE 1.0 [5]

Rotation, correction, and reduction operations C++ functions CUDA kernels

computational constraints d = 5

k = 2 i = 1

ρ5621 ρ592

1 ρ5331

ρ5121ρdδ k

i ρ5731

δ = 6

δ = 9

∑ ∑

ρ521 ρ53

1ρdki

Maximum-likelihood expectation maximization (MLEM)

Initialize activity estimate

L̂1 = 1 1 1( )T

Project the estimated activity distribution

XR

i = PL̂Ri

Back-project the ratio between the real and estimated projections

YR

i = PT XR ÷XRi( )

Generate rotations of estimate

L̂R

i = R L̂i( )

Rotate back and reduce to a single

correction set Yi = ′R YR

i( )m

M

Correct current

estimate

L̂i+1 = Yi ÷V L̂i

Terminal condition satisfied?

Output final estimate

L̂ = L̂i

Yes

No

Legend

Activity estimate for the ith iteration Rotation operation Rotated activity estimate System matrix Estimated projections Measured projections Back-projected correction Rotation in opposite direction Correction Normalization Transpose operation Element-wise multiplication Element-wise division

′R

L̂i

RL̂Ri

PXR

i

XR

÷

YRi

Yi

VT

MLEM SIR algorithm Using matrix-based notation and implementation

Ray-tracing Incremental Siddon’s Method [3]

ldki

f′fdi

b

xyz

Sub-pixels Detector pixels

Multipinhole collimator

Activity voxels

System matrix generation Calculating weights between a pixel d and a voxel k through the ith pinhole

Ray tracing is faster on GPU

TCPU 2TCPU

CPU

3TCPU

GPU

TGPU 2TGPU

SIR Setup

Projections from digital phantoms Projections from experimental

acquisitions With a microcolumnar scintillator [7]

fig_hot_rod.pdf

512 x 512

fig_moby.pdf

512 x 512

Hot rod phantom

MOBY phantom [6]

!

Mouse heart phantom [7]

!

!

Ideal aperture Finite aperture (0.5 mm)

References

1.  Pratx, G.; Chinn, G.; Olcott, P. D.; and Levin, C. S.; “Fast, Accurate and Shift-Varying Line Projections for Iterative Reconstruction Using the GPU,” Medical Imaging, IEEE Transactions on, vol. 28, pp. 435-445, March 2009.

2.  Xu, F.; “Fast Implementation of Iterative Reconstruction with Exact Ray-Driven Projector on GPUs,” Tsinghua Science & Technology, vol. 15, pp. 30-35, 2010.

3.  Han, G.; Liang, Z.; You, J.; , "A fast ray-tracing technique for TCT and ECT studies," Nuclear Science Symposium, 1999. Conference Record. 1999 IEEE, vol.3, no., pp.1515-1518 vol.3, 1999.

4.  Eigen, http://eigen.tuxfamily.org/index.php?title=Main_Page, 2011. 5.  Naumov, M.; “CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices,” GPU

Technology Conference, 2070, Sept. 23, 2010. 6.  Segars, W. P.; Tsui, B.M.W.; Frey, E. C.; Johnson, G. A.; and Berr, S. S.; Development of a 4D digital mouse phantom

for molecular imaging research. Molecular Imaging & Biology, vol. 6, issue 3, p. 149-159, 2004. 7.  Alhassen, F.; Kudrolli, H.; Singh, B.; Kim, S.; Seo, Y.; Gould R. G.; and Nagarkar, V. V.; "A preclinical SPECT

camera with depth-of-interaction compensation using a focused-cut scintillator", Proc. SPIE 7961, 796121 (2011).

Reconstructions

Pinhole aperture

Phantom dimensions

(voxels)

Sinogram dimensions

(pixels)

Sub-pixels per pixel

Pinhole pixels per sub-pixel Number of rays System matrix size (MB)

Ideal 643 2562 x 60 32 1 589,824 76

Finite 643 2562 x 60 32 52 14,745,600 952-1072

Settings

Benchmarking

Pinhole aperture

Computational element Ray tracing (s) Average per

iteration (s)

All iterations

(s)

Total reconstruction

(s)

GPU speed enhancement

NRMSE (%)

Ideal CPU 6.31 10.26 513.20 527.05

37.25 0.15 GPU 0.62 0.23 11.60 14.15

Finite CPU 197.68 47.3 2365.14 2586.02

47.71 1.24E-4 GPU 13.30 0.78 38.86 51.20

Finite (Mouse heart

phantom)

CPU 356.47 60.79 3039.44 3421.37 37.25 4.50E-6

GPU 23.11 0.83 41.28 66.47

NRMSE i( ) = 1max λk( )

1Nk

λk − λ̂ki( )2

k∑

Reconstruction Accuracy

MOBY phantom convergence

GPU CPU Line profile

Recommended