Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

Post on 26-Jun-2015

369 views 4 download

Tags:

transcript

Implementing 3D SPHARM Surfaces Registration on Cell ProcessorRegistration on Cell Processor

Huian Li (huili@indiana.edu) Mi Yan (miyan@us.ibm.com)Robert Henschel (rhensche@indiana edu) Li Shen (shenli@iupui edu)Robert Henschel (rhensche@indiana.edu) Li Shen (shenli@iupui.edu)

July 29, 2009

Contents• SPHARM registrationSPHARM registration• Matlab implementation

Cell implementation• Cell implementation• Performance Analysis• Conclusion

SPHARM Surfaces

R di l d t ll f• Radial and stellar surfaces• Simply connected, arbitrarily shaped• Vision, graphics, imaging, bioinformatics

SPHARM Expansion

( ) (x y z)( ) ( )(,) (x,y,z)

Area-preserving

(,) (x,y,z)

mapping

SHREC

(a) template, (b) object, (c) after ICP, (d) after registration of parameterizationg p

Calculation of coefficients• After rotating the parameter net on the surface inAfter rotating the parameter net on the surface in

Euler angles (α, β, γ), new coefficients will be:l

l

ln

nl

lmn

ml cDc )()(

where

ln

)min( mlnl

))()1(()(),min(

),0max(

)( lmnt

mlnl

mnt

tnimilmn deD

and

)!()!()!()!( llll )2()22( )2

(sin)2

(cos!)!()!()!(

)!()!()!()!()( nmttmnll

mnt tnmttmltnlmlmlnlnl

d

RMSD• RMSD (Root Mean Square Distance): distanceRMSD (Root Mean Square Distance): distance

between two SPHARM models

max

2,2,1 ||||

41 L l

ml

ml ccRMSD

04 l lm

m mand are coefficients of two

SPHARM models

mlc ,1

mlc ,2

Matlab implementation• A straightforward implementation in Matlab:A straightforward implementation in Matlab:

for l = 0 Lfor l = 0, Lmaxfor m = -l, l

for n = l lfor n = -l, lfor t = max(0, n-m), min(l+m, l-n)

performing calculations... performing calculations ...

• One rotation for L = 50 took 823 seconds on 2GHz quad• One rotation for Lmax = 50 took 823 seconds on 2GHz quad-core Intel Xeon E5335

Cell B.E.

Cell implementation• Domain decomposition:Domain decomposition:

for l = 0, Lmaxfor m = -l lfor m l, l

for n = -l, lfor t = max(0 n-m) min(l+m l-n)for t max(0, n m), min(l+m, l n)... calculations ...

• Decomposition along l leads to work load imbalance among SPUsimbalance among SPUs

• Decomposition along m creates unnecessary data p g ycommunication

Cell implementation• Loop fusion:Loop fusion:

for l = 0, Lmaxfor m = -l lfor m l, l

for n = -l, lfor t = max(0 n-m) min(l+m l-n)for t max(0, n m), min(l+m, l n)... calculations ...

• Unique index for combined loop:• Unique index for combined loop: f(l, m) = l2 + m + l

W kl d f h SPE• Workload for each SPE :(Lmax + 1)2/(total # of SPEs)

Cell implementation• Lookup table T for factorialLookup table T for factorial• Transform exponentials & multiplications into

multiplications & additions respectivelymultiplications & additions, respectively.

)2()22( )(sin)(cos)!()!()!()!(

)( nmttmnll mlmlnlnld

)()( )2

(sin)2

(cos!)!()!()!(

)(mnt tnmttmltnld

exp(

))()()()((21

exp(

mlTmlTnlTnlT

)()()()(2

tTnmtTtmlTtnlT

))2

log(sin)2()2

log(cos)22( nmttmnl

Cell implementation• Others that specific to Cell:Others that specific to Cell:

• Vectorization & data alignmentDMA data transfer between main memory &• DMA data transfer between main memory & local storeSPU d t• SPU decrementer

Cell implementation• Single precision vs. double precision: all data in single precisiong p p g p

Cell implementation• Single precision vs. double precision: partial data in double precisiong p p p p

Cell implementation• Single precision vs. double precision: all critical data in double precisiong p p p

Performance analysis

1 8

Performance of one rotation on Cell BE

1.41.61.8

s)

11.2

econ

ds

0 40.60.8

Tim

e (s

00.20.4T

1 2 4 8 16Number of SPEs

Performance analysisPerformance of finding the shortest

7000

Performance of finding the shortest distance at Level 3 on Cell BE

5000

6000

s)

4000

5000

seco

nds

2000

3000

Tim

e (s GNU gcc

IBM xlc

0

1000

04 8 12 16

Number of SPEs

Conclusion• Performance increases dramatically on Cell due toPerformance increases dramatically on Cell due to

its unique architecture and algorithm optimization.• Carefulness must be taken for data placement due• Carefulness must be taken for data placement due

to limited local store.• Carefulness must also be taken for data transfer• Carefulness must also be taken for data transfer

between local store and main memory.

The End

Questions?Questions?