+ All Categories
Home > Documents > Parallel Beam Back Projection: Implementation

Parallel Beam Back Projection: Implementation

Date post: 06-Jan-2016
Category:
Upload: jamar
View: 47 times
Download: 6 times
Share this document with a friend
Description:
Parallel Beam Back Projection: Implementation. Srdjan Coric Miriam Leeser Eric Miller. Outline. Annapolis Wildstar “Simple Architecture” algorithm datapath Performance Results Parallelism extraction “Advanced Architecture 4x” datapath Performance Results Implementation issues - PowerPoint PPT Presentation
Popular Tags:
23
Parallel Beam Back Projection: Implementation Srdjan Coric Miriam Leeser Eric Miller
Transcript
Page 1: Parallel Beam Back Projection: Implementation

Parallel Beam Back Projection:Implementation

Srdjan Coric

Miriam Leeser

Eric Miller

Page 2: Parallel Beam Back Projection: Implementation

Outline• Annapolis Wildstar• “Simple Architecture”

– algorithm– datapath– Performance– Results

• Parallelism extraction• “Advanced Architecture 4x”

– datapath– Performance– Results– Implementation issues

• Future directions

Page 3: Parallel Beam Back Projection: Implementation
Page 4: Parallel Beam Back Projection: Implementation

Data Flow

Sinogram data address

generation

Sinogram data retrieval

Linearinterpolation

Dataaccumulation

Datawrite

Dataread

Sinogram data prefetch

Page 5: Parallel Beam Back Projection: Implementation

LUT1

starting position

Interpolation factor errorCorner starting position

Critical error-accumulation path

5161664 22512251222

LUT1

quantization error

LUT2

quantization error

Bit reduction

error

LUT3

quantization error

LUT2:

LUT3:

15

.2 .

15

LUT1: .10 5

1 detector

pixel

detector

pixeldetectors

detector

0

detector

01 cossin

2

1-Nsinycosx :LUT

detector

pixel2 sin :LUT

detector

pixel3 cos :LUT

Page 6: Parallel Beam Back Projection: Implementation

“Simple Architecture” Datapath

ROUN D

EVENRAM

ODDRAM

EVENRAM

ODDRAM

SW

AP

SU

B MU

LT

AD

D

LOCALRAM

LOCALRAM

MU

X

LUT 1

LUT 2

LUT 3

PROJECTIONCOUN TER

SU

BMU

X

MU

X

AD

D

DE

MU

X

W RITEADDR ESSCOUN TER

MU

XM

UX

AD

D

10

15

16

17

25

25

2525

25

5 4

10

9

9

9

MEZZAN INERAM

9

9

10

13

14

25

9

15

25

++

Page 7: Parallel Beam Back Projection: Implementation

Performance Results: Software vs. FPGA Hardware

A Software - Floating point - 450 MHz Pentium : ~ 240 s

B Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

C Software - Fixed point - 450 MHz Pentium : ~ 50 s

D Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

E Hardware - 50 MHz : ~ 5.4 s

0

50

100

150

200

250

A B C D E

Parameters: 1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor

Page 8: Parallel Beam Back Projection: Implementation

Original image Hardware output image

Zoom: ~200%Grayscale range < Pixel value range

(heart features in focus)

Page 9: Parallel Beam Back Projection: Implementation

Original image Hardware output image

Zoom: ~200%Grayscale range < Pixel value range

(lung features in focus)

Page 10: Parallel Beam Back Projection: Implementation

Original image - Hardware output image

Page 11: Parallel Beam Back Projection: Implementation

V1

Imagerows

Projections

Imagecolumns

T~k1V1 T~k1V2 T~k2 V3

k1 <k2, V2 = V3 = V1 /4, T=Execution time

Case 1:No parallelism extracted

Case 2:Pixel level parallelism

extracted

Case 3: Projection level

parallelism extracted

V2

V3

Parallelism Issues

Memory bandwidth requirements at 50 MHz (for data accumulation)

Case 1: 0.4 GB/sCase 2: 1.6 GB/sCase 3: 0.4 GB/s

Memory bandwidth limit

1.2 GB/s

Page 12: Parallel Beam Back Projection: Implementation

Advanced Architecture - Data Pathprojection parallelism extracted

ROUN D

EVENRAM

ODDRAM

EVENRAM

ODDRAM

SW

AP

SU

B MU

LT

AD

D

LOCALRAM

LOCALRAM

MU

X

LUT 1

LUT 2

LUT 3

PROJECTIONCOUN TER

SU

BMU

X

MU

X

AD

D

DE

MU

X

W RITEADDR ESSCOUN TER

MU

XM

UX

AD

D

10

15

16

17

25

25

25

25

25

5 4

10

9

9

9

MEZZAN INERAM

9

9

10

13

14

25

9

15

25

++

Simple Architecture

SU

B

ROUND

ROUND

ROUND

SU

BS

UB

LUT 3.1

LUT 3.2

LUT 2.1

LUT 2.2

LUT 1.1

LUT 1.2

LUT 4.1

LUT 4.2

EVENWRITE

COUNTERODD

WRITECOUNTER

ODDRAMODD

RAM

EVENRAMEVEN

RAM

ODDRAMODD

RAM

EVENRAMEVEN

RAM

ODDRAMODD

RAM

EVENRAMEVEN

RAM

ODDRAMODD

RAM

EVENRAMEVEN

RAMS

UB

SU

BS

UB

SU

BS

UB M

UL

TM

UL

TM

UL

TM

UL

T

AD

D

PROJECTIONCOUNTER

MU

X

AD

D

LUT 1.3

MU

X

AD

D

LUT 2.3M

UX

AD

D

LUT 3.3

MU

X

AD

D

LUT 4.3

DE

MU

X

ROUND

DE

MU

XD

EM

UX

DE

MU

X

MU

XM

UX

MU

XM

UX

SW

AP

MU

XM

UX

SW

AP

MU

XM

UX

SW

AP

MU

XM

UX

SW

AP

MU

XM

UX

AD

D

AD

DA

DD

AD

D

AD

DA

DD

AD

D LOCALRAM

LOCALRAM

15

16

2525

2525

25

17

5 4

9

9

10

9

9

10

14

1315

16

17

25

25

9

9

8

LEFTMEZZANINE

RAM

RIGHTMEZZANINE

RAM

4 9

4 9

Page 13: Parallel Beam Back Projection: Implementation

Performance Results: Software vs. FPGA Hardware

A Software - Floating point - 450 MHz Pentium : ~ 240 s

B Software - Floating point - 1 GHz Dual Pentium : ~ 94 s

C Software - Fixed point - 450 MHz Pentium : ~ 50 s

D Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s

E Hardware - 50 MHz : ~ 5.4 s

F Hardware (Advanced Architecture) - 50 MHz : ~ 1.3 s

0

50

100

150

200

250

A B C D E F

Parameters: 1024 projections

1024 samples per projection

512*512 pixels image

9-bit sinogram data

3-bit interpolation factor

Page 14: Parallel Beam Back Projection: Implementation
Page 15: Parallel Beam Back Projection: Implementation
Page 16: Parallel Beam Back Projection: Implementation
Page 17: Parallel Beam Back Projection: Implementation
Page 18: Parallel Beam Back Projection: Implementation
Page 19: Parallel Beam Back Projection: Implementation

prj_num(3)fanout = 1565 !

routing delay = 7.913 ns (~39.99%)

Implementation Issues- fanout -

Page 20: Parallel Beam Back Projection: Implementation

odd_2_A_4[4]fanout = 144 !

Implementation Issues- fanout -

Page 21: Parallel Beam Back Projection: Implementation

Memory Bridges Stuff

3 architectures implemented:A “Simple Architecture” = non-parallel (on slide 6)B “Advanced Architecture” = 4-way parallel (slide 12)

C “Bridge Free Advanced Arch” =as B but contains no memory bridges (all design buffers in BlockRAMs) from PCI

bus to memory banks required for Host-Memory communication. Bridges are separate design that is downloaded before (after) design C is

downloaded so that input data can be stored to (output data read from) memories on the WildStar board.

Virtex1000 resource utilization:A 11% logic, 90% BlockRAMs (with bridges)B 39% logic, 100% BlockRAMsC 21% logic, 100% BlockRAMs

Page 22: Parallel Beam Back Projection: Implementation

Floorplan of the“Bridge Free Advanced Architecture”

(design C on the previous slide)

Page 23: Parallel Beam Back Projection: Implementation

Future Directions

• Graduate


Recommended