+ All Categories
Home > Documents > Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer...

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer...

Date post: 11-Jan-2016
Category:
Upload: margaretmargaret-sparks
View: 219 times
Download: 5 times
Share this document with a friend
Popular Tags:
47
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010
Transcript
Page 1: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

Efficient Multi-Ported Memories for FPGAs

Eric LaForestGreg Steffan

University of TorontoComputer Engineering Research Group

February 22, 2010

Page 2: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

2

Parallelism in FPGAs Larger SoCs on FPGAs →Parallel Systems Parallel systems on FPGAs will need:

Queueing Data sharing Communication Synchronization

Boils down to: FIFOs Register files

We can do all these with multi-ported memories

Page 3: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

3

Multi-Ported Memory

X

X

X

X

Existing workarounds are ad-hoc, “roll-your-own”,and have limited parallelism.

Page 4: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

4

Conventional Approaches

Page 5: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

5

2W/2R Multi-Ported Memory

Doesn't exist on FPGAsAltera used to have one (Mercury)

Page 6: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

6

Stratix III Building Blocks

M9K (eg: 32 x 256)M144K (eg: 32 x 4098)

Adaptive Logic Modules RegistersLUTsAdders

Block RAMs

Flexible, but slow

Fast, but inflexible

Page 7: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

7

2W/2R Pure-ALM

Scales very poorly with memory depth

Page 8: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

8

1W/nR Replication

Multiple read portsOnly one write port

Page 9: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

9

mW/nR Banking

Multiple write ports Fragmented data

Page 10: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

10

mW/nR “Multipumping”

Multiple read/write portsNo fragmentation

Divides clock speedRead/write ordering

Page 11: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

11

Block RAMs: Simple Dual Port

ReadWrite

Page 12: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

12

Block RAMs: True Dual Port

R / W R / W

Page 13: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

13

“Pure Multipumping”

Read as banked memory (multiple reads)

Page 14: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

14

“Pure Multipumping”

Write as replicated memory (avoids fragmentation)

Page 15: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

15

Methodology Generate design variations over space

Vary # of ports, depth, type of memories 1W/2R to 8W/16R 2 to 256 elements deep Pure-ALM, M9K, MLAB, Multipumped

Wrap in testbench for timing and correctness Target Quartus 9.0 to Stratix III

No synthesis optimizations for speed or area Standard P&R effort (speed, avg. over 10 runs)

Measure area as Total Equivalent Area Expresses area in a single unit (ALMs)

Page 16: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

16

Conventional Multi-PortingPerformance

Page 17: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

17

1W/2R Pure-ALM Area vs. Speed

NiosII/f290 MHz500 ALMs

Smaller

Faster

Too big and slow!

Page 18: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

18

1W/2R Replicated vs. Pure-ALM

Page 19: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

19

1W/2R “Pure Multipumping”

Page 20: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

20

LVT-Based Multi-Ported Memories

Page 21: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

21

LVT-Based Memory

Page 22: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

22

LVT-Based Memory

Begin with oneblock RAM

Page 23: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

23

LVT-Based Memory

Replicate for two read ports

Page 24: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

24

LVT-Based Memory

Bank for twowrite ports

Page 25: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

25

LVT-Based Memory

Select bankto read from

Page 26: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

26

LVT-Based Memory

Add banklookup table

Page 27: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

27

LVT-Based Memory

Page 28: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

28

Live Value Table Operation

Page 29: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

29

LVT Operation

2W/2R, 4-deep

Page 30: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

30

W0

LVT Operation

W0

W1

R0

R1

Live Value Table

Write Addresses Read Addresses

0

1

2

3

Page 31: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

31

W0

LVT Operation: Write

W0

W1

R0

R1

42 @ 1

23 @ 3

0

Records which write port last updated a location

0

1

2

3 1

Page 32: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

32

W0

LVT Operation: Read

W0

W1

R0

R1

0

@ 1

@ 3

10

1

Steers read port to correct memory bank

0

1

2

3

Page 33: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

33

LVT Implementation

LVT remains practical because it is very narrow

Page 34: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

34

LVT Operation

Small Pure-ALM memory controlling larger block RAMs

Page 35: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

35

Advantages of LVTs

LVTs add a layer of indirection Everything operates in parallel Makes banked memory behave as consistent unit

LVTs are narrow Word width = log

2(# of write ports) < 4 bits typically

Pure-ALM, but practical size and speed

Page 36: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

36

LVT Performance

Page 37: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

37

2W/4R Pure-ALM

Page 38: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

38

2W/4R LVT-based vs. Pure-ALM

84% smaller

43% faster

412 MHzto

375 MHz

Page 39: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

39

2W/4R Multipumping

Must be careful about read/write ordering!

Page 40: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

40

Multipumping Performance

Page 41: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

41

2W/4R Multipumping

Page 42: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

42

2W/4R Multipumping

Pure Multipumping(279 MHz)

Page 43: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

43

4W/8R Multipumping

Worsens as # of ports increases

Page 44: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

44

2W/4R Multipumping

28% smalleron average

193 MHzto

174 MHz

54% slower on average

Page 45: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

45

Conclusions

Pure multipumped memories are better for memories with few ports or low speed.

LVT-based memories are faster and smaller than Pure-ALM memories.

LVT-based memories are faster than pure multipumping, but at a cost in area.

Page 46: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

46

Future Work

Pure multipumping for LVT-based memories Build banks with 2W/4R pure multipumping blocks Possible further area improvement

Relaxing the read/write order for multipumping Allows multiplexing the write ports Leaves designer to watch for WAR violations

Page 47: Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.

47

Thank You


Recommended