A Multi-Ported Memory Compiler Utilizing True Dual-port...

Post on 14-Jun-2020

0 views 0 download

transcript

A Multi-Ported Memory Compiler Utilizing True

Dual-port BRAMs

Ameer Abdelhadi and Guy LemieuxDepartment of Electrical and Computer Engineering

University of British Columbia

Vancouver, Canadaa place of mindTHE UNIVERSITY OFBRITISH COLUMBIA

May 3rd, 2016

Motivation (1):FPGAs as parallel accelerators

•Used as parallel acceleratorsHave dual-ported memories only

1/20

1000’sDual-PortedBlock RAMs

1,000,000’sLogic

Elements

1000’sMultipliers/

DSPs

Motivation (2)Mixed port requirements

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

•Multi-porting approaches provide simple (fixed) ports only

•Waste of resources if these ports are not active simultaneously

2/20

Live-Value Table (LVT)

Multi-read

Replication

LVT with2 write and1 read ports

2-port RAM1

2-port RAM2

Multi-write

LVT

W0

W1

R

2-port RAM1

2-port RAM2

W R0

R1

Easy!

Hard!

10

1

1

Always writes 0’s

Always writes 1’s

3/20

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - stores data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - stores data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - store data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - store data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - store data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - store data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Data Banks Optimization

• LVT-based multi-ported RAM is composed of:

1) LVT - tracks changes2) Data banks - store data copies

•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work

• Optimizes the data banks (not the LVT!)

• The first technique that requires a CAD tool

4/20

This work solves the final step and most important problem of Block RAM allocation

Data Banks

RAM 01 Write/nR Read

WA

ddr

RA

ddr

RAM 11 Write/nR Read

RAM nW-11 Write/nR Read

LVTBankSel

Mixed Port Requirements (1):Fixed ports

5/20

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

Fixed (simple) ports:The majority of multi-ported memories supports fixed ports only

Mixed Port Requirements (2):True ports

6/20

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

True ports:Some techniques support the construction of multi-true-ports

BRAMs in FPGAs are true dual-ported

Mixed Port Requirements (3):Switched ports

7/20

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

Switched ports:A number of writes are switched with a number of reads

True ports are special case of switched ports

Switched Ports (1)Example

8/20

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

Key Observation:BRAMs’ true ports can be utilized to optimize switched ports

Objectives:Optimize the construction of multi-switched ports

Switched Ports (2)Fixed ports abstraction

9/20

/

Fixed Ports

√ xfg1/x

ALU

f/g

0 1

>>

R1,0R0,0

busr/w

Shared bus

R2,0

Switched Ports (3)Fixed data banks

10/20

/

Fixed Ports

√ xfg1/x

ALU

f/g

0 1

>>

R1,0R0,0

busr/w

Shared bus

R2,0

I-LVT

Switched Ports (4)DFG modeling

11/20

Complete Bigraph

Vertex Port

Edge 1W1R BRAM

/

Fixed Ports

√ xfg1/x

ALU

f/g

0 1

>>

R1,0R0,0

busr/w

Shared bus

R2,0

Switched Ports (5)Switched DFG

12/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

/√ xf

g1/x

ALU

f/g

0 1

>>

busr/w

Shared bus

R1,0R0,0 R2,0

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

W R

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

W R

Switched Ports (6)DFG Covering

13/20

Complete Bigraph

Vertex Port

Biclique pattern BRAM

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

W R

W R

Switched Ports (7)Switched data banks

14/20

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

W R

W R

Switched Ports (7)Switched data banks

14/20

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

W R

W R

I-LVT

Switched Ports (7)Switched data banks

14/20

W1 R1

W2 R2

W1 R1

R2

W R

W R

W R

W R

W R

W R

I-LVT I-LVT

Fixed Ports(Complete Bigraph)

Switched Ports (Optimized Bigraph)

12 BRAMs8 BRAMs(33% reduction)

Multi-switched-ports Compiler

•A RAM compiler optimizes data banks construction•Generates DFG from port requirements•Solves set-covering problem on all edges

• Covers are predefined biclique patterns• Solved as BLP problem

•Generates Verilog modules based on optimal covering

15/20

Available as open source contributionhttps://github.com/AmeerAbdelhadi

http://www.ece.ubc.ca/~lemieux/downloads/

Supports bypassing (RAW & RDW) and Initialization

Graphical User Interface (GUI)

16/20

Source of inspiration:Multi-True-Ports by Choi et al. / UofT• Provides true ports only (no simple/fixed ports)

• Is a special case of our generalized approach

• Doesn't need a CAD tools

17/20

RAM

R/W

Data

3

S3

n Read / n WriteRegister-based

LVT

S0 S1 S2 S3 Sn-1

R/WData1

RAM

3 Read / 3 WriteRegister-based

LVT

Experimental Results

•Run-in-batch flow manager•Uses Altera’s Quartus II for synthesis on Stratix V•Uses Altera’s ModelSim for verification with:

• Random vectors • Over a million RAM access cycles

•Results on random test-cases•Up to 8 switched ports•Up to 4 writes and 4 reads per switched port•Up to 28 writes/reads per test-case

18/20

Average BRAM Reduction Average ALMs Reduction Average Fmax Increase

Best of Previous 18% -3% -1%

True Ports 42% 53% 15%

Conclusions

•A methodology to support switched write/read functionality•True dual-ported BRAMs are utilized to optimize the

RAM allocation•A RAM compiler optimizes the problem•An additional 18% average BRAM reduction

compared to the best of other approaches•Practical solution:• Initialization• Bypassing• Available as open source

19/20

Future Directions

•Applications• Parallel computation• HLS – storage binding

•Optimization of switched ports port assignment• Extraction of mutually-exclusive functions from HDL

•Statistical approach• Ports which are mutually-exclusive in most cases can use

a switched port• Access conflicts will be rare

20/20

Thank You!