Post on 14-Jun-2020
transcript
A Multi-Ported Memory Compiler Utilizing True
Dual-port BRAMs
Ameer Abdelhadi and Guy LemieuxDepartment of Electrical and Computer Engineering
University of British Columbia
Vancouver, Canadaa place of mindTHE UNIVERSITY OFBRITISH COLUMBIA
May 3rd, 2016
Motivation (1):FPGAs as parallel accelerators
•Used as parallel acceleratorsHave dual-ported memories only
1/20
1000’sDual-PortedBlock RAMs
1,000,000’sLogic
Elements
1000’sMultipliers/
DSPs
Motivation (2)Mixed port requirements
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
•Multi-porting approaches provide simple (fixed) ports only
•Waste of resources if these ports are not active simultaneously
2/20
Live-Value Table (LVT)
Multi-read
Replication
LVT with2 write and1 read ports
2-port RAM1
2-port RAM2
Multi-write
LVT
W0
W1
R
2-port RAM1
2-port RAM2
W R0
R1
Easy!
Hard!
10
1
1
Always writes 0’s
Always writes 1’s
3/20
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - stores data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - stores data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Data Banks Optimization
• LVT-based multi-ported RAM is composed of:
1) LVT - tracks changes2) Data banks - store data copies
•Our previous work (I-LVT/ FPGA’14) optimizes LVT only•This work
• Optimizes the data banks (not the LVT!)
• The first technique that requires a CAD tool
4/20
This work solves the final step and most important problem of Block RAM allocation
Data Banks
RAM 01 Write/nR Read
WA
ddr
RA
ddr
RAM 11 Write/nR Read
RAM nW-11 Write/nR Read
LVTBankSel
Mixed Port Requirements (1):Fixed ports
5/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Fixed (simple) ports:The majority of multi-ported memories supports fixed ports only
Mixed Port Requirements (2):True ports
6/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
True ports:Some techniques support the construction of multi-true-ports
BRAMs in FPGAs are true dual-ported
Mixed Port Requirements (3):Switched ports
7/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Switched ports:A number of writes are switched with a number of reads
True ports are special case of switched ports
Switched Ports (1)Example
8/20
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Key Observation:BRAMs’ true ports can be utilized to optimize switched ports
Objectives:Optimize the construction of multi-switched ports
Switched Ports (2)Fixed ports abstraction
9/20
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
Switched Ports (3)Fixed data banks
10/20
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
I-LVT
Switched Ports (4)DFG modeling
11/20
Complete Bigraph
Vertex Port
Edge 1W1R BRAM
/
Fixed Ports
√ xfg1/x
ALU
f/g
0 1
>>
R1,0R0,0
busr/w
Shared bus
R2,0
Switched Ports (5)Switched DFG
12/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
/√ xf
g1/x
ALU
f/g
0 1
>>
busr/w
Shared bus
R1,0R0,0 R2,0
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
Switched Ports (6)DFG Covering
13/20
Complete Bigraph
Vertex Port
Biclique pattern BRAM
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
I-LVT
Switched Ports (7)Switched data banks
14/20
W1 R1
W2 R2
W1 R1
R2
W R
W R
W R
W R
W R
W R
I-LVT I-LVT
Fixed Ports(Complete Bigraph)
Switched Ports (Optimized Bigraph)
12 BRAMs8 BRAMs(33% reduction)
Multi-switched-ports Compiler
•A RAM compiler optimizes data banks construction•Generates DFG from port requirements•Solves set-covering problem on all edges
• Covers are predefined biclique patterns• Solved as BLP problem
•Generates Verilog modules based on optimal covering
15/20
Available as open source contributionhttps://github.com/AmeerAbdelhadi
http://www.ece.ubc.ca/~lemieux/downloads/
Supports bypassing (RAW & RDW) and Initialization
Graphical User Interface (GUI)
16/20
Source of inspiration:Multi-True-Ports by Choi et al. / UofT• Provides true ports only (no simple/fixed ports)
• Is a special case of our generalized approach
• Doesn't need a CAD tools
17/20
RAM
R/W
Data
3
S3
n Read / n WriteRegister-based
LVT
S0 S1 S2 S3 Sn-1
R/WData1
RAM
3 Read / 3 WriteRegister-based
LVT
Experimental Results
•Run-in-batch flow manager•Uses Altera’s Quartus II for synthesis on Stratix V•Uses Altera’s ModelSim for verification with:
• Random vectors • Over a million RAM access cycles
•Results on random test-cases•Up to 8 switched ports•Up to 4 writes and 4 reads per switched port•Up to 28 writes/reads per test-case
18/20
Average BRAM Reduction Average ALMs Reduction Average Fmax Increase
Best of Previous 18% -3% -1%
True Ports 42% 53% 15%
Conclusions
•A methodology to support switched write/read functionality•True dual-ported BRAMs are utilized to optimize the
RAM allocation•A RAM compiler optimizes the problem•An additional 18% average BRAM reduction
compared to the best of other approaches•Practical solution:• Initialization• Bypassing• Available as open source
19/20
Future Directions
•Applications• Parallel computation• HLS – storage binding
•Optimization of switched ports port assignment• Extraction of mutually-exclusive functions from HDL
•Statistical approach• Ports which are mutually-exclusive in most cases can use
a switched port• Access conflicts will be rare
20/20
Thank You!