Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
PipeRoute: A Pipelining-Aware Router for FPGAs
Akshay Sharma, Carl Ebeling* and Scott HauckElectrical Engineering / *Computer Science & Engineering
University of WashingtonSeattle, WA – 98195
2
Pipelined FPGA Architectures
• FPGAs and flexible computing
• But, max clock frequency?
• Examples of pipelined FPGAs• RaPiD (Ebeling et al, 1996)
• HSRA (Tsu et al, 1999)
• UCSB (Singh et al, 2001)
• Few prominent features• A fraction of (or all) switch-points are registered
• Registered LUT inputs
• Netlists heavily pipelined and retimed
3
Pipelined Routing
• PipeRoute – route netlists on pipelined FPGAs• pipelined netlist provides information about register separation
• FPGA routing graph consists of R-nodes and D-nodes
• Cost of using an R-node or D-node in a route is the same as Pathfinder
• Pipelined routing problem differs from normal FPGA routing
ST1
T2
4
Normal Routing – Two Terminal
• Dijkstra’s shortest-path for two-terminal routing
T
S
5
Normal Routing – Two Terminal
• Dijkstra’s shortest-path for two-terminal routing
T
S
6
Normal Routing – Two Terminal
• Dijkstra’s shortest-path for two-terminal routing
T
S
7
Normal Routing – Two Terminal
• Dijkstra’s shortest-path for two-terminal routing
T
S
8
Normal Routing – Two Terminal
• Dijkstra’s shortest-path for two-terminal routing
T
S
9
Pipeline Routing – Two Terminal
• Find shortest route that goes through N registers (hereafter “registers” will be called “delays”)
• Traveling Salesman• Find shortest route that goes through all nodes in a graph• NP Complete
TS
10
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
11
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
12
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
13
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
14
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
15
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
16
Two Terminal 1-Delay Router
• Can do optimal routing for 1-delay routes via Dijkstra
TS
17
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router
TS
18
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router• Find 1-delay route
TS
19
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
20
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
21
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
22
Two Terminal N-Delay Router
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
23
Normal Routing – Multi-Terminal
• Do two-terminal routing
• Use all of previous route(s) as source for next route
T1S
T2
24
Normal Routing – Multi-Terminal
• Do two-terminal routing
• Use all of previous route(s) as source for next route
T1S
T2
25
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
T1S
T2
26
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
T1S
T2
27
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time
T1S
T2
28
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
29
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
1
30
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
31
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
2
32
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
33
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
3
34
Multi-Terminal Router
• Sinks considered in increasing order of delay separation• T1 is 2 delays away from S, and T2 is 3 delays away from S
• Accumulate 1 delay at a time• When routing for an I delay, start from all existing routing at delay I
and I-1
T1S
T2
35
Benchmark Architecture
• Modified RaPiD architecture• 1-D datapath of 16-bit ALUs, Multipliers, registers and memories
• Pipelined interconnect structure
• Long and short tracks
• Bus Connectors used to pick up delay
GP
R
GP
R
RA
M
RA
M
MU
LT
AL
U
GP
R
AL
U
GP
R
GP
R
RA
M
AL
U
GP
R
36
Testing
• Benchmark RaPiD netlists
• Pipelining aware placement tool
• For each netlist• Treat netlist as unpipelined and determine smallest RaPiD arch. (Zl)
• Determine smallest RaPiD arch. needed to route pipelined netlist (Zp)
• Pipelining cost = Zp/Zl
37
Results
• Avg pipelining cost incurred = 1.74
RESULTS
0
0.5
1
1.5
2
2.5
3
NETLIST
PIP
E-C
OS
T
38
Results
• Effect of netlist-size on pipelining cost• Normalized to unpipelined netlist area
PIPE-COST vs SIZE
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50 60
SIZE (num RaPiD cells)
PIP
E-C
OS
T
39
Results
• Effect of % pipelined signals on pipelining cost • Normalized to unpipelined circuit area
PIPE-COST vs % PIPELINED SIGNALS
0
0.5
1
1.5
2
2.5
3
0% 10% 20% 30% 40% 50% 60% 70%
% PIPELINED SIGNALS
PIP
E-C
OS
T
40
The Future
• Delay driven PipeRoute• Currently under development
• Sophisticated pipelining-aware placement algorithms
• Fast pipelined routing algorithms
• Use PipeRoute to explore pipelined FPGA architectures• Number and location of registered switch-points