ECE 506
Reconfigurable Computing
http://www.ece.arizona.edu/~ece506
Lecture 6
Clustering
Ali Akoglu
Before Placement: Clustering° Intra-cluster connections: fast° Inter-cluster connections: slowNeed to pack BLEs ° Goals:
• Reduce stress on routing• Take advantage of local fast
interconnect• Reduce inter-cluster wiring• Minimize critical path (timing-
driven)° How do we do this
• Take advantage of cluster architecture
° Tradeoffs
Basic Clustering (Betz)
° How many distinct inputs should be provided to a cluster of N 4-LUTs?
° How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?
VPACK
Basic Clustering (Betz)
° Flow• Iterate until all BLEs consumed• Start new cluster by selecting a random BLE
- select the currently unclustered BLE with the most used inputs,• Add BLE with most shared inputs with current cluster to
cluster- to minimize the number of inputs that must be routed to
each cluster.• Keep adding until either cluster full or input pins used up• Hill climbing – if some cluster BLEs unused
- Add another BLE even if cluster input count temporarily overflowed
- If input count not eventually reduced select best choice from before hill climbing
Logic Utilization
Number of Inputs per Cluster
• Lots of opportunities for input sharing in large clusters (Betz – CICC’99)
• Reducing inputs reduces the size of the device and makes it faster.
• Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.
TVPACK
Architecture Modeling
Tri-state buffer and pass transistor distributionCluster Size vs. Routing resources (Tile size)Transistor and Buffer Scaling based on segment lengthFlexibility of Switches (Fc=W for large cluster size is a waste?)
Logic Cluster Structure
Timing-Driven Clustering – T-VPACK
° Optimization goals of VPack• Pack each cluster to its capacity
- Minimize number of clusters• Minimize number of inputs per cluster
- Reduce the number of external connections
Timing-Driven Clustering – T-VPACK
° Optimization goal of T-VPack• Minimize number of external connections on critical
path• Why?
- External connections have higher delay and internal connections
- Reducing number of external nets on critical path will reduce delay
Timing-Driven Clustering – T-VPACK
° First stage• Identify connections that are on the critical path
° Second Stage• Pack BLEs sequentially along the critical path • Recompute criticality of remaining BLEs
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
Arrival Times
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
3
3
1
Arrival Times
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
Arrival Times
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
Arrival Times
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
18
22
18
Arrival Times
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7
7
13
15
14
22/22
18/22
18/22
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7 / 15
7
13
14/ 18
22/22
18/22
18/22
15 / 15
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7/ 13
13
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7
9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1
3
1
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0
0
0
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
arrival time/required time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0 / 4
0 / 0
0 / 8
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
18/22
15 / 15
7 / 15
Slack = required time - arrival time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
4
0
8
4
0
8
2
0
6
2
4
0
4
4
0
8
Slack = required time - arrival time
Slack and Criticality Calculation
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
4
6
6
5
5
7
4
0 / 4
0 / 0
0 / 8
1 / 5
3 / 3
1 / 9
7 / 9
9 / 9
7/ 13
13 / 15
14 / 18
22/22
18/22
Critical Path
18/22
15 / 15
7 / 15
Timing-Driven Clustering – T-VPACK
° Cost metric now considers both connectivity and timing criticality
° Perform an analysis of criticality at beginning considering all wires to be inter-cluster
° Determine “Base” BLE criticality
Base Criticality
How to break ties?
° Initially, many paths may have the same number of BLEs
° Include “tie-breaking” in performance cost function
Results for T-VPACK versus VPACK
Why does the gap between VPack and T-VPack increase as N increases?
Results for T-VPACK versus VPACK
° T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out
° VPack favors input sharing° T-VPack completely absorbs many low-fanout nets
• Fewer nets to route!
Results for T-VPACK versus VPACK
Why does area-delay product show an increasing trend beyond cluster size of 10?
Results for T-VPACK versus VPACK
° Increased number of nets that are completely absorbed by T-Vpack
° Area- delay product• Cluster size 7-10 best choice (36-34% better than N=1)
° N=7 vs N=1• 30% less delay, 8% les area
Results for T-VPACK, DELAY !!!
Why do we see a circuit speedup?
Results for T-VPACK, DELAY !!!
18%
40%
° Intra-cluster: Fast, Inter-cluster: Slow !° As N increases
• Number of internal connections on the critical path increase• Number of external connections on the critical path decrease
Why are inter-cluster connections becoming faster?
Reduction in Number of external connections (internal connections are faster)External connections on the critical path are becoming faster
Reduction in routing requirements
Drawback of VPack and T-VPack