+ All Categories
Home > Documents > Before Placement: Clustering

Before Placement: Clustering

Date post: 22-Feb-2016
Category:
Upload: efuru
View: 23 times
Download: 0 times
Share this document with a friend
Description:
ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 6 Clustering Ali Akoglu. Before Placement: Clustering. Intra -cluster connections: fast Inter -cluster connections: slow Need to pack BLEs Goals : Reduce stress on routing - PowerPoint PPT Presentation
Popular Tags:
39
ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ ece506 Lecture 6 Clustering Ali Akoglu
Transcript
Page 1: Before Placement: Clustering

ECE 506

Reconfigurable Computing

http://www.ece.arizona.edu/~ece506

Lecture 6

Clustering

Ali Akoglu

Page 2: Before Placement: Clustering

Before Placement: Clustering° Intra-cluster connections: fast° Inter-cluster connections: slowNeed to pack BLEs ° Goals:

• Reduce stress on routing• Take advantage of local fast

interconnect• Reduce inter-cluster wiring• Minimize critical path (timing-

driven)° How do we do this

• Take advantage of cluster architecture

° Tradeoffs

Page 3: Before Placement: Clustering

Basic Clustering (Betz)

° How many distinct inputs should be provided to a cluster of N 4-LUTs?

° How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?

Page 4: Before Placement: Clustering

VPACK

Page 5: Before Placement: Clustering

Basic Clustering (Betz)

° Flow• Iterate until all BLEs consumed• Start new cluster by selecting a random BLE

- select the currently unclustered BLE with the most used inputs,• Add BLE with most shared inputs with current cluster to

cluster- to minimize the number of inputs that must be routed to

each cluster.• Keep adding until either cluster full or input pins used up• Hill climbing – if some cluster BLEs unused

- Add another BLE even if cluster input count temporarily overflowed

- If input count not eventually reduced select best choice from before hill climbing

Page 6: Before Placement: Clustering

Logic Utilization

Page 7: Before Placement: Clustering

Number of Inputs per Cluster

• Lots of opportunities for input sharing in large clusters (Betz – CICC’99)

• Reducing inputs reduces the size of the device and makes it faster.

• Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.

Page 8: Before Placement: Clustering

TVPACK

Page 9: Before Placement: Clustering

Architecture Modeling

Tri-state buffer and pass transistor distributionCluster Size vs. Routing resources (Tile size)Transistor and Buffer Scaling based on segment lengthFlexibility of Switches (Fc=W for large cluster size is a waste?)

Page 10: Before Placement: Clustering

Logic Cluster Structure

Page 11: Before Placement: Clustering

Timing-Driven Clustering – T-VPACK

° Optimization goals of VPack• Pack each cluster to its capacity

- Minimize number of clusters• Minimize number of inputs per cluster

- Reduce the number of external connections

Page 12: Before Placement: Clustering

Timing-Driven Clustering – T-VPACK

° Optimization goal of T-VPack• Minimize number of external connections on critical

path• Why?

- External connections have higher delay and internal connections

- Reducing number of external nets on critical path will reduce delay

Page 13: Before Placement: Clustering

Timing-Driven Clustering – T-VPACK

° First stage• Identify connections that are on the critical path

° Second Stage• Pack BLEs sequentially along the critical path • Recompute criticality of remaining BLEs

Page 14: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

Page 15: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

Arrival Times

Page 16: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

3

3

1

Arrival Times

Page 17: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7

7

Arrival Times

Page 18: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7

7

13

15

14

Arrival Times

Page 19: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7

7

13

15

14

18

22

18

Arrival Times

Page 20: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7

7

13

15

14

22/22

18/22

18/22

arrival time/required time

Page 21: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7 / 15

7

13

14/ 18

22/22

18/22

18/22

15 / 15

arrival time/required time

Page 22: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7/ 13

13

14 / 18

22/22

18/22

18/22

15 / 15

7 / 15

arrival time/required time

Page 23: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7

9

7/ 13

13 / 15

14 / 18

22/22

18/22

18/22

15 / 15

7 / 15

arrival time/required time

Page 24: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1

3

1

7 / 9

9 / 9

7/ 13

13 / 15

14 / 18

22/22

18/22

18/22

15 / 15

7 / 15

arrival time/required time

Page 25: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0

0

0

1 / 5

3 / 3

1 / 9

7 / 9

9 / 9

7/ 13

13 / 15

14 / 18

22/22

18/22

18/22

15 / 15

7 / 15

arrival time/required time

Page 26: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0 / 4

0 / 0

0 / 8

1 / 5

3 / 3

1 / 9

7 / 9

9 / 9

7/ 13

13 / 15

14 / 18

22/22

18/22

18/22

15 / 15

7 / 15

Slack = required time - arrival time

Page 27: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

4

0

8

4

0

8

2

0

6

2

4

0

4

4

0

8

Slack = required time - arrival time

Page 28: Before Placement: Clustering

Slack and Criticality Calculation

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

4

6

6

5

5

7

4

0 / 4

0 / 0

0 / 8

1 / 5

3 / 3

1 / 9

7 / 9

9 / 9

7/ 13

13 / 15

14 / 18

22/22

18/22

Critical Path

18/22

15 / 15

7 / 15

Page 29: Before Placement: Clustering

Timing-Driven Clustering – T-VPACK

° Cost metric now considers both connectivity and timing criticality

° Perform an analysis of criticality at beginning considering all wires to be inter-cluster

° Determine “Base” BLE criticality

Page 30: Before Placement: Clustering

Base Criticality

Page 31: Before Placement: Clustering

How to break ties?

° Initially, many paths may have the same number of BLEs

° Include “tie-breaking” in performance cost function

Page 32: Before Placement: Clustering

Results for T-VPACK versus VPACK

Why does the gap between VPack and T-VPack increase as N increases?

Page 33: Before Placement: Clustering

Results for T-VPACK versus VPACK

° T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out

° VPack favors input sharing° T-VPack completely absorbs many low-fanout nets

• Fewer nets to route!

Page 34: Before Placement: Clustering

Results for T-VPACK versus VPACK

Why does area-delay product show an increasing trend beyond cluster size of 10?

Page 35: Before Placement: Clustering

Results for T-VPACK versus VPACK

° Increased number of nets that are completely absorbed by T-Vpack

° Area- delay product• Cluster size 7-10 best choice (36-34% better than N=1)

° N=7 vs N=1• 30% less delay, 8% les area

Page 36: Before Placement: Clustering

Results for T-VPACK, DELAY !!!

Why do we see a circuit speedup?

Page 37: Before Placement: Clustering

Results for T-VPACK, DELAY !!!

18%

40%

° Intra-cluster: Fast, Inter-cluster: Slow !° As N increases

• Number of internal connections on the critical path increase• Number of external connections on the critical path decrease

Page 38: Before Placement: Clustering

Why are inter-cluster connections becoming faster?

Reduction in Number of external connections (internal connections are faster)External connections on the critical path are becoming faster

Reduction in routing requirements

Page 39: Before Placement: Clustering

Drawback of VPack and T-VPack


Recommended