A Semi-Persistent Clustering Technique for VLSI Circuit Placement

transcript

Charles J. Alpert1, Andrew Kahng2, Gi-Joon Nam1, Sherief Reda2 and Paul G. Villarrubia1

1IBM Corp.2Department of CSE, UCSD

bigblue4 design from ISPD2005 Suite

Implications in Placement

ScalabilityTractabilityRuntime vs. quality trade-off

SoC (System-on-Chip) designsMixed-size objectsWhite space

Problem Statement

What is the most effective and efficient clustering strategy for analytic placement?Quality of solutionCPU time

Clustering Concept

B Cluster A with its “closest

neighbor”

BUpdate the

circuit netlist

Clustering Score Function: d(u, v) = wij conn(u,v)[ size(u) + size(v) ]k

Clustering Literature

Tremendous amounts of research hereEdge-Coarsening (EC)First-Choice (FC)Edge-Separability (ESC)Peak-ClusteringEtc…

General drawbacksClique transformation

Edge weight discrepancyPass-based iterationLack of global clustering view

Best-Choice Clustering

Avoid clique transformationAvoid pass-based iterationsMore global view of clustering sequence

Priority-queue managementLazy-update speed-up technique

Area-controlled balanced clustering

1. Initialize the priority-queue PQ:

- For each cell u: calculate its clustering score c with its closest neighbor v.

- Insert the pair (u, v) into PQ based on their cost c.

2. Until the target cell number is reached:

- Pick the top of the heap (m, n)

- Cluster (m, n) into a new object mn; update the netlist

- Calculate mn closest neighbor k; insert (mn, k) into PQ

- Recalculate the clustering cost of all the neighbors to m and n

Best-Choice Example

Assume N-pin net weight = 1 / (n-1) Each object size = 1 Timing criticality is 1 for all

Best-Choice Example

A=1/2D=1

A B=1/2

CD=2/3

F=1/2E=1/2

Best-Choice Example

BCDA BDC=3/8

BCD=3/10

A=3/8 BCD=3/8

Best-Choice Example

ABCD EF=1/3

ABCD=1/3

ABCDEF

clustering_score = 2.875

Best-Choice Clustering Summary

Globally optimal clustering sequence via priority-queue data structureProduce better quality of results Clustering framework

Arbitrary clustering score function can be plugged in

Clustering score distribution1)First-choice (FC) : clustering_score = 5612.83

2)Best-choice (BC) : clustering_score = 6671.53

(1) (2)

Lazy Update Speed-up Technique

Priority Queue PQ

Top of the PQ Node A

Observations: 1. Node A might be updated a number of times before making it to

the top of the PQ (if ever), but the last update is what determines its final position in PQ

2. Statistics indicate than in 96% of our updating steps, updating node A score pushes A down in PQ

Lazy Update Speed-up Technique

Until the target cell number is reached: - Pick the top of the heap (m, n) - If (m, n) is invalid then - recalculate m closest neighbor n’ and insert (m, n’) in the heap else - Cluster (m, n) into a new object mn; update the netlist - Calculate mn closest neighbor k; insert (mn, k) in the heap - Mark all neighbors of m and n invalid

Main Idea: Wait until A gets to the top of the priority-queue and then update its score if necessary

Lazy Update Runtime Charateristic

1 2 3 4 5 6 7 8 9 10Cell Reduction (%)

Original

Lazy update

Note: Practically no impact to solution quality

Experiments

IBM CPLACE Analytic placement algorithm Semi-persistent clustering paradigm

Up-front clustering Selective unclustering during main global placement Full unclustering before detailed placement

Order-of-magnitude reduction by clustering

Industrial ASIC designsSize ranges from 56K to 880K placeable objects

Placement Results w/ Clustering

Average 4.3% WL improvement over EC BC is x8.76 slower than EC

AL BL CL DL EL FL

FC BC BC+Lazy

No Clustering vs. BC+Lazy Clustering

WL(%) CPU CL-CPU%

AL(270K) 2.09% 0.40 1.17%

BL(276K) -4.28% 0.52 1.35%

CL(351K) 3.27% 0.51 1.14%

DL(426K) 0.87% 0.45 1.35%

EL(456K) 1.59% 0.33 1.10%

FL(880K) 1.41% 0.46 1.68%

AD(389K) 8.23% 0.50 0.98%

BD(285K) -0.34% 0.47 0.94%

CD(56K) -0.36% 0.69 0.51%

Avg. 1.39% 0.48 1.14%

Conclusions

Globally optimal clustering sequence framework Independent of clustering scoring function Better clustering sequence Allow significant placement speed-up Almost no loss of quality of solution

Size control via clustering scoring function Effective for dense design

Future Work

Handling fixed blocks during clustering Ignoring nets connected to fixed objects Ignoring pins connected to fixed objects Including fixed blocks during clustering Etc….

No visible improvement at the moment

Cluster Size Control Results

Standard Automatic

Max Avg WL% Max Avg WL%

AD 14823 171.4 0.00 1140 160.4 -0.88

BD 28600 150.0 0.00 1140 114.6 3.71

CD 9060 113.5 0.00 610 109.8 30.05

• d(u, v) = wij conn(u,v)[ size(u) + size(v) ]k

Standard : k = 1Automatic: k = size(u) + size(v) / where = expected avg. size

A Semi-Persistent Clustering Technique for VLSI Circuit Placement

Documents