Post on 31-Dec-2015
description
transcript
A Semi-Persistent Clustering Technique for VLSI Circuit Placement
Charles J. Alpert1, Andrew Kahng2, Gi-Joon Nam1, Sherief Reda2 and Paul G. Villarrubia1
1IBM Corp.2Department of CSE, UCSD
2
bigblue4 design from ISPD2005 Suite
3
Implications in Placement
ScalabilityTractabilityRuntime vs. quality trade-off
SoC (System-on-Chip) designsMixed-size objectsWhite space
4
Problem Statement
What is the most effective and efficient clustering strategy for analytic placement?Quality of solutionCPU time
5
Clustering Concept
A
D
E
F
C
B Cluster A with its “closest
neighbor”
A
D
E
F
C
B
AC
D
E
F
BUpdate the
circuit netlist
Clustering Score Function: d(u, v) = wij conn(u,v)[ size(u) + size(v) ]k
6
Clustering Literature
Tremendous amounts of research hereEdge-Coarsening (EC)First-Choice (FC)Edge-Separability (ESC)Peak-ClusteringEtc…
General drawbacksClique transformation
Edge weight discrepancyPass-based iterationLack of global clustering view
7
Best-Choice Clustering
Avoid clique transformationAvoid pass-based iterationsMore global view of clustering sequence
Priority-queue managementLazy-update speed-up technique
Area-controlled balanced clustering
8
Best-Choice Clustering
1. Initialize the priority-queue PQ:
- For each cell u: calculate its clustering score c with its closest neighbor v.
- Insert the pair (u, v) into PQ based on their cost c.
2. Until the target cell number is reached:
- Pick the top of the heap (m, n)
- Cluster (m, n) into a new object mn; update the netlist
- Calculate mn closest neighbor k; insert (mn, k) into PQ
- Recalculate the clustering cost of all the neighbors to m and n
9
Best-Choice Example
Assume N-pin net weight = 1 / (n-1) Each object size = 1 Timing criticality is 1 for all
nets
EF
C
B
D
A
10
Best-Choice Example
B=1/2
A=1/2D=1
C=1
D=3/4
D=1/2
EF
C
B
D
A
EF
CD
B
A B=1/2
CD=2/3
B=2/3
F=1/2E=1/2
11
Best-Choice Example
EF
BCDA BDC=3/8
F=1/2
A=3/8
E=1/2
EF
BCDA
BCD=3/10
A=3/8 BCD=3/8
12
Best-Choice Example
EF
ABCD EF=1/3
ABCD=1/3
ABCDEF
clustering_score = 2.875
13
Best-Choice Clustering Summary
Globally optimal clustering sequence via priority-queue data structureProduce better quality of results Clustering framework
Arbitrary clustering score function can be plugged in
14
Best-Choice Clustering
Clustering score distribution1)First-choice (FC) : clustering_score = 5612.83
2)Best-choice (BC) : clustering_score = 6671.53
(1) (2)
15
Lazy Update Speed-up Technique
Priority Queue PQ
Top of the PQ Node A
Observations: 1. Node A might be updated a number of times before making it to
the top of the PQ (if ever), but the last update is what determines its final position in PQ
2. Statistics indicate than in 96% of our updating steps, updating node A score pushes A down in PQ
16
Lazy Update Speed-up Technique
Until the target cell number is reached: - Pick the top of the heap (m, n) - If (m, n) is invalid then - recalculate m closest neighbor n’ and insert (m, n’) in the heap else - Cluster (m, n) into a new object mn; update the netlist - Calculate mn closest neighbor k; insert (mn, k) in the heap - Mark all neighbors of m and n invalid
Main Idea: Wait until A gets to the top of the priority-queue and then update its score if necessary
17
Lazy Update Runtime Charateristic
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10Cell Reduction (%)
Ru
nti
me
(s)
Original
Lazy update
Note: Practically no impact to solution quality
18
Experiments
IBM CPLACE Analytic placement algorithm Semi-persistent clustering paradigm
Up-front clustering Selective unclustering during main global placement Full unclustering before detailed placement
Order-of-magnitude reduction by clustering
Industrial ASIC designsSize ranges from 56K to 880K placeable objects
19
Placement Results w/ Clustering
Average 4.3% WL improvement over EC BC is x8.76 slower than EC
-2
-1
0
1
2
3
4
5
6
7
WL
Im
pro
ve
me
nt%
ov
er
EC
AL BL CL DL EL FL
FC BC BC+Lazy
20
No Clustering vs. BC+Lazy Clustering
WL(%) CPU CL-CPU%
AL(270K) 2.09% 0.40 1.17%
BL(276K) -4.28% 0.52 1.35%
CL(351K) 3.27% 0.51 1.14%
DL(426K) 0.87% 0.45 1.35%
EL(456K) 1.59% 0.33 1.10%
FL(880K) 1.41% 0.46 1.68%
AD(389K) 8.23% 0.50 0.98%
BD(285K) -0.34% 0.47 0.94%
CD(56K) -0.36% 0.69 0.51%
Avg. 1.39% 0.48 1.14%
21
Conclusions
Globally optimal clustering sequence framework Independent of clustering scoring function Better clustering sequence Allow significant placement speed-up Almost no loss of quality of solution
Size control via clustering scoring function Effective for dense design
22
Future Work
Handling fixed blocks during clustering Ignoring nets connected to fixed objects Ignoring pins connected to fixed objects Including fixed blocks during clustering Etc….
No visible improvement at the moment
23
Cluster Size Control Results
Standard Automatic
Max Avg WL% Max Avg WL%
AD 14823 171.4 0.00 1140 160.4 -0.88
BD 28600 150.0 0.00 1140 114.6 3.71
CD 9060 113.5 0.00 610 109.8 30.05
• d(u, v) = wij conn(u,v)[ size(u) + size(v) ]k
Standard : k = 1Automatic: k = size(u) + size(v) / where = expected avg. size