Post on 07-Jul-2020
transcript
ASPDAC03 – Physical Chip Implementation – Section IV 1
Jan. 2003 ASPDAC03 – Physical Chip Implementation 1
Section IV: Timing Closure Techniques
Jan. 2003 ASPDAC03 - Physical Chip Implementation 2
IBM Contributions to this presentation include:
T.J. Watson Research CenterAustin Research LabASIC Design CentersEDA Organization
* For more detailed information see references atthe end of this presentation, which include a widevariety of IBM and External publications covering theseareas.
ASPDAC03 – Physical Chip Implementation – Section IV 2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 3
OverviewIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithms (skip)Paradigms for placement-synthesis integrationPlacement aware synthesis techniques (skip)Congestion avoidance / mitigation techniquesRouting optimization
Jan. 2003 ASPDAC03 - Physical Chip Implementation 4
Timing ClosureMany aspects of a design contribute to performance, power, and density
Architecture / Logic ImplementationPD Design Style (Flat, Hierarchical, etc)Clocking Paradigm / Test / Circuit FamilyFloor Plan / Synthesis / Placement / Routing
Design Automation for timing closure is more significant than ever before
Designs are largerWires are longer, invalidating statistical synthesis models, and requiring lots of buffersCycle times are more aggressive
ASPDAC03 – Physical Chip Implementation – Section IV 3
Jan. 2003 ASPDAC03 - Physical Chip Implementation 5
Design Automation Tools are Individually Mature
Timing analysisSynthesis / Technology mappingPlacement / RoutingFloor PlanningExtraction / Analysis
Jan. 2003 ASPDAC03 - Physical Chip Implementation 6
Physical SynthesisPlace&route
synthesis timing
Challenge is to integrate them into one cooperative application
Netlist in
Completed Design
ASPDAC03 – Physical Chip Implementation – Section IV 4
Jan. 2003 ASPDAC03 - Physical Chip Implementation 7
Design Flow Evolution:Design Entry
Synthesis w/Timing
Place
Route
Timing
1. Tech independent optimization
2. Tech mapping
3. Timing correctionTiming driven
placement Timing Driven
Placement plus
Automatic Post
placement tuning
Integrated Placement
and Synthesis
Integrated Placement, Synthesis &
Routing
1. Physically aware optimizations
2. Physically aware timing correction
3. Timing / Noise aware routing
Jan. 2003 ASPDAC03 - Physical Chip Implementation 8
Purpose of this Section:
Provide users with an intuitive feel of the inner workings of the major timing closure toolsDemonstrate the advancements in timing closure tools technology via example designsExplore a variety of significant design choices
ASPDAC03 – Physical Chip Implementation – Section IV 5
Jan. 2003 ASPDAC03 - Physical Chip Implementation 9
What you should expect:
High level concepts presented are generally applicable across a wide range of tools / methodologies (ie: not IBM specific)Specific tool internals used in this tutorial are taken from IBM tools. They should provide a reasonable “feel” as to how things are done in the industry.
Jan. 2003 ASPDAC03 - Physical Chip Implementation 10
Worldwide ASIC/PLD SalesTop 5 Suppliers for 2001
IBM $ 2758 growth 1.2%Agere $ 1310 growth -43.5%LSI $ 1243 growth -38.2% NEC $ 1243 growth -35.2% XLIINX $ 1149 growth -26.3%
Revenue: Millions of U.S. DollarsSource: Gartner Dataquest (March 2002)
ASPDAC03 – Physical Chip Implementation – Section IV 6
Jan. 2003 ASPDAC03 - Physical Chip Implementation 11
IBM ASIC Supplier #1 since 1999IBM ASIC Supplier #1 since 1999
NEC
Lucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
Toshiba
Hitachi Altera
NEC
Lucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
Toshiba
Altera
NEC
Lucent
LSI Logic
IBM
VLSI
XilinxTI
Fujitsu
Toshiba
Altera
NECLucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
STM
NEC
LucentLSI Logic
IBM12345678910
1996 1997 1998 1999 2000
AlteraToshiba
Xilinx
TI
Fujitsu
STM
NEC
LucentLSI Logic
IBM
AlteraToshiba
Xilinx
Agilent
Fujitsu
Mitsubishi
2001
Dataquest 96-02
Jan. 2003 ASPDAC03 - Physical Chip Implementation 12
Section OutlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
ASPDAC03 – Physical Chip Implementation – Section IV 7
Jan. 2003 ASPDAC03 - Physical Chip Implementation 13
Static Timing Analysis
Jan. 2003 ASPDAC03 - Physical Chip Implementation 14
Timing Analysis Basics:Why static timing since simulation is more accurate?
c=0 c=1b=0 a-z delay1 a-z delay2 b=1 a-z delay3 a-z delay4
Exponential explosion as possible design input states grow!
a
b
c
zHow would one calculate the worst case rising delay from a to z?
A simple example:
Simulation has a number of key drawbacksrequires input state vectorslong runtimes
ASPDAC03 – Physical Chip Implementation – Section IV 8
Jan. 2003 ASPDAC03 - Physical Chip Implementation 15
-Required arrival time(RAT) -- the time a signal must arrive at in order to avoid a chip fail
-Slack = Required arrival time - Arrival time– Positive slack good, negative slack bad
Definition of basic terms-Arrival time(AT) -- the time at which a pin switches state
90
10time
vdd
slew = time90 -time10
50 AT = time50
-Slew - the rate at which a signal switches– usually difference of 10% and 90% on voltage curve
Timing Analysis Basics:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 16
Block based timing:Worst value only stored at merge pointsEach segment is processed just once
d=2
d=1
d=5
d=3
d=2
d=1
d=3
d=3d=1
temp at=3 temp at=7
Example Problem: What is slack at PO?
Timing Analysis Basics:
at=0
at=0
at=0
at=1
at=2
at=5 at=6
at=5
at=8at=11
rat=10
Slack= -1
ASPDAC03 – Physical Chip Implementation – Section IV 9
Jan. 2003 ASPDAC03 - Physical Chip Implementation 17
What is Incremental Timing?Enabling small incremental changes without full retimingOnly direct fanin/fanout cone is processed
at=
at=at=
at=at=
d=2
at=
at=0
at=0
rat=10at=0
at=
5
d=1
d=5
d=3
d=2
d=1
d=3
d=3d=1
2
68
5
1
11
d=1d=1d=1
at=2 slack=0
it passed!
at=3at=7
at=10
at=1
Timing Analysis Basics:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 18
Early Mode Analysis
0=aAT1=bAT
2=xRAT
1=xAT121 −=−=xSL
101 =−=bSL
000 =−=aSL1=yAT
0=cAT
011 =−=ySL
a
b xc
y
Definitions change as follows– longest becomes shortest– slack = arrival - required
1 1
110 −=−=cSL
ASPDAC03 – Physical Chip Implementation – Section IV 10
Jan. 2003 ASPDAC03 - Physical Chip Implementation 19
Timing Correction
Fix electrical violationsResize cellsBuffer netsCopy (clone) cells
Fix timing problemsLocal transforms (bag of tricks)Path-based transforms
Jan. 2003 ASPDAC03 - Physical Chip Implementation 20
Local Synthesis Transforms
Resize cellsBuffer or clone to reduce load on critical netsDecompose large cellsSwap connections on commutative pins or among equivalent netsMove critical signals forwardPad early pathsArea recovery
ASPDAC03 – Physical Chip Implementation – Section IV 11
Jan. 2003 ASPDAC03 - Physical Chip Implementation 21
Transform Example
Delay = 4
…..
Double Inverter
Removal
…..
…..
Delay = 2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 22
Resizing
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
ad
e
f0.2
0.2
0.3
?
b
aA
0.035
b
aC
0.026
ASPDAC03 – Physical Chip Implementation – Section IV 12
Jan. 2003 ASPDAC03 - Physical Chip Implementation 23
Cloning
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
gh
0.2
0.2
0.20.20.2
?
b
a
d
ef
gh
A
B
Jan. 2003 ASPDAC03 - Physical Chip Implementation 24
Buffering
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
gh
0.2
0.2
0.20.20.2
? b
a
d
e
f
gh
0.1
0.2
0.20.20.2
BB
0.2
ASPDAC03 – Physical Chip Implementation – Section IV 13
Jan. 2003 ASPDAC03 - Physical Chip Implementation 25
Redesign Fan-in Tree
a
cd
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=61
1
1
cd
e
Arr(e)=51
1b1
a
Jan. 2003 ASPDAC03 - Physical Chip Implementation 26
Redesign Fan-out Tree
1
1
1
3
1
1
1
Longest Path = 5
1
1
1
3
1
2
Longest Path = 4Slowdown of buffer due to load
ASPDAC03 – Physical Chip Implementation – Section IV 14
Jan. 2003 ASPDAC03 - Physical Chip Implementation 27
Decomposition
Jan. 2003 ASPDAC03 - Physical Chip Implementation 28
Swap Commutative Pins
2
c
ab
2
1
0 1
1
1
3
a
cb
2
1
0
1
1
2
1 5
Simple Sorting on arrival times and delay works
ASPDAC03 – Physical Chip Implementation – Section IV 15
Jan. 2003 ASPDAC03 - Physical Chip Implementation 29
Move Critical Signals Forward
Based on ATPG– linear in circuit size– Detects
redundancies efficiently
Efficiently find wires to be added and remove.– Based on
mandatory assignments..
ab
cd e
ab
edc
Jan. 2003 ASPDAC03 - Physical Chip Implementation 30
Section outlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
ASPDAC03 – Physical Chip Implementation – Section IV 16
Jan. 2003 ASPDAC03 - Physical Chip Implementation 31
Placement Objective:Find optimal relative ordering of cells
minimize wire length and congestionmaximize timing slack
Find optimal spacing of cellseliminate wiring congestion problemsprovide space for post placement synthesis
clock treesbuffer insertiontiming correction
Find optimal Global Position
Jan. 2003 ASPDAC03 - Physical Chip Implementation 32
A B C
Optimal Relative Order:
ASPDAC03 – Physical Chip Implementation – Section IV 17
Jan. 2003 ASPDAC03 - Physical Chip Implementation 33
A B C
To spread ...
Jan. 2003 ASPDAC03 - Physical Chip Implementation 34
A B C
.. or not to spread
ASPDAC03 – Physical Chip Implementation – Section IV 18
Jan. 2003 ASPDAC03 - Physical Chip Implementation 35
A B C
Place to the left
Jan. 2003 ASPDAC03 - Physical Chip Implementation 36
A B C
… or to the right
ASPDAC03 – Physical Chip Implementation – Section IV 19
Jan. 2003 ASPDAC03 - Physical Chip Implementation 37
A B C
Optimal Relative Order:
Without “free” space the problem is dominated by order
Jan. 2003 ASPDAC03 - Physical Chip Implementation 38
Placement Footprints:Standard Cell:
Data Path:
IP - Floorplanning
ASPDAC03 – Physical Chip Implementation – Section IV 20
Jan. 2003 ASPDAC03 - Physical Chip Implementation 39
Core
ControlIO
Reserved areas
Mixed Data Path &sea of gates:
Placement Footprints:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 40
Perimeter IO
Area IO
Placement Footprints:
ASPDAC03 – Physical Chip Implementation – Section IV 21
Jan. 2003 ASPDAC03 - Physical Chip Implementation 41
Placement objectives are subject to User Constraints / Design Style:
Hierarchical Design Constraintspin locationpower rail reserved layers
Flat Design w/Floor Plan constraintsFixed circuitsIO connections
Jan. 2003 ASPDAC03 - Physical Chip Implementation 42
UnconstrainedPlacement
ASPDAC03 – Physical Chip Implementation – Section IV 22
Jan. 2003 ASPDAC03 - Physical Chip Implementation 43
Floor plannedPlacement
Jan. 2003 ASPDAC03 - Physical Chip Implementation 44
CongestionMAP
ASPDAC03 – Physical Chip Implementation – Section IV 23
Jan. 2003 ASPDAC03 - Physical Chip Implementation 45
Advantages of HierarchyDesign is carved into smaller pieces that can be worked on in parallel (improved throughput)A known floor plan provides the logic design team with a large degree of placement control. A known floor plan provided early knowledge of long wiresTiming closure problems can be addressed by tools, logic design, and hierarchy manipulationLate design changes can be done with minimal turmoil to the entire design
Jan. 2003 ASPDAC03 - Physical Chip Implementation 46
Disadvantages of HierarchyResults depend on the quality of the hierarchy. The logic hierarchy must be designed with PD taken into account.Additional methodology requirements must be met to enable hierarchy. Ex. Pin assignment, Macro Abstract management, area budgeting, floor planning, timing budgets, etc Late design changes may affect multiple components.Hierarchy allows divergent methodologies Hierarchy hinders DA algorithms. They can no longer perform global optimizations.
ASPDAC03 – Physical Chip Implementation – Section IV 24
Jan. 2003 ASPDAC03 - Physical Chip Implementation 47
Physical Synthesis FlowSynthesized NetlistWire-load Models
UnplacedPhysically “unaware” timing
Cleanup: Remove buffers, nominal power levels on gates
Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing
Logical + Placement optimizations
Timing-driven placement w/resynthesis
For minimal netweights, based on the timing of the net
Physically aware logic optimizations
Timing Improvement
?Placed Netlist
Yes No more
Jan. 2003 ASPDAC03 - Physical Chip Implementation 48
Example of Logical + Placement Optimizations
CutBin
Start with a placed or unplaced netlistDo recursive partitioningDuring and following each partition action, apply logic optimizations such as
timing correctionsrebufferingrepoweringcloningpin swapping move boxes… etc
ASPDAC03 – Physical Chip Implementation – Section IV 25
Jan. 2003 ASPDAC03 - Physical Chip Implementation 49
Summary of Placement MethodsSimulated annealing
(+) High-quality, arbitrary objectives and constraints, parallelizable, easy to implement(-) Doesn’t scale
Quadratic (or, “analytic”)(+) Mathematically clean, fast (ConjGrad) solvers(-) Solving “the wrong problem”, highly illegal solutions must be legalized, fixed “anchors” neededExample: Alpert, Nam, Villarubia QUAD+ACG placer (ICCAD-02)
Partitioning-based(+) Fastest, scales well if multilevel used, good quality(-) Must be heavily tuned (hMetis, MLPart), difficult to constrain, unstable results (same quality but different structure) (?)Example: Capo (http://gigascale.org/bookshelf/)
Jan. 2003 ASPDAC03 - Physical Chip Implementation 50
Section OutlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
ASPDAC03 – Physical Chip Implementation – Section IV 26
Jan. 2003 ASPDAC03 - Physical Chip Implementation 51
Overview of Common Placement Algorithms:
- Simulated Annealing- Quadratic Placement- Partitioning
Jan. 2003 ASPDAC03 - Physical Chip Implementation 52
for(temp=high; temp > absolute_zero; temp -= increment){
make a random movescore the moveuse temp dependent probability to decide to accept or reject
}
Simulated Annealing:
Note: Clustering can be useto improve performance
ASPDAC03 – Physical Chip Implementation – Section IV 27
Jan. 2003 ASPDAC03 - Physical Chip Implementation 53
Annealing::
Pros:- ease of implementation, dumb moves / smart scoring- can easily accommodate new constraints - just add them to the
scoring function- great quality - can be made to run on parallel processors
Cons:- very long run time
Jan. 2003 ASPDAC03 - Physical Chip Implementation 54
Quadratic Placement
ASPDAC03 – Physical Chip Implementation – Section IV 28
Jan. 2003 ASPDAC03 - Physical Chip Implementation 55
Cost = (x1 − 100)2 + (x1 − x2)2 +(x2 −200)2
x1Cost = 2(x1 − 100) + 2(x1 − x2)
x2Cost =− 2(x1 −x2) +2(x2 − 200)
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 −2−2 4
x1x2
+ −200−400
= 02 −1−1 2
x1x2
+ −100−200
x1=400/3 x2=500/3
xx22
x1
x=100 x=200Review:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 56
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 −2−2 4
x1x2
+ −200−400
= 02 −1−1 2
x1x2
+ −100−200
x1=400/3 x2=500/3
xx22
x1
x=100 x=200
Interpretation of matrices A and B:
The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are 1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object i
Review:
ASPDAC03 – Physical Chip Implementation – Section IV 29
Jan. 2003 ASPDAC03 - Physical Chip Implementation 57
Why formulate the problem this way?
Because we canBecause it is trivial to solveBecause there is only one solutionBecause the solution is a global optimumBecause the solution conveys “relative order” informationBecause the solution conveys “global position” information
Jan. 2003 ASPDAC03 - Physical Chip Implementation 58
However:Solution is not legalSolution depends of fixed anchor pointsSolution does not minimize linear wire length, congestion, or timingSolution is generally highly overlapping w/ high density (ie needs to be spread out)
ASPDAC03 – Physical Chip Implementation – Section IV 30
Jan. 2003 ASPDAC03 - Physical Chip Implementation 59
What does the solution look like?
To get an intuitive feel for the solution, examine the relaxation method for solving Ax + B = 0Actual program implementation may use other solution methods (that are generally less intuitive).
Jan. 2003 ASPDAC03 - Physical Chip Implementation 60
Solution of Quadratic using Relaxation:
ASPDAC03 – Physical Chip Implementation – Section IV 31
Jan. 2003 ASPDAC03 - Physical Chip Implementation 61
Jan. 2003 ASPDAC03 - Physical Chip Implementation 62
Constrained Solutions:
Sometimes we want to solve for the minimum wirelength subject to a constraintExample: Using quadratic for partitioning, we may want the quadratic placement to be "centered"
ASPDAC03 – Physical Chip Implementation – Section IV 32
Jan. 2003 ASPDAC03 - Physical Chip Implementation 63
Jan. 2003 ASPDAC03 - Physical Chip Implementation 64
ASPDAC03 – Physical Chip Implementation – Section IV 33
Jan. 2003 ASPDAC03 - Physical Chip Implementation 65
T o minimize C ost = f(x) subjec t to a constra int g(x) = 0 we can use la ngrangia n mult ip liers to mod ify the C ost func tio n as follows :
Cost = f(x) +2g(x)
x Cost = x f(x) + 2 x g(x)
Us ing C G as a cons tra int: whe re : s is the size o f object_i CG = i=1
n s i x i i=1n s i
n is the number of objects g(x ) = ( i=1
n s ix i i=1n s i ) − CG
w here we use N to represe nt the cons tant x g(x) =s iN i=1
n s i
W e ha ve alread y show n tha t
leads to the sys te m of equa tio ns - -- Ax + B = 0x f(x) = 0
T herefore solving the co nstra ined porble m = 0 x Cost = x f(x) + 2 x g(x) leads to : = 0Ax + B + 2 s i
N
Constrained Solutions
Jan. 2003 ASPDAC03 - Physical Chip Implementation 66
To solve Ax + B + = 0 we could use a packaged solve r and add the 2s iN
additiona l unknow n and equatio n to our ma tric ies and 2 CG = i=1n s ix i N
so lve.
Here is an alte rna tive way to solve the system:
by substitutio n we le t x = xu+2x l where is the unco nstra ined solutio n (ie the solutio n to Ax + B = 0)xu
Assuming we can solve the uncons trained proble m, is known.xu
By subs titutio n we ge t:
A(xu + 2x l) + B + 2 s iN = 0
which becomes:
or A2x l + 2s iN = 0 Ax l +
s iN = 0
Constrained Solutions (cont):
ASPDAC03 – Physical Chip Implementation – Section IV 34
Jan. 2003 ASPDAC03 - Physical Chip Implementation 67
W e need to so lve : A x l +s iN = 0
N o te : T he A ma tr ix is t he same a s the A ma tr ix fo r the unco ns tra ined so lut io n. S ince the A ma tr ix is the ne t lis t co n nec t iv it y sp ec ifica tio n, w e have A .
The B mat r ix here is ins tead o f the sum o f fixed lo ca t io n co nnec ts.s iN
In t rep re ta t io n :
T he so lu t io n to can b e o b ta ined b y mo d ify in g the o r igina l ne t lis t A x l +s iN = 0
and p laceme n t such tha t :
1 .) A ll fixed o b jec ts a re mo ved x = 02 .) A co ns tan t fo rce vec to r is ap p lied to each o b jec t. The co ns tan t fo rce vec to r fo r the i’ th o b jec t has mag n itud e s i
N
T hen use the same so lve r a s w as used to so lve A x + B = 0
Constrained Solutions (cont):
Jan. 2003 ASPDAC03 - Physical Chip Implementation 68
We also need to solve for 2
From the CG relationship and we get:x=xu+2xl
where (ie total size)CG= i=0n si(xui +2xli ) N N= i=0
n si
since we have solved for and the only unkown is xu xl 2
we get:
2=NCG− i=0
n sixui
i=0n sixli
Constrained Solutions (cont):
ASPDAC03 – Physical Chip Implementation – Section IV 35
Jan. 2003 ASPDAC03 - Physical Chip Implementation 69
To m in im ize f(x ) (t he w l sq uared co st fu nc t io n) sub jec t to a C G co nst ra in t w e d o thefo llo w in g :
1 .) S o lve fo r b y so lv in g us ing re laxa t ion o r so me o ther me th o dx u A x u + B = 0
2 .) S o lve fo r as fo llo w s : x l A x l +s iN = 0
- M o ve all fixed o b jec ts to lo ca t io n= 0
- A d d a co nstan t fo rce vecto r to each o b ject. The co nsta nt fo rce vec to r fo r the i’ th o b ject has mag n itud e s i
N - U sin g re la xat io n o r so me o the r metho d , so lve fo r x l
3 .) S o lve fo r us in g 2 2=NCG− i= 0
n s i x u i
i= 0n s ix l i
4 .) C o mp u te the fina l p laceme n t us in g x = x u + 2x l
Constrained Solutions (summary):
Jan. 2003 ASPDAC03 - Physical Chip Implementation 70
xx22
x=100 x=200
Force CG to 150s=100
From the previous example we know that the solution to :
Axu + B = 0
x =133.33166.67
with this solution the CG is at (ie not 150)(10 133.33)+(100 166.67)110 = 163.64
Now we need to solve:
which is the same as solving -> Axl +s iN = 0 Axl +
00 +
s iN = 0
Recall that the B matrix represents the position of fixed objects. So, this equationrepresents the solution to :
x1s=10
x1x2
10/110
100/110x = 0
Review:
ASPDAC03 – Physical Chip Implementation – Section IV 36
Jan. 2003 ASPDAC03 - Physical Chip Implementation 71
Constrained Solutions (summary):Advantages of this approach:
1.) The Solver data structure is the netlist onlyie. no additional memory requirements
2.) Sometimes the unconstrained solution is by itself sufficient,therefore we can avoid the additional overhead of producingthe constrained solution
3.) The numerical iterations in this method are NOT dependent onthe CG. We can solve for xu and xl, then try many different CG pointsat very low cost.
Jan. 2003 ASPDAC03 - Physical Chip Implementation 72
Quadratic Techniques:Pros:- mathematically well behaved- efficient solution techniques find global optimum- great quality
Cons:- solution of Ax + B = 0 is not a legal placement, so generally
some additional partitioning techniques are required.- solution of Ax + B = 0 is that of the "mapped" problem, ie
nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additionalmethods are deployed
- fixed IOs are required for these techniques to work well
ASPDAC03 – Physical Chip Implementation – Section IV 37
Jan. 2003 ASPDAC03 - Physical Chip Implementation 73
Partitioning
Jan. 2003 ASPDAC03 - Physical Chip Implementation 74
Partitioning:
Objective:
Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of nets connecting the two sets is minimized.
ASPDAC03 – Physical Chip Implementation – Section IV 38
Jan. 2003 ASPDAC03 - Physical Chip Implementation 75
FM Partitioning:
Initial Random Placement
After Cut 1
After Cut 2
list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){
for_each_set_in(list_of_sets){
partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */
}
Jan. 2003 ASPDAC03 - Physical Chip Implementation 76
FM Partitioning:
-1
-2
-1
1
0
0
0
2
0
0
1
-
-1
-2
- each object is assigned a gain
- objects are put into a sortedgain list
- the object with the highest gainfrom the smaller of the two sidesis selected and moved.
- the moved object is "locked"- gains of "touched" objects are
recomputed- gain lists are resorted
Object Gain: The amount of change in cut crossingsthat will occur if an object is moved fromits current partition into the other partition
Moves are made based on object gain.
ASPDAC03 – Physical Chip Implementation – Section IV 39
Jan. 2003 ASPDAC03 - Physical Chip Implementation 77
-1
-2
-1
1
0
0
0
2
0
0
1
-
-1
-2
FM Partitioning:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 78
-1
-2
-1
1
0
-2
-20
0
1
-
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 40
Jan. 2003 ASPDAC03 - Physical Chip Implementation 79
-1
-2
-1
1
0
-2
-20
0
1
-
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 80
-1
-2
-11
0
-2
-20
0
1
-
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 41
Jan. 2003 ASPDAC03 - Physical Chip Implementation 81
-1
-2
1 -1
0
-2
-20
-2
-1
-
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 82
-1
-2
1 -1
0
-2
-2 0
-2
-1
-
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 42
Jan. 2003 ASPDAC03 - Physical Chip Implementation 83
-1
-2
1 -1
0
-2
-20
-2
-1
-
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 84
-1
-2
1 -1
-2
-2
-2
0
-2
-1
1
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 43
Jan. 2003 ASPDAC03 - Physical Chip Implementation 85
-1
-2
1
-1
-2
-2
-2
0
-2
-1
1
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 86
-1
-2
1
-1
-2
-2
-2
0
-2
-1
1
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 44
Jan. 2003 ASPDAC03 - Physical Chip Implementation 87
-1
-2
-1
-3
-2
-2
-2
0
-2
-1
1
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 88
-1
-2
-1
-3
-2
-2
-2
0
-2
-1
1
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 45
Jan. 2003 ASPDAC03 - Physical Chip Implementation 89
-1
-2
-1
-3
-2
-2
-2
0
-2
-1
1
-1
-2
-2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 90
-1
-2
-1
-3
-2
-2
-2
-2
-2
-1
-1
-1
-2
-2
ASPDAC03 – Physical Chip Implementation – Section IV 46
Jan. 2003 ASPDAC03 - Physical Chip Implementation 91
Partitioning:
Pros:- very fast- great quality- scales nearly linearly with problem size
Cons:- non-trivial to implement- very directed algorithm, but this limits the ability to deal with
miscellaneous constraints
Jan. 2003 ASPDAC03 - Physical Chip Implementation 92
FM Partitioning
- For large designs min-cut (FM) produces poor results
To Compensate, there are two widely used enhancements:
1.) Quadratic seeding
2.) Multi-Level partitioning
ASPDAC03 – Physical Chip Implementation – Section IV 47
Jan. 2003 ASPDAC03 - Physical Chip Implementation 93
cut linecut line
move1
move2
move4
move3
Partitioning:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 94
Global Placement - Multi-Level Partitioning:
move1
move2
move4
move3
0 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
generate clusters:while(there are clusters)
{partition_it;remove 1 cluster layer;
}partition_it;
ASPDAC03 – Physical Chip Implementation – Section IV 48
Jan. 2003 ASPDAC03 - Physical Chip Implementation 95
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 96
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
ASPDAC03 – Physical Chip Implementation – Section IV 49
Jan. 2003 ASPDAC03 - Physical Chip Implementation 97
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 98
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
ASPDAC03 – Physical Chip Implementation – Section IV 50
Jan. 2003 ASPDAC03 - Physical Chip Implementation 99
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 100
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
ASPDAC03 – Physical Chip Implementation – Section IV 51
Jan. 2003 ASPDAC03 - Physical Chip Implementation 101
move1
move2
move4
move3
0 0 0
0 11
00
1
10
0
1 0
0
10
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 102
move1
move2
move4
move3
0 0 0
0 11
00
1
10
0
1 0
0
10
0
ASPDAC03 – Physical Chip Implementation – Section IV 52
Jan. 2003 ASPDAC03 - Physical Chip Implementation 103
move1
move2
move4
move3
0 0 0
0 11
00
1 1
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 104
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 53
Jan. 2003 ASPDAC03 - Physical Chip Implementation 105
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 106
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 54
Jan. 2003 ASPDAC03 - Physical Chip Implementation 107
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 108
move1
move2
move4
move3
0 0 0
0 11
00
1
00
1
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 55
Jan. 2003 ASPDAC03 - Physical Chip Implementation 109
move1
move2
move4
move3
0 0 0
0 11
00
1
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 110
move1
move2
move4
move3
0 0 0
0 11
00
1
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 56
Jan. 2003 ASPDAC03 - Physical Chip Implementation 111
move1
move2
move4
move3
0 0 0
0 11
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 112
move1
move2
move4
move3
0 0 0
0 11
00
1
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 57
Jan. 2003 ASPDAC03 - Physical Chip Implementation 113
move1
move2
move4
move3
0 0 0
0 11
00
1
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 114
move1
move2
move4
move3
0 0 0
0 11
00
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 58
Jan. 2003 ASPDAC03 - Physical Chip Implementation 115
move1
move2
move4
move3
0 0 0
0 11
00
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 116
move1
move2
move4
move3
0 0 0
0 11
00
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 59
Jan. 2003 ASPDAC03 - Physical Chip Implementation 117
move1
move2
move4
move3
0 0 0
0 11
00
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 118
move1
move2
move4
move3
0 0 0
0 11
00
0
0
ASPDAC03 – Physical Chip Implementation – Section IV 60
Jan. 2003 ASPDAC03 - Physical Chip Implementation 119
move1
move2
move4
move3
0 0 0
0 11
00
0
0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 120
MLP/FM Partitioning Cons:
Does not know how to handle “free” spaceResults tend to be erratic, ie results from run to run have significant variation
ASPDAC03 – Physical Chip Implementation – Section IV 61
Jan. 2003 ASPDAC03 - Physical Chip Implementation 121
MLP/FM Partitioning Pros:
Handles designs that have no fixed connection pointsVery fast - can handle large designs
Jan. 2003 ASPDAC03 - Physical Chip Implementation 122
Hybrid Techniques
Use both MLP and Quadratic techniquesResults are more predictable due to quadratic cost functionPartitioning is used for overlap removalQuadratic is used for “free” space handling and some relative order indications
ASPDAC03 – Physical Chip Implementation – Section IV 62
Jan. 2003 ASPDAC03 - Physical Chip Implementation 123
Quadratic Partitioning
Jan. 2003 ASPDAC03 - Physical Chip Implementation 124
ASPDAC03 – Physical Chip Implementation – Section IV 63
Jan. 2003 ASPDAC03 - Physical Chip Implementation 125
Jan. 2003 ASPDAC03 - Physical Chip Implementation 126
ASPDAC03 – Physical Chip Implementation – Section IV 64
Jan. 2003 ASPDAC03 - Physical Chip Implementation 127
Jan. 2003 ASPDAC03 - Physical Chip Implementation 128
Analytical Constraint Generation
Combine Quadratic techniques with MLP Use Quadratic solution to determine global position (ie balance)Use MLP to determine relative ordering of cells
ASPDAC03 – Physical Chip Implementation – Section IV 65
Jan. 2003 ASPDAC03 - Physical Chip Implementation 129
Poor Solution
Analytical Constraint Generation
Capacity = 2 Capacity = 2
Quadratic solution Area=1Analytical constraintACG solution
Jan. 2003 ASPDAC03 - Physical Chip Implementation 130
Analytical Constraint Generation
ASPDAC03 – Physical Chip Implementation – Section IV 66
Jan. 2003 ASPDAC03 - Physical Chip Implementation 131
Jan. 2003 ASPDAC03 - Physical Chip Implementation 132
ASPDAC03 – Physical Chip Implementation – Section IV 67
Jan. 2003 ASPDAC03 - Physical Chip Implementation 133
Jan. 2003 ASPDAC03 - Physical Chip Implementation 134
ASPDAC03 – Physical Chip Implementation – Section IV 68
Jan. 2003 ASPDAC03 - Physical Chip Implementation 135
Jan. 2003 ASPDAC03 - Physical Chip Implementation 136
ASPDAC03 – Physical Chip Implementation – Section IV 69
Jan. 2003 ASPDAC03 - Physical Chip Implementation 137
Jan. 2003 ASPDAC03 - Physical Chip Implementation 138
ASPDAC03 – Physical Chip Implementation – Section IV 70
Jan. 2003 ASPDAC03 - Physical Chip Implementation 139
Jan. 2003 ASPDAC03 - Physical Chip Implementation 140
ASPDAC03 – Physical Chip Implementation – Section IV 71
Jan. 2003 ASPDAC03 - Physical Chip Implementation 141
Jan. 2003 ASPDAC03 - Physical Chip Implementation 142
ASPDAC03 – Physical Chip Implementation – Section IV 72
Jan. 2003 ASPDAC03 - Physical Chip Implementation 143
Jan. 2003 ASPDAC03 - Physical Chip Implementation 144
ASPDAC03 – Physical Chip Implementation – Section IV 73
Jan. 2003 ASPDAC03 - Physical Chip Implementation 145
Jan. 2003 ASPDAC03 - Physical Chip Implementation 146
ASPDAC03 – Physical Chip Implementation – Section IV 74
Jan. 2003 ASPDAC03 - Physical Chip Implementation 147
Jan. 2003 ASPDAC03 - Physical Chip Implementation 148
ASPDAC03 – Physical Chip Implementation – Section IV 75
Jan. 2003 ASPDAC03 - Physical Chip Implementation 149
Jan. 2003 ASPDAC03 - Physical Chip Implementation 150
ASPDAC03 – Physical Chip Implementation – Section IV 76
Jan. 2003 ASPDAC03 - Physical Chip Implementation 151
MLPw/ACG
Jan. 2003 ASPDAC03 - Physical Chip Implementation 152
Global Route Results::
ASPDAC03 – Physical Chip Implementation – Section IV 77
Jan. 2003 ASPDAC03 - Physical Chip Implementation 153
MLPw/o ACG
Jan. 2003 ASPDAC03 - Physical Chip Implementation 154
Original ACG
Side by Side Comparison:
ASPDAC03 – Physical Chip Implementation – Section IV 78
Jan. 2003 ASPDAC03 - Physical Chip Implementation 155
Jan. 2003 ASPDAC03 - Physical Chip Implementation 156
ASPDAC03 – Physical Chip Implementation – Section IV 79
Jan. 2003 ASPDAC03 - Physical Chip Implementation 157
Observations on Quadratic Placement
placements are predictable and repeatabletiming is inherently betterwire length is not the best, but goodrun time: slower than MLP by 4xrun time: faster than annealing by 4xexcellent “free space” handlingplacements “feel” similar to those produced by annealing
Jan. 2003 ASPDAC03 - Physical Chip Implementation 158
Repeatability Example:One circuitMinimum linear length occurs for all solutions where y=50 0 < x < 100Minimum quadratic length occurs for y=50, x=50Quadratic solution IS both minimum linear and minimum quadratic length
(0,50) (0,100)
ASPDAC03 – Physical Chip Implementation – Section IV 80
Jan. 2003 ASPDAC03 - Physical Chip Implementation 159
Section OutlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
Jan. 2003 ASPDAC03 - Physical Chip Implementation 160
Synthesis - Placement Interface
ASPDAC03 – Physical Chip Implementation – Section IV 81
Jan. 2003 ASPDAC03 - Physical Chip Implementation 161
Read Data
Divide each Partition
Preprocessing
Detailed Placement
While ( any partition has > 2 cells )
Reflow across partitions
Done
Partitioning Algorithm: Partition & Reflow
Global Placement
Netlist
SynthesisSynthesis
Jan. 2003 ASPDAC03 - Physical Chip Implementation 162
What Synthesis Can do when Invoked:
- add boxes- delete boxes- add nets- delete nets- reconnect nets- change box sizes- query placement locations of boxes- query "bin" statistics- remove a box from a bin- add a box to a bin
ASPDAC03 – Physical Chip Implementation – Section IV 82
Jan. 2003 ASPDAC03 - Physical Chip Implementation 163
Placement and Synthesis Integration
Loosely coupled: (methodology coupling)do some synthesis, then write out datado some placement, then write out data.. Repeat
Interleaved: (placement & synthesis in same process)
do pre-pd synthesisfor each placement step redo synthesis
Tightly coupled: (simultaneous P&S aware transforms)
Jan. 2003 ASPDAC03 - Physical Chip Implementation 164
Loosely Coupled Placement & Synthesis:
Characteristics:
- Placement is treated as a black box
- Multiple placement runs are made
Do Placement
Analyze
- re-synthesize- Generate
Constraints
Meet Objectives
Done w/placement
Yes
No
ASPDAC03 – Physical Chip Implementation – Section IV 83
Jan. 2003 ASPDAC03 - Physical Chip Implementation 165
SynthesisSynthesis
Synthesis
Synthesis
Interleaved Placement & Synthesis:
Characteristics:
- the placement flow is the same as ina placement only methodology
- in between each step of the placementprogression, synthesis is invoked
Jan. 2003 ASPDAC03 - Physical Chip Implementation 166
Tightly Coupled
Placement and synthesis algorithms become co-dependentPlacement algorithms have awareness of synthesis activitySynthesis algorithms have awareness of placement activity
ASPDAC03 – Physical Chip Implementation – Section IV 84
Jan. 2003 ASPDAC03 - Physical Chip Implementation 167
Section OutlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
Jan. 2003 ASPDAC03 - Physical Chip Implementation 168
Summary of TechniquesPlacement-Driven X (PDX)
Cloning, Spreading, Sizing, Fanout Reclustering, …(Constant-Delay Methodology)
Buffer InsertionKey problem: Max RAT at Source Interconnect Tree synthesisHeuristic search over topologies + VanGinneken dynamic programming (with practical limits on polarity, buffer location, buffer library, etc. richness of formulation)C-Tree (IBM), recent Q-Tree (UCSD), P/S/U-Tree (UIC), etc.Early timing analysis (slew rate, cap load control): UCSD+IBM
Buffer Block Planning + Buffered Global RoutingDAC-2001 Best Paper from IBM (buffer bays)ASPDAC-2002 Best Paper from UCSD (delay-bounded floorplan evaluation with given buffer plan)Primal-Dual Multi-Commodity Flow approximation
ASPDAC03 – Physical Chip Implementation – Section IV 85
Jan. 2003 ASPDAC03 - Physical Chip Implementation 169
Placement Driven Cloning
critical
non-critical
Cloning to off-load non-critical path from critical path
Jan. 2003 ASPDAC03 - Physical Chip Implementation 170
Placement Driven ExpansionLogic
Logic
AO
LogicLogic
Logic
LogicExpansion allows primitives to be placed in a more timing friendly way
Expansion Transformation
ASPDAC03 – Physical Chip Implementation – Section IV 86
Jan. 2003 ASPDAC03 - Physical Chip Implementation 171
Example:
Tightly Coupled Placement Driven Expansion
Jan. 2003 ASPDAC03 - Physical Chip Implementation 172
Tightly Coupled Synthesis & Placement:
abcdefg
Transform
ad
bef
cg
ASPDAC03 – Physical Chip Implementation – Section IV 87
Jan. 2003 ASPDAC03 - Physical Chip Implementation 173
ab
c
d e
fg
Tightly Coupled Synthesis & Placement Example:Suppose the primary IO constraints look like this:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 174
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
The placement of the synthesized netlist would look something like this:
ASPDAC03 – Physical Chip Implementation – Section IV 88
Jan. 2003 ASPDAC03 - Physical Chip Implementation 175
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
If we could re-synthesize the netlist, we could get something that looks like this.
Jan. 2003 ASPDAC03 - Physical Chip Implementation 176
Tightly Coupled Synthesis & Placement:
abc
defg
PD-MAP
abc
defg
weight = 1/10
weight = 1
ASPDAC03 – Physical Chip Implementation – Section IV 89
Jan. 2003 ASPDAC03 - Physical Chip Implementation 177
Tightly Coupled Synthesis & Placement example:
Map_TREEFor each cut
partition_itFor each partitionIf(partition number > M){
if(related_node_count < N)merge_nodes
if(related_node_count == 1)merge_node into neighbor
partition}
endend
Jan. 2003 ASPDAC03 - Physical Chip Implementation 178
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
ASPDAC03 – Physical Chip Implementation – Section IV 90
Jan. 2003 ASPDAC03 - Physical Chip Implementation 179
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
Jan. 2003 ASPDAC03 - Physical Chip Implementation 180
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
ASPDAC03 – Physical Chip Implementation – Section IV 91
Jan. 2003 ASPDAC03 - Physical Chip Implementation 181
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
Jan. 2003 ASPDAC03 - Physical Chip Implementation 182
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
ASPDAC03 – Physical Chip Implementation – Section IV 92
Jan. 2003 ASPDAC03 - Physical Chip Implementation 183
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
Jan. 2003 ASPDAC03 - Physical Chip Implementation 184
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
ab
c
fg
d e
Result
ASPDAC03 – Physical Chip Implementation – Section IV 93
Jan. 2003 ASPDAC03 - Physical Chip Implementation 185
Placement Driven Timing Correction
Jan. 2003 ASPDAC03 - Physical Chip Implementation 186
Redesign Fan-in Treea
cd
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=6
1
1
1
cd
e
Arr(e)=51
1b1
a
e
e
Arr(e)=0
ASPDAC03 – Physical Chip Implementation – Section IV 94
Jan. 2003 ASPDAC03 - Physical Chip Implementation 187
Placement Driven Repowering
Repowering is traditionally done using load based cell characterizationPlacement changes continuously during partitioningNeed high efficiency algorithms to do repowering in this environmentSolution: Use Gain Based Formulation
Jan. 2003 ASPDAC03 - Physical Chip Implementation 188
Delay Models
inC in
out
C
Cg =
outC
pC
CEcdin
outinv +
).+=
β1(. .1
pCkd out+= .1
Load based formulation:
pgld += .
inC.β
inC outC
Gain based formulation:
pC
CEcdin
outinv +
).+=
β1(. .1
d: delay
l: logical effort
g: gain
p: intrinsic delay
1k l
ASPDAC03 – Physical Chip Implementation – Section IV 95
Jan. 2003 ASPDAC03 - Physical Chip Implementation 189
Area vs Delay CentricLoad Based Paradigm
• (load-based delay eq.)
• sizedKnow:
Size of each cellTotal Area ->
– area centricDon’t know:
Wire loadsDelay of each cellDelay of a path
Estimation error is in the delay:Local ‘path based’ property.
Gain Based Paradigm• (gain based delay
eq.)• sizeless
Know:The delay of each cell.The delay of a path ->
– delay centricDon’t know:
Wire loadsThe area of each cellThe total area
Estimation error is in the areaGlobal property.
Jan. 2003 ASPDAC03 - Physical Chip Implementation 190
Design FlowHigh Level Synthesis
Restructuring
Tech Mapping
Late Timing Corr
LibraryAnalysis GainBased Opt
Discretization
LoadBasedDelay(DCL)
GainBasedDelay
ASPDAC03 – Physical Chip Implementation – Section IV 96
Jan. 2003 ASPDAC03 - Physical Chip Implementation 191
Power Levels (Gate Sizes)
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1Cout
d
A B CoutC
d
Jan. 2003 ASPDAC03 - Physical Chip Implementation 192
Library (Gain) Analysis
00.010.020.030.040.05
0.5 2.5 4.5 6.5 8.5 10.5g
d
A B C
in
out
CCg =
pgld += .
inCoutC
d
ASPDAC03 – Physical Chip Implementation – Section IV 97
Jan. 2003 ASPDAC03 - Physical Chip Implementation 193
Area and Load Calculation
Start at primary outputs/ register inputs. Much like static timing analysis.Incremental.
1g 2g 3g
33
gCC out=
2
32
gCC =
1
21
gCC =
71.0=outC
71.0=outC
1g2g 3g
2
32
gCC =
1
21
gCC =
Jan. 2003 ASPDAC03 - Physical Chip Implementation 194
Gain Calculation
outCinC
1d 3dD
4d2d
43211
dddddDN
ii +++== ∑
=
in
outout
in
N
ii C
CCC
CC
CC
CCgggggG ==
=== ∏ 43
4
2
3.
24321
1....
Minimize D such that:in
out
CCG =
∑∑==
+=N
ii
N
ii pfD
11 CinCoutL
N
igi
N
ili
N
ifiF .
1.
11=∏
=∏=
=∏=
=
Solution: ffi =Geometric
pfpgld +=+= .
Minimize Such that:
ASPDAC03 – Physical Chip Implementation – Section IV 98
Jan. 2003 ASPDAC03 - Physical Chip Implementation 195
Example I
71.0=outC19.0=inC
1d
2
3021.00496.0 :NOR2CCd ×+=
inCCd 2011.00308.0 :NAND2 ×+=
3009.00295.0 :INV
CCd out×+=
2d 3d
000008.03
3.21. ====in
out
CCLffffF
0203.0=f
Nand2 Nor2 Inv Path
p 0.0308 0.0496 0.0295 0.1099
f 0.0203 0.0203 0.0203 0.0609
d 0.0511 0.0699 0.0498 0.1708
Cin 0.19 0.3364 0.3283 .
3C2C
Jan. 2003 ASPDAC03 - Physical Chip Implementation 196
Constant Delay Calculation
outC
ccc
o
c
c
c
fpdglpd
Cout
dCingCout
gpgld
=−==
=
=
=+=
. :Calculate :Measure
0 :Set
:Measure
:Calculate
6.3 :Set.
:Inverter
cd
nandnandnandnandc
nor
nand
pgld
gg
+=
==
.
8.15.2
..
outC
cd
cfgl =.:gatesOther
ASPDAC03 – Physical Chip Implementation – Section IV 99
Jan. 2003 ASPDAC03 - Physical Chip Implementation 197
DiscretizationFrom gain-based model back to appropriate power levelsThere is an error in timing/load when ‘ideal’ power levels are not available.
Goal: Minimize this error.Can be tuned to delay error or capacitance error..
1g 2g 3g
33
gCC out=
2
32
gCC =
1
21
gCC =
71.0=outC
pgld += .
[Kudva98][Beeftink98]
Jan. 2003 ASPDAC03 - Physical Chip Implementation 198
Gain Based: Observations:Gain Based algorithms: A major improvement.
More homogeneous (global) algorithms and designs.Can be better targeted for area and/or delay.
Reveal inherent cell characteristics to optimization tools, leading to improved QOR
Good library design is required to facilitate discretization step
Ideally suited for operation within Physical Synthesis
ASPDAC03 – Physical Chip Implementation – Section IV 100
Jan. 2003 ASPDAC03 - Physical Chip Implementation 199
Placement Driven Buffering
Rip Out all Buffers
Insert Buffers based on placement info
Jan. 2003 ASPDAC03 - Physical Chip Implementation 200
What to do About Long Wires?
Add buffersTune wire sizesModify the placement to reduce them
ASPDAC03 – Physical Chip Implementation – Section IV 101
Jan. 2003 ASPDAC03 - Physical Chip Implementation 201
Placement Driven vs Logic Driven Buffer Insertion
Logic driven buffer insertion focuses on logic topology and buffer sizing while assuming a statistical wire load model Placement driven buffering uses an existing placement as the fundamental constraint
Jan. 2003 ASPDAC03 - Physical Chip Implementation 202
Multiple buffer typesInvertersCapacitance, Slew and Noise constraintsWire SizingSimultaneous driver sizingHigh order interconnect delay and CeffectiveBlockage handling
Placement Driven Buffer Insertion: Buffopt (IBM)
ASPDAC03 – Physical Chip Implementation – Section IV 102
Jan. 2003 ASPDAC03 - Physical Chip Implementation 203
How Do Buffers Help?Reduce delay
Wire delay quadratic in lengthBuffers make delay essentially linearDelay gate dominated, not wire dominated
Fix other problemsBad slews at sinksCapacitance range violationsNoise induced by capacitance coupling
Jan. 2003 ASPDAC03 - Physical Chip Implementation 204
How Does Wire Sizing Help?
Highly resistive lines increase delayWider wires or thick metal layers reduces resistance, but can increase capacitanceFor long interconnect, resistance reduction outweighs capacitance increase
ASPDAC03 – Physical Chip Implementation – Section IV 103
Jan. 2003 ASPDAC03 - Physical Chip Implementation 205
Simple Buffer Insertion ProblemGiven: Source and sink locations, sink capacitancesand RATs, a buffer type, source delay rules, unit wire resistance and capacitance
Buffer
RAT1
RAT2
RAT3RAT4
s0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 206
Simple Buffer Insertion ProblemFind: Buffer locations and a routing tree such that slack at the source is minimized
RAT2
RAT3RAT4
RAT1
s0
)},()({min)( 0410 iii ssdelaysRATsq −= ≤≤
ASPDAC03 – Physical Chip Implementation – Section IV 104
Jan. 2003 ASPDAC03 - Physical Chip Implementation 207
Fundamental Buffer Insertion
Van Ginneken’s dynamic programming algorithmBuilding block: candidate (Cap, slack)
Candidates for each node stored as a listEach sink has one candidatePropagate candidates up the tree
Guarantees optimal solutionQuadratic complexity
Jan. 2003 ASPDAC03 - Physical Chip Implementation 208
Assumptions for the Basic Van Ginneken algorithm:
Given a routing treeGiven a set of potential insertion pointsSingle buffer sizeNo sink or driver sizingLinear gate delay model
Rd Cdown + Kd
Elmore wire delay modelRw (Cw/2 + Cdown)
ASPDAC03 – Physical Chip Implementation – Section IV 105
Jan. 2003 ASPDAC03 - Physical Chip Implementation 209
Van Ginneken ExtensionsMultiple buffer typesInvertersCapacitance, Slew and Noise constraintsWire SizingSimultaneous driver sizingHigh order interconnect delay and CeffectiveBlockage recognition
Jan. 2003 ASPDAC03 - Physical Chip Implementation 210
Example- Connect the end points of the net
using a steiner route
- Add Candidate Nodes
- Final buffer solution is optimal for this route, and this set of candidate nodes.
- Other routes may produce betterfinal solutions.
- Net routing topology is an inputto Van Ginneken’s algorithm
ASPDAC03 – Physical Chip Implementation – Section IV 106
Jan. 2003 ASPDAC03 - Physical Chip Implementation 211
Example
Jan. 2003 ASPDAC03 - Physical Chip Implementation 212
How Many Candidates?Number of candidates seems to double with each additional node
Prune candidate with worst slack when capacitances is greater or equalLinear number of candidates
ASPDAC03 – Physical Chip Implementation – Section IV 107
Jan. 2003 ASPDAC03 - Physical Chip Implementation 213
Pseudo Code:
List = NULL;For each node (bottom up traversal of graph){
augment each item in list with wire segment up to nodeduplicate the listfor each element of the duplicate list
add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list
} Pick best solution;
12 16 4 35
Jan. 2003 ASPDAC03 - Physical Chip Implementation 214
Example
12 16 4 35Node 1 processing: 2 evaluations, at most 2 candidates kept Node 2 processing: 4 evaluations, at most 3 candidates keptNode 3 processing: 6 evaluations, at most 4 candidates keptNode 4 processing: 8 evaluations, at most 5 candidates keptNode 5 processing: 10 evaluations, at most 6 candidates keptNode 6 processing: 12 evaluations, at most 7 candidates kept
Now pick the best one: Optimal solution
1_
)1)((2_1
+<=
+== ∑=
NcandidatesNum
NNisevaluationNumN
i
ASPDAC03 – Physical Chip Implementation – Section IV 108
Jan. 2003 ASPDAC03 - Physical Chip Implementation 215
Merging Branches
Critical
Merge is additive
Jan. 2003 ASPDAC03 - Physical Chip Implementation 216
Van Ginneken Algorithm Summary
GoodClever pruning controls # of candidatesFinds an optimal solution in quadratic timeEasily extended to cover a variety of important considerations (like multiple buffer types, wire sizing, polarity, slew, & capacitance constraints, etc.
BadResults depend on quality of route provided
ASPDAC03 – Physical Chip Implementation – Section IV 109
Jan. 2003 ASPDAC03 - Physical Chip Implementation 217
Example Route:Critical: can not offloaddue to route
Different route leadsto better solution
Jan. 2003 ASPDAC03 - Physical Chip Implementation 218
Physical Synthesis FlowSynthesized NetlistWire-load Models
UnplacedPhysically “unaware” timing
Cleanup: Remove buffers, nominal power levels on gates
Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing
Logical + Placement optimizations
Timing-driven placement w/resynthesis
For minimal netweights, based on the timing of the net
Physically aware logic optimizations
Timing Improvement
?Placed Netlist
Yes No more
ASPDAC03 – Physical Chip Implementation – Section IV 110
Jan. 2003 ASPDAC03 - Physical Chip Implementation 219
Example Route:
If still critical, add net weight
Jan. 2003 ASPDAC03 - Physical Chip Implementation 220
Example Route:
ASPDAC03 – Physical Chip Implementation – Section IV 111
Jan. 2003 ASPDAC03 - Physical Chip Implementation 221
Multiple Buffer Types
Instead of one buffer type, can choose from m power levelsGenerate m candidates instead of oneStill optimalComplexity increase quadratic in m
Jan. 2003 ASPDAC03 - Physical Chip Implementation 222
Inverters
Store candidates in “+” and “-” lists+ implies polarity preserved- implies polarity reversed
Adding inverterSwitches candidate in + list to - listSwitches candidate in - to + list
Final result only chosen from + list
ASPDAC03 – Physical Chip Implementation – Section IV 112
Jan. 2003 ASPDAC03 - Physical Chip Implementation 223
Capacitance Constraints
Each gate g can drive at most C(g) capacitanceWhen inserting buffer g, check downstream capacitance. If it is bigger than C(g), throw out candidateIncreases efficiency
Jan. 2003 ASPDAC03 - Physical Chip Implementation 224
Slew Constraints
Similar to capacitance constraintsWhen inserting buffer, compute slews to gates driven by bufferIf any slew exceeds its target, throw out candidatePotential difficulty: computing slew accurately in bottom-up fashion
ASPDAC03 – Physical Chip Implementation – Section IV 113
Jan. 2003 ASPDAC03 - Physical Chip Implementation 225
Noise Constraints
Each gate has acceptable noise thresholdCompute cumulative noise for each wire viaDevgan noise metricThrow out candidates that violate noise
Can avoid noise while optimizing timing!
Jan. 2003 ASPDAC03 - Physical Chip Implementation 226
Wire Sizing:
For each node (bottom up traversal of graph){
for each Wire Size{
augment each item in list with Sized wire segmentduplicate the listfor each element of the duplicate list
add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list
}} Do Final pruning & Pick best solution;
12 16 4 35
ASPDAC03 – Physical Chip Implementation – Section IV 114
Jan. 2003 ASPDAC03 - Physical Chip Implementation 227
Blockage Recognition
Delete insertion points that run over blockages
Jan. 2003 ASPDAC03 - Physical Chip Implementation 228
Route Around Blockage
ASPDAC03 – Physical Chip Implementation – Section IV 115
Jan. 2003 ASPDAC03 - Physical Chip Implementation 229
Buffer Bays
Jan. 2003 ASPDAC03 - Physical Chip Implementation 230
Routing Into Buffer Bays
ASPDAC03 – Physical Chip Implementation – Section IV 116
Jan. 2003 ASPDAC03 - Physical Chip Implementation 231
“Buffer Site”Similar to buffer bays, only exact buffer locations are pre-specified, not just areasUseful as a mechanism for IP blocks and microprocessor designDummy cell that holds a bufferNot connected to any netBecomes buffer when assigned to a netExtra sites decoupling capsSprinkle sites throughout designAllocate percentage within macros
Jan. 2003 ASPDAC03 - Physical Chip Implementation 232
Routing Into Buffer Sites
ASPDAC03 – Physical Chip Implementation – Section IV 117
Jan. 2003 ASPDAC03 - Physical Chip Implementation 233
Generate Steiner Tree
Jan. 2003 ASPDAC03 - Physical Chip Implementation 234
Reduce Congestion and Coupling
ASPDAC03 – Physical Chip Implementation – Section IV 118
Jan. 2003 ASPDAC03 - Physical Chip Implementation 235
Reduce Congestion and Coupling
Jan. 2003 ASPDAC03 - Physical Chip Implementation 236
Assign Buffers
ASPDAC03 – Physical Chip Implementation – Section IV 119
Jan. 2003 ASPDAC03 - Physical Chip Implementation 237
Comments about Buffering and Wire Sizing:
Extremely critical: One of the highest leverage timing closure itemsThere are extended provably correct algorithms for dealing with the problem.Steiner route & Blockage avoidance are mostly heuristic: Hot research area!
Jan. 2003 ASPDAC03 - Physical Chip Implementation 238
Section OutlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting Optimization
ASPDAC03 – Physical Chip Implementation – Section IV 120
Jan. 2003 ASPDAC03 - Physical Chip Implementation 239
Congestion Mitigation
Jan. 2003 ASPDAC03 - Physical Chip Implementation 240
Sources of CongestionPlacement Quality: Do we have a good relative ordering of cells?Placement Density: Do we have appropriate cell spreading?Preplacement of large cells: Is there a better location for these cells?Floorplan quality: Is this a good floorplan / hierarchy?Netlist complexity: Are some logic groupings inherently difficult to routeLibrary characteristics: Do some cells block too much metal internally?
ASPDAC03 – Physical Chip Implementation – Section IV 121
Jan. 2003 ASPDAC03 - Physical Chip Implementation 241
Congestion MitigationConstructive Avoidance
control global placement pin density: fewer pins per unit area means fewer wires per unit areamonitor congestion during placement and perform dynamic spreading
Post placement fix upremove problems from an already placed netlist
Jan. 2003 ASPDAC03 - Physical Chip Implementation 242
Groute / Spread / Redo
Constructive Avoidance:
Characteristics:
- as placement is formed, take action to avoid problems
- between each step of the placementprogression there is the potential to evaluate congestion and take action
Groute / Spread / Redo
Groute / Spread / Redo
… etc
ASPDAC03 – Physical Chip Implementation – Section IV 122
Jan. 2003 ASPDAC03 - Physical Chip Implementation 243
Constructive Avoidance Deficiencies:
Depends on early estimates of congestion that may not be accurate enough to avoid all problemsPost placement actions such as clock tree insertion, repowering, buffering, etc may add congestion to the designGuard banding with conservative “constructive avoidance” causes lose of performance and density
Jan. 2003 ASPDAC03 - Physical Chip Implementation 244
Post Placement Congestion Mitigation
Use production global router, not internal placement based global routerTranslate congestion values into density targets for placement regionsPerform flow based circuit spreadingPreserve relative logic ordering of cells
ASPDAC03 – Physical Chip Implementation – Section IV 123
Jan. 2003 ASPDAC03 - Physical Chip Implementation 245
Network Flow Based Spreading
Supply Nodes Demand Nodes
s t
i j
b(i) > 0 b(j) < 0
Min-cost max-flow formulation, similar to any “fix-up spreader”: “thermal” placement, Bonn’s top-down placer (Vygen), etc.
i if b(i) > 0,Cap(esi) = b(i)Cost(esi) = 0
i s , j t, Cap(eij) = Infinity (Large Int)Cost(eij) = K
j if b(j) < 0,Cap(ejt) = -b(j)Cost(ejt) = 0
Jan. 2003 ASPDAC03 - Physical Chip Implementation 246
Initial Placement
Calculate bin levelcongestion
Is Congestion
belowthreshold ?
Translate bin score tobin target density
Network flow based circuit spreading
Final Placement
Congestion Driven Circuit Spreading
Yes
No
ASPDAC03 – Physical Chip Implementation – Section IV 124
Jan. 2003 ASPDAC03 - Physical Chip Implementation 247
Jan. 2003 ASPDAC03 - Physical Chip Implementation 248
ASPDAC03 – Physical Chip Implementation – Section IV 125
Jan. 2003 ASPDAC03 - Physical Chip Implementation 249
Jan. 2003 ASPDAC03 - Physical Chip Implementation 250
ASPDAC03 – Physical Chip Implementation – Section IV 126
Jan. 2003 ASPDAC03 - Physical Chip Implementation 251
We’ve Talked About
Placement algorithmsPlacement / Synthesis interactionPlacement aware synthesis techniquesThe Constant Delay paradigmPhysical Buffer insertion / Wire sizingCongestion Mitigation
Jan. 2003 ASPDAC03 - Physical Chip Implementation 252
Let’s Look at some Examples:
ASPDAC03 – Physical Chip Implementation – Section IV 127
Jan. 2003 ASPDAC03 - Physical Chip Implementation 253
Pure MLP Quadratic
Jan. 2003 ASPDAC03 - Physical Chip Implementation 254
shatterclonefaninbuffer
Optimization Results
ASPDAC03 – Physical Chip Implementation – Section IV 128
Jan. 2003 ASPDAC03 - Physical Chip Implementation 255
Optimization Results
Jan. 2003 ASPDAC03 - Physical Chip Implementation 256
Optimization Results
ASPDAC03 – Physical Chip Implementation – Section IV 129
Jan. 2003 ASPDAC03 - Physical Chip Implementation 257
Section outlineIntroductionReview material (timing and synthesis)Introduction to placementPlacement algorithmsParadigms for placement-synthesis integrationPlacement aware synthesis techniquesCongestion avoidance / mitigation techniquesRouting optimization
Jan. 2003 ASPDAC03 - Physical Chip Implementation 258
Routing Based Optimization: RBO (IBM)
ASPDAC03 – Physical Chip Implementation – Section IV 130
Jan. 2003 ASPDAC03 - Physical Chip Implementation 259
Routing based Timing Closure Issues
Post Routing timing problems can be significantaffect design schedulemay be too numerous to fix manually
Increasing design density can reduce cost, but it also increases wiring congestion
timing and signal integrity become more significantavailable resource for manual fixup is limitedwithout automation may not be doable
Rerouting with constraints may resolve some of the problems, but this process is slow
Jan. 2003 ASPDAC03 - Physical Chip Implementation 260
Solution:
Integrate global routing, detailed routing and timing correctionGlobal routing is efficient enough to be run in an iterative timing closure loopTiming critical nets avoid scenic routesNon-critical nets that go scenic can be repowered and buffered prior to detailed routing
ASPDAC03 – Physical Chip Implementation – Section IV 131
Jan. 2003 ASPDAC03 - Physical Chip Implementation 261
critical critical pathspaths
non-critical pathsPDS Timinguses steiner wires - fast
ideal "Steiner" routes
Timing deficient wiring solution
Post PD Timing Catches this problem: Slow!
Timing driven wiring solution
RBO Timing Driven Routing sees this during global route stage: Fast!
Force optimal use of wiring resource (e.g.
critical paths get direct route)
Example Problem:
Jan. 2003 ASPDAC03 - Physical Chip Implementation 262
Global RoutingDivides the entire chip into localized rectangular regions called tiles.Compress several pin location in each tile to a single pin location
All the shapes, wires and open are represented in terms of globaltrack capacity and usage.
ASPDAC03 – Physical Chip Implementation – Section IV 132
Jan. 2003 ASPDAC03 - Physical Chip Implementation 263
Global Routing
Two step approachCreate the initial steiner routesCompute the edge congestion's on the gridPerform a rip-up reroute using shortest path algorithm to reduce the overall congestion of the design
AdvantagesCan communicate with detail routerGood correlation with final detail routing solution
Jan. 2003 ASPDAC03 - Physical Chip Implementation 264
Current Methodology
Physical Synthesis
Global Routing
Detailed Routing
Timing Analysis
RBO Methodology
Physical Synthesis
RBO / Physical
Synthesis
Detailed Routing
Analysis
XrGlobal
Extractor Optimizer
Einstimer
No Timing Criticality for Global router
Costly Manual Timing Correction
Routing Based Optimization
ASPDAC03 – Physical Chip Implementation – Section IV 133
Jan. 2003 ASPDAC03 - Physical Chip Implementation 265
RBO Extraction ProcessVery fast
Excellent correlation with final 3D extractionUses global routes for extractionNeighbor information probabilistically determined based on the global routing congestion informationBased on extraction tables
Capacity of All Edges = 5
Probability of having a neighbor = (#OccupiedTracks)/(#Capacity) = 2/4 = 0.5
1
3 3 2
Jan. 2003 ASPDAC03 - Physical Chip Implementation 266
RBO results onRBO results on memcntlmemcntlDesign : Example 1Nets : ~1.6MSize : 23193 x 23193Congestion : Attached is a display ofGlobal congestion
ASPDAC03 – Physical Chip Implementation – Section IV 134
Jan. 2003 ASPDAC03 - Physical Chip Implementation 267
Timing Critical Nets: Without RBO
Jan. 2003 ASPDAC03 - Physical Chip Implementation 268
Nets Routed with RBO flow
ASPDAC03 – Physical Chip Implementation – Section IV 135
Jan. 2003 ASPDAC03 - Physical Chip Implementation 269
RBO Results - Example 1Worst Slack #Slack
Violations#Cap Violations
#Slew Violations
#Opens #Loops
Steiner Estimates
-0.47 17 1 18
XrLocalwithout RBO
-1.57 4687 14 128 50 87020
RBO Timing Closure (Global Routes)
-0.48 209
Detailed routing with RBO
-0.43 14 18 1 54
Jan. 2003 ASPDAC03 - Physical Chip Implementation 270
Example 2: - Critical Net Routed Without RBO
ASPDAC03 – Physical Chip Implementation – Section IV 136
Jan. 2003 ASPDAC03 - Physical Chip Implementation 271
Example 2: Critical Net Routed With RBO
Jan. 2003 ASPDAC03 - Physical Chip Implementation 272
Example 2 Results Summary
Worst Slack #Slack Violations
#Cap Violations
#Slew Violations
#Opens #Loops
Final routing without RBO
-0.54 1224 33 270 0 1152
Using RBO -0.29 909 32 274 0 1070
ASPDAC03 – Physical Chip Implementation – Section IV 137
Jan. 2003 ASPDAC03 - Physical Chip Implementation 273
PDS - RBO Integration
Not N
oise Aw
are
Noise A
ware
112
2
3
45
67
8
9
10
11
11223
4567
89
10
11
Steiner Routes Timing Closure
PDS-Einstimer
Current Flow
Timing SignOff
ChipEdit-Einstimer
Detail Routing
Xrouter
No N
oiseSignO
ff
Probabilistic Detection of Noise Problems
PDS-Einstimer-RBO
Noise Avoidance
PDS-Einstimer-RBO
SignOff Noise Detection
ETCoupling-3DNoise
Noise Correction
Manual Correction
Timing Closure
PDS-Einstimer-RBO
Steiner-Global
Projected Flow
Steiner Routes Timing Closure
PDS-Einstimer
Global Routes Timing Closure
RBO-Einstimer
SignOff Noise Detection
ETCoupling-3DNoise
Noise Correction
Manual Correction
Proposed Flow -Existing Tools
Jan. 2003 ASPDAC03 - Physical Chip Implementation 274
Noise Detection and Avoidance:RBO (Detection)
Length BaseInitial selection includes length and slack thresholdFurther pruning based on Worst Case Miller Timing
Switching Window based refinementPattern generation based on switching window overlaps
RBO (Avoidance)Long Net Spreading Track Reordering Incremental Placement Changes Layer Assignment
ASPDAC03 – Physical Chip Implementation – Section IV 138
Jan. 2003 ASPDAC03 - Physical Chip Implementation 275
Noise Detection and Avoidance:Wire width selectionPhysical Synthesis
IntegrationFix Cap And Slew Violations with Global Routes Interface to RBO Noise Alleviation Resizing Noise Aware Buffering
Jan. 2003 ASPDAC03 - Physical Chip Implementation 276
Long Net Spreader
ASPDAC03 – Physical Chip Implementation – Section IV 139
Jan. 2003 ASPDAC03 - Physical Chip Implementation 277
Wrap UpTiming closure today is highly dependant on integrated tools.Tightly integrated Placement, Timing & Synthesis tools are available today from multiple vendors.Placement techniques are dominated quadratic techniques and partitioningNext on the list for integration are Routing and Signal integrity tools (happening now) These tools have a high degree of complexity. It takes large well funded DA organizations to compete in this space.
Jan. 2003 ASPDAC03 - Physical Chip Implementation 278
Placement ReferencesC. J. Alpert, T. Chan, D. J.C. J. Alpert, T. Chan, D. J.--H,H,\\. Huang, I. Markov, and K. . Huang, I. Markov, and K. YanYan, “, “Quandratic Quandratic Placement Revisited”,Proc. 34th IEEE/ACM Design Automation ConfePlacement Revisited”,Proc. 34th IEEE/ACM Design Automation Conference, 1997, rence, 1997, pp. 752pp. 752--757757C. J. Alpert, J.C. J. Alpert, J.--H Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, ProH Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, Proc. 34th c. 34th IEEE/ACM Design Automation Conference, 1997, pp. 530IEEE/ACM Design Automation Conference, 1997, pp. 530--533533U. Brenner, and A. U. Brenner, and A. RoheRohe, “An Effective Congestion Driven Placement Framework”, , “An Effective Congestion Driven Placement Framework”, International Symposium on Physical Design 2002, pp. 6International Symposium on Physical Design 2002, pp. 6--1111A. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive BisA. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive Bisection Alone ection Alone Produce Routable Placements”,Proc. 37th IEEE/ACM Design AutomatiProduce Routable Placements”,Proc. 37th IEEE/ACM Design Automation Conference, on Conference, 2000, pp 4772000, pp 477--482482M.A. M.A. BreuerBreuer, “Min, “Min--Cut Placement”, J. Design Automation and Fault Tolerant Cut Placement”, J. Design Automation and Fault Tolerant Computing, I(4), 1997, pp 343Computing, I(4), 1997, pp 343--362362J. J. VygenVygen, “Algorithms for Large, “Algorithms for Large--Scale Flat Placement”, Proc. 34th IEEE/ACM Design Scale Flat Placement”, Proc. 34th IEEE/ACM Design Automation Conference, 1988,pp 746Automation Conference, 1988,pp 746--751751H. H. Eisenmann Eisenmann and F. M. Johannes, “Generic Global Placement and and F. M. Johannes, “Generic Global Placement and FloorplanningFloorplanning”, ”, Proc. 35th IEEE/ACM Design Automation Conference, 1998, pp. 269Proc. 35th IEEE/ACM Design Automation Conference, 1998, pp. 269--274274S.S.--L. L. Ou Ou and M. and M. PedramPedram, “Timing Driven Placement Based on Partitioning with , “Timing Driven Placement Based on Partitioning with Dynamic CutDynamic Cut--Net Control”, Proc. 37th IEEE/ACM Design Automation Conference, Net Control”, Proc. 37th IEEE/ACM Design Automation Conference, 2000, pp. 4722000, pp. 472--476476C.M. C.M. Fiduccia Fiduccia and R.M. and R.M. MattheysesMattheyses, A linear time heuristic for improving network , A linear time heuristic for improving network partitions, partitions, ProcProc. ACM/IEEE Design Automation Conference. (1982) . ACM/IEEE Design Automation Conference. (1982) pppp. 175 . 175 -- 181.181.
ASPDAC03 – Physical Chip Implementation – Section IV 140
Jan. 2003 ASPDAC03 - Physical Chip Implementation 279
Synthesis ReferencesC.L. Berman, J. L. Carter, and K.F. Day. The C.L. Berman, J. L. Carter, and K.F. Day. The Fanout Fanout Problem: From Theory to Practice. In Problem: From Theory to Practice. In Advanced Research in VLSI: Proceedings of the 1989 Decennial CaAdvanced Research in VLSI: Proceedings of the 1989 Decennial Caltech Conference, pages ltech Conference, pages 6969--99, 198999, 1989C. L. Berman, D. J. Hathaway, A. S. C. L. Berman, D. J. Hathaway, A. S. LaPaughLaPaugh, and L. H. , and L. H. TrevillyanTrevillyan. Efficient Techniques for . Efficient Techniques for Timing Corrections. In International Symposium on Circuits and Timing Corrections. In International Symposium on Circuits and Systems, Pages 415Systems, Pages 415--419, 1990419, 1990F. F. BeeftingBeefting, P. N. , P. N. KudvaKudva, D. S. Kung, R. , D. S. Kung, R. PuriPuri, and L. , and L. StokStok. Combinatorial Cell Design for CMOS . Combinatorial Cell Design for CMOS Libraries INTEGRATION, the VLSI Journal, 29:67Libraries INTEGRATION, the VLSI Journal, 29:67--93, 200093, 2000W. W. DonathDonath, P. , P. KudvaKudva, L. , L. StokStok, P. Villarrubia, L. Reddy, and A. Sullivan. Transformational , P. Villarrubia, L. Reddy, and A. Sullivan. Transformational placement and synthesis. In DATE, pages 194placement and synthesis. In DATE, pages 194--201, 2000201, 2000D. J. Hathaway, R.P. D. J. Hathaway, R.P. AbatoAbato, A.D. , A.D. DrummDrumm, and L.P.P.P . Van , and L.P.P.P . Van GinnekenGinneken. Incremental timing . Incremental timing analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508,937.,937.D. Kung, P. D. Kung, P. KudvaKudva, and A. Sullivan. A Gate Sizing Algorithm using Geometric Prog, and A. Sullivan. A Gate Sizing Algorithm using Geometric Programming. In ramming. In Proc. Of the International Workshop on Logic Synthesis, 1997Proc. Of the International Workshop on Logic Synthesis, 1997T. T. Kutzschebauch Kutzschebauch and L. and L. StokStok. Regularity driven logic synthesis. In Proc of the Int. Conf.. Regularity driven logic synthesis. In Proc of the Int. Conf. On On Computer Aided Design, Nov 2000.Computer Aided Design, Nov 2000.P. P. RezvaniRezvani, A.H. , A.H. AjamiAjami, M. , M. PedramPedram, and H. , and H. SavojSavoj. LEOPARD: A Logical Effort based . LEOPARD: A Logical Effort based fanout fanout Optimizer for Area and Delay. In IEEE/ACM International ConfereOptimizer for Area and Delay. In IEEE/ACM International Conference on CAD, pages 516nce on CAD, pages 516--519, 519, 1999.1999.L. L. StokStok, M. , M. IyerIyer, and A. Sullivan. , and A. Sullivan. Wavefront Wavefront technology mapping. In DATE, pages 531technology mapping. In DATE, pages 531--536, 536, 19991999D. S. Kung. A Fast D. S. Kung. A Fast Fanout Fanout Optimization for NewOptimization for New--Continuous Buffer Libraries. In IEEE/ACM Continuous Buffer Libraries. In IEEE/ACM Design Automation Conference, pages 352Design Automation Conference, pages 352--355, 1998355, 1998
Jan. 2003 ASPDAC03 - Physical Chip Implementation 280
DP Buffer Insertion References
Buffer placement in distributed RCBuffer placement in distributed RC--tree networks for minimal Elmore delay tree networks for minimal Elmore delay van van GinnekenGinneken, L.P.P.P. Circuits and Systems, 1990., IEEE International , L.P.P.P. Circuits and Systems, 1990., IEEE International Symposium on , 1990 Page(s): 865 Symposium on , 1990 Page(s): 865 --868 vol.2868 vol.2Optimal wire sizing and buffer insertion for low power and a genOptimal wire sizing and buffer insertion for low power and a generalized delay eralized delay modelmodel LillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; Lin, T.Cheng; Lin, T.--T.Y. SolidT.Y. Solid--State Circuits, IEEE State Circuits, IEEE Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 ––447447Buffer insertion for noise and delay optimization Alpert, C.J.;Buffer insertion for noise and delay optimization Alpert, C.J.; DevganDevgan, A.; , A.; Quay, S.T. ComputerQuay, S.T. Computer--Aided Design of Integrated Circuits and Systems, IEEE Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633 --16451645Buffer insertion with accurate gate and interconnect delay compuBuffer insertion with accurate gate and interconnect delay computation Alpert, tation Alpert, C.J.;C.J.; DevganDevgan, A.; Quay, S.T. Design Automation Conference, 1999. , A.; Quay, S.T. Design Automation Conference, 1999. Proceedings. 36th , 1999 Page(s): 479 Proceedings. 36th , 1999 Page(s): 479 ––484484Wire Segmenting For Improved Buffer Insertion Alpert, C.;Wire Segmenting For Improved Buffer Insertion Alpert, C.; DevganDevgan, A. Design , A. Design Automation Conference, 1997. Proceedings of the 34th Page(s): 58Automation Conference, 1997. Proceedings of the 34th Page(s): 588 8 ––593593Simultaneous routing and buffer insertion for high performance iSimultaneous routing and buffer insertion for high performance interconnectnterconnectLillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; TingCheng; Ting--Ting Y. Lin VLSI, 1996. Proceedings., Sixth Ting Y. Lin VLSI, 1996. Proceedings., Sixth Great Lakes Symposium on , 1996 Page(s): 148 Great Lakes Symposium on , 1996 Page(s): 148 --153153
ASPDAC03 – Physical Chip Implementation – Section IV 141
Jan. 2003 ASPDAC03 - Physical Chip Implementation 281
Blockage Avoidance References
Steiner tree optimization for buffers, blockages, and bays AlperSteiner tree optimization for buffers, blockages, and bays Alpert, C.J.;t, C.J.; GandhamGandham, G.;, G.;Jiang HuJiang Hu;; NevesNeves, J.I.; Quay, S.T.;, J.I.; Quay, S.T.; SapatnekarSapatnekar, S.S. Computer, S.S. Computer--Aided Design of Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: Integrated Circuits and Systems, IEEE Transactions on , Volume: 20 Issue: 4 , April 20 Issue: 4 , April 2001 Page(s): 556 2001 Page(s): 556 ––562.562.A fast algorithm for contextA fast algorithm for context--aware buffer insertion aware buffer insertion JagannathanJagannathan, A.; , A.; SungSung--WooWoo HurHur;; LillisLillis, J. Design Automation Conference, 2000. , J. Design Automation Conference, 2000. Proceedings 2000 Page(s): 368 Proceedings 2000 Page(s): 368 ––373.373.Simultaneous routing and buffer insertion with restrictions on bSimultaneous routing and buffer insertion with restrictions on buffer uffer locationslocations Hai ZhouHai Zhou; Wong, D.F.; I; Wong, D.F.; I--Min Liu;Min Liu; AzizAziz, A. Computer, A. Computer--Aided Aided Design of Integrated Circuits and Systems, IEEE Transactions on Design of Integrated Circuits and Systems, IEEE Transactions on , , Volume: 19 Issue: 7 , July 2000 Page(s): 819 Volume: 19 Issue: 7 , July 2000 Page(s): 819 --824824Maze routing with buffer insertion and wire sizing Maze routing with buffer insertion and wire sizing MinghorngMinghorng Lai; Wong, Lai; Wong, D.F. Design Automation Conference, 2000. Proceedings 2000 Page(sD.F. Design Automation Conference, 2000. Proceedings 2000 Page(s): ): 374 374 --378378Routing tree construction under fixed buffer locations Cong, J.;Routing tree construction under fixed buffer locations Cong, J.; XinXin Yuan Yuan Design Automation Conference, 2000. Proceedings 2000 Page(s): 37Design Automation Conference, 2000. Proceedings 2000 Page(s): 379 9 --384384
Jan. 2003 ASPDAC03 - Physical Chip Implementation 282
Interconnect Planning ReferencesA practical methodology for early buffer and wire resource allocA practical methodology for early buffer and wire resource allocation Alpert, C.J.;ation Alpert, C.J.;Jiang HuJiang Hu;; SapatnekarSapatnekar, S.S.;, S.S.; VillarrubiaVillarrubia, P.G. Design Automation Conference, 2001. , P.G. Design Automation Conference, 2001. Proceedings , 2001 Page(s): 189 Proceedings , 2001 Page(s): 189 ––194194An interconnectAn interconnect--centric design flow for nanometer technologies Cong, J. Proceedicentric design flow for nanometer technologies Cong, J. Proceedings ngs of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 --528528Buffer block planning for interconnectBuffer block planning for interconnect--drivendriven floorplanningfloorplanning Cong, J.;Cong, J.; TianmingTianming Kong; Kong; Pan, D.Z. ComputerPan, D.Z. Computer--Aided Design, 1999. Digest of Technical Papers. 1999 Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on , 1999 Page(s): 358 IEEE/ACM International Conference on , 1999 Page(s): 358 ––363363Provably good global buffering using an available buffer block pProvably good global buffering using an available buffer block plan lan DraganDragan, F.F.;, F.F.;KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Computer Aided Design, 2000. , A. Computer Aided Design, 2000. ICCADICCAD--2000. IEEE/ACM International Conference on , 2000 Page(s): 104 2000. IEEE/ACM International Conference on , 2000 Page(s): 104 --109109Provably good global buffering byProvably good global buffering by multiterminal multicommoditymultiterminal multicommodity flow approximation flow approximation DraganDragan, F.F.;, F.F.; KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Design , A. Design Automation Conference, 2001. Proceedings of the ASPAutomation Conference, 2001. Proceedings of the ASP--DAC 2001. Asia and South DAC 2001. Asia and South Pacific , 2001 Page(s): 120 Pacific , 2001 Page(s): 120 ––125125Planning buffer locations by network flows Tang, X.; Wong, D.F.Planning buffer locations by network flows Tang, X.; Wong, D.F.; International ; International Symposium on Physical Design, April 2001 Page(s): 180Symposium on Physical Design, April 2001 Page(s): 180--185185RoutabilityRoutability--Driven Repeater Block Planning for InterconnectDriven Repeater Block Planning for Interconnect--Centric Centric Floorplanning Floorplanning SarkarSarkar, P.; , P.; SundararamanSundararaman, V.; , V.; KohKoh, C., C.--K.; International Symposium on Physical K.; International Symposium on Physical Design, April 2001 Page(s): 186Design, April 2001 Page(s): 186--191191