Date post: | 18-Mar-2018 |
Category: |
Documents |
Upload: | nguyentruc |
View: | 221 times |
Download: | 0 times |
June 2002 DAC02 - Physical Chip Implementation 2
IBM Contributions to this presentation include:n T.J. Watson Research Centern Austin Research Labn ASIC Design Centersn EDA Organization
* For more detailed information see references atthe end of this presentation, which include a widevariety of IBM and External publications covering theseareas.
June 2002 DAC02 - Physical Chip Implementation 3
Overviewn Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 4
Timing Closuren Many aspects of a design contribute to performance,
power, and densityuArchitecture / Logic ImplementationuPD Design Style (Flat, Hierarchical, etc)uClocking Paradigm / Test / Circuit Familyu Floor Plan / Synthesis / Placement / Routing
n Design Automation for timing closure is more significant than ever beforeuDesigns are largeruWires are longer, invalidating statistical synthesis
models, and requiring lots of buffersuCycle times are more aggressive
June 2002 DAC02 - Physical Chip Implementation 5
Design Automation Tools are Individually Maturen Timing analysisn Synthesis / Technology mappingn Placement / Routingn Floor Planningn Extraction / Analysis
June 2002 DAC02 - Physical Chip Implementation 6
Physical SynthesisPlace&route
synthesis timing
Challenge is to integrate them into one cooperative application
Netlist in
Completed
Design
June 2002 DAC02 - Physical Chip Implementation 7
Design Flow Evolution:Design Entry
Synthesis w/Timing
Place
Route
Timing
1. Tech independent optimization
2. Tech mapping
3. Timing correctionTiming driven
placement Timing Driven
Placement plus
Automatic Post
placement tuning
Integrated Placement
and Synthesis
Integrated Placement, Synthesis &
Routing
1. Physically aware optimizations
2. Physically aware timing correction
3. Timing / Noise aware routing
June 2002 DAC02 - Physical Chip Implementation 8
Purpose of this Section:
n Provide users with an intuitive feel of the inner workings of the major timing closure tools
n Demonstrate the advancements in timing closure tools technology via example designs
n Explore a variety of significant design choices
June 2002 DAC02 - Physical Chip Implementation 9
What you should expect:
n High level concepts presented are generally applicable across a wide range of tools / methodologies (ie: not IBM specific)
n Specific tool internals used in this tutorial are taken from IBM tools. They should provide a reasonable “feel” as to how things are done in the industry.
June 2002 DAC02 - Physical Chip Implementation 10
Worldwide ASIC/PLD SalesTop 5 Suppliers for 2001n IBM $ 2758 growth 1.2%n Agere $ 1310 growth -43.5%n LSI $ 1243 growth -38.2% n NEC $ 1243 growth -35.2%
n XLIINX $ 1149 growth -26.3%
Revenue: Millions of U.S. DollarsSource: Gartner Dataquest (March 2002)
June 2002 DAC02 - Physical Chip Implementation 11
IBM ASIC Supplier #1 since 1999IBM ASIC Supplier #1 since 1999
NEC
Lucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
Toshiba
Hitachi Altera
NEC
Lucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
Toshiba
Altera
NEC
Lucent
LSI Logic
IBM
VLSI
XilinxTI
Fujitsu
Toshiba
Altera
NEC
Lucent
LSI Logic
IBM
VLSI
Xilinx
TI
Fujitsu
STM
NEC
Lucent
LSI Logic
IBM1
2
3
4
56789
10
1996 1997 1998 1999 2000
AlteraToshiba
Xilinx
TI
Fujitsu
STM
NEC
Lucent
LSI Logic
IBM
AlteraToshiba
Xilinx
Agilent
Fujitsu
Mitsubishi
2001
Dataquest 96-02
June 2002 DAC02 - Physical Chip Implementation 12
Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 14
Timing Analysis Basics:Why static timing since simulation is more accurate?
c=0 c=1b=0 a-z delay1 a-z delay2 b=1 a-z delay3 a-z delay4
Exponential explosion as possible design input states grow!
a
b
c
zHow would one calculate the worst case rising delay from a to z?
n A simple example:
n Simulation has a number of key drawbacksn requires input state vectorsn long runtimes
June 2002 DAC02 - Physical Chip Implementation 15
-Required arrival time(RAT) -- the time a signal must arrive at in order to avoid a chip fail
-Slack = Required arrival time - Arrival time– Positive slack good, negative slack bad
Definition of basic terms-Arrival time(AT) -- the time at switch a pin switches state
90
10time
vdd
slew = time90 -time10
50 AT = time50
-Slew - the rate at which a signal switches– usually difference of 10% and 90% on voltage curve
Timing Analysis Basics:
June 2002 DAC02 - Physical Chip Implementation 16
nBlock based timing:n Worst value only stored at merge pointsn Each segment is processed just once
d=2
d=1
d=5
d=3
d=2
d=1
d=3
d=3d=1
temp at=3 temp at=7
Example Problem: What is slack at PO?
Timing Analysis Basics:
at=0
at=0
at=0
at=1
at=2
at=5 at=6
at=5
at=8at=11
rat=10
Slack= -1
June 2002 DAC02 - Physical Chip Implementation 17
nWhat is Incremental Timing?n Enabling small incremental changes without full retimingn Only direct fanin/fanout cone is processed
at=
at=at=
at=at=
d=2
at=
at=0
at=0
rat=10at=0
at=
5
d=1
d=5
d=3
d=2
d=1
d=3
d=3d=1
2
68
5
1
11
d=1d=1d=1
at=2 slack=0
it passed!
at=3at=7
at=10
at=1
Timing Analysis Basics:
June 2002 DAC02 - Physical Chip Implementation 18
Early Mode Analysis
0=aAT
1=bAT
2=xRAT
1=xAT
121 −=−=xSL101 =−=bSL
000 =−=aSL1=yAT
0=cAT
011 =−=ySL
a
b xc
y
n Definitions change as follows– longest becomes shortest– slack = arrival - required
1 1
110 −=−=cSL
June 2002 DAC02 - Physical Chip Implementation 19
Timing Correction
n Fix electrical violationsuResize cellsuBuffer netsuCopy (clone) cells
n Fix timing problemsuLocal transforms (bag of tricks)uPath-based transforms
June 2002 DAC02 - Physical Chip Implementation 20
Local Synthesis Transforms
n Resize cellsn Buffer or clone to reduce load on critical netsn Decompose large cellsn Swap connections on commutative pins or
among equivalent netsn Move critical signals forwardn Pad early pathsn Area recovery
June 2002 DAC02 - Physical Chip Implementation 21
Transform Example
Delay = 4
…..
Double Inverter
Removal
…..
…..
Delay = 2
June 2002 DAC02 - Physical Chip Implementation 22
Resizing
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
ad
e
f
0.2
0.2
0.3
?
b
aA
0.035
b
aC
0.026
June 2002 DAC02 - Physical Chip Implementation 23
Cloning
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.20.2
?
b
a
d
e
f
g
h
A
B
June 2002 DAC02 - Physical Chip Implementation 24
Buffering
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.20.2
? b
a
d
e
f
g
h0.1
0.2
0.2
0.20.2
BB
0.2
June 2002 DAC02 - Physical Chip Implementation 25
Redesign Fan-in Tree
a
c
d
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=61
1
1
c
d
e
Arr(e)=5
1
1b1
a
June 2002 DAC02 - Physical Chip Implementation 26
Redesign Fan-out Tree
1
1
1
3
1
1
1
Longest Path = 5
1
1
1
3
1
2
Longest Path = 4Slowdown of buffer due to load
June 2002 DAC02 - Physical Chip Implementation 28
Swap Commutative Pins
2
c
ab
2
1
0 1
1
1
3
a
cb
2
1
0
1
1
2
1 5
Simple Sorting on arrival times and delay works
June 2002 DAC02 - Physical Chip Implementation 29
Move Critical Signals Forward
n Based on ATPG– linear in circuit size– Detects
redundancies efficiently
n Efficiently find wires to be added and remove.– Based on
mandatory assignments..
a
b
cd e
ab
ed
c
June 2002 DAC02 - Physical Chip Implementation 30
Section outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 31
Placement Objective:n Find optimal relative ordering of cellsuminimize wire length and congestionumaximize timing slack
n Find optimal spacing of cellsu eliminate wiring congestion problemsu provide space for post placement synthesis
Fclock treesFbuffer insertionF timing correction
n Find optimal Global Position
June 2002 DAC02 - Physical Chip Implementation 37
A B C
Optimal Relative Order:
Without “free” space the problem is dominated by order
June 2002 DAC02 - Physical Chip Implementation 38
Placement Footprints:Standard Cell:
Data Path:
IP - Floorplanning
June 2002 DAC02 - Physical Chip Implementation 39
Core
ControlIO
Reserved areas
Mixed Data Path &sea of gates:
Placement Footprints:
June 2002 DAC02 - Physical Chip Implementation 41
Placement objectives are subject to User Constraints / Design Style:n Hierarchical Design Constraintsu pin locationu power rail u reserved layers
n Flat Design w/Floor Plan constraintsn Fixed circuitsn IO connections
June 2002 DAC02 - Physical Chip Implementation 45
Advantages of Hierarchy
n Design is carved into smaller pieces that can be worked on in parallel (improved throughput)
n A known floor plan provides the logic design team with a large degree of placement control.
n A known floor plan provided early knowledge of long wires
n Timing closure problems can be addressed by tools, logic design, and hierarchy manipulation
n Late design changes can be done with minimal turmoil to the entire design
June 2002 DAC02 - Physical Chip Implementation 46
Disadvantages of Hierarchyn Results depend on the quality of the hierarchy. The
logic hierarchy must be designed with PD taken into account.
n Additional methodology requirements must be met to enable hierarchy. Ex. Pin assignment, Macro Abstract management, area budgeting, floor planning, timing budgets, etc
n Late design changes may affect multiple components.
n Hierarchy allows divergent methodologies n Hierarchy hinders DA algorithms. They can no
longer perform global optimizations.
June 2002 DAC02 - Physical Chip Implementation 47
Physical Synthesis FlowSynthesized NetlistWire-load Models
UnplacedPhysically “unaware” timing
Cleanup: Remove buffers, nominal power levels on gates
Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing
Logical + Placement optimizations
Timing-driven placement w/resynthesis
For minimal netweights, based on the timing of the net
Physically aware logic optimizations
Timing Improvement
?
Placed Netlist
Yes No more
June 2002 DAC02 - Physical Chip Implementation 48
Logical + Placement Optimizations
CutBin
n Start with a placed or unplaced netlist
n Do recursive partitioningn During and following each
partition action, apply logic optimizations such asu timing correctionsu rebufferingu repoweringu cloningu pin swapping u move boxesu … etc
June 2002 DAC02 - Physical Chip Implementation 49
Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 50
Overview of Common Placement Algorithms:
- Simulated Annealing- Quadratic Placement- Partitioning
June 2002 DAC02 - Physical Chip Implementation 51
for(temp=high; temp > absolute_zero; temp -= increment){
make a random movescore the moveuse temp dependent probability to decide to accept or reject
}
Simulated Annealing:
Note: Clustering can be useto improve performance
June 2002 DAC02 - Physical Chip Implementation 52
Annealing ::
Pros:- ease of implementation, dumb moves / smart scoring- can easily accommodate new constraints - just add them to the
scoring function- great quality - can be made to run on parallel processors
Cons:- very long run time
June 2002 DAC02 - Physical Chip Implementation 54
Cost = (x1 − 100)2 + (x1 − x2)2 +(x2 −200)2
¹¹x1
Cost = 2(x1 − 100)+ 2(x1 − x2)
¹¹x2
Cost =− 2(x1 −x2) + 2(x2 − 200)
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 −2−2 4
x1x2
+ −200−400
= 02 −1−1 2
x1x2
+ −100−200
x1=400/3 x2=500/3
xx22
x1
x=100 x=200Review:
June 2002 DAC02 - Physical Chip Implementation 55
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 −2−2 4
x1x2
+ −200−400
= 02 −1−1 2
x1x2
+ −100−200
x1=400/3 x2=500/3
xx22
x1
x=100 x=200
Interpretation of matrices A and B:
The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are 1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object i
Review:
June 2002 DAC02 - Physical Chip Implementation 56
Why formulate the problem this way?n Because we cann Because it is trivial to solven Because there is only one solutionn Because the solution is a global optimumn Because the solution conveys “relative order”
informationn Because the solution conveys “global
position” information
June 2002 DAC02 - Physical Chip Implementation 57
However:
n Solution is not legaln Solution depends of fixed anchor pointsn Solution does not minimize linear wire length,
congestion, or timingn Solution is generally highly overlapping w/
high density (ie needs to be spread out)
June 2002 DAC02 - Physical Chip Implementation 58
What does the solution look like?
n To get an intuitive feel for the solution, examine the relaxation method for solving Ax + B = 0
n Actual program implementation may use other solution methods (that are generally less intuitive).
June 2002 DAC02 - Physical Chip Implementation 61
Constrained Solutions:
n Sometimes we want to solve for the minimum wire
n length subject to a constraintn Example: Using quadratic for
partitioning, we may want the quadratic placement to be "centered"
June 2002 DAC02 - Physical Chip Implementation 64
To minimize Cost = f(x) subject to a constraint g(x) = 0 we can use langrangian multipliers to modify the Cost function as follows:
C o st = f(x) + 2g(x)
¹¹x C ost = ¹
¹x f(x) + 2¹¹x g(x)
Using CG as a constraint: where: s is the size of object_i C G = Si= 1
n s i x i +Si=1n s i
n is the number of objects g(x ) = (Si=1
n s ix i +Si=1n si ) − C G
where we use N to represent the constant ¹¹x g(x) = s i
N Si=1n si
We have already shown that
leads to the system of equations --- Ax + B = 0¹¹x f(x) = 0
T herefore solving the constrained porblem = 0 ¹¹x C ost = ¹
¹x f(x) + 2¹¹x g(x) leads to: = 0A x + B + 2 s i
N
Constrained Solutions
June 2002 DAC02 - Physical Chip Implementation 65
To solve Ax + B + = 0 we could use a packaged solver and add the 2 s i
N additional unknown and equation to our matricies and 2 CG =Si=1
n s ix i +N solve.
Here is an alternative way to solve the system:
by substitution we let x = x u+2x l where is the unconstrained solution (ie the solution to Ax + B = 0)x u
Assuming we can solve the unconstrain ed problem, is known.x u
By substitution we get:
A(x u + 2x l) + B + 2 s i
N = 0
which becomes:
or A2x l + 2 s i
N = 0 Ax l + s i
N = 0
Constrained Solutions (cont):
June 2002 DAC02 - Physical Chip Implementation 66
We need to solve: A x l + s i
N = 0
Note: The A matr ix is the same as the A matr ix for the unconstrained solut ion. Since the A matrix is the net l is t c o n nectivity specification, we have A.
The B matrix here is instead of the sum of fixed location connects.s i
N
In t repre ta t ion:
The solut ion to can be obtained by modifying the original netlist A x l + s i
N = 0 and placement such that :
1.) All f ixed objects are moved x = 02.) A constant force vector is applied to each object . The constant force vector for the i’th o b ject has magnitude s i
N
Then use the same solver as was used to solve Ax + B = 0
Constrained Solutions (cont):
June 2002 DAC02 - Physical Chip Implementation 67
We also need to solve for 2
From the CG relationship and we get:x=xu+2xl
where (ie total size)CG=Si=0n si(xui +2xli
)+N N=Si=0n si
since we have solved for and the only unkown is xu xl 2
we get:
2=NCG−Si=0
n sixui
Si=0n sixli
Constrained Solutions (cont):
June 2002 DAC02 - Physical Chip Implementation 68
To minimize f(x) ( the wl squared cost function) subject to a CG constraint we do thefol lowing:
1.) Solve for by solving using relaxation or some other methodx u A x u + B = 0
2.) Solve for as follows: x l A x l + s i
N = 0 - Move all fixed objects to locat ion=0
- Add a constant force vector to each object. The constant force vector for the i’th ob ject has magnitude s i
N - Using rela xation or some other method, solve for x l
3.) Solve for using 2 2=N C G −Si= 0
n s i x u i
Si= 0n s ix l i
4.) Compute the final placement using x = x u + 2x l
Constrained Solutions (summary):
June 2002 DAC02 - Physical Chip Implementation 69
xx22
x=100 x=200
Force CG to 150s=100
From the previous example we know that the solution to:
Axu + B = 0
x =133.33166.67
with this solution the CG is at (ie not 150)(10&133.33)+(100&166.67)110 = 163.64
Now we need to solve:
which is the same as solving -> Axl + s i
N = 0 Axl +00 + s i
N = 0
Recall that the B matrix represents the position of fixed objects. So, this equationrepresents the solution to:
x1s=10
x1x2
10/110
100/110x = 0
Review:
June 2002 DAC02 - Physical Chip Implementation 70
Constrained Solutions (summary):Advantages of this approach:
1.) The Solver data structure is the netlist onlyie. no additional memory requirements
2.) Sometimes the unconstrained solution is by itself sufficient,therefore we can avoid the additional overhead of producingthe constrained solution
3.) The numerical iterations in this method are NOT dependent onthe CG. We can solve for xu and xl, then try many different CG pointsat very low cost.
June 2002 DAC02 - Physical Chip Implementation 71
Quadratic Techniques:Pros:- mathematically well behaved- efficient solution techniques find global optimum- great quality
Cons:- solution of Ax + B = 0 is not a legal placement, so generally
some additional partitioning techniques are required.- solution of Ax + B = 0 is that of the "mapped" problem, ie
nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additionalmethods are deployed
- fixed IOs are required for these techniques to work well
June 2002 DAC02 - Physical Chip Implementation 73
Partitioning:
Objective:
Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of nets connecting the two sets is minimized.
June 2002 DAC02 - Physical Chip Implementation 74
FM Partitioning:
Initial Random Placement
After Cut 1
After Cut 2
list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){
for_each_set_in(list_of_sets){
partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */
}
June 2002 DAC02 - Physical Chip Implementation 75
FM Partitioning:
-1
-2
-1
1
0
0
0
2
0
0
1
-
-1
-2
- each object is assigned a gain
- objects are put into a sortedgain list
- the object with the highest gainfrom the smaller of the two sidesis selected and moved.
- the moved object is "locked"- gains of "touched" objects are
recomputed- gain lists are resorted
Object Gain: The amount of change in cut crossingsthat will occur if an object is moved fromits current partition into the other partition
Moves are made based on object gain.
June 2002 DAC02 - Physical Chip Implementation 90
Partitioning:
Pros:- very fast- great quality- scales nearly linearly with problem size
Cons:- non-trivial to implement- very directed algorithm, but this limits the ability to deal with
miscellaneous constraints
June 2002 DAC02 - Physical Chip Implementation 91
FM Partitioning
- For large designs min-cut (FM) produces poor results
To Compensate, there are two widely used enhancements:
1.) Quadratic seeding
2.) Multi-Level partitioning
June 2002 DAC02 - Physical Chip Implementation 92
cut linecut line
move1
move2
move4
move3
Partitioning:
June 2002 DAC02 - Physical Chip Implementation 93
Global Placement - Multi-Level Partitioning:
move1
move2
move4
move3
0 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
generate clusters:while(there are clusters)
{partition_it;remove 1 cluster layer;
}partition_it;
June 2002 DAC02 - Physical Chip Implementation 94
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 95
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 96
move1
move2
move4
move30 0 0
0 11
2
00
1
1
0 0
0 11
2
0
00
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 97
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 98
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 99
move1
move2
move4
move3
0 0 0
0 11
2
00
1
10
0
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 100
move1
move2
move4
move3
0 0 0
0 11
00
1
10
0
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 101
move1
move2
move4
move3
0 0 0
0 11
00
1
10
0
1 0
0
10
0
June 2002 DAC02 - Physical Chip Implementation 102
move1
move2
move4
move3
0 0 0
0 11
00
1 1
00
1
0
0
June 2002 DAC02 - Physical Chip Implementation 103
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
June 2002 DAC02 - Physical Chip Implementation 104
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
June 2002 DAC02 - Physical Chip Implementation 105
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
June 2002 DAC02 - Physical Chip Implementation 106
move1
move2
move4
move3
0 0 0
0 11
00
1
1
00
1
0
0
June 2002 DAC02 - Physical Chip Implementation 119
MLP/FM Partitioning Cons:
n Does not know how to handle “free” spacen Results tend to be erratic, ie results from run
to run have significant variation
June 2002 DAC02 - Physical Chip Implementation 120
MLP/FM Partitioning Pros:
n Handles designs that have no fixed connection points
n Very fast - can handle large designs
June 2002 DAC02 - Physical Chip Implementation 121
Hybrid Techniques
n Use both MLP and Quadratic techniques
n Results are more predictable due to quadratic cost function
n Partitioning is used for overlap removaln Quadratic is used for “free” space
handling and some relative order indications
June 2002 DAC02 - Physical Chip Implementation 127
Analytical Constraint Generation
n Combine Quadratic techniques with MLP
n Use Quadratic solution to determine global position (ie balance)
n Use MLP to determine relative ordering of cells
June 2002 DAC02 - Physical Chip Implementation 128
Poor Solution
Analytical Constraint Generation
Capacity = 2 Capacity = 2
Quadratic solution Area=1Analytical constraintACG solution
June 2002 DAC02 - Physical Chip Implementation 156
Observations on Quadratic Placementn placements are predictable and repeatablen timing is inherently bettern wire length is not the best, but goodn run time: slower than MLP by 4xn run time: faster than annealing by 4xn excellent “free space” handlingn placements “feel” similar to those produced
by annealing
June 2002 DAC02 - Physical Chip Implementation 157
Repeatability Example:n One circuitn Minimum linear length
occurs for all solutions where y=50 0 < x < 100
n Minimum quadratic length occurs for y=50, x=50
n Quadratic solution IS both minimum linear and minimum quadratic length
(0,50) (0,100)
June 2002 DAC02 - Physical Chip Implementation 158
Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 160
Read Data
Divide each Partition
Preprocessing
Detailed Placement
While ( any partition has > 2 cells )
Reflow across partitions
Done
Partitioning Algorithm: Partition & Reflow
Global Placement
Netlist
SynthesisSynthesis
June 2002 DAC02 - Physical Chip Implementation 161
What Synthesis Can do when Invoked:
- add boxes- delete boxes- add nets- delete nets- reconnect nets- change box sizes- query placement locations of boxes- query "bin" statistics- remove a box from a bin- add a box to a bin
June 2002 DAC02 - Physical Chip Implementation 162
Placement and Synthesis Integrationn Loosely coupled: (methodology coupling)
u do some synthesis, then write out datau do some placement, then write out datau .. Repeat
nn Interleaved: (placement & synthesis in same process)u do pre-pd synthesis
F for each placement step redo synthesis
n Tightly coupled: (simultaneous P&S aware transforms)
June 2002 DAC02 - Physical Chip Implementation 163
Loosely Coupled Placement & Synthesis:
Characteristics:
- Placement is treated as a black box
- Multiple placement runs are made
Do Placement
Analyze
- re-synthesize- GenerateConstraints
Meet Objectives
Done w/placement
Yes
No
June 2002 DAC02 - Physical Chip Implementation 164
SynthesisSynthesis
Synthesis
Synthesis
Interleaved Placement & Synthesis:
Characteristics:
- the placement flow is the same as ina placement only methodology
- in between each step of the placementprogression, synthesis is invoked
June 2002 DAC02 - Physical Chip Implementation 165
Tightly Coupled
n Placement and synthesis algorithms become co-dependant
n Placement algorithms have awareness of synthesis activity
n Synthesis algorithms have awareness of placement activity
June 2002 DAC02 - Physical Chip Implementation 166
Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 167
Placement Driven Cloning
critical
non-critical
Cloning to off-load non-critical path from critical path
June 2002 DAC02 - Physical Chip Implementation 168
Placement Driven ExpansionLogic
Logic
AO
LogicLogic
Logic
LogicExpansion allows primitives to be placed in a more timing friendly way
Expansion Transformation
June 2002 DAC02 - Physical Chip Implementation 169
Example:
Tightly Coupled Placement Driven Expansion
June 2002 DAC02 - Physical Chip Implementation 170
Tightly Coupled Synthesis & Placement:
abcdefg
Transform
ad
bef
cg
June 2002 DAC02 - Physical Chip Implementation 171
ab
c
d e
fg
Tightly Coupled Synthesis & Placement Example:
Suppose the primary IO constraints look like this:
June 2002 DAC02 - Physical Chip Implementation 172
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
The placement of the synthesized netlist would look something like this:
June 2002 DAC02 - Physical Chip Implementation 173
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
If we could re-synthesize the netlist, we could get something that looks like this.
June 2002 DAC02 - Physical Chip Implementation 174
Tightly Coupled Synthesis & Placement:
abc
defg
PD-MAP
abc
defg
weight = 1/10
weight = 1
June 2002 DAC02 - Physical Chip Implementation 175
Tightly Coupled Synthesis & Placement example:
Map_TREEFor each cut
partition_itFor each partitionIf(partition number > M){
if(related_node_count < N)merge_nodes
if(related_node_count == 1)merge_node into neighbor
partition}
endend
June 2002 DAC02 - Physical Chip Implementation 176
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 177
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 178
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 179
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 180
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 181
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
June 2002 DAC02 - Physical Chip Implementation 182
Tightly Coupled Synthesis & Placement Example:
ab
c
d e
fg
ab
c
fg
d e
Result
June 2002 DAC02 - Physical Chip Implementation 184
Redesign Fan-in Treea
c
d
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=6
1
1
1
c
d
e
Arr(e)=5
1
1b1
a
e
e
Arr(e)=0
June 2002 DAC02 - Physical Chip Implementation 185
Placement Driven Repowering
n Repowering is traditionally done using load based cell characterization
n Placement changes continuously during partitioning
n Need high efficiency algorithms to do repowering in this environment
n Solution: Use Gain Based Formulation
June 2002 DAC02 - Physical Chip Implementation 186
Delay Models
inC in
out
C
Cg =
outC
pC
CEcdin
outinv +
).+=
β1(. .1
pCkd out += .1
Load based formulation:
pgld += .
inC.β
inC outC
Gain based formulation:
pC
CEcdin
outinv +
).+=
β1(. .1
d: delay
l: logical effort
g: gain
p: intrinsic delay
1k l
June 2002 DAC02 - Physical Chip Implementation 187
Area vs Delay Centricn Load Based Paradigm
• (load-based delay eq.)
• sizedn Know:
u Size of each cellu Total Area ->
– area centricn Don’t know:
u Wire loadsu Delay of each cellu Delay of a path
n Estimation error is in the delay:u Local ‘path based’ property.
n Gain Based Paradigm• (gain based delay
eq.)• sizeless
n Know:u The delay of each cell.u The delay of a path ->
– delay centricn Don’t know:
u Wire loadsu The area of each cellu The total area
n Estimation error is in the areau Global property.
June 2002 DAC02 - Physical Chip Implementation 188
Design FlowHigh Level Synthesis
Restructuring
Tech Mapping
Late Timing Corr
LibraryAnalysis GainBased Opt
Discretization
LoadBasedDelay(DCL)
GainBasedDelay
June 2002 DAC02 - Physical Chip Implementation 189
Power Levels (Gate Sizes)
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1Cout
d
A B CoutC
d
June 2002 DAC02 - Physical Chip Implementation 190
Library (Gain) Analysis
00.010.020.030.040.05
0.5 2.5 4.5 6.5 8.5 10.5g
d
A B C
in
out
CCg =
pgld += .
inCoutC
d
June 2002 DAC02 - Physical Chip Implementation 191
Area and Load Calculation
n Start at primary outputs/ register inputs. n Much like static timing analysis.n Incremental.
1g 2g 3g
33
gC
Cout
=2
32
gCC =
1
21
gC
C =
71.0=outC
71.0=outC
1g 2g 3g
2
32
gC
C =1
21
gC
C =
June 2002 DAC02 - Physical Chip Implementation 192
Gain Calculation
outCinC
1d 3dD
4d2d
43211
dddddDN
ii +++== ∑
=
in
outout
in
N
ii C
CCC
CC
CC
CCgggggG ==
=== ∏
43
4
2
3.
24321
1....
Minimize D such that:in
out
CCG =
∑∑==
+=N
ii
N
ii pfD
11 CinCout
LN
igi
N
ili
N
ifiF .
1.
11=∏
=∏=
=∏=
=
Solution: ffi =Geometric
pfpgld +=+= .
Minimize Such that:
June 2002 DAC02 - Physical Chip Implementation 193
Example I
71.0=outC19.0=inC
1d
2
3021.00496.0 :NOR2CC
d ×+=
inCC
d2
011.00308.0 :NAND2 ×+=
3009.00295.0 :INV
CC
d out×+=
2d 3d
000008.03
3.21. ====in
out
CCLffffF
0203.0=f
Nand2 Nor2 Inv Path
p 0.0308 0.0496 0.0295 0.1099
f 0.0203 0.0203 0.0203 0.0609
d 0.0511 0.0699 0.0498 0.1708
Cin 0.19 0.3364 0.3283 .
3C2C
June 2002 DAC02 - Physical Chip Implementation 194
Constant Delay Calculation
outC
ccc
o
c
c
c
fpdglpd
Cout
dCingCout
gpgld
=−==
=
=
=+=
. :Calculate :Measure
0 :Set
:Measure
:Calculate
6.3 :Set.
:Inverter
cd
nandnandnandnandc
nor
nand
pgld
gg
+=
==
.
8.15.2
..
outC
cd
cfgl =.:gatesOther
June 2002 DAC02 - Physical Chip Implementation 195
Discretizationn From gain-based model back to appropriate power levelsn There is an error in timing/load when ‘ideal’ power levels are not
available.u Goal: Minimize this error.u Can be tuned to delay error or capacitance error..
1g 2g 3g
33
gC
Cout
=2
32
gCC =
1
21
gC
C =
71.0=outC
pgld += .
[Kudva98][Beeftink98]
June 2002 DAC02 - Physical Chip Implementation 196
Gain Based: Observations:n Gain Based algorithms: A major improvement.
u More homogeneous (global) algorithms and designs.u Can be better targeted for area and/or delay.
n Reveal inherent cell characteristics to optimization tools, leading to improved QOR
n Good library design is required to facilitate discretization step
n Ideally suited for operation within Physical Synthesis
June 2002 DAC02 - Physical Chip Implementation 197
Placement Driven Buffering
Rip Out all Buffers
Insert Buffers based on placement info
June 2002 DAC02 - Physical Chip Implementation 198
What to do About Long Wires?
n Add buffersn Tune wire sizesn Modify the placement to reduce them
June 2002 DAC02 - Physical Chip Implementation 199
Placement Driven vs Logic Driven Buffer Insertionn Logic driven buffer insertion focuses on logic
topology and buffer sizing while assuming a statistical wire load model
n Placement driven buffering uses an existing placement as the fundamental constraint
June 2002 DAC02 - Physical Chip Implementation 200
n Multiple buffer typesn Invertersn Capacitance, Slew and Noise constraintsn Wire Sizingn Simultaneous driver sizingn High order interconnect delay and Ceffectiven Blockage handling
Placement Driven Buffer Insertion: Buffopt (IBM)
June 2002 DAC02 - Physical Chip Implementation 201
How Do Buffers Help?
n Reduce delayuWire delay quadratic in lengthuBuffers make delay essentially linearuDelay gate dominated, not wire dominated
n Fix other problemsuBad slews at sinksuCapacitance range violationsuNoise induced by capacitance coupling
June 2002 DAC02 - Physical Chip Implementation 202
How Does Wire Sizing Help?
n Highly resistive lines increase delayn Wider wires or thick metal layers reduces
resistance, but can increase capacitancen For long interconnect, resistance reduction
outweighs capacitance increase
June 2002 DAC02 - Physical Chip Implementation 203
Simple Buffer Insertion ProblemGiven: Source and sink locations, sink capacitancesand RATs, a buffer type, source delay rules, unit wire resistance and capacitance
Buffer
RAT1
RAT2
RAT3
RAT4
s0
June 2002 DAC02 - Physical Chip Implementation 204
Simple Buffer Insertion ProblemFind: Buffer locations and a routing tree such that slack at the source is minimized
RAT2
RAT3
RAT4
RAT1
s0
)},()({min)( 0410 iii ssdelaysRATsq −= ≤≤
June 2002 DAC02 - Physical Chip Implementation 205
Fundamental Buffer Insertion
n Van Ginneken’s dynamic programming algorithm
n Building block: candidate (Cap, slack)uCandidates for each node stored as a listuEach sink has one candidateuPropagate candidates up the tree
n Guarantees optimal solutionn Quadratic complexity
June 2002 DAC02 - Physical Chip Implementation 206
Assumptions for the Basic Van Ginneken algorithm:n Given a routing treen Given a set of potential insertion pointsn Single buffer sizen No sink or driver sizingn Linear gate delay modeluRd Cdown + Kd
n Elmore wire delay modeluRw (Cw/2 + Cdown)
June 2002 DAC02 - Physical Chip Implementation 207
Van Ginneken Extensions
n Multiple buffer typesn Invertersn Capacitance, Slew and Noise constraintsn Wire Sizingn Simultaneous driver sizingn High order interconnect delay and Ceffectiven Blockage recognition
June 2002 DAC02 - Physical Chip Implementation 208
Example- Connect the end points of the netusing a steiner route
- Add Candidate Nodes
- Final buffer solution is optimal for this route, and this set of candidate nodes.
- Other routes may produce betterfinal solutions.
- Net routing topology is an inputto Van Ginneken’s algorithm
June 2002 DAC02 - Physical Chip Implementation 210
How Many Candidates?
n Number of candidates seems to double with each additional node
n Prune candidate with worst slack when capacitances is greater or equal
n Linear number of candidates
June 2002 DAC02 - Physical Chip Implementation 211
Pseudo Code:
List = NULL;For each node (bottom up traversal of graph){
augment each item in list with wire segment up to nodeduplicate the listfor each element of the duplicate list
add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list
} Pick best solution;
12 16 4 35
June 2002 DAC02 - Physical Chip Implementation 212
Example
12 16 4 35n Node 1 processing: 2 evaluations, at most 2 candidates kept n Node 2 processing: 4 evaluations, at most 3 candidates keptn Node 3 processing: 6 evaluations, at most 4 candidates keptn Node 4 processing: 8 evaluations, at most 5 candidates keptn Node 5 processing: 10 evaluations, at most 6 candidates keptn Node 6 processing: 12 evaluations, at most 7 candidates kept
u Now pick the best one: Optimal solution
1_
)1)((2_1
+<=
+== ∑=
NcandidatesNum
NNisevaluationNumN
i
June 2002 DAC02 - Physical Chip Implementation 214
Van Ginneken Algorithm Summaryn GooduClever pruning controls # of candidatesu Finds an optimal solution in quadratic timeuEasily extended to cover a variety of important
considerations (like multiple buffer types, wire sizing, polarity, slew, & capacitance constraints, etc.
n BaduResults depend on quality of route provided
June 2002 DAC02 - Physical Chip Implementation 215
Example Route:Critical: can not offloaddue to route
Different route leadsto better solution
June 2002 DAC02 - Physical Chip Implementation 216
Physical Synthesis FlowSynthesized NetlistWire-load Models
UnplacedPhysically “unaware” timing
Cleanup: Remove buffers, nominal power levels on gates
Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing
Logical + Placement optimizations
Timing-driven placement w/resynthesis
For minimal netweights, based on the timing of the net
Physically aware logic optimizations
Timing Improvement
?
Placed Netlist
Yes No more
June 2002 DAC02 - Physical Chip Implementation 219
Multiple Buffer Types
n Instead of one buffer type, can choose from m power levels
n Generate m candidates instead of onen Still optimaln Complexity increase quadratic in m
June 2002 DAC02 - Physical Chip Implementation 220
Inverters
n Store candidates in “+” and “-” listsu+ implies polarity preservedu- implies polarity reversed
n Adding inverteruSwitches candidate in + list to - listuSwitches candidate in - to + list
n Final result only chosen from + list
June 2002 DAC02 - Physical Chip Implementation 221
Capacitance Constraints
n Each gate g can drive at most C(g) capacitancen When inserting buffer g, check downstream
capacitance. n If it is bigger than C(g), throw out candidaten Increases efficiency
June 2002 DAC02 - Physical Chip Implementation 222
Slew Constraints
n Similar to capacitance constraintsn When inserting buffer, compute slews to
gates driven by buffern If any slew exceeds its target, throw out
candidaten Potential difficulty: computing slew accurately
in bottom-up fashion
June 2002 DAC02 - Physical Chip Implementation 223
Noise Constraints
n Each gate has acceptable noise thresholdn Compute cumulative noise for each wire via
Devgan noise metricn Throw out candidates that violate noise
Can avoid noise while optimizing timing!
June 2002 DAC02 - Physical Chip Implementation 224
Wire Sizing:
For each node (bottom up traversal of graph){
for each Wire Size{
augment each item in list with Sized wire segmentduplicate the listfor each element of the duplicate list
add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list
}} Do Final pruning & Pick best solution;
12 16 4 35
June 2002 DAC02 - Physical Chip Implementation 225
Blockage Recognition
Delete insertion points that run over blockages
June 2002 DAC02 - Physical Chip Implementation 229
“Buffer Site”n Similar to buffer bays, only exact buffer
locations are pre-specified, not just areasn Useful as a mechanism for IP blocks and
microprocessor designn Dummy cell that holds a buffern Not connected to any netn Becomes buffer when assigned to a netn Extra sites à decoupling capsn Sprinkle sites throughout designn Allocate percentage within macros
June 2002 DAC02 - Physical Chip Implementation 235
Comments about Buffering and Wire Sizing:
n Extremely critical: One of the highest leverage timing closure items
n There are extended provably correct algorithms for dealing with the problem.
n Steiner route & Blockage avoidance are mostly heuristic: Hot research area!
June 2002 DAC02 - Physical Chip Implementation 236
Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing Optimization
June 2002 DAC02 - Physical Chip Implementation 238
Sources of Congestion
n Placement Quality: Do we have a good relative ordering of cells?
n Placement Density: Do we have appropriate cell spreading?
n Preplacement of large cells: Is there a better location for these cells?
n Floorplan quality: Is this a good floorplan / hierarchy?n Netlist complexity: Are some logic groupings
inherently difficult to routen Library characteristics: Do some cells block too
much metal internally?
June 2002 DAC02 - Physical Chip Implementation 239
Congestion Mitigation
n Constructive Avoidanceucontrol global placement pin density: fewer
pins per unit area means fewer wires per unit area
umonitor congestion during placement and perform dynamic spreading
n Post placement fix upuremove problems from an already placed
netlist
June 2002 DAC02 - Physical Chip Implementation 240
Groute / Spread / Redo
Constructive Avoidance:
Characteristics:
- as placement is formed, take action to avoid problems
- between each step of the placementprogression there is the potential to evaluate congestion and take action
Groute / Spread / Redo
Groute / Spread / Redo
… etc
June 2002 DAC02 - Physical Chip Implementation 241
Constructive Avoidance Deficiencies:n Depends on early estimates of congestion
that may not be accurate enough to avoid all problems
n Post placement actions such as clock tree insertion, repowering, buffering, etc may add congestion too the design
n Guard banding with conservative “constructive avoidance” causes lose of performance and density
June 2002 DAC02 - Physical Chip Implementation 242
Post Placement Congestion Mitigationn Use production global router, not internal
placement based global routern Translate congestion values into density
targets for placement regionsn Perform flow based circuit spreadingn Preserve relative logic ordering of cells
June 2002 DAC02 - Physical Chip Implementation 243
Network Flow based Spreading
Supply Nodes Demand Nodes
s t
i j
b(i) > 0 b(j) < 0
ÙMin-cost max-flow formulation
¼ i if b(i) > 0,Cap(esi) = b(i)
Cost(esi) = 0
¼ i ! s , j ! t, Cap(eij) = Infinity (Large Int)Cost(eij) = K
¼ j if b(j) < 0,Cap(ejt) = -b(j)Cost(ejt) = 0
June 2002 DAC02 - Physical Chip Implementation 244
Initial Placement
Calculate bin levelcongestion
Is Congestion
belowthreshold ?
Translate bin score tobin target density
Network flow based circuit spreading
Final Placement
Congestion Driven Circuit Spreading
Yes
No
June 2002 DAC02 - Physical Chip Implementation 249
We’ve Talked About
n Placement algorithmsn Placement / Synthesis interactionn Placement aware synthesis techniquesn The Constant Delay paradigmn Physical Buffer insertion / Wire sizingn Congestion Mitigation
June 2002 DAC02 - Physical Chip Implementation 255
Section outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis
integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization
June 2002 DAC02 - Physical Chip Implementation 257
Routing based Timing Closure Issuesn Post Routing timing problems can be significantu affect design scheduleumay be too numerous to fix manually
n Increasing design density can reduce cost, but it also increases wiring congestionu timing and signal integrity become more significantu available resource for manual fixup is limiteduwithout automation may not be doable
n Rerouting with constraints may resolve some of the problems, but this process is slow
June 2002 DAC02 - Physical Chip Implementation 258
Solution:
n Integrate global routing, detailed routing and timing correction
n Global routing is efficient enough to be run in an iterative timing closure loop
n Timing critical nets avoid scenic routesn Non-critical nets that go scenic can be
repowered and buffered prior to detailed routing
June 2002 DAC02 - Physical Chip Implementation 259
critical critical pathspaths
non-critical pathsPDS Timinguses steiner wires - fast
ideal "Steiner" routes
Timing deficient wiring solution
Post PD Timing Catches this problem: Slow!
Timing driven wiring solution
RBO Timing Driven Routing sees this during global route stage: Fast!
Force optimal use of wiring resource (e.g.
critical paths get direct route)
Example Problem:
June 2002 DAC02 - Physical Chip Implementation 260
Global RoutingDivides the entire chip into localized rectangular regions called tiles.
Compress several pin location in each tile to a single pin location
All the shapes, wires and open are represented in terms of globaltrack capacity and usage.
June 2002 DAC02 - Physical Chip Implementation 261
Global Routing
nTwo step approachuCreate the initial steiner routesuCompute the edge congestion's on the griduPerform a rip-up reroute using shortest path
algorithm to reduce the overall congestion of the design
nAdvantagesuCan communicate with detail routeruGood correlation with final detail routing
solution
June 2002 DAC02 - Physical Chip Implementation 262
Current Methodology
Physical Synthesis
Global Routing
Detailed Routing
Timing Analysis
RBO Methodology
Physical Synthesis
RBO / Physical
Synthesis
Detailed Routing
Analysis
XrGlobal
Extractor Optimizer
Einstimer
No Timing Criticality for Global router
Costly Manual Timing Correction
Routing Based Optimization
June 2002 DAC02 - Physical Chip Implementation 263
RBO Extraction Process
nVery fast uExcellent correlation with final 3D extraction
nUses global routes for extractionnNeighbor information probabilistically determined based on
the global routing congestion informationnBased on extraction tables
Capacity of All Edges = 5
Probability of having a neighbor = (#OccupiedTracks)/(#Capacity) = 2/4 = 0.5
1
3 3 2
June 2002 DAC02 - Physical Chip Implementation 264
RBO results onRBO results on memcntlmemcntlDesign : Example 1Nets : ~1.6MSize : 23193 x 23193Congestion : Attached is a display of
Global congestion
June 2002 DAC02 - Physical Chip Implementation 267
RBO Results - Example 1Worst Slack #Slack
Violations#Cap Violations
#Slew Violations
#Opens #Loops
Steiner Estimates
-0.47 17 1 18
XrLocalwithout RBO
-1.57 4687 14 128 50 87020
RBO Timing Closure (Global Routes)
-0.48 209
Detailed routing with RBO
-0.43 14 18 1 54
June 2002 DAC02 - Physical Chip Implementation 270
Example 2 Results Summary
Worst Slack #Slack Violations
#Cap Violations
#Slew Violations
#Opens #Loops
Final routing without RBO
-0.54 1224 33 270 0 1152
Using RBO -0.29 909 32 274 0 1070
June 2002 DAC02 - Physical Chip Implementation 271
PDS - RBO Integration
No
t No
ise Aw
are
No
ise Aw
are
112
2
3
4
56
7
8
9
10
11
11 2
23
45
678
9
10
11
Steiner Routes Timing Closure
PDS-Einstimer
Current Flow
Timing SignOff
ChipEdit-Einstimer
Detail Routing
Xrouter
No
No
iseS
ign
Off
Probabilistic Detection of Noise Problems
PDS-Einstimer-RBO
Noise Avoidance
PDS-Einstimer-RBO
SignOff Noise Detection
ETCoupling-3DNoise
Noise Correction
Manual Correction
Timing Closure
PDS-Einstimer-RBO
Steiner-Global
Projected Flow
Steiner Routes Timing Closure
PDS-Einstimer
Global Routes Timing Closure
RBO-Einstimer
SignOff Noise Detection
ETCoupling-3DNoise
Noise Correction
Manual Correction
Proposed Flow -Existing Tools
June 2002 DAC02 - Physical Chip Implementation 272
Noise Detection and Avoidance:n RBO (Detection)u Length Base
F Initial selection includes length and slack thresholdFFurther pruning based on Worst Case Miller
TiminguSwitching Window based refinement
FPattern generation based on switching window overlaps
n RBO (Avoidance)u Long Net Spreading u Track Reordering u Incremental Placement Changes u Layer Assignment
June 2002 DAC02 - Physical Chip Implementation 273
Noise Detection and Avoidance:n Wire width selectionn Physical Synthesisu Integrationu Fix Cap And Slew Violations with Global Routes u Interface to RBO uNoise Alleviation Resizing uNoise Aware Buffering
June 2002 DAC02 - Physical Chip Implementation 275
Wrap Upn Timing closure today is highly dependant on
integrated tools.n Tightly integrated Placement, Timing & Synthesis
tools are available today from multiple vendors.n Placement techniques are dominated quadratic
techniques and partitioningn Next on the list for integration are Routing and Signal
integrity tools (happening now) n These tools have a high degree of complexity. It
takes large well funded DA organizations to compete in this space.
June 2002 DAC02 - Physical Chip Implementation 276
Placement Referencesnn C. J. Alpert, T. Chan, D. J.C. J. Alpert, T. Chan, D. J.--H,H,\\. Huang, I. Markov, and K. . Huang, I. Markov, and K. YanYan, “, “Quandratic Quandratic
Placement Revisited”,Proc. 34th IEEE/ACM Design Automation ConfePlacement Revisited”,Proc. 34th IEEE/ACM Design Automation Conference, 1997, rence, 1997, pp. 752pp. 752--757757
nn C. J. Alpert, J.C. J. Alpert, J.--H Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, ProH Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, Proc. 34th c. 34th IEEE/ACM Design Automation Conference, 1997, pp. 530IEEE/ACM Design Automation Conference, 1997, pp. 530--533533
nn U. Brenner, and A. U. Brenner, and A. RoheRohe, “An Effective Congestion Driven Placement Framework”, , “An Effective Congestion Driven Placement Framework”, International Symposium on Physical Design 2002, pp. 6International Symposium on Physical Design 2002, pp. 6--1111
nn A. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive BisA. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive Bisection Alone ection Alone Produce Routable Placements”,Proc. 37th IEEE/ACM Design AutomatiProduce Routable Placements”,Proc. 37th IEEE/ACM Design Automation Conference, on Conference, 2000, 2000, pp pp 477477--482482
nn M.A. M.A. BreuerBreuer, “Min, “Min--Cut Placement”, J. Design Automation and Fault Tolerant Cut Placement”, J. Design Automation and Fault Tolerant Computing, I(4), 1997, Computing, I(4), 1997, pp pp 343343--362362
nn J. J. VygenVygen, “Algorithms for Large, “Algorithms for Large--Scale Flat Placement”, Scale Flat Placement”, ProcProc. 34th IEEE/ACM Design . 34th IEEE/ACM Design Automation Conference, 1988,Automation Conference, 1988,pp pp 746746--751751
nn H. H. Eisenmann Eisenmann and F. M. Johannes, “Generic Global Placement and and F. M. Johannes, “Generic Global Placement and FloorplanningFloorplanning”, ”, ProcProc. 35th IEEE/ACM Design Automation Conference, 1998, . 35th IEEE/ACM Design Automation Conference, 1998, pppp. 269. 269--274274
nn S.S.--L. L. Ou Ou and M. and M. PedramPedram, “Timing Driven Placement Based on Partitioning with , “Timing Driven Placement Based on Partitioning with Dynamic CutDynamic Cut--Net Control”, Net Control”, ProcProc. 37th IEEE/ACM Design Automation Conference, . 37th IEEE/ACM Design Automation Conference, 2000, 2000, pppp. 472. 472--476476
nn C.M. C.M. Fiduccia Fiduccia and R.M. and R.M. MattheysesMattheyses, A linear time heuristic for improving network , A linear time heuristic for improving network partitions, partitions, ProcProc. ACM/IEEE Design Automation Conference. (1982) . ACM/IEEE Design Automation Conference. (1982) pppp. 175 . 175 -- 181.181.
June 2002 DAC02 - Physical Chip Implementation 277
Synthesis Referencesnn C.L. Berman, J. L. Carter, and K.F. Day. The C.L. Berman, J. L. Carter, and K.F. Day. The Fanout Fanout Problem: From Theory to Practice. In Problem: From Theory to Practice. In
Advanced Research in VLSI: Proceedings of the 1989 Decennial CaAdvanced Research in VLSI: Proceedings of the 1989 Decennial Caltech Conference, pages ltech Conference, pages 6969--99, 198999, 1989
nn C. L. Berman, D. J. Hathaway, A. S. C. L. Berman, D. J. Hathaway, A. S. LaPaughLaPaugh, and L. H. , and L. H. TrevillyanTrevillyan. Efficient Techniques for . Efficient Techniques for Timing Corrections. In International Symposium on Circuits and Timing Corrections. In International Symposium on Circuits and Systems, Pages 415Systems, Pages 415--419, 1990419, 1990
nn F. F. BeeftingBeefting, P. N. , P. N. KudvaKudva, D. S. Kung, R. , D. S. Kung, R. PuriPuri, and L. , and L. StokStok. Combinatorial Cell Design for CMOS . Combinatorial Cell Design for CMOS Libraries INTEGRATION, the VLSI Journal, 29:67Libraries INTEGRATION, the VLSI Journal, 29:67--93, 200093, 2000
nn W. W. DonathDonath, P. , P. KudvaKudva, L. , L. StokStok, P. Villarrubia, L. Reddy, and A. Sullivan. Transformational , P. Villarrubia, L. Reddy, and A. Sullivan. Transformational placement and synthesis. In DATE, pages 194placement and synthesis. In DATE, pages 194--201, 2000201, 2000
nn D. J. Hathaway, R.P. D. J. Hathaway, R.P. AbatoAbato, A.D. , A.D. DrummDrumm, and L.P.P.P . Van , and L.P.P.P . Van GinnekenGinneken. Incremental timing . Incremental timing analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508,937.,937.
nn D. Kung, P. D. Kung, P. KudvaKudva, and A. Sullivan. A Gate Sizing Algorithm using Geometric Prog, and A. Sullivan. A Gate Sizing Algorithm using Geometric Programming. In ramming. In ProcProc. Of the International Workshop on Logic Synthesis, 1997. Of the International Workshop on Logic Synthesis, 1997
nn T. T. Kutzschebauch Kutzschebauch and L. and L. StokStok. Regularity driven logic synthesis. In . Regularity driven logic synthesis. In Proc Proc of the of the IntInt. . ConfConf. On . On Computer Aided Design, Nov 2000.Computer Aided Design, Nov 2000.
nn P. P. RezvaniRezvani, A.H. , A.H. AjamiAjami, M. , M. PedramPedram, and H. , and H. SavojSavoj. LEOPARD: A Logical Effort based . LEOPARD: A Logical Effort based fanout fanout Optimizer for Area and Delay. In IEEE/ACM International ConfereOptimizer for Area and Delay. In IEEE/ACM International Conference on CAD, pages 516nce on CAD, pages 516--519, 519, 1999.1999.
nn L. L. StokStok, M. , M. IyerIyer, and A. Sullivan. , and A. Sullivan. Wavefront Wavefront technology mapping. In DATE, pages 531technology mapping. In DATE, pages 531--536, 536, 19991999
nn D. S. Kung. A Fast D. S. Kung. A Fast Fanout Fanout Optimization for NewOptimization for New--Continuous Buffer Libraries. In IEEE/ACM Continuous Buffer Libraries. In IEEE/ACM Design Automation Conference, pages 352Design Automation Conference, pages 352--355, 1998355, 1998
June 2002 DAC02 - Physical Chip Implementation 278
DP Buffer Insertion References
nn Buffer placement in distributed RCBuffer placement in distributed RC--tree networks for minimal Elmore delay tree networks for minimal Elmore delay van van GinnekenGinneken, L.P.P.P. Circuits and Systems, 1990., IEEE International , L.P.P.P. Circuits and Systems, 1990., IEEE International Symposium on , 1990 Page(s): 865 Symposium on , 1990 Page(s): 865 --868 vol.2868 vol.2
nn Optimal wire sizing and buffer insertion for low power and a genOptimal wire sizing and buffer insertion for low power and a generalized delay eralized delay modelmodel LillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; Lin, T.Cheng; Lin, T.--T.Y. SolidT.Y. Solid--State Circuits, IEEE State Circuits, IEEE Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 ––447447
nn Buffer insertion for noise and delay optimization Alpert, C.J.;Buffer insertion for noise and delay optimization Alpert, C.J.; DevganDevgan, A.; , A.; Quay, S.T. ComputerQuay, S.T. Computer--Aided Design of Integrated Circuits and Systems, IEEE Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633 --16451645
nn Buffer insertion with accurate gate and interconnect delay compuBuffer insertion with accurate gate and interconnect delay computation Alpert, tation Alpert, C.J.;C.J.; DevganDevgan, A.; Quay, S.T. Design Automation Conference, 1999. , A.; Quay, S.T. Design Automation Conference, 1999. Proceedings. 36th , 1999 Page(s): 479 Proceedings. 36th , 1999 Page(s): 479 ––484484
nn Wire Segmenting For Improved Buffer Insertion Alpert, C.;Wire Segmenting For Improved Buffer Insertion Alpert, C.; DevganDevgan, A. Design , A. Design Automation Conference, 1997. Proceedings of the 34th Page(s): 58Automation Conference, 1997. Proceedings of the 34th Page(s): 588 8 ––593593
nn Simultaneous routing and buffer insertion for high performance iSimultaneous routing and buffer insertion for high performance interconnectnterconnectLillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; TingCheng; Ting--Ting Y. Lin VLSI, 1996. Proceedings., Sixth Ting Y. Lin VLSI, 1996. Proceedings., Sixth Great Lakes Symposium on , 1996 Page(s): 148 Great Lakes Symposium on , 1996 Page(s): 148 --153153
June 2002 DAC02 - Physical Chip Implementation 279
Blockage Avoidance References
nn Steiner tree optimization for buffers, blockages, and bays AlperSteiner tree optimization for buffers, blockages, and bays Alpert, C.J.;t, C.J.; GandhamGandham, G.;, G.;Jiang HuJiang Hu;; NevesNeves, J.I.; Quay, S.T.;, J.I.; Quay, S.T.; SapatnekarSapatnekar, S.S. Computer, S.S. Computer--Aided Design of Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: Integrated Circuits and Systems, IEEE Transactions on , Volume: 20 Issue: 4 , April 20 Issue: 4 , April 2001 Page(s): 556 2001 Page(s): 556 ––562.562.
nn A fast algorithm for contextA fast algorithm for context--aware buffer insertion aware buffer insertion JagannathanJagannathan, A.; , A.; SungSung--WooWoo HurHur;; LillisLillis, J. Design Automation Conference, 2000. , J. Design Automation Conference, 2000. Proceedings 2000 Page(s): 368 Proceedings 2000 Page(s): 368 ––373.373.
nn Simultaneous routing and buffer insertion with restrictions on bSimultaneous routing and buffer insertion with restrictions on buffer uffer locationslocations Hai ZhouHai Zhou; Wong, D.F.; I; Wong, D.F.; I--Min Liu;Min Liu; AzizAziz, A. Computer, A. Computer--Aided Aided Design of Integrated Circuits and Systems, IEEE Transactions on Design of Integrated Circuits and Systems, IEEE Transactions on , , Volume: 19 Issue: 7 , July 2000 Page(s): 819 Volume: 19 Issue: 7 , July 2000 Page(s): 819 --824824
nn Maze routing with buffer insertion and wire sizing Maze routing with buffer insertion and wire sizing MinghorngMinghorng Lai; Wong, Lai; Wong, D.F. Design Automation Conference, 2000. Proceedings 2000 Page(sD.F. Design Automation Conference, 2000. Proceedings 2000 Page(s): ): 374 374 --378378
nn Routing tree construction under fixed buffer locations Cong, J.;Routing tree construction under fixed buffer locations Cong, J.; XinXin Yuan Yuan Design Automation Conference, 2000. Proceedings 2000 Page(s): 37Design Automation Conference, 2000. Proceedings 2000 Page(s): 379 9 --384384
June 2002 DAC02 - Physical Chip Implementation 280
Interconnect Planning Referencesnn A practical methodology for early buffer and wire resource allocA practical methodology for early buffer and wire resource allocation Alpert, C.J.;ation Alpert, C.J.;
Jiang HuJiang Hu;; SapatnekarSapatnekar, S.S.;, S.S.; VillarrubiaVillarrubia, P.G. Design Automation Conference, 2001. , P.G. Design Automation Conference, 2001. Proceedings , 2001 Page(s): 189 Proceedings , 2001 Page(s): 189 ––194194
nn An interconnectAn interconnect--centric design flow for nanometer technologies Cong, J. Proceedicentric design flow for nanometer technologies Cong, J. Proceedings ngs of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 --528528
nn Buffer block planning for interconnectBuffer block planning for interconnect--drivendriven floorplanningfloorplanning Cong, J.;Cong, J.; TianmingTianming Kong; Kong; Pan, D.Z. ComputerPan, D.Z. Computer--Aided Design, 1999. Digest of Technical Papers. 1999 Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on , 1999 Page(s): 358 IEEE/ACM International Conference on , 1999 Page(s): 358 ––363363
nn Provably good global buffering using an available buffer block pProvably good global buffering using an available buffer block plan lan DraganDragan, F.F.;, F.F.;KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Computer Aided Design, 2000. , A. Computer Aided Design, 2000. ICCADICCAD--2000. IEEE/ACM International Conference on , 2000 Page(s): 104 2000. IEEE/ACM International Conference on , 2000 Page(s): 104 --109109
nn Provably good global buffering byProvably good global buffering by multiterminal multicommoditymultiterminal multicommodity flow approximation flow approximation DraganDragan, F.F.;, F.F.; KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Design , A. Design Automation Conference, 2001. Proceedings of the ASPAutomation Conference, 2001. Proceedings of the ASP--DAC 2001. Asia and South DAC 2001. Asia and South Pacific , 2001 Page(s): 120 Pacific , 2001 Page(s): 120 ––125125
nn Planning buffer locations by network flows Tang, X.; Wong, D.F.Planning buffer locations by network flows Tang, X.; Wong, D.F.; International ; International Symposium on Physical Design, April 2001 Page(s): 180Symposium on Physical Design, April 2001 Page(s): 180--185185
nn RoutabilityRoutability--Driven Repeater Block Planning for InterconnectDriven Repeater Block Planning for Interconnect--Centric Centric Floorplanning Floorplanning SarkarSarkar, P.; , P.; SundararamanSundararaman, V.; , V.; KohKoh, C., C.--K.; International Symposium on Physical K.; International Symposium on Physical Design, April 2001 Page(s): 186Design, April 2001 Page(s): 186--191191