Section IV: Timing Closure Techniques -...

June 2002 DAC02 – Physical Chip Implementation 1

Section IV: Timing Closure Techniques

June 2002 DAC02 - Physical Chip Implementation 2

IBM Contributions to this presentation include:n T.J. Watson Research Centern Austin Research Labn ASIC Design Centersn EDA Organization

* For more detailed information see references atthe end of this presentation, which include a widevariety of IBM and External publications covering theseareas.


Overviewn Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis

integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing optimization


Timing Closuren Many aspects of a design contribute to performance,

power, and densityuArchitecture / Logic ImplementationuPD Design Style (Flat, Hierarchical, etc)uClocking Paradigm / Test / Circuit Familyu Floor Plan / Synthesis / Placement / Routing

n Design Automation for timing closure is more significant than ever beforeuDesigns are largeruWires are longer, invalidating statistical synthesis

models, and requiring lots of buffersuCycle times are more aggressive


Design Automation Tools are Individually Maturen Timing analysisn Synthesis / Technology mappingn Placement / Routingn Floor Planningn Extraction / Analysis


Physical SynthesisPlace&route

synthesis timing

Challenge is to integrate them into one cooperative application

Netlist in

Completed

Design


Design Flow Evolution:Design Entry

Synthesis w/Timing

Place

Route

Timing

1. Tech independent optimization

2. Tech mapping

3. Timing correctionTiming driven

placement Timing Driven

Placement plus

Automatic Post

placement tuning

Integrated Placement

and Synthesis

Integrated Placement, Synthesis &

Routing

1. Physically aware optimizations

2. Physically aware timing correction

3. Timing / Noise aware routing


Purpose of this Section:

n Provide users with an intuitive feel of the inner workings of the major timing closure tools

n Demonstrate the advancements in timing closure tools technology via example designs

n Explore a variety of significant design choices


What you should expect:

n High level concepts presented are generally applicable across a wide range of tools / methodologies (ie: not IBM specific)

n Specific tool internals used in this tutorial are taken from IBM tools. They should provide a reasonable “feel” as to how things are done in the industry.


Worldwide ASIC/PLD SalesTop 5 Suppliers for 2001n IBM $ 2758 growth 1.2%n Agere $ 1310 growth -43.5%n LSI $ 1243 growth -38.2% n NEC $ 1243 growth -35.2%

n XLIINX $ 1149 growth -26.3%

Revenue: Millions of U.S. DollarsSource: Gartner Dataquest (March 2002)


IBM ASIC Supplier #1 since 1999IBM ASIC Supplier #1 since 1999

NEC

Lucent

LSI Logic

IBM

VLSI

Xilinx

TI

Fujitsu

Toshiba

Hitachi Altera

NEC

Lucent

LSI Logic

IBM

VLSI

Xilinx

TI

Fujitsu

Toshiba

Altera

NEC

Lucent

LSI Logic

IBM

VLSI

XilinxTI

Fujitsu

Toshiba

Altera

NEC

Lucent

LSI Logic

IBM

VLSI

Xilinx

TI

Fujitsu

STM

NEC

Lucent

LSI Logic

IBM1

2

3

4

56789

10

1996 1997 1998 1999 2000

AlteraToshiba

Xilinx

TI

Fujitsu

STM

NEC

Lucent

LSI Logic

IBM

AlteraToshiba

Xilinx

Agilent

Fujitsu

Mitsubishi

2001

Dataquest 96-02


Section Outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis



Static Timing Analysis


Timing Analysis Basics:Why static timing since simulation is more accurate?

c=0 c=1b=0 a-z delay1 a-z delay2 b=1 a-z delay3 a-z delay4

Exponential explosion as possible design input states grow!

a

b

c

zHow would one calculate the worst case rising delay from a to z?

n A simple example:

n Simulation has a number of key drawbacksn requires input state vectorsn long runtimes


-Required arrival time(RAT) -- the time a signal must arrive at in order to avoid a chip fail

-Slack = Required arrival time - Arrival time– Positive slack good, negative slack bad

Definition of basic terms-Arrival time(AT) -- the time at switch a pin switches state

90

10time

vdd

slew = time90 -time10

50 AT = time50

-Slew - the rate at which a signal switches– usually difference of 10% and 90% on voltage curve

Timing Analysis Basics:


nBlock based timing:n Worst value only stored at merge pointsn Each segment is processed just once

d=2

d=1

d=5

d=3

d=2

d=1

d=3

d=3d=1

temp at=3 temp at=7

Example Problem: What is slack at PO?


at=0

at=0

at=0

at=1

at=2

at=5 at=6

at=5

at=8at=11

rat=10

Slack= -1


nWhat is Incremental Timing?n Enabling small incremental changes without full retimingn Only direct fanin/fanout cone is processed

at=

at=at=

at=at=

d=2

at=

at=0

at=0

rat=10at=0

at=

5

d=1

d=5

d=3

d=2

d=1

d=3

d=3d=1

2

68

5

1

11

d=1d=1d=1

at=2 slack=0

it passed!

at=3at=7

at=10

at=1



Early Mode Analysis

0=aAT

1=bAT

2=xRAT

1=xAT

121 −=−=xSL101 =−=bSL

000 =−=aSL1=yAT

0=cAT

011 =−=ySL

a

b xc

y

n Definitions change as follows– longest becomes shortest– slack = arrival - required

1 1

110 −=−=cSL


Timing Correction

n Fix electrical violationsuResize cellsuBuffer netsuCopy (clone) cells

n Fix timing problemsuLocal transforms (bag of tricks)uPath-based transforms


Local Synthesis Transforms

n Resize cellsn Buffer or clone to reduce load on critical netsn Decompose large cellsn Swap connections on commutative pins or

among equivalent netsn Move critical signals forwardn Pad early pathsn Area recovery


Transform Example

Delay = 4

…..

Double Inverter

Removal

…..

…..

Delay = 2


Resizing

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

ad

e

f

0.2

0.2

0.3

?

b

aA

0.035

b

aC

0.026


Cloning

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.20.2

?

b

a

d

e

f

g

h

A

B


Buffering

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

g

h

0.2

0.2

0.2

0.20.2

? b

a

d

e

f

g

h0.1

0.2

0.2

0.20.2

BB

0.2


Redesign Fan-in Tree

a

c

d

b eArr(b)=3

Arr(c)=1

Arr(d)=0

Arr(a)=4

Arr(e)=61

1

1

c

d

e

Arr(e)=5

1

1b1

a


Redesign Fan-out Tree

1

1

1

3

1

1

1

Longest Path = 5

1

1

1

3

1

2

Longest Path = 4Slowdown of buffer due to load


Decomposition


Swap Commutative Pins

2

c

ab

2

1

0 1

1

1

3

a

cb

2

1

0

1

1

2

1 5

Simple Sorting on arrival times and delay works


Move Critical Signals Forward

n Based on ATPG– linear in circuit size– Detects

redundancies efficiently

n Efficiently find wires to be added and remove.– Based on

mandatory assignments..

a

b

cd e

ab

ed

c


Section outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis



Placement Objective:n Find optimal relative ordering of cellsuminimize wire length and congestionumaximize timing slack

n Find optimal spacing of cellsu eliminate wiring congestion problemsu provide space for post placement synthesis

Fclock treesFbuffer insertionF timing correction

n Find optimal Global Position


A B C

Optimal Relative Order:


A B C

To spread ...


A B C

.. or not to spread


A B C

Place to the left


A B C

… or to the right


A B C

Optimal Relative Order:

Without “free” space the problem is dominated by order


Placement Footprints:Standard Cell:

Data Path:

IP - Floorplanning


Core

ControlIO

Reserved areas

Mixed Data Path &sea of gates:

Placement Footprints:


Perimeter IO

Area IO

Placement Footprints:


Placement objectives are subject to User Constraints / Design Style:n Hierarchical Design Constraintsu pin locationu power rail u reserved layers

n Flat Design w/Floor Plan constraintsn Fixed circuitsn IO connections


UnconstrainedPlacement


Floor plannedPlacement


CongestionMAP


Advantages of Hierarchy

n Design is carved into smaller pieces that can be worked on in parallel (improved throughput)

n A known floor plan provides the logic design team with a large degree of placement control.

n A known floor plan provided early knowledge of long wires

n Timing closure problems can be addressed by tools, logic design, and hierarchy manipulation

n Late design changes can be done with minimal turmoil to the entire design


Disadvantages of Hierarchyn Results depend on the quality of the hierarchy. The

logic hierarchy must be designed with PD taken into account.

n Additional methodology requirements must be met to enable hierarchy. Ex. Pin assignment, Macro Abstract management, area budgeting, floor planning, timing budgets, etc

n Late design changes may affect multiple components.

n Hierarchy allows divergent methodologies n Hierarchy hinders DA algorithms. They can no

longer perform global optimizations.


Physical Synthesis FlowSynthesized NetlistWire-load Models

UnplacedPhysically “unaware” timing

Cleanup: Remove buffers, nominal power levels on gates

Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing

Logical + Placement optimizations

Timing-driven placement w/resynthesis

For minimal netweights, based on the timing of the net

Physically aware logic optimizations

Timing Improvement

?

Placed Netlist

Yes No more


Logical + Placement Optimizations

CutBin

n Start with a placed or unplaced netlist

n Do recursive partitioningn During and following each

partition action, apply logic optimizations such asu timing correctionsu rebufferingu repoweringu cloningu pin swapping u move boxesu … etc





Overview of Common Placement Algorithms:

- Simulated Annealing- Quadratic Placement- Partitioning


for(temp=high; temp > absolute_zero; temp -= increment){

make a random movescore the moveuse temp dependent probability to decide to accept or reject

}

Simulated Annealing:

Note: Clustering can be useto improve performance


Annealing ::

Pros:- ease of implementation, dumb moves / smart scoring- can easily accommodate new constraints - just add them to the

scoring function- great quality - can be made to run on parallel processors

Cons:- very long run time


Quadratic Placement


Cost = (x1 − 100)2 + (x1 − x2)2 +(x2 −200)2

¹¹x1

Cost = 2(x1 − 100)+ 2(x1 − x2)

¹¹x2

Cost =− 2(x1 −x2) + 2(x2 − 200)

setting the partial derivatives = 0 we solve for the minimum Cost:

Ax + B = 0

= 04 −2−2 4

x1x2

+ −200−400

= 02 −1−1 2

x1x2

+ −100−200

x1=400/3 x2=500/3

xx22

x1

x=100 x=200Review:


setting the partial derivatives = 0 we solve for the minimum Cost:

Ax + B = 0

= 04 −2−2 4

x1x2

+ −200−400

= 02 −1−1 2

x1x2

+ −100−200

x1=400/3 x2=500/3

xx22

x1

x=100 x=200

Interpretation of matrices A and B:

The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are 1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object i

Review:


Why formulate the problem this way?n Because we cann Because it is trivial to solven Because there is only one solutionn Because the solution is a global optimumn Because the solution conveys “relative order”

informationn Because the solution conveys “global

position” information


However:

n Solution is not legaln Solution depends of fixed anchor pointsn Solution does not minimize linear wire length,

congestion, or timingn Solution is generally highly overlapping w/

high density (ie needs to be spread out)


What does the solution look like?

n To get an intuitive feel for the solution, examine the relaxation method for solving Ax + B = 0

n Actual program implementation may use other solution methods (that are generally less intuitive).


Solution of Quadratic using Relaxation:



Constrained Solutions:

n Sometimes we want to solve for the minimum wire

n length subject to a constraintn Example: Using quadratic for

partitioning, we may want the quadratic placement to be "centered"




To minimize Cost = f(x) subject to a constraint g(x) = 0 we can use langrangian multipliers to modify the Cost function as follows:

C o st = f(x) + 2g(x)

¹¹x C ost = ¹

¹x f(x) + 2¹¹x g(x)

Using CG as a constraint: where: s is the size of object_i C G = Si= 1

n s i x i +Si=1n s i

n is the number of objects g(x ) = (Si=1

n s ix i +Si=1n si ) − C G

where we use N to represent the constant ¹¹x g(x) = s i

N Si=1n si

We have already shown that

leads to the system of equations --- Ax + B = 0¹¹x f(x) = 0

T herefore solving the constrained porblem = 0 ¹¹x C ost = ¹

¹x f(x) + 2¹¹x g(x) leads to: = 0A x + B + 2 s i

N

Constrained Solutions


To solve Ax + B + = 0 we could use a packaged solver and add the 2 s i

N additional unknown and equation to our matricies and 2 CG =Si=1

n s ix i +N solve.

Here is an alternative way to solve the system:

by substitution we let x = x u+2x l where is the unconstrained solution (ie the solution to Ax + B = 0)x u

Assuming we can solve the unconstrain ed problem, is known.x u

By substitution we get:

A(x u + 2x l) + B + 2 s i

N = 0

which becomes:

or A2x l + 2 s i

N = 0 Ax l + s i

N = 0

Constrained Solutions (cont):


We need to solve: A x l + s i

N = 0

Note: The A matr ix is the same as the A matr ix for the unconstrained solut ion. Since the A matrix is the net l is t c o n nectivity specification, we have A.

The B matrix here is instead of the sum of fixed location connects.s i

N

In t repre ta t ion:

The solut ion to can be obtained by modifying the original netlist A x l + s i

N = 0 and placement such that :

1.) All f ixed objects are moved x = 02.) A constant force vector is applied to each object . The constant force vector for the i’th o b ject has magnitude s i

N

Then use the same solver as was used to solve Ax + B = 0



We also need to solve for 2

From the CG relationship and we get:x=xu+2xl

where (ie total size)CG=Si=0n si(xui +2xli

)+N N=Si=0n si

since we have solved for and the only unkown is xu xl 2

we get:

2=NCG−Si=0

n sixui

Si=0n sixli



To minimize f(x) ( the wl squared cost function) subject to a CG constraint we do thefol lowing:

1.) Solve for by solving using relaxation or some other methodx u A x u + B = 0

2.) Solve for as follows: x l A x l + s i

N = 0 - Move all fixed objects to locat ion=0

- Add a constant force vector to each object. The constant force vector for the i’th ob ject has magnitude s i

N - Using rela xation or some other method, solve for x l

3.) Solve for using 2 2=N C G −Si= 0

n s i x u i

Si= 0n s ix l i

4.) Compute the final placement using x = x u + 2x l

Constrained Solutions (summary):


xx22

x=100 x=200

Force CG to 150s=100

From the previous example we know that the solution to:

Axu + B = 0

x =133.33166.67

with this solution the CG is at (ie not 150)(10&133.33)+(100&166.67)110 = 163.64

Now we need to solve:

which is the same as solving -> Axl + s i

N = 0 Axl +00 + s i

N = 0

Recall that the B matrix represents the position of fixed objects. So, this equationrepresents the solution to:

x1s=10

x1x2

10/110

100/110x = 0

Review:


Constrained Solutions (summary):Advantages of this approach:

1.) The Solver data structure is the netlist onlyie. no additional memory requirements

2.) Sometimes the unconstrained solution is by itself sufficient,therefore we can avoid the additional overhead of producingthe constrained solution

3.) The numerical iterations in this method are NOT dependent onthe CG. We can solve for xu and xl, then try many different CG pointsat very low cost.


Quadratic Techniques:Pros:- mathematically well behaved- efficient solution techniques find global optimum- great quality

Cons:- solution of Ax + B = 0 is not a legal placement, so generally

some additional partitioning techniques are required.- solution of Ax + B = 0 is that of the "mapped" problem, ie

nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additionalmethods are deployed

- fixed IOs are required for these techniques to work well


Partitioning


Partitioning:

Objective:

Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of nets connecting the two sets is minimized.


FM Partitioning:

Initial Random Placement

After Cut 1

After Cut 2

list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){

for_each_set_in(list_of_sets){

partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */

}


FM Partitioning:

-1

-2

-1

1

0

0

0

2

0

0

1

-

-1

-2

- each object is assigned a gain

- objects are put into a sortedgain list

- the object with the highest gainfrom the smaller of the two sidesis selected and moved.

- the moved object is "locked"- gains of "touched" objects are

recomputed- gain lists are resorted

Object Gain: The amount of change in cut crossingsthat will occur if an object is moved fromits current partition into the other partition

Moves are made based on object gain.


-1

-2

-1

1

0

0

0

2

0

0

1

-

-1

-2

FM Partitioning:


-1

-2

-1

1

0

-2

-20

0

1

-

-1

-2

-2


-1

-2

-1

1

0

-2

-20

0

1

-

-1

-2

-2


-1

-2

-11

0

-2

-20

0

1

-

-1

-2

-2


-1

-2

1 -1

0

-2

-20

-2

-1

-

-1

-2

-2


-1

-2

1 -1

0

-2

-2 0

-2

-1

-

-1

-2

-2


-1

-2

1 -1

0

-2

-20

-2

-1

-

-1

-2

-2


-1

-2

1 -1

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

1

-1

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

1

-1

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

-1

-3

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

-1

-3

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

-1

-3

-2

-2

-2

0

-2

-1

1

-1

-2

-2


-1

-2

-1

-3

-2

-2

-2

-2

-2

-1

-1

-1

-2

-2


Partitioning:

Pros:- very fast- great quality- scales nearly linearly with problem size

Cons:- non-trivial to implement- very directed algorithm, but this limits the ability to deal with

miscellaneous constraints


FM Partitioning

- For large designs min-cut (FM) produces poor results

To Compensate, there are two widely used enhancements:

1.) Quadratic seeding

2.) Multi-Level partitioning


cut linecut line

move1

move2

move4

move3

Partitioning:


Global Placement - Multi-Level Partitioning:

move1

move2

move4

move3

0 0 0

0 11

2

00

1

1

0 0

0 11

2

0

00

1 0

0

10

0

generate clusters:while(there are clusters)

{partition_it;remove 1 cluster layer;

}partition_it;


move1

move2

move4

move30 0 0

0 11

2

00

1

1

0 0

0 11

2

0

00

1 0

0

10

0


move1

move2

move4

move30 0 0

0 11

2

00

1

1

0 0

0 11

2

0

00

1 0

0

10

0


move1

move2

move4

move30 0 0

0 11

2

00

1

1

0 0

0 11

2

0

00

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

2

00

1

10

0

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

2

00

1

10

0

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

2

00

1

10

0

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

00

1

10

0

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

00

1

10

0

1 0

0

10

0


move1

move2

move4

move3

0 0 0

0 11

00

1 1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

1

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


move1

move2

move4

move3

0 0 0

0 11

00

0

0


MLP/FM Partitioning Cons:

n Does not know how to handle “free” spacen Results tend to be erratic, ie results from run

to run have significant variation


MLP/FM Partitioning Pros:

n Handles designs that have no fixed connection points

n Very fast - can handle large designs


Hybrid Techniques

n Use both MLP and Quadratic techniques

n Results are more predictable due to quadratic cost function

n Partitioning is used for overlap removaln Quadratic is used for “free” space

handling and some relative order indications


Quadratic Partitioning






Analytical Constraint Generation

n Combine Quadratic techniques with MLP

n Use Quadratic solution to determine global position (ie balance)

n Use MLP to determine relative ordering of cells


Poor Solution


Capacity = 2 Capacity = 2

Quadratic solution Area=1Analytical constraintACG solution
























MLPw/ACG


Global Route Results::


MLPw/o ACG


Original ACG

Side by Side Comparison:




Observations on Quadratic Placementn placements are predictable and repeatablen timing is inherently bettern wire length is not the best, but goodn run time: slower than MLP by 4xn run time: faster than annealing by 4xn excellent “free space” handlingn placements “feel” similar to those produced

by annealing


Repeatability Example:n One circuitn Minimum linear length

occurs for all solutions where y=50 0 < x < 100

n Minimum quadratic length occurs for y=50, x=50

n Quadratic solution IS both minimum linear and minimum quadratic length

(0,50) (0,100)





Synthesis - Placement Interface


Read Data

Divide each Partition

Preprocessing

Detailed Placement

While ( any partition has > 2 cells )

Reflow across partitions

Done

Partitioning Algorithm: Partition & Reflow

Global Placement

Netlist

SynthesisSynthesis


What Synthesis Can do when Invoked:

- add boxes- delete boxes- add nets- delete nets- reconnect nets- change box sizes- query placement locations of boxes- query "bin" statistics- remove a box from a bin- add a box to a bin


Placement and Synthesis Integrationn Loosely coupled: (methodology coupling)

u do some synthesis, then write out datau do some placement, then write out datau .. Repeat

nn Interleaved: (placement & synthesis in same process)u do pre-pd synthesis

F for each placement step redo synthesis

n Tightly coupled: (simultaneous P&S aware transforms)


Loosely Coupled Placement & Synthesis:

Characteristics:

- Placement is treated as a black box

- Multiple placement runs are made

Do Placement

Analyze

- re-synthesize- GenerateConstraints

Meet Objectives

Done w/placement

Yes

No


SynthesisSynthesis

Synthesis

Synthesis

Interleaved Placement & Synthesis:

Characteristics:

- the placement flow is the same as ina placement only methodology

- in between each step of the placementprogression, synthesis is invoked


Tightly Coupled

n Placement and synthesis algorithms become co-dependant

n Placement algorithms have awareness of synthesis activity

n Synthesis algorithms have awareness of placement activity





Placement Driven Cloning

critical

non-critical

Cloning to off-load non-critical path from critical path


Placement Driven ExpansionLogic

Logic

AO

LogicLogic

Logic

LogicExpansion allows primitives to be placed in a more timing friendly way

Expansion Transformation


Example:

Tightly Coupled Placement Driven Expansion


Tightly Coupled Synthesis & Placement:

abcdefg

Transform

ad

bef

cg


ab

c

d e

fg

Tightly Coupled Synthesis & Placement Example:

Suppose the primary IO constraints look like this:



ab

c

d e

fg

The placement of the synthesized netlist would look something like this:



ab

c

d e

fg

If we could re-synthesize the netlist, we could get something that looks like this.


Tightly Coupled Synthesis & Placement:

abc

defg

PD-MAP

abc

defg

weight = 1/10

weight = 1


Tightly Coupled Synthesis & Placement example:

Map_TREEFor each cut

partition_itFor each partitionIf(partition number > M){

if(related_node_count < N)merge_nodes

if(related_node_count == 1)merge_node into neighbor

partition}

endend



ab

c

d e

fg



ab

c

d e

fg



ab

c

d e

fg



ab

c

d e

fg



ab

c

d e

fg



ab

c

d e

fg



ab

c

d e

fg

ab

c

fg

d e

Result


Placement Driven Timing Correction


Redesign Fan-in Treea

c

d

b eArr(b)=3

Arr(c)=1

Arr(d)=0

Arr(a)=4

Arr(e)=6

1

1

1

c

d

e

Arr(e)=5

1

1b1

a

e

e

Arr(e)=0


Placement Driven Repowering

n Repowering is traditionally done using load based cell characterization

n Placement changes continuously during partitioning

n Need high efficiency algorithms to do repowering in this environment

n Solution: Use Gain Based Formulation


Delay Models

inC in

out

C

Cg =

outC

pC

CEcdin

outinv +

).+=

β1(. .1

pCkd out += .1

Load based formulation:

pgld += .

inC.β

inC outC

Gain based formulation:

pC

CEcdin

outinv +

).+=

β1(. .1

d: delay

l: logical effort

g: gain

p: intrinsic delay

1k l


Area vs Delay Centricn Load Based Paradigm

• (load-based delay eq.)

• sizedn Know:

u Size of each cellu Total Area ->

– area centricn Don’t know:

u Wire loadsu Delay of each cellu Delay of a path

n Estimation error is in the delay:u Local ‘path based’ property.

n Gain Based Paradigm• (gain based delay

eq.)• sizeless

n Know:u The delay of each cell.u The delay of a path ->

– delay centricn Don’t know:

u Wire loadsu The area of each cellu The total area

n Estimation error is in the areau Global property.


Design FlowHigh Level Synthesis

Restructuring

Tech Mapping

Late Timing Corr

LibraryAnalysis GainBased Opt

Discretization

LoadBasedDelay(DCL)

GainBasedDelay


Power Levels (Gate Sizes)

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1Cout

d

A B CoutC

d


Library (Gain) Analysis

00.010.020.030.040.05

0.5 2.5 4.5 6.5 8.5 10.5g

d

A B C

in

out

CCg =

pgld += .

inCoutC

d


Area and Load Calculation

n Start at primary outputs/ register inputs. n Much like static timing analysis.n Incremental.

1g 2g 3g

33

gC

Cout

=2

32

gCC =

1

21

gC

C =

71.0=outC

71.0=outC

1g 2g 3g

2

32

gC

C =1

21

gC

C =


Gain Calculation

outCinC

1d 3dD

4d2d

43211

dddddDN

ii +++== ∑

=

in

outout

in

N

ii C

CCC

CC

CC

CCgggggG ==

=== ∏

43

4

2

3.

24321

1....

Minimize D such that:in

out

CCG =

∑∑==

+=N

ii

N

ii pfD

11 CinCout

LN

igi

N

ili

N

ifiF .

1.

11=∏

=∏=

=∏=

=

Solution: ffi =Geometric

pfpgld +=+= .

Minimize Such that:


Example I

71.0=outC19.0=inC

1d

2

3021.00496.0 :NOR2CC

d ×+=

inCC

d2

011.00308.0 :NAND2 ×+=

3009.00295.0 :INV

CC

d out×+=

2d 3d

000008.03

3.21. ====in

out

CCLffffF

0203.0=f

Nand2 Nor2 Inv Path

p 0.0308 0.0496 0.0295 0.1099

f 0.0203 0.0203 0.0203 0.0609

d 0.0511 0.0699 0.0498 0.1708

Cin 0.19 0.3364 0.3283 .

3C2C


Constant Delay Calculation

outC

ccc

o

c

c

c

fpdglpd

Cout

dCingCout

gpgld

=−==

=

=

=+=

. :Calculate :Measure

0 :Set

:Measure

:Calculate

6.3 :Set.

:Inverter

cd

nandnandnandnandc

nor

nand

pgld

gg

+=

==

.

8.15.2

..

outC

cd

cfgl =.:gatesOther


Discretizationn From gain-based model back to appropriate power levelsn There is an error in timing/load when ‘ideal’ power levels are not

available.u Goal: Minimize this error.u Can be tuned to delay error or capacitance error..

1g 2g 3g

33

gC

Cout

=2

32

gCC =

1

21

gC

C =

71.0=outC

pgld += .

[Kudva98][Beeftink98]


Gain Based: Observations:n Gain Based algorithms: A major improvement.

u More homogeneous (global) algorithms and designs.u Can be better targeted for area and/or delay.

n Reveal inherent cell characteristics to optimization tools, leading to improved QOR

n Good library design is required to facilitate discretization step

n Ideally suited for operation within Physical Synthesis


Placement Driven Buffering

Rip Out all Buffers

Insert Buffers based on placement info


What to do About Long Wires?

n Add buffersn Tune wire sizesn Modify the placement to reduce them


Placement Driven vs Logic Driven Buffer Insertionn Logic driven buffer insertion focuses on logic

topology and buffer sizing while assuming a statistical wire load model

n Placement driven buffering uses an existing placement as the fundamental constraint


n Multiple buffer typesn Invertersn Capacitance, Slew and Noise constraintsn Wire Sizingn Simultaneous driver sizingn High order interconnect delay and Ceffectiven Blockage handling

Placement Driven Buffer Insertion: Buffopt (IBM)


How Do Buffers Help?

n Reduce delayuWire delay quadratic in lengthuBuffers make delay essentially linearuDelay gate dominated, not wire dominated

n Fix other problemsuBad slews at sinksuCapacitance range violationsuNoise induced by capacitance coupling


How Does Wire Sizing Help?

n Highly resistive lines increase delayn Wider wires or thick metal layers reduces

resistance, but can increase capacitancen For long interconnect, resistance reduction

outweighs capacitance increase


Simple Buffer Insertion ProblemGiven: Source and sink locations, sink capacitancesand RATs, a buffer type, source delay rules, unit wire resistance and capacitance

Buffer

RAT1

RAT2

RAT3

RAT4

s0


Simple Buffer Insertion ProblemFind: Buffer locations and a routing tree such that slack at the source is minimized

RAT2

RAT3

RAT4

RAT1

s0

)},()({min)( 0410 iii ssdelaysRATsq −= ≤≤


Fundamental Buffer Insertion

n Van Ginneken’s dynamic programming algorithm

n Building block: candidate (Cap, slack)uCandidates for each node stored as a listuEach sink has one candidateuPropagate candidates up the tree

n Guarantees optimal solutionn Quadratic complexity


Assumptions for the Basic Van Ginneken algorithm:n Given a routing treen Given a set of potential insertion pointsn Single buffer sizen No sink or driver sizingn Linear gate delay modeluRd Cdown + Kd

n Elmore wire delay modeluRw (Cw/2 + Cdown)


Van Ginneken Extensions

n Multiple buffer typesn Invertersn Capacitance, Slew and Noise constraintsn Wire Sizingn Simultaneous driver sizingn High order interconnect delay and Ceffectiven Blockage recognition


Example- Connect the end points of the netusing a steiner route

- Add Candidate Nodes

- Final buffer solution is optimal for this route, and this set of candidate nodes.

- Other routes may produce betterfinal solutions.

- Net routing topology is an inputto Van Ginneken’s algorithm


Example


How Many Candidates?

n Number of candidates seems to double with each additional node

n Prune candidate with worst slack when capacitances is greater or equal

n Linear number of candidates


Pseudo Code:

List = NULL;For each node (bottom up traversal of graph){

augment each item in list with wire segment up to nodeduplicate the listfor each element of the duplicate list

add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list

} Pick best solution;

12 16 4 35


Example

12 16 4 35n Node 1 processing: 2 evaluations, at most 2 candidates kept n Node 2 processing: 4 evaluations, at most 3 candidates keptn Node 3 processing: 6 evaluations, at most 4 candidates keptn Node 4 processing: 8 evaluations, at most 5 candidates keptn Node 5 processing: 10 evaluations, at most 6 candidates keptn Node 6 processing: 12 evaluations, at most 7 candidates kept

u Now pick the best one: Optimal solution

1_

)1)((2_1

+<=

+== ∑=

NcandidatesNum

NNisevaluationNumN

i


Merging Branches

Critical

Merge is additive


Van Ginneken Algorithm Summaryn GooduClever pruning controls # of candidatesu Finds an optimal solution in quadratic timeuEasily extended to cover a variety of important

considerations (like multiple buffer types, wire sizing, polarity, slew, & capacitance constraints, etc.

n BaduResults depend on quality of route provided


Example Route:Critical: can not offloaddue to route

Different route leadsto better solution


Physical Synthesis FlowSynthesized NetlistWire-load Models

UnplacedPhysically “unaware” timing

Cleanup: Remove buffers, nominal power levels on gates

Initial “basic” placementFor minimal wire-length, min-cut, Steiner tree estimates, physically aware timing

Logical + Placement optimizations

Timing-driven placement w/resynthesis

For minimal netweights, based on the timing of the net

Physically aware logic optimizations

Timing Improvement

?

Placed Netlist

Yes No more


Example Route:

If still critical, add net weight


Example Route:


Multiple Buffer Types

n Instead of one buffer type, can choose from m power levels

n Generate m candidates instead of onen Still optimaln Complexity increase quadratic in m


Inverters

n Store candidates in “+” and “-” listsu+ implies polarity preservedu- implies polarity reversed

n Adding inverteruSwitches candidate in + list to - listuSwitches candidate in - to + list

n Final result only chosen from + list


Capacitance Constraints

n Each gate g can drive at most C(g) capacitancen When inserting buffer g, check downstream

capacitance. n If it is bigger than C(g), throw out candidaten Increases efficiency


Slew Constraints

n Similar to capacitance constraintsn When inserting buffer, compute slews to

gates driven by buffern If any slew exceeds its target, throw out

candidaten Potential difficulty: computing slew accurately

in bottom-up fashion


Noise Constraints

n Each gate has acceptable noise thresholdn Compute cumulative noise for each wire via

Devgan noise metricn Throw out candidates that violate noise

Can avoid noise while optimizing timing!


Wire Sizing:

For each node (bottom up traversal of graph){

for each Wire Size{

augment each item in list with Sized wire segmentduplicate the listfor each element of the duplicate list

add a buffer at nodeanalyze each element in listanalyze each element in buffered (duplicate) listpick best element of buffered list and delete the restnew list is union of list and “best” element of buffered list

}} Do Final pruning & Pick best solution;

12 16 4 35


Blockage Recognition

Delete insertion points that run over blockages


Route Around Blockage


Buffer Bays


Routing Into Buffer Bays


“Buffer Site”n Similar to buffer bays, only exact buffer

locations are pre-specified, not just areasn Useful as a mechanism for IP blocks and

microprocessor designn Dummy cell that holds a buffern Not connected to any netn Becomes buffer when assigned to a netn Extra sites à decoupling capsn Sprinkle sites throughout designn Allocate percentage within macros


Routing Into Buffer Sites


Generate Steiner Tree


Reduce Congestion and Coupling


Reduce Congestion and Coupling


Assign Buffers


Comments about Buffering and Wire Sizing:

n Extremely critical: One of the highest leverage timing closure items

n There are extended provably correct algorithms for dealing with the problem.

n Steiner route & Blockage avoidance are mostly heuristic: Hot research area!



integrationn Placement aware synthesis techniquesn Congestion avoidance / mitigation techniquesn Routing Optimization


Congestion Mitigation


Sources of Congestion

n Placement Quality: Do we have a good relative ordering of cells?

n Placement Density: Do we have appropriate cell spreading?

n Preplacement of large cells: Is there a better location for these cells?

n Floorplan quality: Is this a good floorplan / hierarchy?n Netlist complexity: Are some logic groupings

inherently difficult to routen Library characteristics: Do some cells block too

much metal internally?


Congestion Mitigation

n Constructive Avoidanceucontrol global placement pin density: fewer

pins per unit area means fewer wires per unit area

umonitor congestion during placement and perform dynamic spreading

n Post placement fix upuremove problems from an already placed

netlist


Groute / Spread / Redo

Constructive Avoidance:

Characteristics:

- as placement is formed, take action to avoid problems

- between each step of the placementprogression there is the potential to evaluate congestion and take action



… etc


Constructive Avoidance Deficiencies:n Depends on early estimates of congestion

that may not be accurate enough to avoid all problems

n Post placement actions such as clock tree insertion, repowering, buffering, etc may add congestion too the design

n Guard banding with conservative “constructive avoidance” causes lose of performance and density


Post Placement Congestion Mitigationn Use production global router, not internal

placement based global routern Translate congestion values into density

targets for placement regionsn Perform flow based circuit spreadingn Preserve relative logic ordering of cells


Network Flow based Spreading

Supply Nodes Demand Nodes

s t

i j

b(i) > 0 b(j) < 0

ÙMin-cost max-flow formulation

¼ i if b(i) > 0,Cap(esi) = b(i)

Cost(esi) = 0

¼ i ! s , j ! t, Cap(eij) = Infinity (Large Int)Cost(eij) = K

¼ j if b(j) < 0,Cap(ejt) = -b(j)Cost(ejt) = 0


Initial Placement

Calculate bin levelcongestion

Is Congestion

belowthreshold ?

Translate bin score tobin target density

Network flow based circuit spreading

Final Placement

Congestion Driven Circuit Spreading

Yes

No






We’ve Talked About

n Placement algorithmsn Placement / Synthesis interactionn Placement aware synthesis techniquesn The Constant Delay paradigmn Physical Buffer insertion / Wire sizingn Congestion Mitigation


Let’s Look at some Examples:


Pure MLP Quadratic


shatterclonefaninbuffer

Optimization Results






Section outlinen Introductionn Review material (timing and synthesis)n Introduction to placementn Placement algorithmsn Paradigms for placement-synthesis



Routing Based Optimization: RBO (IBM)


Routing based Timing Closure Issuesn Post Routing timing problems can be significantu affect design scheduleumay be too numerous to fix manually

n Increasing design density can reduce cost, but it also increases wiring congestionu timing and signal integrity become more significantu available resource for manual fixup is limiteduwithout automation may not be doable

n Rerouting with constraints may resolve some of the problems, but this process is slow


Solution:

n Integrate global routing, detailed routing and timing correction

n Global routing is efficient enough to be run in an iterative timing closure loop

n Timing critical nets avoid scenic routesn Non-critical nets that go scenic can be

repowered and buffered prior to detailed routing


critical critical pathspaths

non-critical pathsPDS Timinguses steiner wires - fast

ideal "Steiner" routes

Timing deficient wiring solution

Post PD Timing Catches this problem: Slow!

Timing driven wiring solution

RBO Timing Driven Routing sees this during global route stage: Fast!

Force optimal use of wiring resource (e.g.

critical paths get direct route)

Example Problem:


Global RoutingDivides the entire chip into localized rectangular regions called tiles.

Compress several pin location in each tile to a single pin location

All the shapes, wires and open are represented in terms of globaltrack capacity and usage.


Global Routing

nTwo step approachuCreate the initial steiner routesuCompute the edge congestion's on the griduPerform a rip-up reroute using shortest path

algorithm to reduce the overall congestion of the design

nAdvantagesuCan communicate with detail routeruGood correlation with final detail routing

solution


Current Methodology

Physical Synthesis

Global Routing

Detailed Routing

Timing Analysis

RBO Methodology

Physical Synthesis

RBO / Physical

Synthesis

Detailed Routing

Analysis

XrGlobal

Extractor Optimizer

Einstimer

No Timing Criticality for Global router

Costly Manual Timing Correction

Routing Based Optimization


RBO Extraction Process

nVery fast uExcellent correlation with final 3D extraction

nUses global routes for extractionnNeighbor information probabilistically determined based on

the global routing congestion informationnBased on extraction tables

Capacity of All Edges = 5

Probability of having a neighbor = (#OccupiedTracks)/(#Capacity) = 2/4 = 0.5

1

3 3 2


RBO results onRBO results on memcntlmemcntlDesign : Example 1Nets : ~1.6MSize : 23193 x 23193Congestion : Attached is a display of

Global congestion


Timing Critical Nets: Without RBO


Nets Routed with RBO flow


RBO Results - Example 1Worst Slack #Slack

Violations#Cap Violations

#Slew Violations

#Opens #Loops

Steiner Estimates

-0.47 17 1 18

XrLocalwithout RBO

-1.57 4687 14 128 50 87020

RBO Timing Closure (Global Routes)

-0.48 209

Detailed routing with RBO

-0.43 14 18 1 54


Example 2: - Critical Net Routed Without RBO


Example 2: Critical Net Routed With RBO


Example 2 Results Summary

Worst Slack #Slack Violations

#Cap Violations

#Slew Violations

#Opens #Loops

Final routing without RBO

-0.54 1224 33 270 0 1152

Using RBO -0.29 909 32 274 0 1070


PDS - RBO Integration

No

t No

ise Aw

are

No

ise Aw

are

112

2

3

4

56

7

8

9

10

11

11 2

23

45

678

9

10

11

Steiner Routes Timing Closure

PDS-Einstimer

Current Flow

Timing SignOff

ChipEdit-Einstimer

Detail Routing

Xrouter

No

No

iseS

ign

Off

Probabilistic Detection of Noise Problems

PDS-Einstimer-RBO

Noise Avoidance

PDS-Einstimer-RBO

SignOff Noise Detection

ETCoupling-3DNoise

Noise Correction

Manual Correction

Timing Closure

PDS-Einstimer-RBO

Steiner-Global

Projected Flow

Steiner Routes Timing Closure

PDS-Einstimer

Global Routes Timing Closure

RBO-Einstimer

SignOff Noise Detection

ETCoupling-3DNoise

Noise Correction

Manual Correction

Proposed Flow -Existing Tools


Noise Detection and Avoidance:n RBO (Detection)u Length Base

F Initial selection includes length and slack thresholdFFurther pruning based on Worst Case Miller

TiminguSwitching Window based refinement

FPattern generation based on switching window overlaps

n RBO (Avoidance)u Long Net Spreading u Track Reordering u Incremental Placement Changes u Layer Assignment


Noise Detection and Avoidance:n Wire width selectionn Physical Synthesisu Integrationu Fix Cap And Slew Violations with Global Routes u Interface to RBO uNoise Alleviation Resizing uNoise Aware Buffering


Long Net Spreader


Wrap Upn Timing closure today is highly dependant on

integrated tools.n Tightly integrated Placement, Timing & Synthesis

tools are available today from multiple vendors.n Placement techniques are dominated quadratic

techniques and partitioningn Next on the list for integration are Routing and Signal

integrity tools (happening now) n These tools have a high degree of complexity. It

takes large well funded DA organizations to compete in this space.


Placement Referencesnn C. J. Alpert, T. Chan, D. J.C. J. Alpert, T. Chan, D. J.--H,H,\\. Huang, I. Markov, and K. . Huang, I. Markov, and K. YanYan, “, “Quandratic Quandratic

Placement Revisited”,Proc. 34th IEEE/ACM Design Automation ConfePlacement Revisited”,Proc. 34th IEEE/ACM Design Automation Conference, 1997, rence, 1997, pp. 752pp. 752--757757

nn C. J. Alpert, J.C. J. Alpert, J.--H Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, ProH Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, Proc. 34th c. 34th IEEE/ACM Design Automation Conference, 1997, pp. 530IEEE/ACM Design Automation Conference, 1997, pp. 530--533533

nn U. Brenner, and A. U. Brenner, and A. RoheRohe, “An Effective Congestion Driven Placement Framework”, , “An Effective Congestion Driven Placement Framework”, International Symposium on Physical Design 2002, pp. 6International Symposium on Physical Design 2002, pp. 6--1111

nn A. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive BisA. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive Bisection Alone ection Alone Produce Routable Placements”,Proc. 37th IEEE/ACM Design AutomatiProduce Routable Placements”,Proc. 37th IEEE/ACM Design Automation Conference, on Conference, 2000, 2000, pp pp 477477--482482

nn M.A. M.A. BreuerBreuer, “Min, “Min--Cut Placement”, J. Design Automation and Fault Tolerant Cut Placement”, J. Design Automation and Fault Tolerant Computing, I(4), 1997, Computing, I(4), 1997, pp pp 343343--362362

nn J. J. VygenVygen, “Algorithms for Large, “Algorithms for Large--Scale Flat Placement”, Scale Flat Placement”, ProcProc. 34th IEEE/ACM Design . 34th IEEE/ACM Design Automation Conference, 1988,Automation Conference, 1988,pp pp 746746--751751

nn H. H. Eisenmann Eisenmann and F. M. Johannes, “Generic Global Placement and and F. M. Johannes, “Generic Global Placement and FloorplanningFloorplanning”, ”, ProcProc. 35th IEEE/ACM Design Automation Conference, 1998, . 35th IEEE/ACM Design Automation Conference, 1998, pppp. 269. 269--274274

nn S.S.--L. L. Ou Ou and M. and M. PedramPedram, “Timing Driven Placement Based on Partitioning with , “Timing Driven Placement Based on Partitioning with Dynamic CutDynamic Cut--Net Control”, Net Control”, ProcProc. 37th IEEE/ACM Design Automation Conference, . 37th IEEE/ACM Design Automation Conference, 2000, 2000, pppp. 472. 472--476476

nn C.M. C.M. Fiduccia Fiduccia and R.M. and R.M. MattheysesMattheyses, A linear time heuristic for improving network , A linear time heuristic for improving network partitions, partitions, ProcProc. ACM/IEEE Design Automation Conference. (1982) . ACM/IEEE Design Automation Conference. (1982) pppp. 175 . 175 -- 181.181.


Synthesis Referencesnn C.L. Berman, J. L. Carter, and K.F. Day. The C.L. Berman, J. L. Carter, and K.F. Day. The Fanout Fanout Problem: From Theory to Practice. In Problem: From Theory to Practice. In

Advanced Research in VLSI: Proceedings of the 1989 Decennial CaAdvanced Research in VLSI: Proceedings of the 1989 Decennial Caltech Conference, pages ltech Conference, pages 6969--99, 198999, 1989

nn C. L. Berman, D. J. Hathaway, A. S. C. L. Berman, D. J. Hathaway, A. S. LaPaughLaPaugh, and L. H. , and L. H. TrevillyanTrevillyan. Efficient Techniques for . Efficient Techniques for Timing Corrections. In International Symposium on Circuits and Timing Corrections. In International Symposium on Circuits and Systems, Pages 415Systems, Pages 415--419, 1990419, 1990

nn F. F. BeeftingBeefting, P. N. , P. N. KudvaKudva, D. S. Kung, R. , D. S. Kung, R. PuriPuri, and L. , and L. StokStok. Combinatorial Cell Design for CMOS . Combinatorial Cell Design for CMOS Libraries INTEGRATION, the VLSI Journal, 29:67Libraries INTEGRATION, the VLSI Journal, 29:67--93, 200093, 2000

nn W. W. DonathDonath, P. , P. KudvaKudva, L. , L. StokStok, P. Villarrubia, L. Reddy, and A. Sullivan. Transformational , P. Villarrubia, L. Reddy, and A. Sullivan. Transformational placement and synthesis. In DATE, pages 194placement and synthesis. In DATE, pages 194--201, 2000201, 2000

nn D. J. Hathaway, R.P. D. J. Hathaway, R.P. AbatoAbato, A.D. , A.D. DrummDrumm, and L.P.P.P . Van , and L.P.P.P . Van GinnekenGinneken. Incremental timing . Incremental timing analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508analysis. Technical report, IBM Corp., 1996. U.S. patent 5,508,937.,937.

nn D. Kung, P. D. Kung, P. KudvaKudva, and A. Sullivan. A Gate Sizing Algorithm using Geometric Prog, and A. Sullivan. A Gate Sizing Algorithm using Geometric Programming. In ramming. In ProcProc. Of the International Workshop on Logic Synthesis, 1997. Of the International Workshop on Logic Synthesis, 1997

nn T. T. Kutzschebauch Kutzschebauch and L. and L. StokStok. Regularity driven logic synthesis. In . Regularity driven logic synthesis. In Proc Proc of the of the IntInt. . ConfConf. On . On Computer Aided Design, Nov 2000.Computer Aided Design, Nov 2000.

nn P. P. RezvaniRezvani, A.H. , A.H. AjamiAjami, M. , M. PedramPedram, and H. , and H. SavojSavoj. LEOPARD: A Logical Effort based . LEOPARD: A Logical Effort based fanout fanout Optimizer for Area and Delay. In IEEE/ACM International ConfereOptimizer for Area and Delay. In IEEE/ACM International Conference on CAD, pages 516nce on CAD, pages 516--519, 519, 1999.1999.

nn L. L. StokStok, M. , M. IyerIyer, and A. Sullivan. , and A. Sullivan. Wavefront Wavefront technology mapping. In DATE, pages 531technology mapping. In DATE, pages 531--536, 536, 19991999

nn D. S. Kung. A Fast D. S. Kung. A Fast Fanout Fanout Optimization for NewOptimization for New--Continuous Buffer Libraries. In IEEE/ACM Continuous Buffer Libraries. In IEEE/ACM Design Automation Conference, pages 352Design Automation Conference, pages 352--355, 1998355, 1998


DP Buffer Insertion References

nn Buffer placement in distributed RCBuffer placement in distributed RC--tree networks for minimal Elmore delay tree networks for minimal Elmore delay van van GinnekenGinneken, L.P.P.P. Circuits and Systems, 1990., IEEE International , L.P.P.P. Circuits and Systems, 1990., IEEE International Symposium on , 1990 Page(s): 865 Symposium on , 1990 Page(s): 865 --868 vol.2868 vol.2

nn Optimal wire sizing and buffer insertion for low power and a genOptimal wire sizing and buffer insertion for low power and a generalized delay eralized delay modelmodel LillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; Lin, T.Cheng; Lin, T.--T.Y. SolidT.Y. Solid--State Circuits, IEEE State Circuits, IEEE Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 Journal of , Volume: 31 Issue: 3 , March 1996 Page(s): 437 ––447447

nn Buffer insertion for noise and delay optimization Alpert, C.J.;Buffer insertion for noise and delay optimization Alpert, C.J.; DevganDevgan, A.; , A.; Quay, S.T. ComputerQuay, S.T. Computer--Aided Design of Integrated Circuits and Systems, IEEE Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633Transactions on , Volume: 18 Issue: 11 , Nov. 1999 Page(s): 1633 --16451645

nn Buffer insertion with accurate gate and interconnect delay compuBuffer insertion with accurate gate and interconnect delay computation Alpert, tation Alpert, C.J.;C.J.; DevganDevgan, A.; Quay, S.T. Design Automation Conference, 1999. , A.; Quay, S.T. Design Automation Conference, 1999. Proceedings. 36th , 1999 Page(s): 479 Proceedings. 36th , 1999 Page(s): 479 ––484484

nn Wire Segmenting For Improved Buffer Insertion Alpert, C.;Wire Segmenting For Improved Buffer Insertion Alpert, C.; DevganDevgan, A. Design , A. Design Automation Conference, 1997. Proceedings of the 34th Page(s): 58Automation Conference, 1997. Proceedings of the 34th Page(s): 588 8 ––593593

nn Simultaneous routing and buffer insertion for high performance iSimultaneous routing and buffer insertion for high performance interconnectnterconnectLillisLillis, J.; Chung, J.; Chung--KuanKuan Cheng; TingCheng; Ting--Ting Y. Lin VLSI, 1996. Proceedings., Sixth Ting Y. Lin VLSI, 1996. Proceedings., Sixth Great Lakes Symposium on , 1996 Page(s): 148 Great Lakes Symposium on , 1996 Page(s): 148 --153153


Blockage Avoidance References

nn Steiner tree optimization for buffers, blockages, and bays AlperSteiner tree optimization for buffers, blockages, and bays Alpert, C.J.;t, C.J.; GandhamGandham, G.;, G.;Jiang HuJiang Hu;; NevesNeves, J.I.; Quay, S.T.;, J.I.; Quay, S.T.; SapatnekarSapatnekar, S.S. Computer, S.S. Computer--Aided Design of Aided Design of Integrated Circuits and Systems, IEEE Transactions on , Volume: Integrated Circuits and Systems, IEEE Transactions on , Volume: 20 Issue: 4 , April 20 Issue: 4 , April 2001 Page(s): 556 2001 Page(s): 556 ––562.562.

nn A fast algorithm for contextA fast algorithm for context--aware buffer insertion aware buffer insertion JagannathanJagannathan, A.; , A.; SungSung--WooWoo HurHur;; LillisLillis, J. Design Automation Conference, 2000. , J. Design Automation Conference, 2000. Proceedings 2000 Page(s): 368 Proceedings 2000 Page(s): 368 ––373.373.

nn Simultaneous routing and buffer insertion with restrictions on bSimultaneous routing and buffer insertion with restrictions on buffer uffer locationslocations Hai ZhouHai Zhou; Wong, D.F.; I; Wong, D.F.; I--Min Liu;Min Liu; AzizAziz, A. Computer, A. Computer--Aided Aided Design of Integrated Circuits and Systems, IEEE Transactions on Design of Integrated Circuits and Systems, IEEE Transactions on , , Volume: 19 Issue: 7 , July 2000 Page(s): 819 Volume: 19 Issue: 7 , July 2000 Page(s): 819 --824824

nn Maze routing with buffer insertion and wire sizing Maze routing with buffer insertion and wire sizing MinghorngMinghorng Lai; Wong, Lai; Wong, D.F. Design Automation Conference, 2000. Proceedings 2000 Page(sD.F. Design Automation Conference, 2000. Proceedings 2000 Page(s): ): 374 374 --378378

nn Routing tree construction under fixed buffer locations Cong, J.;Routing tree construction under fixed buffer locations Cong, J.; XinXin Yuan Yuan Design Automation Conference, 2000. Proceedings 2000 Page(s): 37Design Automation Conference, 2000. Proceedings 2000 Page(s): 379 9 --384384


Interconnect Planning Referencesnn A practical methodology for early buffer and wire resource allocA practical methodology for early buffer and wire resource allocation Alpert, C.J.;ation Alpert, C.J.;

Jiang HuJiang Hu;; SapatnekarSapatnekar, S.S.;, S.S.; VillarrubiaVillarrubia, P.G. Design Automation Conference, 2001. , P.G. Design Automation Conference, 2001. Proceedings , 2001 Page(s): 189 Proceedings , 2001 Page(s): 189 ––194194

nn An interconnectAn interconnect--centric design flow for nanometer technologies Cong, J. Proceedicentric design flow for nanometer technologies Cong, J. Proceedings ngs of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 of the IEEE , Volume: 89 Issue: 4 , April 2001 Page(s): 505 --528528

nn Buffer block planning for interconnectBuffer block planning for interconnect--drivendriven floorplanningfloorplanning Cong, J.;Cong, J.; TianmingTianming Kong; Kong; Pan, D.Z. ComputerPan, D.Z. Computer--Aided Design, 1999. Digest of Technical Papers. 1999 Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on , 1999 Page(s): 358 IEEE/ACM International Conference on , 1999 Page(s): 358 ––363363

nn Provably good global buffering using an available buffer block pProvably good global buffering using an available buffer block plan lan DraganDragan, F.F.;, F.F.;KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Computer Aided Design, 2000. , A. Computer Aided Design, 2000. ICCADICCAD--2000. IEEE/ACM International Conference on , 2000 Page(s): 104 2000. IEEE/ACM International Conference on , 2000 Page(s): 104 --109109

nn Provably good global buffering byProvably good global buffering by multiterminal multicommoditymultiterminal multicommodity flow approximation flow approximation DraganDragan, F.F.;, F.F.; KahngKahng, A.B.;, A.B.; MandoiuMandoiu, I.;, I.; MudduMuddu, S.;, S.; ZelikovskyZelikovsky, A. Design , A. Design Automation Conference, 2001. Proceedings of the ASPAutomation Conference, 2001. Proceedings of the ASP--DAC 2001. Asia and South DAC 2001. Asia and South Pacific , 2001 Page(s): 120 Pacific , 2001 Page(s): 120 ––125125

nn Planning buffer locations by network flows Tang, X.; Wong, D.F.Planning buffer locations by network flows Tang, X.; Wong, D.F.; International ; International Symposium on Physical Design, April 2001 Page(s): 180Symposium on Physical Design, April 2001 Page(s): 180--185185

nn RoutabilityRoutability--Driven Repeater Block Planning for InterconnectDriven Repeater Block Planning for Interconnect--Centric Centric Floorplanning Floorplanning SarkarSarkar, P.; , P.; SundararamanSundararaman, V.; , V.; KohKoh, C., C.--K.; International Symposium on Physical K.; International Symposium on Physical Design, April 2001 Page(s): 186Design, April 2001 Page(s): 186--191191

Date post:	18-Mar-2018
Category:	Documents
Upload:	nguyentruc
View:	221 times
Download:	0 times

Section IV: Timing Closure Techniques -...

Documents