Simultaneous Device and Interconnect Optimization

Post on 29-Oct-2021

4 views 0 download

transcript

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 1

Simultaneous Device and Interconnect Optimization

■ Simultaneous device and wire sizing

■ Simultaneous buffer insertion and wire sizing

■ Simultaneous topology construction, buffer insertion and wire sizing z WBA tree (student presentation)

z P-tree

Simultaneous device and wiresizing

■ Dominance-Property based approach to minimize weighted sum of delayz Simultaneous driver/buffer and wiresizing

[Cong-Koh, TVLSI’94] [Cong-Koh-Leung, ISLPED’96]z Simultaneous transistor and interconnect sizing

[Cong-He, PDW’96, ICCAD’96]

■ Lagrangian relaxation based approach to minimize maximum delayz Simultaneous buffer and wire sizing

[Chen-Chang-Wong, DAC’96]

■ Mathematical programming based approach to minimize area while meeting performance requirementz Simultaneous gate and wiresizing

[Menezes-Baldick-Pileggi, ICCAD’95]

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 2

RC Delay Model for Drivers

■ Rmin = resistance of min-size driver■ di = size of i-th driver■ Cg = gate capacitance of min-size

driver■ Cd = diffusion capacitance of min-

size driver

d1 did2 dk

tD(T,D)

Delay of Driver = i-thR

d(d C +d C )

ii d i+ g

min1

Rp

Rn

Cg Cd

Switch level RC Model for minimum size driver

Delay from 1st to 2nd last driver, t (T,D) = Delay of driverd

i=1

k-1

i-th∑

Total Delay Measure t(k,D,W)

Total Delay Measure: t(k,D,W) = t (k,D) + t (W)D l

■ Interconnect delay from last driver to sinks

t (W) = t(N)

where is user - specified normalized non - negative parameter

to prioritize sinkN

l

sink Ni

i i

i

i

∑ ×λ

λ

Where tD(T,D) is the delay from 1st to 2nd last driver

tl(W) is the interconnect delay from last driver to sinks

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 3

Power Dissipation Formulation

■ Short-circuit: ScP(i) ∝ di

Short circuit Power = ScP(i)−=∑i

k

1

■ Capacitive: CP(i) ∝ (diCd+di+1Cg) for I< k CP(k) ∝ (dkCd+CIL) CIL: load due to

routing tree

■ Total Power = Capacitive + Short-Circuit

Capacitive Power = CP(i)i

k

=∑

1

Main Theorem: Relation between Driver and Wire Sizing

■ Given (D,W) and (D’, W’) for k drivers

■ if W = opt-WS(D) and W’ = opt-WS(D’)Â D dominates D’ => W dominates W’

■ if D = opt-DS(W) and D’ = opt-DS(W’) ÂW dominates W’ => D dominates D’

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 4

K-SDWS LU-Bound algorithm for Delay Optimization

■ Lower bound of SDWS optimal solution

Dominate

W0 = Min. Width Assignment (dominated by opt. sol.)

D0 = Opt-DS(W0)

W1 = Opt-WS(D0)

D1 = Opt-DS(W1)

(Di,Wi) monotonically inreases

■ (Di, Wi) dominated by optimal solution

K-SDWS Optimal algorithm for Delay Optimization

■ Linear search for the optimal stage number, k*

Optimal k-SDWS solution

SDWS Optimal Algorithm for Delay Optimization

■ Case 1: the bounds meet

■ Case 2: bounds do not meet z Discretize driver sizes of k-th driver between the

bounds z For each discretized driver

− compute optimal sizes for k-1 drivers and wiresz Select best d-SDWS solution

gdsagMAXILD

MAX/CCae

s

CWTCk ==

= + wheres* and

*ln

/),(ln */1

)1( kk*D

MAX≤≤

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 5

K-SDWS Optimal algorithm for Combined Delay and Power Optimization

■ Linear search for the optimal stage number, k*

■ Compute Optimal Driver Sizing Solutin by MAPLE

Solution MonotoneSelect

0 1

1-k to2i allfor 0 1

2

2

1

1

1

=⋅

−⋅+

==−⋅+

+

gk

L

k

i

i

i

Cd

d

dBA

d

d

dBA

solutiondriver monotone no has ws.t.number stagesmallest :

1-

MAX

)1(DPMAX

DP

MAX

k

kk*≤≤

Experiments to Evaluate SDWS Algorithm

■ Compared with other design methods:

z CDSMIN (Constant Driver Sizing, ratio e and MINimum wire

width)

z ODSMIN (Optimal Driver Sizing,

MINimum wire width)

z DWSA [Cong-Koh-Leung, LPDW’94)

(Independent constant Driver Sizing with ratio e, optimal wire

width)

=+

g

L

i

i

C

Ck

d

d

/1

1

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 6

Experimental Results on Power-Delay Trade-off

Simultaneous Transistor and Interconnect Sizing[Cong-He,PDW & ICCAD’96]

Given: Initial layout design for multiple nets,Table-based models for device delay and interconnect coupling capacitances

Determine: Discrete sizes for transistors/wires

Minimize: α Delay + β Power + γ Area

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 7

z resistance for unit-width transistor/wirez area capacitance for unit-width transistor/wire z fringing capacitance for transistor/wirez discrete widths for transistors/wires

■ To minimize t(X) is a simple CH-posynomial program

)(),()(),()( 1,

)(

,0

)( 00 jCjiFxjCjiFXtji

xiR

jijx

iR

ii••+•••= ∑∑

)()()( 1)()( 00 iCiHiG

ix

iR

ix

iR

ii••+•+ ∑∑

:0C

:0R

:},...,,{ 21 nxxxX =:1C

Objective for Delay Minimization

Dominance Property for Simple CH-posynomial Programs

■ Theorem ([Cong-He, pdw’96]z The dominance property holds for simple CH-posynomial

program w.r.t. the local refinement.− If X dominates optimal solution X*

X’ = local refinement of XThen, X’ dominates X*

− Symmetric for X dominated by X*

)()()(0 0 1 ,1

qjqj

m

p

m

q

n

i

n

ijjx

axbXf p

i

pi ⋅⋅= ∑ ∑ ∑ ∑= = = ≠=

■ To minimize

is a simple CH-posynomial program where api and bqj are positive constants.

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 8

Overview of STIS Algorithm

■ Support mixed transistor sizing formulations:z find an optimal size for each gate, each pull-up or pull-down

block, or each transistor

■ Algorithm Flown Partition devices and interconnects into DC-Connect-

Components (DCCs)o Compute TIGHT lower and upper bounds by iterative LR

(local refinement) for devices and wires within each DCCp Compute optimal solution within bounds by bottom-up

dynamic program [Lillis-et al, ICCAD’95] within each DCC

Experimental Results■ Clock nets of 12.7Mchip/s all digital BPSK direct sequence

spread spectrum IF transceiver Chip in UCLA1 radio for wireless multimedia information systems

■ Clock nets routed interactively with Flint, fabricated by 1.2um SCMOS technology

■ CLK net: 112 inverters and 255 sinksDCLK net: 31 inverters and 123 sinks

■ Manually designed driver/buffer: cascade chain of 4 inverters■ Ideal inter-clock skew = 0:

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 9

Manual Design versus LR-Based Optimizations

■ Transistor sizing formulation can achieve higher delay and skew reduction at a similar power dissipation

■ Runtimes (wire segmenting: 10um) z LR-based SBWS 1.18s, STIS 0.88sz Dynamic programming run out of memory

z Total HSPICE simulation ~2000s

manual SBWS STISmax delay (ns) 4.6324 4.3447(-6.2%) 3.9632(-14.4)average power(mW) 60.85 46.09(-24.3%) 46.29(-24.2%)clock skew 470ps 130ps(-3.6x) 40ps(-11.7x)

Trend of Device Effective Resistance

■ R0 is NOT a constant. It depends on size, input slope tt and output load cl

z May differ by a factor of 2

z NOT a function of a single sizing variable

size = 100x

cl \ tt 0.05ns 0.10ns 0.20ns0.225pf 12200 12270 191800.425pf 8135 9719 125000.825pf 8124 8665 10250

size = 400x

cl \ tt 0.05ns 0.10ns 0.20ns0.501pf 12200 15550 191500.901pf 11560 13360 174401.701pf 8463 9688 12470

effective-resistance R0 for unit-width n-transistor

Invalidate simple CH-posynomial Fomulation!

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 10

Bounded CH-Posynomial Program and Extended Local Refinement

))(()()(0 0 1 ,1

)( qjqj

m

p

m

q

n

i

n

ijjx

XaxXbXf p

i

pi ⋅⋅= ∑ ∑ ∑ ∑= = = ≠=

■ To minimize

is a general CH-posynomial, when api and bqj are arbitrary functions of X , but each has an upper and lower bound.

■ Extended local refinement on w.r.t X is local refinement using following coefficients:z When X dominates X*, for any p, q and , we use

maxpia ,)( 1

pixpi forXa min

qja qjxqj forXa 1)(

ix

instead of instead ofminpib )(Xbpi

maxqjb )(Xbqjinstead of instead offor ,p

ix forqjx

ij≠

z Symmetric operation when X is dominated by X*

Dominance Property for Bounded CH-Posynomial Program

■ Theorem ([Cong-He, ISPD’98]:z The dominance property holds for bounded CH-posynomial

program w.r.t. the extended local refinement.− If X dominates optimal solution X*

X’ = extended local refinement of XThen, X’ dominates X*

− If X is dominated by X* X’ = extended local refinement of X

Then, X’ is dominated by X*

■ Application:z Device and wire sizing problem

− under general capacitance model− under table-based device delay model

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 11

Extended Local Refinement for Device

■ and are determined z under assumption that R0 increases w.r.t.

− increases of size and input slope− decrease of output load

z table lookup− using keeping updated lower and upper bounds on

transistor size, input slope and output load

)(max0 iR )(min

0 iR

■ When we use:z for LR optimization on transistor iz for LR optimization on transistors rather than i

,*XX ≥)(max

0 iR

)(min0 iR

■ When we use:z for LR optimization on transistor i

z for LR optimization on transistors rather than i

,*XX ≤)(min

0 iR

)(max0 iR

Comparison between STIS Formulations

DCLK step-model table-model

sgws 1.16 1.08 (-6.8%)

stis 1.13 (-2.5%) 0.96 (-17.2%)

2cm line step-model table-model

sgws 0.82 0.81 (-0.4%)

stis 0.75 (-8.6%) 0.69 (-16.5%)

■ Different formulations on DCLK and 2cm linez Parameters are based on 0.18um processz Optimal buffer insertion is used for 2cm line

■ Total runtimez LR-based optimization ~10 seconds

z HSPICE simulation ~3000 seconds

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 12

GISS can be Solved as General CH-Posynomial Program

z 16-bit bus each a 10mm-long line, 500um per segmentz Min min width (max spacing)z GISS/DP dynamic programming based and under

variable ca and cf

z GISS/LR LR-based and under general cap table

C e n te rs p a c in g

A v e r a g e D e la y s (n s ) R u n t im e s ( s )

M I N G I S S /D P G I S S /L R G I S S /D P G I S S /L R

2 x p i tc h 1 .5 1 0 .8 0 ( -4 7 % ) 0 .7 9 ( -4 7 % ) 1 8 3 2 .0

3 x p i tc h 1 .3 3 0 .5 2 ( -6 1 % ) 0 .5 2 ( -6 1 % ) 1 8 9 2 .4

4 x p i tc h 1 .2 8 0 .4 2 ( -6 7 % ) 0 .4 2 ( -6 7 % ) 5 1 1 2 .3

5 x p i tc h 1 .2 5 0 .3 7 ( -7 1 % ) 0 .3 6 ( -7 1 % ) 1 0 8 6 4 .9

6 x p i tc h 1 .2 3 0 .3 4 ( -7 2 % ) 0 .3 2 ( -7 3 % ) 1 3 7 9 7 .7

Simultaneous Device and Interconnect Optimization

■ Simultaneous device and wire sizing

■ Simultaneous buffer insertion and wire sizing

■ Simultaneous topology construction, buffer insertion and wire sizing

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 13

Buffer Insertion with Wiresizing[Lillis-Cheng-Lin, ICCAD’95]

■ Objective is to minimize power subject to delay constraints■ Incorporate the effect of signal slew on buffer delay using

piece-wise linear functions■ In the bottom-up phase, consider discrete wiresizing for

each edge e,z For each option (c, q), candidate wire width w,

cap(e, w) = wire cap. of e with width wres(e, w) = wire res. of e with width wCompute new option (c’, q’):

c’ = c + cap(e, w);q’ = q - res(e, w) × (cap(e, w)/2 + c)

■ Additional pruning rule considered for power minimization: Options (c, q) with power p, and (c’, q’) with power p’, prune (c, q) if p’< p, c’≤ c, q’≥ q

Simultaneous Buffer Insertion/Sizing and Wiresizing[Chu-Wong, ISPD’97]

■ Assumptions:z Consider only area capacitancez Continue wire widths and buffer sizes without bounds

■ Problem:z Given a single line, driver resist., load, and the total

number of segments n to be used

z Objective: find (i) the optimal number of buffers to beinserted in their locations and sizes

(ii) the optimal length and width of each segment

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 14

■ Results and Implications:z Closed form formula for optimal number of buffers

z All segments in the optimal solution are of equal length

z Closed form formulas for buffer and wire sizes, for any given buffer locations

z Buffer locations do not matter, as long as delay is the only objective and the buffer and wire sizes are not bounded

⇒ For delay minimization, a chain of cascade drivers is as good as using buffers to break a long line

However, power and area will be affected by buffer locations

■ For interconnect tree, apply the formulas on edges iteratively; keep buffer locations/sizes and wire widths of other edges fixed while optimizing one edge

■ Shortcoming: Ignore fringing capacitance which is significant in deep submicron

Simultaneous Buffer Insertion/Sizing and Wiresizingcontinued

Comparison of Several Interconnect Optimization Algorithms

■ T+B+W:Topology (T), followed by optimal buffer insertion and sizing B (B=10) then followed by optimal wire sizing (W=18)

■ TB+BW: Simultaneous T and B (B=3), followed by simultaneous buffer and wire sizing (BW) with B=40, W=18

■ Tbw+BW: Simultaneous TBW with small number of B=3 and W=3, then followed by BW as above

■ TBW: Simultaneous TBW with larger number of B=10 and W=8

■ Provided by the UCLA TRIO (Tree, Repeater, & Interconnect Optimization) package

ECE902 VLSI Interconnects

Fall 1999, Prof. Lei He 15

Comparison of Optimization Results by Different Algorithms

AlgorithmsT+B+W TB+BW Tbw+BW TBW

0.40 0.39 0.35 0.340.47 0.48 0.38 0.38

Delay(nS)

0.42 0.41 0.36 0.355-pi

nne

ts

CPU (S) 0.1 0.1 1.4 150.42 0.37 0.34 0.330.56 0.56 0.44 0.44

Delay(nS)

0.47 0.45 0.38 0.3810-p

inne

ts

CPU (S) 0.8 1.0 6.4 760.45 0.43 0.38 0.390.54 0.48 0.42 0.41

Delay(nS)

0.46 0.43 0.38 0.3820-p

inne

ts

CPU (S) 1.6 4.0 27.6 350