Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA...

Input-Specific Dynamic Power Optimization for VLSI Circuits

Fei HuFei HuIntel Corp. Intel Corp.

Folsom, CA 95630, USAFolsom, CA 95630, USA

Vishwani D. AgrawalVishwani D. AgrawalDepartment of ECEDepartment of ECE

Auburn University, AL 36849, USAAuburn University, AL 36849, USA

October 5, 2006October 5, 2006

Oct. 5, 2005Oct. 5, 2005 Fei Hu, ISLPED 2006, Tegernsee, GermanyFei Hu, ISLPED 2006, Tegernsee, Germany 22

OutlineOutlineBackgroundBackground– Dynamic power dissipationDynamic power dissipation– Glitch reductionGlitch reduction– Previous LP model with fixed gate delayPrevious LP model with fixed gate delay– Process-variation-resistant LP modelProcess-variation-resistant LP model

Input-specific optimizationInput-specific optimization– Without process-variationWithout process-variation– With process-variationWith process-variation

Experimental resultsExperimental resultsConclusionConclusion


BackgroundBackground

Dynamic power dissipationDynamic power dissipation– PPdyndyn= P= Pswitchingswitching + P + Pshort-circuitshort-circuit

Switching power dissipationSwitching power dissipation– PPswitchingswitching = 1/2 kC = 1/2 kCLLVVdddd

22ffclkclk

Vdd

Gnd

CL

0

1

0

1off

on ic

Vdd

Gnd

CL

0

1

0

1on

off

isupply


BackgroundBackgroundGlitch reductionGlitch reduction– A important dynamic power reduction techniqueA important dynamic power reduction technique

– Glitch power consumes 30~70% Glitch power consumes 30~70% PPdyndyn

– Related techniquesRelated techniquesBalanced delayBalanced delay

Hazard filteringHazard filtering

Transistor/Gate sizingTransistor/Gate sizing

Linear Programming approachLinear Programming approach

Static glitch Dynamic glitch


Glitch reductionGlitch reduction

Original circuitOriginal circuit

Balanced path/ path balancingBalanced path/ path balancing– Equalize delays of all path incident on a gateEqualize delays of all path incident on a gate– Balancing requires insertion of delay buffers.Balancing requires insertion of delay buffers.

Hazard/glitch filteringHazard/glitch filtering– Utilize glitch filtering effect of gateUtilize glitch filtering effect of gate– Not necessary to insert bufferNot necessary to insert buffer

11

1

.51

1.5

1.5

.51

3


Glitch reductionGlitch reduction

Transistor/gate sizingTransistor/gate sizing– Find transistor sizes in the circuit to realize the delayFind transistor sizes in the circuit to realize the delay– No need to insert delay buffersNo need to insert delay buffers– Suffers from nonlinearity of delay modelSuffers from nonlinearity of delay model– large solution space, numerical convergence and global large solution space, numerical convergence and global

optimization not guaranteedoptimization not guaranteed

Linear programming approachLinear programming approach– Adopts both path balancing and hazard filteringAdopts both path balancing and hazard filtering– Finds the optimal delay assignments for gatesFinds the optimal delay assignments for gates– Uses technology mapping to map the gate delay assignments to Uses technology mapping to map the gate delay assignments to

transistor/gate dimensionstransistor/gate dimensions– Guarantees optimal solution, a convenient way to solve a large Guarantees optimal solution, a convenient way to solve a large

scale optimization problemscale optimization problem


Previous LP approachPrevious LP approach

Timing window (t, T)

d7

T7t7

T6t6

T5t5

4

5

6

7

1

2

15 18

19

16

20

21

22

8

93

17

23

24

25

26

11

12

1027

28

2913

14

Gate constraints:T7 T5 + d7

T7 T6 + d7

t7 ≤ t5 + d7

t7 ≤ t6 + d7

d7 > T7 – t7

Circuit delay constraints:T11 ≤ maxdelayT12 ≤ maxdelayObjective:Minimize sum of buffer delays


Process-variation-resistant optimizationProcess-variation-resistant optimization

MotivationMotivation– Gate delay assumed fixed in previous modelsGate delay assumed fixed in previous models– Variation of gate delay in real circuitsVariation of gate delay in real circuits

Environmental factors: temperature, VEnvironmental factors: temperature, Vdddd

Physical factors: process variationsPhysical factors: process variations– Effect of delay variationEffect of delay variation

Glitch filtering conditions corruptedGlitch filtering conditions corruptedPower dissipation increases from the optimized valuePower dissipation increases from the optimized value

– Our proposalOur proposalConsider delay variations in dynamic power optimizationConsider delay variations in dynamic power optimizationOnly consider process variations (major source of delay Only consider process variations (major source of delay variation)variation)


LP model based on statistical timingLP model based on statistical timing

Statistical timing model with random variablesStatistical timing model with random variables

Gate i

Gate 1

Gate j

Gate k

ta1 Ta1

taj Taj

tak Tak

tbi Tbi

tai Tai

......

di






Input-specific optimizationInput-specific optimization

MotivationMotivation– Previous LP models guarantee glitch filtering for Previous LP models guarantee glitch filtering for ANYANY

input vector sequenceinput vector sequenceTTi i - t- tii < d < dii for all gates for all gates

– Redundancy in optimizationRedundancy in optimizationInsertion of more buffersInsertion of more buffers

Increased overhead in power/areaIncreased overhead in power/area

– In reality, gates are under embedded environmentsIn reality, gates are under embedded environmentsOptimization for input vector sequence that is possible for the Optimization for input vector sequence that is possible for the circuit, e.g., functional vectorscircuit, e.g., functional vectors

Same reduction in power dissipation with lower overheadsSame reduction in power dissipation with lower overheads


Input-specific optimizationInput-specific optimizationGlitch generation patternGlitch generation pattern– Input vector pair that can Input vector pair that can potentiallypotentially generate a glitch generate a glitch– AND gate example:AND gate example:

Glitch generation probability Glitch generation probability PPgg[ [ ii ] ] = N = Ngg[ [ ii ] ] / N/ N– Probability glitch-generation pattern occurs at inputs of gate iProbability glitch-generation pattern occurs at inputs of gate i– Steady state signal values match the patternSteady state signal values match the pattern

0

1

0

1

0

1

0

1

0

10

1

0

1


Input-specific optimizationInput-specific optimizationApplication to basic LP model w/ fixed gate delay modelApplication to basic LP model w/ fixed gate delay model– Static optimizationStatic optimization

Only static glitches/hazards consideredOnly static glitches/hazards considered

– Relaxation of constraintsRelaxation of constraintsRelax glitch filtering constraints where glitches unlikelyRelax glitch filtering constraints where glitches unlikely

TTi i - t- tii < d < dii => => (T(Ti i – t– tii)*)*i i < d< dii

Selective relaxationSelective relaxation

Generalized relaxationGeneralized relaxation

0 if [ ] 0

1 if [ ] 0

g

ig

P i

P i

[ ]1 gP ii e



Application to process-variation-resistant LP model Application to process-variation-resistant LP model based on statistical timingbased on statistical timing– Static optimizationStatic optimization– Relaxation of constraintsRelaxation of constraints

Selective relaxationSelective relaxation

Generalized relaxationGeneralized relaxation

– Tuning factorTuning factorOriginal objectiveOriginal objective

Current objectiveCurrent objective

[ 3 ( ) ] ;i i i id W W d ik r

Minimize ; ( buffers)jj

d j

1Minimize ( ); ( buffers, other gates)j i

j i

d TF d j iN



Why do we need a tuning factorWhy do we need a tuning factor– Dominating path affects critical delay distributionDominating path affects critical delay distribution

Other logic

PIs

POAlways 0 0

0

1

Can be [1,41]

41

20 40

Dominating path

1

1

1






Experimental resultsExperimental resultsExperimental procedureExperimental procedure– Power estimationPower estimation

Event driven logic simulationEvent driven logic simulationFanout weighted sum of Fanout weighted sum of switching activitiesswitching activitiesMonte-Carlo simulation with Monte-Carlo simulation with 1,000 samples of delays under 1,000 samples of delays under process-variationprocess-variation

– Results analysisResults analysisUn-Opt., unit-delay circuitUn-Opt., unit-delay circuitOpt1, previous basic LP model w/ Opt1, previous basic LP model w/ fixed gate delayfixed gate delayOpt2, Process-variation-resistant Opt2, Process-variation-resistant LP modelLP modelIS-Opt1, IS-Opt2, Input-specific IS-Opt1, IS-Opt2, Input-specific optimizationsoptimizations

Circuit

Data extraction

AMPLDmax

r, LP

models

Circuit generation

Logic simulations

Results

Constraint set data

Gate delays

Optimized circuit


Experimental results – input-specific optimizationExperimental results – input-specific optimization

Application to “Opt1” (basic LP model), IS-Opt1Application to “Opt1” (basic LP model), IS-Opt1

Un-Opt Opt (w/o proc var.)IS-Opt (input-specific w/o

proc) maxdelay Pwr. Pwr. Delay Buffers Pwr. Delay Buffers c432 34 1.0 0.74 34 66 0.74 35 66 68 1.0 0.74 68 58 0.74 69 41 c499 22 1.0 0.94 22 48 0.94 22 33 33 1.0 0.94 33 0 0.95 33 0 c880 48 1.0 0.54 51 35 0.54 49 32 120 1.0 0.54 121 30 0.54 122 24 c1355 48 1.0 0.93 48 192 0.93 48 113 120 1.0 0.93 121 128 0.93 120 25 c1908 80 1.0 0.53 82 62 0.54 86 52 200 1.0 0.54 203 34 0.53 204 3 c2670 64 1.0 0.74 65 34 0.74 66 30 160 1.0 0.74 163 9 0.74 162 1 c3540 94 1.0 0.59 95 139 0.59 101 122 235 1.0 0.59 239 78 0.59 239 73 c5315 98 1.0 0.56 100 167 0.56 104 170 245 1.0 0.56 249 53 0.56 250 52 c6288 228 1.0 0.13 226 870 0.13 228 870 620 1.0 0.13 620 857 0.13 620 853 c7552 86 1.0 0.52 89 91 0.52 88 84 215 1.0 0.52 220 44 0.52 221 38


Experimental results – input-specific optimizationExperimental results – input-specific optimizationApplication to “Opt2” under process-variation, IS-Opt2 under 15% intra-die and 5% inter-die variationApplication to “Opt2” under process-variation, IS-Opt2 under 15% intra-die and 5% inter-die variation

Un-opt. Opt2 (statistical proc)IS-Opt2 (input-specific statistical

proc)

Cir. DMax Nom. Nom. Mean Max Dev.

No. Nom. Mean Max Dev.

No. Pwr. Pwr. Pwr. (%) Buf. Pwr. Pwr. (%) Buf. c432 50 1.0 0.74 0.76 11.1 88 0.74 0.76 9.3 81 99 1.0 0.74 0.74 3.7 106 0.74 0.74 3.3 76 c499 32 1.0 0.94 0.95 2.0 88 0.94 0.95 1.9 88 48 1.0 0.94 0.95 1.0 129 0.94 0.95 1.8 58 c880 70 1.0 0.54 0.59 18.2 57 0.54 0.59 20.4 38 174 1.0 0.54 0.55 8.6 62 0.54 0.56 9.0 38 c1355 70 1.0 0.93 0.98 10.2 305 0.93 1.01 13.1 253 174 1.0 0.93 0.94 3.0 305 0.93 0.95 4.7 160 c1908 116 1.0 0.52 0.64 35.8 135 0.52 0.64 34.7 107 290 1.0 0.52 0.58 21.4 190 0.52 0.57 18.4 104 c2670 93 1.0 0.74 0.80 13.6 249 0.73 0.79 11.3 186 232 1.0 0.73 0.76 6.2 211 0.73 0.75 4.3 79 c3540 137 1.0 0.59 0.66 17.8 281 0.59 0.65 15.6 247 341 1.0 0.59 0.62 10.1 311 0.59 0.61 7.4 188 c5315 143 1.0 0.55 0.63 20.8 399 0.55 0.63 21.0 389 356 1.0 0.55 0.60 13.4 418 0.55 0.60 13.2 413 c6288 331 1.0 0.13 0.38 223.8 1121 0.13 0.38 225.2 1115 899 1.0 0.13 0.26 125.3 1473 0.13 0.26 125.5 1243 c7552 125 1.0 0.52 0.59 18.7 481 0.52 0.58 18.1 389 312 1.0 0.52 0.56 11.8 645 0.52 0.55 10.9 520


Experimental results – input-specific optimizationExperimental results – input-specific optimization

Critical delayCritical delay

– Similar performance for “Opt2” and “IS-Opt2”Similar performance for “Opt2” and “IS-Opt2”

Nominal delay Max. deviation






ConclusionsConclusionsExplored a new aspect of low-power optimization for VLSI circuits – The input-specific Optimization – Optimizing the circuit for a given input sequence that

may be specified for the circuit.

Defined the concept of glitch-generation probability – adaptively relax glitch-filtering constraints

Experimental results– Better solution with fewer delay buffers– Maintain similar power reduction and delay performance– Up to 80% and 63% reductions in delay buffers


Q & AQ & A

BackupsBackups


Process and delay variationsProcess and delay variations

Process variationsProcess variations– Variations Variations due to semiconductor processdue to semiconductor process

VVTT, t, toxox, L, Leffeff, W, Wwirewire, TH, THwirewire,,etc.etc.

– Inter-die variationInter-die variation

Constant within a die, vary from one die to another die of a wafer or wafer lot

– Intra-die variationIntra-die variationVariation within a dieVariation within a die

Due to equipment limitations or statistical effects in the Due to equipment limitations or statistical effects in the fabrication process, e.g., variation in doping concentrationfabrication process, e.g., variation in doping concentration

Spatial correlations and deterministic variation due to CMP and Spatial correlations and deterministic variation due to CMP and optical proximity effectoptical proximity effect


Delay model and implicationsDelay model and implicationsRandom gate delay modelRandom gate delay model–

– Truncated normal distributionTruncated normal distribution– Assume independenceAssume independence– Variation in terms of Variation in terms of σσ/D/Dnom,inom,i ratio ratio

Effect of inter-die variationsEffect of inter-die variations– Depends on its effect to switching activitiesDepends on its effect to switching activities– Definition of glitch-filtering probability Pglt = P {t2-t1< d}

Signal arrival time t1, t2

Gate inertial delay d– Theorem 1 states the change of hange of PPglt glt due to inter-die variation due to inter-die variation

erf(), the error functionk, a path and gate dependent constantr, σσ/D/Dnom,inom,i ratio for inter-die variations ratio for inter-die variations

, ,total i nom i inter,i intra,iD D D D

2

1erf( ) erf( )

2 2 2 2( )glt

k kP

r k


Delay model and implicationsDelay model and implicationsProcess-variation-resistant designProcess-variation-resistant design– Can be achieved by path balancing and glitch filteringCan be achieved by path balancing and glitch filtering– Critical delay may increaseCritical delay may increase

Theorem 2 states that a solution is guaranteed only if circuit delay Theorem 2 states that a solution is guaranteed only if circuit delay is allowed to increaseis allowed to increase

Proved by example, assuming 10% variationProved by example, assuming 10% variation

1 111

1 111

A

B C

2.1 3.9



Statistical timing model with random variablesStatistical timing model with random variables

Gate i

Gate 1

Gate j

Gate k

ta1 Ta1

taj Taj

tak Tak

tbi Tbi

tai Tai

......

di



Minimum-maximum statisticsMinimum-maximum statistics– needed for needed for tbtbii, Tb, Tbii

– Previous worksPrevious worksMin, Max for two normal random variable not necessarily distributed Min, Max for two normal random variable not necessarily distributed as normalas normal

Can be approximated with a normal distributionCan be approximated with a normal distribution

Requiring complex operations, e.g., integration, exponentiation, etc.Requiring complex operations, e.g., integration, exponentiation, etc.

– Challenges for LP approachChallenges for LP approachRequire simple approximation w/o nonlinear operationsRequire simple approximation w/o nonlinear operations

Our approximation for Our approximation for CC=Max(=Max(A,BA,B), A, B, and C are Gaussian RVs), A, B, and C are Gaussian RVs

1

1

Min( , , );

Max( , , );

i j k

i j k

tb ta ta ta

Tb Ta Ta Ta

Max( , )

3 Max( 3 , 3 )C A B

C C A A B B



Min-Max statistics approximation errorMin-Max statistics approximation error– Negligible when Negligible when ||AA--BB|> 3(|> 3(σσAA+ + σσBB))

– Largest when Largest when AA==BB

Max( , )

1Max( 3 , 3 )

3

C A B

C A A B B C

0

1

A B

0.5

P

x

CDFA

CDFB

Approximated CDF for Max(A,B)

Actual CDF for Max(A,B)



VariablesVariables– Timing, delay variables with mean Timing, delay variables with mean and std dev and std dev σσ– Auxiliary variables, Auxiliary variables,

ConstraintsConstraints– Gate constraintsGate constraints

Timing window at the inputs for a two-input gate iTiming window at the inputs for a two-input gate i

Timing window at outputs Timing window at outputs

, , , ,i i i iTb tb i i i W WT t W Tb tb

1 1 1

2 2 2

; 3 ;

; 3 ;

( ) / 3;

i i

i i

i i i

Tb Ta Tb Ta Ta

Tb Ta Tb Ta Ta

Tb Tb Tb

T

T

T

1 1 1

2 2 2

; 3 ;

; 3 ;

( ) / 3;

i i

i i

i i i

tb ta tb ta Ta

tb ta tb ta Ta

tb tb tb

t

t

t

; ( );

; ( );i i i i i i

i i i i i i

Ta Tb d Ta Tb d

ta tb d ta tb d

k r

k r


LP model based on statistical LP model based on statistical timingtiming

ConstraintsConstraints– Gate constraintGate constraint

Linear approximationLinear approximation

– k k [0.707, 1]; choose k=0.85, since [0.707, 1]; choose k=0.85, since– Glitch filtering constraintsGlitch filtering constraints

– Circuit delay constraintCircuit delay constraint

2 2( ) ( )i i i i i iTa Tb d Ta Tb dr k r

2 2 ;2

A BA B A B

;

( );

3 ( );

i i i

i i i

i i i i

W Tb tb

W Tb tb

d W W d

k

k r

(1 3 )iTa maxr D

di-Wi

P

3σ



ParameterParameter– rr, , σσ/D/Dnom,inom,i ratio ratio– DDmaxmax, circuit delay parameter, circuit delay parameter , optimism factor, optimism factor

=1, no relaxation=1, no relaxation<1, optimistic about the actual glitch width<1, optimistic about the actual glitch width=0, reduce to previous model=0, reduce to previous model

ObjectiveObjective– Minimize #buffer inserted – sum of buffer delaysMinimize #buffer inserted – sum of buffer delays

3 ( ) ;i i i id W W dk r

Date post:	20-Dec-2015
Category:	Documents
View:	213 times
Download:	0 times

Input-Specific Dynamic Power Optimization for VLSI Circuits Fei Hu Intel Corp. Folsom, CA 95630, USA...

Documents