Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 0 times |
1UCSD VLSI CAD Laboratory ISQED-2009
Revisiting the Linear Programming Framework for Leakage Power vs.
Performance Optimization
Kwangok Jeong, Andrew B. Kahng, Hailong Yao
http://vlsicad.ucsd.edu/
University of California, San Diego
Outline
Background
Main Contributions
LP-Problem Formulation
Experimental Results
Conclusion
2UCSD VLSI CAD Laboratory ISQED-2009
3UCSD VLSI CAD Laboratory ISQED-2009
Background
Sizing problem– Knobs to optimize power, timing and area
Vdd, Vth, Lgate, Wgate, etc.
– Find optimal sizing for tradeoff of power, timing and area
Basic idea of leakage optimization– High speed (high leakage) gates critical paths– Low speed (low leakage) gates non-critical paths
Previous works on leakage optimization– Iterative optimization local optimum– Simplified timing model timing violations
Contributions
Key difference from previous work– Detailed delay modeling with signoff timing analysis
– on per-timing-arc / per-instance– to capture local delay sensitivity accurately
– High-speed and high-quality linear programming (LP)
Key applications of LP-framework Leakage power minimization under timing constraints Simultaneous timing legalization with leakage minimization
4UCSD VLSI CAD Laboratory ISQED-2009
5UCSD VLSI CAD Laboratory ISQED-2009
Cell Delay Model
SPICE simulations– 65GP technology– All timing arcs– Rise and fall– 7 input slew x 7 load– More than 50 cell masters– Gate length from 50nm to 75nm
Cell delay is approximately linear in gate length
Delay vs. gate length(A1 to Y in 2-input AND)
gd L d: cell delay
Lg: gate length
,: calibrated coefficients for each timing entry
Circuit Delay Model
Directed acyclic graph (DAG) representation – Cell vertex– Wire edge– Super source S and super sink T
6UCSD VLSI CAD Laboratory ISQED-2009
Delay variables• dv
u: cell delay• wu,v: wire delay• av: arrival time to node v
Delay constraint
Flip-flop A Flip-flop B
Q
u
vD
SCK
Tuavuw ,
uvd
va
vuvvuu adwa ,
7UCSD VLSI CAD Laboratory ISQED-2009
LP for Leakage Power Optimization
Objective: − Maximize weighted sum of gate lengths ( Minimize
leakage power) without degrading circuit performance
, ,
0 0
Maximize: ( )
Subject to:
0
0
( and , )
( )
minL maxL ( )
v vv V
S
T
T
uu u v v v u v
u u uv v v v v
v v v
x
a
d
a D
a w d a e E u v V
d d a x L
x v V
D: max. delayminLv : min. gate lengthmaxLv: max. gate lengthv: Power / Delay
8UCSD VLSI CAD Laboratory ISQED-2009
LP for Timing Legalization
Objective: − Given a design with timing violations, − Improve the worst negative slacks of the design with
minimum leakage increase
, ,
Minimize: ( )
Subject to:
0
0
( and , )
minL maxL ( )
T v vv V
S
T
T
uu u v v v u v
v v v
a x
a
d
a D
a w d a e E u v V
x v V
D: min. delay bound: scaling parameterminLv : min. gate lengthmaxLv: max. gate lengthv: Power / Delay
9UCSD VLSI CAD Laboratory ISQED-2009
Timing and Leakage Optimization Flow
Cell libraries
LP-Solver
Netlist + Parasitic + Timing const.
LP-Generator
10UCSD VLSI CAD Laboratory ISQED-2009
Experimental SetupTest case: 65nm technology
Library preparation– 65GP from TSMC (Lgate= 60nm)
– Multi-Lgate libraries: 50nm, 60nm, 70nm
– Lgate biasing: 55nm, 56nm, …, 65nm
– Naming convention: e.g., L60 Lgate 60nm
Delay and leakage evaluation– RC extraction: Synopsys STAR-RCXT (v2007.06)– Delay: Synopsys PrimeTime (v2006.12)– Leakage power: Cadence SOC Encounter (v5.2)
Comparison– Synopsys Astro (v2006.06-SP5) / Cadence SOC Encounter (v5.2)
Design Block Size (mm ^2) #Cell Instances #Nets
AES 0.08 16989 17651JPEG 0.39 86782 91479
JPEG3 1.16 255247 269330
11UCSD VLSI CAD Laboratory ISQED-2009
original SOCE ASTRO LP original SOCE ASTRO LP SOCE ASTRO LP
AES -0.065 -0.135 -0.079 -0.062 360.5 293.5 304.2 294.6 426 317 26
JPEG -0.109 -0.170 -0.110 -0.103 2576.2 2271.1 2466.5 2180.0 3745 427 323
JPEG3 -0.155 -0.329 -0.166 -0.151 7444.1 6283.4 7166.5 6272.0 14569 1473 1754
Leakage (uW) Runtime (s)Design
WNS (ns)
Leakage Optimization with Lgate-Biasing
Inputs– Initial design with L60– Lgate biased libraries: from L55 to L65
Outputs– Meet timing– Better leakage– 8X 16X faster runtime than SOCE
SOCE Astro LP SOCE Astro LP SOCE Astro LP
12UCSD VLSI CAD Laboratory ISQED-2009
Leakage Optimization with Multi-Lgate
Inputs:– Initial design with L60– Multi-Lgate libraries: L50, L60 and L70
Outputs– Meet timing– Better leakage– 5X 14X faster than SOCE
original SOCE ASTRO LP original SOCE ASTRO LP SOCE ASTRO LP
AES -0.065 -0.129 -0.098 -0.051 360.5 290.3 274.9 304.6 285 62 20JPEG -0.109 -0.164 -0.166 -0.093 2576.2 2304.2 2253.6 2365.4 1701 201 312
JPEG3 -0.155 -0.333 -0.215 -0.154 7444.1 6347.5 6599.4 6535.2 16104 735 1449
runtime (s)leakage (uW)Design
WNS (ns)
SOCE Astro LP SOCE Astro LP SOCE Astro LP
13UCSD VLSI CAD Laboratory ISQED-2009
Timing Legalization with Lgate Biasing
Inputs– Initial design with L60– SOCE leakage optimization worsen timing slack– Lgate biased libraries: from L55 to L65
Outputs– Turn timing slacks back or even better– Still obtains smaller leakage power than original
WNS TNS Leakage WNS TNS Leakage WNS TNS Leakage runtime (s)
AES -0.065 -0.690 360.5 -0.135 -5.720 293.5 -0.064 -1.165 298.3 17JPEG -0.109 -8.650 2576.2 -0.170 -20.586 2271.1 -0.105 -7.856 2294.0 327
JPEG3 -0.155 -38.473 7444.1 -0.329 -133.168 6283.4 -0.149 -23.976 6555.0 1275
After SOCEDesign
Before leakage optimization After LP
14UCSD VLSI CAD Laboratory ISQED-2009
Timing Legalization with Multi-Lgate
Inputs– Initial design with L60– SOCE leakage optimization worsen timing slack– Multi-Lgate libraries: L50, L60 and L70
Outputs– Turn timing slacks back or even better– Still obtains smaller leakage power than original
WNS TNS Leakage WNS TNS Leakage WNS TNS Leakage runtime (s)
AES -0.065 -0.690 360.5 -0.129 -10.084 290.3 -0.051 -2.333 346.4 18JPEG -0.109 -8.650 2576.2 -0.164 -16.635 2304.2 -0.101 -11.211 2338.5 313
JPEG3 -0.155 -38.473 7444.1 -0.333 -151.103 6347.5 -0.151 -86.648 6506.3 1266
DesignBefore leakage optimization After SOCE After LP
15UCSD VLSI CAD Laboratory ISQED-2009
Conclusion and Ongoing WorkWe revisited and implemented LP-based frameworks
– Leakage power minimization and timing legalization
Compared with commercial tools, our work shows– Always meet timing– Better leakage power– ~10X faster runtime
Our methods enable very fast, high-quality power-delay tradeoff estimation and optimization
Ongoing work – Larger industrial testcases– Multi-Vth, multi-Lgate and Lgate-biasing
– Timing margin and don't-touch methodologies,– Hold time– Multi-mode/multi-corner analysis– dynamic/total power constraints
THANK YOU
16UCSD VLSI CAD Laboratory ISQED-2009