4/8/2012
1
Energy and Delay Models
EE216B: VLSI Signal Processing
Prof. Dejan Marković [email protected]
Lecture Overview
Goal: tie-in parameters of the underlying implementation technology together with algorithm-level specifications
Strategy
– Technology characterization (energy, delay)
– Circuit-level tuning (gate size, supply voltage)
– Tradeoff analysis (E-D space, logic depth, activity)
Remember
– We will go all the way down to these low-level results to match algorithm specs with technology characteristics
2.2
4/8/2012
2
Power and Energy Figures of Merit
Power consumption in Watts
– Determines battery life in hours
Peak power
– Determines power ground wiring designs
– Sets packaging limits
– Impacts signal noise margin and reliability analysis
Energy efficiency in Joules
– Rate at which power is consumed over time
Energy = Power * Delay
– Joules = Watts * seconds
– Lower energy number means less power to perform a computation at the same frequency
2.3
Power versus Energy
Watts
time
Power is the height of the waveform
Watts
time
Approach 1
Approach 2
Approach 2
Approach 1
Energy is the area under the waveform
Lower power design could simply be slower
Two approaches require the same energy
2.4
4/8/2012
3
Dynamic (~75% today, decreasing)
Short-circuit (~5% today, decreasing)
Leakage (~20% today,
slowly increasing)
Review: Energy and Power Equations
E = α01· CL· VDD2 + α01 ·tsc · VDD · Ipeak + VDD · Ileakage /fclock
P = f01 · CL· VDD2 + f01 · tsc· VDD· Ipeak + VDD · Ileakage
f01 = α01 · fclk
Energy = Power / fclk
2.5
Dominant Energy Components
Dramatic increase in Leakage Energy
0
1
2
3
4
5
0.25 µm 0.18 µm 0.13 µm 90 nm 65 nm
Technology Generation
Ener
gy (
no
rm.)
leakage switching
W
VDD Switching: charges the load capacitance
Leakage: parasitic component
2.6
4/8/2012
4
Switching Energy
Every 0→1 transition at the output, an amount of energy is taken out of supply (energy source)
CL
VDD
Vin Vout
0 1 · ·OH
OL
V
L out outVE C V dV
20 1 ·L DDE C V
2.7
Energy Balance
One half of the energy from supply is consumed in the pull-up network and one half is stored on CL
Charge from CL is discharged to Gnd during the 1→0 transition
E0→1
PMOS network
NMOS network
. . .
A1
AN
CL
Vout
VDD
E1→0
E0→1 = CL · VDD2
E1→0 = 0.5 · CL · VDD2
ER = E1→0
ER = 0.5 · E0→1
EC = 0.5 · E0→1
Energy from supply
heat
heat
2.8
4/8/2012
5
Consider switching a CMOS gate for N clock cycles
EN : the energy consumed for N clock cycles n(N) : the number of 0→1 transitions in N clock cycles
Node Transition Activity and Energy
2· · ( )N L DDE C V n N
2( )lim lim · ·N
avg L DDN N
E n NE C V
N N
0 1
( )limN
n Nα
N
20 1· ·avg L DDE α C V
2.9
Lowering Switching Energy
Esw = a01 · CL · VDD2
Capacitance: Function of fan-out, wire length, transistor sizes
Supply Voltage: Has been dropping* with CMOS scaling
Activity factor: How often, on average, do nodes switch?
2.10
4/8/2012
6
Switched Capacitance
i i+1
Cwire Cparasitic,i Cgate,i+1
For large fanouts, we may neglect the parasitic component
VDD,i VDD,i+1
L sw par outCC C C
, 1sw out wire gate iC C C C
2.11
MOS Capacitances
Gate-Channel Capacitance
– CGC = Cox·W·Leff (Off, Linear)
– CGC = (2/3)·Cox·W·Leff (Saturation)
Gate Overlap Capacitance
– CGSO = CGDO = CO·W (Always)
Junction/Diffusion Capacitance
– Cdiff = Cj·LS·W + Cjsw·(2LS + W) (Always)
Circuit design
Cgate
Cparasitic
Simple linear models
– Designers typically use C / unit width (fF/mm)
γ = Cpar / Cgate (typically γ < 1)
– 90 nm gpdk: γ = 0.61
90 nm gpdk 2.5 fF/mm
C W
2.12
4/8/2012
7
Leakage Energy
When the gate is idle (keeping the state), an amount of energy is taken out of supply (energy source)
CL
VDD
Vin Vout
Sin = 1
Sin = 0
The sub-threshold leakage current is the dominant component
( )· /Leak Leak in DD clockE I S V f
2.13
Sub-Threshold ID vs. VGS
Physical model
Empirical model
[mV/dec]
DIBL
· /0· ·(1 )
DSGSVV
k T qSDSI I e e
2
· · /0
·· · ·
TV
n k T qk TWI μ e
L q
·
0
0
· ·10GS T DSV V γ V
SDS
WI I
W
· · (10)kT
S n lnq
2.14
4/8/2012
8
VDS : 0 to 0.5V
Sub-Threshold ID vs. VGS
I D (A
)
VGS (V)
10x
90 mV
90 mV/dec
lower VT
Exp. increase
0 0.2 0.4 0.6 0.8 1
·
0
0
· ·10GS T DSV V γ V
SDS
WI I
W
· · (10)kT
S n lnq
10−12
10−10
10−8
10−6
10−4
2.15
Balancing Switching and Leakage Energy
Switching energy drops quadratically with VDD
Leakage energy reaches a minimum, then increases
– This is because fclock drops exponentially at low VDD
0.001
0.01
0.1
1
0 0.2 0.4 0.6 0.8 1 1.2
Vdd (V)
En
erg
y (
no
rm.)
Switching
Leakage
Esw = α01 · CL · VDD2
Elk = Ilk(Sin) · VDD / fclock
0 0.2 0.4 0.6 0.8 1 1.2
VDD (V)
0.001
0.01
0.1
1
Ener
gy (
no
rm.)
Energy-VDD
2.16
4/8/2012
9
Total Energy has a Minimum
Total energy is limited by sub-threshold conduction
– Current doesn’t decrease, but delay increases rapidly
0.001
0.01
0.1
1
0 0.2 0.4 0.6 0.8 1 1.2
Vdd (V)
En
erg
y (
no
rm.)
Total
Switching
Leakage
0 0.2 0.4 0.6 0.8 1 1.2
VDD (V)
0.001
0.01
0.1
1
Ener
gy (
no
rm.)
0.3 V
Energy-VDD
12
x
Interesting result: only an order of magnitude in energy reduction is possible by VDD scaling!
Simulation parameters: 65 nm CMOS Activity = 0.1 Logic depth = 10
2.17
Alpha-Power Model of the Drain Current
Basis for delay calculation, also useful for hand analysis [1]
Empirical model
– Curve fitting (MMSE)
– α is between 1 and 2
– In 90 nm, it is ~1.4 (it depends on VTH) ● Can fit to α = 1, but with
what VTH?
1· · ·( )
2α
DS ox GS TH
WI μ C V V
L
[1] T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, Apr. 1990.
I D (
no
rmal
ize
d)
VDS / VDD
VGS
0 0.2 0.4 0.6 0.8 1 0
1
2
3
4
5
6 simulation model
2.18
4/8/2012
10
Alpha-Power-Based Delay Model
Fitting parameters [2]
Von , αd , Kd
··
( Δ ) d
pard DD outα
DD on TH in in
WK V WDelay
V V V W W
Inv NAND2
model
0.5 0
0.5
0.6 0.7 0.8 0.9 1
VDD / VDDref
1
1.5
2
2.5
3
3.5
De
lay
/ D
ela
yref
[2] V. Stojanović et al., “Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” in Proc. Eur. Solid-State Circuits Conf., Sept. 2002, pp. 211-214.
2.19
Alpha-Power-Based Delay Model
heff
Fitting parameters Von , αd , Kd
Effective fanout, heff
heff = g · (VDD, DVTH) · h
··
( Δ ) d
pard DD outα
DD on TH in in
WK V WDelay
V V V W W
Inv NAND2
model
0.5 0
0.5
0.6 0.7 0.8 0.9 1
VDD / VDDref
1
1.5
2
2.5
3
3.5
De
lay
/ D
ela
yref
2.20
4/8/2012
11
Gate Delay as a Function of VDD
Delay increases exponentially in sub-threshold
1
10
100
1000
10000
100000
0 0.2 0.4 0.6 0.8 1 1.2
Vdd (V)
De
lay
(n
orm
.)
0 0.2 0.4 0.6 0.8 1 1.2
VDD (V)
1
100
10,000
100,000 D
elay
(n
orm
.)
Delay-VDD
10
1,000
2.21
Energy-Delay Tradeoff
Assumptions: 65 nm technology, datapath activity = 0.1, logic depth = 10
Energy ↓ 10% 25% 2x 3x 5x 10x
Delay ↑ 7% 27% 2x 4x 10x 130x
Ener
gy (
no
rm.)
0.001
0.01
0.1
1
1 10 100 1000 10000 100000
Delay (norm.)
En
erg
y (
no
rm.)
Total
Switching
Leakage
1 100 1000 104 105
Delay (norm.)
0.001
0.01
0.1
1
10
Energy-delay
12
x
1000x Hardly a tradeoff: a 1000x delay increase for a 12x energy reduction
Which operating point to choose?
2.22
4/8/2012
12
PDP and EDP
Power-delay product (PDP) = Pavg · tp = (CL · VDD2)/2
– PDP is the average energy consumed per switching event (Watts * sec = Joule)
– Lower power design could simply be a slower design
Energy-delay product (EDP)
– EDP = PDP · tp = Pavg · tp2
– EDP = average energy * the computation time required
– One can trade increased delay for lower E/op (e.g. via VDD scaling)
Energy*Delay (EDP)
Energy (PDP)
Delay
0 0.4 0.6 0.8 1
VDD (norm.)
0
0.5
1
1.5
Ener
gy-d
elay
(n
orm
.)
2.23
Choosing Optimal VDD
Optimal VDD depends on the optimization goal
– VDD increases as we put more emphasis on delay
VDD|minE < VDD|minEDP < … < VDD|minD
Energy*Delay (EDP)
Energy (PDP)
Delay
0 0.4 0.6 0.8 1
VDD (norm.)
0
0.5
1
1.5
Ener
gy-d
elay
(n
orm
.)
2.24
4/8/2012
13
Energy-Delay Tradeoff
Unified description of wide range of E and D targets – Choose the operating point that best meets E-D constraints
Delay
VDD scaling
Energy
Emax
Dmax Dmin
Emin
E · D
1
E · D 2
2
E · D 3
3
E · D n
n
E 2 · D
1/2
E 3 · D
1/3
E n · D
1/n
Slope of the line indicates the emphasis on E or D
2.25
Energy-Delay Optimization
Equivalent formulations – Achieve the lowest energy under delay constraint – Achieve the best performance under energy constraint
Delay
Unoptimized design
sizing
VDD
sizing & VDD
sizing & VDD & VTH
Energy
Emax
Dmax Dmin
Emin
(fclkmax) (fclk
min) 2.26
4/8/2012
14
Circuit-Level Optimization
VDD , VTH , W Circuit topology
A
B
Delay
En
erg
y
A
B
Delay
En
erg
yCircuit Optimization
Objective: minimize E E = E(VDD , VTH , W)
Constraint: Delay D = D(VDD , VTH , W)
Energy-Delay
Tuning variables VDD , VTH , W
Constraints VDD
min < VDD < VDDmax
VTHmin < VTH < VTH
max
Wmin < W
Number of bits
Delay
2.27
Summary
The goal in algorithm design is to minimize the number of operations required to perform a task
– Once the number of operations is minimized, circuit-level implementation can further reduce energy by lowering supply voltage, switching activity, or gate capacitance
– There exists a well-defined minimum-energy point in CMOS technology due to parasitic leakage currents
– Considering energy alone is insufficient, energy-performance tradeoff reveals how much energy reduction is possible given a performance constraint
– Energy and performance models with respect to gate size, supply and threshold voltage provide basis for circuit optimization (finding the best energy-delay tradeoff)
2.28