Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | abel-rogers |
View: | 215 times |
Download: | 0 times |
Sp09 CMPEN 411 L14 S.1
CMPEN 411VLSI Digital Circuits
Spring 2009
Lecture 14: Designing for Low Power
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L14 S.2
Reminders Next lecture
Dynamic logic - Reading assignment – Rabaey, et al, 6.3
Sp09 CMPEN 411 L14 S.3
Review: CMOS Power Equations
P = CL VDD2 f + tscVDD Ipeak f + VDD Ileak
Dynamic power
Short-circuit power
Leakage power
Sp09 CMPEN 411 L14 S.4
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.5
Transistor Sizing for Minimum Energy
Device sizing COMBINED with supply voltage reduction is a veryeffective way to reduce the energy consumption of a logic network
Device sizing affects dynamic energy consumption gain is largest for networks with large overall effective fan-outs (F
= CL/Cg,1)
Sp09 CMPEN 411 L14 S.7
Dynamic Power Consumption is Data Dependent
A B Out
0 0 1
0 1 0
1 0 0
1 1 0
2-input NOR Gate
With input signal probabilities PA=1 = 1/2 PB=1 = 1/2
Static transition probability P01 = Pout=0 x Pout=1
= P0 x (1-P0)
Switching activity, P01, has two components A static component – function of the logic topology A dynamic component – function of the timing behavior (glitching)
NOR static transition probability = 3/4 x 1/4 = 3/16
Sp09 CMPEN 411 L14 S.8
NOR Gate Transition Probabilities
CL
A
B
BA
P01 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)
PA
PB
0
1 0 1
Switching activity is a strong function of the input signal statistics PA and PB are the probabilities that inputs A and B are one
Sp09 CMPEN 411 L14 S.9
Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
B
AZ
X0.5
0.5
For Z: P01 =
For X: P01 =
Sp09 CMPEN 411 L14 S.10
Transition Probabilities for Some Basic Gates
P01 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)
OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)
AND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
B
AZ
X0.5
0.5
For Z: P01 = P0 x P1 = (1-PXPB) PXPB
For X: P01 = P0 x P1 = (1-PA) PA
= 0.5 x 0.5 = 0.25
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Sp09 CMPEN 411 L14 S.11
Another Example
B
A
Z
X0.5
0.5
(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085
Sp09 CMPEN 411 L14 S.12
Inter-signal Correlations
B
A
Z
X
P(Z=1) = P(B=1) & P(A=1 | B=1)
0.5
0.5
(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085Reconvergent
Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan-out
Have to use conditional probabilities
notice that Z = (A or B) and B = AB or B = B,
so 0 -> 1 should be (and is) 1/2 x 1/2 = 1/4 !!!
Sp09 CMPEN 411 L14 S.13
Logic Restructuring
Chain implementation has a lower overall switching activity than the tree implementation for random inputs
Logic restructuring: changing the topology of a logic network to reduce transitions
A
BC
D F
AB
CD Z
FW
X
Y0.5
0.5
(1-0.25)*0.25 = 3/16
0.50.5
0.5
0.5
0.5
0.5
7/64
15/256
3/16
3/16
15/256
AND: P01 = P0 x P1 = (1 - PAPB) x PAPB
Sp09 CMPEN 411 L14 S.14
Input Ordering
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
Which is better wrt transition probabilities?
Sp09 CMPEN 411 L14 S.15
Input Ordering
Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)
A
BC
X
F
0.5
0.20.1
B
CA
X
F
0.2
0.10.5
(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
Which is better wrt transition probabilities?
Sp09 CMPEN 411 L14 S.16
Glitching in Static CMOS Networks
ABC
X
Z
101 000
Unit Delay
AB
X
ZC
Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before
settling to the correct logic value
Sp09 CMPEN 411 L14 S.17
Glitching in Static CMOS Networks
ABC
X
Z
101 000
Unit Delay
AB
X
ZC
Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before
settling to the correct logic value
Sp09 CMPEN 411 L14 S.18
Glitching in an RCA
S0S1S2S14S15
Cin
0
1
2
3
0 2 4 6 8 10 12
Time (ps)
S O
utp
ut
Vo
ltag
e (
V)
Cin
S0
S1
S2
S3
S4
S5S10
S15
Sp09 CMPEN 411 L14 S.19
Balanced Delay Paths to Reduce Glitching
So equalize the lengths of timing paths through logic
F1
F2
F3
0
0
0
0
1
2
F1
F2
F3
0
0
0
0
1
1
Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs
Sp09 CMPEN 411 L14 S.20
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.21
Dynamic Power as a Function of VDD
Decreasing the VDD
decreases dynamic energy consumption (quadratically)
But, increases gate delay (decreases performance)
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V) t p
( no
r ma
l ize
d)
Determine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).
Sp09 CMPEN 411 L14 S.22
Multiple VDD Considerations How many VDD? – Two is becoming common
Many chips already have two supplies (one for core and one for I/O)
When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up)
If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off
- The cross-coupled PMOS transistors do the level conversion
- The NMOS transistor operate on a reduced supply
Level converters are not needed for a step-down change in voltage
Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47)
VDDH
Vin
VoutVDDL
Sp09 CMPEN 411 L14 S.23
Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic
paths are critical (have the same delay)
Clustered voltage-scaling Each path starts with VDDH and switches to VDDL (gray logic
gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths
Sp09 CMPEN 411 L14 S.24
Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic
paths are critical (have the same delay)
Clustered voltage-scaling Each path starts with VDDH and switches to VDDL (gray logic
gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths
Sp09 CMPEN 411 L14 S.25
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Active
(Dynamic)
Logic design
Reduced Vdd
TSizing
Multi-Vdd
Clock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Leakage
(Standby)
Multi-VT
Stack effect
Pin ordering
Sleep Transistors
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.26
Stack Effect Subthreshold leakage is a function of the circuit topology
and the value of the inputs
VT = VT0 + (|-2F + VSB| - |-2F|)
where VT0 is the threshold voltage at VSB = 0; VSB is the source- bulk (substrate) voltage; is the body-effect coefficient
A B
B
A
Out
VX
Leakage is least when A = B = 0
Leakage reduction due to stacked transistors is called the stack effect
Sp09 CMPEN 411 L14 S.28
Leakage as a Function of Design Time VT
Reducing the VT increases the sub-threshold leakage current (exponentially)
90mV reduction in VT increases leakage by an order of magnitude
But, reducing VT decreases gate delay (increases performance)
0 0.2 0.4 0.6 0.8 1
VGS (V)ID
(A)
VT=0.4VVT=0.1V
Determine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.
A careful assignment of VT’s can reduce the leakage by as much as 80%
Sp09 CMPEN 411 L14 S.29
Dual-Thresholds Inside a Logic Block
Minimum energy consumption is achieved if all logic paths are critical (have the same delay)
Use lower threshold on timing-critical paths Assignment can be done on a per gate or transistor basis; no
clustering of the logic is needed No level converters are needed
Sp09 CMPEN 411 L14 S.30
IBM Cu11/Cu08 Blue Logic Library
ASIC Cu11 (130nm) Library : Dual-vt library 2690 total cells in standard cell library Nominal Vt level (~300mv) Low Vt level (~210mv)
Low-vt version has same physical footprint ~15% improvement in gate delay ~10x increase in leakage power
ASIC Cu08 (90nm) Library : Multi-vt library 2118 total cells in standard cell library
Intermediate-vt (AVT) and Low-vt (LVT) version of each cell Two more vt levels being planned (very lowvt and high vt)
Sp09 CMPEN 411 L14 S.31
An example to summarize all design-time techniques
Critical path
Sp09 CMPEN 411 L14 S.32
Design Time Low Power Techniques
Lower Vdd
Higher Vdd
Level Converter
Sp09 CMPEN 411 L14 S.33
Design Time Low Power Techniques
Higher Vth
Lower Vth
Sp09 CMPEN 411 L14 S.34
Design Time Low Power Techniques
Stack Forcing
In Out
1/2 W
W
W 1/2 W
1/2 W
1/2 W
Sp09 CMPEN 411 L14 S.35
Low Power Techniques – Interaction w/ each other
Higher Vth
Lower VthApply high Vth and size-up to recover speed
Sp09 CMPEN 411 L14 S.36
Next Lecture and Reminders Next lecture
Dynamic logic - Reading assignment – Rabaey, et al, 6.3