1
On the Rules of Low-Power Design(and How to Break Them)
On the Rules of Low-Power Design(and How to Break Them)
Prof. Todd Austin
Advanced Computer Architecture LabUniversity of Michigan
Once upon a time…Once upon a time…
2
Rules of Low-Power DesignRules of Low-Power Design
1. Minimize switching activity2. Design for lower load capacitance3. Reduce frequency4. Reduce leakage
and the most important of all:5.5. Decrease supply voltage!Decrease supply voltage!
Process margin
Ambient margin
Critical voltage(determined by
critical path)
0.8vVth
P = aCV2f + VIleakaaCC ff IIleakleakVVVV221.2v
Noise margin
Overclockers Break the RulesOverclockers Break the Rules
Process margin
Ambient margin
0.8vVth
1.2vNoise margin
3
Goals of This PresentationGoals of This Presentation
Review some of the rules of low-power design
Show how clever designs can break these rulesRazor resilient circuitsSubliminal subthreshold voltage processor
Highlight the benefits of taking a rule-breaking approach to technical research
Investigating OverclockingInvestigating Overclocking
4
Two Slow Pipelines Check a Fast PipelineTwo Slow Pipelines Check a Fast Pipeline
48-b
it LF
SR
48-b
it LF
SR
48-b
it LF
SR
48-b
it LF
SR
X
X
X
clk/2
clk/2
clk clk
clk/2
clk/2
clk
!=
40-b
it E
rror
Cou
nter
40-b
it E
rror
Cou
nter
Slow Pipeline A
Slow Pipeline B
Fast Pipeline
clk/2
18
18
36
36
36
18x18
18x18
18x18
stabilize
90 MHz
45 MHz
45 MHz
18x18-bit Multiplier Block at 90 MHz and 27 C
0.0000000%0.0000001%0.0000010%0.0000100%0.0001000%0.0010000%0.0100000%0.1000000%1.0000000%10.0000000%100.0000000%
1.141.181.221.261.301.341.381.421.461.501.541.581.621.661.701.741.78Supply Voltage (V)
Erro
r rat
e
Zero-margin@ 1.54 V
Environmental-margin@ 1.69 V
Observation: Voltage Margins Are PlentifulObservation: Voltage Margins Are Plentiful
35% energy savings with 1.3% errors
20% energy savings
One error every 20 seconds!
Margin grows if a few (~1%) errors can be tolerated
5
Razor Resilient CircuitsRazor Resilient Circuits
Main
FF
Shad
ow La
tch
Main
FF
clk clk
clk_del
5
49 MEM39
9
Double-sampling metastability tolerant latches detect timing errors
Second sample is correct-by-design
Microarchitectural support restores program stateTiming errors treated like branch mispredictions
recover
IF
Razo
r FF ID
Razo
r FF EX
Razo
r FF MEM
(read-only)WB
(reg/mem)
error bubble
recover recover
Razo
r FF
Stab
ilizer
FF
PC
recover
flushID
bubbleerror bubble
flushID
error bubble
flushIDFlushControl
flushID
error
Cycle: 0
inst1inst2inst3inst4inst5
123456
inst6
Distributed Pipeline Recovery
inst2inst7inst8
789
inst3inst4
Builds on existing branch prediction frameworkMultiple cycle penalty for timing failureScalable design as all communication is local
6
Razor Prototype DesignRazor Prototype Design
Six stage 64-bit Alpha pipeline200MHz in 0.18mm @ 1.8Vtunable via sw from 200-50MHz, 1.8-1.1V
32-entry, 3-port RF, 8K I-Cache/8K D-CacheBranch-not-taken branch predictorFull scan capability
Razor overhead:192 Razor FF out of 2408 (9%)Error-free power overhead:
Razor flip-flops: < 1%Short path buffer: 2.1%
Recovery power overhead:18x an inst, for pipeline recovery
D-Cache
IF ID EX
ME
M
WB
Register FileI-Cache
3.3 m
m
3 mm
Razor Prototype TestbedRazor Prototype Testbed
7
Razor-Based Dynamic Voltage ScalingRazor-Based Dynamic Voltage Scaling
Eref
VoltageControl
Function Σ...
Pipeline
reset
Vdd
Ediff = Eref - Esample
-
EsampleVoltageRegulator
Edifferror
signals
Current design utilizes a very simple proportionalcontrol function
Control algorithm implemented in software
20 40 60 80 100 120 1400123456789
10
1.481.521.561.601.641.681.721.761.80
Example Voltage Controller ResponseExample Voltage Controller Response
Two minute snapshot of a 15 minute run
Con
trol
ler O
utpu
t Vol
tage
(V)
Perc
enta
ge E
rror
Rat
e
Time (Seconds)01
432
56
910
78
8
Effects of Razor DVSEffects of Razor DVS
Decreasing Supply Voltage
Energy
Energy ofPipeline
Recovery,Erecovery
Total Energy,Etotal = Eproc + Erecovery
Optimal Etotal
PipelineThroughput
IPC
Energy of Processorw/o Razor Support
1%
50%
Energy of ProcessorOperations, Eproc
Razor Also Improves YieldRazor Also Improves Yield
1.4 1.5 1.6 1.7 1.8
1.4
1.5
1.6
1.7
1.8 Chips Linear Fit y=0.78685x + 0.22117
Voltage at First Failure
Volta
ge a
t 0.1
%Er
ror R
ate
9
0.8v
How Razor Breaks the RulesHow Razor Breaks the Rules
Traditional worst-case design techniques must observe margin rules for reliable operation
Incorporating timing-error correction mechanisms allow margins to be erased
Infrequent use of critical paths allow for even deeper cuts in Vdd
Process margin
Ambient margin
0.8vVth
1.2vNoise margin
0.8v
Back to the RulesBack to the Rules
1. Minimize switching activity2. Design for lower load capacitance3. Reduce frequency4. Reduce leakage
and the most important of all:5.5. Decrease supply voltage!Decrease supply voltage!
Critical voltage(determined by
critical path)
0.8vVth
P = aCV2f + VIleakaaCC ff IIleakleakVVVV22
Process margin
Ambient margin
1.2vNoise margin
10
Subthreshold Circuits Break The RulesSubthreshold Circuits Break The Rules
Static logic still works below VthDifferences in Ileak continue to (dis)charge outputsBut diminished Ion/Ioff results in big delays
Approach works if the apps are not too demanding
P
N
OUTIN P
N
OUTIN1.2V
0V
1.2V
0V
0.2V
0V 0V
0.2V
Superthreshold Subthreshold
Sensing Applications
Security
Environmental Industrial
Biomedical
11
Sensor Processing Data RatesSensor Processing Data Rates
Sensing Communication
Computation
Power Supply
Storage
Sensor Processor
12
Sensing Performance Demands are LowSensing Performance Demands are Low
2965.01 3943.478036.77 8296.37
1.00
10.00
100.00
1000.00
10000.00
Speed (Hz)
Voltage (V)
Platform
325M250M133M100M
1.21.21.21.2
ARM 1020TARM 920TARM 7TDMIARM 720T
xRT:
# tim
es fa
ster t
han r
eal-ti
me
Fast Growing Leakage Complicates DesignFast Growing Leakage Complicates Design
Energy per Instruction
Cycles per Instruction
Energy per Cycle
Einst = Ecycle CPI
Activity factor - average number of transistor switches per transistor per cycle
Total circuit capacitance
Supply Voltage Leakage current
Clock period
Ecycle = N(½αCsVdd + VddIleaktclk2
13
Fast Growing Leakage Complicates DesignFast Growing Leakage Complicates Design
Activity factor - average number of transistor switches per transistor per cycle
Total circuit capacitance
Supply Voltage Leakage current
Clock period
Ecycle = N(½αCsVdd + VddIleaktclk2
Impact of voltage reduction
⇓ quad.
⇓ quad.
Edyn
???
⇓ quad.
Ecycle
⇑ ~exp.⇑ exp.⇓ linearSubthreshold
~const.⇑ linear⇓ linearSuperthreshold
EleaktclkIleak
Tension
Fast Growing Leakage Complicates DesignFast Growing Leakage Complicates Design
Impact of voltage reduction
⇓ quad.
⇓ quad.
Edyn
???
⇓ quad.
Ecycle
⇑ ~exp.⇑ exp.⇓ linearSubthreshold
~const.⇑ linear⇓ linearSuperthreshold
EleaktclkIleak
Tension
14
Lessons from Architectural StudiesLessons from Architectural Studies
To reduce Energy per instructionMinimize CPI
To reduce Vmin and energy per cycleMaximize Transistor utility
To reduce leakage energy per cycleMinimize area
To minimize energy at subthreshold voltages, architects must:
Winning designs tend to be compromising designs that balance area, transistor utility and CPIMemory comprises the single largest factor ofleakage energy, therefore, efficient designs must reduce memory storage requirements
Subliminal Architectural OverviewSubliminal Architectural Overview
Imem4x16x2x12
Dmem128x8
Pref
etch
Buf
fer
2x2x
12
RegisterFile
Scheduler
32-bitTimer
PageControl
OpAControl
OpBControl
μOperationDecoder
RegisterWrite
Control
JumpControl
ALU
IF/ID Stage EX/MEM Stage WB Stage
FlagControl
Carry
FetchControl
ExternalInterrupts
Zero
8
8
8
8
12
24
8
8
15
Subliminal processors
Large solar cell
Solar cell for adders
level converter array
Discrete adders
Mux-based memories
Custom memories
Solar cell for processor
Discrete cells Solar cell for discrete cells
Test module
Test memory
Level converter array Subliminal processors
Large solar cell
Solar cell for adders
level converter array
Discrete adders
Mux-based memories
Custom memories
Solar cell for processor
Discrete cells Solar cell for discrete cells
Test module
Test memory
Level converter array
First Subliminal ChipFirst Subliminal Chip
0.01 0.1 1 1002468
1012141618202224
Ener
gy/In
st (p
J)
MIPS
Subliminal(Michigan)
0.01 0.1 1 1002468
1012141618202224
Ener
gy/In
st (p
J)
MIPS
0.85pJ/[email protected] 4MIPS4
Hempstead(Harvard)
CleverDust(Berkeley)
SNAP/LE(Cornell)Hempstead
(Harvard)
SNAP/LE(Cornell)
Subliminal(Michigan)
2.25pJ/Inst@1MIPS1
Pareto Analysis of Sensor Network ProcessorsPareto Analysis of Sensor Network Processors
16
How Subliminal Breaks the RulesHow Subliminal Breaks the Rules
Traditional circuit design relies an transistor switching to perform computation
Static logic circuits continue to operate below Vth by modulating leakage currents
Approach lends itself to low-demand sensor apps, as long as care is taken to build an efficient processor
What I Really Learned…What I Really Learned…
A rule-breaking approach to technical research is effective and engaging
You will find yourself on very fertile ground“It is that which everyone knows is certainly true,that is indeed false.”“The early bird gets the worm.”“If you are not failing some of the time, you are not trying hard enough.”
You will more fully engage your colleaguesOne half will think crazy idea will never workOne half will be intrigued (with your crazy idea)
17
QuestionsQuestions
??
??
?
? ?
? ?
?
??