EENG449b/SavvidesLec 10.1
2/17/05
February 17, 2005
Prof. Andreas Savvides
Spring 2005
http://www.eng.yale.edu/courses/eeng449bG
EENG 449bG/CPSC 439bG Computer Systems
Lecture 11
Power Issues and DVS
EENG449b/SavvidesLec 10.2
2/17/05
Announcements
• Reading reference for this lecture– J. Pouwese, K. Lagendoen, H. Sips, “Dynamic
Voltage Scaling on a Low Power Microprocessor”, posted on the class website
• Midterm date discussion & conflicts with other classes
EENG449b/SavvidesLec 10.3
2/17/05
Why worry about power?Intel vs. Duracell
• No Moore’s Law in batteries: 2-3%/year growth
Processor (MIPS)
Hard Disk (capacity)
Memory (capacity)
Battery (energy stored)
0 1 2 3 4 5 6
16x
14x
12x
10x
8x
6x
4x
2x1x
Improvement(compared to year 0)
Time (years)
EENG449b/SavvidesLec 10.4
2/17/05
Current Battery Technology is Inadequate
• Example: 20-watt battery» NiCd weighs 0.5 kg, lasts 1 hr, and costs $20» Comparable Li-Ion lasts 3 hrs, but costs > 4x
more
Battery Rechargeable? Wh/lb Wh/litreAlkaline MnO2 NO 65.8 347Silver Oxide NO 60 500Li/MnO2 NO 105 550Zinc Air NO 140 1150NiCd YES 23 125Li-Polymer YES 65-90 300-415
EENG449b/SavvidesLec 10.5
2/17/05
Comparison of Energy Sources
Power (Energy) Density Source of Estimates
Batteries (Zinc-Air) 1050 -1560 mWh/cm3 (1.4 V) Published data from manufacturers
Batteries(Lithium ion) 300 mWh/cm3 (3 - 4 V) Published data from manufacturers
Solar (Outdoors)
15 mW/cm2 - direct sun
0.15mW/cm2 - cloudy day. Published data and testing.
Solar (Indoor)
.006 mW/cm2 - my desk
0.57 mW/cm2 - 12 in. under a 60W bulb Testing
Vibrations 0.001 - 0.1 mW/cm3 Simulations and Testing
Acoustic Noise
3E-6 mW/cm2 at 75 Db sound level
9.6E-4 mW/cm2 at 100 Db sound level Direct Calculations from Acoustic TheoryPassive Human
Powered 1.8 mW (Shoe inserts >> 1 cm2) Published Study.
Thermal Conversion 0.0018 mW - 10 deg. C gradient Published Study.
Nuclear Reaction
80 mW/cm3
1E6 mWh/cm3 Published Data.
Fuel Cells
300 - 500 mW/cm3
~4000 mWh/cm3 Published Data.
Assume 1mW Average as definition of “Scavenged Energy”
EENG449b/SavvidesLec 10.6
2/17/05
Trends in Total Power Consumption
• Frightening: proportional to area & frequency
DEC 21164
source : arpa-esto
microprocessorpower dissipation
EENG449b/SavvidesLec 10.7
2/17/05
Power Metrics in Microprocessors
nJ/Instruction – Mostly for processors with the same instruction
sets– Does not capture the effect of operand size (e.g
8-bit addition vs. 32-bit addition operations
MIPS/Watt
mA – common among component data sheets
Remember:
2CV21
les)Energy(Jou
time Power (Joules) Energy
IV (Watts)Power
EENG449b/SavvidesLec 10.8
2/17/05
Modeling the Battery Behavior
• Theoretical capacity of battery is decided by the amount of the active material in the cell
» batteries often modeled as buckets of constant energy • e.g. halving the power by halving the clock frequency is assumed to
double the computation time while maintaining constant computation per battery life
• In reality, delivered or nominal capacity depends on how the battery is discharged
» discharge rate (load current)» discharge profile and duty cycle» operating voltage and power level drained
EENG449b/SavvidesLec 10.9
2/17/05
Battery Capacity
• Current in “C” rating: load current nomralized to battery’s capacity
» e.g. a discharge current of 1C for a capacity of 500 mA-hrs is 500 mA
from [Powers95]
EENG449b/SavvidesLec 10.10
2/17/05
Battery Capacity vs. Discharge Current
• Amount of energy delivered is decreased as the current (rate at which power is drawn) is increased
» rated as ampere hours or watt hours when discharged at a specific rate to a specific cut-off voltage
– primary cells rated at a current which is 1/100th of the capacity in ampere hours (C/100)
– secondary cells are rated at C/20 or C/10
• At high currents, the diffusion process that moves new active material from electrolytes to the electrode cannot keep up
» concentration of active material at cathode drops to zero, and cell voltage goes down below cut-off
» even though active material in cell is not exhausted!
EENG449b/SavvidesLec 10.11
2/17/05
Battery Energy Consumers
EENG449b/SavvidesLec 10.12
2/17/05
Power Supply
Where does the Power Go?B
atte
ry
DC-DCConverter
Communication
RadioModem
RFTransceiver
Processing
ProgrammablePs & DSPs
(apps, protocols etc.) Memory
ASICs
Peripherals
Disk Display
EENG449b/SavvidesLec 10.13
2/17/05
Power Consumption for a Computer with Wireless NIC
Display36%
Wireless LAN18%
Hard Drive18%
CPU/Memory21%
Other7%
EENG449b/SavvidesLec 10.14
2/17/05
Energy Consumption ofWireless NICs (Wavelan)
Specs Measured
2 Mbps(Bronze)
Sleep ModeIdle ModeReceive ModeTransmit Mode
9 mA--------280 mA330 mA
14 mA178 mA 200 mA280 mA
11 Mbps (Silver)
Sleep ModeIdle ModeReceive ModeTransmit Mode
10 mA--------180 mA280 mA
10 mA156 mA190 mA284 mA
EENG449b/SavvidesLec 10.15
2/17/05
Example: Power Consumption for Compaq’s iPAQ
206MHz StrongArm SA-1110 processor
320x240 resolution color TFT LCD
Touch screen
32MB SDRAM / 16MB Flash memory
USB/RS-232/IrDA connection
Speaker/Microphone
Lithium Polymer battery
PCMCIA card expansion pack & CF card expansion pack
* Note
CPU is idle state of most of its time
Audio, IrDA, RS232 power is measured when each part is idling
Etc includes CPU, flash memory, touch screen and all other devices
Frontlight brightness was 16
EENG449b/SavvidesLec 10.16
2/17/05
Microprocessor Power Consumption
CMOS Circuits(Used in most microprocessors)
Dynamic ComponentDigital circuit switching inside
the processor
Static ComponentBias and leakage currents
O(1mW)
clk2
ddlddscddleakageddstandby fVCVIVIVIP
Static Dynamic
EENG449b/SavvidesLec 10.17
2/17/05
Power Consumption in Digital CMOS Circuits
clk2
ddlddscddleakageddstandby fVCVIVIVIPower
standbyI
leakageI
scI
- current constantly drawn from the power supply
- determined by fabrication technology
- short circuit current due to the DC path between the supply rails during output transitions
lC - load capacitance at the output node
clkf - clock frequencyddV - power supply voltage
EENG449b/SavvidesLec 10.18
2/17/05
Dynamic Voltage Scaling
• What can you do to conserve power on a processor?
• Dynamic power consumption is the dominant component
• Example: Transmeta’s Crusoe processor
EENG449b/SavvidesLec 10.19
2/17/05
DVS on Low Power Processor
Maximum gain when voltage is lowered BUT lower voltage increases circuit delay
M
1k
2ddkdynamic VfCP
2TDD
DD
)V(VV
τ
CMOS transistor threshold voltageTransistor gain factor
Dynamic Power Component
Number of gates
Load capacitance of gate k
Propagation delay
EENG449b/SavvidesLec 10.20
2/17/05
Voltage Scaling on LART
• Dynamically lower the processor voltage and frequency to reduce power consumption
• LART wearable board– StorngARM 1100 Processor 190MHz– Various I/O capabilities– 32 MB volatile memory– 4 MB non-volatile memory– Programmable voltage regulator
EENG449b/SavvidesLec 10.21
2/17/05
Processor Envelope
At 1.5V Max clock frequency 251MHzMin frequency the processor functions correctly is 59MHz
EENG449b/SavvidesLec 10.22
2/17/05
LART Power Measurement
• Note the measurement setup at Different levels on the board • Always provide hooks for measurement, testing and debugging during your design. Both for software and hardware!!!
Total Power Consumption on the LARTPlatform
Based on dhrystone benchmark
EENG449b/SavvidesLec 10.23
2/17/05
System Support Requirements
• To manage DVS effectively, the computation requirements must be known in advance
• Predictive scheme– Try to learn that behavior based on the
computation profile
• Better scheme: Applications should be power aware
• Processor frequency and scaling should be changed without much delay
– This is specific to each processor– 150us for the LART processor
EENG449b/SavvidesLec 10.24
2/17/05
Example: Power Aware Video Playback
• Annotate a H.263 video decoder with information on the clock speed required to decode a known video sequence
• Using a 12.6s video, 15fps
• Power consumption measurements for LART
– No-DVS: 198mW for CPU, 207mW for memory subsystem
– DVS: 100mW for CPU and 204mW for the memory subsystem
– 2X improvement, but 25% improvement when memory accesses are considered
EENG449b/SavvidesLec 10.25
2/17/05
LART Memory Performance
• Memory access is optimal when high resolution memory access timing is available
• For LART the optimal memory pattern:– 148MHz– 92 MB/s memory bandwidth– Power consumption 514.2mW– Energy cost 5.6mJ/MB
EENG449b/SavvidesLec 10.26
2/17/05
Memory Subsystem Power Consumption – Read Operation
Power consumption Memory Bandwidth
Optimal memory access waveforms
EENG449b/SavvidesLec 10.27
2/17/05
Energy breakdown for read(based on 1MB read)
Regulator Loss-factor
EENG449b/SavvidesLec 10.28
2/17/05
Power Breakdown for H.263 Decoder
EENG449b/SavvidesLec 10.29
2/17/05
Reducing Power Consumptionis a multilevel task!
• Physical layer – Technology – reduce the surface of CMOS circuits
• Architecture/IC level– Several optimizations in the design (e.g parallelism and
pipelining)– Provide hooks for software driven power management (e.g
different power modes and clock speeds)
• OS Level– Smart schedulers, interval schedulers, DVS
• Application Level– Power aware applications that worn the OS and the
hardware about the features needed during application lifetime
– Sleep modes and DVS driven by applications
• Network Level– Networked devices may be able to apply low duty cycles, in
which some of the devices are asleep and others are awake
EENG449b/SavvidesLec 10.30
2/17/05
Conclusions
• Interval based schedulers not so efficient– Interval-scheduler – reduce voltage after a pre-
specified idle period is detected
• Better leverage of DVS when the processor is aware of the application requirements
– Illustrated with the H.263 encoder
• Monitor different power consumption profiles across different sections of the platform and use them to make clever decisions about power-management
• What is missing:– Comments on power regulator efficiencies…
EENG449b/SavvidesLec 10.31
2/17/05
Announcements
• Need to start deciding on the final projects.
• We need to discuss these with you individually at the end of class
• One page detailed proposal by March 3• This should include
– 1 paragraph motivation and description of your project
– 1 paragraph on the approach you are going to use and the tools
– 1 paragraph on evaluation» What is the strategy you will use to evaluate
the performance of your project.
EENG449b/SavvidesLec 10.32
2/17/05
DVS Example
• Consider a processor with DVS• Frequency range 250 – 59MHz• Supply Voltage range 0.8V (@49MHz)
and 1.5V (@250MHz) • Assume that the processor can
compute at 1 MIPS per MHz.
EENG449b/SavvidesLec 10.33
2/17/05
DVS Example 1
• What is the maximum energy saving the processor can achieve with dynamic voltage scaling?
• What is missing?
fCVP 2
9.146.37
5.562
59MHz0.8c
250MHz1.5 cSavingEnergy
2
2
EENG449b/SavvidesLec 10.34
2/17/05
Task Execution Energy Cost
• A certain task needs to run on the processor. The task requires 200 Million Instructions to complete.
• Which power level will be the most efficient?
cc
c
s
s
12839.376.37E
450c0.85.562E
39.359
200T
8.0250
200T
590.8cP
2501.5c P
59
250
59
250
259
2250
EENG449b/SavvidesLec 10.35
2/17/05
Power Consumption on Embedded Processors
• Different core I/O from Peripheral I/O – numbers here
– Cores scaling down to 0.8V. 1.8V devices are becoming common
– General Purpose I/O interfaces still at 3.0 – 3.3V» Makes power supply harder, additional
regulator inefficiency
• Sleep modes and associate cost of sleep and recovery SA-1100 modes
– Need time and energy to transition between states
EENG449b/SavvidesLec 10.36
2/17/05
Example: SA-1100 CPU
• RUN• IDLE
– CPU stopped when not in use
– Monitoring for interrupts
• SLEEP– Shutdown on-chip
activity
RUN
IDLE SLEEP
400 mW
50 mW 0.16 mW
90 s
10 s 10 s 90 s
160 ms
EENG449b/SavvidesLec 10.37
2/17/05
Duty Cycling: Exploiting Sleep Modes
• Imagine a processor with max power consumption 120mW
• Power supply voltage 2.5V• We need to power the device form a
2000mAh battery for 1 year• Sleep mode draws 20uA current• What is the duty cycle the device
needs to operate at to last for at least 1 year?
EENG449b/SavvidesLec 10.38
2/17/05
Duty cycling
• 1 year has 365 x 24 = 8760 hours
W5702.5228VIP
2288760
2000I
avgavg
avg
A
yearh /38%434.02048000
20228
II
IIT
I)T1(ITI
sleepon
sleepavgon
sleepONONon avg
EENG449b/SavvidesLec 10.39
2/17/05
Voltage Reduction is Better
• Example: task with 100ms deadline, requires 50ms CPU time at full speed– normal system gives 50ms computation, 50ms idle/stopped
time– half speed/voltage system gives 100ms computation, 0ms
idle– same number of CPU cycles but 1/4 energy reduction
Spe
ed
Time
T1 T2 T1 T2
Idle
Same work,lower energy
TaskTask
EENG449b/SavvidesLec 10.40
2/17/05
Problem with Voltage Reduction
• Voltage gets dictated by the tightest (critical) timing constraint
» not a problem if latency not important– throughput can always be improved by pipelining, parallelism
etc.
» but, real systems have bursty throughput and latency critical tasks
Solution: dynamically vary the voltage!
EENG449b/SavvidesLec 10.41
2/17/05
Active Idle
Efixed = 1/2CVdd2
Tframe Tframe
Fixed Supply
Active
Variable Supply
Evar = 1/2C(Vdd /2)2 = 1/4E fixed
0 0.2 0.4 0.6 0.8 1.00
0.2
0.4
0.6
0.8
1.0
Normalized Workload
Normalized Power
Fixed Supply
Variable Supply
from [Gutnik96] (VLSI Symposium)
Varying the Supply Voltage
EENG449b/SavvidesLec 10.42
2/17/05
XYZ Node Frequency Scaling
EENG449b/SavvidesLec 10.43
2/17/05
Code Optimizations for Low Power
• High-level operations (e.g. C statement) can be compiled into different instruction sequences
» different instructions & ordering have different power
• Instruction Selection– Select a minimum-power instruction mix for executing a piece
of high level code
• Instruction Packing & Dual Memory Loads– Two on-chip memory banks
» Dual load vs. two single loads» Almost 50% energy savings
EENG449b/SavvidesLec 10.44
2/17/05
Code Optimizations for Low Power (contd.)
• Reorder instructions to reduce switching effect at functional units and I/O buses
– E.g. Cold scheduling minimizes instruction bus transitions
• Operand swapping– Swap the operands at the input of multiplier– Result is unaltered, but power changes significantly!
• Other standard compiler optimizations– Intermediate level: Software pipelining, dead code elimination,
redundancy elimination– Low level: Register allocation and other machine specific optimizations
• Use processor-specific instruction styles– e.g. on ARM the default int type is ~ 20% more efficient than char or
short as the latter result in sign or zero extension– e.g. on ARM the conditional instructions can be used instead of
branches
EENG449b/SavvidesLec 10.45
2/17/05
Minimizing Memory Access Costs
• Reduce memory access, make better use of registers– Register access consumes power << than memory access
• Straightforward way: minimize number of read-write operations, e.g.
• Cache optimizations– Reorder memory accesses to improve cache hit rates
• Can use existing techniques for high-performance code generation
EENG449b/SavvidesLec 10.46
2/17/05
Low-power Software Strategies
• Code running on CPU– Code optimizations for low power
• Code accessing memory objects– SW optimizations for memory
• Data flowing on the buses– I/O coding for low power
• Compiler controlled power management
CPU
Cache
Memory
EENG449b/SavvidesLec 10.47
2/17/05
How can power consumption be reduced at the circuit design level
inside a processor?
EENG449b/SavvidesLec 10.48
2/17/05
Example: Reference Datapath
from “Digital Integrated Circuits” by Rabaey
Critical path delay: Tadder + Tcomparator = 25 ns Frequency: fref = 40 MHz Total switched capacitance = Cref
Vdd = Vref = 5V Power for reference datapath = Pref = CrefVref
2fref
EENG449b/SavvidesLec 10.49
2/17/05
Parallel Datapath
from “Digital Integrated Circuits” by Rabaey
The clock rate can be reduced by x2 with the same throughput: fpar = fref/2 = 20 MHz Total switched capacitance = Cpar = 2.15Cref
Vpar = Vref/1.7 Ppar = (2.15Cref)(Vref/1.7)2(fref /2) = 0.36Pref
EENG449b/SavvidesLec 10.50
2/17/05
Pipelined Datapath
from “Digital Integrated Circuits” by Rabaey
fpipe = fref
Cpipe = 1.1Cref
Vpipe = Vref/1.7 Voltage can be dropped while maintaining the original throughput Pipe = CpipeVpipe2fpipe = (1.1Cref)(Vref/1.7)22fref = 0.37Pref
EENG449b/SavvidesLec 10.51
2/17/05
Datapath Architecture-Power Trade-off Summary
DatapathArchitecture
Voltage Area Power
Original 5V 1 1Pipelined 2.9V 1.3 0.37Parallel 2.9V 3.4 0.34Pipeline-Parallel
2.0V 3.7 0.18
EENG449b/SavvidesLec 10.52
2/17/05
Back to Processor Architecture: ARM Performance
• Some possible avenues of optimizing performance and power consumption on the ARM
– Use the on-chip cache– Write code in 16-bit mode assembly
» Need only one memory access to fetch an instruction
– Execution in RAM vs. Flash– Write code in assembly
• Refer to the ARM assembly language handout for more references