Who ate my battery?Why software engineers are the key to low power software design
Jeremy Bennett, CEO Embecosm 22 March 2012
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Agenda
Designing energy aware systems
Hardware and software working together– unified system debug
Experience with OpenRISC
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Designing Energy Aware Systems
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Why?
Ericsson T65– released 2001
– Li-Ion 720 mAh
– standby 300 h
– talk time 11 h
– includes talk/standby prediction
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Why?
Ericsson T65– released 2001
– Li-Ion 720 mAh
– standby 300 h
– talk time 11 h
– includes talk/standby prediction
Sony Ericsson Xperia X10 mini– released 2010
– Li-polymer 930 mAh
– standby up to 285 h (3G) / 360 h (2G)
– talk time up to 4 h (2G) / 3.5 h (3G)
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Free Mobile Apps“Drain Battery Faster”
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Energy Saving Today
Largely in the realm of hardware engineering
Hardware design aims to mimimize– static (leakage) power loss
– dynamic (switching) power loss
Techniques used– dynamic voltage and frequency scaling (DVFS)
– multiple mode operation (standby, sleep, suspend, off)
Scope for savings– P = V2R
– on-chip voltage can range from ~0.5V to ~1.5V
– lower frequencies mean lower voltages can be used win on both static and dynamic power loss
Is this where the greatest savings can be made?
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Greater Savings at Higher Levels
Layout
Gate
RTL Synthesis
Architectural
0% 20% 40% 60% 80% 100%
Source: LSI Logic
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Shouldn't We Look Higher?
A Linux implementation famously wasted 70-90% of its energy, simply waking several times a second to drive a blinking cursor.
A project had to raise its clock frequency because a standard codec caused excessive processor stalls through cache conflicts. That project was cancelled shortly aftewards.
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Focussing on Software
Software controls the hardware– algorithms and data flow
– compiler optimization traditionally speed over everything
Few software engineers appreciate this– how does an algorithm affect the power consumption
– power consumption is often a secondary design criterion in software
Yet biggest savings are at the higher levels of abstraction– choice of algorithm
– data handling
– entire software stack
Why?– energy is consumed by the hardware computations
– but ultimate control of that hardware lies with the software
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Greater Savings at Higher Levels
Layout
Gate
RTL Synthesis
Architecture
ISA
Programming Language
Compiler
Application
0% 20% 40% 60% 80% 100%
Source: Bennett & Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
This is Not New
Kaushik Roy and Mark C. Johnson. 1997. Software design for low power. In Low power design in deep submicron electronics, Wolfgang Nebel and Jean Mermet (Eds.). Kluwer Nato Advanced Science Institutes Series, Vol. 337. Kluwer Academic Publishers, Norwell, MA, USA, pp 433-460.
Choose the best algorithm to fit the hardware
Tune algorithms to manage memory size and memory access
Optimize for performance, making best use of parallelism
Use hardware support for power management
Minimize CPU and data path switching in the generate code
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware Power Analysis
Late in the design flow
Slow
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Software Power Analysis
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Tractable Software Power Analysis
Source: Eder, 2011
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Tool Challenge
How do we do high level power estimation
Existing power analysis tools– operate at gate level
– accurate, but very slow
Instruction level power analysis– Tiwari et al, 1996
– highly parameterized formulaic approach
– no data on accuracy
Wattch: An architectural power analysis tool– Brooks et al, 2000
– requires parameterized models of common functional blocks
– accurate to ±10%, relatively fast
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
What Have We Learned So Far?
Use a low power instruction encoding– minimize Hamming distance between consecutive instructions
reduces switching
– ISA is very target application specific (requires profiling to create)
– up to 62% reduction in opcode switching observed
– Woo et al, 2001
Partition the register file– 25% of registers account for 83% of access time
– partition into “hot” and “cold” registers
– average 54% energy savings compared to non-partitioned
– Guan & Fei, 2010
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Higher Abstraction Still
Specialist programming languages– allow the programmer to exploit power/accuracy tradeoffs
– use approximate types where appropriate
– programmer annotates the code
– type checker separates out approximate and accurate code
– Sampson et al 2011
Hardware support for approximate computation– variable bit width in floating point calculations
– up to 66% power saving
– Tong et al 2000
These are niche examples. What is the generic solution– key is tools that allow software designers to explore solutions
– profile energy usage as easily as performance tools from Lauterbach and research by Steve Kerrison at Bristol University
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Energy Aware Computing Initiative (EACO)
Ultimate goal is a European wide program of research
Led by the Institute for Advanced Studies at Bristol University
Kicked off with three dedicated workshops in 2011– http://www.bristol.ac.uk/ias/workshops/current-workshops/energy-aware-computing.html
Intellectual challenges– incremental improvements
– radically new innovative approaches
Conveners– Dr Kerstin Eder
– Prof David May
More collaborators wanted– contact Kerstin Eder
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware and Software Working TogetherUnified System Debug
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
A Typical SoC Design Flow
Exploration &System Design
ImplementationSystem
VerificationSilicon
SystemCTLM
Simulation &CA Model
FPGA orEmulation
Silicon
ISS +Debugger
ISS +Debugger
ISS/ICE +Debugger
Silicon +Debugger (?)
HardwareTeam
SoftwareTeam
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Technical Solution
Embedded software tools need two key features– they must be “peripheral aware”
when the program halts, the peripherals halt
the debugger has visibility into the peripherals
– they must work with models as well as final silicon models of the complete SoC
high level, low level, software or FPGA emulation
This is not a technical challenge– most debuggers extend easily to peripherals
– JTAG provides a good abstraction of the interface
– the EDA world knows how to model SoCs
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Adding “Peripheral Awareness”
Reading peripherals– GDB info command
Writing peripherals– GDB set command
Watchpoints– new GDB command– depends on target abilities
(gdb) info spr picmrPIC.PICMR = SPR9_0 = 0 (0x0)(gdb)
(gdb) set spr picmr 0x00000007PIC.PICMR (SPR9_0) set to 7 (0x7), was: 0 (0x0)(gdb)
(gdb) pwatch picsrPeripheral watchpoint 2: PIC.PICSR (SPR9_2)(gdb)
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Peripheral Awareness Using the GDB Remote Serial Protocol
Extend using standard Remote Serial Protocol (RSP) features– A reliable packet based protocol
qCmd packet used to access peripherals– e.g. readspr to read a peripheral register
– e.g. writespr to write a peripheral register
Future proof against GDB upgrades– RSP compatability is always ensured
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Debugging Models and Silicon
class SocTlmModel : public sc_core::sc_module{ … tlm:tlm_transport_dbg_if<JtagPayload> jtagPort;
class SocCycleModel : public sc_core::sc_module{ … sc_in<bool> jtagTck; sc_in<bool> jtagTms;
static voidjp1_ll_reset_jp1(){ … write (lp, &data, sizeof (data)); JP1_WAIT ();
SystemC TLM 2.0– modeled as debug I/F
Cycle accurate/simulation– modeled as pins
FPGA/Emulation/ASIC– drives hardware interface
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Unified Debug Solution
Hand-writtenTLM
Simulation orCA model
Emulationor FPGA
Silicon
Debugger(e.g. GDB)
Firmware World Hardware World
Unified JTAG Abstraction Layer
TLM 2.0JTAG Model
JTAGsimulation
JTAGdriver
JTAGdriver
Debugger Protocol(e.g. GDB RSP)
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
JTAG AbstractionGeneric Class Diagram
JtagSCRspServerSC
JtagRegister
A generic GDB Remote Serial Protocol server communicating with a generic JTAG target by passing a generic JTAG register.
Both the RSP server and JTAG target are abstract classes. Concrete derived classes are created for specific architectures and specific JTAG targets
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
JTAG AbstractionSpecific Targets
JtagCycleModelSC
JtagSC
JtagTlmSC
JtagVpiSC Jtag2232SC
RspServerSC
JtagRegister
A set of JTAG derived classes provide interfaces to common targets independent of the processor architecture being supported.
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
ArchJtagRegTypeArchJtagRegType
JTAG AbstractionSpecific Architecture
JtagCycleModelSC
JtagSC
JtagTlmSC
JtagVpiSC Jtag2232SC
RspServerSC
JtagRegister
ArchRspServerSC ArchJtagRegType
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Using SystemC TLM to model JTAG
tlm_generic_payload
get_command ()set_command ()get_address ()set_address ()
get_data_length ()set_data_length ()get_extension ()
JtagPayloadCommand
getCommand ()setCommand ()
IGNOREREADWRITE
WRITE_READ
JtagPayloadBitLength
getBitLength ()setBitLength ()
JtagPayloadAddress
getAddress ()setAddress ()
RESETIRDR
0..1
0..1
0..1
Use extensions for JTAG specific features– address, bit length, command
Allows use of generic payload– maximum portability
Use extensions for JTAG specific features– address, bit length, command
Allows use of generic payload– maximum portability
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Example Adapter
TLM Thread Cycle Accurate Thread
tck
tdo
tdi
trst
tms
SystemCsignals
TAP State Machine
BlockingTLM call
BlockingTLM return
Reference Application Note EAN5Reference Application Note EAN5
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Experience with OpenRISC
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The OpenRISC 1000 Project
Objective to develop a family of open source RISC designs– 32 and 64-bit architectures
– floating point support
– vector operation support
Key features– fully free and open source
– linear address space
– register-to-register ALU operations
– two addressing modes
– delayed branches
– Harvard or Stanford memory MMU/cache architecture
– fast context switch
Looks rather like MIPS or DLX
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The OpenRISC 1200
OpenRISC 1200
PowerMgmt
DebugUnit
TickTimer
PIC
CPU
InstMMU
InstCache
DataMMU
DataCache
JTAG
WishBone
WishBone
ALU
32-bit Harvard RISC architecture
– MIPS/DLX like instruction set
– first in OpenRISC 1000 family
– originally developed 1999-2001
Open source under the
– GNU Lesser General Public License
– allows reuse as a component
Configurable design
– caches and MMUs optional
– core instruction set
Source code Verilog 2001
– approx 32k lines of code
Full GNU tool chain and Linux port
– various RTOS ported as well
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Hardware Development
Objective is to use an open source EDA tool chain– back end tools for FGPA all proprietary
free (as in beer) versions available
– front end tools now have open source alternatives
OpenRISC 1000 simulation models– Or1ksim: golden reference ISS
C/SystemC interpreting ISS, 2-5 MIPS
– Verilator cycle accurate model from the Verilog RTL 130kHz in C++ or SystemC
– Icarus Verilog event driven simulation 1.4kHz, 50x slower than commercial alternatives
All OpenRISC 1000 simulation models suitable for SW use– all support GDB debug interface
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
The Software Tool Chain
A standard GNU tool chain– binutils 2.20.1
– gcc 4.5.1
– gdb 7.2
– C and C++ language support
Library support– static libraries only
– newlib 1.18.0 for bare metal (or32-elf-*)
– uClibc 0.9.32 for Linux applications (or32-linux-*)
Testing– regression tested using Or1ksim (both tool chains)
– or32-linux-* regression tested on hardware
– or32-elf-* regression tested on a Verilator model
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Board and OS Support
Boards with BSP implementations– Or1ksim
– DE-nano
– Terasic DE-2
RTOS support– FreeRTOS, RTEMS and eCos all ported
Linux support– adopted into Linux 3.1 kernel mainline
– some limitations (kernel debug, ptrace)
– BusyBox as application environment
Debug interfaces– JTAG for bare metal
– gdbserver over Ethernet for Linux applications
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testingof the OpenRISC 1200
=== gcc Summary ===
# of expected passes 52753# of unexpected failures 152# of expected failures 77# of unresolved testcases 122# of unsupported tests 716
Golden SystemC TLM Model
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testingof the OpenRISC 1200
=== gcc Summary ===
# of expected passes 52753# of unexpected failures 152# of expected failures 77# of unresolved testcases 122# of unsupported tests 716
=== gcc Summary ===
# of expected passes 52677# of unexpected failures 228# of expected failures 77# of unresolved testcases 122# of unsupported tests 716
Golden SystemC TLM Model Verilator SystemC RTL Model
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Comparative Regression Testingof the OpenRISC 1200
=== gcc Summary ===
# of expected passes 52753# of unexpected failures 152# of expected failures 77# of unresolved testcases 122# of unsupported tests 716
=== gcc Summary ===
# of expected passes 52677# of unexpected failures 228# of expected failures 77# of unresolved testcases 122# of unsupported tests 716
Golden SystemC TLM Model Verilator SystemC RTL Model
We can identify two types of problem– tests which fail due to timing out with RTL, but not due to slower model
– tests which give a different result with RTL
These are candidates for possible RTL errors
Used commercially by Adapteva Inc– 50-60 RTL errors eliminated pre-tape out
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Summary and Acknowledgements
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Summary
Future low power products will require a systems approach– hardware and software engineers must work together
– the approach applies throughout the lifecycle
The greatest opportunity for power saving is in the software– techniques for tackling this are still in their infancy
– we need breakthroughs in high level power modeling and simulation
We need a systems oriented tool chain– geared to the needs of both software and hardware
– usable throughout the product lifecycle
Embecosm's unified debugging approach is an example– allows software debugging throughout the lifecycle
The benefits can be seen already in the OpenRISC project– hardware bugs identified by the software engineers
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Acknowlegements
Most of the work described in the first section of this presentation on energy aware computing is due to my colleague, Dr Kerstin Eder of the University of Bristol.
OpenRISC is a community project, to which I am just one of the contributors. It is the cumulative result of 12 years work by a very large number of people
Copyright © 2012 Embecosm. Freely available under a Creative Commons license
Thank You
Thank You
www.embecosm.com
www.opencores.org