Introduction to ComplexEmbedded System Design
National Chiao Tung UniversityChun-Jen Tsai
2/21/2011
Chun-Jen Tsai, CS, NCTU, 2010 2/30
Embedded Systems
Definition of Embedded Computing Systems†:Any devices that include programmable processors,but is not itself a general-purpose computer Take the advantage of application characteristics to
optimize the design
Question: Is iPhone an embedded system?iPhone 4 Spec.: ARM Cortex A8 (1GHz)
PowerVR SGX535 Running OS-X 32G flash storage 512MB SDRAM?
†Wayne Wolf, Computers as Components, Academic Press, 2001
Chun-Jen Tsai, CS, NCTU, 2010 3/30
“Real-time”vs. “Embedded”Systems
Real-time System Deal with time-critical tasks. The response time is typically
in the tens of milliseconds range Not necessarily resides in “small”systems
Embedded System Response time is typically still an issue, but less time critical Mainly for small systems (“possibly”battery powered)
Chun-Jen Tsai, CS, NCTU, 2010 4/30
“Board-level”vs. “Chip-level”Design
Board-level Design More flexibility in development cycle High manufacturing cost for large quantities High power consumption Bad for small form factor products
Chip-level Design (i.e. SoC) Difficult in design, debug, and verification Design constrained by manufacturing process Low cost for large quantities Low power consumption Excellent for small devices
Chun-Jen Tsai, CS, NCTU, 2010 5/30
System-on-Chip Definition
SoC: Complex IC that integrates the major functionalelements of a complete end-product into a single chip
The SoC design typically incorporates Embedded processor cores On-chip memory Custom Accelerator Logics I/O logics Embedded software
RISC Core DSP Core CustomLogics
ROM (flash)for firmware
SRAM Banks(scratch-pad
memory)Cache
I/OLogicsBUS
Chun-Jen Tsai, CS, NCTU, 2010 6/30
Why Multi-core Architecture?
RISC cores: System control tasks (condition & branch)
Digital signal processor (DSP) cores: Parallel data processing Multi-buses for higher memory bandwidth
Application specific IP cores: Computationally intensive
parallelizable operations,customized buffers formemory bandwidthreduction
Audio interface
Microphone ADCspeaker DAC
RFinterface
Analog baseband
DSPAS
Logics
Digital baseband
RISC
Receiver
Synthesizer
Modulator
Poweramp
antenna
Display
Keypad
SIM Card
Radio subsystem
Chun-Jen Tsai, CS, NCTU, 2010 7/30
Embedded Software Mapping
ESW mapping onto 2G dual-core handsetarchitecture:
Baseband Processor
GPP RTOS DSP RTOS
Layer 2/3Radio Management
Service DependentFunctions
SystemApplications
Channelcodec
demodulation
Decryption/encryption
Speech
codec
GPP (RISC) DSP (VLIW)
shared memory
Chun-Jen Tsai, CS, NCTU, 2010 8/30
Two-Chip 3G Architecture
Audio interface
Microphone ADCspeaker DAC
RFinterface
Analog baseband
DSP
ASLogics
Digital baseband
RISC
Receiver
Synthesizer
Modulator
Poweramp
antenna
Display
Keypad
SIM Card
Radio subsystem
Image sensor
MultimediaAccelerator
Processorcores
(RISC+DSP+controllers)
bluetooth GPS
transceiver receiver
application processor
Chun-Jen Tsai, CS, NCTU, 2010 9/30
Two-Chip 3G Software Mapping
Current 3G takes a lazy approach for softwaremapping:
Application Processor
Multimedia OS
Middleware(for service providers)
Applications(service providers & 3rd parties)
Man-Machine Interface
Baseband Processor
GPP RTOS DSP RTOS
Layer 2/3Radio Management
Service DependentFunctions
SystemApplications
Channelcodec
demodulation
Decryption/encryption
Speech
codec
IPC(sharedmemory)
Chun-Jen Tsai, CS, NCTU, 2010 10/30
Commercial-Grade SW Architecture
Freescale provides a cleaner architecturefor the converged platform†:
RISC-side
Ope
nO
S&
Mid
dlew
are
Fra
mew
ork
User Interface
Applications
Service ProviderApplications
DSP-side
RT
OS
Signaling SP
Engine
AudioCodecs
Sharedmemory Physical
Layer Tasks
Layer one
Layer two
Layer three
Idle(~40%) layer 1
layer 2/3
DSP load forEDGE class 12
†Freescale White Paper MXCWPD/D Rev. 2, Feb. 2004
Chun-Jen Tsai, CS, NCTU, 2010 11/30
Product Development Process
Requirements
Specification
Architecture
Componentdesign
Systemintegration
Hardwareplatform?
Softwareplatform?
Do we needa processor?
Do we needan OS?
Chun-Jen Tsai, CS, NCTU, 2010 12/30
3G Mobile Receiver Functional Blocks
Graphicsdisplay
Sound device
Terminalcapabilities
User interface
3GP
PL2
Pac
ket-
base
dne
twor
kin
terf
ace
Sessioncontrol
sync
hron
izat
ion
Spa
tiall
ayou
t
Scenedescription
Sessionestablishment
Capabilityexchange
Syntheticaudio decoder
Speechdecoder
Audiodecoder
Timed textdecoder
Text
Vector graphicsdecoder
Image decoder
Video decoder
Firmware blocksHardware blocks
Chun-Jen Tsai, CS, NCTU, 2010 13/30
System Allocation/Partition Issues
Audio interface
Microphone ADCspeaker DAC
RFinterface
Analog baseband
DSP
ASLogics
Digital baseband
RISC
Receive
Synthesizer
Modulator
Poweramp
antenna
Display
Keypad
SIM Card
Radio subsystemImage sensor
MPEGAccelerator
RISC CoreDSP Core
I/O Ctrl(BT, USB)
bluetooth GPS
transceiver receiver
Graphicsdisplay
Sound device
Terminalcapabilities
User interface
3GP
PL2
Pac
ket-
base
dne
twor
kin
terf
ace
Sessioncontrol
sync
hron
izat
ion
Spa
tiall
ayou
t
Scenedescription
Sessionestablishment
Capabilityexchange
Syntheticaudio decoder
Speechdecoder
Audiodecoder
Timed textdecoder
Text
Vector graphicsdecoder
Image decoder
Video decoder
Ideally, optimal design should be done by:•System-level SW/HW partition•Component-level SW/HW partitioning•Hardware platform customization
Practically,•Time-to-market prevent optimal design•HW/SW platform probably will converge
(just like WINTEL for PC era)
Chun-Jen Tsai, CS, NCTU, 2010 14/30
Another Example: MHP STB
Multimedia Home Platform (MHP) set-top boxes isconsidered as the future of TV
In MHP, application programs (e.g. Java) can beexecuted together with the audio-visual programs:
Score cards EPG Online order
Menu
Chun-Jen Tsai, CS, NCTU, 2010 15/30
DVB-MHP Architecture (by the Book)
HW
SW
Xlets Xlets based on GEMNativeApplications
MHP Middleware
SoCI/O Devices Videoaccelerator
RISC Processor(< 300MHz)
Java Standard Classes (CDC/PBP)
Operating Systems
SunJava-TV
DVB DAVICHAVI
Platform-dependentmodulesJMF 1.1
Application Manager(Navigator)
audioaccelerator
Graphicsaccelerator
Aud
ioC
odec
(HE
-AA
C)
Gra
phic
libra
ry(e
.g.M
icro
win
)
Vid
eoC
odec
(H.2
64,M
PE
G-2
)
Java Virtual Machine (CVM)
MPEG-2 TSDemux
Chun-Jen Tsai, CS, NCTU, 2010 16/30
Design Consideration for MHP STB
The roles of OS (e.g. Linux) and Java RE areoverlapping a waste of embedded systemsresources
The processor clock rate should stay low to keepheat dissipation down Looks is everything for anSTB
The performance of Java RE has to be high sky isthe limit!
Chun-Jen Tsai, CS, NCTU, 2010 17/30
Why Use RISC/DSP Cores?
Alternatives: custom logic (IP Cores) Possible reasons:
“Lower”design cost:Conventional wisdom told us software design iseasier/cheaper than hardware design, (however, this is nottrue anymore)
Efficiency for complex embedded systems:RISC/DSP’s use same logic to perform many differentfunctions
Upgradeability:RISC/DSP’s simplify the design of families of products
Chun-Jen Tsai, CS, NCTU, 2010 18/30
The Performance Paradox
Microprocessors use much more logic to implement afunction than does custom logic High power consumption High manufacturing cost (large die-size)
But microprocessors are often at least as fast: Heavily pipelined Aggressive VLSI technology Large design teams for optimization
Chun-Jen Tsai, CS, NCTU, 2010 19/30
Why Use an OS?
Alternatives: firmware handles GUI as well as directinterfacing to I/O
Possible reasons: OS’es are flexible: can easily add more functions (software)
and modules (hardware) to the devices OS’es simplify the design of families of products
Strong reasons: Enable third-party developers to promote your platform
Chun-Jen Tsai, CS, NCTU, 2010 20/30
Performance Paradox Again
OS’es add overheads between software applicationsand hardware (typically 2~5%)
A well designed OS can force the designer to writesoftware “the right way,”which often increasessystem performance
Each OS usually comes with a nice SDK which cutsdown system development time drastically
But, for chip-level functionality, firmware is the onlyway to go
Chun-Jen Tsai, CS, NCTU, 2010 21/30
Layers of Embedded Systems
Today, embedded systems have a layered structure:
The layers can split or merge, depending on the cost,design expertise, market time constraints, etc.
Applications (e.g. MP3)
Middleware
OS/Firmware
Hardware
What consumers are willing to pay for
What enables application portability
What enables system portability
Where we make money from
Chun-Jen Tsai, CS, NCTU, 2010 22/30
Unique Multimedia ES Features
Multimedia embedded systems have the followingfeatures: High-level of task parallelism (spells multi-core) High-level of data parallelism (spells SIMD) Low-power, small form factor (spells SoC) Low clock cycles, large data processing
(spells memory-centric architecture) Sophisticated GUI, device, and application variety (spells OS) Low cost (spells trouble)
Chun-Jen Tsai, CS, NCTU, 2010 23/30
Weakness of Current Platforms (1/2)
Hardware issues Too many independently-designed cores
Application processors are similar to digital baseband processors Many accelerators (IP-cores) contain P inside
Inflexible hardwired accelerators Shared bus architecture High power consumption
Chun-Jen Tsai, CS, NCTU, 2010 24/30
Weakness of Current Platforms (2/2)
Software issues Symbian OS, WinCE, and Embedded Linux are still too
heavy No kernel supports runtime heterogeneous multiple
processors No kernel supports automatic distributed RAM management Embedded software pieces are often “direct ports”from PC
Efficient implementation of middleware protocols are rare
Chun-Jen Tsai, CS, NCTU, 2010 25/30
Memory-Centric Applications
Many embedded systems are for memory-centricapplications Multimedia applications Network applications
There are some fundamental differences amongRISC CPU, DSP, and hard-wired logic in memory-centric processing
Chun-Jen Tsai, CS, NCTU, 2010 26/30
Memory-Centric Processing (1/3)
Let’s say we have two functional blocks that does thefollowing things: FU#1
FU#2
for (idx = 0; idx < N; idx++) for (jdx = 0; jdx < N; jdx++){
C[idx][jdx] = A[idx][jdx] + B[idx][jdx];}
for (idx = 0; idx < N; idx++) for (jdx = 0; jdx < N; jdx++){
row = scan_row[idx], col = scan_col[jdx];C2[idx][jdx] = C[row][col];
}
Chun-Jen Tsai, CS, NCTU, 2010 27/30
Memory-Centric Processing (2/3)
System blocks:
Pure software approach:
FU#3FU#0 FU#1 FU#2
memoryRISC
12 3
4
Drive up clock frequencyuntil the memorybandwidth is large enough
bus
Chun-Jen Tsai, CS, NCTU, 2010 28/30
Memory-Centric Processing (3/3)
Hardwired chip solution:
Dual-core architecture:
RISCFU#0 FIFO FU#1 memoryFU#2(AGU)
bus
RISC memory
bus
DARAM
DSPDMA
SARAM
DSP code for FU#0, FU#1,and FU#2 (may use DMA)
Chun-Jen Tsai, CS, NCTU, 2010 29/30
The Ideal Platform for EMS
Hardware Dual-core, flexible DMAs, flexible accelerators Sophisticated memory controllers Distributed SRAM Reconfigurable logic blocks (e.g. FPGAs)
Software Clean OS that provides just enough abstraction Kernel that handles dynamic partitioning and
distributed RAM utilization optimally Efficient middleware layer
Chun-Jen Tsai, CS, NCTU, 2010 30/30
Discussions
Multimedia devices, from handsets to digital TVs, areoverlapping more and more in functions
Tight cost/performance requirements makeembedded system design a challenging problem
IP-driven design practices leave a lot of rooms forfurther optimization
For complex embedded IP-core design, fullunderstanding/analysis of the algorithm (esp. dataaccess pattern) is the key to efficiency
Platform convergence will probably happen, but notas focused as the PC market