Foil # 1
EE 382V
System-on-a-Chip Design
Class Overview
Foil # 2
System-on-Chip Market Size
0
200
400
600
800
1000
1200
1400
1991
'93
'95
'97
'99
2001
'03
'05
'07
'09
'11
SoC Market SizeWorld Wide Semiconductor Market Size
Source: SONY Corp & Market Estimates
MS-SOC Contribution to the SoC Market Size
Mar
ket
Siz
e (B
$)
Foil # 3
System-Level Design Flow Driver
Foil # 4
Design Productivity
Source IFX/ST, Published in ITRS
HW DesignProductivity1.6x/18 Months
Capability of Technology2x/18 Months
Gates/ChipGates/Day
SoftwareProductivity2x/5 years
log
time
1981
1985
1989
1993
1997
2001
2005
2009
LoC SW/Chip
Average HW + Design Productivity
LoC/Day
Additional SW required for HW2x all 10 months
Moore’s Law
Silicon SystemDesign Gap
HW DesignGap
Foil # 5
Abstraction Levels
• Growing system complexities
1E0
1E1
1E2
1E3
1E4
1E5
1E6
1E7
Number of componentsLevel
Gate
RTL
Algorithm
System
Transistor
Ab
str
acti
on
Ac
cu
rac
y
Source: R. Doemer, UC Irvine
Foil # 6
System levelSystem levelSystem level
Abstraction Levels
• Growing system complexities
Move to higher levels of abstraction
1E0
1E1
1E2
1E3
1E4
1E5
1E6
1E7
Number of componentsLevel
Gate
RTL
Algorithm
Transistor
Ab
str
acti
on
Ac
cu
rac
y
Source: R. Doemer, UC Irvine
Foil # 7
Abstraction Levels
Temporal orderLow abstraction
High abstraction
Implementation DetailImplementation Detail
Spatial order
physical layout
unstructured
StructureStructure
real time
untimed
TimingTiming
Foil # 8
Top-Down Design Flow
Implementation
Architecture
Specification
Logic-Level Design
Product planning
Structure
pure functional
bus functional
RTL / IS
gates
requirements
Timing
untimed
timing accurate
cycle accurate
gate delays
constraints
System-Level Design
Processor-Level Design
Foil # 9
Integrated Circuits and Systems
Algorithms
System:HW & SW
MacroComponents
Cells VLSI-I
VLSI-II
VLSI Testing
Analog-1
MSSystemDesign
HW/SWArch
RF-ICDesign
SOCDesign
EmbeddedSystem Design
WirelessCommunications
Foil # 10
Electronic System-Level (ESL) Design
• From system specification– Functionality, behavior
• Application algorithms• Constraints
• To system architecture– Structure
• Spatial and temporal order• Components and connectivity• Across hardware and software
Design automation (EDA/CAD) at the system level– Modeling and simulation– Synthesis– Verification
Proc
Proc
Proc
Proc
Proc
Memory
Memory
µProcessor
Interface
Comp.IP
Bus
Interface
Interface
Interface
Custom HW
Requirements, constraints (functional & timing)
Implementation (HW/SW synthesis)
Computation & Communication
Design
Foil # 11
System Specification
• Capture high-level requirements– Functional
• Free of any implementation details
– Non-functional• Quality metrics, constraints
• Formal representation– Models of computation
• Analysis of properties
– Executable• Validation through simulation
Algorithm development Concept to precise description of desired system behavior Separation of computation and communication
Proc
Proc
Proc
Proc
Proc
Natural language Ambiguous Incomplete
Foil # 12
System Architecture
• Processing elements (PEs)– Processors
• General-purpose, programmable• Digital signal processors (DSPs)• Application-specific (ASIP)• Custom hardware processors• Intellectual property (IP)
– Memories
• Communication elements (CEs)– Transducers, bus bridges– I/O peripherals
• Busses– Communication media
• Parallel, master/slave protocols• Serial and network media
Memory
Memory
µProcessor
Interface
Comp.
IP
Bus
Interface
Interface
Interface
Custom HW
Heterogeneous multi-processor systems
MPSoCs
Foil # 13
System Design Tasks
• Computation
– PE Allocation• Processor types and numbers
• Local and global memories
– Partitioning• Behavior to processor mapping
• Variable to storage mapping
– Scheduling• Static scheduling
• Dynamic scheduling
Design space exploration
Unified view across system implementation choices
• Communication
– Network allocation• Busses and CEs
– Mapping• Shared memory vs.
message-passing
• Routing
– Interface synthesis• Addresses and interrupts
• Bus parameters, priorities
Foil # 14
Design Challenges
• Design quality metrics and constraints– Performance
• Latency and throughput
– Power• Static and dynamic power consumption
– Cost• Unit and non-recurring engineering (NRE) costs
– Dependability and reliability• Fault tolerance, safety, correctness, mean time between failure
(MTBF)
– Management• Time-to-market, maintainability, flexibility
– …
Multi-objective optimization and trade-offs Optimize metrics while satisfying all constraints
Foil # 15
ESL Design Today
• Simulation-centric system modeling– Virtual system prototyping [CoWare, Vast]
• C-based, abstracted system modeling [TLM]• System-level design languages (SLDLs) [SystemC]• Processor instruction-set simulation (ISS) models [Tensilica,
Lisatek]
– Algorithmic specification [SPW, MATLAB, COSSAP]• Varying models of computation (MoC) [Ptolemy]• Model-based design [UML, MATLAB/Simulink]
Horizontal integration of different models / components Lack vertical integration for synthesis-centric approach
• High-level synthesis– C-to-RTL [Forte]
• Testbench specification [SystemVerilog]
Single hardware unit only
Foil # 16
Bri
dg
e
CPU Mem
HW IP
Arb
iter
v1
C1
B1 B2
B3 B4
C2
CommunicationComputation &
System SynthesisFront-End
System SynthesisFront-End
Software / HardwareSynthesisBack-End
Software / HardwareSynthesisBack-End
TLM
Inst
ruc
tio
n-S
et
Sim
ula
tors
C-b
ased R
TL
SW binary code HW HDL
Application
Transaction-Level ModelsTLMTLMTLMn
Platform
ESL Landscape
SystemC TLM 2.0,CoWare
Mentor Catapult,Forte, …
VaST,ARM Realview,…
Green Hills,gcc, VxWorks, …
VHDL/Verilog,Synopsys, …
SPIRIT/IP-XACT (XML)
MARTE (UML)
Tensilica
Matlab/Simulink,LabView, …
System-Level Design Languages (SLDLs)
C/C++ code
SpecC, Metropolis …
Foil # 17
Bri
dg
e
CPU Mem
HW IP
Arb
iter
v1
C1
B1 B2
B3 B4
C2
CommunicationComputation &
System SynthesisFront-End
System SynthesisFront-End
Software / HardwareSynthesisBack-End
Software / HardwareSynthesisBack-End
TLM
Inst
ruc
tio
n-S
et
Sim
ula
tors
C-b
ased R
TL
SW binary code HW HDL
Application
Transaction-Level ModelsTLMTLMTLMn
Platform
System Design Courses
System-Level Design Languages (SLDLs)
C/C++ code
EE382V: Embedded System DesignEE382V: Embedded System Design
EE
382N
: A
dv.
Sys
tem
Arc
hit
ectu
reE
E38
2N:
Ad
v. S
yste
m A
rch
itec
ture E
E382V
: Syste
m-o
n-C
hip
Desig
nE
E382V
: Syste
m-o
n-C
hip
Desig
n
Foil # 18
SOC Methodology Flow Chart
MRD
PRD
Map, Model & Simulate in
SPW or Matlab or C or C++
Mapping to Platform or
Components Complete?
Start
Modify Model?
Analyze results
Metrics Met?
Freeze Architecture
MRD Met?
Done
Analyze results
Functionality Met?
System BOM Costs
Met?
Power Req. Met?
Schedule Req. Met?
Platform Req. Met?
Return
No
No
No
No
No
No
No
No
YesYes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Foil # 19
Goals of the Class
• This course is intended to:– Provide an understanding of the concepts, issues, and process of
system-on-chip (SoC) design, i.e., hardware-software co-design & co-verification.
– Expose the student to the modeling and specification of an SoC at a high level of abstraction.
– Use co-simulation to validate system functionality.
– Analyze hardware/software tradeoffs, algorithms, and architectures to optimize the system based on requirements and implementation constraints.
– Describe architectures for control-dominated and data-dominated systems and real-time systems.
– Understand hardware, software, and interface design/synthesis.
– Describe examples of applications and systems developed using a co-design approach.
– Appreciate issues in system-on-a-chip design associated with co-design, such as intellectual property, reuse, and verification.
Foil # 20
EE 382V-SoC-ICS
• Instructors: – Andreas Gerstlauer, Mark McDermott,
Steven Smith, Jacob Abraham– TAs: Hyungman Park, Sriram Sambamurthy
• Web URL: http://www.ece.utexas.edu/~gerstl/ee382v-ics_f09• Lecture notes, homework and lab exercises will be posted on the
web site.• There will be one exam and a project review at the end of the
semester.– Exam: November 13th– Project Review: Dec 3rd and Dec 4th
• Grading:– Homework: 15%– Lab assignments: 30%– Exam: 20%– Project: 35%
Foil # 21
Labs – Lab1
• Lab1: Static profiling of DRM on ARMSD, exercise of float to fixed conversion of Viterbi Decoder.
– This lab is an individual exercise (3 weeks, Aug 24 to Sep 13)
– Profiling is necessary to• Identify time consuming functions
• Optimize the code to increase performance
– Static profiling of DRM code• Floating point code
• Fixed point code – legacy code from last semester
• Report and discuss the results
– Exercise to convert floating point code to fixed point code• Matrix inversion
• Get the same output for the given test cases (no loss)
Foil # 22
Labs – Lab2
• Lab2: Improve the SNR of the fixed point DRM code.
– This lab is a group exercise (3 weeks, Sep 19 to Oct 11)
– When conversion from floating point to fixed point• Degradation of signal occurs due to precision loss
• Not always possible to get the same output
• SNR (signal to noise ratio) metric to identify the loss
– Between the final output wave files
– Objective: Improve SNR• Different levels of scaling for different functions
• Bugs – mismatch of scaling at interface?
• Identifying operations that need to be kept in floating point
– Input: fixed point DRM code
– Output: fixed point DRM code with improved SNR• and fewer floating point operations? (profile)
Foil # 23
Labs – Lab3
• Lab3: Do the synthesis of Viterbi decoder from C++ -> Verilog
– This lab is a group exercise (3 weeks, Oct 17 to Nov 8)
– Input: fixed point Viterbi decoder• ViterbiDecoder.h, ViterbiDecoder.cpp
– Output: viterbi_decoder.v
– Why convert? To implement in hardware
– Steps:
• C++ code to stand alone module (threaded C)– Why threaded? Module can be put to sleep when not used
• Threaded C to Catapult C– Performance and synthesis optimizations
• C-to-verilog – Mentor Catapult high-level synthesis tool
– Propose ways to improve the performance of the resulting verilog code
Foil # 24
Labs – Lab4 (Tutorial)
• Tutorial
– For establishing communication between software and hardware
• Proper address translation in software (map to hardware physical address)
• State machines in hardware
– For synthesizing and downloading the hardware module into FPGA using Xilinx software
– Example C/C++ program that runs on linux on the host processor (ARM)
• Adds two numbers and displays result, adder implemented in the FPGA
– Example of using interrupts for communication
– Power spreadsheet for calculating power (65 nm)
• Project:
– Modules of DRM instead of adder (at least Viterbi decoder)
Foil # 25
Project Overview
• The class project is to develop a low power SOC implementation of the DRM software implementation.
– The result will be a reference design that can be incorporated into an existing MP3 player or potentially a 3G cell phone.
• The design will be prototyped on a HW platform. The HW platform will consist of an ARM processor, I/O devices, memory components, hardware accelerators.
• The project will be a team effort. The team assignments will be determined in the next couple of weeks.
Foil # 26
Fraunhofer Software Radio for reception of DRM transmissions
The goal of Digital Radio Mondiale (DRM) is a single world standard for digital broadcasting in the AM radio bands below 30MHz. In order to give a lot of people the chance to participate at the very early transmissions and in order to push the DRM technologythe Fraunhofer Institut für Integrierte Schaltungen developed a software radio capable of receiving DRM signals. The main goal of the development was early availability and an easy way to reproduce the radio.
Foil # 27
Commercially Available Receivers
Coding Technologies
Himalaya
iGear
Foil # 28
Commercially Available DRM Integrated Circuit
• Texas Instruments currently offers the TMS320DRM300/350 component:
Foil # 29
Typcial DRM Broadcast schedule
UTC Days kHz Beam Target Power Programme Language
0000-0059 daily 1431 ND Canberra 0.05 MCS English
0000-0059 daily 9790 227 NE USA 70 TDPradio Dance Music
0000-0300 daily 177 ND Germany 150 DLR Kultur German
0000-2400 daily 1386 ND AUS-NSW 3 ABC English
0000-2400 daily 1008 ND Prov. Hunan 4 Economic Ch. Chinese
0000-2400 daily 999 ND Paris 8 DRM test French
0000-2400 daily 25775 ND Rennes 0.1 TDF Radio French
0000-2400 daily 25775 ND Cote d'azur 0.7 AGORA French
0000-2400 daily 59500 ND Rennes 0.15 TDF French
Foil # 30
DRM System Architecture
Foil # 31
DRM Software Architecture
ResamplingInputResample()
Sound Card InterfaceReceiveData()
Frequency sync acquisition,Frequency offset correction
FreqSyncAcq()
Time sync acquisitionGuard interval removal
TimeSync()
OFDM DemodulationOFDM_Demodulation()
Sync using pilotsDRM Frame Sync
Sample frequency offset TrackingSyncUsingPilot()
Channel EstimationTime Sync TrackingChannelEstimation()
SB
SB
SB
SB
SB
CB
CB
FAC MLC DecoderFAC_MLC_Decoder()
SDC MLC DecoderSDC_MLC_Decoder()
CB
MSC Symbol DeinterleaverSymbol_Deint()
CBOFDM Cell Demapping
OFDMCellDemappint()
CB
MSC MLC DecoderMSC_MLC_Decoder()
MSC MLC DecoderMSC_MLC_Decoder()
Audio Source DecoderAudioSourceDecoder()
Use SDCUtilizeSDC_Data()
Use FACUtilizeFAC_Data()
SB
SB
CB
CB : Cyclic Buffer
SB : Single Buffer
: Time Domain
: Frequency Domain
: QAM Symbols
: Bit Stream
Foil # 32
Prototyping Hardware Architecture
FlashMemory
Buffers
ConfigurableLogic(FPGA)
ARM‐9EmbeddedProcessor(iMX21)
SDRAMMemory
SDRAMMemory
Ethernet
RS232
USB 1.1
Baseboard
Processor