Fundamental Technology on
Dependable SoC and SiP
for Embedded Real-Time Systems
Nobuyuki Yamasaki (Keio Univ.)
Kikuo Wada (NECAT)
Masayuki Inaba (Tokyo Univ.)
Applications: High-end
Embedded Real-Time Systems
Robot (Humanoid, etc.)
Spacecraft
Factory automation
Intelligent rooms/buildings
Amusement system
Car
VoD (Video on Demand)
Ubiquitous Computing
Large systems in which sensors/actuators are spatially distributed
Large scale systems that can not be controlled by a single CPU
Fault tolerant systems, etc.
Distributed Control on Humanoid Robots
Kojiro (Tokyo Univ.)
Massively distributed control
100-freedom, 60-controller
HRP3L-JSK (Tokyo Univ.)
Very high power leg
48V: cont.50[A], 15sec 100[A]
It is hard to realize a robot system by using a general purpose CPU (x86) and a general purpose OS (Windows/Linux).
Requirements for a high power motor driverReal-time processing under high speed communication
Motor temperature estimation processing for very high power motor driving such as 20 times overdrive rated at 200W
• Current control cycle: 10msec → 10μsec
Reliability, availability, and safety on communication and control under high-stress environment
Huge current noise, unusual situation such as cable disconnection, etc
⇒ Prevention of fatal accidents
Requirements for a large scale distributed motor driverMicrominiaturization of the controller (size: 36x46x7mm)
Area constraint of the digital control part: 20mm square
Real-time communication and control under the size constraint• Poor processing power of current MPU (H8S/2215 16MHz)
• Limit of control cycle: 1msec → 10μsec
• External computation servers (Xeon 3.4GHz x 2) are required.
• High communication traffic 7.2MB/sec
• Limit of Inter-device synchronization cycle: 8msec (USB) → 100μsec (Responsive Link)
Reliability of communication under the size limitation• Severe noises under the logic servo systems
Power saving scheme under the large scale distributed control
Static power of a whole logic part: 80W@idle → 1W
Requirements in the Field of High-end Robots
Kojiro (Tokyo Univ.)
•Freedom 82 DOFs
•Driven -muscle 109
•# of controllers 60
High power leg: HRP3L-JSK (Tokyo Univ.)
Continuous current 80[V],100[A],15[sec]
Peak current 80[V],200[A],10[msec]
Evaluation indicatorsEvaluation itemsClassification
Dependability Evaluation Indicators
Reliability
Availability
Real-time
Power
Noise tolerance
Heat radiation
Safety
Maintainability
Footprint
Plug-and-Play
Heat control
Parts replacement
Board replacement
Failure analysis
Hard/soft time constraint (T/F)
Time quantum (sec)
Jitter (sec)
Dynamic power (W)
Static power (W)
S/N ratio (%)
Thermal resistance (deg C/W)
Integration to robots (T/F)
Correct operating (T/F)
Network reconfigure time (sec)
Self-monitoring (T/F)
Repair and replacement (sec)
Repair and replacement (sec)
Failure analysis (sec)
Network failure
Real-Time Scheduling
Real-time processing/communications are
basically controlled by real-time schedulers.
EDF (Earliest Deadline First) [1]
RM (Rate Monotonic) [1], …
EDF
Deadlines are translated to priority levels.
Earlier the deadline, higher the priority.
Priority is changed dynamically.
Optimal scheduling method
Maximum processor utilization: U = 1[1] Liu, C.and Layland, J, “Scheduling algorithms for
multiprogramming in a hard real-time Environment”, Journal of the
ACM, Vol.20, pp.46-61, 1973
EDF Sample Schedule
Release time Deadline
Earlier the deadline, higher the priority.
Time
Job
Preemption
Preemption
Distributed Real-Time
Systems
Time constraints (deadline, cycle, etc.)
Each controller
Its own actuators and sensors
Connected via real-time network Responsive Link
Almost all real-time scheduling algorithms assume:
PreemptionContext switching in case of processing
Packet overtaking in case of communication
Worst case latencyWCET (Worst Case Execution Time) in case of processing
WCRT (Worst Case Response Time) in case of communication
SoC levelReal-time processing/communication
Processing cores are connected via RT-NoC
Redundant processing cores and network links
SiP levelD-RMTP I SoC
and DRAM modules
are integrated by FFCSP
Real-time DVFS w/ self-monitoring
Thermal control w/ self-monitoring
Robot levelD-RMTP I SiPs are connected via
Responsive Link
Adaptive ECC for Responsive Link
Network reconfiguration to avoid faulty links
Task migration from faulty SiPs
Multi-Level Dependability Support
DRMTPSiP
DRMTPSiP
DRMTPSiP DRMTP
SiP
DRMTPSiP DRMTP
SiP
Responsive Link
Humanoid Robot
D-RMTP SoC
DRAMModules
D-RMTP I SiP
SoC for Embedded Real-Time Processing:
Responsive Multithreaded Processor (RMTP)
Real-time processing unit: RMT PUReal-time execution mechanism (RMT execution)
A context switch is converted to the prioritized SMT execution. 8-thread simultaneous execution in order of priorityThread control bases on priority (256-level)Thread wake-up by an interruptIPC control (processing speed control of real-time threads):Control of WCET
Multimedia processing units (Vector + SIMD)Flexible 2D vector processing units (Integer, FP)Shared vector registers by multiple threads
Context cache (32threads): 4-clock context switchTrace buffer
Real-time communication :Responsive Link
Preemptive communication: Packet overtaking by priority
Packet acceleration/deceleration: Packet priority can be replaced with new priority at each node.
ISO/IEC 24740Computer I/O peripherals
PCI-X, IEEE-1394, Ethernet, etc.Control I/O peripherals
SpaceWire (3-ch switch)PWM Generators, Pulse Counters, etc.
D-RMTP I
RMT PUReal-Time Execution
8way Prioritized SMT
IPC Control
32 Context CacheTrace Buffer
2D Vector Units
DDR SDRAM I/F
Memory bus (256bit)
Gateway
I/O bus (32bit)
256/32bit DMAC
PCI-X SPIIEEE1394
UARTDigital
Port
PWM-inPWM-out
Encoder
32bitExternal
Bus
Responsive
Link
128/32bit
DDR SDRAM
ADC /DAC
Digital
Camera
ROM /
I/O Dev.
Ethernet
MAC
SpaceWire
D-RMTPⅠ
64bit
66MHz4cs x 2ch
(8ch)
1ch 4ch 8bit In: 3ch
Out: 12ch
Cnt: 4ch
2cs
2dreq
2irq
1ch
20ch1ch
128/32 bit
PCI/O
DevicesInternet RMTPRMTPRMTPRMTP
4ch
AC/DC
Motors
I/O
Devices
32bit DDR SDRAMfor Responsive Link
SRAM (256kB)
Data linkEvent link
32bitDMAC
I/ODevices
3ch
Real-Time Network
10mm
10m
m
Real-Time Scheduling
Time
Task 0Task 1Task 2Task 3
low
high
context switch
Release TimeDeadline
Task 4Task 5Task 6Task 7System
Time
Task 0Task 1Task 2Task 3
low
high
Task 4Task 5Task 6Task 7System
Elimination of the overhead the context switches
Priority
Thousands of clock cycles (yellow parts) are needed to switch contexts.
A time constraint including deadline and cycle is converted to priority.
Prioritized threads are scheduled and executed in priority order.
A set of prioritized contexts is treated as a task cue of an RT-OS.
The contexts are executed in priority order by hardware.
RMT Execution by RMT PU
Time
Thread 0
Thread 1
Thread 2
Thread 3
Low Priority
High Priority
Thread 4
Thread 5
Thread 6
Thread 7
System
Release Time Deadline
Multiple threads are executed simultaneously in priority order.
Implicit context switching: A context switch is converted to RMT execution (prioritized SMT execution).
Basically no software context switch exists.
High throughput real-time processing
Requirements for Real-Time
Communication
Preemption
Achieved by packet overtaking
Higher priority packets can overtake lower
priority packets at each node.
WCRT (Worst Case Response Time)
Network latency depends on the size of a packet
and its blocked time
Packet level overtaking
Blocking time of a packet becomes constant.
Essential Requirement
for Real-Time Communication
Release time Deadline
Preemption capability is required to apply real-time scheduling algorithms to communications.
preemption
preemption
A Real-Time Packet Scheduling Algorithm
Virtual Deadline Monotonic [2]
connection1
virtual deadline
Period (=Deadline) Transfer time
connection1 14 4
connection2, connection3 12 6
2 < 1
priority
3 < 1
node1
node2
node3
Link1,2
Link2,3
t
t
connection3
connection2
connection1 connection3
connection1
deadline
connection2
TX start
TX start
14
0
5
connection1 7
[2] S.Kato, Y.Fujita, and N.Yamasaki, “Periodic and Aperiodic Communication Techniques for Responsive Link”, The 15th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pp.135-142, 2009 .
Real-Time Communication
Responsive LinkSplit transmission and independent routing of data and eventsFull-duplex and differential I/FVirtual cut-through switch with packet overtaking (preemption) function: The packet with higher priority can overtake other packets at each node.Priority replacement: Packet priority can be replaced with new priority at each node to accelerate/decelerate the packet.When the network address is same but the priority is different, the different route can be set to realize an exclusive line or a detour.Fixed packet size for WCRT estimation
Data link (64-byte), Event link (16-byte)Powerful adaptive error correction
{RS, None}, {BCH, Hamming, None}, {BS+NRZI, 8b10b, 4b10b}Flexible link speed (12.5 to 800 Mbps/link)Point-to-point link for any topologyStandardization: ISO/IEC 24740
Event link
Low Latency
Data link
High Throughput
Shared traffic
indefinite latency
and throughput
sync interrupt
image sound voice
status
table text
open signal connect
image sync syncsignal sound
Split Links for Event and Data
Dual Physical Communication Links
Event Link : like a nerve networkHard real-time communication for control
Soft real-time communication for bulky data
Multimedia data transmission (images, voice, etc.)
Relatively large fixed packet size (64B)
Total throughput is more important.
Control commands, Inter-processor interrupt,
synchronization, etc.
Relatively small fixed packet size (16B)
Low latency is more important.
Data Link : like a vessel network
Packet
Format
Source Addr. Destination Addr.
Event Packet Format (16B)Data Packet Format (64B)
Source Addr. Destination Addr.
Payload
Redundancy bitsData bits
Frame Format (12bits)
1 bit
1 byte
Serial Number (Cnt.)CorrectFatalInt.Start End
UD Full Data Length
Dirty0 Dirty1 Dirty2 Dirty3 Dirty4 Dirty5 Dirty6 Dirty7
Dirty8 Dirty9 Dirty10 Dirty11 Dirty12 Dirty13 Dirty14 Dirty15
Control & Status Format (32bits)
Control & Status
Control & Status
0
1
2
3
Payload
Responsive Link I/F
Tx Data+
Tx Data-
Rx Data+
Rx Data-
Responsive Link CableResponsive Link Connector
Tx Event+
Tx Event-
Rx Event+
Rx Event-
Event Link
Data Link
Virtual Cut-through Switch
with Packet Overtaking Function
Routing Table
Priority[7-4] Priority[3-0]
Src Addr(16b) Dtn Addr(16b)
EE
DE
P[7-4] PE
P[3-0] L0L4
L3
L2
L1
ReferentReference
It is possible to set a different route when the priority is different
even if the network address is same. (Default route is priority 0.)
It is possible to replace a packet priority with a new priority at
each node.
Routing based on Priority (1/2)
Source
Destination
Data (Priority0)
Data (Priority1)
Event (Priority0)
Event (Priority2)
送信元
送信先
ノード0
ノード1 ノード2
データ
(優先度0)
データ
(優先度3)
ノード3 ノード4 ノード5 ノード6
7 8 9 10 11 12 13 14
Routing Based on Priority (2/2)
Source
Destination
Data withpriority 3
Data withpriority 0
Node0
Node2
Node1
Node3 Node4 Node5 Node6
Adaptive Codecs
Error correction codeByte (block) error correction
RS (1byte error correction) (4B, 6B)
None
Bit error correctionHamming (1bit error correction) (8b, 12b)
BCH (2bit error correction) (8b, 16b)
None
Line codeBit staffing + NRZI (dynamic, clock embedded, DC balancing)
8b/10b (static, clock embedded, DC balancing)
4b/10b (static, clock embedded, DC balancing, 1biterror correction)
Measurement of Real Communication Noise
Transmitter Pulse Receiver Pulse
D-RMTP Ⅰ 30mm sq Eva Kitト
Responsive Link + ECC
D-RMTPⅠ 30mm sq Eva Kitト
Noise generator(Amplification of real motor noise)
Performance of Noise Tolerance
Combination of Codecs
Error correction: {RS, None}, {HAM, BCH, None}
Line code: BitStaffing+NRZI, 8b10b, 4b10bBlock level ECC Bit level ECC Line code Codec rate
BS+NRZI (9, 8) 29.6%
BCH (16, 8) 8b10b (10, 8) 26.7%
4b10b (10, 4) 13.3%
w/ Reed Solomon BS+NRZI (9, 8) 39.5%
(48, 32) Hamming (12, 8) 8b10b (10, 8) 35.6%
4b10b (10, 4) 17.8%
ECC None 8b10b (10, 8) 53.3%
4b10b (10, 4) 26.7%
BS+NRZI (9, 8) 44.4%
BCH (16, 8) 8b10b (10, 8) 40.0%
4b10b (10, 4) 20.0%
w/o Reed Solomon BS+NRZI (9, 8) 59.3%
Hamming (12, 8) 8b10b (10, 8) 53.3%
4b10b (10, 4) 26.7%
ECC None 8b10b (10, 8) 80.0%
4b10b (10, 4) 40.0%
Effect of line code: dominant
BS+NRZI
BER10-3%: 50% error
BER10-2%: 100% error
8b10b
BER10-3%: 20% error
BER10-2%: 90% error
4b10b
BER10-2%: 0% error
@1-bit noise length
D-RMTP I SoC and DRAM modules are
integrated on a SiP Interposer by FFCSP
Real-time DVFS (D-RMTP I )
Low-power while guaranteeing deadline
Safety voltage control w/ self-monitoring
Prevent D-RMTP I & DRAM from Overheating
Thermal control w/ self-monitoring
SiP Level Dependability
20μsec
20μsec
0.80V
1.10V
Voltage & thermal control (D-RMTP I )
DRAM
DRMTP IIFPGA Substrate
Vertical chip stacking (D-RMTP II )
Redundant vertical links(LSI Internal Logic Level)
Undershoot
Voltage transition (0.81.1V)
SoC/SiP Co-design for
Improvement of Dependability
1.Optimizat
ion of IP
and I/O pin
arrangeme
nts for SiP
design
2. Optimization of
bump arrangement
for SiP wiring
3. Inside-SoC adjustment scheme
of wiring jitters that cannot be
adjusted by SiP
I/O buffers of SoC
Jitter adjustment by SiP wiring pattern
Jitter adjustment mechanism inside SoC
Conventional design scheme
Our scheme: Codesign of SoC and SiP
Stabilization of analog characteristic of SiP wiring pattern
Wide wiring area
Low noise tolerance
Narrow wiring areaHigh noise tolerance
Improvement of dependability
30mm square D-RMTP SiP
Thermal Sensorfor D-RMTP
DRAM
Flash D-RMTP
FPGAADC for Supply Voltage
ADC for Thermal Sensor (D-RMTP)
ADC for Thermal Sensor (DRAM)
DC/DC Converterfor RT-DVFS
Potentiometer for RT-DVFS
Thermal Sensor for DRAM
RT-DVFS Function
Almost all functions of PC + Embedded Microcontroller +Real-Time Processing Core + Real-Time Communication
DRAM
Robot Level Dependability
DRMTPSiP
DRMTPSiP
DRMTPSiP DRMTP
SiP
DRMTPSiP DRMTP
SiP
Responsive Link
Humanoid Robot
ECC code
(4Byte)
ECC code
(1Byte)Line code
Reed-
Solomon
(48, 32)
BCH
(16, 8)
BS+NRZI (9, 8)
8b10b (10, 8)
4b10b (10, 4)
Hamming
(12, 8)
BS+NRZI (9, 8)
8b10b (10, 8)
4b10b (10, 4)
None8b10b (10, 8)
4b10b (10, 4)
None
BCH
(16, 8)
BS+NRZI (9, 8)
8b10b (10, 8)
4b10b (10, 4)
BS+NRZI (9, 8)
Received waveform w/ noise
D-RMTP I SiPs are connected via Responsive Link
Permanent faults (links & boards)
Network reconfiguration to avoid faulty links
Task migration from faulty D-RMTP I SiPs
Transient faults (links)
Adaptive ECC & line codes for Responsive Link
(1)Permanent faults by link disconnection
(2)Transient faults by motor noise
D-RMTP On-board Motor Driver
D-RMTP SiP on board
Size: 85x60x36 mm
Specification of motor driver
Voltage:80V
Current:cont. 50A, max. 200A
Vector control
Water-cooling
RMTPFPGA
Altera EP2CIsolator
PWM
ENC
Responsive Link x4
Flash
bus bus
busAltera EPCS
ADC
ADS7886x4
AD
inputx
3
Gate
driver
UART
5V m
onito
r
JTAG
RS
422x
2
EN
C
HO
LE
3.3V for I/O
1.2V for FPGA
1.0Vfor RMTP
5V
GPIO
GP
IO
CLK
SW/RESET
reset
OSC
CLK
Water-cooling
D-RMTP SiP
Responsive Link
D-RMTP I SoC
Responsive Link (4ch)RS-232C (2ch)
DC Jack
USB (Host & Peripheral)FPGA I/O
PWM-IN/OUTEncoder
D-RMTP I Evaluation Kit
For more information, please contact:Yamasaki lab., Keio Univ.http://www.ny.ics.keio.ac.jp/
30mm sq D-RMTP I SiP
I/O Core SiP
Analog I/F Board (Center)
USB Board (Top)
DRAM
I/O Core
FPGA
Responsive Link
Connector
Responsive Link
Connector
RS232C
RS485
A/D Converter
Acceleration Sensor
Thermal Sensor
Analog Input Connector
I2C Connector
SPI Connector
FPGA
USB Peripheral Connector USB Host Connector
Flash ROM
Flash
USB I/F
UART Connector
Ultra Small Evaluation Kit for Distributed
Control via Responsive Link
USB Host
Connector
JTAG
ConnectorPLL
ResponsiveLink
I/O
UART, GPIO, SPI, I2C,DMAC
IO CORE
PCI
I-Cache D-Cache
I/O Core SiP (Bottom)
Programming
ISA of RMT Processor :MIPS upper compatible ISA + thread control instructions + vector instructions
MIPS instructionC and C++ available
RMT Processor own instructions (multithread instructions, vector operations)
Assemble language
Libraries (boost+, tvnet)
OSLinux
iTRON
favor, Tflight (our original RT-OS)
Cross development toolshttp://www.ny.ics.keio.ac.jp/research/rmt/
Simulators
Anywhere the D-RMT Processor can be developed.Host OS: Linux, Solaris, Cygwin
No real hardware is needed.
A laptop PC is available to develop a program.
High speed and low functionality: ISS (Instruction Set Simulator)
Processor model
Major I/O models (Serial, Responsive Link)
No timing check
Low speed and high functionality: RTLS (RTL Simulator)All functional models of the D-RMTP SiP
RMT Processor
Responsive Link
PCI
PWM, etc.
Humanoid robot
Control board
Dependable SiP
Dependable SoC
High power motor driver
High power leg
Context Cache (RF)
Context Cache (status)
Vector Register (FP)
Vector Register (Int.)
Instruction Cache
Data Cache
RMT PU
PLL
IEEE1394SDRAM IF
DMAC
Responsive Link
SoC/SiP co-design
Hard/software
co-design
Thread ControlUnit
Fetch ThreadSelector
InstructionMMU
InstructionCache
InstructionVictim Cache
InstructionWait Buffer
Decoder
PC ControlUnit
InstructionBuffer Instruction
Issue Selector
DataRead/Write
Buffer
DataCache
DataVictim Cache
DataWait Buffer
DataMMU
Context Cache
Reorder Buffer
RegisterFile
Reservation Station
Reservation Station
Reservation Station
Reservation Station
Reservation Station
MemoryAccess
BranchUnit
IntegerDivider
IntegerUnit
FPDivider
FPUnit
64bitInteger
VectorInteger
VectorFP
Common Data Bus Arbitor
PC(32bitx8)
Priority(8bitx8)
PC(32bit)
Phisical Address(32bit)
Instruction(32bitx8)
Instruction Type(8inst.)
Thread Op.
InstructionSelect(4inst.)
Operation(4inst.)
Operation/Data(4inst.)
Commit Instruction(4inst.)
1inst. 2inst. 1inst. 4inst. 1inst. 2inst. 1inst. 1inst. 1inst.
Load/Store Op.(1inst.)
Load Data(64bit)
Write Back(4inst.)
Operation(8inst.)
Load/Store Op.(1inst.)
Instruction Execution Mechanism
Instruction Fetch and Issue MechanismCache Mechanism
1inst1inst1inst1inst1inst1inst
2inst
4inst
AddrData
AddrData
AddrData
AddrData
AddrData
AddrData
PhisicalAddr
(32bit)
InstructionAnalysis
RenameRegister
QoSRT-OS
•RT-Processor•RT-Network
3-D mounted SiP
DistributedcontrolRT-DVFS
Noise toleranceHeat radiation
Real-time
Conclusion