David Zhang
Cadence Design Systems
ARMv8-Based SoC HW/SW Integration and Verification Solution
2 © 2014 Cadence Design Systems, Inc. All rights reserved.
Example ARM®-based HW/SW System
LPDDRDRAM NAND
FLASH
NAND
FLASH
Cellular
Modem
WiFiLLI
DigRF
LP
DD
R 2
eM
MC
4.5
UF
S
LP
DD
R 3
SD
3.0
SD
4.0
UF
S
SLIMbus
DSI
CSI2
CSI3
Bluetooth
SDIO
FM
Receiver
GPS
Receiver
RF
FE
SL
IMb
us
Motion
SensorscJTAG
GBT
SP
MI
Power
Control
Multimedia
Processor
I2C
US
B 2
.0
Memory
Card
HDMI 1.4
Touch Screen
Controller
Display
Driver
Audio
Interface
Camera
Interface
USB 3.0 OTG
OCP 2.0
OCP 3.0
System on PCB
Application Specific Components
SoC Interconnect Fabric
ARM CPU Subsystem
3D
GFX
DSP
A/V
High speed, wired interface peripherals
DDR
3
PHY
Other peripherals
SATA
MIPI
HDMI
WLAN
LTELow-speed peripheral
subsystem
Low speed peripherals
PMU
MIPI
JTAG
INTC
I2C
SPI
Timer
GPIO
Display
UART
Apps
Accel
Modem
Cortex
-A15
L2 cache
USB3.0
3
.
0PHY
2
.
0PHY
PCIe
Gen 2,3
PHY
Ethe
r
net
PHY
Cortex
-A15
Cortex
-A7
L2 cache
Cortex
A-A7
Cache Coherent Fabric
SoC
Software
Ba
re M
eta
l
So
ftw
are
DS
P S
oft
wa
re
Ba
re M
eta
l
So
fwta
re RTOS
Drivers
Communications L2
Communications L1
Firmware / HAL
Communications L3
Modem Comms
Application
Processor
Bare Metal
Operating Systems (OS)
Drivers
Applications
Middleware
Firmware / HAL
3 © 2014 Cadence Design Systems, Inc. All rights reserved.
Challenges at the SoC, System, & SW level
LPDDRDRAM NAND
FLASH
NAND
FLASH
Cellular
Modem
WiFiLLI
DigRF
LP
DD
R 2
eM
MC
4.5
UF
S
LP
DD
R 3 SD
3.0
SD
4.0
UF
S
SLIMbus
DSI
CSI2CSI3
Bluetooth
SDIO
FM
Receiver
GPS
Receiver
RF
FE
SL
IMb
us
Motion
Sensors cJTAGGBT
SP
MI
Power
Control
Multimedia
Processor
I2C
US
B 2
.0
Memory
Card
HDMI 1.4
Touch Screen
ControllerDisplay
Driver
Audio
Interface
Camera
Interface
USB 3.0 OTG
OCP 2.0OCP 3.0
System on PCB
Application Specific Components
SoC Interconnect Fabric
ARM CPU Subsystem
3D
GFX
DSP
A/V
High speed, wired interface peripherals
DDR3
PHY
Other peripherals
SATA
MIPI
HDMI
WLAN
LTE Low-speed peripheral
subsystemLow speed peripherals
PMU
MIPI
JTAG
INTC
I2C
SPI
Timer
GPIO
Display
UART
Apps
Accel
Modem
Cortex
A57
L2 cache
USB3.0
3.0
PHY
2.0
PHY
PCIe
Gen 2,3
PHY
Ether
net
PHY
Cortex
A57
Cortex
A53
L2 cache
Cortex
A53
Cache Coherent Fabric
SOC
Software
Bare
Meta
l
So
ftw
are
DS
P S
oft
ware
Bare
Meta
l
So
fwta
re RTOS
Drivers
Communications L2
Communications L1
Firmware / HAL
Communications L3
Operating Systems (OS)
Drivers
Applications
Middleware
Firmware / HAL
Multi-core early software bring-
up and integration on 64-bit
How do I represent the SoC
environment?
Developing environments for
hardware/software integration and
use-case verification on
simulation/emulation platforms
Bare-metal software use-case testing to
verify multi-core cache and I/O
coherency, concurrency, power shut
off, etc…
Debugging of complex multi-core SoC software scenarios on RTL simulation/emulation
platforms
Characterizing and analyzing system-on-chip (SoC) performance and efficiently
debugging issues
Verification of IPs on AMBA
interconnect with adherence to
ACE protocol
4 © 2014 Cadence Design Systems, Inc. All rights reserved.
Software is key to verification
Applications
(Basic to Angry
Birds)
IP
Sub-System
Bare metal SW
OS & Drivers
(Linux, Android)
System on Chip
Middleware
(Graphics, Audio)
SoC in System
Only
small
gate
level
changes
and
ECO’s
RTL
Becomes
stable
Idea to
specProduction
Post silicon
Validation
SpecPost SiNetlist to GDSII
RTL-Design & IP Integration & VerificationFabIP Qualification
Time for critical bugs in
System Environment to be removed
SW
Development
on ChipM
ay h
old
final ta
pe
ou
t if bu
g to
o c
ritical
Source: Cadence, IBS
5 © 2014 Cadence Design Systems, Inc. All rights reserved.
Accelerating ARM-based development
6 © 2014 Cadence Design Systems, Inc. All rights reserved.
Early OS & Software Bring-up
7 © 2014 Cadence Design Systems, Inc. All rights reserved.
HW/SW Concurrency Gap
Tapeout Silicon Samples Product Ships
SW
HW
System
Legend
HW Development & Verification
Continuous System Validation
Next Generation SW-Driven SoC Flow
HW Development & Verification
SW Dev
On model
HW Development & Verification
System Validation
SW Dev and Bringup On real HW design, Silicon
SW-Enhanced SoC Flow
Traditional SoC then SW Flow
Continuous SW Development & Bringup
SW Dev and Bringup on Silicon
System Validation
Enabled By
Virtual Platform
FPGA Prototype
Emulation
Powered By
Platform Hybrids
Emulation + Virtual Platform + FPGA
8 © 2014 Cadence Design Systems, Inc. All rights reserved.
TLM Virtual Platform – VSP Emulation – Palladium® XPI/II
Early SW Execution on Palladium
- Up to 100MHz
- Early Availability for SW Developers
- Advanced SW Debug
- Fast SW Turnaround Time
- Up to 4MHz
- From early-RTL to full-SoC Validation
- Advanced HW Debug
- Fast HW Turnaround Time
Hybrid Solution with SW Integrator
.- Boot Complex OS at 48MHz
- Speed UP SW-Driven tests 1-10X
over emulation
- Early Availability for SW Developers
- Advanced HW + SW Debug
- Fast HW and SW Turnaround Time
9 © 2014 Cadence Design Systems, Inc. All rights reserved.
VSP Execution Engines Palladium
Palladium/VSP Hybrid Solution
Architected for SW Performance
− High-speed virtual platform
− Asynchronous HW/SW Execution with Interrupt driven sync
− High-Speed Multi-Domain Memory Coherency
Designed to integrate HW and SW flows
− Does not require changes to HW or SW stacks
− Virtual connections into SW Engineer’s environments
− Seamless hybrid execution for both HW and SW users
Proven Methodology, Unique Expertise
− Cross-platform and design integration expertise
− Exclusive hybrid methodology delivers performance and repeatability
− Proven during successful application to SW-rich SoCs
Smart
Memory
Virtualized
CPU
Sub-system
CPU
Bridges
Customer
Virtual Models
VSP Virtual
Models UART, eMMC, USB
Integration
APIs
GPU IP
Memory
ControllerIP IP
RTL Fabric
DDR
ARM AMBA,
interrupts, resets
Customer Design in Palladium®
AVIP
SW Integrator
10 © 2014 Cadence Design Systems, Inc. All rights reserved.
Performance Results
• Boot OSes, run real world applications and benchmarks
• Linux kernel boot– Palladium only = 45 mins
– Hybrid = 2 mins
• Android– Palladium only = Hours*
– Hybrid = 40 – 50 mins
• Windows– Palladium only = Days*
– Hybrid = 75 – 90 mins
11 © 2014 Cadence Design Systems, Inc. All rights reserved.
Performance Analysis
12 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIeRC
LCDDMA
CoreLink NIC-400 (2x1)
ADB Co
reLi
nk
NIC
-40
0ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0F1F2F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLinkDMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IPIPIP
DVFS CLK/PSO
Domain
CLK/PSO Domain
SystemControl
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
ARM ARMv8-A mobile example SoC
13 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIeRC
LCDDMA
CoreLink NIC-400 (2x1)
ADB Co
reLi
nk
NIC
-40
0ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0F1F2F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLinkDMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IPIPIP
DVFS CLK/PSO
Domain
CLK/PSO Domain
SystemControl
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
What is the latency of the processor clusters to
memory paths including all async bridges ?
What is the latency of the processor clusters to
memory paths including all async bridges?
ARM ARMv8-A mobile example SoCPerformance challenges
14 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIeRC
LCDDMA
CoreLink NIC-400 (2x1)
ADB Co
reLi
nk
NIC
-40
0ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0F1F2F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLinkDMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IPIPIP
DVFS CLK/PSO
Domain
CLK/PSO Domain
SystemControl
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADBWhat is the latency of the processor
clusters to memory paths including all async bridges ?
What is the bandwidth of the paths from IP with high bandwidth demands
to memory?
ARM ARMv8-A mobile example SoCPerformance challenges
15 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
CortexTM-A53 Cluster
CortexTM-A57 Cluster
Customer or MaliTM
GPU
S4 S3 S2 S1 S0
ADB ADB ADB
ADB
CoreLink GIC-400
CoreLink NIC-400
PCIeRC
LCDDMA
CoreLink NIC-400 (2x1)
ADB Co
reLi
nk
NIC
-40
0ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0F1F2F3
On-Chip ROM
SRAM
Video SRAM
#2
#4
L2 Cache
Customer DMA or CoreLinkDMA-330
ADB
#1
#3
#2
#4
L2 Cache
#1
#3
Timers
UART
CoreLink NIC-400 CoreLink NIC-400
IP IP IPIPIP
DVFS CLK/PSO
Domain
CLK/PSO Domain
SystemControl
Processor
Coherent Masters
Non-Coherent
Masters
IP
ADB ADB
ADB
What is the bandwidth and latency of the paths from real-time IP to memory
?
What is the bandwidth and latency of the paths from real-time IP to
memory?
ARM ARMv8-A mobile example SoCPerformance challenges
16 © 2014 Cadence Design Systems, Inc. All rights reserved.
Interconnect
Workbench
Assembly
Performance
Measurements
UVM Testbench
IP-Specific
Traffic Profiles
CoreLink 400 System
IP
RTL and IP-XACT
Performance
Analysis
Verification
Closure
Interconnect
Workbench
Analysis and
Debug
Performance
Analyzer
For Interconnect IP Integration•Performance of use-case traffic loads
•Verify configuration functionality
For SoC Integration•Validate performance in context of IPs
Benefits Shorten performance tuning and analysis iteration loop from
days to hours
Reduce testbench development time from weeks to hours
Tune
Architecture
Manual SoC
Testbench
Automate Simulate Analyze
Cadence VIP
Library for AMBA
User
Meta-Data
Manual Testbench Flow
Generated Testbench
Flow
SoC Traffic
Testbench
SoC Verification
Testbench
Interconnect Workbench
17 © 2014 Cadence Design Systems, Inc. All rights reserved.
CoreLink CCI-400
S4 S3 S2 S1 S0
ADB ADB ADB ADB
NIC-400 (2x1)
ADB
ADB ADB
CoreLink TZC-400
Customer DDR Controller
F0F1F2F3
ADB
ADB ADB
ADB
VIP
VIP
VIP
Active
AMBA VIP
Passive
AMBA VIP
Syst
em S
core
bo
ard
an
d P
erfo
rman
ce M
on
ito
r V
IP
VIP VIP VIP VIPVIP VIP VIP VIP
VIPVIPVIPVIP
Characterization TestsRouting Model
UVM
Testbench
DUT
Subsystem testbench
Optional peripheral IP
Using IP-XACT or CSV metadata
18 © 2014 Cadence Design Systems, Inc. All rights reserved.
Auto-generated functional verification test suite
• Generated test suite includes:– Libraries of ready-to-run UVM
sequences + tests
– Covers the most common test scenarios for interconnects
– Constrained-random throughout
– Multiple invocations with different seeds will yield high coverage
– Easily configurable (no need to duplicate test files)
– Remap mode to begin with
– Traffic profile for specific agents/all
• Serves as the platform for performance analysis tests
Test Purpose
single_master_single_slave single path
single_master_all_slaves all paths from a single master
all_masters_single_slave all paths to a single slave
all_masters_all_slaves all paths
19 © 2014 Cadence Design Systems, Inc. All rights reserved.
Charts split by burst length
Increasing bandwidth as burst
length increases
Interconnect Workbench
Analyze performance results
20 © 2014 Cadence Design Systems, Inc. All rights reserved.
Interconnect Workbench
Analyze individual paths, masters
or slaves
Bandwidth, Latency over time and
distribution for each traffic direction
Bandwidth, Latency over time and
distribution for each traffic direction
Bandwidth, Latency over time and
distribution for each traffic direction
Analyze Characterization Results
21 © 2014 Cadence Design Systems, Inc. All rights reserved.
• Significant challenges in predicting and optimizing SoC performance– Multiplicity of IP configuration options particularly in interconnect and DDR
space
– Need a systematic approach with the potential to be automated
• Performance verification accomplished in three steps– Characterization: Fully automated and can be checked as a standard
regressions step
– Architectural: Establish QoS functions as expected
– Use case: Hunt for corner case issues
• Cadence® Interconnect Workbench supports all stages of the process– Automation of testbench, supports ARM CoreLink® System IP
– Automation of the characterization tests
– Comprehensive analysis and checking capabilities
– Traffic synthesizers for architectural and use-case analysis
Interconnect Workbench
22 © 2014 Cadence Design Systems, Inc. All rights reserved.
HW/SW Debug
23 © 2014 Cadence Design Systems, Inc. All rights reserved.
ARMv8-based SoC hardware/softwaredebug solutions
IES
PXPSynchronized with design
and testbench debugger
Cortex®-A53/-A57 post-process
SoC debug
• Integrated and synchronized
hardware/software debug with
testbench
• For verification and design teams
• Enables off-line debugging
• Consistent across IES and PXP
Cortex-A53/-A57 JTAG software
debugger
• Interactive software debugging on
PXP
• Support for software developers
using RealView, Lauterbach, etc..
Embedded C source code
debug with assembly view
Software
variable
tracing
24 © 2014 Cadence Design Systems, Inc. All rights reserved.
ARMv8-based SoC hardware/softwaredebug solutions
Cortex-A53/-A57 post-process SoC
debug
• Integrated and synchronized
hardware/software debug with
testbench
• For verification and design teams
• Enables off-line debugging
• Consistent across IES and PXP
Cortex-A53/-A57 JTAG software
debugger
• Interactive software debugging on
PXP
• Support for software developers
using RealView, Lauterbach, etc..
PXPJTAG debugger support
for software developers
on PXP
ARM RealView
Debugger
Lauterbach
Debugger
25 © 2014 Cadence Design Systems, Inc. All rights reserved.
Verification IP
26 © 2014 Cadence Design Systems, Inc. All rights reserved.
• Benefits
– Get to market first with latest I/Fs
– Verifies SoC data integrity
– Simplify protocol compliance
– Maximize team productivity
• Highlights
– #1 ACE VIP (ARM collaboration)
– Coherent interconnect validation
– Advanced compliance testing
– Formal and acceleration support
• Specification Support
– ARM AMBA CHI, ACE
– ARM AMBA AXI4, AXI3
– ARM AMBA AHB, APB
Cadence VIP for ARM AMBA specifications
Puresuite CMS TripleCheck
Compliance Method
ProtocolChecks
Trace DebugPureView
Configurator
Formal AnalysisInterconnectValidation
AccelerationSupport1
Verification Technologies
100-500
projects
20-100
projects
1-20
projects
500+
projects
Maturity Level
1Accelerated VIP sold
separately
27 © 2014 Cadence Design Systems, Inc. All rights reserved.
Cadence cache-coherent VIP for ACEFull set of VIP agents to verify cache coherent designs
• Generates coherent stimuli and responds
to snoop bursts
• Includes cache model
• Can be configured as ACE or ACE-lite
• Monitors protocol correctness
• Collects coverage
• Includes cache model
• Can be configured as ACE or ACE-lite
Legend: DUT VIP
Cache Cache
Mem Mem
M2Passive Master
S3Passive Slave
Mem
Cache
M2DUT Master
S1Active Slave
S2DUT Slave
S3DUT Slave
M1Active Master
Cache
• Responds to read/write
transactions
• Model sparse memory
• ACE-lite port
• Checks protocol correctness
• Collects coverage
• ACE-lite
CoreLink CCI-400
S4 S3
28 © 2014 Cadence Design Systems, Inc. All rights reserved.
Summary
29 © 2014 Cadence Design Systems, Inc. All rights reserved.
Accelerating ARM-based development