Date post: | 09-Jan-2017 |
Category: |
Technology |
Upload: | kurt-shuler |
View: | 180 times |
Download: | 2 times |
Copyright © 2016 Arteris
Ncore™ Cache Coherent InterconnectTechnology Overview, 24 May 2016
David KruckemyerChief Technology Officer Chief Hardware Architect
Craig Forrest
24 May 2016
Copyright © 2016 Arteris 2
Contents
○ About Arteris
○Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
3
Arteris: The on-chip interconnect leaderArteris Product Milestones○ Founded in 2003 to pioneer network-on-chip (NoC) interconnect○ NoC Solution = first released NoC implementation in 2005○ FlexNoC® = second generation Arteris NoC in 2009/2010○ FlexPSI = die-to-die or chip-to-chip parallel interface in 2013○ FlexNoC Resilience Package™ = Functional Safety option in 2014○ FlexNoC Physical™ = Physically aware IP with FlexNoC Version 3 in 2015○ Ncore™ Cache Coherent Interconnect = Heterogeneous cache coherency in 2016.Company○ Headquarters and Engineering Development in Campbell, USA○ Worldwide support offices (USA, France, China, Korea, India, Japan)
Copyright © 2016 Arteris
Customer Adoption
* Customer data current as of 1 May 2016
Awards
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 6 9 1320
4152
5867
76 79
4
Arteris has become the standard for complex and low-power SoCs
Copyright © 2016 Arteris
Customers shipped > 1B SoCs as of 2015
108 Chips Produced
*Data is cumulative. Design data is customer-reported and subject to change. Data is current as of 1 May 2016.
146 Tape-Outs240 Design Starts
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 5 1119
32
55
99
119
140146
2008 2009 2010 2011 2012 2013 2014 2015 2016
1 411
2033
51
79
104 108
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 5 13 2641
85
128159
190
229240
5
Arteris Customers:Arteris technology is becoming a standard
Copyright © 2016 Arteris
Mobility
Automotive, IoT (Internet of Things), Camera & CE (Consumer Electronics)
SSD (Solid State Drive), Networking & Automation
Current as of 1 May 2016
Very Large SoC Maker
Toshiba Japan System OEM
AutomotiveSoC Maker
Major SSD Vendor
Defense Contractor
Defense Contractor Silicon Foundry Major IP
Provider
Large Drone Maker
Japan Tier 1 SoC Maker
Defense Contractor
Major Auto & CE SoC Maker
Major AutomotiveOEM
Major SSD Vendor
Copyright © 2016 Arteris 6
Arteris interconnect IP now covers coherent and non-coherent use cases
Design-Specific Subsystems
GPU Subsystem3D Graphics
DSP Subsystem (A/V) AES
2D GR.
MPEG
Etc.
FlexNoC® Non-coherent Interconnect
High Speed Wired Peripherals
USB 3USB 2
PHY3.0, 2.0
PCIe
PHY
Ethernet
PHY
Wireless Subsystem
WiFi
GSM
LTE
LTE Adv.
InterChip LinksTM
HDMI
MIPI
Display
PMU
JTAG
I/O Peripherals
Memory Subsystem
Wide IO LP DDRDDR3
PHY PHY
Memory Scheduler
Memory Controller
Arteris Interconnect IP Products
Subsystem InterconnectCRI
CryptoFirewall (PCF+)
RSA-PSSCert.
Engine
Security Subsystem
IP
IP
IP
IP
IP
IP
FlexWay® Interconnect
Application IP Subsystem
IP
IP
IP
IP
IP
IP
FlexWay Interconnect
Ncore™ Cache Coherent Interconnect
CPU Subsystem
A57
L2 cache
A57
A57
A57
A53
L2 cache
A53
A53
A53
Copyright © 2016 Arteris 7
Contents
○ About Arteris
○Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 8
Modern SoC Design Challenges
○ SCALABILITY: How to scale systems up as the number of coherent agents increases?
○HETEROGENEITY: How to integrate coherent processing elements using different protocols, different semantics, or having different cache characteristics?
○ SYSTEM INTEGRATION: How to integrate IP that is not cache coherent and achieve better performance?
○ PHYSICAL DESIGN: How to create a cache coherent system that is easily placed on chip?
○ POWER MANAGEMENT: How to optimize power consumption of complex systems?
Copyright © 2016 Arteris 9
Why Caches?
○Caches are small, fast memories tightly coupled to processing elements
○Reduced average memory latency means higher performance• Temporal locality• Spatial locality
○High bandwidth due to high frequency and wide interfaces
○ Fewer off-chip DRAM accesses resulting in lower power consumption
Copyright © 2016 Arteris 10
Why Cache Coherency?
○Caches create multiple copies of data• Managing these copies in software is difficult
○Hardware cache coherency creates the illusion of a flat, shared memory• Caches are invisible to software• Multiple copies are kept consistent
○ But… managing copies in hardware requires a lot of communication• Must check every place there may be a valid copy Snoop• Snoop filters reduce communication by tracking cache contents
Copyright © 2016 Arteris 11
Contents
○ About Arteris
○Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 12
Ncore Cache Coherent Interconnect IP
CPU Cluster
DRAM
Cache ($)
GPU
Cache ($)… Image
ProcessingDisplay
Processing…
Subsystems
Peripherals
Coherent Agents Non-coherent Agents
Non-coherent A
gents
Memory Agents
Non-coherent interconnect subsystemNcore
Copyright © 2016 Arteris 13
Ncore Interconnect Architecture
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
Copyright © 2016 Arteris 14
Coherent Read Example – Cache Hit
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridge⋯❷
❸
Coherent Agent Interface
Cache ($)
Consumer
❶
Producer
Copyright © 2016 Arteris 15
Coherent Read Example – Cache Misses
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridge⋯❷
❸
Coherent Agent Interface
Cache ($)
Consumer
❶
Memory
❹
Copyright © 2016 Arteris 16
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 17
Benefit #1:
True heterogeneous coherency
Two features are primarily responsible for enabling Ncore’s unique heterogeneous cache coherency capabilities:
1. Support for multiple coherence models
2. Use of multiple configurable snoop filters to accommodate different cache organizations
Copyright © 2016 Arteris 18
Benefit #1: True heterogeneous coherencySupport for heterogeneous coherent agents
○Cache coherent agents can differ greatly, which increases the difficulty in integrating them into a system-on-chip• Logical – coherence models• Physical – cache organization, transaction table sizes
○Ncore adapts to each coherent agent’s behavior and characteristics• Coherent agent interfaces adapt individual coherence models to a
generic model using a lightweight messaging layer
Copyright © 2016 Arteris 19
Benefit #1: True heterogeneous coherencyCoherent agent interfaces adapt individual coherence models to a generic model
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
Copyright © 2016 Arteris 20
Benefit #1: True heterogeneous coherencyWith multiple configurable snoop filters
Non-coherent Domain
Non-coherent B
ridge(s)P
roxy Cache ($)
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
○Cache coherent agents can have very different behaviors• Cache organization• Coherency models• Workloads
○ Associating caching agents that share common properties with individual snoop filters can consume less die area than a monolithic snoop filter
Copyright © 2016 Arteris 21
Benefit #1: True heterogeneous coherencyMultiple snoop filters are more area-efficient than one
A
Cache ($)
B
Cache ($)C
Cache ($)
DCache ($)
Multiple snoop filters are smaller: area(Y+Z) < area (X)
Traditional Approach
MonolithicSnoop Filter
(X)
REQ
ABCD
Ncore Approach
REQ
Snoop Filter #1(Y)
Snoop Filter #2
(Z)
ABCD
Copyright © 2016 Arteris 22
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 23
Benefit #2:
Highly scalable systemsWith a configurable, modular approach
○ Transaction processing and data bandwidth scaling• Each component can be scaled individually (add or subtract
components)• Ports per component can be scaled individually (add or remove
ports)
○Why is configurable interconnect superior to fixed-function, centralized controllers?• Meet performance goals without wasted resources• Easily adjust system design as requirements evolve• Build derivative chips based on the same platform
Copyright © 2016 Arteris 24
Benefit #2: Highly scalable systemsAdd more components or ports to scale bandwidth
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($)
⋱
⋯⋯
Coherent Memory Interface ⋯
Cache ($)
Coherent Agent Interface
Coherent Memory Interface
Non-coherent Subsystem
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
Cache ($)
Coherent Agent Interface
Add more components…
…or add more ports
Copyright © 2016 Arteris 25
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 26
Benefit #3:
Higher performance with non-coherent IPUsing configurable proxy caches
Advantages (new and novel)1. Better for sharing data between non-coherent agents and
coherent agents2. Better for sharing data between non-coherent agents
○Using a proxy cache minimizes communication through DRAM
○ Additional system benefits• Pre-fetch effect – fetch cache lines vs. individual data• Write-gathering benefit – writes accumulated in cache • Optimizes coherent memory accesses
Copyright © 2016 Arteris 27
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent & coherent agents Using configurable proxy caches
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
Consumer Producer
❶
❷
❸
❹
❺
Copyright © 2016 Arteris 28
Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent agents Using configurable proxy caches
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
ConsumerProducer
❶
❷
❸
❹
Copyright © 2016 Arteris 29
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 30
Benefit #4:
Lower power consumptionWith multiple clock and voltage domains
Non-coherent Subsystem
Directory
Snoop Filter
Snoop Filter
Snoop Filter
Snoop Filter
CCTI
Coherent Agent Interface
Cache ($) Cache ($)⋯
⋱
⋯
Coherent Memory Interface ⋯
Coherent Agent Interface
Coherent Memory Interface
Non-
coherent B
ridgeP
roxy Cache ($)
Non-
coherent B
ridgeP
roxy Cache ($)
⋯
Copyright © 2016 Arteris 31
Ncore Benefits
1. True heterogeneous coherency
2. Highly scalable systems
3. Higher performance with non-coherent IP
4. Lower power consumption
5. Easier chip floorplanning
Copyright © 2016 Arteris 32
Benefit #5:
Easier chip floorplanningWith a highly distributed architecture
○ Reserve less area for cache coherent interconnect• Place it in existing “white space” routing channels – easier P&R
○ Locate modular Ncore components closer to critical IP – better timing
○ Minimize wiring congestion
Source: Andrei Frumusanu, AnandTech
Hub- and crossbar-based coherent interconnects require significant contiguous reserved die area
Copyright © 2016 Arteris 33
Contents
○ About Arteris
○Caches, Cache Coherency and Challenges
○ Introducing Ncore Cache Coherent Interconnect
○ Summary
Copyright © 2016 Arteris 34
Summary
Ncore™ Cache Coherent Interconnect IP is targeted at heterogeneous SoCs.
○ Scalability○Configurability ○ Area efficiency○High performance○Optimal power consumption
○Multiple configurable snoop filters
○Multiple configurable proxy caches
○Modular distributed architecture
Benefits Major Unique Features
RESULT: Custom-configured interconnect IP that meets exact
system requirements
Copyright © 2016 Arteris 35
To request more information, visit us at http://www.arteris.com/contact