© 2017 Arm Limited
David Koenen / Jeff Defilippi
CCIX: a new coherent multichip interconnect
for accelerated use cases
Senior Product Manager
Arm
Tech Symposia
© Arm 2017 2
Interconnects for different scale
SoC interconnect
• Connectivity for on-chip processor, accelerator, IO and memory elements.
Server node interconnect - ‘scale-up’
• Simple multichip interconnect (typically PCIe) topology on a PCB motherboard with simple switches and expansion connectors.
Rack interconnect - ‘scale-out’
• Scale-out capabilities with complex topologies connecting 1000’s of server nodes and storage elements.
Tech Symposia
© Arm 2017 3
Key drivers for interconnect technology
• Decline of Moore’s law forcing more heterogeneous compute
• Big data analytics growing at 11.7% CAGR
• 5G wireless applications requiring 10x more bandwidth, 10x lower latency by 2021
• Increase in distributed data forcing more network intelligence at faster data rates (10GbE -> 100GbE -> 400GbE)
• Data bandwidth and sharing growth projected at 10x-50x increase vs present PCIe by 2021
Tech Symposia
© Arm 2017 4
CCIXTM cache coherent interconnect for accelerators
New class of interconnect for accelerated applications
Mission of the CCIX Consortium is to develop and promote adoption of an industry standard specification to enable coherent interconnect technologies between general-purpose processors and acceleration devices for efficient heterogeneous computing.
https://www.ccixconsortium.com/
Tech Symposia
© Arm 2017 5
CCIX Consortium Inc
• Formed January 2016, incorporated
in February 2017
• Complete ecosystem with 34
members and growing
• Hardware specification available for
design starts for member companies
• CCIX pronounced: (c’ siks)Arteris, Inc. Guizhou Huaxintong Semiconductor Technology Co. Ltd.INVECAS INC Netronome Phytium Technology Co., Ltd. PLDA Shanghai Zhaoxin Semiconductor Co., Ltd. Silicon Laboratories Inc.SmartDV Technologies India Private Ltd.
Promoters
Contributors
Adopters
Tech Symposia
© Arm 2017 6
Applications benefiting from CCIX
4G and 5G base station
Data-center Search
Embedded Computing
High Performance (Super)Computing
In memory database processing
Intelligent network acceleration
Machine / Deep Learning
Mobile Edge Computing
Video analytics
Tech Symposia
© Arm 2017 7
CCIX multichip connectivity
High performance, low latency
• CCIX defines 25GT/s (3x performance*)
• Examining 56GT/s (7x performance*) and beyond
• Enabling low latency via light transaction layer
Flexible, scalable interconnect topologies
• Flexible point-to-point, daisy chained and switched topologies
Seamless integration
• Runs on existing PCIe transport layer and management stack
• Supports all major instruction set architectures (ISA)
Processor
Accelerator
Smart Network
PersistentMemory
Switch
Tech Symposia
© Arm 2017 8
System topology examples
Accelerator
CCIX
Switch
Processor
CCIX
Processor
CCIX
Memory
CCIX
Memory
CCIX
Processor
CCIX
Accelerator
CCIX
Processor
CCIX
Accel
CCIX
CCIX
CC
IX
CC
IX Accel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IX
Processor
CCIX
Processor
PCIe
Accel
CCIX
CC
IX
CC
IX
PCIe
Accel
CCIX
CC
IX
CC
IX
PCIe
Accel
CCIX
CCIX
CC
IX
CC
IXAccel
CCIX
CCIX
CC
IX
CC
IX
Processor
PCIe
Direct attached, daisy chain, mesh and switched topologies
Tech Symposia
© Arm 2017 9
DMA Engines: The problem with traditional accelerators
Operating System vendors are interested in the opportunity for workload-optimized accelerators
Traditional DMA approach requires a special (Linux) kernel driver for every unique accelerator
Requires skilled kernel developers (a driver for each accelerator), failure mode is catastrophic (system crash/downtime)
Operating Systems used tomorrow have already been deployed. Updates are 9-12 months apart
Drivers must be in “upstream” Linux before we support them, a year+ turnaround for every accelerator
'Trilby”: DMA Engine driven FPGA based workload accelerator built by Jon Masters for research into the barriers to adoption in the Enterprise, uses traditional approach of kernel driver and Operating System hacks.
Tech Symposia
© Arm 2017 10
Coherent virtual memory eliminates data transfer overhead
Processor
Accelerator
Processor
Accelerator
Clean and copy data
Non-coherent system without Shared Virtual Memory (SVM)Software must manage cache maintenance and data copying
Clean and copy data
Clean and copy data
Cache coherent system with Shared Virtual Memory (SVM)Hardware managed cache maintenance, shared address space with direct memory access
AcceleratorProcessor
Tech Symposia
© Arm 2017 11
CCIX acceleration functions that just work in the cloud
Container_2
P hysical M achine (e.g., processors, DRAM, caches, mmu, iommu, other resources and SoC devices ...)
V irtual M achines
G uest O S 1
JVM_1App_1
VNF_2
Virtual M achine1
G uest O S 2
VNF_1
V irtual M achine2
V irtual M achine M onitor (V M M )/H ypervisor
P hysical M achine
...
H yper-P riveleged
N onpriveleged
P riveleged
Container_1
G uest O S 3
App_4
Virtual M achine3
...
Container_1
VNF_3
Container_2
VNF_4
O ther E xternal D evices (e.g., disks, NICs, FPGAs, GPUs, crypto, other accelerators, other devices ...)
Firm w areFirm w are,
O ption R O M s, etc
(O ptional)S ystemD ependent
Non-privileged
Privileged
HyperPrivileged
OptionalSystem Dependent
CCIXfunction
CCIXfunction
CCIXfunc
CCIXfunc
Tech Symposia
© Arm 2017 12
CCIX layered architecture
• Protocol Layer – coherency protocol, memory read & write flows
• Link Layer – formats CCIX messages for target transport
• Transaction Layer – Adds optimized packets, manages credit based flow control
• Physical Layer – Dual mode PHY to support extended data rates
PCIeTransaction
LayerCCIX
Transaction Layer
PCIe Data Link Layer
CCIX/PCIe Physical Layer
Tx Rx
PCIe packetsCCIX messages
CCIXLink Layer
CCIXProtocol Layer
Tech Symposia
© Arm 2017 13
CCIX example request to home data flows
Memory
Accelerator shares processor memory
ReqCache
Home
LALA
Daisy chain to shared processor memory
ReqCache
Memory
ReqCache
Home
LALA
ReqCache
LA
ReqCache
LA
Memory
ReqCache
Home
LALA
ReqCache
Shared processor and accelerator memory
Memory
Home
Memory
ReqCache
Home
LALA
ReqCache
Shared memory with aggregation
Memory
Home
LALA
Tech Symposia
© Arm 2017 14
Improved efficiency with CCIX transaction layer
Reduced latency with light weight transaction layer
Improved packet efficiency with optimized CCIX header
Tech Symposia
© Arm 2017 15
CCIX port aggregation to boost bandwidth and transactions
CCIX defines a hashing function to steer requests across multiple links
Aggregation effectively multiplies the bandwidth
Aggregation could also be used to increase number of transactions (eg 50GT/s vs 25GT/s)
PCIe requires separate address spaces, requests can not be hashed
Memory
ReqCache
Home
LALA
ReqCache
CCIX with Port Aggregation
Memory
Home
LALA
Memory
Processor
Cache
Home
PC
I
PC
I
Accelerator
Cache
Mem0
Home0
PC
I
PC
I
Mem1
Home0
PCIe with Aggregation
Tech Symposia
© Arm 2017 16
CCIX SoC integration example
PC
IeTr
ansa
ctio
n
Laye
r
CC
IXTr
ansa
ctio
n
Laye
r
Dat
a Li
nk
Laye
r
PH
Y (u
p t
o 2
5G
pb
s)
16 Lanes
DMC-620
DMC-620 DMC-620
DMC-620
XP
CM
LR
NI
CXS
AXI
3rd party PCIe/CCIX IP
Example CMN-600 mesh design
CoreLink CMN-600
Tech Symposia
© Arm 2017 17
Arm CCIX demonstration vehicle
• Arm’s DynamIQ and CoreLink CMN-600 technology
• Cadence CCIX and PCIe controller and PHY IP
• TSMC 7nm process technology
• CCIX Connectivity to Xilinx’s Virtex UltraSoC+ FPGA
Xilinx, Arm, Cadence, and TSMC Announce World's First CCIX Silicon Demonstration Vehicle in 7nm Process Technology
Tech Symposia
© Arm 2017 18
Scale-up server node performance with CCIX
CCIX is a class of interconnect providing high performance, low latency for new accelerators use cases
Easy adoption and simplified development by leveraging today’s data centerinfrastructure
IP available from Arm and ecosystem to optimize CCIX SoC today
• Server, FPGAs, GPUs, network/storage adapters, intelligent networks and custom ASICs
For more information go to:
https://developer.arm.com
Tech Symposia
1919
Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!
© Arm 2017
The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks