At the heart of a new generation of data center infrastructures and appliances
/
Sept 2017
Page 2 ©2017 – Kalray SA All Rights Reserved
VIRTUALIZED DATACENTER: THE BLENDER EFFECT FOR STORAGE I/O OPERATIONS
Aggregate switch
10, 000 VMs MIOPs in random
Compute Nodes Centralized storage 10,000s of VMs generate millions of
random IOPs on the storage side.
Page 3 ©2016 – Kalray SA All Rights Reserved
NVMe SSDs: THE ANSWER TO THE RANDOM MIOPs DEMAND
• 150 IOPs • 6ms-200ms latency
SAS HDD
• 200 KIOPs • 140µs latency
SAS SSD
• 800 KIOPs • 115µs latency
NVMe SSD
X 1000 more compute
X4 more compute
NVMe SSDs deliver 4000x better performances than traditional SAS HDDs.
FROM SAS TO NVMe
SAS Switch A
SAS Switch B
FE NIC SAS HBA CPU CPU
SAS EXPANDER
FE Fabric
SAS Fabric
SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
SAS HBA CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
SAS EXPANDER
SSD
SSD
SSD
SSD
SSD
SAS Fabric
FE NIC CPU CPU
FE Fabric
Point-to-point PCIe
connection
SSD
SSD
SSD
SSD
SSD
Compute Nodes
FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
SAS to NVMe
Page 4 ©2017 – Kalray SA All Rights Reserved FMS 2017
PCIe : The issues
FE NIC CPU CPU
FE Fabric
SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
Page 5 ©2017 – Kalray SA All Rights Reserved FMS 2017
Primary designed for CPU-to-peripheral
Not designed for Rack communication
Point-to-point PCIe
connection
PCIe : The issues
FE NIC CPU CPU
FE Fabric
SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU CPU
Ethernet Switch
PCIe
Page 6 ©2017 – Kalray SA All Rights Reserved FMS 2017
Primary designed for CPU-to-peripheral
Scale storage capacity (pay as you grow)
Adding JBOFs means adding head nodes
FE NIC CPU CPU
SSD
SSD
SSD
SSD
SSD
FE NIC CPU CPU
PCIe
PCIe : The issues
FE NIC CPU CPU
FE Fabric
SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
Page 7 ©2017 – Kalray SA All Rights Reserved FMS 2017
Primary designed for CPU-to-peripheral
Scale storage capacity (pay as you grow)
Scale storage head nodes based on services
6 Head Nodes Max Point-to-point PCIe
connection
PCIe : The issues
FE NIC CPU CPU
FE Fabric
SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
Page 8 ©2017 – Kalray SA All Rights Reserved FMS 2017
Primary designed for CPU-to-peripheral
Scale storage capacity (pay as you grow)
Scale storage head nodes based on services
Limited compute-to-storage ratio and flexibility
1 to 6 Head Nodes per JBOF
Point-to-point PCIe
connection
NVMe JBOF: THE 2 OTHER SOLUTIONS EXISTING TODAY
Page 9 ©2016 – Kalray SA All Rights Reserved – Confidential information
X86 JBOF
Ethernet FE Fabric
NVMe-oF BE Fabric
Compute Node
FE NIC
Top of Rack Switch (ToR)
HBA NIC
CPU CPU
FE NIC HBA NIC
CPU CPU
RNIC CPU CPU
Ethernet Switch
X86 JBOF (Just a Bunch Of Flash)
Storage Head Nodes
Ethernet Switch
PCIe Switch
Lower Density High Cost/High Power
RNIC CPU CPU
SSD
SSD
PCIe Switch
…
…
FE NIC CPU CPU
Ethernet
FE NIC
Top of Rack Switch (ToR)
Hyperconverged nodes
CPU CPU
Ethernet Switch
FE NIC CPU CPU
...
SSD
SSD
SSD
FE NIC CPU CPU
SSD
FE NIC CPU CPU
SSD
FE NIC CPU CPU
SSD
HYPERCONVERGED
SSD
SSD
SSD
SSD
SSD
SSD Compute/storage ratio is fixed
THE IDEAL SOLUTION: NVMe-oF JBOF
Page 10 ©2016 – Kalray SA All Rights Reserved – Confidential information
NVMe-oF JBOF
Ethernet FE Fabric
NVMe-oF BE Fabric
Compute Node
FE NIC
Top of Rack Switch (ToR)
HBA NIC
CPU
FE NIC HBA NIC
Ethernet Switch
Ethernet Switch
Scale head nodes based on services Scale storage as needed Leverage existing PCIe JBOF designs High Density Cost/power optimized
NVMe-OF Target
SSD
SSD
SSD
SSD
PCIe Switch
SSD
CPU
CPU CPU
RNIC CPU
NVMe-oF Target Controller
Density of PCIe JBOF with the flexibility of x86 JBOF
…
… NVMe-OF
Target
SSD
SSD
SSD
SSD
PCIe Switch
SSD
Page 11 ©2016 – Kalray SA All Rights Reserved – Confidential Information
Page 12 ©2016 – Kalray SA All Rights Reserved
NVMe-oF STORAGE SOLUTION: KALRAY TARGET CONTROLLER (KTC40/KTC80)
PCIe RC MODE FOR DIRECT SSD CONTROL • Standard Linux with NVMe Driver • Control up to 255 PCIe endpoints • Any NVMe SSD supported – no need for CMB • SSD Hot Plug Support
LOW ADDITIONAL LATENCY • 15 µs for 4KB block transfer
TARGET CONTROLLER FEATURE
NVMe-oF PROTOCOL OVER RoCEv1/v2 • 4x + performant than SAS (IOPs &throughput) • Scalability: Connect up to 2048 initiator cores • standard ethernet connectivity
END USER INLINE PROCESSING • Compression, Encryption, …
KALRAY TARGET CONTROLLER FUNCTION
…
BOARD MANAGEMENT CONTROL (BMC) • Supervise enclosure
Manages all the storage functions of the new generation storage JBOF.
JBOF
HIGH AVAILABILITY ARCHITECTURE • End-to-end Multipath architecture
RNIC CPU
RNIC CPU
Target Controller
PCIe SW
PCIe SW
U.2 SSD
U.2 SSD
Target Controller
KTC80
• MPPA®2.2-256 (Bostan2 processor)
• 80 GbE of sustained throughput
• 2 x QSFP+ ports
• 16-lane PCIe Gen3
• 2 x DDR3-1866 with ECC (4GB)
• FHHL (Full-Height, Half-Length)
• Embedded switch with bifurcation up to 4 x 4-lane
KTC40
• MPPA®2.2-256 (Bostan2 processor)
• 40GbE
• 2 x QSFP+ ports
• 8-lane PCIe Gen3
• 2 x DDR3-1866 with ECC (2GB)
• LP (Low-profile)
KTC40 & KTC80 HARDWARE SPECIFICATION
KALRAY LEADS THE INDUSTRY IN NVMe-oF COMPATIBILITY
Page 14 ©2016 – Kalray SA All Rights Reserved – Confidential information
OPERATION KTC40 KTC80
Ethernet SSD
(NVMe Direct/Root Complex)
67%RD / 33% WR @4KB
1.6 MIOPs
15 µs latency
3.2 MIOPs
15 µs latency
Highest possible throughput.
A whole family of products.
KALRAY I/O BOSTAN PROCESSOR OVERVIEW
HIGH-SPEED INTERFACES: • 2x 40GbE • 2x PCIe Gen3 8-lanes (EP/RC)
CONNECTED TO A LARGE ARRAY OF PROCESSING • Full C/C++ Programmable • Dataplane execution
VIA A HIGH BANDWIDTH LOW LATENCY NETWORK ON CHIP • Direct packet-to-core delivery • Direct core-to-core transfers • Direct connect between multiple MPPAs
AND I/O Quad CORES • Runs Linux • Runs control plane
Quad core 0
C C C
C C C C
C C C C
C C C C
Quad core 1
Quad core 2
4x
10
G /
40
G
Eth
ern
et
4x
10
G /
40
G
Eth
ern
et
Quad core 3 8-lane PCIe Gen3
C
8-lane PCIe Gen3
NoC
KTC NVMe-oF SOFTWARE STACK
Page 16 ©2016 – Kalray SA All Rights Reserved
Clusters Quad Cores
Hypervisor Hypervisor
Linux
NoC Driver
PCIe RC
TCP/IP
iSCSI
RoCE v1/v2
NVME-oF /iSER
NVME RA
NVME ODP
Firmware
Libnoc
ODP
RoCEv1/v2
NVME-oF
NVMe RA
MPPA Processor
SES Mgmt
SES ports
USER INLINE PROCESSING (DEDUP, ENCRYPTION, …)
USER CONTROL PLANE
Configuration
END USER CUSTOMIZABLE SOLUTION
Page 17 ©2017 – Kalray SA All Rights Reserved – Confidential information
INLINE PROCESSING • Compression • Encryption • Deduplication • Erasure Coding
CUSTOMIZABLE FUNCTIONS
END USER READ/WRITE OPERATION POLICY • Implement optimized Read/write scheduling to
eliminate outliers on critical streams • Achieve a low latency for 99.9999%
BOARD MANAGEMENT CONTROL (BMC) • REDFISH/SWORDFISH • SES • openBMC
Page 18 ©2016 – Kalray SA All Rights Reserved – Confidential Information
Page 19 ©2016 – Kalray SA All Rights Reserved
YOUR PCIe JBOF EASILY BECOMES AN ETHERNET JBOF WITH KALRAY TARGET CONTROLLER
No Modifications
KTC ENABLES A FAST TIME-TO-MARKET TO BUILD NVMe-oF JBOF
NVMe-oF JBOF PCIe JBOF
INITIATOR BOTTLENECK SOLUTION
Page 20 ©2017 – Kalray SA All Rights Reserved
Up to 6 x 255 PCIe End points
Up to 6 x 72 Initiator cores
Up to 6 x 255 PCIe End Points
Up to 6 x 2048 Initiator cores (170 x 72 cores)
NVMe-oF KTC connects to 28x more initiator cores than PCIe adapters.
This solves the initiator bottleneck issue!
MIOPs Initiator cores
KIOPS/ Core
12 432 28
X6
X6
MIOPs Initiator cores
KIOPS/ Core
12 24, 576 1
X170
NVMe-oF JBOF PCIe JBOF
NVMe-oF JBOF: Scales the storage capacity
Page 21 ©2017 – Kalray SA All Rights Reserved
NVMe-oF KTC
NVMe-oF JBOF PCIe JBOF
Ethernet FE Fabric
NVMe-oF BE Fabric
Compute Node
FE NIC
Top of Rack Switch (ToR)
HBA NIC
CPU
FE NIC HBA NIC
Ethernet Switch
Ethernet Switch
KTC
SSD
SSD
SSD
SSD
PCIe Switch
SSD
CPU
CPU CPU
…
… KTC
SSD
SSD
SSD
SSD
PCIe Switch
SSD
FE NIC CPU CPU
FE Fabric
Point-to-point PCIe
connection SSD
SSD
SSD
SSD
SSD
Compute Nodes FE NIC
Top of Rack Switch (ToR)
Storage Servers
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
KTC ENABLES HIGH AVAILABILITY ARCHITECTURE
MULTIPATH HANDLED AT THE INITATOR SIDE
•Standard feature available in Linux Kernel
•Support Active-Active or Active-Standby modes
Page 22 ©2017 – Kalray SA All Rights Reserved
END-TO-END REDUNDANT PATH
•Dual port U.2 NVME SSD
•Dual PCIe Trees
•Dual KTC40/80 connectivity
SCALE PERFORMANCE UP TO SSD PEAK CAPABILITIES
Page 23 ©2017 – Kalray SA All Rights Reserved
MIOPS : RANDOM - 66% RD / 33% WR - 4KB BANDWIDTH : RANDOM - 100% RD – 4KB
Global performances
• 6.4 MIOPs • 17.6 GB/s
• 6.4 MIOPs • 17.6 GB/s
• 15 MIOPs • 48 GB/s
• 19.2 MIOPs • 52.8 GB/s
• 15 MIOPs • 48 GB/s
Global performances
• 15 MIOPs • 48 GB/s
Scale up to SSD peak performances
x86-based JBOF Versus KTC-based JBOF: performance optimized
X86 JBOF architecture
KTC-based architecture
32%
More power efficient.
Higher performance
60%
CPU+ NIC FUNCTION • 2 x XEON E5-2667v4 • 8 x 16GB DDR4 • 3 x 100G NIC
POWER: 309 W
PERFORMANCE: 9.4 MIOPs
CPU + NIC FUNCTION • 6 x KTC80
POWER: 210 W
PEFORMANCE: 15 MIOPS
ELIMINATE THE HIGH COST/ HIGH POWER x86 SYSTEM (CPU, MEMORY, …) WHILE INCREASING THE PERFORMANCES BY 60%
=
Same density.
DENSITY: 24 SSDs in 2U (77TB)
DENSITY: 24 SSDs in 2U (77TB)
x86-based JBOF Versus KTC-based JBOF: density optimized
X86 JBOF architecture
KTC-based architecture
Greater density.
58 %
SPECIFICATION • 2 x XEON E5-2667v4 • 8 x 16GB DDR4 • 2 x 100G NIC
POWER: 294 W
PERFORMANCE: 6.25 MIOPs
SPECIFICATION • CHASSIS WITH 250 M.2 SSD in 2OU • 3 x KTC80-LP
POWER: 105 W
PEFORMANCE: 7.5 MIOPS
ELIMINATE THE HIGH COST/ HIGH POWER x86 SYSTEM (CPU, MEMORY, …) WHILE INCREASING DENSITY AND OPTIMZING COST AND POWER
64%
More Power effective.
20%
Better performance.
DENSITY: 154TB in 2U
DENSITY: 240 TB in 2OU
STORAGE: PAY AS YOU GROW WITH KALRAY TARGET CONTROLLER
Page 26 ©2016 – Kalray SA All Rights Reserved – Confidential information
KEEP THE SAME INFRASTRUCTURE • ToR switch • Number of storage servers
PAY AS YOU GROW ! • Pay only for additional storage capacity • Not for additional storage servers or Top of Rack Switch
KALRAY UNIQUE ADVANTAGE: CHAIN NVMe-oF JBOFs
Ethernet FE Fabric
NVMeoF BE Fabric
KTC
SSD
SSD
SSD
SSD
SSD
FE NIC
Top of Rack Switch (ToR)
Storage Server
CPU
JBOF (Just a Bunch Of Flash)
CPU
Ethernet Switch
HBA NIC
KTC
SSD
SSD
SSD
SSD
SSD
KTC
SSD
SSD
SSD
SSD
SSD
The chaining equivalent to SAS protocol.
NVMe-oF JBOF ENABLES DISAGREGATED HYPERCONVERGED ARCHITECTURE
FE NIC CPU CPU
Ethernet
FE NIC
Top of Rack Switch (ToR)
Hyperconverged nodes
CPU CPU
Ethernet Switch
...
SSD
SSD
FE NIC CPU CPU
SSD
FE NIC CPU CPU
SSD
Page 27 ©2017 – Kalray SA All Rights Reserved FMS 2017
Converged Ethernet (NVMe-oF)
FE NIC
Top of Rack Switch (ToR)
Hyperconverged nodes
CPU CPU
Ethernet Switch
FE NIC CPU CPU
...
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
Target Controller
Target Controller
...
Disaggregated Hyperconverged/SDS Scale compute & storage independently Leverage existing PCIe JBOF designs
Hyperconverged /SDS Hyperconverged/SDS scales naturally Compute/storage ratio is fixed DAS is expansive
Page 28 ©2016 – Kalray SA All Rights Reserved – Confidential Information
Page 29 ©2016 – Kalray SA All Rights Reserved
NVMe-oF : THE SOLUTION FOR NEW GENERATION OF STORAGE SYSTEMS
THE SOLUTION FOR NVMe-oF
JBOF
4X HIGHER IOPS THAN SAS SSD
End-to-end NVMe/NVMe-oF
capabilities ensure 4X more IOPS
SCALABLE & FLEXIBLE
Scale the Head Nodes and Storage capacity
independently
FAST TIME TO MARKET
Plug NVMe-oF Target controller in your
standard PCIe JBOF
NVMe-oF TARGET
$$$
Eliminate the need of X86 and associated
system memory
KALRAY S.A. - GRENOBLE - FRANCE 445 rue Lavoisier, 38 330 Montbonnot - France Tel: +33 (0)4 76 18 09 18 email: [email protected]
KALRAY INC. - LOS ALTOS - USA 4962 El Camino Real Los Altos, CA - USA Tel: +1 (650) 469 3729 email: [email protected]
MPPA, ACCESSCORE and the Kalray logo are trademarks or registered trademarks of Kalray in various countries. All trademarks, service marks, and trade names are the marks of the respective owner(s), and any unauthorized use thereof is strictly prohibited. All terms and prices are indicative and subject to any modification without notice.