Post on 05-Aug-2015
transcript
1
Ceph on All-Flash Storage –Breaking Performance BarriersZhou HaoTechnical Marketing EngineerJune 6th, 2015
Forward-Looking Statements
During our meeting today we will make forward-looking statements.
Any statement that refers to expectations, projections or other characterizations of future events or
circumstances is a forward-looking statement, including those relating to market growth, industry
trends, future products, product performance and product capabilities. This presentation also
contains forward-looking statements attributed to third parties, which reflect their projections as of the
date of issuance.
Actual results may differ materially from those expressed in these forward-looking statements due
to a number of risks and uncertainties, including the factors detailed under the caption “Risk Factors”
and elsewhere in the documents we file from time to time with the SEC, including our annual and
quarterly reports.
We undertake no obligation to update these forward-looking statements, which speak only as
of the date hereof or as of the date of issuance by a third party, as the case may be.
Requirement from Big Data @ PB Scale
Mixed media container, active-archiving, backup, locality of data
Large containers with application SLAs
Internet of Things, Sensor Analytics
Time-to-Value and Time-to-Insight
Hadoop
NoSQL
Cassandra
MongoDB
High read intensive access from billions of edge devices
Hi-Def video driving even greater demand for capacity and performance
Surveillance systems, analytics
CONTENT REPOSITORIES BIG DATA ANALYTICS MEDIA SERVICES
InfiniFlash™ System• Ultra-dense All-Flash Appliance
- 512TB in 3U
• Scale-out software for massive capacity
- Unified Content: Block, Object
- Flash optimized software with
programmable interfaces (SDK)
• Enterprise-Class storage features
- snapshots, replication, thin
provisioning
• Enhanced Performance for Block and
Object
- 10x Improvement for Block Reads
- 2x Improvement for Object Reads
IF500 with InfiniFlash OS (Ceph)
Ideal for large-scale storage &
Best in class $/IOPS/TB
InfiniFlash Hardware SystemCapacity 512TB* raw
All-Flash 3U Storage System
64 x 8TB Flash Cards with Pfail
8 SAS ports total
Operational Efficiency and Resilient
Hot Swappable components, Easy FRU
Low power 450W(avg), 750W(active)
MTBF 1.5+ million hours
Scalable Performance**
780K IOPS
7GB/s Throughput
Upgrade to 12GB/s in Q315
* 1TB = 1,000,000,000,000 bytes. Actual user capacity less.** Based on internal testing of InfiniFlash 100. Test report available.
Innovating Performance @ InfiniFlash OS
Major Improvements to Enhance Parallelism
Backend Optimizations – XFS and Flash
Messenger Performance Enhancements
• Message signing
• Socket Read aheads
• Resolved severe lock contentions
• Reduced ~2 CPU core usage with improved file path resolution from object ID
• CPU and Lock optimized fast path for reads
• Disabled throttling for Flash
• Index Manager caching and Shared FdCache in filestore
• Removed single Dispatch queue bottlenecks for OSD and Client (librados) layers
• Shared thread pool implementation
• Major lock reordering
• Improved lock granularity – Reader / Writer locks
• Granular locks at Object level
• Optimized OpTracking path in OSD eliminating redundant locks
Open Source with SanDisk Advantage InfiniFlash OS – Enterprise Level Hardened Ceph
Enterprise Level Hardening
9,000 hours of cumulative IO tests
1,100+ unique test cases
1,000 hours of cluster rebalancing tests
1,000 hours of IO on iSCSI
Testing at Hyperscale
Over 100 server node clusters
Over 4PB of flash storage
Failure Testing
2,000 cycle node reboot
1,000 times node abrupt power cycle
1,000 times storage failure
1,000 times network failure
IO for 250 hours at a stretch
Enterprise Level Support
Enterprise class support and servicesfrom SanDisk
Risk mitigation through long term support and a reliable long term roadmap
Continual contribution back to the community
Test Configuration – Single InfiniFlash System
Performance improves 2x to 12x depending on the Block size
Performance Improvement: Stock Ceph vs IF OS 8K Random Blocks
Top Row: Queue DepthBottom Row: % Read IOs
IOP
S
Avg
late
nv
(ms)
Avg Latency
0
50000
100000
150000
200000
250000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph(Giant)
IFOS 1.0
0
20
40
60
80
100
120
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB
IOPS
Top Row: Queue DepthBottom Row: % Read IOs
0
20000
40000
60000
80000
100000
120000
140000
160000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph
IFOS 1.0
Avg
Late
ncy
(m
s)
0
20
40
60
80
100
120
140
160
180
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
IOP
S
Performance Improvement: Stock Ceph vs IF OS 64K Random Blocks
IOPS Avg Latency
• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB
Top Row: Queue DepthBottom Row: % Read IOs
Top Row: Queue DepthBottom Row: % Read IOs
Performance Improvement: Stock Ceph vs IF OS 256K Random Blocks
0
5000
10000
15000
20000
25000
30000
35000
40000
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
Stock Ceph
IFOS 1.0
0
50
100
150
200
250
300
1 4 16 1 4 16 1 4 16 1 4 16 1 4 16
0 25 50 75 100
IOP
S
Avg
Late
ncy
(m
s)
IOPS Avg Latency
Top Row: Queue DepthBottom Row: % Read IOs
Top Row: Queue DepthBottom Row: % Read IOs
• 2 RBD/Client x Total 4 Clients• 1 InfiniFlash node with 512TB
Test Configuration – 3 InfiniFlash Systems (128TB each)
Performance scales linearly with additional InfiniFlash nodes
Scaling with Performance 8K Random Blocks
0
100000
200000
300000
400000
500000
600000
700000
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
0
50
100
150
200
250
300
350
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each
Top Row: Queue DepthBottom Row: % Read IOs
Top Row: Queue DepthBottom Row: % Read IOs
IOP
S
Avg
Late
ncy
(m
s)
Scaling with Performance 64K Random Blocks
0
50000
100000
150000
200000
2500001 4
16
64
25
6 2 8
32
12
8 1 4
16
64
25
6 2 8
32
12
8 1 4
16
64
25
6
0 25 50 75 100
0
100
200
300
400
500
600
700
800
900
1000
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each
Top Row: Queue DepthBottom Row: % Read IOs
Top Row: Queue DepthBottom Row: % Read IOs
IOP
S
Avg
Late
ncy
(m
s)
Scaling with Performance 256K Random Blocks
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 4
16
64
25
6 2 8
32
12
8 1 4
16
64
25
6 2 8
32
12
8 1 4
16
64
25
6
0 25 50 75 100
0
500
1000
1500
2000
2500
3000
3500
1 8 64 1 8 64 1 8 64 1 8 64 1 8 64
0 25 50 75 100
IOPS Avg Latency
• 2 RBD/Client x 5 Clients• 3 InfiniFlash nodes with 128TB each
Top Row: Queue DepthBottom Row: % Read IOs
Top Row: Queue DepthBottom Row: % Read IOs
IOP
S
Avg
Late
ncy
(m
s)
Flexible Ceph Topology with InfiniFlash
SAS
HSEB A HSEB B
OSDs
….
HSEB A HSEB B HSEB A HSEB B
….LUN LUN
Client Application…LUN LUN
Client Application…LUN LUN
Client Application…
RBDs / RGW
SCSI Targets
Rea
d IO
O
Write IO
RBDs / RGW
SCSI Targets
RBDs / RGW
SCSI Targets
OSDs OSDs OSDs OSDs OSDs
Rea
d IO
O
Rea
d IO
O
Disaggregated Architecture
Optimized for Performance
Higher Utilization
Reduced CostsStorage Farm
Compute Farm
Flash + HDD with Data Tier-ingFlash Performance with TCO of HDD
InfiniFlash OS performs automatic data placement and data movement between tiers based transparent to Applications
User defined Policies for data placement on tiers
Can be used with Erasure coding to further reduce the TCO
Benefits
Flash based performance with HDD like TCO
Lower performance requirements on HDD tier enables use of denser and cheaper SMR drives
Denser and lower power compared to HDD only solution
InfiniFlash for High Activity data and SMR drives for Low activity data
60+ HDD per Server
Compute Farm
Flash Primary + HDD ReplicasFlash Performance with TCO of HDD
Primary replica onInfiniFlash
HDD based data nodefor 2nd local replica
HDD based data nodefor 3rd DR replica
Higher Affinity of the Primary Replica ensures much of the compute is on InfiniFlash Data
2nd and 3rd replicas on HDDs are primarily for data protection
High throughput of InfiniFlash provides data protection, movement for all replicas without impacting application IO
Eliminates cascade data propagation requirement for HDD replicas
Flash-based accelerated Object performance for Replica 1 allows for denser and cheaper SMR HDDs for Replica 2 and 3
Compute Farm
TCO Example - Object StorageScale-out Flash Benefits at the TCO of HDD
$-
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
TraditionalObjStore on
HDD
InfiniFlashObjectStore -3
Full Replicason Flash
InfiniFlashwith
ErasureCoding- All Flash
InfiniFlash -Flash Primary& HDD copies
x 1
00
00
3Y TCO comparison for 96PB object storage
3 Year Opex
TCA
0
20
40
60
80
100
Total Racks
• Weekly failure rate for a 100PB deployment15-35 HDD vs. 1 InfiniFlash Card
• HDD cannot handle simultaneous egress/ingress
• HDD long rebuild times, multiple failures and rebalancing of data impact in service disruption
• Flash provides guaranteed & consistent SLA
• Flash capacity utilization >> HDD due to reliability & ops
• Flash low power consumption 450W(avg), 750W(active)
Note that operational/maintenance cost and performance benefits are not accounted for in these models!!!
InfiniFlash™ System
The First All-Flash Storage System Built for High Performance Ceph
21© 2015 SanDisk Corporation. All rights reserved. SanDisk is a trademark of SanDisk Corporation, registered in the United States and other countries. InfiniFlash is a trademarks of SanDisk Enterprise IP LLC. All other product and company names are used for identification purposes and may be trademarks of their respective holder(s).
http://bigdataflash.sandisk.com/infiniflash
Steven.Xi@SanDisk.com SalesTonny.Ai@SanDisk.com Sales EngineeringHao.Zhou@SanDisk.com Technical MarketingVenkat.Kolli@SanDisk.com Production Management