+ All Categories
Home > Technology > Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

Date post: 15-Jul-2015
Category:
Upload: ceph-community
View: 478 times
Download: 0 times
Share this document with a friend
28
Ceph Day San Francisco March 12, 2015 Deploying Flash Storage For Ceph
Transcript
Page 1: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

Ceph Day San Francisco – March 12, 2015

Deploying Flash Storage For Ceph

Page 2: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 2- Mellanox Confidential -

Leading Supplier of End-to-End Interconnect Solutions

Virtual Protocol Interconnect

StorageFront / Back-End

Server / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Virtual Protocol Interconnect

Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Metro / WAN

Page 3: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 3- Mellanox Confidential -

Solid State Storage Technology Evolution – Lower Latency

Advanced Networking and Protocol Offloads Required to Match Storage Media Performance

0.1

10

1000

HD SSD NVM

Ac

ce

ss

Tim

e (

mic

ro-S

ec

)

Storage Media Technology

50%

100%

Networked Storage

Storage Protocol (SW) Network

Storage Media

Network

HW & SW

Hard

Drives

NAND

Flash

Next Gen

NVM

Page 4: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 4- Mellanox Confidential -

Scale-Out Architecture Requires A Fast Network

Scale-out grows capacity and performance in parallel

Requires fast network for replication, sharing, and metadata (file)

• Throughput requires bandwidth

• IOPS requires low latency

• Efficiency requires CPU offload

Proven in HPC, storage appliances, cloud, and now… Ceph

Interconnect Capabilities Determine Scale Out Performance

Page 5: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 5- Mellanox Confidential -

Ceph and Networks

High performance networks enable maximum cluster availability

• Clients, OSD, Monitors and Metadata servers communicate over multiple network layers

• Real-time requirements for heartbeat, replication, recovery and re-balancing

Cluster (“backend”) network performance dictates cluster’s performance and scalability

• “Network load between Ceph OSD Daemons easily dwarfs the network load between Ceph Clients

and the Ceph Storage Cluster” (Ceph Documentation)

Page 6: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 6- Mellanox Confidential -

How Customers Deploy Ceph with Mellanox Interconnect

Building Scalable, Performing Storage Solutions

• Cluster network @ 40Gb Ethernet

• Clients @ 10G/40Gb Ethernet

High performance at Low Cost

• Allows more capacity per OSD

Flash Deployment Options

• All HDD (no flash)

• Flash for OSD Journals

• 100% Flash in OSDs

8.5PB System Currently Being Deployed

Page 7: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 7- Mellanox Confidential -

Ceph Deployment Using 10GbE and 40GbE

Cluster (Private) Network @ 40/56GbE

• Smooth HA, unblocked heartbeats, efficient data balancing

Throughput Clients @ 40/56GbE

• Guaranties line rate for high ingress/egress clients

IOPs Clients @ 10GbE or 40/56GbE

• 100K+ IOPs/Client @4K blocks

20x Higher Throughput , 4x Higher IOPs with 40Gb Ethernet Clients!(http://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf)

Throughput Testing results based on fio benchmark, 8m block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2

IOPs Testing results based on fio benchmark, 4k block, 20GB file,128 parallel jobs, RBD Kernel Driver with Linux Kernel 3.13.3 RHEL 6.3, Ceph 0.72.2

Cluster Network

Admin Node

40GbE

Public Network10GbE/40GBE

Ceph Nodes(Monitors, OSDs, MDS)

Client Nodes10GbE/40GbE

Page 8: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 8- Mellanox Confidential -

Ceph Throughput using 40Gb and 56Gb Ethernet

0

1000

2000

3000

4000

5000

6000

Ceph 64KB Random REad Ceph 256KB Random Read

MB

/s

40Gb TCP

56Gb TCP

56Gb

TCP

40Gb

TCP

56Gb

TCP

40Gb

TCP

One OSD, One Client, 8 Threads

Page 9: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 9- Mellanox Confidential -

By SanDisk

Optimizing Ceph for Flash

Page 10: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 10- Mellanox Confidential -

Ceph Flash Optimization

Highlights Compared to Stock Ceph• Read performance up to 8x better

• Write performance up to 2x better with tuning

Optimizations• All-flash storage for OSDs

• Enhanced parallelism and lock optimization

• Optimization for reads from flash

• Improvements to Ceph messenger

Test Configuration• InfiniFlash Storage with IFOS 1.0 EAP3

• Up to 4 RBDs

• 2 Ceph OSD nodes, connected to InfiniFlash

• 40GbE NICs from Mellanox

SanDisk InfiniFlash

Page 11: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 11- Mellanox Confidential -

8K Random - 2 RBD/Client with File System

IOPS: 2 LUNs /Client (Total 4 Clients)

0

50000

100000

150000

200000

250000

300000

1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32

0 25 50 75 100

Stock Ceph IFOS 1.0

Lat(ms): 2 LUNs/Client (Total 4 Clients)

[Queue Depth]

Read Percent

0

20

40

60

80

100

120

1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32

0 25 50 75 100

IOPS Latency

(ms)

Page 12: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 12- Mellanox Confidential -

Performance: 64K Random -2 RBD/Client with File System

IOPS: 2 LUNs/Client (Total 4 Clients) Lat(ms): 2 LUNs/Client (Total 4 Clients)

[Queue Depth]

Read Percent

0

20000

40000

60000

80000

100000

120000

140000

160000

1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32

0 25 50 75 100

Stock Ceph

IFOS 1.00

20

40

60

80

100

120

140

160

180

1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32 1 2 4 8 16 32

0 25 50 75 100

IOPS Latency

(ms)

Page 13: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 13- Mellanox Confidential -

XioMessenger

Adding RDMA To Ceph

Page 14: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 14- Mellanox Confidential -

I/O Offload Frees Up CPU for Application Processing

~88% CPU

Efficiency

Us

er

Sp

ac

eS

ys

tem

Sp

ac

e

~53% CPU

Efficiency

~47% CPU

Overhead/Idle

~12% CPU

Overhead/Idle

Without RDMA With RDMA and Offload

Us

er

Sp

ac

eS

ys

tem

Sp

ac

e

Page 15: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 15- Mellanox Confidential -

Adding RDMA to Ceph

RDMA Beta to be Included in Hammer

• Mellanox, Red Hat, CohortFS, and Community collaboration

• Full RDMA expected in Infernalis

Refactoring of Ceph messaging layer

• New RDMA messenger layer called XioMessenger

• New class hierarchy allowing multiple transports (simple one is TCP)

• Async design that leverages Accelio

• Reduced locks; Reduced number of threads

XioMessenger built on top of Accelio (RDMA abstraction layer)

• Integrated into all CEPH user space components: daemons and clients

• “public network” and “cloud network”

Page 16: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 16- Mellanox Confidential -

Open source!

• https://github.com/accelio/accelio/ && www.accelio.org

Faster RDMA integration to application

Asynchronous

Maximize msg and CPU parallelism

• Enable >10GB/s from single node

• Enable <10usec latency under load

In Giant and Hammer

• http://wiki.ceph.com/Planning/Blueprints/Giant/Accelio_RDMA_Messenger

Accelio, High-Performance Reliable Messaging and RPC Library

Page 17: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 17- Mellanox Confidential -

Ceph Large Block Throughput with 40Gb and 56Gb Ethernet

0

1000

2000

3000

4000

5000

6000

64KB Random REad 256KB Random Read

40Gb

56Gb TCP

56 Mb RDMA

8 c

ore

s in

OS

D

8 c

ore

s in

clie

nt

6 c

ore

s in O

SD

5co

res in

clie

nt

TC

P: 8

co

res in

OS

D

RD

MA

: 6

co

res in

OS

D

8 c

ore

s in

OS

D

8 c

ore

s in c

lient

6 c

ore

s in

OS

D

5co

res in

clie

nt

TC

P: 8 c

ore

s in

OS

D

RD

MA

: 6

co

res in

OS

D

56Gb

TCP

56Gb

RDMA

40Gb 56Gb

TCP

56Gb

RDMA

40Gb

Page 18: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 18- Mellanox Confidential -

Ceph 4KB Read K-IOPS: 40Gb TCP vs. 40Gb RDMA

0

50

100

150

200

250

300

350

400

450

2 OSDs, 4 clients 4 OSDs, 4 clients 8 OSDs, 4 clients

Th

ou

san

ds o

f IO

PS

40Gb TCP

40Gb RDMA

34

co

res in

OS

D

4co

res in

clie

nt

38

co

res in

OS

D

24

co

res in

clie

nt

38

co

res in

OS

D

30

co

res in

clie

nt

36

co

res in

OS

D

32

co

res in

clie

nt

34

co

res in

OS

D

27

co

res in

clie

nt

38

co

res in

OS

D

24

co

res in

clie

nt

RDMATCP RDMATCP RDMATCP

Page 19: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 19- Mellanox Confidential -

Ceph-Powered Solutions

Deployment Examples

Page 20: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 20- Mellanox Confidential -

Ceph For Large Scale Storage– Fujitsu Eternus CD10000

Hyperscale Storage

• 4 to 224 nodes

• Up to 56 PB raw capacity

Runs Ceph with Enhancements

• 3 different storage nodes

• Object, block, and file storage

Mellanox InfiniBand Cluster Network

• 40Gb InfiniBand cluster network

• 10Gb Ethernet front end network

Page 21: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 21- Mellanox Confidential -

Media & Entertainment Storage – StorageFoundry Nautilus

Turnkey Object Storage

• Built on Ceph

• Pre-configured for rapid deployment

• Mellanox 10/40GbE networking

High-Capacity Configuration

• 6-8TB Helium-filled drives

• Up to 2PB in 18U

High-Performance Configuration

• Single client read 2.2 GB/s

• SSD caching + Hard Drives

• Supports Ethernet, IB, FC, FCoE front-end ports

More information: www.storagefoundry.net

Page 22: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 22- Mellanox Confidential -

SanDisk InfiniFlash

Flash Storage System

• Announced March 3, 2015

• InfiniFlash OS uses Ceph

• 512 TB (raw) in one 3U enclosure

• Tested with 40GbE networking

High Throughput

• Up to 7GB/s

• Up to 1M IOPS with two nodes

More information:

• http://bigdataflash.sandisk.com/infiniflash

Page 23: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 23- Mellanox Confidential -

More Ceph Solutions

Cloud – OnyxCCS ElectraStack

• Turnkey IaaS

• Multi-tenant computing system

• 5x faster Node/Data restoration

• https://www.onyxccs.com/products/8-series

Flextronics CloudLabs

• OpenStack on CloudX design

• 2SSD + 20HDD per node

• Mix of 1Gb/40GbE network

• http://www.flextronics.com/

ISS Storage Supercore

• Healthcare solution

• 82,000 IOPS on 512B reads

• 74,000 IOPS on 4KB reads

• 1.1GB/s on 256KB reads

• http://www.iss-integration.com/supercore.html

Scalable Informatics Unison

• High availability cluster

• 60 HDD in 4U

• Tier 1 performance at archive cost

• https://scalableinformatics.com/unison.html

Page 24: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 24- Mellanox Confidential -

Summary

Ceph scalability and performance benefit from high performance networks

Ceph being optimized for flash storage

End-to-end 40/56 Gb/s transport accelerates Ceph today

• 100Gb/s testing has begun!

• Available in various Ceph solutions and appliances

RDMA is next to optimize flash performance—beta in Hammer

Page 25: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

Thank You

Page 26: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 26- Mellanox Confidential -

Ceph Performance Summary – 40Gb/s vs. 56Gb/s

Network Speed

and frame size

RDMA Verbs level

ib_send_bw

MB/s

64K random read

MB/s

256K random read

MB/s

40 Gb/s (MB/s)

(mtu = 1500)

4350 4300 4350

56 Gb/s (MB/s)

(mtu = 1500)

5710 5050 (RDMA)

4300 (tcp)

5450

56 Gb/s (MB/s)

(mtu = 2500)

6051 5355 (RDMA)

4350 (tcp)

5450

56 Gb/s (MB/s)

(mtu = 4500)

6070 5500 (RDMA)

4460 (tcp)

5500

Page 27: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 27- Mellanox Confidential -

Ceph Performance Summary with 1 OSD

1 client 1

stream

Simple

1 client 1

stream

XIO

1 client 8

streams

Simple

1 client 8

streams

XIO

2 clients

16 streams

Simple

2 clients

16 streams

XIO

KIOPs

Read

20(4 cores used at OSD)

21(5 cores used at OSD)

105(20 cores used at OSD)

125(15 cores used at OSD)

125(25 cores used at OSD)

155(19 cores used at OSD)

KIOPs

Write

8.5(7 cores used at OSD)

9.1(6 cores used at OSD)

NA NA NA NA

BW Read

(MB/s)

1140(3 cores used at OSD)

3140(4 cores used at OSD)

4300(8 cores used at OSD)

4300(6 cores used at OSD)

4300(11 cores used at OSD)

4300(8 cores used at OSD)

BW Write

(MB/s)

450(3 cores used at OSD)

520(3 cores used at OSD)

NA NA NA NA

BW tested with 256K IOs

IOPs tested with 4K IOs

Single OSD

Page 28: Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising performance

© 2015 Mellanox Technologies 28- Mellanox Confidential -

Ceph Performance Summary with 2-8 OSDs

2 OSDs 4

clients

Simple

2 OSDs 4

clients

XIO

4 OSDs 4

clients

Simple

4 OSDs 4

clients

XIO

8 OSDs 4

clients

Simple

8 OSDs 4

clients

XIO

KIOPs

Read

230(38 cores used at OSD)

(30 cores used at client)

332(34 cores used at OSD)

(4 cores used at client)

265(38 cores used at OSD)

(24 cores used at client)

423(38 cores used at OSD)

(24 cores used at client)

247(36 cores used at OSD)

(32 cores used at client)

365(34 cores used at OSD)

(27 cores used at client)

KIOPs

Write

25(10 cores used at OSD)

(2.5 cores used at client)

26(10 cores used at OSD)

(2 cores use at client)

41(20 cores used at OSD)

(4 cores used at client)

45(18 cores used at OSD)

(3 cores used at client)

65(32 cores used at OSD)

(6 cores used at client)

66(32 cores used at OSD)

(4 cores used at clients)

BW Read

(MB/s)

4300(7 cores used at OSD)

(7 cores used at client)

4300(6 cores used at OSD)

(5 cores used at client)

4300(9 cores used at OSD)

(8 cores used at client)

4300(6 cores used at OSD)

(5 cores used at client)

4300(8 cores used at OSD)

(8 cores used at client)

4300(6 cores used at OSD)

(5 cores used at client)

BW Write

(MB/s)

1475(6 cores used at OSD)

(1.5 cores used at client)

1464(6 cores used at OSD)

(1.5 cores used at client)

2490(11 cores used at OSD)

( 2 cores used at client)

2495(11 cores used at OSD)

(2 cores used at client)

3730(22 cores used at OSD)

(3 cores used at client)

3700(20 cores used at OSD)

(3 cores used at client)

BW tested with 256K IOs

IOPs tested with 4K IOs

2-8 OSDs


Recommended