OSDC 2014: Martin Gerhard Loschwitz - What's next for Ceph?

What’s next for Ceph? On the future of scalable storage

Martin Gerhard Loschwitz

© 2014 hastexo Professional Services GmbH. All rights reserved.

Who?

Quick reminder:

Object Storage

Users

Objects

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

Cephalopod (Wikipedia, user Nhobgood)

RADOS

Redundant Autonomic Distributed Object Store

2 Components

OSDs

Users

Objects

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

HDD

FS

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

Users

Objects

Unified Storage

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

Users

Objects

Users

Objects

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

Users

Objects

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

Users

Objects

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

MONs

Users

Objects

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

MO

N

MO

N

MO

N

Data Placement

MONs

MONs

MONs

MONs

MONs

MONs

MONs

Parallelization

2 2 1 1

MONs

2 2 1 1

MONs

2 2 1 1 1 2 2 1

MONs

MONs

CRUSH

Controlled Replication Under Scalable Hashing

By configuring CRUSH, you make the cluster

rack-aware.

MO

N

MO

N

MO

N

Users

Objects

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

OS

D

RADOS Block Device Block-level interface

driver for RADOS

RADOS Gateway ReSTful API to

access RADOS

CephFS POSIX file system

access to RADOS

“Booooring!”

Cool Stuff ahead:

Erasure Coding Tiering

Multi-DC Setups Automation

CephFS Enterprise Support

Erasure Coding

2 2 1 1 1 2 2 1

MONs

Until now, Ceph has really worked like

a standard RAID 1.

Every binary object exists two times.

2 2 1 1 2 1 2 1

MONs

Works great. But it also reduces the net

capacity by 50%.

At least.

That is where Erasure Coding comes in.

It makes Ceph work

like a RAID 5.

Mostly developed by Loic Dachary

Idea: Split binary objects into even smaller chunks

MONs

MONs

MONs

MONs

This reduces the amount of space required for replicas enormously!

Different replication factors available

But: The lower the level is, the longer it takes to re-calculate

missing chunks.

Available in Ceph 0.80.

Tiering

Not all data stored in Ceph is equal.

Often needed, fresh data is usually expected

to be served quickly.

Also, customers may be willing to accept

slower performance in exchange for lower prices.

Until now, that wasn’t easy to implement in RADOS due to a

number of limitations.

With Ceph 0.80, pools will allow to

store data on different, hardware, based on

its performance

Wait. Pools?

Pools are a logical unit in RADOS. A pool is a bunch of

Placement Groups.

By using tiering, Pools can be tied to specific hardware components.

All replication happens intra-pool

Data may be moved from one pool to

another pool in RADOS

Available in Ceph 0.80.

Multi-DC Setups

Ceph was designed for high-performance,

synchronous replication

Off-Site replication is typically asynchronous.

Bummer!

But starting with Ceph 0.67, the RADOS Gateway

supports “Federation”

MONs

1

MONs

DC 2

DC 1

RADOS

Gateway

RADOS

Gateway

Sync-Agent

Sync-Agent

MONs

1

MONs

DC 2

DC 1

RADOS

Gateway

RADOS

Gateway

Sync-Agent

Sync-Agent

1

In fact, the federation feature adds asynchronous

replication on top of the RADOS storage cluster

Still needs better integration with the

other Ceph components

Automation

Ceph clusters will almost always be

deployed using tools for automation

Thus, it needs to play together well with Chef, Puppet & Co.

Chef: Yay!

Chef cookbooks are maintained and

provided by Inktank.

Puppet: Ouch

Inktank does not provide Puppet modules

for Ceph deployment

Right now, at least 6 concurring modules exist on GitHub, some

forks of each other

None of these use ceph-deploy, though.

But there is hope: puppet-cephdeploy does use ceph-deploy

Needs some additional work, but generally,

looks very promising and already works

Plays together nicely even with ENCs such

as the Puppet Dashboard or the Foreman project

CephFS

Considered Vapoware by some people already.

But that’s not fair!

CephFS is already available and works.

Well, mostly.

For CephFS, the really critical component is the Metadata Server (MDS)

Running CephFS today with exactly one active

MDS is fine and will most likely not cause trouble.

But Sage wants the MDS to scale-out properly so

that running several active MDSes at a time works

That’s called Subtree Partitioning. Every active MDS will be responsible for the meta-data of a certain subtree of the POSIX-compatible FS

Right now, Subtree partitioning is what’s

causing trouble.

CephFS is not Inktank’s main priority; likely to

be released as “stable” in Q4 2014

Enterprise Support

Major companies willing to run Ceph need some

type of support contract.

Inktank has started to offer that support through a

product called “Inktank Ceph Enterprise” (ICE)

Gives users Long-Term support for certain Ceph releases (such as 0.80)

and hot-fixes for problems

Also brings Calamari, Inktank’s Ceph GUI

Distribution Support

Inktank does a lot to make installing Ceph

on different distributions as smooth as possible already.

Ye olde OSes:

Ubuntu 12.04 Debian Wheezy

RHEL 6 SLES 11

Ubuntu 14.04: May 2014

RHEL 7: December 2014

Release Schedule

Firefly (0.80): May 2014, along

with ICE 1.2

Giant: Summer 2014

(Non-LTS version)

The “H”-release: December 2014,

along with ICE 2.0

Ceph Days

Ceph Days are information events run by Inktank all

over the world.

2 have happened in Europe so far:

London (October 2013)

Frankfurt (Februar 2014)

Ceph Days allow to gather with others willing to use

Ceph, exchange experiences.

And you can meet

Sage Weil

No shit. You can meet

Sage Weil!

Special thanks to Sage Weil (Twitter: @liewegas)

& Crew for Ceph Inktank (Twitter: @inktank)

for the Ceph-Logo

[email protected]

goo.gl/S1sYZ (me on Google+)

twitter.com/hastexo

hastexo.com

2 2 1 1 1 2 2 1

MONS

Date post:	10-Jun-2015
Category:	Software
Upload:	netways
View:	608 times
Download:	2 times

OSDC 2014: Martin Gerhard Loschwitz - What's next for Ceph?

Software