+ All Categories
Home > Software > What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

Date post: 27-Nov-2014
Category:
Upload: ian-colle
View: 328 times
Download: 2 times
Share this document with a friend
Description:
October 2014 overview of the Ceph distributed storage system architecture, integration with OpenStack, and plans for future development.
Popular Tags:
61
14 OCT 2014 Colorado OpenStack Meetup
Transcript
Page 1: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

14 OCT 2014Colorado OpenStack Meetup

Page 2: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

HISTORICAL TIMELINE

2

RHEL-OSP Certification FEB 2014

MAY 2012Launch of Inktank

OpenStack Integration 2011

2010Mainline Linux Kernel

Open Source 2006

2004 Project Starts at UCSC

Production Ready Ceph SEPT 2012

2012CloudStack Integration

OCT 2013Inktank Ceph Enterprise Launch

Xen Integration 2013

APR 2014Inktank Acquired by Red Hat

10 years in the making

Copyright © 2014 by Inktank

Page 3: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

OPENSTACK USER SURVEY, 05/2014

3

DEV / QA PROOF OF CONCEPT PRODUCTION

Page 4: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

A STORAGE REVOLUTION

PROPRIETARY HARDWARE

PROPRIETARY SOFTWARE

SUPPORT & MAINTENANCE

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISK

STANDARDHARDWARE

OPEN SOURCE SOFTWARE

ENTERPRISEPRODUCTS &

SERVICES

COMPUTER

DISKCOMPUTE

RDISK

COMPUTER

DISK

Page 5: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURE

Page 6: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

Copyright © 2014 by Inktank

ARCHITECTURAL COMPONENTS

6

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Page 7: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURAL COMPONENTS

7

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Copyright © 2014 by Inktank

Page 8: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

OBJECT STORAGE DAEMONS

8

FS

DISK

OSD

DISK

OSD

FS

DISK

OSD

FS

DISK

OSD

FS

btrfsxfsext4zfs?

M

M

M

Copyright © 2014 by Inktank

Page 9: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

RADOS CLUSTER

9

APPLICATION

M M

M M

M

RADOS CLUSTER

Copyright © 2014 by Inktank

Page 10: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

RADOS COMPONENTS

10

OSDs: 10s to 10000s in a cluster One per disk (or one per SSD, RAID

group…) Serve stored objects to clients Intelligently peer for replication & recovery

Monitors: Maintain cluster membership and state Provide consensus for distributed decision-

making Small, odd number These do not serve stored objects to

clients

MCopyright © 2014 by Inktank

Page 11: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

WHERE DO OBJECTS LIVE?

11

??APPLICATION

M

M

M

OBJECT

Copyright © 2014 by Inktank

Page 12: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

A METADATA SERVER?

12

1

APPLICATION

M

M

M

2

Copyright © 2014 by Inktank

Page 13: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CALCULATED PLACEMENT

13

FAPPLICATION

M

M

MA-G

H-N

O-T

U-Z

Copyright © 2014 by Inktank

Page 14: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

EVEN BETTER: CRUSH!

14

CLUSTER

OBJECTS

10

01

01

10

10

01

11

01

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

PLACEMENT GROUPS(PGs)

Copyright © 2014 by Inktank

Page 15: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CRUSH IS A QUICK CALCULATION

15

RADOS CLUSTER

OBJECT

10

01

01

10

10

01 11

01

1001

0110 10 01

11

01

Copyright © 2014 by Inktank

Page 16: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CRUSH: DYNAMIC DATA PLACEMENT

16

CRUSH: Pseudo-random placement algorithm

Fast calculation, no lookup Repeatable, deterministic

Statistically uniform distribution Stable mapping

Limited data migration on change Rule-based configuration

Infrastructure topology aware Adjustable replication Weighting

Copyright © 2014 by Inktank

Page 17: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CRUSH

17

OBJECT

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

Copyright © 2014 by Inktank

Page 18: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

18

OBJECT

10 10 01 01 10 10 01 11 01 10

Copyright © 2014 by Inktank

Page 19: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

19

CLIENT

??

Copyright © 2014 by Inktank

Page 20: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

20Copyright © 2014 by Inktank

Page 21: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

21Copyright © 2014 by Inktank

Page 22: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

22

CLIENT

??

Copyright © 2014 by Inktank

Page 23: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

23Copyright © 2014 by Inktank

Page 24: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

24Copyright © 2014 by Inktank

Page 25: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

25Copyright © 2014 by Inktank

Page 26: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURAL COMPONENTS

26

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Copyright © 2014 by Inktank

Page 27: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ACCESSING A RADOS CLUSTER

27

APPLICATION

M M

M

RADOS CLUSTER

LIBRADOS

OBJECT

socket

Copyright © 2014 by Inktank

Page 28: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

L

LIBRADOS: RADOS ACCESS FOR APPS

28

LIBRADOS: Direct access to RADOS for applications C, C++, Python, PHP, Java, Erlang Direct access to storage nodes No HTTP overhead

Copyright © 2014 by Inktank

Page 29: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURAL COMPONENTS

29

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Copyright © 2014 by Inktank

Page 30: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

THE RADOS GATEWAY

30

M M

M

RADOS CLUSTER

RADOSGWLIBRADOS

socket

RADOSGWLIBRADOS

APPLICATION APPLICATION

REST

Copyright © 2014 by Inktank

Page 31: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

RADOSGW MAKES RADOS WEBBY

31

RADOSGW: REST-based object storage proxy Uses RADOS to store objects API supports buckets, accounts Usage accounting for billing Compatible with S3 and Swift applications

Copyright © 2014 by Inktank

Page 32: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURAL COMPONENTS

32

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Copyright © 2014 by Inktank

Page 33: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

STORING VIRTUAL DISKS

33

M M

RADOS CLUSTER

HYPERVISORLIBRBD

VM

Copyright © 2014 by Inktank

Page 34: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

SEPARATE COMPUTE FROM STORAGE

34

M M

RADOS CLUSTER

HYPERVISORLIBRB

D

VM HYPERVISORLIBRB

D

Copyright © 2014 by Inktank

Page 35: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

KERNEL MODULE FOR MAX FLEXIBLE!

35

M M

RADOS CLUSTER

LINUX HOSTKRBD

Copyright © 2014 by Inktank

Page 36: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

RBD STORES VIRTUAL DISKS

36

RADOS BLOCK DEVICE: Storage of disk images in RADOS Decouples VMs from host Images are striped across the cluster

(pool) Snapshots Copy-on-write clones Support in:

Mainline Linux Kernel (2.6.39+) Qemu/KVM, native Xen coming soon OpenStack, CloudStack, Nebula,

Proxmox

Copyright © 2014 by Inktank

Page 37: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

Export snapshots to geographically dispersed data centers▪Institute disaster recovery

Export incremental snapshots▪Minimize network bandwidth by only sending

changes

RBD SNAPSHOTS

Copyright © 2014 by Inktank

Page 38: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHITECTURAL COMPONENTS

38

RGWA web services

gateway for object storage, compatible

with S3 and Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby,

PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed block device with cloud

platform integration

CEPHFSA distributed file

system with POSIX semantics and

scale-out metadata management

APP HOST/VM CLIENT

Copyright © 2014 by Inktank

Page 39: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

SEPARATE METADATA SERVER

39

LINUX HOST

M M

M

RADOS CLUSTER

KERNEL MODULE

datametadata 0110

Copyright © 2014 by Inktank

Page 40: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

SCALABLE METADATA SERVERS

40

METADATA SERVER Manages metadata for a POSIX-compliant

shared filesystem Directory hierarchy File metadata (owner, timestamps,

mode, etc.) Stores metadata in RADOS Does not serve file data to clients Only required for shared filesystem

Copyright © 2014 by Inktank

Page 41: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CALAMARI

41Copyright © 2014 by Inktank

Page 42: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CALAMARI ARCHITECTURE

42

CEPH STORAGE CLUSTER

MASTER

CALAMARI

ADMIN NODE

MINION

MINION

M

MINION

MINION

M

MINION

MINION

M

Copyright © 2014 by Inktank

Page 43: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

USE CASES

Page 44: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

WEB APPLICATION STORAGE

WEB APPLICATION

APP SERVER

APP SERVER

APP SERVER

CEPH STORAGE CLUSTER(RADOS)

CEPH OBJECT GATEWAY

(RGW)

CEPH OBJECT GATEWAY(RGW)

44

APP SERVER

S3/Swift S3/Swift S3/Swift S3/Swift

Copyright © 2014 by Inktank

Page 45: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

MULTI-SITE OBJECT STORAGE

WEB APPLICATIONAPP

SERVER

CEPH OBJECT GATEWAY

(RGW)

45

CEPH STORAGE CLUSTER

(US-EAST)

WEB APPLICATIONAPP

SERVER

CEPH OBJECT GATEWAY

(RGW)

CEPH STORAGE CLUSTER

(EU-WEST)

Copyright © 2014 by Inktank

Page 46: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHIVE / COLD STORAGE

46

APPLICATION

CACHE POOL (REPLICATED)

BACKING POOL (ERASURE CODED)

CEPH STORAGE CLUSTER

Copyright © 2014 by Inktank

Page 47: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ERASURE CODING

47

OBJECT

REPLICATED POOL

CEPH STORAGE CLUSTER

ERASURE CODED POOL

CEPH STORAGE CLUSTER

COPY COPY

OBJECT

31 2 X Y

COPY4

Full copies of stored objects Very high durability Quicker recovery

One copy plus parity Cost-effective durability Expensive recovery

Copyright © 2014 by Inktank

Page 48: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ERASURE CODING: HOW DOES IT WORK?

48

CEPH STORAGE CLUSTER

OBJECT

Y

OSD

3

OSD

2

OSD

1

OSD

4

OSD

X

OSD

ERASURE CODED POOL

Copyright © 2014 by Inktank

Page 49: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CACHE TIERING

49

CEPH CLIENT

CACHE: WRITEBACK MODE

BACKING POOL (REPLICATED)

CEPH STORAGE CLUSTER

Read/Write Read/Write

Copyright © 2014 by Inktank

Page 50: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CACHE TIERING

50

CEPH CLIENT

CACHE: READ ONLY MODE

BACKING POOL (REPLICATED)

CEPH STORAGE CLUSTER

Write Write Read Read

Copyright © 2014 by Inktank

Page 51: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

WEBSCALE APPLICATIONS

51

WEB APPLICATION

APP SERVER

APP SERVER

APP SERVER

CEPH STORAGE CLUSTER(RADOS)

APP SERVER

NativeProtocol

NativeProtocol

NativeProtocol

NativeProtocol

Copyright © 2014 by Inktank

Page 52: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

ARCHIVE / COLD STORAGE

52

APPLICATION

CACHE POOL (REPLICATED)

BACKING POOL (ERASURE CODED)

Site A Site B

CEPH STORAGE CLUSTER CEPH STORAGE CLUSTER

Copyright © 2014 by Inktank

Page 53: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CEPH BLOCK DEVICE (RBD)

DATABASES

53

MYSQL / MARIADB

LINUX KERNEL

CEPH STORAGE CLUSTER(RADOS)

NativeProtocol

NativeProtocol

NativeProtocol

NativeProtocol

Copyright © 2014 by Inktank

Page 54: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

WHAT ABOUT CEPH AND OPENSTACK?

Page 55: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CEPH AND OPENSTACK

55

RADOSGWLIBRADOS

M M

RADOS CLUSTER

OPENSTACK

KEYSTONE CINDER GLANCE

NOVASWIFTLIBRB

DLIBRB

D

HYPER-

VISORLIBRBD

Copyright © 2014 by Inktank

Page 56: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

JUNO Enable Cloning for rbd-backed ephemeral disks

KILO Volume Migration from One Backend to Another Implement proper snapshotting for Ceph-based

ephemeral disks Improve Backup in Cinder

OPENSTACK ADDITIONS

Copyright © 2014 by Inktank

Page 57: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

Future Ceph Roadmap

Page 58: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

CEPH ROADMAP

58

Giant Hammer I-Release

RBD df

Object Versioning

Performance Improvements

RBD Mirroring

Copyright © 2014 by Inktank

Calamari

Alternative Web Server for RGW

Performance Improvements

Object Expiration

Performance Improvements

Page 59: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

NEXT STEPS

Page 60: What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Meetup October 14 2014

NEXT STEPSWHAT NOW?

• Read about the latest version of Ceph: http://ceph.com/docs

• Deploy a test cluster using ceph-deploy: http://ceph.com/qsg

Getting Started with Ceph

Most discussion happens on the mailing lists ceph-devel and ceph-users. Join or view archives at http://ceph.com/list

IRC is a great place to get help (or help others!) #ceph and #ceph-devel. Details and logs at http://ceph.com/irc

Getting Involved with Ceph

60

• Deploy a test cluster on the AWS free-tier using Juju: http://ceph.com/juju

• Ansible playbooks for Ceph: https://www.github.com/alfredodeza/ceph-ansible Download the code: http:

//www.github.com/ceph The tracker manages bugs and

feature requests. Register and start looking around at http://tracker.ceph.com

Doc updates and suggestions are always welcome. Learn how to contribute docs at http://ceph.com/docwriting


Recommended