Ceph Day Santa Clara: Ceph Fundamentals

Post on 16-Jan-2015

1,271 views 0 download

Tags:

description

Ross Turk, VP, Marketing & Community, Inktank Ceph is an open source distributed object store, network block device, and file system designed for reliability, performance, and scalability. It runs on standard hardware, has no single point of failure, and is supported by the Linux kernel. It also works great with OpenStack and CloudStack. If you’ve heard of Ceph but aren’t sure where it fits into your plans, this is the talk for you. Designed for those who are new to Ceph, this talk will cover Ceph’s design principles, overall architecture, and integration with other operational systems.

transcript

Ceph FundamentalsRoss TurkVP Community, Inktank

2

ME ME ME ME ME ME.

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com

3

Ceph Architectural OverviewAh! Finally, 32 slides in and he gets to the nerdy stuff.

4

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

5

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

6

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FSFS btrfsxfsext4

MMM

7

M

M

M

HUMAN

8

Monitors:• Maintain cluster

membership and state• Provide consensus for

distributed decision-making• Small, odd number• These do not serve stored

objects to clients

M

OSDs:• 10s to 10000s in a cluster• One per disk• (or one per SSD, RAID group…)• Serve stored objects to

clients• Intelligently peer to perform

replication and recovery tasks

9

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

10

LIBRADOS

M

M

M

APP

socket

LLIBRADOS• Provides direct access to

RADOS for applications• C, C++, Python, PHP, Java,

Erlang• Direct access to storage

nodes• No HTTP overhead

12

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

13

M

M

M

LIBRADOS

RADOSGW

APP

socket

REST

14

RADOS Gateway:• REST-based object

storage proxy• Uses RADOS to store

objects• API supports buckets,

accounts• Usage accounting for

billing• Compatible with S3 and

Swift applications

15

16

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

17

M

M

M

VM

LIBRADOS

LIBRBD

HYPERVISOR

18

LIBRADOS

M

M

M

LIBRBD

HYPERVISOR

LIBRADOS

LIBRBD

HYPERVISORVM

19

LIBRADOS

M

M

M

KRBD (KERNEL MODULE)

HOST

20

RADOS Block Device:• Storage of disk images in

RADOS• Decouples VMs from host• Images are striped across

the cluster (pool)• Snapshots• Copy-on-write clones• Support in:• Mainline Linux Kernel

(2.6.39+)• Qemu/KVM, native Xen

coming soon• OpenStack, CloudStack,

Nebula, Proxmox

21

RADOS

A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS

A library allowingapps to directlyaccess RADOS,with support forC, C++, Java,Python, Ruby,and PHP

RBD

A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS

A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW

A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

22

M

M

M

CLIENT

0110

datametadata

23

Metadata Server• Manages metadata for a

POSIX-compliant shared filesystem• Directory hierarchy• File metadata (owner,

timestamps, mode, etc.)• Stores metadata in RADOS• Does not serve file data to

clients• Only required for shared

filesystem

24

What Makes Ceph Unique?Part one: it never, ever remembers where it puts stuff.

25

APP??

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

26How Long Did It Take You To Find Your Keys This Morning?azmeen, Flickr / CC BY 2.0

27

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

28Dear Diary: Today I Put My Keys on the Kitchen CounterBarnaby, Flickr / CC BY 2.0

29

APP

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

A-G

H-N

O-T

U-Z

F*

30I Always Put My Keys on the Hook By the Doorvitamindave, Flickr / CC BY 2.0

31

HOW DO YOUFIND YOUR KEYS

WHEN YOUR HOUSEIS

INFINITELY BIGAND

ALWAYS CHANGING?

32The Answer: CRUSH!!!!!pasukaru76, Flickr / CC SA 2.0

33

OBJECT

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

34

OBJECT

10 10 01 01 10 10 01 11 01 10

35

CRUSH• Pseudo-random placement

algorithm• Fast calculation, no lookup• Repeatable, deterministic• Statistically uniform

distribution• Stable mapping• Limited data migration on

change• Rule-based configuration• Infrastructure topology aware• Adjustable replication• Weighting

36

CLIENT

??

37

38

39

40

CLIENT

??

41

What Makes Ceph UniquePart two: it has smart block devices for all those impatient, selfish VMs.

42

LIBRADOS

M

M

M

VM

LIBRBD

HYPERVISOR

43

HOW DO YOUSPIN UP

THOUSANDS OF VMsINSTANTLY

ANDEFFICIENTLY?

44

144 0 0 0 0

instant copy

= 144

45

4144

CLIENT

write

write

write

= 148

write

46

4144

CLIENTread

read

read

= 148

47

What Makes Ceph Unique?Part three: it has powerful friends, ones you probably already know.

48

M

M

M

APACHE CLOUDSTACK

HYPER-VISOR

PRIMARY STORAGE POOL

SECONDARY STORAGE POOL

snapshots

templates

images

49

M

M

M

OPENSTACK

KEYSTONE API

SWIFT API CINDER API GLANCE API

NOVAAPI

HYPER-VISOR

RADOSGW

50

What Makes Ceph Unique?Part three: clustered metadata

51

M

M

M

CLIENT

0110

52

M

M

M

53

one tree

three metadata servers

??

54

55

56

57

58

DYNAMIC SUBTREE PARTITIONING

59

Questions?

Ross TurkVP Community, Inktank

ross@inktank.com@rossturk

inktank.com | ceph.com