+ All Categories
Home > Technology > 2012 Virtual Cloud Day

2012 Virtual Cloud Day

Date post: 25-May-2015
Category:
Upload: ceph-community
View: 651 times
Download: 1 times
Share this document with a friend
Description:
Our VP, Community Ross Turk's slides from his Virtual Cloud Day talk in Nov 2012.
Popular Tags:
127
SCALING STORAGE WITH CEPH Ross Turk, Inktank
Transcript
Page 1: 2012 Virtual Cloud Day

SCALING  STORAGE  WITH  CEPH

Ross  Turk,  Inktank  

Page 2: 2012 Virtual Cloud Day

WHO?

Ross Turk VP Community, Inktank

§  [email protected] §  @rossturk

inktank.com | ceph.com

Page 3: 2012 Virtual Cloud Day
Page 4: 2012 Virtual Cloud Day
Page 5: 2012 Virtual Cloud Day

me

Page 6: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 7: 2012 Virtual Cloud Day

IN  THE  BEGINNING Magic Madzik, Flickr / CC BY 2.0

Page 8: 2012 Virtual Cloud Day

EARLY   INFORMATION  STORAGE Chico.Ferreira, Flickr / CC BY 2.0

Page 9: 2012 Virtual Cloud Day

WRITING  >  CAVE  PAINTINGS kevingessner, Flickr / CC BY-SA 2.0

Page 10: 2012 Virtual Cloud Day

x1000

== x1

Page 11: 2012 Virtual Cloud Day

PEOPLE  BEGIN  WRITING  A  LOT Moyan_Brenn, Flickr / CC BY-ND 2.0

Page 12: 2012 Virtual Cloud Day

WRITING   IS  T IME-­‐CONSUMING trekkyandy, Flickr / CC BY 2.0

Page 13: 2012 Virtual Cloud Day

THE   INDUSTRIALIZATION  OF  WRITING FateDenied, Flickr / CC BY 2.0

Page 14: 2012 Virtual Cloud Day

x1000

== x1

+ magnet = tape magnetic tape

Page 15: 2012 Virtual Cloud Day

STORAGE  BECOMES  MECHANICAL Erik Pitti, Wikipedia / CC BY-ND 2.0

Page 16: 2012 Virtual Cloud Day

HUMAN COMPUTER TAPE

HUMAN ROCK

HUMAN

INK

PAPER

Page 17: 2012 Virtual Cloud Day

COMPUTERS  NEED  PEOPLE  TO  WORK USDAgov, Flickr / CC BY 2.0

Page 18: 2012 Virtual Cloud Day

HUMAN COMPUTER TAPE

Page 19: 2012 Virtual Cloud Day

11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010 01010110 01010011

==

Page 20: 2012 Virtual Cloud Day

THROUGHPUT  BECOMES   IMPORTANT Zane Luke, Flickr / CC BY-ND 2.0

Page 21: 2012 Virtual Cloud Day

LAZ0R  B3AMS  CHANGE  EVERYTHING!! Jeff Kubina, Flickr / CC-BY-SA 2.0

Page 22: 2012 Virtual Cloud Day

HARD  DRIVES  ARE  TOTALLY  BETTER

amazing spinny hard drives sucky stupid tape slow

Page 23: 2012 Virtual Cloud Day

EVERYTHING  GETS  MESSY Rob!, Flickr / CC BY 2.0

Page 24: 2012 Virtual Cloud Day

000

aa

ac ab

ba

111010

bb bc

110

010 111

dc

101

da 000

110 001

010 011 db

Page 25: 2012 Virtual Cloud Day

owner: rturk created: aug12

last viewed: aug17 size: 42025 perms: 644 11101011 10110110 10110101

10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010

file

Page 26: 2012 Virtual Cloud Day

000

aa

ac ab

ba

111010

bb bc

110

010 111

dc

101

da 000

110 001

010 db 01 10

Page 27: 2012 Virtual Cloud Day

WE  OUTGROW  THE  HARD  DRIVE Mr. T in DC, Flickr / CC BY 2.0

Page 28: 2012 Virtual Cloud Day

HUMAN COMPUTER

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

Page 29: 2012 Virtual Cloud Day

(COMPUTER)

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

DISK

HUMAN

HUMAN

HUMAN

HUMAN HUMAN

HUMAN

HUMAN HUMAN

HUMAN HUMAN

HUMAN

HUMAN HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN

HUMAN (actually more like this…)

Page 30: 2012 Virtual Cloud Day

DISK COMPUTER

HUMAN

HUMAN

HUMAN

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

Page 31: 2012 Virtual Cloud Day

000

aa

ac ab

ba

111010

bb bc

110

010 111

dc

101

da 000

110 001

010 011 db X

Page 32: 2012 Virtual Cloud Day

pace: quick driver: frog

license: expired expression: agog

11101011 10110110 10110101 10101001 00100100 01001001 10100100 10100101 01011010 01101010 10101010 10101010

object

Page 33: 2012 Virtual Cloud Day

DISK COMPUTER

APP

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

Page 34: 2012 Virtual Cloud Day

DISK

COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

COMPUTER

DISK

Page 35: 2012 Virtual Cloud Day

DISK

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

COMPUTER

VM

VM

VM

Page 36: 2012 Virtual Cloud Day

STORAGE  THROUGHOUT  H ISTORY Time-scale: Roughly logarithmic. Content: Whatever the opposite of “scientific” is.

Writing

Computers

Shared storage

Distributed storage

Cloud computing

Ceph

Painting

Page 37: 2012 Virtual Cloud Day

DISK COMPUTER

HUMAN

HUMAN

HUMAN

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

Page 38: 2012 Virtual Cloud Day

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

DISK COMPUTER

Page 39: 2012 Virtual Cloud Day

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

Page 40: 2012 Virtual Cloud Day

HUMAN

HUMAN

HUMAN

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

Page 41: 2012 Virtual Cloud Day

STORAGE  APPLIANCES Michael Moll, Wikipedia / CC BY-SA 2.0

Page 42: 2012 Virtual Cloud Day

6.4  MILL ION  SQFT  OF  FACTORIES Dude94111, Flickr / CC BY 2.0

Page 43: 2012 Virtual Cloud Day

TECHNOLOGY   IS  A  COMMODITY RaeAllen, Flickr / CC-BY 2.0

Page 44: 2012 Virtual Cloud Day

COMMODITY  PRICES  FLUCTUATE

May-07 May-08 May-09 May-10 May-11 May-12

Page 45: 2012 Virtual Cloud Day

Hardware Appliances are Mysterious Black Boxes Abode of Chaos, Flickr / CC BY 2.0

Page 46: 2012 Virtual Cloud Day

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

DC

HUMAN [DEVELOPER]

!!

Page 47: 2012 Virtual Cloud Day

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++

Page 48: 2012 Virtual Cloud Day

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++ X

Page 49: 2012 Virtual Cloud Day

THE WORLD NEEDS

AN OPEN STORAGE TECHNOLOGY

THAT SCALES

Page 50: 2012 Virtual Cloud Day

SAGE  WEIL

§  Co-founder of DreamHost

§  Inventor of Ceph

§  CEO of Inktank

Page 51: 2012 Virtual Cloud Day

OPEN SOURCE

philosophy design

Page 52: 2012 Virtual Cloud Day

OPEN  SOURCE  SPREADS   IDEAS orchidgalore, Flickr / CC BY 2.0

Page 53: 2012 Virtual Cloud Day

OPEN SOURCE

COMMUNITY-FOCUSED

philosophy design

Page 54: 2012 Virtual Cloud Day

WE  ARE  SMARTER  TOGETHER rturk, Linkedin Inmap

Page 55: 2012 Virtual Cloud Day

CEPH  BELONGS  TO  ALL  OF  US wackybadger, Flickr / CC BY 2.0

Page 56: 2012 Virtual Cloud Day

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

philosophy design

Page 57: 2012 Virtual Cloud Day

CEPH   IS  BUILT  TO  SCALE

Too much for a book

Too much for a drive

Too much for a computer

Too much for a room

Ceph

Too much for a cave

Page 58: 2012 Virtual Cloud Day

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

philosophy design

Page 59: 2012 Virtual Cloud Day

ARILOMAX  CALIFORNICUS aroid, Flickr / CC BY 2.0

Page 60: 2012 Virtual Cloud Day

THE  OCTOPUS   (A  METAPHOR) I love speaking in metaphors.

single point of failure

highly-available replicated

Page 61: 2012 Virtual Cloud Day

THE  BEEHIVE   (ANOTHER  METAPHOR) blumenbiene, Flickr / CC BY 2.0

Page 62: 2012 Virtual Cloud Day

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

SOFTWARE BASED

philosophy design

Page 63: 2012 Virtual Cloud Day

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++

Page 64: 2012 Virtual Cloud Day

DC

DC

DC

DC

D

C

DC

DC

DC

DC

DC

DC

DC

C++ ✔

Page 65: 2012 Virtual Cloud Day

OPEN SOURCE

COMMUNITY-FOCUSED

SCALABLE

NO SINGLE POINT OF FAILURE

SOFTWARE BASED SELF-

MANAGING

philosophy design

Page 66: 2012 Virtual Cloud Day

DISKS  =   JUST  T INY  RECORD  PLAYERS jon_a_ross, Flickr / CC BY 2.0

Page 67: 2012 Virtual Cloud Day

D

55 times / day

= D

D D

x 1 MILLION

D D

D D

Page 68: 2012 Virtual Cloud Day
Page 69: 2012 Virtual Cloud Day

IT  ALL  STARTED  WITH  A  DREAM

Page 70: 2012 Virtual Cloud Day

+

Page 71: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 72: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 73: 2012 Virtual Cloud Day

DISK

FS

DISK DISK

OSD

DISK DISK

OSD OSD OSD OSD

FS FS FS FS btrfs xfs ext4

M M M

Page 74: 2012 Virtual Cloud Day

M

M

M

HUMAN

Page 75: 2012 Virtual Cloud Day

Monitors: §  Maintain cluster map §  Provide consensus for

distributed decision-making

§  Must have an odd number §  These do not serve stored

objects to clients

M

OSDs: §  One per disk

(recommended) §  At least three in a cluster §  Serve stored objects to

clients §  Intelligently peer to perform

replication tasks §  Supports object classes

Page 76: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 77: 2012 Virtual Cloud Day

LIBRADOS

M

M

M

APP

native

Page 78: 2012 Virtual Cloud Day

L

78

LIBRADOS §  Provides direct access to

RADOS for applications §  C, C++, Python, PHP,

Java §  No HTTP overhead

Page 79: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 80: 2012 Virtual Cloud Day

M

M

M

native

REST

APP

LIBRADOS RADOSGW

LIBRADOS RADOSGW

APP

Page 81: 2012 Virtual Cloud Day

RADOS Gateway: §  REST-based interface to

RADOS §  Supports buckets,

accounting §  Compatible with S3 and

Swift applications

Page 82: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

Page 83: 2012 Virtual Cloud Day

M

M

M

VM

LIBRADOS LIBRBD

VIRTUALIZATION CONTAINER

Page 84: 2012 Virtual Cloud Day

LIBRADOS

M

M

M

LIBRBD CONTAINER

LIBRADOS LIBRBD

CONTAINER VM

Page 85: 2012 Virtual Cloud Day

LIBRADOS

M

M

M

KRBD (KERNEL MODULE) HOST

Page 86: 2012 Virtual Cloud Day

RADOS Block Device: §  Storage of virtual disks in

RADOS §  Allows decoupling of VMs

and containers §  Live migration!

§  Images are striped across the cluster

§  Boot support in QEMU, KVM, and OpenStack Nova

§  Mount support in the Linux kernel

Page 87: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

Page 88: 2012 Virtual Cloud Day

M

M

M

CLIENT

01 10

data metadata

Page 89: 2012 Virtual Cloud Day

Metadata Server §  Manages metadata for a

POSIX-compliant shared filesystem §  Directory hierarchy §  File metadata (owner,

timestamps, mode, etc.) §  Stores metadata in RADOS §  Does not serve file data to

clients §  Only required for shared

filesystem

Page 90: 2012 Virtual Cloud Day

WHAT MAKES CEPH

UNIQUE?

Page 91: 2012 Virtual Cloud Day

HOW  DO  YOU  F IND  YOUR  KEYS? azmeen, Flickr / CC BY 2.0

Page 92: 2012 Virtual Cloud Day

APP ??

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

Page 93: 2012 Virtual Cloud Day

APP

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

A-G

H-N

O-T

U-Z

F*

Page 94: 2012 Virtual Cloud Day

I  ALWAYS  PUT  MY  KEYS  ON  THE  HOOK vitamindave, Flickr / CC BY 2.0

Page 95: 2012 Virtual Cloud Day

APP

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

D C

Page 96: 2012 Virtual Cloud Day

DEAR  DIARY:  KEYS  =   IN  THE  KITCHEN Barnaby, Flickr / CC BY 2.0

Page 97: 2012 Virtual Cloud Day

HOW DO YOU FIND YOUR KEYS

WHEN YOUR HOUSE IS

INFINITELY BIG AND

ALWAYS CHANGING?

Page 98: 2012 Virtual Cloud Day

THE  ANSWER:  CRUSH!! pasukaru76, Flickr / CC SA 2.0

Page 99: 2012 Virtual Cloud Day

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

hash(object name) % num pg

CRUSH(pg, cluster state, rule set)

Page 100: 2012 Virtual Cloud Day

10 10 01 01 10 10 01 11 01 10

10 10 01 01 10 10 01 11 01 10

Page 101: 2012 Virtual Cloud Day

CRUSH §  Pseudo-random placement

algorithm §  Ensures even distribution §  Repeatable, deterministic §  Rule-based configuration

§  Replica count §  Infrastructure topology §  Weighting

Page 102: 2012 Virtual Cloud Day

CLIENT

??

Page 103: 2012 Virtual Cloud Day
Page 104: 2012 Virtual Cloud Day
Page 105: 2012 Virtual Cloud Day

CLIENT

??

Page 106: 2012 Virtual Cloud Day

LIBRADOS

M

M

M

VM

LIBRBD VIRTUALIZATION CONTAINER

Page 107: 2012 Virtual Cloud Day

HOW DO YOU SPIN UP

THOUSANDS OF VMs INSTANTLY

AND EFFICIENTLY?

Page 108: 2012 Virtual Cloud Day

144 0 0 0 0

instant copy

= 144

Page 109: 2012 Virtual Cloud Day

4 144

CLIENT

write

write

write

= 148

write

Page 110: 2012 Virtual Cloud Day

4 144

CLIENT read

read

read

= 148

Page 111: 2012 Virtual Cloud Day

HOW DO YOU MANAGE

DIRECTORY HEIRARCHY WITHOUT

A SINGLE POINT OF

FAILURE?

Page 112: 2012 Virtual Cloud Day

FILESYSTEMS  REQUIRE  METADATA Barnaby, Flickr / CC BY 2.0

Page 113: 2012 Virtual Cloud Day

M

M

M

CLIENT

01 10

Page 114: 2012 Virtual Cloud Day

M

M

M

Page 115: 2012 Virtual Cloud Day

one tree

three metadata servers

??

Page 116: 2012 Virtual Cloud Day
Page 117: 2012 Virtual Cloud Day
Page 118: 2012 Virtual Cloud Day
Page 119: 2012 Virtual Cloud Day
Page 120: 2012 Virtual Cloud Day

DYNAMIC SUBTREE PARTITIONING

Page 121: 2012 Virtual Cloud Day

AND NOW BACKPEDALING

Page 122: 2012 Virtual Cloud Day

ALMOST EVERYTHING

WORKS

Page 123: 2012 Virtual Cloud Day

RADOS A reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes

LIBRADOS A library allowing apps to directly access RADOS, with support for C, C++, Java, Python, Ruby, and PHP

RBD A reliable and fully-distributed block device, with a Linux kernel client and a QEMU/KVM driver

RADOSGW A bucket-based REST gateway, compatible with S3 and Swift

APP APP HOST/VM CLIENT

CEPH FS A POSIX-compliant distributed file system, with a Linux kernel client and support for FUSE

NEARLY AWESOME

AWESOME AWESOME

AWESOME

AWESOME

Page 124: 2012 Virtual Cloud Day

LAN SCALE!! *

* OR REALLY REALLY SCARY FAST WAN

Page 125: 2012 Virtual Cloud Day

CEPH  AND  CLOUDSTACK tableatny, Flickr / CC BY 2.0

Page 126: 2012 Virtual Cloud Day

RBD  SUPPORT   IN  CLOUDSTACK

§  Allows storage of virtual disks inside RADOS §  Works with KVM only right now §  No snapshots yet

§  Upcoming in CloudStack 4 §  More information can be found on the mailing list:

§  ceph-devel / incubator-cloudstack-dev: http://article.gmane.org/gmane.comp.file-systems.ceph.devel/7505

Page 127: 2012 Virtual Cloud Day

QUESTIONS?

Ross Turk VP Community, Inktank

§  [email protected] §  @rossturk

inktank.com | ceph.com


Recommended