+ All Categories
Home > Software > Keeping OpenStack storage trendy with Ceph and containers

Keeping OpenStack storage trendy with Ceph and containers

Date post: 28-Jul-2015
Category:
Upload: sage-weil
View: 1,535 times
Download: 2 times
Share this document with a friend
Popular Tags:
42
KEEPING OPENSTACK STORAGE TRENDY WITH CEPH AND CONTAINERS SAGE WEIL, HAOMAI WANG OPENSTACK SUMMIT - 2015.05.20
Transcript

KEEPING OPENSTACK STORAGE TRENDYWITH CEPH AND CONTAINERS

SAGE WEIL, HAOMAI WANGOPENSTACK SUMMIT - 2015.05.20

2

AGENDA

● Motivation● Block● File● Container orchestration● Summary

MOTIVATION

4

WEB APPLICATION

APP SERVER APP SERVER APP SERVER APP SERVER

A CLOUD SMORGASBORD

● Compelling clouds offer options

● Compute

– VM (KVM, Xen, …)

– Containers (lxc, Docker, OpenVZ, ...)

● Storage

– Block (virtual disk)

– File (shared)

– Object (RESTful, …)

– Key/value

– NoSQL

– SQL

5

WHY CONTAINERS?

Technology

● Performance

– Shared kernel

– Faster boot

– Lower baseline overhead

– Better resource sharing

● Storage

– Shared kernel → efficient IO

– Small image → efficient deployment

Ecosystem

● Emerging container host OSs

– Atomic – http://projectatomic.io

● os-tree (s/rpm/git/)

– CoreOS

● systemd + etcd + fleet

– Snappy Ubuntu

● New app provisioning model

– Small, single-service containers

– Standalone execution environment

● New open container spec nulecule

– https://github.com/projectatomic/nulecule

6

WHY NOT CONTAINERS?

Technology

● Security

– Shared kernel

– Limited isolation

● OS flexibility

– Shared kernel limits OS choices

● Inertia

Ecosystem

● New models don't capture many legacy services

7

WHY CEPH?

● All components scale horizontally

● No single point of failure

● Hardware agnostic, commodity hardware

● Self-manage whenever possible

● Open source (LGPL)

● Move beyond legacy approaches

– client/cluster instead of client/server

– avoid ad hoc HA

8

CEPH COMPONENTS

RGWA web services gateway

for object storage, compatible with S3 and

Swift

LIBRADOSA library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)

RADOSA software-based, reliable, autonomous, distributed object store comprised ofself-healing, self-managing, intelligent storage nodes and lightweight monitors

RBDA reliable, fully-distributed

block device with cloud platform integration

CEPHFSA distributed file system

with POSIX semantics and scale-out metadata

management

APP HOST/VM CLIENT

BLOCK STORAGE

10

EXISTING BLOCK STORAGE MODEL

VM

● VMs are the unit of cloud compute

● Block devices are the unit of VM storage

– ephemeral: not redundant, discarded when VM dies

– persistent volumes: durable, (re)attached to any VM

● Block devices are single-user

● For shared storage,

– use objects (e.g., Swift or S3)

– use a database (e.g., Trove)

– ...

11

KVM + LIBRBD.SO

● Model

– Nova → libvirt → KVM → librbd.so

– Cinder → rbd.py → librbd.so

– Glance → rbd.py → librbd.so

● Pros

– proven

– decent performance

– good security

● Cons

– performance could be better

● Status

– most common deployment model today (~44% in latest survey)

M M

RADOS CLUSTER

QEMU / KVM

LIBRBD

VM NOVA

CINDER

12

MULTIPLE CEPH DRIVERS

● librbd.so

– qemu-kvm

– rbd-fuse (experimental)

● rbd.ko (Linux kernel)

– /dev/rbd*

– stable and well-supported on modern kernels and distros

– some feature gap

● no client-side caching● no “fancy striping”

– performance delta

● more efficient → more IOPS● no client-side cache → higher latency for some workloads

13

LXC + CEPH.KO

● The model

– libvirt-based lxc containers

– map kernel RBD on host

– pass host device to libvirt, container

● Pros

– fast and efficient

– implement existing Nova API

● Cons

– weaker security than VM

● Status

– lxc is maintained

– lxc is less widely used

– no prototype

M M

RADOS CLUSTER

LINUX HOST

RBD.KO

CONTAINER

NOVA

14

NOVA-DOCKER + CEPH.KO

● The model

– docker container as mini-host

– map kernel RBD on host

– pass RBD device to container, or

– mount RBD, bind dir to container

● Pros

– buzzword-compliant

– fast and efficient

● Cons

– different image format

– different app model

– only a subset of docker feature set

● Status

– no prototype

– nova-docker is out of tree

https://wiki.openstack.org/wiki/Docker

15

IRONIC + CEPH.KO

● The model

– bare metal provisioning

– map kernel RBD directly from guest image

● Pros

– fast and efficient

– traditional app deployment model

● Cons

– guest OS must support rbd.ko

– requires agent

– boot-from-volume tricky

● Status

– Cinder and Ironic integration is a hot topic at summit

● 5:20p Wednesday (cinder)

– no prototype

● References– https://wiki.openstack.org/wiki/Ironic/blueprints/

cinder-integration

M M

RADOS CLUSTER

LINUX HOST

RBD.KO

16

BLOCK - SUMMARY

● But

– block storage is same old boring

– volumes are only semi-elastic (grow, not shrink; tedious to resize)

– storage is not shared between guests

performance efficiency VMclient cache

stripingsame

images?exists

kvm + librbd.so best good X X X yes X

lxc + rbd.ko good best close

nova-docker + rbd.ko good best no

ironic + rbd.ko good best close? planned!

FILE STORAGE

18

MANILA FILE STORAGE

● Manila manages file volumes

– create/delete, share/unshare

– tenant network connectivity

– snapshot management

● Why file storage?

– familiar POSIX semantics

– fully shared volume – many clients can mount and share data

– elastic storage – amount of data can grow/shrink without explicit provisioning

MANILA

19

MANILA CAVEATS

● Last mile problem

– must connect storage to guest network

– somewhat limited options (focus on Neutron)

● Mount problem

– Manila makes it possible for guest to mount

– guest is responsible for actual mount

– ongoing discussion around a guest agent …

● Current baked-in assumptions about both of these

MANILA

20

?

APPLIANCE DRIVERS

● Appliance drivers

– tell an appliance to export NFS to guests

– map appliance IP into tenant network (Neutron)

– boring (closed, proprietary, expensive, etc.)

● Status

– several drivers from usual suspects

– security punted to vendor

NFS

MANILA

21

GANESHA DRIVER

● Model

– service VM running nfs-ganesha server

– mount file system on storage network

– export NFS to tenant network

– map IP into tenant network

● Status

– in-tree, well-supported

KVM

GANESHA???

NFS

MANILA

???

22

KVM

GANESHA

KVM + GANESHA + LIBCEPHFS

● Model

– existing Ganesha driver, backed by Ganesha's libcephfs FSAL

● Pros

– simple, existing model

– security

● Cons

– extra hop → higher latency

– service VM is SpoF

– service VM consumes resources

● Status

– Manila Ganesha driver exists

– untested with CephFS

M M

RADOS CLUSTER

LIBCEPHFS

KVM

NFS

NFS.KO

MANILA

NATIVE CEPH

23

KVM + CEPH.KO (CEPH-NATIVE)

● Model

– allow tenant access to storage network

– mount CephFS directly from tenant VM

● Pros

– best performance

– access to full CephFS feature set

– simple

● Cons

– guest must have modern distro/kernel

– exposes tenant to Ceph cluster

– must deliver mount secret to client

● Status

– no prototype

– CephFS isolation/security is work-in-progress

KVM

M M

RADOS CLUSTER

CEPH.KO

MANILA

NATIVE CEPH

24

NETWORK-ONLY MODEL IS LIMITING

● Current assumption of NFS or CIFS sucks

● Always relying on guest mount support sucks

– mount -t ceph -o what?

● Even assuming storage connectivity is via the network sucks

● There are other options!

– KVM virtfs/9p

● fs pass-through to host● 9p protocol● virtio for fast data transfer● upstream; not widely used

– NFS re-export from host

● mount and export fs on host● private host/guest net● avoid network hop from NFS

service VM

– containers and 'mount --bind'

25

NOVA “ATTACH FS” API

● Mount problem is ongoing discussion by Manila team

– discussed this morning

– simple prototype using cloud-init

– Manila agent? leverage Zaqar tenant messaging service?

● A different proposal

– expand Nova to include “attach/detach file system” API

– analogous to current attach/detach volume for block

– each Nova driver may implement function differently

– “plumb” storage to tenant VM or container

● Open question

– Would API do the final “mount” step as well? (I say yes!)

26

KVM + VIRTFS/9P + CEPHFS.KO

● Model

– mount kernel CephFS on host

– pass-through to guest via virtfs/9p

● Pros

– security: tenant remains isolated from storage net + locked inside a directory

● Cons

– require modern Linux guests

– 9p not supported on some distros

– “virtfs is ~50% slower than a native mount?”

● Status

– Prototype from Haomai Wang

HOST

M M

RADOS CLUSTER

KVM VIRTFS

MANILA

NATIVE CEPH

CEPH.KO

VM9P

NOVA

27

KVM + NFS + CEPHFS.KO

● Model

– mount kernel CephFS on host

– pass-through to guest via NFS

● Pros

– security: tenant remains isolated from storage net + locked inside a directory

– NFS is more standard

● Cons

– NFS has weak caching consistency

– NFS is slower

● Status

– no prototype

HOST

M M

RADOS CLUSTER

KVM

MANILA

NATIVE CEPH

CEPH.KO

VMNFS

NOVA

28

(LXC, NOVA-DOCKER) + CEPHFS.KO

● Model

– host mounts CephFS directly

– mount --bind share into container namespace

● Pros

– best performance

– full CephFS semantics

● Cons

– rely on container for security

● Status

– no prototype

HOST

M M

RADOS CLUSTER

CONTAINER

MANILA

NATIVE CEPH

CEPH.KO

NOVA

29

IRONIC + CEPHFS.KO

● Model

– mount CephFS directly from bare metal “guest”

● Pros

– best performance

– full feature set

● Cons

– rely on CephFS security

– networking?

– agent to do the mount?

● Status

– no prototype

– no suitable (ironic) agent (yet)

HOST

M M

RADOS CLUSTER

MANILA

NATIVE CEPH

CEPH.KO

NOVA

30

THE MOUNT PROBLEM

● Containers may break the current 'network fs' assumption

– mounting becomes driver-dependent; harder for tenant to do the right thing

● Nova “attach fs” API could provide the needed entry point

– KVM: qemu-guest-agent

– Ironic: no guest agent yet...

– containers (lxc, nova-docker): use mount --bind from host

● Or, make tenant do the final mount?

– Manila API to provide command (template) to perform the mount

● e.g., “mount -t ceph $cephmonip:/manila/$uuid $PATH -o ...”

– Nova lxc and docker

● bind share to a “dummy” device /dev/manila/$uuid● API mount command is 'mount --bind /dev/manila/$uuid $PATH'

31

SECURITY: NO FREE LUNCH

● (KVM, Ironic) + ceph.ko

– access to storage network relies on Ceph security

● KVM + (virtfs/9p, NFS) + ceph.ko

– better security, but

– pass-through/proxy limits performance

● (by how much?)

● Containers

– security (vs a VM) is weak at baseline, but

– host performs the mount; tenant locked into their share directory

32

PERFORMANCE

● 2 nodes

– Intel E5-2660

– 96GB RAM

– 10gb NIC

● Server

– 3 OSD (Intel S3500)

– 1 MON

– 1 MDS

● Client VMs

– 4 cores

– 2GB RAM

● iozone, 2x available RAM

● CephFS native

– VM ceph.ko → server

● CephFS 9p/virtfs

– VM 9p → host ceph.ko → server

● CephFS NFS

– VM NFS → server ceph.ko → server

33

SEQUENTIAL

34

RANDOM

35

SUMMARY MATRIX

performance consistency VM gateway net hops security agentmount agent

prototype

kvm + ganesha + libcephfs

slower (?) weak (nfs) X X 2 host X X

kvm + virtfs + ceph.ko good good X X 1 host X X

kvm + nfs + ceph.ko good weak (nfs) X X 1 host X

kvm + ceph.ko better best X 1 ceph X

lxc + ceph.ko best best 1 ceph

nova-docker + ceph.ko best best 1 cephIBM talk -Thurs 9am

ironic + ceph.ko best best 1 ceph X X

CONTAINER ORCHESTRATION

37

CONTAINERS ARE DIFFERENT

● nova-docker implements a Nova view of a (Docker) container

– treats container like a standalone system

– does not leverage most of what Docker has to offer

– Nova == IaaS abstraction

● Kubernetes is the new hotness

– higher-level orchestration for containers

– draws on years of Google experience running containers at scale

– vibrant open source community

38

KUBERNETES SHARED STORAGE

● Pure Kubernetes – no OpenStack

● Volume drivers

– Local

● hostPath, emptyDir

– Unshared

● iSCSI, GCEPersistentDisk, Amazon EBS, Ceph RBD – local fs on top of existing device

– Shared

● NFS, GlusterFS, Amazon EFS, CephFS

● Status

– Ceph drivers under review

● Finalizing model for secret storage, cluster parameters (e.g., mon IPs)

– Drivers expect pre-existing volumes

● recycled; missing REST API to create/destroy volumes

39

KUBERNETES ON OPENSTACK

● Provision Nova VMs

– KVM or ironic

– Atomic or CoreOS

● Kubernetes per tenant

● Provision storage devices

– Cinder for volumes

– Manila for shares

● Kubernetes binds into pod/container

● Status

– Prototype Cinder plugin for Kuberneteshttps://github.com/spothanis/kubernetes/tree/cinder-vol-plugin

KVM

Kube node

nginx pod

mysql pod

KVM

Kube node

nginx pod

mysql pod

KVM

Kube master

Volumecontroller

...

CINDER MANILA

NOVA

40

WHAT NEXT?

● Ironic agent

– enable Cinder (and Manila?) on bare metal

– Cinder + Ironic

● 5:20p Wednesday (Cinder)

● Expand breadth of Manila drivers

– virtfs/9p, ceph-native, NFS proxy via host, etc.

– the last mile is not always the tenant network!

● Nova “attach fs” API (or equivalent)

– simplify tenant experience

– paper over VM vs container vs bare metal differences

THANK YOU!

Sage WeilCEPH PRINCIPAL ARCHITECT

Haomai WangFREE AGENT

[email protected]@gmail.com

@liewegas

42

FOR MORE INFORMATION

● http://ceph.com

● http://github.com/ceph

● http://tracker.ceph.com

● Mailing lists

[email protected]

[email protected]

● irc.oftc.net

– #ceph

– #ceph-devel

● Twitter

– @ceph


Recommended