Date post: | 16-Jan-2015 |
Category: |
Technology |
Upload: | inktank |
View: | 997 times |
Download: | 9 times |
Who am I?
● Wido den Hollander– Part of the Ceph community since 2010
– Co-owner of a dutch hosting company– Committer and PMC member for Apache CloudStack
● Developed:– phprados
– rados-java– libvirt RBD storage pool support
– CloudStack integration
● Work as a Ceph and CloudStack consultant
Ceph
Ceph is a unified, open source distributed object store
So why Ceph?
● As Sage already explained... :-)● Traditional storage systems don't scale that well
– All have their limitations: Number of disks, shelfs, CPUs, network connections, etc
– Scaling usually meant buying a second system● Migrating data requires service windows and watching rsync...
● Ceph clusters can grow and shrink without service interruptions
● Ceph runs on commodity hardware– Just add more nodes to add capacity
– Ceph fits in smaller budgets
Hardware failure is the rule
● As systems grow hardware failure becomes more frequent– A system with 1.000 nodes will see daily hardware
issues
● Commodity hardware is cheaper, but less reliable. Ceph mitigates that
RBD: the RADOS Block Device
● Ceph is a object store– Store billions of objects in pools
– RADOS is the heart of Ceph
● RBD block devices are striped over RADOS objects– Default stripe size is 4MB
– All objects are distributed over all available Object Store Daemons (OSDs)
RBD for Primary Storage
● In 4.0 RBD support for Primary Storage for KVM was added– No support for VMware or Xen
– Xen support is being worked on (not by me)
● Live migration is supported● Snapshot and backup support (4.2)● Cloning when deploying from templates● Run System VMs from RBD (4.2)● Uses the rados-java bindings
RBD for Primary Storage
Storage flow
System Virtual Machines
● Perform cluster tasks, e.g.:– DHCP
– Serving metadata to Instances
– Loadbalancing
– Copying data between clusters
– Run in between user Instances
● They can now run from RBD due to a change in the way they get their metadata– Old way was dirty and had to be replaced
● It created a small disk with metadata files
Performance
● Parallel performance is high– Run 100s or 1000s of Instances with 200 IOps
each
● Serial performance is moderate– A single Instance won't get 20k IOps
rados-java bindings
● Developed to have the KVM Agent perform snapshotting and cloning– libvirt doesn't know how to do this, but it would be
best if it did
● Uses JNA, so easy deployment● Binds both librados and librbd● Available on github.com/ceph/rados-java
Future plans
● Add RBD write caching– Write-cache setting per Disk Offering
● none (default), write-back and write-through
– Probably in 4.3
● Native RADOS support for Secondary Storage– Secondary Storage already supports S3
– Ceph has a S3-compatible gateway
● Moving logic from the KVM Agent into libvirt– Like snapshotting and cloning RBD images
Help is needed!
● Code is tested, but testing is always welcome● Adding more RBD logic into libvirt
– Snapshotting RBD images
– Cloning RBD images
– This makes the CloudStack code cleaner and helps other users who also use libvirt with RBD
● Improving the rados-java bindings– Not feature complete yet
Thanks
● Find me on:– E-Mail: [email protected]
– IRC: widodh @ Freenode / wido @ OFTC
– Skype: widodh / contact42on
– Twitter: widodh