+ All Categories
Home > Technology > Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Date post: 01-Dec-2014
Category:
Upload: accumulo-summit
View: 652 times
Download: 2 times
Share this document with a friend
Description:
Speaker: Sapan "Soup" Shah As a whole the community buys a lot of hardware, and currently we run Accumulo in a very static context. Users provision servers up front and we have a lot of applications sharing the same database. As Accumulo adds more features for isolation in the newer versions, we take a little bit of a different approach. We are going to go about using Docker to provision new databases and allow all the databases to talk on a “local” network, and use a shared zookeeper/HDFS cluster. What makes this solution even more attractive is the ability to dynamically spin up and even better spin down tablet servers as the database is going through peak load. Another nice advantage of this approach is that users can deploy iterators into this environment with little fear that someone else’s iterator will take down their accumulo. In the future of this we would like to hook into Accumulo even more using the JMX messages that the monitor uses currently to gather statistics.
17
Docker + Accumulo = <3 Dynamically Scaling Accumulo with Docker
Transcript
Page 1: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Docker + Accumulo = <3Dynamically Scaling Accumulo with Docker

Page 2: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Cloud DatabaseCurrently there are 2 primary methods for delivering a Database as a Service (DBaaS):● Virtual Machine: Cloud platforms allow users to purchase virtual machine

(VM) instances for a limited time. It is possible to run a database on these virtual machines. Users can either upload their own machine image with a database installed on it, or use ready-made machine images that already include an optimized installation of a database.

● DBaaS: In this configuration, application owners do not have to install and maintain the database on their own. Instead, the database service provider takes responsibility for installing and maintaining the database, and application owners pay according to their usage.

Page 3: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

DBaaSPros● All users hit the same API so upgrades to the backend database should

not be noticed by users● Administration is easier as administrators only have to maintain a single

version of the database

Cons● During spikes resources can be scarce as analytics can not be run during

heavy query times

Page 4: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Virtualized DB

Pros● Allows users to take full advantage of the databases feature without

stomping over each other’s resources● Easier for the underlying database as there needs to be no new code

created to support

Cons● Wasting of some resources in the case of a database which is not hit very

often then the resources are wasted on idling databases● Support staff must be more familiar with different versions of the DB

Page 5: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

What is a Linux container?● Inception Linux running within Linux● Provides resource isolation (CPU, Network,

Disk I/O, and RAM) and namespacing● Looks like a VM on the inside of a container● Looks like a process on the outside of a

container

Page 6: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Why Docker?● Docker provides portable deployment across machines, by

providing a mechanism to bundle an application and all its

dependencies into a single format

● Docker has a built in versioning system which is similar to git

● Docker provides component reuse as any docker format can be

used as a “base image”

● Sharing docker containers comes with a public repository

(http://index.docker.io/)

Page 7: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Networking in DockerTo have two docker containers talk to each other simply use the docker link command:sudo docker run -d -P --name web --link db:db training/webapp python app.py

What this command is doing is creating a secure tunnel between the two containers without having them expose ports to the control system. It does this in two ways

● Using Environment Variables

● Updating /etc/hosts

This will only run across containers which are running on the same host

Page 8: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

MultiHost NetworkingThere are numerous ways to link MultiHost Docker containers including (VPN, Bridges, and VPNs). One such method to link containers across multiple hosts we will apply the docker ambassador pattern. An ambassador is a container in between two containers which can take care of the talking so that containers can move hosts but an application will never have to restart itself on change just the ambassador container It looks like:(consumer) --> (redis-ambassador) ---network---> (redis-ambassador) --> (redis)

Page 9: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Accumulo on Docker● Inspired by Slider (Formerly: Hoya) to allow spinning up

databases on the fly (Check out: https://github.com/apache/incubator-slider)

● Also inspired by (https://bitbucket.org/fourtwosix/bdaassrc)● Allows users to spin up their own Accumulo while sharing a

global HDFS and ZooKeeper, for their own application● Currently allows scaling by adding more tablet servers

manually

Page 10: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Advantages● Security between data is simpler as users literally have

their own version of the database, users do not risk data slippages by sharing the same system (nobody knows what a particular iterator is going to spit out in a log)

● Monitoring a CIO/Program Director can easily monitor which databases are getting hit the hardest and make sure that those DBs are allocated more hardware

Page 11: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Advantages (ctd)● Docker makes it trivial to backup snapshots of the

running database servers● Allows users to configure the database however they

choose, which means users can have their own scheme for how they want to do compactions

● Users can figure out peek times to which there data is being hit and can schedule analytic jobs to be run during off peak hours

Page 12: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Advantages (ctd)● Allows applications to maintain different versions of their

databases and with the work on #ACCUMULO 378 users can possibly create 2 copies of their databases one for analytics and one for real time query

● Lets administrators kill off application databases which are behaving badly without having to effect all the applications running on the system

Page 13: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Disadvantages● Administrators have to potentially understand different

versions of the databases as both can coexist on the system

● With the HDFS permission scheme user and group management becomes a difficult

● Port allocation becomes a bit tricky and IPTables rules may become a bit unruly

Page 14: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Accumulo: Built To ScaleWith Accumulo 1.5 and beyond even more smarts have been built in to make scaling easier

● Load balancing built into the master to recognize when a tablet server dies

● Iterators are now stored in HDFS so they do not have to be pushed to every machine

● Accumulo allows for multiple masters which makes failover better

● Accumulo WAL stored in HDFS also making it easier to scale as tablets can read these from one location in HDFS

Page 15: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Future Improvements● Tie this into SLIDER allow YARN to be the resource

allocator and docker to be the container that things are deployed too (see YARN 1964)

● Use the JMX statistics that the accumulo monitor gets to dynamically scale up and down tablet servers based on load

● Add container creation and deletion via Ambari and Cloudera Manager

Page 16: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Future Improvements (ctd)● Make a GUI to make deploying containers and health

monitoring of the system easier● Make a GUI to view system health and to see what

databases are deployed onto the system● Add HBase support

Page 17: Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Questions (???)


Recommended