Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Docker + Accumulo = <3Dynamically Scaling Accumulo with Docker

Cloud DatabaseCurrently there are 2 primary methods for delivering a Database as a Service (DBaaS):● Virtual Machine: Cloud platforms allow users to purchase virtual machine

(VM) instances for a limited time. It is possible to run a database on these virtual machines. Users can either upload their own machine image with a database installed on it, or use ready-made machine images that already include an optimized installation of a database.

● DBaaS: In this configuration, application owners do not have to install and maintain the database on their own. Instead, the database service provider takes responsibility for installing and maintaining the database, and application owners pay according to their usage.

DBaaSPros● All users hit the same API so upgrades to the backend database should

not be noticed by users● Administration is easier as administrators only have to maintain a single

version of the database

Cons● During spikes resources can be scarce as analytics can not be run during

heavy query times

Virtualized DB

Pros● Allows users to take full advantage of the databases feature without

stomping over each other’s resources● Easier for the underlying database as there needs to be no new code

created to support

Cons● Wasting of some resources in the case of a database which is not hit very

often then the resources are wasted on idling databases● Support staff must be more familiar with different versions of the DB

What is a Linux container?● Inception Linux running within Linux● Provides resource isolation (CPU, Network,

Disk I/O, and RAM) and namespacing● Looks like a VM on the inside of a container● Looks like a process on the outside of a

container

Why Docker?● Docker provides portable deployment across machines, by

providing a mechanism to bundle an application and all its

dependencies into a single format

● Docker has a built in versioning system which is similar to git

● Docker provides component reuse as any docker format can be

used as a “base image”

● Sharing docker containers comes with a public repository

(http://index.docker.io/)

Networking in DockerTo have two docker containers talk to each other simply use the docker link command:sudo docker run -d -P --name web --link db:db training/webapp python app.py

What this command is doing is creating a secure tunnel between the two containers without having them expose ports to the control system. It does this in two ways

● Using Environment Variables

● Updating /etc/hosts

This will only run across containers which are running on the same host

MultiHost NetworkingThere are numerous ways to link MultiHost Docker containers including (VPN, Bridges, and VPNs). One such method to link containers across multiple hosts we will apply the docker ambassador pattern. An ambassador is a container in between two containers which can take care of the talking so that containers can move hosts but an application will never have to restart itself on change just the ambassador container It looks like:(consumer) --> (redis-ambassador) ---network---> (redis-ambassador) --> (redis)

Accumulo on Docker● Inspired by Slider (Formerly: Hoya) to allow spinning up

databases on the fly (Check out: https://github.com/apache/incubator-slider)

● Also inspired by (https://bitbucket.org/fourtwosix/bdaassrc)● Allows users to spin up their own Accumulo while sharing a

global HDFS and ZooKeeper, for their own application● Currently allows scaling by adding more tablet servers

manually

https://github.com/apache/incubator-slider

Advantages● Security between data is simpler as users literally have

their own version of the database, users do not risk data slippages by sharing the same system (nobody knows what a particular iterator is going to spit out in a log)

● Monitoring a CIO/Program Director can easily monitor which databases are getting hit the hardest and make sure that those DBs are allocated more hardware

Advantages (ctd)● Docker makes it trivial to backup snapshots of the

running database servers● Allows users to configure the database however they

choose, which means users can have their own scheme for how they want to do compactions

● Users can figure out peek times to which there data is being hit and can schedule analytic jobs to be run during off peak hours

Advantages (ctd)● Allows applications to maintain different versions of their

databases and with the work on #ACCUMULO 378 users can possibly create 2 copies of their databases one for analytics and one for real time query

● Lets administrators kill off application databases which are behaving badly without having to effect all the applications running on the system

Disadvantages● Administrators have to potentially understand different

versions of the databases as both can coexist on the system

● With the HDFS permission scheme user and group management becomes a difficult

● Port allocation becomes a bit tricky and IPTables rules may become a bit unruly

Accumulo: Built To ScaleWith Accumulo 1.5 and beyond even more smarts have been built in to make scaling easier

● Load balancing built into the master to recognize when a tablet server dies

● Iterators are now stored in HDFS so they do not have to be pushed to every machine

● Accumulo allows for multiple masters which makes failover better

● Accumulo WAL stored in HDFS also making it easier to scale as tablets can read these from one location in HDFS

Future Improvements● Tie this into SLIDER allow YARN to be the resource

allocator and docker to be the container that things are deployed too (see YARN 1964)

● Use the JMX statistics that the accumulo monitor gets to dynamically scale up and down tablet servers based on load

● Add container creation and deletion via Ambari and Cloudera Manager

Future Improvements (ctd)● Make a GUI to make deploying containers and health

monitoring of the system easier● Make a GUI to view system health and to see what

databases are deployed onto the system● Add HBase support

Questions (???)

Date post:	01-Dec-2014
Category:	Technology
Upload:	accumulo-summit
View:	652 times
Download:	2 times

Accumulo Summit 2014: Dynamically Scaling Accumulo using Docker

Technology