Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | cloudsigma |
View: | 156 times |
Download: | 1 times |
Big Data in a public cloud
Maximize RAM without paying for overhead CPU and other price/
performance tricks
Big Data
• Big Data: just another way of saying large data sets
• All of them are so large its difficult to manage with traditional tools
• Distributed computing is an approach to solve that problem
• First the data needs to be Mapped
• Then it can be analyzed – or Reduced
MapReduce
• Instead of talking about it, lets just see some code
Map/Reduce – distributed work
But that’s perfect for public cloud!
CPU / RAM
• Mapping – is not CPU intensive
• Reducing – is (usually) not CPU intensive
• Speed: load data in RAM or it hits the HDD and creates iowait
First point – a lot of RAM
Ah…now they have RAM instances
I wanted more RAM I said L
What about MapReduce PaaS
• Be aware of data lock in
• Be aware of forced tool set – limits your workflow
• …and I just don’t like it so much as a developer; it’s a control thing
There is another way
What we do
What is compute like – why commodity?
Now lets get to work
• Chef & Puppet – deploy and config
• Clone; scale up and down
• Problem solved?
And then there is reality…iowait
ioWAIT
• 1,000 iops on a high end enterprise HDD
• 500,000 iops on a high end SSD
• Public Cloud should be forbidden on HDD
Announcing: SSD priced as HDD
Who and How does workload?
• CERN – the LHC experiments to find the Higgs Boson
• 200 Petabytes per year
• 200,000 CPU days / day on hundreds of partner grids and clouds
• Montecarlo simulations (CPU intensive)
The CERN case
• Golden VM images with tools, from which they clone
• A set of coordinator servers, which scales the worker nodes up and down via provisioning API (clone, puppet config)
• Coordinator servers manages workload on each worker node doing Montecarlo simulations
• Federated cloud brokers such as Enstratus and Slipstream
• Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance
The loyalty program case
• The customer runs various loyalty programs
• Plain vanilla Hadoop Map/Reduce:
• Chef and Puppet to deploy and config worker nodes
• Lots of RAM, little CPU
• Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance
Find IaaS suppliers with
• Unbundled resources• Short billing cycles• New equipment: what’s your life cycle?• SSD as compute storage• Allows cross connects from your equipment in the
DC
• CloudSigma, ElasticHost, ProfitBricks to mention a few
What we do