.
......BigFoot: Big Data For Every Organization
Matteo Dell’Amico
Open World Forum 2014, Paris
About BigFoot
About BigFoot Goals
BigFoot Goals.Big Data For Every Organization..
......
Automatic & self-tuned deployment for private clouds
Optimization on all layers
Scalablemachine learning (time-series analysis, forecasting,clustering…)Optimizations for big data frameworksInteractive queries on raw data
Contribute to the Free Software community
About BigFoot The BigFoot Architecture
My Presentation
.Scheduling..
......
HFSP: a new Hadoop scheduler
Schedsim: a playground to simulate new schedulers
.OpenStack..
......
Apache Spark on demand
Work in progress: VM placement optimizations
Scheduling in Hadoop
Scheduling in Hadoop Size-Based Scheduling
“Fair” Sharing vs. Size-Based
100usage (%)
cluster
50
10 15 37.5 42.5 50
time(s)
100usage (%)
cluster
10 5020 30
50
time(s)
job 1
job 2
job 3
job 1 job 3job 2 job 1
Scheduling in Hadoop Size-Based Scheduling
“Fair” Sharing vs. Size-Based
100usage (%)
cluster
50
10 15 37.5 42.5 50
time(s)
100usage (%)
cluster
10 5020 30
50
time(s)
job 1
job 2
job 3
job 1 job 3job 2 job 1
Scheduling in Hadoop HFSP
HFSP: Size-Based Scheduling For Hadoop
.
......
Consistently better than Fair Scheduler (and others…)
The more the system is loaded, the more differenceWe estimate job sizes: it works!
Download from https://github.com/bigfootproject/hfsp
Scheduling in Hadoop PSBS
PSBS – Practical Size-Based Scheduler
Existing Schedulers PSBS: Our proposal.
......
Plotting scheduler response time
blue: better than traditional “fair scheduler”; red: worse
Paper: http://arxiv.org/abs/1410.6122
Simulator: https://github.com/bigfootproject/schedsim
OpenStack
OpenStack Sahara
OpenStack Sahara
.Hadoop On-Demand..
......
Choose number and size of machines
Choose Hadoop version
Voila, a cluster in your datacenter!
.Analytics As-A Service..
......
Compile your Jar
Choose number and size of machines, etc., as before
A cluster appears, does your analytics, and vanishes
OpenStack Sahara
Spark On Sahara
.Spark Is Cool..
......
A project started by the Berkeley AMP Lab
Fast: in-memory computing
Easy: concise code in Scala or Python
.What We Did..
......We made Spark available on Sahara since May
OpenStack Scheduling
Work In Progress
.OpenStack Scheduler..
......
Places virtual machines one at a time
Allows hand-defined filters
Tries to place VMs on least loaded hosts
.What WeWant To Do..
......
Do the placement of a cluster!
VMs that talk a lot to each other: place them closePlace them also close to data!Not too many: we don’t want to overload drives
Parting Words
Parting Words Conclusion
Thank You!
.
......
These slides:http://bit.ly/bigfoot_owf14
.
......
Web: http://bigfootproject.eu
Twitter: @bigfoot_project
Github: http://github.com/bigfootproject/
Bitbucket:bitbucket.org/bigfootproject/