BigFoot: Big Data For Every Organization

.

......BigFoot: Big Data For Every Organization

Matteo Dell’Amico

Open World Forum 2014, Paris

About BigFoot

About BigFoot Goals

BigFoot Goals.Big Data For Every Organization..

......

Automatic & self-tuned deployment for private clouds

Optimization on all layers

Scalablemachine learning (time-series analysis, forecasting,clustering…)Optimizations for big data frameworksInteractive queries on raw data

Contribute to the Free Software community

About BigFoot The BigFoot Architecture

My Presentation

.Scheduling..

......

HFSP: a new Hadoop scheduler

Schedsim: a playground to simulate new schedulers

.OpenStack..

......

Apache Spark on demand

Work in progress: VM placement optimizations

Scheduling in Hadoop

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Scheduling in Hadoop HFSP

HFSP: Size-Based Scheduling For Hadoop

.

......

Consistently better than Fair Scheduler (and others…)

The more the system is loaded, the more differenceWe estimate job sizes: it works!

Download from https://github.com/bigfootproject/hfsp

https://github.com/bigfootproject/hfsp

Scheduling in Hadoop PSBS

PSBS – Practical Size-Based Scheduler

Existing Schedulers PSBS: Our proposal.

......

Plotting scheduler response time

blue: better than traditional “fair scheduler”; red: worse

Paper: http://arxiv.org/abs/1410.6122

Simulator: https://github.com/bigfootproject/schedsim

http://arxiv.org/abs/1410.6122

https://github.com/bigfootproject/schedsim

OpenStack

OpenStack Sahara

OpenStack Sahara

.Hadoop On-Demand..

......

Choose number and size of machines

Choose Hadoop version

Voila, a cluster in your datacenter!

.Analytics As-A Service..

......

Compile your Jar

Choose number and size of machines, etc., as before

A cluster appears, does your analytics, and vanishes

OpenStack Sahara

Spark On Sahara

.Spark Is Cool..

......

A project started by the Berkeley AMP Lab

Fast: in-memory computing

Easy: concise code in Scala or Python

.What We Did..

......We made Spark available on Sahara since May

OpenStack Scheduling

Work In Progress

.OpenStack Scheduler..

......

Places virtual machines one at a time

Allows hand-defined filters

Tries to place VMs on least loaded hosts

.What WeWant To Do..

......

Do the placement of a cluster!

VMs that talk a lot to each other: place them closePlace them also close to data!Not too many: we don’t want to overload drives

Parting Words

Parting Words Conclusion

Thank You!

.

......

These slides:http://bit.ly/bigfoot_owf14

.

......

Web: http://bigfootproject.eu

Twitter: @bigfoot_project

Github: http://github.com/bigfootproject/

Bitbucket:bitbucket.org/bigfootproject/

http://bit.ly/bigfoot_owf14

http://bigfootproject.eu

https://twitter.com/bigfoot_project

http://github.com/bigfootproject/

http://github.com/bigfootproject/

bitbucket.org/bigfootproject/

Date post:	11-Jun-2015
Category:	Data & Analytics
Upload:	matteo-dellamico
View:	138 times
Download:	1 times

BigFoot: Big Data For Every Organization

Data & Analytics