+ All Categories
Transcript
Page 1: BigFoot: Big Data For Every Organization

.

......BigFoot: Big Data For Every Organization

Matteo Dell’Amico

Open World Forum 2014, Paris

Page 2: BigFoot: Big Data For Every Organization

About BigFoot

Page 3: BigFoot: Big Data For Every Organization

About BigFoot Goals

BigFoot Goals.Big Data For Every Organization..

......

Automatic & self-tuned deployment for private clouds

Optimization on all layers

Scalablemachine learning (time-series analysis, forecasting,clustering…)Optimizations for big data frameworksInteractive queries on raw data

Contribute to the Free Software community

Page 4: BigFoot: Big Data For Every Organization
Page 5: BigFoot: Big Data For Every Organization

About BigFoot The BigFoot Architecture

My Presentation

.Scheduling..

......

HFSP: a new Hadoop scheduler

Schedsim: a playground to simulate new schedulers

.OpenStack..

......

Apache Spark on demand

Work in progress: VM placement optimizations

Page 6: BigFoot: Big Data For Every Organization

Scheduling in Hadoop

Page 7: BigFoot: Big Data For Every Organization

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Page 8: BigFoot: Big Data For Every Organization

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Page 9: BigFoot: Big Data For Every Organization

Scheduling in Hadoop HFSP

HFSP: Size-Based Scheduling For Hadoop

.

......

Consistently better than Fair Scheduler (and others…)

The more the system is loaded, the more differenceWe estimate job sizes: it works!

Download from https://github.com/bigfootproject/hfsp

Page 10: BigFoot: Big Data For Every Organization

Scheduling in Hadoop PSBS

PSBS – Practical Size-Based Scheduler

Existing Schedulers PSBS: Our proposal.

......

Plotting scheduler response time

blue: better than traditional “fair scheduler”; red: worse

Paper: http://arxiv.org/abs/1410.6122

Simulator: https://github.com/bigfootproject/schedsim

Page 11: BigFoot: Big Data For Every Organization

OpenStack

Page 12: BigFoot: Big Data For Every Organization

OpenStack Sahara

OpenStack Sahara

.Hadoop On-Demand..

......

Choose number and size of machines

Choose Hadoop version

Voila, a cluster in your datacenter!

.Analytics As-A Service..

......

Compile your Jar

Choose number and size of machines, etc., as before

A cluster appears, does your analytics, and vanishes

Page 13: BigFoot: Big Data For Every Organization

OpenStack Sahara

Spark On Sahara

.Spark Is Cool..

......

A project started by the Berkeley AMP Lab

Fast: in-memory computing

Easy: concise code in Scala or Python

.What We Did..

......We made Spark available on Sahara since May

Page 14: BigFoot: Big Data For Every Organization

OpenStack Scheduling

Work In Progress

.OpenStack Scheduler..

......

Places virtual machines one at a time

Allows hand-defined filters

Tries to place VMs on least loaded hosts

.What WeWant To Do..

......

Do the placement of a cluster!

VMs that talk a lot to each other: place them closePlace them also close to data!Not too many: we don’t want to overload drives

Page 15: BigFoot: Big Data For Every Organization

Parting Words

Page 16: BigFoot: Big Data For Every Organization

Parting Words Conclusion

Thank You!

.

......

These slides:http://bit.ly/bigfoot_owf14

.

......

Web: http://bigfootproject.eu

Twitter: @bigfoot_project

Github: http://github.com/bigfootproject/

Bitbucket:bitbucket.org/bigfootproject/


Top Related