+ All Categories
Home > Data & Analytics > BigFoot: Big Data For Every Organization

BigFoot: Big Data For Every Organization

Date post: 11-Jun-2015
Category:
Upload: matteo-dellamico
View: 138 times
Download: 1 times
Share this document with a friend
Description:
Everybody wants to do big data analytics these days: storage is cheapand data is plentiful; best of all, software in the Hadoop ecosystem is free both as in speech and as in beer. If you are not Facebook or Amazon, however, you are not likely to put your precious data in the systems of cloud providers you may not trust; on the other hand, developing your own small or medium cluster can be prohibitive, since it requires a lot of effort and specialization to be deployed, tuned and maintained. BigFoot aims to simplify the data scientist's life, making the existing big data software easier to deploy and tune, so that data scientists can focus on their job: getting insight from data. BigFoot contributes to OpenStack: we made it possible to deploy virtualized Spark clusters, enabling analytics-as-a-service using fast in-memory computation. HFSP, our scheduler for Hadoop Mapreduce, gives priority to smaller jobs, so that large batch jobs do not harm user productivity by slowing down quicker data exploration jobs. Interestingly, HFSP achieves this without penalizing large jobs. We also contribute to the Apache Pig high-level analytics language: we propose patches that strongly enhance performance when computing aggregations on multi-dimensional data.
Popular Tags:
16
. . BigFoot: Big Data For Every Organization Matteo Dell’Amico Open World Forum 2014, Paris
Transcript
Page 1: BigFoot: Big Data For Every Organization

.

......BigFoot: Big Data For Every Organization

Matteo Dell’Amico

Open World Forum 2014, Paris

Page 2: BigFoot: Big Data For Every Organization

About BigFoot

Page 3: BigFoot: Big Data For Every Organization

About BigFoot Goals

BigFoot Goals.Big Data For Every Organization..

......

Automatic & self-tuned deployment for private clouds

Optimization on all layers

Scalablemachine learning (time-series analysis, forecasting,clustering…)Optimizations for big data frameworksInteractive queries on raw data

Contribute to the Free Software community

Page 4: BigFoot: Big Data For Every Organization
Page 5: BigFoot: Big Data For Every Organization

About BigFoot The BigFoot Architecture

My Presentation

.Scheduling..

......

HFSP: a new Hadoop scheduler

Schedsim: a playground to simulate new schedulers

.OpenStack..

......

Apache Spark on demand

Work in progress: VM placement optimizations

Page 6: BigFoot: Big Data For Every Organization

Scheduling in Hadoop

Page 7: BigFoot: Big Data For Every Organization

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Page 8: BigFoot: Big Data For Every Organization

Scheduling in Hadoop Size-Based Scheduling

“Fair” Sharing vs. Size-Based

100usage (%)

cluster

50

10 15 37.5 42.5 50

time(s)

100usage (%)

cluster

10 5020 30

50

time(s)

job 1

job 2

job 3

job 1 job 3job 2 job 1

Page 9: BigFoot: Big Data For Every Organization

Scheduling in Hadoop HFSP

HFSP: Size-Based Scheduling For Hadoop

.

......

Consistently better than Fair Scheduler (and others…)

The more the system is loaded, the more differenceWe estimate job sizes: it works!

Download from https://github.com/bigfootproject/hfsp

Page 10: BigFoot: Big Data For Every Organization

Scheduling in Hadoop PSBS

PSBS – Practical Size-Based Scheduler

Existing Schedulers PSBS: Our proposal.

......

Plotting scheduler response time

blue: better than traditional “fair scheduler”; red: worse

Paper: http://arxiv.org/abs/1410.6122

Simulator: https://github.com/bigfootproject/schedsim

Page 11: BigFoot: Big Data For Every Organization

OpenStack

Page 12: BigFoot: Big Data For Every Organization

OpenStack Sahara

OpenStack Sahara

.Hadoop On-Demand..

......

Choose number and size of machines

Choose Hadoop version

Voila, a cluster in your datacenter!

.Analytics As-A Service..

......

Compile your Jar

Choose number and size of machines, etc., as before

A cluster appears, does your analytics, and vanishes

Page 13: BigFoot: Big Data For Every Organization

OpenStack Sahara

Spark On Sahara

.Spark Is Cool..

......

A project started by the Berkeley AMP Lab

Fast: in-memory computing

Easy: concise code in Scala or Python

.What We Did..

......We made Spark available on Sahara since May

Page 14: BigFoot: Big Data For Every Organization

OpenStack Scheduling

Work In Progress

.OpenStack Scheduler..

......

Places virtual machines one at a time

Allows hand-defined filters

Tries to place VMs on least loaded hosts

.What WeWant To Do..

......

Do the placement of a cluster!

VMs that talk a lot to each other: place them closePlace them also close to data!Not too many: we don’t want to overload drives

Page 15: BigFoot: Big Data For Every Organization

Parting Words

Page 16: BigFoot: Big Data For Every Organization

Parting Words Conclusion

Thank You!

.

......

These slides:http://bit.ly/bigfoot_owf14

.

......

Web: http://bigfootproject.eu

Twitter: @bigfoot_project

Github: http://github.com/bigfootproject/

Bitbucket:bitbucket.org/bigfootproject/


Recommended