+ All Categories
Home > Documents > 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall)...

1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall)...

Date post: 16-Jan-2016
Category:
Upload: hugo-potter
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
26
1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (http://www.cs.berkeley.edu/~istoica/ classes/cs294/15/)
Transcript
Page 1: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

1

CS 294: Big Data System Research: Trends and

Challenges

Fall 2015 (MW 9:30-11:00, 310 Soda Hall)

Ion Stoica and Ali Ghodsi

(http://www.cs.berkeley.edu/~istoica/classes/cs294/15/)

Page 2: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Big Data

First papers:»2003: The Google file system paper»2004: The MapReduce paper

Today every major system & networking conference has Big Data sessions

Page 3: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Big Data Impact

Already helped create new business

Already helped disrupt existing businesses

»Retail »Rental»Taxi»home appliances»…

Page 4: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Big Data Stack

Data Processing Layer

Resource Management Layer

Storage Layer

Page 5: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Hadoop Stack

Data Processing Layer

Resource Management Layer

Storage Layer

Hadoop MR

Hive PigImpala Storm

Hadoop Yarn

HDFS, S3, …

Page 6: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

The Berkeley AMPLabJanuary 2011 – 2017

»8 faculty»> 40 students»3 software engineer team

Organized for collaboration

3 day retreats(twice a year)

Algorithms

Machines

People

AMP

220 campers (100+ companies)

AMPCamp3(August, 2013)

Page 7: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

The Berkeley AMPLabGovernmental and industrial funding:

Goal: Next generation of open source data analytics stack for industry &

academia:Berkeley Data Analytics Stack

(BDAS)

Page 8: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

BDAS Stack

Data Processing Layer

Resource Management Layer

Storage Layer

Mesos

Spark

SparkStreamin

g Shark SQL

BlinkDBGraphX

MLlib

MLBase

HDFS, S3, … Tachyon

Page 9: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Mesos

HDFS, S3, … Tachyon

Spark

SparkStreamin

g Shark SQL

BlinkDBGraphX

MLlib

MLBase

BDAS & Hadoop fitting together

Hadoop Yarn

HDFS, S3, …

Page 10: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Mesos

HDFS, S3, … Tachyon

How do BDAS & Hadoop fit together?

Hadoop Yarn

HDFS, S3, …

Spark

SparkStreamin

g Shark SQL

BlinkDBGraphX

MLlib

MLBaseSpark Strami

ngSharkSQL

Graph X ML

library

BlinkDB

MLbase

Spark Hadoop MR

Hive Pig Impala

Storm

Page 11: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Mesos

HDFS, S3, … Tachyon

How do BDAS & Hadoop fit together?

Hadoop Yarn

HDFS, S3, …

Spark Strami

ngSharkSQL

Graph X ML

library

BlinkDB

MLbase

Spark Hadoop MR

Hive Pig Impala

Storm

Page 12: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

This Class

Learn about state-of-art research in Big Data

Work on an exciting project

Hopefully start next generation of impactful projects

Page 13: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

13

Grading

Project: 60%

Class presentations: 40%»Around 2 papers per student»See Randy’s guidelines for leading

discussion on papers• http://bnrg.eecs.berkeley.edu/~randy/Courses/

CS294.F07/LeadingPapers.pdf

Page 14: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Administrative Information

Class website: http://www.cs.berkeley.edu/~istoica/classes/cs294/15/

Office Hours (Soda 465D): » TBA

Create an (anonymized) blog account for paper reviews if you don’t have one yet (e.g., www.blogger.com)

» Sent me an e-mail by Monday, August 31, with your blog url

» Preferred e-mail for the class e-mail list14

Page 15: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

15

Papers Is the problem real?

What is the solution’s main idea (nugget)?

Why is solution different from previous work?

»Are system assumptions different?»Is workload different?»Is problem new?

Does the paper (or do you) identify any fundamental/hard trade-offs?

Page 16: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

16

Papers (cont’d)

Do you think the work will be influential in 10 years?

»Why or why not?

Predicting the future hard, but worth a try»Look at past examples for inspiration

Page 17: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

17

Streaming Over TCP

Countless papers:»Why cannot be done…»New protocols to do it…

Today »Virtually all streaming over TCP»Trend to stream over HTTP!

Page 18: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

18

Why did it Succeed?

Page 19: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

19

Multicast

Countless papers:»Why world will come to a standstill without

multicast…»New protocols to do it…

Today»Multicast is used only in enterprise settings at

best»Overlay multicast widely used in the Internet• CDN based, e.g., WorldCup, March Madness,

Iinagurations, ...• P2P, mostly popular outside US (e.g., China)

Page 20: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

20

Why Did it Fail?

Page 21: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

21

Shared Memory

Countless papers:»How shared memory simplifies

programming parallel computers»Many, many systems proposed and build

Today:»Message passing (MPI) took over as the de

facto standard for writing parallel applications

Page 22: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

22

Why Did it Fail?

Page 23: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

23

Network ComputerBig in 90s

»Promoted by an alliance of Sun, Oracle, Acorn

Promise: many of advantages of cloud computing

»Easy to manage»Application sharing»…

Failed miserably

Page 24: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

24

Why Did it Fail?

Page 25: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

Coming Back: ChromeOSWill it succeed this time?

25

Page 26: 1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (istoica/classes/cs294/15/)

26

What are Hard/Fundamental Tradeoffs?

Brewer’s CAP conjecture: “Consistency, Availability, Partition-tolerance”, you can have only two in a distributed system

In a in-order, reliable communication protocol cannot minimize overhead and latency simultaneously

Hard to simultaneously maximize evolvability and performance


Recommended