+ All Categories

Scope vs YSmart

Date post: 28-Nov-2014
Category:
Upload: dacong-yan
View: 511 times
Download: 0 times
Share this document with a friend
Description:
Slides for a course.
17
YSmart vs SCOPE
Transcript
Page 1: Scope vs YSmart

YSmart vs SCOPE

Page 2: Scope vs YSmart

2

YSmart RevisitedWhat is YSmart?

Yet Another SQL-to-MR Translator

Why “yet another”?Sentence-by-sentence translation fails!

Page 3: Scope vs YSmart

3

Wrong Viewa = 1;b = 2;x = a j1 b

c = 1;y = j2 c;d = y;z = j3 d;

Correct Viewa = <exp1>;b = <exp2>;x = a J1 b

c = <exp1>;y = J2 c;d = y;z = J3 d;

<exp1>, <exp2> are expensive data loading!

<J1>, <J2>, <J3> are expensive computation!

Example

a

b

J1

c J2 J3d

Page 4: Scope vs YSmart

4

Big Data!

We cannot afford redundancies anymore!

Let’s eliminate redundancies YSmart!

Page 5: Scope vs YSmart

Primitive MR Jobs

Identify Correlati

ons

Merge Correlated MR jobs

SQL-like queriesMR Jobs for best performance

Correlation-Aware SQL-to-MR Translator

Page 6: Scope vs YSmart

Input Correlation (IC)Multiple MR jobs have input correlation (IC) if their

input relation sets are not disjoint

lineitem orders

J1

lineitem

J2

Page 7: Scope vs YSmart

Transit Correlation (TC)Multiple MR jobs have transit correlation (TC) if

they have input correlation (IC), and they have the same Partition Key

lineitem orders

J1

lineitem

J2

Key: l_orderkeyKey: l_orderkey

7

Page 8: Scope vs YSmart

J1 J2

Job Flow Correlation (JFC)A MR job has Job Flow Correlation (JFC) with one of its child

MR jobs if it has the same partition key as that MR job

Map Func. of MR Job 1

Map Func. of MR Job 2

Partition Key

Other Data

Reduce Func. of MR Job 1

Reduce Func. of MR Job 2

Output of MR Job 2

lineitem orders

J1

J2

Page 9: Scope vs YSmart

Put it all together1: Sentence-to-Sentence Translation• 5 MR jobs

lineitem orders lineitem lineitem

Join2

AGG1 AGG2

Left-outer-Join

Join1

2: InputCorrelation+TransitCorrelation• 3 MR jobs

3: InputCorrelation+TransitCorrelation+ JobFlowCorrelation• 1 MR job

4: Hand-coding (similar with Case 3)• In reduce function, we optimize code according query semantic

lineitem orders

lineitem orders

lineitem orders

Join2

Left-outer-Join

Page 10: Scope vs YSmart

10

YSmart vs SCOPE

Naïve Translati

on

Optimization

(Big) Data Processing LanguageBig Data Analytic Jobs

• YSmart: look at data dependence and control dependence• Identify three correlations• Merge jobs to eliminate redundancy (straightforward)

• SCOPE: look at the actual structure of the input data• Identify structural property correlations• Partition, group, merge (complicated)

Page 11: Scope vs YSmart

11

Big Picture

Naïve Translati

on

Input Independen

tOptimizatio

n

(Big) Data Processing LanguageBig Data Analytic Jobs

Input DependentOptimizati

on

Naïve Translati

onYSmart SCOPE

Page 12: Scope vs YSmart

12

YSmart vs SCOPE

Naïve Translati

on

Optimization

(Big) Data Processing LanguageBig Data Analytic Jobs

• The diagram is actually an over-abstraction.• In reality,• YSmart: source-to-source transformation• SCOPE: run-time optimizing compiler tightly coupled

with underlying execution environment

Page 13: Scope vs YSmart

YSmart vs SCOPENot an apple-to-apple comparison!

But, let’s do it anyway…

Page 14: Scope vs YSmart

14

YSmart Alone

Naïve Translati

onYSmart

(Big) Data Processing LanguageBig Data Analytic Jobs

• 3x speedup, but 17% slower than human• It is supposed to be smarter than human!• What went wrong:• Bad input code• Not enough optimization

Page 15: Scope vs YSmart

15

SCOPE Alone

Naïve Translati

onSCOPE

(Big) Data Processing LanguageBig Data Analytic Jobs

• No thorough evaluation; 2x speedup on a specific case• Problem• They are looking at structures, but at a wrong level.• Very likely, they are optimizing computations that are not

strictly necessary!

Page 16: Scope vs YSmart

16

Discussions

1. Is SQL good enough as a big data analytics processing language?• Bad language design can be detrimental• Redundancies could be introduced unnecessarily simply

due to poor expressiveness of the language

2. How to migrate traditional program analysis and compiler optimization to the big data era?• Correlation detection in YSmart is inherently similar to

dependency analysis.• In compiler optimization, we focus on def-use

statements and expressions; in big data, we should focus on big data transfer and big data tables.

Page 17: Scope vs YSmart

17

Fundamentally, we need good programming languages

&program analyses

forbig data analytics!


Recommended