Scope vs YSmart

YSmart vs SCOPE

2

YSmart RevisitedWhat is YSmart?

Yet Another SQL-to-MR Translator

Why “yet another”?Sentence-by-sentence translation fails!

3

Wrong Viewa = 1;b = 2;x = a j1 b

c = 1;y = j2 c;d = y;z = j3 d;

Correct Viewa = <exp1>;b = <exp2>;x = a J1 b

c = <exp1>;y = J2 c;d = y;z = J3 d;

<exp1>, <exp2> are expensive data loading!

<J1>, <J2>, <J3> are expensive computation!

Example

a

b

J1

c J2 J3d

4

Big Data!

We cannot afford redundancies anymore!

Let’s eliminate redundancies YSmart!

Primitive MR Jobs

Identify Correlati

ons

Merge Correlated MR jobs

SQL-like queriesMR Jobs for best performance

Correlation-Aware SQL-to-MR Translator

Input Correlation (IC)Multiple MR jobs have input correlation (IC) if their

input relation sets are not disjoint

lineitem orders

J1

lineitem

J2

Transit Correlation (TC)Multiple MR jobs have transit correlation (TC) if

they have input correlation (IC), and they have the same Partition Key

lineitem orders

J1

lineitem

J2

Key: l_orderkeyKey: l_orderkey

7

J1 J2

Job Flow Correlation (JFC)A MR job has Job Flow Correlation (JFC) with one of its child

MR jobs if it has the same partition key as that MR job

Map Func. of MR Job 1

Map Func. of MR Job 2

Partition Key

Other Data

Reduce Func. of MR Job 1

Reduce Func. of MR Job 2

Output of MR Job 2

lineitem orders

J1

J2

Put it all together1: Sentence-to-Sentence Translation• 5 MR jobs

lineitem orders lineitem lineitem

Join2

AGG1 AGG2

Left-outer-Join

Join1

2: InputCorrelation+TransitCorrelation• 3 MR jobs

3: InputCorrelation+TransitCorrelation+ JobFlowCorrelation• 1 MR job

4: Hand-coding (similar with Case 3)• In reduce function, we optimize code according query semantic

lineitem orders

lineitem orders

lineitem orders

Join2

Left-outer-Join

10

YSmart vs SCOPE

Naïve Translati

on

Optimization

(Big) Data Processing LanguageBig Data Analytic Jobs

• YSmart: look at data dependence and control dependence• Identify three correlations• Merge jobs to eliminate redundancy (straightforward)

• SCOPE: look at the actual structure of the input data• Identify structural property correlations• Partition, group, merge (complicated)

11

Big Picture

Naïve Translati

on

Input Independen

tOptimizatio

n


Input DependentOptimizati

on

Naïve Translati

onYSmart SCOPE

12

YSmart vs SCOPE

Naïve Translati

on

Optimization


• The diagram is actually an over-abstraction.• In reality,• YSmart: source-to-source transformation• SCOPE: run-time optimizing compiler tightly coupled

with underlying execution environment

YSmart vs SCOPENot an apple-to-apple comparison!

But, let’s do it anyway…

14

YSmart Alone

Naïve Translati

onYSmart


• 3x speedup, but 17% slower than human• It is supposed to be smarter than human!• What went wrong:• Bad input code• Not enough optimization

15

SCOPE Alone

Naïve Translati

onSCOPE


• No thorough evaluation; 2x speedup on a specific case• Problem• They are looking at structures, but at a wrong level.• Very likely, they are optimizing computations that are not

strictly necessary!

16

Discussions

1. Is SQL good enough as a big data analytics processing language?• Bad language design can be detrimental• Redundancies could be introduced unnecessarily simply

due to poor expressiveness of the language

2. How to migrate traditional program analysis and compiler optimization to the big data era?• Correlation detection in YSmart is inherently similar to

dependency analysis.• In compiler optimization, we focus on def-use

statements and expressions; in big data, we should focus on big data transfer and big data tables.

17

Fundamentally, we need good programming languages

&program analyses

forbig data analytics!

Date post:	28-Nov-2014
Category:	Entertainment & Humor
Upload:	dacong-yan
View:	511 times
Download:	0 times

Scope vs YSmart

Entertainment & Humor