Date post: | 12-Jul-2015 |
Category: |
Technology |
Upload: | big-data-spain |
View: | 157 times |
Download: | 0 times |
THE ABSTRACTION THAT POWERS THE BIG DATA
RAÚL CASTRO FERNÁNDEZCOMPUTER SCIENCE PHD STUDENT IMPERIAL COLLEGE
Data!ows: The Abstraction that Powers Big Data
Raul Castro Fernandez Imperial College London
[email protected] @raulcfernandez
“Big Data needs Democra:za:on”
3
Developers and DBAs are no longer the only ones genera:ng, processing and analyzing data.
Democratization of Data
4
Decision makers, domain scien:sts, applica:on users, journalists, crowd workers, and everyday consumers, sales,
marke:ng…
Democratization of Data
Developers and DBAs are no longer the only ones genera:ng, processing and analyzing data.
5
+ Everyone has data
6
+ Everyone has data
+ Many have interes:ng ques:ons
7
+ Everyone has data
+ Many have interes:ng ques:ons
-‐ Not everyone knows how to analyze it
8
+ Everyone has data
+ Many have interes:ng ques:ons
-‐ Not everyone knows how to analyze it
9
Bob Local Expert
10
Bob Local Expert
11
Bob Local Expert
-‐ Barrier of human communica:on -‐ Barrier of professional rela:ons
12
Bob Local Expert
-‐ Barrier of human communica:on -‐ Barrier of professional rela:ons
The limits of my language mean the limits of my world.
Ludwig WiWgenstein “Tractatus Logico-‐Philosophicus 1922”
13
First step to democra:ze Big Data: to offer a familiar programming interface
• Mo>va>on • SDG: Stateful Dataflow Graphs • Handling distributed state in SDGs • Transla:ng Java programs to SDGs • Checkpoint-‐based fault tolerance for SDGs • Experimental evalua:on
14
Outline
? ?
Mutable State in a Recommender System
15
Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); Item-‐A Item-‐B
User-‐A 4 5
User-‐B 0 5
Item-‐A Item-‐B
Item-‐A 1 1
Item-‐B 1 2
User-‐Item matrix (UI)
Co-‐Occurrence matrix (CO)
Mutable State in a Recommender System
16
Matrix userItem = new Matrix(); Matrix coOcc = new Matrix();
void addRa>ng(int user, int item, int ra>ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); }
Item-‐A Item-‐B
User-‐A 4 5
User-‐B 0 5
Item-‐A Item-‐B
Item-‐A 1 1
Item-‐B 1 2
User-‐Item matrix (UI)
Co-‐Occurrence matrix (CO)
Update with new ra:ngs
Mutable State in a Recommender System
17
Matrix userItem = new Matrix(); Matrix coOcc = new Matrix();
void addRa>ng(int user, int item, int ra>ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); }
Vector getRec(int user) { Vector userRow = userItem.getRow(user); Vector userRec = coOcc.mul:ply(userRow); return userRec; }
Item-‐A Item-‐B
User-‐A 4 5
User-‐B 0 5
Item-‐A Item-‐B
Item-‐A 1 1
Item-‐B 1 2
User-‐Item matrix (UI)
Co-‐Occurrence matrix (CO)
Update with new ra:ngs
Mul:ply for recommenda:on
User-‐B 1 2 x
18
Challenges When Executing with Big Data
Big Data Problem: Matrices
become large
> Mutable state leads to concise algorithms but complicates parallelism and fault tolerance
Matrix userItem = new Matrix(); Matrix coOcc = new Matrix();
> Cannot lose state aRer failure
> Need to manage state to support data-‐parallelism
19
Using Current Distributed Data"ow Frameworks
Input data
Output data
> No mutable state simplifies fault tolerance
> MapReduce: Map and Reduce tasks > Storm: No support for state > Spark: Immutable RDDs
20
> Programming distributed dataflow graphs requires learning new programming models
Imperative Big Data Processing
21
Our Goal: Run Java programs with mutable state but with
performance and fault tolerance of distributed dataflow systems
> Programming distributed dataflow graphs requires learning new programming models
Imperative Big Data Processing
22
> @Annota>ons help with transla>on from Java to SDGs > Mutable distributed state in dataflow graphs
Stateful Data"ow Graphs: From Imperative Programs to Distributed Data"ows
Program.java
SDGs: Stateful Dataflow Graphs
> Checkpoint-‐based fault tolerance recovers mutable state aRer failure
• Mo:va:on • SDG: Stateful Dataflow Graphs • Handling distributed state in SDGs • Transla:ng Java programs to SDGs • Checkpoint-‐based fault tolerance for SDGs • Experimental evalua:on
23
Outline
Program.java
SDG: Data, State and Computation
> SDGs separate data and state to allow data and pipeline parallelism
24
Task Elements (TEs) process data
State Elements (SEs) represent state
Dataflows represent
data
> Task Elements have local access to State Elements
State Elements support two abstrac:ons for distributed mutable state – Par>>oned SEs: task elements always access
state by key – Par>al SEs: task elements can access
complete state
25
Distributed Mutable State
26
Distributed Mutable State: Partitioned SEs
Dataflow routed according to hash func:on
Item-‐A Item-‐B
User-‐A 4 5
User-‐B 0 5 Access by key
State par::oned according to par>>oning key
> Par>>oned SEs split into disjoint par::ons
User-‐Item matrix (UI)
hash(msg.id)
Key space: [0-‐N] [0-‐k]
[(k+1)-‐N]
27
Distributed Mutable State: Partial SEs
Local access: Data sent to one
Global access: Data sent to all
> Par>al SE gives nodes local state instances
> Par>al SE access by TEs can be local or global
28
Merging Distributed Mutable State
Merge logic
> Requires applica:on-‐specific merge logic
> Reading all par:al SE instances results in set of par>al values
29
Merging Distributed Mutable State
Mul:ple par:al values
Merge logic
> Requires applica:on-‐specific merge logic
> Reading all par:al SE instances results in set of par>al values
30
Merging Distributed Mutable State
Mul:ple par:al values
Collect par:al values
Merge logic
> Requires applica:on-‐specific merge logic
> Reading all par:al SE instances results in set of par>al values
31
Outline
> @Annota>ons
• Mo:va:on • SDG: Stateful Dataflow Graphs • Handling distributed state in SDGs • Transla>ng Java programs to SDGs • Checkpoint-‐based fault tolerance for SDGs • Experimental evalua:on
Program.java
32
From Imperative Code to Execution
SEEP
Annotated program
> SEEP: data-‐parallel processing plaborm
• Transla:on occurs in two stages: – Sta<c code analysis: From Java to SDG – Bytecode rewri<ng: From SDG to SEEP [SIGMOD’13]
Program.java
Program.java
33
Extract TEs, SEs and accesses
Live variable analysis
TE and SE access code assembly
SEEP runnable
SOOT Framework
Javassist
> Extract state and state access paderns through sta:c code analysis
> Genera:on of runnable code using TE and SE connec:ons
Translation Process
Program.java
34
Extract TEs, SEs and accesses
Live variable analysis
TE and SE access code assembly
SEEP runnable
SOOT Framework
Javassist
> Extract state and state access paderns through sta:c code analysis
> Genera:on of runnable code using TE and SE connec:ons
Translation Process
Annotated Program.java
35
@Par>>oned Matrix userItem = new SeepMatrix(); Matrix coOcc = new Matrix(); void addRa:ng(int user, int item, int ra:ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); } Vector getRec(int user) { Vector userRow = userItem.getRow(user); Vector userRec = coOcc.mul:ply(userRow); return userRec; }
Partitioned State Annotation
> @Par>>on field annota>on indicates par<<oned state
hash(msg.id)
36
@Par::oned Matrix userItem = new SeepMatrix(); @Par>al Matrix coOcc = new SeepMatrix(); void addRa:ng(int user, int item, int ra:ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(@Global coOcc, userItem); }
Partial State and Global Annotations
> @Global annotates variable to indicate access to all par:al instances
> @Par>al field annota>on indicates par<al state
37
@Par::oned Matrix userItem = new SeepMatrix(); @Par>al Matrix coOcc = new SeepMatrix(); Vector getRec(int user) { Vector userRow = userItem.getRow(user); @Par>al Vector puRec = @Global coOcc.mul:ply(userRow); Vector userRec = merge(puRec); return userRec; } Vector merge(@Collec>on Vector[] v){ /*…*/ }
Partial and Collection Annotations
> @Collec>on annota:on indicates merge logic
38
Outline
> Failures
• Mo:va:on • SDG: Stateful Dataflow Graphs • Handling distributed state in SDGs • Transla:ng Java programs to SDGs • Checkpoint-‐Based fault tolerance for SDGs • Experimental evalua:on
Program.java
39
Challenges of Making SDGs Fault Tolerant
Physical deployment of SDG > Node failures may lead to state loss
> Task elements access local in-‐memory state
40
Challenges of Making SDGs Fault Tolerant
RAM RAM
Physical deployment of SDG > Node failures may lead to state loss
> Task elements access local in-‐memory state
Physical nodes
41
RAM RAM
Physical deployment of SDG > Node failures may lead to state loss
Checkpoin>ng State • No updates allowed while state
is being checkpointed • Checkpoin:ng state should not
impact data processing path
> Task elements access local in-‐memory state
Physical nodes
Challenges of Making SDGs Fault Tolerant
42
RAM RAM
Physical deployment of SDG
• Backups large and cannot be stored in memory
• Large writes to disk through network have high cost
State Backup
> Node failures may lead to state loss
Checkpoin>ng State • No updates allowed while state
is being checkpointed • Checkpoin:ng state should not
impact data processing path
> Task elements access local in-‐memory state
Physical nodes
Challenges of Making SDGs Fault Tolerant
43
Checkpoint Mechanism for Fault Tolerance
1. Freeze mutable state for checkpoin:ng 2. Dirty state supports updates concurrently 3. Reconcile dirty state
Asynchronous, lock-‐free checkpoin>ng
Dirty state
44
Distributed M to N Checkpoint Backup
M to N distributed backup and parallel recovery
45
Distributed M to N Checkpoint Backup
M to N distributed backup and parallel recovery
46
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
47
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
48
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
49
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
50
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
51
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
52
M to N distributed backup and parallel recovery
Distributed M to N Checkpoint Backup
How does mutable state impact performance? How efficient are translated SDGs? What is the throughput/latency trade-‐off?
Experimental set-‐up: – Amazon EC2 (c1 and m1 xlarge instances) – Private cluster (4-‐core 3.4 GHz Intel Xeon servers with 8 GB RAM ) – Sun Java 7, Ubuntu 12.04, Linux kernel 3.10
53
Evaluation of SDG Performance
54
0
5
10
15
20
1:5 1:2 1:1 2:1 5:1
100
1000
Thro
ughp
ut(1
000
requ
ests
/s)
Late
ncy
(ms)
Workload (state read/write ratio)
ThroughputLatency
Combines batch and online processing to serve fresh results over large mutable state
Processing with Large Mutable State
> addRa:ng and getRec func:ons from recommender algorithm, while changing read/write ra:o
55
0
10
20
30
40
50
60
25 50 75 100
Th
rou
gh
pu
t (G
B/s
)
Number of nodes
SDGSpark
Translated SDG achieves performance similar to non-‐mutable dataflow
> Batch-‐oriented, itera:ve logis:c regression
E#ciency of Translated SDG
56
SDGs achieve high throughput while main>ng low latency
Latency/Throughput Tradeo$
> Streaming word count query, repor:ng counts over windows
0
50
100
150
200
250
10 100 1000 10000Thro
ughput (1
000 r
equest
s/s)
Window size (ms)
SDGNaiad-LowLatency
57
SDGs achieve high throughput while main>ng low latency
Latency/Throughput Tradeo$
> Streaming word count query, repor:ng counts over windows
0
50
100
150
200
250
10 100 1000 10000Thro
ughput (1
000 r
equest
s/s)
Window size (ms)
SDGNaiad-LowLatency
0
50
100
150
200
250
10 100 1000 10000Thro
ughput (1
000 r
equest
s/s)
Window size (ms)
Naiad-HighThroughputSDG
Streaming Spark
58
SDGs achieve high throughput while main>ng low latency
Latency/Throughput Tradeo$
> Streaming word count query, repor:ng counts over windows
0
50
100
150
200
250
10 100 1000 10000Thro
ughput (1
000 r
equest
s/s)
Window size (ms)
SDGNaiad-LowLatency
0
50
100
150
200
250
10 100 1000 10000Thro
ughput (1
000 r
equest
s/s)
Window size (ms)
Naiad-HighThroughputSDG
Streaming Spark0
50
100
150
200
250
10 100 1000 10000Th
rou
gh
pu
t (1
00
0 r
eq
ue
sts/
s)
Window size (ms)
Naiad-HighThroughputSDG
Streaming SparkNaiad-LowLatency
Running Java programs with the performance of current distributed dataflow frameworks
SDG: Stateful Dataflow Graphs – Abstrac:ons for distributed mutable state – Annota>ons to disambiguate types of distributed state and state access
– Checkpoint-‐based fault tolerance mechanism
59
Summary
Running Java programs with the performance of current distributed dataflow frameworks
SDG: Stateful Dataflow Graphs – Abstrac:ons for distributed mutable state – Annota>ons to disambiguate types of distributed state and state access
– Checkpoint-‐based fault tolerance mechanism
60
Summary
Thank you! Any Ques>ons?
@raulcfernandez [email protected]
hEps://github.com/lsds/Seep/ hEps://github.com/raulcf/SEEPng/
BACKUP SLIDES
61
62
0
0.5
1
1.5
2
50 100 150 200 1
10
100
1000
Th
rou
gh
pu
t (m
illio
n r
eq
ue
sts/
s)
La
ten
cy (
ms)
Aggregated memory (GB)
ThroughputLatency
Support large state without compromising throughput or latency while staying fault tolerant
Scalability on State Size and Throughput
> Increase state size in a mutated KV store
63
Itera:on in SDG
> Local itera>on supported by one node
> Itera>on across TEs requires cycle in the dataflow
• Par::on • Par:al • Global • Par:al • Collec:on • Data annota:ons – Batch – Stream
64
Types of Annota:ons
Overhead of SDG Fault Tolerance
65
1
10
100
1000
10000
No FT 1 2 3 4 5
Late
ncy
(ms)
State size (GB)
1
10
100
1000
2 4 6 8 10 No FT
Late
ncy
(ms)
Checkpoint frequency (s)
Fault Tolerance mechanism impact on performance and
latency is small.
State size and checkpoin>ng Frequency do not affect the performance
66
0
2
4
6
8
10
10 100 1000 2000 0
20
40
60
80
100
Thro
ughput (1
0,0
00 r
equest
s/s)
Late
ncy
(m
s)
Aggregated memory (MB)
SDGNaiad-NoDiskNaiad-DiskSDG (latency)Naiad-NoDisk (latency)
Fault Tolerance Overhead
0
5
10
15
20
25
30
35
40
1 2 4
Reco
very
tim
e (
s)
State size (GB)
1-to-1 recovery2-to-1 recovery1-to-2 recovery2-to-2 recovery
67
Recovery Times
68
0
5
10
15
20
25
30
0 10 20 30 40 50 60 0
1
2
3
4
5
Th
rou
gh
pu
t (1
00
0 r
eq
ue
st/s
)
Nu
mb
er
of
no
de
s
Time (s)
ThroughputNodes
Stragglers
69
0
50
100
150
200
250
1 2 3 40.001
0.01
0.1
1
10
Thro
ughp
ut(1
000
requ
ests
/s)
Late
ncy
(s)
State size (GB)
T'put (Sync)Latency (Sync)T'put (Async)
Fault Tolerance Sync. Vs. Async.
System Large State Mutable State Low Latency Itera>on
MapReduce n/a n/a No No
Spark n/a n/a No Yes
Storm n/a n/a Yes No
Naiad No Yes Yes Yes
SDG Yes Yes Yes Yes
70
Comparison to State-‐of-‐the-‐Art
SDGs are first stateful fault tolerant model; enabling execu:on of impera:ve code with explicit state
71
Characteris:cs of SDGs
> Run>me Data Parallelism (elas>city)
> Support for Cyclic Graphs
> Low Latency
Adapta:on to varying workloads and mechanism against stragglers
Efficiently represent itera:ve algorithms
Pipelining tasks decreases latency
72
Bob Local Expert
Hi, I have a query to run on “Big Data”
Ok, cool, tell me about it
I want to know sales per employee on Saturdays
… well … ok, come in 3 days
Well, this is actually preWy urgent…
… 2 days, I’m preWy busy
2 Days Ayer
Hi! You have the results?
Yes, here you have your sales last Saturday
My sales? I meant all employee sales, and not only last Saturday
ups, sorry for that, give me 2 days…
17TH ~ 18th NOV 2014MADRID (SPAIN)