Contributions to Large-Scale Data Processing Systems - PhD ...PhD Defense Matthieu Caneill February...

transcript

Contributions toLarge-Scale Data Processing Systems

PhD Defense

Matthieu Caneill

February 5, 2018

Daniel HagimontINP ToulouseENSEEIHT

Jean-MarcMenaud

IMT Atlantique

Sihem Amer-YahiaCNRS / Universite

Grenoble Alpes

Noel De PalmaUniversite

Grenoble Alpes

Motivation

Worldwide data production

2008 2010 2012 2014 2016 2018

0.35.8

31 (est.)

started PhD

1zetabyte = 1000exabytes = 106petabytes = 109terabytes

(1 zetabyte is 2 billion times my hard drive)

2 / 71

Motivation

2008 2010 2012 2014 2016 2018

0.35.8

31 (est.)

started PhD

2 / 71

Motivation

2008 2010 2012 2014 2016 2018

0.35.8

31 (est.)

started PhD

2 / 71

Motivation

2008 2010 2012 2014 2016 2018

0.35.8

31 (est.)

started PhD

2 / 71

Motivation

2008 2010 2012 2014 2016 2018

0.35.8

31 (est.)

started PhD

2 / 71

Motivation

Applications

I Genome sequencing and querying (human: 3 B base pairs)

I Web and social networks (Facebook: 600 TB/day in 2014)

I Particle physics (CERN: 1 PB/s of collision data)

I etc.

Problems

I Data management at scale

I Data processing in reasonable time

I . . . and reasonable price

3 / 71

Motivation

Applications

I etc.

Problems

3 / 71

Motivation

Applications

I etc.

Problems

3 / 71

Motivation

Applications

I etc.

Problems

3 / 71

Motivation

Applications

I etc.

Problems

3 / 71

Motivation

Applications

I etc.

Problems

3 / 71

Research questions

How to design. . .

I An industrial system to handle monitoring data and makepredictions about future failures?

I An algorithm to improve locality in distributed streamingengines?

I A framework to compose data processing algorithms in adescriptive fashion, while reasoning on high level abstractions?

4 / 71

Research questions

How to design. . .

4 / 71

Research questions

How to design. . .

4 / 71

Outline

Structure of this presentation

1. Online metrics prediction in monitoring systems

2. Locality data routing

3. λ-blocks

4. Conclusion

5 / 71

Metrics prediction inmonitoring systems

How to design an industrial system to handle monitoring data andmake predictions about future failures?

Metrics prediction in monitoring systems

Actors and roles of Smart Support Center

I Coservit: Monitoring services

I HP: Cloud computing, hardware

I LIG – AMA: Machine learning

I LIG – ERODS: Cloud computing, systems

7 / 71

Scope of Smart Support Center

monitoring data

machine learning system/cloud

I Monitoring insights

I Failure prediction

I Infrastructure scaling

I More server uptime

8 / 71

monitoring data

machine learning

system/cloud

8 / 71

monitoring data

8 / 71

monitoring data

8 / 71

monitoring data

8 / 71

Challenges

I Scale monitoring infrastructure (from 1 to N nodes)

I System design for low latency analytics

I Fault tolerance

9 / 71

Challenges

I Fault tolerance

9 / 71

Challenges

I Fault tolerance

9 / 71

Metrics

I Monitoring metric: observation point on a server in adatacenter

I CPU load, memory, service status

I Reported by agents, processed, and stored

I Computed as time-series

I Associated to thresholds: warning and critical

10 / 71

Metrics

10 / 71

Metrics

10 / 71

Metrics

10 / 71

Metrics

10 / 71

Metrics behaviour: 6 scenarios

Critical zone

Warning zone

Quick rise

Slow riseTransient rise

Perplexity pointSlow rise

Quick rise

11 / 71

Linear regression

I Ability to identify localtrends (few hours)

I Fast to compute

I Good candidate to avoidfalse positives (peaks)

I Library: MLlib (part ofApache Spark)

12 / 71

Linear regression

I Fast to compute

12 / 71

Linear regression

I Fast to compute

12 / 71

Linear regression

I Fast to compute

12 / 71

Linear regression

I Fast to compute

12 / 71

Linear regression

I Fast to compute

12 / 71

System architecture

Monitoring agents

Monitoringbroker

Cassandradatabase

Spark +MLlib

Alertmanager

GUI ...

13 / 71

System architecture

Monitoring agents

Monitoringbroker

Cassandradatabase

Spark +MLlib

Alertmanager

GUI ...

13 / 71

System architecture

Monitoring agents

Monitoringbroker

Cassandradatabase

Spark +MLlib

Alertmanager

GUI ...

13 / 71

System architecture

Monitoring agents

Monitoringbroker

Cassandradatabase

Spark +MLlib

Alertmanager

GUI ...

13 / 71

System architecture

Monitoring agents

Monitoringbroker

Cassandradatabase

Spark +MLlib

Alertmanager

GUI ...

13 / 71

System architecture

Desired properties

I Scalable: up to a few servers (150 CPU cores) to handleCoservit’s load

I End-to-end fault tolerance: metrics can never be lost

I Performances: “fast” to compute metrics predictions

14 / 71

System architecture

Desired properties

14 / 71

System architecture

Desired properties

14 / 71

Evaluation

I Hardware: 4 servers (16–28 cores, 128–256 GB RAM)

I Dataset: Replay on production data recorded at Coservit

I 424 206 metrics, 1.5 billion data points monitored on 25 070servers

15 / 71

Evaluation

0 2 4 6 8 10

time (hours)

Figure: swap memory

16 / 71

Evaluation

0 2 4 6 8 10

time (hours)

past predicted

prediction is computed

Figure: swap memory

16 / 71

Evaluation

0 2 4 6 8 10

time (hours)

past predicted future

prediction is computed

Figure: swap memory

16 / 71

Evaluation

0 2 4 6 8 102

time (hours)

Figure: physical memory

17 / 71

Evaluation

0 2 4 6 8 102

time (hours)

past predicted

17 / 71

Evaluation

0 2 4 6 8 102

time (hours)

17 / 71

Evaluation

0 5 10 15 20688

696Warning

time (hours)

Figure: disk partition

18 / 71

Evaluation

0 5 10 15 20688

696Warning

time (hours)

past predicted

raise alert: diskfull in 10mins

18 / 71

Evaluation

0 5 10 15 20688

696Warning

time (hours)

raise alert: diskfull in 10mins

18 / 71

Evaluation

Metric blacklisting

I Some metrics are too volatile and hard to predict

I To avoid false positives/negatives, and save resources, theyare blacklisted

I Root Mean Square Error evaluated weekly

I Metrics (temporarily) blacklisted if their RMSE > threshold

I 58.5% of the metrics have a low RMSE → good predictions

19 / 71

Evaluation

Metric blacklisting

19 / 71

Evaluation

Metric blacklisting

19 / 71

Evaluation

Metric blacklisting

19 / 71

Evaluation

Metric blacklisting

19 / 71

Evaluation

CPU load and memory consumption

0 200 400 600 8000%

time (seconds)

CPU memory

(a) master

0 200 400 600 8000%

time (seconds)

CPU memory

(b) slave-1

Figure: Running on 4 machines and 100 cores for 15 minutes.20 / 71

Evaluation

Time repartition

load createdataframe

train predictwtfsavewtfpublish0

Figure: Time repartition for predicting a metric.21 / 71

Evaluation

Load handling

I End-to-end process for the prediction of 1 metric: 1 second.

I One monitoring server (with 24 cores) can handle the load of1440 metrics (at worst), which is 85 servers on average.

22 / 71

Evaluation

Load handling

I End-to-end process for the prediction of 1 metric: 1 second.

I One monitoring server (with 24 cores) can handle the load of1440 metrics (at worst), which is 85 servers on average.

22 / 71

Evaluation

Load handling: linear scaling

0 20 40 60 80 100 120 1400

CPU cores

) 1 slave

Figure: Amount of metrics handled in 15 minutes.23 / 71

Evaluation

0 20 40 60 80 100 120 1400

CPU cores

) 1 slave2 slaves

Evaluation

0 20 40 60 80 100 120 1400

CPU cores

) 1 slave2 slaves3 slaves

Related work

Positioning

No published work exhibits the same system (end-to-end systemfor monitoring metrics prediction, storage and blacklisting).

Prediction models

I Hardware failures [CAS12]

I Capacity planning (e.g. Microsoft Azure [mic])

I Datacenter temperature (e.g. Thermocast [LLL+11])

I Monitoring metrics (e.g. Zabbix [zab] with manual tuning)

24 / 71

Related work

Positioning

Prediction models

24 / 71

Related work

Positioning

Prediction models

24 / 71

Related work

Positioning

Prediction models

24 / 71

Locality data routing

How to design an algorithm to improve locality in distributedstreaming engines?

ActorsCollaboration with Vincent Leroy (SLIDE)and Ahmed El-Rheddane (ERODS).

26 / 71

Distributed streaming engines

I Real-time message handling

I Real-time metric calculations

I Parallelization

I Fault-tolerance

27 / 71

I Parallelization

I Fault-tolerance

27 / 71

I Parallelization

I Fault-tolerance

27 / 71

I Parallelization

I Fault-tolerance

27 / 71

Apache Storm → topologies.

extract

Blower

Ccount

Figure: Trending hashtags topology.

S sends tweets, operator A extract hashtags, B converts them tolowercase, and C counts the frequency of each hashtag.

Division into tasks → distribution and parallelization made easy.

28 / 71

extract

Blower

Ccount

28 / 71

extract

Blower

Ccount

28 / 71

Stateful operators

States are associated to keys

For example, the operator C can keep the list of trending hashtags(values) per location (keys).

extract

Blower

Ccount

state...

29 / 71

Stateful operators

ParallelizationTo keep a consistent state, same keys must be routed to the sameinstance.

foofoo

Figure: Tasks A and B are stateless, C is stateful.

30 / 71

SituationLet’s have two stateful operators, each with two instances.

Server 1

Server 2

GoalMinimize the traffic between themachines: A1→ B2 and A2→ B1.By default, locality = 1/parallelism

ConstraintKeep a good load balance betweenthe machines.

31 / 71

Server 1

Server 2

31 / 71

Server 1

Server 2

31 / 71

Keys correlation

Dynamically instrument the keys couples and represent them witha bipartite graph.

AfricaAsia

Oceania

#python#ruby

Asia7443

Oceania

#python

3011969

Server 1

Server 2

Routing tables

I S : Asia → A1

Oceania → A2

I A1: #java → B1

#ruby → B1

#python → B2

I A2: #python → B2

#java → B1

#ruby → B1

Graph partitioning → optimized routing, favorizing local links.

32 / 71

Keys correlation

Asia7443

Oceania

#python

3011969

Server 1

Server 2

Routing tables

I S : Asia → A1

Oceania → A2

I A1: #java → B1

#ruby → B1

#python → B2

#java → B1

#ruby → B1

32 / 71

Keys correlation

Asia7443

Oceania

#python

3011969

Server 1

Server 2

Routing tables

I S : Asia → A1

Oceania → A2

I A1: #java → B1

#ruby → B1

#python → B2

#java → B1

#ruby → B1

32 / 71

Keys correlation

Asia7443

Oceania

#python

3011969

Server 1

Server 2

Routing tables

I S : Asia → A1

Oceania → A2

I A1: #java → B1

#ruby → B1

#python → B2

#java → B1

#ruby → B1

32 / 71

Server 1

Server 2

Message:Posted from:

Skey route

Akey route

Reconfiguration is computed and applied

Correlation between Oceania/python and Asia/java

33 / 71

Server 1

Server 2

Message: #python doesn’t have bracesPosted from: Oceania

Skey route

Akey route

33 / 71

Server 1

Server 2

Message: #python doesn’t have bracesPosted from: Oceania

Skey routeOceania A1

Akey routepython B2

33 / 71

Server 1

Server 2

Message: #java is a verbose languagePosted from: Asia

Akey routepython B2

33 / 71

Server 1

Server 2

Message: #java is a verbose languagePosted from: Asia

Asia A2

Akey routepython B2

java B1

33 / 71

Server 1

Server 2

Asia A2

Akey routepython B2

java B1

33 / 71

Server 1

Server 2

Asia A2

Akey routepython B1

java B2

33 / 71

Server 1

Server 2

Message: #python is pretty cool!Posted from: Oceania

Asia A2

Akey routepython B1

java B2

33 / 71

Server 1

Server 2

Message: #python is pretty cool!Posted from: Oceania

Asia A2

Akey routepython B1

java B2

33 / 71

Trends evolve with timeCorrelations between keys change frequently.

2 3 4 5 6 7 8 9 10 11 12 130

time (days of March 2016)

VirginiaTexas

Florida

Figure: #nevertrump, in March 2016

34 / 71

Trends evolve with timeCorrelations between keys change frequently.

2 3 4 5 6 7 8 9 10 11 12 130

time (days of March 2016)

VirginiaTexas

Florida

Figure: #nevertrump, in March 2016

34 / 71

Locality decay

I Keys correlations evolve with time.

I Routing tables optimized by examining old data lead todecreased locality.

Reconfiguration

I We re-compute the tables every N minutes.

I Difficulty: keep the state consistent.

35 / 71

Locality decay

Reconfiguration

35 / 71

Locality decay

Reconfiguration

35 / 71

Locality decay

Reconfiguration

35 / 71

Reconfiguration protocol

Solution: online reconfiguration protocol

I update the routing tables in a live system

I without losing any message and state

36 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

1○ Get statistics2○ Send statisticsPartition graph, compute routing tables

3○ Send reconfiguration4○ Send ACK5○ Propagate6○ Transfer key statesPropagate to next operator

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

1○ Get statistics

2○ Send statisticsPartition graph, compute routing tables

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2

compute

routing

tables3 3 3 3

4 4 4 4

1○ Get statistics2○ Send statistics

Partition graph, compute routing tables

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables

3 3 3 3

4 4 4 4

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

3○ Send reconfiguration

4○ Send ACK5○ Propagate6○ Transfer key statesPropagate to next operator

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

3○ Send reconfiguration4○ Send ACK

5○ Propagate6○ Transfer key statesPropagate to next operator

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

3○ Send reconfiguration4○ Send ACK5○ Propagate

6○ Transfer key statesPropagate to next operator

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

3○ Send reconfiguration4○ Send ACK5○ Propagate6○ Transfer key states

Propagate to next operator

37 / 71

M A1 A2 B1 B2

1 1 1 1

2 2 2 2compute

routing

tables3 3 3 3

4 4 4 4

37 / 71

Evaluation

Datasets

I From Flickr and Twitter

I Fields: location (country or place), hashtag

I Size: 173M records (Flickr), 100M (Twitter)

I 8× 128 GB RAM, 20 cores.

I Computation of aggregated statistics (stateful workers).

I Parallelism (2..6), network speed (1Gb/s∣∣ 10Gb/s), message

size (0..20kB).

38 / 71

Evaluation

Datasets

I From Flickr and Twitter

I Fields: location (country or place), hashtag

I Size: 173M records (Flickr), 100M (Twitter)

I 8× 128 GB RAM, 20 cores.

I Computation of aggregated statistics (stateful workers).

I Parallelism (2..6), network speed (1Gb/s∣∣ 10Gb/s), message

size (0..20kB).

38 / 71

Evaluation

Great speed-up when network is the bottleneck.

Highly dependent on message size.

39 / 71

Evaluation

Great speed-up when network is the bottleneck.

Highly dependent on message size.

39 / 71

Evaluation – Flickr

Throughput (Ktuples/s) on 10Gb/s network, parallelism 6

0 5 10 15 20 25 300

time (minutes)

w/ reconfiguration

w/o reconfiguration

(a) message size=4kB

0 5 10 15 20 25 300

time (minutes)

w/ reconfiguration

w/o reconfiguration

(b) message size=8kB

40 / 71

Throughput (Ktuples/s) on 1Gb/s network, parallelism 6

0 5 10 15 20 25 300

time (minutes)

w/ reconfiguration

w/o reconfiguration

(a) message size=4kB

0 5 10 15 20 25 300

time (minutes)

w/ reconfiguration

w/o reconfiguration

(b) message size=8kB

41 / 71

Average throughput with 1Gb/s network, 4kB message size

2 3 4 5 60

parallelism

s) w/ reconfiguration

w/o reconfiguration

Figure: Average throughput, measured after the first reconfiguration.

42 / 71

Locality, with parallelism 6

0 5 10 15 20 250%

hash-based

43 / 71

0 5 10 15 20 250%

hash-based offline

43 / 71

0 5 10 15 20 250%

hash-based offline online

43 / 71

Locality when changing the number of collected keycorrelations

101 102 103 104 105 106 1070%

edges (logarithmic scale)

44 / 71

Related work

Scheduling: placement of operators on servers

I Using the topology [ABQ13]

I Using observed communication patterns [ABQ13]

I Using observed and/or estimated CPU and memorypatterns [FB15, PHH+15]

Load balancing: limit impact of data skew

I Partial key grouping [NMG+15]

I Special routing for frequent keys [RQA+15]

Co-location of correlated keys

I Databases partitions [CJZM10], social networks [BJJL13]

45 / 71

Related work

45 / 71

Related work

45 / 71

λ-blocks

How to design a framework to compose data processingalgorithms in a descriptive fashion, while reasoning on high level

abstractions?

λ-blocks

Design goals

I A data processing abstraction

I A graph of code blocks to represent an end-to-end processingsystem

I Separation of concerns: low-level data operations, high-leveldata processing programs

I Maximize reuse of code

I Compatible with existing (specialized) frameworks andpossibility to mix them

I Graph manipulation toolkit

I Bring simplicity to large-scale data processing

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Design goals

47 / 71

λ-blocks

Topologies

read file/etc/passwd

filtercontains: ’root’

48 / 71

λ-blocks

Topologies

48 / 71

λ-blocks

Topologies

48 / 71

λ-blocks

Topologies

"""Counts system users.

def main():

with open('/etc/passwd') as f:

return len(f.readlines())

if __name__ == '__main__':

print(main())

$ wc -l /etc/passwd

49 / 71

λ-blocks

Topologies

def main():

if __name__ == '__main__':

print(main())

$ wc -l /etc/passwd

49 / 71

λ-blocks

Topologies

def main():

if __name__ == '__main__':

print(main())

$ wc -l /etc/passwd

49 / 71

λ-blocks

Topologies

description: Count number of system users

modules: [lb.blocks.foo]

- block: readfile

filename: /etc/passwd

- block: count

inputs :

data: my_readfile.result

50 / 71

λ-blocks

BlocksI read http

I plot bars

I show console

I write line

I write lines

I split

I concatenate

I map list

I flatMap

I flatten list

I group by count

I sort

I get spark context

I spark readfile

I spark text to words

I spark map

I spark filter

I spark flatMap

I spark mapPartitions

I spark sample

I spark union

I spark intersection

I spark distinct

I spark groupByKey

I spark reduceByKey

I spark aggregateByKey

I spark sortByKey

I spark join

I spark cogroup

I spark cartesian

I spark pipe

I spark coalesce

I spark repartition

I spark reduce

I spark collect

I spark count

I spark first

I spark take

I spark takeSample

I spark takeOrdered

I spark saveAsTextFile

I spark countByKey

I spark foreach

I spark add

I spark swap

I twitter search

I grep

I head

I tail

51 / 71

λ-blocks

Blocks

@block(engine='localpython')

def take(n: int=0):

"""Truncates a list of integers.

:param int n: The length of the desired result.

:input List[int] data: The list of items to truncate.

:output List[int] result: The truncated result.

def inner(data: List[int])->ReturnType[List[int]]:

assert n <= len(data)

return ReturnEntry(result=data[:n])

return inner

52 / 71

λ-blocks

Sub-topologies

readfile

filter

count pb

53 / 71

λ-blocks

Sub-topologies

readfile filter

count print

count pb

53 / 71

λ-blocks

Sub-topologies

- block: filter

contains: error

inputs:

data: $inputs.data

- block: count

inputs:

data: filter.result

- block: readfile

filename: foo.log

- topology : count_pb

bind_in :

data: readfile.result

bind_out :

result: count.result

- block: print

inputs:

data: count_pb.result

54 / 71

λ-blocks

Sub-topologies

- block: filter

contains: error

inputs:

data: $inputs.data

- block: count

inputs:

data: filter.result

- block: readfile

filename: foo.log

- topology : count_pb

bind_in :

data: readfile.result

bind_out :

result: count.result

- block: print

inputs:

data: count_pb.result54 / 71

λ-blocks

Architecture

Block libraries

Blocksregistry

Graphengine

Topology

API, CLI

Graphplugins

55 / 71

λ-blocks

Architecture

Block libraries

Blocksregistry

Graphengine

Topology

API, CLI

Graphplugins

55 / 71

λ-blocks

Architecture

Block libraries

Blocksregistry

Graphengine

Topology

API, CLI

Graphplugins

55 / 71

λ-blocks

Architecture

Block libraries

Blocksregistry

Graphengine

Topology

API, CLI

Graphplugins

55 / 71

λ-blocks

Graph manipulations

I Verification (e.g. type checking)

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

I Program reasoning and semantics

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Instrumentation

I Caching

I Debugging tools

I Optimizations

I Monitoring

56 / 71

λ-blocks

Graph manipulations

I Reasoning on the computation graph as a high-level object

I Plugin systemI Hooks:

I before graph execution

pre-processing, optimizations, verificationsI after graph execution

post-processingI before block execution

observation, optimizationsI after block execution

observation

57 / 71

λ-blocks

Graph manipulations

I Plugin system

I Hooks:I before graph execution

observation

57 / 71

λ-blocks

Graph manipulations

pre-processing, optimizations, verifications

I after graph execution

observation

57 / 71

λ-blocks

Graph manipulations

post-processing

I before block execution

observation

57 / 71

λ-blocks

Graph manipulations

observation, optimizations

I after block execution

observation

57 / 71

λ-blocks

Graph manipulations

observation

57 / 71

λ-blocks

Graph manipulation example: instrumentation (excerpt)

by_block = {} # timing by block: begin, duration

@before_block_execution

def store_begin_time(block):

name = block.fields['name']

by_block[name]['begin'] = time.time()

@after_block_execution

def store_end_time(block, results):

by_block[name]['duration'] = \

time.time() - by_block[name]['begin']

58 / 71

λ-blocks

by_block = {} # timing by block: begin, duration

@before_block_execution

def store_begin_time(block):

by_block[name]['begin'] = time.time()

@after_block_execution

def store_end_time(block, results):

by_block[name]['duration'] = \

time.time() - by_block[name]['begin']

58 / 71

λ-blocks

@after_graph_execution

def show_times(results):

longest_first = sorted(by_block, reverse=True)

for blockname in longest_first:

print('{}\t{}'.format(

blockname,

by_block[blockname]['duration'])

59 / 71

λ-blocks

Graph manipulation example: instrumentation

block duration (ms)read http 818write lines 54grep 49split 20

60 / 71

λ-blocks

Evaluation

I Wordcount over https: local machine, 8 cores, 16 GB RAM

I Wordcount over disk: local machine, 8 cores, 16 GB RAM

I PageRank on Spark: Spark on 1 server (24 cores, 128 GBRAM)

61 / 71

λ-blocks

Evaluation

Performances

LB LB+plugins Python

real user sys

Figure: Wordcount over https: Twitter feed.

62 / 71

λ-blocks

Evaluation

Performances

LB LB+plugins Python

real user sys

Figure: Wordcount over disk: Wikipedia dataset.

63 / 71

λ-blocks

Evaluation

Performances

LB LB+plugins Python0

real user sys

Figure: PageRank on Wikipedia hyperlinks with Spark.

64 / 71

λ-blocks

Evaluation

Maximum overhead measured per topology: 50 ms

65 / 71

λ-blocks

Related work

Dataflow programming

I ML pipelines: scikit-learn [PVG+11], Spark [The17a], Orangeframework [DCE+13]

I Real-time: Apache Beam [apa], StreamPipes [RKHS15]

Blocks programming

I Recognition over recall, immediate feedback [BGK+17]

Graphs from configuration

I Pyleus [Yel16], Storm Flux [The17b]

I “Serverless” architectures and stateless functions [JVSR17]

66 / 71

λ-blocks

Related work

Blocks programming

66 / 71

λ-blocks

Related work

Blocks programming

66 / 71

λ-blocks

Related work

Blocks programming

66 / 71

Conclusion

Context

Computer systems to process large quantities of data.

Problems: how to design. . .

67 / 71

Conclusion

Context

Computer systems to process large quantities of data.

Problems: how to design. . .

67 / 71

Conclusion

Contributions

Metricsprediction

Localityrouting

λ-blocks

What it is Industrialsystem

Online routinglibrary

Data processingabstraction

Layer End-to-end Low High

Improves Uptimes Throughput Programmability

68 / 71

Conclusion

Contributions

Metricsprediction

Localityrouting

λ-blocks

68 / 71

Conclusion

Contributions

Metricsprediction

Localityrouting

λ-blocks

68 / 71

Conclusion

Contributions

Metricsprediction

Localityrouting

λ-blocks

68 / 71

Conclusion

Future work

I Predictions on long-term global trends

I Ticketing mechanism

I Replace binary locality/non-locality with distance

I Smarter way to determine when to reschedule

I Extend to more complex topologies

69 / 71

Conclusion

Future work

I Predictions on long-term global trends

I Ticketing mechanism

I Replace binary locality/non-locality with distance

I Smarter way to determine when to reschedule

I Extend to more complex topologies

69 / 71

Conclusion

Future work

λ-blocks

I Explore more graph manipulation abstractions (complexityanalysis, serialization, verification. . . )

I Streaming and online operations

I Tight integration with clusters (data storage, caches, etc)

70 / 71

Thanks! Questions?

λ-blocks

Using a Spark cluster

λ-blocks

Sparkmaster slave-1 slave-2 slave-3

Block calling Spark

Normal block

1 / 14

λ-blocks

Signature algorithm

H(B) = h(B.name, block name (not instance name)

B.args, list of (name, value) tuples

B.inputs) list of (name, H(block), connector) tuples

2 / 14

λ-blocks

Evaluation

Engine instrumentation

(1) (2) (3) (4) (5) (6)

all blocks + pluginsselected blocks + pluginsselected blocks

Figure: Wordcount program running under different setups.(1) Startup (modules import, etc); (2) Blocks registry creation, blockmodules import; (3) Plugin import; (4) YAML parsing and graphcreation; (5) Graph checks; (6) Graph execution.

3 / 14

Database schema

metrics

metric id uuidmetric name textgroup id uuid

measurements

metric id uuidtimestamp intwarn textcrit textmax doublemin doublevalue doublemetric name textmetric unit text

predictions

metric id uuidtimestamp intpredicted values list

4 / 14

Images credits

I Data Center operators verifying network cable integrity,CC-BY-SA,https://commons.wikimedia.org/wiki/File:

Dc_cabling_50.jpg

I Tokyo metro map, http://bento.com/subtop5.html

I Goto e spaghetti code, http://blogbv2.altervista.org/HD/il-goto-e-la-buona-programmazione-parte-ii/

5 / 14

Bibliography I

Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni.Adaptive online scheduling in storm.In Proceedings of the 7th ACM International Conference onDistributed Event-based Systems, DEBS ’13, pages 207–218.ACM, 2013.

Apache Beam.https://beam.apache.org/.

David Bau, Jeff Gray, Caitlin Kelleher, Josh Sheldon, andFranklyn Turbak.Learnable programming: Blocks and beyond.Commun. ACM, 60(6):72–80, May 2017.

6 / 14

Bibliography II

Xiao Bai, Arnaud Jegou, Flavio Junqueira, and Vincent Leroy.Dynasore: Efficient in-memory store for social applications.In Middleware 2013 - ACM/IFIP/USENIX 14th InternationalMiddleware Conference, Beijing, China, December 9-13, 2013,Proceedings, pages 425–444, 2013.

T. Chalermarrewong, T. Achalakul, and S. C. W. See.Failure prediction of data centers using time series and faulttree analysis.In 2012 IEEE 18th International Conference on Parallel andDistributed Systems, pages 794–799, Dec 2012.

Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden.Schism: A workload-driven approach to database replicationand partitioning.Proc. VLDB Endow., 3(1-2):48–57, September 2010.

7 / 14

Bibliography III

Janez Demsar, Tomaz Curk, Ales Erjavec, Crt Gorup, TomazHocevar, Mitar Milutinovic, Martin Mozina, Matija Polajnar,Marko Toplak, Anze Staric, Miha Stajdohar, Lan Umek, LanZagar, Jure Zbontar, Marinka Zitnik, and Blaz Zupan.Orange: Data mining toolbox in python.Journal of Machine Learning Research, 14:2349–2353, 2013.

Lorenz Fischer and Abraham Bernstein.Workload scheduling in distributed stream processors usinggraph partitioning.In 2015 IEEE International Conference on Big Data, Big Data2015, Santa Clara, CA, USA, October 29 - November 1, 2015,pages 124–133, 2015.

8 / 14

Bibliography IV

Eric Jonas, Shivaram Venkataraman, Ion Stoica, and BenjaminRecht.Occupy the cloud: Distributed computing for the 99%.arXiv preprint arXiv:1702.04024, 2017.

Lei Li, Chieh-Jan Mike Liang, Jie Liu, Suman Nath, AndreasTerzis, and Christos Faloutsos.Thermocast: A cyber-physical forecasting model fordatacenters.In Proceedings of the 17th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, KDD’11, pages 1370–1378, New York, NY, USA, 2011. ACM.

9 / 14

Bibliography V

Microsoft cloud azure.https://docs.microsoft.com/en-us/azure/

machine-learning/

machine-learning-algorithm-choice.

Muhammad Anis Uddin Nasir, Gianmarco De FrancisciMorales, David Garcıa-Soriano, Nicolas Kourtellis, and MarcoSerafini.The power of both choices: Practical load balancing fordistributed stream processing engines.In 31st IEEE International Conference on Data Engineering,ICDE, pages 137–148, 2015.

10 / 14

Bibliography VI

Boyang Peng, Mohammad Hosseini, Zhihao Hong, RezaFarivar, and Roy Campbell.R-storm: Resource-aware scheduling in storm.In Proceedings of the 16th Annual Middleware Conference,Middleware ’15, pages 149–161, New York, NY, USA, 2015.ACM.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay.Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011.

11 / 14

Bibliography VII

Dominik Riemer, Florian Kaulfersch, Robin Hutmacher, andLjiljana Stojanovic.Streampipes: solving the challenge with semantic streamprocessing pipelines.In Proceedings of the 9th ACM International Conference onDistributed Event-Based Systems, pages 330–331. ACM, 2015.

Nicolo Rivetti, Leonardo Querzoni, Emmanuelle Anceaume,Yann Busnel, and Bruno Sericola.Efficient key grouping for near-optimal load balancing instream processing systems.In Proceedings of the 9th ACM International Conference onDistributed Event-Based Systems, DEBS ’15, pages 80–91,New York, NY, USA, 2015. ACM.

12 / 14

Bibliography VIII

The Apache Spark developers.ML Pipelines.https:

//spark.apache.org/docs/latest/ml-pipeline.html,2017.

The Apache Storm developers.Flux.http://storm.apache.org/releases/2.0.0-SNAPSHOT/

flux.html, 2017.

YelpArchive.Pyleus.https://github.com/YelpArchive/pyleus, 2016.

13 / 14

Bibliography IX

Zabbix prediction triggers.https://www.zabbix.com/documentation/3.0/manual/

config/triggers/prediction.

14 / 14

Contributions to Large-Scale Data Processing Systems - PhD ...PhD Defense Matthieu Caneill February...

Documents