+ All Categories
Home > Software > JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Date post: 23-Jan-2018
Category:
Upload: jonas-traub
View: 194 times
Download: 0 times
Share this document with a friend
84
On-Demand Data Streaming from Sensor Nodes (ACM SoCC 2017) and A quick overview of Apache Flink Presentation at Sep. 30, 2017 University of California, Santa Barbara
Transcript
Page 1: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

On-Demand Data Streaming from Sensor Nodes

(ACM SoCC 2017)

and

A quick overview of Apache Flink

Presentation at Sep. 30, 2017

University of California, Santa Barbara

Page 2: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

About me

• Researcher and PhD candidate at – Technische Universität Berlin (DIMA)

– German Research Center for Artificial Intelligence (DFKI) / (IAM)

• Working with Volker Markl

• Before – Master’s degree in Computer Science (KTH Stockholm and TU Belin)

– Bachelor’s degree in Applied Computer Science (DHBW Stuttgart)

– Four years at IBM in Germany and the USA

Jonas Traub [email protected] [email protected]

Page 3: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Optimized On-Demand Data

Streaming from Sensor Nodes

Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Extended Talk for . .

Santa Clara, California,

September 25-27, 2017

Page 4: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud

Real-time

insights

4

Page 5: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud

Real-time

insights

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

5

Page 6: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud

Real-time

insights

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

6

Page 7: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud

Real-time

insights

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

7

Page 8: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud

Real-time

insights

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

8

Page 9: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Problems

Real-time

insights

9

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

Page 10: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Problems

Real-time

insights

Streaming all data from billions

of sensors to all applications

with maximal frequencies is impossible

10

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

Page 11: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Problems

Real-time

insights

Streaming all data from billions

of sensors to all applications

with maximal frequencies is impossible

Increasing data rates

require expensive

system scale-out.

11

Billions of sensor nodes form a sensor cloud

and provide data streams to analysis systems.

Page 12: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Solutions

12

Tailor Data Streams to the Demand of Applications

Page 13: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Solutions

13

Tailor Data Streams to the Demand of Applications

• Provide an abstraction to define the data demand of applications.

User-Defined Sampling Functions (UDSFs)

Page 14: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Solutions

14

Tailor Data Streams to the Demand of Applications

• Provide an abstraction to define the data demand of applications.

• Optimize communication costs while maintaining the result accuracy.

User-Defined Sampling Functions (UDSFs)

Read-Time Optimization

Page 15: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

The Sensor Cloud – Solutions

15

Tailor Data Streams to the Demand of Applications

• Provide an abstraction to define the data demand of applications.

• Optimize communication costs while maintaining the result accuracy.

• Share sensor reads and data transfer among users and queries.

User-Defined Sampling Functions (UDSFs)

Read-Time Optimization

Multi-Query / Multi-User Optimization

Page 16: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

16

Page 17: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

17

Page 18: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

18

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

Page 19: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

19

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

Page 20: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

20

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

• Query 2 requires a sample at least every 20 meters

Page 21: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

21

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

• Query 2 requires a sample at least every 20 meters

Page 22: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

22

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

• Query 2 requires a sample at least every 20 meters

• Query 3 requires a sample at least every 0.3s.

Page 23: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

23

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

• Query 2 requires a sample at least every 20 meters

• Query 3 requires a sample at least every 0.3s.

Page 24: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example

24

Different Data Data Demands:

• Query 1 adaptively increases sampling rates when accelerating or braking.

• Query 2 requires a sample at least every 20 meters

• Query 3 requires a sample at least every 0.3s.

Page 25: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example - Evaluation

25

Page 26: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example - Evaluation

26

-57%

Page 27: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example - Evaluation

27

-57%

-72%

Page 28: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A Motivating Example - Evaluation

28

1/3 because 3 values per tuple

-57%

-72%

Page 29: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Architecture Overview

29

Page 30: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Architecture Overview

30

Page 31: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Architecture Overview

31

Page 32: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Architecture Overview

32

Page 33: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Architecture Overview

33

Page 34: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Sensor Read Scheduling

34

Page 35: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions

35

Input:

Sensor read time and value Output:

Next Sensor Read Request

Page 36: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions

36

Input:

Sensor read time and value

Page 37: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions

37

Enable adaptive sampling techniques to reduce data transmission

e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13]

Page 38: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions - Examples

38

Page 39: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions - Examples

39

Page 40: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

User-Defined Sampling Functions - Examples

40

Page 41: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Sensor Read Fusion

41

Page 42: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Sensor Read Fusion

42

1) Minimize Sensor Reads and Data Transfer:

Latest possible read time

Page 43: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Read Time Optimization

43

2) Optimize Sensor Read Times:

● Minimize penalty while executing the minimum number of sensor reads only

● Challenge: assign read requests to sensor reads

Page 44: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Assigning Read Requests to Sensor Reads

44

Postpone Assign to next Read

Page 45: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Assigning Read Requests to Sensor Reads

45

Postpone Assign to next Read

Page 46: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Assigning Read Requests to Sensor Reads

46

Postpone Assign to next Read

Page 47: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Assigning Read Requests to Sensor Reads

47

Postpone Assign to next Read

Page 48: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Local Filtering

48

Page 49: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Local Filtering

49

● Enable adaptive filtering in combination with adaptive sampling

● Enable model-driven data acquisition

Page 50: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Local Filtering

50

Page 51: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Evaluation

● Replay sensor data

- from a football match [DEBS Grand Challenge ’13]

- formula 1 telementry data

● Random UDSFs:

- Read in a poisson process (also simulate load peaks)

- In average 1 read per query per second

- Exponentially distributed read time tolerance

- high probability for small tolerances

- small probability for large tolerances

- In average 0.04s read time tolerance

51

Page 52: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 52

Increasing the number of concurrent queries

• On-Demand scheduling reduces sensor reads and data transfer by up to 87%.

• The # of reads and transfers increases sub-linearly with the # of queries.

Page 53: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 53

Increasing the number of concurrent queries

• Our read-time optimizer reduces the deviation from desired read times

by up to 69% (preserving the min. # of reads and transfers).

Page 54: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 54

Increasing read time tolerances

Page 55: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 55

Increasing read time tolerances

Page 56: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 56

Query Prioritization (1/2)

Page 57: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 57

Query Prioritization (2/2)

Page 58: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 58

Slack Robustness of Adaptive Sampling

Page 59: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

Optimized On-Demand Data

Streaming from Sensor Nodes

Wrap-Up:

Tailor Data Streams to the Demand of Applications

• Define data demand: User-Defined Sampling Functions

• Schedule sensor reads and data transfer on-demand

• Optimize read times globally - for all users and queries

Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

Page 60: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17

A quick overview of Apache Flink - research summary -

Jonas Traub visiting September 30, 2017

Page 61: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Outline

Apache Flink Primer

• Stratosphere – The origin of Apache Flink

• What is Apache Flink? – Basic System Internals

• The Flink Community

An Apache Flink Research Summary

61

Page 62: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

62 © Volker Markl

• Relational Algebra

• Declarativity

• Query Optimization

• Robust Out-of-core

• Scalability

• User-defined

Functions

• Complex Data Types

• Schema on Read

62

Draws on

Database Technology

Draws on

MapReduce Technology

Stratosphere: General Purpose

Programming + Database Execution

Page 63: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

63 © Volker Markl

• Relational Algebra

• Declarativity

• Query Optimization

• Robust Out-of-core

• Scalability

• User-defined

Functions

• Complex Data Types

• Schema on Read

• Iterations

• Advanced Dataflows

• General APIs

• Native Streaming

63

Draws on

Database Technology

Draws on

MapReduce Technology

Adds

Stratosphere: General Purpose

Programming + Database Execution

Page 64: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

64 © Volker Markl 64

64 © Volker Markl

Apache Flink is an open source platform for scalable batch and stream

data processing.

What is Apache Flink?

http://flink.apache.org

A distributed system that you can use to process data

Like a DBMS but not exactly a DBMS

What kind of data? Data that comes in the form of streams

What kind of processing

Quite flexible. You can use Java/Scala APIs similar to programming with Java collections, the new SQL API, etc

Distributed: runs on many (1000s) of machines and hides this complexity from the user

64

Page 65: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

65 © Volker Markl 65

65 © Volker Markl

Basic application architecture

app state

app state

app state

event log

Query

service

Sources of data

(e.g., sensors,

logs, …)

A replayable log of

events with pub/sub

functionality

Processing of

events

Storage and query systems

By courtesy of Kostas Tzoumas 65

Page 66: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

66 © Volker Markl 66 © 2013 Berlin Big Data Center • All Rights Reserved

66 © Volker Markl

Technology inside Flink

case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Program

Page 67: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

67 © Volker Markl 67 © 2013 Berlin Big Data Center • All Rights Reserved

67 © Volker Markl

Technology inside Flink

case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Cost-based

optimizer

Type extraction

stack

Pre-flight (Client) Program

Page 68: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

68 © Volker Markl 68 © 2013 Berlin Big Data Center • All Rights Reserved

68 © Volker Markl

Technology inside Flink

case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Cost-based

optimizer

Type extraction

stack

Pre-flight (Client) DataSourc

e orders.tbl

Filter

Map DataSourc

e lineitem.tbl

Join Hybrid Hash

build

HT probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

Dataflow

Graph

Page 69: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

69 © Volker Markl 69 © 2013 Berlin Big Data Center • All Rights Reserved

69 © Volker Markl

Technology inside Flink

case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Cost-based

optimizer

Type extraction

stack

Task

scheduling

Recovery

metadata

Pre-flight (Client)

Master Workers

DataSourc

e orders.tbl

Filter

Map DataSourc

e lineitem.tbl

Join Hybrid Hash

build

HT probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

Dataflow

Graph

deploy

operators

track

intermediate

results

Page 70: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

70 © Volker Markl 70

70 © Volker Markl

Flink community

0

50

100

150

200

250

300

Feb15 Dec15 Dec16

NumberofContributors

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Feb15 Dec15 Dec16

StarsonGitHub

0

200

400

600

800

1000

1200

1400

Feb15 Dec15 Dec16

ForksonGitHub

By courtesy of Kostas Tzoumas

Project Statistics (Updated: Sep 29, 2017)

70

Page 71: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

71 © Volker Markl 71

71 © Volker Markl

Companies Using Flink

Page 72: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Apache Flink - Related Publications

System Paper 2015:

System Paper 2014:

72

Page 73: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Apache Flink - Related Publications

System Paper 2015:

System Paper 2014:

73

Page 74: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Apache Flink - Related Publications

System Paper 2015:

System Paper 2014:

74

Page 75: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

State Management

VLDB 2017

75

Page 76: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Iterative Processing

VLDB 2012

76

Page 77: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Iterative Processing

VLDB 2012

SIGMOD 2013 77

Page 78: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Fault Tolerance

78

Page 79: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Fault Tolerance

79

Page 80: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Fault Tolerance

80

Page 81: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Fault Tolerance

81

Page 82: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Streaming Window Aggregation

82

Page 83: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

Visualization of Streaming Data

83

Page 84: JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of Apache Flink

On-Demand Data Streaming from Sensor Nodes

(ACM SoCC 2017)

and

A quick overview of Apache Flink

Presentation at Sep. 30, 2017

University of California, Santa Barbara


Recommended