Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | tugdual-grall |
View: | 28 times |
Download: | 3 times |
#DevoxxFR
Stream Processing with Apache Flink
Tugdual “Tug” Grall Technical Evangelist @ MapR [email protected] @tgrall
1
#DevoxxFR
{“about” : “me”}
2
Tugdual “Tug” Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder
• @tgrall • http://tgrall.github.io • [email protected] / [email protected]
#DevoxxFR 3
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
Dat
aPr
oces
sing
Web-Scale StorageMapR-FS MapR-DB
Search and Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and Managed Services
Search and Others
Unified M
anagement and M
onitoring
Search and Others
Event StreamingDatabase
Custom Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
#DevoxxFR 4
Streaming technology is enabling the obvious: continuous processing on data that is continuously produced
Hint: you already have streaming data
#DevoxxFR
Decoupling
5
App B
App A
App C
State managed centralized
App B
App A
App C
Applications build their own state
#DevoxxFR
Streaming and Batch
7
2016-3-1 12:00 am
2016-3-1 1:00 am
2016-3-1 2:00 am
2016-3-11 11:00pm
2016-3-12 12:00am
2016-3-12 1:00am
2016-3-11 10:00pm
2016-3-12 2:00am
2016-3-12 3:00am…
partition
partition
#DevoxxFR
Streaming and Batch
8
2016-3-1 12:00 am
2016-3-1 1:00 am
2016-3-1 2:00 am
2016-3-11 11:00pm
2016-3-12 12:00am
2016-3-12 1:00am
2016-3-11 10:00pm
2016-3-12 2:00am
2016-3-12 3:00am…
partition
partition
Stream (low latency)
Stream (high latency)
#DevoxxFR
Streaming and Batch
9
2016-3-1 12:00 am
2016-3-1 1:00 am
2016-3-1 2:00 am
2016-3-11 11:00pm
2016-3-12 12:00am
2016-3-12 1:00am
2016-3-11 10:00pm
2016-3-12 2:00am
2016-3-12 3:00am…
partition
partition
Stream (low latency)
Batch(bounded stream)Stream (high latency)
#DevoxxFR
Processing
13
• Request / Response
• Batch
• Stream Processing
• Real-time reaction to events
• Continuous applications
• Process both real-time and historical data
#DevoxxFR
Flink Architecture
16
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
#DevoxxFR
Flink Architecture
17
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
CoreRuntime
Distributed Streaming Dataflow
#DevoxxFR
Flink Architecture
18
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
CoreRuntime
Distributed Streaming Dataflow
DataSet APIBatch Processing
API &
Libraries
#DevoxxFR
Flink Architecture
19
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
CoreRuntime
Distributed Streaming Dataflow
DataSet APIBatch Processing
API &
Libraries
FlinkMLMachine Learning
GellyGraph Processing
TableRelational
#DevoxxFR
Flink Architecture
20
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
CoreRuntime
Distributed Streaming Dataflow
DataSet APIBatch Processing
DataStream APIStream Processing
API &
Libraries
FlinkMLMachine Learning
GellyGraph Processing
TableRelational
#DevoxxFR
Flink Architecture
21
DeploymentLocal Cluster Cloud
Single JVM Standalone, YARN, Mesos AWS, Google
CoreRuntime
Distributed Streaming Dataflow
DataSet APIBatch Processing
DataStream APIStream Processing
API &
Libraries
FlinkMLMachine Learning
GellyGraph Processing
TableRelational
CEPEvent Processing
TableRelational
#DevoxxFR
Batch & Stream
23
case class Word (word: String, frequency: Int)
// DataSet API - Batchval lines: DataSet[String] = env.readTextFile(…)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()
// DataStream API - Streamingval lines: DataSream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS))
.every(Time.of(1,SECONDS)).sum(”frequency") .print()
#DevoxxFR
Flink Ecosystem
25
Source Sink
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Apache Bahir
…
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Elasticsearch
HDFS/MapR-FS
…
#DevoxxFR 29
10 Billion events/day 2Tb of data/day
30 Applications 2Pb of storage and growing
Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
#DevoxxFR
Demonstration
38
• Multiple notion of “Time” in Flink
• Event Time
• Ingestion Time
• Processing Time
#DevoxxFR
What Is Event-Time Processing
39
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
#DevoxxFR
Complex Event Processing
42
• Analyzing a stream of events and drawing conclusions
• “if A and then B ! infer event C”
• Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
#DevoxxFR
Order Events
44
Process is reflected in a stream of order events
Order(orderId, tStamp, “received”)Shipment(orderId, tStamp, “shipped”)Delivery(orderId, tStamp, “delivered”)
orderId: Identifies the ordertStamp: Time at which the event happened
#DevoxxFR
CEP to the Rescue
46
Define processing and delivery intervals (SLAs)
ProcessSucc(orderId, tStamp, duration)ProcessWarn(orderId, tStamp)DeliverySucc(orderId, tStamp, duration)DeliveryWarn(orderId, tStamp)
orderId: Identifies the ordertStamp: Time when the event happenedduration: Duration of the processing/delivery
#DevoxxFR 49
Processing: Order ! Shipmentval processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
#DevoxxFR 50
val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern)
Processing: Order ! Shipment
#DevoxxFR 51
val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))
val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
Processing: Order ! Shipment
#DevoxxFR
The End
54
• Process events in real time and/or batch
• Complex Event Processing (CEP)
• Many other things to discover
• Deployment
• High Availability
• Table/Relational API
• … https://mapr.com/ebooks/
#DevoxxFR 55
Flink Community &
Thanks to
Kostas Tzoumas Stephan Ewen Fabian Hueske Till Rohrmann
Jamie Grier
#DevoxxFR
Stream Processing with Apache Flink
Tugdual “Tug” Grall Technical Evangelist @ MapR [email protected] @tgrall
56