Date post: | 21-Jan-2018 |
Category: |
Technology |
Upload: | till-rohrmann |
View: | 318 times |
Download: | 2 times |
Till Rohrmann [email protected] @stsffap
Apache Flink® and More
Jörg Schad [email protected] @joerg_schad
MapReduce is crunching Data
We need to turn faster!
SMACK Stack
EVENTSUbiquitous data
streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process
data
Visualize data and build data driven
applications
Mesos/ DC/OS
Sensors
Devices
Clients
Evolution of Data Analytics
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analytics
Reports what has happened using descriptive analytics
Predictive User Interface
Real-time Pricing and Routing
Real-time Advertising
Billing,Chargeback
Product recommendations
8
9
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
Apache Flink In a Nutshell
10
Event-driven applications (event sourcing, CQRS)
Stateful, event-driven,event-time-aware processing
Batch Processing (data sets)
Stream Processing / Analytics (data streams, windows, …)
Apache Flink Stack
11
DataStream API Stream Processing
DataSet API Batch Processing
Runtime Distributed Streaming Data Flow
Libraries
Streaming and batch as first class citizens.
Programming Model
12
Computation
Computation
Computation
Computation
Source Source
SinkSink
Transformation
state
state
state
state
API & Execution
13
7
SourceDataStream<String> lines = env.addSource(new FlinkKafkaConsumer010(…));
DataStream<Event> events = lines.map(line -> parse(line));
DataStream<Statistic> stats = stream .keyBy("id") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction());
stats.addSink(new BucketingSink(path));
keyBy()/ window()/
apply()
Transformation
Transformation
Sink
Streaming Dataflowmap()Source Sink
Distributed Runtime
14
Levels of Abstraction
15
Process Function (events, state, time)
DataStream API (streams, windows)
Table API (dynamic tables)
Stream SQL
low-level (stateful stream processing)
stream processing & analytics
declarative DSL
high-level language
What Is Flink Good For?
16
17
Detecting fraud in real time
As fraudsters get better, need to update models without downtime
Live 24/7 service
Credit card transactions
Notifications and alerts
Evolving fraud models built by data scientists
@
18
▪ Athena X ▪ SQL to define metrics ▪ Thresholds and actions to trigger
▪ Blends analytics andactions Streams from
Hadoop, Kafka, etc
SQL, thresholds, actions
Analytics Alerts
Derived streams
@
19
▪ Route events to Kafka, ES, Hive ▪ Complex interaction sessions rules ▪ Mix of stateless / small state / large state
▪ Stream Processing as a Service • Launching, monitoring, scaling, updating • DSL to define jobs
@
20
▪ Blink based on Flink ▪ A core system in Alibaba Search
• Machine learning, search, recommendations • A/B testing of search algorithms • Online feature updates to boost conversion rate
▪ Alibaba is a major contributor to Flink ▪ Contributing many changes back to open source
@
21
Complete social network Implemented using event sourcing andCQRS (Command Query Responsibility Segregation)
@
Apache Flink & Apache Mesos
22
Why Apache Mesos?
▪ Mesos offers full functionality to implement fault tolerant and elastic distributed applications
▪ 30% of survey respondents were running Flink on Mesos (prior to proper Mesos support, September 2016)
23
Flink’s Mesos Integration
24▪ Kudos to Eron Wright ( EronWright) for this work
Apache Flink Framework
Mesos Master
Mesos App Master
Flink MesosResourceManager
JobManager
Mesos Task
TaskManager
Mesos Task
TaskManager
Allocate Resources
Launch Mesos tasks
Register
Execute Job
Resource Manager Components
▪ Monitors connection to Mesos
25
Connection Monitor Launch Coordinator
▪ Resource offer processing and task scheduling
▪ Gathers offers and matches them to tasks using Fenzo
Task MonitorReconciliation Coordinator
▪ Monitors Mesos tasks ▪ Triggers reconciliation ▪ Makes sure tasks are properly
killed
▪ Reconciles tasks view between ResourceManager and Mesos Master
Component Interplay
26
ResourceManager
Connection Monitor
Launch Coordinator
Task MonitorReconciliation Coordinator
Mesos MasterResource offers
Launch tasks
Monitor tasks
Status messages
Trigger reconciliation
Status messages
Mesos Task
Reconcile tasks
Start TaskManagers
Recover tasks
Kill task
Fenzo▪ Developed by Netflix ▪ Generic task scheduler for frameworks ▪ Matching between tasks and resource offers
• Pluggable fitness evaluator
27
Fenzo
Mesos
Launch Coordinator
Periodic resource offers
Tell Fenzo offered resources & tasks
Fenzo returns resource task matchings
Tasks to launch
Datacenter
NAIVE APPROACH
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average 12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
© 2017 Mesosphere, Inc. All Rights Reserved. 30
Apache Mesos
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average 12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
Mesos automated schedulers, workload multiplexing
onto the same machines
Why Mesos?● 2-level scheduling● Fault-tolerant, battle-tested● Scalable to 10,000+ nodes● Created by Mesosphere founder @
UC Berkeley; used in production by 100+ web-scale companies [1]
[1] http://mesos.apache.org/documentation/latest/powered-by-mesos/
APACHE MESOS
DC/OS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
StreamingBatchMachine Learning
Analytics
Functions & Logic Search
Time SeriesSQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
© 2016 Mesosphere, Inc. All Rights Reserved.
DEMO
Conclusion
36
Conclusion
▪ Apache Flink runs on Mesos using Fenzo
▪ DC/OS offers easy to use Flink package ▪ Contributions welcome!
DC/OS Office Hour June 29th
37
Thank you! @stsffap
@joerg_schad @ApacheFlink @dataArtisans
@dcos
39
We are hiring! data-artisans.com/careers