Date post: | 17-Feb-2017 |
Category: |
Software |
Upload: | stephan-ewen |
View: | 297 times |
Download: | 5 times |
Stephan Ewen@stephanewen
What's coming up inApache Flink?Quick teaser of some of the upcoming features
Disclaimer
2
This list of threads is incomplete
This is not an Apache Flink roadmap!
What's coming up?
3
APIs
Integration Operations
Stream SQL
Queryable State
Cassandra
Deployment and Management(YARN, Mesos, Docker, …)
Dynamically ScalingStreaming Programs
Metrics
File System Sources
Side InputsJoining streamsand static data
BigTopIntegration
KinesisState Scalability
4
Stream SQL
Two definitions of Stream SQL
1. Run a continuous SQL query that reads an infinitestream and continuously produces results
2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.
5
Two definitions of Stream SQL
1. Run a continuous SQL query that reads an infinitestream and continuously produces results
2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.
6
That's Flink's Stream SQL
Good use case for Kafka + Flink + Druid
An Example
7
val execEnv = StreamExecutionEnvironment.getExecutionEnvironmentval tableEnv = TableEnvironment.getTableEnvironment(execEnv)
// define a JSON encoded Kafka topic as external tableval sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps, ("location", "time", "tempF"))
// register external tabletableEnv.registerTableSource("sensorData", sensorSource)
// define query in external tableval roomSensors: Table = tableEnv.sql(""" SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC FROM sensorData WHERE location LIKE 'room%' """)
// write the table back to Kafka as JSONroomSensors.toSink(new KafkaJsonSink(...))
The Implementation
8Flink 1.0 Flink 1.1 +
9
Queryable State
Sharing State with Applications
10
Access to the stream aggregates with a latency bound Write them to a key/value store
Sharing State with Applications
11
Access to the stream aggregates with a latency bound Write them to a key/value store
Often the biggestbottleneck
Queryable State
12
Optional, andonly at the end of
windows
Send queries to Flink's internal state
What does it bring? Fewer moving parts in the infrastructure Performance!
From an extension of Yahoo!'s streaming benchmark:• With key/value store: 280,000 events/s• Queryable state: 15,000,000 events/s
What's the secret?• No synchronous distributed communication• Persistence via Flink's checkpoint (async snapshots)
13
14
Dynamic Scaling
Adjust parallelism of Streaming Programs
15
Initialconfiguration
Scale Out(for load)
Scale In(save resources)
Adjust parallelism of Streaming Programs Adjusting parallelism without (significantly) interrupting the
program
Initial version:• Savepoint -> stop -> restart-with-different-parallelism
Stateless operators: Trivial Stateful operators: Repartition state
• State reorganized by key for key/value state and windows
16
Consistent Hashing
17
Redistribution via Key Groups
18
Redistribution via Key Groups Flink 1.0: Hash keys into parallel partitions. Finest granularity is a partition.
Flink 1.1: Hash keys into KeyGroups. Assign KeyGroups to parallel partitions Change of parallelism means change of assignment of
KeyGroups to parallel partitions
19
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!data-artisans.com/careers