+ All Categories
Home > Software > Apache Flink Berlin Meetup May 2016

Apache Flink Berlin Meetup May 2016

Date post: 17-Feb-2017
Category:
Upload: stephan-ewen
View: 297 times
Download: 5 times
Share this document with a friend
21
Stephan Ewen @stephanewen What's coming up in Apache Flink? Quick teaser of some of the upcoming features
Transcript
Page 1: Apache Flink Berlin Meetup May 2016

Stephan Ewen@stephanewen

What's coming up inApache Flink?Quick teaser of some of the upcoming features

Page 2: Apache Flink Berlin Meetup May 2016

Disclaimer

2

This list of threads is incomplete

This is not an Apache Flink roadmap!

Page 3: Apache Flink Berlin Meetup May 2016

What's coming up?

3

APIs

Integration Operations

Stream SQL

Queryable State

Cassandra

Deployment and Management(YARN, Mesos, Docker, …)

Dynamically ScalingStreaming Programs

Metrics

File System Sources

Side InputsJoining streamsand static data

BigTopIntegration

KinesisState Scalability

Page 4: Apache Flink Berlin Meetup May 2016

4

Stream SQL

Page 5: Apache Flink Berlin Meetup May 2016

Two definitions of Stream SQL

1. Run a continuous SQL query that reads an infinitestream and continuously produces results

2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.

5

Page 6: Apache Flink Berlin Meetup May 2016

Two definitions of Stream SQL

1. Run a continuous SQL query that reads an infinitestream and continuously produces results

2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.

6

That's Flink's Stream SQL

Good use case for Kafka + Flink + Druid

Page 7: Apache Flink Berlin Meetup May 2016

An Example

7

val execEnv = StreamExecutionEnvironment.getExecutionEnvironmentval tableEnv = TableEnvironment.getTableEnvironment(execEnv)

// define a JSON encoded Kafka topic as external tableval sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps, ("location", "time", "tempF"))

// register external tabletableEnv.registerTableSource("sensorData", sensorSource)

// define query in external tableval roomSensors: Table = tableEnv.sql(""" SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC FROM sensorData WHERE location LIKE 'room%' """)

// write the table back to Kafka as JSONroomSensors.toSink(new KafkaJsonSink(...))

Page 8: Apache Flink Berlin Meetup May 2016

The Implementation

8Flink 1.0 Flink 1.1 +

Page 9: Apache Flink Berlin Meetup May 2016

9

Queryable State

Page 10: Apache Flink Berlin Meetup May 2016

Sharing State with Applications

10

Access to the stream aggregates with a latency bound Write them to a key/value store

Page 11: Apache Flink Berlin Meetup May 2016

Sharing State with Applications

11

Access to the stream aggregates with a latency bound Write them to a key/value store

Often the biggestbottleneck

Page 12: Apache Flink Berlin Meetup May 2016

Queryable State

12

Optional, andonly at the end of

windows

Send queries to Flink's internal state

Page 13: Apache Flink Berlin Meetup May 2016

What does it bring? Fewer moving parts in the infrastructure Performance!

From an extension of Yahoo!'s streaming benchmark:• With key/value store: 280,000 events/s• Queryable state: 15,000,000 events/s

What's the secret?• No synchronous distributed communication• Persistence via Flink's checkpoint (async snapshots)

13

Page 14: Apache Flink Berlin Meetup May 2016

14

Dynamic Scaling

Page 15: Apache Flink Berlin Meetup May 2016

Adjust parallelism of Streaming Programs

15

Initialconfiguration

Scale Out(for load)

Scale In(save resources)

Page 16: Apache Flink Berlin Meetup May 2016

Adjust parallelism of Streaming Programs Adjusting parallelism without (significantly) interrupting the

program

Initial version:• Savepoint -> stop -> restart-with-different-parallelism

Stateless operators: Trivial Stateful operators: Repartition state

• State reorganized by key for key/value state and windows

16

Page 17: Apache Flink Berlin Meetup May 2016

Consistent Hashing

17

Page 18: Apache Flink Berlin Meetup May 2016

Redistribution via Key Groups

18

Page 19: Apache Flink Berlin Meetup May 2016

Redistribution via Key Groups Flink 1.0: Hash keys into parallel partitions. Finest granularity is a partition.

Flink 1.1: Hash keys into KeyGroups. Assign KeyGroups to parallel partitions Change of parallelism means change of assignment of

KeyGroups to parallel partitions

19

Page 20: Apache Flink Berlin Meetup May 2016

Flink Forward 2016, Berlin

Submission deadline: June 30, 2016Early bird deadline: July 15, 2016

www.flink-forward.org

Page 21: Apache Flink Berlin Meetup May 2016

We are hiring!data-artisans.com/careers


Recommended