+ All Categories
Home > Data & Analytics > Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Date post: 08-Jan-2017
Category:
Upload: apache-flink-taiwan-user-group
View: 126 times
Download: 0 times
Share this document with a friend
27
Profile 3 At: 310/311 Tirupati Udyog, I B Patel Rd, Off Western Express Highway Goregaon (East) Mumbai – 400065.
Transcript
Page 1: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Apache Flink TutorialDataStream API

Page 2: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Agenda● Basic structure of a streaming program● Overview of various data streams● Time characteristics● Windows● Window Functions

Page 3: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Basic Structure● For each Apache Flink DataStream Program

○ Obtain an execution environment.■ StreamExecutionEnvironment.getExecutionEnvironment()

○ Load/create data sources.■ read from file■ read from socket■ read from built-in sources (Kafka, RabbitMQ, etc.)

○ Execute transformations on them.■ filter, map, reduce, etc. (Task chaining)

○ Specify where to save results of the computations.■ stdout (print)■ write to files■ write to built-in sinks (elasticsearch, Kafka, etc.)

○ Trigger the program execution.

Hands-onBasicStructure

Page 4: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Various Data Streams in Apache Flink

Page 5: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On
Page 6: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Time Characteristics

E.g., ExecutionEnvironment.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)

Page 7: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Windows● The concept of Windows

○ cut an infinite stream into slices with finite elements.○ based on timestamp or some criteria.

● Construction of Windows○ Keyed Windows

■ an infinite DataStream is divided based on both window and key■ elements with different keys can be processed concurrently

○ Non-keyed Windows

● We focus on the keyed windowing.

Page 8: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Windows● Basic Structure

○ Key○ Window assigner○ Window function

■ reduce()■ fold()■ apply()

val input: DataStream[T] = ...

input

.keyBy(<key selector>)

.window(<window assigner>)

.<windowed transformation>(<window function>)

Page 9: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Window Assigner - Global Windows● Single per-key global window.● Only useful if a custom trigger is

specified.

Page 10: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Window Assigner - Tumbling Windows● Defined by window size.● Windows are disjoint.

Page 11: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Window Assigner - Sliding Windows● Defined by both window size and

sliding size.● Windows may have overlap.

Page 12: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Window Assigner - Session Windows● Defined by gap of time.● Window time

○ starts at individual time points.

○ ends once there has been a certain period of inactivity.

Page 13: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Cheat Sheet of Window Assigners

Hands-onDiffWindows

Page 14: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Window Functions● WindowFunction

○ Cache elements internally○ Provides Window meta information (e.g., start time, end time, etc.)

● ReduceFunction○ Incrementally aggregation○ No access to Window meta information

● FoldFunction○ Incrementally aggregation○ No access to Window meta information

● WindowFunction with ReduceFunction / FoldFunction○ Incrementally aggregation○ Has access to Window meta information

Page 15: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

Dealing with Data Lateness● Set allowed lateness to Windows

○ new in 1.1.0○ watermark passes end timestamp of window + allowedLateness.○ defaults to 0, drop event once it is late.

Hands-onWindowFuncs

Page 16: Apache Flink Training Workshop @ HadoopCon2016 - #3 DataStream API Hands-On

We’re all set. Thank you!!!

Just Flin

k It!


Recommended