+ All Categories
Home > Internet > Apache Flink Training - DataStream API - ProcessFunction

Apache Flink Training - DataStream API - ProcessFunction

Date post: 17-Mar-2018
Category:
Upload: dataartisans
View: 832 times
Download: 3 times
Share this document with a friend
14
1 Apache Flink® Training Flink v1.3 14.9.2017 DataStream API ProcessFunction
Transcript
Page 1: Apache Flink Training - DataStream API - ProcessFunction

1

Apache Flink® Training

Flink v1.3 – 14.9.2017

DataStream API

ProcessFunction

Page 2: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction

Combining timers with stateful event processing

2

Page 3: Apache Flink Training - DataStream API - ProcessFunction

Common Pattern

On each incoming element:

• update some state

• register a callback for a moment in the future

When that moment comes:

• Check a condition and perform a certain action, e.g.

emit an element

3

Page 4: Apache Flink Training - DataStream API - ProcessFunction

Flink 1.2 added ProcessFunction

Gives access to all basic building blocks:

• Events

• Fault-tolerant, Consistent State

• Timers (event- and processing-time)

4

Page 5: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction

Simple yet powerful API:

5

/**

* Process one element from the input stream.

*/

void processElement(I value, Context ctx, Collector<O> out) throws Exception;

/**

* Called when a timer set using {@link TimerService} fires.

*/

void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;

Page 6: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction

Simple yet powerful API:

6

/**

* Process one element from the input stream.

*/

void processElement(I value, Context ctx, Collector<O> out) throws Exception;

/**

* Called when a timer set using {@link TimerService} fires.

*/

void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;

A collector to emit result values

Page 7: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction

Simple yet powerful API:

7

/**

* Process one element from the input stream.

*/

void processElement(I value, Context ctx, Collector<O> out) throws Exception;

/**

* Called when a timer set using {@link TimerService} fires.

*/

void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;

1. Get the timestamp of the element2. Interact with the TimerService to:

• query the current time • and register timers

1. Do the above2. Query if we are operating on Event or

Processing time

Page 8: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction: example

Requirements:

• maintain counts per incoming key, and

• emit the key/count pair if no element came for the key

in the last 100 ms (in event time)

8

Page 9: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction: example

Implementation sketch:• Store the count, key and last mod timestamp in

a ValueState (scoped by key)

• For each record:

• update the counter and the last mod timestamp

• register a timer 100ms from “now” (in event time)

• When the timer fires:

• check the callback’s timestamp against the last mod time for the key and

• emit the key/count pair if they match

9

Page 10: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction: example

// the data type stored in the statepublic class CountWithTimestamp {

public String key;public long count;public long lastModified;

}

// apply the process function onto a keyed streamDataStream<Tuple2<String, Long>> result = stream

.keyBy(0)

.process(new CountWithTimeoutFunction());

10

Page 11: Apache Flink Training - DataStream API - ProcessFunction

ProcessFunction: example

public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {

@Overridepublic void open(Configuration parameters) throws Exception {

// register our state with the state backend}

@Override public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception {

// update our state and register a timer}

@Override public void onTimer(long timestamp, OnTimerContext ctx,

Collector<Tuple2<String, Long>> out) throws Exception { // check the state for the key and emit a result if needed

}}

11

Page 12: Apache Flink Training - DataStream API - ProcessFunction

public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {

private ValueState<CountWithTimestamp> state;

@Overridepublic void open(Configuration parameters) throws Exception {

state = getRuntimeContext().getState(new ValueStateDescriptor<>("myState", CountWithTimestamp.class));

}

}

ProcessFunction: example

12

Page 13: Apache Flink Training - DataStream API - ProcessFunction

public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {

@Override public void processElement(Tuple2<String, Long> value, Context ctx,

Collector<Tuple2<String, Long>> out) throws Exception {

CountWithTimestamp current = state.value(); if (current == null) {

current = new CountWithTimestamp(); current.key = value.f0;

} current.count++; current.lastModified = ctx.timestamp();state.update(current);ctx.timerService().registerEventTimeTimer(current.lastModified + 100);

}

}

ProcessFunction: example

13

Page 14: Apache Flink Training - DataStream API - ProcessFunction

public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {

@Override public void onTimer(long timestamp, OnTimerContext ctx,

Collector<Tuple2<String, Long>> out) throws Exception {

CountWithTimestamp result = state.value(); if (timestamp == result.lastModified + 100) {

out.collect(new Tuple2<String, Long>(result.key, result.count));state.clear();

} }

}

ProcessFunction: example

14


Recommended