Event Stream Processing with Out-of-Order Data Arrival

Post on 11-Jan-2016

41 views 2 download

Tags:

description

Event Stream Processing with Out-of-Order Data Arrival. Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu , Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007 , June 29 th 2007, Toronto ON Canada. Outline. - PowerPoint PPT Presentation

transcript

Event Stream Processing with Out-of-Order Data Arrival

Presenter: Mo Liu

Presentation based on:

Ming Li, Mo Liu, Luping Ding , Elke A. Rundensteiner, and Murali Mani

Worcester Polytechnic Institute, Worcester MA USA

DEPSA at ICDCS 2007, June 29th 2007, Toronto ON Canada

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Introduction: Event Stream Processing

Raising interest in the database community Wild-range and growing applications

Example of Event Stream Processing: Shoplifting in Retail Management

Event Stream Processing Engine Stream engine specific for event stream query: generic for

detecting and extracting expected pattern sequence Performance gain compared to stream system using joins

to handle event sequence query

Introduction: Complex Event Processing (CEP)

SASE Approach

Total Order Assumption in event arrivalsOrder in which the events are received by the

query system is the same as their timestamp orderBy this assumption, “later arrival” means “larger

timestamp”

What if Out-of-Order? Out-of-Order data arrival is common in distributed

computing environment (i.e., due to network traffic)Systems based on total order assumption (i.e.

SASE) miss qualified results and produce spurious results

Introduction: Limitations

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

EVENT <event pattern>

[WHERE <qualification>]

[WITHIN <window>]

SS: (A,B,D)

Input Event Stream

SC: (A,B,D)

SSC

WD: D.ts – A.ts < 10 secs

PSSC: W = 10 secs

( ts:timestamp )

Example: EVENT SEQ (A, B, D) WITHIN 10 seconds

Queries in SASE assume

above language structure

Preliminary: Query Language

SSC (Sequence Scan and Construction)

Sequence Scan: employs an NFA to detect matches Sequence Construction: constructs expected results

NFA with AIS (Active Instance Stack)

Preliminary: Finding Result Sequences

AIS associates a stack with each state of the NFA storing the events that triggered the NFA transition to this state

RIP (Most Recent Instance in Previous Stack) field

The field records the temporal order relevant to the query

Example

0 1 2 3A B D

* *

[] a3

[] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

[] a16

S1 S2 S3

a3 b6 d15

a3 b11 d15

a7 b11 d15

a3 b6 d10

a c b a d f c d f f a…

3 5 6 7 10 12 13 15 16 18 18… Timestamp

b

1

b

11

EVENT SEQ(A, B, D) WITHIN 10 Seconds

WD

Preliminary: Finding Result Sequences (Cont.)

0 1 2 3A B D

* *

EVENT SEQ(A, B, D) WITHIN 10 Seconds

a c b a d f c d f f a…

3 5 6 7 10 12 13 15 16 18 19… Timestamp

b

1

b

11

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

(b11) d15

S1 S2 S3

PSSC:You see d15 Purge a3 and so on

Example

Preliminary: Purging Operator States

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

0 1 2 3A B D

* *

() a3

() a7

(a3) b6

(a7) b11

(b6) d10 a3 b6 d10

a7 b11 d15

a0 b1 d2

a3 b6 d10

a7 b11 d15

Produced Result Correct Result

Missing!

EVENT SEQ(A, B, D) WITHIN 10 Seconds

Problem with Out-of-Order at SSC: Incomplete Event Retrieval

(b11) d15

SSCMissing Result

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

a

0

d

2

f

18 Received Order

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

d

8

f

18 Received Order

[] a3

[] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

S1 S2S3

[b11] d8

Incorrect AIS Appending

a3 b6 d8

a3 b11 d8 a3 b6 d8

Produced Result Correct Result

Missing!Wrong!

Problem with Out-of-Order at SSC: Event Misplacement

0 1 2 3A B D

* *

EVENT SEQ(A, B, D) WITHIN 10 Seconds

a3 b6 d8

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

(b11) d15

S1 S2 S3

Purge in SSYou see d15 thenpurge a3 and so onAfter that, OOO d8comes Missing Result! unauthorized AIS

purge CLAIM : Any data purge of

active instance stack (AIS) is unauthorized unless total order on the data arrival holds for the input stream

If precise query result is required, and memory resources is limited, WD in SSwould not be sufficient for handling Out-of-order event arrival!

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival Example 3

d

8

f

18 Received Order

Problem with Out-of-Order at PSSC

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Solution in SSC Event Retrieval Mechanism

To avoid incomplete retrieval, all states of the NFA need to be set active before the retrieval over the event stream.

0 1 2 3A B D

* *

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

a0 b1 d2

a3 b6 d10

a7 b11 d15

Produced Result

(b11) d15

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

a

0

d

2

f

17 Received Order

(a0) b1() a0 (b1) d2

AIS Construction Mechanism

For avoiding event misplacement, use sort semantics

instead of append semantics

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

b

8

f

18 Received Order

[] a3

[] a7 [a7] b8

[a7] b11

[b8] d10

[b11] d15S1S2 S3

Correct AIS Appending

[a3] b6a3 b8 d10a7 b8 d10a3 b8 d15a7 b8 d15

Solution in SSC (Cont.)

SSC Algorithm with Out-of-Order Handling Out-of-Order Handling Incorporated SSC:

Input: (1) Sequence Query “EVENT SEQ (E1, E2, …, Em) WITHIN W”; (2) AIS constructed from previously input events; (3) newly received event ei (under event type Ei)

Output: (1) updated AIS; (2) sequence output of SSC

1. IF event type Ei is among {E1, E2, …, Em} 2. insert ei into stack Si (using “sort semantics”) 3. set ei’s RIP 4. check the RIP values of the instances in stack

Si+1 and reset the ones being affected by ei 5. produce event sequences containing ei if any

Optimization

Out-of-Order Handling Incorporated SSC with AIS_CLOCK:

Input and output: Same as Algorithm 1

1. IF event type Ei is among {E1, E2, …, Em} 2. IF ei.timestamp < AIS_CLOCK 3. buffer ei 4. insert ei into stack Si (using “sort semantics”) 5. set ei’s RIP 6. check the RIP values of the instances in stack

Si+1 and reset the ones being affected 7. produce event sequences containing ei if any 8. ELSE 9. buffer ei 10. insert ei into stack Si (using “append semantics”) 11. set ei’s RIP 12. IF Ei = Em 13. produce event sequences containing ei if any

Using K-Slack

We apply K-Slack based on time units. It assumes that the out-of-ordering in event arrivals is within a range of k time units. That is, an event can be delayed for at most k time units.

Solution for PSSC

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11 d

8

f

18 Received Order

a3 b6 d8

[] a3

[] a7

[a3] b6

[a7] b8 [b8] d10

[b11] d15[a7] b11

SEQ(A, B, D) W = 10K = 4

Purge when f18 is met

18 > 3 + 10 + 4

Purge condition: ei.timestamp + W + K < CLOCK (After waiting for K time units, no out-of-order

event with timestamp less than ei + W can arrive. Thus ei will no longer be able to contribute to forming a new candidate event sequence)

CLOCK: Its value equals to largest timestamp seen so far

from the received events is maintained.

PSSC Algorithm With Out-of-Order Handling

Out-of-Order Incorporated SSC Purge (PSSC): Input: (1) current AIS; (2) CLOCK triggering from

SSC Output: updated AIS 1. On receiving a CLOCK triggering 2. for event instance e in AIS 3. IF e.timestamp + W + K < CLOCK 4. purge e

Optimization 1: AIS partition

[] a3

[] a7

[] b1

[a3] b5 [b5] d10

[b11] d18

S1 S2 S3

[a7] b11

SEQ(A, B, D) W=7K=10 (large)

a c b a d f c f

3 4 5 7 10 12 13 15

b

11

Out-of-Order Event Arrival

f

18 Received Order

divider

We can divide each stack in AIS into two parts: outdated event instances (e.timestamp + W + K > CLOCK )up-to-date event instances. (e.timestamp + W > CLOCK)

b

1

a3 b5 d18 a3 b5 d18 a3 b11 d18a7 b11 d18

SSC output when d13 comes

Cost !

d

18

For each CLOCK update, only the instance in the last AIS stack will be checked for data purge. For any instance is purged from there, we can purge instances in other AIS stacks following the RIP path.

[ ] a3

[ ] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

Optimization 2: Lazy Purge

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Experiment 1:

Sequence Scan and Construction (SSC)

CPU gain on applying the AIS_CLOCK SEQ (A, B, C, D, E, F))

Out-of-order data percentage is 90%

Y axis cost: Inserting events and resetting RIP

Experiment 2: Applying AIS partition during the SSC purge

Performance Gain On Memory Performance Gain on CPU cost

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Conclusion

In this work, we address the problem of processing event stream with out-of-order data arrival:

we analyze the problems state-of-the-art event stream processing technology would experience when faced with out-of-order data arrival

we propose new implementation and optimization strategies for the core stream algebra operators

we conduct an experimental study that clearly demonstrates the effectiveness of our proposed approach over existing solutions

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Related Work

Some initial work uses K-slack to investigate the out-of-order problem for homogenous-input stream systems

Aurora deals with out of order within operator-level Order-sensitive operators wait a certain period of time before closing each window

Cayuga system deals with out-of-order by waiting K time unite before all the processing, which has higher latency then ours

Stream punctuation confirms that a certain value or time stamp will no longer appear in the future input streams. It requires certain service to first be created and appropriately associated

Thank you!