+ All Categories
Home > Documents > Event Stream Processing with Out-of-Order Data Arrival

Event Stream Processing with Out-of-Order Data Arrival

Date post: 11-Jan-2016
Category:
Upload: zea
View: 41 times
Download: 2 times
Share this document with a friend
Description:
Event Stream Processing with Out-of-Order Data Arrival. Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu , Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007 , June 29 th 2007, Toronto ON Canada. Outline. - PowerPoint PPT Presentation
Popular Tags:
32
Event Stream Processing with Out-of-Order Data Arrival Presenter: Mo Liu Presentation based on: Ming Li, Mo Liu , Luping Ding , Elke A. Rundensteiner, and Murali Mani Worcester Polytechnic Institute, Worcester MA USA DEPSA at ICDCS 2007, June 29 th 2007, Toronto ON Canada
Transcript
Page 1: Event Stream Processing  with Out-of-Order Data Arrival

Event Stream Processing with Out-of-Order Data Arrival

Presenter: Mo Liu

Presentation based on:

Ming Li, Mo Liu, Luping Ding , Elke A. Rundensteiner, and Murali Mani

Worcester Polytechnic Institute, Worcester MA USA

DEPSA at ICDCS 2007, June 29th 2007, Toronto ON Canada

Page 2: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 3: Event Stream Processing  with Out-of-Order Data Arrival

Introduction: Event Stream Processing

Raising interest in the database community Wild-range and growing applications

Example of Event Stream Processing: Shoplifting in Retail Management

Page 4: Event Stream Processing  with Out-of-Order Data Arrival

Event Stream Processing Engine Stream engine specific for event stream query: generic for

detecting and extracting expected pattern sequence Performance gain compared to stream system using joins

to handle event sequence query

Introduction: Complex Event Processing (CEP)

SASE Approach

Page 5: Event Stream Processing  with Out-of-Order Data Arrival

Total Order Assumption in event arrivalsOrder in which the events are received by the

query system is the same as their timestamp orderBy this assumption, “later arrival” means “larger

timestamp”

What if Out-of-Order? Out-of-Order data arrival is common in distributed

computing environment (i.e., due to network traffic)Systems based on total order assumption (i.e.

SASE) miss qualified results and produce spurious results

Introduction: Limitations

Page 6: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 7: Event Stream Processing  with Out-of-Order Data Arrival

EVENT <event pattern>

[WHERE <qualification>]

[WITHIN <window>]

SS: (A,B,D)

Input Event Stream

SC: (A,B,D)

SSC

WD: D.ts – A.ts < 10 secs

PSSC: W = 10 secs

( ts:timestamp )

Example: EVENT SEQ (A, B, D) WITHIN 10 seconds

Queries in SASE assume

above language structure

Preliminary: Query Language

Page 8: Event Stream Processing  with Out-of-Order Data Arrival

SSC (Sequence Scan and Construction)

Sequence Scan: employs an NFA to detect matches Sequence Construction: constructs expected results

NFA with AIS (Active Instance Stack)

Preliminary: Finding Result Sequences

AIS associates a stack with each state of the NFA storing the events that triggered the NFA transition to this state

RIP (Most Recent Instance in Previous Stack) field

The field records the temporal order relevant to the query

Page 9: Event Stream Processing  with Out-of-Order Data Arrival

Example

0 1 2 3A B D

* *

[] a3

[] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

[] a16

S1 S2 S3

a3 b6 d15

a3 b11 d15

a7 b11 d15

a3 b6 d10

a c b a d f c d f f a…

3 5 6 7 10 12 13 15 16 18 18… Timestamp

b

1

b

11

EVENT SEQ(A, B, D) WITHIN 10 Seconds

WD

Preliminary: Finding Result Sequences (Cont.)

Page 10: Event Stream Processing  with Out-of-Order Data Arrival

0 1 2 3A B D

* *

EVENT SEQ(A, B, D) WITHIN 10 Seconds

a c b a d f c d f f a…

3 5 6 7 10 12 13 15 16 18 19… Timestamp

b

1

b

11

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

(b11) d15

S1 S2 S3

PSSC:You see d15 Purge a3 and so on

Example

Preliminary: Purging Operator States

Page 11: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 12: Event Stream Processing  with Out-of-Order Data Arrival

0 1 2 3A B D

* *

() a3

() a7

(a3) b6

(a7) b11

(b6) d10 a3 b6 d10

a7 b11 d15

a0 b1 d2

a3 b6 d10

a7 b11 d15

Produced Result Correct Result

Missing!

EVENT SEQ(A, B, D) WITHIN 10 Seconds

Problem with Out-of-Order at SSC: Incomplete Event Retrieval

(b11) d15

SSCMissing Result

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

a

0

d

2

f

18 Received Order

Page 13: Event Stream Processing  with Out-of-Order Data Arrival

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

d

8

f

18 Received Order

[] a3

[] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

S1 S2S3

[b11] d8

Incorrect AIS Appending

a3 b6 d8

a3 b11 d8 a3 b6 d8

Produced Result Correct Result

Missing!Wrong!

Problem with Out-of-Order at SSC: Event Misplacement

Page 14: Event Stream Processing  with Out-of-Order Data Arrival

0 1 2 3A B D

* *

EVENT SEQ(A, B, D) WITHIN 10 Seconds

a3 b6 d8

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

(b11) d15

S1 S2 S3

Purge in SSYou see d15 thenpurge a3 and so onAfter that, OOO d8comes Missing Result! unauthorized AIS

purge CLAIM : Any data purge of

active instance stack (AIS) is unauthorized unless total order on the data arrival holds for the input stream

If precise query result is required, and memory resources is limited, WD in SSwould not be sufficient for handling Out-of-order event arrival!

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival Example 3

d

8

f

18 Received Order

Problem with Out-of-Order at PSSC

Page 15: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 16: Event Stream Processing  with Out-of-Order Data Arrival

Solution in SSC Event Retrieval Mechanism

To avoid incomplete retrieval, all states of the NFA need to be set active before the retrieval over the event stream.

0 1 2 3A B D

* *

() a3

() a7

(a3) b6

(a7) b11

(b6) d10

a0 b1 d2

a3 b6 d10

a7 b11 d15

Produced Result

(b11) d15

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

a

0

d

2

f

17 Received Order

(a0) b1() a0 (b1) d2

Page 17: Event Stream Processing  with Out-of-Order Data Arrival

AIS Construction Mechanism

For avoiding event misplacement, use sort semantics

instead of append semantics

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11

Out-of-Order Event Arrival

b

8

f

18 Received Order

[] a3

[] a7 [a7] b8

[a7] b11

[b8] d10

[b11] d15S1S2 S3

Correct AIS Appending

[a3] b6a3 b8 d10a7 b8 d10a3 b8 d15a7 b8 d15

Solution in SSC (Cont.)

Page 18: Event Stream Processing  with Out-of-Order Data Arrival

SSC Algorithm with Out-of-Order Handling Out-of-Order Handling Incorporated SSC:

Input: (1) Sequence Query “EVENT SEQ (E1, E2, …, Em) WITHIN W”; (2) AIS constructed from previously input events; (3) newly received event ei (under event type Ei)

Output: (1) updated AIS; (2) sequence output of SSC

1. IF event type Ei is among {E1, E2, …, Em} 2. insert ei into stack Si (using “sort semantics”) 3. set ei’s RIP 4. check the RIP values of the instances in stack

Si+1 and reset the ones being affected by ei 5. produce event sequences containing ei if any

Page 19: Event Stream Processing  with Out-of-Order Data Arrival

Optimization

Out-of-Order Handling Incorporated SSC with AIS_CLOCK:

Input and output: Same as Algorithm 1

1. IF event type Ei is among {E1, E2, …, Em} 2. IF ei.timestamp < AIS_CLOCK 3. buffer ei 4. insert ei into stack Si (using “sort semantics”) 5. set ei’s RIP 6. check the RIP values of the instances in stack

Si+1 and reset the ones being affected 7. produce event sequences containing ei if any 8. ELSE 9. buffer ei 10. insert ei into stack Si (using “append semantics”) 11. set ei’s RIP 12. IF Ei = Em 13. produce event sequences containing ei if any

Page 20: Event Stream Processing  with Out-of-Order Data Arrival

Using K-Slack

We apply K-Slack based on time units. It assumes that the out-of-ordering in event arrivals is within a range of k time units. That is, an event can be delayed for at most k time units.

Solution for PSSC

a c b a d f c d f

3 5 6 7 10 12 13 15 16

b

1

b

11 d

8

f

18 Received Order

a3 b6 d8

[] a3

[] a7

[a3] b6

[a7] b8 [b8] d10

[b11] d15[a7] b11

SEQ(A, B, D) W = 10K = 4

Purge when f18 is met

18 > 3 + 10 + 4

Page 21: Event Stream Processing  with Out-of-Order Data Arrival

Purge condition: ei.timestamp + W + K < CLOCK (After waiting for K time units, no out-of-order

event with timestamp less than ei + W can arrive. Thus ei will no longer be able to contribute to forming a new candidate event sequence)

CLOCK: Its value equals to largest timestamp seen so far

from the received events is maintained.

Page 22: Event Stream Processing  with Out-of-Order Data Arrival

PSSC Algorithm With Out-of-Order Handling

Out-of-Order Incorporated SSC Purge (PSSC): Input: (1) current AIS; (2) CLOCK triggering from

SSC Output: updated AIS 1. On receiving a CLOCK triggering 2. for event instance e in AIS 3. IF e.timestamp + W + K < CLOCK 4. purge e

Page 23: Event Stream Processing  with Out-of-Order Data Arrival

Optimization 1: AIS partition

[] a3

[] a7

[] b1

[a3] b5 [b5] d10

[b11] d18

S1 S2 S3

[a7] b11

SEQ(A, B, D) W=7K=10 (large)

a c b a d f c f

3 4 5 7 10 12 13 15

b

11

Out-of-Order Event Arrival

f

18 Received Order

divider

We can divide each stack in AIS into two parts: outdated event instances (e.timestamp + W + K > CLOCK )up-to-date event instances. (e.timestamp + W > CLOCK)

b

1

a3 b5 d18 a3 b5 d18 a3 b11 d18a7 b11 d18

SSC output when d13 comes

Cost !

d

18

Page 24: Event Stream Processing  with Out-of-Order Data Arrival

For each CLOCK update, only the instance in the last AIS stack will be checked for data purge. For any instance is purged from there, we can purge instances in other AIS stacks following the RIP path.

[ ] a3

[ ] a7

[a3] b6

[a7] b11

[b6] d10

[b11] d15

Optimization 2: Lazy Purge

Page 25: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 26: Event Stream Processing  with Out-of-Order Data Arrival

Experiment 1:

Sequence Scan and Construction (SSC)

CPU gain on applying the AIS_CLOCK SEQ (A, B, C, D, E, F))

Out-of-order data percentage is 90%

Y axis cost: Inserting events and resetting RIP

Page 27: Event Stream Processing  with Out-of-Order Data Arrival

Experiment 2: Applying AIS partition during the SSC purge

Performance Gain On Memory Performance Gain on CPU cost

Page 28: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 29: Event Stream Processing  with Out-of-Order Data Arrival

Conclusion

In this work, we address the problem of processing event stream with out-of-order data arrival:

we analyze the problems state-of-the-art event stream processing technology would experience when faced with out-of-order data arrival

we propose new implementation and optimization strategies for the core stream algebra operators

we conduct an experimental study that clearly demonstrates the effectiveness of our proposed approach over existing solutions

Page 30: Event Stream Processing  with Out-of-Order Data Arrival

Outline

Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work

Page 31: Event Stream Processing  with Out-of-Order Data Arrival

Related Work

Some initial work uses K-slack to investigate the out-of-order problem for homogenous-input stream systems

Aurora deals with out of order within operator-level Order-sensitive operators wait a certain period of time before closing each window

Cayuga system deals with out-of-order by waiting K time unite before all the processing, which has higher latency then ours

Stream punctuation confirms that a certain value or time stamp will no longer appear in the future input streams. It requires certain service to first be created and appropriately associated

Page 32: Event Stream Processing  with Out-of-Order Data Arrival

Thank you!


Recommended