Post on 11-Jan-2016
description
transcript
Event Stream Processing with Out-of-Order Data Arrival
Presenter: Mo Liu
Presentation based on:
Ming Li, Mo Liu, Luping Ding , Elke A. Rundensteiner, and Murali Mani
Worcester Polytechnic Institute, Worcester MA USA
DEPSA at ICDCS 2007, June 29th 2007, Toronto ON Canada
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
Introduction: Event Stream Processing
Raising interest in the database community Wild-range and growing applications
Example of Event Stream Processing: Shoplifting in Retail Management
Event Stream Processing Engine Stream engine specific for event stream query: generic for
detecting and extracting expected pattern sequence Performance gain compared to stream system using joins
to handle event sequence query
Introduction: Complex Event Processing (CEP)
SASE Approach
Total Order Assumption in event arrivalsOrder in which the events are received by the
query system is the same as their timestamp orderBy this assumption, “later arrival” means “larger
timestamp”
What if Out-of-Order? Out-of-Order data arrival is common in distributed
computing environment (i.e., due to network traffic)Systems based on total order assumption (i.e.
SASE) miss qualified results and produce spurious results
Introduction: Limitations
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
EVENT <event pattern>
[WHERE <qualification>]
[WITHIN <window>]
SS: (A,B,D)
Input Event Stream
SC: (A,B,D)
SSC
WD: D.ts – A.ts < 10 secs
PSSC: W = 10 secs
( ts:timestamp )
Example: EVENT SEQ (A, B, D) WITHIN 10 seconds
Queries in SASE assume
above language structure
Preliminary: Query Language
SSC (Sequence Scan and Construction)
Sequence Scan: employs an NFA to detect matches Sequence Construction: constructs expected results
NFA with AIS (Active Instance Stack)
Preliminary: Finding Result Sequences
AIS associates a stack with each state of the NFA storing the events that triggered the NFA transition to this state
RIP (Most Recent Instance in Previous Stack) field
The field records the temporal order relevant to the query
Example
0 1 2 3A B D
* *
[] a3
[] a7
[a3] b6
[a7] b11
[b6] d10
[b11] d15
[] a16
S1 S2 S3
a3 b6 d15
a3 b11 d15
a7 b11 d15
a3 b6 d10
a c b a d f c d f f a…
3 5 6 7 10 12 13 15 16 18 18… Timestamp
b
1
b
11
EVENT SEQ(A, B, D) WITHIN 10 Seconds
WD
Preliminary: Finding Result Sequences (Cont.)
0 1 2 3A B D
* *
EVENT SEQ(A, B, D) WITHIN 10 Seconds
a c b a d f c d f f a…
3 5 6 7 10 12 13 15 16 18 19… Timestamp
b
1
b
11
() a3
() a7
(a3) b6
(a7) b11
(b6) d10
(b11) d15
S1 S2 S3
PSSC:You see d15 Purge a3 and so on
Example
Preliminary: Purging Operator States
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
0 1 2 3A B D
* *
() a3
() a7
(a3) b6
(a7) b11
(b6) d10 a3 b6 d10
a7 b11 d15
a0 b1 d2
a3 b6 d10
a7 b11 d15
Produced Result Correct Result
Missing!
EVENT SEQ(A, B, D) WITHIN 10 Seconds
Problem with Out-of-Order at SSC: Incomplete Event Retrieval
(b11) d15
SSCMissing Result
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11
Out-of-Order Event Arrival
a
0
d
2
f
18 Received Order
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11
Out-of-Order Event Arrival
d
8
f
18 Received Order
[] a3
[] a7
[a3] b6
[a7] b11
[b6] d10
[b11] d15
S1 S2S3
[b11] d8
Incorrect AIS Appending
a3 b6 d8
a3 b11 d8 a3 b6 d8
Produced Result Correct Result
Missing!Wrong!
Problem with Out-of-Order at SSC: Event Misplacement
0 1 2 3A B D
* *
EVENT SEQ(A, B, D) WITHIN 10 Seconds
a3 b6 d8
() a3
() a7
(a3) b6
(a7) b11
(b6) d10
(b11) d15
S1 S2 S3
Purge in SSYou see d15 thenpurge a3 and so onAfter that, OOO d8comes Missing Result! unauthorized AIS
purge CLAIM : Any data purge of
active instance stack (AIS) is unauthorized unless total order on the data arrival holds for the input stream
If precise query result is required, and memory resources is limited, WD in SSwould not be sufficient for handling Out-of-order event arrival!
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11
Out-of-Order Event Arrival Example 3
d
8
f
18 Received Order
Problem with Out-of-Order at PSSC
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
Solution in SSC Event Retrieval Mechanism
To avoid incomplete retrieval, all states of the NFA need to be set active before the retrieval over the event stream.
0 1 2 3A B D
* *
() a3
() a7
(a3) b6
(a7) b11
(b6) d10
a0 b1 d2
a3 b6 d10
a7 b11 d15
…
Produced Result
(b11) d15
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11
Out-of-Order Event Arrival
a
0
d
2
f
17 Received Order
(a0) b1() a0 (b1) d2
AIS Construction Mechanism
For avoiding event misplacement, use sort semantics
instead of append semantics
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11
Out-of-Order Event Arrival
b
8
f
18 Received Order
[] a3
[] a7 [a7] b8
[a7] b11
[b8] d10
[b11] d15S1S2 S3
Correct AIS Appending
[a3] b6a3 b8 d10a7 b8 d10a3 b8 d15a7 b8 d15
Solution in SSC (Cont.)
SSC Algorithm with Out-of-Order Handling Out-of-Order Handling Incorporated SSC:
Input: (1) Sequence Query “EVENT SEQ (E1, E2, …, Em) WITHIN W”; (2) AIS constructed from previously input events; (3) newly received event ei (under event type Ei)
Output: (1) updated AIS; (2) sequence output of SSC
1. IF event type Ei is among {E1, E2, …, Em} 2. insert ei into stack Si (using “sort semantics”) 3. set ei’s RIP 4. check the RIP values of the instances in stack
Si+1 and reset the ones being affected by ei 5. produce event sequences containing ei if any
Optimization
Out-of-Order Handling Incorporated SSC with AIS_CLOCK:
Input and output: Same as Algorithm 1
1. IF event type Ei is among {E1, E2, …, Em} 2. IF ei.timestamp < AIS_CLOCK 3. buffer ei 4. insert ei into stack Si (using “sort semantics”) 5. set ei’s RIP 6. check the RIP values of the instances in stack
Si+1 and reset the ones being affected 7. produce event sequences containing ei if any 8. ELSE 9. buffer ei 10. insert ei into stack Si (using “append semantics”) 11. set ei’s RIP 12. IF Ei = Em 13. produce event sequences containing ei if any
Using K-Slack
We apply K-Slack based on time units. It assumes that the out-of-ordering in event arrivals is within a range of k time units. That is, an event can be delayed for at most k time units.
Solution for PSSC
a c b a d f c d f
3 5 6 7 10 12 13 15 16
b
1
b
11 d
8
f
18 Received Order
a3 b6 d8
[] a3
[] a7
[a3] b6
[a7] b8 [b8] d10
[b11] d15[a7] b11
SEQ(A, B, D) W = 10K = 4
Purge when f18 is met
18 > 3 + 10 + 4
Purge condition: ei.timestamp + W + K < CLOCK (After waiting for K time units, no out-of-order
event with timestamp less than ei + W can arrive. Thus ei will no longer be able to contribute to forming a new candidate event sequence)
CLOCK: Its value equals to largest timestamp seen so far
from the received events is maintained.
PSSC Algorithm With Out-of-Order Handling
Out-of-Order Incorporated SSC Purge (PSSC): Input: (1) current AIS; (2) CLOCK triggering from
SSC Output: updated AIS 1. On receiving a CLOCK triggering 2. for event instance e in AIS 3. IF e.timestamp + W + K < CLOCK 4. purge e
Optimization 1: AIS partition
[] a3
[] a7
[] b1
[a3] b5 [b5] d10
[b11] d18
S1 S2 S3
[a7] b11
SEQ(A, B, D) W=7K=10 (large)
a c b a d f c f
3 4 5 7 10 12 13 15
b
11
Out-of-Order Event Arrival
f
18 Received Order
divider
We can divide each stack in AIS into two parts: outdated event instances (e.timestamp + W + K > CLOCK )up-to-date event instances. (e.timestamp + W > CLOCK)
b
1
a3 b5 d18 a3 b5 d18 a3 b11 d18a7 b11 d18
…
SSC output when d13 comes
Cost !
d
18
For each CLOCK update, only the instance in the last AIS stack will be checked for data purge. For any instance is purged from there, we can purge instances in other AIS stacks following the RIP path.
[ ] a3
[ ] a7
[a3] b6
[a7] b11
[b6] d10
[b11] d15
Optimization 2: Lazy Purge
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
Experiment 1:
Sequence Scan and Construction (SSC)
CPU gain on applying the AIS_CLOCK SEQ (A, B, C, D, E, F))
Out-of-order data percentage is 90%
Y axis cost: Inserting events and resetting RIP
Experiment 2: Applying AIS partition during the SSC purge
Performance Gain On Memory Performance Gain on CPU cost
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
Conclusion
In this work, we address the problem of processing event stream with out-of-order data arrival:
we analyze the problems state-of-the-art event stream processing technology would experience when faced with out-of-order data arrival
we propose new implementation and optimization strategies for the core stream algebra operators
we conduct an experimental study that clearly demonstrates the effectiveness of our proposed approach over existing solutions
Outline
Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Experiment Conclusion Related Work
Related Work
Some initial work uses K-slack to investigate the out-of-order problem for homogenous-input stream systems
Aurora deals with out of order within operator-level Order-sensitive operators wait a certain period of time before closing each window
Cayuga system deals with out-of-order by waiting K time unite before all the processing, which has higher latency then ours
Stream punctuation confirms that a certain value or time stamp will no longer appear in the future input streams. It requires certain service to first be created and appropriately associated
Thank you!