+ All Categories
Home > Documents > Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University,...

Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University,...

Date post: 28-Dec-2015
Category:
Upload: kristian-thomas
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Matching Heterogeneous Events with Patterns Xiaochen Zhu 1 , Shaoxu Song 1 , Jianmin Wang 1 , Philip S. Yu 2 , Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29 ICDE 2014
Transcript
Page 1: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Matching Heterogeneous Events with Patterns

Xiaochen Zhu1, Shaoxu Song1, Jianmin Wang1, Philip S. Yu2, Jiaguang Sun1

1Tsinghua University, China

2University of Illinois at Chicago, USA

1/29

ICDE 2014

Page 2: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Outline

Motivation Event Matching Framework

A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H

Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion

2/29

ICDE 2014

Page 3: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Information System and Event Log

Information systems play an important role in large enterprises:

Enterprise Resource Planning (ERP) Office Automation (OA)

These systems record the business history in their event logs.

3/29

ICDE 2014

Trace ID Trace Trace ID Trace

1 ABCDEF 6 ACBDEF

2 ACBDEF 7 ACBDFE

3 ACBDFE 8 ACBDFE

4 ABCDFE 9 ACBDFE

5 ACBDEF 10 ACBDFE

ABCDEF

Event ID Trace ID Event Name Timestamp

1 1 Order Received (A) 04-22 13:33:34

2 1 Payment (B) 04-22 15:10:17

3 1 Check Inventory (C) 04-22 15:18:11

4 1 Ship Goods (D) 04-22 15:31:50

5 1 Record Order (E) 04-23 08:14:26

6 1 Send Notification (F)

04-23 08:17:18

Page 4: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Event Data Integration

Complex event processing Provenance analysis Decision support

Exploring the correspondence among events

4/29

ICDE 2014

Business Data Warehouse

Event Logs

Beijing Subsidiary

Event Logs

Shanghai Subsidiary

Event Logs

Guangzhou Subsidiary

Information systems

Information systems

Information systems

Page 5: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Heterogeneous Events

Different events may represent the same activity

5/29

Event Name Timestamp

Order Received (A) 04-22 13:33:34

Payment (B) 04-22 15:10:17

Check Inventory (C) 04-22 15:18:11

Ship Goods (D) 04-22 15:31:50

Record Order (E) 04-23 08:14:26

Send Notification (F)

04-23 08:17:18

ICDE 2014

Event Name Timestamp

JD (1) 03-18 09:12:07

YD (2) 03-18 09:27:14

TJD (3) 03-18 09:30:18

CK (5) 03-18 09:35:32

ZF (4) 03-18 09:50:12

FH (6) 03-18 10:30:47

DL (7) 03-18 12:31:12

FT (8) 03-18 12:40:40

Abbreviation of Chinese phonetic representation

English name

Page 6: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Convert Event Log to Graph Text similarity fails statistics and structural information Event Log Event Dependency Graph (V, E, f)

6/29

ICDE 2014

Trace ID Trace

1 ABCDEF

2 ACBDEF

3 ACBDFE

4 ABCDFE

5 ACBDEF

6 ACBDEF

7 ACBDFE

8 ACBDFE

9 ACBDFE

10 ACBDFE

A

B

C

D

E

F

1.0 1.0

1.0 1.0

1.0

0.2

f(A,C)=0.8

0.8

0.2

0.8 0.4

0.2 0.6

0.6

0.4

f(A,A)=1.0

frequency of appearance

frequency of consecutive events

Page 7: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Graph-Based Matching Framework Event logs dependency graphs Event matching vertex mapping (injective mapping : V1

→ V2)

7/29

Event Log 1

Event Log 2

A

B

C

1.0

0.3

0.8

0.2

0.8

0.1

G1

1

2

3

1.0

0.5

0.7

0.3

0.7

0.2

G2

ICDE 2014

A

B

C

G11

2

3

G2

A

B

C

G11

2

3

G2

A

B

C

G11

2

3

G2

How to evaluate the best mapping?

Page 8: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Evaluation of Mapping

Feature space: Vertex+Edge Vertex: Edge: Similarity of corresponding elements:

8/29

ICDE 2014

A

B

C

1.0

0.3

0.8

0.2

0.8

0.1

G1

1

2

3

1.0

0.5

0.7

0.3

0.7

0.2

G2

S(B2) =

B 2

S((A,C)(1,3)) =

B

C

A 1

2

3

mapping ={A1, B2, C3}A1, B2, C3

(A,B)(1,2), (A,C)(1,3), (C,B)(2,3)A, B, C

(A,B), (A,C), (C,B)

Page 9: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Normal Distance Normal Distance*:

Summation of the similarities of corresponding elements. Higher is better.

9/29

* J. Kang and J. F. Naughton. On schema matching with opaque columnnames and data values. In SIGMOD Conference, pages 205–216, 2003.

ICDE 2014

Page 10: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Event Matching Problem

={A1, B2, C3}

={A3, B2, C1}

Problem: Given two event logs and , the event matching problem is to find an event mapping that maximizes .

10/29

ICDE 2014

A

B

C

1.0

0.3

0.8

0.2

0.8

0.1

G1

1

2

3

1.0

0.5

0.7

0.3

0.7

0.2

G2

B

C

A

B

C

A 1

2

3

1

2

3

Page 11: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Vertex+Edge, Not Enough

={A6, B2, C1, D3, E4, F5}

={A3, B4, C5, D6, E7, F8}

11/29

ICDE 2014

A

B

C

D

E

F

1.0

1.0 1.0

1.0 1.0

1.0

0.2

0.8

0.8

0.2

0.8 0.4

0.2 0.6

0.6

0.4

G1

3

4

5

6

7

8

1.0

1.0 0.9

1.0 0.9

1.0

0.4

0.6

0.6

0.4

0.6 0.3

0.4 0.7

0.6

0.4

1

2

1.0

1.0

0.2

0.8

0.2

0.8

G2

A

B

C

D

E

F

3

4

5

6

1

2

14.00

𝐷𝑁 (𝑀 h𝑡𝑟𝑢𝑡 )=13.91

A

B

C

D

E

F

3

4

5

6

7

8

Vertex+Edge is not discriminative enough

Fail!

Page 12: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

More Feature: Event Patterns Event Pattern: particular orders of event occurrence

12/29

ICDE 2014

=B

=SEQ(D,E)

=AND(B,C)

=SEQ(A,AND(B,C),D)

Trace ID Trace

1 ABCDEF

2 ACBDEF

3 ACBDFE

4 ABCDFE

5 ACBDEF

6 ACBDEF

7 ACBDFE

8 ACBDFE

9 ACBDFE

10 ACBDFE

=1.0

=0.4

=1.0

=1.0

not match

match

Page 13: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Pattern Normal Distance Given an event matching and a set of patterns :

Vertices and edges can also be seen as patterns. Pattern Normal Distance is compatible with Normal

Distance

13/29

ICDE 2014

Page 14: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Matching Events with Patterns14/29

ICDE 2014

A

B

C

D

E

F

1.0

1.0 1.0

1.0 1.0

1.0

0.2

0.8

0.8

0.2

0.8 0.4

0.2 0.6

0.6

0.4

G1

3

4

5

6

7

8

1.0

1.0 0.9

1.0 0.9

1.0

0.4

0.6

0.6

0.4

0.6 0.3

0.4 0.7

0.6

0.4

1

2

1.0

1.0

0.2

0.8

0.2

0.8

G2

A

B

C

D

E

F

3

4

5

6

1

2

={A6, B2, C1, D3, E4, F5}14.00

={A3, B4, C5, D6, E7, F8}

A

B

C

D

E

F

3

4

5

6

7

8

Patterns: Vertex pattern: A, B, C, D, E, FEdge pattern: SEQ(A,B), SEQ(A,C), SEQ(B,C), SEQ(C,B), SEQ(B,D), SEQ(C,D), SEQ(D,E), SEQ(D,F), SEQ(E,F), SEQ(F,E)Complex pattern: SEQ(A, AND(B, C), D)SEQ(A, AND(B, C), D) SEQ(3, AND(4, 5), 6)

14 .91

Page 15: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Hardness of Matching Events Large amount of possible mappings:

A survey on a real Chinese bus manufacturer: The average number of distinct events is 18; The number of all the possible event mapping is

15/29

ICDE 2014

Key issue is efficiency

Page 16: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Outline

Motivation Event Matching Framework

A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H

Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion

16/29

ICDE 2014

Page 17: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

A* Search Algorithm Input: two dependency graphs, pre-defined patterns Output: a vertex mapping with the maximum Process: growth of an A* tree Tree node:

Two Scores g and h: g: current (exact) h: remaining (upper bound)

Heuristic: always visit the tree node with the highest g+h

17/29

ICDE 2014

:{} :{A,B,C,D} :{1,2,3,4}

Page 18: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Growth of A* Search Tree18/29

ICDE 2014

:{} :{A,B,C,D} :{1,2,3,4}Root node

:{A1} :{B,C,D}:{2,3,4}

node 1

:{A2} :{B,C,D}:{1,3,4}

node 2

:{A3} :{B,C,D}:{1,2,4}

node 3

:{A2,C1} :{B,D}:{3,4}

node 5

:{A2,C3} :{B,D}:{1,4}

node 6

:{A2,C4} :{B,D}:{1,3}

node 7

:{A2,C3,B4,D1} :{}:{}

node 10

:{A4} :{B,C,D}:{1,2,3}

node 4

g: 0.8h: 3.0g+h: 3.8

g: 1.0h: 3.0g+h: 4.0

g: 0.7h: 3.0g+h: 3.7

g: 0.5h: 3.0g+h: 3.5

g: 1.8h: 2.0g+h: 3.8

g: 2.0h: 2.0g+h: 4.0

g: 1.2h: 2.0g+h: 3.2

g: 4.0h: 0.0g+h: 4.0

1,2,3,4A

C1,3,4

g: current (exact)h: remaining (upper bound)

Terminate when U1 or U2 is empty

Page 19: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Incremental Computing of G19/29

ICDE 2014

A B C D

1 2 3 4

Patterns:A, B, C, D,SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D),SEQ(A,B,C), SEQ(B,C,D)

G1

G2

1. newly introduced patterns:, SEQ(C,B)

C, SEQ(B,C), SEQ(A,B,C)2. prune unmapped patterns:3. compute similarities:

3, SEQ(2,3), SEQ(1,2,3)

, SEQ(C,B) of the parent

+ these similarities= of the child

𝑴𝟏

Parent node::{A1,B2}:{C,D}:{3,4}

𝑴𝟐

Child node::{A1,B2,C3} :{D} :{4}

Page 20: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Estimating Upper Bound of H

Simple Bounding Function We assume each remaining pattern has a matching pattern with

similarity 1.0. Let h = 3.

Advanced Bounding Function

Motivation: Estimation need speed. Find for each ? Compute online ?

20/29

ICDE 2014

A B C D

1 2 3 4

Patterns:A, B, C, D,SEQ(A,B), SEQ(B,C), SEQ(C,B), SEQ(C,D),SEQ(A,B,C), SEQ(B,C,D)

G1

G2

:{A1,B2,C3} :{D} :{4}

Remaining Patterns:D,SEQ(C,D),SEQ(B,C,D)

Page 21: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Advanced Bounding Function Use other frequency to take the place of Highest vertex frequency Highest edge frequency

21/29

ICDE 2014

Case of Pattern Upper Bound

a general pattern

a simple pattern SEQ(, ... , )

a simple pattern AND(, ... , )

a complex pattern

Page 22: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Outline

Motivation Event Matching Framework

A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H

Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion

22/29

ICDE 2014

Page 23: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Pay-As-You-Go Matching Motivation:

Interesting event patterns are gradually identified. Best matching may change.

Two heuristic strategy: Continue Restart

23/29

ICDE 2014

:{} :{A,B,C,D} :{1,2,3,4}

:{A1} :{B,C,D}:{2,3,4}

:{A2} :{B,C,D}:{1,3,4}

:{A3} :{B,C,D}:{1,2,4}

:{A2,C3,B4,D1} :{}:{}

:{A4} :{B,C,D}:{1,2,3}

Materialize leaf nodes

:{A2,C1} :{B,D}:{3,4}

:{A2,C3} :{B,D}:{1,4}

:{A2,C4} :{B,D}:{1,3}

Materialize previous answer for pruning

Page 24: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Outline

Motivation Event Matching Framework

A* Search Algorithm Computing the Normal Distance G Simple Upper Bound of H

Advanced Bounding Function Pay-As-You-Go Matching Experiments Conclusion

24/29

ICDE 2014

Page 25: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Experiment Setting

Real Life Data Set: employed from the bus manufacturer

True-mapping is generated manually by domain experts.

Criteria: to evaluate the accuracy of event matching, F-measure of precision and recall.

Baseline: Opaque matching1, Iterative Matching2.

1. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20032. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.

25/29

No. of Event Logs 38 Min Event Size 2

No. of Traces 3000 Max Event Size 11

ICDE 2014

Page 26: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Effectiveness and Efficiency26/29

ICDE 2014

Our ApproachOur Approach

Our ApproachOur Approach

Page 27: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Performance on pay-as-you-go

More patterns, higher accuracy; Pay-as-you-go strategies accelerate the re-computation of

new event matching.

27/29

ICDE 2014

Page 28: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Conclusion

Pattern based generic framework (Vertex+Edge+Complex) Patterns Compatible with existing methods.

An advanced bounding function.

Support matching in a pay-as-you-go style.

28/29

ICDE 2014

Page 29: Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.

Q & AThanks!

29/29

ICDE 2014


Recommended