+ All Categories
Home > Documents > Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China...

Xiaochen Zhu 1, Shaoxu Song 1, Xiang Lian 2, Jianmin Wang 1, Lei Zou 3 1 Tsinghua University, China...

Date post: 02-Jan-2016
Category:
Upload: rafe-byrd
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
Matching Heterogeneous Event Data Xiaochen Zhu 1 , Shaoxu Song 1 , Xiang Lian 2 , Jianmin Wang 1 , Lei Zou 3 1 Tsinghua University, China 2 University of Texas - Pan American, USA 3 Peking University, China 1/21 SIGMOD 2014
Transcript

Matching Heterogeneous Event Data

Xiaochen Zhu1, Shaoxu Song1, Xiang Lian2, Jianmin Wang1, Lei Zou3

1Tsinghua University, China2University of Texas - Pan American, USA

3Peking University, China

1/21

SIGMOD 2014

Outline`

Motivation Event Matching Similarity

Structural Similarity Function Iterative Computation Estimation

Matching Composite Events Experiments Conclusion

2/21

SIGMOD 2014

Information System and Event Log

Information systems play an important role in large enterprises:

Enterprise Resource Planning (ERP) Office Automation (OA)

These systems record the business history in their event logs.

3/21

SIGMOD 2014

Trace ID Trace Trace ID Trace

1 ACDEF 6 BCDEF

2 BCDFE 7 BCDFE

3 ACDFE 8 BCDEF

4 ACDFE 9 BCDFE

5 ACDEF 10 BCDFE

ACDEF

Event ID Trace ID Event Name Timestamp

1 1 Pay by Cash (A) 04-22 13:33:34

2 1 Check Inventory (C) 04-22 15:18:11

3 1 Validate (D) 04-22 15:31:50

4 1 Ship Goods (E) 04-23 08:14:26

5 1 Email Customer (F) 04-23 08:17:18

Event Data Integration

Complex event processing Provenance analysis Decision support

4/21

Business Data Warehouse

Event Logs

Beijing Subsidiary

Event Logs

Shanghai Subsidiary

Event Logs

Hong Kong Subsidiary

Information systems

Information systems

Information systems

SIGMOD 2014

Exploring the correspondence among events

Heterogeneous Events

Different events may represent the same activity

5/21

ID Trace

t1 Pay by Cash (A) Check Inventory (C) Validate (D) Ship Goods (E) Email Customer (F)

t2 Pay by Credit Card (B) Check Inventory (C) Validate (D) Email Customer (F) Ship Goods (E)

… …

ID Trace

s1 Order Accepted (1) Pay by Cash (2) Inventory Checking & Validation (4) ????????? (5) Send Notification (6)

s2 Order Accepted (1) Pay by Credit Card (3) Inventory Checking & Validation (4) Send Notification (6) ???????? (5)

… …

SIGMOD 2014

Linguistic Matching Dislocated MatchingSemantic MatchingOpaque MatchingComposite Events Matching

Convert Event Log to Graph Text Similarity fails Statistics and structural information Event Log Event Dependency Graph (V, E, f)

6/21

Trace ID Trace

1 ACDEF

2 BCDFE

3 ACDFE

4 ACDFE

5 ACDEF

6 BCDEF

7 BCDFE

8 BCDEF

9 BCDFE

10 BCDFE

A

B

C D

E

F

1.0

0.6

1.0

0.6

0.4

0.4

f(B,C)=0.6

1.00.4

0.4

0.6

0.6

f(A)=0.4

frequency of appearance

frequency of consecutive eventsSIGMOD 2014

7

Related WorkLinguistic Matching

Semantic Matching

Opaque Matching

Dislocated Matching

Composite Events

Graph Edit Distance

OpaqueSchema Matching

Behavioral Matching

Event Matching Similarity1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, 20092. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20033. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.

Event Matching Framework8/21

A

B

C D

E

F

1.0

0.6

1.0

0.6

0.4

0.4

0.6

1.0 0.40.4

0.6

0.6

0.4

1

3

2

4

5

6

1.0

0.6

1.0

0.6

0.40.4

0.6

1.00.4

0.4

0.6

0.6

0.4

1.0

Event Logs Dependency Graphs

Event Matching

Similarities

Correspondences

CompositeEvent

Matching

Trace ID Trace

1 ACDEF

… …

Trace ID Trace

1 12456

… …

1 2 3 4 5 6

A 0.23 0.80 0.52 0.20 0.15 0.19

B 0.38 0.53 0.76 0.24 0.20 0.23

C 0.30 0.16 0.20 0.61 0.20 0.22

D 0.34 0.15 0.20 0.37 0.24 0.25

E 0.27 0.21 0.19 0.18 0.28 0.20

F 0.30 0.19 0.23 0.23 0.20 0.72

A2, B3, C4, D1 E5, F6A2, B3, {C,D}4, E5, F6

Event Matching

Similarities

SIGMOD 2014

Outline

Motivation Event Matching Similarity

Intuition Iterative Computation Estimation

Matching Composite Events Experiments Conclusion

9/21

SIGMOD 2014

An Intuition from Simrank*

Intuition of evaluating the similarity of two events v1 and v2: 1. S(v1 ,v2)=1, if both v1 and v2 have no input neighbor; 2. v1 is similar to v2, if they frequently share similar

input neighbors.

10/21

SIGMOD 2014

* G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD, pages 538–543, 2002.

A

B

C D

E

F

1

3

2

4

5

6

Problem: Cannot deal with dislocated matching

Handle the Dislocated Matching

Introduce an artificial event vX

1. S( , )=1; 2. v1 is similar to v2, if they frequently share similar

input neighbors.

11/21

SIGMOD 2014

A

B

C D

E

F

1

3

2

4

5

6

𝐯𝟏𝐗 𝐯𝟐

𝐗

Iterative Computation12/21

SIGMOD 2014

A

B

C D

E

F

𝐯𝟏𝐗

1

3

2

4

5

6

𝐯𝟐𝐗

1 2 3 4 5 6

1.00 0 0 0 0 0 0

A 0 0 0 0 0 0 0

B 0 0 0 0 0 0 0

C 0 0 0 0 0 0 0

D 0 0 0 0 0 0 0

E 0 0 0 0 0 0 0

F 0 0 0 0 0 0 0

I = 0I = 1I = 2

I = 20

1 2 3 4 5 6

1.00 0 0 0 0 0 0

A 0 0.23 0.80 0.52 0.20 0.15 0.19

B 0 0.38 0.53 0.76 0.24 0.20 0.23

C 0 0.30 0.10 0.13 0.40 0.13 0.17

D 0 0.34 0.11 0.15 0.34 0.17 0.17

E 0 0.27 0.14 0.13 0.13 0.13 0.13

F 0 0.30 0.13 0.15 0.18 0.13 0.63

1 2 3 4 5 6

1.00 0 0 0 0 0 0

A 0 0.23 0.80 0.52 0.20 0.15 0.19

B 0 0.38 0.53 0.76 0.24 0.20 0.23

C 0 0.30 0.16 0.20 0.61 0.19 0.22

D 0 0.34 0.15 0.20 0.36 0.21 0.22

E 0 0.27 0.21 0.19 0.17 0.26 0.19

F 0 0.30 0.19 0.23 0.22 0.19 0.70

1 2 3 4 5 6

1.00 0 0 0 0 0 0

A 0 0.23 0.80 0.52 0.20 0.15 0.19

B 0 0.38 0.53 0.76 0.24 0.20 0.23

C 0 0.30 0.16 0.20 0.61 0.20 0.22

D 0 0.34 0.15 0.20 0.37 0.24 0.25

E 0 0.27 0.21 0.19 0.18 0.28 0.20

F 0 0.30 0.19 0.23 0.23 0.20 0.72

Estimation

For huge and complex graphs, it needs tens or hundreds of iterations to converge.

Instead, we only do I rounds of iterations, and then estimate the converged similarities.

13/21

SIGMOD 2014

Trade-off between accuracy and efficiency.

I : accuracy time I: accuracy time

Outline

Motivation Event Matching Similarity

Structural Similarity Function Iterative Computation Estimation

Matching Composite Events Experiments Conclusion

14/21

SIGMOD 2014

Matching Composite Events

Candidates of Composite Events: C and D, E and F… Pre-defined or discovered automatically

Heuristics: Which candidate improves the average similarity

15/21

SIGMOD 2014

A

B

C D

E

F

1

3

2

4

5

6

A

B

C,D

E

F

A

B

C D E,F

Outline

Motivation Event Matching Similarity

Structural Similarity Function Iterative Computation Estimation

Matching Composite Events Experiments Conclusion

16/21

SIGMOD 2014

Experiment Setting

Real Life Data Set: employed from a real bus manufacturer

True event matching is generated manually by domain experts. Criteria: to evaluate the accuracy of event matching,

F-measure of precision and recall. Baseline: Graph Edit Distance1, Opaque matching2, Behavioral

Matching3.

1. R. M. Dijkman, M. Dumas, and L. Garc´ıa-Ba˜nuelos. Graph matching algorithms for business process model similarity search. In BPM, pages 48–63, 20092. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD Conference, pages 205–216, 20033. S. Nejati, M. Sabetzadeh, M. Chechik, S. M. Easterbrook, and P. Zave. Matching and merging of statecharts specifications. In ICSE, pages 54–64, 2007.

17/21

No. of Event Logs 149 Min Event Size 2

No. of Traces 6000 Max Event Size 11

ICDE 2014

Effectiveness and Efficiency18/21

ICDE 2014

Our Approach

Our Approach

Trade-off in Estimation19/21

ICDE 2014

Conclusion

Event matching framework: Work well with dislocated matching. Work well with opaque event names.

An estimative function for trade-off.

Heuristics on matching composite events.

20/21

SIGMOD 2014

Q & AThanks!

21/21

SIGMOD 2014


Recommended