+ All Categories
Home > Documents > Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China...

Jianmin Wang 1, Shaoxu Song 1, Xuemin Lin 2, Xiaochen Zhu 1, Jian Pei 3 1 Tsinghua University, China...

Date post: 01-Jan-2016
Category:
Upload: antony-hodge
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
23
Cleaning Structured Event Logs: A Graph Repair Approach Jianmin Wang 1 , Shaoxu Song 1 , Xuemin Lin 2 , Xiaochen Zhu 1 , Jian Pei 3 1 Tsinghua University, China 2 University of New South Wales, Australia 3 Simon Fraser University, Canada 1/23 ICDE 2015
Transcript

Cleaning Structured Event Logs: A Graph

Repair Approach

Jianmin Wang1, Shaoxu Song1, Xuemin Lin2, Xiaochen Zhu1, Jian Pei3

1Tsinghua University, China2University of New South Wales, Australia

3Simon Fraser University, Canada

1/23

ICDE 2015

Outline

Motivation Exact Algorithm Approximation Experiments Conclusion

2/23

ICDE 2015

Event Log Information systems record the business history in their event logs.

3/23

Huge Amount of Event Data:

Corporation

Products

No. of Event Traces 1,230,000

Power Generator

3,260,000

Machinery

2,600,000

Train

Event Name Operator Successor

t1 submit M. Liu F. Kang

t2 design F. Kang J. Zhe & O. Chu

t3 insulation proof J. Zhe X. Feng

t4 check inventory O. Chu X. Feng

t5 evaluate X. Feng System2

t6 archive System2 -------

ICDE 2015

Structured Event Data Structural information do exist among events.Task passing relationships:

4/23

Event Name Operator Successor

t1 submit A B

t2 design B C & D

t3 insulation proof C E

t4 check inventory D E

t5 evaluate E F

t6 archive F -------

Structured Event Log Execution Graph

Human Task Service Task

submit design

insulation proof

check inventory

archiveevaluate

ICDE 2015

Process Specification Business events often follow certain business rules or constraints

5/23

Process specification

Execution

Constraints by Petri net:• Sequence• Parallel• Choice

start enda

b c

d e

f g h

ssubmit

revise proof check

merge re-evaluate

archivedesign check inventory

evaluate

electrician proof

insulation proof

XORsplit

XORjoin

ANDsplit

ANDjoin

asubmit

revise merge re-evaluatefo

llow

submit design

insulation proof

check inventory

archiveevaluate submit design

electricianproof

check inventory

archiveevaluatesubmit revise proofcheck

merge archivere-evaluate

ICDE 2015

Conformance6/23

start enda

b c

d e

f g h

ssubmit

revise proof check

merge re-evaluate

archivedesign check inventory

evaluate

electrician proof

insulation proof

submit design

insulation proof

check inventory

archiveevaluate

p0:start

p7: end

p1:a

p2:b

p3:c

p4:d

p5:e

p6:s

t1:submit

t6:archive

t2:design

t4: check inventory

t5:evaluate

t3: insulation proof

Representing execution as Causal Net (Petri net without XOR)

A mapping

Process specification

CausalNet p0 p7p1

p2 p3

p4 p5 p6t1 t6t2 t4 t5

t3

start enda

b c

d e ssubmit archivedesign check inventory

evaluate

insulation proof

ICDE 2015

Dirty Event Data7/23

check inventory

electrician proof

insulation proof

start enda

b c

d e

f g h

ssubmit

revise proof check

merge re-evaluate

archivedesign evaluate

p0:start

p7: end

p1:a

p2:b

p3:c

p4:d

p5:e

p6:s

t1:submit

t6:archive

t2:revise

t4: -------- t5:evaluate

t3: proof

p0:start

p3: end

p1:a

p2:b

t1:submit

t3:archive

t2:design

Inconsistent Labeling

Unsound Structure

check inventory

electrician proof

insulation proof

submit

revise proof check

merge re-evaluate

archive

t2:revise

t3: proof

t4: --------

electrician proof

insulation proof

proof checkTwo types of dirty event data:

According to the specification:

ICDE 2015

Meaning of Repair8/23

The causes of dirty events: Man-made errors (typo); System failures (power down).

Survey in a bus manufacturer: 82% executions are dirty; 77.62% are inconsistent labeling, 4.45% are unsound structure.

Dirty event data may: Return wrong provenance answer; Mislead the aggregation profiling; Obstruct finding interesting process patterns.

ICDE 2015

Repair Dirty Event9/23

Inconsistent Labeling

Unsound Structure

p0:start

p7: end

p1:a

p2:b

p3:c

p4:d

p5:e

p6:s

t1:submit

t6:archive

t2:design

t4: check inventory

t5:evaluate

t3: electrician proof

p0:start

p7: end

p1:a

p2:b

p3:c

p4:d

p5:e

p6:s

t1:submit

t6:archive

t2:design

t4: check inventory

t5:evaluate

t3: insulation proof

1. Find all consistent mappings

2. Choose the one with the minimum repairing cost

No valid repair is found

ICDE 2015

Hardness and Related Work10/23

Hardness:Owing to choices and parallelization of flows, there exist vast possible repairs; Existing methods:Event Log Alignment1:

Does not exploit structural information.

Graph Repair2: Does not consider AND and XOR constraints.

1. M. de Leoni, F. M. Maggi, and W. M. P. van der Aalst. Aligning event logs and declarative process models for conformance checking. In BPM, pages 82–97, 2012.2. S. Song, H. Cheng, J. X. Yu, and L. Chen. Repairing vertex labels under neighborhood constraints. PVLDB, 7(11):987–998, 2014

Event Name

t1 submit

t2 do revise

t3 proof

t4 -----------

t5 evaluate

t6 archive

Event Name

t1 submit

t2 revise

t3 proof check

t4 merge

t5 re-evaluate

t6 archive

Event Name Operator Successor

t1 submit A B

t2 design B C & D

t3 insulation proof C E

t4 check inventory D E

t5 evaluate E F

t6 archive F -------

ICDE 2015

Outline

Motivation Exact Algorithm Approximation Experiments Conclusion

11/23

ICDE 2015

Branch and Bound12/23

Branch: Trying all the possible repairs; Branching at XOR split according to the specification.

Lower Bound: Simple bound = current repair cost

t1:submit

t1:submit,

t2:design

t1:submit, t2:revise

t1:submit,t2:design,

t3:insulation proof,t4:check inventory

t1:submit,t2:design,

t3:insulation proof,

t4:check inventory,t5:evaluate

t1:submit,t2:design,

t3:insulation proof,t4:check inventory,

t5:evaluate,t6:archive

t1:submit,t2:design,

t3:insulation proof

t1:submit,t2:design,

t3:electrician proof

t1:submit,t2:design,

t3:electrician proof,

t4:check inventory

bound=0

t1:submit bound=6

t1:submit,

t2:design

bound=3

t1:submit, t2:revise

bound=17

t1:submit,t2:design,

t3:electrician proof

bound=16

t1:submit,t2:design,

t3:insulation proof

bound=30

t1:submit,t2:design,

t3:insulation proof,t4:check inventory

cost=30

t1:submit,t2:design,

t3:insulation proof,t4:check inventory,

t5:evaluate,t6:archivebound=30

t1:submit,t2:design,

t3:insulation proof,

t4:check inventory,t5:evaluate

bound=31

t1:submit,t2:design,

t3:electrician proof,

t4:check inventorybound=31

t1:submit,t2:design,

t3:electrician proof,

t4:check inventoryinvalid

t1:submit, t2:revise

ICDE 2015

Pruning Invalid Branch13/23

Pruning Rule:

The longest path lengthin causal net

The shortest path lengthin specification

<

start enda

b c

dA

C D E

FB

t2:C

p3p0:start

t1:A

t3p2p1:a

Process specification

CausalNet

Invalid!

Length = 2 (Transitions)

Length = 4 (Transitions)

ICDE 2015

Advanced Bounding Function14/23

(naïve bound=0)t1:submit

t2:w(t2)=3

t3:w(t3)=5

t5:w(t5)=0

t4:w(t4)=5

p0:start

p7: end

p1:a

p2:b

p3:c

p4:d

p5:e

p6:s

t1:submit

t6:archive

t2:do revise

t4: ------- t5:evaluate

t3: proof

1. Build a conflict graph:

where w(t) is the minimum cost on all possible repairs of t

Example: to estimate a lower bound for

2. Remove edges (with vertices) until the conflict graph becomes empty:

3. For each removed edge, add the minimum w(t) on the edge to the lower bound:

Advanced Bound = min{w(t2), w(t3)} + min{w(t4), w(t5)} = 3

Remove (t2, t3) and (t4, t5)

t2:w(t2)=3

t3:w(t3)=5

t5:w(t5)=0

t4:w(t4)=5

ICDE 2015

Outline

Motivation Exact Algorithm Approximation Experiments Conclusion

15/23

ICDE 2015

One Pass Algorithm16/23

1. The start place in causal net <-----> the start place in specification

2. Candidates for Transition ’(t𝜋 k):

• pre(tk) have already been determined;• choose candidates without introducing inconsistency on pre(tk) .

Heuristic: pass the causal net from the start to the end only once, determine the mapping 𝜋’ for each place and transition.

p2:b

p4:d

t2:design

t3Causal net

…b

ddesign

electrician proof

insulation proofSpecification

… …

Candidates for t3:t3:insulation prooft3:electrician proof

3. Choose the candidate that introduces less inconsistency on post(tk) .

One Pass algorithm may report false positive unsound structure!ICDE 2015

Outline

Motivation Exact Algorithm Approximation Experiments Conclusion

17/23

ICDE 2015

Experiment Setting18/23

Real Life Data Set:

Setting we randomly change event names in execution traces as

faults; apply the repair methods to modify the execution trace (find

new mapping). Criteria: to evaluate the accuracy of recovery,

F-measure of precision and recall. Baseline: Event Log Alignment and Graph Repair.

Places in Process Specification

24No. of Event Traces

4722

Transitions in Process Specification

22maximum size of pre/post set

3

Employed from bus manufacturer: Employed from a telecom company:

Places in Process Specification

31No. of Event Traces

1040

Transitions in Process Specification

32maximum size of pre/post set

3

ICDE 2015

Effectiveness and Efficiency19/23

3 6 9 12 15 18 21 24 270

5

10

15

20

25

30

35

time performance

Fault size

Tim

e co

st (

ms)

3 6 9 12 15 18 21 24 270

5

10

15

20

25

30

35

40

45

50 time performance

Fault size

Tim

e co

st (

ms)

3 6 9 12 15 18 21 24 270

0.10.20.30.40.50.60.70.80.9

1

accuracy

Fault size

F-m

easu

re

3 6 9 12 15 18 21 24 270

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1accuracy

OPExactAlignmentGraph

Fault size

F-m

easu

re

Bus manufacturer data set: Telecom company data set:

Low time cost

High accuracy

ICDE 2015

Scalability on Synthetic Data20/23

100 200 300 400 500 600 700 800 900 10000

1000

2000

3000

4000

5000

6000

prune power

Trace size

Pro

cess

ed e

lem

ents

100 200 300 400 500 600 700 800 900 10000

500

1000

1500

2000

2500time performance

OPESPI+ESEA

Trace size

Tim

e co

st (

s)

Synthetic data set:

Pruning Invalid Branch + Advanced

Bound

ICDE 2015

Outline

Motivation Exact Algorithm Approximation Experiments Conclusion

21/23

ICDE 2015

Conclusion

Define Minimum Repair Problem on Structured Event Logs

A Branch and Bound Repair framework Find the minimum repair; Detect unsound Structure.

Pruning and Advanced Bounding Function

A PTIME Approximate Algorithm

22/23

ICDE 2015

Q & AThanks!

23/23

ICDE 2015


Recommended