Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | antony-hodge |
View: | 214 times |
Download: | 0 times |
Cleaning Structured Event Logs: A Graph
Repair Approach
Jianmin Wang1, Shaoxu Song1, Xuemin Lin2, Xiaochen Zhu1, Jian Pei3
1Tsinghua University, China2University of New South Wales, Australia
3Simon Fraser University, Canada
1/23
ICDE 2015
Event Log Information systems record the business history in their event logs.
3/23
Huge Amount of Event Data:
Corporation
Products
No. of Event Traces 1,230,000
Power Generator
3,260,000
Machinery
2,600,000
Train
Event Name Operator Successor
t1 submit M. Liu F. Kang
t2 design F. Kang J. Zhe & O. Chu
t3 insulation proof J. Zhe X. Feng
t4 check inventory O. Chu X. Feng
t5 evaluate X. Feng System2
t6 archive System2 -------
ICDE 2015
Structured Event Data Structural information do exist among events.Task passing relationships:
4/23
Event Name Operator Successor
t1 submit A B
t2 design B C & D
t3 insulation proof C E
t4 check inventory D E
t5 evaluate E F
t6 archive F -------
Structured Event Log Execution Graph
Human Task Service Task
submit design
insulation proof
check inventory
archiveevaluate
ICDE 2015
Process Specification Business events often follow certain business rules or constraints
5/23
Process specification
Execution
Constraints by Petri net:• Sequence• Parallel• Choice
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign check inventory
evaluate
electrician proof
insulation proof
XORsplit
XORjoin
ANDsplit
ANDjoin
asubmit
revise merge re-evaluatefo
llow
submit design
insulation proof
check inventory
archiveevaluate submit design
electricianproof
check inventory
archiveevaluatesubmit revise proofcheck
merge archivere-evaluate
ICDE 2015
Conformance6/23
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign check inventory
evaluate
electrician proof
insulation proof
submit design
insulation proof
check inventory
archiveevaluate
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: insulation proof
Representing execution as Causal Net (Petri net without XOR)
A mapping
Process specification
CausalNet p0 p7p1
p2 p3
p4 p5 p6t1 t6t2 t4 t5
t3
start enda
b c
d e ssubmit archivedesign check inventory
evaluate
insulation proof
ICDE 2015
Dirty Event Data7/23
check inventory
electrician proof
insulation proof
start enda
b c
d e
f g h
ssubmit
revise proof check
merge re-evaluate
archivedesign evaluate
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:revise
t4: -------- t5:evaluate
t3: proof
p0:start
p3: end
p1:a
p2:b
t1:submit
t3:archive
t2:design
Inconsistent Labeling
Unsound Structure
check inventory
electrician proof
insulation proof
submit
revise proof check
merge re-evaluate
archive
t2:revise
t3: proof
t4: --------
electrician proof
insulation proof
proof checkTwo types of dirty event data:
According to the specification:
ICDE 2015
Meaning of Repair8/23
The causes of dirty events: Man-made errors (typo); System failures (power down).
Survey in a bus manufacturer: 82% executions are dirty; 77.62% are inconsistent labeling, 4.45% are unsound structure.
Dirty event data may: Return wrong provenance answer; Mislead the aggregation profiling; Obstruct finding interesting process patterns.
ICDE 2015
Repair Dirty Event9/23
Inconsistent Labeling
Unsound Structure
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: electrician proof
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:design
t4: check inventory
t5:evaluate
t3: insulation proof
1. Find all consistent mappings
2. Choose the one with the minimum repairing cost
No valid repair is found
ICDE 2015
Hardness and Related Work10/23
Hardness:Owing to choices and parallelization of flows, there exist vast possible repairs; Existing methods:Event Log Alignment1:
Does not exploit structural information.
Graph Repair2: Does not consider AND and XOR constraints.
1. M. de Leoni, F. M. Maggi, and W. M. P. van der Aalst. Aligning event logs and declarative process models for conformance checking. In BPM, pages 82–97, 2012.2. S. Song, H. Cheng, J. X. Yu, and L. Chen. Repairing vertex labels under neighborhood constraints. PVLDB, 7(11):987–998, 2014
Event Name
t1 submit
t2 do revise
t3 proof
t4 -----------
t5 evaluate
t6 archive
Event Name
t1 submit
t2 revise
t3 proof check
t4 merge
t5 re-evaluate
t6 archive
Event Name Operator Successor
t1 submit A B
t2 design B C & D
t3 insulation proof C E
t4 check inventory D E
t5 evaluate E F
t6 archive F -------
ICDE 2015
Branch and Bound12/23
Branch: Trying all the possible repairs; Branching at XOR split according to the specification.
Lower Bound: Simple bound = current repair cost
t1:submit
t1:submit,
t2:design
t1:submit, t2:revise
t1:submit,t2:design,
t3:insulation proof,t4:check inventory
t1:submit,t2:design,
t3:insulation proof,
t4:check inventory,t5:evaluate
t1:submit,t2:design,
t3:insulation proof,t4:check inventory,
t5:evaluate,t6:archive
t1:submit,t2:design,
t3:insulation proof
t1:submit,t2:design,
t3:electrician proof
t1:submit,t2:design,
t3:electrician proof,
t4:check inventory
bound=0
t1:submit bound=6
t1:submit,
t2:design
bound=3
t1:submit, t2:revise
bound=17
t1:submit,t2:design,
t3:electrician proof
bound=16
t1:submit,t2:design,
t3:insulation proof
bound=30
t1:submit,t2:design,
t3:insulation proof,t4:check inventory
cost=30
t1:submit,t2:design,
t3:insulation proof,t4:check inventory,
t5:evaluate,t6:archivebound=30
t1:submit,t2:design,
t3:insulation proof,
t4:check inventory,t5:evaluate
bound=31
t1:submit,t2:design,
t3:electrician proof,
t4:check inventorybound=31
t1:submit,t2:design,
t3:electrician proof,
t4:check inventoryinvalid
t1:submit, t2:revise
ICDE 2015
Pruning Invalid Branch13/23
Pruning Rule:
The longest path lengthin causal net
The shortest path lengthin specification
<
start enda
b c
dA
C D E
FB
t2:C
p3p0:start
t1:A
t3p2p1:a
Process specification
CausalNet
Invalid!
Length = 2 (Transitions)
Length = 4 (Transitions)
ICDE 2015
Advanced Bounding Function14/23
(naïve bound=0)t1:submit
t2:w(t2)=3
t3:w(t3)=5
t5:w(t5)=0
t4:w(t4)=5
p0:start
p7: end
p1:a
p2:b
p3:c
p4:d
p5:e
p6:s
t1:submit
t6:archive
t2:do revise
t4: ------- t5:evaluate
t3: proof
1. Build a conflict graph:
where w(t) is the minimum cost on all possible repairs of t
Example: to estimate a lower bound for
2. Remove edges (with vertices) until the conflict graph becomes empty:
3. For each removed edge, add the minimum w(t) on the edge to the lower bound:
Advanced Bound = min{w(t2), w(t3)} + min{w(t4), w(t5)} = 3
Remove (t2, t3) and (t4, t5)
t2:w(t2)=3
t3:w(t3)=5
t5:w(t5)=0
t4:w(t4)=5
ICDE 2015
One Pass Algorithm16/23
1. The start place in causal net <-----> the start place in specification
2. Candidates for Transition ’(t𝜋 k):
• pre(tk) have already been determined;• choose candidates without introducing inconsistency on pre(tk) .
Heuristic: pass the causal net from the start to the end only once, determine the mapping 𝜋’ for each place and transition.
p2:b
p4:d
t2:design
t3Causal net
…
…
…b
ddesign
electrician proof
insulation proofSpecification
… …
…
Candidates for t3:t3:insulation prooft3:electrician proof
3. Choose the candidate that introduces less inconsistency on post(tk) .
One Pass algorithm may report false positive unsound structure!ICDE 2015
Experiment Setting18/23
Real Life Data Set:
Setting we randomly change event names in execution traces as
faults; apply the repair methods to modify the execution trace (find
new mapping). Criteria: to evaluate the accuracy of recovery,
F-measure of precision and recall. Baseline: Event Log Alignment and Graph Repair.
Places in Process Specification
24No. of Event Traces
4722
Transitions in Process Specification
22maximum size of pre/post set
3
Employed from bus manufacturer: Employed from a telecom company:
Places in Process Specification
31No. of Event Traces
1040
Transitions in Process Specification
32maximum size of pre/post set
3
ICDE 2015
Effectiveness and Efficiency19/23
3 6 9 12 15 18 21 24 270
5
10
15
20
25
30
35
time performance
Fault size
Tim
e co
st (
ms)
3 6 9 12 15 18 21 24 270
5
10
15
20
25
30
35
40
45
50 time performance
Fault size
Tim
e co
st (
ms)
3 6 9 12 15 18 21 24 270
0.10.20.30.40.50.60.70.80.9
1
accuracy
Fault size
F-m
easu
re
3 6 9 12 15 18 21 24 270
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1accuracy
OPExactAlignmentGraph
Fault size
F-m
easu
re
Bus manufacturer data set: Telecom company data set:
Low time cost
High accuracy
ICDE 2015
Scalability on Synthetic Data20/23
100 200 300 400 500 600 700 800 900 10000
1000
2000
3000
4000
5000
6000
prune power
Trace size
Pro
cess
ed e
lem
ents
100 200 300 400 500 600 700 800 900 10000
500
1000
1500
2000
2500time performance
OPESPI+ESEA
Trace size
Tim
e co
st (
s)
Synthetic data set:
Pruning Invalid Branch + Advanced
Bound
ICDE 2015
Conclusion
Define Minimum Repair Problem on Structured Event Logs
A Branch and Bound Repair framework Find the minimum repair; Detect unsound Structure.
Pruning and Advanced Bounding Function
A PTIME Approximate Algorithm
22/23
ICDE 2015