Post on 01-Dec-2014
description
transcript
Applicability of Process Mining
Techniques in Business Environments
Annual Meeting IEEE Task Force on Process Mining
Andrea Burattin
andreaburattin
September 8, 2014
Brief Curriculum Vitæ
2009, M.Sc.Computer Science (A.I. program)University of Padova
2009 � 2012, Ph.D.Supervisor: Prof. Alessandro SperdutiJoint school University of Bologna�PadovaThesis defended on April 2013
2013 � 2014, PostdocPrompt project (prompt.processmining.it)
University of PadovaSpecola, Padova. http://flic.kr/p/cEW5bo
2 of 17
Ph.D. Inception
Ph.D background
Inception during M.Sc. thesis� Companies: study on process mining
A company (Siav S.p.A.) funded my PhD
www.siav.it� Aim: investigate applicability of process
mining techniques in business scenarios� Interaction with companies: interesting! (but sometimes. . . )
Outcome� �Applicability of Process Mining Techniques in Business
Environments�
3 of 17
Quick Recap of Process Mining
Imagination
Process Mining
Incarnation / Environment
Observation
OperationalModel
AnalyticalModel Event Logs
InformationSystem
OperationalIncarnation
support
protocol/ audit
Discovery
Conformance
Extension
control
augment
comparecompare
analyze
mine
basis
create
(re-)design
implement
describe
Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Process Mining
Incarnation / Environment
Observation
OperationalModel
AnalyticalModel Event Logs
InformationSystem
OperationalIncarnation
support
protocol/ audit
Discovery
Conformance
Extension
control
augment
comparecompare
analyze
mine
basis
create
(re-)design
implement
describe
Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Process Mining
Incarnation / Environment
Observation
OperationalModel
AnalyticalModel Event Logs
InformationSystem
OperationalIncarnation
support
protocol/ audit
Discovery
Conformance
Extension
control
augment
comparecompare
analyze
mine
basis
create
(re-)design
implement
describe
Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Quick Recap of Process Mining
Imagination
Process Mining
Incarnation / Environment
Observation
OperationalModel
AnalyticalModel Event Logs
InformationSystem
OperationalIncarnation
support
protocol/ audit
Discovery
Conformance
Extension
control
augment
comparecompare
analyze
mine
basis
create
(re-)design
implement
describe
Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.
4 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Di�erent perspectives from
di�erent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and con�guring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
4 Not overlapping sets
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Di�erent perspectives from
di�erent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and con�guring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
4 Not overlapping sets
5 of 17
Theoretical vs. Industrial-related Open Problems
Some literature open problems
Duplicate tasks
Exploiting all data available
Holistic mining
Di�erent perspectives from
di�erent sources
Noise and incompleteness
Case studies open problems
Using process mining tools
and con�guring algorithms
Results interpretation
Readable results
Computational power and
storage capacity required
4 Not overlapping sets
5 of 17
Possible Industry Scenarios
Four possible industry scenarios
Process aware vs. Process unaware
Process aware software vs. Process unaware software
Company 1 Company 2
Company 3Company 4
Process Unaware
Information Systems
Process Aware
Information Systems
Process Aware
Companies
Process Unaware
Companies
6 of 17
Thesis Structure and Organization
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
6 of 17
Overview � Data Preparation
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
6 of 17
Problems with Data Preparation
Problems at di�erent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Di�cult)
Typical set of required �elds
(case-id; activity; timestamp; [process-name]; [originator])
Our context: Company process aware; IS process unaware
Structure of available log
(activity; timestamp; originator; info1; . . . ; infon)
7 of 17
Problems with Data Preparation
Problems at di�erent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Di�cult)
Typical set of required �elds
(case-id; activity; timestamp; [process-name]; [originator])
Our context: Company process aware; IS process unaware
Structure of available log
(activity; timestamp; originator; info1; . . . ; infon)
7 of 17
Problems with Data Preparation
Problems at di�erent complexity and abstraction levels. Examples:
Adaptation of existing data (Syntax problem, easy)
Introduction of new information (Di�cult)
Typical set of required �elds
(case-id; activity; timestamp; [process-name]; [originator])
Our context: Company process aware; IS process unaware
Structure of available log
(activity; timestamp; originator; info1; . . . ; infon)
7 of 17
Problems with Data Preparation (cont.)
Case-id from infoi �elds
Candidate case-id �eldsA-priori knowledge
Events chainsStrings similarity functions
Selection of maximal chainMost activities or simplest chain
Process name is not a problem
All events belonging to the same process
Act. info1 info2
a1 AB-01 BB-01
a2 AA-02 AB-01
a3 AB-01 BB-02
a4 AB-01 BB-03
a1 AA-03 BB-04
a5 AA-03 BB-05
8 of 17
Problems with Data Preparation (cont.)
Case-id from infoi �elds
Candidate case-id �eldsA-priori knowledge
Events chainsStrings similarity functions
Selection of maximal chainMost activities or simplest chain
Process name is not a problem
All events belonging to the same process
Act. info1 info2
a1 AB-01 BB-01
a2 AA-02 AB-01
a3 AB-01 BB-02
a4 AB-01 BB-03
a1 AA-03 BB-04
a5 AA-03 BB-05
8 of 17
Overview � Control-�ow Mining
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
8 of 17
Exploiting Data Available
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
Start
End
Main
ac�vity
Sub‐ac�vity 1
Sub‐ac�vity 2
Sub‐ac�vity n‐1
Sub‐ac�vity n
Tim
e
AB
CD
DCBA
A
B
C
D
A B C D
Process with events as �me intervals
Process with instantaneous events
Time
9 of 17
Exploiting Data Available
Events with duration instead of
instantaneous event
Generalization of Heuristics Miner to
exploit this new information
Start
End
Main
ac�vity
Sub‐ac�vity 1
Sub‐ac�vity 2
Sub‐ac�vity n‐1
Sub‐ac�vity n
Tim
e
AB
CD
DCBA
A
B
C
D
A B C D
Process with events as �me intervals
Process with instantaneous events
Time
9 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures
The mining log is �niteOnly a �nite amount of con�gurations possible
We are able to discretize the parameter values
F
A
B
C
DE
A
B
C
DE
AB
C
D
A B C D
?τ1 = ?τ2 = ?τ3 = ?τ4 = ?
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures
The mining log is �niteOnly a �nite amount of con�gurations possible
We are able to discretize the parameter values
F
A
B
C
DE
A
B
C
DE
AB
C
D
A B C D
?τ1 = ?τ2 = ?τ3 = ?τ4 = ?
10 of 17
Not-expert Users
Our users: not-expert in process mining, with notions of BPM
ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures
The mining log is �niteOnly a �nite amount of con�gurations possible
We are able to discretize the parameter values
F
A
B
C
DE
A
B
C
DE
AB
C
D
A B C D
?τ1 = ?τ2 = ?τ3 = ?τ4 = ?
10 of 17
Model Selection Approaches
User-guided Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
0.34
0.45
0.63
0.69
0.76
0.49
0.71
0.74
0.84
Pro
cess
1
Pro
cess
10
Pro
cess
9
Pro
cess
8
Pro
cess
5
Pro
cess
6
Pro
cess
4
Pro
cess
7
Pro
cess
2
Pro
cess
3 0 0.2 0.4 0.6 0.8 1
Navigation of the dendrogram
Automatic Approach
Hill climbing with
Maximum plateau steps
Random restarts
(Local optimum)
hMDL = argminh∈H
L(h) + L(D|h)
MDL encodings
MDL by Calders et al.
Simpli�ed heuristics
11 of 17
Model Selection Approaches
User-guided Approach
Hierarchical clustering of models
Average linkage
Any model-to-model metric
0.34
0.45
0.63
0.69
0.76
0.49
0.71
0.74
0.84
Pro
cess
1
Pro
cess
10
Pro
cess
9
Pro
cess
8
Pro
cess
5
Pro
cess
6
Pro
cess
4
Pro
cess
7
Pro
cess
2
Pro
cess
3 0 0.2 0.4 0.6 0.8 1
Navigation of the dendrogram
Automatic Approach
Hill climbing with
Maximum plateau steps
Random restarts
(Local optimum)
hMDL = argminh∈H
L(h) + L(D|h)
MDL encodings
MDL by Calders et al.
Simpli�ed heuristics
11 of 17
Overview � Results Evaluation
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
11 of 17
Evaluation Metrics
Model-to-model Metric
Complex process into
Permitted relations
Forbidden relations
Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A
A ‖ B ⇒ A > B, B > A
A # B ⇒ A ≯ B, B ≯ A
Comparison as Jaccard similarity on two sets (> and ≯)
Model-to-log Metric
Declare constraint π and a trace σ ⇒ healthiness measures
Activation sparsity: 1− na(σ,π)n(σ)
Violation ratio: nv (σ,π)na(σ,π)
Ful�llment ratio:nf (σ,π)na(σ,π)
Con�ict ratio: nc (σ,π)na(σ,π)
12 of 17
Evaluation Metrics
Model-to-model Metric
Complex process into
Permitted relations
Forbidden relations
Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A
A ‖ B ⇒ A > B, B > A
A # B ⇒ A ≯ B, B ≯ A
Comparison as Jaccard similarity on two sets (> and ≯)
Model-to-log Metric
Declare constraint π and a trace σ ⇒ healthiness measures
Activation sparsity: 1− na(σ,π)n(σ)
Violation ratio: nv (σ,π)na(σ,π)
Ful�llment ratio:nf (σ,π)na(σ,π)
Con�ict ratio: nc (σ,π)na(σ,π)
12 of 17
Overview � Process Extension
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
12 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
We add roles to the model
Assumption
Roles are characterized byconsistent set of originators
1 Dependencies as handover of roles
2 Remove dependencies below threshold
Connected components are candidate roles
3 Merge candidate roles if users sets
similarities above threshold
4 Entropy-based metric to tune thresholds
13 of 17
Multiperspective Mining
Given
Log with information on originators
Process model
We add roles to the model
Assumption
Roles are characterized byconsistent set of originators
1 Dependencies as handover of roles
2 Remove dependencies below threshold
Connected components are candidate roles
3 Merge candidate roles if users sets
similarities above threshold
4 Entropy-based metric to tune thresholds
13 of 17
Overview � Stream Control-�ow Mining
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
13 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Backtracking not feasible
One pass over data
Variable system condition
Ex. �uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
Principle
Recent observations are more
important than older ones
3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Backtracking not feasible
One pass over data
Variable system condition
Ex. �uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
Principle
Recent observations are more
important than older ones
3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting
14 of 17
Stream Context
Stream Mining Peculiarities
Cannot store the entire stream
Approximation
Backtracking not feasible
One pass over data
Variable system condition
Ex. �uctuating stream rates
Adapt the model to new data
Concept drifts
4 Completely new problems!
Principle
Recent observations are more
important than older ones
3 version of Heuristics Miner
Based on Sliding Window
Based on Lossy Counting
Based on Budget Lossy
Counting
14 of 17
Overview
Process MiningCapable Event Logs
Process Representa�on
Model Evalua�on
Process MiningCapable Event Stream
Data Prepara�on
Control‐flow Mining Stream Control‐flow Mining
Results Evalua�on
Process Extension
14 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
Processes and Logs Generator
Stochastic context free grammar
generates random processes
Rules to simulate a process and
produce an event log
Reference model used for evaluation
control-�ow mining algorithms
P
astart G
(G ;G )
A
a
(G ′ " G )
(G ;G )
A; (G ∧ G );A
b A
c
A
d
e
A
f
A
g
aend
15 of 17
Extra: Processes and Logs Generator
Companies are reluctant to share their data
Researchers need to do tests
(No BPI challenges at that time)
Processes and Logs Generator
Stochastic context free grammar
generates random processes
Rules to simulate a process and
produce an event log
Reference model used for evaluation
control-�ow mining algorithms
P
astart G
(G ;G )
A
a
(G ′ " G )
(G ;G )
A; (G ∧ G );A
b A
c
A
d
e
A
f
A
g
aend
15 of 17
Detailed Map of Performed Activities
Process Representa�on(e.g. Dependency Graph, Petri Net)
Legacy, Process‐unaware Informa�on Systems
Process MiningCapable Event LogsData Prepara�on
Control‐flow Mining AlgorithmExploi�ng More Data
Event Logs GeneratorUser‐guided DiscoveryAlgorithm Configura�on
Automa�cAlgorithm Configura�on
Process MiningCapable Event Stream
Stream Control‐flowMining Framework
Model Evalua�on(wrt Log / Original Model)
Model‐to‐model Metric Model‐to‐log MetricRandom ProcessGenerator
Extension of Process Modelswith Organiza�onal Roles
16 of 17
Thanks!
Doing the Ph.D. has been amazing!
A huge Thank you! to
My supervisor, Alessandro Sperduti
Siav S.p.A. and Roberto Pinelli
My internal examiners: Tullio Vardanega, Paolo Baldan
My external examiners: Barbara Weber, Diogo Ferreira
All the process mining community!
17 of 17