Applicability of Process Mining Techniques in Business Environments

Post on 01-Dec-2014

331 views 4 download

description

Presentation provided at the annual meeting of the IEEE Task Force on Process Mining, for the Best Dissertation Award, during BPM 2014 (in Eindhoven, the Netherlands, http://bpm2014.haifa.ac.il).

transcript

Applicability of Process Mining

Techniques in Business Environments

Annual Meeting IEEE Task Force on Process Mining

Andrea Burattin

andreaburattin

September 8, 2014

Brief Curriculum Vitæ

2009, M.Sc.Computer Science (A.I. program)University of Padova

2009 � 2012, Ph.D.Supervisor: Prof. Alessandro SperdutiJoint school University of Bologna�PadovaThesis defended on April 2013

2013 � 2014, PostdocPrompt project (prompt.processmining.it)

University of PadovaSpecola, Padova. http://flic.kr/p/cEW5bo

2 of 17

Ph.D. Inception

Ph.D background

Inception during M.Sc. thesis� Companies: study on process mining

A company (Siav S.p.A.) funded my PhD

www.siav.it� Aim: investigate applicability of process

mining techniques in business scenarios� Interaction with companies: interesting! (but sometimes. . . )

Outcome� �Applicability of Process Mining Techniques in Business

Environments�

3 of 17

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Possible Industry Scenarios

Four possible industry scenarios

Process aware vs. Process unaware

Process aware software vs. Process unaware software

Company 1 Company 2

Company 3Company 4

Process Unaware

Information Systems

Process Aware

Information Systems

Process Aware

Companies

Process Unaware

Companies

6 of 17

Thesis Structure and Organization

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

6 of 17

Overview � Data Preparation

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

6 of 17

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Problems with Data Preparation (cont.)

Case-id from infoi �elds

Candidate case-id �eldsA-priori knowledge

Events chainsStrings similarity functions

Selection of maximal chainMost activities or simplest chain

Process name is not a problem

All events belonging to the same process

Act. info1 info2

a1 AB-01 BB-01

a2 AA-02 AB-01

a3 AB-01 BB-02

a4 AB-01 BB-03

a1 AA-03 BB-04

a5 AA-03 BB-05

8 of 17

Problems with Data Preparation (cont.)

Case-id from infoi �elds

Candidate case-id �eldsA-priori knowledge

Events chainsStrings similarity functions

Selection of maximal chainMost activities or simplest chain

Process name is not a problem

All events belonging to the same process

Act. info1 info2

a1 AB-01 BB-01

a2 AA-02 AB-01

a3 AB-01 BB-02

a4 AB-01 BB-03

a1 AA-03 BB-04

a5 AA-03 BB-05

8 of 17

Overview � Control-�ow Mining

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

8 of 17

Exploiting Data Available

Events with duration instead of

instantaneous event

Generalization of Heuristics Miner to

exploit this new information

Start

End

Main

ac�vity

Sub‐ac�vity 1

Sub‐ac�vity 2

Sub‐ac�vity n‐1

Sub‐ac�vity n

Tim

e

AB

CD

DCBA

A

B

C

D

A B C D

Process with events as �me intervals

Process with instantaneous events

Time

9 of 17

Exploiting Data Available

Events with duration instead of

instantaneous event

Generalization of Heuristics Miner to

exploit this new information

Start

End

Main

ac�vity

Sub‐ac�vity 1

Sub‐ac�vity 2

Sub‐ac�vity n‐1

Sub‐ac�vity n

Tim

e

AB

CD

DCBA

A

B

C

D

A B C D

Process with events as �me intervals

Process with instantaneous events

Time

9 of 17

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Model Selection Approaches

User-guided Approach

Hierarchical clustering of models

Average linkage

Any model-to-model metric

0.34

0.45

0.63

0.69

0.76

0.49

0.71

0.74

0.84

Pro

cess

1

Pro

cess

10

Pro

cess

9

Pro

cess

8

Pro

cess

5

Pro

cess

6

Pro

cess

4

Pro

cess

7

Pro

cess

2

Pro

cess

3 0 0.2 0.4 0.6 0.8 1

Navigation of the dendrogram

Automatic Approach

Hill climbing with

Maximum plateau steps

Random restarts

(Local optimum)

hMDL = argminh∈H

L(h) + L(D|h)

MDL encodings

MDL by Calders et al.

Simpli�ed heuristics

11 of 17

Model Selection Approaches

User-guided Approach

Hierarchical clustering of models

Average linkage

Any model-to-model metric

0.34

0.45

0.63

0.69

0.76

0.49

0.71

0.74

0.84

Pro

cess

1

Pro

cess

10

Pro

cess

9

Pro

cess

8

Pro

cess

5

Pro

cess

6

Pro

cess

4

Pro

cess

7

Pro

cess

2

Pro

cess

3 0 0.2 0.4 0.6 0.8 1

Navigation of the dendrogram

Automatic Approach

Hill climbing with

Maximum plateau steps

Random restarts

(Local optimum)

hMDL = argminh∈H

L(h) + L(D|h)

MDL encodings

MDL by Calders et al.

Simpli�ed heuristics

11 of 17

Overview � Results Evaluation

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

11 of 17

Evaluation Metrics

Model-to-model Metric

Complex process into

Permitted relations

Forbidden relations

Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A

A ‖ B ⇒ A > B, B > A

A # B ⇒ A ≯ B, B ≯ A

Comparison as Jaccard similarity on two sets (> and ≯)

Model-to-log Metric

Declare constraint π and a trace σ ⇒ healthiness measures

Activation sparsity: 1− na(σ,π)n(σ)

Violation ratio: nv (σ,π)na(σ,π)

Ful�llment ratio:nf (σ,π)na(σ,π)

Con�ict ratio: nc (σ,π)na(σ,π)

12 of 17

Evaluation Metrics

Model-to-model Metric

Complex process into

Permitted relations

Forbidden relations

Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A

A ‖ B ⇒ A > B, B > A

A # B ⇒ A ≯ B, B ≯ A

Comparison as Jaccard similarity on two sets (> and ≯)

Model-to-log Metric

Declare constraint π and a trace σ ⇒ healthiness measures

Activation sparsity: 1− na(σ,π)n(σ)

Violation ratio: nv (σ,π)na(σ,π)

Ful�llment ratio:nf (σ,π)na(σ,π)

Con�ict ratio: nc (σ,π)na(σ,π)

12 of 17

Overview � Process Extension

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

12 of 17

Multiperspective Mining

Given

Log with information on originators

Process model

We add roles to the model

Assumption

Roles are characterized byconsistent set of originators

1 Dependencies as handover of roles

2 Remove dependencies below threshold

Connected components are candidate roles

3 Merge candidate roles if users sets

similarities above threshold

4 Entropy-based metric to tune thresholds

13 of 17

Multiperspective Mining

Given

Log with information on originators

Process model

We add roles to the model

Assumption

Roles are characterized byconsistent set of originators

1 Dependencies as handover of roles

2 Remove dependencies below threshold

Connected components are candidate roles

3 Merge candidate roles if users sets

similarities above threshold

4 Entropy-based metric to tune thresholds

13 of 17

Overview � Stream Control-�ow Mining

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

13 of 17

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Overview

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

14 of 17

Extra: Processes and Logs Generator

Companies are reluctant to share their data

Researchers need to do tests

(No BPI challenges at that time)

Processes and Logs Generator

Stochastic context free grammar

generates random processes

Rules to simulate a process and

produce an event log

Reference model used for evaluation

control-�ow mining algorithms

P

astart G

(G ;G )

A

a

(G ′ " G )

(G ;G )

A; (G ∧ G );A

b A

c

A

d

e

A

f

A

g

aend

15 of 17

Extra: Processes and Logs Generator

Companies are reluctant to share their data

Researchers need to do tests

(No BPI challenges at that time)

Processes and Logs Generator

Stochastic context free grammar

generates random processes

Rules to simulate a process and

produce an event log

Reference model used for evaluation

control-�ow mining algorithms

P

astart G

(G ;G )

A

a

(G ′ " G )

(G ;G )

A; (G ∧ G );A

b A

c

A

d

e

A

f

A

g

aend

15 of 17

Detailed Map of Performed Activities

Process Representa�on(e.g. Dependency Graph, Petri Net)

Legacy, Process‐unaware Informa�on Systems

Process MiningCapable Event LogsData Prepara�on

Control‐flow Mining AlgorithmExploi�ng More Data

Event Logs GeneratorUser‐guided DiscoveryAlgorithm Configura�on

Automa�cAlgorithm Configura�on

Process MiningCapable Event Stream

Stream Control‐flowMining Framework

Model Evalua�on(wrt Log / Original Model)

Model‐to‐model Metric Model‐to‐log MetricRandom ProcessGenerator

Extension of Process Modelswith Organiza�onal Roles

16 of 17

Thanks!

Doing the Ph.D. has been amazing!

A huge Thank you! to

My supervisor, Alessandro Sperduti

Siav S.p.A. and Roberto Pinelli

My internal examiners: Tullio Vardanega, Paolo Baldan

My external examiners: Barbara Weber, Diogo Ferreira

All the process mining community!

17 of 17