+ All Categories
Home > Technology > On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Date post: 07-Nov-2014
Category:
Upload: wil-van-der-aalst
View: 262 times
Download: 1 times
Share this document with a friend
Description:
 
Popular Tags:
33
On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery Joos Buijs Boudewijn van Dongen Wil van der Aalst http://www.win.tue.nl/coselog/
Transcript
Page 1: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Joos Buijs

Boudewijn van Dongen

Wil van der Aalst

http://www.win.tue.nl/coselog/

Page 2: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Advances in Process Mining

• Many process discovery and conformance checking algorithms and tools are available (cf. the various ProM packages).

• Also commercial software based on these ideas: Disco (Fluxicon), Reflect (Futura/Perceptive), BPMOne (Pallas Athena/Perceptive), ARIS Process Performance Manager (Software AG), Interstage Automated Process Discovery (Fujitsu), QPR ProcessAnalyzer/Analysis (QPR Software), flow (fourspark), Discovery Analyst (StereoLOGIC), etc.

• We applied process mining in over 100 organizations.

PAGE 2

More than 75 people involving more than 50 organizations created the Process Mining Manifesto in the context of the IEEE Task Force on Process Mining.

Available in 13 languages

Page 3: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Example Process Discovery(Vestia, Dutch housing agency, 208 cases, 5987 events)

PAGE 3

Page 4: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Example: Conformance Checking (WOZ objections Dutch municipality, 745 objections, 9583 event, f= 0.988)

PAGE 4

Page 5: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Challenge: Four Competing Quality Criteria

PAGE 5

process discovery

replay fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 6: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Example: one log four models

PAGE 6

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N3 : fitness = +, precision = -, generalization = +, simplicity = +

N2 : fitness = -, precision = +, generalization = -, simplicity = +

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

… (all 21 variants seen in the log)

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

process discovery

replay fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 7: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Model N1

PAGE 7

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

bexamine

thoroughly

cexamine casually

dcheck ticket

decide

pay compensation

reject request

reinitiate request

e

g

h

f

end

N1 : fitness = +, precision = +, generalization = +, simplicity = +

Page 8: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Model N2

PAGE 8

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N2 : fitness = -, precision = +, generalization = -, simplicity = +

Page 9: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Model N3

PAGE 9

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

bexamine thoroughly

cexamine casually

d checkticket

decide

pay compensation

reject request

reinitiate requeste

g

hf

end

N3 : fitness = +, precision = -, generalization = +, simplicity = +

Page 10: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Model N4

PAGE 10

acdeh

abdeg

adceh

abdeh

acdeg

adceg

adbeh

acdefdbeh

adbeg

acdefbdeh

acdefbdeg

acdefdbeg

adcefcdeh

adcefdbeh

adcefbdeg

acdefbdefdbeg

adcefdbeg

adcefbdefbdeg

adcefdbefbdeh

adbefbdefdbeg

adcefdbefcdefdbeg

455

191

177

144

111

82

56

47

38

33

14

11

9

8

5

3

2

2

1

1

1

# trace

1391

astart register

request

cexamine casually

dcheckticket

decide reject request

e hend

N4 : fitness = +, precision = +, generalization = -, simplicity = -

aregister request

dexamine casually

ccheckticket

decide reject request

e h

a cexamine casually

dcheckticket

decide

e g

a dexamine casually

ccheckticket

decide

e g

register request

register request

pay compensation

pay compensation

aregister request

b dcheckticket

decide reject request

e h

aregister request

d bcheckticket

decide reject request

e h

a b dcheckticket

decide

e gregister request

pay compensation

examine thoroughly

examine thoroughly

examine thoroughly

… (all 21 variants seen in the log)

Page 11: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Another challenge: Huge search space

PAGE 11

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

M

M

M

M

M

M

MM MMM

MM

M

M

M

M M MM

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

M

M

M

M

M

M

MM MMM

MM

M

M

M

M M MM

M

MMM M

M

MM

M

M

M

M

M

M

MM

M

M

MMM M

M

MM

M

M

M

M

M

MM

MM

M

MMM

MM M

MM M

M

MM

M

M

M

M

M

MM

MM

M

MM

MMM M

M

MM

M

M

M

M

M

MM

MM

M

M

M

M

MM

M

M

M

M

M

M

MM

M

MM

M

M

M

M

M

M

MM

MM

M

M M

MM

M

M

M

M

M

M

M

MM

M

M

M

M

MM

M

M MM M M M

Page 12: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

… with just a few interesting candidates

PAGE 12

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

MM

M

MM

MM

M

M

MM

M

M

M

M

M

M

MM MMM

MM

M

M

M

M M MM

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM M

M

MMM

M

M

M

M

M

M

MM

MM

M

M

MM

MMM MM

M

MM

M

M

M

M

M

M

MM

MM

M

M

MM

M

M

M

M

M

M

MMM

MM MM

M

M

M

M M MM

M

MMM M

M

MM

M

M

M

M

M

M

MM

M

M

MMM M

M

MM

M

M

M

M

M

MM

MM

M

MMM

MM M

MM M

M

MM

M

M

M

M

M

MM

MM

M

MM

MMM M

M

MM

M

M

M

M

M

MM

MM

M

M

M

M

MM

M

M

M

M

M

M

MM

M

MM

M

M

M

M

M

M

MM

MM

M

M M

MM

M

M

M

M

M

M

M

MM

M

M

M

M

MM

M

M MM M M M

Page 13: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Two requirements

1. It should be possible to seamlessly balance the different quality criteria based on user-defined preferences.

2. The algorithm should always return a "correct" process model and not waste time on model having deadlocks and other anomalies.

PAGE 13

process discovery

replay fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 14: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Proposal: Evolutionary Tree Miner (ETM)

• Process trees as representation (= limit search space to "good" models).

• Genetic approach (= very flexible)• Fitness function uses all four criteria (= seamlessly

balance the different "forces")

PAGE 14

Page 15: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Representational Bias: Process Trees

• Always sound because of the block structure

• Also Loop and OR operator

A (BA)*

PAGE 15

A

DC

E

X

B

A

Page 16: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Petri Net Semantics(used for comparison and conformance checking only)

PAGE 16

A

B

B

A

A

B

A

B

Sequence

Exclusive Choice

Loop

Parallellism

Or Choice

A B

Page 17: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Steps of the Genetic ETM Algorithm

PAGE 17

CreateInitial

Population

ReturnBest

Individual

MeasureQuality

ChangePopulation

Stop?

YesYes

NoNo

Page 18: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Population Change

PAGE 18

Population i Population i+1

Elite

Crossover Mutation

Selection

Replace

Page 19: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Four Metrics (see paper)

PAGE 19

process discovery

replay fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

1 = optimal0 = very bad

Page 20: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Example

PAGE 20

B

C

D

E

F

A G

A = send e-mail, B = check credit, C = calculate capacity, D = check system, E = accept, F = reject, G = send e-mail

unknown

Page 21: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Conventional Algorithms (1/3)("best effort" mapping to process trees to allow for comparison)

PAGE 21

alpha miner

ILP miner

language-based region miner

sound

sound

sound

low fitness

low precision

low fitness

… lucky

Page 22: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Conventional Algorithms (2/3)

PAGE 22

heuristic miner

multi-phase miner

unsound

unsound

(relaxed

sound)

Page 23: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Conventional Algorithms (3/3)

PAGE 23

state-based region miner

genetic miner

unsound

sound

Page 24: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Often unsound result and no mechanism to seamlessly balance the four criteria

PAGE 24

unsound models

ignored criteria

Page 25: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Genetic Mining (ETM) While Considering Only One Criterion

PAGE 25

best value possible

for this log

ETM with weight zero to three out of four perspectives.

Page 26: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Considering Replay Fitness and One Other Criterion

PAGE 26ETM with weight zero to two out of four perspectives.

Page 27: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Considering 3 of 4 Criteria

PAGE 27replay fitness needs to have a larger weight

Page 28: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Considering All Four Criteria with Emphasis on Fitness

PAGE 28

fitness has weight 10

Page 29: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Initial Model Versus Discovered Model

PAGE 29

B

C

D

E

F

A G

discovered by ETMsimulated

Discovered model

outperforms initia

l

model with respect

too all crite

ria!

Better than

existing algorithms

(but patience is

needed)!

Page 30: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Real-Life Event Logs

• Event log L0 is the event log used before. L0 contains 100 traces, 590 events and 7 activities.

• Event Log L1 contains 105 traces, 743 events in total, with 6 different activities.

• Event Log L2 contains 444 traces, 3.269 events in total, with 6 different activities.

• Event Log L3 contains 274 traces, 1:582 events in total, with 6 different activities.

PAGE 30

Event logs L1, L2 and L3 are extracted from the information systems of municipalities participating in the CoSeLoG project (http://www.win.tue.nl/coselog/).

Page 31: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Results

PAGE 31

Equal weights for all criteria.

If unsound , the sound behavior is approximated when creating the process tree.

Page 32: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Conclusion

• First algorithm that allows for balancing all four perspectives.

• Genetic algorithm is very flexible, but also very slow.

• Process trees only used internally (choose your favorite representation)

• Future work:− Improve speed

− Distribute PM tasks

− Discover configurable process trees

PAGE 35

process discovery

replay fitness

precisiongeneralization

simplicity

“able to replay event log” “Occam’s razor”

“not overfitting the log” “not underfitting the log”

Page 33: On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

PAGE 36

www.processmining.org

www.win.tue.nl/ieeetfpm/


Recommended