Process-oriented System Analysis Process Mining. BPM Lifecycle.

Post on 05-Jan-2016

224 views 0 download

Tags:

transcript

Process-oriented System AnalysisProcess Mining

BPM Lifecycle

Motivation

Up until now:Designed or pre-defined models

Assumption that they are appropriate

Process Mining

Consideration of information from the execution of proceses

This is covered in log data

Logs

Sequence of log entries, which capture events in a company that relate to processes

Log entries

Examples of log entries

Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57

Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24

Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18

Function ContactCustomer(c1987, PromoMailing) completed on 12.11.2010 at 9:24:10

Function StoreCustomerData(„Miller“, c1988, „Osnabrück“) completed on 12.11.2010 at 9:26:08

Check Invoice for Invoice No. 4568 completed on 12.11.2010 at 9:26:38

Function ContactCustomer(c1988, PromoMailing) completed on 12.11.2010 at Send 9:27:32

Logs bear valuable information

Logs bear valuable information to answer questions likeWhen and how many process instances have been executed?

Are there recurring patterns in the execution of activities?

Can process models be derived from the data?

Which paths of execution are used how often in the process models?

Are there paths which are never taken?

Process Discovery

Process Discovery is a technique for deriving a process model from log data

Input: execution logs as ordered lists of activities with time stamp and case id

Output: process model which could have generated the execution logs

The case id is often not directly covered in the data, and needs to be generated in pre-processing

Process Conformance

Process Conformance is a technique to analyze the relationship between log data and process models

Input: Logs and process model

Output: information on the relationship, e.g. fitness

Overview

Execution Logs

AssumptionExecution log defines complete order of events, which can all be

related to process activitiesAll events in the execution log relate to process instances of the

considered process

HintOften log entries refer to different process modelsThis warrants filtering activities

AbstractionTechniques often work on abstraction of logsFocus on case id and activities

Execution Log Format

Log format(caseID, activity)

ExampleCheck Invoice for Invoice No. 4567 completed on 12.11.2010 at

9:19:57

Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24

Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18

Resulting Log(4567, Check Invoice), (c1987, StoreCustomerData), (4567, Send

Invoice), etc.

Execution Log

Further abstraction

A‘s and B‘s

(case id, task id)

Additional information

Event type, time, resource, data

Not considered here

Assumption

Activity execution captured by one event

No intermediate activities

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

The Alpha Algorithm

Process Discovery Algorithms

Simplest Algorithm: The α – Algorithm

Relatively simple, some properties can be proofed

Affected by Noise, therefore not first choice in practice

Noise refers to incomplete or erroneous logs

Furthermore, the α+(+) – Algorithms

α+ and α++ are extensions to the α – Algorithm for recognizing more fine-granular structure in the process model

Also affected by Noise

Finally, techniques for dealing with Noise

Definitions

Let T be a set of activities (Tasks) and T * the set of all sequences of arbitrary length over T, then we have:σ T * is called execution sequence, if all activities in σ belong to the

same process instance

W T * is called execution log (workflow log)

AssumptionsIn each process model, each activity appears at most once

Each direct neighbor relation between activities is represented at least once

Execution Logs

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Execution Logs

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

Execution sequences:

Case 1: ABCD

Case 2: ACBD

Case 3: ABCD

Case 4: ACBD

Case 5: EF

Resultingworkflow log: W = {ABCD, ACBD, EF}

Order relations

Log based order relations for pairs of activities a, b T in a workflow log W:Direct successor

a >w b i.e. in an execution sequence b directly follows a

Causalitya w b i.e. a >w b and not b >w a

Concurrency a ║w b i.e. a >w b and b >w a

Exclusivenessa w b i.e. not a >w b and not b >w aActivity pairs which never succeed each other

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency

Execution log analysis

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

A>BA>CB>CB>DC>BC>DE>F

AB

AC

BD

CD

EF

B||CC||B

1) 2) 3)

• W = {ABCD, ACBD, EF}• Direct successor• Causality• Concurrency

Execution log analysis

α-Algorithm

The idea is to utilize order relations for deriving a workflow net that is compliant with these relations

Precisely, each order relation results in a petri net fragment, which imposes the respective relationship

α-Algorithm

Idea (a)

a b

α-Algorithm

Idea (b)

a b, a c and b # c

α-Algorithm

Idea (c)

b d, c d and b # c

α-Algorithm

Idea (d)

a b, a c and b || c

α-Algorithm

Idea (e)

b d, c d and b || c

The Alpha-Algorithm (simplified)

1. Identify the set of all tasks in the log as TL.

2. Identify the set of all tasks that have been observed as the first task in some case as TI.

3. Identify the set of all tasks that have been observed as the last task in some case as TO.

4. Identify the set of all connections to be potentially represented in the process model as a set XL. Add the following elements to XL:

a. Pattern (a): all pairs for which hold a→b.

b. Pattern (b): all triples for which hold a→(b#c).

c. Pattern (c): all triples for which hold (b#c)→d.

Note that triples for which Pattern (d) a→(b||c) or Pattern (e) (b||c)→d hold are not included in XL.

The Alpha-Algorithm (cont.)

5. Construct the set YL as a subset of XL by:

a. Eliminating a→b and a→c if there exists some a→(b#c).

b. Eliminating b→c and b→d if there exists some (b#c)→d.

6. Connect start and end events in the following way:

a. If there are multiple tasks in the set TI of first tasks, then draw a start event leading to an XOR-split, which connects to every task in TI. Otherwise, directly connect the start event with the only first task.

b. For each task in the set TO of last tasks, add an end event and draw an arc from the task to the end event.

The Alpha-Algorithm (cont.)

7. Construct the flow arcs in the following way:

a. Pattern (a): For each a→b in YL, draw an arc a to b.

b. Pattern (b): For each a→(b#c) in YL, draw an arc from a to an XOR-split, and from there to b and c.

c. Pattern (c): For each (b#c)→d in YL, draw an arc from b and c to an XOR-join, and from there to d.

d. Pattern (d) and (e): If a task in the so constructed process model has multiple incoming or multiple outgoing arcs, bundle these arcs with an AND-split or AND-join, respectively.

8. Return the newly constructed process model.

α-Algorithm Example

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

α-Algorithm Example

case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D

a(W):

α-Algorithm

Log Completeness

Level of completeness required for a log

Assume for the execution sequence EF, there is a log missing

Then, the correct process model cannot be derived

Basic assumption: each execution sequence must be part of the log

Consequence: the complete behaviour is visible

Problem: amount of required instances grows dramatically

Example:

10 activities are executed in parallel

Amount of potential execution sequences:10! = 3.628.800

Log Completeness

Result

For the α-Algorithm it is sufficient to have completeness in terms of the successor relationship (>w)

Reason

All other relations are derived from direct successorship

Interpretation

Each time two activities may succeed each other, this must be visible in at least one execution sequence

Hint

In case of highly concurrent process models, this reduces the amount of required execution sequences dramatically

Summary

• Execution Logs• Process Mining using the Alpha-Algorithm