+ All Categories
Home > Documents > A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

Date post: 23-Dec-2016
Category:
Upload: rania
View: 238 times
Download: 0 times
Share this document with a friend
30
Knowl Inf Syst DOI 10.1007/s10115-013-0697-8 REGULAR PAPER A markov prediction model for data-driven semi-structured business processes Geetika T. Lakshmanan · Davood Shamsi · Yurdaer N. Doganata · Merve Unuvar · Rania Khalaf Received: 30 January 2012 / Revised: 25 September 2013 / Accepted: 26 September 2013 © Springer-Verlag London 2013 Abstract In semi-structured case-oriented business processes, the sequence of process steps is determined by case workers based on available document content associated with a case. Transitions between process execution steps are therefore case specific and depend on inde- pendent judgment of case workers. In this paper, we propose an instance-specific probabilistic process model (PPM) whose transition probabilities are customized to the semi-structured business process instance it represents. An instance-specific PPM serves as a powerful repre- sentation to predict the likelihood of different outcomes. We also show that certain instance- specific PPMs can be transformed into a Markov chain under some non-restrictive assump- tions. For instance-specific PPMs that contain parallel execution of tasks, we provide an algorithm to map them to an extended space Markov chain. This way existing Markov tech- niques can be leveraged to make predictions about the likelihood of executing future tasks. Predictions provided by our technique could generate early alerts for case workers about the likelihood of important or undesired outcomes in an executing case instance. We have imple- mented and validated our approach on a simulated automobile insurance claims handling semi-structured business process. Results indicate that an instance-specific PPM provides more accurate predictions than other methods such as conditional probability. We also show that as more document data become available, the prediction accuracy of an instance-specific PPM increases. Keywords Markov chain · Data driven · Decision tree · Business process · Prediction G. T. Lakshmanan (B ) · Y. N. Doganata · M. Unuvar · R. Khalaf IBM T. J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights, NY 10598, USA e-mail: [email protected] D. Shamsi Management Science and Engineering, Stanford University, Stanford, CA, USA 123
Transcript
Page 1: A markov prediction model for data-driven semi-structured business processes

Knowl Inf SystDOI 10.1007/s10115-013-0697-8

REGULAR PAPER

A markov prediction model for data-drivensemi-structured business processes

Geetika T. Lakshmanan · Davood Shamsi ·Yurdaer N. Doganata · Merve Unuvar · Rania Khalaf

Received: 30 January 2012 / Revised: 25 September 2013 / Accepted: 26 September 2013© Springer-Verlag London 2013

Abstract In semi-structured case-oriented business processes, the sequence of process stepsis determined by case workers based on available document content associated with a case.Transitions between process execution steps are therefore case specific and depend on inde-pendent judgment of case workers. In this paper, we propose an instance-specific probabilisticprocess model (PPM) whose transition probabilities are customized to the semi-structuredbusiness process instance it represents. An instance-specific PPM serves as a powerful repre-sentation to predict the likelihood of different outcomes. We also show that certain instance-specific PPMs can be transformed into a Markov chain under some non-restrictive assump-tions. For instance-specific PPMs that contain parallel execution of tasks, we provide analgorithm to map them to an extended space Markov chain. This way existing Markov tech-niques can be leveraged to make predictions about the likelihood of executing future tasks.Predictions provided by our technique could generate early alerts for case workers about thelikelihood of important or undesired outcomes in an executing case instance. We have imple-mented and validated our approach on a simulated automobile insurance claims handlingsemi-structured business process. Results indicate that an instance-specific PPM providesmore accurate predictions than other methods such as conditional probability. We also showthat as more document data become available, the prediction accuracy of an instance-specificPPM increases.

Keywords Markov chain · Data driven · Decision tree · Business process · Prediction

G. T. Lakshmanan (B) · Y. N. Doganata ·M. Unuvar · R. KhalafIBM T. J. Watson Research Center, 1101 Kitchawan Rd, Yorktown Heights,NY 10598, USAe-mail: [email protected]

D. ShamsiManagement Science and Engineering, Stanford University, Stanford, CA, USA

123

Page 2: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

1 Introduction

Traditional business process management systems have been focusing on fully structured,comprehensively modeled executable workflows. Lately, there has been a focus on movingto handle processes covering a wider range of behavior. Business processes in reality covera spectrum from the traditional rigid processes (modeled and running under the auspices ofa strict workflow management system) to completely ad hoc unstructured flows driven byhumans over e-mail and phone. Traditional business process management (BPM) systems,at one end of this spectrum, demand a process model that can be completely defined inadvance and typically include restrictions such as rigid control flow and context tunneling.As one moves toward the less rigid processes in this spectrum, it is common to refer to theprocesses as semi-structured. Semi-structured processes arise extensively in industries such asgovernment, insurance, banking and healthcare [2]. Automobile insurance claims processing,handling of prescription drug orders and patient case management in a hospital are a fewexamples of such processes. These processes depart from the traditional kind of structuredand sequential predefined processes since their lifecycle is not fully driven by a processmodel. In case-oriented semi-structured business processes, for example, the set of activitiesthat need to be performed and whether additional steps are required are determined by humanjudgment and document contents [38]. There is typically a large amount of data associatedwith case-oriented semi-structured processes, and it comes from disparate data sources.Workers handling a case, also known as case workers, exercise independent judgment, whileobeying company guidelines, in selecting the set and sequence of process steps for handlinga case instance on available document contents and information associated with a case. Caseworkers can begin working on several tasks in parallel and repeat one or more tasks. Valuesof available data may change at any stage during the execution of a case, and new data valuesmay arrive, influencing a case worker’s decision on how to proceed. Therefore, the set of tasksand order of their execution in a semi-structured business process instance is not known apriori.

In an environment where the progress of cases depends on the analysis of large amountsof dynamic data and impending deadlines require rapid decision making, case handling ischallenging and error prone, even for case workers with a high level of expertise. The largeamount of data involved with a case instance might make it difficult for even an experiencedcase worker to form a coherent picture of the case. All these factors make it difficult forcase managers to detect critical situations where intervention might be required and enforcecompliance with policies during runtime. The outcomes of past decisions, however, can beused to make better decisions in the future. The goal of our work is to build a probabilisticmodel based on past decisions of case workers. On the basis of certain assumptions, wedemonstrate that this model is Markovian. Section 2 outlines a motivating example for ourwork and motivates our reasons for adopting a Markovian approach. The probabilistic modelis then used to provide predictions to a case worker about the likelihood of any potentialnext step including final outcome of a case instance that the worker is handling. This is doneon the basis of document content values belonging to the case instance that are available tothe case worker. We refer to this as an instance-specific probabilistic process model (PPM).Predictions generated from an instance-specific PPM could provide early alerts and guidanceto case workers and managers about certain undesired or important outcomes.

Our technique computes the likelihood of occurrence of any potential future task froma given task in a currently running case instance based on documents available in the cur-rently running case instance. In order to capture the key characteristics of semi-structuredbusiness processes, we first define a generic state model in Sect. 4.1. On the basis of certain

123

Page 3: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

Fig. 1 Steps of our prediction technique

assumptions, we then create a Markov chain from this state model in Sect. 4.2. Figure 1summarizes the steps of our prediction technique:

1. Step 1: Given a set of execution traces, we mine a process model from these traces2. Step 2: Learn a decision tree at every node in the process model using the raw execution

traces.3. Step 3: Given a partial trace of a running process instance, we use decision trees to compute

the probability of each edge (one-step transition probabilities) in an instance-specific PPMand create a Markov chain.

4. Step 4: If an instance-specific PPM contains parallelism, we find one-step transitionprobabilities between tasks by creating an extended state Markov chain.

5. Step 5: If there exists parallel paths in the model, future task probabilities are computedby using the concept of first passage time and assuming that next task depends only onthe current task. Otherwise, Markov methods including absorption or first passage timeprobabilities are applied to compute the probability that a future task will execute forstep 3.

Steps 1–3 covering the creation of an instance-specific PPM with Markov properties aredescribed in Sect. 4.3. Section 4.4 provides a discussion of the accuracy of an instance-specificPPM. Section 4.5 introduces the concepts of absorption and one-time passage probability anddemonstrates how the former can be applied to compute the probability of arriving at a finaloutcome (an absorbing state) in the Markov chain and the latter can be applied to computethe probability of arriving at a transient state from any other transient state in the Markovchain. Section 4.6 describes how to map an instance-specific PPM that contains parallel tasksto an extended space Markov chain as summarized in step 5. Computing the likelihood ofexecuting, a future task requires the application the new one-step transition matrix generatedfor the extended space Markov chain, and this is described in Sect. 4.7. In Sect. 5, wepresent the results of applying our prediction approach to a data-driven automobile insurancesemi-structured business process implemented in a simulator. In Sect. 6, we summarize thecontributions of this paper and discuss opportunities for future work.

Next, we motivate our approach with the help of an example and state assumptions thatwe make.

123

Page 4: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

2 Motivating example and assumptions

Predicting outcomes for business process instances where tasks may be repeatedly executed(loops), and two or more tasks may execute in parallel is not obvious. Suppose we collectexecution traces of a set of case (or process) instances handled by case workers and mine abusiness process model from these traces that aggregates and summarizes the case handlingbehavior of the case workers in this set of traces. As mentioned earlier, case workers cantypically begin working on several tasks in parallel and can repeat one or more tasks asneeded. Therefore, the resulting process model could contain parallel gateways and loopssuch as the process model snippet in Fig. 2. Now suppose a case worker begins to execute anew instance of this process, we can identify the task(s) the case worker is currently executing(e.g., task A in Fig. 2) in the mined process model. From task A, either task B or C can occurnext in a given instance of this process. The parallel gateway emanating from task B allowsmultiple outgoing paths to be active simultaneously. The outgoing edge from the exclusivesplit gateway from F to B potentially allows loops and thus multiple executions of parallelactive paths from B. Given that task A has executed, it is not obvious how to accurately predictthe probability with which tasks E , D, F , G and H will occur while taking in to accountall the potential paths from A to each of these other potential future outcomes. Informationis present in loops and parallelism that could potentially influence the number of potentialfuture outcomes and the probability with which they occur. The prediction methodologyneeds to address the loops and parallelism encountered in the possible paths from the taskthe case worker is currently executing that lead to all potential future outcomes in order toreliably predict the likelihood with which each of the future outcomes will occur.

The core of our approach is to create a Markovian model of a data-driven semi-structuredbusiness process, mined from a set of execution traces of the process. Our model can predictthe likelihood of all potential future tasks in an instance of the semi-structured businessprocess on the basis of data values associated with that instance while taking into accountthe information present in the loops and parallelism in the process. We use decision treeclassification algorithms to determine the probability of executing the next task from a given

Fig. 2 A snippet of a process model (labeled gateways at the bottom). It is not obvious how to address loopsand parallelism in order to reliably predict from task A the likelihood of occurrence of other future tasks

123

Page 5: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

task in a process instance based on data values specific to that instance. In this way, we createan instance-specific PPM whose transition probabilities are customized to a process instance.We assume that the next task in a business process instance depends only on the documentcontent that has been updated or generated since the beginning of the process until the end ofthe current task, but not on the tasks that have already executed nor on when the documentcontent was acquired. This allows us to use transition probabilities computed by our algorithmto construct an instance-specific one-step transition matrix of a Markov system associatedwith an instance-specific PPM. It may seem that an alternative to building a Markov modelcould be to directly train a decision tree for predicting the likelihood of a particular output.There are two important reasons why that approach does not suffice. Firstly, every predictionproblem requires constructing a new decision tree associated with the training data set for theoutput class to be predicted. This means that every time a user wishes to predict a new outcomeor new output class, a new training data set needs to be engineered. There is no algorithmthat can create a decision tree automatically for every new prediction problem. For example,predicting the time it takes to observe a particular task execution pattern requires tailoringthe training data from past observation by taking the execution patterns into account. AMarkov modeling approach, on the other hand, provides a closed form expression to answerthis kind of prediction problem [22,27]. Secondly, techniques to learn a decision tree orother classifiers are not available when parallel execution paths and loops are present in theexecution behavior of a business process. Markov models can handle cycles, and in this paper,we show how to create an extended space Markov model that can represent parallelism in abusiness process. The novelty of our approach is that we create a Markovian model for data-driven semi-structured business processes by computing transition probabilities betweentasks that are specific to a running instance of a business process. We demonstrate howmachine learning techniques can generate instance-specific transition probabilities for suchprocesses. Experimental results presented in this paper validate that our prediction techniquethat takes in to account the operational semantics of the business process (e.g., loops andparallelism) leads to a greater number of accurate predictions compared to techniques thatignore the operational semantics of the business process. Adding any additional informationthat influences decision making in a business process improves the prediction power of ourmethodology for instances of that business process.

One of the inputs to our approach is the execution traces of the process. Getting this traceinformation requires the availability of history logs of the underlying IT system. We assumethat a system such as the business provenance system described in [3] collects case historyfrom diverse sources and provides us with integrated, correlated traces [25] where eachtrace represents the end-to-end execution of a single process instance including contentsof documents accessed or modified or written by each task in the trace. We also assumethat we can mine a process model of a case-oriented semi-structured business process froma set of execution traces and that this mined process model is an accurate representationof the behavior of a business process during a time period within which predictions arederived from the mined model using our prediction methodology. Although workers haveconsiderable freedom in determining the set and sequence of tasks to execute for each caseinstance of a semi-structured business process, we assume that the case handling behaviorof multiple case workers in handling a set of cases can be aggregated and summarized usingknown process mining algorithms in a process model mined from these traces. We assumethat a given instance of the process follows a path that is potentially attainable in this minedmodel. In other words, tasks and the transitions between them that are executed in a givenprocess instance are a subset of the graph represented by the mined process model of theprocess.

123

Page 6: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

Before proceeding to the details of our algorithm (in Sect. 4), we next review existingprobabilistic models and techniques proposed in the business process management and deci-sion support literature that are relevant to this work and outline the unique contributions ofthis paper.

3 Related work

In earlier work [13], we conducted a preliminary investigation of predicting the likelihood offuture tasks in a semi-structured business process instance while assuming that the processis acyclic and does not contain parallelism. On the basis of the chain rule of probability, wedemonstrated how we could predict the likelihood of different outcomes over an aggregateset of cases modeled by a probabilistic graph. We also learned a decision tree at everydecision node in the probabilistic graph in order to determine the document value conditionsunder which future outcomes from each decision node could occur. In this paper, we providealgorithms for predicting the likelihood of future tasks when the business process containsloops and parallelism.

A number of probabilistic models have been proposed with the goal of business processmodeling and mining, including Markovian models [1], models that leverage stochasticprocess modeling techniques to yield a finite state machine [4] and stochastic-task-graph-based models [8,9]. All of these techniques focus on mining, modeling and simulating busi-ness processes. This is distinct from our goal of predicting the likelihood of different outcomeson the basis of available document content values in a running instance of a business processwhere decisions on which tasks to execute next are driven by document contents. Recentlyseveral probabilistic models have been proposed in the business process management liter-ature to predict the next step in a business process instance [26,30,34]. In contrast to theseapproaches, our probabilistic model takes into account the data values available in a processinstance. Furthermore, while [34] uses nonparametric regression, and [30] uses an annotatedtransition system consisting of states, transitions and measurements collected at each state,our method is Markovian.

Several techniques have proposed decision trees or similar tools for making predictionsfor business processes. Using the SPSS Answer Tree tool, [29] explored specific correlationsbetween the practical processing of a case and properties directly linked to a case. Our workis distinct because we focus on a general approach to determine the impact of documentcontents on the outcomes of a decision point and model the relationship in a probabilisticmodel customized to a process instance. How data attributes influence the choices made in aprocess based on past process executions by leveraging decision trees has been investigatedin [24]. They explore in detail many of the broadly scoped ideas presented by [7] who developa set of process analysis tools for managing process execution quality. Although we use thesame decision tree learning algorithm as [24], there are a number of important differences inour work. The focus of [24] is to correctly identify possible decisions at decision points byexamining business process event logs. Consequently, they focus on being able to identifydecision points in the presence of duplicate and invisible tasks in the log. We, on the otherhand, assume a process model has been mined, and our goal is to predict the likelihoodof different outcomes in a currently running process instance the presence of loops andparallelism in the process model.

Probabilistic process models have been proposed as a way of tracking the progress of aprocess and to answer questions such as the probability that a subprocess will be executed,given the state of other subprocesses [19]. A new language, ProPL, has been proposed to

123

Page 7: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

represent a probabilistic process model, and it is argued that standard inference algorithmscan be leveraged to perform inference on the Dynamic Bayesian Network obtained from theprobabilistic process model [19]. They did not address how to make predictions for a runningbusiness process instance using a process model containing loops and parallelism and howto model data-driven decision making [19]. Colored Petri Nets (CPNs) [12] serve as an alter-native expressive representation for probabilistic process models. CPNs can model a generalclass of stochastic processes [12,37]. Rozinat et al. combine decision mining results witha mined process model to create CPNs for business process modeling and simulation [23].While the focus of the work by [23] is on end-to-end process instance simulation using CPNsto answer “what-if” questions and provide information such as execution and waiting timesfor process instances, our approach relies on a Markov model to compute the likelihood offuture outcomes in a running process instance that takes into account loops and parallelismin the process model.

A stochastic policy that gives the probability of selecting an action given that the user hasa goal and is in a certain state has been modeled by [17]. The objective of this work is to selecta policy that minimizes the expected cost given the observed history of the user. An assistantlearns a task hierarchy with relational constraints on the subtasks and their parameters. Theprocess is modeled as a partially observable Markov decision process (POMDP) whose statespace is defined as a combination of world states and the user’s goal structure. For estimatingthe user’s goal stack, they employ a Dynamic Bayesian Network (DBN). The number oflevels of tasks in the DBN corresponds to the depth of the directed graph in a relational taskhierarchy (i.e., model of the process), a directed acyclic graph whose nodes are relationaltask schema. The Product Data Model (PDM) developed by [35] has a tree-like structurethat is similar to the relational task hierarchy employed in [17]. The PDM specifies theelements necessary to assemble a particular product. The goal of [35]’s work is to makerecommendations to a business user on how to use these elements to deliver a product in thebest way by modeling the problem as a Markov Decision Process (MDP). While these twoapproaches [17,35] solve the problem of recommendations by employing Markov models, ourfocus is on providing prediction for a running process instance. Secondly, their underlyingmodel is an acyclic directed graph, unlike our model which permits cycles and addressesparallel execution paths.

There is increasing interest in integrating probabilistic process models [19,23] withdecision-theoretic models [17,35] in order to build integrated decision support systems(IDSS) [15]. A decision support system brings to the ordinary user expertise in both deci-sion analysis as well as domain knowledge. For example, the one by [20] combines deci-sion analysis with investment evaluation techniques and knowledge of the stock market.A detailed discussion of case-based reasoning (CBR) and rule-based reasoning (RBR) andhybrid approaches for decision support is provided in [15]. Liu et al. [15] conducted an exten-sive survey of existing prognosis techniques that integrate CBR techniques (such as the onein [18]) and RBR with reasoning methods such as Bayesian Belief Networks for developingIDSS. These techniques, unlike our approach, do not take into account the semantics of theunderlying business process execution (e.g., loops and parallelism) in order to predict futureoutcomes in a given process instance.

4 Our prediction method

Predicting future tasks in a semi-structured process instance on the basis of historical execu-tion traces requires understanding the state model and the transition dependencies between

123

Page 8: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

the states of the model. If the process is driven by data, then the transition from one task toanother depends on data attribute values of a particular instance. Therefore, the likelihood ofthe next task is influenced by the content of the documents and data values at the current state.Since the transitions depend on the specific document content of the process instance, eachprocess instance follows a different probabilistic model. Hence, the transition probabilitiesbetween states are instance specific. That is why the state model for the process is called aninstance-specific PPM.

In this section, we first define the state of a data-driven semi-structured business process.Then, we discuss how to construct an instance-specific PPM tailored to a currently executingprocess instance captured in a partial trace. Finally, we explain how to calculate the probabilityof executing a particular task by transforming the state model into a Markov chain.

4.1 State of a data-driven business process

In order to describe the key characteristics of the semi-structured business process targetedby this work, the state of the system needs to be defined. State is an abstraction of the generalbehavior of the process. Our abstraction of the state is based on the assumption that the nexttask depends on the document content that is available after the current task was executed.Hence, the accumulated document content and the current task are the salient features thatneed to be captured in a state definition. The document content is accumulated during processexecution starting from the first task. Some documents are related to the process itself andsome are related to the tasks that are being executed. Since we assume that all the notableactivities of a task are captured by documents and recorded, the document content at thetime of execution of a task is sufficient to represent the state of the system. We will usethis argument to create a Markov chain from the states of the system in Sect. 4.2. Casemanagement systems, in particular, are interested in the content, but not the order in whichcontent is collected. Order is managed by the control flow and can be modified to optimizethe process of collecting documents. One may argue that task execution information shouldalso be part of the state representation claiming that the next state depends on the tasks thathave been executed and their order of execution. This is important when the execution ofcertain tasks gives information about the case, even if no data are produced. The predictivepower of task execution regardless of the data being generated requires investigation and ispart of our future work.

We will now define the state of a data-driven semi-structured process as follows. LetXi = {ti , di } denote the state of the system at step i where ti is the task executed and di is thedocument generated or updated since the start of the process. This definition describes thestate of a system in terms of the task being executed and the accumulated data. Next, we willshow how the states of a data-driven semi-structured business process, as we define them inthis section, can be transformed into the states of an associated Markov model.

4.2 Creating a Markov chain

A Markov chain is a process that holds the Markov property and has a countable state space.The formal definition of a Markov chain is that given the present state, the future and the paststates are independent. This can be expressed formally as:

Pr{Xn+1 = x |X1 = x1, X2 = x2, . . . , Xn = xn} = Pr{Xn+1 = x |Xn = xn} (1)

where {X1, . . . Xn} are the states of the Markov process.

123

Page 9: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

We are interested in finding the probability of executing a particular task after step i fora given execution history. Let ti+1 denote a potential task that can be executed next and let{x0, x1, . . . , xi } denote the state values that are visited since the start of the process. Herexi = {ti , di }, where ti is the task being executed, di is the data being accumulated up to stepi + 1. Given these conditions, the probability that ti+1 is the next task to be executed at stepi + 1 can then be expressed by:

Pr{Ti+1 = ti+1|Xi = xi , Xi−1 = xi−1, . . . , X0 = x0}=

di+1∈D

Pr{Xi+1 = {ti+1, di+1}|Xi = xi , Xi−1 = xi−1, . . . , X0 = x0} (2)

where Ti+1 is the task executed at step i + 1 and D is all the possible values for di+1. Weare only interested in finding the likelihood of the next task being ti+1, but not interestedin how the document content will change in the next step. Therefore, the dependency to thedocument content is eliminated by summing up the second part of the equation by all possibledocument values. We will now simplify Eq. 2 by using the following assumption:

Assumption The next task depends only on the document content that has been updated orgenerated since the beginning of the process until the end of the current task, but not the tasksthat have already executed until the current task.

The assumption states that the next task decision does not depend on when the documentcontent was acquired, but what the total document content value is. Hence,

Pr{Ti+1 = ti+1|Xi = xi , Xi−1 = xi−1, . . . , X0 = x0}= Pr{Ti+1 = ti+1|Xi = xi } (3)

where xi = {ti , di }. Equation 3 indicates that we can make a decision about the next taskbased on the document content accumulated before the execution of the next task, sincethe next task depends only on the accumulated content at the end of the prior task. Thisfacilitates creating an instance-specific Markov chain between the tasks of the process. Theaccumulated content value, di , is instance specific. Hence, if the accumulated content for aparticular instance, φ, is dφ

i , then one-step instance-specific transition probabilities betweentasks are expressed from Eq. 3 as:

Pr{Ti+1 = ti+1|Ti = ti }φ = Pr{Ti+1 = ti+1|{ti , dφi }} (4)

where dφi denotes the data accumulated since the start of the process until the end of task ti

for a particular instance φ. Therefore, Eq. 4 yields different transition probabilities for everyprocess instance as dφ

i varies by φ.In a semi-structured business process instance, the process instance starts with initial

documents about the process itself and some instance-specific information. As the executionof the instance progresses, more documents are obtained regarding the particular processinstance and appropriate tasks are executed. Process instances differ from each other sincethe contents of documents are typically specific to each instance. Process execution tracesof a collection of process instances can be used to train machine learning algorithms toclassify the outputs of various tasks. Once a machine learning algorithm is trained, collectedpartial trace information belonging to a process instance in mid-execution can be used tomake predictions about next tasks potentially executable after the last task executed in theprocess instance. The next section focuses on using decision trees to compute state transitionprobabilities specified in Eq. 4 for a probabilistic process model customized to a runningdata-driven semi-structured business process instance and establishing a Markov chain.

123

Page 10: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

4.3 Creating an instance-specific probabilistic process model

Creating an instance-specific PPM requires finding the transition probabilities between thetasks belonging to a particular process instance. The first step is to mine a process model ofa process from a set of execution traces (step 1 in Fig. 1). Determining which tasks couldpotentially be executed in a running instance can be done by examining which tasks in theprocess model are reachable from the tasks the case worker last executed in the instance.The next step is to train a decision tree (from the same set of execution traces used to minethe process model) to classify the output of each decision node in the process model withpotential next tasks as the targeted output categories (step 2 in Fig. 1). A decision node in aprocess model is a task where execution splits into alternate branches, and it is representedas a task with an outgoing exclusive OR gateway. A decision tree provides a tree-structuredplan of a set of attributes to test in order to predict the output. For each running instance ofa process (i.e., a partial case instance) being handled by a case worker, we receive a partialtrace of the tasks executed so far in the case and the data values known up to that point.The content of a partial trace is processed by each decision tree to classify the tasks thatare the output of each decision node in the process model. As a consequence, probabilitiesassociated with each edge emanating from each decision node to another task are assignedbased on the attribute values of the current instance available at the decision node. The resultis referred to as an instance-specific PPM (step 3 in Fig. 1).

We begin with a formal definition of PPMφ , a PPM specific to instance φ, and then describethe exact steps to create the PPMφ , including pseudo-code in Algorithm 4.3.

Definition 4.1 PPMφ = (G, linput, loutput, α, γ ), where G(V, E) is a directed graph witha set of nodes, V , and a set of edges, E . Each edge, e ∈ E has a probability, pe, wherepe is a real number ∈ [0, 1]. There are two label functions on each node: (1) input labellinput : V → {AND, XOR} and (2) output label loutput : V → {AND, XOR}. The start node,α, is a node without any incoming edges and the end node, γ , is a node without any outgoingedges.

A PPMφ can have chained gateways (a series of gateways). Every gateway whose inputor output is attached to another gateway is denoted as a (dummy) task. It is assigned a taskname.

Using existing process mining algorithms [14,28], we first mine a process model, PM(line 4 of Algorithm 4.3), from a set of process execution traces, Z . Each trace in this set is acomplete execution trace of a process instance that has completed execution. Next, a copy ofPM is created and referred to as PPM with the probability of each edge in PPM initialized to0 (line 5 of Algorithm 4.3). PPM has a set of nodes, V , where each node is a task with labelsassociated with it as defined in definition 4.1, and a set of edges, E , where each edge connectstwo nodes in the PPM. The outdegree of a node of the PPM is the number of edges outgoingfrom the node. For every node ti in PPM (lines 6–7 of Algorithm 4.3) that has an outdegreeof 1, a probability of 1 is assigned to its outgoing edge. In other words, let tk be a nodeconnected to node ti , whose outdegree is 1, then pti ,tk = 1 (lines 8–9 of Algorithm 4.3). Forevery task ti that is a decision node in the PPM, i.e., it is labeled with an outgoing exclusiveOR gateway, a decision tree, Dti , is learned from the set of execution traces Z , and the treeis stored with respect to ti in PPM (lines 10–11 of Algorithm 4.3). This tree is trained topredict the nodes connected to ti by 1 edge. For every task ti in PPM that is labeled withan outgoing parallel AND gateway, in other words, multiple outgoing edges of the gatewaymay execute in parallel, a probability of 1 is assigned to each outgoing edge connecting tito another task node via the parallel gateway (lines 12–18 of Algorithm 4.3). All remaining

123

Page 11: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

edges in the PPM are outgoing edges from decision nodes labeled with an outgoing exclusiveOR gateway. The probability of each such edge is customized (in the next step) to a specificrunning process instance for which a partial trace yφ is available. For each partial trace yφ ,a copy of PPM is created and referred to as PPMφ (lines 21–23 of Algorithm 4.3). Let T bethe set of task nodes in PPMφ . For each task t j in T , if t j is a decision node, let there be |Et j |outgoing edges, {e1, e2, . . . , e|Et j |} from the XOR gateway labeled at the output of t j (lines25–28 of Algorithm 4.3). For each such outgoing edge from t j (via the XOR gateway) to anode tk , the probability of executing that edge is denoted by pt j tk . This probability, pt j tk , isinitialized with the prediction, also known as confidence, made by decision tree Dt j trainedat task t j that tk executes after t j (lines 29–32 of Algorithm 4.3) given the document contentsavailable at node t j in the partial trace yφ .

Algorithm 4.1 1: Creating an instance-specific probabilistic process model.2: Input: Z : set of process execution traces. A set Y of partial traces, y1, y2, . . . y|Y |, of

business process instances in mid-execution.3: Output: A set of instance-specific PPMs, {PPM1, PPM2, . . . , PPM|Y |}, i.e., a probabilis-

tic process model specific to each currently running business process instance representedby {y1, y2, . . . y|Y |}.

4: Mine a process model, PM from the set of execution traces, Z .5: Create a copy of PM, and call it PPM. Assign a probability pi j to each edge from node i

to node j and initialize it to 0. Let T be the set of task nodes in PPM.6: for i = 1→ |T | do7: ti ← ith element of T in PPM8: if ti has only one outgoing edge in PPM. Let the node connected to ti be tk then9: pti tk ← 1 in PPM

10: else if ti is a decision node then11: Learn a decision tree, Dti at ti from Z to predict each task reachable from ti in PPM

via an edge, and store Dti with respect to ti in PPM.12: else if ti has Eti outgoing edges via a parallel AND gateway in PPM then13: for j = 1→ |Eti | do14: t j ← task connected to ti from its jth outgoing edge15: pti t j ← 1 in PPM16: j ← j + 117: end for18: end if19: i ← i + 120: end for21: //Create a PPMφ corresponding to each partial trace, yφ , from φ = 1, . . . |Y |22: for φ = 1→ |Y | do23: Create a copy of PPM and call it PPMφ . Let T be the set of task nodes in PPMφ .24: yφ ← φth partial trace in Y25: for j = 1→ |T | do26: t j ← jth element of T27: if t j is a decision node then28: Let Et j be list of outgoing edges from XOR gateway emanating from t j

29: for k = 1→ |Et j | do30: tk ← task connected to t j by kth edge via XOR gateway from t j

31: pt j tk ← probability output by decision tree, Dt j for data values in yφ .32: k ← k + 1

123

Page 12: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

33: end for34: end if35: j ← j + 136: end for37: φ← φ + 138: end for

As a result of the assumption (stated in Sect. 4.2) that the next task does not depend onwhen document content was acquired but what the total document content values are, eachdecision tree is characterized with a vector of document data values, dφ

i , captured by Eq. 4.In other words, we do not use the path information of previously executed tasks to traindecision trees in order to create an instance-specific PPM, but only accumulated data. Theresulting instance-specific PPM is therefore a Markov system. If we do not make such anassumption, then we could compute an instance-specific PPM by training decision trees withboth task path information and data, and the resulting instance-specific PPM would not havethe Markov property.

The computation (lines 5–20 of Algorithm 4.3) of the probabilistic process model, PPM inAlgorithm 4.3, is performed once regardless of the number of partial traces. This computationrelies on a set of traces, Z , to mine a process model, and learn a decision tree at each decisionnode in the mined model. As the aggregate behavior of how actors in a case handle caseinstances changes and deviates from the mined business process model, one needs to decidewhen and how to update the set of training traces, Z , used to construct the PPM from whicheach instance-specific PPM is derived. Determining when and the degree to which a businessprocess has changed and its implications on updates to Z is not within the scope of this paperand is the subject of our future work.

Training a decision tree for the purpose of finding the transition probabilities betweentasks requires selecting the potential next nodes as the target attributes and data informationavailable from the training sequences as the input attributes. For every decision node, ti inPPM, we learn a decision tree, Dti from Z , the set of training traces. Corresponding to eachtrace in Z , a training sample for a decision tree at a decision node is constructed that containsthe attribute values recorded before the execution of the decision node, as well as the outcomeof the decision node. For example, Table 1 shows 3 training samples for a decision tree topredict whether the outcome of a decision node is Send Repair Request. Each sample (row)is created from a trace and consists of values for data attributes: Age of Car and DamageArea Size recorded in the trace before the execution of the decision node and the outcome ofthe decision node (Send Repair Request or Send Payment). Figure 3 shows a binary decisiontree learned for this decision node using training samples such as those in Table 1. We referthe reader to [24,36] for additional details on decision tree training.

Note that that we use the same partial trace, yφ , at every decision node to build the PPMφ .If the decision node is not the last executed node in the partial trace, then classification isdone by incomplete data. This has an impact on the accuracy of PPMφ and is discussed inthe next section.

Table 1 Training samples for adecision point in an automobileinsurance claims process

Processinstance ID

Age of car Damagearea size

Outcome ofdecision point

10001 1 2 Send Repair Request

10002 3 5 Send Repair Request

10003 12 2,000 Send Payment

123

Page 13: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

Fig. 3 A binary decision treelearned to predict whether sendRepair Request will execute

Fig. 4 A partial trace captures a process instance in which task t3 is the last to execute

4.4 Accuracy of an instance-specific probabilistic process model

The accuracy of the transition probability from a given task to neighboring tasks that areobtained by decision trees depends on how much document content is available at the decisionnode. In a typical scenario, transition probabilities are computed by using a partial trace whichcontains partial execution data. At the beginning of the execution of a process instance, apartial trace contains data mostly about the process but not much about the specifics of theinstance. In this situation, the decision tree tries to classify the potential next tasks with themissing data. This will negatively impact the prediction accuracy of the tree unless the initialprocess data are sufficient to make predictions. Toward the end of the execution of a processinstance, more instance-specific data are accumulated as more tasks have been executed andthe prediction accuracy is expected to be higher.

Figure 4 shows a snippet of a process model and a partial execution trace, yφ , of a runninginstance of this process in which task t3 is the last task to execute. The partial execution traceprovides information about the document content up to t3 which is the last task executed. yφ

consist of the following trace: {(t0, d0), (t1, d1), (t3, d3)}. The process model in Fig. 4 showsthat potential future tasks for this instance may include t4, and t5 or t6. A decision tree thatis trained at t4 can be used to predict the probability of the next task after t4. The prediction,however, will be based on the information provided by the partial trace which does not includethe document values at t4. In the absence of the real data, the decision tree will try to estimatethe values of data at t4 based on available training data. The prediction accuracy of the treewill therefore be lower than if the prediction is made with the data available at t4 for the caseinstance corresponding to the partial trace yφ .

If the process data that are available at the beginning is more important than the task-relateddata that is accumulated as the process progresses, then the stage at which an instance-specificPPM is learned for a given process instance will not necessarily affect the accuracy of theresults.

123

Page 14: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

4.5 Calculating the probability of executing a particular task

In a typical semi-structured business process, the next task depends on the content of the dataat the current state which has been accumulated during the execution of the past tasks. A caseworker is assumed to make a decision for the next task based on the current case data but notnecessarily based on the list or the path of the previously executed tasks. We assume that thecase data accumulated before the execution of the next task implicitly carries informationabout the past. This is analogous to trying to predict the next destination for a traveler byexamining his suitcase. The content of the traveler’s suitcase may include souvenirs boughtfrom different travel hubs as well as papers related to the traveler’s past tasks such as carrentals, hotel bills, etc. As every traveler is a unique case, every partial execution trace isunique as well. The likelihood of the next task depends on the instance-specific content ofthe case data accumulated so far.

An instance-specific PPM is represented by a transition matrix where the entries of thematrix are the instance-specific transition probabilities between the tasks computed by algo-rithm 4.3. A one-step transition matrix, Pφ , corresponding to instance φ, is then formed byarranging the one-step transition probabilities into a square matrix where the row sum is 1and pφ

k j is the one-step transition probability from task k to task j for a particular instance φ.

Pφ =

⎜⎜⎜⎜⎝

pφ00 pφ

01 pφ02 ..

pφ10 pφ

11 pφ12 ..

pφ20 pφ

21 pφ22 ..

.. .. .. ::

⎟⎟⎟⎟⎠(5)

If the process is a Markov chain, then the one-step transition between two tasks is inde-pendent of the path that the process traverses to arrive at these task. This means that thetransition matrix at step k, Pφ[k], is the same as Pφ for all values of k. In a typical processmodel, there may be multiple terminal nodes or tasks and case workers may want to know thelikelihood of completing the process at a particular terminal node so that they are preparedin advance. The likelihood of ending in a particular terminal node can be found by usingthe absorbing state concept in Markov chains. The terminal nodes can be considered as theabsorbing states of a process. Just as a Markov chain cannot leave an absorbing state andterminates, if entered, a process instance terminates when a terminal node is hit. If there areno parallel executions in a process, the computation of absorption probabilities can be applieddirectly to the instance-specific PPM of the process (step 5 in Fig. 1). If the process containsparallelism, additional work is needed to compute the probability of reaching a particulartask, and this is described in Sect. 4.6.

The probability that a case instance will end up in a particular terminal state is found bycalculating the absorbing state probabilities [5]. For an absorbing Markov chain, a canonicalform of the transition matrix is obtained by renumbering the states so that the transient states(i.e., non-absorbing states) come first. If there are r absorbing states and h transient states,the transition matrix Pφ has the following canonical form. Here, the matrix Qφ consistsof the transition probabilities between the transient states, and Rφ consists of the transitionprobabilities from transient states to absorbing states. Obviously, there are no transitionsfrom absorbing states to transient states, as shown by the zero matrix, and absorbing statescan only have transitions to themselves, as indicated by the identity matrix I .

Pφ =(

Qφ Rφ

0 I

)(6)

123

Page 15: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

The first h states are transient and the last r states are absorbing states. Let bi j be the probabilitythat an absorbing chain will be absorbed in the absorbing state s j if it starts in the transient

state si . Let Bφ be the matrix with entries bφi j . Then, Bφ is a h-by-r matrix, and

Bφ = Nφ Rφ (7)

where Nφ = (1 − Qφ)−1 is the fundamental matrix for P and R is as in the canonicalform [6]. Once the matrix Bφ is computed, then the probabilities of reaching a terminal taskstarting from any task can be obtained under Markovian assumptions.

In order to find the probability of ever reaching a particular future task from a startingtask, the concept of first passage time is used. The first passage time from state i to state jis the number of transitions made by the process in going from state i to state j for the firsttime. If fi j (n) denotes the probability that the first passage time from state i to state j takesn steps, then the first passage time probabilities satisfy the following recursive relationshipfor instance φ [10]:

f φi j (1) = pφ

i j , (8)

f φi j (2) =

k �= j

pφik f φ

k j (1), (9)

f φi j (n) =

k �= j

pφik f φ

k j (n − 1). (10)

Given that task i has been executed, the probability that task j will ever get executed forinstance φ, ϕ

φi j is found from Eq. (10) as:

ϕφi j =

∞∑

n=1

f φi j (n) (11)

Note that as f φi j (n) converges to zero as n increases and the summation in Eq. (11) converges

to ϕφi j ≤ 1. The absorption probability to a particular terminal node can also be found by

(11), since the first passage probability to an absorbing state is the absorption probability ofthat state. The difference between Eqs. (7) and (11) is that Eq. (7) assumes that Pφ is thetransient matrix of a Markovian chain where the sum of all rows are 1. Equation (11), on theother hand, is applicable in cases where simultaneous transitions occur; hence, row sums aregreater than 1.

As mentioned above, the concept of absorption probability can be applied directly to aninstance-specific PPM if the model does not contain parallel execution paths. If an instance-specific PPM contains parallelism, there may be simultaneous transitions from a single taskto multiple tasks. This means that the sum of the transition probabilities from a task withparallel outgoing transitions becomes greater than 1. If there are N parallel executions, thesum will add up to N , one for each execution path. This means that the system can be inmore than one state at a given time, and a Markovian chain cannot be established. Therefore,the state space is not valid in case of parallel execution paths, neither is Eq. 7. Equation (10),on the other hand, can still be used as long as one-step transition probabilities are obtained.In order to obtain the transition probabilities between tasks, we show how to extend the statespace in the next section.

123

Page 16: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

4.6 Creating an extended space Markov chain

In this section, we explain how an instance-specific PPM containing parallel gateways can bemapped to an extended that contains additional states which account for the parallelism in theinstance-specific PPM (step 4 in Fig. 1). In the extended state space representation, the statesare represented by the edges of the process model rather than the nodes. The extended statespace preserves the Markov property. Our technique to map an instance-specific PPM to anextended space Markov chain is similar to methods used to construct the reachability graphof a Petri Net [16]. First, we explain how to map an instance-specific PPM to an extendedspace Markov chain with an example (Fig. 5); then, we formally define the mapping.

Consider the example of an instance-specific PPM displayed in Fig. 5a. The output gateof task a is a parallel AND gateway and the output gate of task b is an exclusive OR (XOR)gateway. Assume two partial traces, y1 and y2, have been captured of two different currentlyexecuting process instances each belonging to the same semi-structured business process asthe instance-specific PPM displayed in Fig. 5a. Let the execution sequence of tasks in y1 beab and the execution sequence of tasks in y2 be acb. Will task c execute after task b in eachprocess instance? The last task in both traces is b. In y1 task c could be executed after task b.In y2, it is not possible to execute task c after task b. Therefore, the question of which taskwill execute next in a given process instance is a function of the number of edges that arepotentially executable (or active), and this could be greater than one due to parallel gateways.

Next, we consider the nodes and edges that are activated as a result of the execution ofan instance of the process displayed in Fig. 5a. The process instance begins by the executionof node a which is the start node. After the start node is triggered, its output edge (the edgebetween the start node and the AND gateway denoted as ‘+’) becomes active. Now, the inputedge to the AND gateway becomes active, and thus, it is executable. At this stage, inputs tonodes b, c, d and e are not available, and thus, they are not executable. After executing +,the input edge to gateway + is inactivated. Also, the edges labeled 1 (connecting + to b)and edge labeled 2 (connecting + to c) become active. Now all inputs to node b and c areready and either of them can be executed. If c executes first, edge 2 is inactivated, and theonly remaining active edge is the edge labeled 1. By executing node b, the input edge to theexclusive OR (XOR) gateway becomes active. After executing XOR, the input edge to theXOR gateway is inactivated. Subsequently, edge 1 is inactivated and one of the edges 3 or 4become active (for our example assume edge 4 becomes active). At this stage, only edge 4is active and only node e can be executed. After executing node e, all edges become inactive

Fig. 5 An example of a an instance-specific PPM, b an alternative representation of a with labeled edges,and c an extended space Markov chain of (a)

123

Page 17: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

and the process instance ends. In summary, the execution sequence of tasks in this instancewas a, AND, c, b, XOR, e: task a executes→ gateway AND is activated→ edges 1 and 2are activated→ task c executes→ task b executes→ gateway XOR is activated→ edge 4executes→ task e is the last to execute, and the process instance ends.

Tracking all potential active edges in an instance-specific PPM allows us to transform it toan extended space Markov chain. Figure 5c is an extended space Markov chain representationof Fig. 5b, where Fig. 5b is an alternative view of Fig. 5a showing only labeled tasks andlabeled edges between them. Assume that each gateway of the instance-specific PPM residesas an input or output label to the single task it is an input to or output of. For example, theAND gateway resides as an output label to task a, and edges 1 and 2 are parallel. Similarly, theXOR gateway resides as an output label to task b and edges 3 and 4 are exclusive. Figure 5cis an extended space Markov chain representation of this instance-specific PPM with startnode s0. Each yellow ellipsoid is a state in the new extended state space of the Markovchain. The binary string within each state of the extended space Markov chain represents theactive edges in the instance-specific PPM indexed by edge number. The starting state in theextended state space is s0. Labels on the edges in the extended space Markov chain representwhich task in the instance-specific PPM causes the corresponding edge in the extended statespace to become active. In this example, we started from s0 and executed task a, then edges1 and 2 became active leading to a transition to state (1, 1, 0, 0). At this point, edges 1 and2 are active and edges 3 and 4 are inactive. In this state, there are two active tasks: b andc. Execution of task c leads to state (1, 0, 0, 0), and execution of task b leads to either state(0, 1, 0, 1) or state (0, 1, 1, 0). The potential size of an extended space Markov chain is 2|E |where E is the total number of edges of its corresponding instance-specific PPM. Not allstates, however, are reachable from the starting state. For example, in Fig. 5, it is not possibleto reach state (0, 0, 1, 1) because edges 3 and 4 cannot be active simultaneously in the PPM.This is why Fig. 5c has 9 states instead of its potential maximum size of 16.

Now, we describe an algorithm for creating an extended Markov state space from aninstance-specific PPM. Each instance-specific PPM is provided as input to our algorithm (inthe representation depicted in Fig. 5b). Specifically, the input consists of an instance-specificPPM denoted as PPMφ where φ is a particular process instance. Recall from Sect. 4.3 thedefinition (Definition 4.1) of a PPMφ = (G, linput, loutput, α, γ ). Each edge of PPMφ canbe labeled as active (1) or inactive (0), allowing 2|E | possible labels of the edges. Elementμ j ∈ {0, 1} is a label on an edge j , indicating whether the edge is active (μ j = 1) or inactive(μ j = 0). It is assumed that a process instance always completes perfectly, i.e., when theend node is reached, there is no remaining active edge in the instance-specific PPM.

A set of states, S, in the extended space Markov chain corresponding to a PPMφ is the setof all possible unique labeling of edges in PPMφ .

Definition 4.2 S = {s1, s2, . . . s|S−1|} ∪ s0, where |S| = 2|E | + 1. For each state s ∈ S wheres �= s0, s = {(μ1, μ2, . . . μ|E |)|μ j ∈ {0, 1}, j = 1, 2, . . . |E |}. State s0 corresponds to theinitial state of PPMφ .

Thus, each state s in the extended space Markov chain is a binary n-tuple where n = |E |.There exists an edge between two states s1 and s2 in the extended space Markov chain ifand only if there exists a node t ∈ PPMφ such that the active edges in state s1 trigger theexecution of t , and after the execution of t , the set of active edges becomes s2. For example,in Fig. 5c, state (1, 1, 0, 0) and (0, 1, 0, 1) are connected since executing task b transformsthe set of actives edges from (1, 1, 0, 0) to (0, 1, 0, 1).

Having established the set of states in the extended space Markov chain, we next define thetransition probabilities between these states. First, we define transition probabilities Pr{s0, s}

123

Page 18: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

from the start state, s0, to all other s = (μ1, μ2, . . . μ|E |) ∈ S (s �= s0). The start node ofPPMφ , α, might have an AND or XOR output label.

– Case 1 output label of α is AND. When α executes, all of its output edges become active.Since the instance PPMφ has just started, all other edges are inactive. Therefore, withprobability one, the next state can be determined. The next state s is the state in which anedge j ∈ PPMφ is active if and only if it is an output of α.

Pr{s0, s} = 1⇐⇒ μ j =⎧⎨

1 if the edge labeled j ∈ PPMφ is anoutput of α

0 otherwise(12)

It is not possible to go to any other state from the start state in one step; thus, for all otherstates s ∈ S, Pr{s0, s} = 0.

– Case 2 output label of α is XOR. When α executes, only one of α’s output edges becomesactive. If α has M output edges, (e1, e2, . . . eM ), then the m-th edge is chosen with prob-ability pm which is the one-step transition probability between initial task and the nextone. The value of pm can be obtained from PPMφ by using decision trees as described inSect. 4.6. Assume the m-th edge among the output edges of α is activated. Then, an edgej in the next state, s, is active if and only if the edge corresponds to the m-th output of α.

Pr{s0, s} = pm ⇐⇒ μ j =⎧⎨

1 if the edge labeled j ∈ PPMφ is them-th output of α

0 otherwise(13)

From the start node, it is not possible to transit to any other state in one step, thus for allother states s ∈ S, Pr{s0, s} = 0.

Computing transition probabilities between all other states, s1, s2, . . . sw(�= s0) ∈ Srequires the definition of an active task.

Definition 4.3 Task t ∈ PPMφ , with an AND input label, is called an active task in states = (μ1, μ2, . . . μ|E |) ∈ S if each input edge (labeled μ j ) to t is active (i.e., μ j = 1). Taskt ∈ PPMφ , with an XOR input label, is called an active task in state s ∈ S if at least one inputedge (labeled μ j ) to t in PPMφ is active (i.e., μ j = 1).

Next, we describe how to compute transition probabilities, Pr{s1, s}, from s1 to all otherstates s = (μ1, μ2, . . . , μ|E |) ∈ S (s �= s1). By definition, s1 = ((s1)1, (s1)2, (s1)3, . . .),where (s1) j = μ j , the j-th element of state s1. Assume t ∈ PPMφ is an active task in states1. The output label of t may be AND or XOR.

– If the output label of t is AND then, by executing t , all output edges of t become activeand all input edges to t become inactive. Thus, the transition probability from s1 to s canbe defined as follows:

Pr{s1, s} = λt ⇐⇒ μ j =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

1 if the edge labeled j in PPMφ isan output of active task t

0 if the edge labeled j in PPMφ

is an input of active task t(s1) j otherwise

(14)

λt is the probability of executing t when there are multiple active tasks where λt is a realnumber ∈ (0, 1]. Multiple active tasks arise due to parallelism. Let As1 = {t1, . . . t|As1 |}

123

Page 19: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

be the set of active tasks ∈ PPMφ corresponding to state s1. Each λtk should be chosen

such that∑|As1 |

k=1 λtk = 1. The transition probability from s1 to any other state s is 0 if theabove conditions in Eq. (14) do not hold: P(s1, s) = 0.

– If the output label of task t is XOR and it has M output edges (e1, e2, . . . eM ), thenafter executing t , only one of the output edges m becomes active and its probability isprovided by P P Mφ as pm . Thus, an edge j would be active if and only if it is activebefore t executes or it is the mth-output of t and it is not an input edge to node t . Thus thetransition probability from s1 to s can be defined as follows:

Pr{s1, s} = λt pm ⇐⇒ μ j =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

1 if the edge labeled j ∈ PPMφ isthe m-th output of active task t

0 if the edge labeled j ∈ PPMφ isan input edge of active task t

(s1) j otherwise

(15)

pm is the probability that the m-th output task will execute after the active task t . λt is areal number ∈ (0, 1]. Since our goal is to estimate the probability of executing tasks inthe future, not the order of execution, choosing which task to execute next, i.e., λti , doesnot affect the accuracy of the probability estimations.

The same reasoning (Eqs. 14–15) can be applied to compute P(sw, s), w = 1, 2, . . . , |S|,for all states s ∈ S, (s �= sw).

4.7 Calculating the probability of executing a future task in a process instance containingparallel execution

In the previous section, we showed how to map an instance-specific PPM with parallel exe-cution into an extended space Markov chain. Once the one-step transition matrix is computedfor the extended Markov state space (as outlined in the previous Sect. 4.6), the probability ofreaching any state starting from any state can be obtained by using the same concept outlinedin Sect. 4.5, namely the concept of first passage times.

Suppose we want to compute the probability that task t f will execute for a given processinstance, φ, containing parallel executions, in which task tc is the last task to execute. AssumePPMφ is the instance-specific PPM for instance φ, and let S denote the states in the extendedMarkov state space of PPMφ . As mentioned in the previous section, a task is executed if itever gets activated. Let At f denote the set of all states in the extended space Markov chain inwhich task t f is active, thus At f ⊆ S. Recall from Sect. 4.6 that a state, s ∈ S, in the extendedspace Markov chain is a set of binary numbers whose index corresponds to the status of eachedge in PPMφ . If the status of an edge is 1, it means that the edge is fired or active. If there isat least one active edge in a state s that represents an incoming edge to task t f when t f hasan OR input gate label, or alternatively if all incoming edges to t f are active in a state s whent f has an AND input gate label, then task t f is activated in state s. Therefore, At f consistsof all states in S which have active edges where each active edge corresponds to input edgesto task t f , and the number of active edges is equal to the number of edges required by t f ’sinput gateway semantics to be active. For example, if t f is task c in Fig. 5b, since c has onlyone incoming edge (edge number 2) and consequently no input gate label, edge number 2activates task c. Therefore, Ac consists of all states in S with the second edge set to 1. Inother words, Ac = {(1, 1, 0, 0), (0, 1, 1, 0), (0, 1, 0, 1), (0, 1, 0, 0)}. In general, Atk can be

123

Page 20: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

defined as follows:

Atk = {s ∈ S | task tk is active in state s} = {ak1 , ak

2 , . . . , a|Atk |} (16)

where aki is a state of the extended Markov state space consisting of binary n-tuples. Hence,

the probability that task t f will execute is equivalent to finding the probability that the processinstance is in one of the states contained in At f . Thus, the probability of executing task t f ,given that the process is at tc, is obtained by summing up the probability of reaching everystate a f

i of At f for all i ≤ |At f | from each state in S that becomes active after the executionof task c from each state ac

j of Atc for j ≤ |Atc |. If we repeat this procedure for every pair oftasks, tc and t f in V , where V is the set of tasks in PPMφ , the one-step transition matrix of theextended space Markov chain corresponding to PPMφ is converted to a one step transitionmatrix of the process tasks. Here, Eq. (11) can be used by employing the one-step transitionprobabilities of the process tasks. In other words, pi j of Eq. (11), in this case, denotes thetransition probability from ti to t j .

The transition matrix of the process tasks obtained through the procedure described abovedoes not hold the Markovian property in case of parallel tasks. This is because, when thereare parallel paths, ∑

j

pφi j �= 1. (17)

Hence, Eq. (7) cannot be used to calculate the execution probability of a terminal task. Inorder to find the probability of executing a particular task in this case, however, we can stilluse the first passage time probability from state i to state j as in Eq. (11). Note that the useof Eq. (11) does not assume Eq. (17) and takes into account all the parallel paths to reach afuture state assuming that the next task depends only on the current task.

In summary, the steps of computing the probability of a future task in a running instanceof a process that contains parallelism, represented by PPMφ , are as follows:

1. Obtain the one-step transition matrix of the extended Markov state space of PPMφ throughthe algorithm outlined in Sect. 4.6 (Eqs. 12–15).

2. Let V denote the set of tasks in PPMφ . For a pair of tasks ta and tb, where ta, tb ∈ V ,identify the set of states Ata and Atb belonging to the extended Markov state space ofPPMφ which map to the tasks ta and tb, respectively. The transition probability betweenta and tb is obtained by summing up the probability of reaching each state in Atb fromeach state in S that becomes active after the execution of task ta from each state in Ata .Compute the transition probability for every pair of tasks ta and tb in PPMφ in this way.

3. Once the one-step process tasks transition probability matrix is obtained, then computethe probability of executing any task from any other task by using Eq. (11).

Note that Eq. (11) recursively obtains state transition probabilities by using one-step transitionprobabilities without making the Markovian assumption. In case of parallel paths, the one-step transition probability matrix does not satisfy Markov properties; hence, Eq. 7 cannot beused. Nevertheless, Eq. (11) is valid even if the Markov property does not hold. Suppose wewant to compute the probability that t f will execute for a running process instance whoselast executed task is tc. If ϕtct f in Eq. (11) converges to 1, then t f is an absorbing state.Otherwise, t f is a transient state, and Eq. (11) gives the probability of reaching t f from tc. If

f φtct f

(n) = 0 for all values of n in Eq. (10), then the state t f cannot be reached from tc.

123

Page 21: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

5 Implementation and results

In order to evaluate our algorithms, we have designed an automobile insurance claims process,that is a typical example of a case-oriented semi-structured business process, and implementedit in a simulator. In this section, we provide details about the process and the simulator, presentresults of our prediction technique based on simulated auto insurance traces and introduce thenotion of normalized log-likelihood to evaluate the effectiveness of our prediction technique.

5.1 Automobile insurance scenario and simulator design

The scenario is in accordance with typical insurance industry practices [11] and has beensimplified for the sake of achieving clarity in our experiments. Automobile insurance is agood candidate for evaluating our algorithms because an automobile insurance process istypically semi-structured and document-driven. A case worker handling an auto insurancecase instance typically repeats one or more tasks and execute tasks in parallel. The caseworker will usually receive documents such as an accident report and claim document andhas to make a decision on which task to execute next on the basis of document contentsand personal judgment. The order of task execution could vary dramatically between caseinstances depending on factors such as the validity of insurance documents, the claimant’sinsurance history and determining who caused an accident.

The auto insurance claims business process implemented by our simulator is shown inFig. 6. The process contains parallel tasks and loops (of length 2 and 5).

In order to simulate traces in which the set and order of tasks per process instance vary,we introduce the following variations in the simulator:

1. Document-content-driven decision making. Alternate paths, such as Repair the car ornot, are taken depending on the values of one or more document contents, such as the carvalue, age of car, damage estimate amount, etc.

2. Human decision making. Actors in the simulator have properties modeled as probabil-ities, such as the claim handlers probability of overestimating the car value. (Normally,car value is obtained from some database, but claim handlers sometimes make honest oreven dishonest mistakes. The simulator models this by providing a small chance that suchan estimation error will occur.)

3. Invalid deviations. Activity outcomes may deviate from expected behavior. For instance,the notify state task is typically executed when the dollar amount in the payment documentis greater than a threshold. Due to deviations introduced in the simulator, the state maysometimes not be notified, even when the payment document dollar amount exceeds thethreshold.

The simulator is implemented in Java and produces complete, valid XML execution traces.For our experiments, we generated different sets of 2,000 simulation traces. Table 2 showsthe documents in our simulator and the decision points at which they are available during thecourse of each instance of automobile insurance process. Table 3 shows the data attributesin each document in the simulator. Attribute Party at Fault is an enumerated type. All otherattributes in the simulator are numeric types.

The first step required to create an instance-specific PPM is to mine a process modelfrom process execution traces (line 4 in Algorithm 4.3). We used the publicly available opensource ProM tool1 [31] to mine a process model from execution traces of the simulator. The

1 http://www.processmining.org/prom/start

123

Page 22: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

Fig. 6 Automobile insurance process implemented by the simulator

transition probabilities of each edge of an instance-specific PPM can be provided by a machinelearning algorithm such as a decision tree. In order to compute these transition probabilities,we used the decision tree algorithm J48 provided by the Weka software library [36] whichis an implementation of the C4.5 decision tree learning algorithm [21]. Our decision treeimplementation is currently restricted to handling numerical values and enumerated types asdocument attributes. We restricted the parameter minNumObj of the Weka library to 100 inour experiments. minNumObj refers to the minimum number of traces classified by a givenleaf node of the decision tree. A large value of minNumObj corresponds to the aggregation of

123

Page 23: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

Table 2 Documents available at each decision node in the automobile insurance scenario implemented bythe simulator

Document name Decision nodes to which document is available

Claim Document Create Claim, Receive Claim Document, Whose Fault? Should Car BeTotaled? Retrieve Customer Data, Retrieve Accident Report

Police Accident Report Retrieve Accident Report, Get Third Party Info, Whose Fault? Should Carbe Totaled? Retrieve Customer Data

Auto Repair Invoice Handle Repair Request Response, Send Repair Request, Should Car BeTotaled?

Customer Insurance Record Get Third Party Info

Deductible Document Should Car Be Totaled?

Table 3 Data value names belonging to each document in the simulator

Document name Names of data attributes in document

Claim Document Estimated Car Value, Car Age, Car Year, Car Damage Estimate

Police Accident Report Damage Area Size, Party At Fault

Auto Repair Invoice Percentage of Total Cost

Customer Insurance Record Number of Years Customer Insured, Number of Customer’sPrevious Accidents

Deductible Document Deductible Amount

more cases per leaf node, and thus a simpler decision tree. The time complexity of the C4.5learning algorithm is O(Z N 2), where Z is the number of process execution traces used totrain the decision tree, and N is the number of different data attributes in each trace. In ourexperiments, the number of attributes is typically small (N < 10), and Z is roughly 1,000.In practice, the learning time per decision tree is quite small (<2 s).

Figure 7 displays an instance-specific PPM of the auto insurance scenario generated byour prediction technique. The model is mined from a set of 1,000 traces generated by theautomobile insurance scenario simulator. Decision trees at every decision node in the modelwere trained from these 1,000 traces, and transition probabilities on edges are customizedaccording to Algorithm 4.3 outlined in Sect. 4.3 to a running instance of the automobileinsurance process in which the last task to execute is task Retrieve Accident Report.

In the next section, we discuss the results of predictions made by our technique forsimulator-generated automobile insurance traces.

5.2 Evaluating the effectiveness of our prediction technique

In order to evaluate the impact of data on prediction, we introduce the likelihood function.Beginning with a set Z of automobile-insurance-simulator-generated traces, we divide Z intotwo groups: Z

2 set of training traces, and Z2 set of test traces. For the experiments in this

section Z = 2, 000. The training data set is used to mine a process model and train decisiontrees at every decision node. A new instance-specific PPM is then created whose transitionprobabilities are customized to each test trace in the following way. Let z = (t1, t2 . . . tn) bea trace in the test set. ti are executed tasks for a specific case. We choose a random numberk between 1 and m, where m is the maximum number of tasks in a trace of the process, anddivide trace z into two parts: z1 = (t1, t2 . . . tk) and z2 = (tk+1, tk+2 . . . tn). The first halfof each test trace, z1, is used to customize the transition probabilities of an instance-specific

123

Page 24: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

Fig. 7 An instance-specific PPM generated by our prediction technique for an automobile insurance processinstance from simulator-generated execution traces

PPM and predict every potential future task from the last task to execute in z1. Since theautomobile insurance process contains parallel execution, each instance-specific PPM is firstmapped to an extended space Markov chain using the method outlined in Sect. 4.6. Next, theone-step transition matrix of each extended space Markov chain is converted into one-steptransition probabilities between the process tasks using the method outlined in Sect. 4.7.Then, the Eq. (11) for first passage time introduced in Sect. 4.5 is applied to compute theprobability of executing any potential future task.

123

Page 25: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

z2 is the trace that the process follows after z1. The predictor finds the likelihood ofobserving a task ti ∈ z2 as pti = Pr{ti |z1}. In order to measure the likelihood of observingthe whole trace z2 after z1, we compute the product of ptk+ j for each task tk+ j ∈ z2 as

�n−kj=1 ptk+ j . Since ptk+ j is a value between 0 and 1, and �n−k

j=1 ptk+ j could be extremely small,we define the log-likelihood of trace z2 as lz2 and find an approximate expression for thelikelihood of observing z2 after z1 as below by assuming that the tasks of z2 are independent.

lz2 ≈ log

⎝n−k∏

j=1

(ptk+ j

)⎞

⎠ (18)

In reality, two consecutive tasks in trace z2 depend each other. Hence, the exact expressionfor lz2 contains conditional transition probabilities between two consecutive tasks. We havesimplified the expression for lz2 in Eq. (18) by assuming that all tasks in traces z2 only dependon the last task of z1 and there is no dependency among the tasks of z2. Nevertheless, forthe purpose of showing the impact of document contents on the prediction accuracy, thisapproximation provides for a useful comparison (as discussed below).

We can normalize the likelihood function (with n − k, the total number of tasks in z2):

l̄z2 =lz2

n − k(19)

We computed the normalized log-likelihood of 1,000 test traces of the automobile insur-ance scenario generated by our simulator. We then varied the percentage of total documentdata values available at the point at which instance-specific transition probabilities are cre-ated for a probabilistic process model customized to a given test trace. The document con-tents are varied from 20 %, with increments of 20 % up to 100 %. As explained in detail inSect. 4.4, if document values are relevant to the decision making ability of a given decisionnode, then providing a greater number of such document data values to train a decisiontree at the decision node increases the accuracy of the probabilities on the resulting edgesemanating from the decision node. Therefore, varying the available document data valuesavailable when customizing the transition probabilities of a PPM to a given instance is a wayto determine how the predictive power of our technique varies with data. Figure 8 reportsthe normalized log-likelihood for predictions made by instance-specific PPMs created withan increasing percentage of available data values. These results are for simulator-generatedautomobile insurance scenario traces corresponding to the scenario in Fig. 6. The normalizedlog-likelihood numbers reported are aggregated over 1,000 test traces. Each experiment isrepeated 50 times with a new set of randomly generated traces, with average values reportedfor each experiment, including error bars indicating the margin of variation over 50 runs.

We compared the log-likelihood values of our prediction technique against the log-likelihood of predictions generated by a conditional probability approach and random guess.With the conditional approach, if the last executed task is tb, then task ta will be predicted toexecute with probability P(ta |tb) where P(ta |tb) can be estimated empirically from trainingtraces by computing the normalized frequency of transitions from tb to ta . The conditionalmethod excludes the potential impact of data completely, ignores loops and parallelism in theprocess, and simply uses historical task execution aggregated over multiple traces to predictthe future tasks. We also compared against the results for a random guess prediction methodwhere each task is predicted to execute with probability 50 %. The average log-likelihood ofthe traces for random guess (not reported in the graph) is −0.69314 ± 0.05 with simulatordata. Random guess does not utilize any structure about the process or task-related data and

123

Page 26: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

Fig. 8 Using normalized log-likelihood to compare the effectiveness of our prediction technique as more dataare made available with a condition probability prediction approach

is included for comparison alone. As expected, random guess is much worse than the otherprediction methods and simply serves as a sanity check.

Several observations can be made from Fig. 8:

1. Since the normalized log-likelihood of predictions made by an instance-specific PPMcreated with x% document values are different than the conditional probability predictionapproach, it demonstrates that taking the semantics of the business process (e.g., loopsand parallelism) into account makes a difference on the prediction results compared tomethods that ignore this.

2. The normalized log-likelihood of predictions made by each instance-specific PPMincreases as the number of document values available to customize the transition proba-bilities of the instance-specific PPM increase. This confirms our hypothesis that makingmore data available, that is relevant to the decision making ability of a process instance,when creating an instance-specific PPM, increases its predictive power. At some pointadding more document content values may not improve the prediction quality noticeably,particularly for simulator-generated data. One reason for this could be that after a point,additional document values do not have any influence over the decision points in thebusiness process.

Decision trees, in addition to other machine learning tools, provide a measure of confidencewith each prediction. For example, the decision tree learned at the decision node Should

123

Page 27: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

Car Be Totaled? in Fig. 7 could predict for a given partial trace that the decision is SendRepair Request with 70 % confidence. Statistically, this means that the decision tree at thisnode makes an inaccurate prediction 30 % of the time. In other words, observed accuracy isstatistically similar to the confidence level of the decision tree. When we use decision trees tocreate transition probabilities for an instance-specific PPM, we are utilizing the predictionsfrom multiple decision trees all of which are accompanied with a certain level of uncertainty.When we translate an instance-specific PPM into an extended space Markov chain (in orderto address parallelism in the model), we are transforming the uncertainty captured by theinstance-specific PPM into an extended state space. Finally, when we use the algorithms inSect. 4.7 to compute the likelihood of a future task that prediction is accompanied with alevel of uncertainty that originally arose from the prediction of decision trees and eventuallybecame aggregated and transformed. Understanding and providing a comprehensive measureof the uncertainty associated with each prediction made by our technique is the topic of ourfuture work.

6 Conclusion and future work

In this paper, we have proposed a probabilistic technique for making predictions to a caseworker handling a running instance of a semi-structured business process about the likelihoodof future tasks by creating an instance-specific PPM with Markov properties. We show howan instance-specific PPM that contains parallel execution can be transformed to an extendedspace Markov chain and demonstrate how Markov methods can be used to calculate theprobability of execution of any potential future task in a running process instance. These pre-dictions could provide guidance to new case workers on how cases are typically handled andgenerate early alerts when undesirable outcomes are likely to happen. We have implementedour algorithm and evaluated its effectiveness with respect to simulator-generated traces ofan automobile insurance scenario. Experimental results suggest that our technique providesa greater number of correct predictions than techniques that do not take into account thesemantics of the business process such as loops and parallelism. Results also show that theprediction power of an instance-specific PPM increases with the amount of data (that is rel-evant to the decision making ability of the instance) made available in order to compute thetransition probabilities of the instance-specific PPM. In future work, we intend to evaluate theeffectiveness of our prediction technique on business process execution traces belonging to areal semi-structured business process management system. We also intend to investigate theuncertainty involved in our prediction technique and design algorithms to measure it. In thispaper, we assumed that existing mining techniques are sufficient to mine process models ofsemi-structured business processes. For real-world situations where this assumption may nothold, further research into process mining algorithms for flexible data-driven semi-structuredbusiness processes is required, and this is an area we intend to explore in the future.

Acknowledgments We thank Songyun Duan and Paul T. Keyser for valuable discussions.

References

1. Cook JE, Wolf AL (1998) Discovering models of software processes from event-based data. ACM TransSoftw Eng Method 7(3):215–249

2. Critical Capabilities for Composite Content Management Applications. Gartner, report. (2010)

123

Page 28: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

3. Curbera F, Doganata YD, Martens A, Mukhi N, Slominski A (2008) Business provenance—a technologyto increase traceability of end-to-end operations. OTM conferences, 1, pp 100–119

4. Datta A (1998) Automating the discovery of AS-IS business process models: probabilistic and algorithmicapproaches. Inf Syst Res 9(3):275–301

5. Feller W (1957) An introduction to probability theory and its applications, vol 1. Wiley, New York, ISBN0-471-25708-7

6. Grienstead CM, Snell L (1991) Introduction to probability. American Mathematical Society. ISBN0-8218-0749-8

7. Grigori D, Casati F, Castellanos M, Dayal U, Sayal M, Shan M (2004) Business process intelligence.Comput Ind 53(3):321–343

8. Herbst J (2000) A machine learning approach to workflow management. ECML, pp 183–1949. Herbst J, Karagiannis D (1998) Integrating machine learning and workflow management to support

acquisition and adaption of workflow models. DEXA, workshop, pp 745–75210. Hillier FS, Lieberman GJ (1986) Introduction to operations research, 4th edn. Holden-Day Inc., San

Francisco, CA, USA11. IBM Insurance Application Architecture. http://www-03.ibm.com/industries/insurance/us/detail/solut

ion/P669447B27619A15.html?tab=312. Jensen K (1997) Coloured petri nets. Basic concepts, analysis methods and practical use. vol 3, practical

use. Monographs in theoretical computer science, Springer, Berlin. ISBN:3-540-62867-313. Lakshmanan GT, Duan S, Keyser PT, Khalaf R, Curbera F (2010) A heuristic approach for making pre-

dictions for semi-structured case oriented business processes. Business process management workshops,pp 640–651

14. Lakshmanan GT, Khalaf R (2012) Leveraging process mining techniques to analyze semi-structuredprocesses, IEEE IT Professional, to appear. http://doi.ieeecomputersociety.org/10.1109/MITP.2012.88

15. Liu S, Duffy AHB, Whitfield RI, Boyle IM (2010) Integration of decision support systems to improvedecision support performance. Knowl Inf Syst 22(3):261–286

16. Murata T (1989) Petri nets: properties, analysis and applications. In: Proceedings of the IEEE, vol 77,no. 4

17. Natarajan S, Tadepalli P, Fern A (2011) A relational hierarchical model of decision-theoretic assistance.Knowl Inf Syst (KAIS):1–21

18. Paz JF, Bajo J, Gonzlez A, Rodrguez S, Corchado JM (2012) Combining case-based reasoning systemsand support vector regression to evaluate the atmosphere-ocean interaction. Knowl Inf Syst 30(1):155–177

19. Pfeffer A (2005) Functional specification of probabilistic process models. AAAI, pp. 663–66920. Poh KL (2000) An intelligent decision support system for investment analysis. Knowl Inf Syst, pp 340–35821. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufman22. Ross S (2003) Introduction to probability models, 8th edn, Chap. 423. Rozinat A, Wynn MT, van der Aalst WMP, ter Hofstede AHM, Fidge CJ (2009) Workflow simulation for

operational decision support. Data Knowl Eng 68(9):834–85024. Rozinat A, van der Aalst WMP (2006) Decision mining in ProM. Bus Process Manag:420–42525. Rozsnyai S, Slominski A, Lakshmanan GT (2011) Discovering event correlation rules for semi-structured

business processes. In: Proceedings of the 5th ACM international conference on Distributed event-basedsystem, ACM, New York, pp 75–86

26. Schonenberg H, Weber B, van Dongen BF, van der Aalst WMP (2008) Supporting flexible processesthrough recommendations based on history. BPM:51–66

27. Taylor HM, Karlin S (1998) An introduction to stochastic modeling, 3rd edn, Chap. 3–428. van der Aalst WMP (2011) Process mining—discovery, conformance and enhancement of business

processes. Springer, Berlin, pp I–XVI, 1–35229. van der Aalst WMP, Reijers HA, Weijters AJMM, van Dongen BF, Alves de Medeiros AK, Song M et

al. (2007) Business process mining: an industrial application. Inf Syst 32(5):713–73230. van der Aalst WMP, Schonenberg MH, Song M (2011) Time prediction based on process mining. Inf

Syst 36(2):450–47531. van der Aalst WMP, van Dongen BF, Gnther CW, Rozinat A, Verbeek E, Weijters T (2009) ProM: the

process mining toolkit. BPM (Demos)32. van der Aalst WMP, van Dongen BF, Herbst J, Maruster L, Schimm G, Weijters AJMM et al (2003)

Workflow mining: a survey of issues and approaches. Data Knowl Eng 47(2):237–26733. van der Aalst WMP, Weske M, Grünbauer D (2005) Case handling: a new paradigm for business process

support. KDE 53(2):129–16234. van Dongen BF, Crooy RA, van der Aalst WMP (2008) Cycle time prediction: when will this case finally

be finished? OTM conferences, 1, pp 319–336

123

Page 29: A markov prediction model for data-driven semi-structured business processes

A markov prediction model for data-driven semi-structured business processes

35. Vanderfeesten ITP, Reijers HA, van der Aalst WMP (2011) Product-based workflow support. Inf Syst36(2):517–535

36. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. MorganKaufmann, Burlington

37. Zenie A (1985) Coloured stochastic petri nets. In: Proceedings of the international workshop on timedpetri nets, IEEE Computer Society Press, Torino, pp 262–271

38. Zhu WD, Becker B, Boudreaux J, Baman S, Gomez D, Marin M, Vaughan A (2000) Advanced casemanagement with IBM case manager, IBM redbooks. http://www.redbooks.ibm.com/abstracts/sg247929.html?Open

Author Biographies

Geetika T. Lakshmanan is a Research Staff Member at IBM T. J.Watson Research Center. An important theme in Lakshmanan’sresearch is the management, mining and prediction of data-drivenbusiness processes to address new problems introduced by emerg-ing applications, including healthcare and banking. Lakshmanan isan active member of the business process management, services, dis-tributed computing and knowledge management research communi-ties, having co-authored over 30 technical papers, holding nine USpatents, organized workshops and guest edited special issues acrossthese domains. Lakshmanan has M.S. and Ph.D. degrees in computerscience from Harvard University and a B.A. with highest honors fromSmith College. She is an IBM Master Inventor, an IEEE Senior Mem-ber, and a member of the ACM. Contact her via http://researcher.watson.ibm.com/researcher/view.php?person=us-gtlakshm or at [email protected].

Davood Shamsi currently is a Researcher Staff Member in the Adver-tising.com group at AOL Inc. in Palo Alto, California. His researchinterest includes large-scale machine learning algorithms and optimiza-tion. Particularly, he is working on developing machine learning algo-rithms and data structures which are fast, scalable, and distributed forweb-scale data sets with billions of users and millions features. Otherinterests include numerical optimization, process modeling, stochasticinferences and optimization over network. Davood holds an M.S. inManagement Science and Engineering (operation research) from Stan-ford University and an M.S. in Computer and Electrical Engineeringfrom Rice University. He received B.S. degrees in Mathematics andElectrical Engineering from Sharif University in 2005.

123

Page 30: A markov prediction model for data-driven semi-structured business processes

G. T. Lakshmanan et al.

Yurdaer N. Doganata is a research staff member at IBM T. J. WatsonResearch Center and the research lead at IBM-CaixaBank Digital Inno-vation Center. He has worked and managed projects in broad researchtopics including Queuing Theory, Intelligent Transportation Systems,Multimedia Servers, Web-based Collaboration, Electronic Services,Technical Support Search Systems, Unstructured Information Manage-ment and Business Process Management, since he first joined IBMResearch in 1989. His current research interests include decision man-agement systems, predictive modeling and managing uncertainty insemi-managed business processes. He received an M.S. from MiddleEast Technical University in 1983 and a Ph.D. from California Instituteof Technology in 1987 all in Electrical and Electronics Engineering. Heis a senior member of IEEE.

Merve Unuvar is a Research Staff Member in the Middleware andVirtualization Management group at IBM T. J. Watson Research Cen-ter. Merve holds Ph.D. and M.S. in Operations Research from RutgersUniversity, and a B.S. in Industrial Engineering from Bilkent Univer-sity, Turkey. Merves research interests include: Stochastic Modelingand Programming, Probabilistic Network Design such as bounding thereliability of a distributed system, Optimization with Discrete RandomVariables, Data Mining, Management and Predictive Analytics.

Rania Khalaf is a Research Staff Member in and manager of theComponent Systems group at the IBM T. J. Watson Research Cen-ter. She received her Doctorate from the University of Stuttgart andher Bachelor’s and Master’s degrees in Computer Science and Electri-cal Engineering from the Massachusetts Institute of Technology (MIT).Her interests include Service Computing, Business Process Manage-ment and Distributed Computing. Khalaf’s research focuses on busi-ness process languages and systems, along the spectrum from fullystructured, modeled workflows to semi-structured processes in whichpeople play a key role such as Case Management, to lightweight ‘busi-ness mashups’ in SaaS systems, to those that are completely ad hocpotentially without an underlying model or supporting runtime.

123


Recommended