An iterative approach to synthesize business process...

Contents lists available at SciVerse ScienceDirect

Information Systems

Information Systems 37 (2012) 714–736

0306-43

http://d

n Corr

E-m

Rajeev.G

jimmy.t

matthia

journal homepage: www.elsevier.com/locate/infosys

An iterative approach to synthesize business process templatesfrom compliance rules

Ahmed Awad a, Rajeev Goré c, Zhe Hou c, James Thomson c, Matthias Weidlich b,n

a Faculty of Computers and Information, Cairo University, Egyptb Hasso Plattner Institute, University of Potsdam, Germanyc School of Computer Science, The Australian National University, Australia

a r t i c l e i n f o

Available online 11 May 2012

Keywords:

Process synthesis

Analysis of business process compliance

specification

Process mining

79/$ - see front matter & 2012 Elsevier Ltd. A

x.doi.org/10.1016/j.is.2012.05.001

esponding author. Tel.: þ49 331 5509180.ail addresses: [email protected] (A. Awad

[email protected] (R. Goré), [email protected]

[email protected] (J. Thomson),

[email protected] (M. Weidlich

a b s t r a c t

Companies have to adhere to compliance requirements. The compliance analysis of

business operations is typically a joint effort of business experts and compliance

experts. Those experts need to create a common understanding of business processes to

effectively conduct compliance management. In this paper, we present a technique that

aims at supporting this process. We argue that process templates generated out of

compliance requirements provide a basis for negotiation among business and com-

pliance experts. We introduce a semi-automated and iterative approach to the synthesis

of such process templates from compliance requirements expressed in Linear Temporal

Logic (LTL). We show how generic constraints related to business process execution are

incorporated and present criteria that point at underspecification. Further, we outline

how such underspecification may be resolved to iteratively build up a complete specifica-

tion. For the synthesis, we leverage existing work on process mining and process

restructuring. However, our approach is not limited to the control-flow perspective, but

also considers direct and indirect data-flow dependencies. Finally, we elaborate on the

application of the derived process templates and present an implementation of our

approach.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Compliance management has received increasing atten-tion in recent years. Numerous financial scandals in largecompanies have fostered increasing attention on compli-ance management and has led to legislation initiatives suchas SOX [1]. When aiming to control business operations,compliance checking focuses on many different aspects ofa business process, for example compliance of businessrequirements may restrict the order in which activities areexecuted. Often constraints on the execution of activities are

ll rights reserved.

),

.au (Z. Hou),

).

also based on the data context, and dedicated data valuesmay require the execution, or the absence of any execution,of an activity. There may even be restrictions on roleresolution to realize a separation of duty.

Driven by these trends, numerous approaches have beenpresented to address compliance management of businessprocesses, and they can be classified as follows. First,compliance rules may guide the design of a business process[2,3] so that compliance is ensured by design since com-pliance violations are identified in the course of processmodel creation. Second, existing process models are verifiedagainst compliance rules [4,5]. Given compliance require-ments and a process model as input, these approachesidentify violations on the process model level.

Clearly, addressing compliance during the design ofbusiness operations has many advantages. Non-compliantprocesses are prevented at an early stage of process

www.elsevier.com/locate/infosyswww.elsevier.com/locate/infosysdx.doi.org/10.1016/j.is.2012.05.001dx.doi.org/10.1016/j.is.2012.05.001dx.doi.org/10.1016/j.is.2012.05.001mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]/10.1016/j.is.2012.05.001

A. Awad et al. / Information Systems 37 (2012) 714–736 715

implementation and costly post-implementation compli-ance verification along with root cause analysis of non-compliance is not needed. In most cases, process modelsthat are synthesized from compliance rules cannot bedirectly used for implementing a business process. Instead,they should be seen as a blueprint that is used as a basis fornegotiation between business and compliance experts.Hence, we refer to these process models as process tem-plates in order to emphasize that further refinements areneeded to actually implement the business process [6].While this approach has been advocated by other authors,e.g., [2,7–9], existing approaches are limited when it comesto data-dependent compliance requirements.

In this paper, we build on our previous work [10]and present an iterative approach to the synthesis ofcompliant process templates. We start with a set ofcompliance rules expressed in Linear Temporal Logic(LTL). Hence, we do not require the definition of explicitpoints in time as in [2,7], but focus on relative executionorder dependencies. Further, we also consider data flowdependencies between activity executions. This isneglected in [8], whereas other work requires the pat-tern-based coupling of control flow routing and dataconditions [9]. These rules are then enriched with generalconstraints related to business process execution. Toreach the ultimate goal of generating a process templateand hence to have a common understanding of thecompliance requirements, we go through a set of stepsthat might be repeated several times. First, we checkwhether the compliance requirements are satisfiable. Ifthe compliance requirements are not satisfiable, weanalyze the reasons for this inconsistency. If the require-ments are satisfiable, we generate all possible executionsequences (traces) and check them for underspecification.If requirements are well-specified, a process template isgenerated from these traces. At the time a specific step fails,e.g., the requirements are inconsistent, our approach sug-gests some remedies, the requirements are updated by theuser and a new iteration starts. Finally, we also illustratehow generated templates are applied during process designand how the template generation may identify inconsisten-cies and open questions. Hence, the template guides furtherrefinements of the process model and the compliancerequirements. To evaluate the applicability of our approach,we present a prototypical implementation.

This paper revises and extends our initial approachpresented in [10] in various directions. In this paper, weshow how analysis is conducted if the compliance rules areinconsistent, an aspect neglected in our previous work.Further, we now use a theorem proving approach to analyzethe generated traces and extend the trace criteria to beverified. That is, we explicitly verify correctness of the tracesregarding the order of execution as induced by data depen-dencies. In addition, our new approach comes with resolu-tion strategies when the trace correctness criteria are notsatisfied. Finally, we present a novel approach for the actualsynthesis, which incorporates techniques from the field ofprocess restructuring. Compared to the synthesis presentedin [10], it has the advantage that we synthesize well-structured process models and that we take indirect datadependencies (between activities that do not follow each

other directly) into account. As such, our contribution is acomprehensive approach to process design grounded incompliance rules.

The remainder of this paper is structured as follows. Thenext section introduces preliminaries for our work, such asthe applied formalism. Section 3 gives an overview of ourapproach of synthesizing process templates from a givenset of compliance rules and discusses the LTL encoding.Section 4 elaborates on how to detect inconsistencies if theobtained LTL specification is not satisfiable. The generationof traces along with their analysis is presented in Section 5.Section 6 shows how to cope with issues that are detectedduring the analysis of traces. Then, we discuss the actualsynthesis of a process template and its application inSection 7. A prototypical implementation of our approachis presented in Section 8. Finally, Section 10 reviewsrelated work, before we conclude in Section 11.

2. Preliminaries

This section gives preliminaries for our work. Section2.1 clarifies our notion of execution semantics. Section 2.2discusses the reason we choose LTL instead of otherlogics. Section 2.3 presents LTL as the logic used in thispaper. Section 2.4 summarizes existing work on generat-ing a behavioral model from a given set of LTL formulae.

2.1. Process runs as linear sequences

In this paper, we rely on trace semantics for processmodels. An execution sequence s of a process model isreferred to as a process run or trace – a finite and discretelinear sequence of states s : s0,s1, . . . ,sn with a start states0 and an end state sn. Clearly, a process model, as well asa set of compliance requirements, might allow for manyconforming traces, e.g., due to choice points in the processmodel. Each state of a trace is labeled with propositionsthat refer to actions and results. Actions are the drivingforce of a trace and refer to the execution of businessactivities. This, in turn, may effect or be constrained byresults, which relate to data values of the businessprocess. As an example, think of an activity ‘risk analysis’(ra) and a data object ‘risk’. The action that represents theexecution of this activity may have the result of settingthe value of the data object to ‘high’ or ‘low’. Theexecution of another activity, i.e., another action, may beallowed to happen solely if a certain result, e.g., the objecthas been set to ‘high’, occurred. Both actions and resultsare represented by Boolean propositions at each state. Forinstance, proposition ra being ‘true’ at a state si meansthat the action, i.e., execution of activity ‘risk analysis’,happened at state si. In contrast, proposition ra being‘false’ at state si means that the action did not happen atstate si. Given a trace s : s0,s1, . . . ,sn, we write p 2 si toindicate that proposition p is true in state si, for 0r irnand p 2 s if there is a state si in s where p 2 si, for some0r irn.

We represent an execution sequence as a linearsequence of states where states are labelled with bothactions and results, and (unlabelled) edges between statesrepresent the temporal ordering in the sequence.

A. Awad et al. / Information Systems 37 (2012) 714–736716

2.2. A suitable logic for process modelling

Various logics have been proposed for the purpose ofprocess modelling. Deontic modalities were used to definepolicies and business rules by Padmanabhan et al. [11],similarly, Geodertier et al. [2,7] adopted PENELOPE, adeontic logic with temporal assignments, to enact control-flow-based processes. van der Aalst et al. [12–14], on theother hand, developed a declarative workflow managementsystem that uses Linear Temporal Logic (LTL) [15] to drivethe design and execution of processes.

The study of the logic for process modelling is still on-going, we do not have a conclusion that one is definitelybetter than the other. Following our intuition to captureprocess runs, however, LTL seems to be a suitable logic toreason about relations of actions and results in a sequenceof executions. First of all, with the temporal operators, LTLbrings about the power to specify the uncertain orderrelation between actions. This advantage, in van derAalst et al.’s approach, is exploited to declaratively enactflexible processes, and thus can be considered as animportant future extension to our approach. ConverselyPENELOPE enforces exact time for events with its tem-poral assignments, and consequently fixes the structure ofthe process. Second, many workflow management sys-tems record process executions as ‘‘event-logs’’, whichcontain the ordering of events in each process run.Recorded ‘‘event-logs’’ can be used for process mining,and by adopting machine learning techniques, van derAalst et al. have shown that it helps support the design ofprocesses by rediscovering patterns from past experience.To this end, choosing LTL is plausible since it naturallycaptures the ordering of events in linear time. Third, byLTL theorem proving, we will show in Section 2.4 that wecan derive all possible traces from the rules in an LTLspecification. This means we guarantee that our processmodel generated from the rules are logically correct andcontains all permitted executions described by the rulewith all invalid executions excluded. Finally, there aremany versions of Deontic Logics that enables differentsets of axioms [16], it is nontrivial to decide which one isthe best in the process modelling context. There may bemissing axioms that could be useful, or present ones thatcould be harmful. Therefore, for simplicity, LTL enables usto encode and reason about processes under our setting,and also provides us with possibilities to extend our workin related areas in the future.

Before we had LTL as our underlying logic set in stone,we considered other logics such as Computation TreeLogic (CTL) and Propositional Dynamic Logic (PDL). Theexpressive power of CTL and LTL is incomparable, thereare sentences that can be expressed in one but not theother. Nevertheless, since the domain knowledge andrules we define for the process are actually constraintson all traces, we can just add the universal path quantifierin the front of every state quantifier to form the corre-sponding CTL encoding. In this way, however, the fullexpressive power of CTL is unexploited. Even though CTLis branching, it is hard to utilise the existential pathquantifier to merge all possible paths in one model. Thatis, to extract all correct traces, we still have to consider all

models in CTL and the search space is not reduced. ThusCTL does not benefit us in an obvious way. PDL, as a modallogic, has similar semantics as deontic logic, which in asense, enables us to describe what must occur and whatmay occur. However, eventualities in temporal logics,which is the key to flexible structured processes, are hardto simulate in PDL. Although PDL naturally supportsdistinguishing control-flow from data-flow (actions asprogrammes and results as propositions), it does not helpto express the temporal relations that we want to encode.

As a result, we choose LTL as the suitable logic for ourapproach, the details of the encoding and other techniquesrelating to LTL will be presented in the rest of this paper.

2.3. Linear temporal logic

Linear Temporal Logic is a logic specifically designedfor expressing and reasoning about properties of linearsequences of states. The formulae of LTL are built fromatomic propositions using the connectives of 3 (or), 4(and), : (not) and ) (implication), and the followingtemporal connectives: X (next), F (eventually), G (always),U (until) and B (before). There are two logical constants >(verum) and ? (falsum) which are always true and alwaysfalse respectively. The temporal connectives are inter-preted as follows:

Xj:

in the neXt state, j holds

Fj:
there is some state either now or in the Future
where j holds

Gj:
in every state Globally from now on, j holds
jUc:
there is some state, either now or in the future,where c holds, and j holds in every state fromnow Until that state
jBc:
Before c holds, if it ever does, j must hold.
Note that Fj can be defined as >Uj and similarlyGj can be defined as ? B:j. We apply LTL to encodecompliance requirements. Hence, we obtain a set G of LTLformulae expressing the constraints to which complianttraces have to conform.

2.4. Finding all LTL-models of a set of LTL formulae

Given a collection of compliance requirements (con-straints) expressed as a set G of LTL-formulae, we try tofind a behavioral model that captures all formula-models,i.e., traces in our setting, which satisfy G. That is, sucha structure describes all linear sequences of statess0,s1, . . . ,sn such that G is true at s0. Since G may containeventualities, such as Fj or c1Uc2, ensuring that G is trueat s0 may require us to ensure that j or c2 is trueeventually at some state si with 0r irn. In contrast tomodel checking [17] we are not given a single trace, butconstruct all traces satisfying the given constraints.

The first step is to determine whether the constraintsG are satisfiable. If not, the specification is erroneous sinceno trace can conform to the given constraints. The secondstep is the creation of the behavioral model that describesall traces.


For both steps, we use a graph-based tableaux methodintroduced in [18,19]. In essence, this approach works asfollows. We start by creating a root node containing G andproceed in two phases. First, a finite (cyclic) graph oftableau nodes is created by applying tableau-expansionrules that capture the semantics of LTL and by pruningnodes containing local contradictions [18]. Second, oncethe graph is complete, a reachability algorithm is used todetermine which nodes do not satisfy their eventualities.These nodes are removed and the reachability algorithmis reapplied until no nodes may be removed. The set offormulae G is satisfiable, if and only if the root node hasnot been removed [18]. Further, the graph created by thetableau algorithm, referred to as the pseudomodel, describesall possible formula-models, i.e., possible traces [18]. Weuse this pseudomodel to extract possible traces during oursynthesis approach.

In this paper, two types of tableaux methods are usedfor different purposes. One is the graph-based methodmentioned above, which is used for extracting traces (cf.Section 5.1). The other one is the tree-based method withback-jumping, which is used for analysing the cause ofinconsistency (cf. Section 4.2). Deciding satisfiability inLTL is PSPACE-complete [20], and the tableaux methodswe use are exponential in time. It is also commonlyacknowledged that the time complexity of the modelchecking problem in LTL is exponential in the size of theformula as well [21], so theoretically other methods basedon LTL model checking do not have an advantage over ourtheorem proving method in terms of run time.

3. The basis of the synthesis approach

Section 3.1 gives an overview of the synthesis of processmodels from a set of compliance rules and introduces anexample set of compliance rules used to illustrate all sub-sequent steps. Section 3.2 describes the LTL encoding of thecompliance rules and additional domain knowledge.

3.1. Overview

The process model in Fig. 1 visualizes the steps tosynthesize a process template out of a set of compliance

Fig. 1. Process synth

rules. First, a set of compliance rules is collected andformulated in LTL, we refer to the set of rules as CR, stepA. In order to identify whether these requirements areconsistent and thus a process template can be synthe-sized, related domain-specific knowledge is identified, werefer to domain knowledge as DK, step B. In Section 3.2we give details on the LTL encoding of both compliancerules and domain knowledge.

For the conjunction G of the LTL formulae in CR and DK,we verify satisfiability as summarized in Section 2.4, step C.If G is not satisfiable then no trace can be constructed tosatisfy the given LTL formulae. At that point, inconsistencyanalysis is conducted, step D, to identify the causes ofinconsistency, details of this step are given in Section 4.Note that all of the three substeps introduced in Section 4.4in the inconsistency analysis (checking domain knowledge,rules, and the conjunction of them resp.) are conductedin step D. That is, in step C the aim is to generate apseudomodel out of the specification. But if the conjunctionof domain knowledge and rules is not satisfiable, there canbe no pseudomodel, thus checking the satisfiability in step Cis not for inconsistency analysis but for deciding whetherthe process should go to step D or step E. As discussed inSection 2.4, since the purpose of the satisfiability checkingin step C and step D is different, we use a graph-basedtheorem prover in the former and a tree-based theoremprover in the latter.

On the other hand, If G is satisfiable then the satisfia-bility checker automatically returns the pseudomodelwhich is a behavioral model of all traces that obey thegiven constraints. As a next step, finite traces areextracted from the pseudomodel by following all choicepoints and stopping when a trace becomes cyclic, step E.Having a finite set of traces that satisfy the compliancerules, we check them for some correctness criteria, step F.Failure of these criteria hints at issues in the specification,so that a new iteration of the synthesis may be startedwith refined compliance rules or adapted domain knowl-edge. Steps E and F are discussed in Section 5. In steps Gand H, we check whether it is possible to restore thecorrectness criteria of the traces. Both steps are describedin Section 6.

If the traces obey the correctness criteria, we use aprocess synthesis algorithm to extract a process template,

esis approach.


step J. The synthesized template is then analyzed toidentify discrepancies that stem, e.g., from under-specifi-cation, step K, which is basically a manual step handledby a human expert. Depending on the result of thisanalysis, again, a new iteration of the synthesis may bestarted. Section 7 describes the template generation step.

Example. We illustrate our approach with an examplefrom the financial domain. Anti money laundering guide-lines [22] address financial institutes like banks, anddefine a set of checks to prevent money transfers withthe purpose of financing criminal actions. We focus on thefollowing guidelines for opening new bank accounts:

R1:

TableThe f

Con

To

o

The

The

c

Kn

To

A

t

The

b

The

i

A risk assessment has to be conducted for each ‘openaccount’ request.

R2:
A due diligence evaluation has to be conducted foreach ‘open account’ request.
R3:
Before opening an account the risk associated withthat account must be low. Otherwise, the account isnot opened.
R4:
If due diligence evaluation fails, the client has to beadded to the bank’s black list.
3.2. LTL encoding

Once the compliance rules have been collected, abehavioral model that represents all traces conformingto these rules is created. In order to arrive at such a model,we need to collect extra domain-specific rules. Much ofthe domain-specific rules can be generated automaticallyfrom a higher level description. Such a description needsto be defined by a human expert in the first place andcomprises the following information where 2X is the set ofall subsets of X.

Actions and Goals: The set of all actions is denoted by A.The set of goal actions G � A comprises actions that indicatethe completion of a trace. Moreover, we capture contra-dicting actions that are not allowed to occur together in onetrace in a relation CA : A-2A.

1ormulae making up the domain knowledge.

straint description

realize interleaving semantics, the formula interleave ensures that at most

ne activity can be executed, at any state.

formula progress guarantees that at least one action occurs at each state

mutual exclusion constraints given in RE are enforced by the formula m

annot be true at the same time.

owledge on contradicting actions or results is taken into account by the fo

implement the relation between actions and results, formula cau1 states t

M the action a must cause at least one of the results in S. Formula cau2 sta

hat result can only be changed by one of the actions which can cause it.

formula once enforces that all actions other than end occur at most once

ehavior. The formula final enforces that end persists forever to represent

formula goals is used to require that eventually the outcome of the proc

nitial ensures correct initial values for all objects.

Results and Initial Values: The set of all results isdenoted by R, and we define R ¼ f:r9r 2 Rg as the set ofnegated results. The initial values of data objects aredefined by a set IV � R [ R. Similar to contradictingactions, we capture contradicting results in a relationCR : R-2R.

Relation between Actions and Results: The mappingfrom actions to sets of results is given as a relationAM : A� 2R[R . To reflect the fact that results are causedby actions, we define results according to their corre-sponding actions. That is, for some a 2 A that has results,we define Ra as the set of results of action a, thus Ra ¼ fr 2R : (S:ða,SÞ 2 AM 4 r 2 S or :r 2 Sg. It is easy to see that Ris the union of the Ras for the actions a that have results.

Exclusive Results Set: Mutually exclusive sets of resultsRE is defined as RED2R which satisfies 8E 2 RE,8r1, r2 2E s:t: r1ar2, r1 and r2 cannot hold jointly in one state.

Based on this information and two additional actionsstart and end that represent the initial and final states of atrace (independent of any goal states), we derive LTL rules torepresent the domain knowledge according to Table 1.Common process description languages, e.g., BPMN or EPCs,assume interleaving semantics, which is enforced by formulainterleave and progress. The information on exclusivenessconstraints and on contradicting actions and results yieldsthe formulae mutex and contra. The formula causality guar-antees correct implementation of dependencies betweenactions and results. Finally, the formulae once, final, goals,and initial ensure correct initialization and successful termi-nation of any trace. The conjunction of all these formulaeyields the formula domain, which represents the domainknowledge:

domain¼ start4G initial4F goals4F end4G interleave4G progress4G mutex4G causality4G once4G contra4G final

Example. For our example, an expert first identifies thefollowing actions and results.

one actio

.

utex, i.e.

rmulae

hat for e

tes that

, in orde

the proc

ess is de

Actions¼ fra,edd,og,od,blg

ra:
conduct a risk assessment.
Formalization

n can be true, i.e., interleaveðaÞ ¼ a) ðV

b2A\fag:bÞÞinterleave¼

Va2AinterleaveðaÞ

progress¼W

a2Aa

, exclusive results mutexðSÞ ¼V

a,b2S,aab:ða4bÞmutex¼

VS2REmutexðSÞ

, con and conRes. conðaÞ ¼ a) GV

b2CAðaÞ:bconResðrÞ ¼ r) G

Vs2CRðrÞ:s

contra¼V

a2AconðaÞ4V

r2RconResðrÞvery entry ða,SÞ 2for every result r,

cau1ða,SÞ ¼ a)W

r2Sr

cau2ðrÞ ¼ r) ðXWða,SÞ2AM,fr,:rg\Sa|aÞB:r

causality¼Vða,SÞ2AMcau1ða,SÞ4

Vr2R[R cau2ðrÞ

r to avoid infinite

ess end.

onceðaÞ ¼ a) XG:aonce¼

Va2A\fendgonceðaÞ

final¼ end) Gendtermined, while goals¼

Wg2Gg

initial¼ start)V

v2IV v


edd:
evaluate due-diligence.
og:
grant a request to open an account.
od:
deny a request to open an account.
bl:
blacklist a client.
Results¼ fri,rh,rl,ei,ef ,epg

ri:
risk assessment is initial.
rh:
risk was assessed as high.
rl:
risk was assessed as low.
ei:
due-diligence evaluation is initial.
ef :
due-diligence evaluation failed.
ep:
due-diligence evaluation passed.
Note that the results are all descriptive statements,while the actions refer to activities. Moreover, we intro-duce positive representations for the values ‘high’ and‘low’ of the risk object, even though both values areopposites. For example, the risk object has three possiblevalues: high, low, or initial. The same holds true for thedue-diligence object.

Based on these actions and results, the compliance rulesare encoded in LTL. As a process to open a bank account isconsidered, the process is assumed to start by receivingsuch a request. Therefore, rules R1 and R2, mentionedabove, are interpreted as ‘‘A risk assessment has to beconducted’’ and ‘‘A due diligence evaluation has to beconducted’’ respectively. The third rule is interpreted tomean that the risk associated with opening an account mustbe low at the time the request is granted, rather than at somepoint in the past. Similarly in the case when denying theopen request, the risk has to be high. Based on thisinterpretation the rules are formalized as follows:

R1:
A risk assessment has to be conducted:F ra‘‘Eventually ra must hold’’
R2:
A due diligence evaluation has to be conducted:F edd‘‘Eventually edd must hold’’
R3:
The risk associated with opening an account must below when the request is granted:Gðog ) rlÞ4Gðod) rhÞ‘‘Always, og only if rl, and always, od only if rh’’
R4:
If due diligence evaluation fails, the client has to beadded to the bank’s black list:Gðedd4ef ) F blÞ‘‘Always, edd and ef imply eventually bl’’
As a next step, the domain knowledge is defined inmore detail. For instance, the action mapping AM definesra/frh,rlg and ra/f:rig. The former says that action racauses the risk object to take a concrete value of ‘high’ or‘low’. The latter means that ra causes the risk to stopbeing ‘initial’ by forcing ri to not hold. Excluding resultsare defined, e.g., fri,rl,rhg 2 RE states that at most one ofthe propositions ri,rh,rl can hold at each state. The goal ofthe process is defined as fog,odg and the set of initialvalues fri,eig signifies that initially, both risk and due-diligence objects, are put to an initial, unknown, value.There are also contradicting actions, fog/fodg,od/foggg,ensuring that we cannot grant and deny a request withinthe same trace.

Based on Table 1, this specification is converted intoLTL. For example, this yields the formula progress¼

ra3edd3og3od3bl3start3end. The final set of LTL for-mulae is the union of the domain formula and all fourformulae representing the compliance rules.

Given a set of LTL formulae, we apply the techniquesummarized in Section 2.4 to determine whether theconstraints are satisfiable. If the constraints are unsatisfi-able, this indicates an inconsistent specification. Thedetails of inconsistency detection is covered in Section4. On the other hand, if the constraints are satisfiable, wecan obtain a set of traces that represents how such rulescan be fulfilled and these traces are further examinedbefore a process template is generated.

4. Analysis of domain knowledge and rulesinconsistency

In this section, we describe our approach to discover thecause of inconsistency in the domain knowledge andcompliance rules expressed in temporal logic. First, relatedtechniques and definitions are discussed in Section 4.1. Thenwe present the details of our approach to find the incon-sistent subset of rules in Section 4.2. If the user wants tofurther know the exact reason of inconsistency, we refinethe resulting subset to a minimal unsatisfiable core, as isillustrated in Section 4.3. Finally, we show how the techni-ques are incorporated in the business process synthesiscontext in Section 4.4.

4.1. Related methods and definitions

Various approaches attempt to find the explanationsfor the inconsistency of a set G of given formulae [23–26].Most of them focus on extracting minimal unsatisfiablecores of G, since they narrow down the reason that causesinconsistency. Although a vast amount of work has beendone to investigate the minimal unsatisfiability problemin propositional logic, the same problem for LTL has notdrawn much attention. Hantry and Hacid proposed aconflict-driven tableau depth-first-search for LTL, thecomplexity of their approach is theoretically EXPTIME[25]. Schuppan demonstrated approaches to computeunsatisfiable cores of LTL formulae from various aspects[24], but there was no requirement to find a minimal one.

There are several different notions of unsatisfiablecores in the literature, here we adopt a series of defini-tions by Lynce et al. [27], which are appropriate under thecontext of business process modelling.

Definition 1 (Unsatisfiable Core). Given a set G of formu-lae, which is the LTL encoding of domain knowledge DKand compliance rules CR, a set UC of formulae is anunsatisfiable core for G iff UCDG and UC is unsatisfiable.

Thus an unsatisfiable core can be any subset of G thatis unsatisfiable. That is, the largest unsatisfiable core is Gitself, and in the worst case it can be the minimalunsatisfiable core as well, the definition of which is shownas follows.

Definition 2 (Minimal Unsatisfiable Core). An unsatisfi-able core UC for a given set G of formulae is a minimalunsatisfiable core iff 8j 2 UC. UC\fjg is satisfiable.


If there are multiple minimal unsatisfiable cores in G, aminimum unsatisfiable core is one that has the leastcardinality, as below.

Definition 3 (Minimum Unsatisfiable Core). A minimalunsatisfiable core UC for a given set G of formulae is aminimum unsatisfiable core iff every unsatisfiable coreUC0 of G has 9UC09Z9UC9.

Note that there could even be multiple minimumunsatisfiable cores for a given set of formulae, if theseare of the same size. Sometimes it is useful to find all theminimal unsatisfiable cores to allow the user to select the‘‘best’’ explanation.

Unfortunately, deciding if a set of LTL formulae is aminimal unsatisfiable core is in PSPACE, and is conjec-tured to be PSPACE-complete [25], thus it is expensive topinpoint the exact reason of inconsistency. However, inmany cases, it may be important to find (not necessarilyminimal) unsatisfiable cores, since they still provide theuser with useful information.

4.2. Finding unsatisfiable cores

We now outline our procedure for finding unsatisfiablecores of the given domain knowledge and compliancerules. The technique we use is known in the automatedreasoning and artificial intelligence communities as‘‘back-jumping’’ or ‘‘use-check’’ [28], and is integratedinto our tree-based tableaux method.

The tree-based tableaux method for checking satisfia-bility attempts to build a linear model of nodes for thegiven set of formulae G. Each node in the model containsthe set of formulae which are true at that node. It doesthis by starting with an initial root node that contains G asa set of formulae of LTL. Under the assumption that G isLTL-satisfiable (i.e., has a model), the tableaux methodadds further nodes below the current node which mustalso be LTL-satisfiable. It adds these nodes by applying arule from a given finite set of rules to one of the formulaein the node to give a new node containing new formulae.

If the current node contains the formula j4c, as wellas the set of formulae X, then the tableau rule for ð4Þ canbe applied to obtain a child node that is identical tothe current node, except that j4c is replaced by twoformulae j and c. The rule can be written as shownbelow at left:

ð4ÞX;j4cX;j;c ð3Þ

X;j3cX;j9X;c

ð?ÞX; p;:p

ðXÞ Z;Xj1,Xj2; � � � ;Xjnj1,j2; � � � ;jnZ only contains literals and ð?Þ not applicable

The rule ð3Þ, shown above in the middle splits thecurrent branch into two branches: one where the childnode contains j instead of j3c, and the other whichcontains c instead of j3c. These branches capture theintuition that if j3c is true, then j is true or c is true.

The ðXÞ rule intuitively creates a child node which is a‘‘next’’ state to the current state. Thus, it has a side-condition which says that it can be applied only when no

other rules could apply. The rule ð?Þ, shown above on theright is a ‘‘stopping rule’’ which says that we can stopexpansion of the current branch if we find a node thatcontains a contradictory pair of atomic formulae. That is,if we find such a pair, then the current node has nochildren. Such a node is said to be ‘‘closed’’.

We omit the rules for F,G,U and B since they are morecomplicated to explain. However, F and U behave inessentially the same way as ð3Þ for the purposes ofbacktracking, since jUc�c3ðj4XðjUcÞÞ, and G and Bbehave similar to ð4Þ because jBc�:c4ðj3XðjBcÞÞ.

The usual way to explore these branches is to traversethem in a depth-first, left-to-right order. When we apply að4Þ or ð3Þ rule to a node, we label the node as an 4-nodeor 3-node respectively. When we find a closed node, wepropagate that status upwards to the parent of the closednode as follows. If the parent is an 4-node or an X-nodethen it gets the status closed too. Else if the parent is an3-node it gets the status closed if both of its children areclosed.

Suppose we have a node X;j3c which we split intotwo child nodes X;j and X;c. Suppose that the X;jbranch closes somewhere below this node because itcreates a node Y ; p;:p. We can immediately form theset Xl ¼ fp,:pg as a ‘‘bad’’ set. When we backtrack up fromY ; p;:p, we can trace the formulae p and :p to theformulae which create them and replace each in Xl withits respective traced formula. We repeat this tracingprocedure to higher and higher nodes and collect theselarger and larger ‘‘bad’’ formulae into a larger and largerset. Eventually, we end up at X;j with some set Xl of‘‘bad’’ formulae. That is, we know that XlDðX;jÞ isunsatisfiable (i.e., contradictory).

If we have XlDX then we can conclude that j isirrelevant to the contradiction. Hence we can concludethat X;c will be contradictory for exactly the samereason: namely the contradictory Xl that is sitting insideX. This means that we do not even have to expand the X;cbranch, we can just declare it closed and backtrack furtherup the tableau branch.

On the other hand, if XlJX then we have to explore theright branch by expanding the right child X;c. Suppose itgives us an unsatisfiable set Xr DðX;cÞ. If Xr DX then wecan pass up Xr alone, ignoring Xl. That is, this tells us that ifwe had swapped the order of exploration and had exploredX;c first, then we would have back-jumped over the X;jchild since X contains a contradictory subset Xr.

On the other hand, if XrJX then we have found twocontradictory sets XlDðX;jÞ and Xr D ðX;cÞ such thatXlJX and XrJX. We therefore form the set Xlr ¼ ðXl\fjgÞ [ðXr\fcgÞÞ [ fj3cg by replacing j and c by j3c in theunion of Xl and Xr. We can guarantee that Xlr is unsatisfi-able since an application of the (3)-rule immediatelygives us two children Xl;Xr\fcg and Xr;Xl\fjg, eachof which contains an unsatisfiable subset Xl and Xrrespectively.

Notice that there may be a set W � Xlr which is alsounsatisfiable and strictly smaller than Xlr, thus we do notguarantee to identify a minimal unsatisfiable core.

Finally, here is what we do at an X node. As describedabove, the child will give a ‘‘bad’’ set XbDfj1, � � � ;jng


which we know to be unsatisfiable. We therefore returnthe set XXb ¼ fXji9ji 2 Xbg as an unsatisfiable core. Again,this set may not be minimal. There are also some furthercomplications that arise because of the need to trackunfulfilled eventualities. It is impractical to find minimalunsatisfiable cores immediately via back-jumping, so wefurther trim off the irrelevant formulae from the resulthere as presented in Section 4.3.

Back-jumping is an optimisation to quickly discardbranches that do not lead to satisfiability. It improves theperformance of theorem proving in practice, but does notchange the worst case time complexity. Comparing to thegraph-based method which aims at finding all models of aformula, the tree-based method is faster when the for-mula is satisfiable because it can stop once it has foundonly one model.

4.3. Refining the unsatisfiable core

The tree based tableaux method with back-jumping isused in the inconsistency analysis because we can alsoderive unsatisfiable cores by using this method. If the usercan already determine the cause of inconsistency from theresult given by back-jumping, then no further refinementis needed. However, sometimes the result returned byback-jumping may be a relatively large subset. In thiscase, to help the user identify the cause of inconsistency,we adopt a general algorithm to find a minimal unsatisfi-able core [29] and simultaneously avoid the complicatedinternal operations in the tree-based tableaux method.

Two well known algorithms that use the theoremprover as a black box to find minimal unsatisfiable coresare described by Marques-Silva [29]. The deletion-basedalgorithm calls the theorem prover O(m) times, where mis the size of the set of formulae; the insertion-basedalgorithm with binary search involves Oðk� log mÞ callsto the theorem prover, where k is a number that is muchsmaller than m. Despite the possible lower complexity ofinsertion-based algorithms and their promising results inConstraint Satisfaction Problems, they are not widelyused because insertion-based methods are not effectivein solving minimal unsatisfiable core problems. Moreover,the inconsistent set returned by back-jumping can oftenbe significantly smaller than G, we therefore use thedeletion-based algorithm.

Given an unsatisfiable core D returned by back-jump-ing, we call the theorem prover iteratively to find aminimal unsatisfiable core as follows. For each j 2 D,we test if D\j is unsatisfiable. If D\j is still unsatisfiablethen j does not contribute to the inconsistency of thissubset, and thus is deleted. Otherwise, we keep j in D andtest other formulae. This procedure terminates when allthe formulae in D are tested, and the remaining set offormulae is guaranteed to be a minimal unsatisfiable core.

Notice that minimal unsatisfiable cores are not definedin terms of their sizes but rather on the inability to find anunsatisfiable strict subset. However, sometimes a smallerset might give the user a better intuition for correction. Tofind a minimum unsatisfiable core, the naive approach isto check each subset of G and find the smallest one that isunsatisfiable, as adopted by van der Aalst et al. to identify

the cause of an error [30]. Intuitively this procedure isdone in a bottom-up fashion. That is, first check thosesubsets that contain one formula, then check those thatcontain two formulae, and so on until an unsatisfiablesubset is found. In the worst case, if there are n formulaein G, one needs to test 2n subsets, the enormous searchspace makes the naive approach computationally expen-sive. There could be more intelligent methods for thisproblem, but those are out of the scope of this paper.

4.4. Inconsistency analysis of the business specification

With previously described techniques and definitionsat hand, we now demonstrate our strategy to analyze theinconsistency of the domain knowledge and compliancerules. Our assumption here is that in step C of Fig. 1,the theorem prover has returned unsatisfiability for thespecification, and now we want to analyse the reason whyit is inconsistent. We proceed as follows.

First of all, we check the satisfiability of the domainknowledge DK. It is very rare that there are inconsisten-cies in the domain knowledge, since most of thoseformulae do not constrain the behavior of the businessprocess. Second, we check whether the set of compliancerules CR are satisfiable, any contradictory rules will becorrected by iterating this step. Finally, when the sets ofdomain knowledge DK formulae and compliance rules CRformulae are both independently satisfiable, we checkwhether the two of them together as G¼DK [ CR areconsistent (satisfiable).

After the first two steps, if we find any internalinconsistency in the domain knowledge or the rules,backjumping gives an unsatisfiable core as the cause ofinconsistency. On the other hand, if the checks in the firsttwo steps are passed, we then proceed to the final step,after which we report errors caused by the interaction ofDK and CR, if any. Each time the cause of inconsistency isreported, our process synthesis procedure loops backto step L in Fig. 1 to refine the domain knowledge and/or rules.

For the first two of these steps, we provide three basicoptions to deal with inconsistencies. If L, the set offormulae being tested, is not satisfiable, the back-jumpingprocedure will return an unsatisfiable core D. If the usercannot identify the cause of inconsistency from D, thenwe can either (1) refine D by the deletion-based algorithmand find a minimal unsatisfiable core or (2) test eachsubset of D to find a local minimum unsatisfiable core inD. Note that the result returned by the latter option maynot be a global minimum unsatisfiable core, since theremay be smaller minimal unsatisfiable cores outside D.Therefore, we can (3) test each subset of L to find a globalminimum unsatisfiable core.

For the last of these steps, the situation is morecomplicated. The first two steps guarantee that the setof domain knowledge DK and the set of compliance rulesCR are satisfiable independently. So if their conjunction Gis unsatisfiable, then both DK and CR should have at leastone formula in the unsatisfiable cores. Since DK is theencoding of the underlying assumptions of the businessprocess, such as ‘‘the goal must be reached’’ or ‘‘occurrence


of actions should lead to corresponding results’’, once DK isverified to be correct, it should not be changed. That is, theuser may be more interested in the compliance rules thatbreak the consistency. Therefore, in addition to the pre-viously introduced three basic options, we further providethree more to focus on the compliance rules. Suppose G isunsatisfiable, the back-jumping procedure returns an unsa-tisfiable core D, but the user needs a deeper analysis, wesplit D into two sets: DDK ¼DK \ D, and DCR ¼ CR \D.Then we can either (4) run the deletion-based algorithm onD, but only test those formulae from DCR, and return theremaining formula(e) in DCR as a minimal unsatisfiable corerelated to CR, or (5) test each subset of DCR together withDDK to find a local minimum unsatisfiable core in D relatedto CR. Finally, we can also (6) test each subset of CRtogether with DK to find a global minimum unsatisfiablecore in G related to CR.

The use of the six options depends on the user. We havesome general observations, but the pros and cons are to beinvestigated as future work. Our business process synthesisprocedure requires the domain knowledge and compliancerules to be consistent, so only when all the inconsistenciesare removed can we proceed to generate the set of traces forthis business process. Consequently, if there are multipleminimal unsatisfiable cores in G, we have to correct eachone of them until G is satisfiable. In this sense, the order ofdiscovery and their size do not matter. In general, finding aminimal unsatisfiable core is faster than finding a minimumone, so the user might tend to choose (1) and (4) instead of(2) and (5). Similarly, since the search space of (3) and (6)are in general much larger than other options, they may becostly in terms of computational time.

As another aspect of comparison, the benefit of speci-fically analyzing compliance rules (options (4–6)) may notbe obvious under some conditions, since it may ‘‘push’’the cause of unsatisfiability into the domain knowledge. Aminimum (locally or globally) unsatisfiable core related toCR may not be minimum in G, because there may bemore formulae in DK that contribute to the inconsistency.For the same reason, merely giving a set of compliancerules as the cause of inconsistency may not be so helpfulin some cases, as it is possible that the error is relatedmore tightly to a chain of formulae in the domain knowl-edge, thus leaving only the compliance rules that are notobviously related to each other.

Suppose the unsatisfiable core returned by back-jumpingis D, the size of which is denoted by 9D9. In the worst case,option (1) involves Oð9D9Þ calls to the theorem prover,whereas option (2) needs Oð29D9Þ calls to the theoremprover. The number of times Option (3) calls the theoremprover is exponential in the size of the entire formula beingtested, which is usually larger than D. Similarly, suppose inthe third step, when we test the domain knowledge andrules together, the rules in D consist of the set DCR of size9DCR9. Then option (4) calls the theorem prover Oð9DCR9Þtimes, and for option (5) it is Oð29DCR9Þ. Option (6), bycontrast, invokes the theorem prover Oð29CR9Þ times, where9CR9 is the number of compliance rules.

Example. In this example, we add some rules regardlessof the correctness and the applicability in the banking

area. The purpose is to demonstrate our inconsistencyanalysis.

Trivial errors such as adding G:edd to the previousfour rules can be picked up by back-jumping, resulting ina subset fðF eddÞ,ðG:eddÞg. Since this is already a minimal(and minimum) unsatisfiable core, there is no need forfurther refinement.

Suppose the user comes up with an idea that the duediligence evaluation should be done before the riskassessment, since if one fails the due diligence evaluationand is blacklisted, his bank account should not be opened,and thus there is no need to do risk assessment anymore(we do not consider if this is the case in real life).Therefore, risk assessment is only required when onepasses the due diligence evaluation ðGðedd4ep) F raÞ4Gðra) epÞÞ. Moreover, the user gets confused at this pointand specifies that the bank should blacklist anyone whoseopen-account request is denied ðGðod) F blÞÞ, and if hisbank account is granted the bank should evaluate his duediligence again for double checking ðGðog ) F eddÞÞ.

The above rules give the following set of LTL formulae:

fGðbl) ef Þ,Gðedd4ep) F raÞ,Gðra) epÞ,

Gðod) F blÞ,Gðog ) F eddÞg

Note that Gðbl) ef Þ is added to complete the semanticsof R4. The independent tests of the domain knowledgeand compliance rules show that they are both satisfiable,but the conjunction of them yields a set of unsatisfiableformulae. However, this time the back-jumping proceduregives a large unsatisfiable core D which must containformulae from both the domain knowledge and the com-pliance rules. To refine D, we use option (1) to find aminimal unsatisfiable core D1DD, as shown below.

D1 ¼D1CR [ D1DK

where

D1CR ¼ fGðog ) F eddÞ,Gðod) F blÞ,Gðra) epÞ,Gðbl) ef Þ,Gðog ) rlÞ,F rag

D1DK ¼ fGðedd) XG:eddÞ,Gð:ef ) X edd B ef Þ,Gðef ) X edd B:ef Þ,Gðei) X edd B:eiÞ,Gðri) X ra B:riÞ,G:ðef4epÞ,G:ðei4epÞ,G:ðri4rlÞ,Gðra):end4:start4:edd4:og4:od4:blÞ,Fðog3odÞ,Gðstart) ri4:rh4:rl4ei4:ep4:ef Þ,startg

Interpreting the meaning of those formulae may betime consuming, so suppose the user is still not satisfiedwith this answer and wants to focus on compliance rules,in which case option (4) is used on D, and returnsD4 ¼ fGðog ) rlÞ,Gðod) rhÞ,Gðbl) ef Þ,Gðra) epÞ,Gðod)F blÞ,Gðog ) F eddÞg as a minimal unsatisfiable core ofcompliance rules. Note that this core is different from D1CR,this highlights the fact that there may be multiple


minimal unsatisfiable cores that give different causes ofthe inconsistency.

It seems that the minimal unsatisfiable core D4 is stilllengthy, but removing any formula j from it gives asatisfiable set D4\fjg. The reason for its large size is that itcaptures two interacting causes of inconsistency. First,fGðbl) ef Þg unveils that bl should only be executed whenone fails the due diligence evaluation, and fGðra) epÞgindicates that the client’s risk will be assessed only whenhe passes the due diligence evaluation. However, formu-lae fGðod) rhÞ,Gðod) F blÞg enforce that if the client’srisk is assessed to be high, then his opening accountrequest will be denied, and he will be blacklisted after-wards. Thus it is possible to blacklist a client even if hepasses the due diligence evaluation, and this violates therule that restricts blacklisting to only happen when ef istrue.1 Second, fGðog ) rlÞ,Gðog ) F eddÞg manifests thatog occurs only when the risk is assessed to be low, whichimplies that ra, and hence edd, have already been exe-cuted. But og will lead to edd again, which is not allowedby our ‘‘once’’ rule in the domain knowledge. The domainknowledge specifies that the goal of this process is eitherto grant the opening of an account (og), or to deny it (od).However, D4 closes the option of od since it will cause blto be executed incorrectly. The only remaining goaloggives rise to the restarting of edd, which forms a cyclethat never ends.

When the set of unsatisfiable core is reported to theuser, a new iteration is triggered so that business expertsand compliance experts can discuss and redefine thedomain knowledge or compliance rules. It is definitelynot trivial to automatically correct the set of unsatisfiableformulae, so human involvement is needed to ensure thatthe intended business process is captured.

5. Trace generation and analysis

If a set of compliance rules is satisfiable, we obtaina pseudomodel that describes all traces that conform tothe domain knowledge and compliance requirements.Section 5.1 shows how we extract traces from such apseudomodel. Then, we discuss how to check a propertyagainst a set of traces using logic in Section 5.2. Thistechnique is applied in Section 5.3 to verify correctnesscriteria over these traces.

5.1. Extracting traces

Given a pseudomodel, we extract traces as follows.Any sequence s¼ s0, . . . ,sn of states, starting at the rootnode of the pseudomodel can be extended into a trace. As

1 Note that the way we capture the semantics of a process is that if a

result (e.g., ef) is true at a state, then this result holds at that time. If an

action is true at a state, then this action happens at that time. We

enforce each action to only happen once, so each action can only be true

once in a trace, but the truth value of a result can persist. Therefore

Gðaction) resultÞ means the action can only happen when the resultholds, while Gðaction1 ) F action2Þ means if action1 happens, thenaction2 must happen after that. They cannot occur at the same state

because we have the ‘‘interleave’’ constraint.

we are modeling finite sequences with an end state, weconsider a trace to be complete if end 2 sn. Because of theonce constraint introduced in the Section 3, there will beno loops in the pseudomodel between the start and theend. Hence, the finite set of paths in the pseudomodelbetween the root state and a state labeled with end is theset of correct traces.

Note that it is possible to extract traces that takerepetition of activities into account by omitting the onceconstraint in the domain knowledge. Still, for our purpose,this does not seem to be appropriate. Business expertsrarely explicitly forbid the repetition of activity execution,but we feel that this is implicitly intended in many cases.Additionally, modeling all potential loops blurs the struc-ture of a generated process template. As this hindersdiscussions between business and compliance experts,we explicitly forbid repetition for our synthesis approach.

The time complexity of our traces extraction procedureis linear in the size of traces. If the process we are tryingto model is well-structured, the number of traces willusually be small. For flexible processes, however, therecould be a large number of traces. The number of stateson each trace, which is also the number of actions toexecute on a trace (it happens to be this case because ofthe way we encode the domain knowledge, cf. Table 1), isanother factor that determines the size of traces.

Example. Back to the rules we formulated in Section 3.2,some of the traces extracted from the pseudomodel areillustrated in Table 2. Here, the states of a trace arecharacterized by the conjunction of propositions that holdtrue in the respective state.

5.2. Process mining using logic

Traditional process mining takes the so-called eventlog as input, which contains the execution sequences ofevents that are recorded. It is common that event logsmay contain noise and be incomplete [31], but in ourapproach, the set of traces that represents the executionsequences of actions is generated by a theorem prover forLTL. Therefore, incorrectness of our traces usually indi-cates that the rules are not well defined. Moreover, we donot have to consider probabilistic or heuristic approachesfor handling the errors, and thus can focus on the analysisby using logic, which provides a more flexible andextensible way to reason about the information that thosetraces imply.

To incorporate the process mining procedure in ourcontext, we adopt the idea of theorem proving to analyzethe set of traces. To query the LTL theorem prover, we askif G) query is valid, where G is the encoded domainknowledge plus the formulated compliance rules. Notethat the implementation is a satisfiability tester so thatthe validity of G) query is converted to the satisfiabilityof G4:query, and if this turns out to be unsatisfiable, thenG) query is valid. Or, we can simply express the query asG4query to test whether the query is satisfiable.

Since the set G of formulae, which can be used toconstruct the set of traces, does not change in our processmining procedure, we use the set of traces instead for the

Table 2Excerpt of the extracted traces.

s1 : start4ei4ri,edd4ep4ri,ra4ep4rh,bl4ep4rh,od4ep4rh,end4ep4rhs2 : start4ei4ri,edd4ep4ri,ra4ep4rh,od4ep4rh,end4ep4rhs14 : start4ei4ri,edd4ef4ri,bl4ef4ri,ra4ef4rl,og4ef4rl,end4ef4rl^s32 : start4ei4ri,ra4rl4ei,og4rl4ei,edd4ep4rl,end4ep4rl^^s37 : start4ei4ri,bl4ei4ri,edd4ep4ri,ra4ep4rh,od4ep4rh,end4ep4rh^s42 : start4ei4ri,bl4ei4ri,ra4rl4ei,og4rl4ei,edd4ep4rl,end4ep4rl


queries, so that there is no need for invoking the theoremprover for each query. As a consequence, the query issimplified to asking whether a formula can be satisfied bythe set of traces. This greatly reduces the time cost of theprocedure compared to repeated calls to the theoremprover.

The testing of a query formula j against the set oftraces is based on the semantics of LTL. To know whetherG4p is satisfiable, where p is an atomic proposition, wecheck whether a given state (in this case, the first state ofeach trace) contains p, if p is in that state, then G4p issatisfiable. Formulae consisting of :,4,3,- are testedaccording to the semantics of propositional logic. Thoseinvolving temporal operators X,F,G are tested respectivelyby checking whether the formula is true at the next state,somewhere after (including) the current state, and all thestates from the current state. This is different from modelchecking in the sense that we are testing j against all thepossible models that G produces. Therefore, as long asthere is a trace that satisfies j, the trace checker willreturn true, otherwise it will return false. Since G and theset of traces P represent the same information in ourcontext, we will denote ‘‘test j against P’’ as the queryformula G4j in the rest of the paper.

The above method is particularly useful when we needto check a series of properties against the same set of traces,and those properties can be translated into very smallformulae. For example, those formulae being tested in thefollowing sections usually only contain one temporal opera-tor, thus the querying procedure only visits each state oneach trace once in the worst case. The ‘‘next’’ operator X iseasy to handle since we only need to test the next state, buttoo many eventualities such as F and U will complicate thecomputation. This method is not efficient if one wants totest a large formula, in which case we prefer to use theoremproving to solve the problem.

5.3. Analysis of extracted traces

As stated earlier, the goal of synthesizing a processtemplate out of compliance rules is to support experts ingetting a better understanding of the compliance aspectsand to discover missing or under-specified requirements.However, it is possible to detect such under-specificationby analyzing the extracted traces before proceeding tosynthesizing a process template. Yet, not every semanticalerror in the specification can be detected, so a humanexpert has to validate the synthesized process template.

In this section,we address the issue of under-specified LTLspecifications by checking correctness criteria for theextracted traces.

Let P be a set of traces derived from a pseudomodel,cf. Section 5.1. We leverage the information whether anaction a 2 A is optional for completing the process.

Definition 4 (Optional Actions). Given a set of actions Aand a set of traces P, the set AO of optional actions isdefined as AO ¼ fa 2 A : ( s 2 P:a=2sg. The set AM of man-datory actions is thus the complement of AO, i.e.,AM ¼ A\AO.

To detect optional actions, we simply test the satisfia-bility of G4G:a for every action a. If this is satisfied bysome traces, then a is optional.

We argue that the correctness of a specification wheresome activity is optional requires the existence of aspecific data condition under which the optional activityis executed. Even if the choice of executing an activity isto be made in a non-deterministic way, an appropriateresult-predicate which is set by an artificial initial activitymust be part of the model. Then, we still obtain acomplete specification of the behavior and, therefore,are able to ensure compliance of the created processtemplate with the requirements. For the traces in Table 2,for instance, og and od are optional activities. The conditionunder which og executes is ðrl4ef Þ3ðrl4epÞ3ðrl4eiÞ, i.e.,the risk object assumes the value ‘low’. Action og is executedindependently from the value of the due diligence evalua-tion object. For action od the condition is ðrh4ef Þ3ðrh4epÞ3ðrh4eiÞ, i.e., the risk is ‘high’. In contrast, actionbl is executed under the condition ðei4riÞ3ðei4 rhÞ3ðei4rlÞ3ðef4riÞ3ðef4rhÞ3ðef4rlÞ3ðep4rhÞ3ðep4rlÞ3ðep4 riÞ. Hence, none of the objects influences the decisionof executing bl, since bl appears with all combinations ofdata values. Yet, bl is optional. This indicates an under-specified LTL specification as conditions for executingoptional activities are not stated explicitly.

Definition 5 (Optional Action Execution Condition). Let AObe the set of optional actions w.r.t a set of traces P, and REthe set of mutually exclusive results. For an action a 2 AO,the execution condition is defined as conda ¼ ffr1, . . . ,rng :( s 2 P:( s 2 s:a 2 s4r1 2 s4r1 2 S14S1 2 RE4 . . .4rn 2s4rn 2 Sn4Sn 2 RE4n¼ 9RE9g where the sets Si, 1r irn,are different.


This definition describes the conditions under whichan action executes by investigating, for each observationof the action a, the data effects (results) that are true inthe same state as a. If an optional activity a has anexecution condition, which is a proper subset of thecombination of non-exclusive results, then this indicatesa well specified set of compliance rules. We formalize thistrace correctness criterion as follows.

Definition 6 (Proper Execution of Optional Actions). Let AObe the set of optional actions with respect to a set oftraces P and RE the set of mutually exclusive results.We define the set of all possible results interactionsas RI¼ ffr1, . . . , rng : r1 2 S14S1 2 RE4 . . .4rn 2 Sn4 Sn 2RE4 n¼ 9RE9g. An action a 2 AO has a proper execution iffconda � RI.

The rationale behind this definition is that the execu-tion condition of an optional action oa is proper if andonly if there exist some combinations of results thatprevent it from occurring. To check this, we test G4Fðr14 � � �4rn4oaÞ, for each fr1, . . . ,rng 2 RI. If there is a setin RI together with oa that is unsatisfiable, then thiscriterion is met for oa.

The proper execution of actions is the first correctnesscriterion to be investigated on traces before synthesizinga template. Referring to the set of traces in Table 2, wefind that this criterion is not met for activity bl.

The second correctness criterion also relates to theexecution of optional actions. Even if an optional actionhas a proper execution condition, the set of compliancerules might be specified in a way that allows a counter-intuitive execution of optional tasks. Imagine that rule R3from Section 3.2 is modified to

R3 : Gðod) ðrh3ef ÞÞ4Gðog ) ðrl4epÞÞ

Then, od has a proper execution condition which is:ðri4ef Þ3ðrh4ef Þ3ðrh4epÞ3ðrl4ef Þ3ðrh4eiÞ. Yet, we canobserve traces like the following.

s : start4ei4ri,edd4ef4ri,od4ef4ri,ra4ef4rl, . . . ,end4ef4rl

In this trace, we can observe that od has been executedbefore executing ra and it is still a compliant execution,because the condition ef3rh holds at the point of execu-tion of od. However, from the execution condition of odwe observe that it depends on the result of both actions raand edd. Thus, it seems reasonable to postpone theexecution of od until the state where ra and edd havebeen executed. Generally, we require that an optionalaction must not be executed until all actions upon whichit depends have been already executed. This property,related to a concept called natural order, is motivated bythe aim of deriving a well-structured process template,suited for discussions among experts, in an imperativemodeling language that emphasizes the control flow logic.Hence, only at a few branching points, data values areconsidered in order to decide on the continuation. Thenatural order, therefore, can be seen as a means to controlthe number of decision points which prevents the crea-tion of overly complex models.

Definition 7 (Natural Order). Let AO be the set of optionalactions with respect to a set of traces P, CondEff a be theset of results contributing to the condition of an optionalaction a 2 AO, the set CnAa ¼ fca 2 A : Rca \ CondEff aa|g bethe set of controlling actions for a. We say that a naturalorder between optional action a and its controllingactions CnAa is kept iff 8s 2 P:ð8si 2 s:a 2 si ) 8ca 2CnAa:(sj 2 s:ca 2 sj4jo iÞ, where i,j 2 N.

Definition 7 ensures that the execution of an optionalaction a must always be preceded by all actions whichcontribute to the execution condition of a.

The final correctness criterion for a set of traces isdata-completeness. A set of traces P is data-complete if forevery possible combination of results resulting from themandatory activities, there is a trace in which this combi-nation occurs.

Definition 8 (Traces Data-Completeness). Let P be a set oftraces, AM be the set of mandatory actions and REM be theset of mutually exclusive results of mandatory actions,defined as REM � RE: 8EM 2 REM 8r 2 EM :r 2 Ra wherea 2 AM . We define the set RIM ¼ ffr1, . . . , rng : r1 2 S14S1 2REM4 . . .4 rn 2 Sn4Sn 2 REM4n¼ 9REM9g. The set oftraces P is data-complete iff 8 C 2 RIM : ( s 2 P: ( si 2 s :8 r 2 C r 2 si where i40 and the sets Si are different.

To verify this, we test each exclusive set of results ofeach mandatory action. That is, for every fr1, . . . ,rng 2 RIM ,we ask G4Fðr14 � � �4rnÞ, if the answer is satisfiable forevery set in RIM, then the set of traces is data-complete.

A process template may be generated even if dataincompleteness is detected for a set of traces. However,the template could suffer from deadlocks as for somecombinations of results, continuation of processing is notdefined.

6. Addressing failed trace correctness criteria

In Section 5.3 we have described three correctnesscriteria for the generated traces that must be fulfilledbefore a process template can be generated.

In this section we discuss approaches to help refine thecompliance rules and thus the traces in case an improperexecution condition or data-incompleteness is found inthe traces.

6.1. Handling optional actions

A set of traces fails the proper execution conditioncriterion if at least one optional action appears under allpossible result interactions. The primary reason of thisviolation is the under-specification of conditions underwhich an optional action shall be executed. Businessexperts usually tend to express the rules in the way R4is specified, i.e., ‘‘If due diligence evaluation fails, then theclient has to be added to the bank’s black list’’, which isformalized as Gðedd4ef ) F blÞ. Thus, they focus on thereasons that call for executing some optional actionwithout explicitly specifying data conditions that musthold at the point in time such actions are executed. As aresult, the satisfiability checker generates some traces


that are compliant with the explicitly mentioned rules yetare meaningless from a business-expert point of view. Inthis behavior, optional actions are executed unnecessarily,cf. trace s37 in Table 2 where the client is immediatelyblacklisted before any other actions occur.

In this section, we address this problem of improperexecution conditions of optional actions by detecting so-called implied runs. The term implied run is inspired bythe term implied scenario [32,33] from the requirementsengineering domain. An implied scenario represents anunnecessary and usually unwanted extra behavior of asoftware system that was not intended by the users. Thedetection of implied scenarios indicates under-specifica-tion of requirements and triggers a new iteration ofspecification refinement.

Definition 9 (Implied Process Run). Given a set of actionsA and a set of process runs P, a process run s : s0, . . . ,sn 2P is called implied, if there exists an action a 2 AO and aninteger k such that a 2 sk, and there is a process runs0 : s0, . . . ,sk�1,skþ1, . . . ,sn 2 P.

Intuitively we can delete s since the almost identicals0 can substitute for it, and the optional action a has noeffect on the compliance of the process.

Based on the implied run notion, we reduce the set oftraces considered for process template generation byremoving implied runs. With our approach, we strivefor a minimal representation of the compliant behavior.Hence, the potential execution of an optional action shouldbe neglected in order to synthesize compact process tem-plates. To detect such optional execution, we check allprocess traces that contain an optional action such thatthe removal of the according state results in a process runthat is also in the set of all process runs. If so, this providesus with evidence that the optional action is not needed tocomplete the process run (trace) in a compliant manner.Hence, such an implied process run is not considered forprocess template generation. With the removal of impliedruns we can obtain a proper execution condition foroptional task and thus can continue trying to generate aprocess template. Moreover, we can suggest to the userfurther rules that explicitly state conditions for the execu-tion of optional tasks of the form Gða) condaÞ, where a isan optional task and conda is now the proper conditionaccording to Definition 6.

Definition 7 describes the natural order correctnesscriterion between optional actions and their controllers aslacking traces in which an optional action might appearbefore all of its controllers have already been executed. Ifthis criterion is not met, we follow the same approach wedid with the implied runs: we delete the traces in whichthe criterion is violated. This comes with the cost of notbeing completely faithful to the behavior stated in therules, but we argue that experts care only about executingthe optional action after its controlling actions havealready completed. Moreover, maintaining the naturalorder helps produce structured models which will bebetter understood by experts and better serve as a designtemplate for operational processes.

Based on each violating trace that we drop, we candeduce an explicit ordering rule between the optional

action and its controller of the form Gðstart) ðca B oaÞÞ,where oa is the optional action and ca is one of itscontrolling actions.

Example. In our running example, we have identifiedthat task bl has an improper execution condition, cf.Table 2. According to Definition 9, run s1 is an impliedrun because removing the state in which bl appears yieldsthe trace s2. We can see that bl was unnecessarily executedin run s1, because the task edd resulted in ep. That is, thedue diligence evaluation succeeded and there is no need toblack list the customer. On the other hand, trace s14 cannotbe dropped as there is no other trace that looks exactly thesame but avoids executing bl because removing the statewhere bl executes yields a non-compliant execution sincerule R4 will not be fulfilled. After dropping the implied runs,we can suggest the following rule Gðbl) ef Þ that explicitlystates that bl executes only when ef holds.

6.2. Identifying vacuously satisfied rules

The satisfiability checker is designed in a way thatgenerates all possible traces, that satisfy G. However, insome cases, the traces satisfy a rule vacuously, especiallyif these rules lead to contradictions with other rules.Imagine two actions a and b where a has the exclusiveresults r1 and r2. Suppose that we have two rules t1 :Gða4r1) F bÞ and t2 : Gð:bÞ. It is obvious that it is notpossible to satisfy t1 by allowing a to produce the effect r1because this contradicts with t2. Rather, t1 is satisfiedvacuously by never making the condition of t1 true.

Data-incompleteness of the traces occurs when somerules are vacuously satisfied. This indicates that someresult combinations will lead to a contradiction and thusthe satisfiability checker avoids producing these results.

Definition 10 (Vacuously satisfied rules). Let CR be the setof LTL rules representing compliance requirements, DK bethe set of LTL rules representing the domain knowledgewhere G¼DK [ CR and let P be the set of traces gener-ated for G. A rule r 2 CR [DK is vacuously satisfied by P,written as PFvr, iff 8s 2 P:)s 2 s:sFcondðrÞ, wherecond(r) is the condition part of the rule r.

For each rule, trace s 2 P and state s 2 S we need tocheck whether the conjunction of atomic propositionsthat hold true in state s logically implies the propositionalformula forming the condition of the rule. The result ofthis scan is the set VS¼ fr 2 CR [DK : PFvrg.

We construct VS by finding those rules r that areunsatisfiable when we force cond(r) to occur. That is, wecheck G4FðcondðrÞÞ and we run the test described in Section4. These inconsistencies are then reported to the user forcorrection, e.g., refining the rules, and then iterate, cf. Fig. 1.

7. Process template generation and evaluation

Given a set of traces that meets the aforementionedcorrectness criteria, we proceed by generating a processtemplate. To this end, we adapt techniques from the fieldof process mining and process restructuring to create aninitial process template in Section 7.1. Then, Section 7.2


shows how the initial template is augmented with dataconditions. Finally, Section 7.3 elaborates on the evalua-tion of the generated process template.

7.1. Generating process templates

The generation of an initial process template buildsupon techniques proposed in the field of process mining[34] and process restructuring [35,36]. Most mining algo-rithms neglect the difference between control flowdependencies and data flow dependencies when generat-ing a process model. Therefore, we cannot apply anexisting algorithm directly. Also, process templates areintended to serve as a means for discussion and negotia-tion between business and compliance experts. Hence, wewant to ensure that the generated template is easy tounderstand. It has been shown that block-structuredprocess models are more easy to understand than arbi-trarily structured process models [37]. Block-structured-ness refers to a topology of a process model that requiresevery node with multiple outgoing edges to have acorresponding node with multiple incoming flows suchthat both nodes form a single-entry single-exit block. Toobtain such a process model structure, we combine basictechniques from the field of process mining, i.e., beha-vioral relations known from the a-mining algorithm [34],with techniques that aim for the construction of a block-structured process model, see [35,36].

Order of actions. As a first step, we extract the prece-dence of actions. To this end, we employ the orderrelations known from the a-mining algorithm [34].

Definition 11 (Order Relations). Let P be a set of tracesand A the sets of actions. We define the following orderrelations for two actions a1,a2 2 A.

a14a2:
iff there is a trace s : s0, . . . ,sn 2 P, such thata1 2 si4a2 2 siþ1 for some 0r ion.
a1-a2:

iff a14a2 and a2{a1.

a1 J a2:

iff a14a2 and a24a1.

a1 # a2:
iff a1{a2 and a2{a1.
For two actions ordered by 4 , we know that the firstaction appears immediately before the second action. Weobtain the order relations by testing satisfiability for thefollowing formula expressed in LTL: G4Fða14X a2Þ. If thisformula succeeds, we conclude that a14a2. Then, therelations -, J, and # are derived from the results for the4 relation as stated in Definition 11. For the modelsynthesis, therefore, we have to test satisfiability usingthe method mentioned in Section 5.2 for n2 formulae withn as the number of distinct actions.

Synthesis of process model: Having defined the orderof actions, one may directly proceed by applying thea-mining algorithm [34] to construct a process model.However, this approach has drawbacks. First, the con-struction of a process model may result in a model thatshows behavioral anomalies, such as deadlocks. Thea-mining algorithm does not specify the requirementsfor the derivation of a sound process model without

anomalies on the level of behavioral relations. Therefore,one needs to construct a model using the algorithm andcheck its correctness subsequently. Second, the a-miningalgorithm encodes causal dependencies directly, whichmay lead to the interference of synchronization of parallelpaths and the choice about exclusive continuations. Thiscomplicates the annotation of data conditions, since thereis no unique point at which the decision is taken. Forthese reasons, we rely on the order relations of thea-mining algorithm, but adopt the synthesis techniquepresented in [35,36]. It leverages the notion of an orderrelations graph that is defined for a set of behavioralrelations. We adapt this notion towards the relationsintroduced earlier.

Definition 12 (Order Relations Graph). Let f-,#,Jg beorder relations for a set of actions A via Definition 11. Theorder relations graph G¼ ðV ,EÞ comprises all actions as nodes,V¼A, and the relations - and # as edges, E¼ ð- [ #Þ.

Edges in the order relations graph represent the orderor exclusiveness of actions. Both relations are distin-guished by unidirectional or bidirectional edges. Usingthis graph, we employ the synthesis technique introducedin [35,36]. We limit ourselves to an informal descriptionof this synthesis and refer to [35,36] for the formal details.

The synthesis employs the modular decomposition[38] of the order relations graph. It detects subgraphs,which show uniform relations with all other nodes of thegraph. The modular decomposition technique parses agraph into a rooted hierarchy of these subgraphs that aremaximal and non-overlapping in terms of their containednodes. The modular decomposition technique distinguishesdifferent types of detected subgraphs. Trivial subgraphscomprise only a single node. Subgraphs that are completegraphs are referred to as being XOR-complete, subgraphsthat are edgeless are referred to as AND-complete. Further,a subgraph is linear, if and only if all of its nodes aresequentially ordered. Finally, subgraphs that do not meetany of these requirements are called primitive. Algorithms toobtain a modular decomposition tree run in time linear inthe size of the order relations graph [38]. The size of thedecomposition tree is also linear in the size of the graph. Thesize of the graph, in turn, is determined by the number ofactions.

For the generation of a process template, we requirethe order relations graph to be free of primitives. If this isnot the case, user input is required on the behavioraldependencies between the actions that are part of thissubgraph. If the order relations graph is free of primitives,the synthesis algorithm iteratively constructs a processmodel from the identified subgraphs, cf. [35,36]. That is,trivial subgraphs contain a single action, which is addedas an activity to the process model. Subgraphs that areXOR-complete or AND-complete are represented by ablock that is bordered by gateways with either XOR- orAND-logic. Finally, subgraphs that are linear lead to theconstruction of edges between the respective activities.

Example. After we adapted the set of constraints for ourrunning example as discussed above, we derive the orderrelations graph, visualized as the first graph in Fig. 2. Since


the graph is free of primitive subgraphs, we proceed byapplying modular decomposition. Fig. 2 illustrates thisdecomposition which identified subgraphs comprisingnodes with uniform relations to all other nodes. Basedon these subgraphs, the model synthesis returns theprocess template visualized in Fig. 3. Note that thistemplate does not correctly represent the action ‘blacklist a client’ as an optional action. This is due to theabsence of any exclusive action that is executed instead ofthis activity. Such anomalies are corrected when annotat-ing the template with data conditions.

7.2. Annotating data conditions

The process template created so far lacks details on thedata conditions that lead to the execution of optionalactivities. We next augment the process template withsuch conditions. We already introduced the notion of anexecution condition for optional actions in Definition 5. Itis important to see that these conditions are not local (i.e.,consider only the directly preceding actions as in ourprevious work [10]). Instead, indirect data dependenciesin terms of results that have been obtained by actions thathappened long before are also taken into account.

To annotate the process template with these executionconditions one may guard the execution of each optionalactivity with the respective condition separately. How-ever, this may lead to overly complex templates. Imaginethat there is a sequence of actions where all actions areguarded by the same execution condition. Then, insertingdecisions points for all of them separately would inflatethe template drastically compared to inserting just onedecisions point that guards the whole sequence of actions.Hence, we proceed as follows.

1.
We consider all XOR-complete subgraphs that havebeen detected during the model synthesis. For each ofthese subgraphs, we investigate whether the actions ineach of the child subgraphs show the same executioncondition. If so, we annotate the edge leading from therespective splitting gateway with XOR-logic to theactions of the child subgraph with this executioncondition. We keep the information on whether allactions of a child of an XOR-complete subgraph couldbe treated in this way.
2.

Fig. 3. Intermediate process template synthesized for the example,annotations to constrain the execution of optional activities are still

missing.

For all optional actions that are have not been treatedby the previous step, we insert two additional gate-ways with XOR-logic directly before and after theaction. Note that the process template is acyclic andall actions have at most one predecessor and at most

Fig. 2. Order relations graph and the single step

one successor. The edge between the gateway beforethe action and the action itself is annotated with theexecution condition. Further, an edge is definedbetween the gateway before the action and the oneafter the action. This edge is annotated with thedisjunction of all results under which the executionof the optional action is not observed. Those results aredetermined by removing the result terms of theexecution condition from the set of all possible resultcombinations of the objects references in the execu-tion condition.

Again, this step does not impose computational chal-lenges. The data conditions for all actions have beendetermined before. The annotation of process templatesfirst requires iteration over all subgraphs and comparisonof the data conditions for all children. In a second step, weiterate over all optional actions that have not been treatedbefore.

Example. For our running example, we first observe thatthe XOR-complete subgraph that spans the actions og andod can be treated as follows. The edge leading to action ogis annotated with the result rl, whereas the edge leadingto action od is annotated with the result rh. That is, therequest to open an account is granted only if the risk isconsidered to be low. If the risk is high, the request isdenied. In addition, we have to deal with the optionalaction bl, which is not part of a child of an XOR-completesubgraph. As such, we introduce two XOR-gateways andannotate the edges as outlined above. Here, the edgebypassing the action bl is annotated with ei3ep since theevaluation object must have a value other than ef to avoidexecuting bl. However, due to the action edd which musthave occurred already, the object may only have value ef

s of the modular decomposition.

Fig. 4. Annotated process template synthesized for the example.Fig. 5. A compliant process template where bl and og are exclusive andthe conditions for their execution have been adjusted.


or ep. The obtained annotated process templates is shownin Fig. 4.

7.3. Evaluation of the synthesized process template

Process templates aim to support experts in getting abetter understanding of the compliance aspects and todiscover missing or under-specified requirements. Suchunder-specification is manifested in the process templatein terms of semantical problems. Those problems can onlybe detected by human experts. In this section, we willfurther elaborate on the running example to illustrate suchproblems. Using the process template in Fig. 4 as a basis ofthe discussion between compliance expert and businessexpert, they identify that the template allows for executingboth black listing the client and granting the request to openthe account in the same trace. This is an example of theaforementioned semantical problems caused by under-spe-cified compliance rules. The compliance expert refines theset of constraints by indicating that black listing andgranting the request to open an account are contradictory,cf. the CA relation in Section 3.2, formalized as Gðog )Gð:blÞÞ and Gðbl) Gð:ogÞÞ. Repeating the steps of ourapproach reveals that the adapted set of compliance rulesyields a set of traces that is data incomplete. This isexplained based on the two added constraints as follows.By forcing bl and og to be exclusive, we implicitly require blto be executed only with the condition ef4rh, while og isexecuted only with the condition ep4rl. Other combinationsof results are not considered. There is no trace thataddresses the situation wher

Date post:	23-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

An iterative approach to synthesize business process...

Documents