+ All Categories
Home > Documents > IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

Date post: 03-Feb-2022
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
14
IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 1 Decentralized Enactment of BPEL Processes Michael Pantazoglou, Ioannis Pogkas, and Aphrodite Tsalgatidou Abstract—This article presents BPELcube, a framework comprising a scalable architecture and a set of distributed algorithms, which support the decentralized enactment of BPEL processes. In many application domains, BPEL processes are long-running, they involve the exchange of voluminous data with external Web services, and are concurrently accessed by large numbers of users. In such context, centralized BPEL process execution engines pose considerable limitations in terms of scalability and performance. To overcome such problems, a scalable hypercube peer-to-peer topology is employed by BPELcube in order to organize an arbitrary number of nodes, which can then collaborate in the decentralized execution and monitoring of BPEL processes. Contrary to traditional clustering approaches, each node does not fully take charge of executing the whole process; rather, it contributes to the overall process execution by running a subset of the process activities, and maintaining a subset of the process variables. Hence, the hypercube- based infrastructure acts as a single execution engine, where workload is evenly distributed among the participating nodes in a fine- grained manner. An experimental evaluation of BPELcube and a comparison with centralized and clustered BPEL engine architectures demonstrates that the decentralized approach yields improved process execution times and throughput. Index Terms—Composite Web Services, Processes, Business Process Management, Simulation of Business Processes 1 I NTRODUCTION T He Web Services Business Process Execution Lan- guage [1], abbreviated to WS-BPEL or BPEL, is widely considered the de facto standard for the imple- mentation of executable service-oriented business pro- cesses as compositions of Web services. The language specification defines a set of activities to support syn- chronous and asynchronous interactions between a pro- cess and its clients, as well as between a process and external Web services. Moreover, a number of struc- tured activities are used to implement typical control flow units such as sequential or parallel execution, if- else statements, loops, etc. Hence the control flow of a business process is realized by a number of activities, which are appropriately ordered and put together. The BPEL language also provides the necessary elements to support the expression of common programming concepts such as scope encapsulation, fault handling, compensation, and thread synchronization. Data handling is realized by means of variables, which are conveniently used by a BPEL process to hold the data that are generated and/or consumed upon execution of its constituent activities. Thus the various activities of a process are able to share data with each other simply by reading from and writing to one or more of the process variables. To date, most of the available solutions for the ex- ecution of BPEL processes have been designed and operate in a centralized manner, whereby an orchestrator component running on a single server is responsible for the execution of all process instances, while all relevant The authors are with the Department of Informatics and Telecommunica- tions, National and Kapodistrian University of Athens, Panepistimiopolis, Ilissia, Athens 15784, Greece. E-mail: {michaelp,jpogkas,atsalga}@di.uoa.gr data are maintained at a single location (i.e. the server hosting the BPEL engine). Clearly, such approach cannot scale in the presence of a potentially large number of simultaneous, long-running process instances that pro- duce and consume voluminous data. While in some cases clustering techniques are supported and can be employed to address the scalability issue, the deploy- ment and maintenance of clusters consisting of two or more centralized BPEL engines sets requirements on the underlying hardware resources, which cannot be always fulfilled by the involved organizations. Furthermore, clustering could be proved an inefficient approach under certain conditions, as it cannot overcome the emergence of bottlenecks that are caused by specific activities of a BPEL process. Hence, a more fine-grained workload distribution approach is called for to ensure scalability of the BPEL execution engine at lower cost. In the following section, we introduce a motivating scenario from the environmental domain so as to better capture the problem in real-world terms. 1.1 Motivation In order to facilitate the development, delivery and reuse of environmental software models, service orientation has been recently pushed forward by several important initiatives 1, 2, 3 and international standardization bodies 4 in the environmental domain. In the light of those ef- forts, both geospatial data and geo-processing units are exposed as Web services, which can be used as building blocks for the composition of environmental models in the form of BPEL processes [2], [3], [4], [5]. However, 1. INSPIRE, http://inspire.jrc.ec.europa.eu/ 2. GMES, http://www.gmes.info 3. SEIS, http://ec.europa.eu/environment/seis/ 4. OGC, http://www.opengeospatial.org/
Transcript
Page 1: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 1

Decentralized Enactment of BPEL ProcessesMichael Pantazoglou, Ioannis Pogkas, and Aphrodite Tsalgatidou

Abstract—This article presents BPELcube, a framework comprising a scalable architecture and a set of distributed algorithms, whichsupport the decentralized enactment of BPEL processes. In many application domains, BPEL processes are long-running, theyinvolve the exchange of voluminous data with external Web services, and are concurrently accessed by large numbers of users.In such context, centralized BPEL process execution engines pose considerable limitations in terms of scalability and performance.To overcome such problems, a scalable hypercube peer-to-peer topology is employed by BPELcube in order to organize an arbitrarynumber of nodes, which can then collaborate in the decentralized execution and monitoring of BPEL processes. Contrary to traditionalclustering approaches, each node does not fully take charge of executing the whole process; rather, it contributes to the overall processexecution by running a subset of the process activities, and maintaining a subset of the process variables. Hence, the hypercube-based infrastructure acts as a single execution engine, where workload is evenly distributed among the participating nodes in a fine-grained manner. An experimental evaluation of BPELcube and a comparison with centralized and clustered BPEL engine architecturesdemonstrates that the decentralized approach yields improved process execution times and throughput.

Index Terms—Composite Web Services, Processes, Business Process Management, Simulation of Business Processes

F

1 INTRODUCTION

THe Web Services Business Process Execution Lan-guage [1], abbreviated to WS-BPEL or BPEL, is

widely considered the de facto standard for the imple-mentation of executable service-oriented business pro-cesses as compositions of Web services. The languagespecification defines a set of activities to support syn-chronous and asynchronous interactions between a pro-cess and its clients, as well as between a process andexternal Web services. Moreover, a number of struc-tured activities are used to implement typical controlflow units such as sequential or parallel execution, if-else statements, loops, etc. Hence the control flow of abusiness process is realized by a number of activities,which are appropriately ordered and put together. TheBPEL language also provides the necessary elementsto support the expression of common programmingconcepts such as scope encapsulation, fault handling,compensation, and thread synchronization.

Data handling is realized by means of variables, whichare conveniently used by a BPEL process to hold the datathat are generated and/or consumed upon execution ofits constituent activities. Thus the various activities of aprocess are able to share data with each other simply byreading from and writing to one or more of the processvariables.

To date, most of the available solutions for the ex-ecution of BPEL processes have been designed andoperate in a centralized manner, whereby an orchestratorcomponent running on a single server is responsible forthe execution of all process instances, while all relevant

• The authors are with the Department of Informatics and Telecommunica-tions, National and Kapodistrian University of Athens, Panepistimiopolis,Ilissia, Athens 15784, Greece.E-mail: {michaelp,jpogkas,atsalga}@di.uoa.gr

data are maintained at a single location (i.e. the serverhosting the BPEL engine). Clearly, such approach cannotscale in the presence of a potentially large number ofsimultaneous, long-running process instances that pro-duce and consume voluminous data. While in somecases clustering techniques are supported and can beemployed to address the scalability issue, the deploy-ment and maintenance of clusters consisting of two ormore centralized BPEL engines sets requirements on theunderlying hardware resources, which cannot be alwaysfulfilled by the involved organizations. Furthermore,clustering could be proved an inefficient approach undercertain conditions, as it cannot overcome the emergenceof bottlenecks that are caused by specific activities ofa BPEL process. Hence, a more fine-grained workloaddistribution approach is called for to ensure scalabilityof the BPEL execution engine at lower cost.

In the following section, we introduce a motivatingscenario from the environmental domain so as to bettercapture the problem in real-world terms.

1.1 Motivation

In order to facilitate the development, delivery and reuseof environmental software models, service orientationhas been recently pushed forward by several importantinitiatives1,2,3 and international standardization bodies4

in the environmental domain. In the light of those ef-forts, both geospatial data and geo-processing units areexposed as Web services, which can be used as buildingblocks for the composition of environmental models inthe form of BPEL processes [2], [3], [4], [5]. However,

1. INSPIRE, http://inspire.jrc.ec.europa.eu/2. GMES, http://www.gmes.info3. SEIS, http://ec.europa.eu/environment/seis/4. OGC, http://www.opengeospatial.org/

Page 2: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 2

several challenges arise upon this paradigm shift. Effi-cient execution and monitoring of long-running envi-ronmental processes that consume and produce largevolumes of data, in the presence of multiple concurrentprocess instances are among the prominent issues thatone should effectively deal with.

Fig. 1. A BPEL process used for the calculation oflandslide probabilities in a user-specified area.

Let us consider the BPEL process of Figure 1, whichwas developed by one of our industrial partners in theENVISION project [6], and is part of a decision supportsystem dedicated to landslide risk assessment. Overall,the process consists of 15 activities, which are repre-sented in the diagram as rounded boxes, and 10 vari-ables, which are shown as cornered boxes. The controlflow of the process is indicated by normal arrows con-necting the various activities, while the dashed arrowspointing from the variables to the activities and viceversa display its data flow. The various assign activitiesin the process are used to copy data from one variable toanother; the invoke activities allow the process to interactwith external Web services; the receive and reply activitiesare used by the process in order to retrieve the userinput and send the final output, respectively; finally, thestructured sequence and flow activities dictate the orderin which their included activities will be executed.

In essence, the landslide process orchestrates four Webservices. First, a digital elevation model (DEM) of theuser-specified area is retrieved by invoking an appropri-ate service through activity Invoke1. In parallel, a sensor

service is called through activity Invoke2 in order toretrieve the precipitation data of the user-specified area.These data along with a set of user input parametersare further fed through activity Invoke3 to a processingservice, which simulates the main mechanisms of thewater cycle by a system of reservoir. The produceddigital elevation model and the hydrological model con-taining the produced map of groundwater level in thatarea are finally passed as input to another processingservice, which is invoked through activity Invoke4 andperforms static mechanical analysis, in order to calculatethe landslide probabilities in the area of study, in theform of a map of safety factors ranging between zeroand one. That map is finally returned to the user as theprocess response.

According to our partner, the execution time of theBPEL process implementing the landslide model mayrange from a few seconds/minutes to many days, de-pending on the addressed area and the desired levelof detail. Those delays are usually introduced by theexternal Web services, and force the process instances toremain alive consuming valuable resources of the BPELengine. Such long-running instances also involve thegeneration, consumption and processing of large datasets ranging between a few kilobytes and hundreds ofmegabytes, such as the sensor observations, the digitalelevation model, etc. Moreover, due to their potentiallylong duration, clients need to monitor the executionprogress by retrieving intermediate results (e.g. the ex-ternal Web service outputs) from the execution engine.As the number of clients increases, the resources of thecentralized infrastructure hosting the executable processare quickly exhausted, and the BPEL engine becomesbloated with requests coming from multiple concurrentusers. Hence, the overall throughput of the executioninfrastructure is dramatically deteriorated, while the pro-cess execution times escalate at unacceptable levels.

Looking beyond the above scenario, one may findthat, similar scalability and performance issues havebeen observed in many other application domains whereBPEL is used, such as supply chain management, onlineretail, or healthcare [7]. BPEL applications related to syn-chronous multiple bookings, or asynchronous compositesearching and downloading [8] have also shown the lim-itations of centralized BPEL engines. Going back to ourmotivating example, all the aforementioned conditionscurrently prevent the exposure and reuse of the landslidemodel on the Web. Our partner cannot afford expensiveservers to host a cluster of BPEL engines, while the useof Cloud computing solutions [9] is prohibited due toadministrative and security policies. Instead, an idealsolution to the identified scalability and performanceproblems would be to distribute the execution of theprocess activities and the maintenance of the processvariables across the existing infrastructure, which con-sists of less expensive equipment with limited hardwareresources.

Page 3: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 3

1.2 Contribution

In an effort to address situations such as the one de-scribed previously, we introduce a framework compris-ing a scalable Peer-to-Peer (P2P) architecture and a setof distributed algorithms to support the decentralizedenactment of BPEL processes. Our framework, dubbedBPELcube hereinafter, particularly focuses on the im-provement of the average process execution times, andthe enhancement of the overall throughput of the execu-tion infrastructure in the presence of multiple, concur-rent and long-running process instances. BPELcube ismainly characterized by the following features:

• Fully decentralized, P2P-based BPEL engine architecture.BPEL processes are deployed, executed, and mon-itored by a set of nodes organized in a hypercubeP2P topology. Each node does not fully take chargeof executing the whole process; rather, it contributesby running a sub-set of the process activities, andmaintaining a sub-set of the generated process data.Thus the BPEL execution engine is fully operationalwithout the need of any central controller compo-nents.

• Fine-grained distribution of process activities. Decen-tralization of process execution fits to the natureof long-running business-to-business interactions,and significantly improves the performance andthroughput of the execution infrastructure. BPELprocesses are fully decomposed into their con-stituent activities. Large-scale parallelization is fea-sible as the various activities designated to runin parallel can be synchronized and executed bydifferent nodes.

• Proximity-based distribution of process variables. Sincein many application domains processes consumeand produce large volumes of data, it is importantthat those data are distributed in order to avoid re-source exhaustion situations. Our algorithms makesure that the data produced by a BPEL process willbe distributed across the nodes involved in its exe-cution. Moreover, they will stay close to the processactivities that produce them, thereby avoiding theunnecessary transfer of potentially large volumes ofdata between nodes as much as possible.

• Asynchronous interaction with the client. Even if aBPEL process is synchronous following the request-response communication pattern, the interaction be-tween the client and the distributed execution en-gine occurs in an asynchronous, non-blocking man-ner. This way, the execution engine is able to servemultiple long-running process instances without theneed to maintain open connections to the respec-tive clients over long periods of time. Furthermore,while waiting for a long-running process instanceto complete, clients are given the monitoring mecha-nisms to retrieve intermediate results, without inter-vening or inflicting additional delays to the processexecution.

• Efficient use of the available resources and balanced work-load distribution. The BPELcube algorithms ensurethat all nodes available in the P2P infrastructurewill contribute to the execution of BPEL processes.The frequency of use of each node is taken intoaccount upon load balancing, while efficient routingtechniques are employed in order to achieve aneven distribution of the workload at any given timeand thereby avoid the emergence of performancebottlenecks.

In the following section, we present an analysis ofthe relative literature and pinpoint the added value ofour work in the context of decentralized BPEL processexecution. Then, we proceed in Section 3 with the de-tailed presentation of our proposed approach. Examplesbased on the landslide BPEL process that was describedin Section 1.1 are given where necessary in order tobetter explain the various algorithms. An experimentalevaluation of our approach along with the retrievedmeasurements are presented and discussed in Section 4,while we conclude this paper and identify paths forfuture work in Section 5.

2 RELATED WORK

2.1 Distributed Scientific Workflow SystemsIn the last years, developments in Scientific Workflow(SWF) systems have made possible the efficient, scalableexecution of long-running workflows that comprise largenumbers of parallel tasks, and operate on large sets ofdata. Since the challenges met by those efforts bear someresemblance to the motivation behind BPELcube, wewould like to emphasize on their different scope andtechnical foundations.

By definition, the majority of SWF solutions are partic-ularly designed to support the modeling and executionof in silico simulations and scientific experiments. More-over, they are mostly based on proprietary executablelanguages, which are tailor-made to the needs of suchapplications. On the other hand, by using the BPEL stan-dard as its underlying basis, BPELcube has more generalpurposes and can be used to support a wider rangeof environments and applications. The different scopesof BPELcube and the various SWF systems are alsoreflected by their pursued programming models. Due totheir data-flow orientation, most SWF engines (see forinstance Taverna [10]) follow a functional, data-drivenmodel, whereas BPEL engines, including BPELcube,implement an imperative, control-driven model. Hence,the focus of BPELcube is on implementing algorithmsthat distribute the control flow of processes, in a way thatno central coordinator is required. On the other hand,since the control flow is of minor importance to scientificworkflows, SWF systems build on efficient paralleliza-tion and pipelining techniques, in order to improvethe processing of large-scale data flows. For instance,Pegasus [11] attains scalability by mainly addressing thelarge number of parallel tasks in a single workflow,

Page 4: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 4

and the corresponding voluminous data sets, throughtask clustering and in-advance resource provisioning. InBPELcube, we are primarily interested in improving thethroughput of the BPEL engine, and the average processexecution times, in the presence of large numbers ofconcurrently running process instances that are long-running, and consume potentially large data sets.

Most SWF systems (e.g. Kepler [12], Triana [13], orPegasus [14]) exhibit a clear separation of concernsbetween the design of a workflow and the executioninfrastructure, although much effort has been spent onsupporting Grid settings such as Globus. In general,however, Grid infrastructures are heavy-weight, com-plex, and thus difficult to manage and maintain. Incontrast, BPELcube is able to seamlessly organize andmanage any set of nodes in a hypercube topology, soas to engage them in the execution of long-running andresource-demanding processes.

Still, despite their inherently different scopes, pro-gramming models, and scalability concerns, SWF sys-tems have effectively dealt with advanced data man-agement aspects, such as provenance [15], or high-speeddata transfer [16]. These features are complementary toour approach and could be accommodated by BPELcubeto further enhance its capabilities and performance.

2.2 BPEL Decentralization

The decomposition and decentralized enactment ofBPEL processes is a valid problem that has been thesubject of many research efforts in the last years. In thefollowing, we review a number of related results thathave become available in the literature.

A P2P-based workflow management system calledSwinDeW that enables the decentralized execution ofworkflows was proposed by Yan et al. [17]. Accord-ing to the authors, the generic workflow representationmodel is compatible with most concrete workflow lan-guages including BPEL, although this compatibility isnot demonstrated. In any case, similar to our presentedapproach, SwinDeW is based on the decomposition ofa given workflow into its constituent tasks, and theirsubsequent assignment to the available nodes of a P2Pnetwork, in order to remove the performance bottleneckof centralized engines. Unlike BPELcube, however, thereis no evaluation of SwinDeW to assess its applicabilityin settings where processes are long-running and pro-duce large volumes of data. Besides, a main differencebetween the two approaches lies in their correspondingworker recruitment algorithms: SwinDeW makes useof the JXTA [18] peer discovery mechanism to findnodes with specific capabilities, and then quantifies theirworkload before assigning the given task to the one withthe minimum workload. Since the respective discoveryprotocol cannot guarantee that all relevant peers will befound upon a query, it may become possible that notall available nodes in the P2P network are equally uti-lized. In BPELcube, the recruitment algorithm relies on

the hypercube topology, the inherent ability to performefficient random walks, and the frequency of use of eachnode in order to evenly divide the workload and therebyexploit all available resources.

The NINOS orchestration architecture [7] is basedon a distributed content-based publish/subscribe (pub-/sub hereinafter) infrastructure, which is leveraged totransform BPEL processes into fine-grained pub/subagents. The latter then interact using pub/sub mes-sages and collaborate in order to execute a process ina distributed manner. A critical departure of BPELcubefrom the NINOS approach lies in the respective processdeployment mechanisms. In NINOS, a BPEL process isdeployed prior to its execution to a number of agents,which remain the same for all subsequent executionsof the process. Hence, the infrastructure may underper-form in the presence of multiple concurrent instancesof the same process. In our case, the BPEL process isdecomposed and its constituent activities are assignedto the available nodes in the P2P network at runtime,depending on their current workload which is inferredby their frequence of use. Furthermore, the evaluationof NINOS has been performed in a more relaxed settingin terms of message sizes and external service delays,which does not give any evidence of the applicability ofthat approach to long-running processes producing bigdata.

In an attempt to improve the throughput of the BPELprocess execution infrastructure in the presence of multi-ple concurrent clients, a program partitioning techniquehas been proposed by Nanda et al. [19], which splits agiven BPEL process specification into an equivalent set ofprocesses. The latter are then executed by different servernodes without the need of a centralized coordinator.Similar approaches have also been proposed by Baresi etal. [20] as well as by Yildiz and Godart [21]. Along thesame lines, the use of a penalty-based genetic algorithmto partition a given BPEL process and thereby allowdecentralized execution was proposed by Ai et al. [22].However, to realize these partitioning techniques, eachparticipating node must host a full-fledged BPEL engine,which is often heavyweight and therefore not alwaysaffordable by many small organizations and businesses.In our approach, there is no such requirement imposedon the nodes forming the underlying P2P infrastructure,and thus each node has a relatively small memoryfootprint. This way, our distributed BPEL engine canleverage and be deployed on hardware with limitedcapabilities, as it is already demonstrated by the experi-mental setting described in Section 4.

A solution to the problem of decentralized BPEL work-flow enactment that is based on the use of tuplespacetechnology was reported by Wutke et al. [23]. Accordingto that approach, workflows are defined as BPEL pro-cesses, which are split among all participating partners,and are implemented directly by the individual com-ponents. The latter are deployed and coordinate them-selves using shared tuplespace(s). Like our approach, the

Page 5: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 5

tuplespace technology facilitates the execution of data-intensive workflows, since it allows for data distributionand yields a decrease of messages being passed betweenthe interacting components. Unlike our approach, how-ever, the use of tuplespace technology decouples datamaintenance from the execution infrastructure. Althoughtuplespaces are proven technology and could be used asan alternative mechanism for data sharing by the BPEL-cube engine, their deployment would require additionalhardware resources. Furthermore, the overall approachrequires considerable preparatory work such as compo-nent configuration to be conducted at deployment time,which could eventually become a scalability bottleneck.

In order to effectively separate the concerns of reg-ular BPEL engines and various other complex aspects,including decentralized orchestration, Jimenez-Peris etal. proposed the ZenFlow BPEL engine [24]. ZenFlowemploys techniques from reflective and aspect-orientedprogramming, and makes use of specialized meta-objectsto describe and plug the extensions necessary for de-centralized execution into the centralized BPEL engine.In this work, however, decentralization is enabled bymeans of a cluster of centralized BPEL engines, witheach one being responsible for the execution of thewhole process each time. With BPELcube we follow afine-grained decentralization strategy, whereby the BPELprocess is decomposed into the consituent activities, theexecution of which is distributed among the nodes of aP2P network.

The CEKK machine that was presented by Yu [8]supports P2P execution of BPEL processes based onthe continuation-passing style. In this approach, theexecution control is formalized as a continuation, andis passed along from one processing unit to anotherwithout the interference of a central coordinating compo-nent. In this distributed execution environment, specialattention is paid to the problem of failure handling andrecovery, while a number of extensions to the BPELlanguage are introduced. Overall, this approach focuseson the formalization of a distributed process executionmodel and does not address aspects related to the struc-ture of the P2P infrastructure, or the distribution ofprocess activities and variables. Furthermore, it lacks anevaluation that would allow us to assess its applicabil-ity to the execution of long-running and data-intensiveBPEL processes.

3 THE BPELCUBE FRAMEWORK

We introduce BPELcube, a framework comprising ascalable architecture and a set of distributed algorithmsto support the efficient execution of BPEL processes,which are long-running, consume and/or produce vo-luminous data, while they are simultaneously accessedby large numbers of clients. Central to our approachis its underlying P2P infrastructure, which implementsa binary hypercube topology to organize an arbitrarynumber of available nodes.

In general, a complete binary hypercube consists ofN = 2d nodes, where d is the number of dimensionsequaling to the number of neighbors each node has.Hence the network diameter, i.e. the smallest number ofhops connecting two most distant nodes in the topology,is ∆ = log2 N .

Hypercubes have been widely used in P2P comput-ing [25] [26] [27], and are particularly known for aseries of attributes, which are also fundamental for theapplicability of our approach:

• Network symmetry. All nodes in a hypercube topol-ogy are equivalent. No node incorporates a moreprominent position than the others, while any nodeis inherently allowed to issue a broadcast. Conse-quently, in our case, any node can become the entrypoint for the deployment and execution of a process.

• Efficient Broadcasting. It is guaranteed that, upon abroadcast, a total of exactly N − 1 messages arerequired to reach all N nodes in the hypercubenetwork, with the last ones being reached afterat most ∆ steps, regardless of the broadcastingsource. Since broadcasts are extensively used in ourapproach for the deployment and undeployment ofBPEL processes, this property proves to be criticalin terms of performance.

• Cost-effectiveness. The hypercube topology exhibitsan O(log2 N) complexity with respect to the mes-sages that have to be sent, for a node to join or leavethe network. Hence, the execution of the respectivejoin and leave protocols does not inflict the overallperformance of the distributed BPEL engine.

• Churn resilience. It is always possible for the hyper-cube topology to recover from sudden node losses.This makes possible the deployment of the dis-tributed BPEL engine in less controlled WAN envi-ronments, if needed, where churn rates are naturallyhigher than the ones met in centrally administeredLANs.

Each node in the hypercube topology is capable ofexecuting one or more individual BPEL activities aspart of a given process instance execution, while alsomaintaining one or more of the instance’s data variables.Thus, one or more nodes are engaged to contribute inthe execution of a process instance, and coordinate witheach other in a completely decentralized manner thatis exclusively driven by the defined structure of thecorresponding process.

3.1 BPELcube Node Architecture

The main internal components of a node participatingin the BPELcube engine are shown in Figure 2. TheP2P Connection Listener acts as the entry point of eachnode accepting incoming requests from other nodes inthe hypercube. Each request is bound to a new P2P con-nection, which is then passed to a P2P Connection Handlerfor further processing. Since the latter runs in a separate

Page 6: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 6

Fig. 2. Internal architecture and main components of the BPELcube node.

thread, it is possible for a node to simultaneously servemore than one incoming requests.

Depending on its type, a request is always associatedwith a particular P2P service, which the P2P ConnectionHandler selects, instantiates and executes. P2P servicesfall into two distinct categories:

• Hypercube services are used by the node to performthe various tasks needed for the maintenance of thehypercube topology. Such tasks implement the joinand leave algorithms of the hypercube protocol, aswell as additional functionality such as broadcast-ing, random walks, heartbeat, etc., which is essentialfor the P2P network.

• BPEL services encapsulate all functionality necessaryfor the distributed deployment, execution, and mon-itoring of BPEL processes by the BPELcube engine.Such services provide for the execution of individ-ual BPEL activities (by employing the appropriateBPEL activity executors), the read/write of processvariables, the response to notifications such as thecompletion of an activity or the completion of aprocess, etc.

The P2P services may follow a simple one-waycommunication, or otherwise implement the request-response pattern, in which case the corresponding P2PConnection Handler is used to send back the responsemessage. The execution of most supported P2P servicesincludes the invocation of one or more P2P serviceson other nodes within the hypercube. This is typicalfor instance in the hypercube service implementing thebroadcast scheme, or the BPEL services that are used toexecute a particular activity. To support such situations,each node is equipped with a P2P Service Client, whichis responsible for establishing a P2P connection with aspecified node, submitting the prepared service request,and receiving the corresponding response, if any. Finally,the majority of the supported P2P services make useof a local database that is embedded within the node.

The database holds all information that is needed bya node to participate in the hypercube topology, andalso maintains the various tuples, which are generatedupon deployment and execution of a BPEL process. TheBPEL data may be optionally replicated at the node’simmediate neighbors, in order to improve the networkresilience to higher churn rates. In such a case, to ensuredata consistency, each change on the BPEL data of a nodeyields the exchange of a maximum of log2N replicationmessages.

In the following paragraphs, we proceed with thedescription of our approach, explaining how the BPEL-cube architecture is used for the distributed deployment,execution, and monitoring of BPEL processes.

3.2 Process Deployment

For a BPEL process to be deployed to the BPELcubeengine, a request containing a bundle with all necessaryfiles needs to be submitted to one of the availablenodes in the hypercube. In particular, this bundle con-tains the BPEL process specification, the WSDL inter-face, the WSDL files of all external services, as wellas any potentially required XML schemas and/or XSLTtransformation files. Upon receipt of the deploy request,the node first performs a syntactic validation of theincluded files, and then decomposes the process intoits constituent activities and variables. The goal is togenerate a convenient process representation that willfacilitate its decentralized execution. Similar to the workreported in [19], our BPEL decomposition mechanismrelies on the use of Program Dependence Graphs (PDGs)for representing the control, data, and synchronizationdependencies of the process activities. From a well-formed PDG, it is then easy to decompose the originalBPEL process (i.e. to identify the individual processactivities and variables) by simply traversing the graphstructure.

Page 7: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 7

As soon as it has been properly parsed and decom-posed, a BPEL process p is stored in a 4-tuple as follows:

⟨idp, V, A, idm⟩ (1)

In this tuple, idp is the unique identifier of the BPELprocess, V = {idvi}ni=1 is a set containing the uniqueidentifiers of all variables defined within the process, Ais an ordered set containing all its activities in order ofappearance, while idm is the unique identifier of the mainactivity, i,e, the one that is always executed first.

Each element a ∈ A represents an activity of the givenprocess and takes the form of an 8-tuple

⟨ida, s, idp(a), Vr, Vw, Ls, Lt, C⟩ (2)

where:• ida is the unique identifier of the activity within the

context of its owner process• s holds the BPEL specification of the activity that

will be used for its execution• idp(a) is the unique identifier of the immediate

parent activity of a, i.e. the structured activity thattriggers the execution of a

• Vr is a set containing the identifiers of all variablesthat the activity reads from

• Vw is a set containing the identifiers of all variablesthat the activity writes to

• Ls is a set containing the identifiers of the activitieswhich are synchronized with activity a and must becompleted before a

• Lt is a set containing the identifiers of the activitieswhich are synchronized with activity a and will waitfor it to complete before executing themselves

• C is a set containing the identifiers of all activities,the parent of which is activity a

The deployer node persists the decomposed processlocally, and initiates a broadcast within the hypercube, inorder to efficiently send the process tuple to all availablenodes. Thus, at the end of the broadcast, all nodes inthe hypercube are in a position to serve requests for theexecution of that particular process, without having tocarry out the potentially time-consuming tasks of processvalidation and decomposition.

The undeployment of a BPEL process is performed ina similar way. It may be triggered by any node in thehypercube upon receipt of an undeploy request contain-ing the process identifier. The recipient node checks ifit is currently involved in the execution of one or moreinstances of that process. If it is, it schedules the deletionof the corresponding process tuple after all instances arecompleted. Otherwise, the process tuple is immediatelyremoved from the node’s local database. Then, the nodeinitiates a broadcast within the hypercube sending theundeploy request to all available nodes. Each recipientperforms the same algorithm in order to safely removethe process tuple from its local database without af-fecting any running instances of that process. Thus, theprocess tuple is eventually removed from all nodes inthe hypercube and the undeployment is completed.

3.3 Process Execution

The execution of an already deployed process is trig-gered each time an ExecuteProcess request is received byone of the available nodes in the hypercube, containingthe process identifier idp and the initial input, if any. Toensure even distribution of workload, the recipient nodestarts a random walk within the hypercube by means ofshortest-path routing, in order to appoint the node thatwill actually take over the role of the execution manager.

The appointed manager creates a new P2P session forthe execution of the process, and stores it in a tuple:

⟨idp2p, idp, Ea, Ev, em, sp⟩ (3)

Each P2P session tuple is distinguished by a uniqueidentifier idp2p, while it is associated with the processto be executed through the process identifier idp. Inaddition, it includes two tables, Ea = {(ida, e)i}|A|

i=1 andEv = {(idv, e)i}|V |

i=1, which map the process activitiesand variables to the endpoint addresses of the workernodes responsible for their execution and maintenance,respectively. Finally, the P2P session tuple contains theendpoint address em of the manager node, as well asthe current status sp of the process instance (i.e. waiting,running, completed, or failed).

3.3.1 Recruiting Worker Nodes

In order to properly fill in the tables Ea and Ev beforestarting out the execution of the process, a distributedrecruitment algorithm is carried out. In doing so, themanager first assigns the initial receive and final replyactivities to itself, and therefore takes over the responsi-bility for maintaining the variable holding the user input.Then, it broadcasts its new timestamp of last use to allits immediate neighbors in the hypercube. Afterwards, itselects its Least Recently Used (LRU) hypercube neighborin the lowest possible dimension, in order to send toit a recruitment request containing the new P2P sessiontuple.

The recipient of a recruitment request continues therecruitment procedure by executing Algorithm 1. First,the receiver node updates the first entry in Ea where e isnull with its own endpoint address, thereby appointingitself as a worker in the context of the particular P2Psession. Then, it creates a new triple representing thespecific activity instance:

⟨idp2p, ida, sa⟩ (4)

As it can be seen, the activity instance is identifiedby the combination of the P2P session and activityidentifiers, and further has an execution status sa ∈{waiting, running, completed, failed}.

In case the assigned activity writes to a variable thathas not yet been assigned to any other node (i.e. thereis no entry for that variable in table Ev), the nodealso becomes responsible for the maintenance of thatvariable and updates Ev with a new entry accordingly.

Page 8: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 8

Algorithm 1: Processing of a recruitment request.input : ⟨idp2p, idp, Ea, Ev, em⟩

1 begin2 eL ← get local endpoint address3 Get process tuple based on idp4 recruitment complete← true5 foreach (ida, e) ∈ Ea do6 if e = null then7 e← eL8 Update Ea

9 recruitment complete← false10 Get activity tuple based on process tuple

and ida11 Vw ← get from activity tuple12 foreach idv ∈ Vw do13 (idv, e)← get corresponding entry

from Ev

14 if e = null then15 e← eL16 Update Ev

17 end18 end19 N ← get all hypercube neighbors20 Broadcast timestamp of last use to all

nodes ∈ N21 break22 end23 end24 if recruitment complete = false then25 w ← get LRU neighbor in lowest dimension26 e← get endpoint address from w27 Send Recruitment request to e28 else29 Send RecruitmentCompleted notification to em30 end31 end

Each variable that a node becomes responsible for in thecontext of a P2P session is locally stored in a triple

⟨idp2p, idv, v⟩ (5)

where idp2p is the identifier of the P2P session, idv isthe identifier of the variable, and v is a holder of thevariable value.

As soon as the aforementioned updates are performed,the node broadcasts its new timestamp of last use toall its immediate neighbors in the hypercube. Finally,the new worker selects its LRU hypercube neighbor inthe lowest possible dimension and forwards to it therecruitment request which now contains the updatedP2P session tuple.

The same steps are performed each time a node isvisited during the recruitment procedure, until all entriesin Ea are properly set, and all variables are assignedin Ev . Our algorithm ensures that the process activitiesand variables will be distributed to the least recentlyused nodes in a fair manner, although some nodes maybe assigned more than one activities and/or variables,depending on the process at hand and the size of the hy-percube. In any case, the last recruited node sends backto the manager a notification containing the updated

P2P session tuple, thereby signaling the completion ofthe recruitment procedure. Finally, the manager sends amessage to the client containing the P2P session tuple,so that the client can use the included information laterfor monitoring purposes.

It should be noted that, there is no one-to-one cor-respondence between the activities and the hypercubenodes. In other words, at any given time, it is pos-sible for a node to be responsible for more than oneactivities belonging to more than one process instances.Depending on the available resources, the dimension ofthe hypercube may be scaled up in order to improve theperformance of BPELcube as the number of concurrentprocess instances increases.

Example: Let us describe how the recruitment algo-rithm works in the case of the landslide BPEL process ofFigure 1. For the sake of simplicity in our example, weassume that the BPELcube engine has just started andcomprises a hypercube of eight nodes (3-cube). Figure 3,read from left to right and top to bottom, demonstratesthe sequence in which the hypercube nodes are visitedupon receipt of an execution request by node 000, whileTable 1 shows the recruitment results, i.e. the distributionof the BPEL activities and variables to the hypercubenodes. As it can be seen, the recruitment algorithmmanaged to engage all available nodes while taking intoaccount their frequence of use upon distribution of theworkload.

3.3.2 Running the Process InstanceAs soon as the recruitment of worker nodes is finished,the manager node retrieves from table Ea the endpointaddress of the worker responsible for the main activityof the process and sends to it an ExecuteActivity requestcontaining the activity identifier and the P2P sessiontuple that corresponds to the process instance. Uponreceipt of the request, the worker node executes thespecified activity according to Algorithm 2, as describedbelow.

First, the worker node uses the process identifier idpthat is included in the P2P session tuple in order toretrieve the corresponding process tuple from its localdatabase. Then, it uses the activity identifier ida toextract the corresponding activity tuple from set A.

Before proceeding with the activity execution, theworker waits until an ActivityCompleted notification issent by all activities in Ls, in the context of the sameP2P session. Afterwards, the worker uses the contentsof Vr in order to resolve the variables that the activityneeds to read from, and retrieves the endpoint addressesof the nodes responsible for their maintenance in the P2Psession. Then, for each variable, it sends a ReadVariablerequest to the respective node and caches the response,or simply reads the variable locally, in case the workeris the actual holder.

As soon as all needed variables are read, the workerruns the activity and stores the values of all variablesfound in Vw. This is either done locally, in case the

Page 9: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 9

Fig. 3. Recruitment of workers for the execution of the landslide environmental model.

TABLE 1Recruitment procedure results.

Hypercube Node Assigned Activities Assigned Variables000 Receive, Reply, Invoke2 LandslideInput, Precipitation100 Sequence1, Assign3 HydroModelInput110 Flow, Invoke3 HydrologicalModel010 Sequence2, Assign4 SafetyFactorsMapInput011 Assign1, Invoke4 DEMInput, SafetyFactorsMapOutput111 Invoke1, Assign5 DEM, LandslideOutput101 Sequence3 –001 Assign2 PrecipitationInput

variable has been assigned to that worker, or remotely,by sending a WriteVariable request to the respective node.

Finally, the worker sends an ActivityCompleted noti-fication to the nodes in charge of all activities in Lt,containing the activity identifier and the updated P2Psession tuple. In case the completed activity has a par-ent, the worker also sends the same ActivityCompletednotification to the node in charge, otherwise it sends aProcessCompleted notification to the manager, signalingthe completion of the process instance.

In the context of a process instance, the same algorithmis repeated by each node that receives an ExecuteActivityrequest. In this way, the process activities are executedone after the other, according to the designated controlflow. The proper execution order of all nested activities isensured by the BPEL activity executors that implementtheir parent structured activities.

Example: Let us describe how the structured activ-ity Sequence2 of the landslide BPEL process will be exe-cuted based on the results of the recruitment procedure

shown in Table 1. In principle, a sequence activity withina BPEL process is responsible for sequentially executingall its child activities. In our example, node 010, whichis responsible for the execution of Sequence2, sends anExecuteActivity request to node 011, and waits until itreceives back an ActivityCompleted notification. Node 011is responsible for the execution of activity Assign1, whichis the first child activity of Sequence2.

Since the activity Assign1 is synchronized with activityAssign2 through a BPEL link (see arrow in Figure 1),node 011 will wait until an ActivityCompleted notificationis sent from node 001. Then, it sends a ReadVariablerequest to node 000, in order to retrieve the value ofthe LandslideInput variable. After that, the node proceedswith the execution of the copy statements within the as-sign activity, and locally writes the produced outcome tothe DEMInput variable. At this point, the Assign1 activityhas completed and node 011 sends an ActivityCompletednotification to node 010, which is in charge of the parentactivity, Sequence2.

Page 10: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 10

Algorithm 2: BPEL activity execution.input : ida, ⟨idp2p, idp, Ea, Ev, em⟩

1 begin2 eL ← get local endpoint address3 Get process tuple based on idp4 Get activity tuple based on process tuple and ida5 Wait for all activities in Ls to complete6 foreach idv ∈ Vr do7 e← get endpoint address from Ev for idv8 if e = eL then9 Read value of variable idv locally

10 else11 Read value of variable idv from e12 end13 end14 Execute activity a15 foreach idv ∈ Vw do16 e← get endpoint address from Ev for idv17 if e = eL then18 Persist new value of variable idv locally19 else20 Persist new value of idv in e21 end22 end23 Notify all activities in Lt

24 if p(a) = null then25 e← get endpoint address from Ea for p(a)26 Send ActivityCompleted notification to e27 else28 Send ProcessCompleted notification to em29 end30 end

Node 010 resumes the execution of Sequence2, whichdictates that the Invoke1 activity be executed next. To doso, an ExecuteActivity request is sent to node 111 whichis responsible for that activity. Before performing theactual invocation of the Digital Elevation Model WCS,node 111 retrieves the required input by reading theDEMInput variable from node 011. After invocation, theservice output is locally written to variable DEM, andan ActivityCompleted notification is sent back to node010, allowing it to complete the execution of activitySequence2, and send the appropriate notification to node110, which is in charge of the execution of the parentFlow activity.

3.4 Process Monitoring

The ability to monitor various aspects of a long-runningprocess instance while executing is a critical user re-quirement in most application domains. The BPELcubeengine addresses this need and facilitates the respectivemonitoring tasks. At any time during execution of aprocess instance, the client can easily obtain monitoringdetails in a non-intrusive manner. Typical monitoringinformation includes the process instance status, the sta-tus of individual activities, as well as the current valuesof the process variables. In the following, we describehow the provided monitoring mechanism enables theretrieval of such details, which can be presented to

the client in a user-friendly, visualized manner throughappropriate front-end software.

3.4.1 Retrieving Process Instance Status

The status of a process instance can be obtained directlyfrom its appointed manager node, the endpoint addressof which is known to the client through the P2P sessiontuple. In doing so, the client sends a GetProcessInstanceS-tatus request containing the identifier idp2p of the corre-sponding P2P session. In response, the manager retrievesthe P2P session tuple and sends back to the client theprocess execution status value sp.

3.4.2 Retrieving Individual Activity Status

While waiting for a long-running process instance tocomplete, the client often needs to keep track of theexecution progress at the activity level. Our monitoringmechanism enables the client to retrieve this informationby directly contacting the nodes responsible for the ex-ecution of each individual activity. More specifically, fora given process instance and activity, the client resolvesfrom the P2P session tuple the endpoint address of thecorresponding node through table Ea, and submits toit a GetActivityStatus request containing the P2P sessionidentifier idp2p and the activity identifier ida. The nodethen simply returns to the client the current status saof the specified activity instance taken from the corre-sponding activity instance tuple.

3.4.3 Retrieving Intermediate Results

BPEL processes usually depend on data from manyexternal sources, which are commonly retrieved throughinvocations of the appropriate Web services, and arestored in the various variables. In many applications,those intermediate data may be equally important tothe user as the final result. Our monitoring mechanismfacilitates the extraction of intermediate results from the’black box’ surrounding the execution of a BPEL process.Thanks to the information stored in table Ev in each P2Psession (i.e. process instance), the client is able to directlyinteract with the node in charge of each variable, andretrieve the current value of the specified variable bysending a ReadVariable request containing the P2P sessionand variable identifiers.

3.5 Resilience to Network Failures

Thanks to the hypercube topology, the BPELcube engineexhibits resilience to changes in the underlying P2Pnetwork. In general, when a node in the hypercubegoes offline, we are interested in preserving all of itshypercube-specific and BPEL-specific data. The latter aretranslated into the variable values, activity assignments,and P2P session tuples that are maintained by the node.Here we define two recovery strategies corresponding tograceful and unexpected node departures.

Page 11: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 11

3.5.1 Recovery From Graceful Node DeparturesIn the case of a graceful node departure from the hy-percube, the respective leave protocol is executed andone of its neighbors takes over its position. By properlyextending the hypercube leave protocol we ensure that,in addition to the hypercube-specific information, thereplacing node will also obtain all BPEL-specific datafrom the departing node. After that, the replacing nodebroadcasts a notification, so that all the other nodes in thehypercube update the activity and variable assignmentsin all affected P2P session tuples. This is simply doneby replacing the departed node’s endpoint address withthat of the replacing node.

3.5.2 Recovery From Unexpected Node DeparturesIf the hypercube topology has been deployed to anopen environment with high churn rate, it is alwayspossible that nodes go unexpectedly offline, withoutperforming the hypercube leave protocol. In such cases,the hypercube protocols ensure that the vacant positionswill be taken over by other nodes eventually, leadingto a stable and consistent topology. However, the onlyway to also ensure that the BPEL-specific data of thedeparted nodes will be preserved is to enable replication.This feature is optional in BPELcube, mainly becauseit comes with the price of increased network trafficand resources consumption. Nevertheless, thanks to thehypercube topology, we are able to keep the side-effectslimited, as the BPEL-specific contents of each node arereplicated only to its neighbors yielding the exchange ofa maximum of log2 N additional messages each time achange is made in the node’s BPEL data.

4 EVALUATION

With the use of a prototype implementation in Java,we performed an evaluation of BPELcube in a num-ber of experiments. The goal was to quantitavely as-sess the benefits of the proposed architecture and algo-rithms in terms of average process execution time andthroughput. In order to cover various aspects affectingthe performance, the experimental measurements weretaken in three different settings. More specifically, weinvestigated the effect of (i) the request rate, (ii) themessage sizes, and (iii) the external service delays onthe performance of the distributed execution engine.Additionally, in all cases, by comparing BPELcube tocentralized and clustered architectures, we identifiedtheir limitations and gained insight on the conditionswhere each approach would be best suited.

4.1 Experimental SetupFor our experiments we utilized a 100Mbps LAN of16 workstations, with each one running Linux Ubuntu8.0.4 (kernel 2.6.24-31) on Intel (R) Core (TM) 2 DuoCPU [email protected] processor and 1 GB RAM. In thissetting, we deployed a single node so as to use it as

the centralized BPEL execution engine, a cluster of twonodes with a simple load balancer acting as a clusteredBPEL engine, and a three-dimensional hypercube (3-cube) of 8 nodes that formed our distributed BPELengine. Another node was reserved for the deploymentof the client, which we implemented and used to feedthe aforementioned nodes with ExecuteProcess requestsat various rates.

All measurements were taken by executing the BPELprocess of Figure 1. In order to be able to modifyand test with different message sizes and delays, weimplemented simulators of the four actual Web servicesbeing invoked by the process, and deployed them on thesame LAN. Finally, the duration of the client executionfor the retrieval of each measurement in all experimentswas set to 10 minutes.

4.2 Experimental Results4.2.1 Varying Request RateIn the first part of our experiments, we examined thebehavior of the three different architectures in the pres-ence of up to 300 concurrently running instances of thelandslide process. The message size was set at 1024kb,while the external service delay was set at 15 seconds.

As the results in Figure 4 suggest, all approachesexhibited similar throughput and execution times forrates of up to 70 requests per minute. However, the per-formance of the centralized deployment was seriouslydegraded for larger request rates, making this approachnot able to serve more than 150 requests per minute.It is also worth mentioning that, in the case of 150requests per minute, the average process execution timewas about five times longer than that of BPELcube.

Besides, even though the clustered architecture wasresponsive at higher request rates, its throughput wasconsiderably deteriorated, while the average process ex-ecution times were significantly increased. Concluding,the measurements made clear that this approach cannotsupport more than 200 concurrent requests with accept-able response times. On the other hand, our distributedarchitecture managed a relatively steady throughputregardless of the request rate, and a slight increase onthe average process execution times.

4.2.2 Varying Message SizeIn the second part of the experiments, we assessed thecapability of BPELcube to cope with larger data pro-duced by the external Web services, and also identifiedthe limitations of the centralized and clustered deploy-ments. Like in the previous set of runs, the externalservice delay was again specified at 15 seconds, whilewe configured the client to feed the engine with 50ExecuteProcess requests per minute.

The measured throughput and average process exe-cution times are shown in Figure 5. The results showedthat, for message sizes up to 1280kb, all approaches per-form in a similar way. The performance of the centralized

Page 12: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 12

0

20

40

60

80

100

20 40 60 80 100 150 200 250 300

Thr

ough

put (

/min

)

Request rate (/min)

CentralizedCluster with 2 engines

3-cube

0

50

100

150

200

250

300

350

20 40 60 80 100 150 200 250 300

Avg

. pro

cess

exe

cutio

n tim

e (s

)

Request rate (/min)

CentralizedCluster with 2 engines

3-cube

Fig. 4. Performance measurements with varying request rates.

0

20

40

60

80

100

256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072

Thr

ough

put (

/min

)

Message size (kb)

CentralizedCluster with 2 engines

3-cube

0

50

100

150

200

250

300

350

256 512 768 1024 1280 1536 1792 2048 2304 2560 2816 3072

Avg

. pro

cess

exe

cutio

n tim

e (s

)

Message size (kb)

CentralizedCluster with 2 engines

3-cube

Fig. 5. Performance measurements with varying message sizes.

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40 45 50 55 60

Thr

ough

put (

/min

)

External service delay (s)

CentralizedCluster with 2 engines

3-cube

0

50

100

150

200

250

300

350

0 5 10 15 20 25 30 35 40 45 50 55 60

Avg

. pro

cess

exe

cutio

n tim

e (s

)

External service delay (s)

CentralizedCluster with 2 engines

3-cube

Fig. 6. Performance measurements with varying external service delays.

engine was seriously degraded for larger data, as it waspractically stalled for message sizes larger than 2304kb.On the other hand, the clustered engine exhibited thebest performance of all approaches for message sizesup to 2304kb, while our distributed engine was able tomaintain a relatively steady average process executiontime and a slowly decreasing throughput in all cases.

The partial superiority of the clustered BPEL engineover BPELcube was due to the significantly smallernumber of required message exchanges, as the produced

data were maintained in two nodes only. However, theexperiment showed that the cluster’s resources capacitywas not adequate to accommodate processes involvingvariables with size larger than 2560kb. On the contrary,such issues were not observed in our approach, thanksto the distribution of the process variables in all nodesavailable in the hypercube.

All in all, the results of this part of the experimentscould provide a guide for the selection of the mostappropriate architecture in terms of scalability. In other

Page 13: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 13

words, the deployment of a cluster could be avoided forprocesses with small-sized variables, whereas the use ofa distributed architecture such as the one championed byBPELcube would be required to accommodate concur-rently running processes with data variables of highervolume.

4.2.3 Varying External Service DelayThe third part of the experiments concerned the impli-cations of external service delays to the performance ofthe BPEL execution engine, in the presence of multipleconcurrently running process instances. To investigatesuch situations, we programmatically enforced an artifi-cial delay in the execution of the four Web services beinginvoked by the landslide process that ranged betweenzero and 60 seconds. Furthermore, we set a constantmessage size of 1024kb and ensured a steady rate of50 ExecuteProcess requests per minute coming from theclient.

The results, shown in Figure 6, indicate once againthe limitations of centralized process execution, as thecentralized engine was again stalled for service delayslasting more than 45 seconds. This can be explained bythe fact that too many process instances remained active,as each one of them required more time to complete dueto the extended waiting for responses of the invokedservices. Besides, although the clustered engine was ableto cope with the multiple concurrent and long-runningprocess instances, its throughput was up to 50% lowerthan that of BPELcube. Regarding the average processexecution times, both the centralized and clustered ap-proaches were outperformed by our distributed BPELengine for delays of more than 50 seconds.

4.2.4 Workload DistributionWe conclude the presentation and discussion of ourexperimental results with an evaluation of BPELcubein terms of workload distribution. In the context ofBPEL process execution, workload amounts to the to-tal number of process activities and variables that areassigned to each hypercube node over time. In orderto assess the efficiency of the distributed hypercube-based architecture and algorithms, we run another 10-minute round of feeding the deployed three-dimensionalhypercube with landslide process execution requests.The request rate was set at 50/min, the message sizewas specified at 1024kb, while the external service delaywas set at 15 seconds.

As it can be seen in Figure 7, our recruitment algo-rithm achieved a remarkably even distribution of theactivities, exploiting all the available hypercube nodesand considering at the same time their frequence ofuse. On the other hand, the measurements revealedsome deviations regarding the distribution of processvariables. Nevertheless, such behavior was anticipated,since apart from the receive, assign, and invoke activities,the rest of the activities found in the landslide BPELprocess do not produce any data (i.e. they do not write

0

200

400

600

800

1000

node1 node2 node3 node4 node5 node6 node7 node8

Wor

kloa

d

Hypercube nodes

Number of assigned activitiesNumber of assigned variables

Fig. 7. Distribution of activities and variables across thenodes of the three-dimensional hypercube BPEL engine.

to any of the process variables). Hence, the same nodesresponsible for the execution of data-producing activitieswere inevitably also used for the maintenance of thecorresponding variables that held the produced data.

5 DISCUSSION

We presented a distributed architecture based on thehypercube P2P topology along with a set of algorithmsthat enable the decentralized execution of BPEL pro-cesses. Our approach targets towards the improvementof the average process execution times and the en-hancement of the overall throughput of the executioninfrastructure, in the presence of multiple long-runningprocess instances that involve the exchange of large data.The presented algorithms support the decomposition ofa given BPEL process and the subsequent assignmentof the constituent activities and data variables to theavailable hypercube nodes. Execution is then performedin a completely decentralized manner without the exis-tence of a central coordinator. Our distributed approachalso provides a lightweight monitoring mechanism thatdoes not intrude into the process execution, but ratherallows the retrieval of monitoring information from thehypercube in a seamless manner.

We evaluated our approach in a series of experiments,and compared it with centralized and clustered architec-tures in terms of performance. The retrieved measure-ments indicate that our hypercube-based architectureis more suitable for the execution of long-running anddata-intensive processes, while it is able to accommodatemore concurrent clients than the other two architectures.Moreover, thanks to the even distribution of workload,our approach copes with large data in a more efficientmanner.

In future work, we aim to expand our worker re-cruitment algorithm so as to consider additional factorslike network proximity or other QoS, which are com-plementary to the frequence of use of the employednodes. This expansion will facilitate the deployment ofthe hypercube-based engine on less controlled settings

Page 14: IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX

IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X, NO. X, MONTH 20XX 14

such as WAN networks. We are also interested in ex-tending the proposed architecture to support Cloud-based deployment of the BPELcube engine. We antic-ipate that by moving BPELcube to the Cloud, we willbe able to exploit elasticity capabilities for dynamicallyincreasing or decreasing the hypercube dimension. Thisway, the BPELcube engine will be able to effectivelyand timely respond to workload changes. Finally, interms of implementation, we will investigate the use ofparallel query processing techniques to further enhancethe performance of BPELcube nodes, in the presence ofmultiple concurrently running process instances.

REFERENCES

[1] OASIS, “Web Services Business Process Execution LanguageVersion 2.0,” Apr. 2007. [Online]. Available: http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html

[2] B. Schaffer and T. Foerster, “A client for distributed geo-processing and workflow design.” Journal of Location Based Ser-vices, vol. 2, no. 3, pp. 194–210, 2008.

[3] A. Weiser and A. Zipf, “Web service orchestration of ogc webservices for disaster management,” in Geomatics Solutions forDisaster Management, ser. Lecture Notes in Geoinformation andCartography, J. Li, S. Zlatanova, A. G. Fabbri, W. Cartwright,G. Gartner, L. Meng, and M. P. Peterson, Eds. Springer BerlinHeidelberg, 2007, pp. 239–254.

[4] X. Meng, F. Bian, and Y. Xie, “Research and realization ofgeospatial information service orchestration based on BPEL.” inProceedings of the 2009 International Conference on EnvironmentalScience and Information Application Technology, ESIAT 2009. IEEEComputer Society, 2009, pp. 642–645.

[5] F. Theisselmann, D. Dransch, and S. Haubrock, “Service-orientedarchitecture for environmental modelling - the case of a dis-tributed dike breach information system,” in Proceedings of the 18thWorld IMACS/MODSIM Congress, 13-17 July 2009, pp. 938–944.

[6] D. Roman, S. Schade, A. J. Berre et al., “Environmental servicesinfrastructure with ontologies - a decision support framework,”in Proceedings of EnviroInfo 2009: Environmental Informatics andIndustrial Environmental Protection: Concepts, Methods and Tools.Shaker Verlag, 2009, pp. 307–315.

[7] G. Li, V. Muthusamy, and H.-A. Jacobsen, “A distributed service-oriented architecture for business process execution.” ACM Trans-actions on the Web, vol. 4, no. 1, 2010.

[8] W. Yu, “Peer-to-peer execution of bpel processes.” in CAiSEForum, ser. CEUR Workshop Proceedings, J. Eder, S. L. Tomassen,A. L. Opdahl, and G. Sindre, Eds., vol. 247. CEUR-WS.org, 2007.

[9] M. Armbrust, A. Fox et al., “A view of cloud computing,” Com-munications of the ACM, vol. 53, no. 4, pp. 50–58, Apr. 2010.

[10] P. Missier, S. Soiland-Reyes et al., “Taverna, reloaded,” in SSDBM2010, M. Gertz, T. Hey, and B. Ludaescher, Eds., Heidelberg,Germany, June 2010.

[11] S. Callaghan, E. Deelman et al., “Scaling up workflow-basedapplications,” J. Comput. Syst. Sci., vol. 76, no. 6, pp. 428–446,Sep. 2010.

[12] I. Altintas, C. Berkley et al., “Kepler: An extensible system fordesign and execution of scientific workflows.” in SSDBM. IEEEComputer Society, 2004, pp. 423–424.

[13] I. J. Taylor, M. S. Shields et al., “Distributed P2P Computing withinTriana: A Galaxy Visualization Test Case.” in 17th InternationalParallel and Distributed Processing Symposium (IPDPS 2003). IEEEComputer Society, 2003, pp. 16–27.

[14] E. Deelman, G. Singh et al., “Pegasus: A framework for mappingcomplex scientific workflows onto distributed systems,” Sci. Pro-gram., vol. 13, no. 3, pp. 219–237, Jul. 2005.

[15] I. Altintas, O. Barney, and E. Jaeger-Frank, “Provenance collectionsupport in the kepler scientific workflow system,” in Proceedingsof the 2006 international conference on Provenance and Annotation ofData, ser. IPAW’06. Berlin, Heidelberg: Springer-Verlag, 2006, pp.118–132.

[16] W. Allcock, J. Bresnahan et al., “The globus striped gridftp frame-work and server,” in Proceedings of the 2005 ACM/IEEE conferenceon Supercomputing, ser. SC ’05. Washington, DC, USA: IEEEComputer Society, 2005, pp. 54–.

[17] J. Yan, Y. Yang, and G. K. Raikundalia, “SwinDeW-a p2p-baseddecentralized workflow management system.” IEEE Transactionson Systems, Man, and Cybernetics, Part A, vol. 36, no. 5, pp. 922–935,2006.

[18] L. Gong, “JXTA: A Network Programming Environment,” IEEEInternet Computing, vol. 5, no. 3, pp. 88–95, 2001.

[19] M. G. Nanda, S. Chandra, and V. Sarkar, “Decentralizing ex-ecution of composite web services.” in Proceedings of the 19thAnnual ACM SIGPLAN Conference on Object-Oriented Programming,Systems, Languages, and Applications, OOPSLA ’04, J. M. Vlissidesand D. C. Schmidt, Eds. ACM, 2004, pp. 170–187.

[20] L. Baresi, A. Maurino, and S. Modafferi, “Towards distributedBPEL orchestrations.” Electronic Communications of the EASST,vol. 3, 2006.

[21] U. Yildiz and C. Godart, “Towards decentralized service orches-trations,” in SAC ’07: Proceedings of the 2007 ACM symposium onApplied computing. New York, NY, USA: ACM Press, 2007, pp.1662–1666.

[22] L. Ai, M. Tang, and C. Fidge, “Partitioning composite web servicesfor decentralized execution using a genetic algorithm,” FutureGeneration Computer Systems, vol. 27, no. 2, pp. 157 – 172, 2011.

[23] D. Wutke, D. Martin, and F. Leymann, “Model and infrastructurefor decentralized workflow enactment,” in Proceedings of the 2008ACM Symposium on Applied Computing, SAC ’08. New York, NY,USA: ACM, 2008, pp. 90–94.

[24] R. Jimenez-Peris, M. Patino Martınez, and E. Martel-Jordan, “De-centralized web service orchestration: A reflective approach.”in Proceedings of the 23rd Annual ACM Symposium on AppliedComputing. ACM, 2008, pp. 494–498.

[25] M. T. Schlosser et al., “Hypercup - hypercubes, ontologies, andefficient search on peer-to-peer networks.” in Proceedings of theFirst International Conference on Agents and Peer-to-Peer Computing,AP2PC 2002, ser. Lecture Notes in Computer Science, G. Moroand M. Koubarakis, Eds., vol. 2530. Springer, 2002, pp. 112–124.

[26] H. Ren, Z. Wang, and Z. Liu, “A hyper-cube based p2p informa-tion service for data grid.” in Proceedings of the Fifth InternationalConference on Grid and Cooperative Computing, GCC 2006. IEEEComputer Society, 2006, pp. 508–513.

[27] E. Anceaume, R. Ludinard et al., “Peercube: A hypercube-basedp2p overlay robust against collusion and churn.” in Proceedings ofthe Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems, SASO 2008, S. A. Brueckner, P. Robertson, andU. Bellur, Eds. IEEE Computer Society, 2008, pp. 15–24.

Michael Pantazoglou is a post-doc research associate at the De-partment of Informatics and Telecommunications of the National andKapodistrian University of Athens, Greece. He is currently a member ofthe S3Lab (http://s3lab.di.uoa.gr) group. His current research interestsspan service-oriented computing, with a focus on service discovery andcomposition, P2P computing, and semantic web technologies.

Ioannis Pogkas received his BSc and MSc from the Department ofInformatics and Telecommunications of the National and KapodistrianUniversity of Athens, Greece. His research interests focus on distributedsearch, reputation and execution mechanisms in peer-to-peer networks.He is also interested in mobile computing.

Aphrodite Tsalgatidou is associate professor at the Department ofInformatics and Telecommunications of the National and KapodistrianUniversity of Athens, Greece. Aphrodite is the director of the S3Labgroup (http://s3lab.di.uoa.gr), which pursues research in service engi-neering, software engineering and software development.


Recommended