Computers and Chemical Engineering - Semantic Scholar · PDF fileNumerical continuation is the...

Pd

Ga

b

a

ARR1AA

KPPBPP

1

tcoddtotiGBca1o

g

FQrC

0d

Computers and Chemical Engineering 38 (2012) 94– 105

Contents lists available at SciVerse ScienceDirect

Computers and Chemical Engineering

journa l h o me pag e: w ww.elsev ier .com/ locate /compchemeng

arallel tools for the bifurcation analysis of large-scale chemically reactiveynamical systems

aetano Continilloa,∗, Artur Grabskia, Erasmo Mancusia,1, Lucia Russob

Dipartimento di Ingegneria, Università del Sannio, Piazza Roma, 82100 Benevento, ItalyIstituto di Ricerche sulla Combustione CNR, Piazzale Tecchio, 80125 Naples, Italy

r t i c l e i n f o

rticle history:eceived 17 November 2010eceived in revised form6 December 2011ccepted 20 December 2011vailable online 29 December 2011

a b s t r a c t

In this work we propose a set of tools for the parallel application of pseudo-arclength continuation to aclass of systems for which the right hand side can be properly represented by a time numerically calcu-lated evolution operator. For example, the reverse flow reactor and the reactors network with periodicallyswitched inlet and outlet sections belong to this class of system. To conduct a dynamical analysis of thesesystems when the key parameters are changed, it is necessary to compute the eigenvalues of the Jaco-

eywords:eriodically forced chemical reactorsarameter continuationifurcation analysis

bian matrix many times. Since the Jacobian can only be obtained numerically, and this in turn takes awayreally significant computational power, running this operation in parallel saves real time of computation.Examples, solution lines and performance diagrams for selected systems are presented and discussed.

© 2011 Elsevier Ltd. All rights reserved.

arallel implementationarallelism

. Introduction

There is an increasing interest in chemical engineering towardshe application of bifurcation theory arsenal in order to obtain aomplete dynamical characterization of the mathematical modelf a system (for example a reactor plant). In order to properlyesign and control chemical reactors, it is necessary to accuratelyescribe all the regime conditions when relevant design and opera-ion parameters are changed. More generally, mathematical modelsf chemically reactive systems could exhibit complex regimeransitions marked by catastrophic bifurcations (sudden changesn temperature and/or concentrations) (Balakotaiah, Dommeti, &upta, 1999; Mancusi, Merola, Crescitelli, & Maffettone, 2000).ifurcation analysis and parametric continuation are the tools of
hoice in investigating dynamic features of non-linear systemsnd, particularly, in identifying multi-stability regions (e.g. Seydel,988). Numerical continuation is the process of solving systemsf nonlinear equations F(x,p) = 0 (or xn+1 − F(xn,p) = 0) for various
∗ Corresponding author.E-mail addresses: [email protected],

[email protected] (G. Continillo).1 Erasmo Mancusi is spending a period as Visiting Professor at the Universidade

ederal de Santa Catarina. The current address is Departamento de Engenhariauímica e Engenharia de Alimentos, Universidade Federal de Santa Catarina, Labo-

atório de Simulac ão Numérica de Sistemas Químicos, LABSIN, Campus Universitáriox. P. 476, 88.040-900 Florianópolis (SC), Brazil.

098-1354/$ – see front matter © 2011 Elsevier Ltd. All rights reserved.oi:10.1016/j.compchemeng.2011.12.016

values of a real parameter p. Such a technique requires at each stepthe computation of eigenvalues of the Jacobian matrix of the sys-tem in order to define the stability of the solution regime and thepossible appearance of a bifurcation (e.g. Kuznetsov, 1998). Thenumerical computation of these eigenvalues is one of the maintasks in such continuation algorithms. However, standard and pop-ular codes for automatic continuation, such as AUTO (Doedel et al.,1997, 2000), CONTENT (Kuznetsov et al., 1996), CONT (Schreiber& Marek, 1991), BIF-PACK (Seydel & Hlavácek, 1987) are generallyunsuitable for large scale systems. Moreover, these software pack-ages work efficiently only if the mathematical model of the systemhas an analytic expression for the Jacobian matrix.

Generally, the numerical computation of the eigenvalues is theslowest task in the execution of the continuation algorithm and,when the system under study has no analytic expression of theJacobian matrix, it must be computed numerically and the wholeprocess becomes even more time consuming. It is perhaps worthmentioning that there exist some methods of automatic differentia-tion and relevant software tools such as ADIFOR and ADIC (Bischofet al., 1992; Bischof, Roh, & Mauer-Oats, 1997). Main purpose ofthis software is to extract a computer code (usually in high-levellanguage code statements such as Fortran and/or C) for the evalu-ation of the analytical form of the derivatives of a vector functionexpressed. This obviously assumes that an analytical form of the
function exists. It should be made clear that we rather address ourattention to systems for which an analytical form of the functiondoes not exist and also the execution of the numerical derivativesrequires huge computational effort.
dx.doi.org/10.1016/j.compchemeng.2011.12.016

http://www.sciencedirect.com/science/journal/00981354

http://www.elsevier.com/locate/compchemeng

mailto:[email protected]

mailto:[email protected]

dx.doi.org/10.1016/j.compchemeng.2011.12.016

G. Continillo et al. / Computers and Chem

Nomenclature

Symbolsn order of reaction, dimension of the system, number

of speciesm nr. of estimations of Jacobian matrixns number of slavesPeM = wL/D Peclet number of mass balancePeS = wL�KcK/�k Peclet number of solid phasePeH = wL�GcG/�G Peclet number of fluid phasep number of processorsr = kCnA reaction speed (kmol/(m3 s))R gas constant (kJ/(kmol K))RHS right hand side of equations = serial CPU time/parallel CPU time speedupSt = Akq/(�cpF) Stanton numbert time (s)T temperature (K)V reactor volume (m3)u vector of variablesw velocity (m/s)q fraction of parallel codex vector of variablesz spatial coordinate (m)

Greek symbols˛ conversion

= (−�H)CA0/(T0�cp) dimensionless adiabatic tempera-ture rise

ı = Aqkpp/(�cpF) dimensionless heat transfer coefficient� = E/(RT0) dimensionless activation energyε porosityϕ�(x) evolution operator� thermal conductivity (W/(mK)) dimensionless temperatureR dimensionless spatial coordinate in reactorE = lE/LE dimensionless spatial coordinate in heat

exchanger� density (kg/m3)� = (1 − ε)�KcK/(�GcG) dimensionless heat volume of solid

phase�P = VP/VR dimensionless volume of products pipe of heat

exchanger�F = VF/VR dimensionless volume of feed pipe of heat

exchanger

Subscripts and superscriptsB backward perturbationE exchanger, end valueF feed, forward perturbationG fluid (gas)i number of speciesH heatM massP productR reactorS, K catalyst, solid, start

tiip

k number of iterated point (maps)

Parallel computation is the most promising approach to reduce
he computation time in complex numerical problems. Presence ofndependent computation tasks that can be conducted in parallels not the only condition to obtain successful implementation ofarallel algorithms. Numerical problems in chemical engineering
ical Engineering 38 (2012) 94– 105 95

can often be cast in such a way as to benefit from parallel imple-mentations of the algorithms. Simulation of distributed systemsis a typical field of application, when Internal Domain Decom-position techniques are employed to generate decoupled sets ofdifferential equations that can be integrated in parallel numerically(Kumar & Mazumder, 2010). Also, large scale parameter estima-tion problems have been tackled by decomposition methods (Liu& Wang, 2009) to convert large sets of ordinary differential equa-tions into decoupled algebraic equations. Their method not onlyreduces computation time, but also generates a set of uncoupledalgebraic equations that can therefore be efficiently solved in par-allel. A similar property can be obtained in large scale optimizationprocesses, when multiple shooting techniques are employed for therelevant DAE integrations (Leineweber, Schäfer, Boch, & Schlöder,2003): computations in each interval are completely decoupled andhence can be performed in parallel. Parallel computations may bealso efficiently applied when PDE equations governing macroscopicdistributed processes are not known, whereas evolution rules atthe microscopic/mesoscale scale are well known. Indeed, in suchcase short “bursts” of microscopic simulations may be run in par-allel while system level tasks, like stability analysis and bifurcationcalculations, are performed with the so-called “coarse timestep-per” approach without ever obtaining the equations in closed form(Armaou, Siettos, & Kevrekidis, 2004; Kevrekidis & Samaey, 2009;Siettos, Armaou, Makeev, & Kevrekidis, 2003). It should be notedthat, even if such method might be able to perform computationaltasks otherwise unaffordable, in the case of spatially distributedmacroscopic processes the “coarse time stepper” may be still veryhigh dimensional. In this framework, our approach may be usedto perform efficiently stability and bifurcation analysis at a macro-scopic level in a parallel computation environment when a timesimulator is used as “black box”.

Parallel computation can also change comparative performanceof different algorithms which can be used as alternatives to per-form the same task. An example can be found in the numericalsolution of boundary value problems, for which orthogonal colloca-tion methods are memory intensive whereas shooting methods arecomputationally intensive. Parallel shooting under proper circum-stances may make solution of large scale boundary value problemsaffordable.

To obtain convenient parallel speedup, the parallel fraction ofthe whole algorithm must be very high (Amdahl, 1967). For exam-ple, when bi analysis requires extensive numerical work spent tointegrate functions for independently changed integration bounds,and this task is dominant in the computing time, we are in afavorable condition for best performance of parallel algorithms.This condition is met for a wide class of systems, which include– but are not limited to – systems for which the Jacobian matrixmust be computed numerically via repeated independent numer-ical integrations. Several examples are found in the literature.Discontinuous periodically forced reactors such as Reversed-FlowReactors (RFR) and Reactor Networks (RN) were analyzed (Mancusi,Russo, Altimari, Maffettone, & Crescitelli, 2007; Mancusi, Russo,Altimari, & Crescitelli, 2010; Mancusi, Russo, Altimari, & Crescitelli,2011; Russo, Altimari, Mancusi, Maffettone, & Crescitelli, 2006;Russo, Mancusi, Maffettone, & Crescitelli, 2002) by conducting abifurcation analysis on a properly constructed discrete map, basedon the system’s Poincaré map. This map is not available in analyticform and thus it must be computed numerically. As a consequence,there is no analytical expression for the Jacobian matrix. Mancusi,Russo, and Continillo (2003) pointed out that most of the com-putation time is spent during repeated time integrations of the
map. By the way, they successfully used their method for contin-uation of various periodically forced systems for the constructionof solution diagrams. Jacobsen and Berezowski (1998), in order tostudy the dynamic behavior of ideal homogeneous reactors with

9 Chem

rtcit

acoate

•

•

ada

tapbiptF

2n

2

fa

x

wittasbaascrosME

x

sttpnt

6 G. Continillo et al. / Computers and

ecycle, showed that the system described by a set of PDE equa-ions can be efficiently approximated by a discrete map. Also in thisase the discrete map is not available in analytical form and thent must be computed numerically. Analysis of static bifurcation ishen possible.

In this work we implement and conduct parallel bifurcationnalysis of systems with require extensive independent numeri-al integration work. The method is applied by means of versionsf AUTO (Doedel et al., 1997, 2000) modified by the authors to oper-te in parallel. We report implementation details and results ofwo different applications, which are typical examples of chemicalngineering problems, namely:

Bifurcation analysis of a Reverse-Flow Reactor for the catalyticcombustion of lean gas mixture;bifurcation analysis of a network of connected catalytic reactorswith periodically switched inlet and outlet sections.

In the two cases the continuation software is applied to a suit-ble discrete system (Poincaré map). Results are presented andiscussed which include solution diagrams and evaluation of par-llel performance.

The article is structured as follows. First, two categories of sys-ems are described for which bifurcation analysis may benefit of

parallel approach. Then, a key task in our problems with higharallelism is described, i.e. the numerical computation of the Jaco-ian matrix for systems requiring extensive independent numerical

ntegrations. Then, application examples are illustrated in whicharallel algorithms are implemented, essentially highly compu-ationally intensive parameter continuation of model problems.inally, performance results are collected and discussed.

. Two kinds of systems involving many independentumerical integrations

.1. Study of discontinuous periodically forced systems

Discontinuous periodically forced systems, like those arisingrom modeling periodically forced reactors, can be formulated inn abstract form as:

˙ = f(x, p, g(T)) (1)

here g(T) represents the discontinuous forcing with period T, xs the state vector and p the vector parameter. The continuousime system can be studied via its Poincaré map. The details ofhis approach are described in Russo et al. (2002). The authorsdopt AUTO (Doedel et al., 1997) to conduct the bifurcation analy-is. AUTO can trace the fixed point locus of a discrete map, computeranches of stable and unstable fixed points for a discrete system,nd compute the Floquet multipliers that determine the stabilitylong these branches. Standard use of AUTO requires that the userupplies an analytic expression of the discrete-time system. In theirase, an analytic expression for the map is unavailable, thus theyesort to numerical evaluation of the map. The technique consistsf an interaction between AUTO (or any equivalent continuationoftware) and an ODE solver which efficiently evaluates the map.ore explicitly, for the continuous-time forced system reported in

q. (1), let the Poincaré map P be:

k+1 = P(xk, p) (2)

The map must be evaluated numerically. For periodically forcedystems the Poincaré map is easily constructed by sampling theime trajectory at each period T. Then, starting from xk, the equa-
ions of the continuous-time system are integrated over a timeeriod T and the result can be assumed as initial condition of aew time integration of the equations of the continuous-time sys-em, again on a time interval T. Numerically, the continuation of the
ical Engineering 38 (2012) 94– 105

fixed points of the map is conducted with calls from the AUTO mainroutine to an external integrator. The vector state of the system issent to the integrator, which sends it back after a time equal to T,for the one-iterate.

The bifurcation analysis is conducted by solving the followingalgebraic vector equation:

x − P(x, p) = F(x, ϕT (x), p) = 0 (3)

where ϕT (x) represents the time integration operator. Of course,since the map P has no analytical expression, the application ofthe Newton–Raphson method to Eq. (3) requires that both F and itsJacobian be computed numerically. This involves expensive numer-ical integrations in a number that grows with the square of the orderof the reduced dynamical system.

2.2. Study of the stationary solutions of distributed chemicalreactors

The problem of finding stationary solutions of nonlinear dis-tributed dynamical systems like those arising from distributedchemical reactors can be often formulated as a boundary valueproblem. For a reaction–diffusion–convection system, the abstractform can be:⎧⎪⎪⎪⎨⎪⎪⎪⎩

f

(d2xd2

,dxd, x, p

)= 0 ∈ [0, 1]

B

(dx

d

∣∣∣1,dx

d

∣∣∣0, x(1), x(0), p

)= 0

(4)

where f is a system of differential equations in the spatial vari-able and B are the associated boundary conditions. In some cases,systems of the type (4) can be transformed into:

F(u, ϕ(x), p) = 0 (5)

and treated like a two-point boundary value problem (Berezowski,2000). In Eq. (5) u is the subset of the state variables which are notgiven at the, say, right boundary, ϕ(x) represents the integrationoperator along the spatial variable , and p is the parameter vector.Starting conditions for u are given as:

= 0

⎡⎢⎣xS1xS2· · ·xSm

⎤⎥⎦ =

⎡⎢⎣u1u2· · ·um

⎤⎥⎦ (6)

The remaining state variables must be determined as solu-tions of Eq. (5) so that the full stationary vector state xS is finallycomputed. The problem is thus recast as a continuation problem,starting from a known stationary steady-state, in which solutiondiagrams are computed via pseudo arc length continuation. Thismethod involves many Newton–Raphson iterations and the Jaco-bian of F again is not available in analytical form and must becomputed numerically.

3. Parallel numerical computation of the Jacobian matrix

A pseudo arc length continuation algorithm (Keller, 1977) con-sists essentially of the following steps:

• Prediction.• Pseudo arc length continuation with step size control. During this

task there are m evaluations of the Jacobian matrix for the Newtonsolver, where m ≥ 1 depends on the chosen continuation step size
and on numerical tolerances.
• Correction.• Detection of bifurcation point. During this task there is one evalu-

ation of the Jacobian matrix. In general, each point of the solution

Chem

FtePat

1

2

3

4c

irt

•

o

G. Continillo et al. / Computers and

(or bifurcation) line takes two or more evaluations of the Jacobianmatrix.

Whenever a Jacobian is to be evaluated for a vector function(x, p) for which no analytical expression is available, deriva-ives are to be computed numerically. In our examples, functionvaluations include numerical integration of given expressions.arallelism applies to the numerical computation of derivatives. Asn example, the following steps are necessary in order to computehe Jacobian matrix by means of second order difference operators:

. Prepare 2n perturbed vectors of starting conditions:

x1F =

⎡⎢⎣x1 + εx2· · ·xn

⎤⎥⎦ , x1B =

⎡⎢⎣x1 − εx2· · ·xn

⎤⎥⎦ , . . . , xnF =

⎡⎢⎣

x1x2· · ·xn + ε

⎤⎥⎦ ,

xnB =

⎡⎢⎣

x1x2· · ·xn − ε

⎤⎥⎦ (7)

. Compute values of right hand sides with suitable perturbed con-ditions. Each of these operations takes significant computationpower.

FiB = F(xiB), FiF = F(xiF ), i = 1, 2, . . . , n (8)

. Compute the numerical derivatives by second order finitedifference operators. These operations do not require much com-putation power.

∂F(x)∂xi

= FiF − FiB2ε

, i = 1, 2, . . . , n (9)

where n is the dimension of the system (number of state vari-ables) whereas ε is a small magnitude arbitrary perturbation, forexample:

ε = 10−7(1 + maxi

(Pi)) (10)

In Eq. (7) subscript F means forward, B means backward pertur-bation. It is worth to stress here that all the operations expressedby Eqs. (7)–(9) are intrinsically independent and might be runin parallel. However, tasks involved by Eqs. (7) and (9) wouldrequire too much communication compared to the amount ofcomputation, therefore it is convenient to run them as serial onthe master processor. Task (8) takes almost the entire compu-tation time due to the significant time consumption requiredby the numerical integration operator ϕ. These operations aretherefore run in parallel, by distributing them among availableprocessors, for example, F1F on processor 1, F2F on processor 2,F3F on processor 3, etc. It appears that, in this case, the maximumnumber of processors that can be usefully involved is limited bytwice the dimension of the system.

. Realization of the parallel software to conductontinuation analysis

We used MPI and PTHREADS parallel libraries for distribut-ng tasks around the cluster. There are two possible computationegimes, namely synchronous and asynchronous. For each of thewo regimes a set of parallel subroutines is prepared.

Synchronous regime of computation.

In the synchronous regime of computation, the performancef the algorithm is limited by the slowest processor. This regime


is then designed for homogeneous clusters and for shared mem-ory Symmetric Multi Processing (SMP) machines. Generally in suchsystems all processors have the same performance and thereforethe job is equally divided among processors. Delivery of results isexpected to occur synchronously. The synchronous regime givesspeedup starting from a 2-processor system.

For large systems it is expected that the delay of the net-work connection does not play a significant role. To investigatethis aspect, a comparison between a shared memory SymmetricMulti Processor (SMP) machine and a distributed cluster is con-ducted. SMP machines are composed by few high-performancenodes and latency is very small, as it is governed by the memory busspeed, whereas typically distributed-memory machines (clusters)are made by many medium-performance nodes connected by a net-work. If the parallel speedup is comparable between the two kindsof architectures, it will be concluded that communication speed isnot too important in our problems.

In both cases it makes sense to involve the master processorin parallel computations, since otherwise it would be running idlewhile awaiting for results to be delivered. For SMP systems (rela-tively small number of processors) the computing power employedwith inclusion of the master is increased significantly.

• Asynchronous regime of computation.

This regime is best suited for heterogeneous clusters or GRIDcomputing, because it distributes the job dynamically for eachnode. So for example when slower and faster computers are usedor the external load on the cluster is unknown/unpredictable, thealgorithm behaves similarly to peer-to-peer networks. The speedin general is not limited by the slowest processor. This conceptrequires at least three processors to yield speedup (one master andtwo slaves). Since the work is partitioned into 2n independent sim-ulations, this concept works well when the dimension of the systemis larger than half of the number of available nodes, otherwise, alljobs would be allocated at the first distribution, dynamic allocationwould not be possible and speedup would again be limited by theslowest processor.

The software was prepared and tested for a distributed-memoryarchitecture. However, the same concepts can be used for sharedmemory architectures. In fact, for the AUTO2000 version of thepackage, a parallel version was prepared by means of the PTHREADSlibrary for shared memory machines. The software was run on aLinux cluster equipped with ROCKS 3.2.0 operating system andLAM-MPI 7.0.6 (Burns, Daoud, & Vaigl, 1994). For tests we useda system built by eight processor shared memory machine Intel®

XeonTM MP CPU 2.80 GHz 8 Gb, memory plus 6 individual nodes1.8 GHz 500 Mb RAM. The network cards are 100 GbE.

4.1. Synchronous approach

For the synchronous approach, the master process realizes themain continuation process. During the execution of the Jacobiansubroutine, the master prepares perturbed starting conditions andbroadcasts them to available slaves. Slaves do the simulations.During these simulations the master also does simulation. Aftersimulations are completed, the master collects the vector of vari-ables and prepares next round of job for slaves and for itself. This isdone until all 2n simulations are done, then the master computes
the Jacobian. As shown in Fig. 1, all nodes do the same numberof jobs. The same number of jobs (particularly, evaluations of theRHS) are done on the master. Fig. 1 reports a schematic of thesynchronous algorithm.

98 G. Continillo et al. / Computers and Chemical Engineering 38 (2012) 94– 105

simulation on master

Gatherresu lts,

computederivatives

Main co ntinuati on ta sk on master

Prepar e an d broa dcast the new pertu rbed

values toslaves

and m aster

Is the Jaco bian

computed?

no yes

simulation on slave 1

simulatio n on slave 2

simulati on on slave n



Gatherresu lts,

computederivatives



values toslaves

and m aster







values toslaves

and m aster





values toslaves

and m aster


values toslaves

and m aster










4

otbt(tspfwnons

4

tcpio

Send “S TOP” toslaves

Run master computation(parameter

continu ation )

Stop sl ave 1 St op slave 2 Stop sl ave nReceive fro m sl avesand store: name of proces sor, nu mber

of jobs, c omputat iontime

Comp ute Jaco bianmat rix and spawn

RHS evaluation s toslaves



continu ation )

Stop sl ave 1 St op slave 2 Stop sl ave nReceive fro m sl avesand store: name of proces sor, nu mber




continu ation )

Stop sl ave 1 St op slave 2 Stop sl ave nStop sl ave 1Stop sl ave 1 St op slave 2Stop slave 2 Stop sl ave nStop sl ave n Receive fro m sl avesand store: name of proces sor, nu mber


Comp ute Jaco bianmat rix and spawn

RHS evaluation s toslaves

Fig. 1. Sketch of synchronous Jacobian subroutine.

.2. Asynchronous approach

In the asynchronous approach (Fig. 2) the master process, aspposed to the synchronous approach, does not work on simula-ions, but rather waits for results from the simulations conductedy the slaves. Thus, the master process prepares suitable per-urbed starting conditions (2n) and broadcasts them to availablens) slaves. As soon as the result from slave k is received, the mas-er sends a new job to the same slave (k). This is done until all 2nimulations are done (value of m achieves 2n). Next, the masterrocess computes the Jacobian matrix. This procedure works wellor distributed-memory machines, heterogeneous clusters and alsohen there is large latency among some nodes, because speed isot limited by the slowest processor or the slowest connection. Tobtain speedup, it needs more than two processors. On the figure:

– dimension of the system, m – work counter, ns – number oflaves, k – slave number.

.3. PTHREADS approach

We used also the PTHREADS library in order to make it possibleo use the parallel AUTO2000 on shared memory computers. The
orresponding software can be useful on a multiprocessor com-uter where MPI is not installed. This subroutine is asynchronous
n the sense that all slaves are created at the beginning and allocatedn available processors by the system. In this implementation there

Fig. 2. Sketch of asynchronous Jacobian subroutine.

Fig. 3. Sketch of subroutine “master”.

is no control on which processor will realize each particular task.Tests run on a 4-processor SMP machine showed that speedup ofthis approach is very close to that obtained on synchronous MPI-based subroutines run on the same shared memory machine. Itshould be pointed out that the speedup depends on number ofPTHREADS processes – threads.

4.4. Subroutine MASTER

This subroutine is run on the main computer of the cluster or onany of the processors on multiprocessor shared memory machines.This subroutine runs the main computational task (in our examples,parameter continuation) as the master process. A block diagram ispresented in Fig. 3. Note that the subroutine for Jacobian matrix,both for the synchronous and asynchronous approach, is run by themaster process, which in turn spawns parallel integration jobs toslaves. After the computation is finished, the master process sendsan “end” tag to slaves, collects the working data (number of RHScalls, name of slave processor) and possibly outputs computationstatistics.

4.5. Subroutine slave

Subroutines “slave” are run on slave nodes. They work in parallel
and, obviously, independently from each other. The block diagramis given in Fig. 4 This subroutine is designed as a loop which canbe terminated only by the master process. When the “work” tag isreceived, the subroutine receives a vector of variables and a vector
yes

no

Send number of jobs and name of

processor to ma ster process

Exit

Receive infor mation(starti ng co nditi ons) from mast er proc ess

Run RH S su brouti ne(simulator)

with receivedstarti ng con ditions

Send RHS valuesto master pr ocess

jobs =jobs +1Inf ormation

= STOP

yes

no



Exit





jobs =jobs +1

yes

no

yes

no



Exit





jobs =jobs +1



Exit



Exit





jobs =jobs +1





jobs =jobs +1




jobs =jobs +1

Fig. 4. Sketch of subroutine “slave”.

G. Continillo et al. / Computers and Chemical Engineering 38 (2012) 94– 105 99

Table 1Determination of Gustafson bottleneck of a sample computation of a scalable reverse flow reactor.

Dimension of system Time serial (s) Time parallel (s)(no. of nodes)

Speedup sG Gustafsonbottleneck qG

10 38.35 21.6 (2) 1.775 0.2245.7 (4).6 (8).6 (16

oip(m

5

oS

••

sattd

acirt

s

T

q

cf

FF

20 119.6 3640 471.0 7280 1624 121

f parameters from the master process. Next, the RHS subroutines called and the result – vector of RHS values is sent to the masterrocess. When a slave receives an “end” tag, it sends statistics datanumber of calls of RHS subroutine and name of processor) to the

aster and exits.

. Speedup

Parallel speedup is defined as the ratio between the real timef a serial computation and real time of a parallel computation.peedup in general depends on the system, especially on:

difficulty of numerical evaluation of the RHS;scale of the system.

A general rule of thumb is that larger systems exhibit betterpeedup, and systems more difficult (time consuming) to integratelso lead to better speedup. This is obvious with respect to the facthat larger and more difficult systems typically lead to a larger frac-ion of computation work to be conducted on each processor beforeata intercommunication is required.

The simplest definition of speedup is due to Amdahl (1967). For given, fixed size problem, and assuming infinite intercommuni-ation speed among processors (no latency), Amdahl derived thedeal speedup s that can be obtained on p processors by an algo-ithm with fraction q of parallel tasks and (1 − q) of serial tasks, inhe simple form:

= p

q + (1 − q)p(11)

he reverse form of Eq. (11),

p(s − 1)
=s(p − 1)
an be used to estimate the parallel fraction q of a given algorithmrom the measured value of the speedup s.

ig. 5. Dependence of speedup from system scale, bifurcation analysis of a Reverse-low reactor.

3.258 0.2470 6.487 0.2160) 13.35 0.1763

Fig. 5 shows parallel performance comparison as a function ofthe scale of the system for one case of the Reverse-Flow Reactormodel problem described later.

Amdahl’s law expresses the obvious concept that, for a givenproblem, no matter how fast one performs the parallel fractionof some algorithm, the total time cannot be any shorter than thetime required by the sequential (serial) part of the algorithm. Thisreflects into the fact that, no matter how large p, s can never belarger than s∞, given by

s∞ = limp→∞

s = 11 − q

This observation has led to skepticism towards massive paral-lelism. However, as observed by Gustafson (1988), when given amore powerful processor, programmers rather use their controlover grid resolution, number of time steps, difference operatorcomplexity and other parameters that are usually adjusted, to allowthe program to be run in some desired amount of time. The size ofthe problem then expands to make use of the increased facilities.Hence, it may be more realistic to assume that run time, not prob-lem size, is constant. In this view, predictions of speedup based onAmdahl’s law are not too realistic. To incorporate this observation,Gustafson introduced the concept of “scalable speedup” in whichthe fraction of serial code (scaled serial fraction) is no longer fixedbut scales with the size of the system. Gustafson speedup sG , againunder the zero-latency assumption, is then given by Gustafson(1988):

sG = p − (p − 1)qG (12)

where qG is the serial fraction of the code, also called Gustafsonbottleneck, which can be estimated by means of the reverse formof Eq. (12):

qG = p − sGp − 1

Thus, by running a test we can simply estimate the Gustafsonbottleneck by knowing the number of processors and the measuredspeedup. Following Gustafson, the test is run by increasing the sizeof the problem proportionally to the number of available nodes.

Table 1 shows a timing of a sample computation of a scalablereverse flow reactor model. As expected, the estimated serial frac-tion of the code decreases (parallel performance increase) as theproblem size is increased. The average Gustafson serial fraction ofcode is qG = 0.2160. This value might be used to predict the possiblespeedup of larger systems run on correspondingly larger parallelmachines (Eq. (12)). However, hereafter we chose to test func-tionality of our parallel approaches following Amdahl’s paradigm(fixed size of problem, increasing size of cluster) since it poses moreemphasis on parallel efficiency.

6. Examples

Non-ideal (distributed-parameter) models of chemical reac-
tors are necessary to take into account in order to accuratelydescribed dynamical behavior. Therefore, chemical reactors mod-els are frequently described by sets of nonlinear PDEs. A setof PDEs represents an infinite dimensional problem. The most

1 Chem

ud&wod{

wcFg

P

P

wc

˙

ijafpf1

{

w

u

ifiTsstfiDa

6

firtsim

0


sed approach to model reduction in chemical engineering is toiscretize the spatial domain (e.g. Canuto, Hussaini, Quarteroni,

Zang, 1988). Traditional reduction schemes (finite elements,eighted residuals, etc.) lead to the determination of large sets

f ODEs which can be written in abstract form as the followingynamical system:

x = f(x,p, g(t))B(x, p) = 0

(13)

here F is the right hand side of PDE system, B are the boundaryonditions, x is the state vector and p is the parameters vector.inally g(t) is time function that can represent a disturbance or moreenerally a periodic forcing.

Any non-autonomous T-periodic system can be studied with theoincaré map P (e.g. Kuznetsov, 1998):

: u0 ∈ → ˚T (u0, 0) ∈ ˙

here u is the vector of state variables, ( is a hyperplane (the so-alled Poincaré section) orthogonal to the whole system orbits:

:={u ∈X : um+1 = u}, u∈ [0, 2 [, with X = [0, 2 [×Rm.

In the following we always refer to the section with u = 0, and ˚t

s the evolution operator of the dynamical system at t = T. Any tra-ectory intersects orthogonally the Poincaré section ( every T-time,nd the Poincaré map P merely tracks any initial conditions u0 ∈ (or t = 0 after a period T. Thus, one can study the continuous-timeeriodic system with the globally equivalent Poincaré map. There-ore, the continuous-time system is “equivalent” (e.g. Kuznetsov,998) to the discrete-time system:

˙, {Pk}k ∈ Z },

here Pk is the k-iterate of the map, and:

k = P(uk − 1) = Pku0 = ˚kT (u0, 0) (14)

In general, a periodic orbit of a continuous time system mayntersect a Poincaré section k times before closing onto itself:xed points of P correspond univocally to periodic orbits of period

= n� of the continuous-time system; fixed points of Pk corre-pond to k-th-order subharmonic solutions of the continuous-timeystem. The stability of a periodic orbit of the continuous-time sys-em is determined by studying the stability of the correspondingxed point of the associated map according to the Floquet theory.etails of the construction of the discrete dynamical system (13)re described in Russo et al. (2002).

.1. Discrete dynamical system: Reverse-Flow

We consider a first order exothermic reaction occurring in axed catalytic reactor where the flow direction is periodicallyeverted. The fixed-bed reactor is modeled as a heterogeneous sys-em with heat and mass transfer resistance between the gas and theolid phase, axial dispersion in the gas phase, axial heat conductionn the solid phase, and cooling at the reactor wall. The mathematical

odel for the RFR reads (Rehácek, Kubícek, & Marek, 1998):Mass balance in the gas phase

∂˛

∂t+ (2g(t) − 1)

∂˛

∂R= 1PeM

∂2˛

∂R2

+ ıMG (˛S − ˛) (15)

Mass balance in the solid phase

= Da(1 − ˛S)n exp

(�

ˇG1 + ˇG

)+ ıMS ( − ˛S) (16)


Heat balance in the gas phase:

∂G∂t

+ (2g(t) − 1)dGdR

= 1PeH

d2Gd2R

+ ıG(S − G) + ıGS (H − G) (17)

Heat balance in the solid phase:

�∂S∂t

= 1PeS

d2Sd2R

+ �(˛S, S) + ıSS(G − S) (18)

The periodic inversion of the flow direction is modeled by thefollowing square wave function:

g(t) =

⎧⎨⎩ 1 if

T

�mod(2) < 1

0 ifT

�mod(2) > 1

(19)

It is apparent that the vector field changes discontinuously intime, and it recovers the same form after a time n� = T. Indeed, g(t)is a discontinuous periodic function with minimum period T, andthe non-autonomous system Eqs. (15)–(19) is T-periodic.

The following boundary conditions for concentration in the gasphase and temperature in the solid and gas phases are assumed:

= 0 :

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

− 1PeM

d˛

dR= 0

G − 1PeH

dGdR

= 0

dSdR

= 0

, = 1 :

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

d˛

dR= 0

dGdR

= 0

dSdR

= 0

(20)

Eqs. (15)–(20) are dimensionless with axial coordinate definedas R = x/L, time t = vt/L, gas phase conversion = (C0 − Cg)/C0,conversion on the catalyst ˛s = (C0 − Cs)/C0, gas phase temper-ature g = E(T − T0)/RT2

0 , and catalyst temperature s = E(Ts −T0)/RT2

0 . The definition of all dimensionless parameters and theirvalues are:

PeM = 317.0, PeH = 644.0, PeS = 115.08, H = −0.79,

Da = 0.56, � = 1251.0.0, ı = 0.72, � = 16.68,

ˇ = 0.7282, ıMG = 17.5, ıMS = 22.9, ıHS = 28.4, ıHG = 28.4.

For subsequent numerical investigation, the infinite dimen-sional PDE system Eqs. (15)–(20) is reduced to a set of 36 ODEsby orthogonal collocation technique on finite elements (Villadsen& Michelsen, 1978). The domain of each reactor has been dividedinto three blocks, and four collocation points are used in each subdomain.

The stability of a periodic orbit of the system Eqs. (15)–(20) isassessed by studying the stability of the corresponding fixed pointof the associated P map (Eq. (14)).

A typical solution diagram is represented in Fig. 6. Each pointrepresents a T-periodic regime, stable periodic regimes are shownas solid lines, and unstable regimes are shown as dashed lines. Thecomputation of the stability characteristics of each point on thediagram reported in Fig. 6 needs several estimations of the Jaco-bian matrix of the discrete system. The corresponding performancediagrams and computing times are presented in Figs. 7 and 8.

It should be underlined that large scale systems with integrationoperator are characterized by little needs of intercommunicationamong processors. The measure on our homogeneous clus com-pared with a shared memory 4-processor machine shows thatspeedup in the two cases is similar. The average estimated frac-
tion of parallel code in the case of distributed memory machine is0.9725 (Table 2). The estimated fraction of parallel code in the caseof shared memory machine is 0.9866. Difference in the estimatesis an indication of the deviation from ideal behavior of processors,


τ200 300 40 0 500 60 0 700

α(1)

0.993

0.994

0.995

0.996

0.997

0.998

0.999

1.000

StableUnstablePitchfork

Fig. 6. Solution diagram, Reverse-Flow Reactor model. Conversion in the first node˛ vs. switch period �.

Fig. 7. Speedup, RFR model, distributed cluster.

Fm

acda

c

Table 2Computing time, speedup and estimated fraction of parallel code (Amdahlparadigm).

Number ofprocessors

Elapsedtime (h)

Speedup Estimatedfraction q

1 (serial) 10.81 1.0002 5.551 1.949 0.9744 2.939 3.680 0.9716 2.057 5.278 0.9738 1.614 6.703 0.972

Table 3Timing for heterogeneous case.

Processor State Number of jobs Time (s)

1 Free 528105.42 Free 529

3 Free 5271 Free 556

131.02 Free 5293 Busy 499

is lower than that of the free processor. The resulting final numberof jobs completed for each processor and the corresponding tim-ing are reported in Table 3. In the table, “free” state means thatthe entire computation power of the processor is dedicated to the

ig. 8. Speedup comparison, RFR model, distributed cluster and shared memoryachine.

ssumed to derive Amdahl’s law in the simple form of Eq. (11), as aonsequence of latencies due to intercommunication processes on
istributed systems. The difference visible in Fig. 8 corresponds tobout 1.4%.
Fig. 9 shows the prediction of Amdahl’s speedup on largeromputers by means of Eq. (11). The two lines correspond to

1 Free 590151.12 Busy 495

3 Busy 499

the estimated fraction of the parallel code from shared memorymachine and distributed cluster. As a general remark, it is seen thatideal performance estimated by Amdahl’s law for larger configura-tions is highly sensitive to even small changes in the fraction (1 − q)of serial code. For a given size problem, increasing the number ofnodes beyond a certain value does not provide significant benefits.However, if the size of the problem is correspondingly increased(Gustafson, 1988), parallel performance can also increase.

Previous computation results all refer to the synchronousapproach run on a shared memory machine or on a homogeneouscluster. The asynchronous computation approach mentioned ear-lier along with the relevant software was also developed and tested.Table 3 presents a sample computation of a portion part of thesolution line reported in Fig. 6.

The computing platform employed has different availableprocessors power. More precisely, a heterogeneous cluster wassimulated by a homogeneous four processor system in which oneor more processors were loaded with additional serial computa-tion tasks, so that the available computation power of a processor

Fig. 9. Prediction of the speedup on larger computers based on estimates of theparallel fraction of the code.

102 G. Continillo et al. / Computers and Chemical Engineering 38 (2012) 94– 105

Table 4Dimensionless parameters with Tin = 100 ◦C, T0 = 200 ◦C, u0 = 1 m/s, z0 = 1 m (Mancusiet al., 2010).

pbptscds

6

prs(tsvain

o

am

C

dR⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩ r i

wp

�

τ4540353025

α(1)

0.0

0.1

0.2

0.3

0.4

0.5

S1

S2

�)]xin

1)�

B = 16 ϑin = −8.2 Peh = 413 Pem = 390 Da = 0.0017Le = 27 � = 39 � = 1.64 = 3.28 L = 1/2

arallel process, “busy” means that the processor is partially loadedy an additional serial process. It is seen that, obviously, the besterformance is achieved when all processors are free, while theotal computing time progressively increases as one or two proces-ors are partially busy. In all cases the computation was successfullyompleted while the asynchronous approach permitted an unevenistribution of number of jobs among differently loaded processorso that none of them was running idle.

.2. Discrete dynamical system: a network of four reactors

A Reactor Network (RN) of four connected catalytic reactors witheriodically switched inlet and outlet sections is selected as rep-esentative process. The forcing strategy consists of periodicallywitching the feed to the second reactor of the reactors sequenceMancusi et al., 2010, 2011). Namely, the reactors are fed accordingo the sequence 1-2-3-4 during the time interval [0, � [, � being thewitch (or cycle) time, and after the first switch, during the inter-al [�, 2� [, the reactors are fed according to the sequence 2-3-4-1nd so on until to recover the initial feed configuration 1-2-3-4. Its worth to note that the forced network is T-periodic, that is theetwork recovers the initial configuration after a time T = 4�.

The mathematical model of each fixed bed of the RN consists of ane-dimensional pseudo-homogeneous model taking into account

xial mass and energy dispersive transport. The RN performs theethanol synthesis according to the following reaction:

O + 2H2

r1�r−1

CH3OH

Under the assumption of first order reversible reaction, theimensionless mass and energy balances for the i-th reactor of theN read as follows:

Le∂ϑi∂t

+ v∂ϑi∂

= 1Peh

∂2ϑi∂2

+ Br(xi, ϑi)

∂xi∂t

+ v∂xi∂

= 1Pem

∂2xi∂2

− r(xi, ϑi)

r(xi, ϑi) = Da exp(ϑi�

ϑi + �

)(1 − xi

(1 +

((1 − �)�2

ϑi + �

)))(fo

ith the following definitions for dimensionless variables andarameters:

= E2

E1; = A2

A1; = z

z0; t = tu0

z0; ϑ = �

T − T0

T; x = 1 − C

C0;

� = E; vs = u

; B = (−�H)C0� ; Da = Az0 exp(−�);

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

1Pem

∂xi∂

∣∣∣∣0

= −[1 − f (t − (i − 1)

1Peh

∂i∂

∣∣∣∣0

= −[1 − f (t − (t − (i −

∂xi∂

∣∣∣∣1

∂i∂

∣∣∣∣1

= 0

RT u0 (�cp)f T0 uo

Le(�cp)eff(�cp)f

; Peh =(�cp)f zouo

ke; Pem =

(�cp)f zouoDf

. (22)

= 1, 2, 3, 4)

(21)

Fig. 10. The T-periodic solution diagram with the switch time, �, as bifurcationparameter.

The values of the model parameters and their description arereported in Table 4.

As in the case of Reverse flow reactor previously discussed, theforcing enters the model through the boundary conditions. Theforcing is implemented with a discontinuous periodic wave func-tion:

g(t) ={

0 if 0 ≤ (mod4) < 1

1 ift

�(mod4) > 1

(23)

Thus the boundary conditions are:

− f (t − 1(i − 1)�)xi−1(1, t) + xi(0, t)

]in − f (t − (i − 1)�)i−1(1, t) + i(0, t) (24)

The infinite dimensional PDE system (Eqs. (21)–(24)) has beenreduced to a set of 128 ordinary differential equations (ODEs) byorthogonal collocation on finite elements (Villadsen & Michelsen,1978). A coarser grid has been used giving rise to a smaller set ofequations (80 ordinary differential equations) to study the systemdimensions effect on the speedup.

A typical solution diagram of the system presented above isreported in Fig. 10. The diagram was obtained by performing

the parameter continuation of the full 128 equations system. Theignited (high conversion) solutions form an isola bounded by twocatastrophic saddle node bifurcation points, while the wash-outsolution is not reported. As the switch time is varied within therange �S1 < � < �S2, stable high conversion T-periodic regimes coex-ist with stable non ignited T-periodic regimes.

The size of the system will not affect, qualitatively and quanti-
tatively, the dynamic behavior. That is, the solution diagrams of thecoarser system (80 equations) coincide whit the solution diagramreported in Fig. 10. The computation time required by a single pro-cessor to obtain the solution diagram reported in Fig. 10 is 56.4 h


Table 5Computing time and speedup for synchronous and asynchronous regimes respectively.

Number of processors 128 equations case 80 equations case

Asynchronous Synchronous Asynchronous Synchronous

Elapsed time (h) Speedup Elapsed time (h) Speedup Elapsed time (h) Speedup Elapsed time (h) Speedup

1 (serial) 56.39 – – – 11.08 – – –2 64.49 0.87 33.23 1.70 12.48 0.89 6.44 1.723 32.49 1.74 22.94 2.46 6.26 1.77 4.37 2.544 21.89 2.58 17.37 3.25 4.27 2.59 3.32 3.345 16.56 3.40 14.34 3.93 3.25 3.41 2.70 4.106 13.36 4.22 12.05 4.68 2.65 4.18 2.32 4.797 10.54 5.35 10.40 5.42 2.14 5.19 2.00 5.558 8.71 6.48 9.09 6.20 1.78 6.23 1.75 6.329 7.47 7.55 8.25 6.84 1.55 7.17 1.59 6.9510 6.52 8.65 7.47 7.55 1.37 8.09 1.44 7.7111 5.85 9.64 6.94 8.12 1.24 8.95 1.36 8.1712 5.29 10.67 6.38 8.84 1.14 9.76 1.28 8.6913 4.83 11.68 5.82 9.70 1.02 10.88 1.19 9.3214 4.51 12.50 5.57 10.13 0.96 11.55 1.12 9.9215 4.16 13.56 5.36 10.53 0.91 12.18 1.04 10.7016 3.91 14.41 4.83 11.68 0.85 13.09 0.96 11.6017 3.67 15.38 4.76 11.85 0.79 14.06 0.96 11.6018 3.47 16.23 4.51 12.50 0.78 14.22 0.88 12.6219 3.29 17.13 4.26 13.22 0.73 15.28 0.87 12.6820 3.11 18.14 3.98 14.16 0.70 15.87 0.79 13.9821 3.01 18.74 3.95 14.29 0.68 16.40 0.79 13.9822 2.91 19.35 3.74 15.09 0.66 16.73 0.80 13.9023 2.80 20.15 3.74 15.09 0.62 17.83 0.72 15.4724 2.64 21.39 3.47 16.26 0.62 17.83 0.72 15.4725 2.56 22.04 3.45 16.36 0.61 18.09 0.72 15.4726 2.50 22.54 3.15 17.88 0.55 20.00 0.72 15.4727 2.43 23.19 3.18 17.76 0.55 20.00 0.64 17.45

171919

wm

F

p8scooa

ae

28 2.33 24.21 3.18

29 2.29 24.62 2.93

30 2.27 24.88 2.93

hen the 128 ODEs model is considered and 11.1 h for the 80 ODEsodel.The corresponding performance diagrams are presented in

ig. 11a and b for 128 and 80 equations systems respectively.In Fig. 11 comparison between speedups of two algorithms are

resented – asynchronous and synchronous – for both 128 and0 equations cases. It is worth to note that, independently on theystem dimension, for a number of processors lower than 8 the syn-hronous algorithm has better performance. A better insight can bebtained by analyzing the data reported in Table 5 where the exactbserved values of time consumption and speedup for both casesre presented.

As it was previously mentioned, the synchronous process usesll processors for Jacobian calculation so the entire resource isxploited. But this regime of work is characterized by synchronous

nr of nodes

302520151050

spee

dup

0

5

10

15

20

25

30SYNCRONOUSASYNCRONOUS

a

Fig. 11. Speedup of distributed cluster for synchronous and asynchro

.74 0.55 20.16 0.64 17.45

.25 0.54 20.67 0.64 17.45

.28 0.50 22.16 0.64 17.45

job distribution and synchronous feedback from all slave nodes. Itmeans that, in the case that any slave process temporary worksslower, the entire process awaits for it. This evidently leads todecreasing speedup for higher number of nodes. Therefore, the syn-chronous algorithm works more efficiently with a small number ofclustered processors or with a shared memory machine. By default,current shared memory machines are characterized by a relativelysmall number of processors. The asynchronous algorithm is char-acterized by dynamic work distribution by one (master) process toavailable slave nodes regardless of computing power characterizingeach node. The slowest nodes do not limit the entire computationspeed. The master process is excluded from calculating the Jaco-
bian matrix. The synchronous algorithm works very efficiently forlarge scale distributed cluster because is characterized by optimaljob distribution. In the analyzed case, the asynchronous algorithm
nr of nodes

302520151050

b

nous continuation: (a) 128 ODEs system; (b) 80 ODEs system.

1 Chem

wnlt

nbe1lit11p7btsapal

7

ba

•••

iadrtndwloaorvsawcadeaacota


orks better than the synchronous for 8 and more processors. It isecessary to point out that parallel execution is more effective for

arger problems – the 128 equations case express higher speeduphan the 80 equations one.

The stair step shape of the synchronous speedup curves for largeumber of nodes is caused by the splitting of the natural num-er of jobs (simulations) by the natural number of processors. Forxample here we examine the case of 80 equations case, 16 and7 processors. All simulations are done by several rounds of calcu-

ation. Therefore, in order to calculate derivatives of 80 equationst is necessary to make 160 computations. Thus, if 160 computa-ions are distributed among 16 processors, it is necessary to make0 rounds of calculations. When the same problem is calculated by7 processors, it would be necessary to make 9.411 rounds – morerecisely 17 processors will calculate 9 rounds, and the remaining

jobs need a last, 10th round. For 18 processors 8.88 rounds woulde necessary but, more precisely, 18 processors work for 8 rounds,he remaining 16 jobs need a last, 9th round. So, 16 and 17 proces-ors work for 10 rounds, 18 processors for 9 rounds. Note that thesynchronous algorithm is also partially affected by the mentionedhenomenon, nonetheless we can affirm that the asynchronouslgorithm is more efficient than the synchronous algorithm for aarge scale distributed cluster.

. Conclusions and future work

We have proposed a parallel computation approach for theifurcation analysis of a class of systems. The systems are char-cterized by:

Numerically expensive evaluation of the right hand side (RHS).Impossibility of obtaining analytical derivatives of the RHS.High degree of parallelism in the computational opera related tosensitivity matrix or parameter derivatives of the systems.

We propose the parallel computation of these derivatives andllustrate our proposal by running parallel bifurcation analysisnd/or parameter continuation of mathematical models based onifference equations and algebraic equations. We found that, forelatively large systems (our largest case shown consists of a sys-em of 128 difference equations), speedup is almost linear and doesot depend much on the computer architecture. One test was con-ucted by following Gustafson’s paradigm (size of problem scaledith size of machine) while all other tests were conducted by fol-

owing Amdahl’s paradigm (fixed size of problem for varied sizef machine). We obtained similar speedup on a distributed clusternd a shared memory, four-processor machine, with a differencef about 1.4% estimated by means of the reverse Amdahl’s law. Ouresults indicate a host of potential application: there exists a wideariety of systems which are characterized by significant time con-umption in computing the right hand side of the model equationsnd that could greatly benefit from parallel computing. Parallelismill be useful in conducting computations which need numeri-

al derivatives of right hand sides of such systems, for examplelgorithms of optimization, estimation of Lyapunov exponent foriscrete dynamical systems, models involving integro-differentialquations etc. Finally, parallelization was developed, implementednd tested with two different parallel versions of the software,llowing alternatively synchronous and asynchronous regime of
omputation. Generally, synchronous subroutines perform bettern homogeneous shared memory ma and asynchronous subrou-ines on heterogeneous clusters. The corresponding software isvailable as modified versions of AUTO97 and AUTO2000.

Acknowledgments

We are indebted with: Constantinos Siettos for helpful dis-cussions and comments, and Francesco Saverio Marra for kindlyhosting and assisting with many computations on his Linux clusterat Istituto di Ricerche sulla Combustione, CNR, Naples, Italy.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.compchemeng.2011.12.016.

References

Amdahl, G. M. (1967). Validity of the single-processor approach to achieving largescale computing capabilitiesì. In Proceedings of AFIPS conference (p. 483).

Armaou, A., Siettos, C. I., & Kevrekidis, I. G. (2004). Time-steppers and “coarse” con-trol of distributed microscopic processes. International Journal of Robust andNonlinear Control, 14, 89.

Balakotaiah, V., Dommeti, S. M. S., & Gupta, N. (1999). Bifurcation analysis of chemicalreactors and reacting flows. Chaos, 9, 13.

Berezowski, M. (2000). Method of determination of steady-state diagrams of chem-ical reactors. Chemical Engineering Science, 55, 4291.

Bischof, C. H., Carle, A., Corliss, G., & Griewank, A. (1992). ADIFOR: Automatic differ-entiation in a source translator environment. In P. S. Wang (Ed.), Papers From theinternational symposium on symbolic and algebraic computation. Berkeley, CA,United States, July 27–29, 1992, ISSAC 92 (pp. 294–302). New York, NY: ACMPress.

Bischof, C. H., Roh, L., & Mauer-Oats, A. J. (1997). ADIC: An extensible automaticdifferentiation tool for ANSI-C. Software: Practice and Experience, 27, 1427.

Burns, G. D., Daoud, R. B., & Vaigl, J. R. (1994). LAM: An open cluster environmentfor MPI. In Supercomputing Symposium 94 Toronto, Canada, (Available from:http://www.lam-mpi.org/download/files/lam-papers.tar.gz)

Canuto, C., Hussaini, M. Y., Quarteroni, A., & Zang, T. A. (1988). Spectral methods influid dynamics. Berlin: Springer Verlag.

Doedel, E. J., Champneys, A. R., Fairgrieve, T. F., Kuznetsov, Y. A., Sanstede, B., & Wang,X. J. (1997). AUTO97: Continuation and bifurcation software for ordinary differentialequations. Technical Report, CML, Concordia University, Montreal.

Doedel, E. J., Paffenroth, R. C., Champneys, A. R., Fairgrieve, T. F., Kuznetsov, Y. A.,Oldeman, B. E., Sandstede, B., & Wang X. J. (2000). AUTO2000: Continuation andbifurcation software for ordinary differential equations. Software manual.

Gustafson, J. L. (1988). Reevaluating Amdahl’s law. Communications of the Associationof Computing Machinery (CACM), 31, 532.

Jacobsen, E. W., & Berezowski, M. (1998). Chaotic dynamics in homogeneous tubularreactors with recycle. Chemical Engineering Science, 53, 4023.

Keller, H. B. (1977). Numerical solution of bifurcation and nonlinear eigenvalue prob-lems. In P. H. Rabinowitz (Ed.), Application of bifurcation theory (pp. 359–384).New York: Academic Press.

Kevrekidis, I., & Samaey, G. (2009). Equation-free multiscale computation: Algo-rithms and applications. Annual Review of Physical Chemistry, 60, 321.

Kumar, A., & Mazumder, S. (2010). Toward simulation of full-scale monolithiccatalytic converters with complex heterogeneous chemistry. Computers andChemical Engineering, 34, 135.

Kuznetsov, Y. A. (1998). Elements of applied bifurcation theory (2nd ed.). New York:Springer Verlag.

Kuznetsov, Y. A., Levitin V. V., & Skovoroda, A. R. (1996). Continuation of stationarysolutions to evolution problems in CONTENT. The Netherlands Report AM-R9611,Centrum voor Wiskunde en Informatica, Kruislaan 413, 1098 SJ Amsterdam.

Leineweber, D. B., Schäfer, A., Boch, H. G., & Schlöder, J. P. (2003). An efficientmultiple shooting based reduced SQP strategy for large-scale dynamic processoptimization: part II: software aspects and applications. Computers and ChemicalEngineering, 27, 167.

Liu, H., & Wang, Z. (2009). A Bloch band based level set method for computing thesemiclassical limit of Schrodinger equations. Journal of Computational Physics, 9,3326.

Mancusi, E., Merola, G., Crescitelli, S., & Maffettone, P. L. (2000). Multistability andhysteresis in an industrial ammonia reactor. AIChE Journal, 44, 824.

Mancusi, E., Russo, L., Altimari, P., Maffettone, P. L., & Crescitelli, S. (2007). Effect of theswitch strategy on the stability of reactor networks. Industrial and EngineeringChemistry Research, 46, p6510.

Mancusi, E., Russo, L., Altimari, P., & Crescitelli, S. (2010). Temperature and con-version patterns in a network of catalytic reactors for methanol synthesis fordifferent switch strategies. Chemical Engineering Science, 65, p4579.

Mancusi, E., Russo, L., Altimari, P., & Crescitelli, S. (2011). Multiplicities of tem-perature wave trains in periodically forced networks of catalytic reactors forreversible exothermic reactions. Chemical Engineering Journal, 171(2), p655.

Mancusi, E., Russo, L., & Continillo, G. (2003). Bifurcation analysis of discontinuous
periodically forced reactors. In 19th international colloquium on the dynamics ofexplosions and reactive systems Hakone, Japan.
Rehácek, J., Kubícek, M., & Marek, M. (1998). Periodic, quasi-periodic and chaoticspatio-temporal patterns in a tubular catalytic reactor with periodic flow rever-sal. Computers and Chemical Engineering, 22, 283.

http://dx.doi.org/10.1016/j.compchemeng.2011.12.016

http://www.lam-mpi.org/download/files/lam-papers.tar.gz

Chem

R

R

S

S

S

G. Continillo et al. / Computers and

usso, L., Altimari, P., Mancusi, E., Maffettone, P. L., & Crescitelli, S. (2006). Com-plex dynamics and spatio-temporal patterns in a network of three distributedchemical reactors with periodical feed switching. Chaos, Solitons and Fractals, 28,682.

usso, L., Mancusi, E., Maffettone, P. L., & Crescitelli, S. (2002). Symmetry proper-ties and bifurcation analysis of a class of periodically forced reactors. ChemicalEngineering Science, 57, p5065.

chreiber, I., & Marek, M. (1991). Chaotic behaviour of deterministic dissipative sys-
tems. Cambridge: Cambridge University Press.
eydel, R. (1988). From equilibrium to chaos. Practical bifurcation and stability analysis.New York: Elsevier.

eydel, R., & Hlavácek, V. (1987). Role of continuation in engineering analysis. Chem-ical Engineering Science, 42, p1281.


Siettos, C. I., Armaou, A., Makeev, A. G., & Kevrekidis, I. G. (2003). Micro-scopic/stochastic timesteppers and “coarse” control: A KMC example. AIChEJournal, 49, 1922.

Villadsen, J., & Michelsen, M. L. (1978). Solution of differential equation models bypolynomial approximation. Englewood Cliffs: Prentice-Hall.

Websites

Lam MPI www.lam-mpi.org.AUTO97 and AUTO2000 (modified parallel versions) at:

http://www.ing.unisannio.it/continillo/AUTO.

http://www.lam-mpi.org/

http://www.ing.unisannio.it/continillo/AUTO

Date post:	08-Mar-2018
Category:	Documents
Upload:	domien
View:	215 times
Download:	1 times

Computers and Chemical Engineering - Semantic Scholar · PDF fileNumerical continuation is the...

Documents