Parallelization of a Transient Method of Lines Navier–Stokes Code

This article was downloaded by: [Dalhousie University]On: 14 November 2013, At: 03:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Computational Fluid DynamicsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gcfd20

Parallelization of a Transient Method of LinesNavier–Stokes CodeCem Erşahin* , Tanil Tarhan† , Ismail H. Tuncer‡ a & Nevin Selçuka Aeronautical Engineering Department , Middle East Technical University , 06531, Ankara,TurkeyPublished online: 25 Jan 2007.

To cite this article: Cem Erşahin* , Tanil Tarhan† , Ismail H. Tuncer‡ & Nevin Selçuk (2004) Parallelization of a TransientMethod of Lines Navier–Stokes Code, International Journal of Computational Fluid Dynamics, 18:1, 81-92, DOI:10.1080/1061856031000094673

To link to this article: http://dx.doi.org/10.1080/1061856031000094673

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gcfd20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/1061856031000094673

http://dx.doi.org/10.1080/1061856031000094673

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Parallelization of a Transient Method of Lines Navier–StokesCode

CEM ERSAHINa,*, TANIL TARHANa,†, ISMAIL H. TUNCERb,‡ and NEVIN SELCUKa,{

aChemical Engineering Department; bAeronautical Engineering Department, Middle East Technical University, 06531 Ankara, Turkey

(Received 5 March 2002; In final form 9 December 2002)

Parallel implementation of a serial code, namely method of lines (MOL) solution for momentumequations (MOLS4ME), previously developed for the solution of transient Navier–Stokes equations forincompressible separated internal flows in regular and complex geometries, is described.

The serial code was parallelized by using a domain decomposition strategy with overlappedboundary at the interfaces for information exchange between the sub-domains. A parallel virtualmachine (PVM) and dual-processor personal computer (PC) connected over a switch are employed asthe message passing software and the parallel computing platform, respectively.

Performance of the parallelization strategy was first tested on a transient 1-D laminar pipe flowproblem for MOL, explicit and implicit finite difference methods. Assessment of the performance of theparallel code (PARMOLS4ME) was then tested by applying it to two test cases; (i) laminar pipe flowwith sudden expansion and (ii) turbulent flow in a gas turbine combustor simulator (GTCS) andcomparing the results of serial and parallel codes. Comparisons show that the flow fields predicted byparallel codes agree well with those of serial codes at considerably lower execution times. Effects ofload balancing are also investigated and it is seen that the load balancing has a significant effect on theexecution time of the parallel code.

Keywords: Parallel computing; Domain decomposition; PVM; Direct numerical simulation; Method oflines; Navier–Stokes equations

NOMENCLATURE

D Diameter, cm

E Efficiency, %

g Gravitational acceleration, cm/s2

L Length, cm

p Number of processors; Static pressure, g/cm s2

r Distance in radial direction, cm

Re Reynolds number

S Speed-up

Sa Numerical speed-up of parallel code on

one-processor system

t Time, s

tf Time step for the transient results to be

sent from worker to master, s

tp Time step for the boundary

exchange, s

Tp Execution time of parallel code on

p-processors system, s

T1 Execution time of parallel

code on one-processor

system, s

u Instantaneous axial

velocity, cm/s

v Instantaneous radial

velocity, cm/s

z Distance in axial

direction, cm

ISSN 1061-8562 print/ISSN 1029-0257 online q 2004 Taylor & Francis Ltd

DOI: 10.1080/1061856031000094673

*E-mail: [email protected]†E-mail: [email protected]‡E-mail: [email protected]{Corresponding author. E-mail: [email protected]

International Journal of Computational Fluid Dynamics, January 2004 Vol. 18 (1), pp. 81–92

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

Greek letters

1 Percentage error

f Dimensionless velocity

m Dynamic viscosity, g/cm s

r Density, g/cm3

t Dimensionless time

Subscripts

in Inlet

r r-direction

z z-direction

sj Side jet

INTRODUCTION

Recent emphasis on the prediction of transient behavior

of turbulent, reacting and radiating flows in most

industrial combustion chambers such as gas turbines,

combustors, rocket engines, etc. has led to the

investigation of accurate and computationally efficient

numerical methods for the simultaneous solution of the

time-dependent conservation equations for mass,

momentum, energy and species. Therefore, extensive

research has been devoted to the development of

computational fluid dynamics (CFD) codes and testing

their predictive accuracies by comparing their predic-

tions with either experimental data or results of other

numerical simulation or solution techniques. Research

carried out in the past two decades has shown that DNS

is the most accurate approach for the prediction of

laminar and turbulent flow fields in complex cylindrical

and rectangular geometries (Eggels et al., 1994; Oymak

and Selcuk, 1996; 1997a,b; Kim et al., 1987; Le et al.,

1997; Parneix et al., 1998; Selcuk and Oymak, 1999;

Tarhan and Selcuk, 2001). However, due to the fact that

a lot of grid points as well as time steps are needed for

the direct simulation of turbulent flows, both accurate

and efficient numerical techniques, and high perform-

ance computers are required for the simulation in a short

computation time. The former can be achieved by

increasing the order of the spatial discretization method,

resulting in high accuracy with less grid points, and

using not only a highly accurate but also a stable

numerical algorithm for the time integration. The

method of lines (MOL) is an alternative approach that

meets this requirement for time-dependent problems.

The latter requirement is met by either supercomputers

or parallel computers which require efficient parallel

algorithms.

The MOL consists of converting the partial differen-

tial equations (PDE) into an ordinary differential

equation (ODE) initial value problem by discretizing

the spatial derivatives together with the boundary

conditions via finite differences, finite elements, finite

volumes or weighted residual techniques and integrating

the resulting ODEs using a sophisticated ODE solver

which takes the burden of time discretization

and chooses the time steps in such a way that maintains

the accuracy and stability of the evolving solution.

The most important advantage of the MOL approach is

that it has not only the simplicity of the implementation

of the explicit methods used for evaluation of spatial

derivatives but also the superiority of the implicit

methods. The advantages of the MOL approach are

two-fold. First, it is possible to achieve higher-order

approximations in the discretization of spatial deriva-

tives without a significant increase in computational

complexity except in the boundaries. Second, compar-

able orders of accuracy can also be obtained in the time

steps when highly efficient and reliable initial value

ODE solvers are used.

Recently, a novel code (MOL solution for momentum

equations, MOLS4ME) satisfying the requirement of an

accurate and efficient numerical algorithm, was developed

for the solution of time-dependent 2-D Navier–Stokes

equations for incompressible separated internal flows in

complex cylindrical and rectangular geometries (Selcuk

and Oymak, 1999; Tarhan and Selcuk, 2001). The validity

and the predictive ability of the code was tested by

applying it to the prediction of flow fields in both laminar

and turbulent flows without and with sudden expansion,

and comparing its predictions with either measured data or

numerical results available in the literature. The predicted

flow fields were found to be in perfect agreement with

measurements for laminar flow. The success of the code

for turbulent flows, however, was found to depend strongly

on the number of grid points, which cannot be handled by

most of the present day computers (Selcuk and Oymak,

1999). This problem can be overcome by developing an

efficient parallel algorithm for the sequential code,

MOLS4ME.

In an attempt to achieve this objective, sequential

codes previously developed for (i) transient 1-D laminar

flow in a circular pipe (ii) transient 2-D laminar flow in a

short pipe with sudden expansion are also parallelized in

this study (Ersahin, 2001). The reason behind the

selection of these test cases was firstly due to the

recently proven superiority of MOL over FDM in terms

of accuracy, central processing unit (CPU) and set-up

times (Oymak and Selcuk, 1993; Selcuk et al., 2002) and

flexibility for incorporation of other conservation

equations and secondly due to the presence of sequential

codes for these test cases for comparative testing

purposes. The first test case provides the evaluation of

the performances of parallel implementation of MOL

C. ERSAHIN et al.82

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

and FDMs on a simple 1-D problem while the second

test case displays the effect of parallelization on a

multi-dimensional problem with regular geometry. To the

authors’ knowledge, parallel implementation of Navier–

Stokes flow solvers based on MOL solutions is not

available to date.

GOVERNING EQUATIONS

The Navier–Stokes equations for transient two-dimen-

sional incompressible flows in cylindrical coordinates can

be written as follows;

continuity

›u

›zþ

›v

›rþ

v

r¼ 0 ð1Þ

z-momentum

›u

›tþ u

›u

›zþ v

›u

›r

¼ 21

r

›p

›zþ

m

r

›2u

›r 2þ

1

r

›u

›rþ

›2u

›z2

� �þ gz

ð2Þ

r-momentum

›v

›tþ u

›v

›zþ v

›v

›r

¼ 21

r

›p

›rþ

m

r

›2v

›r 2þ

1

r

›v

›r2

v

r 2þ

›2v

›z2

� �þ gr

ð3Þ

where t is time, u and v represent axial (z) and radial (r)

velocity components, respectively, p is static pressure, r

and m are density and dynamic viscosity, respectively, and

g stands for gravitational acceleration.

The initial and boundary conditions are;

IC : @t ¼ 0; ;r ^ ;z : u ¼ 0; v ¼ 0

BC 1 : @r ¼ 0; ;t ^ ;z :›u

›r¼ 0; v ¼ 0

BC 2 : @r ¼ R; ;t ^ ;z : u ¼ 0; v ¼ 0

BC 3 : @z ¼ 0; ;t ^ ;r : u ¼ uin; v ¼ 0

BC 4 : @z ¼ L; ;t ^ ;r :›2u

›z2¼ 0;

›2v

›z2¼ 0:

BC 1, BC 2 and BC 3 represent axial symmetry, no-

slip velocity boundary condition at the stationary wall

and the velocity profile at the inlet, respectively. BC 4

describes the soft boundary condition at the outlet

denoting developing flow. Before the sudden start of

the flow ðt ¼ 0Þ; a zero velocity field is prescribed in

the entire domain, except at the inlet of the flow

geometry where a certain mass flux with a specified

velocity profile is set.

NUMERICAL SOLUTION TECHNIQUE

Method of Lines

In this study, the governing equations are solved using

the numerical MOL (Schiesser, 1991). Many existing

numerical algorithms for transient PDEs can be

considered as MOL algorithms. The most important

difference of the MOL approach from the conventional

methods is that in the MOL approach higher-order,

implicit and hence stable numerical algorithms for time

integration are used. For the numerical solution of the

same PDE system, the MOL approach and the

conventional methods, in which lower-order either

explicit or implicit time integration methods are used,

have the same system of ODEs as a result of spatial

discretization. Therefore, the stability of the ODE

problem, which can only be achieved by scheme

adaptive spatial discretization of convective terms in a

zone of dependence manner, should be satisfied not only

for the MOL approach but also for the conventional

methods. However, note that satisfaction of the stability

of the ODE system as a result of spatial discretization

does not necessarily mean that the final solution as a

result of time integration will also be stable. So, in order

to have absolutely stable and accurate solutions, the first

condition is to satisfy the ODE problem stability, and

the second is to use sophisticated (higher-order and

implicit) time integration methods. In this study, the first

is provided by utilizing an intelligent higher-order

spatial discretization scheme, namely five-point

Lagrange interpolation polynomial, of Oymak and

Selcuk (1997a), and the second, time integration, is

achieved by using higher-order and stable schemes

embedded in a quality ODE solver, namely, LSODES of

the LSODE family (Radhakrishnan and Hindmarsh, 1993)

in which the backward differentiation formula (BDF)

method is accommodated.

Treatment of Pressure Gradient and Computation ofRadial Velocity Component

The computation of pressure is the most difficult and

time-consuming part of the overall solution of the

Navier–Stokes equations and involves an iterative

procedure between the velocity and pressure fields

through the solution of a Poisson-type equation for

pressure to satisfy the global mass flow constraint and

the divergence-free condition for confined incompres-

sible flows. Therefore, in this paper, a non-iterative

procedure for the calculation of pressure suggested by

Raithby and Schneider (1979) and Patankar and

Spalding (1972), and applied by Oymak and Selcuk

PARALLEL UNSTEADY N–S BY METHOD OF LINES 83

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

(1996; 1997a,b), is used. This procedure is based on the

fact that in the numerical solution of the Navier–Stokes

equations for internal flows, the streamwise pressure

gradient must be known in such a way that the mass

conservation at each cross-section is satisfied. In order to

accomplish this, the static pressure p (z,r,t) in the

momentum equations is split into two parts

p ðz; r; tÞ ¼ pðz; tÞ þ ~pðz; r; tÞ ð4Þ

as suggested in Raithby and Schneider (1979) and

Patankar and Spalding (1972).

As can be seen from Eq. (4), p is independent of the r

direction. Hence, the derivative of Eq. (4) with respect to z

and r directions yields

›p

›z¼

›p

›zþ

›~p

›z

and

›p

›r¼

›~p

›r:

The physical assumption in this decoupling procedure is

that ›~p=›z is very small compared with ›p=›z. Therefore,

when the pressure field is split into two in this manner, the

z-momentum equation can be written as

›u

›tþ u

›u

›zþ v

›u

›r

¼ 21

r

›p

›zþ

m

r

›2u

›r 2þ

1

r

›u

›rþ

›2u

›z2

� �þ gz:

ð5Þ

The pressure gradient in Eq. (5), ›~p=›z, is determined

with the aid of global mass flow constraint combined

with the discretized form of the z-momentum equation as

follows:

u nþ1i; j ¼ Fn

i; j þ›p

›z

� �n

j

C n ð6Þ

where

Fni; j ¼ un

i; j 2 Dt uni; j

›u

›z

� �n

i; j

þ vni; j

›u

›r

� �n

i; j

(

þ n›2u

›z2

� �n

i; j

þ1

r

›u

›r

� �n

i; j

þ›2u

›r 2

� �n

i; j

" #); ð7Þ

C n ¼ 2Dt

rn:

Equation (7) is then multiplied by the density rnþ1 and

the resulting equation is subsequently integrated numeri-

cally over the cross-sectional area perpendicular to

the streamwise direction. This yieldsð2p

0

ðR

0

rnþ1u nþ1i; j r dr du

¼ _m ¼

ð2p

0

ðR

0

rnþ1Fni; jr dr du

þ›p

›z

� �n

j

ð2p

0

ðR

0

rnþ1C nr dr du:

ð8Þ

Note that the density rnþ1 is not known a priori.

Since the mass flow is pre-specified by the problem

inlet boundary condition, the pressure gradient ›p=›z

can be computed from the following expression

obtained by rearranging Eq. (8).

›p

›z

� �n

j

¼2prn

Ð R

0Fn

i; jr dr 2 _m

pR2Dt: ð9Þ

The r-component velocity vðr; z; tÞ; on the other hand, is

determined with the direct utilization of the continuity

equation. For this purpose the continuity equation [Eq. (1)]

is discretized as

vniþ1;j ¼

ri

riþ1

vni; j 2 Drþi

›u

›z

� �n

i; j

" #for i ¼ 1; . . .; IR 2 2;

j ¼ 2; . . .; JZ; Drþi ¼ riþ1 2 ri: ð10Þ

Hence, by this formulation not only the r-component

velocity vðr; z; tÞ is computed without an extra burden to

the ODE solver, but also the divergence-free condition for

incompressible flows is satisfied automatically.

Higher-order Intelligent Spatial Discretization Scheme

In the present study, the spatial derivatives in the Navier–

Stokes equations are approximated by utilizing the general

definition of the fourth-order, five-point Lagrange

interpolation polynomial

y ¼X5

i¼1i–j

Y5

j¼1

x 2 xj

xi 2 xj

yi ð11Þ

which makes it possible to investigate the solutions of the

governing equations by a higher-order discretization

scheme on both uniform and non-uniform grid topology.

The discretization procedure is applied on both radial

and axial directions after the transformation of the

concerned dependent variable, say wðr; z; tÞ; into its

pseudo-1-D form, wð�x; tÞ; in a certain spatial direction �x;for a value of remainder direction, by transforming the 2-D

array into 1-D array. Here, �x stands for either the radial

direction r or axial direction z, depending on the direction

in which discretization of the dependent variable is carried

C. ERSAHIN et al.84

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

out. Details of the transformation and discretization

procedure can be found in Oymak (1997). Substitution

of the spatial derivatives into the Navier– Stokes

equations constitutes the following coupled system of

ODEs in time.

d �V

dt¼ �Fð �VÞ ð12Þ

where

�V ¼ u1;1; u1;2; . . .; uIR;JZ ; v1;1; v1;2; . . .; vIR;JZ

� �Tð13Þ

�F ¼ F1;1;F1;2; . . .;FIR;JZ

� �T: ð14Þ

Here IR and JZ are the number of grid points in the

r- and z-direction, respectively.

The convective terms in the governing equation are

approximated in such a way that the resulting system of

ODEs should be stable according to the linear stability

theory (Hirsch, 1988; Schiesser, 1991). This is achieved

by using a multidimensional intelligent scheme (scheme-

adaptive method) which is based on the choice of biased-

upwind or biased-downwind stencils for the convective

derivatives according to the sign of the coefficient of the

associated derivative (Oymak and Selcuk, 1997a).

Time Integration

The integration of the resulting ODEs derived from the

discretization of the Navier–Stokes equations is carried

out by an implicit algorithm, namely, BDF embedded

in the well-known ODE solver LSODES from the

LSODE family (Radhakrishnan and Hindmarsh, 1993).

The implicit nature of the solution method requires some

additional discussion. In order to illustrate this, a typical

implicit formulation for the solution of ODEs can be

written in the form of backward Euler method as

�Vnþ1 ¼ �Vn þ �FðVnþ1ÞDt ð15Þ

where �Vnþ1 and �FðV nþ1Þ are the solution and the

derivative vectors, respectively. As can be seen from

Eq. (15), the derivative vector is evaluated at the next time

level. In other words, Eq. (15) is implicit in the derivative

vector, �FðVnþ1Þ: It is this implicit term that gives the

method its good stability properties. Therefore, the

elegance of the MOL is that it shares the advantages of

both the explicit and the implicit methods. In the MOL, the

spatial derivatives and the source terms are evaluated at

the previous time level as applied in the explicit approach,

so that no linearization problem arises. Furthermore, the

solution of the resulting ODEs is carried out by an implicit

algorithm such as the implicit Adams–Moulton method or

the BDF method. Hence, it can be concluded that the MOL

has the simplicity of the explicit approach and the power

of the implicit one, unless a poor algorithm for the solution

of ODEs is employed.

Code and its Mode of Operation

The MOLS4ME is a general program for computing

single-phase, transient, 2D, incompressible, internal flows

which may be laminar or turbulent in complex cylindrical

geometries (Oymak, 1997). The code uses a body-fitted

coordinate system. Therefore, the spacing of the

computational grid in the physical domain may be

arbitrarily specified as uniform/non-uniform or regular/-

irregular. Details of the code structure can be found

elsewhere (Oymak, 1997).

The mode of operation of MOLS4ME can be described

as follows. The general algorithm for the solution of the

governing equations of fluid dynamics by using the MOL

approach is based on the evaluation of the derivative

vector by which the solution is advanced from one time

step to the next. Once the derivative vector is obtained, the

first step in solving the system of equations is to combine

the dependent variables into a one-dimensional array.

The evaluation of the derivative vector can be summarized

as follows.

The complete velocity field satisfying the continuity

equation, is known a priori at the beginning of each cycle,

either as a result of the previous cycle or from the

prescribed initial conditions for the dependent variables.

Once the spatial derivatives appearing in the governing

equations are evaluated using values of the present

cycle, the normal component of the velocity is calculated

by the direct utilization of the continuity equation, so

that the divergence-free condition is ensured automati-

cally. Then the corresponding pressure gradients along the

axial direction are calculated to ensure that the mass flow

is conserved. Once these calculations have been made, the

derivative vector is calculated over the spatial domain of

interest, then it is sent to the ODE solver in the form of

one-dimensional array to compute the dependent variables

at the advanced time level. This completes the progression

of the solution to the end of the new cycle having the new

values of the velocity field. At user specified time intervals

(tp) the ODE solver sends the current solutions to the main

program. This cyclic procedure is then continued until the

steady state is reached.

PARALLEL IMPLEMENTATION

For parallel implementation, domain decomposition and

the overlapping boundary condition technique are

implemented as shown in Fig. 4. The solution domain

may be decomposed into as many sub-domains as the

number of processors available for parallel processing.

The solution process for each sub-domain may then be

assigned to a processor. Parallel virtual machine (PVM)

message passing software is employed for the information

exchange at the overlapped intergrid boundaries (Fig. 4).

The parallel algorithm is based on the master–worker

paradigm, where the master process generates the

computational grid, sets the initial and physical boundary


Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

conditions, decomposes the domain into sub-domains, and

sends the related data to the worker processes. The proper

inlet and outlet boundary conditions are recognized and

applied by the worker processes.

Each worker process implements the same solution

algorithm as the serial code, described under the “Code

and its mode of operation” section, on the sub-domain

assigned to itself, advances in time and exchanges

necessary intergrid boundary condition data with the

neighboring sub-domains at user defined time steps (tp)

and sends transient results to the master process at user

defined time intervals to follow the transient solution until

the steady-state is reached.

It should be noted that fourth-order accurate spatial

discretization is used. To preserve the same accuracy at the

intergrid boundaries, three point overlapping is employed.

As a result, at each intergrid boundary, solution variables

along three grid lines parallel to the boundary are

exchanged between the sub-domains.

Another characteristic of the MOL appears during

marching in time. The ODE solver itself determines the

time steps (Dt) for the solution of the set of ODEs and the

solutions at certain time steps (tp) may be requested from

the solver. Determination of the time steps, tp, does not

affect the time steps, Dt, used for the solution of the ODE

set. However, in the parallel algorithm the value of the

time step, tp, has an effect on the overall accuracy due to

the application of the intergrid boundary condition at each

tp, time intervals. As the value of tp increases, the

frequency of the data exchange at the intergrid boundary

condition decreases. Preliminary studies show that the

value of tp used in the serial code is sufficient for the

accuracy of the parallel code in terms of accuracy both in

transient and steady solutions.

Performance Evaluation

Speed-up (S) and efficiency (E) of the parallel processing

were evaluated by comparing the execution times of the

parallel code on a single (T1) and p processor (Tp) runs.

The speed up and efficiency are defined as,

S ¼T1

Tp

; ð16Þ

E ¼S

p: ð17Þ

RESULTS AND DISCUSSIONS

The results obtained by both serial and parallel codes for

(i) transient 1-D laminar flow in a circular pipe, (ii)

transient 2-D laminar flow in a short pipe with sudden

expansion and (iii) transient turbulent flow in a gas turbine

combustor simulator (GTCS) were compared in terms of

accuracy and execution time. For comparative testing

purposes, computations were carried out using the same

grids, time steps, final time and user specified error

tolerances for both serial and parallel codes and for each

test case, respectively.

The runs were carried out on dual-processor personal

computers (PCs), which are equipped with Pentium III-

700 MHz processors and with 512 MB RAM. The

operating system used on these machines is Linux kernel

version 2.2.14 and they are on an ethernet network through

a 100 Mbps switch. FORTRAN 90 and PVM v.3.4.3 are

employed as the programming language and message

passing software, respectively.

Test Case 1: Transient 1-D Laminar Flow in a Circular

Pipe

Oymak and Selcuk (1993) have investigated the accuracy

and efficiency of the sequential codes for MOL, explicit

and implicit FDMs on a start-up flow in an infinitely long

pipe for which an exact analytical solution has been

provided by Szymanski (Bird et al., 1960). The long pipe

of length L and radius R contains a fluid at constant

density, r and viscosity, m. Initially, the fluid in the pipe is

at rest. At time t ¼ 0 a pressure gradient ðpin 2 pLÞ=L is

applied to the system and the fluid starts to flow. The

problem is to determine the variation of transient velocity

profiles. In this study, these sequential codes were

parallelized and tested for accuracy and efficiency of

parallelization by comparing their results with those of

serial codes.

As the problem is one-dimensional, the domain

decomposition was carried out in the radial direction

only. Steady state velocity profiles were obtained by

running parallel and serial codes for 101 grid points with a

time step of 1 £ 1025, 1 £ 1024 and 3 £ 1023 for EFDM,

IFDM and MOL, respectively. Relative and absolute error

tolerances of the ODE solver were 1024 and 10210,

respectively, for the MOL solution.

In order to display the discrepancies between point

values of the serial and parallel solutions, percentage

absolute errors defined as;

1 ¼jFparallel 2Fsequentialj

Fsequential

£ 100 ð18Þ

were calculated for all grids points and for each numerical

method of solution and average and maximum absolute

percentage errors for all numerical methods were

displayed in Table I. As can be seen from the table, the

results obtained by the parallel solution are in good

agreement with those of the serial ones.

TABLE I Accuracy of parallel code for each numerical method

Method Average error (%) Maximum error (%)

EFDM 0.3949 0.4912IFDM 0.5433 0.6257MOL 0.0246 0.0273

C. ERSAHIN et al.86

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

For the evaluation of performance of the parallel

programs for all the numerical techniques, the execution

times of serial and parallel programs run by using dual-

processors, speed-up and efficiency values were compared

and summarized in Table II. It is noted that the execution

times of all the parallel programs are much higher than

those of serial ones, which is reflected by speed-up values

much less than unity. The reason for this is considered to

be due to the simplicity of the test case and the low ratio of

computation to communication time for the number of

grid points employed. As can be seen from Table II, the

efficiency of the MOL is very low compared to those of

the EFDM and IFDM due to its very low computational

work compared to the others. This outcome is expected

due to its shorter execution time than others. The major

portion of its execution time is occupied by communi-

cation for data exchange rather than computation.

However, for multi-dimensional problems with complex

geometries where a large number of grid points is required

for accurate solutions, parallel solution is expected to

provide higher speed-up and efficiency values.

Test Case 2: Transient 2-D Laminar Flow in a Short

Pipe with Sudden Expansion

The geometry under consideration in this test case is a

circular pipe with a sudden expansion. Measurements and

numerical predictions of the flow field for the laminar flow

of engine oil in the pipe are available in the literature

(Macagno and Hung, 1967). The aspect ratio of the inlet

diameter Din to the outlet diameter D is 1:2 and the total

length, L, of the pipe is 20 Din. The geometry of the system

is shown in Fig. 1.

The original serial code based on the MOL developed

and tested by Oymak and Selcuk (1997a,b) was parallelized

in this study. The parallel and serial codes were run for

three different Reynolds numbers of 36, 60 and 198,

where the Reynolds number is based upon the inlet mean

velocity, uin, and the inlet diameter, Din. Results were

obtained for user specified relative and absolute error

tolerances of 1021 in the ODE solver, 1023 for tp and for

final time of 0.5 s.

The solution domain is decomposed into 4 sub-domains

in the axial direction with overlapping boundary regions at

the intersections and the parallel code is executed on one,

two and four processors. The predictive accuracy of the

parallel code is determined by comparing the steady state

results obtained by the parallel code with the ones

obtained by the serial code and by comparing some

representative points in the solution domain during

unsteady solution. The steady state solution comparisons

are given in Fig. 2 for the Reynolds numbers under

consideration. As can be seen from Fig. 2, results of both

the serial and parallel code are in perfect agreement with

each other. The maximum point error in the solution

domain, where sharp gradients exist, has been calculated

as 0.5%. Both comparisons show that the parallel code

satisfies the same predictive accuracy as the serial code.

An important parameter for the comparison of sudden

expansion flow predictions is the reattachment length.

The comparison between predictions of the serial and

parallel computations for the reattachment lengths is

given in Table III for the Reynolds numbers concerned.

As can be seen from the table, the reattachment lengths

predicted by the parallel and serial codes are in favorable

agreement.

Speed-up (S) and efficiency (E) of the parallel code

for the Reynolds numbers under consideration are given

in Fig. 3. As can be seen from the speed-up (S) vs.

number of processor ( p) curves, an increase in the

number of processors causes an increase in the speed-up

with a deviation from the ideal case. This deviation

from the ideal case can also be seen from the efficiency

(E) vs. number of processor ( p) curves. However,

this deviation from the ideal case is slight therefore it

can be concluded that the performance of the parallel

code is satisfactory and it can be scalable to more

processors.

Although the computational load is low owing to the

low grid point utilization due to the laminar case, the

obtained results, both in terms of speed-up and efficiency,

are promising for the turbulent cases where a higher

number of grid points are employed.

Test Case 3: Transient Turbulent Flow in a Gas

Turbine Combustor Simulator

The sequential code, MOLS4ME (Selcuk and Oymak,

1999), was recently applied to a GTCS with complex

geometry. The GTCS is a cylindrical test combustor rig

43.2 cm in length (excluding the exit cone length of

15.4 cm) and 10.16 cm in diameter. The primary air

stream is injected through a 0.845 cm diameter inlet in

the center of the bluff-body and the secondary air

stream flows through the annular gap formed by

the outer edge of the bluff-body and the rig floor.FIGURE 1 Physical system and relative dimensions.

TABLE II Performance of the parallel code for each numerical method

Method

Execution time (s)Speed-up

(2 processors)Efficiency

(%)Serial Parallel

EFDM 249 402 0.62 31IFDM 34.7 134 0.26 13MOL 0.22 17.2 0.01 0.5


Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

The details of the GTCS can be found elsewhere

(Selcuk and Oymak, 1999).

As the flow in GTCS is highly turbulent ðRe ¼

35000Þ and considering that the code is based on DNS

and that DNS requires a number of grid points

proportional to Re 3/2 for the present two-dimensional

application, the problem requirement was relaxed by

returning to the classical approach, i.e. employing

artificial viscosity (Selcuk and Oymak, 1999). Favorable

FIGURE 2 Steady-state streamlines and axial velocity contours for (a) Re ¼ 36; (b) Re ¼ 60 and (c) Re ¼ 198:

TABLE III Reattachment lengths predicted by parallel and serial codesfor all Reynolds numbers

Reattachment length (cm)

Reynoldsnumber

Serialcode

Parallelcode

36 1.62 1.6160 2.57 2.62198 8.57 8.53

C. ERSAHIN et al.88

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

comparisons between steady-state predictions and

measurements were obtained for an artificial viscosity

of 100m. However, the success of the code eventually

depends on the excessive number of grid points required

by DNS. This has led to the parallelization of the serial

code carried out in this study. For this purpose the

physical domain was decomposed into four sub-domains

in the axial direction as shown in Fig. 4. Serial and

parallel codes were run under the same conditions using

the same number of grid points, 151 £ 151 in the axial

and radial directions.

Figure 5 shows the steady-state solutions obtained by

serial (MOLS4ME) and parallel (PARMOLS4ME) codes.

As can be seen from the figure, the streamline patterns and

the velocity contours predicted by the parallel code are in

good agreement with those of the serial one. Reattachment

lengths at the wall and the center predicted by both codes

are the same.

FIGURE 3 Domain decomposition.


Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

Time-averaged and instantaneous velocities at some

points corresponding to steep gradients in the flow field

close to the inlet of GTCS were recorded. Absolute

percentage errors of the steady state time-averaged

velocities of the parallel code for all points under

consideration were calculated as the absolute difference

between parallel and serial code results divided by

maximum inlet velocity (usj), multiplied by 100 resulting

in a maximum error of about 2% for the points located in

the region of steep gradients.

To investigate the effect of load balancing in parallel

computations, the computational domain is first divided

into four sub-domains having an equal number of grid

points and the CPU time of each processor is recorded and

shown in a pie chart in Fig. 6 (a). As can be seen from the

figure, the CPU usage is not evenly distributed among the

workers for an equal number of grid points in each sub-

domain. It should be noted that the MOL solution in each

sub-domain adjusts the time step, Dt, based on the solution

gradients in that domain. Therefore, in regions of large

gradients, smaller time steps are used, resulting in longer

execution times in each intergrid boundary data exchange

period, tp. This uneven load distribution among the

processors leads to idleness of some of the processors, and

hence decreases the parallel efficiency.

In order to improve the parallel efficiency, the number

of grid points in each sub-domain is adjusted so that the

workload of each sub-domain is evenly distributed in

terms of computational work rather than the number of

grid points. The sub-domains, which contain sharp

gradients of dependent variables, are assigned a lower

number of grid points than the others. As can be seen

from Fig. 6(b), this ensures a better load balancing

between the sub-domains.

Speed-up and efficiencies with and without load

balancing are shown in Fig. 7. As can be seen from the

figure, load balancing improves the parallel efficiency

significantly. However, as the number of processors

increases, the communication overhead increases and

efficiency decreases. In the case of 16 sub-domains, due to

the small number of grid points in each sub-domain, load

balancing was not performed.

FIGURE 5 Steady-state streamlines axial velocity contours for parallel code and serial code.

FIGURE 4 Speed-up and efficiency versus number of processors.

C. ERSAHIN et al.90

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

CONCLUSION

In this study, a parallelization strategy for a novel transient

Navier – Stokes solver (MOLS4ME) for the direct

numerical simulation of incompressible separated internal

flows in complex cylindrical geometries is presented.

The domain decomposition technique and PVM as

a message passing paradigm were utilized with

an improved static load balancing technique. Overlapped

sub-grid boundaries are implemented to satisfy

the intergrid boundary condition between sub-domains. A

cluster of PCs with dual-processors was used as the parallel

processing platform.

Performance of the parallel code (PARMOLS4ME) was

tested by applying it to two test cases; (i) laminar pipe flow

with sudden expansion and (ii) turbulent flow in a GTCS

and comparing the results of serial and parallel codes.

Numerical experimentations carried out in this study show

that up to nine-fold speed-up with 16 processors is

possible without any loss in numerical accuracy and that

load balancing based on computational workload rather

than the number of grid points between the sub-domains

has an important role on the parallel efficiency.

The parallel code developed in this study constitutes a

major improvement to the serial code employed in the

prediction of transient, incompressible, separated internal

flows and provides an algorithm for future DNS

applications.

Acknowledgements

This study was performed as part of Middle East Technical

University research funding projects AFP-2000-03-04-02

and AFP-2001-03-04-02. The support is gratefully

acknowledged.

FIGURE 7 Speed-up and efficiency versus number of processors.

FIGURE 6 CPU distribution among the processors for (a) equal numberof grid points (b) unequal number of grid points.


Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

References

Bird, R.B., Stewart, W.E. and Lightfoot, E.N. (1960) TransportPhenomena (John Wiley & Sons, New York).

Eggels, J.G.M., Unger, F., Weiss, M.H., Westerweel, J., Adrian, R.J.,Friedrich, R. and Nieuwstadt, E.T.M. (1994) “Fully developedturbulent pipe flow: a comparison between direct numericalsimulation and experiment”, J. Fluid Mech. 268, 175–209.

Ersahin, C. (2001) “Parallelization of a transient Navier–Stokes codebased on method of lines solution”, M.Sc. Thesis, Chem. Eng. Dept.,Middle East Technical University (Turkey).

Hirsch, C. (1988) Numerical Computation of Internal and External FlowsFundamentals of Numerical Discretization (John Wiley & Sons,Chichester) Vol. 1.

Kim, J., Moin, P. and Moser, R. (1987) “Turbulence statistics in fullydeveloped channel flow at low Reynolds number”, J. Fluid Mech.177, 133–166.

Le, H., Moin, P. and Kim, J. (1997) “Direct simulation of turbulent flowover a backward-facing step”, J. Fluid Mech. 330, 349–374.

Macagno, O.E. and Hung, T.-K. (1967) “Computational and experi-mental study of a captive annular Eddy”, J. Fluid Mech. 28, 43–64.

Oymak, O. (1997) “Method of lines solution of time-dependent two-dimensional Navier–Stokes equations for incompressible separatedinternal flows”, Ph.D. Thesis, Chem. Eng. Dept., Middle EastTechnical University (Turkey).

Oymak, O. and Selcuk, N. (1993) “MOL vs FDM solutions ofan unsteady viscous flow problem”, Proc. Eight Int. Conf. OnNumerical Methods in Laminar and Turbulent Flows, Swansea, 1,153–162.

Oymak, O. and Selcuk, N. (1996) “Method of lines solution of time-dependent two-dimensional Navier–Stokes equations”, Int. J. Numer.Meth. Fluids 23, 455–466.

Oymak, O. and Selcuk, N. (1997a) “Transient simulation of internalseparated flows using an intelligent higher-order spatial discretizationscheme”, Int. J. Numer. Meth. Fluids 24, 759–769.

Oymak, O. and Selcuk, N. (1997b) “The method of lines solution of time-dependent 2D Navier–Stokes equations coupled with the energyequation”, Int. Symp. on Advances in Computational Heat Transfer,Cesme, Turkey.

Parneix, S., Laurence, D. and Durbin, P.A. (1998) “A procedure for usingDNS databases”, J. Fluids Eng. 120, 40–47.

Patankar, S.V. and Spalding, D.B. (1972) “A calculation procedure forheat, mass and momentum transfer in three-dimensional parabolicflows”, Int. J. Heat Mass Transfer 15, 1787–1806.

Radhakrishnan, K. and Hindmarsh, A.C. (1993) “Description and use ofLSODE, The Livermore solver for ordinary differential equations”,Technical Report UCRL-ID-113855, Lawrence Livermore NationalLaboratory, NASA.

Raithby, G.D. and Schneider, G.E. (1979) “Numerical solution ofproblems in incompressible fluid flow: treatment of the velocitypressure coupling”, Numer. Heat Transfer 2, 417–440.

Schiesser, W.E. (1991) The numerical method of lines integration of partialdifferential equations (Academic Press, Inc., San Diego).

Selcuk, N. and Oymak, N. (1999) “A novel code for the predictionof transient flow field in a gas turbine combustor simulator”, AVTSymposium on Gas Turbine Engine Combustion, Emissions andAlternative Fuels, 12 – 16 October 1998, Lisbon, Portugal,NATO/RTO Meeting Proceedings 14 11/1-10.

Selcuk, N., Tarhan, T. and Tanrıkulu, S. (2002) “Comparison of methodof lines and finite difference solutions of 2-D Navier–Stokesequations for transient laminar pipe flow”, Int. J. Numer. Meth. Eng.53, 1615–1628.

Tarhan, T. and Selcuk, N. (2001) “Method of lines for transient flowfields”, Int. J. Comput. Fluid Dyn. 15, 309–328.

C. ERSAHIN et al.92

Dow

nloa

ded

by [

Dal

hous

ie U

nive

rsity

] at

03:

37 1

4 N

ovem

ber

2013

Date post:	14-Dec-2016
Category:	Documents
Upload:	nevin
View:	217 times
Download:	1 times

Parallelization of a Transient Method of Lines Navier–Stokes Code

Documents