This article was downloaded by: [Dalhousie University]On: 14 November 2013, At: 03:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
International Journal of Computational Fluid DynamicsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gcfd20
Parallelization of a Transient Method of LinesNavier–Stokes CodeCem Erşahin* , Tanil Tarhan† , Ismail H. Tuncer‡ a & Nevin Selçuka Aeronautical Engineering Department , Middle East Technical University , 06531, Ankara,TurkeyPublished online: 25 Jan 2007.
To cite this article: Cem Erşahin* , Tanil Tarhan† , Ismail H. Tuncer‡ & Nevin Selçuk (2004) Parallelization of a TransientMethod of Lines Navier–Stokes Code, International Journal of Computational Fluid Dynamics, 18:1, 81-92, DOI:10.1080/1061856031000094673
To link to this article: http://dx.doi.org/10.1080/1061856031000094673
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Parallelization of a Transient Method of Lines Navier–StokesCode
CEM ERSAHINa,*, TANIL TARHANa,†, ISMAIL H. TUNCERb,‡ and NEVIN SELCUKa,{
aChemical Engineering Department; bAeronautical Engineering Department, Middle East Technical University, 06531 Ankara, Turkey
(Received 5 March 2002; In final form 9 December 2002)
Parallel implementation of a serial code, namely method of lines (MOL) solution for momentumequations (MOLS4ME), previously developed for the solution of transient Navier–Stokes equations forincompressible separated internal flows in regular and complex geometries, is described.
The serial code was parallelized by using a domain decomposition strategy with overlappedboundary at the interfaces for information exchange between the sub-domains. A parallel virtualmachine (PVM) and dual-processor personal computer (PC) connected over a switch are employed asthe message passing software and the parallel computing platform, respectively.
Performance of the parallelization strategy was first tested on a transient 1-D laminar pipe flowproblem for MOL, explicit and implicit finite difference methods. Assessment of the performance of theparallel code (PARMOLS4ME) was then tested by applying it to two test cases; (i) laminar pipe flowwith sudden expansion and (ii) turbulent flow in a gas turbine combustor simulator (GTCS) andcomparing the results of serial and parallel codes. Comparisons show that the flow fields predicted byparallel codes agree well with those of serial codes at considerably lower execution times. Effects ofload balancing are also investigated and it is seen that the load balancing has a significant effect on theexecution time of the parallel code.
Keywords: Parallel computing; Domain decomposition; PVM; Direct numerical simulation; Method oflines; Navier–Stokes equations
NOMENCLATURE
D Diameter, cm
E Efficiency, %
g Gravitational acceleration, cm/s2
L Length, cm
p Number of processors; Static pressure, g/cm s2
r Distance in radial direction, cm
Re Reynolds number
S Speed-up
Sa Numerical speed-up of parallel code on
one-processor system
t Time, s
tf Time step for the transient results to be
sent from worker to master, s
tp Time step for the boundary
exchange, s
Tp Execution time of parallel code on
p-processors system, s
T1 Execution time of parallel
code on one-processor
system, s
u Instantaneous axial
velocity, cm/s
v Instantaneous radial
velocity, cm/s
z Distance in axial
direction, cm
ISSN 1061-8562 print/ISSN 1029-0257 online q 2004 Taylor & Francis Ltd
DOI: 10.1080/1061856031000094673
*E-mail: [email protected]†E-mail: [email protected]‡E-mail: [email protected]{Corresponding author. E-mail: [email protected]
International Journal of Computational Fluid Dynamics, January 2004 Vol. 18 (1), pp. 81–92
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
Greek letters
1 Percentage error
f Dimensionless velocity
m Dynamic viscosity, g/cm s
r Density, g/cm3
t Dimensionless time
Subscripts
in Inlet
r r-direction
z z-direction
sj Side jet
INTRODUCTION
Recent emphasis on the prediction of transient behavior
of turbulent, reacting and radiating flows in most
industrial combustion chambers such as gas turbines,
combustors, rocket engines, etc. has led to the
investigation of accurate and computationally efficient
numerical methods for the simultaneous solution of the
time-dependent conservation equations for mass,
momentum, energy and species. Therefore, extensive
research has been devoted to the development of
computational fluid dynamics (CFD) codes and testing
their predictive accuracies by comparing their predic-
tions with either experimental data or results of other
numerical simulation or solution techniques. Research
carried out in the past two decades has shown that DNS
is the most accurate approach for the prediction of
laminar and turbulent flow fields in complex cylindrical
and rectangular geometries (Eggels et al., 1994; Oymak
and Selcuk, 1996; 1997a,b; Kim et al., 1987; Le et al.,
1997; Parneix et al., 1998; Selcuk and Oymak, 1999;
Tarhan and Selcuk, 2001). However, due to the fact that
a lot of grid points as well as time steps are needed for
the direct simulation of turbulent flows, both accurate
and efficient numerical techniques, and high perform-
ance computers are required for the simulation in a short
computation time. The former can be achieved by
increasing the order of the spatial discretization method,
resulting in high accuracy with less grid points, and
using not only a highly accurate but also a stable
numerical algorithm for the time integration. The
method of lines (MOL) is an alternative approach that
meets this requirement for time-dependent problems.
The latter requirement is met by either supercomputers
or parallel computers which require efficient parallel
algorithms.
The MOL consists of converting the partial differen-
tial equations (PDE) into an ordinary differential
equation (ODE) initial value problem by discretizing
the spatial derivatives together with the boundary
conditions via finite differences, finite elements, finite
volumes or weighted residual techniques and integrating
the resulting ODEs using a sophisticated ODE solver
which takes the burden of time discretization
and chooses the time steps in such a way that maintains
the accuracy and stability of the evolving solution.
The most important advantage of the MOL approach is
that it has not only the simplicity of the implementation
of the explicit methods used for evaluation of spatial
derivatives but also the superiority of the implicit
methods. The advantages of the MOL approach are
two-fold. First, it is possible to achieve higher-order
approximations in the discretization of spatial deriva-
tives without a significant increase in computational
complexity except in the boundaries. Second, compar-
able orders of accuracy can also be obtained in the time
steps when highly efficient and reliable initial value
ODE solvers are used.
Recently, a novel code (MOL solution for momentum
equations, MOLS4ME) satisfying the requirement of an
accurate and efficient numerical algorithm, was developed
for the solution of time-dependent 2-D Navier–Stokes
equations for incompressible separated internal flows in
complex cylindrical and rectangular geometries (Selcuk
and Oymak, 1999; Tarhan and Selcuk, 2001). The validity
and the predictive ability of the code was tested by
applying it to the prediction of flow fields in both laminar
and turbulent flows without and with sudden expansion,
and comparing its predictions with either measured data or
numerical results available in the literature. The predicted
flow fields were found to be in perfect agreement with
measurements for laminar flow. The success of the code
for turbulent flows, however, was found to depend strongly
on the number of grid points, which cannot be handled by
most of the present day computers (Selcuk and Oymak,
1999). This problem can be overcome by developing an
efficient parallel algorithm for the sequential code,
MOLS4ME.
In an attempt to achieve this objective, sequential
codes previously developed for (i) transient 1-D laminar
flow in a circular pipe (ii) transient 2-D laminar flow in a
short pipe with sudden expansion are also parallelized in
this study (Ersahin, 2001). The reason behind the
selection of these test cases was firstly due to the
recently proven superiority of MOL over FDM in terms
of accuracy, central processing unit (CPU) and set-up
times (Oymak and Selcuk, 1993; Selcuk et al., 2002) and
flexibility for incorporation of other conservation
equations and secondly due to the presence of sequential
codes for these test cases for comparative testing
purposes. The first test case provides the evaluation of
the performances of parallel implementation of MOL
C. ERSAHIN et al.82
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
and FDMs on a simple 1-D problem while the second
test case displays the effect of parallelization on a
multi-dimensional problem with regular geometry. To the
authors’ knowledge, parallel implementation of Navier–
Stokes flow solvers based on MOL solutions is not
available to date.
GOVERNING EQUATIONS
The Navier–Stokes equations for transient two-dimen-
sional incompressible flows in cylindrical coordinates can
be written as follows;
continuity
›u
›zþ
›v
›rþ
v
r¼ 0 ð1Þ
z-momentum
›u
›tþ u
›u
›zþ v
›u
›r
¼ 21
r
›p
›zþ
m
r
›2u
›r 2þ
1
r
›u
›rþ
›2u
›z2
� �þ gz
ð2Þ
r-momentum
›v
›tþ u
›v
›zþ v
›v
›r
¼ 21
r
›p
›rþ
m
r
›2v
›r 2þ
1
r
›v
›r2
v
r 2þ
›2v
›z2
� �þ gr
ð3Þ
where t is time, u and v represent axial (z) and radial (r)
velocity components, respectively, p is static pressure, r
and m are density and dynamic viscosity, respectively, and
g stands for gravitational acceleration.
The initial and boundary conditions are;
IC : @t ¼ 0; ;r ^ ;z : u ¼ 0; v ¼ 0
BC 1 : @r ¼ 0; ;t ^ ;z :›u
›r¼ 0; v ¼ 0
BC 2 : @r ¼ R; ;t ^ ;z : u ¼ 0; v ¼ 0
BC 3 : @z ¼ 0; ;t ^ ;r : u ¼ uin; v ¼ 0
BC 4 : @z ¼ L; ;t ^ ;r :›2u
›z2¼ 0;
›2v
›z2¼ 0:
BC 1, BC 2 and BC 3 represent axial symmetry, no-
slip velocity boundary condition at the stationary wall
and the velocity profile at the inlet, respectively. BC 4
describes the soft boundary condition at the outlet
denoting developing flow. Before the sudden start of
the flow ðt ¼ 0Þ; a zero velocity field is prescribed in
the entire domain, except at the inlet of the flow
geometry where a certain mass flux with a specified
velocity profile is set.
NUMERICAL SOLUTION TECHNIQUE
Method of Lines
In this study, the governing equations are solved using
the numerical MOL (Schiesser, 1991). Many existing
numerical algorithms for transient PDEs can be
considered as MOL algorithms. The most important
difference of the MOL approach from the conventional
methods is that in the MOL approach higher-order,
implicit and hence stable numerical algorithms for time
integration are used. For the numerical solution of the
same PDE system, the MOL approach and the
conventional methods, in which lower-order either
explicit or implicit time integration methods are used,
have the same system of ODEs as a result of spatial
discretization. Therefore, the stability of the ODE
problem, which can only be achieved by scheme
adaptive spatial discretization of convective terms in a
zone of dependence manner, should be satisfied not only
for the MOL approach but also for the conventional
methods. However, note that satisfaction of the stability
of the ODE system as a result of spatial discretization
does not necessarily mean that the final solution as a
result of time integration will also be stable. So, in order
to have absolutely stable and accurate solutions, the first
condition is to satisfy the ODE problem stability, and
the second is to use sophisticated (higher-order and
implicit) time integration methods. In this study, the first
is provided by utilizing an intelligent higher-order
spatial discretization scheme, namely five-point
Lagrange interpolation polynomial, of Oymak and
Selcuk (1997a), and the second, time integration, is
achieved by using higher-order and stable schemes
embedded in a quality ODE solver, namely, LSODES of
the LSODE family (Radhakrishnan and Hindmarsh, 1993)
in which the backward differentiation formula (BDF)
method is accommodated.
Treatment of Pressure Gradient and Computation ofRadial Velocity Component
The computation of pressure is the most difficult and
time-consuming part of the overall solution of the
Navier–Stokes equations and involves an iterative
procedure between the velocity and pressure fields
through the solution of a Poisson-type equation for
pressure to satisfy the global mass flow constraint and
the divergence-free condition for confined incompres-
sible flows. Therefore, in this paper, a non-iterative
procedure for the calculation of pressure suggested by
Raithby and Schneider (1979) and Patankar and
Spalding (1972), and applied by Oymak and Selcuk
PARALLEL UNSTEADY N–S BY METHOD OF LINES 83
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
(1996; 1997a,b), is used. This procedure is based on the
fact that in the numerical solution of the Navier–Stokes
equations for internal flows, the streamwise pressure
gradient must be known in such a way that the mass
conservation at each cross-section is satisfied. In order to
accomplish this, the static pressure p (z,r,t) in the
momentum equations is split into two parts
p ðz; r; tÞ ¼ pðz; tÞ þ ~pðz; r; tÞ ð4Þ
as suggested in Raithby and Schneider (1979) and
Patankar and Spalding (1972).
As can be seen from Eq. (4), p is independent of the r
direction. Hence, the derivative of Eq. (4) with respect to z
and r directions yields
›p
›z¼
›p
›zþ
›~p
›z
and
›p
›r¼
›~p
›r:
The physical assumption in this decoupling procedure is
that ›~p=›z is very small compared with ›p=›z. Therefore,
when the pressure field is split into two in this manner, the
z-momentum equation can be written as
›u
›tþ u
›u
›zþ v
›u
›r
¼ 21
r
›p
›zþ
m
r
›2u
›r 2þ
1
r
›u
›rþ
›2u
›z2
� �þ gz:
ð5Þ
The pressure gradient in Eq. (5), ›~p=›z, is determined
with the aid of global mass flow constraint combined
with the discretized form of the z-momentum equation as
follows:
u nþ1i; j ¼ Fn
i; j þ›p
›z
� �n
j
C n ð6Þ
where
Fni; j ¼ un
i; j 2 Dt uni; j
›u
›z
� �n
i; j
þ vni; j
›u
›r
� �n
i; j
(
þ n›2u
›z2
� �n
i; j
þ1
r
›u
›r
� �n
i; j
þ›2u
›r 2
� �n
i; j
" #); ð7Þ
C n ¼ 2Dt
rn:
Equation (7) is then multiplied by the density rnþ1 and
the resulting equation is subsequently integrated numeri-
cally over the cross-sectional area perpendicular to
the streamwise direction. This yieldsð2p
0
ðR
0
rnþ1u nþ1i; j r dr du
¼ _m ¼
ð2p
0
ðR
0
rnþ1Fni; jr dr du
þ›p
›z
� �n
j
ð2p
0
ðR
0
rnþ1C nr dr du:
ð8Þ
Note that the density rnþ1 is not known a priori.
Since the mass flow is pre-specified by the problem
inlet boundary condition, the pressure gradient ›p=›z
can be computed from the following expression
obtained by rearranging Eq. (8).
›p
›z
� �n
j
¼2prn
Ð R
0Fn
i; jr dr 2 _m
pR2Dt: ð9Þ
The r-component velocity vðr; z; tÞ; on the other hand, is
determined with the direct utilization of the continuity
equation. For this purpose the continuity equation [Eq. (1)]
is discretized as
vniþ1;j ¼
ri
riþ1
vni; j 2 Drþi
›u
›z
� �n
i; j
" #for i ¼ 1; . . .; IR 2 2;
j ¼ 2; . . .; JZ; Drþi ¼ riþ1 2 ri: ð10Þ
Hence, by this formulation not only the r-component
velocity vðr; z; tÞ is computed without an extra burden to
the ODE solver, but also the divergence-free condition for
incompressible flows is satisfied automatically.
Higher-order Intelligent Spatial Discretization Scheme
In the present study, the spatial derivatives in the Navier–
Stokes equations are approximated by utilizing the general
definition of the fourth-order, five-point Lagrange
interpolation polynomial
y ¼X5
i¼1i–j
Y5
j¼1
x 2 xj
xi 2 xj
yi ð11Þ
which makes it possible to investigate the solutions of the
governing equations by a higher-order discretization
scheme on both uniform and non-uniform grid topology.
The discretization procedure is applied on both radial
and axial directions after the transformation of the
concerned dependent variable, say wðr; z; tÞ; into its
pseudo-1-D form, wð�x; tÞ; in a certain spatial direction �x;for a value of remainder direction, by transforming the 2-D
array into 1-D array. Here, �x stands for either the radial
direction r or axial direction z, depending on the direction
in which discretization of the dependent variable is carried
C. ERSAHIN et al.84
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
out. Details of the transformation and discretization
procedure can be found in Oymak (1997). Substitution
of the spatial derivatives into the Navier– Stokes
equations constitutes the following coupled system of
ODEs in time.
d �V
dt¼ �Fð �VÞ ð12Þ
where
�V ¼ u1;1; u1;2; . . .; uIR;JZ ; v1;1; v1;2; . . .; vIR;JZ
� �Tð13Þ
�F ¼ F1;1;F1;2; . . .;FIR;JZ
� �T: ð14Þ
Here IR and JZ are the number of grid points in the
r- and z-direction, respectively.
The convective terms in the governing equation are
approximated in such a way that the resulting system of
ODEs should be stable according to the linear stability
theory (Hirsch, 1988; Schiesser, 1991). This is achieved
by using a multidimensional intelligent scheme (scheme-
adaptive method) which is based on the choice of biased-
upwind or biased-downwind stencils for the convective
derivatives according to the sign of the coefficient of the
associated derivative (Oymak and Selcuk, 1997a).
Time Integration
The integration of the resulting ODEs derived from the
discretization of the Navier–Stokes equations is carried
out by an implicit algorithm, namely, BDF embedded
in the well-known ODE solver LSODES from the
LSODE family (Radhakrishnan and Hindmarsh, 1993).
The implicit nature of the solution method requires some
additional discussion. In order to illustrate this, a typical
implicit formulation for the solution of ODEs can be
written in the form of backward Euler method as
�Vnþ1 ¼ �Vn þ �FðVnþ1ÞDt ð15Þ
where �Vnþ1 and �FðV nþ1Þ are the solution and the
derivative vectors, respectively. As can be seen from
Eq. (15), the derivative vector is evaluated at the next time
level. In other words, Eq. (15) is implicit in the derivative
vector, �FðVnþ1Þ: It is this implicit term that gives the
method its good stability properties. Therefore, the
elegance of the MOL is that it shares the advantages of
both the explicit and the implicit methods. In the MOL, the
spatial derivatives and the source terms are evaluated at
the previous time level as applied in the explicit approach,
so that no linearization problem arises. Furthermore, the
solution of the resulting ODEs is carried out by an implicit
algorithm such as the implicit Adams–Moulton method or
the BDF method. Hence, it can be concluded that the MOL
has the simplicity of the explicit approach and the power
of the implicit one, unless a poor algorithm for the solution
of ODEs is employed.
Code and its Mode of Operation
The MOLS4ME is a general program for computing
single-phase, transient, 2D, incompressible, internal flows
which may be laminar or turbulent in complex cylindrical
geometries (Oymak, 1997). The code uses a body-fitted
coordinate system. Therefore, the spacing of the
computational grid in the physical domain may be
arbitrarily specified as uniform/non-uniform or regular/-
irregular. Details of the code structure can be found
elsewhere (Oymak, 1997).
The mode of operation of MOLS4ME can be described
as follows. The general algorithm for the solution of the
governing equations of fluid dynamics by using the MOL
approach is based on the evaluation of the derivative
vector by which the solution is advanced from one time
step to the next. Once the derivative vector is obtained, the
first step in solving the system of equations is to combine
the dependent variables into a one-dimensional array.
The evaluation of the derivative vector can be summarized
as follows.
The complete velocity field satisfying the continuity
equation, is known a priori at the beginning of each cycle,
either as a result of the previous cycle or from the
prescribed initial conditions for the dependent variables.
Once the spatial derivatives appearing in the governing
equations are evaluated using values of the present
cycle, the normal component of the velocity is calculated
by the direct utilization of the continuity equation, so
that the divergence-free condition is ensured automati-
cally. Then the corresponding pressure gradients along the
axial direction are calculated to ensure that the mass flow
is conserved. Once these calculations have been made, the
derivative vector is calculated over the spatial domain of
interest, then it is sent to the ODE solver in the form of
one-dimensional array to compute the dependent variables
at the advanced time level. This completes the progression
of the solution to the end of the new cycle having the new
values of the velocity field. At user specified time intervals
(tp) the ODE solver sends the current solutions to the main
program. This cyclic procedure is then continued until the
steady state is reached.
PARALLEL IMPLEMENTATION
For parallel implementation, domain decomposition and
the overlapping boundary condition technique are
implemented as shown in Fig. 4. The solution domain
may be decomposed into as many sub-domains as the
number of processors available for parallel processing.
The solution process for each sub-domain may then be
assigned to a processor. Parallel virtual machine (PVM)
message passing software is employed for the information
exchange at the overlapped intergrid boundaries (Fig. 4).
The parallel algorithm is based on the master–worker
paradigm, where the master process generates the
computational grid, sets the initial and physical boundary
PARALLEL UNSTEADY N–S BY METHOD OF LINES 85
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
conditions, decomposes the domain into sub-domains, and
sends the related data to the worker processes. The proper
inlet and outlet boundary conditions are recognized and
applied by the worker processes.
Each worker process implements the same solution
algorithm as the serial code, described under the “Code
and its mode of operation” section, on the sub-domain
assigned to itself, advances in time and exchanges
necessary intergrid boundary condition data with the
neighboring sub-domains at user defined time steps (tp)
and sends transient results to the master process at user
defined time intervals to follow the transient solution until
the steady-state is reached.
It should be noted that fourth-order accurate spatial
discretization is used. To preserve the same accuracy at the
intergrid boundaries, three point overlapping is employed.
As a result, at each intergrid boundary, solution variables
along three grid lines parallel to the boundary are
exchanged between the sub-domains.
Another characteristic of the MOL appears during
marching in time. The ODE solver itself determines the
time steps (Dt) for the solution of the set of ODEs and the
solutions at certain time steps (tp) may be requested from
the solver. Determination of the time steps, tp, does not
affect the time steps, Dt, used for the solution of the ODE
set. However, in the parallel algorithm the value of the
time step, tp, has an effect on the overall accuracy due to
the application of the intergrid boundary condition at each
tp, time intervals. As the value of tp increases, the
frequency of the data exchange at the intergrid boundary
condition decreases. Preliminary studies show that the
value of tp used in the serial code is sufficient for the
accuracy of the parallel code in terms of accuracy both in
transient and steady solutions.
Performance Evaluation
Speed-up (S) and efficiency (E) of the parallel processing
were evaluated by comparing the execution times of the
parallel code on a single (T1) and p processor (Tp) runs.
The speed up and efficiency are defined as,
S ¼T1
Tp
; ð16Þ
E ¼S
p: ð17Þ
RESULTS AND DISCUSSIONS
The results obtained by both serial and parallel codes for
(i) transient 1-D laminar flow in a circular pipe, (ii)
transient 2-D laminar flow in a short pipe with sudden
expansion and (iii) transient turbulent flow in a gas turbine
combustor simulator (GTCS) were compared in terms of
accuracy and execution time. For comparative testing
purposes, computations were carried out using the same
grids, time steps, final time and user specified error
tolerances for both serial and parallel codes and for each
test case, respectively.
The runs were carried out on dual-processor personal
computers (PCs), which are equipped with Pentium III-
700 MHz processors and with 512 MB RAM. The
operating system used on these machines is Linux kernel
version 2.2.14 and they are on an ethernet network through
a 100 Mbps switch. FORTRAN 90 and PVM v.3.4.3 are
employed as the programming language and message
passing software, respectively.
Test Case 1: Transient 1-D Laminar Flow in a Circular
Pipe
Oymak and Selcuk (1993) have investigated the accuracy
and efficiency of the sequential codes for MOL, explicit
and implicit FDMs on a start-up flow in an infinitely long
pipe for which an exact analytical solution has been
provided by Szymanski (Bird et al., 1960). The long pipe
of length L and radius R contains a fluid at constant
density, r and viscosity, m. Initially, the fluid in the pipe is
at rest. At time t ¼ 0 a pressure gradient ðpin 2 pLÞ=L is
applied to the system and the fluid starts to flow. The
problem is to determine the variation of transient velocity
profiles. In this study, these sequential codes were
parallelized and tested for accuracy and efficiency of
parallelization by comparing their results with those of
serial codes.
As the problem is one-dimensional, the domain
decomposition was carried out in the radial direction
only. Steady state velocity profiles were obtained by
running parallel and serial codes for 101 grid points with a
time step of 1 £ 1025, 1 £ 1024 and 3 £ 1023 for EFDM,
IFDM and MOL, respectively. Relative and absolute error
tolerances of the ODE solver were 1024 and 10210,
respectively, for the MOL solution.
In order to display the discrepancies between point
values of the serial and parallel solutions, percentage
absolute errors defined as;
1 ¼jFparallel 2Fsequentialj
Fsequential
£ 100 ð18Þ
were calculated for all grids points and for each numerical
method of solution and average and maximum absolute
percentage errors for all numerical methods were
displayed in Table I. As can be seen from the table, the
results obtained by the parallel solution are in good
agreement with those of the serial ones.
TABLE I Accuracy of parallel code for each numerical method
Method Average error (%) Maximum error (%)
EFDM 0.3949 0.4912IFDM 0.5433 0.6257MOL 0.0246 0.0273
C. ERSAHIN et al.86
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
For the evaluation of performance of the parallel
programs for all the numerical techniques, the execution
times of serial and parallel programs run by using dual-
processors, speed-up and efficiency values were compared
and summarized in Table II. It is noted that the execution
times of all the parallel programs are much higher than
those of serial ones, which is reflected by speed-up values
much less than unity. The reason for this is considered to
be due to the simplicity of the test case and the low ratio of
computation to communication time for the number of
grid points employed. As can be seen from Table II, the
efficiency of the MOL is very low compared to those of
the EFDM and IFDM due to its very low computational
work compared to the others. This outcome is expected
due to its shorter execution time than others. The major
portion of its execution time is occupied by communi-
cation for data exchange rather than computation.
However, for multi-dimensional problems with complex
geometries where a large number of grid points is required
for accurate solutions, parallel solution is expected to
provide higher speed-up and efficiency values.
Test Case 2: Transient 2-D Laminar Flow in a Short
Pipe with Sudden Expansion
The geometry under consideration in this test case is a
circular pipe with a sudden expansion. Measurements and
numerical predictions of the flow field for the laminar flow
of engine oil in the pipe are available in the literature
(Macagno and Hung, 1967). The aspect ratio of the inlet
diameter Din to the outlet diameter D is 1:2 and the total
length, L, of the pipe is 20 Din. The geometry of the system
is shown in Fig. 1.
The original serial code based on the MOL developed
and tested by Oymak and Selcuk (1997a,b) was parallelized
in this study. The parallel and serial codes were run for
three different Reynolds numbers of 36, 60 and 198,
where the Reynolds number is based upon the inlet mean
velocity, uin, and the inlet diameter, Din. Results were
obtained for user specified relative and absolute error
tolerances of 1021 in the ODE solver, 1023 for tp and for
final time of 0.5 s.
The solution domain is decomposed into 4 sub-domains
in the axial direction with overlapping boundary regions at
the intersections and the parallel code is executed on one,
two and four processors. The predictive accuracy of the
parallel code is determined by comparing the steady state
results obtained by the parallel code with the ones
obtained by the serial code and by comparing some
representative points in the solution domain during
unsteady solution. The steady state solution comparisons
are given in Fig. 2 for the Reynolds numbers under
consideration. As can be seen from Fig. 2, results of both
the serial and parallel code are in perfect agreement with
each other. The maximum point error in the solution
domain, where sharp gradients exist, has been calculated
as 0.5%. Both comparisons show that the parallel code
satisfies the same predictive accuracy as the serial code.
An important parameter for the comparison of sudden
expansion flow predictions is the reattachment length.
The comparison between predictions of the serial and
parallel computations for the reattachment lengths is
given in Table III for the Reynolds numbers concerned.
As can be seen from the table, the reattachment lengths
predicted by the parallel and serial codes are in favorable
agreement.
Speed-up (S) and efficiency (E) of the parallel code
for the Reynolds numbers under consideration are given
in Fig. 3. As can be seen from the speed-up (S) vs.
number of processor ( p) curves, an increase in the
number of processors causes an increase in the speed-up
with a deviation from the ideal case. This deviation
from the ideal case can also be seen from the efficiency
(E) vs. number of processor ( p) curves. However,
this deviation from the ideal case is slight therefore it
can be concluded that the performance of the parallel
code is satisfactory and it can be scalable to more
processors.
Although the computational load is low owing to the
low grid point utilization due to the laminar case, the
obtained results, both in terms of speed-up and efficiency,
are promising for the turbulent cases where a higher
number of grid points are employed.
Test Case 3: Transient Turbulent Flow in a Gas
Turbine Combustor Simulator
The sequential code, MOLS4ME (Selcuk and Oymak,
1999), was recently applied to a GTCS with complex
geometry. The GTCS is a cylindrical test combustor rig
43.2 cm in length (excluding the exit cone length of
15.4 cm) and 10.16 cm in diameter. The primary air
stream is injected through a 0.845 cm diameter inlet in
the center of the bluff-body and the secondary air
stream flows through the annular gap formed by
the outer edge of the bluff-body and the rig floor.FIGURE 1 Physical system and relative dimensions.
TABLE II Performance of the parallel code for each numerical method
Method
Execution time (s)Speed-up
(2 processors)Efficiency
(%)Serial Parallel
EFDM 249 402 0.62 31IFDM 34.7 134 0.26 13MOL 0.22 17.2 0.01 0.5
PARALLEL UNSTEADY N–S BY METHOD OF LINES 87
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
The details of the GTCS can be found elsewhere
(Selcuk and Oymak, 1999).
As the flow in GTCS is highly turbulent ðRe ¼
35000Þ and considering that the code is based on DNS
and that DNS requires a number of grid points
proportional to Re 3/2 for the present two-dimensional
application, the problem requirement was relaxed by
returning to the classical approach, i.e. employing
artificial viscosity (Selcuk and Oymak, 1999). Favorable
FIGURE 2 Steady-state streamlines and axial velocity contours for (a) Re ¼ 36; (b) Re ¼ 60 and (c) Re ¼ 198:
TABLE III Reattachment lengths predicted by parallel and serial codesfor all Reynolds numbers
Reattachment length (cm)
Reynoldsnumber
Serialcode
Parallelcode
36 1.62 1.6160 2.57 2.62198 8.57 8.53
C. ERSAHIN et al.88
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
comparisons between steady-state predictions and
measurements were obtained for an artificial viscosity
of 100m. However, the success of the code eventually
depends on the excessive number of grid points required
by DNS. This has led to the parallelization of the serial
code carried out in this study. For this purpose the
physical domain was decomposed into four sub-domains
in the axial direction as shown in Fig. 4. Serial and
parallel codes were run under the same conditions using
the same number of grid points, 151 £ 151 in the axial
and radial directions.
Figure 5 shows the steady-state solutions obtained by
serial (MOLS4ME) and parallel (PARMOLS4ME) codes.
As can be seen from the figure, the streamline patterns and
the velocity contours predicted by the parallel code are in
good agreement with those of the serial one. Reattachment
lengths at the wall and the center predicted by both codes
are the same.
FIGURE 3 Domain decomposition.
PARALLEL UNSTEADY N–S BY METHOD OF LINES 89
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
Time-averaged and instantaneous velocities at some
points corresponding to steep gradients in the flow field
close to the inlet of GTCS were recorded. Absolute
percentage errors of the steady state time-averaged
velocities of the parallel code for all points under
consideration were calculated as the absolute difference
between parallel and serial code results divided by
maximum inlet velocity (usj), multiplied by 100 resulting
in a maximum error of about 2% for the points located in
the region of steep gradients.
To investigate the effect of load balancing in parallel
computations, the computational domain is first divided
into four sub-domains having an equal number of grid
points and the CPU time of each processor is recorded and
shown in a pie chart in Fig. 6 (a). As can be seen from the
figure, the CPU usage is not evenly distributed among the
workers for an equal number of grid points in each sub-
domain. It should be noted that the MOL solution in each
sub-domain adjusts the time step, Dt, based on the solution
gradients in that domain. Therefore, in regions of large
gradients, smaller time steps are used, resulting in longer
execution times in each intergrid boundary data exchange
period, tp. This uneven load distribution among the
processors leads to idleness of some of the processors, and
hence decreases the parallel efficiency.
In order to improve the parallel efficiency, the number
of grid points in each sub-domain is adjusted so that the
workload of each sub-domain is evenly distributed in
terms of computational work rather than the number of
grid points. The sub-domains, which contain sharp
gradients of dependent variables, are assigned a lower
number of grid points than the others. As can be seen
from Fig. 6(b), this ensures a better load balancing
between the sub-domains.
Speed-up and efficiencies with and without load
balancing are shown in Fig. 7. As can be seen from the
figure, load balancing improves the parallel efficiency
significantly. However, as the number of processors
increases, the communication overhead increases and
efficiency decreases. In the case of 16 sub-domains, due to
the small number of grid points in each sub-domain, load
balancing was not performed.
FIGURE 5 Steady-state streamlines axial velocity contours for parallel code and serial code.
FIGURE 4 Speed-up and efficiency versus number of processors.
C. ERSAHIN et al.90
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
CONCLUSION
In this study, a parallelization strategy for a novel transient
Navier – Stokes solver (MOLS4ME) for the direct
numerical simulation of incompressible separated internal
flows in complex cylindrical geometries is presented.
The domain decomposition technique and PVM as
a message passing paradigm were utilized with
an improved static load balancing technique. Overlapped
sub-grid boundaries are implemented to satisfy
the intergrid boundary condition between sub-domains. A
cluster of PCs with dual-processors was used as the parallel
processing platform.
Performance of the parallel code (PARMOLS4ME) was
tested by applying it to two test cases; (i) laminar pipe flow
with sudden expansion and (ii) turbulent flow in a GTCS
and comparing the results of serial and parallel codes.
Numerical experimentations carried out in this study show
that up to nine-fold speed-up with 16 processors is
possible without any loss in numerical accuracy and that
load balancing based on computational workload rather
than the number of grid points between the sub-domains
has an important role on the parallel efficiency.
The parallel code developed in this study constitutes a
major improvement to the serial code employed in the
prediction of transient, incompressible, separated internal
flows and provides an algorithm for future DNS
applications.
Acknowledgements
This study was performed as part of Middle East Technical
University research funding projects AFP-2000-03-04-02
and AFP-2001-03-04-02. The support is gratefully
acknowledged.
FIGURE 7 Speed-up and efficiency versus number of processors.
FIGURE 6 CPU distribution among the processors for (a) equal numberof grid points (b) unequal number of grid points.
PARALLEL UNSTEADY N–S BY METHOD OF LINES 91
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013
References
Bird, R.B., Stewart, W.E. and Lightfoot, E.N. (1960) TransportPhenomena (John Wiley & Sons, New York).
Eggels, J.G.M., Unger, F., Weiss, M.H., Westerweel, J., Adrian, R.J.,Friedrich, R. and Nieuwstadt, E.T.M. (1994) “Fully developedturbulent pipe flow: a comparison between direct numericalsimulation and experiment”, J. Fluid Mech. 268, 175–209.
Ersahin, C. (2001) “Parallelization of a transient Navier–Stokes codebased on method of lines solution”, M.Sc. Thesis, Chem. Eng. Dept.,Middle East Technical University (Turkey).
Hirsch, C. (1988) Numerical Computation of Internal and External FlowsFundamentals of Numerical Discretization (John Wiley & Sons,Chichester) Vol. 1.
Kim, J., Moin, P. and Moser, R. (1987) “Turbulence statistics in fullydeveloped channel flow at low Reynolds number”, J. Fluid Mech.177, 133–166.
Le, H., Moin, P. and Kim, J. (1997) “Direct simulation of turbulent flowover a backward-facing step”, J. Fluid Mech. 330, 349–374.
Macagno, O.E. and Hung, T.-K. (1967) “Computational and experi-mental study of a captive annular Eddy”, J. Fluid Mech. 28, 43–64.
Oymak, O. (1997) “Method of lines solution of time-dependent two-dimensional Navier–Stokes equations for incompressible separatedinternal flows”, Ph.D. Thesis, Chem. Eng. Dept., Middle EastTechnical University (Turkey).
Oymak, O. and Selcuk, N. (1993) “MOL vs FDM solutions ofan unsteady viscous flow problem”, Proc. Eight Int. Conf. OnNumerical Methods in Laminar and Turbulent Flows, Swansea, 1,153–162.
Oymak, O. and Selcuk, N. (1996) “Method of lines solution of time-dependent two-dimensional Navier–Stokes equations”, Int. J. Numer.Meth. Fluids 23, 455–466.
Oymak, O. and Selcuk, N. (1997a) “Transient simulation of internalseparated flows using an intelligent higher-order spatial discretizationscheme”, Int. J. Numer. Meth. Fluids 24, 759–769.
Oymak, O. and Selcuk, N. (1997b) “The method of lines solution of time-dependent 2D Navier–Stokes equations coupled with the energyequation”, Int. Symp. on Advances in Computational Heat Transfer,Cesme, Turkey.
Parneix, S., Laurence, D. and Durbin, P.A. (1998) “A procedure for usingDNS databases”, J. Fluids Eng. 120, 40–47.
Patankar, S.V. and Spalding, D.B. (1972) “A calculation procedure forheat, mass and momentum transfer in three-dimensional parabolicflows”, Int. J. Heat Mass Transfer 15, 1787–1806.
Radhakrishnan, K. and Hindmarsh, A.C. (1993) “Description and use ofLSODE, The Livermore solver for ordinary differential equations”,Technical Report UCRL-ID-113855, Lawrence Livermore NationalLaboratory, NASA.
Raithby, G.D. and Schneider, G.E. (1979) “Numerical solution ofproblems in incompressible fluid flow: treatment of the velocitypressure coupling”, Numer. Heat Transfer 2, 417–440.
Schiesser, W.E. (1991) The numerical method of lines integration of partialdifferential equations (Academic Press, Inc., San Diego).
Selcuk, N. and Oymak, N. (1999) “A novel code for the predictionof transient flow field in a gas turbine combustor simulator”, AVTSymposium on Gas Turbine Engine Combustion, Emissions andAlternative Fuels, 12 – 16 October 1998, Lisbon, Portugal,NATO/RTO Meeting Proceedings 14 11/1-10.
Selcuk, N., Tarhan, T. and Tanrıkulu, S. (2002) “Comparison of methodof lines and finite difference solutions of 2-D Navier–Stokesequations for transient laminar pipe flow”, Int. J. Numer. Meth. Eng.53, 1615–1628.
Tarhan, T. and Selcuk, N. (2001) “Method of lines for transient flowfields”, Int. J. Comput. Fluid Dyn. 15, 309–328.
C. ERSAHIN et al.92
Dow
nloa
ded
by [
Dal
hous
ie U
nive
rsity
] at
03:
37 1
4 N
ovem
ber
2013