[American Institute of Aeronautics and Astronautics 28th Aerospace Sciences Meeting - Reno,NV,U.S.A....

AlAA 90-0333 Parallel Numerical Algorithms for Fluid Dynamics Simulation A. Lin Temple Univ. Philadelphia, PA

28th Aerospace Sciences Meeting January 8-1 1, 1990/Reno, Nevada

For permission to copy or republish, contact the American Institute of Aeronautics and Astronautics 370 L’Enfant Promenade, S.W., Washington, D.C. 20024

AIAA-90.0333

Parallel Numerical Algorithms for Fluid Dynamics Simulation

Avi Lin' Temple University

Philadelphia, PA 19122

1 Abstract

We describe and briefly analyze a new approach for a fast solution of fluid flows. It is based on two relatively ncw approachcs: the domain decomposition (DDC) method and thc operator splitting (OPD) method. Dy blending these methods together, a degenerate parallel scheme is obtained which is very efficient, easy and simple to use and attractive in the sense that it is not as sensitive to the nature of the parallel system employed. This parallcl numcrical scheme, which is basically a multilevel iterative procedure has all the ingridients to fit well het- crogeneous parallel systems. In terms of the iterative scheme, it can use almost any itcrativc procedure in the 01'11 parallcl levels.

/'

2 Introduction

In the present paper we consider the numerical simulation of the two and three dimensional Navier Stokes (NS) cquations on an hierarchical looscly coupled parallel computing environment. The importance of this parallcl computing issue has grown in recent yeam [1,2]. We will dcscrihe an eficient parallel implementation of the computational Navier-Stokes (NS) cquations on a pnrallcl machinc. This is u part of an on-going research effort to gonerate near optimal parallel schemes for the NS equations; 131, 141, 151 and 1311 can serve as a basis for this approach.

T h e parallel machines to he considered here are quite general distributed mcmory machines. The goal is to

'Associate Professor, Department of Mathematics Seiiior Membcr :AIAh

develop an approach which is not sensitive to the parallel architccture and the communication network, at least for Iargc problems. There are several different parallel distributed environments and thus portability and cfficiency are important issues. The classical way to overcome this difficulty is by generating parallel algorithms with much smaller interconnection time complexity than their parallel computation time complexity. Another consideration that a general parallel scheme may yield is the flexibility to choose two ore more different methods and algorithms to solve a given problem in a parallel manner. An appropriate combination of the schemes can result in a much better method. Most of the times this blending of several parallel algorithms (For the same problem) is insensitive to the parallel computing environment (a nice example for this approach was set up in the past for the parallel sort 1111). It was suggested rccently 1311 that somctimes thc best parallelism is achieved for problems that contain several different indepcndent features. Then a parallel algorithm is applied more naturally and most of the times morc effectively. This approach is very suitable for the NS equations, and is one of the main subjects of the present paper.

We will consider here the steady-state incompress- ible laminar NS equations which are of the elliptic type 181. Thc non-lincurity nature of the canvcction tcrms is not of a primary concern in the present paper, and we will assumc that they are treated using the quasi-linear (second order) iterative procedure 19, 311. The following linear system is solved at each iteration level:

aui azi - = 0

This system is solved for U which is the current flow vclocity vector, where V is the known flow velocity vector from the previous iteration. (1) is a linear elliptic PDE system, and thus combines two basic independent elements: the first is the domain over which this system is solved and the second is the set of equations to he solved. In the present approach, we take advantage of this fact, suggesting a parallel strategy that combines lhcse two elements, which may lead to a quit efficient p;~rallcl algorithm.

3 The Parallel Split

Technique.

The model of computation that will be used here is the parallel degenerate system 113, 311. According to this model any machine in the parallel environment can itself be a parallel machine. This description suits many of the parailel machines today. Such a system can be used most effcctively when the parallel algorithm is bierarchi- cal in nature. We will show that parallel computational schemes for equation (1) can be formulated in some kind of an hierarchical structure, and thus are well applied to the degenerate parallel computational model. Parallel numerical schemes for elliptic PDEs can be hierarchical in nnturc, since they contain two different elements, namely, the domain Q and the set of PDEs. Thus, at each level there arc two independent directions along which the parallelism can be explored. One is a parallel split of the sot of PDEs (operator decomposition or OPD) into a several subsystems such that an appropriate parallel combination of their solutions will give the final rcsult 131). The second direction is a parallel split of the domain - known also as domain decomposition (or DDC) [IO].

Domain decomposition (or substructuring, as it is sometimes called) is quite classical in nature, and has bcen used as a practical carrier for numerical computations for a long time [15, 16,311. It consists of asolution procedure of the algebraic system (resulting from the discretization over a regular grid) reduced to the solution of problems in the subdomains plus a linear system for the interface unknowns, which is usually called the capacitance system [14]. The number of subdomains is usually the same as the number of processors P avail- able to solve the problem. Most of the recent DDC algorithms make use of the preconditioned conjugate gradicnt itcration in the principal iteration loop of the solution procedure (171. There are other types of DDC mctliods like a combination of DDC with Jacobi methods 1181 and with multigrid methods 1191. Most of the timcs the DDC is applied iteratively mainly because it is inconvcnient to solve directly the capacitance system (it is very sparse). Usually it is done using the Schwartz concept 126, 271.

For thc OPD, assume the following elliptic problem for 4:

Ad = R

where A is elliptic operator. According to the OPD splitting tcchniqne (311, we seek a set of different (independent in some sense) operators {C;}zl such that each element Ci will be as close as possible to A and easicr invert (in terms of computational time complexity) than A. The procedure starts by assuming initial guess do and at each iteration k, rn problems of the type

(2) ci$(’) = R - ~ ~ 4 ( ~ 1 , i = 1,2 ,... , m

are solved in parallel (using rn different processors). Then the results are linearly combined to give the value of the next iterate:

(3)

where a = are rn operators satisfying: m

E a ; = I i=1

and the n operators B = {B,},”=, satisfy the following condition:

L r”

C a i C ; ’ ( A - B < ) = I (4) 2-1

The two sets, a and B, should be chosen so that the rate of convergence of the iterative scheme will be max- imiaed: rn

X[C ajc;l] + 0 ( 5 ) j=1

where X is the spcctral radius of the appropriate operator. There are several possibilities for the set C when the NS equations are considered. The main approach, which is analyzed in IS] and 1311, is to choose the Ci’s so that they will contain as many terms as possible that are similar to those of A. This may be done simply by deleting from A the minimum number of t e r m so that the complexity of the computation of the inverse is reduced [say from O(nS) to O(n2)]. In any event, the leading concept in finding such matrices is that they will be as “mutually distinct” as possible: Cl and C2 are distinct if the set of solutions generated by 4, with a givcn sct of boundary conditions, is independent of the set of solutions generated by the operator C2 with the same set of boundary conditions.

It was suggested [SI and shown 1311 that the parabw lized operators along the domain’s coordinates is area- sormble choice for the elements of C. The modified strongly imp!icit (MSI) procedure was also suggested

4

t of convergence towards the steady state (determined 2 1 0

i - 2 i- 1 i X mainly by the short waves of the error propagation) is quite fast: if I and U are the respective length scale and the convcction coefficient scale used to normalize the linear system, then the choice of A = l / U gives the best coefficient of convergence, which is about 3.771' 129, 311. The boundary conditions are treated similarly and

Figure 1: Location of the grid points for the present scheme .~

the reader can consult with 1291 or 1311. As a study case for this paper we choose the square

cavity problem. The flow inside a two dimensional driven square cavity is fairly simDie and well understood so

in [31j as a possible OPD for the linear version of the NS equations. Before analyzing them and other OPDs lets discuss briefly the numerical approach for equation ,.>

I l l . Past experience indicates 1311 that usually the paral-

le1 implementation of the OPD concept is done more effectively in an iterative manner. It should be noted that thcrc arc some theoretical indications [4,5] that this is the only way to implement the OPD in parallel.

that we can concentrate on the parallel computational method.

5 One Dimensional OPD-DDC

4 -T-h.e-Numerical Scheme. We bring here the one dimensional case to show clearly the use of the OPD-DDC approach on parallel systems in conjunction with elliptic problems. Consider the linear boundary value problem:

We will consider here the solution of two and three dimensional fields using the finite difference approach. For the second derivatives for d we will use the standard cen- tral difference approximation over a non uniform grid: .,. - Q<+, , j ,k - (1 + u;)Q;,j,i + LT;m. . = '-',hi +I: (6)

&E:

where a is the local mesh ratio, E; is the local z spacing and T; is the value of the truncation error. The function 7' at thc point ( i , j , k ) which is given by

which is second order for analytical distribution of the mesh ratio.

For the convcction terms let us consider the setup given in Figure 1. Let A denote the time step, and n the time step index, then the following FD approxima,

pi,, + b(z)Q, f e(z)Q = d(z) (10)

that has to be solved over n ( L , R ) , L < R and is subject, for simplicity, to the following Dirichlet boundary conditions

Q(L) = pir. ; Q(R) QR (11)

A PD approximation for equation (11) using n grid points in R will result in a tri-diagonal system which can be solved in O(n) time complexity. An appropriate implementation of the OPD is to choose a set C with time complecity of 0(1) for computing the C;'. This means that ~ ( ' ) jin equation (2)] are explicit expression of d('). This means that at any grid point J' we have

@n+l = , @ l ( ~ ( ~ l + @ + ) , - I , + I

tions are suggested for ui,j,i > 0: where n is the iteration level and for most of the cases it is better to choose: #) = 0 if i # j. Obviously, this set-up does not have a natural DDC implementation. If the DDC has to be applied, it can be done by spreading

' !%!k - '!%!k the values of 4 over the processors, probably by using

Q . . (4 - e . . t ( * - l )

A (8) * . A i '.I,

( % ) i , j , k =

= (I+- 1 + a;-1 hi-1 some "close-neighbors" 1311 approach.

If we want to implement the DDC approach to its full extend, thc R has to besplit into several subdomains 0;. Equation (11) is solve almost independently using

This second order upwind scheme 1291 was proven to be unconditionally stable for all n 2 1 [29] and the rate

the approach in 1321. More specifically, let us define the following variables:

X: XT

'3 : QT

(z~,zz,zs), L 5 z1 < 22 < z~ 5 R.

(Ql,Q2,'D3,1), where mi 3 +(si), i = 1,2 ,3 . Usually we'll denote it by @(X). . c : CT E (Cl,l,C3,C4).

the spacings h E z3 - zz , k E z2 - z1.

With these definitions, the following problem is defined:

The Problem P: For a given vector X, find a coefficient C such that

c m = o (12)

where *(X) are the discrete values of the function Q that fulfills eqs.(l) or eq.(2).

It can be proven that the solution to the problem exists and is unique 1321.

The parallel implementation of the DDC consists of the following steps:

S t e p 1: Choose a set W of P internal discrete points in n : W = {q, ..., zp}, z g = L and ZP+~ = R. Define

Yi = (z;-~, z;, , i = 1 ,2 ,.., P (13)

S t e p 2: Solve in a parallel manner P problems of the type: P(q,Y;;Ci), i = 1,2, ..., P, where the i lh

processor solves the ilh problem independently of the others. One of the main issues that may be of some concern is load balancing, but we will not discuss it here.

S t e p 3: Solve the following tri-diagonal linear system for the set {+4i}:zr:

C i @ ; = O , i = 1 , 2 ,..., P (14)

using one of the processors.

Step 2 is general in the sense that each processor i can spread a FD grid over Y; and execute the solution for P. The number of such grid points and the way they are chosen in is determined by the requirements on the accuracy of the solut,ion C;. It was shown 1321 that for every boundary value problem it is possible t o find a numerical scheme that will be accurate to any order of

accuracy, and still keep the FD approximation spread only over three grid points. Because of load balancing it can be shown that the maximum parallel efficiency is 50% in this case.

Here, we did not incorporate any OPD. However, when this approach is applied in the continuous space (rather in the discrete FD space) it can be viewed also as an OPD, solving several boundary value problems on different domains. Thus we may conclude that for all practical applications, the OPD and the DDC are the same for the one dimensional case.

6 The Two Dimension DDC

Mast of the times the DDC is used in a very simple manner: R is split into P convenient pieces. The brute force splitting is usually into even strips, so that the computational effort is minimal. That i s since the in- teraction between the internal boundaries of the subdomains takes a very simple form, and the structure of the capacitance matrix is simple. In general it can be said that the leading concept of the DDC splitting is simplicity of the computational procedure.

In the present study, the DDC was implemented dif- ferently. Since R is not homogeneous in terms of the the solution, it is important to condense more processors around condensed (or concentrated) values of the function. For simplicity, we can use the definition that the concentration of a function is (linearly) proportional to the absolute values of its gradients.

Assume that ni is a subdomain and the concentration of a function f at a point id p = IVfl. the coneen- tration f i l o f f over this domain is:

L

Ini fiL(n) d n continuous field " = [ & p k ~ k , l E Ri discrete field

If there are P subdomains, the total concentration is

p = c l r i i

The DDC is implemented here in such a way that the concentration o f f over each of the subdomains has the same value. Obviously, this approach is implemented iteratively. At the beginning R is split evenly into P subdomains. Then an approximate solution (probably

using the OPD approach) is obtained. Using this solution f, a better split of the subdomains generated. Using the given f as an initial approximation, a better approximate solution is sought. This solution is the ba, sis for the next DDC split, etc. Intuitively, non of the approximations has to be well converged, but the qual- ity (converge criteria) has to become better with the global iterations.

Figure 2 presents this procedure for the square cavity problem for Reynolds number of 1000 using 50 x 50 grid.

7 The Two Dimension OPD

Over each subdomain the continuous or the discretized version of equations (I) is decomposed into several subsystems, each resembles more or less the original system. It is quite an obvious and a reasonable approach to used the parabolized operators as parts of the OPD, since they are obtained by deleting one of the two dif- fusion terms of system (l), treating it as known (from the last global iteration). This is discussed briefly in [ 5 ] , where the operators are parabolized along the systems' coordinates. For that case, the number of elements in C is the same as the dimensionality of the geometri- cal domain. For the two dimensional case we have the following form:

1

where $J = (u,u,p)= and the two 6's are combined t- gether to give the next approximation as follows:

4 = 31 + %(+Z - 6') and the source terms R and S depend on the values of thc last iterate of 6.

Thus each subdomains requires two processors, one in each direction. Each of these processors finds an approximation solution over ni by marching along one of

Residue = 0.05

~ Residue : 0.007

Figure 2 Dynamic DDC for Re = 1000. Residue = JResidue(z-momentum) (+(Residue(y-

rnornentum)J.

the coordinates. After each marching procedure is com- plctad, their solutions are weighted averaged before pro- cccding to the next global iteration. The global iterative procedure is terminated after certain accuracy is achieved.

The weighted average of the solutions of the two processors is dictated mainly by the maximization of the rate of convergence. It can be shown that the p a ramcter of this averaging is Q,/QU where Q. and Qu are the average mass flux in the z direction and in the y direction respectively.

Another possibility for the OPD is the alternating strongly implicit procedure. We have been using the alternating scheme of the modified version of the strongly implicit procedure [ Z S ] . Let us assume that the discrete version of equations (1) at the (i, j)'. grid point is:

E;jdi+i , j+ Wi jd i - i , j+N; jh , j+~ +Sij4i , j - l+Pijh, j = &j

where E , W, N , S and P are matrix coefficients and R is the matrix source term. The basic strongly implicit (SI) procedure can be described as follows:

Di, = I/(Ej + Xi,j-iS;j + %i,jWij)

X.. - -D..N.. . y.. - -D..E. <, - *, a, , t, - 'I 'I

8 Some Examples and

Discussion Referring to Figure 2, these results were obtained using the INTEL Hypercube parallel machine with 8 processors. Each couple of them where allocated to a different subdomain. The major limitation we forced on the DDC procedure is that the subdomains' boundaries should be parallel to the coordinates axes. This make the OPD procedure far more simpler than in the general case. However, we loose some of the load balancing because now the concentrations over the subdomains are note the same. For that case we have used some (straight forward) procedure to minimize these differences. For the variables' values along the internal boundaries, we have use the serial (scalar) version of the incomplete conjugate gradient procedure to solve them in an iterative manner. A similar numerical experiment was done using the modified strongly implicit approach for the OPD. h sample of the equations' residue versus the cpu time is presented in the next table. We compare the residues of the parallel and serial version of the parab olized version and the MSI version of our scheme for Reynolds number of 2500. For the parallel programs we measure the time and the residues just before changing the DDC splitting.

L

11 CPU Time /I Scalar I Parallel I Scalar I Parallel 1

and k is the iteration level. The MSI method is obtained by adding an extra step t o the SI procedure. This step is a combination of the last two equations (for di j and for &), and is given by the following equation:

It follows from the above equation that after calculat- ing the 4 field an implicit equation is solved again for tlie values of 4 along the diagonals of the field. (Note that a diagonal is defined for i + j = constant). A similar equation can be obtained for the other (normal) diagonal direction. These two versions last equation are executcd indepcndently by two processors, where after each iteration their results are averaged, and serves as tlie next gucss for the solution of the system.

1 7 1 0.36 I 0.044 I 0.039 1 0.012 0.14 I nnnR I n n i i I

0.015 i 0.00013 I 0.0071 0.00010

This table shows clearly that this approach has a lot of potential, and we intend to investigate it much deeper in the near future.

References [I.] International Journal on Numerical Methods in

Fluids, Special Issue on "Parallel and Super Com- puting in Flows", in press.

12.1 Conference on Parallel CFD - Implementations

and Results Using MIMD Computers, Los Ange- les, May 8 - 9, 1989.

13.1 Lin, A., (1987), YParallel and Super Computing of Elliptic Operators”, in Supercomputing, K a r t a shev, L. and Kartashev, S. editors, The Interna, tional Supercomputing Institute, pp. 497-502.

14.1 Lin, A., (1987), “Parallel Algorithm for Three Di- mensional Flows”, Numerical

Methods in Flows, 5, part 1, pp.48-56, Pineridge Press, UK.

15.1 Lin, A., (1989), ‘Solving Numerically the Navier- Stokes Equations on Parallel System”, Int. J . on Numerical Methods in Fluids, in press.

16.1 Kung, I(. C., (1984), “Concurrency in parallel pr* cessing systems”, UCLA CSD Rep. 840039, Com- puter Science Department, University of Calior- nia, Los Angeles.

17.1 IIwang, K., Tseng, P. -S. and Kim, D., (1989), “An orthogonal Multiprocessor for parallel scien- tific computations”, IEEE Trans. on Computers, 36, 1, pp. 47-61.

IS.] Roache, P., (19761, Computational Fluid Dynamics, Hermosa, Albur- querque, NM.

19.1 Lin, A., (1966), “High Order Three Points Schemes for Boundary Value Problems. 11) Nonlinear P r o b lems“, the J . of Computational and Applied Math- ematics, 1.5, 2, pp. 269 - 282.

,

110.1 Anderson, C. R., (1985), ‘On domain decomposition”, Technical Report, Mathematics Department and Computer Science Department, Stanford Uni- versity.

111.1 Scherson, I. D. and Sen, S. (1989), “Parallel sort- ing in two dimensional VLSI models of computation” IEEE Trans. on Computers, 38, 2, pp. 238- 249.

112.1 Temam, R., (1984), Navier-Stokes Equations: theory and numerical analysis, North-IIolland.

113.1 Gehringer, E. F., Abullarade, J. and Gulyn, M. H., (1988), “A survey of commercial parallel processors”, Computer Architecture News, 16, 4, pp. 75-107. On parallel degenerate systems.

114.1 Hockney, R. W., (1970) “The potential calculation and some applications”, Met. Comput. Physics, 9, pp. 135-211.

115.1 Przemieniecki, J. S., (1963), “Matrix structural analysis of substructures”, AIAA J., 1, pp. 138- 147.

116.1 Bjorstad, P. E. and Widlund, 0. B., (1984), ‘It erative methods for a solution of elliptic problem on regions partitioned into substructures”, Tech- nical Report 136, Courant Institute of Mathemati- cal Sciences, New York University, September 1984 and 1985.

[17.] Caucus, P, Golub, G . H. and O’Leary, D. P., (1975), “A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations”, in Proceedings of the symposium on Sparse Matrix Computations, J. R. Bunch and D. J. Rose eds., Academic Press, New-York, pp. 309-332.

(18.1 Rodrigue, G. and Simon, J., (1984). ‘Jacobi splittings and the method of overlapping domains for solving elliptic PDEs”, in Advances in Com- puter Methods for Partial Differential Equations - 5 , Vichnevetsky, R. and Stepleman, R. S. eds., IMACS, pp. 383-386.

119.1 Oliger, J., Skamarock, W. and Tang, W. -E’., (1985), “Schwarz alternating procedure and S.O.R. accelerations”, Technical Report, Computer Sci- ence Department, Stanford University.

120) Dryja, M., (1982), ‘A capacitance matrix method for Dirichlet problem on polygonal region”, Numer. Math., 39, pp. 51-64.

1211 Golub, G. H. and Mayers, D., (1983), “The use of preconditioning over irregular regions”, the Sizth Int. Conf. on Computing Methods in Applied Sci- ence and Engineering, Versailles, France.

1221 Bjostad, P. E. and Widlund, 0. B., (1985), ’Iter- ative methods for the solution of elliptic problems

on regions partitioned into structures", Technical Report 130, Courant Institute of Mathematical Sci- ences, New York University.

1231 Bramble, J. H., Pasciak, J . E. and Schatz, A. H., (1986), 'The construction of preconditianei-s for elliptic problems by substructuring: I", Mathemafics of Computation, 47, pp. 10%134.

1241 Dryja, M. and Proskurowski, W., (1984), "Fast elliptic solvers on rectangular regions subdivided into strips", in Advances in Computer Methods for Partial

Differential Equations - V, R. Vichnevetsky and R. S. Stepleman, eds., IMACS, pp. 360-368.

[251 Chan, T. F. and Resasco, D. C., (1987), "A domain decomposition fast Poisson solver on a rectangle", SJAM J. Sci. Stat. Computing, 8 , 1, pp. 14-26.

1261 Schwarz, H. A,, (1869), "Ueher Einige Abbil- dungsaufgaben", Journal F. die reine und angew. Math., 70, pp, 105-120.

1271 Schwarz, €1. A., (1890), 'Gesammette Mathema tische Abhadlungen", Springer, Berlin, 2, pp. 133- 134.

1281 Lin, A., (1989), "On the modified strongly implicit method", submitted..

[29] Lin, A,, (1989), "Stable Second Order Accurate It- erative Solutions for Second Order Elliptic Prob- lems", The International Journal of Numerical Methods in Fluids, 9, pp. 87-102.

[30] Rubin, S. G. and Khosla, P. K., (1981), "Navier- Stokes calculations with a coupled strongly implicit method - parts 1 & 2", Computers and Fluids, 9, pp. 103-180.

131) Lin, A,, (1989), YFast Computations of Incom- pressible Fields", The 6th International Conference on Numerical Methods in Laminar and Turbulent Flows, England, pp.2.000.

132) Lin, A,, (1989), "Parallel Algorithms for Boundary Value Problems". submitted.

Date post:	10-Dec-2016
Category:	Documents
Upload:	avi
View:	212 times
Download:	0 times

[American Institute of Aeronautics and Astronautics 28th Aerospace Sciences Meeting - Reno,NV,U.S.A....

Documents