+ All Categories
Home > Documents > [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference -...

[American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference -...

Date post: 10-Dec-2016
Category:
Upload: dana
View: 212 times
Download: 0 times
Share this document with a friend
9
Inexact Newton's Method Solutions to the Incompressible Navier-Stokes and Energy Equations Using Standard and Matrix-Free Implementations Paul R. McHugh and Dana A. Knoll Computational Fluid Dynamics Unit Idaho National Engineering Laboratory Idaho Falls, Idaho 8341 5-3895 Abstract Fully implicit Ncwton's mcthod is coupled with conjugate gradient-likc itcrativc algorithms to form inexact Newton algorithms for solving the stcady, incompressible, Navicr-Stokes and encrgy equations in primitive variables. Finitc volumc diffcrcncing is employed using the power law convection-diffusion scheme on a uniform, but slaggcrcd mesh. The wcll known model problcm of natural convection in an enclosed cavity is solvcd. Thrce conjugatc gradient-like algorithms arc sclcctcd from a class of algorithms bascd upon thc Lanczos biorthogonalization proccdurc; namely, the conjugate gradicnts squarcd algorithm (CGS), the transpose-frce quasi-minimal rcsidual algorithm (QMRCGS), and a morc smoothly convcrgcnt version of thc bi-conjugate gradicnts algorithm (Bi-CGSTAB). A fourth algorithm is bascd upon the Arnoldi proccss, namcly the popular gcncralized minimal rcsidual algorithm (GMRES). The performance of a standard incxact Ncwton's mcthod implementation is comparcd with a matrix-frec implementation. Rcsults indicatc that thc performance of the matrix-free implemcntation is strongly dcpcndcnt upon grid size (numbcr of unknowns) and thc sclcction of the conjugatc gradient-like mcthod. GMRES was found to be supcrior to the Lanczos bascd algorithms within the contcxt of a matrix-frcc implcmcnlation. Among the Lanczos bascd algorithms, QMRCGS and Bi-CGSTAB wcrc bctter suitcd to thc matrix-frce implcmcntation than CGS bccausc of thcir smoothcr convcrgcncc behavior. Introduction The usc of robust, fully implicit algorithms to solve the Navier-Stokes equations is growing in popularity mainly duc to the rapid advanccs in computcr specd and available mcmory. An 'incxact' Newton's mcthod rcfcrs to thc usc of an itcrativc solvcr to Copyright 1993 by the American Institurc of Aeronautics and Astronautics, Inc. Thc submitted manuscript has bcen auihorcd by a contractor or the U.S. Governmcnt under DOE Contract DE-AC07-76ID01570. Accordingly, the US. Governmcnt rctains a noncxclusivc, royalty-free liccnsc to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Govcrnmcnt purposes. approximately solve the linear systems arising from a Ncwton linearization of the governing eq~ations.l*~ The primary advantage in the use of an iterative solver is reduced memory requirements. However, it is also advantageous in the sense that the tolerance of the linear equation solve can be relaxed when far from the true solution, and tightened as the true solution is approached. Preconditioned conjugate gradient-like iterative algorithms have been successfully used in this fa~hion.~-lO True conjugate gradient methods compute approximations to x in the affine space xo + Kk, where Kk is the Krylov subspace of dimension k.l l Thcy are characterized by an optimality condition that is realizcd by constructing an orthogonal basis for the Krylov subspace (orthogonality c~ndition).~ In many of thcse Krylov projection methods the Jacobian matrix appcars only in the form of matrix-vector products. This becomes very important in the context of an incxact Newton iteration because these products may be approximated by finite differences as f0llows,8-~~ where J is the Jacobian matrix, x is the state vector, v is an arbitrary vector, E is a perturbation, and F represents the system of nonlinear equations. The existence of this approximation is significant because it suggest the possibility of matrix-free implementations of Newton's method, thereby circumventing the main drawback associated with its use. The performance of this matrix-free implementation compared with the standard implementation is the focus of this paper. Note that for symmetric matrices the orthogonality condition mentioned above is easily satisfied using short vector recurrence relationships, resulting in constant work and storage requirements on each iteration. For non-symmetric matrices, however, short recurrences do not exist13 and so the work and storage requirements increase with the iteration number; making the use of true conjugate gradient methods impractical. However, some problems allow successful application of true conjugate gradient algorithms to the normal equations (i.e., ATAx = ATb). Disadvantages in this approach, however, are that the condition
Transcript
Page 1: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

Inexact Newton's Method Solutions to the Incompressible Navier-Stokes and Energy Equations Using Standard and Matrix-Free Implementations

Paul R. McHugh and Dana A. Knoll Computational Fluid Dynamics Unit

Idaho National Engineering Laboratory Idaho Falls, Idaho 8341 5-3895

Abstract

Fully implicit Ncwton's mcthod is coupled with conjugate gradient-likc itcrativc algorithms to form inexact Newton algorithms for solving the stcady, incompressible, Navicr-Stokes and encrgy equations in primitive variables. Finitc volumc diffcrcncing is employed using the power law convection-diffusion scheme on a uniform, but slaggcrcd mesh. The wcll known model problcm of natural convection in an enclosed cavity is solvcd. Thrce conjugatc gradient-like algorithms arc sclcctcd from a class of algorithms bascd upon thc Lanczos biorthogonalization proccdurc; namely, the conjugate gradicnts squarcd algorithm (CGS), the transpose-frce quasi-minimal rcsidual algorithm (QMRCGS), and a morc smoothly convcrgcnt version of thc bi-conjugate gradicnts algorithm (Bi-CGSTAB). A fourth algorithm is bascd upon the Arnoldi proccss, namcly the popular gcncralized minimal rcsidual algorithm (GMRES). The performance of a standard incxact Ncwton's mcthod implementation is comparcd with a matrix-frec implementation. Rcsults indicatc that thc performance of the matrix-free implemcntation is strongly dcpcndcnt upon grid size (numbcr of unknowns) and thc sclcction of the conjugatc gradient-like mcthod. GMRES was found to be supcrior to the Lanczos bascd algorithms within the contcxt of a matrix-frcc implcmcnlation. Among the Lanczos bascd algorithms, QMRCGS and Bi-CGSTAB wcrc bctter suitcd to thc matrix-frce implcmcntation than CGS bccausc of thcir smoothcr convcrgcncc behavior.

Introduction

The usc of robust, fully implicit algorithms to solve the Navier-Stokes equations is growing in popularity mainly duc to the rapid advanccs in computcr specd and available mcmory. An 'incxact' Newton's mcthod rcfcrs to thc usc of an itcrativc solvcr to

Copyright 1993 by the American Institurc of Aeronautics and Astronautics, Inc. Thc submitted manuscript has bcen auihorcd by a contractor or the U.S. Governmcnt under DOE Contract DE-AC07-76ID01570. Accordingly, the US. Governmcnt rctains a noncxclusivc, royalty-free liccnsc to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Govcrnmcnt purposes.

approximately solve the linear systems arising from a Ncwton linearization of the governing eq~at ions . l*~ The primary advantage in the use of an iterative solver is reduced memory requirements. However, it is also advantageous in the sense that the tolerance of the linear equation solve can be relaxed when far from the true solution, and tightened as the true solution is approached. Preconditioned conjugate gradient-like iterative algorithms have been successfully used in this fa~hion.~-lO

True conjugate gradient methods compute approximations to x in the affine space xo + K k , where K k is the Krylov subspace of dimension k.l l Thcy are characterized by an optimality condition that is realizcd by constructing an orthogonal basis for the Krylov subspace (orthogonality c~ndition).~ In many of thcse Krylov projection methods the Jacobian matrix appcars only in the form of matrix-vector products. This becomes very important in the context of an incxact Newton iteration because these products may be approximated by finite differences as f0llows,8-~~

where J is the Jacobian matrix, x is the state vector, v is an arbitrary vector, E is a perturbation, and F represents the system of nonlinear equations. The existence of this approximation is significant because it suggest the possibility of matrix-free implementations of Newton's method, thereby circumventing the main drawback associated with its use. The performance of this matrix-free implementation compared with the standard implementation is the focus of this paper.

Note that for symmetric matrices the orthogonality condition mentioned above is easily satisfied using short vector recurrence relationships, resulting in constant work and storage requirements on each iteration. For non-symmetric matrices, however, short recurrences do not exist13 and so the work and storage requirements increase with the iteration number; making the use of true conjugate gradient methods impractical. However, some problems allow successful application of true conjugate gradient algorithms to the normal equations (i.e., ATAx = ATb). Disadvantages in this approach, however, are that the condition

Page 2: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

number of the ncw system is made much worse, and matrix vcctor multiplications with are required. Working with is undesirable for several reasons: first, the transposc is not always readily available; second, the efficiency of matrix vcctor multiplications with the transposc may bc rcduccd on vcctor/parallcl computers; and third, working with the transposc eliminates the option of thc abovc mcntioncd matrix- free implcmcntations of Ncwton's mcthod. For thcsc reasons we chose to conccntratc on thc performance of conjugate gradicnt-like algorithms.

Conjugate gradient-likc algorithms are dcrivcd by relaxing eithcr the optimality condition or the orthogonality condition.14 Thc optimality condition may bc relaxcd by cithcr allowing pcriodic algorithm restarts or by applying somc sort of quasi-minimization process. Thc orthogonality condition may bc rclaxcd by either artificially truncating the recursion (ic. the ncw direction vector is orthogonal to only thc prcvious s direction vcctors), or by using thc Lanczos biorthogonalization proccdurc (ix. using thrcc-tcrm recursions to build a pair of biorthogonal bases).14

The rcslartcd Gcncralizcd Minimal RESidual (GMRES) algorithm, bascd on the use of the Arnoldi process, is dcrivcd by relaxing the optimalily condition.15 Notc that thc fu l l GMRES algorithm is a true conjugate gradicnt mcthod in the scnsc that it satisfies both the orthogonality and optimality conditions. Howcvcr, thc storagc rcquircmcnts grow linearly and thc work quadratically with the itcration n ~ m b e r . ~ Thus, i t is oftcn necessary to use thc restarted vcrsion, GMRES(k), whcrc k indicatcs thc sclccted dimension of thc Krylov subspacc. In this case, the algorithm is rcslartcd aftcr k itcrations. Rcslarts may causc thc GMRES algorithm to exhibit very slow convergcncc; but GMRES will not cncountcr the breakdowns that arc possiblc with other a1gorithms.l

Algorithms dcrivcd by rclaxing the orthogonality condition using thc nonsymmctric Lanczos proccdurc includc: thc Bi-Conjugate Gradicnt (BCG) a ~ g o r i t h m , ~ ~ , ~ ~ thc Bi-CGSTAB algorithm and its variants,19 thc Conjugate Gradicnts Squarcd (CGS) algorithm,20 and a set of algorithms bascd on thc Quasi-Minimal-Residual idca2I Compared with thc Arnoldi bascd method (ic. GMRES), thesc Lanczos based methods rcquirc lcss work and storagc pcr iteration.22

Thc first of the Lanczos bascd mcthods developcd, BCG, sul'fcrs from three main shortcoming^:^^

1.) Requires matrix vcctor multiplications with the matrix transposc.

2.) Lacks the minimization property inherent in Arnoldi based method, resulting in irregular convergence properties.

3.) Algorithm breakdown is possible.

The first shortcoming was overcome with the dcvclopment of CGS, a variant of the BCG that avoids use of the matrix transpose.20 CGS doubles the rate of convergcnce of BCG, but unfortunately it also doubles the rate of divergence and still exhibits irregular convergence behavior. In an attempt to control this irregular convergence behavior several new algorithms have becn developed: Van der Vorst used local steepest decent steps in the BI-CGSTAB algorithm; and Freund applied the quasi-minimal residual idea (QMRCGS) to obtain more smoothly convergent CGS-like solution^.^^^^^ Note that both Bi-CGSTAB and QMRCGS may still encounter algorithm breakdown. Thc lookahead Lanczos procedure has been used in several algorithms b avoid these breakdowns, but these algorithms once again require working with the mamx transpose. l6

This paper investigates the use of CGS, QMRCGS, Bi-CGSTAB, and GMRES(20) within both standard and matrix-free implementations of inexact Ncwton's method. The resulting algorithms are used to solve the model problem described below.

Model Problem Descri~tion

The model problem considered here is natural convection in an enclosed square cavity as illustrated in Figure 1. The flow is assumed incompressible. The coupling between the momentum and energy equations occurs through the buoyancy force term in the momentum equation, using the Boussinesq approximation.23 In conservative and dimensionless form the governing equations can be expresscd as,

Continuity

Momen turn

du2 duv & d2u d2u - + = - + + - ax ?Y ax ax2 dy2 (3)

Energy

Page 3: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

where G r is the Grashof number, Pr is thc Prandil number (= 0.71), and the Rayleigh number, R a , is given by Ra = Gr Pr (= lo4). The gravity vector is assumed pointing in the negativc y-direction. Boundary conditions for this problem arc specified in Figurc 1.

Finite volumc differencing using the power law schemc of ~ a t a n k a r ~ ~ is uscd to discrctizc thcsc governing equations on a uniform, staggcrcd grid where velocities are locatcd on ccll faccs and pressurcs and temperatures are located at ccll ccntcrs.

Figure 1. Gcomctry and boundary conditions for thc model problem.

Numerical Solulion Alnorithm

The numerical solution algorithm used herc is based on Newton's mcthod. Implementation is simplified using a numerically evaluated Jacobian, while robustness is improved using mesh sequencing and a dampcd Newton i tera t i~n.~ Thc lincar systems that arise on each Ncwton iteration are solvcd using preconditioned conjugate gradient-like iterative algorithms. Conscquently, the resulting solution algorithm is rcfcrred to as an 'inexact' Newton's method. Both standard and matrix-frcc implementations of inexact Newton's method arc discussed bclow.

Newton's Method

Newton's mcthod is a robust technique for solving systems of nonlincar equations of the form,

where the state variable, x, can be expressed as,

Application of Newton's method requires the solution of the linear system,

whcre the elements of the Jacobian, J, are defined by,

?f J . . - 1 'J - dx .' J

and the new solution approximation is obtained from,

The constant, d, in Equation (10) is used to damp the Newton updates. The damping strategy is designed to prevcnt the calculation of non-physical variable values (i.e., negative temperature), and to scale large variable updates when the solution is far from the true solution. However, damping was not necessary to obtain the resul~s presented here.

The convergence criteria for the Newton itcration is based upon a relative update defined by

o Max (1 1)

whcre the superscript on refers to the outer Newton itcration and the subscript indicates the dependence on the Newton iteration. Convergence is then assumed when R: < 1 x This means that six digits of accuracy are required when the magnitude of the state variable is greater than one, and six decimal places of accuracy are required when the magnitude of the state variable is less than one.

We employ a natural ('uvpT') ordering of the variables, whcre the u-momentum equation is solved for the u-vclocity; the v-momentum equation is solved for the v-velocity; the continuity equation is solved for the pressure; and the energy equation is solved for the temperature. Cells are numbered from left to right and then bottom to top. This ordering and our finite volume differencing scheme results in a sparse banded Jacobian matrix. We exploit this sparse banded structure by storing only the non-zero diagonal bands of thc Jacobian matrix.

Page 4: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

Slandard Inexact Ncwton Iteration Prccondi tioninq

An inexact Ncwton mcthod is onc in which Equation (8) is not solvcd exactly on each Ncwton itcration. Conjugatc gradient-likc itcrativc algorithms arc uscd to solve Equation (8). Thc accuracy of the itcrativc solve is controlled by an inncr convcrgcnce criteria similar to that proposed by Avcrick and Ortcgal and ~ c m b o . ~ Specifically, thc inncr QMRCGS itcration is assumcd converged whcn,

where the superscript on R: refers to thc inncr itcration, and the subscript indicates thc dcpcndcncc on the Newton itcration: The sclection of Lhc bcst value of yn

is based upon prcvious rcsults. l 0925 Whcn slating from a flat initial gucss, yn is allowcd to vary on cach Newton itcration according to

When starting from a reasonably good initial gucss (is., the intcrpolatcd solution from a coarscr grid) then yn is set equal to 10-2.

Matrix-Frcc Incxact Ncwton Itcration

Thc Krylov projection mcthods uscd to solve Equation (8) rcquirc thc Jacobian matrix only in thc form of matrix-vector products, which may be approximated by Equation ( 1 ) . Usc of this approximation lcads to a matrix-frcc incxact Ncwton's method. Equation (1) indicatcs that thc accuracy of thc matrix-frec approximation is strongly dcpcndcnt upon the vector, v . Since this vcctor changcs within thc inncr conjugatc gradient-likc itcration, thc accuracy of the matrix-frcc approximation is subjcct to somc uncertainty. In our matrix-frcc implcmcntation, thc pcrturbation constant, E, is choscn as follows,

xi is the ith component of thc state vcctor of dimension n , and a is a pcrturbation constant whose magnitude is on the ordcr of the square root of computer round-off.

We employ right preconditioning in order to improve the performance of the conjugate gradient-like algorithms. Sclection of an effective preconditioner is a very important, but sometimes difficult task. In this paper the focus is on the performance of the standard vcrsus the matrix-free implementations of inexact Ncwton's method. Therefore, we restrict our attention to only incomplete lower-upper (ILU) factorization, specifically ILU(0) preconditioning.14*16 This means that the preconditioning matrix assumes the same nonzcro sparsity pattern as the Jacobian matrix. In our implcmcntation, however, we take advantage of the bandcd structure of our Jacobian matrix and store non- zero diagonals. We assume then that our preconditioner has the same number of nonzero diagonals.

A difficulty, which arises when the continuity cquation is solved for pressure, warrants a short discussion with rcgard to preconditioning. Because prcssure does not explicitly appear in this equation, zcros appcar on the main diagonal in every row of the Jacobian matrix representing the continuity equation. Thesc zcro diagonal entries reduce the number of effcctivc prcconditioners that can be derived from the Jacobian matrix. However, incomplete lower-upper (ILU) factorization can still be used as an effective preconditioner in this case because fill-in resulting from thc incomplete factorization will generate non-zero entries in all the diagonal rows except those corresponding to a finite volume with faces adjacent to both lcft and bottom b~undar ies .~ In our model problem, the only cell with faces adjacent to both a left and a bottom boundary is located in the lower-left corner. We handle this problem by simply fixing the prcssure to zero in that cell, which is justified for this model problem and incompressible flow because prcssure is determined only up to an additive consmnt.6p27

Results

The advantages and disadvantages of the matrix-free implcmentation are studied with respect to performance and robustness. The computer memory advantagcs of the matrix-free implementation are obvious. The potential performance advantage lies in rcducing the CPU cost of forming and using the Jacobian matrix, without inhibiting or degrading convergence. Performance is studied using both Lanczos based iterative solvers (CGS, QMRCGS, and Bi-CGSTAB) and an Arnoldi based iterative solver (GMRES(20)). Note that in applications with a large numbcr of equations, where the cost of forming the Jacobian matrix is a significant fraction of the total CPU time, the potential advantages of the matrix-free implementation are very appealing.

Page 5: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

Table 1. Comparison of standard and matrix-frcc implcmcntations on a 10x10 grid (m- = 20). Standard Implcmcnla~ion Matrix-Free Implementation

Itcrativc 6

CPU mmax CPU mmax rFi Solvcr n Time hits n Time hits

CGS 7 7 2.3 0 9 9 6.4 2 QMRCGS 8 7 2.9 0 8 12 7.2 3

Bi-CGSTAB 8 7 2.5 0 7 8 4.4 1 GMRES(20) 9 7 2.8 0 9 7 3.5 0

Table 2. Comparison of standard and matrix-frec implcmcntations on a 20x20 grid (m,, = 40). Standard Implcmcnkition Matrix-Free Implementation

Itcrativc iii

CPU mmax CPU mmax A Solver n Timc hits n Time hits

CGS 8 17 16.7 0 QMRCGS 8 2 1 24.4 0 10 34 85.4 6

Bi-CGSTAB 8 18 17.3 0 10 27 60.3 4 GMRES(20) 10 24 17.7 3 10 23 27.7 1

Table 3. Comparison of standard and matrix-frcc implcmcnlations on a 40x40 grid (m,, = 80). Standard Implcmcntation Matrix-Free Implementation

Iterative A

CPU mrnax CPU mmax rFi Solvcr n Time hits n Time hits

CGS 9 55 188.2 3 QMRCGS 10 61 305.7 4 58 79 4313.4 57

Bi-CGSTAB 13 69 322.0 8

Table 4. Comparison of standard and matrix-frcc implcmcntations on a 40x40 grid using a 10x10,20x20, and 40x40 mcsh scqucncc (m,, = 20,40, and 80, rcspcctivcly).

Standard Implementation Matrix-Free Implementation

Iterative iii

CPU mmax CPU mmax rFi Solver n Time hits n Time hits

CGS 5 62 131.7 3 QMRCGS 5 70 195.1 4 29 80 2249.9 29

Bi-CGSTAB 7 71 196.6 8 28 79 1841.1 26 G MRES (20) 15 80 270.1 15 15 80 485.5 15

Tablcs 1 through 4 and Figurcs 2 through 5

I show performance data for the stcady statc solution of I the natural convcction tcst problem with Ru = 101 and

Pr =0.71. All computations wcrc run on an IBM RISCl6000 model 320 workstation. Calculations wcrc initiatcd from a fat initial gucss (LC., u=v =0, T=O.S). Note that data obtaincd for um, and v,, on the 40x40 grid differcd from thc bcnchrnark solutuion of Dc Vahl ~ a v i s ~ ~ by lcss than 1%. Thc tablcs abovc prcscnt thc requircd numbcr of Ncwton itcrations (n), the avcragc

number of inncr itcra~ions pcr Ncwton iteration (if?), the rcquircd CPU time, and the numbcr of timcs thc maximum inncr itcration limit (mmax) was encountcrcd. Thc maximum inncr itcration limit was

sct equal to the square root of the number of unknowns. Note that the implementation used to generate these results is not practical in the sense that a new ILU(0) prcconditioner was formed each Newton iteration, which in turn requires the formation of the Jacobian matrix. A more practical implementation might use the same preconditioner for several Newton iterations, or a less cxpcnsivc prcconditioner that does not require forming thc complcte Jacobian matrix. However, the purpose of this article is to investigate the effects of the matrix-free approximation. With this goal in mind, a more practical implementation was not necessary, and so ILU(0) was selected as the only preconditioner. Consequently, CPU times should not be used as a basis for comparing the two implementations, but rather as a

Page 6: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

basis for comparing the pcrformance of thc diffcrcnt itcrative solvcrs. Convcrgcncc bchavior is uscd as a basis of comparison for thc standard and matrix-frce implemcnlations.

Tablc 1 prcscnts performance data for a coarse 10x10 grid solution. Corresponding convcrgcncc plots arc shown for both thc standard and thc matrix-frcc implemcnlations in Figurcs 2 and 3, rcspcctivcly. These figures plot thc maximum relativc Ncwton updatc, R:, as a function of thc Ncwton itcration count. The pcrformance of thc diffcrcnt itcrative solvcrs is similar for both the slandard implcrnentation and the matrix-frec implc~ncntations for this coarsc grid.

Tables 2 and 3 invcstigatc thc cffcct of grid refincmcnt. Use of thc Lanczos bascd itcrative algorithms with the matrix-frcc approximation Icd to a marked degradation in pcrformance as the grid was refined; while thc Arnoldi bascd mcthod (GMRES) performed similarly for both implcmcnlations. In fact, for thc 40x40 grid (Tablc 3) no solutions wcrc oblaincd using the matrix-frce approximation with CGS or Bi- CGSTAB. The use of mcsh scqucncing in Tablc 4 lcd to a solution using Bi-CGSTAB, but still did not enable a solution using CGS.

Convcrgcncc plots for the 40x40 grid solutions in Table 4 arc shown in Figurcs 4 and 5. Figurc 6 is a plot of R,!, vcrsus thc inncr itcration numbcr for thc first Ncwton iteration on thc 40x40 grid corrcsponding to thc standard implcrnentation solutions in Tablc 4. Note that the first itcration was chosen bccause then cach of the itcrative algorithms arc roughly solving thc same linear system. Note from Figurc 4, thc rclativcly slow convcrgcncc obtained using GMRES(20) for the slandard implcmcnlation. This bchavior follows from thc choice for thc dimcnsion of our Krylov subspacc (k=20). Twcnty itcrations was not sufficient to satisfy the inner itcration convcrgcncc critcria (Equation (1 3)). This necessik~ted pcriodic algorithm rcsm~s, which in turn slowed the convcrgcncc of thc GMRES algorithm. This observation is cvidenccd by thc largc numbcr of m,,, hits cncountcrcd in Tablcs 3 and 4, and thc convergencc flattening trend shown in Figurc 6 for thc GMRES(20) curvc. GMRES(20) convcrgencc is excellent during thc first 20 itcrations, but thcrcaftcr begins to ilattcn or stall as more pcriodic algorithm restarts are necdcd. lncrcasing thc dimcnsion of Lhc Krylov subspace would improve pcrformance, but it would also furthcr incrcasc algorithm memory requirements.

Several additional obscrvations can bc glcaned from Figure 6. Notice the rathcr crratic convcrgcncc behavior of the CGS algorithm that was alludcd to previously. In spiu: of this bchavior, thc labulatcd data shows that whcn the CGS algorithm convcrgcd it was morc CPU efficient than the othcr algorithms. Figurc

6 shows that for this problem, QMRCGS is more succcssful than Bi-CGSTAB at controlling the erratic CGS convergence behavior, but at a noticeable higher CPU cost as can be seen in the tabulated results. The QMRCGS convergence curve tends to temporarily stall or flatten out when CGS is displaying very erratic convcrgence bchavior. The Bi-CGSTAB convergence curve, although more controlled than the CGS curve, still exhibits some crratic convergence bchavior.

These observations may lend some insight into thc relatively poor performance of matrix-free implcmcntation when the Lanczos based methods are uscd. This pcrformance is illustrated in Figure 5, which once again corresponds to the solutions presented in Table 4. Recall that no solution was obtained using the CGS algorithm with the matrix-free approximation. In the case of GMRES(20), replacing the standard implcrnentation with the matrix-free approximation resulted in nearly identical convergence behavior. This suggests that Equation (1) yielded acceptable approximations for the needed matrix-vector products. In contrast, the convergence behavior using QMRCGS and Bi-CGSTAB degraded appreciably when the standard implementation was replaced with the matrix-free approximation. In an attempt to relate this behavior to the obscrvations cited above, consider the convergence bchavior of the iterative solvers as shown in Figure 7 for the first Ncwton iteration. The erratic convergence bchavior of CGS coupled with the use of Equation (1) rcsults in very poor approximations for the needed matrix-vector products. The CGS algorilhm could not rccovcr from the erratic jumps and eventually returned a very bad Newton update that led to divergence. Once again, the QMRCGS algorithm is observed to stall out whcn thc CGS itcration is behaving badly. During this Ncwton itcration, QMRCGS stalls with a value of R,!, near one. This behavior, although resulting in poor convergcncc, does not cause divergence of the algorithm. During this first Newton iteration it is fortuitous that Bi-CGSTAB converged, because during latcr itcrations it most often encountered the m,,, limit as shown in Table 4. In addition, note that a solution could not be obtained using the matrix-free approximation with Bi-CGSTAB on the 40x40 grid starting from a flat initial guess. In that case, behavior similar to that of CGS over several Newton iterations led to divergence.

Recall that the accuracy of the matrix-free approximation in Equation (1) is dependent upon the vector, v. In the case of the Lanczos based algorithms, thc characteristics of this vector may vary wildly as cvidcnced by the sometimes erratic CGS convergence bchavior. In the case of GMRES, however, an orthonormal basis is constructed for the Krylov subspace so that only normalized vectors appear in matrix-vector products. Presumably, this feature enables Equation (1) to generate acceptable approxim-

Page 7: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

Figure 2. Incxact Ncwton itcration convcrgcncc behavior on a 10x10 grid using the standard implementation.

Figure 3. Incxact Newton itcration convcrgcnce behavior on a 10x10 grid using thc matrix-frec implementation.

Figure 4. Inexact Newton iteration convergence bchavior on a 40x40 grid using the standard irnplcmentation.

- CGS-No solution 102 ..A ........... y\ \ 10' ; \ $ Bi-COSTAB

\ lo0 '+.%

:\Y : 10-I - +\:.

R :: <\ 10.2 I\ 10-3

\

Figure 5. Inexact Newton iteration convergence bchavior on a 40x40 grid using the matrix-free irnplcmentation.

Page 8: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

ations for the required matrix-vector products needed within the GMRES algorithm.

Summarv and Conclusions

Figure 6. Inner itcration convcrgcncc bchavior for Lhc first Newton itcration on a 40x40 grid using the standard implcmcnllltion.

Figure 7. Inner itcration convcrgcncc bchavior for thc first Ncwton ilcralion on a 40x40 grid using thc matrix- frec implementation.

Inexact Newton algorithms were used to solve the well known problem of natural convection in an cncloscd square cavity, which is assumed governed by thc incompressible Navier-Stokes and energy equations. Coupling between the momentum and encrgy equations occurred through the buoyancy force terms in the momcntum equations using the Boussinesq approximation.23 These equations were solved with Ra = lo4 and Pr = 0.71 on several staggered, finite volume grids of increasing refinement. Several conjugate gradient-like algorithms were selected from a class of algorithms based upon the Lanczos biorthogonalization proccss. These included CGS, QMRCGS, and Bi-CGSTAB. A fourth algorithm was bascd upon the Arnoldi process, namely the restarted GMRES algorithm. We chose the dimension of the Krylov subspace for the restarted GMRES algorithm to be 20. Right ILU(0) preconditioning was used to improve the performance of the iterative solvers. Both standard and matrix-free implementations were investigated.

In general, GMRES(20) outperformed the Lanczos based methods when the matrix-free approximation was used. GMRES was able to maintain an acceptable level of performance when the standard implcrnentation was replaced with the manix- frcc approximation. In contrast, the Lanczos based mcthods were not able to maintain the same level of pcrformance. Among these methods, CGS was found to be poorly suitcd to matrix-free irnplcmentations of inexact Newton's method because of its erratic convergence behavior. QMRCGS and Bi-CGSTAB pcrformcd better than CGS because of their smoother convergence behavior, but still suffered a notable drop in pcrfonance when the matrix-free approximation was used.

Standard irnplcmentations using the Lanczos bascd algorithms seemed to outperform the standard implementation using GMRES(20) when the grid was rcfincd (number of unknowns increased). Convergence of the GMRES(20) algorithm was not ensured within 20 iterations, the selected dimension of the Krylov subspace. Consequently, periodic algorithm restarts wcre necessary, leading to slower convergence. The GMRES(20) iteration frequently encountered the upper limit for the number of inner iterations on the finest grid. This rcsulted in the return of mediocre Newton updates and slower convergence of the outer Newton itcration compared with the use of the Lanczos based mcthods.

Page 9: [American Institute of Aeronautics and Astronautics 11th Computational Fluid Dynamics Conference - Orlando,FL,U.S.A. (06 July 1993 - 09 July 1993)] 11th Computational Fluid Dynamics

Acknowledgements

Work supported through the EG&G Idaho Long Term Research Initiative in Computational Mechanics under DOE Idaho Field Office Contract DE- AC07-76ID01570. The authors thank Peler Brown for suggesting the use of SPARS KIT?^ and Youcef Saad for granting permission to use the GMRES algorithm from that package.

Averick, B M. and Ortega, J. M., "Solutions of nonlinear Poisson-type equations," Applied Numerical Mathematics 8,443455 (1991). Dembo, R. el al., "Inexact Newton methods," SIAM J . Numer. Anul. 19, 400-408 (1982). Einset, E. 0 . and Jensen, K. F., "A Finite Element Solution of Three-Dimensional Mixed Convection Gas Flows in Horimntal Channels Using Preconditioned Iterative Matrix Methods," Int. J . Numer. Meth. in Fluids 14, 817-841 (1992). Dahl, 0. and Wille, S. O., "An ILU Precondiiioner with Couple Node Fill-in for Iterative Solution of the Mixed Finite Elcment Formulation of the 2D and 3D Navier-Stokes Equations," Int. J. Numer. Meth. Fluids 15, 525-534 (1992). McHugh, P. R. and Knoll, D. A., "Fully Implicit Solution of the Benchmark Backward Facing Step Problem Using Finite Volume Differencing and Inexact Newton's Method," ASME HTD-Vol. 222, p.77, 1992 ASME Winter Annual Meeting, Anaheim, CA, Nov. 8-13, 1992. Chin, P., et al., "Preconditioned Conjugate Gradient Methods for the Incompressible Navier- Stokes Equations," Int. J. Num. Meth. Fluids 15, 273-295 (1992) Em, A., et a]., "Towards Polyalgorithmic Linear System Solvers for Nonlinear Elliptic Problems," presented at the Copper Mounrain Conference on Iterative Methods, Copper Mountain, Colorado, April 9-14, 1992. Gear, C. W. and Saad, Y., "Iterative Solution of Linear Equations in ODE Codes," SlAM .I. Sci. Star. Comp. 4, 583-601 (1983) Brown, P. N. and Hindmarsh, A. C., "Matrix-Free Methods for Stiff Systems of ODE'S," SIAM J . Numer. Anal. 23, 61 0-638, (1 986).

10. Brown, P. N. and Saad, Y., "Hybrid Krylov Methods for Nonlinear Systems of Equations," SIAM J . Sci. Stat. Comput. 11, 450-481 (1990).

11. Saad, Y. and Schullz, M. H., "Conjugate Gradient- Like Algorithms for Solving Nonsymmetric Lincar Systems," Mathematics of Cornputation 44, N170, 417-424 (1985).

12. Ashby, S. F. et al., "A Taxonomy for Conjugate Gradient Methods," SlAM .I. Numer. Anul. 27, 1542-1 568 (1990).

13. Faber, V. and Manteuffel, T., "Necessary and Sufficient Conditions for the Existence of a

Conjugate Gradient Method," SIAM J. Numer. Anal. 21, 352 (1984)

14. Ashby, S. et al., Preconditioned Polynomial Iterative Methods, A Tutorial, University of Colorado at Denver, April 7-8, 1992.

15. Saad, Y. and Schultz, M. H., "GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems," SIAM J Sci. Stat. Comput. 7, 856 (1986).

16. Freund, R. W. et al., Iterative Solution of Linear Systems, Numerical Analysis Project, Computer Science Department, Stanford University, Manuscript NA-91-05, November 199 1.

17. Lanczos, C., "Solution of Systems of Linear Equations by Minimized Iterations," J. Res. Natl. Natl. Bur. Stand. 49, 33-53 (1952).

18 Fletcher, R., "Conjugate Gradient Methods for Indefinite Systems," in Proc. Dundee Conference on Numerical Analysis, 1975, Lecture Notes in Mathematics 506, G . A. Watson, ed., Springer- Verlag, Berlin, pp. 73-89 (1976).

19. Van der Vorst, H. A., "Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems," SlAM J . Sci. Stat. Comput. 13, 631-644 (1992).

20. Sonneveld, P., "CGS, a Fast Lanczos-type Solver for Nonsymmeuic Linear Systems," SIAM J. Sci. Stat. Comput. 7, 856-869 (1986).

21. Freund, R. W., "A Transpose-Free Quasi-Minimal Residual Algorithm for non-Hermitian Linear Systems," SIAM J . Sci. Comput. 14, 470-482 (1993).

22. Tong, C. H., A Comparative Study of Preconditioned Lanczos Methods for Nonsymmetric Linear Systems, Sandia National Laboratories Report, SAND9 1-8240, UC-404, January 1992.

23. Landau, L. D. and E. Lifshitz, M., F l u i d Mechanics 2nd ed., 6, 1987

24. Pamkar, S. V., Numerical Heat Transfer and Fluid Flow, Hemisphere, New York, 1980.

25. McHugh, P. R. and Knoll, D. A., "Fully Implicit Finite Volume Solutions of the Incompressible Navier-Stokes and Energy Equations Using an Inexact newton Method," (submitted to Int. J. Num. Meth. Fluids).

26. Sangback, M. and Chronopoulos, A. T., "Implementation of Iterative Methods for Large Sparse Nonsymmetric Linear Systems on A Parallel Vector Machine," Int. J. Supercomputer Applications, 4 , 9-24 (1990).

27. Gresho, P. M., "Some Current CFD Issues Relevanl to the Incompressible Navier-Stokes Equations," Computer Meth. in Appl. Mech. and Eng. 87,201-252 (1991).

28. De Vahl Davis, G. "Natural Convection of Air in a Square Cavity: A Benchmark Numerical Solution," Int. J . Num. Meth. Fluids 3, 249-264 (1983).

29. Saad, Y., SPARSKIT, A Basic Tool Kit for Sparse Matrix Computations, RIACS Technical Report 90.20, 1991.


Recommended