Methods for the Solution of Linear Systems Deriving from ...

Chapter 3
Methods for the Solution of Linear SystemsDeriving fr om Elliptic PDEs
A finite volumediscretisationof theelliptic PDEs(Partial Dif ferentialEquations)usedto describea fluid mechanicsproblemgeneratesa largesparsesetof linearequations.Typically a CFD algorithm will involvetherepetitivesolutionof aPoissonpressureequationalongwith scalartransportequations for momentum,enthalpy, concentration,andany otherfieldsof interest.Normally theprogramwill spendmostof its executiontime in solving theselinearisedequations,andso the efficiency of the linearsolversunderpinstheefficiency of thesolutionmethodasa whole. Thereforea crucialaspect of any efficient solutionof a fluid mechanicsproblemis the speedthat theselinear(ised)equations canbesolved.
In this chaptera numberof differentalgorithmsfor thesolutionof linearequationsarediscussedand their resourceuseis compared,bothin termsof speedandmemoryuse.Whilst therearemany papers comparingtwo or threelinearsolvers,comparisonsof severalclassesof linearsolver arerarein the literature. FerzigerandPeric’s book[43] comparesa numberof methods,but the comparisonsare madein termsof numberof iterationsto convergeinsteadof time to converge,a rathermeaningless measuredueto thevariationin computationeffort per iterationamongstthesolvers. Bottaet al[14] comparea numberof methods,however thesolverswerewritten by differentgroupsandso thereis thepossibilitythatsomeof thevariationin performanceis dueto differentcodingstandards.In both casesthenumberof methodscoveredis lessthanin thecurrentstudy.
Thischapteris dividedinto two parts;thefirst describesthelinearsolvers,whilst theseconddiscusses their suitability for the solution of the equationsthat result from a finite volume discretisationof elliptic PDEs.
3.1 A Description of the Linear Solvers
Methodsfor solvinglinearequationscanbedividedinto two classes,directmethods(i.e., thosewhich executein a predeterminednumberof operations)anditerative methods(i.e., thosewhich attemptto convergeto thedesiredanswerin an unknown numberof repeatedsteps).Direct methodsareoften usedfor smalldenseproblems,but for the largesparseproblemswhich aretypically encounteredin thesolutionof PDEs,iterativemethodsareusuallymoreefficient.
The direct methodswhich arediscussedhereareGaussJordanelimination,LU (Lower Upper)factorisation,Cholesky factorisation,LDL (Lower DiagonalLower
) decomposition,Tridiagonaland
Thedistinctionbetweentheclassesof iterativesolversblurssomewhatsincemultigrid methodscanbe consideredasanaccelerationtechniqueto improvetheperformanceof thesimpleiterativeandincom- pletefactorisationmethods,andKrylov-spacemethodscanusethe simple iterative andincomplete factorisationmethodsaspreconditioners.A Krylov spacemethodusingan incompletefactorisation smoothedmultigrid preconditioneris a solver thatcombinesthreeof theclassesof iterativesolver.
3.1.1 Linear Equations Resulting fr om a Finite Volume Discretisation of a PDE
Thegeneralform of a setof linearequationscanbewritten, (3.1)
or consideringtheindividual equation ! #" (3.2)
will typically take on a hepta-diagonalstructure,with the
non-zerocomponentsoccupying only sevendiagonalsof thematrix. For atwo dimensionalPDEthere will beonly five diagonalswhich arenonzero,andfor a onedimensionalPDEtherearethreenon- zerodiagonals.Thisregularstructureenablesaconsiderablereductionin memoryuseandthenumber of operationsperformedsinceonly thesesevendiagonalsneedto bestoredandoperatedupon.Using the compassnotationof the equationsdiscussedpreviously in Section2.1 the above linear equation becomes $ $&%'( ()%*,+ + - .
for a discretisationof a onedimensionalPDE, $ $ %* ( ( %' + + %* %'/ / ! 0 (3.4)
for a discretisationof a two dimensionalPDE,and $ $ %' ( ( %' + + %* %'/ /1%* %*2 2 - 0 (3.5)
for a threedimensionalPDE. The subscript3 refersto the point at which the equationis centred, andthe 4 657 8 69: <; and = subscriptsrefer to theneighbouringEast,West,North, South,Top and
CHAPTER3. LINEAR SOLVERS 54
For somesystemstheequationsaresymmetric,E " (3.7)
3.1.2 Dir ect Methods
LU (Lower Upper) Factorisation
A matrix
canbefactoredinto loweranduppertriangularcomponentssuchthat,FHG - " (3.8)
This decompositioncanbeusedto solve theequation,&JI FHGLK & F I G K M (3.9)
by first solvingfor thevector N suchthat F N M (3.10)
andthensolving G ) N " (3.11)
Theonly problemremainingis factoring
into FaG
, a processthatcanbeperformedusingCrout’s algorithm.For eachcolumnof thematrix, b Q 6Z "]"]" 8 ,_ _ T V c _ c c -Z C[ "^"]" T Q _ Q (3.14)P Q0dHe T V c _ c P c gf P P e T V c _ c P c f b % Q b % Z "^"]" 8 " Thealgorithmfor factoringthe
arrayinto the
of thesystem I FHGLK hi
beinggivenin Figure3.2.
for b Q 6Z "^"]" 8_ for -Z, [ "]"]" b T Q_ T'j V c W _ c c _ QP Q.dlk T j V c _ c P c _ gm for b % Q b % Z "]"]" 8P P k T'j V c _ c P c nm Figure3.1: Factoring
into
FaG .O d P for !Z C[ "^"]" 8O k UT j V P O om d P O d _ for -8 T Q C8 T Z "]"^" Q k O UT j ^` _ g,nm d _
Figure3.2: Solvingthesystem I FHGpK hi
.
equations(which would correspondto theN pointson a finite volumemesh)the storagerequirementfor anLU factorisationis
8&q % 8 , whilst thenumberof operationsis of r I8&s K
for factorisation,and r I8&q K for solution. For a sparsesystemof equationssucha schemeis rather inefficient since
8 is likely to be large, and most of
is zero. A more efficient banddiagonal
versionwherethearrayis storedasabandonly wideenoughto storethefarthestoff-banddiagonalis implementedin Presset al[133].
For symmetricmatriceswhere t
u FHF (sometimesreferredto asthe squareroot of the
matrix),andtheLDL (LowerDiagonalLower
8 squareroot operations,which canbe a slow
operationon many computers.
ThomasTridiagonal Solver
. . . . . . V x V y V x
X{zzzzzY RwwwwwS q
To solvesuchasetof equationsrequiresonly a forwardanda backwardsubstitution,thealgorithmis givenin Figure3.3. x O - d
for -Z, [ "]"]" 8 y V d x UT O JI T7 O V K d
O for 8 T Q 8 T Z "^"]" Q O T ]` ^`
Figure3.3: TheThomasTridiagonalAlgorithm.
Block Tridiagonal Solvers
For afinite volumediscretisationof a two dimensionalPDEasdescribedby equation(3.4),thematrix structurecanbeviewedas,RwwwwwS v q v q q
. . . . . .
... V X{zzzzzY
RwwwwwS q ... V
X{zzzzzY (3.16)
wherethesubmatricies , v , and
v RwwwS $ ( ,+ < $ < ( # . . . . . .
In a mannersimilar to theThomasTridiagonalalgorithmtheseequationscanbesolvedby a forward anda backwardsblock substitution.A similar block structurecanbeusedto solve threedimensional PDEs.
Thematrix inverse V usedin Figure3.4,which containstheblock tridiagonalalgorithm,is purely for notationalpurposes.Insteadof calculatingthe inverseandperforminga matrix multiplicationas in V , thesystemcanbesolvedas
- by factoring
into its
F and
G components,
v N V for -Z, [ "]"]" 8 V V v UT N JI T N V K V N for 8 T Q 8 T Z "^"]" Q N T ^` ^`
Figure3.4: TheBlock TridiagonalAlgorithm.
The two dimensionalversionof thesolver uses 8 s6Cq
wordsof storageandtakes r I8 q K operations (assuminga squaremesh),whilst the threedimensionalversionuses
8&6Cs words of storageandr I8&6s K operations.Whilst this is betterthanLU factorisation,it is not asgoodastheonedimen-
sionaltridiagonalsolver, andis lessefficient thanmostiterative schemes.However for caseswhere onedimensionof the problemis muchgreaterthanthe othersthe densesubmatriciescanbe made smallerandtheefficiency of themethod(both in termsof storageandnumberof operations)canbe greatlyimproved1.
3.1.3 Iterati veMethods
set U
Thethreeclassesof iterativemethodsdiscussedin this chapterare simpleiterativemethods,suchasJacobiiteration, incompletefactorisationschemes,suchas SIP(Strongly Implicit Procedure,also known as Stone’smethod), Krylov spacemethods,suchastheConjugateGradientmethod,and multigrid schemes.
1For a ¨h©«ª¬¨®6¯<¨7°M± threedimensionalmesh,thestoragerequiredis ±}°M¨®² , andtheoperationcountof order ³ ´]±µ¨·¶o¸ . By changingfrom a cubic meshto onewhere ±7¹»º<¼6¨ , the storagerequirementis reducedby a factor of ½¾º<¿ , and the operationcountis reducedby a factorof ÀoÁ@ÂoÃ .
Convergenceof an Iterati veScheme
Somemethodmustbe madeto rank the fitnessof the approximatesolutionsso that the decisionto terminatethe solvers iterative loop canbe made. If the solutionat iteration
is c
, thentheerrorat iteration
Of coursetheexactsolution
is unknown andthussois theerror. However we caneasilycalculate the residualat any stepandtheresidualis proportionalto theerror, so if theresidualdecreasesby a factorof Q¾Å VÆ sodoestheerror. Theresidualof thesystem(3.1) is definedbyÇ c T c " (3.22)
Most iterative linearsolversincludea calculationof theresidual(or a closeapproximation)aspartof thesolutionalgorithm,andsowith thesemethodstheconvergenceof thesolver canbemonitoredat no extra computationalcost.
A scalarmeasureof theresidualvector Ç ’s lengthis givenby its norm.A family of normsis givenbyÈ Ç È6É e j DÊ Ë Ê É8 f É " (3.23)
Theone,two andinfinity normsare, È Ç È j DÊ Ë Ê8 (3.24)È Ç È q »Ì j ÍË q8 (3.25)
and È Ç ÈgÎ ÏÐ.Ñ W |Ò|Ò| Ê Ë Ê " (3.26)
For thetestsdescribedin thelattersectionsof thischaptertheinfinity normwasusedasa measureof convergence.
The ResidualForm of the Equations
If Ó is thedifferencebetweensuccessive solutionsby an iterative method,Ó » c ` T c , thenthe system
(3.27)
canberewrittenas JI c % Ó K Ô c % Ó Ô (3.28) Ó Ô T c "
Theright handsideof thelastequationin (3.28)is theresidualof the iteration,andthuswe can
rewrite equation(3.27)in its residualform,Ç c Ô T c Ó Ç c (3.29) c ` Ô c % Ó "
termswheredifferencescan be several ordersof
3.1.4 Simple Iterati veMethods
which is appliediteratively to the systemto be solved. Jacobi’s method,Gauss–Seidel,Successive Over Relaxation(SOR),SymmetricSuccessive Over Relaxation(SSOR)andRed-BlackSuccessive Over Relaxation(RBSOR)are all linear solvers of this form. Thesemethodsare the simplestto implement,but aretypically the slowestto convergeto a solution. However they canbe effectively usedaspreconditionersfor Krylov spacemethods,or smoothersfor multigrid schemes.
Jacobi’sMethod
This suggestsaniterativemethoddefinedby, c$ Q $ e T >@? >@? c V >@? f (3.33)
wherethe termson the right handsideof the equationareall from the previous iteration. This algorithmis givenin Figure3.6.
set U Q
while Õi¢D£¤
and Ä ¥ Ä ¢D£¤Ó T c V c c V %×Ö Ó Ä È Ó È Î- % Q Figure3.6: TheJacobimethod.
TheJacobimethodrequires 8
Successive Over Relaxation (SOR)
(3.34)
Theequationsnow containa datadependency, with thenew valueof c$ dependingon theprevious equationsupdatedvalues.This limits theability to paralleliseor vectorisethealgorithm.Thedepen- dency canberemovedby changingtheorderin which theequationsaresolved(suchasis donewith RBSOR)but this in turn will affect therateof convergence,typically for theworse.
set U Q
while Õ¡¢D£¤
and Ä ¥ Ä ¢D£¤ for Q 6Z "]"^" 8Ó Þ « k T j V c T j U c V m c c V %×Ö Ó Ä È Ó ÈnÎ§! % Q
Figure3.7: TheSORalgorithm.
The choiceof the valueof Ö is crucial to the convergenceof the SORmethod. A theoremdueto Kahan[74] shows thatSORfails to convergeif Ö is outsidethe interval (0,2). If the relaxationtermÖ Q thenSORreducesto theGauss-Seidelmethod.
SymmetricSORandRed-BlackSORutilise thesamealgorithmbut with changesmadeto theorder of theoperations.With SymmetricSOReverysweepthroughtheequationsin theorder Q 6Z "^"]" 8is followedby a sweepin thereverseorder, ß8& 8 T Q "^"]" Q . With Red-BlackSORtheoperations aredonein two passes,the first passoperatingon the odd elementsof the array, Q [ "]"^" 8 T Q ,followedby thesecondpasswhich operateson theevenelements, -Z à "^"]" 8 .
Theadvantagesof Red-BlackSORis thatdependenciesbetweenadjacentarrayvaluesareremoved, enablingthemethodto bevectorisedor parallelised.However, asis seenin Figures(3.22)to (3.25), this is at thecostof agreatlyreducedrateof convergence.
arrayinto it’s FaG
F and
matrix andoneor two further bands. For eachiteration a forward and backward substitutionprocessusing this incompletefactorisationis applied to the residualformulationof thesystemof equations.
Thesemethodsareefficient in their own right, but alsohave valueaspreconditionersfor theKrylov spacesolvers,andassmoothersfor the multigrid schemes.The mostcommonlyencounteredim- plementationsare IncompleteCholesky factorisation(commonlyusedas a preconditionerfor the ConjugateGradientmethod)andtheStronglyImplicit methodof Stone[163].
Incomplete Cholesky Factorisation (IC)
set U Q
while Õ¡¢D£¤
and Ä ¥ Ä ¢D£¤Ç c V T c V Ä È Ç c V ÈoÎ for
Pâ Q CZ "^"]"<ãäfor b Q 6Z "^"]"<ãåfor Q CZ "]"]"<ãæO ç è ç I Ë ç T},+ é ê O V ç è V çT} / « é ê O V ç è V çT}2 « é ê O ç«V è ç«V Kfor Pâ ã ä ã ä T Q "]"^" Qfor b ãå ãå6T Q "^"]" Qfor ãæ ãænT Q "]"^" QO ç è ç I O çÍT è ç I ( « é ê O ]` ç%· é ê O ` ç%· é ê O ç{` K<K c c V % N§! % Q
Figure3.8: TheIncompleteCholesky method.
for Pâ Q 6Z "]"^"ã äfor b Q CZ "]"^"ãåfor Q 6Z "^"]"<ãæè ç Q0dìëíííî $ « é ê T I + é ê è V ç K qT I / é ê è V ç K qT I 2 « é ê è ç«V K q
Figure3.9: TheIncompleteCholesky andIncompleteLU factorisations.
thentheincompletefactorisationï canbewrittenasï ßIð % FMK Ið % F K (3.37)
wherethe ð
arrayonly hasnon-zerocomponentson thediagonal,thevaluesbeingè Qñ UT j V I è K q (3.38)
whilst theoff-diagonalvaluesof the F
matrix have thesamevaluesasthecorrespondingelementsin the
the residualis calculated, Ç c V T c V (3.39)
thenanupdateis calculatedby solving Ið % FMK N Ç c V Ið % F Kò N (3.40)
with theupdatethenbeingaddedto thepreviousiterationssolution, c c V % ò " (3.41)
Thesolutionalgorithmis givenin Figure3.8,with thefactorisationbeinggivenin Figure3.9.
set U Q
while Õ¡ ¢D£¤
Pâ Q CZ "^"]"<ãäfor b Q 6Z "^"]"<ã åfor Q CZ "]"]"<ã æO ç è ç I Ë çT} + é ê O V çT}/ « é ê O V çT}2 « é ê O ç«V Kfor Pâ ãä ãäT Q "]"^" Qfor b ã å ã å T Q "^"]" Qfor ã æ ã æ T Q "]"^" QO ç O ç T è ç I ( é ê O ]` ç%· « é ê O ` ç%· « é ê O ç^` K c c V % N§! % Q Figure3.10:TheIncompleteLU method.
The methodis not asfastasthe SIP andMSI formulationsdescribedbelow, andis not often used as a solver. However it hasfrequentlybeenusedas a preconditionerfor the ConjugateGradient method,thecombinationbeingreferredto asthe IncompleteCholesky–ConjugateGradientmethod or ICCG. The solver canbe modified to remove the needto take squareroots in the factorisation of the equations(the IncompleteLDL method),and to be appliedto non-symmetricsystems(the IncompleteLU method(ILU)). Thefactorisationandsolutionalgorithmsfor theILU solveraregiven in Figures3.10and3.11.
The factorisationstepof the IC andILU solversonly requiresthe storageof the diagonalof the ð
matrix, whilst the solvers themselvesrequirethe storageof the residual Ç . Thereforethe memory usageof themethodis only
Z8 words.
for Pâ Q 6Z "]"^"ãäfor b Q CZ "]"^"ã åfor Q 6Z "^"]"<ã æè ç Q0d ØÚ $ é ê T} + « é ê ( «ó é ê è V çT}/ « é ê é#ó ê è V çT} 2 é ê é ê^ó è ç«V ÛÜ Figure3.11:TheIncompleteLU factorisation.
Strongly Implicit Procedure(SIP)
thatis usedisï FôG %*õ (3.42)
where õ is theerrorbetweentheexactandapproximatefactorisations.
; two extra non-zerodiagonalsif
is a five-diagonal matrix resultingfrom a two dimensionalPDE,or six extra diagonalsif it is a seven-diagonalmatrix from athreedimensionalPDE.
To make ï a goodapproximationof
, the õ arrayis setsuchthatõ )ö Å " (3.43)
This is doneby recognisingthat the systembeingsolved is from a finite volumeapproximationof a PDE.Thusthevaluesof the
field in theextra diagonalsof õ canbe approximatedby a second
orderextrapolationof the valuesof
. By putting the termsfor the extrapolationinto the elements of õ andcancellingwith thevaluesof
in theextra diagonalsof õ thenthesystemcanbemadeto
approximateequation(3.43). Finally, to make the LU factorisationuniquethe diagonalelementsofG aresetto 1.
Thesystemof equationsis thensolvediteratively in asimilarmannerto thesolutionof theIncomplete Cholesky system;for eachiteration
theresidualis calculated,Ç c V T c V (3.44)
thenanupdateis calculatedby solving F N Ç c V G-ò N (3.45)
with theupdatethenbeingaddedto thepreviousiterationssolution, c c V % ò " (3.46)
As is seenin Figures(3.22) to (3.25), the SIP solver is much fasterthan the simpler ILU and IC schemes,andso is suitableasa solver in its own right, aswell asa smootherwith other iterative methods.It requires÷ 8 wordsof storagefor thesolutionof two dimensionalPDEs,and ø 8 words for threedimensionalPDEs.
set U§ Q
while Õ¡¢D£<¤
Pâ Q CZ "]"^"ãäfor b Q CZ "]"]"<ã åfor Q 6Z "]"^"ã æO ç !¦ ç I Ë ç T è ç O V çT x ç O V çT y ç O ç«V Kfor Pâ ã ä ã ä T Q "^"]" Qfor b ãå ãå6T Q "]"]" Qfor ãæ ãægT Q "^"]" QO ç O çTúù¾ ç O ^` çT _ ç O ` çTüû ç O ç{` c c V % N- % Q
Figure3.12:TheStronglyImplicit Procedure(SIP)of Stone.
for P Q CZ "]"]"<ãäfor b Q CZ "^"]"<ã åfor Q 6Z "]"^"ã æy ç 2 « é ê d I Q %'ý I ù ç«V % _ çV Kx ç / é ê d I Q %*ý I ù V ç % _ V ç Kè ç ,+ « é ê d I Q %'ý I ù V ç % _ V ç K y ç ù ç«V % x ç ù V ç x ç û V ç % è ç ù V çþ è ç _ V ç\% y çù¾ ç«V ¦ ç Q0dÿ $ « é ê %·ý I
% % þ K T Iè ç ù V ç% x ç _ V ç % y ç û çV Kù ç -¦ ç I ( « é ê Tý
K_ ç -¦ ç I é ê T7ý Kû ç -¦ ç I « é ê T7ý þ K Figure3.13:TheincompleteLU factorisationusedin SIP.
Modified Strongly Implicit procedure(MSI)
F and
compositionin equation(3.42)areallowedto have morenon-zeroelementsthantheequationmatrix .
set U Q
while Õi¢D£<¤
P Q CZ "]"]"<ã äfor b Q CZ "^"]"<ãåfor Q 6Z "]"^"ãæO ç ç I Ë ç T y ç O ç«V T x ç O ]` ç«V T è ç O ` ç«V T ¦ ç O V çT ç O ]` V çT\ ç O V ç Kfor P ã ä ã ä T Q "^"]" Qfor b ãå ãå@T Q "]"]" Qfor ã æ ã æ T Q "^"]" QO ç O çT g ç O ^` ç Túù¾ ç O V ` çT # ç O ` ç T _ ç O V ç{` Tüû ç O V ç{` Tü ç O ç{` c c V % N! % Q
Figure3.14:TheModifiedStronglyImplicit procedure(MSI) of SchneiderandZedan.
for Pâ Q CZ "]"]"<ãäfor b Q CZ "^"]"<ã åfor Q CZ "]"^"ã æy ç 2 é ê d I Q %·ý I ù¾ ç«V Tg çV I g]` ç«V % #^` çV KT I # ç«V T×ùn]` ç«V g çV K I ù¾ ` ç«V % # ` ç«V K<KKx ç T y çn ç«V è ç T y ç # çV T x çù¾]` ç«V ¦ ç I / é ê T y ç _ ç«V %·ý I x ç _ ]` ç I n^` V ç% Z _ ^` V ç\%ü]` V ç KT _ V ç I + « é ê T y ç0û ç«V K<KKd I Q %*ý IZ _ V ç%×û V çUT _ V çù¾ V çT g V ç I n^` V ç% Z _ ^` V ç,%ü]` V ç K<KK ç T x ç _ ^` çV T ¦ çg V ç ç I + « é ê T y ç û ç«V T ¦ ç ù V çT}ý I y ç ù ç«V % è ç ù ` çV % x ç û V ç KKd I Q %*ý I Z ù V ç % _ V ç % Z û V ç KK x ç ^` ç«V q y ç ù ç«V s x ç V çV % è ç ` ç«V -è ç.ù¾ ç«V -è ç # ` ç«V Æ çg]` V ç \ çù¾V ç !¦ ç _ V ç ç _ ]` V ç -¦ ç¾û V ç%\ ç _ V ç C çü]` V ç q ç û V çn ç ç I ( é ê T x çü^` ç«V T ç #]` V çT}ý I Z,I % Æ K % s % % K<Kù¾ ç T ç Iè çü ` ç«V %\ ç #V ç K ç ç I « é ê T è ç û ` ç«V T}ý I q % s % % Z,I % KK<K_ ç T ç I¦ ç V ç % ç _ ^` V ç Kû ç T ç ç V çü ç ç I « é ê T7ý I % % % C % q KK
Figure3.15:TheincompleteLU factorisationusedin theMSI method.
3.1.6 Krylo v SpaceMethods
Sincethe developmentof the ConjugateGradientschemea numberof otherKrylov spaceschemes have beendevised,with goodsummariesof themethodsbeinggivenin thebooksby Barrettet al[7] andGolubandVanLoan[51]. A moreintuitive introductionto themethodsis givenin thepaperby Shewchuk[154].
In this chapterthe ConjugateGradient(CG), the Bi-ConjugateGradientStabilised(BiCGSTAB), andthe GeneralMinimalisedResidual(GMRES)methodsarediscussed.Othermethodswhich are commonlyencounteredin theliteratureincludetheSteepestDescent(SD),theBi-ConjugateGradient (BiCG), the ConjugateGradientSquared(CGS),andthe Quasi-MinimalResidual(QMR) methods. Theseadditionalmethodswerealsoimplementedfor this studybut offeredno advantagesover the threemethodsdiscussedbelow. They arebriefly discussedat the endof thesection,but for a fuller descriptionthereaderis referredto Barrettetal[7] andGolubandVanLoan[51].
Following thepresentationgivenin Barrettetal[7], for eachiterationtheresidualis minimisedalonga pathorthogonalto theprevioussearch.Thusthesolverstepsalongtheresidualsurfacein thesolution spaceto find theminimum residual.At eachiterationthe iterate
searchdirectionvector c , c i c V %*ý c c " (3.47)
TheresidualÇ c T c is updatedas,Ç c Ç c V Tý c c where c - c " (3.48)
Thechoice ý c | ! | " ! minimisesÇ c Ç c for all choicesof ý .
Thesearchdirectionsareupdatedusingtheresidual, c Ç c % c V c V (3.49)
wherethechoice
c Ç c " Ç cÇ c V " Ç c V (3.50)
ensuresthat Ç c and Ç c V , or c and c V areorthogonal.
The pseudocodefor the preconditionedconjugategradientmethodis given in Figure3.16,the pre- conditioningbeingthe“solve ï ò c V Ç c V ” operation.If ï is setto theidentitymatrix # thenfor eachiteration
ò Ç andthe algorithmsimplifiesto its unpreconditionedform. The preconditioned form of thesolver requires
àµ8 wordsof storage(not including that requiredby thepreconditioner),
whilst theunpreconditionedform requires [µ8
while Õi¢D£¤
and Ä ¥ Ä ¢D£¤ solve ï ò c V Ç c V þ c V Ç$ V&% " ò c V if Å ò
else
c V (' ó' ó c ò c V % c V c c - cý c ' ó! | ) c c V %*ý c cÇ c Ç c V T7ý c cÄ È Ç c È Î- % Q
Figure3.16:ThepreconditionedConjugateGradientalgorithm.
GeneralisedMinimal Residual(GMRES)
TheGMRESiteratesareconstructedastheseries c % O +* %"]"^".% O c * c (3.51)
wherethe O coefficientshavebeenchosento minimisetheresidualnorm. Thenumberof operations in the calculationof the
c iteratethusincreaseslinearly with the numberof iterations,asdoesthe
storageused.To placeanupperlimit on thestoragerequiredby thescheme,thesolver is commonly implementedwith a restartafter
-,/.10/2 £ ,/2 iterations,limiting thememoryusageto
8 % -,/.10/2 £ ,/2gI8 % ,/.1032 £ ,32 % à K wordsof storage.
The algorithmfor the restartedGMRESsolver is given in Figure3.17. It is taken from the method suggestedby SaadandSchultz[145].
set U Q
while Õi ¢D£¤
and Ä ¥ Ä ¢D£<¤Ç T if ( Q ) Ä È Ç È q
solve ï * Ç if ( Q ) 4 Ä d È * È q5 Åù È * È q* * d È * È q
for Q CZ "]"]" ,/.1032 £ ,32 while Ä Ä ¢D£<¤Ç - * solve ï * ^` Ç for b Q 6Z "]"^" * ]` " * * ]` * ]` T * \]` È * ]` È q* ^` * ^` d ]` for b Q 6Z "]"^" T Qý % _ ` T _ % ` ý `
d ñ q % q^` _ ]` d ñ q %6 q^` \ ñ q % q^` \]` Åùn]` T _ ù¾ùn #ùnÄ 4 Ê ù¾^` Ê- % Qfor b T Q "^"]" QO ù d for Pâ b T Q b T Z "^"]" Qù ç ù ç T O ç« for b T Q "^"]" Q % O *
Figure3.17:ThePreconditionedrestartedGMRESMethod.
set Ç ® T 7Ç Ç § Q
while Õ¡¢D£<¤
and Ä ¥ Ä ¢D£¤þ c V 87Ç " Ç c V if þ c V Å or Ö c V Å methodfails if Q Ç
else
c V ' ó:9 ó' ó; ó c Ç c V % c V < c V T)Ö c V * c V >=solve ï@? * c - ?ý c ' óA | B 5 Ç c V Tý c* cÄ È 5 c ÈnÎ
if Ä ¥ Ä ¢D£¤ c c V %*ý c ?stop solve ïC?5 5D - ?5Ö c FE | GE | E c c V %'ý c c %7Ö c ?5Ç c 5 T)Ö c DÄ È Ç c ÈoÎ- % Q
Figure3.18:ThepreconditionedBi-ConjugateGradientStabilisedalgorithm.
Other Krylo v SpaceMethods
The SteepestDescentscheme(SD) is an optimisationmethod—byminimising the residualsof the linearequationsit arrivesat theequationssolution[154]. For thecurrentiterationthesolver corrects thesolutionin thedirectionof thesteepestdownwardgradient.It is simplebut inefficient.
andtheotheron its transpose
is a modificationof the BiCG solver that appliesthe updatingoperationsfor the
sequenceand the
sequenceto both vectors. Ideally this would doublethe convergencerate, but in practice
convergenceis very irregular.
TheQuasi-MinimalResidual(QMR)[46] methodappliesaleastsquaressolveandupdateto theBiCG residuals,smoothingout theconvergenceof themethodandpreventingthebreakdownsthatcanoccur with BiCG.
Preconditioners
where Ç is a currentresidualfield, ò
is thepreconditionedresidual,and ï is a matrix having similar propertiesto
. If ï is identicalto
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
Unpreconditioned CG MSI MSI Preconditioned CG
Figure 3.19: Comparisonof the convergenceof the MSI, the unpreconditionedCG, and the MSI preconditionedCGsolvers.
Theconvergenceof anunpreconditionedCG solver, anda MSI solver togetherwith a MSI precondi- tionedCG solver areshown in Figure3.19. Theconjugategradientsolver convergesslowly at first, with the convergencerate increasingafter 10 secondsof solutiontime. In contrast,the incomplete
Among the easiestpreconditioningmethodsto implementarethe JacobiandSymmetricSORalgo- rithms. More complex methodssuchasincompleteCholesky decomposition,IncompleteLU methods,andthemultigrid schemesgive fasterconvergence,at theexpenseof greatermemoryusageand a morecomplex implementation.
3.1.7 Multigrid Methods
For themultigrid methoddiscussedhere,a setof equationsis givenfor a PDEdiscretisedon a fine mesh. The multigrid schemethen transformsthe equationsonto a seriesof progressively coarser meshes,solvingtheequationsfully on thecoarsestmesh.Thesolutionis thensolvedon theseriesof successively finer meshes,usingthesolutionfrom theprevious(coarser)meshasan initial estimate of the solution,finally solving on the finestmesh.By transferringfrom a fine to a coarsemeshthe mediumwavelengtherrorsin thefine meshsolutionaretransformedinto shortwavelengtherrorson the coarsemeshwhich aremucheasierto smooth. The computationcostfor solving on the coarse meshesis low, and the cost for solving on the finer meshesis reducedby using the coarsemesh solutionasaninitial estimate.
The threebasicoperatorsfor the multigrid techniquearethe smoother, which improvesthe current estimateof thesolutionon agivenmesh,andtherestrictionandprolongationoperators,whichmapa setof equationsanda solutionbetweena fineandacoarsemesh.
Given two meshes,with the fine meshhaving a meshspacingof , andthe coarsemeshhaving a spacingof
Z . The restrictionoperatormapsthe fine meshsolutiononto the coarsemesh,and is written q IHi (3.53)
whilst theprolongationoperatorperformstheinverseoperation,interpolatinga coarsemeshsolution ontoafine mesh, IJ q " (3.54)
For all but thecoarsestmeshthemultigrid solveris recursivelyappliedto solvetherestrictedsystemof equations.On thecoarsestmeshtherestrictedsystemis solvedeitherby aniterativemethod,solving theequationsto full convergence,or by theuseof a directmethod.
On thefinestmeshtheinfinity normof theresidualis takenandcomparedto a suppliedtolerance.If thenormof the residualis lessthanthe tolerancethenthesolver is takento have convergedandthe
Õ¡¢D£<¤ smooth
c Ç c T cif on finestmeshÄ c È Ç c ÈoÎ if Ä c ¥ Ä ¢D£¤ exitÇ q IH Ç c
if on coarsestmesh solve
aq N q Ç q else
applymultigrid to aq N q Ç q N IJ N q c ` c % N- % Q
Figure3.20:Themultigrid algorithm.
refers to the iteration number, whilst the superscript
q specifiesthatthevariableappliesto thecoarsemesh.
Prolongationand Restriction Operators
For the first type of prolongationandrestrictionscheme,wherethe equationsarerederivedandthe boundaryconditionsre-appliedoneachmesh,thesolveris necessarilycloselytiedto thediscretisation of the PDE. The resultingcodeis not very general,and for complicateddifferencingschemesthe calculationof theequationscanbeslow2. For sucha schemethevalueof thesolutionat eachpoint canbe transferedto the correspondingpoint on the coarser/finermesh,an operationcalledstraight injection. For a finite volumesolver this carriessomeextra overheadin that the cell centresof two meshesdon’t align, andinsteadof a simpleinjectionprocessthesolutionmustbeaveragedover the cells.
For a blackbox solver that is not directly coupledto thediscretisationprocesstheprolongationand restrictionoperatorsmust be derived from the fine meshequationsratherthan re-discretisedfrom the underlyingequations.A methodthat appliesto the solutionof PDEsis developedbelow. The methodis appliedto meshesthathave
Z % [ nodesalongeachaxis includingboundarynodes(ie:Z % Q internalpointson eachaxis). However it canbe usedfor systemswith K Z % [ nodeson eachaxis, where K % Q is the numberof pointsalongthe axis on the coarsestmesh. The method canbeusedfor theequationsarisingfrom bothfinite volumeandfinite differencedifferencing,and straightinjection canbe usedfor transferringfields betweendifferentmesheswithout the problems of averagingsolutionsas can the casewith the finite volume schemes.In addition the boundary
2Moreover, for someequationssuchasthe pressurecorrectionequationin SIMPLE couplingschemes,the variablesare definedonly uponthemeshthey werecreatedon.
X{zzzzY RwwwwS q s X{zzzzY
RwwwwS q s X{zzzzY " (3.55)
For the above systemthe Í and nodesare interior boundarypoints(ie: the first row of points on the interior of a solutiondomain),andthe boundaryconditionshave beenappliedin a form that removestheneedfor thepointsphysicallylocatedon theboundary(seeSection2.3).
For an elliptic PDE the solutionmustbe reasonablysmooth,andso the solutionvaluesat the even numberedpointsin themeshcanbeestimatedfrom asecondordercentredinterpolationfrom theodd numberedpoints, q q I U % s K q I s % K " (3.56)
By substitutingtheseequationsinto the initial system,a systemwith only half the equationsof the original systemis generated,RS Z $ %* ( ( ,+ L Z $ L %*+ L %'( L ( L,+ R Z $ R %*,+ R XY RS Í s XY RS Z Z sZ XY (3.57)
which canberewritten RS q $ q ( q +L q $ML q (NL q + R q $ R XY RS Í s XY RS @q q s @q XY (3.58)
where q $ Z $ ó %'( «ó %*+ ó q ( ( «ó q + + ó oq Zô q V " (3.59)
For theseequations,restrictingthefield from the onemeshto thenext coarsermeshcanbeaccom- plishedusingsimpleinjection, q q V " (3.60)
Thecorrespondingprolongationoperatoris q ` Q C[, "]"^" (3.61)
for theoddnumberedpoints,whilst theevennumberedpointsarefoundby linearinterpolation ÝQZ I ` % K !Z à "]"^" (3.62)
For a two dimensionalsystemtheequationrestrictionoperatorbecomes q $ « é Z $ ó éó %* ( ó éó %* + ó éó %* «ó é#ó %'/ «ó é#ó q ( « é ( «ó éó q + « é + ó éó q é ó éó q / « é / «ó é#ó q Zô q V q V (3.63)
andtheprolongationoperatorbecomes q ` é ` Q C[, "^"]" b Q C[, "]"^" (3.65)
with theremainingpointsbeingfoundby bilinearinterpolation.
Similarly in threedimensionstheequationrestrictionoperatorsare q $ « é Z $ «ó é#ó ó %J ( «ó éó ó %ß + «ó é#ó ó%J «ó éó ó %ß/ «ó éó ó%J «ó éó ó %ß2 «ó éó ó q ( « é ( ó éó ó q + « é + «ó éó ó q é «ó é#ó ó q / « é / «ó éó ó q « é ó éó ó q 2 « é 2 ó éó ó q c Z q V q V q c V (3.66)
thefield restrictionis q c q V q V q c V (3.67)
andtheprolongationbecomes c q ` é ` ` Q [\ "^"]" b Q [\ "^"]" 6 Q C[, "]"^" (3.68)
with theremainingpointsfoundby trilinear interpolation.
Thesolveoperationin Figure3.20variesdependingon whatlevel of themeshhierarchythesolver is on–forall but thecoarsestmeshthesolutionto q q q (3.69)
is foundby recursingandapplyingthemultigrid solver to thesystem.At thecoarsestmeshhowever the systemis solved either by an iterative techniqueappliedto convergence,or by using a direct method.Sincethis systemis for thecoarsestmeshthecomputationalcostof its solutionis minimal.
Thetestfor overallconvergenceof theschemeis performedonthefinestmesh,wherethenormof the residualis calculatedandcomparedwith theusersuppliedsolutiontolerance.Oncethenormreduces below thespecifiederrorboundthesolveris assumedto haveconvergedandtheprocessis terminated.
The linearsolverswerecomparedin termsof their speed,andmemoryusage.Whencomparingthe speedsof the solversseveral factorscomeinto play. For direct solversthe numberof operationsis fixedfor agivennumberof equations,but for iterativemethodsthetimetakento convergeto asolution dependsnot only on thenumberof equationsbut alsoon thepropertiesof theequationsthemselves (suchastheboundaryconditionsanddiagonaldominance),andtheconvergencecriteriaandtolerance chosen.
Thenumberof equationsto besolved, the layoutof thedatain memory, andminor implementation detailssuchasthesyntaxusedto performa matrix-vectoroperationalsohave a big effect on speed. Theseeffectsarediscussedin the following chapter, but it is importantto notethat comparisonsof differentcodesshouldbemadefor a rangeof arraysizesandwith a consistentcodingstyleto reduce variability dueto thesefactors.
In the following sectionsthetestcaseusedto comparethesolversis described,andthenthesolvers arecomparedon thebasisof their convergencecharacteristicsandscaling.
3.2.1 The Solver TestCase
To comparethe speedsof the linear solvers they were usedto solve two finite volume problems, onewith Dirichlet andthe otherwith Neumannboundaryconditions,which simulatethe equations encounteredin afinite volumeCFDcode.Thetestcaseswererun for bothtwo andthreedimensional problems,andweresolvedto full convergence.
The testcasewasa finite volumediscretisationof the Laplaceequationappliedto a unit squareor cubicdomain,with a sinusoidallyvaryingsourceterm. For thethreedimensionalcasetheequations were, S qUT IVW A IYX K VW A IZ-X O K VW A IZ-X&Z K
(3.70)
with Dirichlet boundaryconditions,andS q T [VW A I ZX K V:W A IZ-X O K V:W A IZ-X&Z K (3.71)
with Neumannboundaryconditions.For Neumannboundariesazeronormalgradientwasappliedto all boundaries,andfor theDirichlet problemthe Å and Q boundariesweresetto
T T Q andT Q respectively, with all otherboundariessetto T Å . With theNeumannproblem,thesolution
is notunique,anda furtherconditionof T Å (3.72)
wasimposedat thecentreof thesolutiondomainto ensureuniqueness.For thetwo dimensionaltest casesthe
Z forcing componentof thesourcetermwasdropped.Thesolutionsto thetwo dimensional
formsof thetestfunctionsareshown in Figure3.213.
For eachequationtwo runsweremade.Thefirst comparedtheconvergenceof the iterative methods at onemeshsize–a
ZP\-]µq meshin two dimensions,and ÷_^ s meshin three. The otherrun wasmade
for a rangeof meshsizes,with resultsobtainedfor the time to reducethe maximumresidualby a factorof Q¾Å Æ , the solution thenbeingconsideredfully converged. The direct methodswerealso timed for the samerangeof array sizes. All runsweremadefrom an initial field of
T Å , and the relaxationparametersfor thesimpleiterative andincompleteLU solversweresetto Ö Q andý Å " ] respectively.
TherunsweremadeonaDECAlpha500auworkstationrunningDigital Unix 4.0E,usingFortran90 codecompiledwith theDigital Fortrancompilerandusingdoubleprecisionstoragefor thefloating
point data. Timings were madeusing the C getrusage andgettimeofday functionswhich provide accuracy to Q0dQnÅÅµÅ of a second,with multiple runsbeingmadeat thesmallarraysizesto ensureanaccurateresolutionof theruntime.
Onefailing of thetestproblem(andonewhich theAuthor hasnot hadtime to remedy)is thesquare or cubic dimensionsof the solutiondomains,and the unit aspectratio of the meshcells. Iterative methodscanstall whenthe meshis distortedfrom sucha regular topology[43], an analysisof the mechanismsbehindthefailing beinggivenby Brandt[17]. Unfortunately, timerpreventeda thorough investigationof this phenomena,althoughFerzigerandPeric[43] claim it’s effect is lesspronounced on convection–diffusionproblems.
and ÷_^ s meshes for boththeNeumannandDirichlet boundaryconditionproblems,with theresidualat eachiteration beingprintedout for plotting, the abscissabeingplottedassolutiontime. Otherstudiesof iterative solver have eitherstudiedonly oneor two classesof solver (suchasBriggs[18], Yu[183]), or have madecomparisonsin termsof numberof iterations(FerzigerandPeric[43], Iserles[68], Barrettet al[7], and Kershaw[77]). The speedof eachiteration can vary widely betweendifferent solvers, andsocomparisonsin termsof iterationsaloneis rathermeaningless.Onestudyby Bottaet al[14] makesa comparisonof a numberof differentmethodsin termsof computationspeed,but therateof convergenceis not given, andthe codingof differentsolversby differentgroupsallows for a wide rangeof codingstyleswhich mayaffect therelativesolverspeeds.
In generaltheiterativemethodstooklongerto solvetheproblemswith Neumannboundaryconditions thanthosewith Dirichlet boundaries.The systemfrom the Neumannboundaryproblemis singular unlessthesolutionis specifiedat somepoint in thesolutiondomain.Most methodsperformedmore efficiently if the systemwas left singular, with the exceptionof the direct Cholesky factorisation methodwhich would fail with thesingularsystem.
Figures3.22and3.23show the convergenceof the solversfor the 2D testcaseswith Dirichlet and Neumannboundaryconditionsrespectively. Figures3.24and3.25show theconvergencefor the3D Dirichlet andNeumanntestruns.
The Simple Iterati veMethods
The Neumannproblemswere slower to solve than thosewith Dirichlet boundaryconditions. For the Dirichlet boundaryconditionsthe convergenceexhibits an initial sharpdrop in the maximum residual,followedby a muchlongerperiodwheretherelative residualdropsby a constantratio with eachiteration. For the Neumannboundaryconditionsthis initial sharpdrop wasnot observed,and it was thoughtto be due to the rapid smoothingof the shortwavelengthstructuresadjacentto the boundariesin theDirichlet problem.
Incomplete Factorisation Methods
Multigrid Schemes
Typically theSIPsmoothedmultigrid wasfastest,beingfasterthantheMSI smootherversion,but in many 2D casestheSSORsmoothedmultigrid hadsimilar speedto theSIPsmoothedscheme.
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
2D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU MG-SIP MG-MSI
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU CG-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-SSOR BiCGSTAB-ILU BiCGSTAB-SIP BiCGSTAB-MSI BiCGSTAB-MG-Jacobi BiCGSTAB-MG-SSOR BiCGSTAB-MG-ILU BiCGSTAB-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI GMRES-MG-Jacobi GMRES-MG-SSOR GMRES-MG-ILU GMRES-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
Jacobi MSI MG-Jacobi MG-SIP CG CG-Jacobi CG-MSI CG-MG-SIP BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI GMRES-MG-SIP
Figure3.22:Convergencewith timeof 2D solversona ZP\-]µq
meshwith Dirichlet boundaryconditions. Notethescaleof thex axisvariesfrom graphto graph.
Krylo v SpaceMethods
R es
id ua
2D/Neumann: Convergance of Simple, Incomplete LU and Multigrid Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
l` Time
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
l` Time
Figure3.23: Convergencewith time of 2D solverson a ZP\P] q
meshwith Neumannboundarycondi- tions.Notethescaleof thex axisvariesfrom graphto graph.
Preconditioningimprovedthesmoothnessof theconvergenceof theCGsolver, with theincom- pletefactorisationsmoothedmultigrid preconditionersforcing a smoothlinearreductionin the residual.TheJacobismoothedmultigrid preconditionerhowevergaveaverypoorperformance for theNeumannboundaryproblem.
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
l` Time
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
l` Time
Figure3.24:Convergenceof 3D solverson a ÷_^ s meshwith Dirichlet boundaryconditions.
whereit gaveamuchslower rateof convergence. GMRES:TheGMRESsolver typically displayeda monotonicconvergence.Theheavily pre- conditionedversionswith multigrid or incompletefactorisationpreconditionersconvergedwith asimilar rateto theCGor BiCGSTAB solvers,but theunpreconditionedsolverhadaveryslow convergencerate,comparableto thesimpleiterativeschemes.
Summary
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
l` Time
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
l` Time
Figure3.25:Convergenceof 3D solversona ÷a^ s meshwith Neumannboundaryconditions.
of theDirichlet boundaryproblemstheincompletefactorisationmethodsgiveafasterinitial reduction in theresidual,behaviour thatwasthoughtto bedueto their rapidsmoothingof theshortwavelength componentsin theresidualat theboundary.
Themultigrid-SIPandKrylov-multigrid-incompletefactorisationmethodsgivethebestperformance, bothexhibiting asmoothmonotonicreductionin theresidual,whilst beingfasterthantheothermeth- ods.
Thesolverswererunfor arangeof meshsizes,with areductionof QnÅ Æ in theinfinity normof theresidual (
È Ç È Î ) beingusedasa convergencetolerance.Thesolverswererun for theequationsresulting from theDirichlet andNeumannboundaryconditionproblems(seeEquations(3.70)and(3.71)and Figure3.21),with both the2D and3D versionsof thesolversbeingtested.Thetime for solutionof thetestcodesis plottedasa functionof meshsizein Figures3.26to 3.29,with Figures3.26and3.27 showing thesolutiontime for 2D problemswith Dirichlet andNeumannboundaryconditions,whilst Figures3.28and3.29show thesolutiontimesfor the3D problems.
Dir ectSolvers
Thesesolversrun for thesamelengthof timeregardlessof theboundaryconditions– nopivotingwas donesotherewasasetnumberandorderof operationsregardlessof theequationsbeingsolved.With largesystemsof equationsthetime that thedenseLU andLDL solverstook to solve hadan r I8 K scaling,where
8 wasthe numberof equations.However, from an operationcountoneexpectsanr I8&s K scaling,andno explanationis availablefor thescalingdiscrepancy.
For two dimensionalproblemsthebandedLU andCholesky solverswereboth r I8&q K in run time. A jump in thesolver runtimeis clearlyvisible at
8 QnÅµÅÅ and 8 ß[ ÅµÅÅ for the2D bandedLU and
Cholesky solversrespectively. This occurswhenthesolverdataexceededthesizeof thecacheof the testmachine4. Theblock tridiagonalsolversalsoexhibit r I8&q K scaling,but run15and4 timesfaster thanthebandedLU andCholesky solversrespectively.
The threedimensionalsolversexhibit slightly differentbehaviour to thosesolving two dimensional problems.As might beexpectedthedenseLU andLDL solversscaleidentically to the two dimen- sionalversions.However thetridiagonalsolversscalelike r I8&s K , asdoesthebandedLU solver.
Thedirectmethodswererarely fasterthanthe iterative schemes.However, for the two dimensional testproblemstheblock-tridiagonalsolver wasfasterthanthesimpleiterative schemes,andfor large two dimensionalsystems(
8 ¥ Z ÅÅÅµÅ ) with Neumannboundariesit wasfasterthanthe incomplete factorisationschemes.
The Simple Iterati veMethods
Incomplete Factorisation Methods
For thetwo dimensionalproblemsthesolutiontimefor theincompletefactorisationsolversscalesliker I8 K for smallnumbersof equations,but asthenumbersof equationsincreasesthey startedto exhibitr I8&q K behaviour. For thethreedimensionalproblemthesolversexhibited r I8 | K behaviour.
For small to moderatesizedproblems( 8 Q¾ÅÅÅ for Neumannproblems,and
8 Q¾ÅÅÅµÅ for Dirichlet problems)theMSI solverwastypically thefastestsolverof all themethodstested.
4Theeffectsof cachesizearediscussedin Chapter6
S ol
ut io
n T
im e
LU Decomposition LDL Decomposition Banded LU Decomposition Banded Cholesky Decomposition Block Tridiagonal Block Tridiagonal/LDL
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
2D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid Solvers
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI GMRES-MG-Jacobi GMRES-MG-ILU GMRES-MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
LDL Decomposition Block Tridiagonal/LDL Jacobi MSI MG-Jacobi MG-SIP CG-MSI CG-MG-SIP BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI GMRES-MG-SIP
Figure3.26: The time taken to solve a two dimensionaldiscretisationof the Laplaceequationwith Dirichlet boundaryconditions.
S ol
ut io
n T
im e
LU Decomposition LDL Decomposition Banded LU Decomposition Banded Cholesky Decomposition Block Tridiagonal Block Tridiagonal/LDL
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
2D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid Solvers
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
Figure3.27: The time taken to solve a two dimensionaldiscretisationof the Laplaceequationwith Neumannboundaryconditions.
T im
e to
S ol
ut io
LU Decomposition LDL Decomposition Banded LU Decomposition Block Tridiagonal Block Tridiagonal/LDL
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
3D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid Solvers
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
Figure3.28: Thetime takento solve a threedimensionaldiscretisationof theLaplaceequationwith Dirichlet boundaryconditions.
T im
e to
S ol
ut io
LU Decomposition LDL Decomposition Banded LU Decomposition Block Tridiagonal Block Tridiagonal/LDL
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
3D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid Solvers
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
Figure3.29: Thetime takento solve a threedimensionaldiscretisationof theLaplaceequationwith Neumannboundaryconditions.
For the two dimensionaltestcasesthe time taken by the multigrid methodsto solve the systemof equationsscaledlike r I8 | K for the Dirichlet boundaryconditionproblem,and r I8 | s K for the problemwith Neumannboundaries.With the threedimensionaltestcasethescalingwas r I8 | K for bothboundaryconditions.
For small problemsthe multigrid schemearetypically the slowest,for 8 QnÅµÅ beingslower than
theslowestof thedirectmethods.However their superiorscalingmeansthat for largeproblemsizes ( 8 ¥ Z ÅÅµÅÅ ) themultigrid solversarethefastestof themethodsstudied.
Krylo v SpaceMethods
8 ö QnÅµÅÅµÅ whencoupled with the MSI or SIP smoothedmultigrid preconditioners.For larger problemsthe slightly superior scalingof themultigrid solversensuresthatthey becomefaster. Themultigrid-preconditionedKrylov spacemethodsscaledbetterthanmethodsthatusedotherpreconditioners,dueto thebetterscalingof multigrid schemesasa whole. CG:For thetwo dimensionaltestcasethetimetakenby theconjugategradientmethodto solve
the systemof equationsscaledbetweenr I8 | s K and r I8 | Æ K , with the multigrid preconditioned solvers scalinglike r I8 | s K and the other preconditionersand the unpreconditioned solver scaling r I8 | Æ K . For the Neumannboundariesthe scalingwasof order r I8 | K andr I8 | Æ K for themultigrid-preconditionedandothersolversrespectively.
For thethreedimensionaltestcasesthescalingwas r I8 | K and r I8 | q K for thegeneraland multigrid preconditionedversionsof thesolver respectively. BiCGSTAB: For thetwo dimensionalteststhescalingwas r I8 | Æ K and r I8 | K for thegen- eralandmultigrid-preconditionedcodes,andwith the threedimensionalproblemsthescaling wasof r I8 | K and r I8 | q K . GMRESThis solver exhibitedsimilar scalingto its CG andBiCGSTAB brethren.Generallyit wasslower thanthosetwo solvers.
Summary
pletefactorisationmethod,wastypically thefastestmethodto solvetheequations.For moderatesized problems(wherethenumberof equationswasof theorderof Q¾ÅÅÅµÅ ) theCGandBiCGSTAB Krylov spacemethodwerefastestwhencoupledwith eithertheMSI or SIPpreconditionersor themultigrid preconditionersthat useMSI or SIP smoothing.For large systems(wherethe numberof equations exceeded
[ ÅÅÅµÅ ) themultigrid solverswerethefastest.
Thesolutiontimesfor themultigrid methodsscaledlike r I8 | K for largesetsof equations,unlike theotheriterativeschemesfor which thesolutiontime generallyscaledbetweenr I8 | K to r I8&q K . For preconditionedKrylov spacemethodsthe scalingwas somewherebetweenthe scalingof the Krylov methodandthatof it’spreconditioner.
The directmethodsweretypically theslowestof the methodstested.However it is noteworthy that for two dimensionalsystemswith Neumannboundaryconditionsand more than
Z ÅµÅÅµÅ equations the block tridiagonalsolver wasfasterthanthe incompletefactorisationschemessuchasMSI. The
solutiontime for thedirectLU solver exhibited r I8 K scalingof thesolutiontime,whilst theblock
tridiagonalsolverhad r I8 q K and r I8 s K scalingfor the2D and3D casesrespectively.
3.2.4 Memory Usage
,
) which are ^ 8 and]µ8 for the2D and3D finite volumediscretisationsused.
10
100
1000
10000
100000
1 10 100 1000 10000 100000 1e+06 1e+07
M em
or y
U sa
LU Decomposition Block Tridiagonal Jacobi MSI Multigrid-Jacobi Multigrid-SIP CG BiCGSTAB GMRES
100
1000
10000
100000
M em
or y
U sa
LU Decomposition Block Tridiagonal Jacobi MSI Multigrid-Jacobi Multigrid-SIP CG BiCGSTAB GMRES
Figure3.30:Thememoryusagein wordsof the2D (left) and3D (right) linearsolvers.
The simple iterative schemessuchas SORusedthe leastamountof memory, with SOR,RBSOR andSSORnot requiringany memoryabove the storagerequirementsof the equations,andJacobi requiringonly
8 wordsof memory. The simplestof the incompletefactorisationmethodsalsoare
efficient in memoryuse,with the IncompleteCholesky and ILU methodsboth only requiring Z8
wordsof memoryabovethestorageof theequations.However theSIPandMSI solversrequiremore memory, using ÷ 8 and ø 8 wordsof memoryfor the2D versions,and ø 8 and Q àµ8 in 3D.
The Krylov spacemethodssimilarly vary in their storagerequirements.The unpreconditionedCG solver is the mostefficient requiring
[µ8 wordsof memoryabove the storageof theequations,with
the unpreconditionedBiCGSTAB solver requiring ÷ 8 words of memory. The GMRES solver is comparatively greedyin its memoryusage,requiring
I Q % 8 r K 8 % 8&qr wordsof memory, where
8 r
Thememoryusageof themultigrid solversdependsbothon thenumberof equationsandalsoupon thenumberof grid levelsused.With ã beingthenumberof equationson a particulargrid level, the 2D multigrid solverrequires
] ã , Q [ ã and Q \ ã wordsof memoryfor thatlevel for theJacobi/ILU,SIP andMSI versionsof thesolver respectivly. For the3D solver therequirementsare QµQ ã , Q ^ ã and
Z[ ã . For eachrestrictionfrom a fine to a coarsemeshthenumberof equationsdecreasesby a factorof
Zq for the2D solverand
Zs for the3D solver, sothenumberof pointsonall meshesis givenby theseriesI Q % % Æ "^"]" K 8 for the2D solverand
I Q % % Æ "]"^" K 8 for the3D solver. Truncatingbothof these seriesat threetermsgivesanapproximateestimateof theoverallmemoryrequirementfor thesolvers as QQ " ø 8 , Q ^ 8 and Q ] " ^ 8 for theJacobi/ILU,SIPandMSI versionsof the2D multigrid solver, andQ Z " \8 , Q ] " àµ8 and
Z ÷ " Z8 for theJacobi/ILU,SIPandMSI versionsof the3D solver.
Finally the direct methodshave a much larger memoryfootprint than the iterative schemes.The nave LU decompositionschemeuses
8&q wordsof memory, whilst the memoryusageof the block
tridiagonalmethodsdependson the relative axis lengthsof the meshuponwhich the finite volume equationshavebeendiscretised.For theworstcase,a squareor cubicmesh,the2D block tridiagonal
wordsof memory, whilst the3D solver uses 8&6s
words. If themeshesaremuch shorteralongoneaxishowever thememoryusageis reduced.
3.3 Conclusions
A numberof linearsolverssuitablefor thesolutionof elliptic PDEshave beendescribed,andtested on theequationsarisingfrom a two andthreedimensionalfinite volumediscretisationof theLaplace equation.Comparisonsof thesolvershavebeenmadein termsof thespeedto solve thesystems,and thememoryused.
For small problems(wherethe numberof equationsis lessthan \ ÅÅÅ ) the MSI (Modified Strongly
Implicit) solver, anincompletefactorisationmethod,is typically thefastestmethodto solve theequations. For moderatesizedproblems(wherethe numberof equationsis of the orderof Q¾ÅÅÅµÅ ) the CG(ConjugateGradient)andBiCGSTAB (Bi-ConjugateGradientStabilised)Krylov spacemethods becomefastestwhencoupledwith eithertheMSI or SIP(StronglyImplicit Proceedure)precondition- ersor themultigrid preconditionersthatuseMSI or SIPsmoothing.
For largesystems(wherethenumberof equationsexceeds Z ÅÅÅµÅ ) themultigrid solversbecomethe
fastest.Thesolutiontimesfor themultigrid methodsscalelike r I8 | K for largesetsof equations, unlike theotheriterativeschemesfor which thesolutiontimescaleslike r I8 | K to r I8&q K . Thedirectmethodsweretypically theslowestmethodstested.Howeverin onecase(two dimensional systemswith morethan
Z ÅÅÅµÅ equationsandNeumannboundaryconditions)the block tridiagonal solverwasfasterthanthesimpleiterativeschemessuchasSOR(SuccessiveOverRelaxation)andthe incompletefactorisationschemessuchasMSI.

Date post:	27-Mar-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Methods for the Solution of Linear Systems Deriving from ...

Documents