Chapter 3
Methods for the Solution of Linear SystemsDeriving fr om Elliptic
PDEs
A finite volumediscretisationof theelliptic PDEs(Partial Dif
ferentialEquations)usedto describea fluid
mechanicsproblemgeneratesa largesparsesetof
linearequations.Typically a CFD algorithm will
involvetherepetitivesolutionof aPoissonpressureequationalongwith
scalartransportequations for momentum,enthalpy,
concentration,andany otherfieldsof interest.Normally theprogramwill
spendmostof its executiontime in solving
theselinearisedequations,andso the efficiency of the
linearsolversunderpinstheefficiency of thesolutionmethodasa whole.
Thereforea crucialaspect of any efficient solutionof a fluid
mechanicsproblemis the speedthat theselinear(ised)equations
canbesolved.
In this chaptera numberof differentalgorithmsfor thesolutionof
linearequationsarediscussedand their resourceuseis compared,bothin
termsof speedandmemoryuse.Whilst therearemany papers comparingtwo
or threelinearsolvers,comparisonsof severalclassesof linearsolver
arerarein the literature. FerzigerandPeric’s book[43] comparesa
numberof methods,but the comparisonsare madein termsof numberof
iterationsto convergeinsteadof time to converge,a rathermeaningless
measuredueto thevariationin computationeffort per
iterationamongstthesolvers. Bottaet al[14] comparea numberof
methods,however thesolverswerewritten by differentgroupsandso
thereis thepossibilitythatsomeof thevariationin performanceis dueto
differentcodingstandards.In both casesthenumberof methodscoveredis
lessthanin thecurrentstudy.
Thischapteris dividedinto two parts;thefirst
describesthelinearsolvers,whilst theseconddiscusses their
suitability for the solution of the equationsthat result from a
finite volume discretisationof elliptic PDEs.
3.1 A Description of the Linear Solvers
Methodsfor solvinglinearequationscanbedividedinto two
classes,directmethods(i.e., thosewhich executein a
predeterminednumberof operations)anditerative methods(i.e.,
thosewhich attemptto convergeto thedesiredanswerin an unknown
numberof repeatedsteps).Direct methodsareoften usedfor
smalldenseproblems,but for the largesparseproblemswhich
aretypically encounteredin thesolutionof
PDEs,iterativemethodsareusuallymoreefficient.
The direct methodswhich
arediscussedhereareGaussJordanelimination,LU (Lower Upper)fac-
torisation,Cholesky factorisation,LDL (Lower DiagonalLower
) decomposition,Tridiagonaland
Thedistinctionbetweentheclassesof
iterativesolversblurssomewhatsincemultigrid methodscanbe
consideredasanaccelerationtechniqueto improvetheperformanceof
thesimpleiterativeandincom-
pletefactorisationmethods,andKrylov-spacemethodscanusethe simple
iterative andincomplete factorisationmethodsaspreconditioners.A
Krylov spacemethodusingan incompletefactorisation smoothedmultigrid
preconditioneris a solver thatcombinesthreeof theclassesof
iterativesolver.
3.1.1 Linear Equations Resulting fr om a Finite Volume
Discretisation of a PDE
Thegeneralform of a setof linearequationscanbewritten, (3.1)
or consideringtheindividual equation ! #" (3.2)
will typically take on a hepta-diagonalstructure,with the
non-zerocomponentsoccupying only sevendiagonalsof thematrix. For
atwo dimensionalPDEthere will beonly five diagonalswhich
arenonzero,andfor a onedimensionalPDEtherearethreenon-
zerodiagonals.Thisregularstructureenablesaconsiderablereductionin
memoryuseandthenumber of operationsperformedsinceonly
thesesevendiagonalsneedto bestoredandoperatedupon.Using the
compassnotationof the equationsdiscussedpreviously in Section2.1
the above linear equation becomes $ $&%'( ()%*,+ + - .
for a discretisationof a onedimensionalPDE, $ $ %* ( ( %' + + %*
%'/ / ! 0 (3.4)
for a discretisationof a two dimensionalPDE,and $ $ %' ( ( %' + +
%* %'/ /1%* %*2 2 - 0 (3.5)
for a threedimensionalPDE. The subscript3 refersto the point at
which the equationis centred, andthe 4 657 8 69: <; and =
subscriptsrefer to theneighbouringEast,West,North, South,Top
and
CHAPTER3. LINEAR SOLVERS 54
For somesystemstheequationsaresymmetric,E " (3.7)
3.1.2 Dir ect Methods
LU (Lower Upper) Factorisation
A matrix
canbefactoredinto loweranduppertriangularcomponentssuchthat,FHG - "
(3.8)
This decompositioncanbeusedto solve theequation,&JI FHGLK &
F I G K M (3.9)
by first solvingfor thevector N suchthat F N M (3.10)
andthensolving G ) N " (3.11)
CHAPTER3. LINEAR SOLVERS 55
Theonly problemremainingis factoring
into FaG
, a processthatcanbeperformedusingCrout’s algorithm.For
eachcolumnof thematrix, b Q 6Z "]"]" 8 ,_ _ T V c _ c c -Z C[ "^"]"
T Q _ Q (3.14)P Q0dHe T V c _ c P c gf P P e T V c _ c P c f b % Q
b % Z "^"]" 8 " Thealgorithmfor factoringthe
arrayinto the
of thesystem I FHGLK hi
beinggivenin Figure3.2.
for b Q 6Z "^"]" 8_ for -Z, [ "]"]" b T Q_ T'j V c W _ c c _ QP
Q.dlk T j V c _ c P c _ gm for b % Q b % Z "]"]" 8P P k T'j V c _ c
P c nm Figure3.1: Factoring
into
FaG .O d P for !Z C[ "^"]" 8O k UT j V P O om d P O d _ for -8 T Q
C8 T Z "]"^" Q k O UT j ^` _ g,nm d _
Figure3.2: Solvingthesystem I FHGpK hi
.
equations(which would correspondto theN pointson a finite
volumemesh)the storagerequirementfor anLU factorisationis
8&q % 8 , whilst thenumberof operationsis of r I8&s K
CHAPTER3. LINEAR SOLVERS 56
for factorisation,and r I8&q K for solution. For a
sparsesystemof equationssucha schemeis rather inefficient
since
8 is likely to be large, and most of
is zero. A more efficient banddiagonal
versionwherethearrayis storedasabandonly wideenoughto
storethefarthestoff-banddiagonalis implementedin Presset
al[133].
For symmetricmatriceswhere t
u FHF (sometimesreferredto asthe squareroot of the
matrix),andtheLDL (LowerDiagonalLower
8 squareroot operations,which canbe a slow
operationon many computers.
ThomasTridiagonal Solver
. . . . . . V x V y V x
X{zzzzzY RwwwwwS q
To solvesuchasetof equationsrequiresonly a forwardanda
backwardsubstitution,thealgorithmis givenin Figure3.3. x O -
d
for -Z, [ "]"]" 8 y V d x UT O JI T7 O V K d
O for 8 T Q 8 T Z "^"]" Q O T ]` ^`
Figure3.3: TheThomasTridiagonalAlgorithm.
CHAPTER3. LINEAR SOLVERS 57
Block Tridiagonal Solvers
For afinite volumediscretisationof a two
dimensionalPDEasdescribedby equation(3.4),thematrix
structurecanbeviewedas,RwwwwwS v q v q q
. . . . . .
... V X{zzzzzY
RwwwwwS q ... V
X{zzzzzY (3.16)
wherethesubmatricies , v , and
v RwwwS $ ( ,+ < $ < ( # . . . . . .
In a mannersimilar to
theThomasTridiagonalalgorithmtheseequationscanbesolvedby a forward
anda backwardsblock substitution.A similar block
structurecanbeusedto solve threedimensional PDEs.
Thematrix inverse V usedin Figure3.4,which containstheblock
tridiagonalalgorithm,is purely for notationalpurposes.Insteadof
calculatingthe inverseandperforminga matrix multiplicationas in V ,
thesystemcanbesolvedas
- by factoring
into its
F and
G components,
CHAPTER3. LINEAR SOLVERS 58
v N V for -Z, [ "]"]" 8 V V v UT N JI T N V K V N for 8 T Q 8 T Z
"^"]" Q N T ^` ^`
Figure3.4: TheBlock TridiagonalAlgorithm.
The two dimensionalversionof thesolver uses 8 s6Cq
wordsof storageandtakes r I8 q K operations (assuminga
squaremesh),whilst the threedimensionalversionuses
8&6Cs words of storageandr I8&6s K operations.Whilst this
is betterthanLU factorisation,it is not asgoodastheonedimen-
sionaltridiagonalsolver, andis lessefficient thanmostiterative
schemes.However for caseswhere onedimensionof the problemis
muchgreaterthanthe othersthe densesubmatriciescanbe made
smallerandtheefficiency of themethod(both in termsof
storageandnumberof operations)canbe greatlyimproved1.
3.1.3 Iterati veMethods
set U
Thethreeclassesof iterativemethodsdiscussedin this chapterare
simpleiterativemethods,suchasJacobiiteration,
incompletefactorisationschemes,suchas SIP(Strongly Implicit
Procedure,also known as Stone’smethod), Krylov
spacemethods,suchastheConjugateGradientmethod,and multigrid
schemes.
1For a ¨h©«ª¬¨®6¯<¨7°M±
threedimensionalmesh,thestoragerequiredis ±}°M¨®² ,
andtheoperationcountof order ³ ´]±µ¨·¶o¸ . By changingfrom a cubic
meshto onewhere ±7¹»º<¼6¨ , the storagerequirementis reducedby a
factor of ½¾º<¿ , and the operationcountis reducedby a factorof
ÀoÁ@Âoà .
CHAPTER3. LINEAR SOLVERS 59
Convergenceof an Iterati veScheme
Somemethodmustbe madeto rank the fitnessof the
approximatesolutionsso that the decisionto terminatethe solvers
iterative loop canbe made. If the solutionat iteration
is c
, thentheerrorat iteration
Of coursetheexactsolution
is unknown andthussois theerror. However we caneasilycalculate the
residualat any stepandtheresidualis proportionalto theerror, so if
theresidualdecreasesby a factorof Q¾Å VÆ sodoestheerror.
Theresidualof thesystem(3.1) is definedbyÇ c T c " (3.22)
Most iterative linearsolversincludea calculationof theresidual(or a
closeapproximation)aspartof thesolutionalgorithm,andsowith
thesemethodstheconvergenceof thesolver canbemonitoredat no extra
computationalcost.
A scalarmeasureof theresidualvector Ç ’s lengthis givenby its
norm.A family of normsis givenbyÈ Ç È6É e j DÊ Ë Ê É8 f É "
(3.23)
Theone,two andinfinity normsare, È Ç È j DÊ Ë Ê8 (3.24)È Ç È q »Ì j
ÍË q8 (3.25)
and È Ç ÈgÎ ÏÐ.Ñ W |Ò|Ò| Ê Ë Ê " (3.26)
For thetestsdescribedin thelattersectionsof thischaptertheinfinity
normwasusedasa measureof convergence.
The ResidualForm of the Equations
If Ó is thedifferencebetweensuccessive solutionsby an iterative
method,Ó » c ` T c , thenthe system
(3.27)
canberewrittenas JI c % Ó K Ô c % Ó Ô (3.28) Ó Ô T c "
Theright handsideof thelastequationin (3.28)is theresidualof the
iteration,andthuswe can
rewrite equation(3.27)in its residualform,Ç c Ô T c Ó Ç c (3.29) c
` Ô c % Ó "
CHAPTER3. LINEAR SOLVERS 60
termswheredifferencescan be several ordersof
3.1.4 Simple Iterati veMethods
which is appliediteratively to the systemto be solved. Jacobi’s
method,Gauss–Seidel,Successive Over
Relaxation(SOR),SymmetricSuccessive Over
Relaxation(SSOR)andRed-BlackSuccessive Over Relaxation(RBSOR)are
all linear solvers of this form. Thesemethodsare the simplestto
implement,but aretypically the slowestto convergeto a solution.
However they canbe effectively usedaspreconditionersfor Krylov
spacemethods,or smoothersfor multigrid schemes.
Jacobi’sMethod
This suggestsaniterativemethoddefinedby, c$ Q $ e T >@? >@? c
V >@? f (3.33)
wherethe termson the right handsideof the equationareall from the
previous iteration. This algorithmis givenin Figure3.6.
set U Q
while Õi¢D£¤
and Ä ¥ Ä ¢D£¤Ó T c V c c V %×Ö Ó Ä È Ó È Î- % Q Figure3.6:
TheJacobimethod.
TheJacobimethodrequires 8
CHAPTER3. LINEAR SOLVERS 61
Successive Over Relaxation (SOR)
(3.34)
Theequationsnow containa datadependency, with thenew valueof c$
dependingon theprevious equationsupdatedvalues.This limits
theability to paralleliseor vectorisethealgorithm.Thedepen- dency
canberemovedby changingtheorderin which
theequationsaresolved(suchasis donewith RBSOR)but this in turn will
affect therateof convergence,typically for theworse.
set U Q
while Õ¡¢D£¤
and Ä ¥ Ä ¢D£¤ for Q 6Z "]"^" 8Ó Þ « k T j V c T j U c V m c c V
%×Ö Ó Ä È Ó ÈnΧ! % Q
Figure3.7: TheSORalgorithm.
The choiceof the valueof Ö is crucial to the convergenceof the
SORmethod. A theoremdueto Kahan[74] shows thatSORfails to
convergeif Ö is outsidethe interval (0,2). If the relaxationtermÖ Q
thenSORreducesto theGauss-Seidelmethod.
SymmetricSORandRed-BlackSORutilise thesamealgorithmbut with
changesmadeto theorder of theoperations.With
SymmetricSOReverysweepthroughtheequationsin theorder Q 6Z "^"]" 8is
followedby a sweepin thereverseorder, ß8& 8 T Q "^"]" Q . With
Red-BlackSORtheoperations aredonein two passes,the first
passoperatingon the odd elementsof the array, Q [ "]"^" 8 T Q
,followedby thesecondpasswhich operateson theevenelements, -Z à
"^"]" 8 .
Theadvantagesof Red-BlackSORis
thatdependenciesbetweenadjacentarrayvaluesareremoved,
enablingthemethodto bevectorisedor parallelised.However, asis
seenin Figures(3.22)to (3.25), this is at thecostof
agreatlyreducedrateof convergence.
arrayinto it’s FaG
F and
CHAPTER3. LINEAR SOLVERS 62
matrix andoneor two further bands. For eachiteration a forward and
backward substitutionprocessusing this incompletefactorisationis
applied to the residualformulationof thesystemof equations.
Thesemethodsareefficient in their own right, but alsohave
valueaspreconditionersfor theKrylov spacesolvers,andassmoothersfor
the multigrid schemes.The mostcommonlyencounteredim-
plementationsare IncompleteCholesky factorisation(commonlyusedas a
preconditionerfor the
ConjugateGradientmethod)andtheStronglyImplicit methodof
Stone[163].
Incomplete Cholesky Factorisation (IC)
set U Q
while Õ¡¢D£¤
and Ä ¥ Ä ¢D£¤Ç c V T c V Ä È Ç c V ÈoÎ for
Pâ Q CZ "^"]"<ãäfor b Q 6Z "^"]"<ãåfor Q CZ "]"]"<ãæO ç è
ç I Ë ç T},+ é ê O V ç è V çT} / « é ê O V ç è V çT}2 « é ê O ç«V è
ç«V Kfor Pâ ã ä ã ä T Q "]"^" Qfor b ãå ãå6T Q "^"]" Qfor ãæ ãænT Q
"]"^" QO ç è ç I O çÍT è ç I ( « é ê O ]` ç%· é ê O ` ç%· é ê O ç{`
K<K c c V % N§! % Q
Figure3.8: TheIncompleteCholesky method.
for Pâ Q 6Z "]"^"ã äfor b Q CZ "]"^"ãåfor Q 6Z "^"]"<ãæè ç
Q0dìëíííî $ « é ê T I + é ê è V ç K qT I / é ê è V ç K qT I 2 « é ê
è ç«V K q
Figure3.9: TheIncompleteCholesky andIncompleteLU
factorisations.
thentheincompletefactorisationï canbewrittenasï ßIð % FMK Ið % F K
(3.37)
CHAPTER3. LINEAR SOLVERS 63
wherethe ð
arrayonly hasnon-zerocomponentson thediagonal,thevaluesbeingè Qñ UT
j V I è K q (3.38)
whilst theoff-diagonalvaluesof the F
matrix have thesamevaluesasthecorrespondingelementsin the
the residualis calculated, Ç c V T c V (3.39)
thenanupdateis calculatedby solving Ið % FMK N Ç c V Ið % F Kò N
(3.40)
with theupdatethenbeingaddedto thepreviousiterationssolution, c c V
% ò " (3.41)
Thesolutionalgorithmis givenin Figure3.8,with
thefactorisationbeinggivenin Figure3.9.
set U Q
while Õ¡ ¢D£¤
and Ä ¥ Ä ¢D£¤Ç c V T c V Ä È Ç c V ÈoÎ for
Pâ Q CZ "^"]"<ãäfor b Q 6Z "^"]"<ã åfor Q CZ "]"]"<ã æO ç
è ç I Ë çT} + é ê O V çT}/ « é ê O V çT}2 « é ê O ç«V Kfor Pâ ãä
ãäT Q "]"^" Qfor b ã å ã å T Q "^"]" Qfor ã æ ã æ T Q "]"^" QO ç O
ç T è ç I ( é ê O ]` ç%· « é ê O ` ç%· « é ê O ç^` K c c V % N§! %
Q Figure3.10:TheIncompleteLU method.
The methodis not asfastasthe SIP andMSI formulationsdescribedbelow,
andis not often used as a solver. However it
hasfrequentlybeenusedas a preconditionerfor the ConjugateGradient
method,thecombinationbeingreferredto asthe
IncompleteCholesky–ConjugateGradientmethod or ICCG. The solver
canbe modified to remove the needto take squareroots in the
factorisation of the equations(the IncompleteLDL method),and to be
appliedto non-symmetricsystems(the IncompleteLU method(ILU)).
Thefactorisationandsolutionalgorithmsfor theILU solveraregiven in
Figures3.10and3.11.
The factorisationstepof the IC andILU solversonly requiresthe
storageof the diagonalof the ð
matrix, whilst the solvers themselvesrequirethe storageof the
residual Ç . Thereforethe memory usageof themethodis only
Z8 words.
CHAPTER3. LINEAR SOLVERS 64
for Pâ Q 6Z "]"^"ãäfor b Q CZ "]"^"ã åfor Q 6Z "^"]"<ã æè ç Q0d
ØÚ $ é ê T} + « é ê ( «ó é ê è V çT}/ « é ê é#ó ê è V çT} 2 é ê é
ê^ó è ç«V ÛÜ Figure3.11:TheIncompleteLU factorisation.
Strongly Implicit Procedure(SIP)
thatis usedisï FôG %*õ (3.42)
where õ is
theerrorbetweentheexactandapproximatefactorisations.
; two extra non-zerodiagonalsif
is a five-diagonal matrix resultingfrom a two dimensionalPDE,or six
extra diagonalsif it is a seven-diagonalmatrix from
athreedimensionalPDE.
To make ï a goodapproximationof
, the õ arrayis setsuchthatõ )ö Å " (3.43)
This is doneby recognisingthat the systembeingsolved is from a
finite volumeapproximationof a PDE.Thusthevaluesof the
field in theextra diagonalsof õ canbe approximatedby a second
orderextrapolationof the valuesof
. By putting the termsfor the extrapolationinto the elements of õ
andcancellingwith thevaluesof
in theextra diagonalsof õ thenthesystemcanbemadeto
approximateequation(3.43). Finally, to make the LU
factorisationuniquethe diagonalelementsofG aresetto 1.
Thesystemof equationsis thensolvediteratively in asimilarmannerto
thesolutionof theIncomplete Cholesky system;for eachiteration
theresidualis calculated,Ç c V T c V (3.44)
thenanupdateis calculatedby solving F N Ç c V G-ò N (3.45)
with theupdatethenbeingaddedto thepreviousiterationssolution, c c V
% ò " (3.46)
As is seenin Figures(3.22) to (3.25), the SIP solver is much
fasterthan the simpler ILU and IC schemes,andso is suitableasa
solver in its own right, aswell asa smootherwith other iterative
methods.It requires÷ 8 wordsof storagefor thesolutionof two
dimensionalPDEs,and ø 8 words for threedimensionalPDEs.
CHAPTER3. LINEAR SOLVERS 65
set U§ Q
while Õ¡¢D£<¤
and Ä ¥ Ä ¢D£¤Ç c V T c V Ä È Ç c V ÈoÎ for
Pâ Q CZ "]"^"ãäfor b Q CZ "]"]"<ã åfor Q 6Z "]"^"ã æO ç !¦ ç I Ë
ç T è ç O V çT x ç O V çT y ç O ç«V Kfor Pâ ã ä ã ä T Q "^"]" Qfor
b ãå ãå6T Q "]"]" Qfor ãæ ãægT Q "^"]" QO ç O çTúù¾ ç O ^` çT _ ç O
` çTüû ç O ç{` c c V % N- % Q
Figure3.12:TheStronglyImplicit Procedure(SIP)of Stone.
for P Q CZ "]"]"<ãäfor b Q CZ "^"]"<ã åfor Q 6Z "]"^"ã æy ç 2
« é ê d I Q %'ý I ù ç«V % _ çV Kx ç / é ê d I Q %*ý I ù V ç % _ V ç
Kè ç ,+ « é ê d I Q %'ý I ù V ç % _ V ç K y ç ù ç«V % x ç ù V ç x ç
û V ç % è ç ù V çþ è ç _ V ç\% y çù¾ ç«V ¦ ç Q0dÿ $ « é ê %·ý
I
% % þ K T Iè ç ù V ç% x ç _ V ç % y ç û çV Kù ç -¦ ç I ( « é ê
Tý
K_ ç -¦ ç I é ê T7ý Kû ç -¦ ç I « é ê T7ý þ K
Figure3.13:TheincompleteLU factorisationusedin SIP.
CHAPTER3. LINEAR SOLVERS 66
Modified Strongly Implicit procedure(MSI)
F and
compositionin equation(3.42)areallowedto have
morenon-zeroelementsthantheequationmatrix .
set U Q
while Õi¢D£<¤
and Ä ¥ Ä ¢D£¤Ç c V T c V Ä È Ç c V ÈoÎ for
P Q CZ "]"]"<ã äfor b Q CZ "^"]"<ãåfor Q 6Z "]"^"ãæO ç ç I Ë
ç T y ç O ç«V T x ç O ]` ç«V T è ç O ` ç«V T ¦ ç O V çT ç O ]` V
çT\ ç O V ç Kfor P ã ä ã ä T Q "^"]" Qfor b ãå ãå@T Q "]"]" Qfor ã
æ ã æ T Q "^"]" QO ç O çT g ç O ^` ç Túù¾ ç O V ` çT # ç O ` ç T _
ç O V ç{` Tüû ç O V ç{` Tü ç O ç{` c c V % N! % Q
Figure3.14:TheModifiedStronglyImplicit procedure(MSI) of
SchneiderandZedan.
CHAPTER3. LINEAR SOLVERS 67
for Pâ Q CZ "]"]"<ãäfor b Q CZ "^"]"<ã åfor Q CZ "]"^"ã æy ç
2 é ê d I Q %·ý I ù¾ ç«V Tg çV I g]` ç«V % #^` çV KT I # ç«V T×ùn]`
ç«V g çV K I ù¾ ` ç«V % # ` ç«V K<KKx ç T y çn ç«V è ç T y ç #
çV T x çù¾]` ç«V ¦ ç I / é ê T y ç _ ç«V %·ý I x ç _ ]` ç I n^` V
ç% Z _ ^` V ç\%ü]` V ç KT _ V ç I + « é ê T y ç0û ç«V K<KKd I Q
%*ý IZ _ V ç%×û V çUT _ V çù¾ V çT g V ç I n^` V ç% Z _ ^` V ç,%ü]`
V ç K<KK ç T x ç _ ^` çV T ¦ çg V ç ç I + « é ê T y ç û ç«V T ¦
ç ù V çT}ý I y ç ù ç«V % è ç ù ` çV % x ç û V ç KKd I Q %*ý I Z ù V
ç % _ V ç % Z û V ç KK x ç ^` ç«V q y ç ù ç«V s x ç V çV % è ç `
ç«V -è ç.ù¾ ç«V -è ç # ` ç«V Æ çg]` V ç \ çù¾V ç !¦ ç _ V ç ç _ ]`
V ç -¦ ç¾û V ç%\ ç _ V ç C çü]` V ç q ç û V çn ç ç I ( é ê T x çü^`
ç«V T ç #]` V çT}ý I Z,I % Æ K % s % % K<Kù¾ ç T ç Iè çü ` ç«V
%\ ç #V ç K ç ç I « é ê T è ç û ` ç«V T}ý I q % s % % Z,I %
KK<K_ ç T ç I¦ ç V ç % ç _ ^` V ç Kû ç T ç ç V çü ç ç I « é ê
T7ý I % % % C % q KK
Figure3.15:TheincompleteLU factorisationusedin theMSI method.
CHAPTER3. LINEAR SOLVERS 68
3.1.6 Krylo v SpaceMethods
Sincethe developmentof the ConjugateGradientschemea numberof
otherKrylov spaceschemes have beendevised,with goodsummariesof
themethodsbeinggivenin thebooksby Barrettet al[7]
andGolubandVanLoan[51]. A moreintuitive introductionto themethodsis
givenin thepaperby Shewchuk[154].
In this chapterthe ConjugateGradient(CG), the
Bi-ConjugateGradientStabilised(BiCGSTAB), andthe
GeneralMinimalisedResidual(GMRES)methodsarediscussed.Othermethodswhich
are commonlyencounteredin
theliteratureincludetheSteepestDescent(SD),theBi-ConjugateGradient
(BiCG), the ConjugateGradientSquared(CGS),andthe
Quasi-MinimalResidual(QMR) methods.
Theseadditionalmethodswerealsoimplementedfor this studybut
offeredno advantagesover the threemethodsdiscussedbelow. They
arebriefly discussedat the endof thesection,but for a fuller
descriptionthereaderis referredto Barrettetal[7]
andGolubandVanLoan[51].
Following thepresentationgivenin Barrettetal[7], for
eachiterationtheresidualis minimisedalonga pathorthogonalto
theprevioussearch.Thusthesolverstepsalongtheresidualsurfacein
thesolution spaceto find theminimum residual.At eachiterationthe
iterate
searchdirectionvector c , c i c V %*ý c c " (3.47)
TheresidualÇ c T c is updatedas,Ç c Ç c V Tý c c where c - c "
(3.48)
Thechoice ý c | ! | " ! minimisesÇ c Ç c for all choicesof ý
.
Thesearchdirectionsareupdatedusingtheresidual, c Ç c % c V c V
(3.49)
wherethechoice
c Ç c " Ç cÇ c V " Ç c V (3.50)
ensuresthat Ç c and Ç c V , or c and c V areorthogonal.
The pseudocodefor the preconditionedconjugategradientmethodis given
in Figure3.16,the pre- conditioningbeingthe“solve ï ò c V Ç c V ”
operation.If ï is setto theidentitymatrix # thenfor
eachiteration
ò Ç andthe algorithmsimplifiesto its unpreconditionedform. The
preconditioned form of thesolver requires
àµ8 wordsof storage(not including that requiredby
thepreconditioner),
whilst theunpreconditionedform requires [µ8
while Õi¢D£¤
and Ä ¥ Ä ¢D£¤ solve ï ò c V Ç c V þ c V Ç$ V&% " ò c V if Å
ò
else
c V (' ó' ó c ò c V % c V c c - cý c ' ó! | ) c c V %*ý c cÇ c Ç c
V T7ý c cÄ È Ç c È Î- % Q
Figure3.16:ThepreconditionedConjugateGradientalgorithm.
GeneralisedMinimal Residual(GMRES)
TheGMRESiteratesareconstructedastheseries c % O +* %"]"^".% O c * c
(3.51)
wherethe O coefficientshavebeenchosento minimisetheresidualnorm.
Thenumberof operations in the calculationof the
c iteratethusincreaseslinearly with the numberof
iterations,asdoesthe
storageused.To placeanupperlimit on thestoragerequiredby
thescheme,thesolver is commonly implementedwith a
restartafter
-,/.10/2 £ ,/2 iterations,limiting thememoryusageto
8 % -,/.10/2 £ ,/2gI8 % ,/.1032 £ ,32 % à K wordsof storage.
CHAPTER3. LINEAR SOLVERS 70
The algorithmfor the restartedGMRESsolver is given in Figure3.17.
It is taken from the method suggestedby SaadandSchultz[145].
set U Q
while Õi ¢D£¤
and Ä ¥ Ä ¢D£<¤Ç T if ( Q ) Ä È Ç È q
solve ï * Ç if ( Q ) 4 Ä d È * È q5 Åù È * È q* * d È * È q
for Q CZ "]"]" ,/.1032 £ ,32 while Ä Ä ¢D£<¤Ç - * solve ï * ^` Ç
for b Q 6Z "]"^" * ]` " * * ]` * ]` T * \]` È * ]` È q* ^` * ^` d
]` for b Q 6Z "]"^" T Qý % _ ` T _ % ` ý `
d ñ q % q^` _ ]` d ñ q %6 q^` \ ñ q % q^` \]` Åùn]` T _ ù¾ùn #ùnÄ 4
Ê ù¾^` Ê- % Qfor b T Q "^"]" QO ù d for Pâ b T Q b T Z "^"]" Qù ç ù
ç T O ç« for b T Q "^"]" Q % O *
Figure3.17:ThePreconditionedrestartedGMRESMethod.
set Ç ® T 7Ç Ç § Q
while Õ¡¢D£<¤
and Ä ¥ Ä ¢D£¤þ c V 87Ç " Ç c V if þ c V Å or Ö c V Å methodfails
if Q Ç
else
c V ' ó:9 ó' ó; ó c Ç c V % c V < c V T)Ö c V * c V >=solve
ï@? * c - ?ý c ' óA | B 5 Ç c V Tý c* cÄ È 5 c ÈnÎ
if Ä ¥ Ä ¢D£¤ c c V %*ý c ?stop solve ïC?5 5D - ?5Ö c FE | GE | E c
c V %'ý c c %7Ö c ?5Ç c 5 T)Ö c DÄ È Ç c ÈoÎ- % Q
Figure3.18:ThepreconditionedBi-ConjugateGradientStabilisedalgorithm.
Other Krylo v SpaceMethods
The SteepestDescentscheme(SD) is an optimisationmethod—byminimising
the residualsof the linearequationsit arrivesat
theequationssolution[154]. For thecurrentiterationthesolver
corrects thesolutionin thedirectionof
thesteepestdownwardgradient.It is simplebut inefficient.
CHAPTER3. LINEAR SOLVERS 72
andtheotheron its transpose
is a modificationof the BiCG solver that appliesthe
updatingoperationsfor the
sequenceand the
sequenceto both vectors. Ideally this would doublethe
convergencerate, but in practice
convergenceis very irregular.
TheQuasi-MinimalResidual(QMR)[46]
methodappliesaleastsquaressolveandupdateto theBiCG
residuals,smoothingout theconvergenceof
themethodandpreventingthebreakdownsthatcanoccur with BiCG.
Preconditioners
where Ç is a currentresidualfield, ò
is thepreconditionedresidual,and ï is a matrix having similar
propertiesto
. If ï is identicalto
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
Unpreconditioned CG MSI MSI Preconditioned CG
Figure 3.19: Comparisonof the convergenceof the MSI, the
unpreconditionedCG, and the MSI preconditionedCGsolvers.
Theconvergenceof anunpreconditionedCG solver, anda MSI solver
togetherwith a MSI precondi- tionedCG solver areshown in
Figure3.19. Theconjugategradientsolver convergesslowly at first,
with the convergencerate increasingafter 10 secondsof solutiontime.
In contrast,the incomplete
CHAPTER3. LINEAR SOLVERS 73
Among the easiestpreconditioningmethodsto implementarethe
JacobiandSymmetricSORalgo- rithms. More complex
methodssuchasincompleteCholesky decomposition,IncompleteLU meth-
ods,andthemultigrid schemesgive fasterconvergence,at theexpenseof
greatermemoryusageand a morecomplex implementation.
3.1.7 Multigrid Methods
For themultigrid methoddiscussedhere,a setof equationsis givenfor a
PDEdiscretisedon a fine mesh. The multigrid schemethen
transformsthe equationsonto a seriesof progressively coarser
meshes,solvingtheequationsfully on thecoarsestmesh.Thesolutionis
thensolvedon theseriesof successively finer
meshes,usingthesolutionfrom theprevious(coarser)meshasan initial
estimate of the solution,finally solving on the finestmesh.By
transferringfrom a fine to a coarsemeshthe mediumwavelengtherrorsin
thefine meshsolutionaretransformedinto shortwavelengtherrorson the
coarsemeshwhich aremucheasierto smooth. The computationcostfor
solving on the coarse meshesis low, and the cost for solving on the
finer meshesis reducedby using the coarsemesh solutionasaninitial
estimate.
The threebasicoperatorsfor the multigrid techniquearethe smoother,
which improvesthe current estimateof thesolutionon
agivenmesh,andtherestrictionandprolongationoperators,whichmapa
setof equationsanda solutionbetweena fineandacoarsemesh.
Given two meshes,with the fine meshhaving a meshspacingof , andthe
coarsemeshhaving a spacingof
Z . The restrictionoperatormapsthe fine meshsolutiononto the
coarsemesh,and is written q IHi (3.53)
whilst
theprolongationoperatorperformstheinverseoperation,interpolatinga
coarsemeshsolution ontoafine mesh, IJ q " (3.54)
For all but thecoarsestmeshthemultigrid solveris
recursivelyappliedto solvetherestrictedsystemof equations.On
thecoarsestmeshtherestrictedsystemis solvedeitherby
aniterativemethod,solving theequationsto full convergence,or by
theuseof a directmethod.
On thefinestmeshtheinfinity normof theresidualis takenandcomparedto
a suppliedtolerance.If thenormof the residualis lessthanthe
tolerancethenthesolver is takento have convergedandthe
CHAPTER3. LINEAR SOLVERS 74
Õ¡¢D£<¤ smooth
c Ç c T cif on finestmeshÄ c È Ç c ÈoÎ if Ä c ¥ Ä ¢D£¤ exitÇ q IH Ç
c
if on coarsestmesh solve
aq N q Ç q else
applymultigrid to aq N q Ç q N IJ N q c ` c % N- % Q
Figure3.20:Themultigrid algorithm.
refers to the iteration number, whilst the superscript
q specifiesthatthevariableappliesto thecoarsemesh.
Prolongationand Restriction Operators
For the first type of prolongationandrestrictionscheme,wherethe
equationsarerederivedandthe
boundaryconditionsre-appliedoneachmesh,thesolveris
necessarilycloselytiedto thediscretisation of the PDE. The
resultingcodeis not very general,and for
complicateddifferencingschemesthe calculationof
theequationscanbeslow2. For sucha schemethevalueof thesolutionat
eachpoint canbe transferedto the correspondingpoint on the
coarser/finermesh,an operationcalledstraight injection. For a
finite volumesolver this carriessomeextra overheadin that the cell
centresof two meshesdon’t align, andinsteadof a
simpleinjectionprocessthesolutionmustbeaveragedover the
cells.
For a blackbox solver that is not directly coupledto
thediscretisationprocesstheprolongationand restrictionoperatorsmust
be derived from the fine meshequationsratherthan re-discretisedfrom
the underlyingequations.A methodthat appliesto the solutionof
PDEsis developedbelow. The methodis appliedto meshesthathave
Z % [ nodesalongeachaxis includingboundarynodes(ie:Z % Q
internalpointson eachaxis). However it canbe usedfor systemswith K
Z % [ nodeson eachaxis, where K % Q is the numberof pointsalongthe
axis on the coarsestmesh. The method canbeusedfor
theequationsarisingfrom bothfinite volumeandfinite
differencedifferencing,and straightinjection canbe usedfor
transferringfields betweendifferentmesheswithout the problems of
averagingsolutionsas can the casewith the finite volume schemes.In
addition the boundary
2Moreover, for someequationssuchasthe pressurecorrectionequationin
SIMPLE couplingschemes,the variablesare definedonly uponthemeshthey
werecreatedon.
CHAPTER3. LINEAR SOLVERS 75
X{zzzzY RwwwwS q s X{zzzzY
RwwwwS q s X{zzzzY " (3.55)
For the above systemthe Í and nodesare interior boundarypoints(ie:
the first row of points on the interior of a solutiondomain),andthe
boundaryconditionshave beenappliedin a form that removestheneedfor
thepointsphysicallylocatedon theboundary(seeSection2.3).
For an elliptic PDE the solutionmustbe reasonablysmooth,andso the
solutionvaluesat the even numberedpointsin
themeshcanbeestimatedfrom asecondordercentredinterpolationfrom
theodd numberedpoints, q q I U % s K q I s % K " (3.56)
By substitutingtheseequationsinto the initial system,a systemwith
only half the equationsof the original systemis generated,RS Z $ %*
( ( ,+ L Z $ L %*+ L %'( L ( L,+ R Z $ R %*,+ R XY RS Í s XY RS Z Z
sZ XY (3.57)
which canberewritten RS q $ q ( q +L q $ML q (NL q + R q $ R XY RS
Í s XY RS @q q s @q XY (3.58)
where q $ Z $ ó %'( «ó %*+ ó q ( ( «ó q + + ó oq Zô q V "
(3.59)
For theseequations,restrictingthefield from the onemeshto thenext
coarsermeshcanbeaccom- plishedusingsimpleinjection, q q V "
(3.60)
Thecorrespondingprolongationoperatoris q ` Q C[, "]"^" (3.61)
for theoddnumberedpoints,whilst theevennumberedpointsarefoundby
linearinterpolation ÝQZ I ` % K !Z à "]"^" (3.62)
For a two dimensionalsystemtheequationrestrictionoperatorbecomes q
$ « é Z $ ó éó %* ( ó éó %* + ó éó %* «ó é#ó %'/ «ó é#ó q ( « é (
«ó éó q + « é + ó éó q é ó éó q / « é / «ó é#ó q Zô q V q V
(3.63)
CHAPTER3. LINEAR SOLVERS 76
andtheprolongationoperatorbecomes q ` é ` Q C[, "^"]" b Q C[, "]"^"
(3.65)
with theremainingpointsbeingfoundby bilinearinterpolation.
Similarly in threedimensionstheequationrestrictionoperatorsare q $
« é Z $ «ó é#ó ó %J ( «ó éó ó %ß + «ó é#ó ó%J «ó éó ó %ß/ «ó éó ó%J
«ó éó ó %ß2 «ó éó ó q ( « é ( ó éó ó q + « é + «ó éó ó q é «ó é#ó ó
q / « é / «ó éó ó q « é ó éó ó q 2 « é 2 ó éó ó q c Z q V q V q c V
(3.66)
thefield restrictionis q c q V q V q c V (3.67)
andtheprolongationbecomes c q ` é ` ` Q [\ "^"]" b Q [\ "^"]" 6 Q
C[, "]"^" (3.68)
with theremainingpointsfoundby trilinear interpolation.
Thesolveoperationin Figure3.20variesdependingon whatlevel of
themeshhierarchythesolver is on–forall but
thecoarsestmeshthesolutionto q q q (3.69)
is foundby recursingandapplyingthemultigrid solver to thesystem.At
thecoarsestmeshhowever the systemis solved either by an iterative
techniqueappliedto convergence,or by using a direct
method.Sincethis systemis for thecoarsestmeshthecomputationalcostof
its solutionis minimal.
Thetestfor overallconvergenceof theschemeis
performedonthefinestmesh,wherethenormof the residualis
calculatedandcomparedwith
theusersuppliedsolutiontolerance.Oncethenormreduces below
thespecifiederrorboundthesolveris assumedto
haveconvergedandtheprocessis terminated.
The linearsolverswerecomparedin termsof their
speed,andmemoryusage.Whencomparingthe speedsof the solversseveral
factorscomeinto play. For direct solversthe numberof operationsis
fixedfor agivennumberof equations,but for
iterativemethodsthetimetakento convergeto asolution dependsnot only
on thenumberof equationsbut alsoon thepropertiesof
theequationsthemselves
(suchastheboundaryconditionsanddiagonaldominance),andtheconvergencecriteriaandtolerance
chosen.
Thenumberof equationsto besolved, the layoutof thedatain memory,
andminor implementation detailssuchasthesyntaxusedto performa
matrix-vectoroperationalsohave a big effect on speed.
Theseeffectsarediscussedin the following chapter, but it is
importantto notethat comparisonsof differentcodesshouldbemadefor a
rangeof arraysizesandwith a consistentcodingstyleto reduce
variability dueto thesefactors.
In the following sectionsthetestcaseusedto comparethesolversis
described,andthenthesolvers arecomparedon thebasisof their
convergencecharacteristicsandscaling.
3.2.1 The Solver TestCase
To comparethe speedsof the linear solvers they were usedto solve
two finite volume problems, onewith Dirichlet andthe otherwith
Neumannboundaryconditions,which simulatethe equations encounteredin
afinite volumeCFDcode.Thetestcaseswererun for bothtwo
andthreedimensional problems,andweresolvedto full
convergence.
The testcasewasa finite volumediscretisationof the
Laplaceequationappliedto a unit squareor cubicdomain,with a
sinusoidallyvaryingsourceterm. For
thethreedimensionalcasetheequations were, S qUT IVW A IYX K VW A
IZ-X O K VW A IZ-X&Z K
(3.70)
with Dirichlet boundaryconditions,andS q T [VW A I ZX K V:W A IZ-X
O K V:W A IZ-X&Z K (3.71)
with Neumannboundaryconditions.For
Neumannboundariesazeronormalgradientwasappliedto all
boundaries,andfor theDirichlet problemthe Å and Q
boundariesweresetto
T T Q andT Q respectively, with all otherboundariessetto T Å . With
theNeumannproblem,thesolution
is notunique,anda furtherconditionof T Å (3.72)
wasimposedat thecentreof thesolutiondomainto ensureuniqueness.For
thetwo dimensionaltest casesthe
Z forcing componentof thesourcetermwasdropped.Thesolutionsto thetwo
dimensional
formsof thetestfunctionsareshown in Figure3.213.
For eachequationtwo runsweremade.Thefirst comparedtheconvergenceof
the iterative methods at onemeshsize–a
ZP\-]µq meshin two dimensions,and ÷_^ s meshin three. The otherrun
wasmade
for a rangeof meshsizes,with resultsobtainedfor the time to
reducethe maximumresidualby a factorof Q¾Å Æ , the solution
thenbeingconsideredfully converged. The direct methodswerealso
timed for the samerangeof array sizes. All runsweremadefrom an
initial field of
T Å , and the relaxationparametersfor thesimpleiterative
andincompleteLU solversweresetto Ö Q andý Å " ] respectively.
TherunsweremadeonaDECAlpha500auworkstationrunningDigital Unix
4.0E,usingFortran90 codecompiledwith theDigital
Fortrancompilerandusingdoubleprecisionstoragefor thefloating
CHAPTER3. LINEAR SOLVERS 78
point data. Timings were madeusing the C getrusage andgettimeofday
functionswhich provide accuracy to Q0dQnÅŵŠof a second,with
multiple runsbeingmadeat thesmallarraysizesto
ensureanaccurateresolutionof theruntime.
Onefailing of thetestproblem(andonewhich theAuthor hasnot hadtime
to remedy)is thesquare or cubic dimensionsof the
solutiondomains,and the unit aspectratio of the meshcells.
Iterative methodscanstall whenthe meshis distortedfrom sucha
regular topology[43], an analysisof the mechanismsbehindthefailing
beinggivenby Brandt[17]. Unfortunately, timerpreventeda thorough
investigationof this phenomena,althoughFerzigerandPeric[43] claim
it’s effect is lesspronounced on
convection–diffusionproblems.
CHAPTER3. LINEAR SOLVERS 79
and ÷_^ s meshes for boththeNeumannandDirichlet
boundaryconditionproblems,with theresidualat eachiteration
beingprintedout for plotting, the
abscissabeingplottedassolutiontime. Otherstudiesof iterative solver
have eitherstudiedonly oneor two classesof solver
(suchasBriggs[18], Yu[183]), or have madecomparisonsin termsof
numberof iterations(FerzigerandPeric[43], Iserles[68], Barrettet
al[7], and Kershaw[77]). The speedof eachiteration can vary widely
betweendifferent solvers, andsocomparisonsin termsof
iterationsaloneis rathermeaningless.Onestudyby Bottaet al[14]
makesa comparisonof a numberof differentmethodsin termsof
computationspeed,but therateof convergenceis not given, andthe
codingof differentsolversby differentgroupsallows for a wide
rangeof codingstyleswhich mayaffect therelativesolverspeeds.
In generaltheiterativemethodstooklongerto solvetheproblemswith
Neumannboundaryconditions thanthosewith Dirichlet boundaries.The
systemfrom the Neumannboundaryproblemis singular
unlessthesolutionis specifiedat somepoint in thesolutiondomain.Most
methodsperformedmore efficiently if the systemwas left singular,
with the exceptionof the direct Cholesky factorisation methodwhich
would fail with thesingularsystem.
Figures3.22and3.23show the convergenceof the solversfor the 2D
testcaseswith Dirichlet and Neumannboundaryconditionsrespectively.
Figures3.24and3.25show theconvergencefor the3D Dirichlet
andNeumanntestruns.
The Simple Iterati veMethods
The Neumannproblemswere slower to solve than thosewith Dirichlet
boundaryconditions. For the Dirichlet boundaryconditionsthe
convergenceexhibits an initial sharpdrop in the maximum
residual,followedby a muchlongerperiodwheretherelative
residualdropsby a constantratio with eachiteration. For the
Neumannboundaryconditionsthis initial sharpdrop wasnot observed,and
it was thoughtto be due to the rapid smoothingof the
shortwavelengthstructuresadjacentto the boundariesin theDirichlet
problem.
Incomplete Factorisation Methods
Multigrid Schemes
CHAPTER3. LINEAR SOLVERS 81
Typically theSIPsmoothedmultigrid wasfastest,beingfasterthantheMSI
smootherversion,but in many 2D casestheSSORsmoothedmultigrid
hadsimilar speedto theSIPsmoothedscheme.
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
2D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU MG-SIP
MG-MSI
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
2D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU MG-SIP
MG-MSI
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-SSOR BiCGSTAB-ILU BiCGSTAB-SIP
BiCGSTAB-MSI BiCGSTAB-MG-Jacobi BiCGSTAB-MG-SSOR BiCGSTAB-MG-ILU
BiCGSTAB-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-SSOR GMRES-MG-ILU GMRES-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
Jacobi MSI MG-Jacobi MG-SIP CG CG-Jacobi CG-MSI CG-MG-SIP BiCGSTAB
BiCGSTAB-Jacobi BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.22:Convergencewith timeof 2D solversona ZP\-]µq
meshwith Dirichlet boundaryconditions. Notethescaleof thex
axisvariesfrom graphto graph.
Krylo v SpaceMethods
R es
id ua
2D/Neumann: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
2D/Neumann: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-SSOR BiCGSTAB-ILU BiCGSTAB-SIP
BiCGSTAB-MSI BiCGSTAB-MG-Jacobi BiCGSTAB-MG-SSOR BiCGSTAB-MG-ILU
BiCGSTAB-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
l` Time
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-SSOR GMRES-MG-ILU GMRES-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
l` Time
Jacobi MSI MG-Jacobi MG-SIP CG CG-Jacobi CG-MSI CG-MG-SIP BiCGSTAB
BiCGSTAB-Jacobi BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.23: Convergencewith time of 2D solverson a ZP\P] q
meshwith Neumannboundarycondi- tions.Notethescaleof thex
axisvariesfrom graphto graph.
Preconditioningimprovedthesmoothnessof theconvergenceof
theCGsolver, with theincom- pletefactorisationsmoothedmultigrid
preconditionersforcing a smoothlinearreductionin the
residual.TheJacobismoothedmultigrid
preconditionerhowevergaveaverypoorperformance for
theNeumannboundaryproblem.
CHAPTER3. LINEAR SOLVERS 83
R es
id ua
3D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
3D/Dirichlet: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-SSOR BiCGSTAB-ILU BiCGSTAB-SIP
BiCGSTAB-MSI BiCGSTAB-MG-Jacobi BiCGSTAB-MG-SSOR BiCGSTAB-MG-ILU
BiCGSTAB-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
l` Time
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-SSOR GMRES-MG-ILU GMRES-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
R es
id ua
l` Time
Jacobi MSI MG-Jacobi MG-SIP CG CG-Jacobi CG-MSI CG-MG-SIP BiCGSTAB
BiCGSTAB-Jacobi BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.24:Convergenceof 3D solverson a ÷_^ s meshwith Dirichlet
boundaryconditions.
whereit gaveamuchslower rateof convergence. GMRES:TheGMRESsolver
typically displayeda monotonicconvergence.Theheavily pre-
conditionedversionswith multigrid or
incompletefactorisationpreconditionersconvergedwith asimilar rateto
theCGor BiCGSTAB solvers,but theunpreconditionedsolverhadaveryslow
convergencerate,comparableto thesimpleiterativeschemes.
Summary
CHAPTER3. LINEAR SOLVERS 84
R es
id ua
3D/Neumann: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
3D/Neumann: Convergance of Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
BiCGSTAB BiCGSTAB-Jacobi BiCGSTAB-SSOR BiCGSTAB-ILU BiCGSTAB-SIP
BiCGSTAB-MSI BiCGSTAB-MG-Jacobi BiCGSTAB-MG-SSOR BiCGSTAB-MG-ILU
BiCGSTAB-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
R es
id ua
l` Time
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-SSOR GMRES-MG-ILU GMRES-MG-SIP
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
10
100
R es
id ua
l` Time
Jacobi MSI MG-Jacobi MG-SIP CG CG-Jacobi CG-MSI CG-MG-SIP BiCGSTAB
BiCGSTAB-Jacobi BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.25:Convergenceof 3D solversona ÷a^ s meshwith
Neumannboundaryconditions.
of theDirichlet
boundaryproblemstheincompletefactorisationmethodsgiveafasterinitial
reduction in theresidual,behaviour thatwasthoughtto bedueto their
rapidsmoothingof theshortwavelength componentsin theresidualat
theboundary.
Themultigrid-SIPandKrylov-multigrid-incompletefactorisationmethodsgivethebestperformance,
bothexhibiting asmoothmonotonicreductionin theresidual,whilst
beingfasterthantheothermeth- ods.
Thesolverswererunfor arangeof meshsizes,with areductionof QnÅ Æ in
theinfinity normof theresid- ual (
È Ç È Î ) beingusedasa convergencetolerance.Thesolverswererun for
theequationsresulting from theDirichlet
andNeumannboundaryconditionproblems(seeEquations(3.70)and(3.71)and
Figure3.21),with both the2D and3D versionsof
thesolversbeingtested.Thetime for solutionof thetestcodesis
plottedasa functionof meshsizein Figures3.26to 3.29,with
Figures3.26and3.27 showing thesolutiontime for 2D problemswith
Dirichlet andNeumannboundaryconditions,whilst
Figures3.28and3.29show thesolutiontimesfor the3D problems.
Dir ectSolvers
Thesesolversrun for thesamelengthof timeregardlessof
theboundaryconditions– nopivotingwas
donesotherewasasetnumberandorderof operationsregardlessof
theequationsbeingsolved.With largesystemsof equationsthetime that
thedenseLU andLDL solverstook to solve hadan r I8 K
scaling,where
8 wasthe numberof equations.However, from an
operationcountoneexpectsanr I8&s K scaling,andno explanationis
availablefor thescalingdiscrepancy.
For two dimensionalproblemsthebandedLU andCholesky solverswereboth
r I8&q K in run time. A jump in thesolver runtimeis
clearlyvisible at
8 QnŵÅÅ and 8 ß[ ŵÅÅ for the2D bandedLU and
Cholesky solversrespectively. This
occurswhenthesolverdataexceededthesizeof thecacheof the
testmachine4. Theblock tridiagonalsolversalsoexhibit r I8&q K
scaling,but run15and4 timesfaster thanthebandedLU andCholesky
solversrespectively.
The threedimensionalsolversexhibit slightly differentbehaviour to
thosesolving two dimensional problems.As might beexpectedthedenseLU
andLDL solversscaleidentically to the two dimen-
sionalversions.However thetridiagonalsolversscalelike r I8&s K
, asdoesthebandedLU solver.
Thedirectmethodswererarely fasterthanthe iterative schemes.However,
for the two dimensional testproblemstheblock-tridiagonalsolver
wasfasterthanthesimpleiterative schemes,andfor large two
dimensionalsystems(
8 ¥ Z ÅÅŵŠ) with Neumannboundariesit wasfasterthanthe incomplete
factorisationschemes.
The Simple Iterati veMethods
Incomplete Factorisation Methods
For thetwo dimensionalproblemsthesolutiontimefor
theincompletefactorisationsolversscalesliker I8 K for
smallnumbersof equations,but asthenumbersof equationsincreasesthey
startedto exhibitr I8&q K behaviour. For
thethreedimensionalproblemthesolversexhibited r I8 | K
behaviour.
For small to moderatesizedproblems( 8 Q¾ÅÅÅ for
Neumannproblems,and
8 Q¾ÅÅŵŠfor Dirichlet problems)theMSI solverwastypically
thefastestsolverof all themethodstested.
4Theeffectsof cachesizearediscussedin Chapter6
S ol
ut io
n T
im e
LU Decomposition LDL Decomposition Banded LU Decomposition Banded
Cholesky Decomposition Block Tridiagonal Block
Tridiagonal/LDL
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
2D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU MG-SIP
MG-MSI
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-ILU GMRES-MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
LDL Decomposition Block Tridiagonal/LDL Jacobi MSI MG-Jacobi MG-SIP
CG-MSI CG-MG-SIP BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.26: The time taken to solve a two
dimensionaldiscretisationof the Laplaceequationwith Dirichlet
boundaryconditions.
CHAPTER3. LINEAR SOLVERS 87
S ol
ut io
n T
im e
LU Decomposition LDL Decomposition Banded LU Decomposition Banded
Cholesky Decomposition Block Tridiagonal Block
Tridiagonal/LDL
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
2D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-ILU GMRES-MG-SIP
1e-05
0.0001
0.001
0.01
0.1
1
10
100
S ol
ut io
n T
im e
LDL Decomposition Block Tridiagonal/LDL Jacobi MSI MG-Jacobi MG-SIP
CG-MSI CG-MG-SIP BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.27: The time taken to solve a two
dimensionaldiscretisationof the Laplaceequationwith
Neumannboundaryconditions.
CHAPTER3. LINEAR SOLVERS 88
T im
e to
S ol
ut io
LU Decomposition LDL Decomposition Banded LU Decomposition Block
Tridiagonal Block Tridiagonal/LDL
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
3D/Dirichlet: Solution Time for Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-ILU GMRES-MG-SIP
0.0001
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
LDL Decomposition Block Tridiagonal/LDL Jacobi MSI MG-Jacobi MG-SIP
CG-MSI CG-MG-SIP BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.28: Thetime takento solve a
threedimensionaldiscretisationof theLaplaceequationwith Dirichlet
boundaryconditions.
CHAPTER3. LINEAR SOLVERS 89
T im
e to
S ol
ut io
LU Decomposition LDL Decomposition Banded LU Decomposition Block
Tridiagonal Block Tridiagonal/LDL
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
3D/Neumann: Solution Time for Simple, Incomplete LU and Multigrid
Solvers
Jacobi SOR SSOR RBSOR ILU SIP MSI MG-Jacobi MG-SSOR MG-ILU
MG-SIP
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
CG CG-Jacobi CG-SSOR CG-ILU CG-SIP CG-MSI CG-MG-Jacobi CG-MG-ILU
CG-MG-SIP
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
GMRES GMRES-Jacobi GMRES-SSOR GMRES-ILU GMRES-SIP GMRES-MSI
GMRES-MG-Jacobi GMRES-MG-ILU GMRES-MG-SIP
0.001
0.01
0.1
1
10
100
T im
e to
S ol
ut io
LDL Decomposition Block Tridiagonal/LDL Jacobi MSI MG-Jacobi MG-SIP
CG-MSI CG-MG-SIP BiCGSTAB-MSI BiCGSTAB-MG-SIP GMRES-MSI
GMRES-MG-SIP
Figure3.29: Thetime takento solve a
threedimensionaldiscretisationof theLaplaceequationwith
Neumannboundaryconditions.
CHAPTER3. LINEAR SOLVERS 90
For the two dimensionaltestcasesthe time taken by the multigrid
methodsto solve the systemof equationsscaledlike r I8 | K for the
Dirichlet boundaryconditionproblem,and r I8 | s K for the
problemwith Neumannboundaries.With the
threedimensionaltestcasethescalingwas r I8 | K for
bothboundaryconditions.
For small problemsthe multigrid schemearetypically the slowest,for
8 QnŵŠbeingslower than
theslowestof thedirectmethods.However their
superiorscalingmeansthat for largeproblemsizes ( 8 ¥ Z ÅŵÅÅ )
themultigrid solversarethefastestof themethodsstudied.
Krylo v SpaceMethods
8 ö QnŵÅŵŠwhencoupled with the MSI or SIP smoothedmultigrid
preconditioners.For larger problemsthe slightly superior scalingof
themultigrid solversensuresthatthey becomefaster.
Themultigrid-preconditionedKrylov
spacemethodsscaledbetterthanmethodsthatusedotherpreconditioners,dueto
thebetterscalingof multigrid schemesasa whole. CG:For thetwo
dimensionaltestcasethetimetakenby theconjugategradientmethodto
solve
the systemof equationsscaledbetweenr I8 | s K and r I8 | Æ K , with
the multigrid precondi- tioned solvers scalinglike r I8 | s K and
the other preconditionersand the unpreconditioned solver scaling r
I8 | Æ K . For the Neumannboundariesthe scalingwasof order r I8 | K
andr I8 | Æ K for
themultigrid-preconditionedandothersolversrespectively.
For thethreedimensionaltestcasesthescalingwas r I8 | K and r I8 | q
K for thegeneraland multigrid preconditionedversionsof thesolver
respectively. BiCGSTAB: For thetwo dimensionalteststhescalingwas r
I8 | Æ K and r I8 | K for thegen-
eralandmultigrid-preconditionedcodes,andwith the
threedimensionalproblemsthescaling wasof r I8 | K and r I8 | q K .
GMRESThis solver exhibitedsimilar scalingto its CG andBiCGSTAB
brethren.Generallyit wasslower thanthosetwo solvers.
Summary
pletefactorisationmethod,wastypically thefastestmethodto
solvetheequations.For moderatesized problems(wherethenumberof
equationswasof theorderof Q¾ÅÅŵŠ) theCGandBiCGSTAB Krylov
spacemethodwerefastestwhencoupledwith eithertheMSI or
SIPpreconditionersor themultigrid preconditionersthat useMSI or SIP
smoothing.For large systems(wherethe numberof equations
exceeded
[ ÅÅŵŠ) themultigrid solverswerethefastest.
Thesolutiontimesfor themultigrid methodsscaledlike r I8 | K for
largesetsof equations,unlike theotheriterativeschemesfor which
thesolutiontime generallyscaledbetweenr I8 | K to r I8&q K .
For preconditionedKrylov spacemethodsthe scalingwas
somewherebetweenthe scalingof the Krylov methodandthatof
it’spreconditioner.
The directmethodsweretypically theslowestof the
methodstested.However it is noteworthy that for two
dimensionalsystemswith Neumannboundaryconditionsand more than
Z ŵÅŵŠequations the block tridiagonalsolver wasfasterthanthe
incompletefactorisationschemessuchasMSI. The
CHAPTER3. LINEAR SOLVERS 91
solutiontime for thedirectLU solver exhibited r I8 K scalingof
thesolutiontime,whilst theblock
tridiagonalsolverhad r I8 q K and r I8 s K scalingfor the2D and3D
casesrespectively.
3.2.4 Memory Usage
,
) which are ^ 8 and]µ8 for the2D and3D finite
volumediscretisationsused.
10
100
1000
10000
100000
1 10 100 1000 10000 100000 1e+06 1e+07
M em
or y
U sa
LU Decomposition Block Tridiagonal Jacobi MSI Multigrid-Jacobi
Multigrid-SIP CG BiCGSTAB GMRES
100
1000
10000
100000
M em
or y
U sa
LU Decomposition Block Tridiagonal Jacobi MSI Multigrid-Jacobi
Multigrid-SIP CG BiCGSTAB GMRES
Figure3.30:Thememoryusagein wordsof the2D (left) and3D (right)
linearsolvers.
The simple iterative schemessuchas SORusedthe leastamountof memory,
with SOR,RBSOR andSSORnot requiringany memoryabove the
storagerequirementsof the equations,andJacobi requiringonly
8 wordsof memory. The simplestof the
incompletefactorisationmethodsalsoare
efficient in memoryuse,with the IncompleteCholesky and ILU
methodsboth only requiring Z8
wordsof memoryabovethestorageof theequations.However theSIPandMSI
solversrequiremore memory, using ÷ 8 and ø 8 wordsof memoryfor
the2D versions,and ø 8 and Q àµ8 in 3D.
The Krylov spacemethodssimilarly vary in their
storagerequirements.The unpreconditionedCG solver is the
mostefficient requiring
[µ8 wordsof memoryabove the storageof theequations,with
the unpreconditionedBiCGSTAB solver requiring ÷ 8 words of memory.
The GMRES solver is comparatively greedyin its
memoryusage,requiring
I Q % 8 r K 8 % 8&qr wordsof memory, where
8 r
Thememoryusageof themultigrid solversdependsbothon thenumberof
equationsandalsoupon thenumberof grid levelsused.With ã
beingthenumberof equationson a particulargrid level, the 2D
multigrid solverrequires
] ã , Q [ ã and Q \ ã wordsof memoryfor thatlevel for
theJacobi/ILU,SIP andMSI versionsof thesolver respectivly. For
the3D solver therequirementsare QµQ ã , Q ^ ã and
Z[ ã . For eachrestrictionfrom a fine to a coarsemeshthenumberof
equationsdecreasesby a factorof
Zq for the2D solverand
Zs for the3D solver, sothenumberof pointsonall meshesis givenby
theseriesI Q % % Æ "^"]" K 8 for the2D solverand
I Q % % Æ "]"^" K 8 for the3D solver. Truncatingbothof these
seriesat threetermsgivesanapproximateestimateof
theoverallmemoryrequirementfor thesolvers as QQ " ø 8 , Q ^ 8 and Q
] " ^ 8 for theJacobi/ILU,SIPandMSI versionsof the2D multigrid
solver, andQ Z " \8 , Q ] " àµ8 and
Z ÷ " Z8 for theJacobi/ILU,SIPandMSI versionsof the3D solver.
Finally the direct methodshave a much larger memoryfootprint than
the iterative schemes.The nave LU decompositionschemeuses
8&q wordsof memory, whilst the memoryusageof the block
tridiagonalmethodsdependson the relative axis lengthsof the
meshuponwhich the finite volume equationshavebeendiscretised.For
theworstcase,a squareor cubicmesh,the2D block tridiagonal
CHAPTER3. LINEAR SOLVERS 92
wordsof memory, whilst the3D solver uses 8&6s
words. If themeshesaremuch shorteralongoneaxishowever
thememoryusageis reduced.
3.3 Conclusions
A numberof linearsolverssuitablefor thesolutionof elliptic PDEshave
beendescribed,andtested on theequationsarisingfrom a two
andthreedimensionalfinite volumediscretisationof theLaplace
equation.Comparisonsof thesolvershavebeenmadein termsof thespeedto
solve thesystems,and thememoryused.
For small problems(wherethe numberof equationsis lessthan \ ÅÅÅ )
the MSI (Modified Strongly
Implicit) solver, anincompletefactorisationmethod,is typically
thefastestmethodto solve theequa- tions. For
moderatesizedproblems(wherethe numberof equationsis of the orderof
Q¾ÅÅŵŠ) the CG(ConjugateGradient)andBiCGSTAB
(Bi-ConjugateGradientStabilised)Krylov spacemethods
becomefastestwhencoupledwith eithertheMSI or SIP(StronglyImplicit
Proceedure)precondition- ersor themultigrid
preconditionersthatuseMSI or SIPsmoothing.
For largesystems(wherethenumberof equationsexceeds Z ÅÅŵŠ)
themultigrid solversbecomethe
fastest.Thesolutiontimesfor themultigrid methodsscalelike r I8 | K
for largesetsof equations, unlike theotheriterativeschemesfor which
thesolutiontimescaleslike r I8 | K to r I8&q K .
Thedirectmethodsweretypically theslowestmethodstested.Howeverin
onecase(two dimensional systemswith morethan
Z ÅÅŵŠequationsandNeumannboundaryconditions)the block tridiagonal
solverwasfasterthanthesimpleiterativeschemessuchasSOR(SuccessiveOverRelaxation)andthe
incompletefactorisationschemessuchasMSI.