Chapter 7: Linear Systems: Iterative Methods

September 15, 2014

Chapter 7: Linear Systems: Iterative Methods

Uri M. Ascher and Chen GreifDepartment of Computer ScienceThe University of British Columbia

ascher,[email protected]

Slides for the bookA First Course in Numerical Methods (published by SIAM, 2011)

http://bookstore.siam.org/cs07/

Linear Systems: Iterative Methods Goals

Goals of chapter

• To learn simple and effective iterative methods for linear systems where directmethods are ineffective;

• to analyze these methods, establishing when and where they can be appliedand how effective they are;

• to understand modern algorithms, specifically preconditioned conjugategradients;

• *to get introduced to more advanced and more general Krylov subspace andmultigrid techniques which often include the methods of choice for large scalecomputations.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 1 / 64

Linear Systems: Iterative Methods Outline

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

*advanced


Linear Systems: Iterative Methods Motivation

Iterative methods for a linear problem

• In this chapter we consider the same problem as in Chapter 5: a linear system

Ax = b

where A is nonsingular n× n.

• Iterative method: starting from initial guess x0, generate iteratesx1,x2, . . . ,xk, . . ., hopefully converging to solution x = x∗.

• This approach is typical for nonlinear problems, see Chapter 3 and 9. Here,however, it is applied to a linear problem.

• But why not simply use LU decomposition, or

x = A “backslash′′ b

in Matlab?

• Generally, the matrix A must be somehow special to consider iterativemethods!



Why (or when) not to use a direct method

• If A is large and sparse, LU decomposition (Gaussian elimination) mayintroduce fill-in.

• Want to take advantage when only a rough approximation x to x is required.

• Want to take advantage when a good x0 approximating x is known (warmstart).

• Sometimes A is not explicitly available, only matrix-vector products Av forany vector v can be efficiently carried out.



The famous Poisson matrixA particularly famous example of a sparse matrix is that of the discretization ofthe Poisson partial differential equation. Here is an example of the sparsitypattern of a 100× 100 Poisson matrix, using the Matlab command spy.

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

nz = 460


Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Outline









Jacobi and Gauss-Seidel relaxation methodsGiven A, denote by D its diagonal part and by E its lower triangular part:D = diag(diag(A)) ; E = tril(A)

e.g.

A =

7 3 1−3 10 21 7 −15

, ⇒ D =

7 0 00 10 00 0 −15

, E =

7 0 0−3 10 01 7 −15

.

Given iterates x0,x1,x2, . . . ,xk, . . . denote residual vector

rk = b−Axk.

Jacobi’s method

xk+1 = xk +D−1rk

Gauss-Seidel method

xk+1 = xk + E−1rk



A stationary method

Given iterates x0,x1,x2, . . . ,xk, . . . denote residual vectors rk = b−Axk.

• Any given matrix M of the same size as A defines a splitting

A = M − (M −A).

• Such a splitting defines the stationary iterative method

Mxk+1 = (M −A)xk + b, k = 0, 1, 2, . . . .

• This iteration can be written equivalently as

xk+1 = xk +M−1rk.

• It is called stationary because M is independent of the iteration counter k.

• Both Jacobi and GS are simple examples of a stationary method.



Example: Jacobi (simultaneous relaxation)

• Consider the linear system

7x1 + 3x2 + x3 = 3

−3x1 + 10x2 + 2x3 = 4

x1 + 7x2 − 15x3 = 2 .

• Write as

7x1 = 3− 3x2 − x3

10x2 = 4 + 3x1 − 2x3

−15x3 = 2− x1 − 7x2 .

(Corresponds to a splitting A = M −N with M = D.)

• Evaluate right hand side at current iterate k and left hand side as unknown

x(k+1)1 = (3− 3x

(k)2 − x

(k)3 )/7

x(k+1)2 = (4 + 3x

(k)1 − 2x

(k)3 )/10

x(k+1)3 = (2− x

(k)1 − 7x

(k)2 )/(−15) ,

for k = 0, 1, 2, . . .



Example: Gauss-Seidel

• Write same linear system as

7x1 = 3− 3x2 − x3

10x2 − 3x1 = 4− 2x3

−15x3 + x1 + 7x2 = 2 .

(Corresponds to a splitting A = M −N with M = E.)

• Evaluate right hand side at current iterate k and left hand side as unknown(i.e., forward substitution)

x(k+1)1 = (3− 3x

(k)2 − x

(k)3 )/7

x(k+1)2 = (4 + 3x

(k+1)1 − 2x

(k3 )/10

x(k+1)3 = (2− x

(k+1)1 − 7x

(k+1)2 )/(−15) ,

for k = 0, 1, 2, . . .



Properties of Jacobi and Gauss-Seidel relaxations

• Jacobi is more easily parallelized.

• Jacobi matrix M is symmetric.

• GS converges whenever Jacobi converges and often (but not always) twice asfast.

• Both Jacobi and GS converge if A is strictly (or at least irreducibly)diagonally dominant. GS is also guaranteed to converge if A is symmetricpositive definite.

• Both methods are simple but converge slowly (if they converge at all). Theyare used as building blocks for faster, more complex methods.

• Both Jacobi and GS are simple examples of a stationary method.



Over-relaxation and under-relaxation

• There are more sophisticated stationary methods than Jacobi and GS. Themethods introduced below are based on a simple modification of the ancientones.

• Let xk+1 be obtained from xk by either Jacobi or GS.Modify it further by

xk+1 ← ωxk+1 + (1− ω)xk

where ω is a parameter.• Two useful variants:

• Based on Gauss-Seidel (GS) and 1 < ω < 2, obtain faster successiveover-relaxation (SOR).

xk+1 = xk + ω[(1− ω)D + ωE]−1rk.

• Based on Jacobi and ω ≈ 0.8, obtain slower under-relaxation which is a goodsmoother in some applications.

xk+1 = xk + ωD−1rk.


Linear Systems: Iterative Methods The Poisson model problem

Outline









Poisson problem

• There are many practical problems involving large, sparse matrices (not our3× 3 example!) where simple stationary methods are relevant. We nowdevelop one such prototype example.

• The Poisson equation is a partial differential equation that in its simplestform is defined on the open unit square, 0 < x, y < 1, and reads

−(∂2u

∂x2+

∂2u

∂y2

)= g(x, y).

Here u(x, y) is the unknown function sought and g(x, y) is a given source.

• Boundary conditions: assume that the sought function u(x, y) satisfieshomogeneous Dirichlet boundary conditions along the entire boundary of theunit square, written as

u(x, 0) = u(x, 1) = u(0, y) = u(1, y) = 0.



The need to discretize

σ=0

σ>0

(air)

(ground)

(air-earth interface)Γ

Ω1

2Ω

(a) An earthly domain

h

h

2Ω

Ω1

(b) With a discretization grid

Figure : A 2D cross-section of a 3D domain with a square grid added.



Finite Difference Discretization

• Discretizing using centred differences for the partial derivatives we obtain theequations

4ui,j − ui+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N,

ui,j = 0 otherwise.

• In this difference scheme, ui,j is the value at the (i, j)th node of a squareplanar grid, and bi,j = h2g(ih, jh) are given values at the same gridlocations. Here h = 1/(N + 1) is the grid width.

• See the grid on the next slide, and note in particular the location of a ui,j

and those of its neighbours which appear in the above formula (distinguishedby red dots).

• For ui,j to approximate the differential equation solution u(ih, jh) well, needto set h “sufficiently small”. Hence, N can easily become large. For example,h = 0.01 gives N = 99.



u1,1 u1,2 u1,N

u2,1 u2,2 u2,N

uN,1 uN,2 uN,N

ui,j



A linear system

• Obviously, these are linear relations, so we can express them as a system oflinear equations

Au = b,

where u consists of the n = N2 unknowns ui,j somehow organized as avector, and b is composed likewise from the values bi,j.

• The two-dimensional problem has features related to sparsity that are notseen in the one-dimensional problem; we will return to this point soon.

• Note that n can easily become very large. Even for a simple 2D case withN = 100, the matrix dimension is n = 10, 000. In 3D and the same h wewould get n = 106, so A cannot even be stored as a full matrix. This, for thesimplest problem of its type!



Ordering the unknowns

How should we order the grid unknowns ui,jNi,j=1 into a vector u? We can dothis in many ways. A simple (and rational) way is lexicographically, say bycolumns, which yields

u =

u1,1

u2,1

...uN,1

u1,2

u2,2

...uN,2

u1,3

...uN,N

, b =

b1,1b2,1...

bN,1

b1,2b2,2...

bN,2

b1,3...

bN,N

.



Block tridiagonal matrix

The n× n matrix A (with n = N2) has the form

A =

J −I−I J −I

. . .. . .

. . .

−I J −I−I J

,

where J is the tridiagonal N ×N matrix

J =

4 −1−1 4 −1

. . .. . .

. . .

−1 4 −1−1 4

,

and I denotes the identity matrix of size N .



Example

For instance, if N = 3 then

A =

4 −1 0 −1 0 0 0 0 0−1 4 −1 0 −1 0 0 0 00 −1 4 0 0 −1 0 0 0−1 0 0 4 −1 0 −1 0 00 −1 0 −1 4 −1 0 −1 00 0 −1 0 −1 4 0 0 −10 0 0 −1 0 0 4 −1 00 0 0 0 −1 0 −1 4 −10 0 0 0 0 −1 0 −1 4

.

Note zero-diagonals within the band! In general there are N − 2 such diagonals,because neighbouring unknowns ui,j , ui±1,j±1 are no longer consecutive in thevector u.



Sparsity pattern

Recall figure of the sparsity structure of the 100× 100 (i.e., N = 10) “Poissonmatrix”:

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

nz = 460



Sparsity issues

• There is a fundamental difference between the 1D and the 2D & 3Dproblems.

• The 1D problem (i.e., only x, no y variable) produces a tridiagonal matrix,for which Gaussian elimination (GE) requires a linear, i.e., O(n), number offloating point operations (optimal order of complexity). No iterative methodsare needed here.

• Higher-dimensional problems give rise to matrices that are sparse within theband, and complexity of GE is certainly not linear (in 2D it is O(n2)). This isin fact where iterative methods can be very effective.

• As a rule of thumb, for this problem in 2D special direct methods are stillcompetitive, but in 3D they no longer are.



Eigenvalues of the Poisson matrix

• For any size N , the matrix A is diagonally dominant and nonsingular.

• It can be verified directly that the n = N2 eigenvalues of A are given by

λl,m = 4− 2 (cos(lπh) + cos(mπh)) , 1 ≤ l,m ≤ N

(recall (N + 1)h = 1).

• Thus λl,m > 0 for all 1 ≤ l,m ≤ N , so the matrix A is symmetric positivedefinite.

• It also follows that the condition number satisfies κ(A) =λN,N

λ1,1= O(n).

• Knowing the eigenvalues explicitly is helpful in understanding performance,convergence and accuracy issues related to iterative solvers. Such specificknowledge is not typically available for more complex problems of this type,yet the conclusions drawn for the model Poisson problem are often indicativeof what happens more generally.



Performance of our relaxation methods for Poisson

• Recall that A is composed of relations of the form

4ui,j − u1+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N.

• Can apply the relaxation methods directly, without ever forming A, but bearin mind vectorization issues.

• On a 15× 15 grid: N = 15, n = 225, obtain

relaxation method ω Error after 2 itns Error after 20 itnsJacobi 1 7.1e-2 5.4e-2GS 1 6.9e-2 3.8e-2SOR 1.69 5.6e-2 4.8e-4

• So, Jacobi and GS are both very slow to converge, SOR with ω = 1.69 issignificantly faster.

• For larger N these methods require even more iterations, as we shall soon see.


Linear Systems: Iterative Methods Convergence of general stationary methods

Outline









Convergence of a general stationary method

• Recall our general iterative method

xk+1 = xk +M−1rk = xk +M−1(b−Axk).

• Write this as

xk+1 = Txk +M−1b, T = I −M−1A.

The matrix T is called the iteration matrix.• For the exact solution x∗ = x, we similarly have

x = x+M−1(b−Ax) = Tx+M−1b.

• Define error in the kth iteration

ek = x− xk.

Then, subtracting expression for xk+1 from identity for x,

ek+1 = Tek = T (Tek−1) = · · · = T k+1e0.

• Thus, convergence iff T k → 0.Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 27 / 64


Convergence of stationary methods

• Recall (Chapter 4) that the spectral radius of a square matrix B witheigenvalues µ1, . . . , µn is the maximum eigenvalue magnitude

ρ(B) = maxi|µi|.

• Our stationary method with iteration matrix T converges if and only if

ρ(T ) < 1.

• How fast does the iteration converge? Define rate of convergence

rate = − log10 ρ(T ).

• Then number of iterations required to reduce error by factor 10 (i.e., gain adecimal digit) is

k ≈ 1

rate.



Norm vs spectral radius

• You could ask, why do we need this spectral radius stuff? Indeed we couldsimply write instead that, since ek+1 = Tek,

∥ek+1∥ ≤ ∥T∥∥ek∥ ≤ · · · ≤ ∥T∥k+1∥e0∥,

hence convergence follows if ∥T∥ < 1 in any induced matrix norm.

• Unfortunately, the norm condition is indeed sufficient for convergence, butnot necessary.

• Example: for the iteration (or amplification) matrix

T =

(0 1

41 0

)we have that

T

(10

)=

(01

),

hence in any of our usual norms, ∥T∥ ≥ 1.However, the eigenvalues are ±1

2 , hence ρ(T ) = 12 . So, not only there is

convergence, it is pretty fast, with rate > 14 .



Convergence of Jacobi for model Poisson problem

• Recall matrix is defined by

4ui,j − u1+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N.

Eigenvalues

λl,m = 4− 2

(cos

lπ

N + 1+ cos

mπ

N + 1

), 1 ≤ l,m ≤ N .

• Consider Jacobi: M = D. Then

T = I −D−1A = I − 1

4A.

• So eigenvalues of iteration matrix are

µl,m = 1− 1

4λl,m =

1

2

(cos

lπ

N + 1+ cos

mπ

N + 1

), 1 ≤ l,m ≤ N .



Convergence of Jacobi cont.

• Eigenvalues of T are

µl,m = 1− 1

4λl,m =

1

2

(cos

lπ

N + 1+ cos

mπ

N + 1

), 1 ≤ l,m ≤ N .

• Spectral radius

ρ(T ) = ρJ (T ) = µ1,1 = cosπ

N + 1≤ 1− c

n

• Rate of convergence

rate = − log ρ(T ) ∼ 1/n.

Thus, O(n) iterations are required for error reduction by a constant factor.(Asymptotically the same cost as GE.)



Optimal parameter for SOR

...1 Denote the optimal parameter (i.e., that which reduces error most rapidly) byωopt.

...2 For a class of matrices including model Poisson, the optimal parameter isgiven by

ωopt =2

1 +√1− ρ2J

> 1,

where ρJ is the spectral radius of the Jacobi iteration matrix....3 For the model Poisson problem we therefore obtain

ωopt =2

1 + sin(

πN+1

) ....4 For this class of matrices, the spectral radius of the SOR matrix for ωopt isωopt − 1 = 1− c

N .



Convergence of stationary methods for Poisson

For N = 15, n = 225:

0 500 1000 1500 2000 250010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

JacobiGauss−SeidelSOR



Relaxation methods for the model Poisson problem

• Jacobi requires O(n) iterations hence O(n2) flops

• Gauss-Seidel is twice as fast, so same O• For this problem and others, can rearrange unknowns in red-black order,where all neighbours of a “red” unknown are “black” and vice versa. ThenGS sweep is same as two Jacobi half-sweeps: gain parallelism.

• SOR with optimal ω requires O(N) iterations hence O(n3/2) flops!

• Lower bound: the best possible method would require at least O(n) flops.• So there seems to be room for improvement: are there better methods?


Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Outline









Nonstationary methods

• None of the stationary methods we have seen is very fast even for the modelPoisson problem when n is really large.

• Another, general disadvantage of all these relaxation methods is that theyrequire an explicitly given matrix A: what if A is only given implicitly,through matrix-vector products?

• Try to take advantage of accumulating knowledge as the iteration proceeds.

• Consider M = Mk, and more generally

xk+1 = xk + αkpk.

The scalar αk > 0 is a step size; pk is a search direction.

• Restrict consideration to cases where A is symmetric positive definite.

• Then solving Ax = b is equivalent to

minx

ϕ(x) =1

2xTAx− bTx.



Gradient descent and steepest descent

• Gradient descent: take pk = rk, i.e., Mk = α−1k I.

• How to choose the step size?

• Simple-minded, greedy approach: exact line search.

minα

ϕ(xk + αkrk) =1

2(xk + αrk)

TA(xk + αrk)− bT (xk + αrk).

• Critical point: differentiate wrto α and set to 0. Obtain steepest descent(poor, but popular, choice of name) with

αk =rTk rkrTkArk

=⟨rk, rk⟩⟨rk, Ark⟩

.



Steepest descent algorithm

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,δ0 = ⟨r0, r0⟩, bδ = ⟨b,b⟩ and k = 0. Then:While δk > tol2 bδ,

sk = Ark

αk =δk

⟨rk, sk⟩xk+1 = xk + αkrk

rk+1 = rk − αksk

δk+1 = ⟨rk+1, rk+1⟩k = k + 1.

Note organization so that only one matrix-vector multiplication per iteration isrequired.Properties of this algorithm will be summarized after we introduce the better,conjugate gradient algorithm.



Conjugate gradient (CG) algorithm

Here the search direction is no longer the residual, but rather, a judicious linearcombination of the residual with the previous search direction.

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,δ0 = ⟨r0, r0⟩, bδ = ⟨b,b⟩, k = 0 and p0 = r0. Then:While δk > tol2 bδ,

sk = Apk

αk =δk

⟨pk, sk⟩xk+1 = xk + αkpk

rk+1 = rk − αksk

δk+1 = ⟨rk+1, rk+1⟩

pk+1 = rk+1 +δk+1

δkpk

k = k + 1.



Convergence of methods for Poisson

For N = 31, n = 961:

0 1000 2000 300010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

JacobiGauss−SeidelSOR

0 50 10010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

SORCG



CG algorithm notes

• Although the CG algorithm may look unfriendly at first, it is actually shortand straightforward to program.

• It is arranged to look like a one-step method, where each step (from k tok + 1) involves just one matrix-vector multiplication. The rest are vector andscalar operations.

• Assuming that the cost of matrix-vector product dominates the otheroperations, the total iteration cost is comparable to that of Jacobi.

• Furthermore, here the matrix A is not required explicitly, and we need notknow any of its elements.



Krylov subspace

• Easy to see, for both methods, that

rk = pk(A)r0,

where pk is a polynomial of degree k satisfying pk(0) = 1.

• Also

ek = pk(A)e0, xk − x0 = qk−1(A)r0.

• Define Krylov Subspace of nonsingular C with respect to y by

Kk(C;y) = spany, Cy, C2y, . . . , Ck−1y.

• Thus,

rk ∈ Kk+1(A; r0), and

xk − x0 ∈ Kk(A; r0).



Energy norm

• For B a symmetric positive definite matrix,

∥z∥B =√zTBz ≡

√⟨z, Bz⟩.

• Then

∥rk∥A−1 = ∥ek∥A.

• Can interpret CG as minimizing in kth iteration the energy norm ∥ek∥A overspace x0 +Kk(A; r0).

• Hence, in exact arithmetic, exact solution is obtained after n iterations.



CG properties

• Subspace minimization (and uninteresting exact termination)

• Search directions are A-conjugate:

⟨pl, Apj⟩ = 0, l = j

• Residual directions are orthogonal:

⟨rl, rj⟩ = 0, l = j

• Key convergence rate estimate:

∥ek∥A ≤ 2

(√κ(A)− 1√κ(A) + 1

)k

∥e0∥A,

hence for large κ, number of iterations k for reducing initial error by factor c

k ≤ .5√κ(A) ln(2/c) + 1.



Steepest descent (SD) vs Conjugate gradients (CG)

• Gradient descent uses residual rk as search direction; CG uses

pk = rk + ∥rk∥2

∥rk−1∥2pk−1.

• SD is simpler, greedy, one-step, easier to show convergence. CG is two-step(requires initialization), can be more fragile.

• SD requires O(κ(A)) iterations whereas CG requires O(√κ(A)) iterations!

• For both methods

rk ∈ Kk+1(A; r0), xk − x0 ∈ Kk(A; r0).

• For CG also

xk = argminy∈S

ϕ(y) = argminz∈S∥x− z∥A

where S = x0 +Kk(A; r0). Thus, convergence (in exact arithmetic) in atmost n iterations.



CG vs SOR

• When both methods may be applied optimally (i.e., A is symmetric positivedefinite for CG, and we know ωopt for SOR) these methods requireasymptotically a similar number of iterations.

• However, the SOR parameter is often unknown and hard to approximate well,which is a major drawback of SOR.

• CG uses only matrix-vector multiplications: more applicable.

• Moreover, there are ways to speed up CG by preconditioning, which are notavailable to SOR.

• Conclusion: continue to improve (and approve) CG.



Preconditioning

• Often, even O(√κ(A)) iterations are too many!

(In the Poisson example,√κ(A) ∼ N .)

• Consider solving instead

(P−1/2AP−1/2)(P 1/2x) = P−1/2b

with preconditioner P such that P−1/2AP−1/2 is better conditioned.

• Fortunately, can simply consider

P−1Ax = P−1b,

and we want B = P−1A to be better conditioned than A.

• So, the search is on for a preconditioner P that is both close enough to A toget a good condition number for B and far enough from A to be easilyinvertible.



Preconditioned conjugate gradients (PCG)

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,h0 = P−1r0, δ0 = ⟨r0,h0⟩, bδ = ⟨b, P−1b⟩, k = 0 and p0 = h0. Then:While δk > tol2 bδ,

sk = Apk

αk =δk

⟨pk, sk⟩xk+1 = xk + αkpk

rk+1 = rk − αksk

hk+1 = P−1rk+1

δk+1 = ⟨rk+1,hk+1⟩

pk+1 = hk+1 +δk+1

δkpk

k = k + 1.



Choices for preconditioner

• Symmetric SOR (SSOR)

• Incomplete Cholesky (IC) P = FFT ; more generally incomplete LU (ILU)• IC(’0’) – avoid fill-in altogether• IC(tol) – carry out elimination step only if result is above drop tolerance tol

• Often a good preconditioner is unknown in practice!



Convergence of methods for Poisson

For N = 31, n = 961:

0 10 20 30 40 50 6010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

CGPCG/IC(0)PCG/ICT(.01)


Linear Systems: Iterative Methods *Krylov subspace methods

Outline









Iterative methods for general linear systems

• A major limitation of CG is the requirement that A be symmetric positivedefinite.

• Given a more general (nonsingular) matrix, naively we could solve the normalequations problem ATAx = ATb where now ATA is symmetric positivedefinite. However, the condition number is squared, and hence convergence ismuch slower!

• Opt instead for other Krylov subspace methods.

• Consider also a favourite family of multigrid methods which exploitmulti-scale structure in the given problem.



General nonsingular matrix

• Assume A nonsingular, but not necessarily symmetric positive definite.

• Extend CG by searching in similar Krylov subspaces: xk − x0 ∈ Kk(A; r0)

Kk(A; r0) = spanr0, Ar0, A2r0, . . . , A

k−1r0.

• Building blocks...1 Construct an orthogonal basis for the Krylov subspace....2 Define an optimality property....3 Use an effective preconditioner.



Orthogonal basis for the subspace: Arnoldi

• The obvious basis (powers Ajr0) is poorly conditioned.

• So orthonormalize these vectors: Arnoldi algorithm.

q1 = r0/∥r0∥for j = 1 to k

z = Aqj

for i = 1 to j

hi,j = ⟨qi, z⟩z = z− hi,jqi

end

hj+1,j = ∥z∥if hj+1,j = 0, quit

qj+1 = z/hj+1,j

end



Orthogonal basis in symmetric case: Lanczos

• If A is symmetric, the upper Hessenberg H is tridiagonal.

• Obtain three-term recurrence: Lanczos algorithm.

z = Aqj

γj = ⟨qj , z⟩z = z− βj−1qj−1 − γjqj

βj = ∥z∥qj+1 = z/βj .

Start with q1 = r0/∥r0∥, β0 = 0.



GMRES and MINRES

Minimize residual norm.

• Main components of GMRES iteration:...1 perform a step of the Arnoldi process;...2 update the QR factorization of the updated upper Hessenberg matrix;...3 solve the resulting least squares problem.

• GMRES(m): limited memory GMRES – restart after m iterations

• MINRES: for the symmetric case: no memory problem.



Preconditioning

• For a general matrix A, a preconditioner is often more crucial than P in PCGfor s.p.d.

• Incomplete LU (ILU)

• Incomplete LU with drop tolerance (ILUT)

• Example: convection-diffusion equation

−(∂2u

∂x2+

∂2u

∂y2

)+ σ

∂u

∂x+ τ

∂u

∂y= g(x, y).

0 50 100 150 200 250 300 350 40010

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

102

Iteration

Rel

ativ

e re

sidu

al

non−preconditioned GMRESILUT−preconditioned GMRES


Linear Systems: Iterative Methods *Multigrid method

Outline









Multigrid method

• Consider iteration for model problem: Poisson equation on the unit square.

• Highest frequencies of residual or error correspond to largest eigenvalues,most oscillatory eigenvectors. These are “not seen” on a coarser grid.

• Simple relaxations such as damped Jacobi (ω = .8) or Gauss-Seidel are slowmethods but fast smoothers (reduce high frequency components ofresidual/error).

• Smoothed residual can be transferred to a coarser grid, where procedure canbe repeated, more economically.



Smoothing

Illustration of smoothing effect using damped Jacobi with ω = 0.8 on a “Poissonin 1D”:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

2

resi

dual

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

2

x

smoo

thed

res

idua

l



Smoothing for model Poisson

• Recall eigenvalues

λl,m = 4− 2 (cos(lπh) + cos(mπh)) , 1 ≤ l,m ≤ N

• So, if we move from a grid (mesh) with N ≈ 1/h to a grid with N/2 ≈ 1/H,H = 2h, then the largest eigenvalues, and their corresponding mostoscillatory eigenvectors, are no longer “seen”.

• On the finer, h-grid, need to take care only of error/residual componentswhich correspond, at least in one direction, to the large eigenvalues, i.e.,where either l > N/2 or m > N/2 or both.



Multigrid method cycle

function x = multigrid(A,b,x, level, ν1, ν2, γ)

if level = coarsest, solve exactly Ax = b, return

for j = 1 : ν1, x = relax(A,b,x), end

r = b−Ax

[Ac, rc] = restrict(A, r)

vc = 0

for l = 1 : γ

vc = multigrid(Ac, rc,vc, level − 1, ν1, ν2, γ)

end

x = x+ prolongate(vc)

for j = 1 : ν2, x = relax(A,b,x), end



Multigrid performance

Can be used as a standalone or as a preconditioner for a Krylov subspace method.The latter is more robust, and as such is generally preferred.The figure below is for the model Poisson problem with n = 2552

0 50 100 150 200 250 300 350 40010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Rel

ativ

e re

sidu

al n

orm

CGPCG/ICT(.01)PCG/MGMG



Multigrid performance

Still on model Poisson problem.Number of iterations and CPU times for achieving convergence to 10−6,N = 2l − 1, l = 5, 6, 7, 8, 9:

0 100 200 300 400 500 6000

200

400

600

800

Itera

tions


0 100 200 300 400 500 6000

5

10

15

20

N

CP

U s

ecs



Date post:	12-Apr-2022
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Chapter 7: Linear Systems: Iterative Methods

Documents