+ All Categories
Home > Documents > Chapter 7: Linear Systems: Iterative Methods

Chapter 7: Linear Systems: Iterative Methods

Date post: 12-Apr-2022
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
65
September 15, 2014 Chapter 7: Linear Systems: Iterative Methods Uri M. Ascher and Chen Greif Department of Computer Science The University of British Columbia {ascher,greif}@cs.ubc.ca Slides for the book A First Course in Numerical Methods (published by SIAM, 2011) http://bookstore.siam.org/cs07/
Transcript
Page 1: Chapter 7: Linear Systems: Iterative Methods

September 15, 2014

Chapter 7: Linear Systems: Iterative Methods

Uri M. Ascher and Chen GreifDepartment of Computer ScienceThe University of British Columbia

ascher,[email protected]

Slides for the bookA First Course in Numerical Methods (published by SIAM, 2011)

http://bookstore.siam.org/cs07/

Page 2: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Goals

Goals of chapter

• To learn simple and effective iterative methods for linear systems where directmethods are ineffective;

• to analyze these methods, establishing when and where they can be appliedand how effective they are;

• to understand modern algorithms, specifically preconditioned conjugategradients;

• *to get introduced to more advanced and more general Krylov subspace andmultigrid techniques which often include the methods of choice for large scalecomputations.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 1 / 64

Page 3: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Outline

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

*advanced

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 2 / 64

Page 4: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Motivation

Iterative methods for a linear problem

• In this chapter we consider the same problem as in Chapter 5: a linear system

Ax = b

where A is nonsingular n× n.

• Iterative method: starting from initial guess x0, generate iteratesx1,x2, . . . ,xk, . . ., hopefully converging to solution x = x∗.

• This approach is typical for nonlinear problems, see Chapter 3 and 9. Here,however, it is applied to a linear problem.

• But why not simply use LU decomposition, or

x = A “backslash′′ b

in Matlab?

• Generally, the matrix A must be somehow special to consider iterativemethods!

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 3 / 64

Page 5: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Motivation

Why (or when) not to use a direct method

• If A is large and sparse, LU decomposition (Gaussian elimination) mayintroduce fill-in.

• Want to take advantage when only a rough approximation x to x is required.

• Want to take advantage when a good x0 approximating x is known (warmstart).

• Sometimes A is not explicitly available, only matrix-vector products Av forany vector v can be efficiently carried out.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 4 / 64

Page 6: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Motivation

The famous Poisson matrixA particularly famous example of a sparse matrix is that of the discretization ofthe Poisson partial differential equation. Here is an example of the sparsitypattern of a 100× 100 Poisson matrix, using the Matlab command spy.

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

nz = 460

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 5 / 64

Page 7: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 6 / 64

Page 8: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Jacobi and Gauss-Seidel relaxation methodsGiven A, denote by D its diagonal part and by E its lower triangular part:D = diag(diag(A)) ; E = tril(A)

e.g.

A =

7 3 1−3 10 21 7 −15

, ⇒ D =

7 0 00 10 00 0 −15

, E =

7 0 0−3 10 01 7 −15

.

Given iterates x0,x1,x2, . . . ,xk, . . . denote residual vector

rk = b−Axk.

Jacobi’s method

xk+1 = xk +D−1rk

Gauss-Seidel method

xk+1 = xk + E−1rk

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 7 / 64

Page 9: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

A stationary method

Given iterates x0,x1,x2, . . . ,xk, . . . denote residual vectors rk = b−Axk.

• Any given matrix M of the same size as A defines a splitting

A = M − (M −A).

• Such a splitting defines the stationary iterative method

Mxk+1 = (M −A)xk + b, k = 0, 1, 2, . . . .

• This iteration can be written equivalently as

xk+1 = xk +M−1rk.

• It is called stationary because M is independent of the iteration counter k.

• Both Jacobi and GS are simple examples of a stationary method.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 8 / 64

Page 10: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Example: Jacobi (simultaneous relaxation)

• Consider the linear system

7x1 + 3x2 + x3 = 3

−3x1 + 10x2 + 2x3 = 4

x1 + 7x2 − 15x3 = 2 .

• Write as

7x1 = 3− 3x2 − x3

10x2 = 4 + 3x1 − 2x3

−15x3 = 2− x1 − 7x2 .

(Corresponds to a splitting A = M −N with M = D.)

• Evaluate right hand side at current iterate k and left hand side as unknown

x(k+1)1 = (3− 3x

(k)2 − x

(k)3 )/7

x(k+1)2 = (4 + 3x

(k)1 − 2x

(k)3 )/10

x(k+1)3 = (2− x

(k)1 − 7x

(k)2 )/(−15) ,

for k = 0, 1, 2, . . .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 9 / 64

Page 11: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Example: Gauss-Seidel

• Write same linear system as

7x1 = 3− 3x2 − x3

10x2 − 3x1 = 4− 2x3

−15x3 + x1 + 7x2 = 2 .

(Corresponds to a splitting A = M −N with M = E.)

• Evaluate right hand side at current iterate k and left hand side as unknown(i.e., forward substitution)

x(k+1)1 = (3− 3x

(k)2 − x

(k)3 )/7

x(k+1)2 = (4 + 3x

(k+1)1 − 2x

(k3 )/10

x(k+1)3 = (2− x

(k+1)1 − 7x

(k+1)2 )/(−15) ,

for k = 0, 1, 2, . . .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 10 / 64

Page 12: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Properties of Jacobi and Gauss-Seidel relaxations

• Jacobi is more easily parallelized.

• Jacobi matrix M is symmetric.

• GS converges whenever Jacobi converges and often (but not always) twice asfast.

• Both Jacobi and GS converge if A is strictly (or at least irreducibly)diagonally dominant. GS is also guaranteed to converge if A is symmetricpositive definite.

• Both methods are simple but converge slowly (if they converge at all). Theyare used as building blocks for faster, more complex methods.

• Both Jacobi and GS are simple examples of a stationary method.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 11 / 64

Page 13: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Stationary iteration and relaxation methods

Over-relaxation and under-relaxation

• There are more sophisticated stationary methods than Jacobi and GS. Themethods introduced below are based on a simple modification of the ancientones.

• Let xk+1 be obtained from xk by either Jacobi or GS.Modify it further by

xk+1 ← ωxk+1 + (1− ω)xk

where ω is a parameter.• Two useful variants:

• Based on Gauss-Seidel (GS) and 1 < ω < 2, obtain faster successiveover-relaxation (SOR).

xk+1 = xk + ω[(1− ω)D + ωE]−1rk.

• Based on Jacobi and ω ≈ 0.8, obtain slower under-relaxation which is a goodsmoother in some applications.

xk+1 = xk + ωD−1rk.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 12 / 64

Page 14: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 13 / 64

Page 15: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Poisson problem

• There are many practical problems involving large, sparse matrices (not our3× 3 example!) where simple stationary methods are relevant. We nowdevelop one such prototype example.

• The Poisson equation is a partial differential equation that in its simplestform is defined on the open unit square, 0 < x, y < 1, and reads

−(∂2u

∂x2+

∂2u

∂y2

)= g(x, y).

Here u(x, y) is the unknown function sought and g(x, y) is a given source.

• Boundary conditions: assume that the sought function u(x, y) satisfieshomogeneous Dirichlet boundary conditions along the entire boundary of theunit square, written as

u(x, 0) = u(x, 1) = u(0, y) = u(1, y) = 0.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 14 / 64

Page 16: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

The need to discretize

σ=0

σ>0

(air)

(ground)

(air-earth interface)Γ

Ω1

(a) An earthly domain

h

h

Ω1

(b) With a discretization grid

Figure : A 2D cross-section of a 3D domain with a square grid added.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 15 / 64

Page 17: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Finite Difference Discretization

• Discretizing using centred differences for the partial derivatives we obtain theequations

4ui,j − ui+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N,

ui,j = 0 otherwise.

• In this difference scheme, ui,j is the value at the (i, j)th node of a squareplanar grid, and bi,j = h2g(ih, jh) are given values at the same gridlocations. Here h = 1/(N + 1) is the grid width.

• See the grid on the next slide, and note in particular the location of a ui,j

and those of its neighbours which appear in the above formula (distinguishedby red dots).

• For ui,j to approximate the differential equation solution u(ih, jh) well, needto set h “sufficiently small”. Hence, N can easily become large. For example,h = 0.01 gives N = 99.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 16 / 64

Page 18: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

u1,1 u1,2 u1,N

u2,1 u2,2 u2,N

uN,1 uN,2 uN,N

ui,j

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 17 / 64

Page 19: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

A linear system

• Obviously, these are linear relations, so we can express them as a system oflinear equations

Au = b,

where u consists of the n = N2 unknowns ui,j somehow organized as avector, and b is composed likewise from the values bi,j.

• The two-dimensional problem has features related to sparsity that are notseen in the one-dimensional problem; we will return to this point soon.

• Note that n can easily become very large. Even for a simple 2D case withN = 100, the matrix dimension is n = 10, 000. In 3D and the same h wewould get n = 106, so A cannot even be stored as a full matrix. This, for thesimplest problem of its type!

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 18 / 64

Page 20: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Ordering the unknowns

How should we order the grid unknowns ui,jNi,j=1 into a vector u? We can dothis in many ways. A simple (and rational) way is lexicographically, say bycolumns, which yields

u =

u1,1

u2,1

...uN,1

u1,2

u2,2

...uN,2

u1,3

...uN,N

, b =

b1,1b2,1...

bN,1

b1,2b2,2...

bN,2

b1,3...

bN,N

.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 19 / 64

Page 21: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Block tridiagonal matrix

The n× n matrix A (with n = N2) has the form

A =

J −I−I J −I

. . .. . .

. . .

−I J −I−I J

,

where J is the tridiagonal N ×N matrix

J =

4 −1−1 4 −1

. . .. . .

. . .

−1 4 −1−1 4

,

and I denotes the identity matrix of size N .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 20 / 64

Page 22: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Example

For instance, if N = 3 then

A =

4 −1 0 −1 0 0 0 0 0−1 4 −1 0 −1 0 0 0 00 −1 4 0 0 −1 0 0 0−1 0 0 4 −1 0 −1 0 00 −1 0 −1 4 −1 0 −1 00 0 −1 0 −1 4 0 0 −10 0 0 −1 0 0 4 −1 00 0 0 0 −1 0 −1 4 −10 0 0 0 0 −1 0 −1 4

.

Note zero-diagonals within the band! In general there are N − 2 such diagonals,because neighbouring unknowns ui,j , ui±1,j±1 are no longer consecutive in thevector u.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 21 / 64

Page 23: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Sparsity pattern

Recall figure of the sparsity structure of the 100× 100 (i.e., N = 10) “Poissonmatrix”:

0 10 20 30 40 50 60 70 80 90 100

0

10

20

30

40

50

60

70

80

90

100

nz = 460

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 22 / 64

Page 24: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Sparsity issues

• There is a fundamental difference between the 1D and the 2D & 3Dproblems.

• The 1D problem (i.e., only x, no y variable) produces a tridiagonal matrix,for which Gaussian elimination (GE) requires a linear, i.e., O(n), number offloating point operations (optimal order of complexity). No iterative methodsare needed here.

• Higher-dimensional problems give rise to matrices that are sparse within theband, and complexity of GE is certainly not linear (in 2D it is O(n2)). This isin fact where iterative methods can be very effective.

• As a rule of thumb, for this problem in 2D special direct methods are stillcompetitive, but in 3D they no longer are.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 23 / 64

Page 25: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Eigenvalues of the Poisson matrix

• For any size N , the matrix A is diagonally dominant and nonsingular.

• It can be verified directly that the n = N2 eigenvalues of A are given by

λl,m = 4− 2 (cos(lπh) + cos(mπh)) , 1 ≤ l,m ≤ N

(recall (N + 1)h = 1).

• Thus λl,m > 0 for all 1 ≤ l,m ≤ N , so the matrix A is symmetric positivedefinite.

• It also follows that the condition number satisfies κ(A) =λN,N

λ1,1= O(n).

• Knowing the eigenvalues explicitly is helpful in understanding performance,convergence and accuracy issues related to iterative solvers. Such specificknowledge is not typically available for more complex problems of this type,yet the conclusions drawn for the model Poisson problem are often indicativeof what happens more generally.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 24 / 64

Page 26: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods The Poisson model problem

Performance of our relaxation methods for Poisson

• Recall that A is composed of relations of the form

4ui,j − u1+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N.

• Can apply the relaxation methods directly, without ever forming A, but bearin mind vectorization issues.

• On a 15× 15 grid: N = 15, n = 225, obtain

relaxation method ω Error after 2 itns Error after 20 itnsJacobi 1 7.1e-2 5.4e-2GS 1 6.9e-2 3.8e-2SOR 1.69 5.6e-2 4.8e-4

• So, Jacobi and GS are both very slow to converge, SOR with ω = 1.69 issignificantly faster.

• For larger N these methods require even more iterations, as we shall soon see.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 25 / 64

Page 27: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 26 / 64

Page 28: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Convergence of a general stationary method

• Recall our general iterative method

xk+1 = xk +M−1rk = xk +M−1(b−Axk).

• Write this as

xk+1 = Txk +M−1b, T = I −M−1A.

The matrix T is called the iteration matrix.• For the exact solution x∗ = x, we similarly have

x = x+M−1(b−Ax) = Tx+M−1b.

• Define error in the kth iteration

ek = x− xk.

Then, subtracting expression for xk+1 from identity for x,

ek+1 = Tek = T (Tek−1) = · · · = T k+1e0.

• Thus, convergence iff T k → 0.Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 27 / 64

Page 29: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Convergence of stationary methods

• Recall (Chapter 4) that the spectral radius of a square matrix B witheigenvalues µ1, . . . , µn is the maximum eigenvalue magnitude

ρ(B) = maxi|µi|.

• Our stationary method with iteration matrix T converges if and only if

ρ(T ) < 1.

• How fast does the iteration converge? Define rate of convergence

rate = − log10 ρ(T ).

• Then number of iterations required to reduce error by factor 10 (i.e., gain adecimal digit) is

k ≈ 1

rate.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 28 / 64

Page 30: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Norm vs spectral radius

• You could ask, why do we need this spectral radius stuff? Indeed we couldsimply write instead that, since ek+1 = Tek,

∥ek+1∥ ≤ ∥T∥∥ek∥ ≤ · · · ≤ ∥T∥k+1∥e0∥,

hence convergence follows if ∥T∥ < 1 in any induced matrix norm.

• Unfortunately, the norm condition is indeed sufficient for convergence, butnot necessary.

• Example: for the iteration (or amplification) matrix

T =

(0 1

41 0

)we have that

T

(10

)=

(01

),

hence in any of our usual norms, ∥T∥ ≥ 1.However, the eigenvalues are ±1

2 , hence ρ(T ) = 12 . So, not only there is

convergence, it is pretty fast, with rate > 14 .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 29 / 64

Page 31: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Convergence of Jacobi for model Poisson problem

• Recall matrix is defined by

4ui,j − u1+1,j − ui−1,j − ui,j+1 − ui,j−1 = bi,j , 1 ≤ i, j ≤ N.

Eigenvalues

λl,m = 4− 2

(cos

N + 1+ cos

N + 1

), 1 ≤ l,m ≤ N .

• Consider Jacobi: M = D. Then

T = I −D−1A = I − 1

4A.

• So eigenvalues of iteration matrix are

µl,m = 1− 1

4λl,m =

1

2

(cos

N + 1+ cos

N + 1

), 1 ≤ l,m ≤ N .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 30 / 64

Page 32: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Convergence of Jacobi cont.

• Eigenvalues of T are

µl,m = 1− 1

4λl,m =

1

2

(cos

N + 1+ cos

N + 1

), 1 ≤ l,m ≤ N .

• Spectral radius

ρ(T ) = ρJ (T ) = µ1,1 = cosπ

N + 1≤ 1− c

n

• Rate of convergence

rate = − log ρ(T ) ∼ 1/n.

Thus, O(n) iterations are required for error reduction by a constant factor.(Asymptotically the same cost as GE.)

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 31 / 64

Page 33: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Optimal parameter for SOR

...1 Denote the optimal parameter (i.e., that which reduces error most rapidly) byωopt.

...2 For a class of matrices including model Poisson, the optimal parameter isgiven by

ωopt =2

1 +√1− ρ2J

> 1,

where ρJ is the spectral radius of the Jacobi iteration matrix....3 For the model Poisson problem we therefore obtain

ωopt =2

1 + sin(

πN+1

) ....4 For this class of matrices, the spectral radius of the SOR matrix for ωopt isωopt − 1 = 1− c

N .

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 32 / 64

Page 34: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Convergence of stationary methods for Poisson

For N = 15, n = 225:

0 500 1000 1500 2000 250010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

JacobiGauss−SeidelSOR

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 33 / 64

Page 35: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Convergence of general stationary methods

Relaxation methods for the model Poisson problem

• Jacobi requires O(n) iterations hence O(n2) flops

• Gauss-Seidel is twice as fast, so same O• For this problem and others, can rearrange unknowns in red-black order,where all neighbours of a “red” unknown are “black” and vice versa. ThenGS sweep is same as two Jacobi half-sweeps: gain parallelism.

• SOR with optimal ω requires O(N) iterations hence O(n3/2) flops!

• Lower bound: the best possible method would require at least O(n) flops.• So there seems to be room for improvement: are there better methods?

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 34 / 64

Page 36: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 35 / 64

Page 37: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Nonstationary methods

• None of the stationary methods we have seen is very fast even for the modelPoisson problem when n is really large.

• Another, general disadvantage of all these relaxation methods is that theyrequire an explicitly given matrix A: what if A is only given implicitly,through matrix-vector products?

• Try to take advantage of accumulating knowledge as the iteration proceeds.

• Consider M = Mk, and more generally

xk+1 = xk + αkpk.

The scalar αk > 0 is a step size; pk is a search direction.

• Restrict consideration to cases where A is symmetric positive definite.

• Then solving Ax = b is equivalent to

minx

ϕ(x) =1

2xTAx− bTx.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 36 / 64

Page 38: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Gradient descent and steepest descent

• Gradient descent: take pk = rk, i.e., Mk = α−1k I.

• How to choose the step size?

• Simple-minded, greedy approach: exact line search.

minα

ϕ(xk + αkrk) =1

2(xk + αrk)

TA(xk + αrk)− bT (xk + αrk).

• Critical point: differentiate wrto α and set to 0. Obtain steepest descent(poor, but popular, choice of name) with

αk =rTk rkrTkArk

=⟨rk, rk⟩⟨rk, Ark⟩

.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 37 / 64

Page 39: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Steepest descent algorithm

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,δ0 = ⟨r0, r0⟩, bδ = ⟨b,b⟩ and k = 0. Then:While δk > tol2 bδ,

sk = Ark

αk =δk

⟨rk, sk⟩xk+1 = xk + αkrk

rk+1 = rk − αksk

δk+1 = ⟨rk+1, rk+1⟩k = k + 1.

Note organization so that only one matrix-vector multiplication per iteration isrequired.Properties of this algorithm will be summarized after we introduce the better,conjugate gradient algorithm.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 38 / 64

Page 40: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Conjugate gradient (CG) algorithm

Here the search direction is no longer the residual, but rather, a judicious linearcombination of the residual with the previous search direction.

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,δ0 = ⟨r0, r0⟩, bδ = ⟨b,b⟩, k = 0 and p0 = r0. Then:While δk > tol2 bδ,

sk = Apk

αk =δk

⟨pk, sk⟩xk+1 = xk + αkpk

rk+1 = rk − αksk

δk+1 = ⟨rk+1, rk+1⟩

pk+1 = rk+1 +δk+1

δkpk

k = k + 1.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 39 / 64

Page 41: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Convergence of methods for Poisson

For N = 31, n = 961:

0 1000 2000 300010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

JacobiGauss−SeidelSOR

0 50 10010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

SORCG

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 40 / 64

Page 42: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

CG algorithm notes

• Although the CG algorithm may look unfriendly at first, it is actually shortand straightforward to program.

• It is arranged to look like a one-step method, where each step (from k tok + 1) involves just one matrix-vector multiplication. The rest are vector andscalar operations.

• Assuming that the cost of matrix-vector product dominates the otheroperations, the total iteration cost is comparable to that of Jacobi.

• Furthermore, here the matrix A is not required explicitly, and we need notknow any of its elements.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 41 / 64

Page 43: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Krylov subspace

• Easy to see, for both methods, that

rk = pk(A)r0,

where pk is a polynomial of degree k satisfying pk(0) = 1.

• Also

ek = pk(A)e0, xk − x0 = qk−1(A)r0.

• Define Krylov Subspace of nonsingular C with respect to y by

Kk(C;y) = spany, Cy, C2y, . . . , Ck−1y.

• Thus,

rk ∈ Kk+1(A; r0), and

xk − x0 ∈ Kk(A; r0).

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 42 / 64

Page 44: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Energy norm

• For B a symmetric positive definite matrix,

∥z∥B =√zTBz ≡

√⟨z, Bz⟩.

• Then

∥rk∥A−1 = ∥ek∥A.

• Can interpret CG as minimizing in kth iteration the energy norm ∥ek∥A overspace x0 +Kk(A; r0).

• Hence, in exact arithmetic, exact solution is obtained after n iterations.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 43 / 64

Page 45: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

CG properties

• Subspace minimization (and uninteresting exact termination)

• Search directions are A-conjugate:

⟨pl, Apj⟩ = 0, l = j

• Residual directions are orthogonal:

⟨rl, rj⟩ = 0, l = j

• Key convergence rate estimate:

∥ek∥A ≤ 2

(√κ(A)− 1√κ(A) + 1

)k

∥e0∥A,

hence for large κ, number of iterations k for reducing initial error by factor c

k ≤ .5√κ(A) ln(2/c) + 1.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 44 / 64

Page 46: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Steepest descent (SD) vs Conjugate gradients (CG)

• Gradient descent uses residual rk as search direction; CG uses

pk = rk + ∥rk∥2

∥rk−1∥2pk−1.

• SD is simpler, greedy, one-step, easier to show convergence. CG is two-step(requires initialization), can be more fragile.

• SD requires O(κ(A)) iterations whereas CG requires O(√κ(A)) iterations!

• For both methods

rk ∈ Kk+1(A; r0), xk − x0 ∈ Kk(A; r0).

• For CG also

xk = argminy∈S

ϕ(y) = argminz∈S∥x− z∥A

where S = x0 +Kk(A; r0). Thus, convergence (in exact arithmetic) in atmost n iterations.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 45 / 64

Page 47: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

CG vs SOR

• When both methods may be applied optimally (i.e., A is symmetric positivedefinite for CG, and we know ωopt for SOR) these methods requireasymptotically a similar number of iterations.

• However, the SOR parameter is often unknown and hard to approximate well,which is a major drawback of SOR.

• CG uses only matrix-vector multiplications: more applicable.

• Moreover, there are ways to speed up CG by preconditioning, which are notavailable to SOR.

• Conclusion: continue to improve (and approve) CG.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 46 / 64

Page 48: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Preconditioning

• Often, even O(√κ(A)) iterations are too many!

(In the Poisson example,√κ(A) ∼ N .)

• Consider solving instead

(P−1/2AP−1/2)(P 1/2x) = P−1/2b

with preconditioner P such that P−1/2AP−1/2 is better conditioned.

• Fortunately, can simply consider

P−1Ax = P−1b,

and we want B = P−1A to be better conditioned than A.

• So, the search is on for a preconditioner P that is both close enough to A toget a good condition number for B and far enough from A to be easilyinvertible.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 47 / 64

Page 49: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Preconditioned conjugate gradients (PCG)

Given an initial guess x0 and a tolerance tol, set at first r0 = b−Ax0,h0 = P−1r0, δ0 = ⟨r0,h0⟩, bδ = ⟨b, P−1b⟩, k = 0 and p0 = h0. Then:While δk > tol2 bδ,

sk = Apk

αk =δk

⟨pk, sk⟩xk+1 = xk + αkpk

rk+1 = rk − αksk

hk+1 = P−1rk+1

δk+1 = ⟨rk+1,hk+1⟩

pk+1 = hk+1 +δk+1

δkpk

k = k + 1.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 48 / 64

Page 50: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Choices for preconditioner

• Symmetric SOR (SSOR)

• Incomplete Cholesky (IC) P = FFT ; more generally incomplete LU (ILU)• IC(’0’) – avoid fill-in altogether• IC(tol) – carry out elimination step only if result is above drop tolerance tol

• Often a good preconditioner is unknown in practice!

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 49 / 64

Page 51: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods Gradient descnet and conjugate gradient methods

Convergence of methods for Poisson

For N = 31, n = 961:

0 10 20 30 40 50 6010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Res

idua

l nor

m

CGPCG/IC(0)PCG/ICT(.01)

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 50 / 64

Page 52: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 51 / 64

Page 53: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

Iterative methods for general linear systems

• A major limitation of CG is the requirement that A be symmetric positivedefinite.

• Given a more general (nonsingular) matrix, naively we could solve the normalequations problem ATAx = ATb where now ATA is symmetric positivedefinite. However, the condition number is squared, and hence convergence ismuch slower!

• Opt instead for other Krylov subspace methods.

• Consider also a favourite family of multigrid methods which exploitmulti-scale structure in the given problem.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 52 / 64

Page 54: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

General nonsingular matrix

• Assume A nonsingular, but not necessarily symmetric positive definite.

• Extend CG by searching in similar Krylov subspaces: xk − x0 ∈ Kk(A; r0)

Kk(A; r0) = spanr0, Ar0, A2r0, . . . , A

k−1r0.

• Building blocks...1 Construct an orthogonal basis for the Krylov subspace....2 Define an optimality property....3 Use an effective preconditioner.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 53 / 64

Page 55: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

Orthogonal basis for the subspace: Arnoldi

• The obvious basis (powers Ajr0) is poorly conditioned.

• So orthonormalize these vectors: Arnoldi algorithm.

q1 = r0/∥r0∥for j = 1 to k

z = Aqj

for i = 1 to j

hi,j = ⟨qi, z⟩z = z− hi,jqi

end

hj+1,j = ∥z∥if hj+1,j = 0, quit

qj+1 = z/hj+1,j

end

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 54 / 64

Page 56: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

Orthogonal basis in symmetric case: Lanczos

• If A is symmetric, the upper Hessenberg H is tridiagonal.

• Obtain three-term recurrence: Lanczos algorithm.

z = Aqj

γj = ⟨qj , z⟩z = z− βj−1qj−1 − γjqj

βj = ∥z∥qj+1 = z/βj .

Start with q1 = r0/∥r0∥, β0 = 0.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 55 / 64

Page 57: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

GMRES and MINRES

Minimize residual norm.

• Main components of GMRES iteration:...1 perform a step of the Arnoldi process;...2 update the QR factorization of the updated upper Hessenberg matrix;...3 solve the resulting least squares problem.

• GMRES(m): limited memory GMRES – restart after m iterations

• MINRES: for the symmetric case: no memory problem.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 56 / 64

Page 58: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Krylov subspace methods

Preconditioning

• For a general matrix A, a preconditioner is often more crucial than P in PCGfor s.p.d.

• Incomplete LU (ILU)

• Incomplete LU with drop tolerance (ILUT)

• Example: convection-diffusion equation

−(∂2u

∂x2+

∂2u

∂y2

)+ σ

∂u

∂x+ τ

∂u

∂y= g(x, y).

0 50 100 150 200 250 300 350 40010

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

101

102

Iteration

Rel

ativ

e re

sidu

al

non−preconditioned GMRESILUT−preconditioned GMRES

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 57 / 64

Page 59: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Outline

• Stationary iteration and relaxation methods

• Application: model Poisson problem

• Convergence of stationary methods

• Conjugate gradient method

• *Krylov subspace methods

• *Multigrid methods

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 58 / 64

Page 60: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Multigrid method

• Consider iteration for model problem: Poisson equation on the unit square.

• Highest frequencies of residual or error correspond to largest eigenvalues,most oscillatory eigenvectors. These are “not seen” on a coarser grid.

• Simple relaxations such as damped Jacobi (ω = .8) or Gauss-Seidel are slowmethods but fast smoothers (reduce high frequency components ofresidual/error).

• Smoothed residual can be transferred to a coarser grid, where procedure canbe repeated, more economically.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 59 / 64

Page 61: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Smoothing

Illustration of smoothing effect using damped Jacobi with ω = 0.8 on a “Poissonin 1D”:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

2

resi

dual

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−2

−1

0

1

2

x

smoo

thed

res

idua

l

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 60 / 64

Page 62: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Smoothing for model Poisson

• Recall eigenvalues

λl,m = 4− 2 (cos(lπh) + cos(mπh)) , 1 ≤ l,m ≤ N

• So, if we move from a grid (mesh) with N ≈ 1/h to a grid with N/2 ≈ 1/H,H = 2h, then the largest eigenvalues, and their corresponding mostoscillatory eigenvectors, are no longer “seen”.

• On the finer, h-grid, need to take care only of error/residual componentswhich correspond, at least in one direction, to the large eigenvalues, i.e.,where either l > N/2 or m > N/2 or both.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 61 / 64

Page 63: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Multigrid method cycle

function x = multigrid(A,b,x, level, ν1, ν2, γ)

if level = coarsest, solve exactly Ax = b, return

for j = 1 : ν1, x = relax(A,b,x), end

r = b−Ax

[Ac, rc] = restrict(A, r)

vc = 0

for l = 1 : γ

vc = multigrid(Ac, rc,vc, level − 1, ν1, ν2, γ)

end

x = x+ prolongate(vc)

for j = 1 : ν2, x = relax(A,b,x), end

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 62 / 64

Page 64: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Multigrid performance

Can be used as a standalone or as a preconditioner for a Krylov subspace method.The latter is more robust, and as such is generally preferred.The figure below is for the model Poisson problem with n = 2552

0 50 100 150 200 250 300 350 40010

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Iterations

Rel

ativ

e re

sidu

al n

orm

CGPCG/ICT(.01)PCG/MGMG

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 63 / 64

Page 65: Chapter 7: Linear Systems: Iterative Methods

Linear Systems: Iterative Methods *Multigrid method

Multigrid performance

Still on model Poisson problem.Number of iterations and CPU times for achieving convergence to 10−6,N = 2l − 1, l = 5, 6, 7, 8, 9:

0 100 200 300 400 500 6000

200

400

600

800

Itera

tions

CGPCG/ICT(.01)PCG/MGMG

0 100 200 300 400 500 6000

5

10

15

20

N

CP

U s

ecs

CGPCG/ICT(.01)PCG/MGMG

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods September 15, 2014 64 / 64


Recommended