+ All Categories
Home > Documents > Lect15_2

Lect15_2

Date post: 21-Dec-2015
Category:
Upload: anuradha19
View: 8 times
Download: 3 times
Share this document with a friend
Description:
svd
Popular Tags:
34
Iterative Methods in Linear Algebra (part 2) Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 27, 2011 CS 594, 04-27-2011
Transcript
Page 1: Lect15_2

Iterative Methods in Linear Algebra(part 2)

Stan Tomov

Innovative Computing LaboratoryComputer Science Department

The University of Tennessee

Wednesday April 27, 2011

CS 594, 04-27-2011

Page 2: Lect15_2

CS 594, 04-27-2011

Page 3: Lect15_2

Outline

Part IKrylov iterative solvers

Part IIConvergence and preconditioning

Part IIIIterative eigen-solvers

CS 594, 04-27-2011

Page 4: Lect15_2

Part I

Krylov iterative solvers

CS 594, 04-27-2011

Page 5: Lect15_2

Krylov iterative solvers

Building blocks for Krylov iterative solvers covered so far

Projection/minimization in a subspace

Petrov-Galerkin conditionsLeast squares minimization, etc.

Orthogonalization

CGS and MGSCholesky or Householder based QR

CS 594, 04-27-2011

Page 6: Lect15_2

Krylov iterative solvers

We also covered abstract formulations foriterative solvers and eigen-solvers

What are the goals of this lecture?

Give specific examples of Krylov solvers

Show how examples relate to the abstract formulation

Show how examples relate to the building blocks covered sofar, specidicly to

Projection, andOrthogonalization

But we are not going into the details!

CS 594, 04-27-2011

Page 7: Lect15_2

Krylov iterative solvers

How are these techniques related to Krylov iterative solvers?

Remember projection slides 26 & 27, Lecture 7 (left)

Projection in a subspace is the basis for an iterative method

Here projection is in VIn Krylov methods V is the Krylov subspace

Km(A, r0) = span{r0, Ar0, A2r0, . . . , Am−1r0}

where r0 ≡ b − Ax0 and x0 is an initial guess.

Often V or W are orthonormalized

The projection is ’easier’ to find when we work with anorthonormal basis(e.g. problem 4 from homework 5: projection in generalvs orthonormal basis)The orthonormalization can be CGS, MGS, Cholesky orHouseholder based, etc.

CS 594, 04-27-2011

Page 8: Lect15_2

Krylov Iterative Methods

To summarize, Krylov iterative methods in general

expend the Krylov subspace by a matrix-vector product, and

do a projection in it.

Various methods result by specific choices of the expansion andprojection.

CS 594, 04-27-2011

Page 9: Lect15_2

Krylov Iterative Methods

A specific example with the

Conjugate Gradient Method (CG)

CS 594, 04-27-2011

Page 10: Lect15_2

Conjugate Gradient Method

The method is for SPD matrices

Both V and W are the Krylov subspaces, i.e. at iteration i

V ≡W ≡ Ki (A, r0) ≡ span{r0, Ar0, . . . , Ai−1r0}

The projection xi ∈ Ki (A, r0) satisfies the Petrov-Galerkinconditions

(Axi , φ) = (b, φ), for ∀φ ∈ Ki (A, r0)

CS 594, 04-27-2011

Page 11: Lect15_2

Conjugate Gradient Method (continued)

At every iteration there is a way (to be shown later) to construct a new search direction pi such that

span{p0, p1, . . . , pi} ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j.

Note: A is SPD ⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product,i.e. p0, . . . , pi is an (·, ·)A orthogonal basis for Ki+1(A, r0)

⇒ we can easily find xi+1 ≈ x as

xi+1 = x0 + α0p0 + · · · + αi pi s.t.

(Axi+1, pj ) = (b, pj ) for j = 0, . . . , i

Namely, because of the (·, ·)A orthogonality of p0, . . . , pi at iteration i + 1 we have to find only αi

(Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi =(ri , pi )

(Api , pi )

Note: xi above actually can be replaced by any x0 + v , v ∈ Ki (A, r0) (Why?)

CS 594, 04-27-2011

Page 12: Lect15_2

Conjugate Gradient Method (continued)

Conjugate Gradient Method

1: Compute r0 = b − Ax0 for some initial guess x0

2: for i = 0 to ... do3: ρi = rT

i ri

4: if i = 0 then5: p0 = r0

6: else7: pi = ri +

ρiρi−1

pi−1

8: end if9: qi = Api

10: αi =ρi

piT qi

11: xi+1 = xi + αi pi

12: ri+1 = ri − αi qi

13: check convergence; continue if necessary

14: end for

Note:

One matrix vector product/iteration (at line 9)

Two inner-products/iteration (lines 3 and 10)

In exact arithmetic ri+1 = b − Axi+1(Apply A to both sides of 11 and subtract from b toget line 12)

Update for xi+1 is as pointed out before, i.e. with

αi =(ri , ri )

(Api , pi )=

(ri , pi )

(Api , pi )

since (ri , pi−1) = 0 (exercise)

Other relations to be proved (exercise)

pi s’ span the Krylov spacepi s’ are (·, ·)A orthogonal, etc.

CS 594, 04-27-2011

Page 13: Lect15_2

Conjugate Gradient Method (continued)

To sum it up:

In exact arithmetic we get the exact solution in at most nsteps, i.e.

x = x0 + α0p0 + · · ·+ αipi + αi+1pi+1 + · · ·+ αn−1pn−1

At every iteration one more term αjpj is added to the currentapproximation

xi = x0 + α0p0 + · · ·+ αi−1pi−1

xi+1 = x0 + α0p0 + · · ·+ αi−1pi−1 +αipi ≡ xi + αipi

Note: we do not have to solve linear system at every iterationbecause of the A-orthogonal basis that we manage tomaintain and expend at every iterationIt can be proved that the error ei = x − xi satisfies

||ei ||A ≤ 2

(√k(A)− 1√k(A) + 1

)i

||e0||A

CS 594, 04-27-2011

Page 14: Lect15_2

Building orthogonal basis for a Krylov subspace

Building orthogonal basis for a Krylov subspace

We have seen the importance in

Defining projections

not just for linear solvers

Abstract linear solvers and eigen-solver formulations

A specific example

in CG where the basis for the Krylov subspaces is A-orthogonal(A is SPD)

We have seen how to build it

CGS, MGS, Cholesky or Householder based, etc.

These techniques can be used in a method specificallydesigned for Krylov subspaces (general non-Hermitian matrix),namely in the

Arnoldi’s Method

CS 594, 04-27-2011

Page 15: Lect15_2

Arnoldi’s Method

Arnoldi’s method:

Build an orthogonal basis for Km(A, r0)A can be general, non-Hermitian

1: v1 = r0

2: for j = 1 to m do

3: hij = (Avj , vi ) for i = 1, .., j

4: wj = Avj − h1j v1 − ...− hjj vj

5: hj+1,j = ||wj ||26: if hj+1,j = 0 Stop

7: vj+1 =wj

hj+1,j

8: end for

Note:

This orthogonalization is based on CGS (line 4)

wj = Avj − (Avj , v1)v1 − ...− (Avj , vj )vj

⇒ up to iteration j vectors

v1, ..., vj

are orthogonal

The space of this orthogonal basis grows by taking thenext vector to be Avj

If we do not exit at step 6 we will have

Km(A, r0) = span{v1, v2, . . . , vm}

(exercise)

CS 594, 04-27-2011

Page 16: Lect15_2

Arnoldi’s Method (continued)

Arnoldi’s method in matrix notation

Denote

Vm ≡ [v1, . . . , vm], Hm+1 = {hij}m+1×m

and by Hm the matrix Hm+1 without the last row.

Note that Hm is upper Hessenberg (0s below the lower secondsub-diagonal) and

AVm = VmHm + wmeTm

V Tm AVm = Hm

(exercise)

CS 594, 04-27-2011

Page 17: Lect15_2

Arnoldi’s Method (continued)

Variations:

Explained using CGS

Can be implemented with MGS, Householder, etc.

How to use it in linear solvers?

Example with the Full Orthogonalization Method (FOM)

CS 594, 04-27-2011

Page 18: Lect15_2

FOM

FOM

1: β = ||r0||22: Compute v1, . . . , vm with Arnoldi

3: ym = βH−1m e1

4: xm = x0 + Vmym

Look for solution in the form

xm = x0 + ym(1)v1 + · · · + ym(m)vm

≡ x0 + Vmym

Petrov-Galerkin conditions will be

V Tm Axm = V T

m b

⇒ V Tm A(x0 + Vmym) = V T

m b

⇒ V Tm AVmym = V T

m r0

⇒ Hmym = V Tm r0 = βe1

which is given by steps 3 and 4 of the algorithm

CS 594, 04-27-2011

Page 19: Lect15_2

Restarted FOM

What happens when m increases?

computation grows as at least O(m2)n

memory is O(mn)

A remedy is to restart the algorithm, leading to restarted FOM

FOM(m)

1: β = ||r0||22: Compute v1, . . . , vm with Arnoldi3: ym = βH−1

m e1

4: xm = x0 + Vmym. Stop if residual is small enough.5: Set x0 := xm and go to 1

CS 594, 04-27-2011

Page 20: Lect15_2

GMRES

Generalized Minimum Residual Method (GMRES)Similar to FOM

Again look for solution

xm = x0 + Vmym

where Vm is from the Arnoldi process (i.e. Km(A, r0))The test conditions Wm from the abstract formulation (slide27, Lecture 7)

W Tm AVmym = W T

m r0

are Wm = AVm.

The difference results in step 3 from FOM, namely

ym = βH−1m e1

being replaced by

ym = argminy ||βe1 − Hm+1y ||2

CS 594, 04-27-2011

Page 21: Lect15_2

GMRES

Similarly to FOM, GMRES can be defined with

Various orthogonalizations in the Arnoldi process

Restart

Note:

Solving the least squares (LS) problem

argminy ||βe1 − Hm+1y ||2

can be done with QR factorization as discussed inLecture 7, Slide 25

CS 594, 04-27-2011

Page 22: Lect15_2

Lanczos Algorithm

Can we improve on Arnoldi if A is symmetric?

Yes! Hm becomes symmetric so it will be just 3 diagonal

the simplification of Arnoldi in this case leads to theLanczos Algorithm

Lanczos can be used in deriving CG

The Lanczos Algorithm

1: v1 =r0||r0||2

, β1 = 0, v0 = 0

2: for j = 1 to m do

3: wj = Avj − βj vj−1

4: αj = (wj , vj )

5: wj = wj − αj vj

6: βj+1 = ||wj ||2. If βj+1 = 0 then Stop

7: vj+1 =wjβj+1

8: end for

Matrix Hm here is 3-diagonal with diagonal

hii = αi

and off diagonal

hi,i+1 = βi+1

In exact arithmetic vi s’ are orthogonal but inreality orthogonalization gets lost rapidly

CS 594, 04-27-2011

Page 23: Lect15_2

Choice of basis for the Krylov subspace

We saw how different basis for the Krylov spaces is characteristicfor various methods, e.g.

GMRES uses orthogonal

CG uses A-orthogonal

This is true for other methods as well

Conjugate Residual (CR; for symmetric problems) usesATA-orthogonal (i.e. Api ’s are orthogonal)

ATA-orthogonal basis can be generalized to thenon-symmetric case as well, e.g. in theGeneralized Conjugate Residual (GCR)

CS 594, 04-27-2011

Page 24: Lect15_2

Other Krylov methods

We considered various methods that construct a basis for theKrylov subspaces

Another big class of methods is based on biortogonalization(algorithm due to Lanczos)

For non-symmetric matrices build a pair of bi-orthogonalbases for the two subspaces

Km(A, v1) = span{v1,Av1, . . . ,Am−1v1}

Km(AT ,w1) = span{w1,ATw1, . . . , (A

T )m−1w1}

Examples here are BCG and QMR (not to be discussed)

These methods are more difficult to analyze

CS 594, 04-27-2011

Page 25: Lect15_2

Part II

Convergence and preconditioning

CS 594, 04-27-2011

Page 26: Lect15_2

Convergence

Convergence can be analyzed by

Exploit the optimality properties (of projection) when suchproperties exist

A useful tool is Chebyshev polynomials

Depend on the condition number of the matrix, e.g.

in CG it is

||ei ||A ≤ 2

(√k(A)− 1√k(A) + 1

)i

||e0||A

CS 594, 04-27-2011

Page 27: Lect15_2

Preconditioning

Convergence can be slow or even stagnate

for ill-conditioned matrices (with large condition number)

But can be improved with preconditioning

xi+1 = xi + P(b − Axi )

Think of P as a preconditioner, an operator/matrix P ≈ A−1

for P = A−1 it takes 1 iteration

CS 594, 04-27-2011

Page 28: Lect15_2

Preconditioning

Properties desired in a preconditioner:

Should approximate A−1

Should be easy to compute, apply to a vector, and store

Iterative solvers can be extended to support preconditioning(How?)

CS 594, 04-27-2011

Page 29: Lect15_2

Preconditioning

Extending Iterative solvers to support preconditioning

The same solver can be used but on a modified problem, e.g.

Problem Ax = b is transformed into

PAx = Pb

known as left preconditioning

Problem Ax = b is transformed into

APx = b, x = Pu

known as right preconditioning

Convergence of the modified problem would depend on k(PA)(e.g. with left preconditioning)

CS 594, 04-27-2011

Page 30: Lect15_2

Preconditioning

Examples:

Incomplete LU factorization (e.g. ILU(0))

Jacobi (inverse of the diagonal)

Other stationary iterative solvers (GS, SOR, SSOR)

Block preconditioners and domain decomposition

Additive Schwarz (thing of Block-Jacobi)Multiplicative Schwarz (think of Block-GS)

CS 594, 04-27-2011

Page 31: Lect15_2

Preconditioning

Examples so far:

algebraic preconditioners, i.e. exclusively based on the thematrix

Often, for problems coming from PDEs, PDE and discretizationinformation can be used in designing a preconditioner, e.g.

FFTs’ can be involved to approximate differential operators onregular grids (as in Fourier space the operators are diagonalmatrices)

Grid and problem information to define multigridpreconditioners

Indefinite problems are often composed of sub-blocks that aredefinite: used in defining specific preconditioners and evenmodify solvers for these needs, etc.

CS 594, 04-27-2011

Page 32: Lect15_2

Part III

Iterative eigen-solvers

CS 594, 04-27-2011

Page 33: Lect15_2

Iterative Eigen-Solvers

How are iterative eigensolvers related to Krylov subspaces?

Remember projection slides 29 & 30, Lecture 7 (left)

Again, as in linear solvers, Projection in a subspace is the basis for

an iterative eigen-solver

V and W are often based on Krylov subspaces

Km(A, r0) = span{r0, Ar0, A2r0, . . . , Am−1r0}

where r0 ≡ b − Ax0 and x0 is an initial guess.

Often parts of V or W are orthogonalized

For stabilityThe orthogonalization can be CGS, MGS, Cholesky orHouseholder based, etc.The smaller Rayleigh-Ritz are usually solved withLAPACK routines

CS 594, 04-27-2011

Page 34: Lect15_2

Learning Goals

A brief introduction to Krylov iterative solvers and eigen-solvers

Links to building blocks that we have already covered

Abstract formulationProjection, andOrthogonalization

Specific examples and issues(preconditioning, parallelization, etc.)

CS 594, 04-27-2011