Iterative linear solvers Conjugate Gradients - foam-extend.fsb.hr€¦ · Ph.D. Positive–...

OFW15 2020

Tessa Uroic,Ph.D.

Positive–definitematrix

Method ofSteepestDescent

Method ofConjugateDirections

ConjugateGradientMethod

Preconditioning

Other variantsof Krylovsubspacesolvers

Summary

Iterative linear solversConjugate Gradients

Tessa Uroic, Ph.D.

Chair of TurbomachineryDepartment of Energy, Power Engineering and EnvironmentFaculty of Mechanical Engineering and Naval Architecture,

University of Zagreb

[email protected]

June 23, 2020

This lecture was inspired by J.R. Shewchuk (An introduction to the Conjugate Gradient Method without theagonizing pain, 1994).

1 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

OverviewFSB

1 Positive–definite matrix

2 Method of Steepest Descent

3 Method of Conjugate Directions

4 Conjugate Gradient Method

5 Preconditioning

6 Other variants of Krylov subspace solvers

7 Summary

2 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

What is a positive–definite matrix?FSB• A matrix is positive-definite if, for every nonzero vector x:

x>Ax > 0• It is not a very intuitive idea, and it’s hard to imagine how a matrix that is

positive-definite might look differently from one that isn’t. We will get afeeling for what positive-definiteness is about when we see how it affectsthe shape of quadratic forms.

• A quadratic form is simply a scalar, quadratic function of a vector with theform

f(x) =12

x>Ax− b>x + c

where A is a matrix, x and b are vectors, and c is a scalar constant.• Take this example: [

3 22 6

]×[x1x2

]=[

2−8

],

where c = 0.• The solution lies at the intersection of n hyperplanes, each having

dimension n− 1. For this problem, the solution is[x1x2

]=[

2−2

]3 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Quadratic formFSB

Figure 1:Left: Graph of a quadratic form. The minimum point is the solution to Ax = b.Right: Contours of the quadratic form. Each ellipsoidal curve has constant f(x).

4 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Quadratic formFSB

• The gradient of a quadratic form is a vector field that, for a given point x,points in the direction of greatest increase of f(x):

f ′(x) =

∂∂x1

f(x)∂∂x2

f(x)...

∂∂xn

f(x)

• Now, calculate the gradient of the quadratic form:

f ′(x) =12

A>x +12

Ax− b

• If A is symmetric, this reduces to: f ′(x) = Ax− b• Setting the gradient to 0, we obtain the system of equations we wish to

solve. Therefore, the solution to Ax = b is a critical point of f(x). If A ispositive-definite and symmetric, solution of the system is a minimum off(x).

5 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Quadratic formFSB

Figure 2: Gradient f ′(x) of the quadratic form. For every x, the gradient points inthe direction of steepest increase of f(x), and is orthogonal to the contour lines.

6 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Quadratic formFSB

• The fact that f(x) is a paraboloid is our best intuition of what it means fora matrix to be positive-definite. If is not positive-definite, there are severalother possibilities:• A could be negative-definite - the result of negating a

positive-definite matrix (hold the paraboloid upside-down).• A might be singular, in which case no solution is unique; the set of

solutions is a line or hyperplane having a uniform value for f .• If A is none of the above, then x is a saddle point, and techniques

like Steepest Descent and CG will likely fail.• The values of b and c determine where the minimum point of the

paraboloid lies, but do not affect the paraboloid’s shape.

7 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Quadratic formFSB

Figure 3:(a) Quadratic form for a positive-definite matrix.(b) For a negative-definite matrix.(c) For a singular (and positive-indefinite) matrix. A line that runs through the bottom of the valley is theset of solutions.(d) For an indefinite matrix. Because the solution is a saddle point, Steepest Descent and CG will not work.In three dimensions or higher, a singular matrix can also have a saddle.

8 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Method of Steepest DescentFSB• Let’s recap:

• Error is a vector which indicates how far we are from the solution:

e(i) = x(i) − x

• Residual indicates how far we are from the correct value of the righthand side of the system:

r(i) = b−Ax(i)

• The residual is actually the error which is transformed by A into thespace of b:

r(i) = −Ae(i)

• More importantly:

r(i) = −f ′(x(i))

• In the method of the Steepest Descent, we start at an arbitrary point x0and take a series of steps (x1, x2, ...) sliding down the paraboloid, until weare satisfied that we are close enough to the solution x. When we take astep, we choose the direction in which f decreases most quickly, which isthe direction opposite to f ′(x(i)). So the residual is the direction of thesteepest descent.

• But, how big a step should we take?9 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Line searchFSB

• A line search is a procedure that chooses α to minimize f along a line. Weare restricted to choose a point on the intersection of the vertical plane andthe paraboloid (Figure, b). Figure, c shows the parabola at the intersection.What is the value of α at the bottom of the parabola?

• From calculus, α minimizes f when

d

dαf(x(1)) = 0

• By the chain rule:

d

dαf(x(1)) = f ′(x(1))

d

dαx(1) = f ′(x(1))r(0)

• Setting this expresion to zero (scalar product of two vectors) leads us toconclusion that α should be chosen so that r(0) and f ′(x(1)) areorthogonal (Figure, d).

10 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Line searchFSB

Figure 4: The method of Steepest Descent.

(a) Starting at x(0) =[−2−2

], take a step in the direction of steepest descent of f

(b) Find the point on the intersection of these two surfaces that minimizes f(c) This parabola is the intersection of surfaces. The bottommost point is our target.(d) The gradient at the bottommost point is orthogonal to the gradient of the previous step.

11 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Line searchFSBA visual explanation: Take a look at the Figure. It shows the gradient vectors atvarious points along the search line. The slope of the parabola at any point isequal to the magnitude of the projection of the gradient onto the line. Theseprojections actually represent the rate of increase of f as one traverses the searchline. f is minimised where projection is zero (gradient is orthogonal to the searchline).

Figure 5: The gradient f ′ is shown at several locations along the search line (solid arrows). Eachgradient’s projection onto the line is also shown (dotted arrows). The gradient vectors represent the directionof steepest increase of f , and the projections represent the rate of increase as one traverses the search line.On the search line, f is minimised where the gradient is orthogonal to the search line.

12 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Line searchFSBTo determine α:

f ′(x(1)) = −r(1)

r(1)r(0) = 0

(b−Ax(1))r(0) = 0[b−A(x(0) + αr(0))

]r(0) = 0

(b−Ax(0))r(0) − α(Ar(0))r(0) = 0

(b−Ax(0))r(0) = α(Ar(0))r(0)

r(0)r(0) = αr(0)(Ar(0))

α =r(0)r(0)

r(0)Ar(0)

13 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Method of Steepest DescentFSBFinally, the method of Steepest Descent is:

r(i) = b−Ax(i)

α(i) =(r(i))>r(i)

(r(i))>Ar(i)

x(i+1) = x(i) + α(i)r(i)

Figure 6: Here, the method of Steepest Descent starts at x(0) =[−2−2

]and converges at x =

[2−2

].

Note the zig–zag path, which appears because each gradient is orthogonal to the previous gradient.14 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Method of Conjugate DirectionsFSB

• Steepest Descent often finds itself taking steps in the same direction asearlier steps. Wouldn’t it be better if, every time we took a step, we got itright the first time?

• Let’s pick a set of orthogonal search directions d(0),d(1), ...,d(n). In eachsearch direction, we’ll take exactly one step, and that step will be just theright length to line up evenly with x. After n steps, we’ll be done.

• For each step choose a point:

x(i+1) = x(i) + α(i)d(i)

• From the fact that the search direction should be orthogonal to the errorterm, find α:

(d(i))>e(i+1) = 0...

α(i) = −(d(i))>(e(i))(d(i))>d(i)

• But, we don’t know the error term, so we haven’t accomplished anything.

15 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Method of Conjugate DirectionsFSB• Solution: make the search directions A-orthogonal or conjugate. Two vectors (d(i)) and (d(j))

are conjugate if:

(d(i))>Ad(j) = 0

• The orthogonality condition corresponds to finding the minimum point along the search directiond(i).

• Fig. a shows what A orthogonal vectors look like. Imagine if you could stretch the paper to deformthe picture until the ellipses appeared circular. The vectors would then appear orthogonal as in Fig.b.

Figure 7: (a): A-orthogonal vectors, (b): orthogonal vectors

16 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Method of Conjugate DirectionsFSB• Expression for α using conjugate directions:

α(i) = −

(d(i))>Ae(i)

(d(i))>Ad(i)

αi = −(d(i))>r(i)

(d(i))>Ad(i)

• For searching A-orthogonal search directions there is a simple process called the conjugateGram–Schmidt process

• When we take a step in a search direction, we don’t need to step in that direction ever again: theerror term is A-orthogonal to the old search directions. So is the residual because(r(i)) = −A(e(i)).

Figure 8: The method of Conjugate Directions converges in n steps. (a) The first step is taken alongsome direction d(0). The minimum point is chosen by the constraint that e(1) must be A-orthogonal tod(0). (b) The initial error e(0) can be expressed as a sum of A-orthogonal components (gray arrows). Eachstep of Conjugate Directions eliminates one of these components.

17 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Gram–Schmidt conjugationFSB

• We have a set of n linearly independent vectors u(0),u(1), ...,u(n−1). Toconstruct the search direction d(i), take u(i) and subtract out anycomponents that are not A-orthogonal to the previous d vectors.

• Set d(0) = u(0)

and for i > 0: d(i) = u(i) +∑i−1

k=0 βikd(i)k

• To find βik - because the search directions are A-orthogonal, it is possibleto eliminate all the βik values but one by premultiplying the expression bydTA:

(d(i))>Ad(j) = (u(i))>Ad(j) +i−1∑k=0

βikd>k Adj

0 = (u(i))>Ad(j) + βij(d(j))>Ad(j), i > j

βij = −(u(i))>Ad(j)

(d(j))>Ad(j)

• The difficulty of using the method is that all the old search vectors must bekept in memory to construct each new one.

18 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Gram–Schmidt conjugationFSB

Figure 9: Gram-Schmidt conjugation of two vectors. Begin with two linearlyindependent vectors u(0) and u(1). Set d(0) = u(0). Vector u(1) is composed oftwo components: u∗ which is A-orthogonal or conjugate to d(0) and u+ which isparallel to d(0). After conjugation, only the A-orthogonal component remains andd(1) = u∗

19 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Conjugate Gradient MethodFSB• Conjugate Gradient Method is actually the method of Conjugate Directions

where the search directions are constructed by conjugation of the residuals.• Why the residual? The residual has the nice property that it’s orthogonal

to the previous search directions, so it’s guaranteed always to produce anew, linearly independent search direction unless the residual is zero, inwhich case the problem is already solved.

Figure 10: In the method of Conjugate Gradients, each new residual is orthogonal to all the previousresiduals and search directions; and each new search direction is constructed (from the residual) to beA-orthogonal to all the previous residuals and search directions. The endpoints of r(2) and d(2) lie on aplane parallel to D2 (the shaded subspace). d(2) is a linear combination of r(2) and d(1). 20 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Conjugate Gradient MethodFSB• Because the search vectors are built from the residuals, the subspace span

r(0), r(1), ..., r(i−1) is equal to Di.• As each residual is orthogonal to the previous search directions, it is also

orthogonal to the previous residuals: (r(i))>r(j) = 0.• It can be shown that each new residual is a linear combination of the

previous residual and Ad(i−1):

r(i+1) = −Ae(i+1)

r(i+1) = −A(e(i) + αid(i))

r(i+1) = r(i) − αiAd(i)

• Since di ∈ D, each new subspace Di+1 is formed from the union of theprevious subspace Di and the subspace ADi.

• This subspace is called a Krylov subspace and it is a subspace created byrepeatedly applying a matrix to a vector.

• Because ADi is included in Di+1, the next residual r(i+1) is orthogonal toDi+1 means that r(i+1) is A-orthogonal to Di. Gram-Schmidt conjugationbecomes easy because r(i+1) is already A-orthogonal to all of the previoussearch directions except d(i).

21 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

PreconditioningFSB• Convergence of Krylov subspace methods depends (in a complicated way)

on spectral properties of the matrix (eigenvalue distribution, field of values,condition number). In practice, for large scale problems, CG methods arealways used with a preconditioner.

• The idea is to find an operator M such that M−1A has better (but stillunknown) spectral properties.

Where does the idea of preconditioning come from?

• For M = A:

M−1Ax = A−1Ax = Ix = M−1b

• The system would become ideal, since the matrix would reduce toan identity matrix and all subspace methods would deliver the truesolution in one single step.

• But, inverting A is too expensive!

• The hope is that for M in some sense close to A, Krylov method applied to

M−1Ax = M−1b

would need only a few iterations to yield a close enough approximation ofthe solution. 22 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

PreconditioningFSB

• To find an efficient linear operator M (called a preconditioner), itsproperties should satisfy:• M is a good approximation of A in some sense.• The cost of construction of M is not prohibitive.• The system My = z is much easier to solve than the original system.

• Except for some trivial situations, the matrix M−1A is never formedexplicitly! (We would usually get a dense matrix which would kill theefficiency.)

• Instead, for each required application of M−1A to some vector y:

M−1Ay→ Ay = w→M−1w = z

• apply A to y and calculate w.• apply M−1 to w and calculate z (done by solving Mz = w).

• Only very special and simple preconditioners can be applied explicitly to A(for example, a diagonal preconditioner).

23 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

PreconditioningFSBThe same preconditioner applied to the matrix will give the same eigenvalues,however, the convergence depends on the eigenvectors: specifically, on thecomponents of the starting residual in eigenvector directions. Differentimplementations of preconditioners can have quite different eigenvectors, andthus different convergence behaviour.

Left preconditioning• Apply the iterative method to

M−1Ax = M−1b.

• Note that with left preconditioning we are minimizing the preconditionedresidual M−1(b−Ax(k)), which may be quite different from the residualb−Ax(k). This could have consequences for stopping criteria based onthe norm of the residual.

Right preconditioning• Apply the iterative method to

AM−1y = b,Mx = y.

• The advantage of right preconditioning is that it affects only the operatorand not the right hand side.

• Note that the error norm in y may be much smaller than the error norm inx.

24 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

PreconditioningFSB

• Preconditioning can be considered as an attempt to stretch the quadraticform to make it appear more spherical (eigenvalues are more clustered,scaling along eigenvector axes). A perfect preconditioner is M = Abecause M−1A is a unitary matrix and the system can be solved in asingle iteration. But, computing A−1 is equivalent to solving the system.

• The simplest preconditioner is the Jacobi or diagonal preconditioner (whichscales the quadratic form along the coordinate axes). It’s fast becausediagonal matrix is easy to invert.

• A better preconditioner is Cholesky preconditioning which is a technique forfactoring a matrix into the form LLT , where L is a lower triangular matrix.It is a version of the LU factorization for symmetric matrices.

• The LU factorization is performed by a modified Gaussian elimination:eliminate the entries below the main diagonal to obtain the upper triangularmatrix U and remember the multiple of row j that is subtracted from row ito zero the coefficient aij . This multiple is the subdiagonal entry of L.

∀i,j>kaij = aij − aika−1kkakj (1)

25 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

Other variants of Krylov subspace solversFSB

Generalized Minimal Residual - GMRES• Applicable to unsymmetric systems.• Generates a sequence of orthogonal vectors, but in absence of symmetry all

previously computed vectors in the orthogonal sequence have to be retained.• Minimizes the residual norm.• Restarted versions of the method are used to save memory.

Biconjugate Gradient - BiCG• Replaces the orthogonal sequence of residuals by two mutually orthogonal

sequences, but at a price of no longer providing a minimization.• Update relations for residuals in CG are augmented in BiCG by relations

that are similar but based on AT instead of A.Biconjugate Gradient Stabilized - BiCGStab• Can be interpreted as a product of BiCG and repeatedly applied GMRES.

At least locally, a residual vector is minimized which leads to smootherconvergence.

26 / 27

OFW15 2020

Tessa Uroic,Ph.D.





Preconditioning


Summary

SummaryFSB

• CG solver - for symmetric matrices,• BiCG, BiCGStab, GMRES - for unsymmetric matrices,• use preconditioners with Krylov subspace solvers (diagonal or some variant

of ILU)

Used and recommended literature:• Y. Saad: Iterative methods for sparse linear systems, 2000• J.R. Shewchuk: An introduction to the Conjugate Gradient Method

without the agonizing pain, 1994• H.A. van der Vorst: Iterative Krylov methods for large linear systems, 2003

27 / 27

Date post:	23-Jul-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	1 times

Iterative linear solvers Conjugate Gradients - foam-extend.fsb.hr€¦ · Ph.D. Positive–...

Documents