Painless Conjugate Gradient Figs

transcript

8/22/2019 Painless Conjugate Gradient Figs

1/39

Classroom Figures for

the Conjugate Gradient Method

Without the Agonizing Pain

Edition 1 14

Jonathan Richard Shewchuk

August 4, 1994

School of Computer Science

Carnegie Mellon University

Pittsburgh, PA 15213

Abstract

This report contains a set of full-page figures designed to be used as classroom transparencies for teaching from the

article An Introduction to the Conjugate Gradient Method Without the Agonizing Pain.

Supported in part by the Natural Sciences and Engineering Research Council of Canada under a 1967 Science and Engineering

Scholarship and by the National Science Foundation under Grant ASC-9318163. The views and conclusions contained in this

document are those of the author and should not be interpreted as representing the official policies, either express or implied, of

NSERC, NSF, or the U.S. Government.


2/39

Keywords: conjugate gradient method, transparencies, agonizing pain


3/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

3

1

2

2

2

2

1

6

2 8

Sample 2-d linear system of the form

:

3 22 6

2

8


4/39

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

0

50

100

150

-

-2

0

2

4

6

1

2

1

Graph of quadratic form

12

. The

minimum point of this surface is the solution to

.

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

Contours of the quadratic form. Each ellipsoidal curve has

constant

.


5/39

-4 -2 2 4 6

-8

-6

-4

-2

2

4

6

1

2

Gradient

of the quadratic form. For every

, the

gradient points in the direction of steepest increase of

,

and is orthogonal to the contour lines.


6/39

(c)

1

2

1

(d)

1

2

1

(a)

1

2

1

(b)

1

2

1

(a) Quadratic form for a positive-definite matrix.

(b) For a negative-definite matrix.(c) For a singular (and positive-indefinite) matrix. A line

that runs through the bottom of the valley is the set of

solutions.

(d) For an indefinite matrix.


7/39

0.2 0.4 0.6

20

40

60

80

100

120

140

-4 -2 2 4 6

-6

-4

-2

2

4

-4 -2 2 4 6

-6

-4

-2

2

4

-2.50

2.55

-5

-2.5

0

2.50

50

100

150

-2.50

2.55

(c)

1

(d)

2

1

1

(a)

2

0

0

(b)

1

2

1

The method of Steepest Descent.


8/39

-2 -1 1 2 3

-3

-2

-1

1

2

1

2

Solid arrows: Gradients.

Dotted arrows: Slope along search line.


9/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

The method of Steepest Descent.


10/39

B v

B v

v

2

3Bv

is an eigenvector of with a corresponding eigenvalue

of

0

5. As increases,

converges to zero.

Bvv B v B v2 3

Here,

has a corresponding eigenvalue of 2. As increases,

diverges to infinity.

B x

B x2

3

x

Bxv v

1 2

1

2. One eigenvector diverges, so

also diverges.


11/39

-4 -2 2 4 6

-6

-4

-2

2

4

7

2

1

2

The eigenvectors of are directed along the axes of the

paraboloid defined by the quadratic form

. Each eigen-

vector is labeled with its associated eigenvalue.


12/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

(a)

2

0

470

47

-4 -2 2 4 6

-6

-4

-2

2

4

1

(b)

2

0

-4 -2 2 4 6

-6

-4

2

4

1

(c)

2

0

-4 -2 2 4 6

-6

-4

-2

2

4

1

(d)

2

1

-4 -2 2 4 6

-6

-4

-2

2

4

1

(e)

2

2

-4 -2 2 4 6

-6

-4

-2

2

4

1

(f)

2

Convergence of the Jacobi Method.

In (a), the eigenvectors of are shown with their corre-

sponding eigenvalues. These eigenvectors are NOT the

axes of the paraboloid.


13/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

Steepest Descent converges to the exact solution on the first

iteration if the error term is an eigenvector.


14/39

-4 -2 2 4 6

-4

-2

2

4

6

1

2

Steepest Descent converges to the exact solution on the first

iteration if the eigenvalues are all equal.


15/39

-6 -4 -2 2

-2

2

4

6

8

1

2

The energy norm of these two vectors is equal.


16/39

0

5

10

15

20

1

20

40

60

80

100

0

0.2

0.4

0.6

0.8

5

10

15

20

Convergence

of Steepest Descent.

is the slope of

with respect to the eigenvector axes.

is the condition number of .

Convergence is worst when

.


17/39

-4 -2 2 4

-4

-2

2

4

6

-4 -2 2 4

-4

-2

2

4

6

-4 -2 2 4

-4

-2

2

4

6

-4 -2 2 4

-4

-2

2

4

6

1

(c)

2

1

(d)

2

1

(a)

2

1

(b)

2

(a) Large

, small

.

(b) An example of poor convergence.

and

are bothlarge.

(c) Small

and

.

(d) Small

, large

.


18/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

Solid lines: Worst starting points for Steepest Descent.

Dashed lines: Steps toward convergence.Grey arrows: Eigenvector axes.

Here,

3

5.


19/39

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Convergence of Steepest Descent (per iteration) worsens as

the condition number of the matrix increases.


20/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

1

1

0

The Method of Orthogonal Directions.


21/39

-4 -2 2 4

-4

-2

2

4

1

2

These pairs of vectors are -orthogonal

-4 -2 2 4

-4

-2

2

4

1

2

because these pairs of vectors are orthogonal.


22/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

1

1

0

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

The method of Conjugate Directions converges in steps.

1 must be

-orthogonal to

0.


23/39

d

d

u

u

u

u

+

*

d0

1

(0)(0)

(1)

Gram-Schmidt conjugation of two vectors.


24/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

The method of Conjugate Directions using the axial unit

vectors, also known as Gauian elimination.


25/39

d

d(0)

(1)

e (2)e(0) (1)

e

0

The shaded area is

0

0

span

0

1

.

The ellipsoid is a contour on which the energy norm is

constant.

After two steps, CG finds

2

, the point on

0

that

minimizes

.


26/39

1

1

0

1

1

0

0

1

(a)

1

1

0

0

1

1

0

1

(b)

1

2

0

0

2

0

1

1

0

1

(c)

2

1

1

0

0

1

0

2

0

1

(d)

(a) 2D problem.

(b) Stretched 2D problem.

(c) 3D problem.

(d) Stretched 3D problem.


27/39

d

d

d

r

(0)

(1)

(2)

(2)

uu

10

u2

e

(2)

0

and

1

span the same subspace as 0

1 (the gray-

colored plane

2).

2

is -orthogonal to

2.

2

is orthogonal to

2.

2

is constructed (from 2) to be -orthogonal to

2.


28/39

d

d

dr

(0)

(1)

(2)(2)

rr

(0)(1)

e(2)

0

and

1

span the same subspace as

0

1

(the gray-

colored plane

2).

2

is -orthogonal to

2.

2

is orthogonal to

2.

2

is constructed (from

2

) to be -orthogonal to

2.


29/39

-4 -2 2 4 6

-6

-4

-2

2

4

1

2

0

The method of Conjugate Gradients.


30/39

2 7

-1

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

2 7

-1

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

2 7

-1

-0.75

-0.5

-0.25

0.25

0.5

0.751

2 7

-1

-0.75

-0.5

-0.25

0.25

0.5

0.751

(c)

2

(d)

2

(a)

0

(b)

1

The convergence of CG after iterations depends on how

close a polynomial

of degree

can be to zero on eacheigenvalue, given the constraint that

0

1.


31/39

-1 -0.5 0.5 1

-2

-1.5

-1-0.5

0.5

1

1.5

2

-1 -0.5 0.5 1

-2

-1.5

-1-0.5

0.5

1

1.5

2

-1 -0.5 0.5 1

-2

-1.5

-1

-0.5

0.5

1

1.5

2

-1 -0.5 0.5 1

-2

-1.5

-1

-0.5

0.5

1

1.5

2

10

49

2

5

Chebyshev polynomials of degree 2, 5, 10, and 49.


32/39

1 2 3 4 5 6 7 8

-1

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

2

The optimal polynomial

2

for

2 and

7

in the general case.

is reduced by a factor of at least 0.183 after two

iterations of CG.


33/39

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Convergence of Conjugate Gradients (per iteration) as a

function of condition number.

200 400 600 800 10000

5

10

15

20

25

30

35

40

Number of iterations of Steepest Descent required to match

one iteration of CG.


34/39

-4 -2 2 4 6

-8

-6

-4

-2

1

2

Contour lines of the quadratic form of the diagonally pre-

conditioned sample problem. The condition number has

improved from 3

5 to roughly 2

8.


35/39

-4

-2

0

2

4

6

-2

0

2

4

6

-250

0

250

500

-

-2

0

2

4

6

(a)

1

2

1

-4 -2 2 4 6

-2

2

4

6

1

(b)

2

0

-0.04 -0.02 0.02 0.04

-200

200

400

600

(c)

-4 -2 2 4 6

-2

2

4

6

1

(d)

2

0

The nonlinear Conjugate Gradient Method.

(b) Fletcher-Reeves CG.

(c) Cross-section of the surface corresponding to the first

step of Fletcher-Reeves.

(d) Polak-Ribiere CG.


36/39

-4 -2 2 4 6

-2

2

4

6

1

2

0

Nonlinear CG can be more effective with periodic restarts.


37/39

-1 -0.5 0.5 1 1.5 2

-1

-0.75

-0.5

-0.25

0.25

0.5

0.75

1

The Newton-Raphson method.

Solid curve: The function to minimize.Dashed curve: Parabolic approximation to the function,

based on first and second derivatives at

.

is chosen at the base of the parabola.


38/39

-1 1 2 3 4

-1.5

-1

-0.5

0.5

1

The Secant method.

Solid curve: The function to minimize.Dashed curve: Parabolic approximation to the function,

based on first derivatives at

0 and

2.

is chosen at the base of the parabola.


39/39

-4 -2 2 4 6

-2

2

4

6

1

2

0

The preconditioned nonlinear Conjugate Gradient Method.

Polak-Ribiere formula and a diagonal preconditioner.The space has been stretched to show the improvement

in circularity of the contour lines around the minimum.

Painless Conjugate Gradient Figs

Documents