Iterative Methods and Preconditioning [.2em] for Large and ...

Iterative Methods and Preconditioning [.2em] for Large and Sparse Linear SystemsLuca Bergamaschi
University of Padova
webpage: www.dmsa.unipd.it/˜berga
Sparse linear systems
Given a square n× n non singular matrix A and a vector b ∈ Rn the unique solution to the system
Ax = b
Factorize A = LDU
Then solve
What’s wrong with direct methods?
2 of 107
Sparse linear systems
We deal with sparse matrices that is matrices whose nonzero number (nz(A)) is
nz(A) = c · n
with c independent of n.
Matrices have large size, with n = 105, 106 (to be solved on PCs) n = 107, 108
(on supercomputers), n = 109, 1010 (e.g. on Marconi supercomputer at Bologna, Italy, more than 105 processors simultaneously running)
Can’t afford factorization of A
1 Factors L and U are not guaranteed to be as sparse as A =⇒ they can’t be stored.
2 Computation of L,U too costly O(n3) operations.
(Partial) remedy: reordering of the elements in matrix A so as to “minimize” the density of triangular factors.
Example. Matrix with n = 21355 and nz=4425300 (Approx. 60 Mb storage).
L and U factors need more than 5 Gb RAM.
Routine used: state-of-the-art HSL-ma57: multifrontal method with optimal reordering and dense linear algebra subroutines (BLAS) to optimize cache accesses.
3 of 107
Finite Element (FE)/ Finite Difference (FD) discretization of Partial Differential Equations (PDEs);
Constrained optimization problems;
Google PageRank matrix G. G is the adjacency matrix of the set of Web pages W . (in 2017, the size of W , n, was about 30 trillion (3× 1013).
gij =
{ 1 if there is a hyperlink from page i to page j
0 otherwise .
G is huge, but very sparse; its number of nonzeros is the total number of hyperlinks in the pages in W .
4 of 107
Rectangular domain discretized by triangular elements.
3433 35 363231
25
7 8 9
Nonzero of the discretization matrix are the diagonal elements plus:
aij =
{ nonzero if there is an edge from node i to node j
0 otherwise .
In row 15 the only (possible) nonzeros are: a15,8, a15,9, a15,14, a15,15, a15,16, a15,21, a15,22
At most 7 nonzeros per row whatever size of the problem n.
5 of 107
Linear Algebra Concepts and Notation
We indicate with Mn the set of real square n × n matrices.
Standard or Euclidean scalar product
vT w =< v,w >= n∑
Euclidean norm. It is associated with the standard scalar product
v2 = √
vT v =
This is a particular case of a p-norm:
vp = (|v1|p + |v2|p + . . .+ |vn|p)1/p
Other useful norm for p = 1 or p =∞: v1 = n∑
i=1
6 of 107
Some useful definitions
Definition A symmetric matrix is said to be symmetric positive definite (SPD in short) if
vTAv > 0 ∀v 6= 0.
In such case vA = √
vTAv defines a vector norm (Prove it as an exercise).
Definition A matrix A ∈Mn is said strictly diagonally dominant if
|aii | > n∑
|aik |, i = 1, . . . , n
Orthogonality
Definition The vectors v and w are said to be orthogonal if vT w = 0 and to be orthonormal if in addition, v = 1, w = 1.
The vectors v and w are said to be A-orthogonal if vTAw = 0.
Definition A n × n real matrix with orthonormal columns is called orthogonal (unitary if it is complex). For an orthogonal matrix U, we have UTU = UUT = I , where I is the identity matrix.
8 of 107
Eigenvalues and eigenvectors
Given a square matrix A, a vector u 6= 0 and a scalar λ ∈ C are said to be an eigenvector and an eigenvalue, respectively, if
Au = λu
Eigenvalues of symmetric matrices are real. Eigenvalues of a symmetric positive definite matrix are real positive.
A matrix is said to be normal if ATA = AAT (generally matrices DO NOT commute). All the symmetric matrices are normal.
A (normal) symmetric matrix A can be decomposed as
A = UT ΛU, where Λ = diag(λ1, · · · , λn), U = [u1, · · · , un]
Columns of U are the n linearly independent (orthogonal) eigenvectors of A so that UTU = I .
9 of 107
Matrix norms
A matrix norm is a function Mn → R satisfying for all matrices A,B and scalars α ∈ R the following properties
1 A ≥ 0, and A = 0 ⇐⇒ A = 0 (zero matrix)
2 αA = |α| · A 3 A + B ≤ A+ B 4 AB ≤ A · B
Example. The Frobenius norm is defined as AF =
√√√√ n∑ i,j=1
Matrix norms
Given a vector norm in Rn, The induced matrix norm is defined as
A = sup y∈Rn
The following norms are induced by the corresponding vector norms
A1 = max j
A2 = √ λmax(ATA)
To see the last definition, recall the variational characterization of the eigenvalues of a Hermitian matrix:
λmax(AHA) = max y 6=0
yHAHAy
The condition number κ(A) is defined as
κ(A) = AA−1
Matrix norms and eigenvalues
The Spectral Radius ρ(A) of a square matrix is
ρ(A) = max{|λ|, λ is an eigenvalue of A} = lim k→∞
Ak 1 k
Moreover
Theorem Given a square matrix and ε > 0. Then there is a matrix norm · such that
ρ(A) > A − ε.
12 of 107
Iterative Methods
An iterative method tries to find the unique solution x∗ of Ax = b by constructing a sequence of vectors:
x(1), x(2), · · · , x(k)
An iterative method is said to be convergent if
lim k→∞
x∗ − x(k) = 0
The error vector at each iteration is defined as e(k) = x∗ − x(k)
The residual vector at each iteration is defined as r(k) = b− Ax(k)
It can be easily proved that
lim k→∞
An iterative method never produces the “exact” solution even in “infinite” precision. (Direct factorization methods theoretically give the exact solution but working on a computer introduces numerical errors).
When employing an iterative method we can a priori choose the accuracy of our solution. Let us call tol the maximum error we allow in the solution. So we would like to have
e(k) < tol, (absolute error) or e(k) x∗
< tol (relative error)
However, since x∗, and hence e(k) are not known this exit test is not practical. We use
instead
r(k) < tol (test on the absolute residual) or r(k) b
< tol (test on the relative residual)
14 of 107
In which sense is this test reliable?
To answer to this question we have first to connect the residual with the error as r(k) = b− Ax(k) = Ax∗ − Ax(k) = Ae(k).
Then using the properties of matrix norms:
e(k) ≤ A−1r(k) (1)
Moreover note that
x∗ ≤ A b
e(k) x∗
= κ(A) r(k) b
15 of 107
Iterative Methods. Stopping criteria
Conclusion: Exit test on residual reliable if κ(A) not too large, otherwise better use
κ(A) r(k) b
but how can we estimate κ(A) (without knowing A−1)?
If we use the spectral norm · 2 recall that
κ(A) =
√ λmax(ATA)
λmin(ATA)
κ(A) = λmax(A)
λmin(A)
The problem reduces to (roughly) approximating the maximum and the minimum
eigenvalues of A.
16 of 107
Stationary Iterative Methods
A straightforward approach to an iterative solution of Ax = b is to rewrite it as a linear fixed point equation:
x = x− r = x− Ax + b = (I − A)x + b
from which the iterative method
xk+1 = (I − A)xk + b, k ≥ 0. (1)
= xk + rk . (2)
The method in this last form is known as the Richardson method which can be generalized to
xk+1 = xk + αk rk .
The method (1) is also a particular case of the general iteration
xk+1 = Hxk + c,
17 of 107
Stationary Iterative Methods
A = M − N
where M is invertible.
From Ax = b ⇐⇒ Mx = Nx + b ⇐⇒ x = M−1Nx + M−1b
the following iterative method can be defined:
x(k+1) = M−1Nx(k) + M−1b ≡ Hx(k) + q
H is called iteration matrix.
When H is constant the method is called stationary.
Equivalently we can write (since M−1N = I −M−1A)
x(k+1) = x(k) + M−1(b− Ax(k)) = x(k) + M−1r(k).
18 of 107
Lemma If H < 1 then I − H is nonsingular and
(I − H)−1 ≤ 1
H i = (I − H)−1.
The sequence (partial sum) Sk = ∞∑ i=0
H i is a Cauchy sequence in Rn×n, in fact, for all
m > k,
( 1− Hm−k
) → 0.
as m, k →∞. Hence Sk converges and its limit is S = (I − H)−1 since
HSk + I = Sk+1 =⇒ HS + I = S =⇒ S = (I − H)−1.
Finally (3) comes from
19 of 107
Approximate inverse
A direct consequence of the previous Lemma is the following result
Corollary
If H < 1 then the iteration xk+1 = Hxk + c converges to x = (I − H)−1c for all initial guesses x0.
As a consequence the Richardson iteration converges if I − A < 1.
Preconditioning: sometimes it is useful to premultiply the system by a suitable matrix obtaining
BAx = Bb.
in order to accelerate the iterative process.
Definition B is an approximate inverse of A if I − BA < 1.
20 of 107
The following result is known as the Banach Lemma.
Lemma Let B an approximate inverse of A, then A,B are both nonsingular and
A−1 ≤ B
1− I − BA
Proof. Using the previous Lemma with H = I − BA we have that I − H = BA is nonsingular so that both A and B are nonsingular.
The equality A−1B−1 = (BA)−1 = (I − H)−1 implies A−1 = (I − H)−1B.
Taking norms, and using again the Lemma
A−1 ≤ (I − H)−1B ≤ B
1− H =
A = L + D + U
D is the diagonal of A,
U is the strict upper triangular part of A.
Assume that aii 6= 0, i = 1, . . . , n.
L
U
D
Classical examples
1 Jacobi’s method. Define M = D and N = −(L + D). Then
x(k+1) = −D−1(L + U)x(k) + D−1b, k = 0, 1, . . .
2 Gauss-Seidel’s method. Define M = D + L and N = −U. Then
x(k+1) = −(D + L)−1Ux(k) + (D + L)−1b, k = 0, 1, . . .
22 of 107
Computational issues
A single Jacobi iteration is very cheap. It deals with sparse matrix-vector multiplications (number of multiplication order of nz(A)) and inversion of a diagonal matrix.
A single Gauss-Seidel iteration is also cheap. It needs solution of the sparse triangular linear system (D + L)x(k+1) = b− Ux(k).
Componentwise
Jacobi
aijx (k) j −
aijx (k+1) j −
n∑ j=i+1
Convergence
Theorem Stationary iterative methods converge for all initial guesses x(0) ∈ Rn if and only if
ρ(H) < 1.
Proof. Subtracting the equation Nx = Mx + b from Nx(k+1) = Mx(k) + b yields
Me(k+1) = Ne(k) =⇒ e(k+1) = M−1Ne(k) = He(k).
Recursively we have
k→∞ e(k) = 0, ∀e(0) ⇐⇒ lim
k→∞ Hk = 0 ⇐⇒ ρ(H) < 1.
Theorem If matrix A is strictly diagonally dominant then Jacobi and Gauss-Seidel iterations converge for every initial guess.
How fast are stationary methods?
Theorem
= ρ(H).
The asymptotic convergence factor µ = ρ(H) is average reduction factor of the error norm. The smaller µ the faster the convergence.
Definition The rate of convergence of a stationary method is R = − log10 ρ(H).
Practical consequences.
How many iterations must be performed in order to have e(k) e(0)
≤ 10−p?
− log10 µ p =
p
R .
In real problems, e. g. systems arising from discretization of PDEs, µ ≈ 1 using Jacobi/Gauss-Seidel iterations. Note that if µ = 1− ε then R ≈ ε.
25 of 107
Stationary Iterative methods
The so-called successive over-relaxation method (SOR in short) is a generalization of the Gauss-Seidel method:
x(k+1) = −(D + L)−1Ux(k) + (D + L)−1b = HSx(k) + qS
It is based on a real parameter ω.
SOR method
x(k+1) = −(D + ωL)−1 ((1− ω)D − ωU) x(k) + ω(D + ωL)−1b = HSORx(k) + qSOR
Componentwise:
(k) i +
aijx (k+1) j −
n∑ j=i+1
(k+1) i,S
26 of 107
Lemma (Kahan)
Theorem (Ostrowski)
If A is symmetric and positive definite then SOR is convergent for every ω in (0, 2).
Theorem (Young & Varga)
If A is 2-cyclic and consistently ordered then
λ(HGS ) = λ(HJ)2 (being HJ the Jacobi iteration matrix).
If in addition all the eigenvalues of the Jacobi iteration matrix are real and ρ(HJ) < 1 then
there is an optimal value of ω ∈ (1, 2): ωopt = 2
1 + √
ρ(HSOR(ωopt)) = ωopt − 1
Remark. Tridiagonal matrices and also matrices arising from FD discretization of the Laplace operator are 2-cyclic and consistently ordered.
27 of 107
Let us now focus on symmetric positive definite (SPD) matrices.
Let us define a function φ : Rn → R
φ(x) = 1
1
Note that
φ(x) > 0 ∀e 6= 0 (hence ∀x 6= x∗) by definition of SPD matrix and
φ(x) = 0 ⇐⇒ x = x∗ = A−1b.
Summarizing, φ has a unique minimum x = x∗, so that minimizing φ can be regarded as a method to solve the linear system Ax = b.
The following function
2 xTAx− xT b
differs from φ by a constant and hence attains the minimum at the same point x∗.
28 of 107
Suppose to have an approximate solution xk .
Seek for a more accurate approximation
xk+1 = xk + αkpk
by optimally choosing the direction pk and the scalar αk .
How do we choose search direction pk?
In the method of the steepest descent pk is the (minus) gradient of φ.
pk = −∇φ(xk )
As known the direction opposite to the gradient vector is the direction of greatest decrease
29 of 107
Steepest descent Directional Derivatives
For a function f (x1, x2, . . . , xn) the partial derivative with respect to x1 gives the rate of change of f in the x1 direction.
The rate of change of a function of several variables in an arbitrary unitary direction u is called the directional derivative in the direction u.
Duf = ∇f · u = ∂f
Duf = ∇f T u = |∇f ||u| cos θ
where θ is the angle between the gradient vector and u.
Duf takes on its greatest positive value if θ = 0. If θ = π it assumes the smallest (negative) value. So the direction of maximal decrease is when u = −∇f .
30 of 107
z = φ(x)
is an hyperellipsoid in the (n+1)-dimensional subspace whose minimum is in the solution x∗.
The level curves
31 of 107
−1.5
−1
−0.5
0
0.5
1
1.5
2
The SD method goes from xk toward the center of the hyperellipsoid going in the opposite direction to the gradient.
The gradient of φ at the point xk is
∇φ(xk ) = ∇ (
1
) = Axk − b = −rk
The direction is the residual of our linear system, namely pk = −∇φ(xk ) = rk .
32 of 107
How far must we go along this direction?
The next approximation is written as
xk+1 = xk + αk rk
where αk is a suitable scalar to be determined in order to minimize φ(x) along the search direction rk .
We have to perform a monodimensional minimization of function φ.
Formally αk = argmintφ(xk + trk )
Steepest descent
Write in detail φ(xk + trk ): it turns out to be a second order polynomial in the variable t.
g(t) = φ(xk + trk ) = (xk + trTk )A(xk + trk )− (xk + trTk )b
= xTk Axk + 2trTk Axk + t2rTk Ark − xTk b− trTk b
= t2rTk Ark + 2trTk (Axk − b) + xTk Axk − xTk b
= t2rTk Ark − 2trTk rk + const.
Being rTk Ark > 0, this parabola has a minimum and the minimum of g(t) is attained for
αk ≡ t = rTk rk
= (b− A(xk + trk )) · rk = rTk rk − trTk Ark
34 of 107
Steepest descent. Algorithm
The k + 1th iterate of the method of the steepest descent can be written as
rk = b− Axk
αk = rTk rk
Note that this iteration costs mainly two matrix-vector products.
We can save a matrix-vector product since
xk+1 = xk + αk rk
⇓ rk+1 = b− Axk+1 = b − Axk − Aαk rk = rk − αkArk .
Hence only a matrix-vector product by rk is required at each iteration.
Consecutive residuals rk+1, rk are orthogonal (proof by Exercise).
The error ek+1 is A orthogonal to the search direction rk . (proof by Exercise).
35 of 107
while rk > tol b and k < kmax do
1 z = Ark
4 rk+1 = rk − αkz
5 k = k + 1
Theorem√ 2φ(xk ) =
κ(A)− 1
κ(A) + 1
Order the eigenvalues of A as 0 < λn ≤ · · · ≤ λ2 ≤ λ1, then κ(A) = λ1
λn .
Again when dealing with system arising from PDEs, κ(A) may be very large.
Compute an estimate of the number of iteration needed to gain p digits in the approximation of the solution:
ekA e0A
log κ(A)− 1
( 1 0 0 9
guess x0 = (9, 1)T .
Find x1, x2. prove that xk = 0.8k × (9,−1)T . Verify the convergence bounds of previous frame
2 Solve the system Ax = 0 with A =
( 1 0 0 9
guess x0 = (9, 1/9)T .
−1.5
−1
−0.5
0
0.5
1
1.5
2
39 of 107
Krylov subspace methods
The Krylov subspace methods are considered currently to be among the most important iterative techniques available for solving large linear systems.
These techniques are based on projection processes, both orthogonal and oblique, onto Krylov subspaces.
Definition Km(A, v) = span{v,Av, . . . ,Am−1v} is the Krylov subspace of size m generated by the vector v.
A general projection method for solving the linear system
Ax = b
extracts an approximate solution xm from an affine subspace x0 + Km(A, v) of dimension m.
The solution xm is obtained by imposing the Petrov-Galerkin orthogonality condition
Lm ⊥ rm
where Lm is another subspace of dimension m.
A Krylov subspace method is a method for which the subspace Km is generated by the initial residual r0 = b− Ax0.
40 of 107
A−1b ≈ xm = x0 + qm−1(A)r0,
in which qm−1 is a certain polynomial of degree m − 1. Note that in the simplest case where x0 = 0, then A−1b ≈ xm = qm−1(A)b.
41 of 107
Krylov solvers for linear systems
Different versions of Krylov subspace methods arise from different choices of the subspace Lm. Most common choices for the space Lm:
Lm = Km.
Lm = AKm.
We will describe in depth two methods.
The Conjugate Gradient method (CG). Lm = Km.
The Generalized Minimal RESidual (GMRES). Lm = AKm.
They both minimize at the k-th iteration some measure of the error over the affine subspace
x0 +Kk (A, r0)
GMRES method
42 of 107
The Conjugate Gradient (CG) Method
The Conjugate Gradient method can be viewed as an improvement of the SD method.
We start from a similar recurrence relation between consecutive iterates.
xk+1 = xk + αkpk
Now we allow our search direction to be different from the gradient, namely
p0 = r0
pk = rk + βk−1pk−1, k > 0
Differently from SD we require that the error at step k + 1 is also A-orthogonal to the previous direction pk−1.
M. R. Hestenes and E. Stiefel,
Methods of Conjugate Gradient for Solving Linear Systems
J. Res. Nat. Bur. Standard, 1952
43 of 107
We impose A orthogonality of two consecutive directions:
βk−1 = − rTk Apk−1
pT k−1Apk−1
From this definition of βk−1 we have that pT k−1Apk = 0 i.e. pk and pk−1 are
A-orthogonal.
As in the SD case choice of αk is aimed to minimize φ(xk+1) = φ(xk + αkpk ) giving the following formula
αk = rTk pk
pT k Apk
44 of 107
45 of 107
The finite termination property of CG
To prove the most important properties of the CG method we have to premise the following important
Lemma Given a subspace of dimension m spanned by v1, . . . , vm and V =
[ v1, . . . , vm
] and a
vector x written as x0 plus a generic vector in the subspace i.e. x0 + V y, y ∈ Rm then
xA is minimum ⇐⇒ VTAx = 0.
Proof.
x2 A = (x0 + V y)TA(x0 + V y) ≡ f (y).
x2 A is minimum if and only if ∇f = 0. Now
∇f = 2VTA(x0 + V y) = 2VTAx.
46 of 107
The finite termination property of CG
Theorem The CG algorithm generates the exact solution x∗ in at most n steps. Moreover the sequences ek , rk and pk satisfy:
eTk+1Apj = 0, pT k+1Apj = 0, rTk+1rj = 0, ∀j ≤ k.
The error at step k + 1 lies in the space e0 + A span{e0,Ae0, . . . ,Ak+1e0} and it is the vector with the minimum A-norm in this space.
Proof. Assume that {rj} 6= 0, j ≤ k.
We first prove the thesis with j = k, i.e. that
eTk+1Apk = 0, pT k+1Apk = 0, rTk+1rk = 0.
The relation pT k+1Apk = 0 immediately follow from the definition of βk . Moreover
rTk+1rk = 0 (prove it by exercise). Finally
eTk+1Apk = (ek + αkpk )TApk = eTk Apk + αkpT k Apk = −rTk pk + αkpT
k Apk = 0,
Now it remains to prove by induction that
eTk+1Apj = 0, pT k+1Apj = 0, rTk+1rj = 0, ∀j ≤ k − 1
Basis of the induction.
pT 1 Ap0 = 0 by the definition of β0
rT0 r1 = rT0 (r0 − α0Ar0) = rT0 r0 − α0rT0 Ar0 = 0 since α0 = rT0 p0
pT 0 Ap0
We assume as inductive hypothesis that
eTk Apj = 0, pT k Apj = 0, rTk rj = 0, ∀j ≤ k − 1
Note that the first relation also implies that rTk pj = 0, ∀j ≤ k − 1.
48 of 107
Then for j ≤ k − 1 we have
eTk+1Apj = (ek + αkpk )TApj = eTk Apj + αkpT k Apj = 0 (=⇒ rTk+1pj = 0);
rTk+1rj = rTk+1
( pj − βj−1pj−1
pT k+1Apj = (rk+1 + βkpk )TApj = rTk+1Apj + βkpT
k Apj = rTk+1
) = 0 using rTk+1rk = 0 and (4)
Finally it is easily proved that the space span{p0, . . . , pk} ≡ span{e0,Ae0, . . . ,Ak+1e0}.
Hence ek+1 is A-orthogonal to span{e0,Ae0, . . . ,Ak+1e0}, therefore, for the previous Lemma, it is the unique minimizer of e0 + span{e0,Ae0, . . . ,Ak+1e0}A.
Important consequence
The CG method finds the “exact” solution (in exact arithmetics!!) in at most n iterations.
In fact from the above theorem we have that rn = 0, pn = 0 which implies xn = x∗.
However, in floating point arithmetics, rounding error destroys the finite termination property.
Useful relations.
αk = rTk pk
pT k Apk
pT k Apk
since pT k rk = rTk rk . In fact from eTk Apj = 0 immediately follows rTk pj = 0 and
hence pT k rk = (rk + βk−1pk−1)T rk = rTk rk
2 An alternative (more stable) formulation for βk is
βk = − rTk+1Apk
pT k Apk
= rTk+1rk+1
Algorithm: Conjugate Gradient
r0 = p0 = b − Ax0, k = 0
1 z = Apk
zT pk 3 xk+1 = xk + αkpk
4 rk+1 = rk − αkz
5 βk = rTk+1rk+1
rTk rk 6 pk+1 = rk+1 + βkpk
7 k = k + 1
From the previous Theorem we have
e = e0 + Pk−1(A)r0 = e0 + Pk−1(A)Ae0 = Qk (A)e0 (5)
with Qk a polynomial of degree k satisfying Qk (0) = 1.
The minimizing property of the Conjugate Gradient method can be therefore restated as
ekA = min Pk∈Πk ,Pk (0)=1
Pk (A)e0A (6)
where Πk is the set of polynomials of degree k.
The initial error e0 can be expressed as a linear combination of the eigenvectors of A as
e0 = n∑
j=1
52 of 107
Then (6) becomes:
Pk (A) k∑
n∑ j=1
max j |Pk (λj )|
max x∈[a,b]
where 0 < a ≤ λmin(A) ≤ λmax(A) ≤ b.
53 of 107
This minimization problem is solved by Chebyshev polynomials defined as
Tk (t) =
cosh (k arccosh(t)) = 1
satisfying the following three-term recurrence relation:
Tk+1(t) = 2tTk (t)− Tk−1(t).
A suitable shifted and scaled Chebyshev polynomial provides the minimum of (7).
Theorem Let [α, β](6= ∅) in R and let γ any scalar outside [α, β]. Then the minimum
min P∈Πk ,P(γ)=1
max t∈[a,b]
Tk (t) = Tk
) Tk
) . Moreover, if γ < α there holds
max Tk (t) = Tk
)−1
= Tk
CG convergence
Theorem The A-norm of the error at the k-th CG iterate satisfies:
ekA e0A
λmin(A) is the spectral condition number of A.
Proof. First use Theorem 22 with γ = 0, α = λmin(A), β = λmax(A) to state that
ek ≤ Tk
( β + α
β − α
Tk (t) = 1
.
It is useful to compute an estimate of the number of iteration needed to gain p digits in the approximation of the solution. To this end we define the asymptotic convergence rate as
R = − log10(µ)
being µ the average error reduction factor. In practice R−1 is an estimate of the number of iterations needed to reduce the norm of the error by an order of magnitude.
Here, neglecting the factor 2 in (8) which is negligible for large k, µ =
√ κ+ 1 √ κ− 1
R = − log(µ)
log 10 = −
ekA e0A
2 p (√ κ(A) + 1
Chebyshev polynomials are not optimal Example
Suppose eigenvalues of A are 1, 2, 10. Scaled and shifted 2nd order Chebyshev polynomial on [1, 10] has the following form
T2(x) = 2 (
161 ≈ 0.503
A sharper bound would be obtained by a polynomial for which max{|P2(λi )|, i = 1, 2, 3} is minimum. Given P2(x) = 1 + ax + bx2, we look for optimal a, b by imposing the conditions |Q2(1)| = |Q2(2)| = |Q2(10)| i.e. by solving the following linear system (Q2(1) =) 1 + a + b = γ
(Q2(2) =) 1 + 2a + 4b = −γ (Q2(10) =) 1 + 10a + 100b = γ
which gives
a = − 33
42 , b =
57 of 107
0.5
1
1.5
Summing up:
≤ 0.503 Chebyshev polynomial ≤ 0.2857 “optimal” polynomial = 0.2857(1) Computed ratio
(1) Maximum reduction over 1000 CG runs with random right hand side and random initial guess.
58 of 107
Convergence of the CG Method
A bound of the error norm reduction can be given in terms of shifted and scaled Chebyshev polynomials which satisfy a so called minimax condition.
Theorem
.
Estimate of the number of iteration needed to gain p digits in the approximation of the solution:
ekA e0A
(√ κ(A)− 1 √ κ(A) + 1
log
2 p (√ κ(A) + 1
Preconditioning
Fast convergence of the CG method is guaranteed if κ(A) is sufficiently small.
To obtain a faster convergence we would like
either to reduce the condition number
or to obtain a clustering of the majority of the eigenvalues around a single value
Preconditioning a linear system consists in (pre)multiplying the system by a nonsingular matrix, identified by M−1, thus yielding an equivalent linear system.
M−1Ax = M−1b (9)
The preconditioner M−1, should be also easily (cheaply) invertible, or in other words, should allow a fast solution of a linear system of the form My = c.
60 of 107
Problem. The new system matrix no longer symmetric.
If M is spd then we are allowed to write M = UDUT , with D diagonal and dii > 0. There exists an spd matrix, M1/2, M1/2 = UD1/2UT such that M = M1/2M1/2.
System (1) can be written as
M−1/2M−1/2Ax = M−1/2M−1/2b
or, premultiplying by M1/2,
M−1/2Ax = M−1/2b
Let us now define: x′ = M1/2x, b′ = M−1/2b, we can rewrite the system as
M−1/2AM−1/2x′ = b′ (10)
with the system matrix now spd (and also similar to M−1A).
61 of 107
x′k = M1/2xk ,
p′k = M1/2pk ,
r′k = b′ − Bx′k = M−1/2b−M−1/2AM−1/2M1/2xk = M−1/2rk
α′k = r′k
r′Tk r′k =
rTk M−1/2M−1/2rk =
rTk+1M −1rk+1
rTk M−1rk
x′k+1 = x′k + αkp′k =⇒ M1/2xk+1 = M1/2xk + αkM 1/2pk =⇒ xk+1 = xk + αkpk
r′k+1 = r′k − αkBp′k =⇒ M−1/2rk+1 = M−1/2rk − αkM −1/2AM−1/2M1/2pk
=⇒ rk+1 = rk − αkApk
p′k+1 = r′k+1 + β′p′k =⇒ M1/2pk+1 = M−1/2rk+1 + β′M1/2pk
=⇒ pk+1 = M−1rk+1 + β′pk
GOOD NEWS: computing M1/2(M−1/2) not really needed.
Preconditioning CG
In Input: x0,A,M, b, kmax, tol
r0 = b− Ax0, p0 = M−1r0, k = 0, ρ0 = rT0 M−1r0
1 zk = Apk
zTk pk
3 xk+1 = xk + αkpk
4 rk+1 = rk − αkzk
5 gk+1 = M−1rk+1, ρk+1 = rTk+1gk+1
6 βk = rTk+1M
ρk
8 k = k + 1
1 3 scalar products + 3 daxpys = 6 O(n) operations
2 1 matrix vector product
3 1 application of preconditioner (solution of Mgk+1 = rk+1).
How to choose the preconditioner?
It should mediate between two opposite requirements:
1 Reducing the number of iterations (by reducing the condition number – clustering eigenvalues).
2 Allowing cheap solution of system Mgk+1 = rk+1.
64 of 107
Classical preconditioners
Jacobi preconditioner M = D where D is the (spd) diagonal matrix satisfying
dii = aii
Computation of M−1rk+1 costs O(n) operations and hence is very cheap.
If A is diagonally dominant, this preconditioner is expected to reduce the condition number of the system
Approximate inverse preconditioners They build M−1 = ZTZ where Z is a sparse lower triangular matrix approximating L−1.
AINV approach (Benzi et al.) Z is constructed by means of a sparse Gram-Schmidt orthogonalization.
Particularly suited to parallel computation (its application needs two matrix-vector products).
65 of 107
Incomplete Cholesky preconditioner
M = LLT where L is a sparse approximation of the triangular factor obtained by the Cholesky factorization.
Common choice (−→ IC(0) preconditioner): set to zero values in L where also the coefficient in A is zero:
aij = 0 =⇒ lij ≡ 0
The IC factorization may not be properly defined even if A is spd (Existence results are known for M-matrices).
Computation of gk+1 = M−1rk+1 is performed by a sequential solution of two sparse triangular linear systems: from gk+1 = (LLT )−1rk+1 we obtain LLT gk+1 = rk+1 and hence:
Ly = rk+1
Computational cost:
The cost of a single iteration is roughly doubled with respect to the non-preconditioned CG.
The condition number may decrease of 1-2 orders of magnitude =⇒ remarkable decrease in the number of iterations.
66 of 107
The diffusion equation
A number of different physical processes can be described by the diffusion equation:
∂u(~x , t)
∂t −∇ · (K∇u(~x , t)) = f (~x), ~x ∈ , t ∈ (0,T )
Here u might represent the temperature distribution at time t in an open domain ⊂ Rd , to which an external heat source f is applied.
The positive definite matrix K(~x) is the thermal conductivity of the material.
To determine the temperature at time t, we need to know an initial temperature distribution u(~x , 0) and some boundary conditions, of the form
u(~x , t) = 0 on ∂1 (Dirichlet BCs) ~∇u(~x , t) · ~n = 0 on ∂2 (Neumann BCs)
where ~n is the outward normal to the domain .
67 of 107
Laplace Equation: Finite Difference discretization
A simplified form of the diffusion equation is the Laplace equation (setting K ≡ I and f ≡ 0), whose solution gives the steady state solution.
Let be an open subset of R2 and Γ its boundary (consider only Dirichlet BCs).
∂2u
u(x , y) = g(x , y), (x , y) ∈ Γ.
In the Finite Difference discretization the domain is subdivided in rectangles whose vertices form a grid as in Figure.
68 of 107
Laplace Equation: Finite Difference discretization
For every internal gridpoint (i , j) the partial derivative is replaced by a standard centered difference approximation formula in both directions:
∂2u
∂x2 =
x2 ,
∂2u
∂y2 =
y2
= (0, 1)× (0, 1) is the unit square
the grid spacing x = y ≡ h is constant.
We call the number of gridpoints along the x and y axes nx, which is equal to
nx = 1
h − 1.
The Laplace equation on node (i , j) is then approximated by
ui+1,j + ui−1,j + ui,j+1 + ui,j−1 − 4ui,j = 0, 1 ≤ i , j ≤ nx.
The result is a system with as many rows as the interior nodes of the grid.
69 of 107
Linear system arising from FD discretization
Every interior node produces a linear equation with the values of temperature at nodes as unknowns.
Temperature on boundary nodes is known and contributes to the right hand side.
Now u is a vector with global ordering of the unknowns (see previous figure).

Turning now to the diffusion equation:
After discretization in the space, the PDE is converted in a system of ODEs like
∂u
Stiffness matrix A is symmetric definite negative.
Finally discretization in time transforms the system of ODEs to a linear system of size equal to the number of gridpoints to be solved at each time step.
Time discretization divides the time interval (0,T ] into subintervals of length ti so
that solution in time is computed at tk = ∑k
i=1 ti and uses Finite Differences to approximate the time derivative.
For example applying the implicit Euler method (u ≈ u(tk+1)− u(tk )
tk ).
71 of 107
Eigenvalues and condition number of the Laplacian
The 2D FD discretization of the (minus) Laplace equation takes a general form (in the case x = y = h):
A = − 1
. . . . . .
. . . . . .
1 −4
Note that A is SPD as the Laplacian operator is symmetric negative definite.
Eigenvalues of S and hence those of A are explicitly known since S is tridiagonal and A block tridiagonal:
λj,k (H) = 4
Eigenvalues and condition number of the Laplacian
Theorem The smallest and largest eigenvalues of A behave like: λmin ≈ 2π2, λmax ≈ 8h−2.
Proof.
λmax = 8
h2 sin2
The condition number of A behaves like
κ(A) = 4
π2 h−2 + O(1) (this also holds in 3-dimensions).
Hence the number of iteration of the CG for solving Ax = b is proportional to h−1 ≈
√ n for the 2D Laplacian.
73 of 107
Finite Difference discretization of the steady-state Laplace equation on a unitary square domain with nx ∈ {100, 200, 400, 800}.
Solver: CG (no preconditioner), PCG accelerated with IC preconditioner provided by MatLAB L = ichol(A,struct(’type’,’ict’,’droptol’,tol)); based on drop tolerances of tol = 10−2, 10−3.
nx ≈ h−1 n CG PCG (10−2) PCG (10−3) 98 9 604 370 65 27
198 39 204 719 120 47 398 158 404 1365 230 88 798 636 804 2655 409 169
Table: Number of iterations of CG and PCG with the IC factorization
Note from the table the dependence of the number of iterations on h−1 ≈ √ n.
74 of 107
Numerical Exercise
Exercise 1.
1 Implement in MatLAB the Jacobi and SOR methods (recall Gauss-Seidel is equivalent to SOR with ω = 1).
2 Generate as A the matrix discretizing the Laplacian operator. This matrix can be obtained by the following MatLAB commands
G = numgrid ( ’ S ’ , 3 0 ) ; A = d e l s q (G) ;
This will produce an SPD matrix with h = 1 30−1
of size n = (30− 2)2 = 784 (it
is 2-cyclic and consistently ordered).
3 Write a script which solves the linear system Ax = b with A (with b = A ∗ ones(n, 1)) with the Jacobi, Gauss-Seidel and SOR (ω = 1.2) methods. Produce a semilogarithmic picture with the convergence profile of the three methods.
4 Evaluate the spectral radius of the Jacobi iteration matrix. With this information it is possible to compute the number of iterations needed to reduce the error of p orders of magnitude. Compare the this value with the actual number of iterations. The same can be done for GS and SOR by Young-Varga theorem.
75 of 107
Numerical Exercise
Exercise 2.
1 Referring to the previous matrix. Compute the optimal value of ω and run the SOR method with this value.
2 Implement the (preconditioned conjugate gradient method). Verify the correctness of your implementation by comparing it with the MatLAB PCG method (see the help).
3 Solve the system Ax = b with the CG method (no preconditioning).
4 Solve the system Ax = b with the CG method with the incomplete Cholesky preconditioner (ichol, with no fill-in).
5 Realize a semilogarithmic figure with the convergence profiles of the three methods.
6 How many iterations do we expect for the CG without preconditioning?
76 of 107
The so-called successive over-relaxation method (SOR in short) is a generalization of the Gauss-Seidel method:
xk+1 = −(D + L)−1Uxk + (D + L)−1b = EGSxk + qGS
It is base on a real parameter ω.
SOR method
xk+1 = −(D + ωL)−1 ((1− ω)D − ωU) xk + ω(D + ωL)−1b = ESORxk + qSOR
Componentwise:
Theorem (Young & Varga)
If A is block tridiagonal with nonsingular diagonal blocks and all the eigenvalues of the Jacobi iteration matrix are real then
1 λ(EGS ) = λ(EJ)2
2 If in addition ρ(EJ) < 1 then there is an optimal value of ω:
ωopt = 2
Recall
. . . . . .
. . . . . .
1 −4
Since D = diag(H) = −4h2I , then eigenvalues of EJ = I − D−1A are
λj,k (E) = 1− sin2
What about CG? Asymptotically the reduction factor is µ =
√ κ− 1 √ κ+ 1
≈ 1− πh
Some results
2D discretization of the diffusion equation by FD, with h = 0.005, tol = 10−8
Results:
Method iterations CPU time comments Jacobi 100000 262.01 rk > 10−6
Gauss Seidel 62207 186.36 SOR 817 2.68 ω = 1.969 ≈ ωopt
SOR 1819 5.98 ω = 1.95 SOR 1207 3.91 ω = 1.98
CG 357 1.61 no prec CG 146 1.38 IC(0) prec
2D discretization of the diffusion equation by FD, with h = 0.0025, tol = 10−8
Results
Method iterations CPU time comments SOR 1614 73.16 ω = 1.984 ≈ ωopt
79 of 107
Results and Comments
Results with h = 0.00125, tol = 10−8
Method iterations CPU time comments SOR 3572 605.40 ω = 1.992 ≈ ωopt
Comments
Theoretically SOR(ωopt) is the fastest method with CG very close.
ωopt is always very difficult to assess.
Bounds on SOR is sharp while that on CG is tight.
CG can be preconditioned while SOR can not.
CG better than the estimates also when eigenvalues are spread into the spectral interval.
Dependence of condition number on h−2 (and the number of iterations on h−1) only alleviated by preconditioning.
SOR is dramatically dependent on ω. Solving the FD-discretized diffusion equation, it takes advantage on the a priori knowledge of the spectral interval.
80 of 107
CG is expected to converge only on SPD matrices. Why?
One reason is that for non SPD matrices it is impossible to have at the same time orthogonality and a minimization property by means of a short term recurrence.
CG at iteration k minimizes the error on a subspace of size k and constructs an orthogonal basis using a short-term recurrence.
Extensions of CG for general nonsymmetric matrices:
1 CG method applied to the (SPD) system ATAx = AT b (Normal equations). This system is often ill-conditioned, in fact.
κ2(ATA) =
√ λmax(ATA)2
λmin(ATA) = A2 · A−12 = κ(A)2.
Also difficult to find a preconditioner if ATA could not be explicitly formed.
2 Methods that provide orthogonality + minimization by using a long-term recurrence (GMRES)
3 Methods that provide (bi) orthogonality. Examples: BiCG, BiCGstab
4 Methods that provide some minimization properties. Examples: QMR, TFQMR.
The GMRES method
Given an arbitrary nonzero vector v, a Krylov subspace of dimension m is defined as
Km(v) = span ( v,Av,A2v, . . . ,Am−1v
) The GMRES (Generalized Minimal RESidual) method finds the solution of the linear system
Ax = b
by minimizing the norm of the residual rm = b−Axm over all the vectors xm written as
xm = x0 + y, y ∈ Km(r0)
where x0 is an arbitrary initial vector and Km is the Krylov subspace generated by the initial residual. First note that the basis
{r0,Ar0,A 2r0, . . . ,A
82 of 107
Theorem
If the eigenvalues of A are such that |λ1| > |λ2| ≥ · · · then Ak r0 = v1 + O ( |λ2| |λ1|
)k ,
with v1 the eigenvector corresponding to λ1.
To compute a really independent basis for Km we have to orthonormalize such vectors using the Gram-Schmidt procedure.
This algorithm is known as the Arnoldi method.
1: β = r0, v1 = r0/β
2: for k = 1 : m do 3: wk+1 = Avk
4: for j = 1 : k − 1 do 5: hjk = wT
k+1vj
8: hk+1,k = wk+1; vk+1 = wk+1/hk+1,k
9: end for
W. E. Arnoldi,
The Principle of Minimized Iteration in the Solution of the Matrix Eigenvalue Problem,
Quart. Appl. Math., 1951
Proposition
If the Arnoldi process does not stop before the m-th step then the vectors v1, . . . , vm form an orthonormal basis of the Krylov subspace Km. Moreover, the Arnoldi algorithm breaks down at step j (i.e. hj+1,j = 0) then Kj is invariant under A (AKj = Kj ).
Theorem
The new vectors vk satisfy: Avk = k+1∑ j=1
hjkvj , k = 1, . . . ,m
Proof. In fact from step 4. of the Gram Schmidt algorithm we have
wk+1 = Avk − k∑
hjkvj
Now substituting wk+1 = hk+1,kvk+1 we obtain hk+1,kvk+1 = Avk −
k∑ j=1
and Hm = (hjk ).
AVm = Vm+1Hm
which also implies VT m AVm = Hm.
Hm is a rectangular m + 1×m matrix. It is a Hessenberg matrix since it has the following nonzero pattern:
Hm =
0 0 . . . . . . 0 0 0 hm,m−1 hmm
0 0 0 0 hm+1,m

=
86 of 107
The Full Orthogonalization Method (FOM)
After the Arnoldi process, different selection of the subspace Lm yields different methods. Consider e.g. the choice Lm = Km.
This gives raise to the the Full Orthogonalizaton Method which can be easily developed by imposing the orthogonality Km ⊥ rk :
0 = VT m rm = VT
m (r0 − AVmy) = VT m r0 − VT
m AVmy.
which gives
m r0 = H−1 m VT
m e0,
where Hm is the (square) matrix obtained by dropping the last row in Hm.
Finally
m r0.
Let us turn to the GMRES method.
It aims at minimizing rm among all the xm:
xm = x0 + m∑ j=1
yivi
Now
= r0 − AVmy =
GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems
SIAM Journal on Scientific and Statistical Computing, 1982
88 of 107
Minimizing the residual
rm = √
rTmrm =
=
Comments:
Hm has more rows than columns then Hmy = βe1 has (in general) no solutions.
Minimization problem y = argminβe1 − Hmy is very small m = 20, 50, 100 as compared to the original size n. Computational solution of this problem will be very cheap.
89 of 107
Least square minimization
Any (even rectangular) matrix H can be factorized as a product of an orthogonal matrix Q and an “upper triangular” matrix R (known as QR factorization).
When H is not square the resulting R has as many final zero rows as the gap between rows and columns.
Let us now factorize our Hm as: Hm = QR (computational cost O(m3)). Then, in view of the orthogonality of Q:
min βe1 − Hmy = min Q ( βQT e1 − Ry
) = min g − Ry
R =
. . . . . . . . . 0 rmm

The solution to min g − Ry is simply accomplished by solving Ry = g where R is obtained from R by dropping the last row and g the first m components of g.
This last system being square, small, and upper triangular, is easily and cheaply solved.
Finally note that min rm = |gm+1|.
91 of 107
Obtaining GMRES by the orthogonality condition
The GMRES method can be developed in an alternative way by exploiting the orthogonality condition Lm ≡ AKm ⊥ rm which writes.
VT m AT (r0 − AVmy) = 0.
This gives
m AT r0
mVT m+1r0
= (HT mHm)−1HT
mVT m+1r0
= (HT mHm)−1HT
mβe1,
which is mathematically equivalent to the solution of the least square problem
Hmy = βe1.
Algorithm: GMRES Input: x0,A, b, kmax, toll
r0 = b− Ax0, k = 0, ρ0 = r0, β = ρ0, v1 = r0
β
while ρk > toll b and k < kmax do
1 k = k + 1 2 vk+1 = Avk 3 for j = 1, k
hjk = vT k+1vj
vk+1 = vk+1 − hjkvj
end for 4 hk+1,k = vk+1 5 vk+1 = vk+1/hk+1,k
6 zk = argminβe1 − Hkz 7 ρk = βe1 − Hkzk
end while
xk = x0 + Vkzk
93 of 107
Practical GMRES implementations.
GMRES is optimal in the sense that it minimizes the residual over a subspace of increasing size (finite termination property).
GMRES is a VERY computationally costly method.
1 Storage: It needs to keep in memory all the vectors of Krylov basis. When the number of iterations becomes large (order of hundreds) the storage may be prohibitive.
2 Computational cost. Main operations at each iteration
1 One matrix-vector product 2 At iteration k: k scalar products and k updates of vectors
The cost of a Gram-Schmidt orthogonalization increases with the iteration number k (O(kn)) so again problems arise when the number of iteration is high.
Practical implementations fix a maximum number of vectors to be kept in memory, say p. After p iterations, xp is computed and a new Krylov subspace is being constructed starting from rp .
p is the Restart parameter.
In this way however, we loose optimality with consequent slowing down of the method.
94 of 107
A = VΛV−1
where Λ is the diagonal matrix of eigenvalues and columns of V are normalized eigenvectors of A. Then
rk = min Pk
pk (Λ) · r0
or rk r0
≤ κ(V ) min Pk
max |pk (λI )|
Matrix V may be ill-conditioned
We cannot simply relate convergence of GMRES with the eigenvalue distribution of A.
GMRES converges in at most s iterations where
s is the number of distinct eigenvalues of A (if diagonalizable) s is the degree of the minimal polynomial of A (if not)
95 of 107
Effect of restart
Nonsymmetric matrix with n = 7100 and nz = 35104, κ(A) = 1.98× 104.
0 1000 2000 3000 4000 5000 6000 7000 8000
10 -6
10 -4
10 -2
10 0
restart=10 restart=20 restart=40 restart=80 restart=160
restart iter CPU 10 7337 3.1018 20 3664 2.1784 40 1954 1.8013 80 1043 1.7907
160 592 2.0677
96 of 107
Also for general systems preconditioning is premultiplying the original system by a nonsingular matrix.
Here preconditioning, more than reducing condition number of A is aimed at clustering eigenvalues away from the origin.
Common general preconditioners.
1 Diagonal: M = diag(Ai), with Ai the ith row of A.
2 ILU decomposition: M = LU (based on pattern and/or on dropping tolerance).
3 Approximate inverse preconditioners
Computational cost. Instead of applying A one should apply M−1A to a vector (step 2. of algorithm) as
t = Avk , Mvk+1 = t
if M = LU (ILU preconditioner) step 2. becomes:
1: v = Avk 2: Solve Lw = v; 3: Solve Ut = w;
Other methods
MINRES. Well suited for symmetric but indefinite (with both negative and positve eigenvalues) linear systems. It needs however an SPD preconditioner.
C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations, SIAM J. Numerical Analysis 12, 617-629, 1975
BiCG (Roger Fletcher, 1976). A short-term recurrence constructs two sequences of conjugate directions and residuals by means of a biorthogonalization process. It requires multiplication by AT .
Bi-Conjugate Gradient Stabilized (BiCGSTAB). It is a stable variant of BiCG avoiding erratic convergence behaviour.
Sleijpen and van der Vorst. Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems SIAM J. Sci. Stat. Comput, 1992
· · · and many others · · ·
98 of 107
The MINRES method
It is an efficient variant of GMRES for solving symmetric (not SPD) linear systems.
If A is symmetric but indefinite, Tm ≡ Hm = HT m hence it is tridiagonal.
As a consequence the current vector vk+1 must be orthogonalized only against the vector vk . Hence the cost of a single MINRES iteration is constant (like the CG method).
Basic operations for the construction of the orthonormal Krylov basis:
1: vk = vk/βk
4: βk+1 = vk+1
Prove by exercise that the choice above of βk yields orthogonality bewteen vk−1 and vk+1.
As in the case of GMRES, the MINRES method minimizes the norm of the residual rm by finding the least square solution to the overdetermined linear system
Tmym = βe1 (11)
99 of 107
The tridiagonal matrix Tm takes the form
Tm =
rm = Vm+1(βe1 − Tmy). (12)
In analogy with the GMRES method, and recalling that VT m+1Vm+1 = Im+1 the
solution ym can be found by performing a QR factorization of Tm.
Moreover the solution of the least square problem min βe1 − Tmy can be performed in an incremental way (taking into account the QR factorization at previous step).
100 of 107
rk r0
max x∈[a,b]
Here we allow zero to belong to [a, b].
Worst-case MINRES convergence behavior: replace the discrete set of the eigenvalues by the union of two intervals containing all of them and excluding the origin, say
I−
with λmin ≤ λs < 0 < λs+1 ≤ λmax .
When both intervals are of the same length, i.e., λmax − λs+1 = λs − λmin , the following bound for the min-max value holds
rk r0
max x∈I−
Suppose that |λmin| = λmax = 1 and |λs | = λs+1
Then the length of the two intervals is κ = λ−1 s+1 and the right-hand side of (13)
reduces to
)bk/2c ,
which corresponds to the value of right hand side of the CG bound at step bk/2c for an SPD matrix having all its eigenvalues in the interval [λ2
s+1, 1] .
In the general case when the two intervals are not of the same length, the explicit solution of the min-max approximation problem on I− ∪ I+ becomes quite complicated,
and no simple and explicit bound on the MINRES convergence is known.
Convergence analysis of Krylov subspace methods
Jorg Liesen and Petr Tichy
GAMM, 2014
102 of 107
Preconditioner for MINRES
When A is symmetric and indefinite, any preconditioner for MINRES must be symmetric and positive definite. This is necessary since otherwise there is no equivalent symmetric system for the preconditioned matrix.
Thus a nonsymmetric iterative method (e.g. GMRES) must be used even if a symmetric and indefinite preconditioner is employed for a symmetric and indefinite matrix in general.
A preconditioner for a symmetric indefinite matrix A for use with MINRES therefore can not be an approximation of the inverse of A, since this is also indefinite.
With a symmetric and positive definite preconditioner, the preconditioned MINRES convergence bounds as above become
rkM−1
r0M−1
max x∈[a,b]
max x∈I−
I+ |Pk (x)|
where the intervals I− and I+ now refer to the eigenvalue distribution of the preconditioned matrix.
In summary: a good preconditioner must yield two intervals I− = [−β,−α], I+ = [a, b]
for which the quantity β · b α · a
is as small as possible.
Block Diagonal preconditioner: spectral analysis
MINRES particularly suited for indefinite saddle point linear systems like[ A BT
B 0
] .
with A ∈ Rn×n SPD and B ∈ Rm×n(m < n) rectangular.
Optimal preconditioner:
Theorem An eigenvalue of M−1H belongs to {1, 1−
√ 5
104 of 107
Proof of the Theorem
Proof. λ ∈ σ(M−1H) satisfies Hu = λMu, for some u 6= 0:[
A BT
B 0
Bu1 = λSu2
If u2 = 0 then from the first equation we have Au1 = λAu1 which implies λ = 1.
Assume now u2 6= 0. Then multiplying the first equation by BA−1 on the left yields
Bu1 + Su2 = λBu1
Now substituting Bu1 with λSu2 from the second equation we get
(λ2 − λ− 1)Su2 = 0, which gives λ = 1± √
5
2 .
Note. This preconditioner is ideal since its applications requires solving for A and S. In the practice A and S are computed as cheap approximations of A and S , respectively, and M is defined as
M =
MINRES paricularly suited for symmetrice indefinite linear system.
Example: The Stokes problem
Finite Element discretization yields a highly indefinite matrix H:
H =
∫ ψi ∂φj
∂x , by ij = −
∫ ψi ∂φj
G =
[ IC(F ) 0
0 IC(F )
] The Schur complement is not explicitly computed. It is approximated by the mass matrix Q
qij =
M =
Andrew J. Wathen, David J. Silvester, and Howard C. Elman
Finite Elements and Fast Iterative Solvers: With Applications in Incompressible Fluid Dynamics
Oxford University Press, 2nd edition, 2014
107 of 107

Date post:	27-Mar-2022
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Iterative Methods and Preconditioning [.2em] for Large and ...

Documents