4 Solving linear systems - sorbonne-universite systems.pdf · sparse linear systems are presented...

4

Solving linear systems

”We must admit with humility that, while number ispurely a product of our minds, space has a reality outsideour minds.”

Carl Friedrich Gauss (1777-1855)

”Almost everything, which the mathematics of our cen-tury has brought forth in the way of original scientificideas, attaches to the name of Gauss.”

Leopold Kronecker (1823-1891)

The problem treated in this chapter is the numerical resolution systems of mlinear equations in n variables, of the form

A x = b ,

where A ! Rm!n, b ! Rm and x ! Rn is the unknown solution vector. Suchsystems of equations can be encountered in almost every problem in scientificcomputing. Among the situations in which the numerical resolution of suchproblems is required appear notably the approximation of the solution ofordinary or partial di!erential equations by finite di!erence, finite element orfinite volume methods (Chapters 5, 6, ??). Even the solution of a nonlinearproblem is usually obtained by solving a sequence of linear systems (Newton’smethod). Since the algorithms for solving linear systems are widely used ina large range of applications, the methods must then be e!cient, accurate,reliable and robust.

Only in the case where the matrix A has full row and column rank, i.e.,rank(A) = m = n, does the linear system A x = b have a unique solutionfor any right hand side b. When rank(A) < n, the system either has manysolutions, is undetermined or has no solution, is overdetermined.

Two classes of methods for solving systems of linear equations are of in-terest: direct methods and iterative methods. In a direct method, the matrixof the initial linear system is transformed or factorized into a simpler form,involving diagonal or triangular matrices, using elementary transformations,which can be solved easily. The exact solution is obtained in a finite numerof arithmetic operations, if not considering numerical rounding errors. Themost proeminent direct method is the Gaussian elimination. Iterative meth-ods, on the other hand, compute a sequence of approximate solutions, which

Page: 101 job: book macro: svmono.cls date/time: 11-Jun-2009/15:34

102 4 Solving linear systems

converges to the exact solution in the limit, i.e., in practice until a desiredaccuracy is obtained.

During a long time, direct methods have been preferred to iterative meth-ods for solving linear systems, mainly because of their simplicity and robust-ness. However, the emergence of conjugate gradient methods and Krylov sub-space iterations provided an e"cient alternative to direct solvers. Nowadays,iterative methods are almost mandatory in complex applications, notably be-cause of memory and computational requirements that prohibit the use ofdirect methods. They usually involve a matrix-vector multiplication proce-dure that is cheap to compute on modern computer architectures. When thematrix A is very large and is composed of a majority of nonzero elements, theLU factorization for example would contains much more nonzero coe"cientsthan the matrix A itself. Nonetheless, in some peculiar applications, very ill-conditioned matrices arise that may required a direct method for solving theproblem at hand.

In the important case where the matrix A is symmetric positive definite,about half of the work can be spared. If only a fraction of the elements ofA are nonzeroes, then the linear system is called sparse. Nowadays, withoutthe knowledge and the exploitation of the sparsity of the matrix A, manyapplication problems could not be solved.

Numerical methods for solving linear systems are a good illustration ofthe di!erence between analytical mathematics and ”engineering” numericalanalysis. Actually, significant progress in the design of algorithms have beenobtained in the last decades, thanks to the advent of e"cient computer archi-tectures. Again, we face this intriguing context where some perfectibly theo-retically sound methods reveal useless for computing the numerical solution.

In this chapter, we consider matrices with real or complex entries andtherefore, we denote K a field that is either R or C. In Section 1, the elementaryproperties of finite-dimensional vector spaces and matrix algebra are brieflyreviewed, and statements may be given without proofs. When the matrix A issquare dense and without apparent structure, the Gaussian elimination andLU factorization methods are likely to be the methods of choice, as will be seenin Section 2. Classical iterative algorithms and projection methods for solvingsparse linear systems are presented in Section 3. Finally, a small section isdevoted to methods for computing eigenvalues.

4.1 A linear algebra primer

Before introducing various methods for solving linear systems of the formAx = b, we propose a brief overview of the main results in linear algebrasuitable for our purpose. We refer the reader to classical courses in linearalgebra and matrix computations for further details (see the bibliographysection at the end of this chapter).


4.1 A linear algebra primer 103

Since we refer frequently to vectors and matrices, let us recall some con-ventional notations. A lowercase letter like x will always denote a vector andits jth component will be written xj . Vectors are almost always consideredas column vectors. A matrix is denoted by an uppercase lettre like A and its(i, j)th element will be written aij . Mm,n(K) will denote the set of m " n(real or complex) matrices. Any matrix A !Mm,n(K) may be defined by itscolumns cj ! Km as A = [c1| . . . |cn]. Given a matrix A = (aij), (At)ij = aji

and (A")ij = aji will denote the transpose of the matrix A and the conjugatetranspose of A, respectively. If A ! Mm,n(K), then |A| ! Mm,n(K) denotesthe matrix of absolute values of entries of A: (|A|)ij = |aij |.

4.1.1 Basic notions

Definition 4.1.1. A vector space over the field K is a nonempty set V inwhich addition and mutiplication are defined and such that for all vectorsu, v ! V and any scalars !," ! K, the following properties must hold:

1. addition is commutative and associative;2. additive identity: u + 0 = u, 0 is the zero vector;3. additive inverse, for any u, there exists #u such that u + (#v) = 0;4. distributivity properties:

$! ! K, $u, v ! V, !(u + v) = !u + !v

$!," ! K, $u ! V, (! + ")u = !u + "v .

5. assocative property:

$!," ! K, $u ! V, (!")u = !("u) .

6. scalar multiplication identity: 1u = u.

The set W of linear combinations of a system of p vectors of V , {u1, . . . , up},is a subspace of V called the span of the vector system and denoted by

W = span{u1, . . . , up} = {w =p!

i=1

!iui , with !i ! K} .

A set of m vectors {u1, . . . , um} of V is called linearly independent if noneof its element can be expressed as a linear combination of the other vectors,i.e. if the relation

m!

i=1

!iui = 0 ,

with (!i)1#i#m ! K implies that every !i = 0. Otherwise, it is called lin-early dependent. A basis of V is then a linearly independent subset of V thatspans V .



Let u = (u1, . . . , un)t be a vector over a field K. We denote u" ! Kn theadjunct of the column vector u such that u" = (u1, . . . , un) and the transposeof u the row vector ut = (u1, . . . , un). We recall the definition of the matrix-vector and matrix-matrix products. Given a matrix A ! Mm,n(K) and avector u ! Kn, the vector v = Au ! Km is such that:

vi =n!

j=1

aij uj , i = 1, . . . ,m .

and given B !Mn,p(K) we define C = AB !Mm,p(K) as:

cij =n!

k=1

aik bkj , i = 1, . . . ,m j = 1, . . . , p .

In the canonical basis of Kn, the dot product, also known as the scalar product,of two vectors u, v ! Kn is denoted (u, v) and is the scalar defined as:

(u, v) =n!

i=1

ui v"i .

We consider the case of square matrices A having n rows and n columnsthat belong to the set Mn(K), that is a noncommutative algebra for themultiplication. The neutral element is usually denoted In and is defined byits entries (#i,j)1#i,j#n, where #i,j is the Kronecker symbol. We recall that asquare matrix A ! Mn(K) is said to be invertible (or nonsingular) if thereexists a matrix B !Mn(K) such that AB = BA = In. This inverse matrixB is denoted A$1. In contrast, a noninvertible matrix is called singular.

Definition 4.1.2. Suppose A !Mn(K) is a square matrix.

• A is a normal matrix if and only if AA" = A"A,• A is a unitary matrix if and only if AA" = A"A = In. Moreover, if K = R,

A is a orthogonal matrix and AAt = AtA = In and At = A$1,• A is a Hermitian matrix if and only if A" = A. Moreover, if K = R, A is

a symmetric or self-adjoint matrix and At = A.

Proposition 4.1.1. Every Hermitian matrix A is a normal matrix.

Proof. Suppose A !Mn(K) is such that A" = A. Then, we have

AA" = AA = A"A .

%&

In many applications, we will consider sparse matrices, i.e., matrices primarilyfilled with zeros. In particular, a matrix A = (aij) ! Mn(K) is a lowertriangular matrix if and only if aij = 0, 1 ' i < j ' n, A is a upper triangularmatrix if and only if aij = 0, 1 ' j < i ' n and A is a diagonal matrixif aij = 0 for i (= j. By contrast, a matrix A that is mainly populated bynonzeros if called a dense matrix.



Lemma 4.1.1. Suppose L and U are two invertible lower triangular and up-per triangular matrices in Mn(K), respectively. Then, L$1 (resp.U$1) is alower (resp. upper) triangular matrix.

Proof. We show the first assertion only. Suppose A = (aij) is the inverse ofL = (lij), we have then:

#i,j =n!

k=1

aiklkj =j!

k=1

aiklkj , $ 1 ' i, j ' n ,

Hence, for j = 1 and i > 1 we have, ai1l11 = 0 yielding to ai1 = 0. Likewise,for j = 2 and i > 2 we have ai1l12 + ai2l22 = ai2l22 = 0 and thus ai2 = 0. Byrecurrence, we show that for every j > i, we have aij = 0. It is then easy toconclude that A is the inverse of L and is a lower triangular matrix. %&

The trace of a square matrix A = (aij) ! Mn(K) is the sum of its diagonalelements:

trA =n!

i=1

aii .

The determinant of a square matrix A = (aij) !Mn(K) is defined as

det A =!

!%Sn

$(%)n"

i=1

ai!(i) ,

where $(%) = (#1)p(!) is the signature of the permutation %, equal to 1 or-1, Sn is the set of all permutations of the set {1, 2, . . . , n} into itself and thenumber p(%) is the number of inversions in %. Hence $(%) is 1 if % is even and-1 if % is odd.

Lemma 4.1.2. Given A and B two square matrices in Mn(K). Then

1. det(AB) = (detA)(detB) = det(BA);2. det(At) = detA;3. A is invertible if and only if det A (= 0.

The kernel, or null space, of a matrix A !Mn(K) is the set denoted KerA ofvectors x ! Kn such that Ax = 0. The range of A is the set denoted Im A ofvectors y ! Kn such that y = Ax, for x ! Kn. By definition, the dimension ofthe space Im A is called the rank of A and is denoted by rankA.

Lemma 4.1.3 (invertible matrix). For any square matrix A ! Mn(K),the following assertions are equivalent:

1. A is invertible;2. Ker A = {0};3. Im A = Kn;4. there exists B !Mn(K) such that AB = In and BA = In (and B = A$1).



Lemma 4.1.4. Let A and B be two invertible matrices in Mn(K). Then

(AB)$1 = B$1 A$1 .

The characteristic polynomial of A !Mn(K) is the polynomial PA(&) ofdegree n defined on K by

PA(&) = det(A# &In) .

The n roots &i ! K of this polynomial, not necessarily distinct, are calledthe eigenvalues of A. A vector x ! Kn, x (= 0, such that Ax = &x is theeigenvector of A associated with the eigenvalue &. There exists at least onesuch eigenvector. The set of eigenvalues of A is called the spectrum of A andis denoted Sp(A):

Sp(A) = {&i ! K ; 1 ' i ' n ;)xi ! Kn : xi (= 0 , Axi = &ixi} .

The spectral radius of A is the maximum of the moduli of the eigenvalues ofA:

'(A) = max1#i#n

{|&i| ; &i ! Sp(A)} .

Remark 4.1.1. 1. if A !Mn(R), A may have complex eigenvalues;2. the characteristic polynomial PA is invariant under any basis change, i.e.,

for any invertible matrix P we have:

det(P$1AP # &In) = det(A# &In) ;

3. the eigenvalues of a Hermitian matrix are all real.

The vector subspace defined by R" = Ker(A # &In) is the eigensubspaceassociated with the eigenvalue &. Moreover, the vector subspace

F" =#

k&1

Ker(A# &In)k

is the generalized eigenspace associated with the eigenvalue &.

Theorem 4.1.1 (Schur’s theorem). If A ! Mn(K) is a square matrix,then there exists a unitary matrix U !M(K) such that U"A U is triangularand its diagonal entries are all eigenvalues of A.

Corollary 4.1.1. If A ! Mn(K) is a Hermitian matrix, then there exists aunitary matrix U ! M(K) such that U"A U is diagonal and its entries areall eigenvalues of A.

Theorem 4.1.2 (Diagonalization). A ! Mn(K) is normal if and only ifthere exists a unitary matrix U !M(K) such that

A = U diag(&1, . . . ,&n) U$1

where the &i are the eigenvalues of A.



Lemma 4.1.5. For any matrix A !Mm,n(K), the matrix A"A is Hermitianand has real, nonnegative eigenvalues.

The singular values of A !Mm,n(K) are the nonnegative square roots of then eigenvalues of A"A. It is easy to see that the singular values of a normalmatrix are the moduli of its eigenvalues.

Lemma 4.1.6 (SVD decomposition). Let A ! Mm,n(K) be an arbitrarymatrix, with m > n, having r positive singular values µi. Then, there existstwo unitary matrices U ! Mn(K) and V ! Mn(K) and a diagonal matrix( = diag(&1, . . . ,&r) !Mn(R), where µ1 * µ2 * · · · * µr * 0 such that

A = V (U" .

If m < n, the singular value decomposition (SVD) is defined by taking A"

and deduce the result by taking the adjoint.

We observe that the rank of A is equal to r, the number of nonzero singularvalues of A.

The SVD decomposition has also a geometric interpretation. Let Sn$1 bethe unit sphere in Rn, i.e.,

Sn$1 = {x = (x1, . . . , xn)t ! Rn ;!

i

x2i = 1} .

Then the image of the unit sphere Sn$1 by a nonsingular matrix A is anellipsoid centered at the origin of Rn with semiaxes µivi, where vi is the ithcolumn of V .

The SVD can be used to define the pseudoinverse of a matrix A ! +, \(K).Let A = V (U" be the SVD factorization of A, then the pseudoinverse of Ais the matrix A+ ! Mn,m(K) defined by A+ = U(+V ", with (+ is thepseudoinverse of ( obtained by replacing every nonzero entry in ( by itsreciprocal (its inverse). We have the following properties.

Proposition 4.1.2. Let A !Mm,n(K) be an arbitrary matrix having r posi-tive singular values µi. The following identities hold:

A+A = U(+(U" =r!

i=1

ui u"i ;

AA+ = V ((+V " =r!

i=1

vi v"i ;

A =r!

i=1

µiviu"i , and A+ =

r!

i=1

1µi

uiv"i .

(4.1)

Furthemore, if A has maximal rank (r = n ' m), its pseudoinverse is thengiven by:

A+ = (A"A)$1A" .



4.1.2 Vector and matrix norms

Norms will be primarily used to measure errors in matrix computations. Werecall the definition of a norm on a vector space Kn (see Chapter 1, Section1.1.4).

Definition 4.1.3. A mapping denoted by , ·, : Kn - R is called a norm ifit satisfies all of the following:

1. for any x ! Kn, ,x, * 0, (nonnegativity);2. ,x, = 0 if and only if x = 0, (nondegeneracy)3. for every & ! K, ,!x, = |!|,x,, (multiplicativity);4. for every x, y ! Kn, ,x + y, ' ,x,+ ,y,, ( triangle inequality).

We recall the definition of the inner product.

Definition 4.1.4. Suppose V is a vector space over the field R. An innerproduct on V is a positive definite bilinear form (·, ·)V : V " V - R thatsatisfies the following properties, for all vectors x, y ! V

1. (x, y)V = (y, x)V , ( symmetry);2. (x, x)V > 0 and (x, x)V = 0 . x = 0, (positive definiteness)

Proposition 4.1.3. Suppose (·, ·)V is an inner product on V . Then

$x ! V , ,x,V = (x, x)1/2V ,

defines a norm on V . Furthermore we have the Cauchy-Schwarz inequality

$x, y ! V , |(x, y)V | ' ,x,V ,y,V ,

with equality only if x and y are linearly dependent.

Proof. The Cauchy-Schwarz inequality is trivial to prove in the case y = 0.Suppose (y, y) (= 0, thus posing ! = (y, y)$1

V (x, y)V gives

0 ' (x# !y, x# !y)V = (x, x)V # (y, y)$1V |(x, y)V |2 ,

which is true only if |(x, y)V |2 ' (x, x)V (y, y)V and the results follows. %&

If V = Kn is endowed with a scalar or Hermitian product (·, ·) (and the sub-script (·, ·)V is omitted on purpose), then the mapping x /- (x, x)1/2 definesa norm on Kn. The following norms are most important:

• the Euclidean norm: ,x,2 = (n!

i=1

|xi|2)1/2;

• the )p-norms defined as: ,x,p =

$n!

i=1

|xi|p%1/p

, p * 1;

• the )'-norm: ,x,' = max1#i#n

|xi|,



!!

"1

1

"1

1

!1

!2

Fig. 4.1. Unit disks of R2 for the norms !1, !2 and !!.

The unit disks of R2 for these norms are represented in Figure 4.1. A classicalresult about )p-norms is the Holder inequality :

,xty,1 ' ,x,p,y,q ,1p

+1q

= 1 , $x, y,! Kn .

and an important special case (p = q = 2) is the Cauchy-Schwarz inequality:

,xty,1 ' ,x,2,y,2 .

Theorem 4.1.3. All norms are equivalent on Kn, i.e., for all pairs of norms, ·, #, , ·, $ on Kn, there exist two constant c1 and c2 such that 0 < c1 ' c2,and for all x ! Kn we have:

c1,x,# ' ,x,$ ' c2,x,# .

Example 4.1.1. For the previous )p-norms, we have the constants:

,x,' ' ,x,p ' n1/p,x,',x,2 ' ,x,1 '

0n,x,2 .

Since the space of matrices Mm,n(K) is a vector space isomorphic toKm!n, the definition of a matrix norm shall be equivalent to the definition ofa vector norm.

Definition 4.1.5. A mapping , ·, : Mmn(K) - K is a matrix norm if thefollowing properties hold:

1. ,A, * 0, and ,A, = 0 if and only if A = 0;2. ,!A, = |!|,A,, for any scalar !;3. ,A + B, ' ,A,+ ,B,, for all A, B !Mm,n(K).

Additionally, if A, B !Mn(R), the matrix norm satisfies



4. ,AB, ' ,A,,B, and then the matrix norm is called a submultiplicativenorm.

The most frequently used matrix norms are

• the Froebenius (or Schur) norm, ,A,F =

&

'm!

i=1

n!

j=1

|aij |2(

)1/2

;

• the )p-norm, ,A,p =

&

'm!

i=1

n!

j=1

|aij |p(

)1/p

, for p * 1;

• the )'-norm, ,A,' = max1#i#m,1#j#n

|aij |;

It is wise to define matrix norms that are subordinate to a vector norm in Kn.Definition 4.1.6 (induced norm). Suppose , ·, is a vector norm on Kn. Itinduces the matrix norm , ·, subordinate to the vector norm , ·, defined by

,A, = supx%Kn

,Ax,,x, .

For convenience, we use the same notation for vector norms and matrix norms.Proposition 4.1.4. For any pair of vector norms , · ,# ! Km, , · ,$ ! Kn,we have

,Ax,$ ' ,A,#,$,x,#

where , ·, #,$ is a matrix norm defined by

,A,#,$ = supx(=0

,Ax,$

,x,#,

that is subordinate to the vector norms.Since the set {x ! Kn ; ,x,# = 1} is compact and , ·, $ is continuous, itfollows that

,A,#,$ = max)x)!=1

,Ax,$ = ,Ay,$

for some y ! Kn having a unit norm.Proposition 4.1.5. Froebenius and )p-norms satisfy the following properties,for A !Mm,n(K):

,A,2 ' ,A,F '0

n,A,2max

i,j|aij | ' ,A,2 '

0mn max

ij|aij |

,A,1 = max1#j#n

m!

i=1

|aij | and ,A,' = max1#i#m

n!

j=1

,aij |

10n,A,' ' ,A,2 '

0m,A,'

10m,A,1 ' ,A,2 '

0n,A,1



Theorem 4.1.4. If A ! Mm,n(K), then there exists a unit )2-norm vectorz ! Kn such that

A"Az = µ2z where µ = ,A,2 .

This result implies that ,A,22 is a zero of the polynomial

P (&) = det(A"A# &In) .

In particular, the )2-norm is the square root of the largest eigenvalue of A"A:

,A,2 =*

'(A"A) =*

'(AA") .

Proof. Consider K = C. We have then

,A,22 = supx%Cn

,Ax,22,x,22

= supx%Cn

(Ax)"(Ax)x"x

= supx%Cn

x"A"Ax

x"x.

We check that (A"A)" = A"(A")" = A"A. Hence, the matrix A"A is Hermi-tian and diagonalisable. There exists U , U"U = In and such that

UA"AU = diag(µk)

where the values µk are the singular values of the matrix A and thus theeigenvalues of A"A. It yields

supx%Cn

x"A"Ax

x"x= sup

x%Cn

x"U"UA"AU"Ux

x"U"Ux

and by posing y = Ux we have

,A,22 = supy%Cn

y"UA"AU"y

y"y= sup

y%Cn

+nk=1 µk|yk|2+n

k=1 |yk|2

where the µk are the eigenvalues of A"A and by considering the vector(0, . . . , |µk|, . . . , 0)t we obtain ,A,22 = '(AA"). %&

Corollary 4.1.2. If A !Mm,n(K), then ,A,2 ' (,A,1,A,')1/2.

In the next paragraph, we consider square matrices only.

Definition 4.1.7 (convergence). A sequence of matrices (Ak)k&1 convergesto a limit A for a matrix norm , ·, if

limk*'

,Ak #A, = 0

and we write A = limk*'Ak.

Obviously, since Mn(K) is a vector space of finite dimension, all norms areequivalent and thus the notion of convergence does not depend on the normconsidered. The following result gives a su"cient condition for a sequence ofiterated powers of a matrix to converge to 0.



Lemma 4.1.7. Suppose A !Mn(C). The following assertions are equivalent:

1. limn*'

An = 0,2. lim

n*'Anx = 0, for all x ! Cn;

3. '(A) < 1;4. there exists at least one subordinate norm such that ,A, < 1.

Proposition 4.1.6. Let A !Mn(K) be a matrix such that ,A,p < 1. Then,the matrix In #A is nonsingular and we have

(In #A)$1 =+'!

n=0

An with ,(In #A)$1,p '1

1# ,A,p.

Notice that ,(In#A)$1# In,p ' ,A,p/(1#,A,p). Hence if $ 1 1 then O($)perturbations in In induce O($) pertubations in the inverse.

Lemma 4.1.8. If A is nonsingular and r = ,A$1E,p < 1, then A + E isnonsingular and we have

,(A + E)$1 #A$1,p ',E,p ,A$1,2p

1# r.

4.1.3 Conditioning issues

We now turn to an important issue in numerical analysis, to see how numer-ically well-conditioned the problem at hand is. For instance, the conditionnumber associated with the system Ax = b provides a bound on the dis-crepency between the exact and the numerical solution, it is a measure of theaccuracy of the computation, before considering round-o" errors. It is indeeda property of a matrix.

Definition 4.1.8. The quantity cond(A) = ,A,·,A$1, is called the conditionnumber of the matrix A with respect to the matrix norm , · , subordinate tothe vector norm , ·, .

Consider an invertible matrix A !Mn(K) and a vector b ! Kn. Let x ! Kn

be the solution of the linear system Ax = b given by x = A$1b. Given a smallperturbation #b of b, we denote x + #x the solution of

A(x + #x) = b + #b .

For any vector norm ,·, and its induced matrix norm ,·,, our goal is to boundthe relative change ,#x,/,x, with respect to the relative error ,#b,/,b,.Bylinearity and using the triangle inequality or vector norms, we write

A#x = #b 2 ,#x, ' ,A$1,,#b,

and also



Ax = b 2 ,b, ' ,A,,x, ,

which is equivalent to1,x, ' ,A, 1

,b, .

We can further rearrange these inequalities to

,#x,,x, ' ,A$1,,A,,#b,,b, ' cond(A)

,#b,,b, .

We have the following properties.

Proposition 4.1.7. Let A ! Mn(K) a square invertible matrix and , · , amatrix norm subordinate to a vector norm , ·, . Then we have

1. cond(A$1) = cond(A);2. cond(!A) = cond(A), for any scalar ! ! K;3. cond(In) = 1;4. cond(A) * 1;5. a linear sytem Ax = b is well-conditioned (resp. ill-conditioned) if cond(A)

is low (resp. high).

The condition number cond(A) defines the rate of changes in the solution xwith respect to a change in the data b. It has a main drawback though. Itinvolves ,A$1, that is usually di"cult to calculate. Nevertheless, for a normalmatrix and the )2 matrix norm we have the following results.

Proposition 4.1.8. Consider the )2 vector norm and its induced matrixnorm.

1. if A !Mn(K) thencond2(A) =

µmax

µmin

where µmax, µmin are the maximal and minimal singular values of A,respectively.

2. if A !Mn(K) is a Hermitian matrix, we have

,A,2 = '(A) .

3. if A !Mn(R) is a symmetric invertible matrix, then

cond2(A) =|&max||&min|

= '(A)'(A$1) , (4.2)

where &max, &min are maximal and minimal (by moduli) eigenvalues ofA, respectively.

4. for any unitary matrix U , cond2(U) = 1;5. for any unitary matrix U , cond2(AU) = cond2(UA) = cond2(A).



Proof. We prove the identities 3 and 4.Since A is a Hermitian matrix, there exists U such that UU" = In andU"AU = diag(&i), and the (&i)i are the eigenvalues of A. We have

,A,22 = supx%Kn

,Ax,22,x,22

= supx%Kn

(Ax)"(Ax)x"x

.

However,x"A"Ax = x"UU"A"UU"AUU"x

= (U"x)"(diag(&i))"(diag(&i))U"x

and U"AU = (U"AU)". We pose y = U"x thus leading to

x"A"Ax = y" diag(&i) diag(&i)y ,

and the results follows:

,A,22 = supy%Kn

y" diag(|&i|2)yy"y

= '(A)2 .

Suppose A !Mn(R) is a invertible symmetric matrix. We apply the previousresult to the Hermitian inverse matrix A$1 to obtain

,A$1,2 = '(A$1) =1

|&min|,

since 1/&i is an eigenvalue of A$1 and we deduce

cond(A) = ,A,2,A$1,2 =|&max||&min|

that is the desired result. %&

Remark 4.1.2. The identity (4.2) is optimal, as for any matrix norm we have

cond(A) = ,A,,A$1, * '(A)'(A$1) .

Lemma 4.1.9. Conditionings cond1, cond2 and cond' are equivalent and wehave, for any matrix A

1n

cond2(A) ' cond1(A) ' n cond2(A)

1n

cond'(A) ' cond2(A) ' n cond'(A)

10n

cond1(A) ' cond'(A) ' n2 cond1(A) .

Proof. The inequalities result from the equivalences between matrix and vec-tor norms. %&


4.2 Direct methods 115

4.2 Direct methods

The algorithms presented in this section are called direct methods because inthe absence of rounding errors they would finally give the exact solution xof the problem Ax = b after a finite number of elementary operations. Theprinciple of direct methods is to find an invertible matrix M such that MA isan upper triangular matrix. This is called the elimination procedure. Hence,it remains to solve a linear system

MAx = Mb

using a back-substitution procedure, so called because the unknowns (the com-ponents of x) are computed in backward order, from xn to x1). Notice that inpractice, the matrice M is not explicitly evaluated, only the matrix MA andthe right-hand side vector Mb are calculated.

The matrix interpretation of the Gauss pivoting method is the LU fac-torization that shows that every invertible matrix can be decomposed as theproduct of a lower triangular matrix L by an upper triangular matrix U .This simplification can be further extended to the case of symmetric positivedefinite matrices, it is then called the Cholesky factorization.

Remark 4.2.1. We shall indicate that

1. the resolution of Ax = b, with A !Mn(K) is not obtained by computingA$1 and then calculating x = A$1b. The evaluation of A$1 is indeedequivalent to solving n linear systems

Axi = ei , 1 ' i ' n ,

where (ei)1#i#n denote the basis vectors in Kn.2. if A is a upper triangular matrix, the resolution is trivial, we have then

,----.

----/

a11x1 + · · ·+ a1n$1xn$1 + a1nxn = b1

...an$1n$1xn$1 + an$1nxn = bn$1

annxn = bn

and since0n

i=1 aii = det(A) (= 0, the system is solved by first computingxn in the last equation and then xn$1 and so on, thus leading to

,-----.

-----/

xn = a$1nnbn

xn$1 = a$1n$1n$1(bn$1 # an$1nxn)

...

x1 = a$111 (b1 # a12x2 # · · ·# a1n$1xn$1 # a1nxn)

This backward substitution method requires 1+2+ · · ·+(n# 1) = n(n$1)2

additions and n(n$1)2 multiplications.



4.2.1 Cramer formulas

We consider a square linear system having n equations with n unknowns. Weknow that if the matrix A !Mn(R) is nonsingular, then there exists a uniquesolution to the system Ax = b. The following proposition provides explicitformulas to compute the solution.

Proposition 4.2.1 (Cramer formulas). Let A !Mn(R) be a nonsingularmatrix. The solution x = (x1, . . . , xn)t to the linear system Ax = b is givenby its components

xi =det(Ai)det(A)

, $ 1 ' i ' n ,

where Ai = (a1| . . . |ai$1|b|ai+1| . . . |an) is the matrix formed by replacing theith column of A by the column vector b.

Proof. The system Ax = b is equivalent to+n

i=1 aixi = b, where ai ! Rn

represents the ith column in A. The components xi are the entries of b in thebasis formed by the columns ai and thus we deduce that

det(a1| . . . |ai$1|b|ai+1| . . . |an) = xi det(A) ,

since the determinant is an alternate multilinear form. %&

Nevertheless, having explicit formulas to calculate the solution x to the lin-ear system is not always interesting, especially for large systems. Indeed, thenumber of multiplications required to compute the determinant of a squarematrix A of order n is larger than n!. Consequently, more than (n + 1)! mul-tiplications are needed to compute the solution of Ax = b. To give an ideaof the computational cost, consider a system of n = 10 equations, it wouldrequire more than 4, 000, 000 operations ! Hence, Cramer formulas are notused in practice because of their computational cost and we need to look formore e"cient techniques of solving linear systems.

4.2.2 Gaussian elimination

A fundamental observation is that the following elementary operations can beperformed on the system Ax = b witout changing the set of solutions:

1. adding a multiple of the ith equation to the jth equation;2. interchanging two equations (line swap);3. multiplying one entire row by a nonzero scalar.

It is also possible to interchange two columns in A provided the correspondinginterchanges are made in the components of the solution vector x.

Definition 4.2.1. The first nonzero entry in A !Mn(R) is called the leadingentry (or the pivot) of the row. The matrix A is in row echelon form if



1. all nonzero rows are above any zero rows;2. the leading entry of a row i is strictly to the right of the leading entry of

the row i# 1.

The matrix A is in reduced row echelon form if it is in row echelon form and

3. every leading entry aik is 1 and aik is the only nonzero entry in thiscolumn k.

A system of linear equations Ax = b is in row echelon if its augmented matrix[A|b] (the entries in the last column of the matrix are the components of b,ajn+1 = bj and thus [A|b] !Mn,n+1(R)) is in row echelon form.

As mentioned, the idea behind the Gaussian elimination is to use theelementary operations to eliminate the unknowns in the system Ax = b, so asto obtain an equivalent triangular system, solved by backsubstitution.

We pose A(1) = A and b(1) = b. For i = 1, . . . , n#1, we calculate A(i+1) andb(i+1) such that the systems A(i)x = b(i) and A(i+1)x = b(i+1) are equivalent,where the matrices A(i) and A(n) have the following form

A(i) =

&

11111111111'

a(1)11 . . . . . . . . . . . . a(1)

1n

0. . .

...... a(i)

ii

...... a(i)

i+1i

. . ....

......

. . ....

0 . . . a(i)ni . . . . . . a(i)

nn

(

22222222222)

and A(n) =

&

11111111111'

a(n)11 . . . . . . . . . . . . a(n)

1n

0. . .

...... a(i)

ii

......

. . ....

.... . .

...0 . . . . . . . . . 0 a(n)

nn

(

22222222222)

The two steps to solve the system Ax = b are summarized as follows.

1. Gaussian elimination: using elementary operations, we modify A(i) andb(i) accordingly. Linear combinations of rows shall allow to cancel theentries of the ith column below the ith row in view of obtaining a uppertriangular matrix. The two-steps algorithm is written asi) for k ' i and j = 1, . . . , n, we pose a(i+1)

kj = a(i)kj and b(i+1)

k = b(i)k ;

ii) for k > i, if a(i)ii (= 0, we pose,----.

----/

a(i+1)kj = a(i)

kj #a(i)

ki

a(i)ii

a(i)ij , for k = j, . . . , n

b(i+1)k = b(i)

k #a(i)

ki

a(i)ii

b(i)i

The matrix A(i+1) has the form defined hereabove and the two systems areequivalent. The diagonal entries a11, a

(2)22 , . . . , a(n)

nn which appear during thelmination procedure are called pivotal elements. Let Ak denote the kth



leading principal submatrix of A. Since the determinant of a matrix doesnot change under row permutations, then

det(Ak) = a(1)11 . . . a(k)

kk , k = 1, . . . , n .

This implies that all pivotal elements a(i)ii , 1 ' i ' n in Gaussian elimi-

nation are nonzero if and only if det(Ak) (= 0, k = 1, . . . , n. In this case,after (n# 1) steps of elimination, we obtain the single equation

a(n)nn xn = b(n)

n .

We obtain finally an upper triangular system and we have also

det(A) = A(1)11 a(2)

22 . . . a(n)nn .

2. backsubstitution: denoting the upper triangular matrix A(n) = U , theunknowns (xi)1#i#n can be computed as follows

xn =b(n)n

a(n)nn

, and then

xi =1

a(n)ii

&

'b(i)i #

n!

j=i+1

a(n)ij xj

(

) , for i = n# 1, . . . , 1 .

Remark 4.2.2. Suppose that we face the case a(k)kk = 0 at step k in Gaussian

elimination. If A is nonsingular, rhen in particular its first k columns arelinearly independent, and so are the k columns of the reduced matrix. Hence,some a(k)

ik , i ' k must be nonzeroes, say a(k)pk (= 0. By interchanging rows k

and p, this entry can be considered as pivot and the process can continue.

An important conclusion is given by the next proposition.

Proposition 4.2.2. Any nonsingular matrix can be reduced to triangularform by Gaussian elimination (if appropriate row interchanges are made).

Gaussian elimination requires 2/3n3 additions and multiplications and n2/2divisions. Thus, if n = 10 Gaussian elimination requires 700 operations (to becompared with Cramer formulas).

In principle, the reduced form of the matrix yields the rank of the matrix A.From the numerical point of view, i.e., when performing floating point oper-ations, it is in general necessary to perform row interchanges not only whena pivotal element is exactly zero, but also when its absolute value is small, toensure the numerical stability of the subsequent operations. To this end, it isoften interesting to chose between two strategies of Gaussian elimination.



1. in partial pivoting, the pivot is taken as the largest entry in the unreducedpart of the column k: choose r as the smallest integer for which

|a(k)rk | = max

k#i#n|a(k)

ik | ,

and interchanges rows k and r;2. in complete pivoting, the element of the largest magnitude in the whole

unreduced part of the matrix is chosen as pivot: choose r and s as thesmallest integers for which

|a(k)rs | = max

k#i,j#n|a(k)

ij | ,

and interchanges rows k and r, columns k and s.

The growth of the elements in the reduced matrix can be measured by thegrowth ratio defined as

gn =maxi,j,k |a(k)

ij |maxi,j |aij |

,

and for partial and complete pivoting, we have

|a(k+1)ij | ' |a(k)

ij |+ |!ik||a(k)kj | ' |a(k)

ij |+ |a(k)kj | ' 2 max

i,j|a(k)

ij | ,

with !i1 =ai1

a11and the bound gn ' 2n$1 follows by induction.

Theorem 4.2.1. If Gaussian elimination is performed without pivoting, wehave the following bounds on gn(A)

• if A is nonsingular and diagonally dominant, i.e., |aii| *+

j (=i |aij |, forall i = 1, . . . , n, then

gn(A) ' 2 .

• if A is symmetric and positive definite, i.e., At = A and xtAx > 0, forall x (= 0, then

gn(A) ' 1 .

• if A is totally positive (nonnegative), i.e., if the determinant of everysubmatrix of A is positive (nonnegative), then

gn(A) ' 1 .

4.2.3 The LU factorization

For better understanding, we can rewrite the Gaussian elimination algorithmin a more abstract form, using matrix products only. Indeed, Gaussian elimina-tion consists in decomposing the matrix A as the product of a lower triangularmatrix L by an upper triangular matrix U : A = LU .



It can be observed that

A(n) = L(n$1)A(n$1) = L(n$1) . . . L(1)A ,

where L(k) = In #B(k) with

B(k) =

&

1111111'

0 . . . 0 0 0 . . . 0...

......

......

0 . . . 0 !(k)k+1 0 . . . 0

......

......

...0 . . . 0 !(k)

n 0 . . . 0

(

2222222)

with !(k)i =

a(k)ik

a(k)kk

, i = 1, . . . , n (4.3)

as A(k+1) = L(k)A(k) and b(k+1) = L(k)b(k).

Lemma 4.2.1. Let B(k) !Mn(R) be a square matrix of the form (4.3). Then,we have

1. B(k)B(l) = 0 for every 1 ' k < l ' n;2. for L(k) = In #B(k), L(k) is invertible and (L(k))$1 = In + B(k);3. L(k)L(l) = In # (B(k) + B(l)), for every 1 ' k < l ' n.

Proof. Let B(k) and B(l) !Mn(R), we have for 1 ' i, j ' n

(B(k)B(l))ij =n!

m=1

(B(k))im(B(l))mj

but, for m (= k, (B(k))im = 0 by assumption and thus

(B(k)B(l))ij = (B(k))ik(B(l))kj .

Moreover, if j (= l, we have (B(l))kj = 0 and thus (B(k)B(l))ij = 0. If j = l,we have then

(B(k)B(l))il = (B(k))ik(B(l))kl .

However, since k < l, the coe"cient (B(l))kl is null and thus (B(k)B(l))il = 0.Finally, we obtain (B(k)B(l) = 0 for every 1 ' k < l ' n. The remainder ofthe proof is more trivial. At first, we check that

(In #B(k))(In + B(k)) = In

and thus(L(k))$1 = (In #B(k))$1 = In + B(k) .

Then, we verify that, for 1 ' k < l ' n,

L(k)L(l) = In # (B(k) + B(l))

and the results follows. %&



Using this result, we write now

A = (L(n$1) . . . L(1))$1A(n)

= (L(1))$1 . . . (L(n$1))$1A(n)

= (In + B(1)) . . . (In + B(n$1))A(n) .

And, using a simiar property, changing B(k) in #B(k), we have

(In + B(1)) . . . (In + B(n$1)) = (In + B(1) + · · ·+ B(n$1))

thus leading to write

A = (In + B(1) + · · ·+ B(n$1))A(n) ,

where the matrix L = (In + B(1) + · · ·+ B(n$1)) is a lower triangular matrixand the matrix U = A(n) is upper triangular.

We introduce the following result for the Gaussian elimination methodwithout pivoting.

Lemma 4.2.2. Let A !Mn(R) be a nonsingular matrix, invertible. Supposethat all the principal submatrices of order 1 to n # 1 are invertible. Then,there exists a unique lower triangular matrix L with unit diagonal coe!cientsand a upper triangular matrix U , invertible, such that

A = LU .

Proof. We prove the uniqueness.Suppose we have two decompositions A = L#U# and A = L$U$ where L#

and L$ are lower triangular matrices with unit diagonal coe"cients. Thus, L#

and L$ are invertible. Similarly, U# and U$ are upper triangular invertiblematrices. Then, we can write

L$1$ L# = U$U$1

# .

L$1$ L# is a lower triangular matrix with unit diagonal and U$U$1

# is an uppertriangular matrix, it is thus diagonal and we have

L$1$ L# = In and U$U$1

# = In .

Hence, we have L# = L$ and U# = U$ , which proves the uniqueness.To apply the Gaussian elimination without pivoting, the coe"cient a(i)

iimust be di!erent of zero at each step. We prove the result by induction onnonzero eterminant for all submatrices.We have a11 (= 0 and thus a(1)

11 (= 0.We assume that a(k)

kk (= 0 for every k = 1, . . . , i#1. Using Gaussian elimination,we have then



A = A(1) = (L(i$1) . . . L(1))$1A(i) .

By developing by blocks, we obtain3

Ai ·· ·

4=

3Li 0· ·

4 3A(i)

i ·· ·

4

where Ai and A(i)i are the principal submatrices of A and A(i), respectively

and Li ! Mi(R) is a lower triangular matrix with unit coe"cients on itsdiagonal. Hence, we have

det(Ai) = det(LiA(i)i ) = det(Li) det(A(i)

i ) = 1 · a(1)11 . . . a(i)

ii .

By hypothesis, det(Ai) (= 0 and thus

i"

k=1

a(k)kk (= 0 2 a(i)

ii (= 0 .

We can thus apply a new step in Gaussian elimination. At the end, we willhave

A = (L(1))$1L(1)A(1) = (L(1))$1A(2) = · · · = (In+B(1)+· · ·+B(n$1))A(n$1) .

We note U = A(n$1) and L = (L(1))$1 . . . (L(n$1))$1 and thus we concludeto the result. %&

Corollary 4.2.1. Let A !Mn(C) be a given Hermitian positive definite ma-trix. Then, A admits a LU decomposition.

Proof. It consists in checking that the principal submatrices are invertible.These matrices are symmetric, positive definite and thus they are invertible.Using Lemma 4.2.2, we conclude about the LU decomposition of A. %&

Remark 4.2.3. Suppose the problem Ax = b need to be resolved for the samematrix A but for various data b, as it may happen when dealing with approx-imation methods. It is then important to avoid calculating the LU decom-position at each time, for e"ciency purposes. Hence, the matrix A(n) mustbe computed once while the vectors b(n) and x must be evaluated for eachright-hand side b. In practice, the matrices L and U are computed duringthe resolution of the first system and are kept in memory. The solution toevery system Ax = b is then obtained by solving two subproblems involvingtriangular matrices

1. find y ! Rn, solving Ly = b;2. find x ! Rn, solving Ux = y.

We have the following result.



Theorem 4.2.2 (LU factorization). Let A !Mn(R) be a given nonsingu-lar matrix. Then, there exists a permutation matrix P such that the Gaussianelimination on the matrix PA can be carried out without pivoting giving thefactorization

PA = LU

where the pair of matrices (L, U), with L lower triangular with unit diagonaland U upper triangular, is uniquely determined.

Proof. cf. [Cia89]. %&

In practice, the LU factorization of a matrix A can be achieved, if it exists,by setting the matrices L and U as follows

L =

&

1111'

1 0 . . . 0

l21. . . . . .

......

. . . . . . 0ln1 . . . lnn$1 1

(

2222)and U =

&

1111'

u11 . . . . . . u1n

0 u22

......

. . . . . ....

0 . . . 0 unn

(

2222)

and then, since L (resp. U) is lower (resp. upper) triangular, we write, for1 ' i, j ' n

aij =n!

k$1

likukj =m!

k=1

likukj ,

where m = min(i, j). The entries lij and uij are easily calculated by readingin increasing order the columns of A (a complete description of the algorithmcan be found in [All08], for instance).

Remark 4.2.4. 1. The LU factorization o!ers two advantages. It divides thesolution of a linear system Ax = b into two independent stepsa) the factorization PA = LUb) the resolution of two systems: Ly = Pb and Ux = y for y and x,

respectively. Indeed, we have

PAx = LUx = L(Ux) = Pb .

2. If the LU factorization of PA is known, then the solution x can be com-puted by solving the two systems Ly = Pb and Ux = y in O(n2) opera-tions.

3. If the matrix A is tridiagonal then the number f operations required tosolve the system Ax = b involves 3(n#1) additions, 3(n#1) multiplicationsand 2n divisions [Cia89].



4.2.4 Cholesky method

The Cholesky method applied only to symmetric and positive definite matri-ces. We recall that a real matrix A is positive definite if all its eigenvalues arepositive. If A is symmetric positive definite, it is invertible.

The Cholesky method1 consists in determining a lower triangular matrixL such that

A = LLt

so that solving the linear system Ax = b becomes equivalent to solving twotriangular systems

Ly = b and Ltx = y .

This decomposition allows also to compute the inverse matrix A$1 and the de-terminant, det(A) =

0ni=1 lii. The matrix L appears sometimes as the square

root of the matrix A.Suppose the decomposition LLt has been obtained, then we solve succes-

sively

1. the system Ly = b, written as follows

Ly =

&

1'l11 0...

. . ....

ln1 . . . lnn

(

2)

&

1'y1...

yn

(

2) =

&

1'b1...

bn

(

2)

that can be rewritten componentwise as, for i = 1, . . . , n

l11y1 = b1 2 y1 = l$111 b1

l21y1 + l22y2 = b2 2 y2 = l$122 (b2 # l21y1)

...n!

j=1

lnjyj = bn 2 yn = l$1nn

&

'bn #n$1!

j=1

lnjyj

(

)

that gives (yi)1#i#n.2. the system Ltx = y for x,

Ltx =

&

1'l11 l1n

. . ....

0 . . . lnn

(

2)

&

1'x1...

xn

(

2) =

&

1'y1...

yn

(

2)

that gives (xi)1#i#n using the classical backward substitution, as follows1 Named after the French mathematician Andre-Louis Cholesky (1875-1918).



lnnxn = yn 2 xn = l$1nnyn

...n!

j$1

lj1xj = y1 2 x1 = l$111

&

'y1 #n!

j=2

lj1xj

(

)

The existence and uniqueness of the LLt decomposition of a symmetric andpositive definite matrix A is given thereafter.

Theorem 4.2.3 (Cholesky factorization). Let A !Mn(R) be a symmet-ric and positive definite matrix. There exists a unique real lower triangularmatrix L, having positive diagonal entries lii > 0, for all 1 ' i ' n, such that

A = LLt .

Proof. We have seen (Theorem 4.2.2) that there exists a pair of matrices(L, U) such that PA = LU . Here, we show that the decomposition can beobtained without any permutation. More precisely, Lemma (4.2.2) will allowus to conclude to the existence and uniqueness of the decomposition.

Since A is a positive definite matrix, all principal (symmetric) submatricesAk are also positive definite and thus invertible. Assuming the LU decompo-sition of A, we write

det(Ak) = det((LU)k) =k"

i=1

uii ,

and we observe that det(Ak) > 0 since all the eigenvalues of A are positive(and det(Ak) correspond to the product of the eigenvalues of Ak), hence wehave uii > 0, for all 1 ' i ' k. By induction we show that ui > 0 for all1 ' i ' n. Next, we introduce the real matrix * = diag(

0uii) in the LU

factorization and we obtain

A = (L*)(*$1U) = LU with L = L* U = *$1U .

Furthermore, since A is symmetric, we write LU = U tLt or, similarly

(U t)$1L = LtU$1 or L(U t)$1 = (L)$1U t .

Since U t and (U t)$1 are lower triangular, (U t)$1L is lower triangular. Like-wise, LtU$1 is upper triangular. This matrix identity is only possible if bothmatrices are identical and equal to In

(LtU$1)ii =0

uii10

uii= 1

i.e., if LtU$1 = In or Lt = U . This proves the existence of (at least) oneCholesky factorization.

The uniqueness of the decomposition is a direct consequence of the unique-ness property of the LU decomposition. %&



Remark 4.2.5. If the diagonal entries in the matrix L are not all positive, thenthe LLt decomposition may not be unique.

The analysis of the factorization gives a practical algorithm for computingthe matrix L. From the identity A# LLt, we deduce

aij = (LLt)ij =n!

i=1

likljk =m!

i=1

liklkj , 1 ' i, j ' n

where m = min(i, j), and by noticing that lpq = 0 for 1 ' p < q ' n. Thematrix A being symmetric, the previous identities must be satisfied for alli ' j, and the entries lij in L must be such that

aij =i!

k=1

likljk , 1 ' i, j ' n .

Like for the LU factorization, the entries in L are computed by reading inincreasing order the columns of A,

1. for i = 1, the first column of L is given by

a11 = l11l11 2 l11 =0

a11

a12 = l11l21 2 l21 = l$111 a12

...

a1n = l11ln1 2 ln1 = l$111 a1n

2. for i * 1, we compute the column j of L, assuming the first (j#1) columnsof L have been previously computed.

aii =i!

k=1

liklik 2 lii =

$aii #

i$1!

k=1

l2ik

%1/2

aii+1 =i!

k=1

likli+1k 2 li+1i = l$1ii

$aii+1 #

i$1!

k=1

likli+1k

%

...

ain =n!

k=1

liklnk 2 lni = l$1ii

$ain #

i$1!

k=1

liklnk

%,

and we know that

aii #i$1!

k=1

l2ik > 0 .



Remark 4.2.6. 1. The Cholesky method requires n3/6 additions, n3/6 mul-tiplications, n2/2 divisions and n square root computations.

2. The Cholesky method is also useful for computing the determinant of A,by noticing that

det(A) = (det(L))2 .

Band matrices are commonly found in many applications (see Chapters 5,6, ??, for instance). Their peculiar structure reveals useful to spare memoryand to improve computational e"ciency during the factorization.

Definition 4.2.2. A matrix A !Mn(R) is said to be a band matrix if aij = 0for |i# j| > p, p ! N. The bandwith of A is then 2p + 1.

Proposition 4.2.3. The Cholesky factorization preserves the band structureof the matrix A.

Remark 4.2.7. Suppose the matrix A is nonsymmetric or is merely symmetric.We observe that, for A invertible,

Ax = b . AtAx = Atb

since det(At) = det(A) (= 0. AtA is symmetric and positive definite. Indeed,given any matrix B !Mn(R), B is symmetric if and only if (Bx, y) = (x, By)and (Bx, y) = (x, Bty), for all (x, y) ! Rn " Rn. Taking B = AtA, we de-duce that AtA is symmetric. Moreover, since A is invertible, (AtAx, x) =(Ax, Ax) = |Ax|2 > 0, if x (= 0. Hence, AtA is symmetric and positive defi-nite.

The Choleski method can be used to compute the solution x of Ax = bwhen A is nonsymmetric. It consists in calculating AtA and Atb, then to solveAtAx = Atb using the classical Choleski method. However, this factorizationis more computationally expensive, it requires O(4n3/3) operations comparedto O(2n3/3) operations for the LU factorization.

Remark 4.2.8. Regarding nonsquare systems of the form Ax = b, with A !Mm,n(R), b ! Rm and x ! Rn, we shall notice that such systems have ingeneral no solution, i.e., there is more equations than unknowns and thesystems are said to be overdetermined. Nevetheless, we look for x that solvesthe equations ”at best”. To this end, we define f : Rn - R by

f(x) = ,Ax# b,22

where ,x,2 = (x, x)1/2 denotes the Euclidean norm on Rn. Then we searchfor x solution of the optimization problem

f(x) ' f(y) , $ y ! Rn .

We can develop f(x) as follows



f(x) = (AtAx, x)# 2(b, Ax) + (b, b) .

If there exists a solution to the optimization problem, it is given by the reso-lution of

AtAx = Atb

and we have seen that the Choleski method can be used to solve this system.

4.2.5 The QR decomposition

Another matrix factorization method, the QR decomposition, attempts toreduce the linear system Ax = b to a triangular system. Here, the matrixA is factorized as the product of a orthogonal (unitary) matrix Q (such thatQt = Q$1 or Q" = Q$1) by an upper triangular matrix R. For any nonsingularmatrix A, there exists an orthogonal matrix Q.

For solving the system Ax = b, the QR factorization determines an orthog-onal matrix Q such that Q"A = R is upper triangular and compute Q"b, andthe solution x is then obtained by solving the triangular system Rx = Q"b.

Theorem 4.2.4 (QR factorization). Let A ! Mn(K) be a nonsingularmatrix. There exists a unique pair (Q, R) of matrices, where Q is orthogonaland R is upper triangular with positive diagonal entries rii > 0, such that

A = QR .

In order for us to prove this result, we need to introduce the Gram-Schmidtorthogonalization process. This process provides a convenient method for or-thogonalizing a set of vector in a vector or an inner product space. We havethe following statement, given here without proof.

Lemma 4.2.3 (Gram-Schmidt). Consider a finite linearly independent setS = {x1, . . . , xk} in Kn, for k ' n. Then, there exists an orthogonal setSo = {y1, ,yk} that spans the same k-dimensional subspace of Kn as S, i.e.,

span{y1, . . . , yk} = span{x1, . . . , xk} .

Firstly, we define the projection operator, that projects a vector y orthogonallyonto a vector x, by

projx(y) =(x, y)(x, x)

x = (x, y)x

(x, x)

where (·, ·) denotes the inner product. The Gram-Schmidt process is thendefined by induction on k



y1 = x1 e1 =y1

,y1,

y2 = x2 # projy1(x2) e2 =

y2

,y2,...

...

yk = xk #k$1!

j=1

projyk(xk) ek =

yk

,yk,

At completion, {y1, . . . , yk} is the system of orthogonal vectors and the set{e1, . . . , ek} is orthonormal.

Proof (of Theorem 4.2.4). The uniqueness of the factorization is obtained bycontradiction. Suppose there exist two decompositions

A = Q1R1 and A = Q2R2 .

Then Q"2Q1 = R2R$11 is upper triangular with positive diagonal entries. We

have then(Q"2Q1)(Q"2Q1)" = In = LL"

and we observe that L = (Q"2Q1) is a Cholesky factorization of In and thusL = In.

Since A is nonsingular, its column vectors a1, . . . , an form a basis of Rn.We apply the Gram-Schmidt orthogonalization process to the (ai)1#i#n. Thisyields an orthonormal basis {q1, . . . , qn} defined by, for all 1 ' i ' n

qi =xi

,xi,, with xi = ai #

i$1!

k=1

(qk, ai)qk .

Then, posing rki = (qk, ai) for 1 ' k ' i# 1, we deduce

ai =i!

k=1

rkiqk and rii = ,ai #i$1!

k=1

(qk, ai)qk, > 0 .

We denote by Q = (q1, . . . , qn) the orthogonal matrix and by R = (rij) theupper triangular matrix, setting rki = 0 for k > i; we have then A = QR. %&

Remark 4.2.9. The Gram-Schmidt algorithm for the QR factorization involvesalmost three times more operations than the Gaussian elimination. This draw-back explains why it is not used in practice for solving square linear systems.However, it may be used for solving least square fitting problems involvingrectangular matrices.

By extension, a rectangular matrix A ! Mm,n(R), with m * n, admits aQR decomposition if there exists an orthogonal matrix Q !Mm,m(R) and an



upper trapezoidal matrix R !Mm,n(R) with rows n + 1 to m all null, suchthat

A = Q

3R0

4.

Remark 4.2.10. On a computer, the Gram-Schmidt process gives vectors thatare often not orthogonal because of rounding errors and therefore this processis said to be numerically unstable [Cia89]. Other orthogonalization methodsare usually favored, like Householder transformations or Given rotations (seethe references at the end of this chapter).

4.3 Iterative methods

The direct methods presented in the previous section are e"cient, they providethe exact solution x (in the absence of rounding errors) of the linear systemAx = b. However, they require a large memory to store the matrix A. If thesystem results from the discretization of a partial di!erential equation, thematrix A is generally sparse, i.e., contains a majority of zeroes, but can bevery large. Iterative methods take advantage of the sparse structure of thematrix in memory and usually involve only matrix-vector products. Hence,a desired feature for a resolution method is to preserve at best the sparsityof the matrix A. This criterion is clearly discrediting most direct methods.In this regard, the LU decomposition fills the sparse structure of the matrix.If the bandwidth of each matrix L and U is p + 1, the number of nonzeroelements per line is then 2p+1. However, we are considering here large sparsematrices A !Mn(R), such that

m 1 2p + 1 1 n

where m denotes the maximal number of nonzero elements per line in A. Inother words, the size required to store matrices L and U is much larger thanthe memory required to store the nonzero entries of A using a suitable datastructure, like the CSR (Compressed Sparse Row) format.

4.3.1 General context

All algorithms considered in this section are called iterative because theycompute a sequence (x(k))k&1 of approximate solutions, given an initial datax(0), that converges under certain assumptions, toward the solution x of theproblem Ax = b, when k tends to +3.

Since linear systems are considered, it would seem then reasonable to buildthe sequence of iterates of the general form

x(k+1) = Bx(k) + c , k * 0 , for any x(0) . (4.4)


4.3 Iterative methods 131

Here, B !Mn(R) and c ! Rn must be carefully chosen to ensure the conver-gence of the method. The next lemma gives a condition on the convergenceof such method.

Lemma 4.3.1. An iterative method of the form (4.4) converges to the solutionx of Ax = b, for any choice of x(0), if and only if '(B) < 1.

Proof. Having denoted e(k) = x(k) # x, the error at iteration k, we have

e(k) = Bke(0)

and thanks to Lemma 4.1.7, it follows that limk*'

Bke(0) = 0 for any vector

e(0) if and only if '(B) < 1.Conversely, suppose that '(B) > 1, then there exists at least one eigenvalue

&(B)| > 1. Let e(0) be an eigenvector associated with &, then Be(0) = &e(0)

and e(k) = &ke(0). This prevents e(k) from tending to 0 as k -3. %&

Remark 4.3.1. 1. An iterative method of the form (4.4) is not very useful inpractice, because it requires the calculation of the inverse in A$1b. Supposethe method is convergent. Then, by taking the limit in the inductionrelation, we obtain x = Bx + c and, since x = A$1b, this leads to setc = (In #B)A$1b.

2. An iterative method of the form (4.4) is a special instance of iterativemethods to find a fixed point for the mapping

F : x ! Rn - F (x) = (Bx + c) ! Rn ,

that is a contraction if ,B, < 1, with respect to any matrix norm. Bytaking the matrix norm induced by the vector norm , ·, , we have

,F (x)# F (y), ' ,B,,x# y, .

Among the many iterative methods to solve linear systems, we restrictourselves to a few algorithms that epitomize this class of methods.

4.3.2 Linear iterative methods

In this section, we consider only iterative methods in which the iterate x(k+1)

is a function of x(k) only and not of x(k$1), . . . x(1), i.e., x(k+1) = F (x(k). Aclassical and general approach is based on a regular decomposition (or split-ting) of the matrix A of the form

A = M #N ,

where M is a nonsingular matrix, easy to invert in practice. We have then theset of equivalences

Ax = b . Mx = Nx + b . x = M$1Nx + M$1b .



We can then define the iterative method based on the splitting as follows

x(k+1) = M$1Nx(k) + M$1b , k * 0 , for any x(0) , (4.5)

whose iteration matrix is B = M$1N = In #M$1A.

Remark 4.3.2. If the sequence (x(k))k&1 converges to a limit x as k -3, thenby taking the limt in the induction relation we have (M #N)x = Ax = b.

Definition 4.3.1. Let A = M # N ! Mn(K) with M nonsingular. Themethod of the form (4.5) is convergent if, for any b ! Kn and for any choiceof x(0) ! Kn, the sequence (x(k))k&1 converges to x, the solution of Ax = b inKn. We have then

limk*'

,x(k) # x, = 0 ,

where , ·, denotes any norm in Kn.

As previously, we consider the error at iteration k + 1, e(k+1) = x(k+1) # x.The method is convergent if e(k+1) tends to 0 as k -3. However, since x isthe unknown, e(k+1) cannot be evaluated. We would have come to the sameconclusion by considering the residual r(k+1) = b # Ax(k+1) instead of theerror e(k+1). Nevertheless, we observe that

e(k+1) = M ($1)N(x(k) # x) = M$1Ne(k) = Be(k) = Bk+1e(0) .

The next lemma provides convergence critera for the iterative method.

Lemma 4.3.2. The iterative method (4.5) converges if and only if the spec-tral radius of B satisfies '(B) < 1.

Proof. see Lemma 4.3.1. %&

Jacobi, Gauss-Seidel and relaxation methods

In this section, we consider a few classical linear iterative methods, corre-sponding to di!erent regular decompositions.

Suppose the diagonal entries aii of A are nonzero.

Definition 4.3.2. The Jacobi iterative method is defined by the regular de-composition

M = D , N = D #A = E + F ,

where D = diag(aii), and E denotes the lower triangular matrix of entrieseij = #aij, if i > j and F is the upper triangular matrix of entries fij = #aij

if j > i.



The iteration matrix BJ f the Jacobi method is then

BJ = M$1N = D$1(E + F ) = In #D$1A ,

and the iterative algorithm is written as

Dx(k+1) = (E + F )x(k) + b , k * 0 , for any x(0) .

Once x(0) has been chosen, the vector x(k+1) can be computed with the for-mula

aiix(k+1)i = bi #

!

j>ij (=i

aijx(k)j , i = 1, . . . , n . (4.6)

The convergence of Jacobi algorithm is given by the following lemma.

Lemma 4.3.3. For any initial vector x(0), the Jacobi method (4.6) convergesif the matrix A is strictly diagonally dominant, i.e., if

|aii| >!

j (=i

|aij | , i = 1, . . . , n .

Proof. From Lemmas 4.1.7 and 4.3.2, we know that it is su"cient to find asubordinate norm , ·, such that the matrix B = D$1(E + F ) satisfies

,B, < 1 .

Since A is strictly diagonally dominant, it is easy to show that

n!

j=1

|bij | = |aii|$1n!

j=1j (=i

|aij | < 1 ,

and thus ,B,' = max1#i#n

n!

j=1

|bij | < 1. %&

Definition 4.3.3. The Gauss-Seidel method is defined by the regular decom-position

M = D # E , N = F ,

where the matrices D,E and F are the same as with Jacobi method.

The iteration matrix BGS is then

BGS = M$1N = (D # E)$1F ,

and the iterative algorithm can be written as

(D # E)x(k+1) = Fx(k) + b , k * 0 , for any x(0) .



Once x(0) has been chosen, the vector x(k+1) can be computed with the for-mula

aiix(k+1)i = bi #

i$1!

j=i

aijx(k)j #

n!

j=i+1

aijx(k)j , i = 1, . . . , n . (4.7)

And we have the following results for Gauss-Seidel method.

Proposition 4.3.1. Let A = M # N be a symmetric and positive definitematrix. If M t + N is symmetric and positive definite, then '(M$1N) < 1.

Lemma 4.3.4. If A is symmetric and positive definite, then for any initialvector x(0), the Gauss-Seidel method converges to the solution x of Ax = b.

Proof. We have to show that M t + N is symmetric and positive definite,

M t + N = (D # E)t = F = D # Et + F ,

and since A is symmetric, Et = F and thus M t + N = D, that is symmetricand positive definite. The previous proposition allows to conclude. %&

A variant consists in introducing a relaxation parameter + in the Gauss-Seidelmethod to improve its convergence rate.

Definition 4.3.4. Let + ! R+. The iterative relaxation method is defined bythe regular decomposition

M =D

+# E , N =

1# +

+D + F ,

where the matrices D,E and F are the same as with Jacobi method.

The iteration matrix B% is then

B% = M$1N =3

D

+# E

4$1 31# +

+D + F

4

= (In # +D$1E)5(1# +)In + +D$1F

6.

and the iterative algorithm is written as

aiix(k+1)i = +

&

'bi #i$1!

j=1

aijx(k+1)j #

n!

j=i+1

aijx(k+1)j

(

) + (1# +)x(k)i ,

Remark 4.3.3. 1. The relaxation method is well defined if D is invertible;2. if + = 1, the relaxation method coincides with the Gauss-Seidel method;3. if + > 1 (resp. + < 1) it is called over-relaxation (resp. under-relaxation)

method.

The convergence of the relaxation method results from the following property.



Proposition 4.3.2. Let A = M #N be the regular deomposition of a Hermi-tian positive definite matrix A, with M nonsingular. Then the matrix M"+Nis Hermitian and, if M"+N is also positive definite, we have '(M$1N) < 1.

Lemma 4.3.5. If A is a Hermitian positive definite matrix, then for any + ![0, 2[ and any initial vector x(0), the relaxation method converges to x thesolution of Ax = b.

Proof. Matrix D is definite positive as A is positive definite. Hence, +$1D#Eis nonsingular and, by noticing that E" = F , we have

M" + N = +$1D #" ++$1(1# +)D + F = +$1(2# +)D ,

and we can conclude that M" + N is positive definite if and only if + ! [0, 2[and the previous proposition leads to the result. %&Lemma 4.3.6. For any matrix A, the spectral radius of B% is such that

'(B%) * |1# +| , for all + (= 0 ,

and thus the relaxation method converges only if + !]0, 2[.

Remark 4.3.4. The over-relaxation procedure can be also applied to Jacobimethod and the algorithm (A.3) is then written as

x(k+1) = x(k) + +D$1(b#Ax(k))

and the vector x(k+1) is computed as

aiix(k+1)i = +

&

11'bi #n!

j>ij (=i

aijx(k)j

(

22) + (1# +)x(k)i , i = 1, . . . , n . (4.8)

The corresponding iteration matrix is then

BJ" = +BJ + (1# +)In .

Practical issues

When computing the sequence of approximate solutions, a practical stoppingcriterion is necessary. Since the solution vector x is unknown, it is of limitedusefulness to evaluate the di!erence ,x(k) # x, and to stop when the desiredaccuracy $ is reached. Since Ax is known, a simple convergence criterion couldbe ,b#Ax(k), ' $. However, if ,A$1, is large, we have then

,x# x(k), ' ,A$1,,b#Ax(k), ' $,A$1, ,

and this term may not be small. For this reason, another criterion based onthe residual is favored

,b#Ax(k), ' $ ,b#Ax(0), . ,r(k), ' $,r(0), .

The computational cost of the iterative methods is at most 3/2n2 periteration, which is favorable compared to direct methods if the number ofiterations if small before n.



4.3.3 Gradient methods

In this section, we restrict ourselves to the case of real symmetric matrices,although the case of complex self-adjoint matrices can be handled quite sim-ilarly.

The linear iterative methods based on regular decomposition rely on pa-rameters that may be di"cult to set properly. In this regard, the gradientmethod (also called steepest descent method or Richardson’s method) is de-fined as follows.

Definition 4.3.5. Let ! ! R". The gradient method is defined by the regulardecomposition

M = !$1In and N = (!$1In #A) ,

and the corresponding iterative algorithm can be written as

x(k+1) = x(k) + !(b#Ax(k)) , k * 1 , for any x(0) . (4.9)

Likewise, we have a convergence result for this method.

Lemma 4.3.7. Let (&i)1#i#n denote the real eigenvalues of A and suppose&i > 0, for all 1 ' i ' n. Then, the gradient method converges if and only ifthe parameter ! is such that 0 < ! < 2/&max.Moreover, the optimal parameter !opt which minimizes the spectral radius ofthe iteration matrix '(M$1N) is given by

!opt =2

&min + &maxand 'opt =

&max # &min

&min + &max=

cond2(A)# 1cond2(A) + 1

.

Proof. Since we assumed 0 < !min ' · · · ' &max, we can easily deduce that

#1 < 1# !&max 2 ! <2

&max.

The optimal value !opt is obtained by considering the function f : & /-|1#!&|. The analysis shows that f is decreasing on ]#3, 1/![ and increasingon ]1/!,+3[. Hence, '(M$1N) = max(|1 # !&min|, |1 # !&max|). On theother hand, the function ! ! [0, 2/&max /- '(M$1N) admits a minimum atthe point !opt defined by 1#!opt = !opt&max#1 (Figure 4.2). By substitution,the value 'opt s obtained. %&

Next, we show another interpretation of the gradient method that will justifythe denomination of projection method. To this end, we recall the followingnotions.

Definition 4.3.6. Let f be a function from Rn into R. The gradient of f atthe point x ! Rn is defined by 4f(x) = (Df(x))t ! Rn and we denote

4f(x) =3

,f

,x1(x), . . .

,f

,xn(x)

4t

.



2!max

!opt

1

|1! "#k|

1!k

1!max

""opt

|1! "#min|

|1! "#max|

1!min

Fig. 4.2. Spectral radius of In ! "A as a function of ".

For (x, y) ! Rn " Rn, we have then

Df(x)y =n!

j=i

,f

,xj(x)yj = (4f(x), y) .

And we have the following results.

Lemma 4.3.8 (Existence). Let f be a function from Rn into R such that

(i) f is continuous;(ii) f(x) - +3 when ,x, - +3

Then, there exists a point x0 ! Rn such that f(x0) ' f(y) for all y ! Rn.

Notice that this result does not hold if f : E - R, where E is a Banach space(i.e., if the dimension of E is not finite). The point x0 is called a minimumof f , or f is said to attain its minimum at x0.

Definition 4.3.7 (Convexity). Let f be a function from a vector space Einto R. The function f is called convex if, for any two points (x, y) ! E2,x (= y and for t ! [0, 1]

f(tx + (1# t)y) ' tf(x) + (1# t)f(y) ,

Furthemore, the function f is called strictly convex if, for t !]0, 1[

f(tx + (1# t)y) < tf(x) + (1# t)f(y) .

Corollary 4.3.1 (Existence and uniqueness). Let f be a strictly convexfunction from Rn into R satisfying the hypothesis of Lemma 4.3.8. Then, thereexists a unique point x0 ! Rn such that

f(x0) = infy%Rn

f(y) .



We have the following characterization of a convex function.

Proposition 4.3.3. Suppose E is a normed vector space on R and let considera function f ! C1(E, R). Then, for any two points (x, y) ! E2,

1. f is convex if and only if f(y) * f(x) + Df(x)(y # x);2. f is strictly convex if and only if f(y) > f(x) + Df(x)(y # x), x (= y.

Proposition 4.3.4. Let f be a C1 continuous convex function from Rn intoR and let x0 ! Rn. Then,

f(x0) = infy%Rn

f(y) . 4f(x0) = 0 .

Next, we introduce an optimization problem.

Minimization of a quadratic functional.

Let consider a symmetric matrix A ! Mn(R), a vector b ! Rn and thefunction f from Rn into R defined by

f(x) =12(Ax, x)# (b, x) =

12

n!

i,j=1

aijxixj #n!

i=1

bixi , (4.10)

Then, f ! C'(Rn, R). The evaluation of the gradient of f shows that

4f(x) =12(Ax + Atx)# b ,

and hence, since A is symmetric we have

4f(x) = Ax# b .

Indeed, computing the kth partial derivative of f yields

,f

,xk(x) = akkxk +

12

!

i (=k

aikxi +12

!

i (=k

akixi # bk

=!

i

aikxi # bk = (Ax# b)k .

Proposition 4.3.5. Let A ! Mn(R) be a symmetric and positive definitematrix and b ! Rn. Then, there exists a unique point x0 ! Rn such thatf(x0) ' f(x) for any x ! Rn, and x0 is the unique minimum of f that issolution of the linear system Ax = b.

Remark 4.3.5. The quadratic form f defined Equation (4.10), is often calledthe energy of system Ax = b. If x0 is a solution of the system, then

f(y) = f(x0 + (y # x0)) = f(x0) +12(A(y # x0), y # x0) , for all y ! Rn ,

and we conclude that f(y) > f(x0) if y (= x0. This shows that x0 is a minimizerof the functional f .



The following result provides an interpretation of the minimum as a projectionon a vector subspace.

Corollary 4.3.2. Let A !Mn(R) be a real symmetric and positive definitematrix and let f be the function defined by (4.10). Let E be a vector subspaceof Rn. There exists a unique x0 ! E such that

f(x0) ' f(x) , for all x ! E .

Moreover, x0 is the unique vector of E such that

(Ax0 # b, y) = 0 , for all y ! E .

If we denote by P the matrix representation of the orthogonal projection onthe vector subspace E in the canonical basis, we have

minx%E

f(x) = miny%Rn

f(Py) * minx%Rn

f(x) .

Hence, we have f(Py) = 12 (P tAPy, y)#(P tb, y) and the matrix P tAP is non-

negative. Moreover, P tb belongs to Im(P tAP ) and the minimum of f(Py) isattained by all x such that P tAPx = P tb. We can then enounce the followingresult.

Theorem 4.3.1. Let A !Mn(R) be a symmetric and positive definite ma-trix, and let f be the function defined on Rn by (4.10). Then,

1. x ! Rn is the minimum of f if and only if 4f(x) = 0 ,2. suppose 4f(x) (= 0 at x ! Rn. Then, for any value ! !]0, 2/'(A)[, we

havef(x# !4f(x)) < f(x) .

Proof. (admitted here, see [All08] for instance). %&

This result yields the definition of an iterative method to find the minimumof the function f . Namely, we define the sequence of points (xk)k&1 such thatthe sequence (f(k))k&1 is decreasing

xk+1 = xk # !4f(xk) = xk + !(b#Axk) , given x0 ,

and we observe that this relation is exactly identical to the algorithm definedby Equation (4.9). In other words, solving a linear system Ax = b for whichthe matrix A is symmetric and positive definite, is equivalent to minimiz-ing a quadratic functional. This concept will be used to define optimizationalgorithms.



Gradient descent methods

Let consider E = Rn and f ! C0(E, R). We assume that there exists a pointx0 ! Rn such that f(x0) = infy%E f(y). The goal is to compute this point x0.

Definition 4.3.8. Let f ! C0(E, R) and E = Rn. Let x ! E.

1. A vector w ! E\{0} is a descent direction at x if there exists !0 > 0 suchthat

f(x + !w) ' f(x) , for all ! ! [0, !0] .

2. A vector w ! E\{0} is a strict descent direction at x if there exists !0 > 0such that

f(x + !w) ' f(x) , for all ! !]0, !0] .

Using this definition, we can suggest the following descent method for findingthe point x0 that realize f(x0) = infy%E f(y). It consist in constructing thesequence (x(k))k&1 as follows

1. set x(0) = x0 ! E;2. given x(0), . . . , x(k$1)

a) find a strict descent direction wk of x(k),b) define x(k+1) = x(k) + !kwk, with !k suitably chosen.

The next result explains how to chose the parameter !k.

Proposition 4.3.6. Let E = Rn, f ! C1(E, R), x ! E and w ! E\{0}. Wehave the following properties

(i) if w is a descent direction at x then (w,4f(x)) ' 0,(ii)if 4f(x) (= 0 then w = #4f(x) is a strict descent direction at x.

Proof. We introduce the function - ! C1(R, R) defined by

-(!) = f(x + !w) .

(i) We have then -+(!) = (4f(x + !w), w). Since w is a descent direction atx, we have -(!) < -(0) for any ! !]0, !0] and thus

-(!)# -(0)!# 0

' 0 , for all ! !]0, !0] ,

By passing to the limit, when ! - 0, we deduce that -+(0) ' 0, i.e.,(4f(x), w) ' 0.(ii) Let w = #4f(x) (= 0. The objective is to find !0 > 0 such thatif ! !]0, !0] then f(x + !w) < f(x). This is equivalent to showing that-(!) < -(0). We have

-+(0) = (4f(x), w) = #|4f(x)|2 < 0 .



Since -+ is continuous, there exists !0 > 0 such that, if ! !]0, !0], -+(!) < 0.However, if ! !]0, !0], then

-(!)# -(0) =7 #

0-+(t)dt < 0 ,

and thus -(!) < -(0) for any ! !]0, !0], that shows that w is a strict descentdirection at x. %&

And we deduce easily the iterative algorithm of Definition (4.3.5) for comput-ing x(k+1) with wk = #4f(x(k)) = b#Ax(k).

The main drawback of this iterative gradient method is that it involves aparameter !, chosen constant at each step. We have seen previously (Lemma4.3.7) that the optimal parameter !opt requires the knowledge of the smallestand the largest eigenvalues of A. Hence, this algorithm is of limited practicaluselfulness. However, it can be improved by chosing at each step a di!erentcoe"cient !k that minimizes f(x(k) # !4f(x(k))).

Definition 4.3.9. The gradient method with variable step size for solving thelinear system Ax = b is defined by the algorithm

x(k+1) = x(k) + !k(b#Ax(k)) , for any x(0) ,

where !k is set as the minimum of the function f(x(k) # !4f(x(k))), i.e.,

f(x(k) # !k4f(x(k))) ' f(x(k) # !4f(x(k))) , for all ! * 0 .

It remains to find a computational expression for !k.

Lemma 4.3.9. Let A be a symmetric and positive definite matrix. For thegradient algorithm with variable step, there exists a unique optimal step size!k defined by

!k =(w(k), w(k))

(w(k), Aw(k)), where w(k) = b#Ax(k) .

Proof. To find the parameter !k, let us write the quadratic functional (4.10)for x(k+1):

f(x(k+1)) =12(A(x(k) # !kw(k)), x(k) # !kw(k))# (b, x(k) # !kw(k))

= f(x(k))# !k(w(k), w(k)) +12!2

k(Aw(k), w(k)) .

Di!erentiating this function with respect to !k and setting it to zero (to findthe minimum), gives the expected value of !k. %&

Remark 4.3.6. 1. For a more general function f , there is usually no explicitformula for the parameter !k. It can be obtained numerically by usingNewton’s method to find the roots of the gradient function.

2. The convergence of the gradient method can be quite slow (linear in gen-eral), especially if the condition number cond2(A) is large.



4.3.4 Projection methods

The research on unconstrained optimization problems had a positive impacton the improvement of gradient methods and e!ectively led to the conju-gate gradient method and Krylov subspace methods2. We introduce first thisimportant notion.

Definition 4.3.10. Let r ! Rn and A ! Mn(R). We call order-k Krylovspace generated by the vector r and associated with the matrix A, the vectorsubspace of Rn, denoted by Kk(A, r) (or simply by Kk if there is no ambiguity),spanned by the k + 1 vectors {r, Ar, . . . , Akr}, i.e.,

Kk(A, r) = span(r, Ar, . . . , Akr) .

By definition, dim(Kk(A, r)) = k + 1.

Lemma 4.3.10. The sequence of Krylov spaces (Kk)k&0 is increasing, i.e.,

Kk 5 Kk+1 , for all k * 0 .

Consider for instance the iterate x(k+1) of the gradient method

x(k+1) = x(k) + !kw(k) = x(0) +k!

j=0

!jw(j) , with w(k) = (b#Ax(k)) .

The vector w(k) is called the residual and it can be seen that

1. w(k) belongs to the Krylov space Kk corresponding to the initial residualw(0);

2. x(k+1) belongs to the following a"ne space Wk = [x(0) + Kk] defined asthe set of vectors v such that v # x(0) belongs to Kk, i.e.,

Wk =8

v = x(0) + y , y ! Kk(A, w(0))9

.

The term+k

j=0 !jw(j) is a polynomial in A of degree less than k and theapproximate solution x of Ax = b is searched for in the space Wk. It seemswise to devise a method that searches for the approximate solution of the form

x(k+1) = x(0) + pk(A)w(0) , (4.11)

where pK(·) is a suitably defined polynomial, for instance such that x(k+1)

represents the best approximation of x in Wk, in a sense that remains to bespecified. A method that looks for the solution x(k+1) ! Wk of the form (4.11)is then called a Krylov method.2 Named after the Russian mathematician Alexei Krylov (1863-1945).



Proposition 4.3.7. Let (x(k))k&0 be a sequence in Rn and let Kk be theKrylov subspace generated by the residual vector w(0) = b # Ax(0) and as-sociated with A. If x(k+1) ! Wk = [x(0) +Kk], then w(k+1) ! Kk+1.

Proof. If x(k+1) ! Wk there exists (!j)0#j#k such that

x(k+1) = x(0) +k!

j=0

!jAjw(0) ,

and by multiplying by A and subtracting to b, we obtain

w(k+1) = w(0) #k!

j=0

!jAj+1w(0) .

Consequently, w(k+1) ! Kk+1 which is the expected result. %&

Definition 4.3.11. Let A !Mn(R) be a symmetric positive definite matrix.

1. two vectors x and y in Rn\{0} are A-orthogonal, or A-conjugate, if(Ax, y) = (y, Ax) = 0;

2. A collection (w(0), . . . , w(p)) in Rn\{0} is A-conjugate if (w(i), Aw(j)) = 0for any pair (i, j) ! {1, . . . , p}2, i (= j.

Remark 4.3.7. Since A is symmetric positive definite, (·, A·) defines a innerproduct

(x, y)A = (Ax, y) = (x, Ay) ,

and the corresponding norm ,x,A = (Ax, x)1/2 is called the energy norm.

Proposition 4.3.8. Let A !Mn(R) be a symmetric positive definite matrix,(w(0), . . . , w(p)) a collection of vectors in Rn. Then,

1. if the set (w(0), . . . , w(p)) is A-conjugate then the vectors are linearly in-dependent;

2. if the case p = n, if the set (w(0), . . . , w(n)) is A-conjugate then it formsa basis of Rn.

The Conjugate Gradient Method

In this section, we assume the matrices to be symmetric and positive definite.We pointed out that Krylov methods attempt to find the solution x(k+1) inthe space Wk. Obviously, there are infinitely many possible choices for x(k+1)

in the a"ne space Wk. To remove the ambiguity, we can chose x(k+1) ! Wk

such that the residual w(k+1) is orthogonal to Kk. This projection allows todefine an iterative method.



The Hestenes-Stiefel conjugate gradient method3 starts from an initialguess x(0) of the solution x and the corresponding residual w(0) = b# Ax(0).And the A-conjugate directions of the method can be characterized as follows.At iteration k + 1, it considers the Krylov subspace Kk of dimension k + 1

Kk = span(w(0), . . . , Akw(0)) ,

and realizes the projection of w(k) on Kk.

Definition 4.3.12 (Conjugate gradient method). Let A ! Mn(R) be asymmetric positive definite matrix, b ! Rn. We define the conjugate gradientalgorithm as follows

1. Initialization:x(0) ! Rn, givenw(0) = b#Ax(0);d(0) = w(0);

2. Iteration:for k * 0, until ,w(k), < $,construct the sequence

!k =(w(k), d(k))(d(k), Ad(k))

x(k+1) = x(k) + !kd(k)

w(k+1) = w(k) # !kAd(k)

:----;

----<

compute new solution

"k =(Ad(k), w(k+1))(Ad(k), d(k))

d(k+1) = w(k+1) # "kd(k)

:-;

-<compute new direction

It can be observed that the two parameters !k and "k can be also computedas

!k =,w(k),22

(d(k), Ad(k)), and "k =

,w(k+1),22,w(k),22

.

The conjugate gradient algorithm enjoys several orthogonality properties.

Proposition 4.3.9. Let A !Mn(R) be a symmetric positive definite matrix,b ! Rn. Let (x(k))k&0 be the sequence of approximations of the conjugategradient method. Then

1. The Krylov space Kk = span(w(0), . . . , Akw(k)) is such that

Kk = span(w(0), . . . , w(k)) = span(d(0), . . . , d(k)) ,

3 Named after the American mathematician Magnus Hestenes (1905-1991) and theSwiss mathematician Eduard Steifel (1909-1978), was first published in 1952.



2. the sequence (w(k))0#k#n$1 is orthogonal, i.e.,

(w(k), w(l)) = 0 for all 0 ' l < k ' n# 1 ,

furthermore, the residual w(k) is orthogonal to all previous search direc-tions (d(l))0#l<k, i.e.,

(w(k), d(l)) = 0 , for all 0 ' l < k ' n# 1 ,

3. the sequence (d(k))0#k#n$1 is A-conjugate, i.e.,

(Ad(k), d(l)) = 0 for all 0 ' l < k ' n# 1 ,

Proof. The definition of the conjugate gradient implies that w(k+1) ! Kk+1

and w(k+1)6Kk. Thus, Kk 5 Kk+1 and we have dim(Kk) = k + 1. The family(w(k))0#k#n$1 is free and orthogonal and we deduceKk = span(w(0), . . . , w(k)).The family of directions (d(k))0#k#n$1 is also orthogonal for the inner prod-uct induced by A and forms a vector space of dimension k + 1. By induction,we can show that span(d(0), . . . , d(k)) 5 Kk. Suppose the assertion is true fork # 1, we have then

d(k) = w(k) # "kd(k$1) ! Kk 7Kk$1 5 Kk ,

and the results follows.Suppose (Ad(l), d(k)) = 0 for l < k # 1. We have then

(Ad(l), d(k)) = (Ad(l), w(k) # "kd(k$1))

= (Ad(l), w(k)) = !$1l (w(l) # w(l+1), w(k)) = 0 .

We shall notice that !l (= 0. We deduce that the family (d(k))k&0 is orthogonalfo the inner product (A·, ·). Since an orthogonal family of nonzero vectors isfree then Kk = span(d(0), . . . , d(k)). %&

Lemma 4.3.11. Let A !Mn(R) be a symmetric positive definite matrix andb ! Rn. The sequence (x(k))0#k#p, with p ' n, defined by the conjugate gradi-ent method is such that x(n) = x0 with Ax0 = b. Furthermore, the algorithmconverges to the solution in at most n iterations.

Remark 4.3.8. The conjugate gradient method could have been defined bychosing x(k+1) in the a"ne space [x(0) +Kk] that minimizes in this space thefunctional

f(x) =12(Ax, x)# (b, x) .

Indeed, if we define, for any y ! Kk, the function

g(y) = f(x(0) + y) =12(Ay, y)# (w(0), y) + f(x(0)) ,



we observe that minimizing f on [x(0) + Kk] is equivalent to minimizing gon the Krylov space Kk. Thanks to Corollary 4.3.2, g(y) admits a uniqueminimum in Kk, denoted (x(k+1)#x(0), that gives a unique x(k+1). And usingthe same Corollary, we have

(Ax(k+1) # b, y) = 0 , for all y ! Kk ,

which is equivalent to writing that w(k+1) is orthogonal to Kk. Hence, x(k+1)

minimizes the energy functional (4.10) and from this property, it follows thatthe energy norm , ·, A in the conjugate gradient method is monotonicallydecreasing.

Practical issues

In theory, the convergence of the conjugate gradient is achieved when the resid-ual w(k) = 0 and then x(k) is the solution to the system Ax = b. However,this termination property is only valid in exact arithmetic. Indeed, becauseof rounding errors, this criterion may not be satisfied in practice. This ex-plains why a parameter $ 1 1 (in general chosen in the range [10$8, 10$4]) isintroduced and the convergence is considered numerically achieved when

,w(k), ' $ ,w(0), .

Another issue concerns the choice of the initial vector x(0). Since there isin general no information on the solution to the system Ax = b, a typicalchoice is to set x(0) = 0. But if a sequence of closely related problems, x(0)

can be initialized to the solution of the previous resolution.The conjugate gradient algorithm involves only the single matrix-vector

product Ad(k) at each iteration, and for obvious e"ciency reasons, the residualw(k) is computed using the induction formula and not via the theoreticalformula w(k) = b#Ax(k).

At this point, one may wonder whether the conjugate gradient shall beconsidered as a direct or as an iterative method. On the one hand, we knowthat x(n$1) = x, the exact solution of Ax = b, but on the other hand, x(k+1)

is computed from x(k) and from the previous descent directions d(k). Shouldit be regarded as a direct method, the number of operations in the worst casescenario k = n# 1 would then be in O(n3), more computationally expensivethan Cholesky decomposition for instance. Clearly, the practice tends to con-sider the conjugate gradient method as an iterative method with an optimalconvergence rate. We have the following result

Proposition 4.3.10. Let A !Mn(R) be a symmetric positive definite matrixand x ! Rn be the exact soution of Ax = b. Let (x(k))k be the sequence ofapproximate solutions obtained by the conjugate gradient algorithm. Then wehave the following estimates



,x(k) # x,2 ' 2 cond2(A)1/2

3cond2(A)1/2 # 1cond2(A)1/2 + 1

4k

,x(0) # x,2

,x(k) # x,A ' 23

cond2(A)1/2 # 1cond2(A)1/2 + 1

4k

,x(0) # x,A .

Proof. (see [GvL83], [Saa96]). %&

It is interesting to notice that the approximation of x is strongly dependenton the number of iterations that are performed, i.e., more iterations leadto a better solution. This confirms that the conjugate gradient is indeed aniterative method. Furthermore, the convergence rate is quadratic, i.e., relatedto the square root of the condition number of A, and this is a much betterresult than with the gradient method. This also indicates that if cond2(A) isclose to 1 (optimal), then the convergence will be accelerated. Preconditioningis often a way of improving the e"ciency of the conjugate gradient method (seenext section). Finally, the convergence rate is often pessimistic asymptoticallyand the e!ective convergence is better. The e!ect of extremal eigenvaluestend to be eliminated as the number of iterations increases, thus leading to asuperlinear convergence.

Preconditioned conjugate gradients

Suppose the matrix A ! Mn(R) is ill-conditioned and consider for instance$ = cond2(A)$1 1 1. At each iteration of the conjugate gradient, the error,x(k) # x,A is then reduced by a factor 1 # 2

0$ and the convergence rate

is thus very slow. The idea of preconditioning is to replace the matrix Ain the original system by another matrix, leading to the same solution, ifits condition number is better than the condition number of A. Hence, theconjugate gradient method is expected to converge faster toward the solution.

Definition 4.3.13. Let Ax = b be the linear system to solve. Suppose P is asymmetric positive definite matrix, easy to invert and such that cond2(P$1A)is smaller than cond2(A). The equivalent system P$1Ax = P$1b is called apreconditioned system.

Since A is symmetric and positive definite, the preconditioned matrix mustalso have this property. To this end, we take P symmetric positive definiteand we consider a Choleski factorization P = LLt. Posing y = Ltx leads tosolve the systems

L$1ALty = L$1b and (Lt)$1y = x ,

and this last system is easy to solve since L is upper triangular.Consider A ! Mn(R) defined by A = L$1A(Lt)$1. The matrix A is

symmetric since



At = ((Lt)$1)tAt(L$1)t = L$1A(Lt)$1 = A ,

and is positive definite as

(Ax, x) = (L$1A(Lt)$1x, x) = (A(Lt)$1x, (Lt)$1x), ,

and thus (Ax, x) > 0 if x (= 0. Thus, the conjugate gradient algorithm canbe employed to solve the system Ay = L$1b, i.e., to compute a sequence ofapproximate solutions (y(k))k&0 such that

(Ay(k) # L$1b, y) = 0 , for all y ! Kk ,

and then to deduce (x(k))k&0 solution of the system. However, it is possibleto compute directly the sequence (x(k))k&0.

Definition 4.3.14 (Preconditioned conjugate gradient). Let A !Mn(R)be a symmetric positive definite matrix, b ! Rn. The preconditioned conjugategradient algorithm is defined as

1. Initialization:x(0) ! Rn, givenw(0) = b#Ax(0);s(0) = (LLt)$1w(0);d(0) = s(0);

2. Iteration:for k * 0, until ,w(k), < $,construct the sequence

!k =(w(k), d(k))(d(k), Ad(k))

x(k+1) = x(k) + !kd(k)

w(k+1) = w(k) # !kAd(k)

s(k+1) = (LLt)$1w(k+1)

:------;

------<

compute new solution

"k =(Ad(k), s(k+1))(Ad(k), d(k))

d(k+1) = s(k+1) # "kd(k)

:-;

-<compute new direction

We observe that the only new operation is the computation, at each itera-tion, of the term s(k) = (LLt)$1w(k) that requires solving the linear system(LLt)s(k) = w(k). This procedure can be achieved in O(n2) operations in theworst case scenario. The error estimate is the same as for the unprecondtionedconjugate gradient, simply replacing the matrix A by P$1A.



4.3.5 Nonsymmetric matrices: GMRes

We briefly mention another method for solving a linear system Ax = b, inthe case where the matrix A is nonsymmetric. The GMRes4 (GeneralizedMinimal Residual) method approximates the solution in a Krylov subspacewith minimal residual. For the proofs and further analysis, the reader is referedto [Axe94], [Saa96], [QSS00].

Definition 4.3.15. The degree of r ! Rn with respect to A ! Mn(R) isdefined as the minimum degree of a non null polynomial p in A, for which(p(A), r) = 0.

The dimension of the Krylov space Kk(A, r) is equal to the minimum betweenk and the degree of r with respect to A. Hence, the dimension of the Krylovsubspaces is an increasing function of k.

Here, we show that at the iteration k, it is possible to construct a Krylovsubspace of dimension k that minimizes the residual w(k). An orthonormalbasis of Kk(A, w(k)) can be computed using Arnoldi’s algorithm, for a fixed k.

Definition 4.3.16 (Arnoldi’s algorithm). Posing v(1) = w(0)/,w(0),2,Arnoldi’s algorithm generates the orthonormal basis for Kk(A, v(1)) using theGram-Schmidt procedure. We have

1. Initialization:

v(1) =w(0)

,w(0),22. Iteration:

for j = 1, . . . , k compute

hij = (Av(j), v(i)) , i = 1, 2, . . . , j

s(j) = Av(j) #j!

i=1

hijv(i) ,

hj+1j = ,s(j), , if hj+1j = 0 then exit.

v(j+1) =s(j)

hj+1j.

If s(j) = 0, the process terminates by a breakdown.

Lemma 4.3.12. Suppose hj+1j (= 0, 1 ' j ' k. Then the family (v(1), . . . , v(k))is orthonormal and forms a basis of Kk(A, v(1)).4 Y. Saad and M.H. Schultz (1986), GMRES: A generalized minimal residual algo-

rithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7:856-869.



Denoting Vk = [v(1)| . . . |v(k)] ! Mn,k(R) the rectangular matrix whosecolumns are the vectors v(j), we have

V tk AVk = Hk , V t

k+1AVk = Hk ,

where Hk !Mk+1,k(R) is the upper Hessenberg matrix whose entries hij aregiven by the Arnoldi’s algorithm and Hk !Mk(R) is the restriction of Hk tothe first k rows and k columns.

This algorithm will be used to solve the linear system Ax = b by a Krylovmethod. Classically, we search for x(k) ! Wk as the vector that minimizes thenorm of the residual ,w(k),2, i.e., such that

,Ax(k) # b,2 = miny%Wk

,Ay # b,2 .

To this end, we writex(k) = x(0) + Vkz(k), ,

where z(k) are coe"cients to be determined. Hence, we have

w(k) = w(0) #AVkz(k) = w(0) # Vk+1Hkz(k) = Vk+1(,w(0),2e1 # Hkz(k)) ,

where e1 is the first unit vector of Rk+1.

Lemma 4.3.13. Since the matrix Vk+1 is orthogonal, the minimum of w(k)

is characterized by

,,w(0),2e1 # Hkz(k),2 ' ,,w(0),2e1 # Hky(k),2 , for all y(k) ! Rk .

Hence, at each step k, z(k) is chosen in such a way to minimize the functional,,w(0),2e1# Hkz(k),2. This requires solving a linear least-squares problem ofsize k, at each iteration.

The GMRes method terminates in at most n iterations with the exactsolution.

Proposition 4.3.11. A breakdown occurs in the GMRes at iteration k < n ifand only if x(k) = x, the solution of Ax = b.

Practical issues

The GMRes algorithm requires storing a basis of the Krylov subspace inmemory. Hene, the memory requirement increases linearly with the numberof iterations k while the computational e!ort to orthogonalize Av(k) is propor-tional to n2. Therefore, a maximal dimension of the Krylov space is fixed inpractice, usually between 10 and 50. After k iterations, the GMRes algorithmis restarted with x(0) = x(k) and a new Krylov subspace is constructed.

It has been observed in numerical experiments that the GMRes algorithmoften exhibits a asuperlinear convergence. This indicates that the rate of con-vergence improves at each iteration.


4.4 Methods for computing eigenvalues 151

4.4 Methods for computing eigenvalues

In this section, we deal with approximations of the eigenvalues and eigenvec-tors of a matrix A !Mn(C). This topic is of importance in many applicationfields, like structural dynamics for instance. Eigenvalues provide useful infor-mation about the evolution of a system governed by a matrix. They are alsoimportant in the analysis of the stability of many numerical methods. Twoclasses of method can be identified, local methods that compute only the ex-trema eigenvalues of A and global methods that provide the whole spectrumof A.

We recall that the eigenvalues and eigenvectors of A are solutions of thelinear homogeneous system

(A# &In)x = 0 , x (= 0 . (4.12)

Hence, if & is an eigenvalue of A then the matrix A#&In is a singular matrix.The eigenvalues are the roots of the characteristic equation

PA(&) = det(A# &In) = 0 ,

where PA(&) denotes the characteristic polynomial of A. Thus, the matrixA has exactly n eigenvalues &i, counting multiple roots according to theirmultiplicities and then we have

PA(&) =n"

i=1

(&i # &) .

The next result relates the spectral radius '(A) to matrix norms and indicatesthat all the eigenvalues of A are enclosed in a circle a radius ,A, centered atthe origin in the complex plane.

Lemma 4.4.1. Let A !Mn(C) and let , ·, denote a matrix norm. Then

'(A) ' ,A, , or |&| ' ,A, , for all & ! Sp(A) .

Proof. Suppose & is an eigenvalue of A and x (= 0 an associated eigenvector.We have then

|&|,x, = ,&x, = ,Ax, ' ,A,,x, ,

and thus |&| ', A,. %&

The following theorem is used to locate eigenvalues of A.

Theorem 4.4.1 (Gershgorin’s circle5). Let A ! Mn(C). Then, all theeigenvalues of A lie in the union of the Gershgorin disks in the complex plane,i.e.,5 Gerschgorin S., Uber die Abgrenzung der Eigenwerte einer Matrix, Izv. Akad.

Nauk. USSR Otd. Fiz.-Mat. Nauk, 7: 749-754, (1931).



Sp(A) 8 SC =n#

i=1

Di , Di = {z ! C ; |z # aii| < ri} , ri =n!

j (=i

|aij | .

Proof. Let & be an eigenvalue of A and x (= 0 a corresponding eigenvector.Then, for all i = 1, . . . , n

(&# aii)xi =n!

j=1,i (=j

aijxj .

Introducing the , · ,' norm, we consider the index i such that |xi| = ,x,'.Then, we have

|&# aii| 'n!

j=1

|aij ||xj ||xi|

' ri .

This inequality implies the existence of a disk centered at aii of radius ri thatcontains each and any eigenvalue &. Hence, all eigenvalues lie in the union ofthese disks. %&

Definition 4.4.1. A matrix A !Mn(C) is said to be reducible if there existsa permutation matrix P such that

PAP t =3

A11 A12

0 A22

4,

where Aii are square matrices. Otherwise, A is called irreducible.

Proposition 4.4.1. Let A ! Mn(C) be a irreducible matrix. Then eacheigenvalue & ! Sp(A) lies in the union of the Gershgorin disks ( i.e., can-not lie on the boundary of the union) unless its lies on the boundary of allGershgorin disks.

4.4.1 The power method

The power method if one of the oldest methods for approximating the eigen-values and eigenvectors of a matrix and, more specifically, its eigenvalues &1

of largest module. Its interesting feature is that it does not decompose thematrix A and thus can be used with large sparse matrices.

Let A ! Mn(C) be a diagonalizable matrix and let * ! Mn(C) be thematrix of its eigenvectors (xi)1#i#n associated with the eigenvalues &1, . . . ,&n,supposed ordered as

|&1| > |&2| * · · · * |&n| ,

Here, &1 is called the dominant eigenvalue of A.

Definition 4.4.2. Let A !Mn(C) be a diagonalizable matrix. Given a start-ing vector unit norm q(0) ! Cn, the power method constructs the sequence ofvectors Aq(k) using the recursive algorithm



z(k) = Aq(k$1)

q(k) =z(k)

,z(k), 2

.

By induction, we have then

q(k) =Akq(0)

,Akq(0),2, k * 1 ,

the initial vector q(0) can be expanded along the basis of the eigenvectors xi

of A

q(0) =n!

j=1

!jxj , !j ! C , j = 1, . . . , n ,

and we have then for all k = 1, . . . , n

q(k) =n!

j=1

&kj !jxj = &k

1

&

'!1x1 +n!

j=2

3&j

&1

4k

!j&j

(

) ,

and likewise, since Axi = &ixi, we can write

Akq(0) = !1&k1

&

'x1 +n!

j=2

!j

!1

3&j

&1

4k

xj

(

) , k = 1, . . . , n .

Since |&j/&1| < 1, for j = 2, . . . , n, we observe that

1&k

1

q(k) = !1x1 + O

$====&2

&1

====k%

, (4.13)

and the vector q(k) has an increasingly significant component in the directionof x1 and thus will converge toward a limit vector which is he eigenvectorassociated with the dominating eigenvalue &1.

Lemma 4.4.2. Let A ! Mn(C) be a diagonalizable matrix with eigenvaluessuch that |&1| > |&2| * · · · * |&n|. If !1 (= 0, there exists a constant C > 0such that

,q(k) # x1,2 ' C

====&2

&1

====k

, k * 1 ,

where

q(k) =q(k),Akq(0),2

!1&k1

= x1 +n!

j=2

!j

!1

3&i

&1

4k

xi , k * 1 .

Convergence of the power method can be shown for any initial vector q(0)

since it depends only on the assumption !1 (= 0. However, the convergencerate (and thus the e"ciency of the method) depends on the separation of thedominant eigenvalues, |&2|/|&1|1 1.



Remark 4.4.1. Because it provides only the dominant eigenvalue, the powermethod is of limited practical uselfulness. Nevertheless, specific applicationsrequire such algorithm. For example, it is used in Google’s PageRank algo-rithm, which is an eigenvector of a matrix of order 2.7 billion6.

4.4.2 Inverse iteration method

At the di!erence of the power method that converges always to the eigenvalueof largest module, the inverse iteration method7 allows to chose the eigenvalueto converge to. More precisely, we look here for the eigenvalue of A !Mn(C)which is closest to a given number µ ! C, µ /! Sp(A).

Firstly, we observe that the power method allows to compute the smallesteigenvalue in modulus of a nonsingular matrix A !Mn(C), by applying thealgorithm to A$1. Indeed, from (4.13) we deduce that if the eigenvalues of Aare such that

|&1| *| &2| * · · · > |&n| ,and the power method is applied to A$1, the vector q(k) will converge towardthe eigenvector xn corresponding to &n.

Let µ /! Sp(A) be a crude approximation of an eigenvalue & of A. The poweriteration can be applied to the matrix B$1 = (A# µIn)$1. The eigenvectorsof (A # µIn) are those of A and the spectrum of this matrix if shifted by µ(called a shift for this reason). In other words, the eigenvalues .i of the matrixB$1 are related to the eigenvalues of A by

.i =1

&i # µ, &i = µ +

1.i

.

Suppose µ is closer to &m than any other eigenvalue of A, i.e.,

|&m # µ| < |&i # µ| , for all i = 1, . . . , n , i (= m ,

then &m # µ is the smallest eigenvalue of (A # µIn) and likewise, .m is theeigenvalue of B$1 with largest modulus. In the peculiar case µ = 0, .m is alsothe smallest eigenvalue of A.

Definition 4.4.3. Let A ! Mn(C). Given a starting vector q(0) ! Cn, theinverse iteration algorithm constructs the sequence

(A# µIn)z(k) = q(k$1)

q(k) =z(k)

,z(k),2.

6 Ipsen I., Wills R.M., Analysis and Computation of Google’s PageRank, in 7thIMACS International Symposium on Iterative Methods in Scientific Computing,Fields Institute, Toronto, Canada, 5-8 May (2005).

7 Wielandt H., Das Iterationsverfahren bei nicht selbstadjungierten linearen Eigen-wertaufgaben, Math. Z., 50: 93-143, (1944).



Notice that the matrix (A# µIn) is not inverted explicitly. At each iterationstep k, a linear system must be solved. In the symmetric case, a factorizationis performed at k = 1 on the matrix A # µIn = LDLt where D is a blockdiagonal matrix and L is block lower triangular. In the unsymmetric case, theLU factorization is performed at step k = 1, so that at each step two linearsystems are solved.

If the shift µ is su"ciently close to an eigenvalue &m,

|&m # µ|1| &i # µ| , for all &i (= &m ,

then (&m#µ)$1 is a dominating eigenvalue of (A#µIn) and q(k) will convergequickly to the eigenvector xi.

4.4.3 QR iteration

As its name tells, the QR iteration method8 performs a QR decomposition(see Section 4.2.5), to write a matrix as a product of an orthogonal matrix Qand an upper triangular matrix R.

We consider here the case of nonsingular real matrices and assume thatthe matrices have distinct real eigenvalues, |&1| > · · · > |&n| > 0.

Definition 4.4.4. Let A !Mn(R) such that |&1| > · · · > |&n| > 0. The QRmethod constructs the sequence of matrices (Ak)k&1 with A1 = A and

Ak+1 = RkQk ,

where QkRk = Ak is the QR decomposition of Ak (see Section 4.2.5).

Since we have Ak+1 = Qtk(QkRk)Qk = Qt

k(Ak)Qk = Q$1k (Ak)Qk, we show by

induction that

Ak+1 = Qtk(Ak)Qk = Qt

kQtk$1(Ak$1)Qk$1Qk = · · · = (Q(k+1))tAQ(k+1)

with Q(k) = Q1 . . . Qk. And thus, AQ(k) = Q(k+1)Rk+1. In other words, everymatrix Ak is orthogonally similar to the matrix A.

Lemma 4.4.3. Let A ! Mn(R) such that |&1| > · · · > |&n|. Then, the se-quence of matrices (Ak)k&1 generated by the QR iteration algorithm convergesto an upper triangular matrix whose diagonal entries are the eigenvalues of A.If the matrix A is symmetric, the sequence (Ak)k&1 tends to a diagonal matrix.

Proof. (see [DB74], [All08], for instance). %&8 Francis J.G.F., The QR Transformation, I, The Computer Journal, 4(3): 265-271,

(1961) and Kublanovskaya V.N., On some algorithms for the solution of the com-plete eigenvalue problem, USSR Computational Mathematics and MathematicalPhysics, 3: 637-657, (1961).



4.4.4 The Lanczos method

The Lanczos method9 computes the eigenvalues of a real symmetric matrix Awhose real eigenvalues are ordered as &1 * &2 * · · · * &n. Actually, the Lanc-zos algorithm can be viewed as a simplified version of the Arnoldi algorithm(see Definition 4.3.16).

Definition 4.4.5 (Lanczos algorithm). Given w(0) ! Rn a nonzero vectorand Kk(A, w(0) the Krylov space spanned by (w(0), . . . , Akw(0)) , the Lanczosalgorithm generates a sequence of vectors (v(j))j&1 by induction

1. Initialization:

v(0) = 0 , v(1) =w(0)

,w(0),, "1 = 0 ,

2. Iteration:for j = 2, . . . , k, compute

s(j) = Av(j) # "jv(j$1) ,

!j = (s(j), v(j)) ,

s(j) = s(j) # !jv(j) ,

"j+1 = ,s(j), , if "j+1 = 0 then exit.

v(j+1) =s(j)

"j+1.

By comparison with Arnoldi’s algorithm, we observe that !j = hjj and "j =hj$1j . The Lanczos algorithm generates a symmetric tridiagonal matrix

Tk =

&

11111'

!1 "2

"2 !2 "3

. . . . . . . . ."k$1 !k$1 "k

"k !k

(

22222)

and a matrix Vk = (v(1)| . . . |v(k)) with orthogonal columns spanning theKrylov space Kk(A, v(1)) such that

AVk = VkTk + " # k + 1v(k+1)etk ,

where ek is the kth vector of the canonical basis of Rk. The interesting resultis that the eigenvalues of Tk are eigenvalues of A as well.9 Lanczos C., An Iteration Method for the Solution of the Eigenvalue Problem of

Linear Di?erential and Integral Operator, J. Res. Nat. Bur. Stand., 45: 255-282,(1950).


References 157

4.5 Exercises and Problems

Exercise 4.5.1 (Symmetric positive definite matrices).

1. Let A ! Mn(R) a symmetric matrix. Show that A is positive definite ifand only if all its eigenvalues are stricly positive

2. Let A !Mn(R) a symmetric and positive definite matrix. Show that wecan define a matrix M ! Mn(R) symmetric and positive definite, suchthat B2 = A.

Exercise 4.5.2 (Spectral radius).

1. What is the spectral radius of the matrix A =3

a 40 a

4?

Check that if a !]0, 1[ then '(A) < 1, but ,Ap,1/p2 can be greater than 1.

2. Consider the matrix A !Mn(R) defined by

A =

&

11111'

2 #1#1 2 #1

. . . . . . . . .#1 2 #1

#1 2

(

22222)

Compute its gershorin disks and show that all eigenvalues & of A arestriclty positive.

Problem 4.5.1 (Linear system). Consider the linear system Ax = b where

A =3

1 22 3

4, b =

335

4,

and the iterative method to solve it

x(k+1) = B

References

[All08] Allaire G., Kaber S.M., Numerical Linear Algebra, Texts in AppliedMathematics, 55, Springer-Verlag, New York, (2008).

[Axe94] Axelsson O., Iterative Solution Methods, Cambridge University Press,New York, (1994).

[Bel70] Bellman R., Introduction to Matrix Analysis, Mc Graw Hill, New York,(1970).

[Che05] Chen K., Matrix preconditioning techniques and Applications, CambridgeMonographs on Applied and Computational Mathematics, CambridgeUniversity Press, New York, (2005).


158 References

[Cia89] Ciarlet P.G., Introduction to Numerical Linear Algebra and Optimisa-tion, Cambridge Texts in Applied Mathematics, Cambridge UniversityPress, Cambridge, (1989).

[DB74] Dahlquist G., Bjorck A., Numerical Methods, Prentice Hall, Series inAutomatic Computation, Englewood Cli!s, NJ, (1974).

[Dem97] Demmel J.W., Applied Numerical Linear Algebra, SIAM, Philadelphia,(1997).

[DER86] Du! I.S., Erisman A.M., Reid J.K., Direct Methods for Sparse Matrices,Oxford University Press, London, (1986).

[Fle80] Fletcher R., Practical Methods of Optimization, J. Wiley, New York,(1980).

[GvL83] Golub G.H., van Loan C.F., Matrix Computations, The John HopkinsUniversity Press, Baltimore, 3rd edition, (1983).

[Hac94] ackbush W., Iterative Solution of Large Sparse Systems of Equations,Springer-verlag, new York, (1994).

[HK71] Ho!man K., Kunze, R., Linear Algebra, 2nd ed., Prentice Hall, Engle-woods Cli!s, NJ, (1971).

[Hog07] Hogben L., Handbook of Linear Algebra, Discrete Mathematics and itsApplications, L. Hogben ed., Chapman et al., CRC, Boca Raton, (2007).

[Kel95] Kelley C.T., Iterative Methods for Linear and Nonlinear Equations,SIAM, Philadelphia, (1995).

[Kre05] Kressner D., Numerical Methods for General and Structured EigenvalueProblems, Lecture Notes in Computational Science and Engineering, 46,Springer, Berlin, (2005).

[Lan89] Lang S., Linear Algebra, Undergraduate Texts in Applied Mathematics,Springer-Verlag, New York, (1989).

[Lay03] Lay D.C., Linear Algebra and its Applications, 3rd ed., Addison WesleyPublishing Co., Reading, MA, (2003).

[Lax97] Lax P., Linear Algebra, John Wiley, New York, (1997).[Ort87] Ortega J.M., Matrix Theory, a Second Course, Plenum, New York,

(1987).[QSS00] Quarteroni A., Sacco R., Saleri F., Numerical Mathematics, Texts in

Applied Mathematics, 37, Springer-Verlag, New York, (2000).[Saa96] Saad Y., Iterative Methods for Sparse Linear Systems, PWS Publishing

Company, Boston, (1996).[Str80] Strang G., Linear Algebra and its Applications, 2nd ed., Academic Press

Inc., New York, (1980).[TB97] Trefethen L.N., Bau D., Numerical Linear Algebra, SIAM, Philadelphia,

(1997).[Var62] Varga R.S., Matrix iterative analysis, Prentice Hall, Englewood Cli!s,

NJ, (1962).[Wat02] Watkins D.S., Fundamentals of Matrix Computations, Pure and Applied

Mathematics, John Wiley & Sons, New York, (2002).


Date post:	29-May-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

4 Solving linear systems - sorbonne-universite systems.pdf · sparse linear systems are presented...

Documents