NumLinAlg

8/11/2019 NumLinAlg

1/72

Notes on Numerical Linear

Algebra

Dr. George W Benthien

December 9, 2006

E-mail: [email protected]

8/11/2019 NumLinAlg

2/72

Contents

Preface 5

1 Mathematical Preliminaries 6

1.1 Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Inner Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.3 Matrices As Linear Transformations . . . . . . . . . . . . . . . . . . . . . 9

1.3 Derivatives of Vector Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Newtons Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Solution of Systems of Linear Equations 11

2.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 The Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2 Row Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Elementary Unitary Matrices and the QR Factorization . . . . . . . . . . . . . . . 19

2.3.1 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . 19

1

8/11/2019 NumLinAlg

3/72

2.3.2 Householder Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3.3 Complex Householder Matrices . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.4 Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.5 Complex Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.6 QR Factorization Using Householder Reflectors. . . . . . . . . . . . . . . 28

2.3.7 Uniqueness of the Reduced QR Factorization . . . . . . . . . . . . . . . . 29

2.3.8 Solution of Least Squares Problems . . . . . . . . . . . . . . . . . . . . . 32

2.4 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.1 Derivation and Properties of the SVD . . . . . . . . . . . . . . . . . . . . 33

2.4.2 The SVD and Least Squares Problems . . . . . . . . . . . . . . . . . . . . 36

2.4.3 Singular Values and the Norm of a Matrix . . . . . . . . . . . . . . . . . . 39

2.4.4 Low Rank Matrix Approximations. . . . . . . . . . . . . . . . . . . . . . 39

2.4.5 The Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . . . 41

2.4.6 Computation of the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Eigenvalue Problems 44

3.1 Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 The Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Inverse Iteration with Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 The Basic QR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6.1 The QR Method with Shifts . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.7 The Divide-and-Conquer Method. . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Iterative Methods 61

2

8/11/2019 NumLinAlg

4/72

4.1 The Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3

8/11/2019 NumLinAlg

5/72

List of Figures

2.1 Householder reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Householder reduction of a matrix to bidiagonal form. . . . . . . . . . . . . . . . 42

3.1 Graph off ./ D 1 C :5

1C :5

2C :5

3C :5

4 . . . . . . . . . . . . . . . . . . . 58

3.2 Graph off ./ D 1 C :51

C :012

C :53

C :54

. . . . . . . . . . . . . . . . . . . 59

4

8/11/2019 NumLinAlg

6/72

Preface

The purpose of these notes is to present some of the standard procedures of numerical linear al-

gebra from the perspective of a user and not a computer specialist. You will not find extensive

error analysis or programming details. The purpose is to give the user a general idea of what the

numerical procedures are doing. You can find more extensive discussions in the references

Applied Numerical Linear Algebra by J. Demmel, SIAM 1997 Numerical Linear Algebra by L. Trefethen and D. Bau, Siam 1997 Matrix Computations by G. Golub and C. Van Loan, Johns Hopkins University Press 1996

The notes are divided into four chapters. The first chapter presents some of the notation used in

this paper and reviews some of the basic results of Linear Algebra. The second chapter discusses

methods for solving linear systems of equations, the third chapter discusses eigenvalue problems,

and the fourth discusses iterative methods. Of course we cannot discuss every possible method,

so I have tried to pick out those that I believe are the most used. I have assumed that the user hassome basic knowledge of linear algebra.

5

8/11/2019 NumLinAlg

7/72

Chapter 1

Mathematical Preliminaries

In this chapter we will describe some of the notation that will be used in these notes and reviewsome of the basic results from Linear Algebra.

1.1 Matrices and Vectors

A matrix is a two-dimensional array of real or complex numbers arranged in rows and columns. If

a matrixAhasmrows andncolumns, we say that it is anm nmatrix. We denote the element inthei -th row andj -th column ofAby aij. The matrixAis often written in the form

A D

a11 a1n:::

:::

am1 amn

:

We sometimes write AD .a1; : : : ; an/ where a1; : : : ; an are the columns of A. A vector (orn-vector) is an n1 matrix. The collection of all n-vectors is denoted by Rn if the elements(components) are all real and by Cn if the elements are complex. We define the sum of two

m nmatrices componentwise, i.e., the i ,j entry ofA C B isaijC bij. Similarly, we define themultiplication of a scalar times a matrixAto be the matrix whosei ,j component isaij.

IfA is a real matrix with components aij, then the transpose ofA (denoted byAT

) is the matrixwhose i ,j component is aj i , i.e. rows and columns are interchanged. IfA is a matrix with complex

components, thenAH is the matrix whosei ,j -th component is the complex conjugate of the j ,i -th

component ofA. We denote the complex conjugate ofa by a. Thus,.AH/ijD aj i . A real matrixA is said to be symmetric ifAD AT. A complex matrix is said to be Hermitian ifAD AH.Notice that the diagonal elements of a Hermitian matrix must be real. Then n matrix whosediagonal components are all one and whose off-diagonal components are all zero is called the

identity matrix and is denoted by I.

6

8/11/2019 NumLinAlg

8/72

IfA is anm k matrix andB is ank nmatrix, then the product AB is them nmatrix withcomponents given by

.AB/ijDkX

rD1

air brj:

The matrix product AB is only defined when the number of columns ofA is the same as thenumber of rows ofB . In particular, the product of anm nmatrixA and ann-vector xis given by

.Ax/iDnX

kD1

aik xk iD 1 ; : : : ; m :

It can be easily verified that I AD A if the number of columns in Iequals the number of rowsinA. It can also be shown that.AB/T D BTAT and .AB/H D BHAH. In addition, we have.AT/T D Aand .AH/H D A.

1.2 Vector Spaces

Rn and Cn together with the operations of addition and scalar multiplication are examples of a

structure called a vector space. A vector space Vis a collection of vectors for which addition and

scalar multiplication are defined in such a way that the following conditions hold:

1. Ifx andy belong to Vand is a scalar, thenx C y and x belong to V.2. x

CyD

yC

x for any two vectorsx and y in V.

3. x C .y C z/ D .x C y/ C z for any three vectorsx,y , andz in V.4. There is a vector0in Vsuch thatx C 0 D xfor allx in V.5. For eachx in Vthere is a vector x in Vsuch thatx C .x/ D 0.6. ./xD .x/for any scalars, and any vectorx in V.7. 1xD xfor anyx in V.8. .x C y/ D x C yfor anyx and y in Vand any scalar.9. . C /xD xC x for anyx in Vand any scalars,.

A subspace of a vector space Vis a subset that is also a vector space in its own right.

7

8/11/2019 NumLinAlg

9/72

1.2.1 Linear Independence and Bases

A set of vectorsv1; : : : ; vr is said to be linearly independent if the only way we can have 1v1 C C r vrD 0 is for 1D D rD 0. A set of vectorsv1; : : : ; vn is said to span a vectorspace Vif every vectorx in Vcan be written as a linear combination of the vectors v1; : : : ; vn, i.e.,

xD 1x1 C C nxn. The set of all linear combinations of the vectorsv1; : : : ; vr is a subspacedenoted by< v1; : : : ; vr >and called the span of these vectors. If a set of vectorsv1; : : : ; vn is

linearly independent and spansVit is called a basis forV. If a vector spaceVhas a basis consisting

of a finite number of vectors, then the space is said to be finite dimensional. In a finite-dimensional

vector space every basis has the same number of vectors. This number is called the dimension of

the vector space. Clearly Rn and Cn have dimensionn. Letek denote the vector in Rn or Cn that

consists of all zeroes except for a one in the k-th position. It is easily verified thate1; : : : ; en is a

basis for either Rn or Cn.

1.2.2 Inner Product and Orthogonality

Ifx and y are two n-vectors, then the inner (dot) productx yis the scaler value defined byxHy.If the vector space is real we can replacexH byxT. The inner productx y has the properties:

1. y xD x y2. x .y/ D .x y/3. x .y C z/ D x y C x z

4. x x 0and x xD 0if and only ifxD 0.Vectorsx and y are said to be orthogonal ifx yD 0. A basisv1; : : : ; vn is said to be orthonormalif

vi vjD(

0 i j1 iD j :

We define the norm kxk of a vector x by kxk D px xDq

jx1j2 C C jxnj2. The norm hasthe properties

1.

kx

k D jjk

xk

2.kxk D 0implies thatxD 03.kx C yk kxk C kyk.

Ifv1; : : : ; vn is an orthonormal basis and x D 1v1C C nvn, then it can be shown thatkxk2 D j1j2 C C jnj2. The norm and inner product satisfy the inequality

jx yj kxk kyk: Cauchy Inequality

8

8/11/2019 NumLinAlg

10/72

1.2.3 Matrices As Linear Transformations

Anm nmatrixAcan be considered as a mapping of the space Rn (Cn) into the space Rm (Cm)where the image of the n-vectorx is the matrix-vector product Ax. This mapping is linear, i.e.,

A.xC

y/D

AxC

AyandA.x/D

Ax. The range ofA (denoted by Range.A/) is the space

of allm-vectorsy such thatyD Ax for somen-vectorx. It can be shown that the range ofA isthe space spanned by the columns ofA. The null space ofA (denoted by Null.A/) is the vector

space consisting of all n-vectors x such that AxD 0. An n n square matrix A is said to beinvertible if it is a one-to-one mapping of the space Rn (Cn) onto itself. It can be shown that a

square matrixAis invertible if and only if the null space Null.A/consists of only the zero vector.

IfA is invertible, then the inverse A1 ofA is defined by A1yD xwhere xis the unique n-vectorsatisfyingAxD y . The inverse has the properties A1ADAA1 D I and.AB/1 D B1A1.We denote.A1/T and.AT/1 byAT.

IfAis anm nmatrix,x is ann-vector, andy is anm-vector; then it can be shown that

.Ax/ yD x .AHy/:

1.3 Derivatives of Vector Functions

The central idea behind differentiation is the local approximation of a function by a linear func-

tion. Iff is a function of one variable, then the locus of points

x;f.x/

is a plane curve C. The

tangent line to Cat

x;f.x/

is the graphical representation of the best local linear approximation

tof atx . We call this local linear approximation the differential. We represent this local linear

approximation by the equationdyD f0.x/dx. Iff is a function of two variables, then the locusof points

x;y ;f.x;y /

represents a surface S . Here the best local linear approximation to f at

.x;y/is graphically represented by the tangent plane to the surface S at the point

x;y ;f.x;y /

.

We will generalize this idea of a local linear approximation to vector-valued functions ofn vari-

ables. Letf be a function mapping n-vectors into m-vectors. We define the derivativeDf .x/ of

f at the n-vectorx to be the unique linear transformation (m nmatrix) satisfying

f .x C h/ D f.x/ C Df .x/h C o.khk/ (1.1)

whenever such a transformation exists. Here theonotation signifies a function with the property

limkhk!0

o.khk/khk D 0:

Thus,Df .x/is a linear transformation that locally approximatesf.

We can also define a directional derivative hf.x/in the directionhby

hf.x/ D lim!0

f .x C h/ f.x/

D dd

f .x C h/

D0(1.2)

9

8/11/2019 NumLinAlg

11/72

whenever the limit exists. This directional derivative is also referred to as the variationoff in the

directionh. IfDf .x/exists, then

hf.x/ D Df .x/h:However, the existence ofhf.x/for every directionh does not imply the existence ofDf .x/. If

we take hD

ei , thenhf.x/is just the partial derivative @f.x/

@xi.

1.3.1 Newtons Method

Newtons method is an iterative scheme for finding the zeroes of a smooth function f. Ifx is a

guess, then we approximatef nearx by

f .x C h/ D f.x/ C Df .x/h:

Ifx

Chis the zero of this linear approximation, then

h D Df .x/1f.x/or

x C h D x Df .x/1f.x/: (1.3)We can takexC h as an improved approximation to the nearby zero off . If we keep iteratingwith equation (1.3), then the.k C 1/-iterate x.kC1/ is related to thek-iteratex.k/ by

x.kC1/ D x.k/ Df .x.k//1f .x.k//: (1.4)

10

8/11/2019 NumLinAlg

12/72

Chapter 2

Solution of Systems of Linear Equations

2.1 Gaussian Elimination

Gaussian elimination is the standard way of solving a system of linear equations AxD b whenA is a square matrix with no special properties. The first known use of this method was in the

Chinese textNine Chapters on the Mathematical Artwritten between 200 BC and 100 BC. Here

it was used to solve a system of three equations in three unknowns. The coefficients (including

the right-hand-side) were written in tabular form and operations were performed on this table to

produce a triangular form that could be easily solved. It is remarkable that this was done long

before the development of matrix notation or even a notation for variables. The method was used

by Gauss in the early 1800s to solve a least squares problem for determining the orbit of the asteroid

Pallas. Using observations of Pallas taken between 1803 and 1809, he obtained a system of sixequations in six unknowns which he solved by the method now known as Gaussian elimination.

The concept of treating a matrix as an object and the development of an algebra for matrices were

first introduced by Cayley [2] in the paperA Memoir on the Theory of Matrices.

In this paper we will first describe the basic method and show that it is equivalent to factoring the

matrix into the product of a lower triangular and an upper triangular matrix, i.e., AD LU. Wewill then introduce the method of row pivoting that is necessary in order to keep the method stable.

We will show that row pivoting is equivalent to a factorizationPAD LU orA D PLU wherePis the identity matrix with its rows permuted. Having obtained this factorization, the solution for a

given right-hand-sideb is obtained by solving the two triangular systemsLyD

P band U xD

y

by simple processes called forward and backward substitution.

There are a number of good computer implementations of Gaussian elimination with row pivoting.

Matlab has a good implementation obtained by the call [L,U,P]=lu(A). Another good implemen-

tation is the LAPACK routine SGESV (DGESV,CGESV). It can be obtained in either Fortran or C

from the site www.netlib.org.

We will end by showing how the accuracy of a solution can be improved by a process called

11

8/11/2019 NumLinAlg

13/72

iterative refinement.

2.1.1 The Basic Procedure

Gaussian elimination begins by producing zeroes below the diagonal in the first column, i.e., : : : : : : :::

::: :::

: : :

!

: : : 0 : : : :::

::: :::

0 : : :

: (2.1)

Ifaijis the element ofA in thei -th row and thej -th column, then the first step in the Gaussian

elimination process consists of multiplyingA on the left by the lower triangular matrixL1 given

by

L1D

1 0 0 : : : 0

a21=a11 1 0 : : : 0a31=a11 0 1 :::

::: :::

: : : 0

an1=a11 0 : : : 0 1

; (2.2)

i.e., zeroes are produced in the first column by adding appropriate multiples of the first row to the

other rows. The next step is to produce zeroes below the diagonal in the second column, i.e.,

: : : 0 : : : :::

::: :::

0 : : :

!

: : : 0 0 0 :::

::: :::

0 0 : : :

: (2.3)

This can be obtained by multiplyingL1Aon the left by the lower triangular matrixL2given by

L2D1 0 0 0 : : : 00 1 0 0 00 a.1/32=a.1/22 1 0 00 a.1/42=a.1/22 0 1 0::: ::: ::: : : : 0

0 a.1/n2 =a.1/22 0 : : : 0 1

(2.4)where a

.1/ij is the i; j -th element ofL1A. Continuing in this manner, we can define lower triangular

matricesL3; : : : ; Ln1 so thatLn1 L1Ais upper triangular, i.e.,

Ln1 L1A D U: (2.5)

12

8/11/2019 NumLinAlg

14/72

Taking the inverses of the matricesL1; : : : ; Ln1, we can writeAas

A D L11 L1n1U: (2.6)Let

LD

L1

1 L1

n1

: (2.7)

Then it follows from equation (2.6) that

A D LU: (2.8)We will now show that Lis lower triangular. Each of the matricesLk can be written in the form

LkD I u.k/eTk (2.9)where ek is the vector whose components are all zero except for a one in thek-th position andu

.k/

is a vector whose first k components are zero. The termu.k/eTk is ann nmatrix whose elementsare all zero except for those below the diagonal in the k-th column. In fact, the components ofu.k/

are given by

u.k/i D

(0 1 i ka

.k1/

ik =a

.k1/

kk k < i

(2.10)

wherea.k1/ij is thei; j -th element ofLk1 L1A. SinceeTku.k/ D u.k/k D 0, it follows that

IC u.k/eTk

I u.k/eTk D IC u.k/eTk u.k/eTk u.k/eTku.k/eTk

D I u.k/eTku.k/eTkD I; (2.11)

i.e.,L1k D IC u.k/eTk : (2.12)

Thus, L1k is the same as Lk except for a change of sign of the elements below the diagonal in

columnk. Combining equations (2.7) and(2.12), we obtain

L D IC u.1/eT1 IC u.n1/eTn1 D IC u.1/eT1C C u.n1/eTn1: (2.13)In this expression the cross terms dropped out since

u.i/ eTi u.j /eTjD u.j /i u.i/ eTjD 0 fori < j:

Equation (2.13) implies thatL is lower triangular and that thek-th column ofLlooks like thek-thcolumn ofLk with the signs reversed on the elements below the diagonal, i.e.,

LD

1 0 0 : : : 0

a21=a11 1 0 0

a31=a11 a.1/32=a

.1/22 1 0

::: :::

: : : :::

an1=a11 a.1/n2 =a

.1/22 1

: (2.14)

13

8/11/2019 NumLinAlg

15/72

Having theLUfactorization given in equation (2.8), it is possible to solve the system of equations

AxD LUxD bfor any right-hand-side b. If we letyD Ux, theny can be found by solving the triangular systemLy

D b. Havingy, x can be obtained by solving the triangular system Ux

D y. Triangular

systems are very easy to solve. For example, in the system UxD y, the last equation can besolved forxn (the only unknown in this equation). Havingxn, the next to the last equation can be

solved forxn1 (the only unknown left in this equation). Continuing in this manner we can solve

for the remaining components ofx. For the systemLyD b, we start by computing y1 and thenwork our way down. Solving an upper triangular system is called back substitution. Solving a

lower triangular system is called forward substitution.

To computeL requires approximatelyn3=3operations where an operation consists of an addition

and a multiplication. For each right-hand-side, solving the two triangular systems requires approx-

imatelyn2 operations. Thus, as far as solving systems of equations is concerned, having the LU

factorization ofAis just as good as having the inverse ofA and is less costly to compute.

2.1.2 Row Pivoting

There is one problem with Gaussian elimination that has yet to be addressed. It is possible for

one of the diagonal elements a.k1/kk

that occur during Gaussian elimination to be zero or to be

very small. This causes a problem since we must divide by this diagonal element. If one of the

diagonals is exactly zero, the process obviously blows up. However, there can still be a problem

if one of the diagonals is small. In this case large elements are produced in both the L and U

matrices. These large entries lead to a loss of accuracy when there are subtractions involving these

big numbers. This problem can occur even for well behaved matrices. To eliminate this problem

we introduce row pivoting. In performing Gaussian elimination, it is not necessary to take the

equations in the order they are given. Suppose we are at the stage where we are zeroing out the

elements below the diagonal in the k -th column. We can interchange any of the rows from the

k-th row on without changing the structure of the matrix. In row pivoting we find the largest in

magnitude of the elements a.k1/

kk ; a.k1/

kC1;k; : : : ; a.k1/

nk and interchange rows to bring that element

to thek; k-position. Mathematically we can perform this row interchange by multiplying on the

left by the matrixPk that is like the identity matrix with the appropriate rows interchanged. The

matrix Pkhas the property PkPkD I, i.e.,Pk is its own inverse. With row pivoting equation (2.5)is replaced by

Ln1Pn1 L2P2L1P1A D U: (2.15)We can write this equation in the form

Ln1

Pn1Ln2P1

n1

Pn1Pn2Ln3P

1n2P

1n1

Pn1 P2L1P12 : : : P 1n1

Pn1 P1

A D U: (2.16)

DefineL0n1D Ln1 andL0kD Pn1 PkC1LkP1kC1 P1n1 kD 1 ; : : : ; n 2: (2.17)

14

8/11/2019 NumLinAlg

16/72

8/11/2019 NumLinAlg

17/72

2.1.3 Iterative Refinement

If the solution ofAxD b is not sufficiently accurate, the accuracy can be improved by applyingNewtons method to the functionf .x/ D Ax b. Ifx.k/ is an approximate solution to f .x/ D 0,then a Newton iteration produces an approximation x.kC1/ given by

x.kC1/ D x.k/ Df .x.k//1f .x.k// D x.k/ A1Ax.k/ b: (2.21)An iteration step can be summarized as follows:

1. compute the residualr .k/ D Ax.k/ b;2. solve the systemAd.k/ D r.k/ using theLU factorization ofA;3. Computex.kC1/ D x.k/ d.k/.

The residual is usually computed in double precision. If the above calculations were carried out

exactly, the answer would be obtained in one iteration as is always true when applying Newtons

method to a linear function. However, because of roundoff errors, it may require more than one

iteration to obtain the desired accuracy.

2.2 Cholesky Factorization

Matrices that are Hermitian (A

H

D A) and positive definite (xH

Ax > 0 for all x 0) occursufficiently often in practice that it is worth describing a variant of Gaussian elimination that isoften used for this class of matrices. Recall that Gaussian elimination amounted to a factorization

of a square matrixA into the product of a lower triangular matrix and an upper triangular matrix,

i.e.,A D LU. The Cholesky factorization represents a Hermitian positive definite matrixAby theproduct of a lower triangular matrix and its conjugate transpose, i.e., AD LLH. Because of thesymmetries involved, this factorization can be formed in roughly half the number of operations as

are needed for Gaussian elimination.

Let us begin by looking at some of the properties of positive definite matrices. Ifei is the i -th

column of the identity matrix and AD .aij/is positive definite, thenai iD eTi Aei > 0, i.e., thediagonal components ofA are real and positive. SupposeXis a nonsingular matrix of the samesize as the Hermitian, positive definite matrix A. Then

xH.XHAX/xD .Xx/HA.Xx/ > 0 for allx 0:

Thus,A Hermitian positive definite implies that XHAXis Hermitian positive definite. Conversely,

supposeXHAXis Hermitian positive definite. Then

A D .XX1/HA.XX1/ D .X1/H.XHAX/.X1/ is Hermitian positive definite.

16

8/11/2019 NumLinAlg

18/72

Next we will show that the component of largest magnitude of a Hermitian positive definite matrix

Aalways lies on the diagonal. Suppose that jakl j D maxi;jjaijj andk l. IfaklD jakl jeikl , letD eikl andxD ekC el . Then

xHAxD eTkAekC e Tl Aek C eTkAelC jj2eTl AelD akk C al l 2jakl j 0:This contradicts the fact that A is positive definite. Therefore, maxi;jjaijj D maxiai i . Supposewe partition the Hermitian positive definite matrix Aas follows

A D

B CH

C D

Ify is a nonzero vector compatible withD, letxH D .0;yH/. Then

xHAxD .0;yH/

B CH

C D

0

y

D yHDy > 0;

i.e., D is Hermitian positive definite. Similarly, lettingxH

D .yH

; 0/, we can show that B isHermitian positive definite.

We will now show that ifA is a Hermitian, positive-definite matrix, then there is a unique lower

triangular matrixL with positive diagonals such thatAD LLH. This factorization is called theCholesky factorization. We will establish this result by induction on the dimension n. Clearly, the

result is true for nD 1. For in this case we can take LD .pa11/. Suppose the result is true formatrices of dimensionn 1. LetA be a Hermitian, positive-definite matrix of dimensionn. Wecan partitionAas follows

A D a11 w

H

w K (2.22)where wis a vector of dimension n 1andKis a .n 1/ .n 1/matrix. It is easily verified that

A D

a11 wH

w K

D B H

1 0

0 K wwHa11

!B (2.23)

where

BDp

a11wHp

a110 I

!: (2.24)

We will first show that the matrix B is invertible. If

BxDpa11 wHpa11

0 I

!x1x2

Dpa11x1 C wHx2pa11

x2

!D 0;

then x2D 0 andp

a11x1D x1D 0. Therefore,B is invertible. From our discussion at thebeginning of this section it follows from equation (2.23) that the matrix

1 0

0 k wwHa11

!

17

8/11/2019 NumLinAlg

19/72

is Hermitian positive definite. By the results on the partitioning of a positive definite matrix, it

follows that the matrix

K w wH

a11

is Hermitian positive definite. By the induction hypothesis, there exists a unique lower triangular

matrixOLwith positive diagonals such that

K wwH

a11DOL OLH: (2.25)

Substituting equation (2.25) into equation (2.23), we get

A D B H

1 0

0 OL OLH

BD B H

1 0

0 OL

1 0

0 OLH

BDp

a11 0wpa11

OL

!pa11

wHpa11

0 OLH

! (2.26)

which is the desired factorization ofA. To show uniqueness, suppose that

A D

a11 wH

w K

D

l11 0

v OL

l11 vH

0 OLH

(2.27)

is a Cholesky factorization ofA. Equating components in equation (2.27), we see thatl 211D a11and hence thatl11D

pa11. Alsol11vD wor vD w= l11D w=

pa11. Finally, vv

H COL OLH D KorK vvH DK ww H=a11DOL OLH. SinceOL OLH is the unique factorization of the .n 1/ .n 1/Hermitian, positive-definite matrixK ww H=a11, we see that the Cholesky factorizationofA is unique. It now follows by induction that there is a unique Cholesky factorization of any

Hermitian, positive-definite matrix.

The factorization in equation (2.23) is the basis for the computation of the Cholesky factorization.

The matrix B H is lower triangular. Since the matrixK ww H=a11 is positive definite, it canbe factored in the same manner. Continuing in this manner until the center matrix becomes the

identity matrix, we obtain lower triangular matrices L1; : : : ; Ln such that

A D L1 LnLHn LH1 :

LettingL D L1 Ln, we have the desired Cholesky factorization.

As was mentioned previously, the number of operations in the Cholesky factorization is about half

the number in Gaussian elimination. Unlike Gaussian elimination the Cholesky method does not

need pivoting in order to maintain stability. The Cholesky factorization can also be written in theform

A D LDLH

whereD is diagonal andL now has all ones on the diagonal.

18

8/11/2019 NumLinAlg

20/72

2.3 Elementary Unitary Matrices and the QR Factorization

In Gaussian elimination we saw that a square matrix A could be reduced to triangular form by

multiplying on the left by a series of elementary lower triangular matrices. This process can also

be expressed as a factorization A D LU whereLis lower triangular and Uis upper triangular. Inleast squares problems the number of rows m in A is usually greater than the number of columnsn. The standard technique for solving least-squares problems of this type is to make use of a

factorization A D QRwhereQ is anm munitary matrix andR has the form

RDOR

0

with OR an n n upper triangular matrix. The usual way of obtaining this factorization is toreduce the matrix Ato triangular form by multiplying on the left by a series of elementary unitary

matrices that are sometimes called Householder matrices (reflectors). We will show how to use

thisQRfactorization to solve least square problems. If OQ is the m n matrix consisting of thefirstncolumns ofQ, then

A D OQOR:This factorization is called the reduced QR factorization. Elementary unitary matrices are also

used to reduce square matrices to a simplified form (Hessenberg or tridiagonal) prior to eigenvalue

calculation.

There are several good computer implementations that use the Householder QR factorization to

solve the least squares problem. The LAPACK routine is called SGELS (DGELS,CGELS). In

Matlab the solution of the least squares problem is given by Anb. TheQR factorization can beobtained with the call[Q,R]=qr(A).

2.3.1 Gram-Schmidt Orthogonalization

A reduced QR factorization can be obtained by an orthogonalization procedure known as the

Gram-Schmidt process. Suppose we would like to construct an orthonormal set of vectors q1; : : : ; qnfrom a given linearly independent set of vectorsa1; : : : ; an. The process is recursive. At the j -th

step we construct a unit vector qj that is orthogonal toq1; : : : ; qj1using

vjD ajj1XiD1

.qHi aj/qi

qjD vj=kvjk:

The orthonormal basis constructed has the additional property

< q1; : : : ; qj >D< a1; : : : ; aj > jD 1 ; 2 ; : : : ; n :

19

8/11/2019 NumLinAlg

21/72

If we consider a1; : : : ; an as columns of a matrix A, then this process is equivalent to the matrix

factorizationAD OQORwhere OQD .q1; : : : ; qn/ and OR is upper triangular. Although the Gram-Schmidt process is very useful in theoretical considerations, it does not lead to a stable numerical

procedure. In the next section we will discuss Householder reflectors, which lead to a more stable

procedure for obtaining aQRfactorization.

2.3.2 Householder Reflections

Let us begin by describing the Householder reflectors. In this section we will restrict ourselves to

real matrices. Afterwards we will see that there are a number of generalizations to the complex

case. Ifv is a fixed vector of dimension mwith kvk D 1, then the set of all vectors orthogonal to vis an.m 1/-dimensional subspace called a hyperplane. If we denote this hyperplane by H, then

HD fu W vTu D 0g: (2.28)

Here vT denotes the transpose ofv. Ifxis a point not on H, letNxdenote the orthogonal projectionofx ontoH(see Figure2.1). The differenceNx xmust be orthogonal toHand hence a multipleofv, i.e.,

Nx xD v or NxD x C v: (2.29)

x

x

xv

H

Figure 2.1: Householder reflection

SinceNxlies onH andvTvD kvk2 D 1, we must havevT NxD vTx C vTvD vTx C D 0: (2.30)

Thus,D vTxand consequentlyNxD x .vTx/vD x vvTxD .I vv T/x: (2.31)

20

8/11/2019 NumLinAlg

22/72

DefinePD I vv T. ThenPis a projection matrix that projects vectors orthogonally onto H.The projectionNx is obtained by going a certain distance from x in the directionv. Figure2.1suggests that the reflectionOx ofx acrossHcan be obtained by going twice that distance in thesame direction, i.e.,

OxD x 2.vT

x/vD x 2vvT

xD .I 2vvT

/x: (2.32)

With this motivation we define the Householder reflector Q by

QD I 2vvT kvk D 1: (2.33)An alternate form for the Householder reflector is

QD I 2uuT

kuk2 (2.34)

where here u is not restricted to be a unit vector. Notice that, in this form, replacing u by a multiple

ofu does not changeQ. The matrixQ is clearly symmetric, i.e., QT

DQ. Moreover,

QTQD Q2 D .I 2vv T/.I 2vvT/ D I 2vv T 2vv T C 4vvTvvT D I; (2.35)i.e.,Q is an orthogonal matrix. As with all orthogonal matricesQpreserves the norm of a vector,

i.e.,

kQxk2 D .Qx/TQxD xTQTQxD xTxD kxk2: (2.36)

To reduce a matrix to one that is upper triangular it is necessary to zero out columns below a certain

position. We will show how to construct a Householder reflector so that its action on a given vector

x is a multiple ofe1, the first column of the identity matrix. To zero out a vector below rowk we

can use a matrix of the form

QD I 00 Q

where Iis the.k 1/ .k 1/identity matrix and Qis a.m k C 1/ .m k C 1/ Householdermatrix. Thus, for a given vector x we would like to choose a vector u so thatQx is a multiple of

the unit vectore1, i.e.,

QxD x 2.uTx/

kuk2 u D e1: (2.37)

SinceQ preserves norms, we must have jj D kxk. Therefore, equation (2.37) becomes

Qx

Dx

2.uTx/

kuk2

u

D kx

ke1: (2.38)

It follows from equation (2.38) thatu must be a multiple of the vector x kxke1. Sinceucan bereplaced by a multiple ofuwithout changingQ, we let

u D x kxke1: (2.39)It follows from the definition ofuin equation (2.39) that

uTxD kxk2 kxkx1 (2.40)

21

8/11/2019 NumLinAlg

23/72

and

kuk2 D uTu D kxk2 kxkx1 kxkx1 C kxk2 D 2.kxk2 kxkx1/: (2.41)Therefore,

2.uTx/

kuk

2 D 1; (2.42)

and henceQx becomes

QxD x 2.uTx/

kuk2 u D x u D kxke1 (2.43)

as desired. From what has been discussed so far, either of the signs in equation (2.39) would

produce the desired result. However, ifx1 is very large compared to the other components, then it

is possible to lose accuracy through subtraction in the computation ofu D x kxke1. To preventthis we chooseuto be

u D x C sign.x1/kxke1 (2.44)where sign.x1/is defined by

sign.x1/ D(

C1 x1 01 x1 < 0:

(2.45)

With this choice ofu, equation (2.43) becomes

QxD sign.x1/kxke1: (2.46)

In practice,uis often scaled so that u1D 1, i.e.,

u

D

x C sign.x1/kxke1

x1 C sign.x1/kxk : (2.47)

With this choice ofu,

kuk2 D 2kxkkxk C jx1j : (2.48)

The matrixQapplied to a general vectory is given by

QyD y 2uTy

kuk2u: (2.49)

2.3.3 Complex Householder Matrices

Thee are several ways to generalize Householder matrices to the complex case. The most obvious

is to let

UD I 2 uuH

kuk2where the superscriptHdenotes conjugate transpose. It can be shown that a matrix of this form

is both Hermitian (UD UH) and unitary (UHUD I). However, it is sometimes convenient

22

8/11/2019 NumLinAlg

24/72

to be able to construct a U such that UHx is a real multiple ofe1. This is especially true when

converting a Hermitian matrix to tridiagonal form prior to an eigenvalue computation. For in this

case the tridiagonal matrix becomes a real symmetric matrix even when starting with a complex

Hermitian matrix. Thus, it is not necessary to have a separate eigenvalue routine for the complex

case. It turns out that there is no Hermitian unitary matrix U, as defined above, that is guaranteed to

produce a real multiple ofe1. Therefore, linear algebra libraries such as LAPACK use elementaryunitary matrices of the form

UD I wwH (2.50)wherecan be complex. These matrices are not in general Hermitian. IfUis to be unitary, we

must have

ID UHUD .I wwH/.I wwH/ D I .C jj2 kwk2/ww H

and hence

jj2 kwk2 D 2 Re./: (2.51)

Notice that replacingw byw=and by jj2in equation (2.50) leavesU unchanged. Thus, ascaling ofw can be absorbed in. We would like to choosew and so that

UHxD x .wHx/wD kxke1 (2.52)

where D 1. It can be seen from equation (2.52) that w must be proportional to the vectorx kxke1. Since the factor of proportionality can be absorbed in, we choose

wD x kxke1: (2.53)

Substituting this expression forw into equation (2.52), we get

UHxD x .wHx/.x kxke1/ D .1 wHx/xC .wHx/kxke1D kxke1: (2.54)

Thus, we must have

.wHx/D 1 or D 1xHw

: (2.55)

This choice of gives

UHxD kxke1:It follows from equation (2.53) that

xHw

D kx

k2

kx

kx1 (2.56)

and

kwk2 D .xH kxkeT1/.x kxke1/ D kxk2 kxkx1 kxkx1 C kxk2D 2kxk2 kxk Re.x1/ (2.57)

23

8/11/2019 NumLinAlg

25/72

Thus, it follows from equations (2.55)(2.57) that

2 Re. /

jj2 D C

D 1

C 1

D xHw C wHx

D kxk2 kxkx1C kxk2 kxkx1D 2kxk2 2kxk Re.x1/D kwk2;

i.e., the condition in equation (2.51) is satisfied. It follows that the matrix Udefined by equation

(2.50) is unitary when w is defined by equation (2.53) and is defined by equation (2.55). As

before we chooseto prevent the loss of accuracy due to subtraction in equation ( 2.53). In this

case we chooseD signRe.x1/. Thus,w becomeswD x C signRe.x1/kxke1: (2.58)

Let us define a real constant by

D signRe.x1/kxk: (2.59)With this definitionwbecomes

wD x C e1: (2.60)It follows that

xHwD kxk2 C x1D 2 C x1D . C x1/; (2.61)and hence

D 1.

Cx1/

: (2.62)

In LAPACKw is scaled so thatw1D 1, i.e.,

wD x C e1x1 C

: (2.63)

With thisw ,becomes

Djx1 C j2

. C x1/D .x1 C /.x1 C /

. C x1/ D x1 C

: (2.64)

Clearly thissatisfies the inequality

j 1j Djx1jjj Djx1jkxk 1: (2.65)

It follows from equation (2.64) thatis real whenx1is real. Thus,Uis Hermitian whenx1is real.

An alternate approach to defining a complex Householder matrix is to let

UD I 2w wH

kwk2 : (2.66)

24

8/11/2019 NumLinAlg

26/72

ThisUis Hermitian and

UHUD

I 2wwH

kwk2

I 2wwH

kwk2

D I 2wwH

kwk2 2ww H

kwk2 C4kwk2ww H

kwk4 D I; (2.67)

i.e.,Uis unitary. We want to choosew so that

UHxD UxD x 2wHx

kwk2 wD kxke1 (2.68)

where jj D 1. Multiplying equation (2.68) byxH, we get

xHUxD xHUHxD xHUxD kxkx1: (2.69)

SincexHUxis real, it follows that x1is real. Ifx1D jx1jei1, thenmust have the form

D ei1: (2.70)

It follows from equation (2.68) that w must be proportional to the vector x ei1kxke1. Sincemultiplying wby a constant factor doesnt change U, we take

wD x ei1kxke1: (2.71)

Again, to avoid accuracy problems, we choose the plus sign in the above formula, i.e.,

wD x C ei1kxke1: (2.72)

It follows from this definition that

kwk2 D xH C ei1kxkeT1 x C ei1kxke1D kxk2 C jx1jkxk C jx1jkxk C kxk2 D 2kxk

kxk C jx1j (2.73)and

wHxD xH C ei1kxkeT1 xD kxk2 C ei1x1kxk D kxkkxk C jx1j: (2.74)Therefore,

2wHx

kwk2 D 1; (2.75)

and henceUxD x wD x x C ei1kxke1 D ei1kxke1: (2.76)

This alternate form for the Householder matrix has the advantage that it is Hermitian and that the

multiplier ofwwH is real. However, it cant in general map a given vectorx into a real multiple of

e1. Both EISPACK and LINPACK use elementary unitary matrices similar to this. The LAPACK

form is not Hermitian, involves a complex multiplier ofwwH, but can produce a real multiple of

e1 when acting on x. As stated before, this can be a big advantage when reducing matrices to

triangular form prior to an eigenvalue computation.

25

8/11/2019 NumLinAlg

27/72

2.3.4 Givens Rotations

Householder matrices are very good at producing long strings of zeroes in a row or column. Some-

times, however, we want to produce a zero in a matrix while altering as little of the matrix as

possible. This is true when dealing with matrices that are very sparse (most of the elements are al-

ready zero) or when performing many operations in parallel. The Givens rotations can sometimes

be used for this purpose. We will begin by considering the case where all matrices and vectors are

real. The complex case will be considered in the next section.

The two-dimensional matrix

RD

cos sin sin cos

rotates a 2-vector counterclockwise through an angle . If we let cD cos andsD sin , thenthe matrixRcan be written as

R

D c s

s c

where c 2 C s2 D 1. Ifx is a 2-vector, we can determine c ands so that Rx is a multiple ofe1.Since

RxD

cx1 sx2sx1C cx2

;

R will have the desired property ifcD x1=q

x21C x22 andsD x2=q

x21C x22 . In factRx Dqx21C x22 e1.

Givens matrices are an extension of this two-dimensional rotation to higher dimensions. For j > i ,

the givens matrix G.i;j/is anmmmatrix that performs a counterclockwise rotation in the .i;j/coordinate plane. It can be obtained by replacing the.i; i/ and.j;j / components of the m midentity matrix byc , the.i;j /component by s and the.j;i /component by s. It has the matrixform

G.i;j/D

coli colj

rowi

row j1 1 : : : c s: : :

s c: : :

1

1

(2.77)26

8/11/2019 NumLinAlg

28/72

wherec2 C s2 D 1. The matrixG.i; j /is clearly orthogonal. In terms of components

G.i;j/klD 1 kD l,k i andk jc kD l,kD i orkD js kD i ,lD js kD j ,lD i

0 otherwise

: (2.78)

Multiplying a vector byG.i; j /only affects thei andj components. IfyD G.i;j/x, then

ykD

xk k i andk jcxi sxj kD isxiC cxj kD j

: (2.79)

Suppose we want to makeyj

D0. We can do this by setting

cD xiqx2iC x2j

and sD xjqx2iC x2j

: (2.80)

With this choice forc and s,y becomes

ykD

xk k i andk jq

x2iC x2j kD i0 kD j

: (2.81)

Multiplying a matrixA on the left byG.i;j/only alters rows i andj . Similarly, MultiplyingAon the right byG.i;j /only alters columnsi andj

2.3.5 Complex Givens Rotations

For the complex case we replaceR in the previous section by

RD

c ss c

wherec is real. (2.82)

It can be easily verified thatRis unitary if and only ifc ands satisfy

c2 C jsj2 D 1:

Given a 2-vector x, we want to choose R so thatRx is a multiple ofe1. ForR unitary, we must

have

RxD kxke1 where jj D 1. (2.83)

27

8/11/2019 NumLinAlg

29/72

Multiplying equation (2.83) byRH, we get

xD RRHxD kxkRHe1D kxk

c

s

(2.84)

or

cD x1kxk and sDx

2

kxk : (2.85)

We define sign.u/for ucomplex by

sign.u/ D(

u=juj u 01 u D 0: (2.86)

Ifc is to be real,must have the form

D sign.x1/ D 1:

With this choice of,c and s become

cD jx1jkxk and sD

x2 sign.x1/kxk :

(2.87)

If we want the complex case to reduce to the real case when x1 and x2 are real, then we can

chooseD signRe.x1/. As before, we can constructG.i; j / by replacing the .i;i / and.j;j /components of the identity matrix by c, the.i;j /component by s, and the.j;i / component bys. In the expressions for c ands in equation (2.87), we replace x1 by xi , x2 by xj, andkxk by

qjxi j2 C jxjj2.

2.3.6 QR Factorization Using Householder Reflectors

LetAbe anm nmatrix withm > n. LetQ1be a Householder matrix that maps the first columnofA into a multiple ofe1. ThenQ1A will have zeroes below the diagonal in the first column. Now

let

Q2D

1 0

0 OQ2

where OQ2 is an .m

1/

.m

1/ Householder matrix that will zero out the entries below the

diagonal in the second column ofQ1A. Continuing in this manner, we can construct Q2; : : : ; Qn1so that

Qn1 Q1A DOR

0

(2.88)

where ORis ann ntriangular matrix. The matricesQk have the form

QkD

I 0

0 OQk

(2.89)

28

8/11/2019 NumLinAlg

30/72

where OQk is an.m k C 1/ .m k C 1/Householder matrix. If we define

QH D Qn1 Q1 and RDOR

0

; (2.90)

then equation (2.88) can be written QHA D R: (2.91)Moreover, since eachQk is unitary, we have

QHQD .Qn1 Q1/.QH1 QHn1/ D I; (2.92)

i.e.,Q is unitary. Therefore, equation (2.91) can be written

A D QR: (2.93)

Equation (2.93) is the desired factorization. The operations count for this factorization is approxi-

matelymn2

where an operation is an addition and a multiplication. In practice it is not necessaryto construct the matrix Qexplicitly. Usually only the vectorsv defining eachQk are saved.

If OQis the matrix consisting of the first ncolumns ofQ, then

A D OQOR (2.94)

where OQ is am n matrix with orthonormal columns and OR is an nupper triangular matrix.The factorization in equation (2.94) is the reducedQRfactorization.

2.3.7 Uniqueness of the Reduced QR Factorization

In this section we will show that a matrix A of full rank has a unique reduced QRfactorization if

we require that the triangular matrix Rhas positive diagonals. All other reduced QR factorizations

ofAare simply related to this one with positive diagonals.

The reducedQRfactorization can be written

A

D.a1; a2; : : : ; an/

D.q1; q2; : : : ; qn/r11 r12 r1nr22 r2n: :

:

:::

rnn

: (2.95)IfA has full rank, then all of the diagonal elements rjj must be nonzero. Equating columns in

equation (2.95), we get

ajDjX

kD1

rkjqkD rjjqjCj1XkD1

rkjqk

29

8/11/2019 NumLinAlg

31/72

or

qjD 1rjj

.ajj1XkD1

rkjqk/: (2.96)

WhenjD 1equation (2.96) reduces to

q1D a1r11

: (2.97)

Sinceq1must have unit norm, it follows that

jr11j D ka1k: (2.98)

Equations (2.97) and(2.98) determineq1 and r11 up to a factor having absolute value one, i.e.,

there is ad1 with jd1j D 1such that

r11

Dd1

Or11 q1

D

Oq1d

1

whereOr11D ka1k andOq1D a1=Or11.

ForjD 2, equation (2.96) becomes

q2D 1r22

.a2 r12q1/:

Since the columnsq1and q2 must be orthonormal, it follows that

0 D qH1 q2D 1

r22.qH1 a2 r12/

and hence that

r12D qH1 a2D d1 OqH1 a2: (2.99)Here we have used the fact thatd1D 1=d1. Sinceq2has unit norm, it follows that

1 D kq2k D 1jr22jka2 r12q1k D 1jr22j

ka2 .d1 OqH1 a2/ Oq1=d1k D 1

jr22jka2 . OqH1 a2/ Oq1k

and hence that

jr22j D ka2 . OqH1 a2/ Oq1k Or22:Therefore, there exists a scalar d2with

jd2

j D1such that

r22D d2 Or22 and q2D Oq2=d2whereOq2D

a2 . OqH1 a2/ Oq1

=Or22.

ForjD 3, equation (2.96) becomes

q3D 1r33

.a3 r13q1 r23q2/:

30

8/11/2019 NumLinAlg

32/72

Since the columnsq1,q2 and q3must be orthonormal, it follows that

0 D qH1 q3D 1

r33.qH1 a3 r13/

0

DqH2 q3

D 1

r33.qH2 a3

r23/

and hence that

r13D qH1 a3D d1 OqH1 a3r23D qH2 a3D d2 OqH2 a3:

Sinceq3has unit norm, it follows that

1 D kq3k D 1jr33jka3 r13q1 r23q2k D 1jr33j

ka3 . OqH1 a3/ Oq1 . OqH2 a3/ Oq2k

and hence that

jr33j D ka3 . Oq1Ha3/Oq1 . Oq2Ha3/Oq2k Or33:Therefore, there exists a scalar d3with jd3j D 1such that

r33D d3 Or33 and q3D Oq3=d3 (2.100)

whereOq3D

a3 .Oq1Ha3/ Oq1 .Oq2Ha3/ Oq2

=Or33. Continuing in this way we obtain the unitarymatrix OQD . Oq1; : : : ;Oqn/and the triangular matrix

ORD Or11 Or12 Or1nOr22 Or2n: : : :::

Ornn

such thatAD OQORis the unique reduced QRfactorization ofA withR having positive diagonalelements. IfA D QRis any other reducedQRfactorization ofA, then

RDd1

: : :

dn ORand

QD OQ

1=d1

: : :

1=dn

D OQ

d1

: : :

dn

where jd1j D D jdnj D 1.

31

8/11/2019 NumLinAlg

33/72

2.3.8 Solution of Least Squares Problems

In this section we will show how to use the QRfactorization to solve the least squares problem.

Consider the system of linear equations

AxD

b (2.101)

where A is an m n matrix with m > n. In general there is no solution to this system of equa-tions. Instead we seek to find an x so thatkAx bk is as small as possible. In view of theQRfactorization, we have

kAx bk2 D kQRx bk2 D kQ.Rx QHb/k2 D kRx QHbk2: (2.102)

We can writeQ in the partitioned formQD .Q1; Q2/whereQ1 is anm nmatrix. Then

Rx QHbDORx

0

QH1 b

QH2 b

DORx QH1 b

QH2 b

: (2.103)

It follows from equation (2.103) that

kRx QHbk2 D kORx QH1 bk2 C kQH2 bk2: (2.104)

Combining equations (2.102) and(2.104), we get

kAx bk2 D kORx QH1 bk2 C kQH2 bk2: (2.105)

It can be easily seen from this equation that kAx bk is minimized whenx is the solution of thetriangular system

ORx

DQH1 b (2.106)

when such a solution exists. This is the standard way of solving least square systems. Later we will

discuss the singular value decomposition (SVD) that will provide even more information relative

to the least squares problem. However, the SVD is much more expensive to compute than the QR

decomposition.

2.4 The Singular Value Decomposition

The Singular Value Decomposition (SVD) is one of the most important and probably one of the

least well known of the matrix factorizations. It has many applications in statistics, signal process-

ing, image compression, pattern recognition, weather prediction, and modal analysis to name a

few. It is also a powerful diagnostic tool. For example, it provides approximations to the rank and

the condition number of a matrix as well as providing orthonormal bases for both the range and

the null space of a matrix. It also provides optimal low rank approximations to a matrix. The SVD

is applicable to both square and rectangular matrices. In this regard it provides a general solution

to the least squares problem.

32

8/11/2019 NumLinAlg

34/72

The SVD was first discovered by differential geometers in connection with the analysis of bilinear

forms. Eugenio Beltrami [1] (1873) and Camille Jordan [10] (1874) independently discovered

that the singular values of the matrix associated with a bilinear form comprise a complete set

of invariants for the form under orthogonal substitutions. The first proof of the singular value

decomposition for rectangular and complex matrices seems to be by Eckart and Young [5] in 1939.

They saw it as a generalization of the principal axis transformation for Hermitian matrices.

We will begin by deriving the SVD and presenting some of its most important properties. We will

then discuss its application to least squares problems and matrix approximation problems. Follow-

ing this we will show how singular values can be used to determine the condition of a matrix (how

close the rows or columns are to being linearly dependent). We will conclude with a brief outline

of the methods used to compute the SVD. Most of the methods are modifications of methods used

to compute eigenvalues and vectors of a square matrix. The details of the computational methods

are beyond the scope of this presentation, but we will provide references for those interested.

2.4.1 Derivation and Properties of the SVD

Theorem 1. (Singular Value Decomposition)LetAbe a nonzerom nmatrix. Then there existsan orthonormal basisu1; : : : ; umof m-vectors, an orthonormal basis v1; : : : ; vn of n-vectors, and

positive numbers 1; : : : ; r such that

1. u1; : : : ; ur is a basis of the range ofA

2. vrC1; : : : ; vnis a basis of the null space ofA

3. A D PrkD1 kukvHk :Proof: AHA is a Hermitian nn matrix that is positive semidefinite. Therefore, there is anorthonormal basisv1; : : : ; vn and nonnegative numbers

21 ; : : : ;

2n such

AHAvkD 2k vk kD 1 ; : : : ; n : (2.107)Since A is nonzero, at least one of the eigenvalues 2

k must be positive. Let the eigenvalues be

arranged so that21 22 2r > 0and 2rC1D D 2nD 0. Consider now the vectorsAv1; : : : ; A vn. We have

.Avi /HAvj

DvHi A

HAvjD

2jvHi vj

D0 i

j; (2.108)

i.e.,Av1; : : : ; A vnare orthogonal. WheniD jkAvik2 D vHi AHAviD 2ivHi viD 2i > 0 iD 1 ; : : : ; r

D 0 i > r: (2.109)Thus, AvrC1D D AvnD 0 and hence vrC1; : : : ; vn belong to the null space ofA. Defineu1; : : : ; ur by

uiD .1=i /Avi iD 1 ; : : : ; r : (2.110)

33

8/11/2019 NumLinAlg

35/72

Thenu1; : : : ; ur is an orthonormal set of vectors in the range ofA that span the range ofA. Thus,

u1; : : : ; ur is a basis for the range ofA. The dimensionr of the range ofA is called the rank of

A. Ifr < m, we can extend the set u1; : : : ; ur of orthonormal vectors to an orthonormal basis

u1; : : : ; um of m-space using the Gram-Schmidt process. Ifx is an n-vector, we can write x in

terms of the basisv1; : : : ; vn as

xDnX

kD1

.vHk x/vk: (2.111)

It follows from equations (2.110) and(2.111) that

AxDnX

kD1

.vHk x/AvkDrX

kD1

.vHk x/kukDrX

kD1

kukvHk x: (2.112)

Sincex in equation (2.112) was arbitrary, we must have

A DrX

kD1kukv

H

k : (2.113)

The representation ofA in equation (2.113) is called the singular value decomposition (SVD). If

x belongs to the null space ofA (AxD 0), then it follows from equation (2.112) and the linearindependence of the vectors u1; : : : ; ur that v

Hk

xD 0 for kD 1; : : : ; r . It then follows fromequation (2.111) that

xDnX

kDrC1

.vHk x/vk;

i.e.,vrC1; : : : ; vn span the null space ofA. SincevrC1; : : : ; vn are orthonormal vectors belonging

to the null space ofA, they form a basis for the null space ofA.

We will now express the SVD in matrix form. Define UD .u1; : : : ; um/,VD .v1; : : : ; vn/, andSD diag.1; : : : ; r /. Ifr

8/11/2019 NumLinAlg

36/72

Generally we write the SVD in the form (2.114) with the understanding that some of the zero

portions might collapse and disappear.

We next give a geometric interpretation of the SVD. For this purpose we will restrict ourselves to

the real case. Letx be a point on the unit sphere, i.e., kxk D 1. Sinceu1; : : : ; ur is a basis for therange ofA, there exist numbersy1; : : : ; yk such that

AxDrX

kD1

ykuk

DrX

kD1

k.vTkx/uk :

Therefore, ykD k.vTkx/,kD 1; : : : ; r . Since the columns ofVform an orthonormal basis, wehave

xDnX

kD1.v

T

kx/vk:

Therefore,

kxk2 DnX

kD1

.vTkx/2 D 1:

It follows thaty2121

C C y2r

2rD .vT1x/2 C C .vTr x/2 1:

Here equality holds when r D n. Thus, the image ofx lies on or interior to the hyper ellipsoidwith semi axes1u1; : : : ; r ur . Conversely, ify1; : : : ; yr satisfy

y2121

C C y2r

2r 1;

we define2 D 1 PrkD1.yk=k/2 andxD

rXkD1

yk

kvkC vrC1:

SincevrC1is in the null space ofA and AvkD kuk (k r), it follows that

AxDrX

kD1

yk

kAvkC AvrC1D

rXkD1

ykuk:

In addition,

kxk2 DrX

kD1

y2k

2kC 2 D 1:

35

8/11/2019 NumLinAlg

37/72

Thus, we have shown that the image of the unit sphere kxk D 1under the mappingAis the hyperellipsoid

y2121

C C y2r

2r1

relative to the basis u1; : : : ; ur . When rD

n, equality holds and the image is the surface of the

hyper ellipsoid

y2121

C Cy2r

2nD 1:

2.4.2 The SVD and Least Squares Problems

In least squares problems we seek an x that minimizeskAx bk. In view of the singular valuedecomposition, we have

kAx bk2 DUS 00 0

VHx b

2DU

S 0

0 0

VHx UHb

2

D

S 0

0 0

VHx UHb

2

: (2.118)

If we define

y

D y1

y2 D V

Hx (2.119)

ObD Ob1

Ob2

!D UHb: (2.120)

then equation (2.118) can be written

kAx bk2 D

Sy1 Ob1 Ob2

!2

D kSy1 Ob1k2 C k Ob2k2: (2.121)

It is clear from equation (2.121) that kAx bk is minimized wheny1DS1 Ob1. Therefore, theythat minimizes kAx bk is given by

yD

S1 Ob1y2

y2arbitrary. (2.122)

In view of equation (2.119), thex that minimizes kAx bk is given by

xD VyD V

S1 Ob1y2

y2 arbitrary. (2.123)

36

8/11/2019 NumLinAlg

38/72

SinceVis unitary, it follows from equation (2.123) that

kxk2 D kS1 Ob1k2 C ky2k2:

Thus, there is a unique xof minimum norm that minimizes kAx bk, namely the xcorrespondingtoy2D 0. Thisx is given by

xD V

S1 Ob10

D V

S1 0

0 0

Ob1Ob2

!

D V

S1 0

0 0

UHb:

The matrix multiplyingb on the right-hand-side of this equation is called the generalized inverse

ofAand is denoted byAC

, i.e.,

AC D V

S1 0

0 0

UH: (2.124)

Thus, the minimum norm solution of the least squares problem is given byxD ACb. Then mmatrixAC plays the same role in least squares problems that A1 plays in the solution of linear

equations. We will now show that this definition of the generalized inverse gives the same result

as the classical Moore-Penrose conditions.

Theorem 2. IfA has a singular value decomposition given by

A D U

S 0

0 0

VH;

then the matrixXdefined by

XD AC D V

S1 0

0 0

UH

is the unique solution of the Moore-Penrose conditions:

1. AXA D A

2. XAXD X3. .AX/H D AX4. .XA/H D XA.

37

8/11/2019 NumLinAlg

39/72

Proof:

AXA D U

S 0

0 0

VHV

S1 0

0 0

UHU

S 0

0 0

VH

DUS 0

0 0I 0

0 0VH

D U

S 0

0 0

VH

D A;i.e.,Xsatisfies condition (1).

XAXD V

S1 0

0 0

UHU

S 0

0 0

VHV

S1 0

0 0

UH

D VS1 0

0 0UH

D X;i.e.,Xsatisfies condition (2). Since

AXD U

S 0

0 0

VHV

S1 0

0 0

UH D U

I 0

0 0

UH

and

XA D V

S1 0

0 0

UHU

S 0

0 0

VH D V

I 0

0 0

VH;

it follows that both AX and XA are Hermitian, i.e., Xsatisfies conditions (3) and (4). To show

uniqueness let us suppose that bothXandY satisfy the Moore-Penrose conditions. Then

XD XAX by (2)D X.AX/H D XXHAH by (3)D XXH.AYA/H D XXHAHYHAH by (1)D XXHAH.AY /H D XXHAHAY by (3)D X.AX/HAYD XAXAY by (3)D XAY by (2)D X.AYA/Y by (1)

DXA.YA/Y

DXA.YA/HY

DXAAHYHY by (4)

D .XA/HAHYHYD AHXHAHYHY by (4)D .AXA/HYHYD AHYHY by (1)D .YA/HYD YAY by (4)D Y by (2):

Thus, there is only one matrix Xsatisfying the Moore-Penrose conditions.

38

8/11/2019 NumLinAlg

40/72

2.4.3 Singular Values and the Norm of a Matrix

LetAbe anm nmatrix. By virtue of the SVD, we have

AxDrX

kD1k.v

H

k x/uk for any n-vectorx: (2.125)

Since the vectorsu1; : : : ; ur are orthonormal, we have

kAxk2 DrX

kD1

2k jvHk xj2 21rX

kD1

jvHk xj2 21 kxk2: (2.126)

The last inequality comes from the fact thatx has the expansion xD PnkD1.vHk x/vk in terms ofthe orthonormal basisv1; : : : ; vn and hence

kxk2

DnX

kD1jv

H

k xj2

:

Thus, we have

kAxk 1kxk for allx. (2.127)SinceAv1D 1u1, we have kAv1k D 1D 1kv1k. Hence,

maxx0

kAxkkxk D 1; (2.128)

i.e.,A cant stretch the length of a vector by a factor greater than 1. One of the definitions of the

norm of a matrix is

kAk D supx0

kAxkkxk : (2.129)

It follows from equations (2.128) and(2.129) that kAk D1 (the maximum singular value ofA).IfAis of full rank (r=n), then it follows by a similar argument that

minx0

kAxkkxk D n:

IfAis anm nmatrix andB is ann pmatrix, then for everyp-vectorx we havekABxk kAk kB xk kAk kBk kxk

and hence kABk kAk kBk.

2.4.4 Low Rank Matrix Approximations

You can think of the rank of a matrix as a measure of redundancy. Matrices of low rank should

have lots of redundancy and hence should be capable of specification by less parameters than the

39

8/11/2019 NumLinAlg

41/72

total number of entries. For example, if the matrix consists of the pixel values of a digital image,

then a lower rank approximation of this image should represent a form of image compression. We

will make this concept more precise in this section.

One choice for a low rank approximation to Ais the matrixAkD PkiD1 i ui v

Hi fork < r. Ak is

a truncated SVD expansion ofA. Clearly

A AkDrX

iDkC1

i ui vHi : (2.130)

Since the largest singular value ofA Ak iskC1, we have

kA Akk D kC1: (2.131)

Suppose B is another mnmatrix of rankk. Then the null space N ofB has dimension nk. Letw1; : : : ; wnk be a basis for N. Then C 1 n-vectorsw1; : : : ; wnk ; v1; : : : ; vkC1 must be linearlydependent, i.e., there are constants1; : : : ; ank and1; : : : ; kC1, not all zero, such that

nkXiD1

i wiCkC1XiD1

i viD 0:

Not all of thei can be zero since v1; : : : ; vkC1 are linearly independent. Similarly, not all of the

i can be zero. Therefore, the vector h defined by

h DnkXiD1

i wiD kC1XiD1

i vi

is a nonzero vector that belongs to both N and < v1; : : : ; vkC1 >. By proper scaling, we can

assume thathis a vector with unit norm. Sincehbelongs to< v1; : : : ; vkC1 >, we have

h DkC1XiD1

.vHi h/vi : (2.132)

Therefore,

khk2 DkC1XiD1

jvHi hj2: (2.133)

SinceAviD i ui foriD 1 ; : : : ; r, it follows from equation (2.132) that

Ah DkC1XiD1

.vHi h/AviDkC1XiD1

.vHi h/i ui : (2.134)

Therefore,

kAhk2 DkC1XiD1

jvHi hj22i 2kC1kC1XiD1

jvHi hj2 D 2kC1khk2: (2.135)

40

8/11/2019 NumLinAlg

42/72

Sincehbelongs to the null space N, we have

kA Bk2 k.A B /hk2 D kAhk2 2kC1khk2 D 2kC1: (2.136)

Combining equations (2.131) and(2.136), we obtain

kA Bk kC1D kA Akk: (2.137)

Thus,Ak is the rankk matrix that is closest to A.

2.4.5 The Condition Number of a Matrix

SupposeA is ann ninvertible matrix andx is the solution of the system of equations AxD b .We want to see how sensitive x is to perturbations of the matrixA. LetxC x be the solution tothe perturbed system .A

CA/.x

Cx/

Db. Expanding the left-hand-side of this equation and

neglecting the second order perturbations Ax, we get

A xC A xD 0 or xD A1Ax: (2.138)

It follows from equation (2.138) that

kxk kA1kkAkkxk

or kxk=kxk

kA

k=

kA

k

kA1kkAk: (2.139)

The quantity kA1kkAk is called the condition number ofAand is denoted by.A/, i.e.,

.A/ D kA1kkAk:

Thus, equation (2.139) can be written

kxk=kxkkAk=kAk .A/: (2.140)

We have seen previously thatkAk D 1, the largest singular value. SinceA1 has the singularvalue decompositionA1

D V S1UH, it follows that

kA1

k D1=n. Therefore, the condition

number is given by.A/ D 1

n: (2.141)

The condition number is sort of an aspect ratio of the hyper ellipsoid that A maps the unit sphere

into.

41

8/11/2019 NumLinAlg

43/72

2.4.6 Computation of the SVD

The methods for calculating the SVD are all variations of methods used to calculate eigenvalues

and eigenvectors of Hermitian Matrices. The most natural procedure would be to follow the deriva-

tion of the SVD and compute the squares of the singular values and the unitary matrix Vby solving

the eigenproblem forAHA. TheUmatrix would then be obtained from AV. Unfortunately, this

procedure is not very accurate due to the fact that the singular values ofAHA are the squares of the

singular values ofA. As a result the ratio of largest to smallest singular value can be much larger

forAHAthan forA. There are, however, implicit methods that solve the eigenproblem forAHA

without ever explicitly formingAHA. Most of the SVD algorithms first reduceA to bidiagonal

form (all elements zero except the diagonal and first superdiagonal). This can be accomplished

using householder reflections alternately on the left and right as shown in figure2.2.

A1D UH1 A D

x x x x0 x x x

0 x x x

0 x x x

0 x x x

! A2D A1V1D

x x 0 00 x x x

0 x x x

0 x x x

0 x x x

!

A3D UH2 A2D

x x 0 0

0 x x x

0 0 x x

0 0 x x

0 0 x x

! A4D A3V2D

x x 0 0

0 x x 0

0 0 x x

0 0 x x

0 0 x x

!

A5D UH3 A4D

x x 0 0

0 x x 0

0 0 x x

0 0 0 x

0 0 0 x

! A6D UH4 A5D

x x 0 0

0 x x 0

0 0 x x

0 0 0 x

0 0 0 0

:

Figure 2.2: Householder reduction of a matrix to bidiagonal form.

Since the application of the Householder reflections on the right dont try to zero all the elementsto the right of the diagonal, they dont affect the zeroes already obtained in the columns. We have

seen that, even in the complex case, the Householder matrices can be chosen so that the resulting

bidiagonal matrix is real. Notice also that when the number of rowsm is greater than the number

of columns n, the reduction produces zero rows after rown. Similarly, whenn > m, the reduction

produces zero columns after columnm. If we replace the products of the Householder reflections

by the unitary matrices OU and OV, the reduction to a bidiagonalB can be written asBD OUHA OV or A DOU BOVH: (2.142)

42

8/11/2019 NumLinAlg

44/72

IfB has the SVDBD NU NVT, thenA has the SVD

A DOU .NU NVT/ OVH D .OUNU/. OVNV /H D UVH;

whereUD OUNU andVDOVNV. Thus, it is sufficient to find the SVD of the real bidiagonal matrixB . Moreover, it is not necessary to carry along the zero rows or columns ofB . For if the squareportionB1ofB has the SVDB1D U11VT1 , then

BD

B10

D

U11VT

1

0

D

U1 0

0 I

10

VT1 (2.143)

or

BD .B1; 0/ D .U11VT1 ; 0/ D U1.1; 0/

V1 0

0 I

T: (2.144)

Thus, it is sufficient to consider the computation of the SVD for a real, square, bidiagonal matrix

B .

In addition to the implicit methods of finding the eigenvalues ofB TB, some methods look instead

at the symmetric matrix

0 B T

B 0

. If the SVD ofB is B D U VT, then

0 B T

B 0

has the

eigenequation 0 B T

B 0

V V

U U

D

V V

U U

0

0

: (2.145)

In addition, the matrix

0 BT

B 0

can be reduced to a real tridiagonal matrixTby the relation

T

DPTBP (2.146)

where PD .e1; enC1; e2; enC2; : : : ; en; e2n/ is a permutation matrix formed by a rearrangementof the columns e1; e2; : : : ; e2n of the 2n2 n identity matrix. The matrixP is unitary and issometimes called theperfect shufflesince its operation on a vector mimics a perfect card shuffle of

the components. The algorithms based on this double size Symmetric matrix dont actually form

the double size matrix, but make efficient use of the symmetries involved in this eigenproblem.

For those interested in the details of the various SVD algorithms, I would refer you to the book by

Demmel [4].

In Matlab the SVD can be obtained by the call [U,S,V]=svd(A). In LAPACK the general driver

routines for the SVD are SGESVD, DGESVD, and CGESVD depending on whether the matrix is

real single precision, real double precision, or complex.

43

8/11/2019 NumLinAlg

45/72

Chapter 3

Eigenvalue Problems

Eigenvalue problems occur quite often in Physics. For example, in Quantum Mechanics eigen-values correspond to certain energy states; in structural mechanics problems eigenvalues often

correspond to resonance frequencies of the structure; and in time evolution problems eigenvalues

are often related to the stability of the system.

LetA be anm msquare matrix. A nonzero vector x is an eigenvector ofA and is its corre-sponding eigenvalue, if

AxD x:The set of vectors

VD fxW AxD xg

is a subspace called the eigenspace corresponding to . The equationAxD x is equivalent to.A I/xD 0. Ifis an eigenvalue, then the matrixA I is singular and hencedet.A I/ D 0:

Thus, the eigenvalues ofA are roots of a polynomial equation of order n. This polynomial equation

is called the characteristic equation ofA. Conversely, ifp.z/ D a0 Ca1z C C an1zn1 Canznis an arbitrary polynomial of degreen (an 0), then the matrix0 a0=an1 0 a1=an1 0 a2=an1 : : : :::

: : : 0 an2=an1 an1=an

hasp.z/ D 0as its characteristic equation.

In some problems an eigenvalue might correspond to a multiple root of the characteristic equa-

tion. The multiplicity of the rootis called its algebraic multiplicity. The dimension of the space

44

8/11/2019 NumLinAlg

46/72

V is called its geometric multiplicity. If for some eigenvalue ofA, the geometric multiplicity

of does not equal its geometric multiplicity, this eigenvalue is said to be defective. A matrix

with one or more defective eigenvalues is said to be a defective matrix. An example of a defective

matrix is the matrix 2 1 00 2 10 0 2

:

This matrix has the single eigenvalue 2 with algebraic multiplicity 3. However, the eigenspace

corresponding to the eigenvalue 2 has dimension 1. All the eigenvectors are multiples ofe1. In

these notes we will only consider eigenvalue problems involving Hermitian matrices (AH D A).We will see that all such matrices are non defective.

IfSis a nonsingularm mmatrix, then the matrixS1ASis said to be similar toA. Since

det.S1AS I/ D det

S1.A I/S

D det.S1/ det.A I/ det.S / D det.A I/;

it follows that S1ASand A have the same characteristic equation and hence the same eigenvalues.It can be shown that a Hermitian matrixAalways has a complete set of orthonormal eigenvectors.

If we form the unitary matrix U whose columns are the eigenvectors belonging to this orthonormal

set, then

AUD U or UHAUD (3.1)where is a diagonal matrix whose diagonal entries are the eigenvalues. Thus, a Hermitian matrix

is similar a diagonal matrix. Since a diagonal matrix is clearly non defective, it follows that all

Hermitian matrices are non defective.

Ife is a unit eigenvector of the Hermitian matrixAand is the corresponding eigenvalue, then

AeD e and hence D eHAe:

It follows that D .eHAe/H D eHAHeD eHAeD , i.e., the eigenvalues of a Hermitianmatrix are real.

It was shown by Abel, Galois and others in the nineteenth century that there can be no alge-

braic expression for the roots of a polynomial equation whose order is greater than four. Since

eigenvalues are roots of the characteristic equation and since the roots of any polynomial are the

eigenvalues of some matrix, there can be no purely algebraic method for computing eigenvalues.

Thus, algorithms for finding eigenvalues must at some stage be iterative in nature. The methods

to be discussed here first reduce the Hermitian matrix A to a real, symmetric, tridiagonal matrixT by means of a unitary similarity transformation. The eigenvalues ofTare then found using

certain iterative procedures. The most common iterative procedures are theQRalgorithm and the

divide-and-conquer algorithm.

45

8/11/2019 NumLinAlg

47/72

8/11/2019 NumLinAlg

48/72

and let 1; : : : ; n be the corresponding eigenvalues. We will assume that the eigenvalues and

eigenvectors are so ordered that

j1j j2j jnj:We will assume further that

j1

j>

j2

j. Let v be an arbitrary vector with

kv

k D1. Then there

exist constantsc1; : : : ; cn such that

vD c1v1 C C cnvn: (3.2)We will make the further assumption thatc1 0. Successively applyingA to equation (3.2), weobtain

AkvD c1Akv1 C C cnAkvnD c1k1 v1 C C cnknvn: (3.3)You can see from equation (3.3) that the term c1

k1 v1 will eventually dominate and thus A

kv,

if properly scaled at each step to prevent overflow, will approach a multiple of the eigenvector

v1. This convergence can be slow if there are other eigenvalues close in magnitude to 1. The

conditionc1

0is equivalent to the condition

< v > \ < v2; : : : ; vn >D f0g:

3.3 The Rayleigh Quotient

The Rayleigh quotient of a vectorx is the real number

r.x/

D

xTAx

xT

x

:

Ifx is an eigenvector ofA corresponding to the eigenvalue, thenr.x/D. Ifx is any nonzerovector, then

kAx xk2 D .xTAT xT/.Ax x/D xTATAx 2xTAx C 2xTxD xTATAx 2r.x/xTx C 2xTx C r2.x/xTx r2.x/xTxD xTATAx C xTx r.x/2 r2.x/xTx:

Thus,

D r.x/ minimizes

kAx

x

k. If x is an approximate eigenvector, then r.x/ is an

approximate eigenvalue.

3.4 Inverse Iteration with Shifts

For any that is not an eigenvalue ofA, the matrix.A I/1 has the same eigenvectors asAand has eigenvalues .j /1 wherefjg are the eigenvalues ofA. Suppose is close to the

47

8/11/2019 NumLinAlg

49/72

eigenvalue i . Then.i /1 will be large compared to .j /1 for j i . If we apply poweriteration to .AI /1, the process will converge to a multiple of the eigenvector vi correspondingtoi . This procedure is called inverse iteration with shifts. Although the power method is not used

in practice, the inverse power method with shifts is frequently used to compute eigenvectors once

an approximate eigenvalue has been obtained.

3.5 Rayleigh Quotient Iteration

The Rayleigh quotient can be used to obtain the shifts at each stage of inverse iteration. The

procedure can be summarized as follows.

1. Choose a starting vectorv.0/ of unit magnitude.

2. Let.0/ D .v0/TAv0 be the corresponding Rayleigh quotient.3. ForkD 1 ; 2 ; : : :

Solve

A .k1/wD v.k1/ forw, i.e., compute A .k1/1v.k1/.Normalizew to obtainv.k/ D w=kwk.Let.k/ D .v.k//TAv.k/ be the corresponding Rayleigh quotient.

It can be shown that the convergence of Rayleigh quotient iteration is ultimately cubic. Cubic

convergence triples the number of significant digits on each iteration.

3.6 The Basic QR Method

The QR method was discovered independently by Francis [6] and Kublanovskaya [11] in 1961.

It is one of the standard methods for finding eigenvalues. The discussion in this section is based

largely on the paperUnderstanding the QR Algorithmby Watkins [13]. As before, we will assume

that the matrixA is real and symmetric. Therefore, there is an orthonormal basis v1; : : : ; vn such

that AvjD

jvj for each j . We will assume that the eigenvaluesj are ordered so that

j1

j j2j jnj.

The QR algorithm can be summarized as follows:

48

8/11/2019 NumLinAlg

50/72

1. ChooseA0D A2. Form D 1 ; 2 ; : : :

Am1D

QmRm QRfactorization

AmD RmQm

3. Stop whenAmis approximately diagonal.

It is probably not obvious what this algorithm has to do with eigenvalues. We will show that the QR

method is a way of organizing simultaneous iteration, which in turn is a multivector generalization

of the power method.

We can apply the power method to subspaces as well as to single vectors. SupposeS is a k-

dimensional subspace. We can compute the sequence of subspacesS;AS; A2S ; : : : . Under certain

conditions this sequence will converge to the subspace spanned by the eigenvectors v1; v2; : : : ; vkcorresponding to the klargest eigenvalues ofA. We will not provide a rigorous convergence proof,

but we will attempt to make this result seem plausible. Assume that jkj >jkC1j and define thesubspaces

TD< v1; : : : ; vk > UD< vkC1; : : : ; vn > :We will first show that all the null vectors ofA lie in U. Supposev is a null vector ofA, i.e.,

AvD 0. We can expandv in terms of the basisv1; : : : ; vn giving

vD c1v1 C C ckvkC ckC1vkC1 C C cnvn:

Thus,AvD c11v1 C C ckk vkC ckC1kC1vkC1 C C cnnvnD 0:

Since the vectorsfvjg are linearly independent andj1j jk j > 0, it follows that c1Dc2D D ckD 0, i.e.,v belongs to the subspace U. We will now make the additional assumptionS\ UD f0g. This assumption is analogous to the assumptionc1 0in the power method. Ifxis a nonzero vector inS, then we can write

xD c1v1 C c2v2 C C ckvk .component inT /C ckC1vkC1 C C cnvn: .component inU /

Thus,

Amx=.k /m D c1.1=k/mv1 C C ck1.k1=k/mvk1 C ckvk

C ckC1.kC1=k/mvkC1 C C cn.n=k/mvn:

Sincex doesnt belong to U, at least one of the coefficients c1; : : : ; ck must be nonzero. Notice

that the firstk terms on the right-hand-side do not decrease in absolute value asm! 1 whereasthe remaining terms approach zero. Thus, Amx, if properly scaled, approaches the subspaceT as

m ! 1. In the limit AmSmust approach a subspace ofT. Since S\UD f0g,A can have no null

49

8/11/2019 NumLinAlg

51/72

vectors inS. Thus,A is invertible onS. It follows that all of the subspaces AmShave dimension

kand hence the limit can not be a proper subspace ofT, i.e.,AmS! T asm ! 1.

Numerically, we cant iterate on an entire subspace. Therefore, we pick a basis of this subspace

and iterate on this basis. Letq01 ; : : : ; q0k

be a basis ofS. SinceA is invertible onS,Aq01 ; : : : ; A q0k

is a basis ofAS. Similarly, Am

q01 ; : : : ; A

m

q0k is a basis ofA

m

S for all m. Thus, in principlewe can iterate on a basis of S to obtain bases for AS;A2S ; : : : . However, for large m these

bases become ill-conditioned since all the vectors tend to point in the direction of the eigenvector

corresponding to the eigenvalue of largest absolute value. To avoid this we orthonormalize the basis

at each step. Thus, given an orthonormal basisqm1 ; : : : ; qmk

ofAmS, we computeAq m1 ; : : : ; Aqmk

and then orthonormalize these vectors (using something like the Gram-Schmidt process) to obtain

an orthonormal basis qmC11 ; : : : ; qmC1k

ofAmC1S. This process is called simultaneous iteration.

Notice that this process of orthonormalization has the property

< Aqm1 ; : : : ; Aqmi >D< q mC11 ; : : : ; qmC1i > foriD 1 ; : : : ; k :

Let us consider now what happens when we apply simultaneous iteration to the complete set of

orthonormal vectorse1 : : : ; enwhereek is thek-th column of the identity matrix. Let us define

SkD< e1; : : : ; ek >; TkD< v1; : : : ; vk >; UkD< vkC1; : : : ; vn >

for k D 1; 2 ; : : : ; n 1. We also assume thatSk\ UkD f0g andjkj >jkC1j > 0 for each1 k n 1. It follows from our previous discussion that AmSk! Tk asm! 1. In termsof bases, the orthonormal vectors q m1 ; : : : ; q

mn will converge to and orthonormal basis q1; : : : ; qn

such thatTkD< q1; : : : ; qk >for eachkD 1; : : : ; n 1. Each of the subspacesTk is invariantunderA, i.e.,ATk Tk. We will now look at a property of invariant subspaces. SupposeT is aninvariant subspace ofA. LetQD .Q1; Q2/be an orthogonal matrix such that the columns ofQ1is a basis ofT. Then

QTAQD

QT1AQ1 QT1 AQ2

QT2AQ1 QT2 AQ2

D

QT1AQ1 0

0 QT2 AQ2

, i.e., the basis consisting of the columns ofQ block diagonalizes A. Let Q be the matrix with

columnsq1; : : : ; qn. Since eachTk is invariant underA, the matrixQTAQhas the block diagonal

form

QTAQD

A1 0

0 A2

whereA1is k k

for each kD 1; : : : ; n 1. Therefore,QT

AQ must be diagonal. The diagonal entries are theeigenvalues ofA. If we define AmD QTmAQm where QmD< qm1 ; : : : ; qmn >, then Am willbecome approximately diagonal for largem.

We can summarize simultaneous iteration as follows:

50

8/11/2019 NumLinAlg

52/72

1. We start with the orthogonal matrix Q0D Iwhose columns form a basisofn-space

2. ForkD 1 ; 2 ; : : : we compute

ZmD AQm1 Power iteration step(3.4a)

ZmD QmRm Orthonormalize columns ofZm (3.4b)AmD QTmAQm Test for diagonal matrix: (3.4c)

TheQRalgorithm is an efficient way to organize these calculations. Equations (3.4a) and(3.4b)

can be combined to give

AQm1D QmRm: (3.5)Combining equations (3.4c) and(3.5), we get

Am1D QTm1AQm1D QTm1.QmRm/ D .QTm1Qm/RmD OQmRm (3.6)where OQmD QTm1Qm. Equation (3.5) can be rewritten as

QTmAQm1D Rm: (3.7)Combining equations (3.4c) and(3.7), we get

AmD QTmAQmD .QTmAQm1/QTm1QmD Rm.QTm1Qm/ D RmOQm: (3.8)Equation (3.6) is a QR factorization ofAm1. Equation (3.8) shows that Am has the same Q

and R factors but with their order reversed. Thus, theQRalgorithm generates the matrices Amrecursively without having to compute Zmand Qmat each step. Note that the orthogonal matricesOQmand Qmsatisfy the relation

OQ1OQ2 OQkD .QT0Q1/.QT1 Q2/ .QTk1Qk/ D Qk:

We have now seen that the QRmethod can be considered as a generalization of the power method.

We will see that the QRalgorithm is also related to inverse power iteration. In fact we have the

following duality result.

Theorem 3. IfA is an nn symmetric nonsingular matrix and ifS andS? are orthogonalcomplementary subspaces. ThenA

m

S andAm

S?

are also orthogonal complements.

Proof. Ifx andy aren-vectors, then

x yD xTyD xTAT.AT/1yD .Ax/T.AT/1yD .Ax/TA1yD Ax A1y:Applying this result repeatedly, we obtain

x yD Amx Amy:

51

8/11/2019 NumLinAlg

53/72

It is clear from this relation that every element in AmSis orthogonal to every element inAmS?.

Let q1; : : : ; qk be a basis ofS and let qkC1; : : : ; qn be a basis ofS?. Then Amq1; : : : ; A

mqkis a basis ofAmS and AmqkC1; : : : ; A

mqn is a basis ofAmS?. Suppose there exist scalars

c1; : : : ; cn such that

c1Amq1

C Cck A

mqkC

ckC1AmqkC1

C CcnA

mqnD

0: (3.9)

Taking the dot product of this relation with c1Amq1 C C ckAmqk, we obtain

kc1Amq1 C C ckAmqkk D 0and hence c1A

mq1C CckAmqkD 0. Since Amq1; : : : ; Amqkare linearly independent, it followsthatc1D c2D D ckD 0. In a similar manner we obtain ckC1D D cnD 0. Therefore,Amq1; : : : ; A

mqk; AmqkC1; : : : ; A

mqn are linearly independent and hence form a basis for n-

space. Thus,AmS andAmS? are orthogonal complements.

It can be seen from this theorem that performing power iteration on subspaces Skis also performing

inverse power iteration onS

?

k . Since< qm1 ; : : : ; q

mk >D< Ame1; : : : ; Amek >;

Theorem3implies that

< qmkC1; : : : ; qmn >D< AmekC1; : : : ; Amen > :

For k D n1 we have < qmn >D< Amen >. Thus, qmn is the result at the m-th step ofapplying the inverse power method to en. It follows that q

mn should converge to an eigenvector

corresponding to the smallest eigenva

Date post:	03-Jun-2018
Category:	Documents
Upload:	harold-contreras
View:	214 times
Download:	0 times

NumLinAlg

Documents