Date post: | 03-Jun-2018 |
Category: |
Documents |
Upload: | harold-contreras |
View: | 214 times |
Download: | 0 times |
of 72
8/11/2019 NumLinAlg
1/72
Notes on Numerical Linear
Algebra
Dr. George W Benthien
December 9, 2006
E-mail: [email protected]
8/11/2019 NumLinAlg
2/72
Contents
Preface 5
1 Mathematical Preliminaries 6
1.1 Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Inner Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Matrices As Linear Transformations . . . . . . . . . . . . . . . . . . . . . 9
1.3 Derivatives of Vector Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Newtons Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Solution of Systems of Linear Equations 11
2.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 The Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Row Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Elementary Unitary Matrices and the QR Factorization . . . . . . . . . . . . . . . 19
2.3.1 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . 19
1
8/11/2019 NumLinAlg
3/72
2.3.2 Householder Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.3 Complex Householder Matrices . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.5 Complex Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.6 QR Factorization Using Householder Reflectors. . . . . . . . . . . . . . . 28
2.3.7 Uniqueness of the Reduced QR Factorization . . . . . . . . . . . . . . . . 29
2.3.8 Solution of Least Squares Problems . . . . . . . . . . . . . . . . . . . . . 32
2.4 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Derivation and Properties of the SVD . . . . . . . . . . . . . . . . . . . . 33
2.4.2 The SVD and Least Squares Problems . . . . . . . . . . . . . . . . . . . . 36
2.4.3 Singular Values and the Norm of a Matrix . . . . . . . . . . . . . . . . . . 39
2.4.4 Low Rank Matrix Approximations. . . . . . . . . . . . . . . . . . . . . . 39
2.4.5 The Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . . . 41
2.4.6 Computation of the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 Eigenvalue Problems 44
3.1 Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 The Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Inverse Iteration with Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 The Basic QR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6.1 The QR Method with Shifts . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7 The Divide-and-Conquer Method. . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Iterative Methods 61
2
8/11/2019 NumLinAlg
4/72
4.1 The Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3
8/11/2019 NumLinAlg
5/72
List of Figures
2.1 Householder reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Householder reduction of a matrix to bidiagonal form. . . . . . . . . . . . . . . . 42
3.1 Graph off ./ D 1 C :5
1C :5
2C :5
3C :5
4 . . . . . . . . . . . . . . . . . . . 58
3.2 Graph off ./ D 1 C :51
C :012
C :53
C :54
. . . . . . . . . . . . . . . . . . . 59
4
8/11/2019 NumLinAlg
6/72
Preface
The purpose of these notes is to present some of the standard procedures of numerical linear al-
gebra from the perspective of a user and not a computer specialist. You will not find extensive
error analysis or programming details. The purpose is to give the user a general idea of what the
numerical procedures are doing. You can find more extensive discussions in the references
Applied Numerical Linear Algebra by J. Demmel, SIAM 1997 Numerical Linear Algebra by L. Trefethen and D. Bau, Siam 1997 Matrix Computations by G. Golub and C. Van Loan, Johns Hopkins University Press 1996
The notes are divided into four chapters. The first chapter presents some of the notation used in
this paper and reviews some of the basic results of Linear Algebra. The second chapter discusses
methods for solving linear systems of equations, the third chapter discusses eigenvalue problems,
and the fourth discusses iterative methods. Of course we cannot discuss every possible method,
so I have tried to pick out those that I believe are the most used. I have assumed that the user hassome basic knowledge of linear algebra.
5
8/11/2019 NumLinAlg
7/72
Chapter 1
Mathematical Preliminaries
In this chapter we will describe some of the notation that will be used in these notes and reviewsome of the basic results from Linear Algebra.
1.1 Matrices and Vectors
A matrix is a two-dimensional array of real or complex numbers arranged in rows and columns. If
a matrixAhasmrows andncolumns, we say that it is anm nmatrix. We denote the element inthei -th row andj -th column ofAby aij. The matrixAis often written in the form
A D
a11 a1n:::
:::
am1 amn
:
We sometimes write AD .a1; : : : ; an/ where a1; : : : ; an are the columns of A. A vector (orn-vector) is an n1 matrix. The collection of all n-vectors is denoted by Rn if the elements(components) are all real and by Cn if the elements are complex. We define the sum of two
m nmatrices componentwise, i.e., the i ,j entry ofA C B isaijC bij. Similarly, we define themultiplication of a scalar times a matrixAto be the matrix whosei ,j component isaij.
IfA is a real matrix with components aij, then the transpose ofA (denoted byAT
) is the matrixwhose i ,j component is aj i , i.e. rows and columns are interchanged. IfA is a matrix with complex
components, thenAH is the matrix whosei ,j -th component is the complex conjugate of the j ,i -th
component ofA. We denote the complex conjugate ofa by a. Thus,.AH/ijD aj i . A real matrixA is said to be symmetric ifAD AT. A complex matrix is said to be Hermitian ifAD AH.Notice that the diagonal elements of a Hermitian matrix must be real. Then n matrix whosediagonal components are all one and whose off-diagonal components are all zero is called the
identity matrix and is denoted by I.
6
8/11/2019 NumLinAlg
8/72
IfA is anm k matrix andB is ank nmatrix, then the product AB is them nmatrix withcomponents given by
.AB/ijDkX
rD1
air brj:
The matrix product AB is only defined when the number of columns ofA is the same as thenumber of rows ofB . In particular, the product of anm nmatrixA and ann-vector xis given by
.Ax/iDnX
kD1
aik xk iD 1 ; : : : ; m :
It can be easily verified that I AD A if the number of columns in Iequals the number of rowsinA. It can also be shown that.AB/T D BTAT and .AB/H D BHAH. In addition, we have.AT/T D Aand .AH/H D A.
1.2 Vector Spaces
Rn and Cn together with the operations of addition and scalar multiplication are examples of a
structure called a vector space. A vector space Vis a collection of vectors for which addition and
scalar multiplication are defined in such a way that the following conditions hold:
1. Ifx andy belong to Vand is a scalar, thenx C y and x belong to V.2. x
CyD
yC
x for any two vectorsx and y in V.
3. x C .y C z/ D .x C y/ C z for any three vectorsx,y , andz in V.4. There is a vector0in Vsuch thatx C 0 D xfor allx in V.5. For eachx in Vthere is a vector x in Vsuch thatx C .x/ D 0.6. ./xD .x/for any scalars, and any vectorx in V.7. 1xD xfor anyx in V.8. .x C y/ D x C yfor anyx and y in Vand any scalar.9. . C /xD xC x for anyx in Vand any scalars,.
A subspace of a vector space Vis a subset that is also a vector space in its own right.
7
8/11/2019 NumLinAlg
9/72
1.2.1 Linear Independence and Bases
A set of vectorsv1; : : : ; vr is said to be linearly independent if the only way we can have 1v1 C C r vrD 0 is for 1D D rD 0. A set of vectorsv1; : : : ; vn is said to span a vectorspace Vif every vectorx in Vcan be written as a linear combination of the vectors v1; : : : ; vn, i.e.,
xD 1x1 C C nxn. The set of all linear combinations of the vectorsv1; : : : ; vr is a subspacedenoted by< v1; : : : ; vr >and called the span of these vectors. If a set of vectorsv1; : : : ; vn is
linearly independent and spansVit is called a basis forV. If a vector spaceVhas a basis consisting
of a finite number of vectors, then the space is said to be finite dimensional. In a finite-dimensional
vector space every basis has the same number of vectors. This number is called the dimension of
the vector space. Clearly Rn and Cn have dimensionn. Letek denote the vector in Rn or Cn that
consists of all zeroes except for a one in the k-th position. It is easily verified thate1; : : : ; en is a
basis for either Rn or Cn.
1.2.2 Inner Product and Orthogonality
Ifx and y are two n-vectors, then the inner (dot) productx yis the scaler value defined byxHy.If the vector space is real we can replacexH byxT. The inner productx y has the properties:
1. y xD x y2. x .y/ D .x y/3. x .y C z/ D x y C x z
4. x x 0and x xD 0if and only ifxD 0.Vectorsx and y are said to be orthogonal ifx yD 0. A basisv1; : : : ; vn is said to be orthonormalif
vi vjD(
0 i j1 iD j :
We define the norm kxk of a vector x by kxk D px xDq
jx1j2 C C jxnj2. The norm hasthe properties
1.
kx
k D jjk
xk
2.kxk D 0implies thatxD 03.kx C yk kxk C kyk.
Ifv1; : : : ; vn is an orthonormal basis and x D 1v1C C nvn, then it can be shown thatkxk2 D j1j2 C C jnj2. The norm and inner product satisfy the inequality
jx yj kxk kyk: Cauchy Inequality
8
8/11/2019 NumLinAlg
10/72
1.2.3 Matrices As Linear Transformations
Anm nmatrixAcan be considered as a mapping of the space Rn (Cn) into the space Rm (Cm)where the image of the n-vectorx is the matrix-vector product Ax. This mapping is linear, i.e.,
A.xC
y/D
AxC
AyandA.x/D
Ax. The range ofA (denoted by Range.A/) is the space
of allm-vectorsy such thatyD Ax for somen-vectorx. It can be shown that the range ofA isthe space spanned by the columns ofA. The null space ofA (denoted by Null.A/) is the vector
space consisting of all n-vectors x such that AxD 0. An n n square matrix A is said to beinvertible if it is a one-to-one mapping of the space Rn (Cn) onto itself. It can be shown that a
square matrixAis invertible if and only if the null space Null.A/consists of only the zero vector.
IfA is invertible, then the inverse A1 ofA is defined by A1yD xwhere xis the unique n-vectorsatisfyingAxD y . The inverse has the properties A1ADAA1 D I and.AB/1 D B1A1.We denote.A1/T and.AT/1 byAT.
IfAis anm nmatrix,x is ann-vector, andy is anm-vector; then it can be shown that
.Ax/ yD x .AHy/:
1.3 Derivatives of Vector Functions
The central idea behind differentiation is the local approximation of a function by a linear func-
tion. Iff is a function of one variable, then the locus of points
x;f.x/
is a plane curve C. The
tangent line to Cat
x;f.x/
is the graphical representation of the best local linear approximation
tof atx . We call this local linear approximation the differential. We represent this local linear
approximation by the equationdyD f0.x/dx. Iff is a function of two variables, then the locusof points
x;y ;f.x;y /
represents a surface S . Here the best local linear approximation to f at
.x;y/is graphically represented by the tangent plane to the surface S at the point
x;y ;f.x;y /
.
We will generalize this idea of a local linear approximation to vector-valued functions ofn vari-
ables. Letf be a function mapping n-vectors into m-vectors. We define the derivativeDf .x/ of
f at the n-vectorx to be the unique linear transformation (m nmatrix) satisfying
f .x C h/ D f.x/ C Df .x/h C o.khk/ (1.1)
whenever such a transformation exists. Here theonotation signifies a function with the property
limkhk!0
o.khk/khk D 0:
Thus,Df .x/is a linear transformation that locally approximatesf.
We can also define a directional derivative hf.x/in the directionhby
hf.x/ D lim!0
f .x C h/ f.x/
D dd
f .x C h/
D0(1.2)
9
8/11/2019 NumLinAlg
11/72
whenever the limit exists. This directional derivative is also referred to as the variationoff in the
directionh. IfDf .x/exists, then
hf.x/ D Df .x/h:However, the existence ofhf.x/for every directionh does not imply the existence ofDf .x/. If
we take hD
ei , thenhf.x/is just the partial derivative @f.x/
@xi.
1.3.1 Newtons Method
Newtons method is an iterative scheme for finding the zeroes of a smooth function f. Ifx is a
guess, then we approximatef nearx by
f .x C h/ D f.x/ C Df .x/h:
Ifx
Chis the zero of this linear approximation, then
h D Df .x/1f.x/or
x C h D x Df .x/1f.x/: (1.3)We can takexC h as an improved approximation to the nearby zero off . If we keep iteratingwith equation (1.3), then the.k C 1/-iterate x.kC1/ is related to thek-iteratex.k/ by
x.kC1/ D x.k/ Df .x.k//1f .x.k//: (1.4)
10
8/11/2019 NumLinAlg
12/72
Chapter 2
Solution of Systems of Linear Equations
2.1 Gaussian Elimination
Gaussian elimination is the standard way of solving a system of linear equations AxD b whenA is a square matrix with no special properties. The first known use of this method was in the
Chinese textNine Chapters on the Mathematical Artwritten between 200 BC and 100 BC. Here
it was used to solve a system of three equations in three unknowns. The coefficients (including
the right-hand-side) were written in tabular form and operations were performed on this table to
produce a triangular form that could be easily solved. It is remarkable that this was done long
before the development of matrix notation or even a notation for variables. The method was used
by Gauss in the early 1800s to solve a least squares problem for determining the orbit of the asteroid
Pallas. Using observations of Pallas taken between 1803 and 1809, he obtained a system of sixequations in six unknowns which he solved by the method now known as Gaussian elimination.
The concept of treating a matrix as an object and the development of an algebra for matrices were
first introduced by Cayley [2] in the paperA Memoir on the Theory of Matrices.
In this paper we will first describe the basic method and show that it is equivalent to factoring the
matrix into the product of a lower triangular and an upper triangular matrix, i.e., AD LU. Wewill then introduce the method of row pivoting that is necessary in order to keep the method stable.
We will show that row pivoting is equivalent to a factorizationPAD LU orA D PLU wherePis the identity matrix with its rows permuted. Having obtained this factorization, the solution for a
given right-hand-sideb is obtained by solving the two triangular systemsLyD
P band U xD
y
by simple processes called forward and backward substitution.
There are a number of good computer implementations of Gaussian elimination with row pivoting.
Matlab has a good implementation obtained by the call [L,U,P]=lu(A). Another good implemen-
tation is the LAPACK routine SGESV (DGESV,CGESV). It can be obtained in either Fortran or C
from the site www.netlib.org.
We will end by showing how the accuracy of a solution can be improved by a process called
11
8/11/2019 NumLinAlg
13/72
iterative refinement.
2.1.1 The Basic Procedure
Gaussian elimination begins by producing zeroes below the diagonal in the first column, i.e., : : : : : : :::
::: :::
: : :
!
: : : 0 : : : :::
::: :::
0 : : :
: (2.1)
Ifaijis the element ofA in thei -th row and thej -th column, then the first step in the Gaussian
elimination process consists of multiplyingA on the left by the lower triangular matrixL1 given
by
L1D
1 0 0 : : : 0
a21=a11 1 0 : : : 0a31=a11 0 1 :::
::: :::
: : : 0
an1=a11 0 : : : 0 1
; (2.2)
i.e., zeroes are produced in the first column by adding appropriate multiples of the first row to the
other rows. The next step is to produce zeroes below the diagonal in the second column, i.e.,
: : : 0 : : : :::
::: :::
0 : : :
!
: : : 0 0 0 :::
::: :::
0 0 : : :
: (2.3)
This can be obtained by multiplyingL1Aon the left by the lower triangular matrixL2given by
L2D1 0 0 0 : : : 00 1 0 0 00 a.1/32=a.1/22 1 0 00 a.1/42=a.1/22 0 1 0::: ::: ::: : : : 0
0 a.1/n2 =a.1/22 0 : : : 0 1
(2.4)where a
.1/ij is the i; j -th element ofL1A. Continuing in this manner, we can define lower triangular
matricesL3; : : : ; Ln1 so thatLn1 L1Ais upper triangular, i.e.,
Ln1 L1A D U: (2.5)
12
8/11/2019 NumLinAlg
14/72
Taking the inverses of the matricesL1; : : : ; Ln1, we can writeAas
A D L11 L1n1U: (2.6)Let
LD
L1
1 L1
n1
: (2.7)
Then it follows from equation (2.6) that
A D LU: (2.8)We will now show that Lis lower triangular. Each of the matricesLk can be written in the form
LkD I u.k/eTk (2.9)where ek is the vector whose components are all zero except for a one in thek-th position andu
.k/
is a vector whose first k components are zero. The termu.k/eTk is ann nmatrix whose elementsare all zero except for those below the diagonal in the k-th column. In fact, the components ofu.k/
are given by
u.k/i D
(0 1 i ka
.k1/
ik =a
.k1/
kk k < i
(2.10)
wherea.k1/ij is thei; j -th element ofLk1 L1A. SinceeTku.k/ D u.k/k D 0, it follows that
IC u.k/eTk
I u.k/eTk D IC u.k/eTk u.k/eTk u.k/eTku.k/eTk
D I u.k/eTku.k/eTkD I; (2.11)
i.e.,L1k D IC u.k/eTk : (2.12)
Thus, L1k is the same as Lk except for a change of sign of the elements below the diagonal in
columnk. Combining equations (2.7) and(2.12), we obtain
L D IC u.1/eT1 IC u.n1/eTn1 D IC u.1/eT1C C u.n1/eTn1: (2.13)In this expression the cross terms dropped out since
u.i/ eTi u.j /eTjD u.j /i u.i/ eTjD 0 fori < j:
Equation (2.13) implies thatL is lower triangular and that thek-th column ofLlooks like thek-thcolumn ofLk with the signs reversed on the elements below the diagonal, i.e.,
LD
1 0 0 : : : 0
a21=a11 1 0 0
a31=a11 a.1/32=a
.1/22 1 0
::: :::
: : : :::
an1=a11 a.1/n2 =a
.1/22 1
: (2.14)
13
8/11/2019 NumLinAlg
15/72
Having theLUfactorization given in equation (2.8), it is possible to solve the system of equations
AxD LUxD bfor any right-hand-side b. If we letyD Ux, theny can be found by solving the triangular systemLy
D b. Havingy, x can be obtained by solving the triangular system Ux
D y. Triangular
systems are very easy to solve. For example, in the system UxD y, the last equation can besolved forxn (the only unknown in this equation). Havingxn, the next to the last equation can be
solved forxn1 (the only unknown left in this equation). Continuing in this manner we can solve
for the remaining components ofx. For the systemLyD b, we start by computing y1 and thenwork our way down. Solving an upper triangular system is called back substitution. Solving a
lower triangular system is called forward substitution.
To computeL requires approximatelyn3=3operations where an operation consists of an addition
and a multiplication. For each right-hand-side, solving the two triangular systems requires approx-
imatelyn2 operations. Thus, as far as solving systems of equations is concerned, having the LU
factorization ofAis just as good as having the inverse ofA and is less costly to compute.
2.1.2 Row Pivoting
There is one problem with Gaussian elimination that has yet to be addressed. It is possible for
one of the diagonal elements a.k1/kk
that occur during Gaussian elimination to be zero or to be
very small. This causes a problem since we must divide by this diagonal element. If one of the
diagonals is exactly zero, the process obviously blows up. However, there can still be a problem
if one of the diagonals is small. In this case large elements are produced in both the L and U
matrices. These large entries lead to a loss of accuracy when there are subtractions involving these
big numbers. This problem can occur even for well behaved matrices. To eliminate this problem
we introduce row pivoting. In performing Gaussian elimination, it is not necessary to take the
equations in the order they are given. Suppose we are at the stage where we are zeroing out the
elements below the diagonal in the k -th column. We can interchange any of the rows from the
k-th row on without changing the structure of the matrix. In row pivoting we find the largest in
magnitude of the elements a.k1/
kk ; a.k1/
kC1;k; : : : ; a.k1/
nk and interchange rows to bring that element
to thek; k-position. Mathematically we can perform this row interchange by multiplying on the
left by the matrixPk that is like the identity matrix with the appropriate rows interchanged. The
matrix Pkhas the property PkPkD I, i.e.,Pk is its own inverse. With row pivoting equation (2.5)is replaced by
Ln1Pn1 L2P2L1P1A D U: (2.15)We can write this equation in the form
Ln1
Pn1Ln2P1
n1
Pn1Pn2Ln3P
1n2P
1n1
Pn1 P2L1P12 : : : P 1n1
Pn1 P1
A D U: (2.16)
DefineL0n1D Ln1 andL0kD Pn1 PkC1LkP1kC1 P1n1 kD 1 ; : : : ; n 2: (2.17)
14
8/11/2019 NumLinAlg
16/72
8/11/2019 NumLinAlg
17/72
2.1.3 Iterative Refinement
If the solution ofAxD b is not sufficiently accurate, the accuracy can be improved by applyingNewtons method to the functionf .x/ D Ax b. Ifx.k/ is an approximate solution to f .x/ D 0,then a Newton iteration produces an approximation x.kC1/ given by
x.kC1/ D x.k/ Df .x.k//1f .x.k// D x.k/ A1Ax.k/ b: (2.21)An iteration step can be summarized as follows:
1. compute the residualr .k/ D Ax.k/ b;2. solve the systemAd.k/ D r.k/ using theLU factorization ofA;3. Computex.kC1/ D x.k/ d.k/.
The residual is usually computed in double precision. If the above calculations were carried out
exactly, the answer would be obtained in one iteration as is always true when applying Newtons
method to a linear function. However, because of roundoff errors, it may require more than one
iteration to obtain the desired accuracy.
2.2 Cholesky Factorization
Matrices that are Hermitian (A
H
D A) and positive definite (xH
Ax > 0 for all x 0) occursufficiently often in practice that it is worth describing a variant of Gaussian elimination that isoften used for this class of matrices. Recall that Gaussian elimination amounted to a factorization
of a square matrixA into the product of a lower triangular matrix and an upper triangular matrix,
i.e.,A D LU. The Cholesky factorization represents a Hermitian positive definite matrixAby theproduct of a lower triangular matrix and its conjugate transpose, i.e., AD LLH. Because of thesymmetries involved, this factorization can be formed in roughly half the number of operations as
are needed for Gaussian elimination.
Let us begin by looking at some of the properties of positive definite matrices. Ifei is the i -th
column of the identity matrix and AD .aij/is positive definite, thenai iD eTi Aei > 0, i.e., thediagonal components ofA are real and positive. SupposeXis a nonsingular matrix of the samesize as the Hermitian, positive definite matrix A. Then
xH.XHAX/xD .Xx/HA.Xx/ > 0 for allx 0:
Thus,A Hermitian positive definite implies that XHAXis Hermitian positive definite. Conversely,
supposeXHAXis Hermitian positive definite. Then
A D .XX1/HA.XX1/ D .X1/H.XHAX/.X1/ is Hermitian positive definite.
16
8/11/2019 NumLinAlg
18/72
Next we will show that the component of largest magnitude of a Hermitian positive definite matrix
Aalways lies on the diagonal. Suppose that jakl j D maxi;jjaijj andk l. IfaklD jakl jeikl , letD eikl andxD ekC el . Then
xHAxD eTkAekC e Tl Aek C eTkAelC jj2eTl AelD akk C al l 2jakl j 0:This contradicts the fact that A is positive definite. Therefore, maxi;jjaijj D maxiai i . Supposewe partition the Hermitian positive definite matrix Aas follows
A D
B CH
C D
Ify is a nonzero vector compatible withD, letxH D .0;yH/. Then
xHAxD .0;yH/
B CH
C D
0
y
D yHDy > 0;
i.e., D is Hermitian positive definite. Similarly, lettingxH
D .yH
; 0/, we can show that B isHermitian positive definite.
We will now show that ifA is a Hermitian, positive-definite matrix, then there is a unique lower
triangular matrixL with positive diagonals such thatAD LLH. This factorization is called theCholesky factorization. We will establish this result by induction on the dimension n. Clearly, the
result is true for nD 1. For in this case we can take LD .pa11/. Suppose the result is true formatrices of dimensionn 1. LetA be a Hermitian, positive-definite matrix of dimensionn. Wecan partitionAas follows
A D a11 w
H
w K (2.22)where wis a vector of dimension n 1andKis a .n 1/ .n 1/matrix. It is easily verified that
A D
a11 wH
w K
D B H
1 0
0 K wwHa11
!B (2.23)
where
BDp
a11wHp
a110 I
!: (2.24)
We will first show that the matrix B is invertible. If
BxDpa11 wHpa11
0 I
!x1x2
Dpa11x1 C wHx2pa11
x2
!D 0;
then x2D 0 andp
a11x1D x1D 0. Therefore,B is invertible. From our discussion at thebeginning of this section it follows from equation (2.23) that the matrix
1 0
0 k wwHa11
!
17
8/11/2019 NumLinAlg
19/72
is Hermitian positive definite. By the results on the partitioning of a positive definite matrix, it
follows that the matrix
K w wH
a11
is Hermitian positive definite. By the induction hypothesis, there exists a unique lower triangular
matrixOLwith positive diagonals such that
K wwH
a11DOL OLH: (2.25)
Substituting equation (2.25) into equation (2.23), we get
A D B H
1 0
0 OL OLH
BD B H
1 0
0 OL
1 0
0 OLH
BDp
a11 0wpa11
OL
!pa11
wHpa11
0 OLH
! (2.26)
which is the desired factorization ofA. To show uniqueness, suppose that
A D
a11 wH
w K
D
l11 0
v OL
l11 vH
0 OLH
(2.27)
is a Cholesky factorization ofA. Equating components in equation (2.27), we see thatl 211D a11and hence thatl11D
pa11. Alsol11vD wor vD w= l11D w=
pa11. Finally, vv
H COL OLH D KorK vvH DK ww H=a11DOL OLH. SinceOL OLH is the unique factorization of the .n 1/ .n 1/Hermitian, positive-definite matrixK ww H=a11, we see that the Cholesky factorizationofA is unique. It now follows by induction that there is a unique Cholesky factorization of any
Hermitian, positive-definite matrix.
The factorization in equation (2.23) is the basis for the computation of the Cholesky factorization.
The matrix B H is lower triangular. Since the matrixK ww H=a11 is positive definite, it canbe factored in the same manner. Continuing in this manner until the center matrix becomes the
identity matrix, we obtain lower triangular matrices L1; : : : ; Ln such that
A D L1 LnLHn LH1 :
LettingL D L1 Ln, we have the desired Cholesky factorization.
As was mentioned previously, the number of operations in the Cholesky factorization is about half
the number in Gaussian elimination. Unlike Gaussian elimination the Cholesky method does not
need pivoting in order to maintain stability. The Cholesky factorization can also be written in theform
A D LDLH
whereD is diagonal andL now has all ones on the diagonal.
18
8/11/2019 NumLinAlg
20/72
2.3 Elementary Unitary Matrices and the QR Factorization
In Gaussian elimination we saw that a square matrix A could be reduced to triangular form by
multiplying on the left by a series of elementary lower triangular matrices. This process can also
be expressed as a factorization A D LU whereLis lower triangular and Uis upper triangular. Inleast squares problems the number of rows m in A is usually greater than the number of columnsn. The standard technique for solving least-squares problems of this type is to make use of a
factorization A D QRwhereQ is anm munitary matrix andR has the form
RDOR
0
with OR an n n upper triangular matrix. The usual way of obtaining this factorization is toreduce the matrix Ato triangular form by multiplying on the left by a series of elementary unitary
matrices that are sometimes called Householder matrices (reflectors). We will show how to use
thisQRfactorization to solve least square problems. If OQ is the m n matrix consisting of thefirstncolumns ofQ, then
A D OQOR:This factorization is called the reduced QR factorization. Elementary unitary matrices are also
used to reduce square matrices to a simplified form (Hessenberg or tridiagonal) prior to eigenvalue
calculation.
There are several good computer implementations that use the Householder QR factorization to
solve the least squares problem. The LAPACK routine is called SGELS (DGELS,CGELS). In
Matlab the solution of the least squares problem is given by Anb. TheQR factorization can beobtained with the call[Q,R]=qr(A).
2.3.1 Gram-Schmidt Orthogonalization
A reduced QR factorization can be obtained by an orthogonalization procedure known as the
Gram-Schmidt process. Suppose we would like to construct an orthonormal set of vectors q1; : : : ; qnfrom a given linearly independent set of vectorsa1; : : : ; an. The process is recursive. At the j -th
step we construct a unit vector qj that is orthogonal toq1; : : : ; qj1using
vjD ajj1XiD1
.qHi aj/qi
qjD vj=kvjk:
The orthonormal basis constructed has the additional property
< q1; : : : ; qj >D< a1; : : : ; aj > jD 1 ; 2 ; : : : ; n :
19
8/11/2019 NumLinAlg
21/72
If we consider a1; : : : ; an as columns of a matrix A, then this process is equivalent to the matrix
factorizationAD OQORwhere OQD .q1; : : : ; qn/ and OR is upper triangular. Although the Gram-Schmidt process is very useful in theoretical considerations, it does not lead to a stable numerical
procedure. In the next section we will discuss Householder reflectors, which lead to a more stable
procedure for obtaining aQRfactorization.
2.3.2 Householder Reflections
Let us begin by describing the Householder reflectors. In this section we will restrict ourselves to
real matrices. Afterwards we will see that there are a number of generalizations to the complex
case. Ifv is a fixed vector of dimension mwith kvk D 1, then the set of all vectors orthogonal to vis an.m 1/-dimensional subspace called a hyperplane. If we denote this hyperplane by H, then
HD fu W vTu D 0g: (2.28)
Here vT denotes the transpose ofv. Ifxis a point not on H, letNxdenote the orthogonal projectionofx ontoH(see Figure2.1). The differenceNx xmust be orthogonal toHand hence a multipleofv, i.e.,
Nx xD v or NxD x C v: (2.29)
x
x
xv
H
Figure 2.1: Householder reflection
SinceNxlies onH andvTvD kvk2 D 1, we must havevT NxD vTx C vTvD vTx C D 0: (2.30)
Thus,D vTxand consequentlyNxD x .vTx/vD x vvTxD .I vv T/x: (2.31)
20
8/11/2019 NumLinAlg
22/72
DefinePD I vv T. ThenPis a projection matrix that projects vectors orthogonally onto H.The projectionNx is obtained by going a certain distance from x in the directionv. Figure2.1suggests that the reflectionOx ofx acrossHcan be obtained by going twice that distance in thesame direction, i.e.,
OxD x 2.vT
x/vD x 2vvT
xD .I 2vvT
/x: (2.32)
With this motivation we define the Householder reflector Q by
QD I 2vvT kvk D 1: (2.33)An alternate form for the Householder reflector is
QD I 2uuT
kuk2 (2.34)
where here u is not restricted to be a unit vector. Notice that, in this form, replacing u by a multiple
ofu does not changeQ. The matrixQ is clearly symmetric, i.e., QT
DQ. Moreover,
QTQD Q2 D .I 2vv T/.I 2vvT/ D I 2vv T 2vv T C 4vvTvvT D I; (2.35)i.e.,Q is an orthogonal matrix. As with all orthogonal matricesQpreserves the norm of a vector,
i.e.,
kQxk2 D .Qx/TQxD xTQTQxD xTxD kxk2: (2.36)
To reduce a matrix to one that is upper triangular it is necessary to zero out columns below a certain
position. We will show how to construct a Householder reflector so that its action on a given vector
x is a multiple ofe1, the first column of the identity matrix. To zero out a vector below rowk we
can use a matrix of the form
QD I 00 Q
where Iis the.k 1/ .k 1/identity matrix and Qis a.m k C 1/ .m k C 1/ Householdermatrix. Thus, for a given vector x we would like to choose a vector u so thatQx is a multiple of
the unit vectore1, i.e.,
QxD x 2.uTx/
kuk2 u D e1: (2.37)
SinceQ preserves norms, we must have jj D kxk. Therefore, equation (2.37) becomes
Qx
Dx
2.uTx/
kuk2
u
D kx
ke1: (2.38)
It follows from equation (2.38) thatu must be a multiple of the vector x kxke1. Sinceucan bereplaced by a multiple ofuwithout changingQ, we let
u D x kxke1: (2.39)It follows from the definition ofuin equation (2.39) that
uTxD kxk2 kxkx1 (2.40)
21
8/11/2019 NumLinAlg
23/72
and
kuk2 D uTu D kxk2 kxkx1 kxkx1 C kxk2 D 2.kxk2 kxkx1/: (2.41)Therefore,
2.uTx/
kuk
2 D 1; (2.42)
and henceQx becomes
QxD x 2.uTx/
kuk2 u D x u D kxke1 (2.43)
as desired. From what has been discussed so far, either of the signs in equation (2.39) would
produce the desired result. However, ifx1 is very large compared to the other components, then it
is possible to lose accuracy through subtraction in the computation ofu D x kxke1. To preventthis we chooseuto be
u D x C sign.x1/kxke1 (2.44)where sign.x1/is defined by
sign.x1/ D(
C1 x1 01 x1 < 0:
(2.45)
With this choice ofu, equation (2.43) becomes
QxD sign.x1/kxke1: (2.46)
In practice,uis often scaled so that u1D 1, i.e.,
u
D
x C sign.x1/kxke1
x1 C sign.x1/kxk : (2.47)
With this choice ofu,
kuk2 D 2kxkkxk C jx1j : (2.48)
The matrixQapplied to a general vectory is given by
QyD y 2uTy
kuk2u: (2.49)
2.3.3 Complex Householder Matrices
Thee are several ways to generalize Householder matrices to the complex case. The most obvious
is to let
UD I 2 uuH
kuk2where the superscriptHdenotes conjugate transpose. It can be shown that a matrix of this form
is both Hermitian (UD UH) and unitary (UHUD I). However, it is sometimes convenient
22
8/11/2019 NumLinAlg
24/72
to be able to construct a U such that UHx is a real multiple ofe1. This is especially true when
converting a Hermitian matrix to tridiagonal form prior to an eigenvalue computation. For in this
case the tridiagonal matrix becomes a real symmetric matrix even when starting with a complex
Hermitian matrix. Thus, it is not necessary to have a separate eigenvalue routine for the complex
case. It turns out that there is no Hermitian unitary matrix U, as defined above, that is guaranteed to
produce a real multiple ofe1. Therefore, linear algebra libraries such as LAPACK use elementaryunitary matrices of the form
UD I wwH (2.50)wherecan be complex. These matrices are not in general Hermitian. IfUis to be unitary, we
must have
ID UHUD .I wwH/.I wwH/ D I .C jj2 kwk2/ww H
and hence
jj2 kwk2 D 2 Re./: (2.51)
Notice that replacingw byw=and by jj2in equation (2.50) leavesU unchanged. Thus, ascaling ofw can be absorbed in. We would like to choosew and so that
UHxD x .wHx/wD kxke1 (2.52)
where D 1. It can be seen from equation (2.52) that w must be proportional to the vectorx kxke1. Since the factor of proportionality can be absorbed in, we choose
wD x kxke1: (2.53)
Substituting this expression forw into equation (2.52), we get
UHxD x .wHx/.x kxke1/ D .1 wHx/xC .wHx/kxke1D kxke1: (2.54)
Thus, we must have
.wHx/D 1 or D 1xHw
: (2.55)
This choice of gives
UHxD kxke1:It follows from equation (2.53) that
xHw
D kx
k2
kx
kx1 (2.56)
and
kwk2 D .xH kxkeT1/.x kxke1/ D kxk2 kxkx1 kxkx1 C kxk2D 2kxk2 kxk Re.x1/ (2.57)
23
8/11/2019 NumLinAlg
25/72
Thus, it follows from equations (2.55)(2.57) that
2 Re. /
jj2 D C
D 1
C 1
D xHw C wHx
D kxk2 kxkx1C kxk2 kxkx1D 2kxk2 2kxk Re.x1/D kwk2;
i.e., the condition in equation (2.51) is satisfied. It follows that the matrix Udefined by equation
(2.50) is unitary when w is defined by equation (2.53) and is defined by equation (2.55). As
before we chooseto prevent the loss of accuracy due to subtraction in equation ( 2.53). In this
case we chooseD signRe.x1/. Thus,w becomeswD x C signRe.x1/kxke1: (2.58)
Let us define a real constant by
D signRe.x1/kxk: (2.59)With this definitionwbecomes
wD x C e1: (2.60)It follows that
xHwD kxk2 C x1D 2 C x1D . C x1/; (2.61)and hence
D 1.
Cx1/
: (2.62)
In LAPACKw is scaled so thatw1D 1, i.e.,
wD x C e1x1 C
: (2.63)
With thisw ,becomes
Djx1 C j2
. C x1/D .x1 C /.x1 C /
. C x1/ D x1 C
: (2.64)
Clearly thissatisfies the inequality
j 1j Djx1jjj Djx1jkxk 1: (2.65)
It follows from equation (2.64) thatis real whenx1is real. Thus,Uis Hermitian whenx1is real.
An alternate approach to defining a complex Householder matrix is to let
UD I 2w wH
kwk2 : (2.66)
24
8/11/2019 NumLinAlg
26/72
ThisUis Hermitian and
UHUD
I 2wwH
kwk2
I 2wwH
kwk2
D I 2wwH
kwk2 2ww H
kwk2 C4kwk2ww H
kwk4 D I; (2.67)
i.e.,Uis unitary. We want to choosew so that
UHxD UxD x 2wHx
kwk2 wD kxke1 (2.68)
where jj D 1. Multiplying equation (2.68) byxH, we get
xHUxD xHUHxD xHUxD kxkx1: (2.69)
SincexHUxis real, it follows that x1is real. Ifx1D jx1jei1, thenmust have the form
D ei1: (2.70)
It follows from equation (2.68) that w must be proportional to the vector x ei1kxke1. Sincemultiplying wby a constant factor doesnt change U, we take
wD x ei1kxke1: (2.71)
Again, to avoid accuracy problems, we choose the plus sign in the above formula, i.e.,
wD x C ei1kxke1: (2.72)
It follows from this definition that
kwk2 D xH C ei1kxkeT1 x C ei1kxke1D kxk2 C jx1jkxk C jx1jkxk C kxk2 D 2kxk
kxk C jx1j (2.73)and
wHxD xH C ei1kxkeT1 xD kxk2 C ei1x1kxk D kxkkxk C jx1j: (2.74)Therefore,
2wHx
kwk2 D 1; (2.75)
and henceUxD x wD x x C ei1kxke1 D ei1kxke1: (2.76)
This alternate form for the Householder matrix has the advantage that it is Hermitian and that the
multiplier ofwwH is real. However, it cant in general map a given vectorx into a real multiple of
e1. Both EISPACK and LINPACK use elementary unitary matrices similar to this. The LAPACK
form is not Hermitian, involves a complex multiplier ofwwH, but can produce a real multiple of
e1 when acting on x. As stated before, this can be a big advantage when reducing matrices to
triangular form prior to an eigenvalue computation.
25
8/11/2019 NumLinAlg
27/72
2.3.4 Givens Rotations
Householder matrices are very good at producing long strings of zeroes in a row or column. Some-
times, however, we want to produce a zero in a matrix while altering as little of the matrix as
possible. This is true when dealing with matrices that are very sparse (most of the elements are al-
ready zero) or when performing many operations in parallel. The Givens rotations can sometimes
be used for this purpose. We will begin by considering the case where all matrices and vectors are
real. The complex case will be considered in the next section.
The two-dimensional matrix
RD
cos sin sin cos
rotates a 2-vector counterclockwise through an angle . If we let cD cos andsD sin , thenthe matrixRcan be written as
R
D c s
s c
where c 2 C s2 D 1. Ifx is a 2-vector, we can determine c ands so that Rx is a multiple ofe1.Since
RxD
cx1 sx2sx1C cx2
;
R will have the desired property ifcD x1=q
x21C x22 andsD x2=q
x21C x22 . In factRx Dqx21C x22 e1.
Givens matrices are an extension of this two-dimensional rotation to higher dimensions. For j > i ,
the givens matrix G.i;j/is anmmmatrix that performs a counterclockwise rotation in the .i;j/coordinate plane. It can be obtained by replacing the.i; i/ and.j;j / components of the m midentity matrix byc , the.i;j /component by s and the.j;i /component by s. It has the matrixform
G.i;j/D
coli colj
rowi
row j1 1 : : : c s: : :
s c: : :
1
1
(2.77)26
8/11/2019 NumLinAlg
28/72
wherec2 C s2 D 1. The matrixG.i; j /is clearly orthogonal. In terms of components
G.i;j/klD 1 kD l,k i andk jc kD l,kD i orkD js kD i ,lD js kD j ,lD i
0 otherwise
: (2.78)
Multiplying a vector byG.i; j /only affects thei andj components. IfyD G.i;j/x, then
ykD
xk k i andk jcxi sxj kD isxiC cxj kD j
: (2.79)
Suppose we want to makeyj
D0. We can do this by setting
cD xiqx2iC x2j
and sD xjqx2iC x2j
: (2.80)
With this choice forc and s,y becomes
ykD
xk k i andk jq
x2iC x2j kD i0 kD j
: (2.81)
Multiplying a matrixA on the left byG.i;j/only alters rows i andj . Similarly, MultiplyingAon the right byG.i;j /only alters columnsi andj
2.3.5 Complex Givens Rotations
For the complex case we replaceR in the previous section by
RD
c ss c
wherec is real. (2.82)
It can be easily verified thatRis unitary if and only ifc ands satisfy
c2 C jsj2 D 1:
Given a 2-vector x, we want to choose R so thatRx is a multiple ofe1. ForR unitary, we must
have
RxD kxke1 where jj D 1. (2.83)
27
8/11/2019 NumLinAlg
29/72
Multiplying equation (2.83) byRH, we get
xD RRHxD kxkRHe1D kxk
c
s
(2.84)
or
cD x1kxk and sDx
2
kxk : (2.85)
We define sign.u/for ucomplex by
sign.u/ D(
u=juj u 01 u D 0: (2.86)
Ifc is to be real,must have the form
D sign.x1/ D 1:
With this choice of,c and s become
cD jx1jkxk and sD
x2 sign.x1/kxk :
(2.87)
If we want the complex case to reduce to the real case when x1 and x2 are real, then we can
chooseD signRe.x1/. As before, we can constructG.i; j / by replacing the .i;i / and.j;j /components of the identity matrix by c, the.i;j /component by s, and the.j;i / component bys. In the expressions for c ands in equation (2.87), we replace x1 by xi , x2 by xj, andkxk by
qjxi j2 C jxjj2.
2.3.6 QR Factorization Using Householder Reflectors
LetAbe anm nmatrix withm > n. LetQ1be a Householder matrix that maps the first columnofA into a multiple ofe1. ThenQ1A will have zeroes below the diagonal in the first column. Now
let
Q2D
1 0
0 OQ2
where OQ2 is an .m
1/
.m
1/ Householder matrix that will zero out the entries below the
diagonal in the second column ofQ1A. Continuing in this manner, we can construct Q2; : : : ; Qn1so that
Qn1 Q1A DOR
0
(2.88)
where ORis ann ntriangular matrix. The matricesQk have the form
QkD
I 0
0 OQk
(2.89)
28
8/11/2019 NumLinAlg
30/72
where OQk is an.m k C 1/ .m k C 1/Householder matrix. If we define
QH D Qn1 Q1 and RDOR
0
; (2.90)
then equation (2.88) can be written QHA D R: (2.91)Moreover, since eachQk is unitary, we have
QHQD .Qn1 Q1/.QH1 QHn1/ D I; (2.92)
i.e.,Q is unitary. Therefore, equation (2.91) can be written
A D QR: (2.93)
Equation (2.93) is the desired factorization. The operations count for this factorization is approxi-
matelymn2
where an operation is an addition and a multiplication. In practice it is not necessaryto construct the matrix Qexplicitly. Usually only the vectorsv defining eachQk are saved.
If OQis the matrix consisting of the first ncolumns ofQ, then
A D OQOR (2.94)
where OQ is am n matrix with orthonormal columns and OR is an nupper triangular matrix.The factorization in equation (2.94) is the reducedQRfactorization.
2.3.7 Uniqueness of the Reduced QR Factorization
In this section we will show that a matrix A of full rank has a unique reduced QRfactorization if
we require that the triangular matrix Rhas positive diagonals. All other reduced QR factorizations
ofAare simply related to this one with positive diagonals.
The reducedQRfactorization can be written
A
D.a1; a2; : : : ; an/
D.q1; q2; : : : ; qn/r11 r12 r1nr22 r2n: :
:
:::
rnn
: (2.95)IfA has full rank, then all of the diagonal elements rjj must be nonzero. Equating columns in
equation (2.95), we get
ajDjX
kD1
rkjqkD rjjqjCj1XkD1
rkjqk
29
8/11/2019 NumLinAlg
31/72
or
qjD 1rjj
.ajj1XkD1
rkjqk/: (2.96)
WhenjD 1equation (2.96) reduces to
q1D a1r11
: (2.97)
Sinceq1must have unit norm, it follows that
jr11j D ka1k: (2.98)
Equations (2.97) and(2.98) determineq1 and r11 up to a factor having absolute value one, i.e.,
there is ad1 with jd1j D 1such that
r11
Dd1
Or11 q1
D
Oq1d
1
whereOr11D ka1k andOq1D a1=Or11.
ForjD 2, equation (2.96) becomes
q2D 1r22
.a2 r12q1/:
Since the columnsq1and q2 must be orthonormal, it follows that
0 D qH1 q2D 1
r22.qH1 a2 r12/
and hence that
r12D qH1 a2D d1 OqH1 a2: (2.99)Here we have used the fact thatd1D 1=d1. Sinceq2has unit norm, it follows that
1 D kq2k D 1jr22jka2 r12q1k D 1jr22j
ka2 .d1 OqH1 a2/ Oq1=d1k D 1
jr22jka2 . OqH1 a2/ Oq1k
and hence that
jr22j D ka2 . OqH1 a2/ Oq1k Or22:Therefore, there exists a scalar d2with
jd2
j D1such that
r22D d2 Or22 and q2D Oq2=d2whereOq2D
a2 . OqH1 a2/ Oq1
=Or22.
ForjD 3, equation (2.96) becomes
q3D 1r33
.a3 r13q1 r23q2/:
30
8/11/2019 NumLinAlg
32/72
Since the columnsq1,q2 and q3must be orthonormal, it follows that
0 D qH1 q3D 1
r33.qH1 a3 r13/
0
DqH2 q3
D 1
r33.qH2 a3
r23/
and hence that
r13D qH1 a3D d1 OqH1 a3r23D qH2 a3D d2 OqH2 a3:
Sinceq3has unit norm, it follows that
1 D kq3k D 1jr33jka3 r13q1 r23q2k D 1jr33j
ka3 . OqH1 a3/ Oq1 . OqH2 a3/ Oq2k
and hence that
jr33j D ka3 . Oq1Ha3/Oq1 . Oq2Ha3/Oq2k Or33:Therefore, there exists a scalar d3with jd3j D 1such that
r33D d3 Or33 and q3D Oq3=d3 (2.100)
whereOq3D
a3 .Oq1Ha3/ Oq1 .Oq2Ha3/ Oq2
=Or33. Continuing in this way we obtain the unitarymatrix OQD . Oq1; : : : ;Oqn/and the triangular matrix
ORD Or11 Or12 Or1nOr22 Or2n: : : :::
Ornn
such thatAD OQORis the unique reduced QRfactorization ofA withR having positive diagonalelements. IfA D QRis any other reducedQRfactorization ofA, then
RDd1
: : :
dn ORand
QD OQ
1=d1
: : :
1=dn
D OQ
d1
: : :
dn
where jd1j D D jdnj D 1.
31
8/11/2019 NumLinAlg
33/72
2.3.8 Solution of Least Squares Problems
In this section we will show how to use the QRfactorization to solve the least squares problem.
Consider the system of linear equations
AxD
b (2.101)
where A is an m n matrix with m > n. In general there is no solution to this system of equa-tions. Instead we seek to find an x so thatkAx bk is as small as possible. In view of theQRfactorization, we have
kAx bk2 D kQRx bk2 D kQ.Rx QHb/k2 D kRx QHbk2: (2.102)
We can writeQ in the partitioned formQD .Q1; Q2/whereQ1 is anm nmatrix. Then
Rx QHbDORx
0
QH1 b
QH2 b
DORx QH1 b
QH2 b
: (2.103)
It follows from equation (2.103) that
kRx QHbk2 D kORx QH1 bk2 C kQH2 bk2: (2.104)
Combining equations (2.102) and(2.104), we get
kAx bk2 D kORx QH1 bk2 C kQH2 bk2: (2.105)
It can be easily seen from this equation that kAx bk is minimized whenx is the solution of thetriangular system
ORx
DQH1 b (2.106)
when such a solution exists. This is the standard way of solving least square systems. Later we will
discuss the singular value decomposition (SVD) that will provide even more information relative
to the least squares problem. However, the SVD is much more expensive to compute than the QR
decomposition.
2.4 The Singular Value Decomposition
The Singular Value Decomposition (SVD) is one of the most important and probably one of the
least well known of the matrix factorizations. It has many applications in statistics, signal process-
ing, image compression, pattern recognition, weather prediction, and modal analysis to name a
few. It is also a powerful diagnostic tool. For example, it provides approximations to the rank and
the condition number of a matrix as well as providing orthonormal bases for both the range and
the null space of a matrix. It also provides optimal low rank approximations to a matrix. The SVD
is applicable to both square and rectangular matrices. In this regard it provides a general solution
to the least squares problem.
32
8/11/2019 NumLinAlg
34/72
The SVD was first discovered by differential geometers in connection with the analysis of bilinear
forms. Eugenio Beltrami [1] (1873) and Camille Jordan [10] (1874) independently discovered
that the singular values of the matrix associated with a bilinear form comprise a complete set
of invariants for the form under orthogonal substitutions. The first proof of the singular value
decomposition for rectangular and complex matrices seems to be by Eckart and Young [5] in 1939.
They saw it as a generalization of the principal axis transformation for Hermitian matrices.
We will begin by deriving the SVD and presenting some of its most important properties. We will
then discuss its application to least squares problems and matrix approximation problems. Follow-
ing this we will show how singular values can be used to determine the condition of a matrix (how
close the rows or columns are to being linearly dependent). We will conclude with a brief outline
of the methods used to compute the SVD. Most of the methods are modifications of methods used
to compute eigenvalues and vectors of a square matrix. The details of the computational methods
are beyond the scope of this presentation, but we will provide references for those interested.
2.4.1 Derivation and Properties of the SVD
Theorem 1. (Singular Value Decomposition)LetAbe a nonzerom nmatrix. Then there existsan orthonormal basisu1; : : : ; umof m-vectors, an orthonormal basis v1; : : : ; vn of n-vectors, and
positive numbers 1; : : : ; r such that
1. u1; : : : ; ur is a basis of the range ofA
2. vrC1; : : : ; vnis a basis of the null space ofA
3. A D PrkD1 kukvHk :Proof: AHA is a Hermitian nn matrix that is positive semidefinite. Therefore, there is anorthonormal basisv1; : : : ; vn and nonnegative numbers
21 ; : : : ;
2n such
AHAvkD 2k vk kD 1 ; : : : ; n : (2.107)Since A is nonzero, at least one of the eigenvalues 2
k must be positive. Let the eigenvalues be
arranged so that21 22 2r > 0and 2rC1D D 2nD 0. Consider now the vectorsAv1; : : : ; A vn. We have
.Avi /HAvj
DvHi A
HAvjD
2jvHi vj
D0 i
j; (2.108)
i.e.,Av1; : : : ; A vnare orthogonal. WheniD jkAvik2 D vHi AHAviD 2ivHi viD 2i > 0 iD 1 ; : : : ; r
D 0 i > r: (2.109)Thus, AvrC1D D AvnD 0 and hence vrC1; : : : ; vn belong to the null space ofA. Defineu1; : : : ; ur by
uiD .1=i /Avi iD 1 ; : : : ; r : (2.110)
33
8/11/2019 NumLinAlg
35/72
Thenu1; : : : ; ur is an orthonormal set of vectors in the range ofA that span the range ofA. Thus,
u1; : : : ; ur is a basis for the range ofA. The dimensionr of the range ofA is called the rank of
A. Ifr < m, we can extend the set u1; : : : ; ur of orthonormal vectors to an orthonormal basis
u1; : : : ; um of m-space using the Gram-Schmidt process. Ifx is an n-vector, we can write x in
terms of the basisv1; : : : ; vn as
xDnX
kD1
.vHk x/vk: (2.111)
It follows from equations (2.110) and(2.111) that
AxDnX
kD1
.vHk x/AvkDrX
kD1
.vHk x/kukDrX
kD1
kukvHk x: (2.112)
Sincex in equation (2.112) was arbitrary, we must have
A DrX
kD1kukv
H
k : (2.113)
The representation ofA in equation (2.113) is called the singular value decomposition (SVD). If
x belongs to the null space ofA (AxD 0), then it follows from equation (2.112) and the linearindependence of the vectors u1; : : : ; ur that v
Hk
xD 0 for kD 1; : : : ; r . It then follows fromequation (2.111) that
xDnX
kDrC1
.vHk x/vk;
i.e.,vrC1; : : : ; vn span the null space ofA. SincevrC1; : : : ; vn are orthonormal vectors belonging
to the null space ofA, they form a basis for the null space ofA.
We will now express the SVD in matrix form. Define UD .u1; : : : ; um/,VD .v1; : : : ; vn/, andSD diag.1; : : : ; r /. Ifr
8/11/2019 NumLinAlg
36/72
Generally we write the SVD in the form (2.114) with the understanding that some of the zero
portions might collapse and disappear.
We next give a geometric interpretation of the SVD. For this purpose we will restrict ourselves to
the real case. Letx be a point on the unit sphere, i.e., kxk D 1. Sinceu1; : : : ; ur is a basis for therange ofA, there exist numbersy1; : : : ; yk such that
AxDrX
kD1
ykuk
DrX
kD1
k.vTkx/uk :
Therefore, ykD k.vTkx/,kD 1; : : : ; r . Since the columns ofVform an orthonormal basis, wehave
xDnX
kD1.v
T
kx/vk:
Therefore,
kxk2 DnX
kD1
.vTkx/2 D 1:
It follows thaty2121
C C y2r
2rD .vT1x/2 C C .vTr x/2 1:
Here equality holds when r D n. Thus, the image ofx lies on or interior to the hyper ellipsoidwith semi axes1u1; : : : ; r ur . Conversely, ify1; : : : ; yr satisfy
y2121
C C y2r
2r 1;
we define2 D 1 PrkD1.yk=k/2 andxD
rXkD1
yk
kvkC vrC1:
SincevrC1is in the null space ofA and AvkD kuk (k r), it follows that
AxDrX
kD1
yk
kAvkC AvrC1D
rXkD1
ykuk:
In addition,
kxk2 DrX
kD1
y2k
2kC 2 D 1:
35
8/11/2019 NumLinAlg
37/72
Thus, we have shown that the image of the unit sphere kxk D 1under the mappingAis the hyperellipsoid
y2121
C C y2r
2r1
relative to the basis u1; : : : ; ur . When rD
n, equality holds and the image is the surface of the
hyper ellipsoid
y2121
C Cy2r
2nD 1:
2.4.2 The SVD and Least Squares Problems
In least squares problems we seek an x that minimizeskAx bk. In view of the singular valuedecomposition, we have
kAx bk2 DUS 00 0
VHx b
2DU
S 0
0 0
VHx UHb
2
D
S 0
0 0
VHx UHb
2
: (2.118)
If we define
y
D y1
y2 D V
Hx (2.119)
ObD Ob1
Ob2
!D UHb: (2.120)
then equation (2.118) can be written
kAx bk2 D
Sy1 Ob1 Ob2
!2
D kSy1 Ob1k2 C k Ob2k2: (2.121)
It is clear from equation (2.121) that kAx bk is minimized wheny1DS1 Ob1. Therefore, theythat minimizes kAx bk is given by
yD
S1 Ob1y2
y2arbitrary. (2.122)
In view of equation (2.119), thex that minimizes kAx bk is given by
xD VyD V
S1 Ob1y2
y2 arbitrary. (2.123)
36
8/11/2019 NumLinAlg
38/72
SinceVis unitary, it follows from equation (2.123) that
kxk2 D kS1 Ob1k2 C ky2k2:
Thus, there is a unique xof minimum norm that minimizes kAx bk, namely the xcorrespondingtoy2D 0. Thisx is given by
xD V
S1 Ob10
D V
S1 0
0 0
Ob1Ob2
!
D V
S1 0
0 0
UHb:
The matrix multiplyingb on the right-hand-side of this equation is called the generalized inverse
ofAand is denoted byAC
, i.e.,
AC D V
S1 0
0 0
UH: (2.124)
Thus, the minimum norm solution of the least squares problem is given byxD ACb. Then mmatrixAC plays the same role in least squares problems that A1 plays in the solution of linear
equations. We will now show that this definition of the generalized inverse gives the same result
as the classical Moore-Penrose conditions.
Theorem 2. IfA has a singular value decomposition given by
A D U
S 0
0 0
VH;
then the matrixXdefined by
XD AC D V
S1 0
0 0
UH
is the unique solution of the Moore-Penrose conditions:
1. AXA D A
2. XAXD X3. .AX/H D AX4. .XA/H D XA.
37
8/11/2019 NumLinAlg
39/72
Proof:
AXA D U
S 0
0 0
VHV
S1 0
0 0
UHU
S 0
0 0
VH
DUS 0
0 0I 0
0 0VH
D U
S 0
0 0
VH
D A;i.e.,Xsatisfies condition (1).
XAXD V
S1 0
0 0
UHU
S 0
0 0
VHV
S1 0
0 0
UH
D VS1 0
0 0UH
D X;i.e.,Xsatisfies condition (2). Since
AXD U
S 0
0 0
VHV
S1 0
0 0
UH D U
I 0
0 0
UH
and
XA D V
S1 0
0 0
UHU
S 0
0 0
VH D V
I 0
0 0
VH;
it follows that both AX and XA are Hermitian, i.e., Xsatisfies conditions (3) and (4). To show
uniqueness let us suppose that bothXandY satisfy the Moore-Penrose conditions. Then
XD XAX by (2)D X.AX/H D XXHAH by (3)D XXH.AYA/H D XXHAHYHAH by (1)D XXHAH.AY /H D XXHAHAY by (3)D X.AX/HAYD XAXAY by (3)D XAY by (2)D X.AYA/Y by (1)
DXA.YA/Y
DXA.YA/HY
DXAAHYHY by (4)
D .XA/HAHYHYD AHXHAHYHY by (4)D .AXA/HYHYD AHYHY by (1)D .YA/HYD YAY by (4)D Y by (2):
Thus, there is only one matrix Xsatisfying the Moore-Penrose conditions.
38
8/11/2019 NumLinAlg
40/72
2.4.3 Singular Values and the Norm of a Matrix
LetAbe anm nmatrix. By virtue of the SVD, we have
AxDrX
kD1k.v
H
k x/uk for any n-vectorx: (2.125)
Since the vectorsu1; : : : ; ur are orthonormal, we have
kAxk2 DrX
kD1
2k jvHk xj2 21rX
kD1
jvHk xj2 21 kxk2: (2.126)
The last inequality comes from the fact thatx has the expansion xD PnkD1.vHk x/vk in terms ofthe orthonormal basisv1; : : : ; vn and hence
kxk2
DnX
kD1jv
H
k xj2
:
Thus, we have
kAxk 1kxk for allx. (2.127)SinceAv1D 1u1, we have kAv1k D 1D 1kv1k. Hence,
maxx0
kAxkkxk D 1; (2.128)
i.e.,A cant stretch the length of a vector by a factor greater than 1. One of the definitions of the
norm of a matrix is
kAk D supx0
kAxkkxk : (2.129)
It follows from equations (2.128) and(2.129) that kAk D1 (the maximum singular value ofA).IfAis of full rank (r=n), then it follows by a similar argument that
minx0
kAxkkxk D n:
IfAis anm nmatrix andB is ann pmatrix, then for everyp-vectorx we havekABxk kAk kB xk kAk kBk kxk
and hence kABk kAk kBk.
2.4.4 Low Rank Matrix Approximations
You can think of the rank of a matrix as a measure of redundancy. Matrices of low rank should
have lots of redundancy and hence should be capable of specification by less parameters than the
39
8/11/2019 NumLinAlg
41/72
total number of entries. For example, if the matrix consists of the pixel values of a digital image,
then a lower rank approximation of this image should represent a form of image compression. We
will make this concept more precise in this section.
One choice for a low rank approximation to Ais the matrixAkD PkiD1 i ui v
Hi fork < r. Ak is
a truncated SVD expansion ofA. Clearly
A AkDrX
iDkC1
i ui vHi : (2.130)
Since the largest singular value ofA Ak iskC1, we have
kA Akk D kC1: (2.131)
Suppose B is another mnmatrix of rankk. Then the null space N ofB has dimension nk. Letw1; : : : ; wnk be a basis for N. Then C 1 n-vectorsw1; : : : ; wnk ; v1; : : : ; vkC1 must be linearlydependent, i.e., there are constants1; : : : ; ank and1; : : : ; kC1, not all zero, such that
nkXiD1
i wiCkC1XiD1
i viD 0:
Not all of thei can be zero since v1; : : : ; vkC1 are linearly independent. Similarly, not all of the
i can be zero. Therefore, the vector h defined by
h DnkXiD1
i wiD kC1XiD1
i vi
is a nonzero vector that belongs to both N and < v1; : : : ; vkC1 >. By proper scaling, we can
assume thathis a vector with unit norm. Sincehbelongs to< v1; : : : ; vkC1 >, we have
h DkC1XiD1
.vHi h/vi : (2.132)
Therefore,
khk2 DkC1XiD1
jvHi hj2: (2.133)
SinceAviD i ui foriD 1 ; : : : ; r, it follows from equation (2.132) that
Ah DkC1XiD1
.vHi h/AviDkC1XiD1
.vHi h/i ui : (2.134)
Therefore,
kAhk2 DkC1XiD1
jvHi hj22i 2kC1kC1XiD1
jvHi hj2 D 2kC1khk2: (2.135)
40
8/11/2019 NumLinAlg
42/72
Sincehbelongs to the null space N, we have
kA Bk2 k.A B /hk2 D kAhk2 2kC1khk2 D 2kC1: (2.136)
Combining equations (2.131) and(2.136), we obtain
kA Bk kC1D kA Akk: (2.137)
Thus,Ak is the rankk matrix that is closest to A.
2.4.5 The Condition Number of a Matrix
SupposeA is ann ninvertible matrix andx is the solution of the system of equations AxD b .We want to see how sensitive x is to perturbations of the matrixA. LetxC x be the solution tothe perturbed system .A
CA/.x
Cx/
Db. Expanding the left-hand-side of this equation and
neglecting the second order perturbations Ax, we get
A xC A xD 0 or xD A1Ax: (2.138)
It follows from equation (2.138) that
kxk kA1kkAkkxk
or kxk=kxk
kA
k=
kA
k
kA1kkAk: (2.139)
The quantity kA1kkAk is called the condition number ofAand is denoted by.A/, i.e.,
.A/ D kA1kkAk:
Thus, equation (2.139) can be written
kxk=kxkkAk=kAk .A/: (2.140)
We have seen previously thatkAk D 1, the largest singular value. SinceA1 has the singularvalue decompositionA1
D V S1UH, it follows that
kA1
k D1=n. Therefore, the condition
number is given by.A/ D 1
n: (2.141)
The condition number is sort of an aspect ratio of the hyper ellipsoid that A maps the unit sphere
into.
41
8/11/2019 NumLinAlg
43/72
2.4.6 Computation of the SVD
The methods for calculating the SVD are all variations of methods used to calculate eigenvalues
and eigenvectors of Hermitian Matrices. The most natural procedure would be to follow the deriva-
tion of the SVD and compute the squares of the singular values and the unitary matrix Vby solving
the eigenproblem forAHA. TheUmatrix would then be obtained from AV. Unfortunately, this
procedure is not very accurate due to the fact that the singular values ofAHA are the squares of the
singular values ofA. As a result the ratio of largest to smallest singular value can be much larger
forAHAthan forA. There are, however, implicit methods that solve the eigenproblem forAHA
without ever explicitly formingAHA. Most of the SVD algorithms first reduceA to bidiagonal
form (all elements zero except the diagonal and first superdiagonal). This can be accomplished
using householder reflections alternately on the left and right as shown in figure2.2.
A1D UH1 A D
x x x x0 x x x
0 x x x
0 x x x
0 x x x
! A2D A1V1D
x x 0 00 x x x
0 x x x
0 x x x
0 x x x
!
A3D UH2 A2D
x x 0 0
0 x x x
0 0 x x
0 0 x x
0 0 x x
! A4D A3V2D
x x 0 0
0 x x 0
0 0 x x
0 0 x x
0 0 x x
!
A5D UH3 A4D
x x 0 0
0 x x 0
0 0 x x
0 0 0 x
0 0 0 x
! A6D UH4 A5D
x x 0 0
0 x x 0
0 0 x x
0 0 0 x
0 0 0 0
:
Figure 2.2: Householder reduction of a matrix to bidiagonal form.
Since the application of the Householder reflections on the right dont try to zero all the elementsto the right of the diagonal, they dont affect the zeroes already obtained in the columns. We have
seen that, even in the complex case, the Householder matrices can be chosen so that the resulting
bidiagonal matrix is real. Notice also that when the number of rowsm is greater than the number
of columns n, the reduction produces zero rows after rown. Similarly, whenn > m, the reduction
produces zero columns after columnm. If we replace the products of the Householder reflections
by the unitary matrices OU and OV, the reduction to a bidiagonalB can be written asBD OUHA OV or A DOU BOVH: (2.142)
42
8/11/2019 NumLinAlg
44/72
IfB has the SVDBD NU NVT, thenA has the SVD
A DOU .NU NVT/ OVH D .OUNU/. OVNV /H D UVH;
whereUD OUNU andVDOVNV. Thus, it is sufficient to find the SVD of the real bidiagonal matrixB . Moreover, it is not necessary to carry along the zero rows or columns ofB . For if the squareportionB1ofB has the SVDB1D U11VT1 , then
BD
B10
D
U11VT
1
0
D
U1 0
0 I
10
VT1 (2.143)
or
BD .B1; 0/ D .U11VT1 ; 0/ D U1.1; 0/
V1 0
0 I
T: (2.144)
Thus, it is sufficient to consider the computation of the SVD for a real, square, bidiagonal matrix
B .
In addition to the implicit methods of finding the eigenvalues ofB TB, some methods look instead
at the symmetric matrix
0 B T
B 0
. If the SVD ofB is B D U VT, then
0 B T
B 0
has the
eigenequation 0 B T
B 0
V V
U U
D
V V
U U
0
0
: (2.145)
In addition, the matrix
0 BT
B 0
can be reduced to a real tridiagonal matrixTby the relation
T
DPTBP (2.146)
where PD .e1; enC1; e2; enC2; : : : ; en; e2n/ is a permutation matrix formed by a rearrangementof the columns e1; e2; : : : ; e2n of the 2n2 n identity matrix. The matrixP is unitary and issometimes called theperfect shufflesince its operation on a vector mimics a perfect card shuffle of
the components. The algorithms based on this double size Symmetric matrix dont actually form
the double size matrix, but make efficient use of the symmetries involved in this eigenproblem.
For those interested in the details of the various SVD algorithms, I would refer you to the book by
Demmel [4].
In Matlab the SVD can be obtained by the call [U,S,V]=svd(A). In LAPACK the general driver
routines for the SVD are SGESVD, DGESVD, and CGESVD depending on whether the matrix is
real single precision, real double precision, or complex.
43
8/11/2019 NumLinAlg
45/72
Chapter 3
Eigenvalue Problems
Eigenvalue problems occur quite often in Physics. For example, in Quantum Mechanics eigen-values correspond to certain energy states; in structural mechanics problems eigenvalues often
correspond to resonance frequencies of the structure; and in time evolution problems eigenvalues
are often related to the stability of the system.
LetA be anm msquare matrix. A nonzero vector x is an eigenvector ofA and is its corre-sponding eigenvalue, if
AxD x:The set of vectors
VD fxW AxD xg
is a subspace called the eigenspace corresponding to . The equationAxD x is equivalent to.A I/xD 0. Ifis an eigenvalue, then the matrixA I is singular and hencedet.A I/ D 0:
Thus, the eigenvalues ofA are roots of a polynomial equation of order n. This polynomial equation
is called the characteristic equation ofA. Conversely, ifp.z/ D a0 Ca1z C C an1zn1 Canznis an arbitrary polynomial of degreen (an 0), then the matrix0 a0=an1 0 a1=an1 0 a2=an1 : : : :::
: : : 0 an2=an1 an1=an
hasp.z/ D 0as its characteristic equation.
In some problems an eigenvalue might correspond to a multiple root of the characteristic equa-
tion. The multiplicity of the rootis called its algebraic multiplicity. The dimension of the space
44
8/11/2019 NumLinAlg
46/72
V is called its geometric multiplicity. If for some eigenvalue ofA, the geometric multiplicity
of does not equal its geometric multiplicity, this eigenvalue is said to be defective. A matrix
with one or more defective eigenvalues is said to be a defective matrix. An example of a defective
matrix is the matrix 2 1 00 2 10 0 2
:
This matrix has the single eigenvalue 2 with algebraic multiplicity 3. However, the eigenspace
corresponding to the eigenvalue 2 has dimension 1. All the eigenvectors are multiples ofe1. In
these notes we will only consider eigenvalue problems involving Hermitian matrices (AH D A).We will see that all such matrices are non defective.
IfSis a nonsingularm mmatrix, then the matrixS1ASis said to be similar toA. Since
det.S1AS I/ D det
S1.A I/S
D det.S1/ det.A I/ det.S / D det.A I/;
it follows that S1ASand A have the same characteristic equation and hence the same eigenvalues.It can be shown that a Hermitian matrixAalways has a complete set of orthonormal eigenvectors.
If we form the unitary matrix U whose columns are the eigenvectors belonging to this orthonormal
set, then
AUD U or UHAUD (3.1)where is a diagonal matrix whose diagonal entries are the eigenvalues. Thus, a Hermitian matrix
is similar a diagonal matrix. Since a diagonal matrix is clearly non defective, it follows that all
Hermitian matrices are non defective.
Ife is a unit eigenvector of the Hermitian matrixAand is the corresponding eigenvalue, then
AeD e and hence D eHAe:
It follows that D .eHAe/H D eHAHeD eHAeD , i.e., the eigenvalues of a Hermitianmatrix are real.
It was shown by Abel, Galois and others in the nineteenth century that there can be no alge-
braic expression for the roots of a polynomial equation whose order is greater than four. Since
eigenvalues are roots of the characteristic equation and since the roots of any polynomial are the
eigenvalues of some matrix, there can be no purely algebraic method for computing eigenvalues.
Thus, algorithms for finding eigenvalues must at some stage be iterative in nature. The methods
to be discussed here first reduce the Hermitian matrix A to a real, symmetric, tridiagonal matrixT by means of a unitary similarity transformation. The eigenvalues ofTare then found using
certain iterative procedures. The most common iterative procedures are theQRalgorithm and the
divide-and-conquer algorithm.
45
8/11/2019 NumLinAlg
47/72
8/11/2019 NumLinAlg
48/72
and let 1; : : : ; n be the corresponding eigenvalues. We will assume that the eigenvalues and
eigenvectors are so ordered that
j1j j2j jnj:We will assume further that
j1
j>
j2
j. Let v be an arbitrary vector with
kv
k D1. Then there
exist constantsc1; : : : ; cn such that
vD c1v1 C C cnvn: (3.2)We will make the further assumption thatc1 0. Successively applyingA to equation (3.2), weobtain
AkvD c1Akv1 C C cnAkvnD c1k1 v1 C C cnknvn: (3.3)You can see from equation (3.3) that the term c1
k1 v1 will eventually dominate and thus A
kv,
if properly scaled at each step to prevent overflow, will approach a multiple of the eigenvector
v1. This convergence can be slow if there are other eigenvalues close in magnitude to 1. The
conditionc1
0is equivalent to the condition
< v > \ < v2; : : : ; vn >D f0g:
3.3 The Rayleigh Quotient
The Rayleigh quotient of a vectorx is the real number
r.x/
D
xTAx
xT
x
:
Ifx is an eigenvector ofA corresponding to the eigenvalue, thenr.x/D. Ifx is any nonzerovector, then
kAx xk2 D .xTAT xT/.Ax x/D xTATAx 2xTAx C 2xTxD xTATAx 2r.x/xTx C 2xTx C r2.x/xTx r2.x/xTxD xTATAx C xTx r.x/2 r2.x/xTx:
Thus,
D r.x/ minimizes
kAx
x
k. If x is an approximate eigenvector, then r.x/ is an
approximate eigenvalue.
3.4 Inverse Iteration with Shifts
For any that is not an eigenvalue ofA, the matrix.A I/1 has the same eigenvectors asAand has eigenvalues .j /1 wherefjg are the eigenvalues ofA. Suppose is close to the
47
8/11/2019 NumLinAlg
49/72
eigenvalue i . Then.i /1 will be large compared to .j /1 for j i . If we apply poweriteration to .AI /1, the process will converge to a multiple of the eigenvector vi correspondingtoi . This procedure is called inverse iteration with shifts. Although the power method is not used
in practice, the inverse power method with shifts is frequently used to compute eigenvectors once
an approximate eigenvalue has been obtained.
3.5 Rayleigh Quotient Iteration
The Rayleigh quotient can be used to obtain the shifts at each stage of inverse iteration. The
procedure can be summarized as follows.
1. Choose a starting vectorv.0/ of unit magnitude.
2. Let.0/ D .v0/TAv0 be the corresponding Rayleigh quotient.3. ForkD 1 ; 2 ; : : :
Solve
A .k1/wD v.k1/ forw, i.e., compute A .k1/1v.k1/.Normalizew to obtainv.k/ D w=kwk.Let.k/ D .v.k//TAv.k/ be the corresponding Rayleigh quotient.
It can be shown that the convergence of Rayleigh quotient iteration is ultimately cubic. Cubic
convergence triples the number of significant digits on each iteration.
3.6 The Basic QR Method
The QR method was discovered independently by Francis [6] and Kublanovskaya [11] in 1961.
It is one of the standard methods for finding eigenvalues. The discussion in this section is based
largely on the paperUnderstanding the QR Algorithmby Watkins [13]. As before, we will assume
that the matrixA is real and symmetric. Therefore, there is an orthonormal basis v1; : : : ; vn such
that AvjD
jvj for each j . We will assume that the eigenvaluesj are ordered so that
j1
j j2j jnj.
The QR algorithm can be summarized as follows:
48
8/11/2019 NumLinAlg
50/72
1. ChooseA0D A2. Form D 1 ; 2 ; : : :
Am1D
QmRm QRfactorization
AmD RmQm
3. Stop whenAmis approximately diagonal.
It is probably not obvious what this algorithm has to do with eigenvalues. We will show that the QR
method is a way of organizing simultaneous iteration, which in turn is a multivector generalization
of the power method.
We can apply the power method to subspaces as well as to single vectors. SupposeS is a k-
dimensional subspace. We can compute the sequence of subspacesS;AS; A2S ; : : : . Under certain
conditions this sequence will converge to the subspace spanned by the eigenvectors v1; v2; : : : ; vkcorresponding to the klargest eigenvalues ofA. We will not provide a rigorous convergence proof,
but we will attempt to make this result seem plausible. Assume that jkj >jkC1j and define thesubspaces
TD< v1; : : : ; vk > UD< vkC1; : : : ; vn > :We will first show that all the null vectors ofA lie in U. Supposev is a null vector ofA, i.e.,
AvD 0. We can expandv in terms of the basisv1; : : : ; vn giving
vD c1v1 C C ckvkC ckC1vkC1 C C cnvn:
Thus,AvD c11v1 C C ckk vkC ckC1kC1vkC1 C C cnnvnD 0:
Since the vectorsfvjg are linearly independent andj1j jk j > 0, it follows that c1Dc2D D ckD 0, i.e.,v belongs to the subspace U. We will now make the additional assumptionS\ UD f0g. This assumption is analogous to the assumptionc1 0in the power method. Ifxis a nonzero vector inS, then we can write
xD c1v1 C c2v2 C C ckvk .component inT /C ckC1vkC1 C C cnvn: .component inU /
Thus,
Amx=.k /m D c1.1=k/mv1 C C ck1.k1=k/mvk1 C ckvk
C ckC1.kC1=k/mvkC1 C C cn.n=k/mvn:
Sincex doesnt belong to U, at least one of the coefficients c1; : : : ; ck must be nonzero. Notice
that the firstk terms on the right-hand-side do not decrease in absolute value asm! 1 whereasthe remaining terms approach zero. Thus, Amx, if properly scaled, approaches the subspaceT as
m ! 1. In the limit AmSmust approach a subspace ofT. Since S\UD f0g,A can have no null
49
8/11/2019 NumLinAlg
51/72
vectors inS. Thus,A is invertible onS. It follows that all of the subspaces AmShave dimension
kand hence the limit can not be a proper subspace ofT, i.e.,AmS! T asm ! 1.
Numerically, we cant iterate on an entire subspace. Therefore, we pick a basis of this subspace
and iterate on this basis. Letq01 ; : : : ; q0k
be a basis ofS. SinceA is invertible onS,Aq01 ; : : : ; A q0k
is a basis ofAS. Similarly, Am
q01 ; : : : ; A
m
q0k is a basis ofA
m
S for all m. Thus, in principlewe can iterate on a basis of S to obtain bases for AS;A2S ; : : : . However, for large m these
bases become ill-conditioned since all the vectors tend to point in the direction of the eigenvector
corresponding to the eigenvalue of largest absolute value. To avoid this we orthonormalize the basis
at each step. Thus, given an orthonormal basisqm1 ; : : : ; qmk
ofAmS, we computeAq m1 ; : : : ; Aqmk
and then orthonormalize these vectors (using something like the Gram-Schmidt process) to obtain
an orthonormal basis qmC11 ; : : : ; qmC1k
ofAmC1S. This process is called simultaneous iteration.
Notice that this process of orthonormalization has the property
< Aqm1 ; : : : ; Aqmi >D< q mC11 ; : : : ; qmC1i > foriD 1 ; : : : ; k :
Let us consider now what happens when we apply simultaneous iteration to the complete set of
orthonormal vectorse1 : : : ; enwhereek is thek-th column of the identity matrix. Let us define
SkD< e1; : : : ; ek >; TkD< v1; : : : ; vk >; UkD< vkC1; : : : ; vn >
for k D 1; 2 ; : : : ; n 1. We also assume thatSk\ UkD f0g andjkj >jkC1j > 0 for each1 k n 1. It follows from our previous discussion that AmSk! Tk asm! 1. In termsof bases, the orthonormal vectors q m1 ; : : : ; q
mn will converge to and orthonormal basis q1; : : : ; qn
such thatTkD< q1; : : : ; qk >for eachkD 1; : : : ; n 1. Each of the subspacesTk is invariantunderA, i.e.,ATk Tk. We will now look at a property of invariant subspaces. SupposeT is aninvariant subspace ofA. LetQD .Q1; Q2/be an orthogonal matrix such that the columns ofQ1is a basis ofT. Then
QTAQD
QT1AQ1 QT1 AQ2
QT2AQ1 QT2 AQ2
D
QT1AQ1 0
0 QT2 AQ2
, i.e., the basis consisting of the columns ofQ block diagonalizes A. Let Q be the matrix with
columnsq1; : : : ; qn. Since eachTk is invariant underA, the matrixQTAQhas the block diagonal
form
QTAQD
A1 0
0 A2
whereA1is k k
for each kD 1; : : : ; n 1. Therefore,QT
AQ must be diagonal. The diagonal entries are theeigenvalues ofA. If we define AmD QTmAQm where QmD< qm1 ; : : : ; qmn >, then Am willbecome approximately diagonal for largem.
We can summarize simultaneous iteration as follows:
50
8/11/2019 NumLinAlg
52/72
1. We start with the orthogonal matrix Q0D Iwhose columns form a basisofn-space
2. ForkD 1 ; 2 ; : : : we compute
ZmD AQm1 Power iteration step(3.4a)
ZmD QmRm Orthonormalize columns ofZm (3.4b)AmD QTmAQm Test for diagonal matrix: (3.4c)
TheQRalgorithm is an efficient way to organize these calculations. Equations (3.4a) and(3.4b)
can be combined to give
AQm1D QmRm: (3.5)Combining equations (3.4c) and(3.5), we get
Am1D QTm1AQm1D QTm1.QmRm/ D .QTm1Qm/RmD OQmRm (3.6)where OQmD QTm1Qm. Equation (3.5) can be rewritten as
QTmAQm1D Rm: (3.7)Combining equations (3.4c) and(3.7), we get
AmD QTmAQmD .QTmAQm1/QTm1QmD Rm.QTm1Qm/ D RmOQm: (3.8)Equation (3.6) is a QR factorization ofAm1. Equation (3.8) shows that Am has the same Q
and R factors but with their order reversed. Thus, theQRalgorithm generates the matrices Amrecursively without having to compute Zmand Qmat each step. Note that the orthogonal matricesOQmand Qmsatisfy the relation
OQ1OQ2 OQkD .QT0Q1/.QT1 Q2/ .QTk1Qk/ D Qk:
We have now seen that the QRmethod can be considered as a generalization of the power method.
We will see that the QRalgorithm is also related to inverse power iteration. In fact we have the
following duality result.
Theorem 3. IfA is an nn symmetric nonsingular matrix and ifS andS? are orthogonalcomplementary subspaces. ThenA
m
S andAm
S?
are also orthogonal complements.
Proof. Ifx andy aren-vectors, then
x yD xTyD xTAT.AT/1yD .Ax/T.AT/1yD .Ax/TA1yD Ax A1y:Applying this result repeatedly, we obtain
x yD Amx Amy:
51
8/11/2019 NumLinAlg
53/72
It is clear from this relation that every element in AmSis orthogonal to every element inAmS?.
Let q1; : : : ; qk be a basis ofS and let qkC1; : : : ; qn be a basis ofS?. Then Amq1; : : : ; A
mqkis a basis ofAmS and AmqkC1; : : : ; A
mqn is a basis ofAmS?. Suppose there exist scalars
c1; : : : ; cn such that
c1Amq1
C Cck A
mqkC
ckC1AmqkC1
C CcnA
mqnD
0: (3.9)
Taking the dot product of this relation with c1Amq1 C C ckAmqk, we obtain
kc1Amq1 C C ckAmqkk D 0and hence c1A
mq1C CckAmqkD 0. Since Amq1; : : : ; Amqkare linearly independent, it followsthatc1D c2D D ckD 0. In a similar manner we obtain ckC1D D cnD 0. Therefore,Amq1; : : : ; A
mqk; AmqkC1; : : : ; A
mqn are linearly independent and hence form a basis for n-
space. Thus,AmS andAmS? are orthogonal complements.
It can be seen from this theorem that performing power iteration on subspaces Skis also performing
inverse power iteration onS
?
k . Since< qm1 ; : : : ; q
mk >D< Ame1; : : : ; Amek >;
Theorem3implies that
< qmkC1; : : : ; qmn >D< AmekC1; : : : ; Amen > :
For k D n1 we have < qmn >D< Amen >. Thus, qmn is the result at the m-th step ofapplying the inverse power method to en. It follows that q
mn should converge to an eigenvector
corresponding to the smallest eigenva