+ All Categories
Home > Documents > NumLinAlg

NumLinAlg

Date post: 03-Jun-2018
Category:
Upload: harold-contreras
View: 214 times
Download: 0 times
Share this document with a friend

of 72

Transcript
  • 8/11/2019 NumLinAlg

    1/72

    Notes on Numerical Linear

    Algebra

    Dr. George W Benthien

    December 9, 2006

    E-mail: [email protected]

  • 8/11/2019 NumLinAlg

    2/72

    Contents

    Preface 5

    1 Mathematical Preliminaries 6

    1.1 Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.2.1 Linear Independence and Bases . . . . . . . . . . . . . . . . . . . . . . . 8

    1.2.2 Inner Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . 8

    1.2.3 Matrices As Linear Transformations . . . . . . . . . . . . . . . . . . . . . 9

    1.3 Derivatives of Vector Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.3.1 Newtons Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 Solution of Systems of Linear Equations 11

    2.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.1.1 The Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.1.2 Row Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.1.3 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.2 Cholesky Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3 Elementary Unitary Matrices and the QR Factorization . . . . . . . . . . . . . . . 19

    2.3.1 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . . . . . . . . 19

    1

  • 8/11/2019 NumLinAlg

    3/72

    2.3.2 Householder Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.3.3 Complex Householder Matrices . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3.4 Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.3.5 Complex Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.3.6 QR Factorization Using Householder Reflectors. . . . . . . . . . . . . . . 28

    2.3.7 Uniqueness of the Reduced QR Factorization . . . . . . . . . . . . . . . . 29

    2.3.8 Solution of Least Squares Problems . . . . . . . . . . . . . . . . . . . . . 32

    2.4 The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    2.4.1 Derivation and Properties of the SVD . . . . . . . . . . . . . . . . . . . . 33

    2.4.2 The SVD and Least Squares Problems . . . . . . . . . . . . . . . . . . . . 36

    2.4.3 Singular Values and the Norm of a Matrix . . . . . . . . . . . . . . . . . . 39

    2.4.4 Low Rank Matrix Approximations. . . . . . . . . . . . . . . . . . . . . . 39

    2.4.5 The Condition Number of a Matrix . . . . . . . . . . . . . . . . . . . . . 41

    2.4.6 Computation of the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3 Eigenvalue Problems 44

    3.1 Reduction to Tridiagonal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.2 The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    3.3 The Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.4 Inverse Iteration with Shifts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.5 Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    3.6 The Basic QR Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    3.6.1 The QR Method with Shifts . . . . . . . . . . . . . . . . . . . . . . . . . 52

    3.7 The Divide-and-Conquer Method. . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4 Iterative Methods 61

    2

  • 8/11/2019 NumLinAlg

    4/72

    4.1 The Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    4.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    4.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    3

  • 8/11/2019 NumLinAlg

    5/72

    List of Figures

    2.1 Householder reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.2 Householder reduction of a matrix to bidiagonal form. . . . . . . . . . . . . . . . 42

    3.1 Graph off ./ D 1 C :5

    1C :5

    2C :5

    3C :5

    4 . . . . . . . . . . . . . . . . . . . 58

    3.2 Graph off ./ D 1 C :51

    C :012

    C :53

    C :54

    . . . . . . . . . . . . . . . . . . . 59

    4

  • 8/11/2019 NumLinAlg

    6/72

    Preface

    The purpose of these notes is to present some of the standard procedures of numerical linear al-

    gebra from the perspective of a user and not a computer specialist. You will not find extensive

    error analysis or programming details. The purpose is to give the user a general idea of what the

    numerical procedures are doing. You can find more extensive discussions in the references

    Applied Numerical Linear Algebra by J. Demmel, SIAM 1997 Numerical Linear Algebra by L. Trefethen and D. Bau, Siam 1997 Matrix Computations by G. Golub and C. Van Loan, Johns Hopkins University Press 1996

    The notes are divided into four chapters. The first chapter presents some of the notation used in

    this paper and reviews some of the basic results of Linear Algebra. The second chapter discusses

    methods for solving linear systems of equations, the third chapter discusses eigenvalue problems,

    and the fourth discusses iterative methods. Of course we cannot discuss every possible method,

    so I have tried to pick out those that I believe are the most used. I have assumed that the user hassome basic knowledge of linear algebra.

    5

  • 8/11/2019 NumLinAlg

    7/72

    Chapter 1

    Mathematical Preliminaries

    In this chapter we will describe some of the notation that will be used in these notes and reviewsome of the basic results from Linear Algebra.

    1.1 Matrices and Vectors

    A matrix is a two-dimensional array of real or complex numbers arranged in rows and columns. If

    a matrixAhasmrows andncolumns, we say that it is anm nmatrix. We denote the element inthei -th row andj -th column ofAby aij. The matrixAis often written in the form

    A D

    a11 a1n:::

    :::

    am1 amn

    :

    We sometimes write AD .a1; : : : ; an/ where a1; : : : ; an are the columns of A. A vector (orn-vector) is an n1 matrix. The collection of all n-vectors is denoted by Rn if the elements(components) are all real and by Cn if the elements are complex. We define the sum of two

    m nmatrices componentwise, i.e., the i ,j entry ofA C B isaijC bij. Similarly, we define themultiplication of a scalar times a matrixAto be the matrix whosei ,j component isaij.

    IfA is a real matrix with components aij, then the transpose ofA (denoted byAT

    ) is the matrixwhose i ,j component is aj i , i.e. rows and columns are interchanged. IfA is a matrix with complex

    components, thenAH is the matrix whosei ,j -th component is the complex conjugate of the j ,i -th

    component ofA. We denote the complex conjugate ofa by a. Thus,.AH/ijD aj i . A real matrixA is said to be symmetric ifAD AT. A complex matrix is said to be Hermitian ifAD AH.Notice that the diagonal elements of a Hermitian matrix must be real. Then n matrix whosediagonal components are all one and whose off-diagonal components are all zero is called the

    identity matrix and is denoted by I.

    6

  • 8/11/2019 NumLinAlg

    8/72

    IfA is anm k matrix andB is ank nmatrix, then the product AB is them nmatrix withcomponents given by

    .AB/ijDkX

    rD1

    air brj:

    The matrix product AB is only defined when the number of columns ofA is the same as thenumber of rows ofB . In particular, the product of anm nmatrixA and ann-vector xis given by

    .Ax/iDnX

    kD1

    aik xk iD 1 ; : : : ; m :

    It can be easily verified that I AD A if the number of columns in Iequals the number of rowsinA. It can also be shown that.AB/T D BTAT and .AB/H D BHAH. In addition, we have.AT/T D Aand .AH/H D A.

    1.2 Vector Spaces

    Rn and Cn together with the operations of addition and scalar multiplication are examples of a

    structure called a vector space. A vector space Vis a collection of vectors for which addition and

    scalar multiplication are defined in such a way that the following conditions hold:

    1. Ifx andy belong to Vand is a scalar, thenx C y and x belong to V.2. x

    CyD

    yC

    x for any two vectorsx and y in V.

    3. x C .y C z/ D .x C y/ C z for any three vectorsx,y , andz in V.4. There is a vector0in Vsuch thatx C 0 D xfor allx in V.5. For eachx in Vthere is a vector x in Vsuch thatx C .x/ D 0.6. ./xD .x/for any scalars, and any vectorx in V.7. 1xD xfor anyx in V.8. .x C y/ D x C yfor anyx and y in Vand any scalar.9. . C /xD xC x for anyx in Vand any scalars,.

    A subspace of a vector space Vis a subset that is also a vector space in its own right.

    7

  • 8/11/2019 NumLinAlg

    9/72

    1.2.1 Linear Independence and Bases

    A set of vectorsv1; : : : ; vr is said to be linearly independent if the only way we can have 1v1 C C r vrD 0 is for 1D D rD 0. A set of vectorsv1; : : : ; vn is said to span a vectorspace Vif every vectorx in Vcan be written as a linear combination of the vectors v1; : : : ; vn, i.e.,

    xD 1x1 C C nxn. The set of all linear combinations of the vectorsv1; : : : ; vr is a subspacedenoted by< v1; : : : ; vr >and called the span of these vectors. If a set of vectorsv1; : : : ; vn is

    linearly independent and spansVit is called a basis forV. If a vector spaceVhas a basis consisting

    of a finite number of vectors, then the space is said to be finite dimensional. In a finite-dimensional

    vector space every basis has the same number of vectors. This number is called the dimension of

    the vector space. Clearly Rn and Cn have dimensionn. Letek denote the vector in Rn or Cn that

    consists of all zeroes except for a one in the k-th position. It is easily verified thate1; : : : ; en is a

    basis for either Rn or Cn.

    1.2.2 Inner Product and Orthogonality

    Ifx and y are two n-vectors, then the inner (dot) productx yis the scaler value defined byxHy.If the vector space is real we can replacexH byxT. The inner productx y has the properties:

    1. y xD x y2. x .y/ D .x y/3. x .y C z/ D x y C x z

    4. x x 0and x xD 0if and only ifxD 0.Vectorsx and y are said to be orthogonal ifx yD 0. A basisv1; : : : ; vn is said to be orthonormalif

    vi vjD(

    0 i j1 iD j :

    We define the norm kxk of a vector x by kxk D px xDq

    jx1j2 C C jxnj2. The norm hasthe properties

    1.

    kx

    k D jjk

    xk

    2.kxk D 0implies thatxD 03.kx C yk kxk C kyk.

    Ifv1; : : : ; vn is an orthonormal basis and x D 1v1C C nvn, then it can be shown thatkxk2 D j1j2 C C jnj2. The norm and inner product satisfy the inequality

    jx yj kxk kyk: Cauchy Inequality

    8

  • 8/11/2019 NumLinAlg

    10/72

    1.2.3 Matrices As Linear Transformations

    Anm nmatrixAcan be considered as a mapping of the space Rn (Cn) into the space Rm (Cm)where the image of the n-vectorx is the matrix-vector product Ax. This mapping is linear, i.e.,

    A.xC

    y/D

    AxC

    AyandA.x/D

    Ax. The range ofA (denoted by Range.A/) is the space

    of allm-vectorsy such thatyD Ax for somen-vectorx. It can be shown that the range ofA isthe space spanned by the columns ofA. The null space ofA (denoted by Null.A/) is the vector

    space consisting of all n-vectors x such that AxD 0. An n n square matrix A is said to beinvertible if it is a one-to-one mapping of the space Rn (Cn) onto itself. It can be shown that a

    square matrixAis invertible if and only if the null space Null.A/consists of only the zero vector.

    IfA is invertible, then the inverse A1 ofA is defined by A1yD xwhere xis the unique n-vectorsatisfyingAxD y . The inverse has the properties A1ADAA1 D I and.AB/1 D B1A1.We denote.A1/T and.AT/1 byAT.

    IfAis anm nmatrix,x is ann-vector, andy is anm-vector; then it can be shown that

    .Ax/ yD x .AHy/:

    1.3 Derivatives of Vector Functions

    The central idea behind differentiation is the local approximation of a function by a linear func-

    tion. Iff is a function of one variable, then the locus of points

    x;f.x/

    is a plane curve C. The

    tangent line to Cat

    x;f.x/

    is the graphical representation of the best local linear approximation

    tof atx . We call this local linear approximation the differential. We represent this local linear

    approximation by the equationdyD f0.x/dx. Iff is a function of two variables, then the locusof points

    x;y ;f.x;y /

    represents a surface S . Here the best local linear approximation to f at

    .x;y/is graphically represented by the tangent plane to the surface S at the point

    x;y ;f.x;y /

    .

    We will generalize this idea of a local linear approximation to vector-valued functions ofn vari-

    ables. Letf be a function mapping n-vectors into m-vectors. We define the derivativeDf .x/ of

    f at the n-vectorx to be the unique linear transformation (m nmatrix) satisfying

    f .x C h/ D f.x/ C Df .x/h C o.khk/ (1.1)

    whenever such a transformation exists. Here theonotation signifies a function with the property

    limkhk!0

    o.khk/khk D 0:

    Thus,Df .x/is a linear transformation that locally approximatesf.

    We can also define a directional derivative hf.x/in the directionhby

    hf.x/ D lim!0

    f .x C h/ f.x/

    D dd

    f .x C h/

    D0(1.2)

    9

  • 8/11/2019 NumLinAlg

    11/72

    whenever the limit exists. This directional derivative is also referred to as the variationoff in the

    directionh. IfDf .x/exists, then

    hf.x/ D Df .x/h:However, the existence ofhf.x/for every directionh does not imply the existence ofDf .x/. If

    we take hD

    ei , thenhf.x/is just the partial derivative @f.x/

    @xi.

    1.3.1 Newtons Method

    Newtons method is an iterative scheme for finding the zeroes of a smooth function f. Ifx is a

    guess, then we approximatef nearx by

    f .x C h/ D f.x/ C Df .x/h:

    Ifx

    Chis the zero of this linear approximation, then

    h D Df .x/1f.x/or

    x C h D x Df .x/1f.x/: (1.3)We can takexC h as an improved approximation to the nearby zero off . If we keep iteratingwith equation (1.3), then the.k C 1/-iterate x.kC1/ is related to thek-iteratex.k/ by

    x.kC1/ D x.k/ Df .x.k//1f .x.k//: (1.4)

    10

  • 8/11/2019 NumLinAlg

    12/72

    Chapter 2

    Solution of Systems of Linear Equations

    2.1 Gaussian Elimination

    Gaussian elimination is the standard way of solving a system of linear equations AxD b whenA is a square matrix with no special properties. The first known use of this method was in the

    Chinese textNine Chapters on the Mathematical Artwritten between 200 BC and 100 BC. Here

    it was used to solve a system of three equations in three unknowns. The coefficients (including

    the right-hand-side) were written in tabular form and operations were performed on this table to

    produce a triangular form that could be easily solved. It is remarkable that this was done long

    before the development of matrix notation or even a notation for variables. The method was used

    by Gauss in the early 1800s to solve a least squares problem for determining the orbit of the asteroid

    Pallas. Using observations of Pallas taken between 1803 and 1809, he obtained a system of sixequations in six unknowns which he solved by the method now known as Gaussian elimination.

    The concept of treating a matrix as an object and the development of an algebra for matrices were

    first introduced by Cayley [2] in the paperA Memoir on the Theory of Matrices.

    In this paper we will first describe the basic method and show that it is equivalent to factoring the

    matrix into the product of a lower triangular and an upper triangular matrix, i.e., AD LU. Wewill then introduce the method of row pivoting that is necessary in order to keep the method stable.

    We will show that row pivoting is equivalent to a factorizationPAD LU orA D PLU wherePis the identity matrix with its rows permuted. Having obtained this factorization, the solution for a

    given right-hand-sideb is obtained by solving the two triangular systemsLyD

    P band U xD

    y

    by simple processes called forward and backward substitution.

    There are a number of good computer implementations of Gaussian elimination with row pivoting.

    Matlab has a good implementation obtained by the call [L,U,P]=lu(A). Another good implemen-

    tation is the LAPACK routine SGESV (DGESV,CGESV). It can be obtained in either Fortran or C

    from the site www.netlib.org.

    We will end by showing how the accuracy of a solution can be improved by a process called

    11

  • 8/11/2019 NumLinAlg

    13/72

    iterative refinement.

    2.1.1 The Basic Procedure

    Gaussian elimination begins by producing zeroes below the diagonal in the first column, i.e., : : : : : : :::

    ::: :::

    : : :

    !

    : : : 0 : : : :::

    ::: :::

    0 : : :

    : (2.1)

    Ifaijis the element ofA in thei -th row and thej -th column, then the first step in the Gaussian

    elimination process consists of multiplyingA on the left by the lower triangular matrixL1 given

    by

    L1D

    1 0 0 : : : 0

    a21=a11 1 0 : : : 0a31=a11 0 1 :::

    ::: :::

    : : : 0

    an1=a11 0 : : : 0 1

    ; (2.2)

    i.e., zeroes are produced in the first column by adding appropriate multiples of the first row to the

    other rows. The next step is to produce zeroes below the diagonal in the second column, i.e.,

    : : : 0 : : : :::

    ::: :::

    0 : : :

    !

    : : : 0 0 0 :::

    ::: :::

    0 0 : : :

    : (2.3)

    This can be obtained by multiplyingL1Aon the left by the lower triangular matrixL2given by

    L2D1 0 0 0 : : : 00 1 0 0 00 a.1/32=a.1/22 1 0 00 a.1/42=a.1/22 0 1 0::: ::: ::: : : : 0

    0 a.1/n2 =a.1/22 0 : : : 0 1

    (2.4)where a

    .1/ij is the i; j -th element ofL1A. Continuing in this manner, we can define lower triangular

    matricesL3; : : : ; Ln1 so thatLn1 L1Ais upper triangular, i.e.,

    Ln1 L1A D U: (2.5)

    12

  • 8/11/2019 NumLinAlg

    14/72

    Taking the inverses of the matricesL1; : : : ; Ln1, we can writeAas

    A D L11 L1n1U: (2.6)Let

    LD

    L1

    1 L1

    n1

    : (2.7)

    Then it follows from equation (2.6) that

    A D LU: (2.8)We will now show that Lis lower triangular. Each of the matricesLk can be written in the form

    LkD I u.k/eTk (2.9)where ek is the vector whose components are all zero except for a one in thek-th position andu

    .k/

    is a vector whose first k components are zero. The termu.k/eTk is ann nmatrix whose elementsare all zero except for those below the diagonal in the k-th column. In fact, the components ofu.k/

    are given by

    u.k/i D

    (0 1 i ka

    .k1/

    ik =a

    .k1/

    kk k < i

    (2.10)

    wherea.k1/ij is thei; j -th element ofLk1 L1A. SinceeTku.k/ D u.k/k D 0, it follows that

    IC u.k/eTk

    I u.k/eTk D IC u.k/eTk u.k/eTk u.k/eTku.k/eTk

    D I u.k/eTku.k/eTkD I; (2.11)

    i.e.,L1k D IC u.k/eTk : (2.12)

    Thus, L1k is the same as Lk except for a change of sign of the elements below the diagonal in

    columnk. Combining equations (2.7) and(2.12), we obtain

    L D IC u.1/eT1 IC u.n1/eTn1 D IC u.1/eT1C C u.n1/eTn1: (2.13)In this expression the cross terms dropped out since

    u.i/ eTi u.j /eTjD u.j /i u.i/ eTjD 0 fori < j:

    Equation (2.13) implies thatL is lower triangular and that thek-th column ofLlooks like thek-thcolumn ofLk with the signs reversed on the elements below the diagonal, i.e.,

    LD

    1 0 0 : : : 0

    a21=a11 1 0 0

    a31=a11 a.1/32=a

    .1/22 1 0

    ::: :::

    : : : :::

    an1=a11 a.1/n2 =a

    .1/22 1

    : (2.14)

    13

  • 8/11/2019 NumLinAlg

    15/72

    Having theLUfactorization given in equation (2.8), it is possible to solve the system of equations

    AxD LUxD bfor any right-hand-side b. If we letyD Ux, theny can be found by solving the triangular systemLy

    D b. Havingy, x can be obtained by solving the triangular system Ux

    D y. Triangular

    systems are very easy to solve. For example, in the system UxD y, the last equation can besolved forxn (the only unknown in this equation). Havingxn, the next to the last equation can be

    solved forxn1 (the only unknown left in this equation). Continuing in this manner we can solve

    for the remaining components ofx. For the systemLyD b, we start by computing y1 and thenwork our way down. Solving an upper triangular system is called back substitution. Solving a

    lower triangular system is called forward substitution.

    To computeL requires approximatelyn3=3operations where an operation consists of an addition

    and a multiplication. For each right-hand-side, solving the two triangular systems requires approx-

    imatelyn2 operations. Thus, as far as solving systems of equations is concerned, having the LU

    factorization ofAis just as good as having the inverse ofA and is less costly to compute.

    2.1.2 Row Pivoting

    There is one problem with Gaussian elimination that has yet to be addressed. It is possible for

    one of the diagonal elements a.k1/kk

    that occur during Gaussian elimination to be zero or to be

    very small. This causes a problem since we must divide by this diagonal element. If one of the

    diagonals is exactly zero, the process obviously blows up. However, there can still be a problem

    if one of the diagonals is small. In this case large elements are produced in both the L and U

    matrices. These large entries lead to a loss of accuracy when there are subtractions involving these

    big numbers. This problem can occur even for well behaved matrices. To eliminate this problem

    we introduce row pivoting. In performing Gaussian elimination, it is not necessary to take the

    equations in the order they are given. Suppose we are at the stage where we are zeroing out the

    elements below the diagonal in the k -th column. We can interchange any of the rows from the

    k-th row on without changing the structure of the matrix. In row pivoting we find the largest in

    magnitude of the elements a.k1/

    kk ; a.k1/

    kC1;k; : : : ; a.k1/

    nk and interchange rows to bring that element

    to thek; k-position. Mathematically we can perform this row interchange by multiplying on the

    left by the matrixPk that is like the identity matrix with the appropriate rows interchanged. The

    matrix Pkhas the property PkPkD I, i.e.,Pk is its own inverse. With row pivoting equation (2.5)is replaced by

    Ln1Pn1 L2P2L1P1A D U: (2.15)We can write this equation in the form

    Ln1

    Pn1Ln2P1

    n1

    Pn1Pn2Ln3P

    1n2P

    1n1

    Pn1 P2L1P12 : : : P 1n1

    Pn1 P1

    A D U: (2.16)

    DefineL0n1D Ln1 andL0kD Pn1 PkC1LkP1kC1 P1n1 kD 1 ; : : : ; n 2: (2.17)

    14

  • 8/11/2019 NumLinAlg

    16/72

  • 8/11/2019 NumLinAlg

    17/72

    2.1.3 Iterative Refinement

    If the solution ofAxD b is not sufficiently accurate, the accuracy can be improved by applyingNewtons method to the functionf .x/ D Ax b. Ifx.k/ is an approximate solution to f .x/ D 0,then a Newton iteration produces an approximation x.kC1/ given by

    x.kC1/ D x.k/ Df .x.k//1f .x.k// D x.k/ A1Ax.k/ b: (2.21)An iteration step can be summarized as follows:

    1. compute the residualr .k/ D Ax.k/ b;2. solve the systemAd.k/ D r.k/ using theLU factorization ofA;3. Computex.kC1/ D x.k/ d.k/.

    The residual is usually computed in double precision. If the above calculations were carried out

    exactly, the answer would be obtained in one iteration as is always true when applying Newtons

    method to a linear function. However, because of roundoff errors, it may require more than one

    iteration to obtain the desired accuracy.

    2.2 Cholesky Factorization

    Matrices that are Hermitian (A

    H

    D A) and positive definite (xH

    Ax > 0 for all x 0) occursufficiently often in practice that it is worth describing a variant of Gaussian elimination that isoften used for this class of matrices. Recall that Gaussian elimination amounted to a factorization

    of a square matrixA into the product of a lower triangular matrix and an upper triangular matrix,

    i.e.,A D LU. The Cholesky factorization represents a Hermitian positive definite matrixAby theproduct of a lower triangular matrix and its conjugate transpose, i.e., AD LLH. Because of thesymmetries involved, this factorization can be formed in roughly half the number of operations as

    are needed for Gaussian elimination.

    Let us begin by looking at some of the properties of positive definite matrices. Ifei is the i -th

    column of the identity matrix and AD .aij/is positive definite, thenai iD eTi Aei > 0, i.e., thediagonal components ofA are real and positive. SupposeXis a nonsingular matrix of the samesize as the Hermitian, positive definite matrix A. Then

    xH.XHAX/xD .Xx/HA.Xx/ > 0 for allx 0:

    Thus,A Hermitian positive definite implies that XHAXis Hermitian positive definite. Conversely,

    supposeXHAXis Hermitian positive definite. Then

    A D .XX1/HA.XX1/ D .X1/H.XHAX/.X1/ is Hermitian positive definite.

    16

  • 8/11/2019 NumLinAlg

    18/72

    Next we will show that the component of largest magnitude of a Hermitian positive definite matrix

    Aalways lies on the diagonal. Suppose that jakl j D maxi;jjaijj andk l. IfaklD jakl jeikl , letD eikl andxD ekC el . Then

    xHAxD eTkAekC e Tl Aek C eTkAelC jj2eTl AelD akk C al l 2jakl j 0:This contradicts the fact that A is positive definite. Therefore, maxi;jjaijj D maxiai i . Supposewe partition the Hermitian positive definite matrix Aas follows

    A D

    B CH

    C D

    Ify is a nonzero vector compatible withD, letxH D .0;yH/. Then

    xHAxD .0;yH/

    B CH

    C D

    0

    y

    D yHDy > 0;

    i.e., D is Hermitian positive definite. Similarly, lettingxH

    D .yH

    ; 0/, we can show that B isHermitian positive definite.

    We will now show that ifA is a Hermitian, positive-definite matrix, then there is a unique lower

    triangular matrixL with positive diagonals such thatAD LLH. This factorization is called theCholesky factorization. We will establish this result by induction on the dimension n. Clearly, the

    result is true for nD 1. For in this case we can take LD .pa11/. Suppose the result is true formatrices of dimensionn 1. LetA be a Hermitian, positive-definite matrix of dimensionn. Wecan partitionAas follows

    A D a11 w

    H

    w K (2.22)where wis a vector of dimension n 1andKis a .n 1/ .n 1/matrix. It is easily verified that

    A D

    a11 wH

    w K

    D B H

    1 0

    0 K wwHa11

    !B (2.23)

    where

    BDp

    a11wHp

    a110 I

    !: (2.24)

    We will first show that the matrix B is invertible. If

    BxDpa11 wHpa11

    0 I

    !x1x2

    Dpa11x1 C wHx2pa11

    x2

    !D 0;

    then x2D 0 andp

    a11x1D x1D 0. Therefore,B is invertible. From our discussion at thebeginning of this section it follows from equation (2.23) that the matrix

    1 0

    0 k wwHa11

    !

    17

  • 8/11/2019 NumLinAlg

    19/72

    is Hermitian positive definite. By the results on the partitioning of a positive definite matrix, it

    follows that the matrix

    K w wH

    a11

    is Hermitian positive definite. By the induction hypothesis, there exists a unique lower triangular

    matrixOLwith positive diagonals such that

    K wwH

    a11DOL OLH: (2.25)

    Substituting equation (2.25) into equation (2.23), we get

    A D B H

    1 0

    0 OL OLH

    BD B H

    1 0

    0 OL

    1 0

    0 OLH

    BDp

    a11 0wpa11

    OL

    !pa11

    wHpa11

    0 OLH

    ! (2.26)

    which is the desired factorization ofA. To show uniqueness, suppose that

    A D

    a11 wH

    w K

    D

    l11 0

    v OL

    l11 vH

    0 OLH

    (2.27)

    is a Cholesky factorization ofA. Equating components in equation (2.27), we see thatl 211D a11and hence thatl11D

    pa11. Alsol11vD wor vD w= l11D w=

    pa11. Finally, vv

    H COL OLH D KorK vvH DK ww H=a11DOL OLH. SinceOL OLH is the unique factorization of the .n 1/ .n 1/Hermitian, positive-definite matrixK ww H=a11, we see that the Cholesky factorizationofA is unique. It now follows by induction that there is a unique Cholesky factorization of any

    Hermitian, positive-definite matrix.

    The factorization in equation (2.23) is the basis for the computation of the Cholesky factorization.

    The matrix B H is lower triangular. Since the matrixK ww H=a11 is positive definite, it canbe factored in the same manner. Continuing in this manner until the center matrix becomes the

    identity matrix, we obtain lower triangular matrices L1; : : : ; Ln such that

    A D L1 LnLHn LH1 :

    LettingL D L1 Ln, we have the desired Cholesky factorization.

    As was mentioned previously, the number of operations in the Cholesky factorization is about half

    the number in Gaussian elimination. Unlike Gaussian elimination the Cholesky method does not

    need pivoting in order to maintain stability. The Cholesky factorization can also be written in theform

    A D LDLH

    whereD is diagonal andL now has all ones on the diagonal.

    18

  • 8/11/2019 NumLinAlg

    20/72

    2.3 Elementary Unitary Matrices and the QR Factorization

    In Gaussian elimination we saw that a square matrix A could be reduced to triangular form by

    multiplying on the left by a series of elementary lower triangular matrices. This process can also

    be expressed as a factorization A D LU whereLis lower triangular and Uis upper triangular. Inleast squares problems the number of rows m in A is usually greater than the number of columnsn. The standard technique for solving least-squares problems of this type is to make use of a

    factorization A D QRwhereQ is anm munitary matrix andR has the form

    RDOR

    0

    with OR an n n upper triangular matrix. The usual way of obtaining this factorization is toreduce the matrix Ato triangular form by multiplying on the left by a series of elementary unitary

    matrices that are sometimes called Householder matrices (reflectors). We will show how to use

    thisQRfactorization to solve least square problems. If OQ is the m n matrix consisting of thefirstncolumns ofQ, then

    A D OQOR:This factorization is called the reduced QR factorization. Elementary unitary matrices are also

    used to reduce square matrices to a simplified form (Hessenberg or tridiagonal) prior to eigenvalue

    calculation.

    There are several good computer implementations that use the Householder QR factorization to

    solve the least squares problem. The LAPACK routine is called SGELS (DGELS,CGELS). In

    Matlab the solution of the least squares problem is given by Anb. TheQR factorization can beobtained with the call[Q,R]=qr(A).

    2.3.1 Gram-Schmidt Orthogonalization

    A reduced QR factorization can be obtained by an orthogonalization procedure known as the

    Gram-Schmidt process. Suppose we would like to construct an orthonormal set of vectors q1; : : : ; qnfrom a given linearly independent set of vectorsa1; : : : ; an. The process is recursive. At the j -th

    step we construct a unit vector qj that is orthogonal toq1; : : : ; qj1using

    vjD ajj1XiD1

    .qHi aj/qi

    qjD vj=kvjk:

    The orthonormal basis constructed has the additional property

    < q1; : : : ; qj >D< a1; : : : ; aj > jD 1 ; 2 ; : : : ; n :

    19

  • 8/11/2019 NumLinAlg

    21/72

    If we consider a1; : : : ; an as columns of a matrix A, then this process is equivalent to the matrix

    factorizationAD OQORwhere OQD .q1; : : : ; qn/ and OR is upper triangular. Although the Gram-Schmidt process is very useful in theoretical considerations, it does not lead to a stable numerical

    procedure. In the next section we will discuss Householder reflectors, which lead to a more stable

    procedure for obtaining aQRfactorization.

    2.3.2 Householder Reflections

    Let us begin by describing the Householder reflectors. In this section we will restrict ourselves to

    real matrices. Afterwards we will see that there are a number of generalizations to the complex

    case. Ifv is a fixed vector of dimension mwith kvk D 1, then the set of all vectors orthogonal to vis an.m 1/-dimensional subspace called a hyperplane. If we denote this hyperplane by H, then

    HD fu W vTu D 0g: (2.28)

    Here vT denotes the transpose ofv. Ifxis a point not on H, letNxdenote the orthogonal projectionofx ontoH(see Figure2.1). The differenceNx xmust be orthogonal toHand hence a multipleofv, i.e.,

    Nx xD v or NxD x C v: (2.29)

    x

    x

    xv

    H

    Figure 2.1: Householder reflection

    SinceNxlies onH andvTvD kvk2 D 1, we must havevT NxD vTx C vTvD vTx C D 0: (2.30)

    Thus,D vTxand consequentlyNxD x .vTx/vD x vvTxD .I vv T/x: (2.31)

    20

  • 8/11/2019 NumLinAlg

    22/72

    DefinePD I vv T. ThenPis a projection matrix that projects vectors orthogonally onto H.The projectionNx is obtained by going a certain distance from x in the directionv. Figure2.1suggests that the reflectionOx ofx acrossHcan be obtained by going twice that distance in thesame direction, i.e.,

    OxD x 2.vT

    x/vD x 2vvT

    xD .I 2vvT

    /x: (2.32)

    With this motivation we define the Householder reflector Q by

    QD I 2vvT kvk D 1: (2.33)An alternate form for the Householder reflector is

    QD I 2uuT

    kuk2 (2.34)

    where here u is not restricted to be a unit vector. Notice that, in this form, replacing u by a multiple

    ofu does not changeQ. The matrixQ is clearly symmetric, i.e., QT

    DQ. Moreover,

    QTQD Q2 D .I 2vv T/.I 2vvT/ D I 2vv T 2vv T C 4vvTvvT D I; (2.35)i.e.,Q is an orthogonal matrix. As with all orthogonal matricesQpreserves the norm of a vector,

    i.e.,

    kQxk2 D .Qx/TQxD xTQTQxD xTxD kxk2: (2.36)

    To reduce a matrix to one that is upper triangular it is necessary to zero out columns below a certain

    position. We will show how to construct a Householder reflector so that its action on a given vector

    x is a multiple ofe1, the first column of the identity matrix. To zero out a vector below rowk we

    can use a matrix of the form

    QD I 00 Q

    where Iis the.k 1/ .k 1/identity matrix and Qis a.m k C 1/ .m k C 1/ Householdermatrix. Thus, for a given vector x we would like to choose a vector u so thatQx is a multiple of

    the unit vectore1, i.e.,

    QxD x 2.uTx/

    kuk2 u D e1: (2.37)

    SinceQ preserves norms, we must have jj D kxk. Therefore, equation (2.37) becomes

    Qx

    Dx

    2.uTx/

    kuk2

    u

    D kx

    ke1: (2.38)

    It follows from equation (2.38) thatu must be a multiple of the vector x kxke1. Sinceucan bereplaced by a multiple ofuwithout changingQ, we let

    u D x kxke1: (2.39)It follows from the definition ofuin equation (2.39) that

    uTxD kxk2 kxkx1 (2.40)

    21

  • 8/11/2019 NumLinAlg

    23/72

    and

    kuk2 D uTu D kxk2 kxkx1 kxkx1 C kxk2 D 2.kxk2 kxkx1/: (2.41)Therefore,

    2.uTx/

    kuk

    2 D 1; (2.42)

    and henceQx becomes

    QxD x 2.uTx/

    kuk2 u D x u D kxke1 (2.43)

    as desired. From what has been discussed so far, either of the signs in equation (2.39) would

    produce the desired result. However, ifx1 is very large compared to the other components, then it

    is possible to lose accuracy through subtraction in the computation ofu D x kxke1. To preventthis we chooseuto be

    u D x C sign.x1/kxke1 (2.44)where sign.x1/is defined by

    sign.x1/ D(

    C1 x1 01 x1 < 0:

    (2.45)

    With this choice ofu, equation (2.43) becomes

    QxD sign.x1/kxke1: (2.46)

    In practice,uis often scaled so that u1D 1, i.e.,

    u

    D

    x C sign.x1/kxke1

    x1 C sign.x1/kxk : (2.47)

    With this choice ofu,

    kuk2 D 2kxkkxk C jx1j : (2.48)

    The matrixQapplied to a general vectory is given by

    QyD y 2uTy

    kuk2u: (2.49)

    2.3.3 Complex Householder Matrices

    Thee are several ways to generalize Householder matrices to the complex case. The most obvious

    is to let

    UD I 2 uuH

    kuk2where the superscriptHdenotes conjugate transpose. It can be shown that a matrix of this form

    is both Hermitian (UD UH) and unitary (UHUD I). However, it is sometimes convenient

    22

  • 8/11/2019 NumLinAlg

    24/72

    to be able to construct a U such that UHx is a real multiple ofe1. This is especially true when

    converting a Hermitian matrix to tridiagonal form prior to an eigenvalue computation. For in this

    case the tridiagonal matrix becomes a real symmetric matrix even when starting with a complex

    Hermitian matrix. Thus, it is not necessary to have a separate eigenvalue routine for the complex

    case. It turns out that there is no Hermitian unitary matrix U, as defined above, that is guaranteed to

    produce a real multiple ofe1. Therefore, linear algebra libraries such as LAPACK use elementaryunitary matrices of the form

    UD I wwH (2.50)wherecan be complex. These matrices are not in general Hermitian. IfUis to be unitary, we

    must have

    ID UHUD .I wwH/.I wwH/ D I .C jj2 kwk2/ww H

    and hence

    jj2 kwk2 D 2 Re./: (2.51)

    Notice that replacingw byw=and by jj2in equation (2.50) leavesU unchanged. Thus, ascaling ofw can be absorbed in. We would like to choosew and so that

    UHxD x .wHx/wD kxke1 (2.52)

    where D 1. It can be seen from equation (2.52) that w must be proportional to the vectorx kxke1. Since the factor of proportionality can be absorbed in, we choose

    wD x kxke1: (2.53)

    Substituting this expression forw into equation (2.52), we get

    UHxD x .wHx/.x kxke1/ D .1 wHx/xC .wHx/kxke1D kxke1: (2.54)

    Thus, we must have

    .wHx/D 1 or D 1xHw

    : (2.55)

    This choice of gives

    UHxD kxke1:It follows from equation (2.53) that

    xHw

    D kx

    k2

    kx

    kx1 (2.56)

    and

    kwk2 D .xH kxkeT1/.x kxke1/ D kxk2 kxkx1 kxkx1 C kxk2D 2kxk2 kxk Re.x1/ (2.57)

    23

  • 8/11/2019 NumLinAlg

    25/72

    Thus, it follows from equations (2.55)(2.57) that

    2 Re. /

    jj2 D C

    D 1

    C 1

    D xHw C wHx

    D kxk2 kxkx1C kxk2 kxkx1D 2kxk2 2kxk Re.x1/D kwk2;

    i.e., the condition in equation (2.51) is satisfied. It follows that the matrix Udefined by equation

    (2.50) is unitary when w is defined by equation (2.53) and is defined by equation (2.55). As

    before we chooseto prevent the loss of accuracy due to subtraction in equation ( 2.53). In this

    case we chooseD signRe.x1/. Thus,w becomeswD x C signRe.x1/kxke1: (2.58)

    Let us define a real constant by

    D signRe.x1/kxk: (2.59)With this definitionwbecomes

    wD x C e1: (2.60)It follows that

    xHwD kxk2 C x1D 2 C x1D . C x1/; (2.61)and hence

    D 1.

    Cx1/

    : (2.62)

    In LAPACKw is scaled so thatw1D 1, i.e.,

    wD x C e1x1 C

    : (2.63)

    With thisw ,becomes

    Djx1 C j2

    . C x1/D .x1 C /.x1 C /

    . C x1/ D x1 C

    : (2.64)

    Clearly thissatisfies the inequality

    j 1j Djx1jjj Djx1jkxk 1: (2.65)

    It follows from equation (2.64) thatis real whenx1is real. Thus,Uis Hermitian whenx1is real.

    An alternate approach to defining a complex Householder matrix is to let

    UD I 2w wH

    kwk2 : (2.66)

    24

  • 8/11/2019 NumLinAlg

    26/72

    ThisUis Hermitian and

    UHUD

    I 2wwH

    kwk2

    I 2wwH

    kwk2

    D I 2wwH

    kwk2 2ww H

    kwk2 C4kwk2ww H

    kwk4 D I; (2.67)

    i.e.,Uis unitary. We want to choosew so that

    UHxD UxD x 2wHx

    kwk2 wD kxke1 (2.68)

    where jj D 1. Multiplying equation (2.68) byxH, we get

    xHUxD xHUHxD xHUxD kxkx1: (2.69)

    SincexHUxis real, it follows that x1is real. Ifx1D jx1jei1, thenmust have the form

    D ei1: (2.70)

    It follows from equation (2.68) that w must be proportional to the vector x ei1kxke1. Sincemultiplying wby a constant factor doesnt change U, we take

    wD x ei1kxke1: (2.71)

    Again, to avoid accuracy problems, we choose the plus sign in the above formula, i.e.,

    wD x C ei1kxke1: (2.72)

    It follows from this definition that

    kwk2 D xH C ei1kxkeT1 x C ei1kxke1D kxk2 C jx1jkxk C jx1jkxk C kxk2 D 2kxk

    kxk C jx1j (2.73)and

    wHxD xH C ei1kxkeT1 xD kxk2 C ei1x1kxk D kxkkxk C jx1j: (2.74)Therefore,

    2wHx

    kwk2 D 1; (2.75)

    and henceUxD x wD x x C ei1kxke1 D ei1kxke1: (2.76)

    This alternate form for the Householder matrix has the advantage that it is Hermitian and that the

    multiplier ofwwH is real. However, it cant in general map a given vectorx into a real multiple of

    e1. Both EISPACK and LINPACK use elementary unitary matrices similar to this. The LAPACK

    form is not Hermitian, involves a complex multiplier ofwwH, but can produce a real multiple of

    e1 when acting on x. As stated before, this can be a big advantage when reducing matrices to

    triangular form prior to an eigenvalue computation.

    25

  • 8/11/2019 NumLinAlg

    27/72

    2.3.4 Givens Rotations

    Householder matrices are very good at producing long strings of zeroes in a row or column. Some-

    times, however, we want to produce a zero in a matrix while altering as little of the matrix as

    possible. This is true when dealing with matrices that are very sparse (most of the elements are al-

    ready zero) or when performing many operations in parallel. The Givens rotations can sometimes

    be used for this purpose. We will begin by considering the case where all matrices and vectors are

    real. The complex case will be considered in the next section.

    The two-dimensional matrix

    RD

    cos sin sin cos

    rotates a 2-vector counterclockwise through an angle . If we let cD cos andsD sin , thenthe matrixRcan be written as

    R

    D c s

    s c

    where c 2 C s2 D 1. Ifx is a 2-vector, we can determine c ands so that Rx is a multiple ofe1.Since

    RxD

    cx1 sx2sx1C cx2

    ;

    R will have the desired property ifcD x1=q

    x21C x22 andsD x2=q

    x21C x22 . In factRx Dqx21C x22 e1.

    Givens matrices are an extension of this two-dimensional rotation to higher dimensions. For j > i ,

    the givens matrix G.i;j/is anmmmatrix that performs a counterclockwise rotation in the .i;j/coordinate plane. It can be obtained by replacing the.i; i/ and.j;j / components of the m midentity matrix byc , the.i;j /component by s and the.j;i /component by s. It has the matrixform

    G.i;j/D

    coli colj

    rowi

    row j1 1 : : : c s: : :

    s c: : :

    1

    1

    (2.77)26

  • 8/11/2019 NumLinAlg

    28/72

    wherec2 C s2 D 1. The matrixG.i; j /is clearly orthogonal. In terms of components

    G.i;j/klD 1 kD l,k i andk jc kD l,kD i orkD js kD i ,lD js kD j ,lD i

    0 otherwise

    : (2.78)

    Multiplying a vector byG.i; j /only affects thei andj components. IfyD G.i;j/x, then

    ykD

    xk k i andk jcxi sxj kD isxiC cxj kD j

    : (2.79)

    Suppose we want to makeyj

    D0. We can do this by setting

    cD xiqx2iC x2j

    and sD xjqx2iC x2j

    : (2.80)

    With this choice forc and s,y becomes

    ykD

    xk k i andk jq

    x2iC x2j kD i0 kD j

    : (2.81)

    Multiplying a matrixA on the left byG.i;j/only alters rows i andj . Similarly, MultiplyingAon the right byG.i;j /only alters columnsi andj

    2.3.5 Complex Givens Rotations

    For the complex case we replaceR in the previous section by

    RD

    c ss c

    wherec is real. (2.82)

    It can be easily verified thatRis unitary if and only ifc ands satisfy

    c2 C jsj2 D 1:

    Given a 2-vector x, we want to choose R so thatRx is a multiple ofe1. ForR unitary, we must

    have

    RxD kxke1 where jj D 1. (2.83)

    27

  • 8/11/2019 NumLinAlg

    29/72

    Multiplying equation (2.83) byRH, we get

    xD RRHxD kxkRHe1D kxk

    c

    s

    (2.84)

    or

    cD x1kxk and sDx

    2

    kxk : (2.85)

    We define sign.u/for ucomplex by

    sign.u/ D(

    u=juj u 01 u D 0: (2.86)

    Ifc is to be real,must have the form

    D sign.x1/ D 1:

    With this choice of,c and s become

    cD jx1jkxk and sD

    x2 sign.x1/kxk :

    (2.87)

    If we want the complex case to reduce to the real case when x1 and x2 are real, then we can

    chooseD signRe.x1/. As before, we can constructG.i; j / by replacing the .i;i / and.j;j /components of the identity matrix by c, the.i;j /component by s, and the.j;i / component bys. In the expressions for c ands in equation (2.87), we replace x1 by xi , x2 by xj, andkxk by

    qjxi j2 C jxjj2.

    2.3.6 QR Factorization Using Householder Reflectors

    LetAbe anm nmatrix withm > n. LetQ1be a Householder matrix that maps the first columnofA into a multiple ofe1. ThenQ1A will have zeroes below the diagonal in the first column. Now

    let

    Q2D

    1 0

    0 OQ2

    where OQ2 is an .m

    1/

    .m

    1/ Householder matrix that will zero out the entries below the

    diagonal in the second column ofQ1A. Continuing in this manner, we can construct Q2; : : : ; Qn1so that

    Qn1 Q1A DOR

    0

    (2.88)

    where ORis ann ntriangular matrix. The matricesQk have the form

    QkD

    I 0

    0 OQk

    (2.89)

    28

  • 8/11/2019 NumLinAlg

    30/72

    where OQk is an.m k C 1/ .m k C 1/Householder matrix. If we define

    QH D Qn1 Q1 and RDOR

    0

    ; (2.90)

    then equation (2.88) can be written QHA D R: (2.91)Moreover, since eachQk is unitary, we have

    QHQD .Qn1 Q1/.QH1 QHn1/ D I; (2.92)

    i.e.,Q is unitary. Therefore, equation (2.91) can be written

    A D QR: (2.93)

    Equation (2.93) is the desired factorization. The operations count for this factorization is approxi-

    matelymn2

    where an operation is an addition and a multiplication. In practice it is not necessaryto construct the matrix Qexplicitly. Usually only the vectorsv defining eachQk are saved.

    If OQis the matrix consisting of the first ncolumns ofQ, then

    A D OQOR (2.94)

    where OQ is am n matrix with orthonormal columns and OR is an nupper triangular matrix.The factorization in equation (2.94) is the reducedQRfactorization.

    2.3.7 Uniqueness of the Reduced QR Factorization

    In this section we will show that a matrix A of full rank has a unique reduced QRfactorization if

    we require that the triangular matrix Rhas positive diagonals. All other reduced QR factorizations

    ofAare simply related to this one with positive diagonals.

    The reducedQRfactorization can be written

    A

    D.a1; a2; : : : ; an/

    D.q1; q2; : : : ; qn/r11 r12 r1nr22 r2n: :

    :

    :::

    rnn

    : (2.95)IfA has full rank, then all of the diagonal elements rjj must be nonzero. Equating columns in

    equation (2.95), we get

    ajDjX

    kD1

    rkjqkD rjjqjCj1XkD1

    rkjqk

    29

  • 8/11/2019 NumLinAlg

    31/72

    or

    qjD 1rjj

    .ajj1XkD1

    rkjqk/: (2.96)

    WhenjD 1equation (2.96) reduces to

    q1D a1r11

    : (2.97)

    Sinceq1must have unit norm, it follows that

    jr11j D ka1k: (2.98)

    Equations (2.97) and(2.98) determineq1 and r11 up to a factor having absolute value one, i.e.,

    there is ad1 with jd1j D 1such that

    r11

    Dd1

    Or11 q1

    D

    Oq1d

    1

    whereOr11D ka1k andOq1D a1=Or11.

    ForjD 2, equation (2.96) becomes

    q2D 1r22

    .a2 r12q1/:

    Since the columnsq1and q2 must be orthonormal, it follows that

    0 D qH1 q2D 1

    r22.qH1 a2 r12/

    and hence that

    r12D qH1 a2D d1 OqH1 a2: (2.99)Here we have used the fact thatd1D 1=d1. Sinceq2has unit norm, it follows that

    1 D kq2k D 1jr22jka2 r12q1k D 1jr22j

    ka2 .d1 OqH1 a2/ Oq1=d1k D 1

    jr22jka2 . OqH1 a2/ Oq1k

    and hence that

    jr22j D ka2 . OqH1 a2/ Oq1k Or22:Therefore, there exists a scalar d2with

    jd2

    j D1such that

    r22D d2 Or22 and q2D Oq2=d2whereOq2D

    a2 . OqH1 a2/ Oq1

    =Or22.

    ForjD 3, equation (2.96) becomes

    q3D 1r33

    .a3 r13q1 r23q2/:

    30

  • 8/11/2019 NumLinAlg

    32/72

    Since the columnsq1,q2 and q3must be orthonormal, it follows that

    0 D qH1 q3D 1

    r33.qH1 a3 r13/

    0

    DqH2 q3

    D 1

    r33.qH2 a3

    r23/

    and hence that

    r13D qH1 a3D d1 OqH1 a3r23D qH2 a3D d2 OqH2 a3:

    Sinceq3has unit norm, it follows that

    1 D kq3k D 1jr33jka3 r13q1 r23q2k D 1jr33j

    ka3 . OqH1 a3/ Oq1 . OqH2 a3/ Oq2k

    and hence that

    jr33j D ka3 . Oq1Ha3/Oq1 . Oq2Ha3/Oq2k Or33:Therefore, there exists a scalar d3with jd3j D 1such that

    r33D d3 Or33 and q3D Oq3=d3 (2.100)

    whereOq3D

    a3 .Oq1Ha3/ Oq1 .Oq2Ha3/ Oq2

    =Or33. Continuing in this way we obtain the unitarymatrix OQD . Oq1; : : : ;Oqn/and the triangular matrix

    ORD Or11 Or12 Or1nOr22 Or2n: : : :::

    Ornn

    such thatAD OQORis the unique reduced QRfactorization ofA withR having positive diagonalelements. IfA D QRis any other reducedQRfactorization ofA, then

    RDd1

    : : :

    dn ORand

    QD OQ

    1=d1

    : : :

    1=dn

    D OQ

    d1

    : : :

    dn

    where jd1j D D jdnj D 1.

    31

  • 8/11/2019 NumLinAlg

    33/72

    2.3.8 Solution of Least Squares Problems

    In this section we will show how to use the QRfactorization to solve the least squares problem.

    Consider the system of linear equations

    AxD

    b (2.101)

    where A is an m n matrix with m > n. In general there is no solution to this system of equa-tions. Instead we seek to find an x so thatkAx bk is as small as possible. In view of theQRfactorization, we have

    kAx bk2 D kQRx bk2 D kQ.Rx QHb/k2 D kRx QHbk2: (2.102)

    We can writeQ in the partitioned formQD .Q1; Q2/whereQ1 is anm nmatrix. Then

    Rx QHbDORx

    0

    QH1 b

    QH2 b

    DORx QH1 b

    QH2 b

    : (2.103)

    It follows from equation (2.103) that

    kRx QHbk2 D kORx QH1 bk2 C kQH2 bk2: (2.104)

    Combining equations (2.102) and(2.104), we get

    kAx bk2 D kORx QH1 bk2 C kQH2 bk2: (2.105)

    It can be easily seen from this equation that kAx bk is minimized whenx is the solution of thetriangular system

    ORx

    DQH1 b (2.106)

    when such a solution exists. This is the standard way of solving least square systems. Later we will

    discuss the singular value decomposition (SVD) that will provide even more information relative

    to the least squares problem. However, the SVD is much more expensive to compute than the QR

    decomposition.

    2.4 The Singular Value Decomposition

    The Singular Value Decomposition (SVD) is one of the most important and probably one of the

    least well known of the matrix factorizations. It has many applications in statistics, signal process-

    ing, image compression, pattern recognition, weather prediction, and modal analysis to name a

    few. It is also a powerful diagnostic tool. For example, it provides approximations to the rank and

    the condition number of a matrix as well as providing orthonormal bases for both the range and

    the null space of a matrix. It also provides optimal low rank approximations to a matrix. The SVD

    is applicable to both square and rectangular matrices. In this regard it provides a general solution

    to the least squares problem.

    32

  • 8/11/2019 NumLinAlg

    34/72

    The SVD was first discovered by differential geometers in connection with the analysis of bilinear

    forms. Eugenio Beltrami [1] (1873) and Camille Jordan [10] (1874) independently discovered

    that the singular values of the matrix associated with a bilinear form comprise a complete set

    of invariants for the form under orthogonal substitutions. The first proof of the singular value

    decomposition for rectangular and complex matrices seems to be by Eckart and Young [5] in 1939.

    They saw it as a generalization of the principal axis transformation for Hermitian matrices.

    We will begin by deriving the SVD and presenting some of its most important properties. We will

    then discuss its application to least squares problems and matrix approximation problems. Follow-

    ing this we will show how singular values can be used to determine the condition of a matrix (how

    close the rows or columns are to being linearly dependent). We will conclude with a brief outline

    of the methods used to compute the SVD. Most of the methods are modifications of methods used

    to compute eigenvalues and vectors of a square matrix. The details of the computational methods

    are beyond the scope of this presentation, but we will provide references for those interested.

    2.4.1 Derivation and Properties of the SVD

    Theorem 1. (Singular Value Decomposition)LetAbe a nonzerom nmatrix. Then there existsan orthonormal basisu1; : : : ; umof m-vectors, an orthonormal basis v1; : : : ; vn of n-vectors, and

    positive numbers 1; : : : ; r such that

    1. u1; : : : ; ur is a basis of the range ofA

    2. vrC1; : : : ; vnis a basis of the null space ofA

    3. A D PrkD1 kukvHk :Proof: AHA is a Hermitian nn matrix that is positive semidefinite. Therefore, there is anorthonormal basisv1; : : : ; vn and nonnegative numbers

    21 ; : : : ;

    2n such

    AHAvkD 2k vk kD 1 ; : : : ; n : (2.107)Since A is nonzero, at least one of the eigenvalues 2

    k must be positive. Let the eigenvalues be

    arranged so that21 22 2r > 0and 2rC1D D 2nD 0. Consider now the vectorsAv1; : : : ; A vn. We have

    .Avi /HAvj

    DvHi A

    HAvjD

    2jvHi vj

    D0 i

    j; (2.108)

    i.e.,Av1; : : : ; A vnare orthogonal. WheniD jkAvik2 D vHi AHAviD 2ivHi viD 2i > 0 iD 1 ; : : : ; r

    D 0 i > r: (2.109)Thus, AvrC1D D AvnD 0 and hence vrC1; : : : ; vn belong to the null space ofA. Defineu1; : : : ; ur by

    uiD .1=i /Avi iD 1 ; : : : ; r : (2.110)

    33

  • 8/11/2019 NumLinAlg

    35/72

    Thenu1; : : : ; ur is an orthonormal set of vectors in the range ofA that span the range ofA. Thus,

    u1; : : : ; ur is a basis for the range ofA. The dimensionr of the range ofA is called the rank of

    A. Ifr < m, we can extend the set u1; : : : ; ur of orthonormal vectors to an orthonormal basis

    u1; : : : ; um of m-space using the Gram-Schmidt process. Ifx is an n-vector, we can write x in

    terms of the basisv1; : : : ; vn as

    xDnX

    kD1

    .vHk x/vk: (2.111)

    It follows from equations (2.110) and(2.111) that

    AxDnX

    kD1

    .vHk x/AvkDrX

    kD1

    .vHk x/kukDrX

    kD1

    kukvHk x: (2.112)

    Sincex in equation (2.112) was arbitrary, we must have

    A DrX

    kD1kukv

    H

    k : (2.113)

    The representation ofA in equation (2.113) is called the singular value decomposition (SVD). If

    x belongs to the null space ofA (AxD 0), then it follows from equation (2.112) and the linearindependence of the vectors u1; : : : ; ur that v

    Hk

    xD 0 for kD 1; : : : ; r . It then follows fromequation (2.111) that

    xDnX

    kDrC1

    .vHk x/vk;

    i.e.,vrC1; : : : ; vn span the null space ofA. SincevrC1; : : : ; vn are orthonormal vectors belonging

    to the null space ofA, they form a basis for the null space ofA.

    We will now express the SVD in matrix form. Define UD .u1; : : : ; um/,VD .v1; : : : ; vn/, andSD diag.1; : : : ; r /. Ifr

  • 8/11/2019 NumLinAlg

    36/72

    Generally we write the SVD in the form (2.114) with the understanding that some of the zero

    portions might collapse and disappear.

    We next give a geometric interpretation of the SVD. For this purpose we will restrict ourselves to

    the real case. Letx be a point on the unit sphere, i.e., kxk D 1. Sinceu1; : : : ; ur is a basis for therange ofA, there exist numbersy1; : : : ; yk such that

    AxDrX

    kD1

    ykuk

    DrX

    kD1

    k.vTkx/uk :

    Therefore, ykD k.vTkx/,kD 1; : : : ; r . Since the columns ofVform an orthonormal basis, wehave

    xDnX

    kD1.v

    T

    kx/vk:

    Therefore,

    kxk2 DnX

    kD1

    .vTkx/2 D 1:

    It follows thaty2121

    C C y2r

    2rD .vT1x/2 C C .vTr x/2 1:

    Here equality holds when r D n. Thus, the image ofx lies on or interior to the hyper ellipsoidwith semi axes1u1; : : : ; r ur . Conversely, ify1; : : : ; yr satisfy

    y2121

    C C y2r

    2r 1;

    we define2 D 1 PrkD1.yk=k/2 andxD

    rXkD1

    yk

    kvkC vrC1:

    SincevrC1is in the null space ofA and AvkD kuk (k r), it follows that

    AxDrX

    kD1

    yk

    kAvkC AvrC1D

    rXkD1

    ykuk:

    In addition,

    kxk2 DrX

    kD1

    y2k

    2kC 2 D 1:

    35

  • 8/11/2019 NumLinAlg

    37/72

    Thus, we have shown that the image of the unit sphere kxk D 1under the mappingAis the hyperellipsoid

    y2121

    C C y2r

    2r1

    relative to the basis u1; : : : ; ur . When rD

    n, equality holds and the image is the surface of the

    hyper ellipsoid

    y2121

    C Cy2r

    2nD 1:

    2.4.2 The SVD and Least Squares Problems

    In least squares problems we seek an x that minimizeskAx bk. In view of the singular valuedecomposition, we have

    kAx bk2 DUS 00 0

    VHx b

    2DU

    S 0

    0 0

    VHx UHb

    2

    D

    S 0

    0 0

    VHx UHb

    2

    : (2.118)

    If we define

    y

    D y1

    y2 D V

    Hx (2.119)

    ObD Ob1

    Ob2

    !D UHb: (2.120)

    then equation (2.118) can be written

    kAx bk2 D

    Sy1 Ob1 Ob2

    !2

    D kSy1 Ob1k2 C k Ob2k2: (2.121)

    It is clear from equation (2.121) that kAx bk is minimized wheny1DS1 Ob1. Therefore, theythat minimizes kAx bk is given by

    yD

    S1 Ob1y2

    y2arbitrary. (2.122)

    In view of equation (2.119), thex that minimizes kAx bk is given by

    xD VyD V

    S1 Ob1y2

    y2 arbitrary. (2.123)

    36

  • 8/11/2019 NumLinAlg

    38/72

    SinceVis unitary, it follows from equation (2.123) that

    kxk2 D kS1 Ob1k2 C ky2k2:

    Thus, there is a unique xof minimum norm that minimizes kAx bk, namely the xcorrespondingtoy2D 0. Thisx is given by

    xD V

    S1 Ob10

    D V

    S1 0

    0 0

    Ob1Ob2

    !

    D V

    S1 0

    0 0

    UHb:

    The matrix multiplyingb on the right-hand-side of this equation is called the generalized inverse

    ofAand is denoted byAC

    , i.e.,

    AC D V

    S1 0

    0 0

    UH: (2.124)

    Thus, the minimum norm solution of the least squares problem is given byxD ACb. Then mmatrixAC plays the same role in least squares problems that A1 plays in the solution of linear

    equations. We will now show that this definition of the generalized inverse gives the same result

    as the classical Moore-Penrose conditions.

    Theorem 2. IfA has a singular value decomposition given by

    A D U

    S 0

    0 0

    VH;

    then the matrixXdefined by

    XD AC D V

    S1 0

    0 0

    UH

    is the unique solution of the Moore-Penrose conditions:

    1. AXA D A

    2. XAXD X3. .AX/H D AX4. .XA/H D XA.

    37

  • 8/11/2019 NumLinAlg

    39/72

    Proof:

    AXA D U

    S 0

    0 0

    VHV

    S1 0

    0 0

    UHU

    S 0

    0 0

    VH

    DUS 0

    0 0I 0

    0 0VH

    D U

    S 0

    0 0

    VH

    D A;i.e.,Xsatisfies condition (1).

    XAXD V

    S1 0

    0 0

    UHU

    S 0

    0 0

    VHV

    S1 0

    0 0

    UH

    D VS1 0

    0 0UH

    D X;i.e.,Xsatisfies condition (2). Since

    AXD U

    S 0

    0 0

    VHV

    S1 0

    0 0

    UH D U

    I 0

    0 0

    UH

    and

    XA D V

    S1 0

    0 0

    UHU

    S 0

    0 0

    VH D V

    I 0

    0 0

    VH;

    it follows that both AX and XA are Hermitian, i.e., Xsatisfies conditions (3) and (4). To show

    uniqueness let us suppose that bothXandY satisfy the Moore-Penrose conditions. Then

    XD XAX by (2)D X.AX/H D XXHAH by (3)D XXH.AYA/H D XXHAHYHAH by (1)D XXHAH.AY /H D XXHAHAY by (3)D X.AX/HAYD XAXAY by (3)D XAY by (2)D X.AYA/Y by (1)

    DXA.YA/Y

    DXA.YA/HY

    DXAAHYHY by (4)

    D .XA/HAHYHYD AHXHAHYHY by (4)D .AXA/HYHYD AHYHY by (1)D .YA/HYD YAY by (4)D Y by (2):

    Thus, there is only one matrix Xsatisfying the Moore-Penrose conditions.

    38

  • 8/11/2019 NumLinAlg

    40/72

    2.4.3 Singular Values and the Norm of a Matrix

    LetAbe anm nmatrix. By virtue of the SVD, we have

    AxDrX

    kD1k.v

    H

    k x/uk for any n-vectorx: (2.125)

    Since the vectorsu1; : : : ; ur are orthonormal, we have

    kAxk2 DrX

    kD1

    2k jvHk xj2 21rX

    kD1

    jvHk xj2 21 kxk2: (2.126)

    The last inequality comes from the fact thatx has the expansion xD PnkD1.vHk x/vk in terms ofthe orthonormal basisv1; : : : ; vn and hence

    kxk2

    DnX

    kD1jv

    H

    k xj2

    :

    Thus, we have

    kAxk 1kxk for allx. (2.127)SinceAv1D 1u1, we have kAv1k D 1D 1kv1k. Hence,

    maxx0

    kAxkkxk D 1; (2.128)

    i.e.,A cant stretch the length of a vector by a factor greater than 1. One of the definitions of the

    norm of a matrix is

    kAk D supx0

    kAxkkxk : (2.129)

    It follows from equations (2.128) and(2.129) that kAk D1 (the maximum singular value ofA).IfAis of full rank (r=n), then it follows by a similar argument that

    minx0

    kAxkkxk D n:

    IfAis anm nmatrix andB is ann pmatrix, then for everyp-vectorx we havekABxk kAk kB xk kAk kBk kxk

    and hence kABk kAk kBk.

    2.4.4 Low Rank Matrix Approximations

    You can think of the rank of a matrix as a measure of redundancy. Matrices of low rank should

    have lots of redundancy and hence should be capable of specification by less parameters than the

    39

  • 8/11/2019 NumLinAlg

    41/72

    total number of entries. For example, if the matrix consists of the pixel values of a digital image,

    then a lower rank approximation of this image should represent a form of image compression. We

    will make this concept more precise in this section.

    One choice for a low rank approximation to Ais the matrixAkD PkiD1 i ui v

    Hi fork < r. Ak is

    a truncated SVD expansion ofA. Clearly

    A AkDrX

    iDkC1

    i ui vHi : (2.130)

    Since the largest singular value ofA Ak iskC1, we have

    kA Akk D kC1: (2.131)

    Suppose B is another mnmatrix of rankk. Then the null space N ofB has dimension nk. Letw1; : : : ; wnk be a basis for N. Then C 1 n-vectorsw1; : : : ; wnk ; v1; : : : ; vkC1 must be linearlydependent, i.e., there are constants1; : : : ; ank and1; : : : ; kC1, not all zero, such that

    nkXiD1

    i wiCkC1XiD1

    i viD 0:

    Not all of thei can be zero since v1; : : : ; vkC1 are linearly independent. Similarly, not all of the

    i can be zero. Therefore, the vector h defined by

    h DnkXiD1

    i wiD kC1XiD1

    i vi

    is a nonzero vector that belongs to both N and < v1; : : : ; vkC1 >. By proper scaling, we can

    assume thathis a vector with unit norm. Sincehbelongs to< v1; : : : ; vkC1 >, we have

    h DkC1XiD1

    .vHi h/vi : (2.132)

    Therefore,

    khk2 DkC1XiD1

    jvHi hj2: (2.133)

    SinceAviD i ui foriD 1 ; : : : ; r, it follows from equation (2.132) that

    Ah DkC1XiD1

    .vHi h/AviDkC1XiD1

    .vHi h/i ui : (2.134)

    Therefore,

    kAhk2 DkC1XiD1

    jvHi hj22i 2kC1kC1XiD1

    jvHi hj2 D 2kC1khk2: (2.135)

    40

  • 8/11/2019 NumLinAlg

    42/72

    Sincehbelongs to the null space N, we have

    kA Bk2 k.A B /hk2 D kAhk2 2kC1khk2 D 2kC1: (2.136)

    Combining equations (2.131) and(2.136), we obtain

    kA Bk kC1D kA Akk: (2.137)

    Thus,Ak is the rankk matrix that is closest to A.

    2.4.5 The Condition Number of a Matrix

    SupposeA is ann ninvertible matrix andx is the solution of the system of equations AxD b .We want to see how sensitive x is to perturbations of the matrixA. LetxC x be the solution tothe perturbed system .A

    CA/.x

    Cx/

    Db. Expanding the left-hand-side of this equation and

    neglecting the second order perturbations Ax, we get

    A xC A xD 0 or xD A1Ax: (2.138)

    It follows from equation (2.138) that

    kxk kA1kkAkkxk

    or kxk=kxk

    kA

    k=

    kA

    k

    kA1kkAk: (2.139)

    The quantity kA1kkAk is called the condition number ofAand is denoted by.A/, i.e.,

    .A/ D kA1kkAk:

    Thus, equation (2.139) can be written

    kxk=kxkkAk=kAk .A/: (2.140)

    We have seen previously thatkAk D 1, the largest singular value. SinceA1 has the singularvalue decompositionA1

    D V S1UH, it follows that

    kA1

    k D1=n. Therefore, the condition

    number is given by.A/ D 1

    n: (2.141)

    The condition number is sort of an aspect ratio of the hyper ellipsoid that A maps the unit sphere

    into.

    41

  • 8/11/2019 NumLinAlg

    43/72

    2.4.6 Computation of the SVD

    The methods for calculating the SVD are all variations of methods used to calculate eigenvalues

    and eigenvectors of Hermitian Matrices. The most natural procedure would be to follow the deriva-

    tion of the SVD and compute the squares of the singular values and the unitary matrix Vby solving

    the eigenproblem forAHA. TheUmatrix would then be obtained from AV. Unfortunately, this

    procedure is not very accurate due to the fact that the singular values ofAHA are the squares of the

    singular values ofA. As a result the ratio of largest to smallest singular value can be much larger

    forAHAthan forA. There are, however, implicit methods that solve the eigenproblem forAHA

    without ever explicitly formingAHA. Most of the SVD algorithms first reduceA to bidiagonal

    form (all elements zero except the diagonal and first superdiagonal). This can be accomplished

    using householder reflections alternately on the left and right as shown in figure2.2.

    A1D UH1 A D

    x x x x0 x x x

    0 x x x

    0 x x x

    0 x x x

    ! A2D A1V1D

    x x 0 00 x x x

    0 x x x

    0 x x x

    0 x x x

    !

    A3D UH2 A2D

    x x 0 0

    0 x x x

    0 0 x x

    0 0 x x

    0 0 x x

    ! A4D A3V2D

    x x 0 0

    0 x x 0

    0 0 x x

    0 0 x x

    0 0 x x

    !

    A5D UH3 A4D

    x x 0 0

    0 x x 0

    0 0 x x

    0 0 0 x

    0 0 0 x

    ! A6D UH4 A5D

    x x 0 0

    0 x x 0

    0 0 x x

    0 0 0 x

    0 0 0 0

    :

    Figure 2.2: Householder reduction of a matrix to bidiagonal form.

    Since the application of the Householder reflections on the right dont try to zero all the elementsto the right of the diagonal, they dont affect the zeroes already obtained in the columns. We have

    seen that, even in the complex case, the Householder matrices can be chosen so that the resulting

    bidiagonal matrix is real. Notice also that when the number of rowsm is greater than the number

    of columns n, the reduction produces zero rows after rown. Similarly, whenn > m, the reduction

    produces zero columns after columnm. If we replace the products of the Householder reflections

    by the unitary matrices OU and OV, the reduction to a bidiagonalB can be written asBD OUHA OV or A DOU BOVH: (2.142)

    42

  • 8/11/2019 NumLinAlg

    44/72

    IfB has the SVDBD NU NVT, thenA has the SVD

    A DOU .NU NVT/ OVH D .OUNU/. OVNV /H D UVH;

    whereUD OUNU andVDOVNV. Thus, it is sufficient to find the SVD of the real bidiagonal matrixB . Moreover, it is not necessary to carry along the zero rows or columns ofB . For if the squareportionB1ofB has the SVDB1D U11VT1 , then

    BD

    B10

    D

    U11VT

    1

    0

    D

    U1 0

    0 I

    10

    VT1 (2.143)

    or

    BD .B1; 0/ D .U11VT1 ; 0/ D U1.1; 0/

    V1 0

    0 I

    T: (2.144)

    Thus, it is sufficient to consider the computation of the SVD for a real, square, bidiagonal matrix

    B .

    In addition to the implicit methods of finding the eigenvalues ofB TB, some methods look instead

    at the symmetric matrix

    0 B T

    B 0

    . If the SVD ofB is B D U VT, then

    0 B T

    B 0

    has the

    eigenequation 0 B T

    B 0

    V V

    U U

    D

    V V

    U U

    0

    0

    : (2.145)

    In addition, the matrix

    0 BT

    B 0

    can be reduced to a real tridiagonal matrixTby the relation

    T

    DPTBP (2.146)

    where PD .e1; enC1; e2; enC2; : : : ; en; e2n/ is a permutation matrix formed by a rearrangementof the columns e1; e2; : : : ; e2n of the 2n2 n identity matrix. The matrixP is unitary and issometimes called theperfect shufflesince its operation on a vector mimics a perfect card shuffle of

    the components. The algorithms based on this double size Symmetric matrix dont actually form

    the double size matrix, but make efficient use of the symmetries involved in this eigenproblem.

    For those interested in the details of the various SVD algorithms, I would refer you to the book by

    Demmel [4].

    In Matlab the SVD can be obtained by the call [U,S,V]=svd(A). In LAPACK the general driver

    routines for the SVD are SGESVD, DGESVD, and CGESVD depending on whether the matrix is

    real single precision, real double precision, or complex.

    43

  • 8/11/2019 NumLinAlg

    45/72

    Chapter 3

    Eigenvalue Problems

    Eigenvalue problems occur quite often in Physics. For example, in Quantum Mechanics eigen-values correspond to certain energy states; in structural mechanics problems eigenvalues often

    correspond to resonance frequencies of the structure; and in time evolution problems eigenvalues

    are often related to the stability of the system.

    LetA be anm msquare matrix. A nonzero vector x is an eigenvector ofA and is its corre-sponding eigenvalue, if

    AxD x:The set of vectors

    VD fxW AxD xg

    is a subspace called the eigenspace corresponding to . The equationAxD x is equivalent to.A I/xD 0. Ifis an eigenvalue, then the matrixA I is singular and hencedet.A I/ D 0:

    Thus, the eigenvalues ofA are roots of a polynomial equation of order n. This polynomial equation

    is called the characteristic equation ofA. Conversely, ifp.z/ D a0 Ca1z C C an1zn1 Canznis an arbitrary polynomial of degreen (an 0), then the matrix0 a0=an1 0 a1=an1 0 a2=an1 : : : :::

    : : : 0 an2=an1 an1=an

    hasp.z/ D 0as its characteristic equation.

    In some problems an eigenvalue might correspond to a multiple root of the characteristic equa-

    tion. The multiplicity of the rootis called its algebraic multiplicity. The dimension of the space

    44

  • 8/11/2019 NumLinAlg

    46/72

    V is called its geometric multiplicity. If for some eigenvalue ofA, the geometric multiplicity

    of does not equal its geometric multiplicity, this eigenvalue is said to be defective. A matrix

    with one or more defective eigenvalues is said to be a defective matrix. An example of a defective

    matrix is the matrix 2 1 00 2 10 0 2

    :

    This matrix has the single eigenvalue 2 with algebraic multiplicity 3. However, the eigenspace

    corresponding to the eigenvalue 2 has dimension 1. All the eigenvectors are multiples ofe1. In

    these notes we will only consider eigenvalue problems involving Hermitian matrices (AH D A).We will see that all such matrices are non defective.

    IfSis a nonsingularm mmatrix, then the matrixS1ASis said to be similar toA. Since

    det.S1AS I/ D det

    S1.A I/S

    D det.S1/ det.A I/ det.S / D det.A I/;

    it follows that S1ASand A have the same characteristic equation and hence the same eigenvalues.It can be shown that a Hermitian matrixAalways has a complete set of orthonormal eigenvectors.

    If we form the unitary matrix U whose columns are the eigenvectors belonging to this orthonormal

    set, then

    AUD U or UHAUD (3.1)where is a diagonal matrix whose diagonal entries are the eigenvalues. Thus, a Hermitian matrix

    is similar a diagonal matrix. Since a diagonal matrix is clearly non defective, it follows that all

    Hermitian matrices are non defective.

    Ife is a unit eigenvector of the Hermitian matrixAand is the corresponding eigenvalue, then

    AeD e and hence D eHAe:

    It follows that D .eHAe/H D eHAHeD eHAeD , i.e., the eigenvalues of a Hermitianmatrix are real.

    It was shown by Abel, Galois and others in the nineteenth century that there can be no alge-

    braic expression for the roots of a polynomial equation whose order is greater than four. Since

    eigenvalues are roots of the characteristic equation and since the roots of any polynomial are the

    eigenvalues of some matrix, there can be no purely algebraic method for computing eigenvalues.

    Thus, algorithms for finding eigenvalues must at some stage be iterative in nature. The methods

    to be discussed here first reduce the Hermitian matrix A to a real, symmetric, tridiagonal matrixT by means of a unitary similarity transformation. The eigenvalues ofTare then found using

    certain iterative procedures. The most common iterative procedures are theQRalgorithm and the

    divide-and-conquer algorithm.

    45

  • 8/11/2019 NumLinAlg

    47/72

  • 8/11/2019 NumLinAlg

    48/72

    and let 1; : : : ; n be the corresponding eigenvalues. We will assume that the eigenvalues and

    eigenvectors are so ordered that

    j1j j2j jnj:We will assume further that

    j1

    j>

    j2

    j. Let v be an arbitrary vector with

    kv

    k D1. Then there

    exist constantsc1; : : : ; cn such that

    vD c1v1 C C cnvn: (3.2)We will make the further assumption thatc1 0. Successively applyingA to equation (3.2), weobtain

    AkvD c1Akv1 C C cnAkvnD c1k1 v1 C C cnknvn: (3.3)You can see from equation (3.3) that the term c1

    k1 v1 will eventually dominate and thus A

    kv,

    if properly scaled at each step to prevent overflow, will approach a multiple of the eigenvector

    v1. This convergence can be slow if there are other eigenvalues close in magnitude to 1. The

    conditionc1

    0is equivalent to the condition

    < v > \ < v2; : : : ; vn >D f0g:

    3.3 The Rayleigh Quotient

    The Rayleigh quotient of a vectorx is the real number

    r.x/

    D

    xTAx

    xT

    x

    :

    Ifx is an eigenvector ofA corresponding to the eigenvalue, thenr.x/D. Ifx is any nonzerovector, then

    kAx xk2 D .xTAT xT/.Ax x/D xTATAx 2xTAx C 2xTxD xTATAx 2r.x/xTx C 2xTx C r2.x/xTx r2.x/xTxD xTATAx C xTx r.x/2 r2.x/xTx:

    Thus,

    D r.x/ minimizes

    kAx

    x

    k. If x is an approximate eigenvector, then r.x/ is an

    approximate eigenvalue.

    3.4 Inverse Iteration with Shifts

    For any that is not an eigenvalue ofA, the matrix.A I/1 has the same eigenvectors asAand has eigenvalues .j /1 wherefjg are the eigenvalues ofA. Suppose is close to the

    47

  • 8/11/2019 NumLinAlg

    49/72

    eigenvalue i . Then.i /1 will be large compared to .j /1 for j i . If we apply poweriteration to .AI /1, the process will converge to a multiple of the eigenvector vi correspondingtoi . This procedure is called inverse iteration with shifts. Although the power method is not used

    in practice, the inverse power method with shifts is frequently used to compute eigenvectors once

    an approximate eigenvalue has been obtained.

    3.5 Rayleigh Quotient Iteration

    The Rayleigh quotient can be used to obtain the shifts at each stage of inverse iteration. The

    procedure can be summarized as follows.

    1. Choose a starting vectorv.0/ of unit magnitude.

    2. Let.0/ D .v0/TAv0 be the corresponding Rayleigh quotient.3. ForkD 1 ; 2 ; : : :

    Solve

    A .k1/wD v.k1/ forw, i.e., compute A .k1/1v.k1/.Normalizew to obtainv.k/ D w=kwk.Let.k/ D .v.k//TAv.k/ be the corresponding Rayleigh quotient.

    It can be shown that the convergence of Rayleigh quotient iteration is ultimately cubic. Cubic

    convergence triples the number of significant digits on each iteration.

    3.6 The Basic QR Method

    The QR method was discovered independently by Francis [6] and Kublanovskaya [11] in 1961.

    It is one of the standard methods for finding eigenvalues. The discussion in this section is based

    largely on the paperUnderstanding the QR Algorithmby Watkins [13]. As before, we will assume

    that the matrixA is real and symmetric. Therefore, there is an orthonormal basis v1; : : : ; vn such

    that AvjD

    jvj for each j . We will assume that the eigenvaluesj are ordered so that

    j1

    j j2j jnj.

    The QR algorithm can be summarized as follows:

    48

  • 8/11/2019 NumLinAlg

    50/72

    1. ChooseA0D A2. Form D 1 ; 2 ; : : :

    Am1D

    QmRm QRfactorization

    AmD RmQm

    3. Stop whenAmis approximately diagonal.

    It is probably not obvious what this algorithm has to do with eigenvalues. We will show that the QR

    method is a way of organizing simultaneous iteration, which in turn is a multivector generalization

    of the power method.

    We can apply the power method to subspaces as well as to single vectors. SupposeS is a k-

    dimensional subspace. We can compute the sequence of subspacesS;AS; A2S ; : : : . Under certain

    conditions this sequence will converge to the subspace spanned by the eigenvectors v1; v2; : : : ; vkcorresponding to the klargest eigenvalues ofA. We will not provide a rigorous convergence proof,

    but we will attempt to make this result seem plausible. Assume that jkj >jkC1j and define thesubspaces

    TD< v1; : : : ; vk > UD< vkC1; : : : ; vn > :We will first show that all the null vectors ofA lie in U. Supposev is a null vector ofA, i.e.,

    AvD 0. We can expandv in terms of the basisv1; : : : ; vn giving

    vD c1v1 C C ckvkC ckC1vkC1 C C cnvn:

    Thus,AvD c11v1 C C ckk vkC ckC1kC1vkC1 C C cnnvnD 0:

    Since the vectorsfvjg are linearly independent andj1j jk j > 0, it follows that c1Dc2D D ckD 0, i.e.,v belongs to the subspace U. We will now make the additional assumptionS\ UD f0g. This assumption is analogous to the assumptionc1 0in the power method. Ifxis a nonzero vector inS, then we can write

    xD c1v1 C c2v2 C C ckvk .component inT /C ckC1vkC1 C C cnvn: .component inU /

    Thus,

    Amx=.k /m D c1.1=k/mv1 C C ck1.k1=k/mvk1 C ckvk

    C ckC1.kC1=k/mvkC1 C C cn.n=k/mvn:

    Sincex doesnt belong to U, at least one of the coefficients c1; : : : ; ck must be nonzero. Notice

    that the firstk terms on the right-hand-side do not decrease in absolute value asm! 1 whereasthe remaining terms approach zero. Thus, Amx, if properly scaled, approaches the subspaceT as

    m ! 1. In the limit AmSmust approach a subspace ofT. Since S\UD f0g,A can have no null

    49

  • 8/11/2019 NumLinAlg

    51/72

    vectors inS. Thus,A is invertible onS. It follows that all of the subspaces AmShave dimension

    kand hence the limit can not be a proper subspace ofT, i.e.,AmS! T asm ! 1.

    Numerically, we cant iterate on an entire subspace. Therefore, we pick a basis of this subspace

    and iterate on this basis. Letq01 ; : : : ; q0k

    be a basis ofS. SinceA is invertible onS,Aq01 ; : : : ; A q0k

    is a basis ofAS. Similarly, Am

    q01 ; : : : ; A

    m

    q0k is a basis ofA

    m

    S for all m. Thus, in principlewe can iterate on a basis of S to obtain bases for AS;A2S ; : : : . However, for large m these

    bases become ill-conditioned since all the vectors tend to point in the direction of the eigenvector

    corresponding to the eigenvalue of largest absolute value. To avoid this we orthonormalize the basis

    at each step. Thus, given an orthonormal basisqm1 ; : : : ; qmk

    ofAmS, we computeAq m1 ; : : : ; Aqmk

    and then orthonormalize these vectors (using something like the Gram-Schmidt process) to obtain

    an orthonormal basis qmC11 ; : : : ; qmC1k

    ofAmC1S. This process is called simultaneous iteration.

    Notice that this process of orthonormalization has the property

    < Aqm1 ; : : : ; Aqmi >D< q mC11 ; : : : ; qmC1i > foriD 1 ; : : : ; k :

    Let us consider now what happens when we apply simultaneous iteration to the complete set of

    orthonormal vectorse1 : : : ; enwhereek is thek-th column of the identity matrix. Let us define

    SkD< e1; : : : ; ek >; TkD< v1; : : : ; vk >; UkD< vkC1; : : : ; vn >

    for k D 1; 2 ; : : : ; n 1. We also assume thatSk\ UkD f0g andjkj >jkC1j > 0 for each1 k n 1. It follows from our previous discussion that AmSk! Tk asm! 1. In termsof bases, the orthonormal vectors q m1 ; : : : ; q

    mn will converge to and orthonormal basis q1; : : : ; qn

    such thatTkD< q1; : : : ; qk >for eachkD 1; : : : ; n 1. Each of the subspacesTk is invariantunderA, i.e.,ATk Tk. We will now look at a property of invariant subspaces. SupposeT is aninvariant subspace ofA. LetQD .Q1; Q2/be an orthogonal matrix such that the columns ofQ1is a basis ofT. Then

    QTAQD

    QT1AQ1 QT1 AQ2

    QT2AQ1 QT2 AQ2

    D

    QT1AQ1 0

    0 QT2 AQ2

    , i.e., the basis consisting of the columns ofQ block diagonalizes A. Let Q be the matrix with

    columnsq1; : : : ; qn. Since eachTk is invariant underA, the matrixQTAQhas the block diagonal

    form

    QTAQD

    A1 0

    0 A2

    whereA1is k k

    for each kD 1; : : : ; n 1. Therefore,QT

    AQ must be diagonal. The diagonal entries are theeigenvalues ofA. If we define AmD QTmAQm where QmD< qm1 ; : : : ; qmn >, then Am willbecome approximately diagonal for largem.

    We can summarize simultaneous iteration as follows:

    50

  • 8/11/2019 NumLinAlg

    52/72

    1. We start with the orthogonal matrix Q0D Iwhose columns form a basisofn-space

    2. ForkD 1 ; 2 ; : : : we compute

    ZmD AQm1 Power iteration step(3.4a)

    ZmD QmRm Orthonormalize columns ofZm (3.4b)AmD QTmAQm Test for diagonal matrix: (3.4c)

    TheQRalgorithm is an efficient way to organize these calculations. Equations (3.4a) and(3.4b)

    can be combined to give

    AQm1D QmRm: (3.5)Combining equations (3.4c) and(3.5), we get

    Am1D QTm1AQm1D QTm1.QmRm/ D .QTm1Qm/RmD OQmRm (3.6)where OQmD QTm1Qm. Equation (3.5) can be rewritten as

    QTmAQm1D Rm: (3.7)Combining equations (3.4c) and(3.7), we get

    AmD QTmAQmD .QTmAQm1/QTm1QmD Rm.QTm1Qm/ D RmOQm: (3.8)Equation (3.6) is a QR factorization ofAm1. Equation (3.8) shows that Am has the same Q

    and R factors but with their order reversed. Thus, theQRalgorithm generates the matrices Amrecursively without having to compute Zmand Qmat each step. Note that the orthogonal matricesOQmand Qmsatisfy the relation

    OQ1OQ2 OQkD .QT0Q1/.QT1 Q2/ .QTk1Qk/ D Qk:

    We have now seen that the QRmethod can be considered as a generalization of the power method.

    We will see that the QRalgorithm is also related to inverse power iteration. In fact we have the

    following duality result.

    Theorem 3. IfA is an nn symmetric nonsingular matrix and ifS andS? are orthogonalcomplementary subspaces. ThenA

    m

    S andAm

    S?

    are also orthogonal complements.

    Proof. Ifx andy aren-vectors, then

    x yD xTyD xTAT.AT/1yD .Ax/T.AT/1yD .Ax/TA1yD Ax A1y:Applying this result repeatedly, we obtain

    x yD Amx Amy:

    51

  • 8/11/2019 NumLinAlg

    53/72

    It is clear from this relation that every element in AmSis orthogonal to every element inAmS?.

    Let q1; : : : ; qk be a basis ofS and let qkC1; : : : ; qn be a basis ofS?. Then Amq1; : : : ; A

    mqkis a basis ofAmS and AmqkC1; : : : ; A

    mqn is a basis ofAmS?. Suppose there exist scalars

    c1; : : : ; cn such that

    c1Amq1

    C Cck A

    mqkC

    ckC1AmqkC1

    C CcnA

    mqnD

    0: (3.9)

    Taking the dot product of this relation with c1Amq1 C C ckAmqk, we obtain

    kc1Amq1 C C ckAmqkk D 0and hence c1A

    mq1C CckAmqkD 0. Since Amq1; : : : ; Amqkare linearly independent, it followsthatc1D c2D D ckD 0. In a similar manner we obtain ckC1D D cnD 0. Therefore,Amq1; : : : ; A

    mqk; AmqkC1; : : : ; A

    mqn are linearly independent and hence form a basis for n-

    space. Thus,AmS andAmS? are orthogonal complements.

    It can be seen from this theorem that performing power iteration on subspaces Skis also performing

    inverse power iteration onS

    ?

    k . Since< qm1 ; : : : ; q

    mk >D< Ame1; : : : ; Amek >;

    Theorem3implies that

    < qmkC1; : : : ; qmn >D< AmekC1; : : : ; Amen > :

    For k D n1 we have < qmn >D< Amen >. Thus, qmn is the result at the m-th step ofapplying the inverse power method to en. It follows that q

    mn should converge to an eigenvector

    corresponding to the smallest eigenva


Recommended