Eigenvalue
Problems
Last Time β¦
Social Network Graphs
Betweenness
Girvan-Newman Algorithm
Graph Laplacian
Spectral Bisection
π2, π€2
Today β¦
Small deviation into eigenvalue problems β¦
Formulation
Standard eigenvalue problem: Given a π Γ π matrix π΄, find scalar πand a nonzero vector π₯ such that
π΄π₯ = ππ₯
π is a eigenvalue, and π₯ is the corresponding eigenvector
Spectrum = π(π΄) = set of eigenvalues of π΄
Spectral radius = π π΄ = max π βΆ π β π π΄
Characteristic Polynomial
Equation π΄π₯ = ππ₯ is equivalent to
π΄ β ππΌ π₯ = 0
Eigenvalues of π΄ are roots of the characteristic polynomial
det π΄ β ππΌ = 0
The characteristic polynomial is a powerful theoretical tool but
usually not useful computationally
Considerations
Properties of eigenvalue problem affecting choice of algorithm
Are all eigenvalues needed, or only a few?
Are only eigenvalues needed, or are corresponding eigenvectors also needed?
Is matrix real or complex?
Is matrix relatively small and dense, or large and sparse?
Does matrix have any special properties, such as symmetry?
Problem Transformations
Shift: If π΄π₯ = ππ₯ and π is any scalar, then π΄ β ππΌ π₯ = π β π π₯
Inversion: If π΄ is nonsingular and π΄π₯ = ππ₯, then π β 0 and π΄β1π₯ =1
ππ₯
Powers: If π΄π₯ = ππ₯, then π΄ππ₯ = πππ₯
Polynomial: If π΄π₯ = ππ₯ and π(π‘) is a polynomial, then π π΄ π₯ = π π π₯
Similarity Transforms
π΅ is similar to π΄ if there is a nonsingular matrix π, such that
π΅ = πβ1π΄π
Then,
π΅π¦ = ππ¦ β πβ1π΄ππ¦ = ππ¦ β π΄ ππ¦ = π ππ¦
Similarity transformations preserve eigenvalues and eigenvectors are easily recovered
Diagonal form
Eigenvalues of a diagonal matrix are the diagonal entries and the
eigenvectors are columns of the Identity matrix
The diagonal form is highly desirable in simplifying eigenvalue
problems for general matrices by similarity transformations
But not all matrices are diagonalizable by similarity transformations
Triangular form
Any matrix can be transformed into a triangular form by similarity
The eigenvalues are simply the diagonal values
Eigenvectors are not as obvious, but still easy to compute
Power iteration
Simplest method for computing one eigenvalue-eigenvector pair
π₯π = π΄π₯πβ1
Converges to multiple of eigenvector corresponding to dominant
eigenvalue
We have seen this before while computing the Page Rank
Proof of convergence ?
Convergence of Power iteration
Express starting vector π₯0 in terms of the eigenvectors of π΄
π₯0 =
π=1
π
πΌπ ππ
Then,
π₯π = π΄π₯πβ1 = π΄2π₯πβ2 = β― = π΄
ππ₯0
π=1
π
πππ πΌπππ = ππ
π πΌπππ +
π=1
πβ1ππππ
π
πΌπππ
Since ππ
ππ< 1, successively higher powers go to zero
Power iteration with shift
Convergence rate of power iteration depends on the ratio ππβ1
ππ
It is possible to choose a shift, π΄ β ππΌ, such that
ππβ1 β π
ππ β π<ππβ1ππ
so convergence is accelerated
Shift must be added back to result to obtain eigenvalue of original
matrix
Inverse iteration
If the smallest eigenvalues are required rather than the largest, we
can make use of the fact that the eigenvalues of π΄β1 are reciprocals of those of π΄, so the smallest eigenvalue of π΄ is the
reciprocal of the largest eigenvalue of π΄β1
This leads to the inverse iteration scheme
π΄π¦π = π₯πβ1π₯π = π¦π/ π¦π β
Inverse of π΄ is not computed explicitly, but some factorization of π΄ is
used to solve the system at each iteration
Shifted inverse iteration
As before, the shifting strategy using a scalar π can greatly improve
convergence
It is particularly useful for computing the eigenvector corresponding
to the approximate eigenvalue
Inverse iteration is also useful for computing the eigenvalue closest
to a given value π½, since if π½ is used as the shift, then the desired
eigenvalue corresponds to the smallest eigenvalue of the shifted
matrix
Deflation
Once the dominant eigenvalue and eigenvector ππ, π€π have
been computed, the remaining eigenvalues can be computed using deflation, which effectively removes the known eigenvalue
Let π» be any nonsingular matrix such that π»π₯ = πΌπ1
Then the similarity transform determined by π» transforms π΄ into,
π»π΄π»β1 = ππ ππ
0 π΅
Can now work with π΅ to compute the next eigenvalue,
Process can be repeated to find additional eigenvalues and eigenvectors
Deflation
Alternate approach: let π’π be any vector such that π’πππ€π = ππ
Then π΄ β π€ππ’ππ has eigenvalues ππβ1, β¦ , π1, 0
Possible choices for π’π
π’π = πππ€π, if π΄ is symmetric and π€π is normalized so that π€π 2 = 1
π’π = πππ¦π, where π¦π is the corresponding left eigenvector (π΄ππ¦π = πππ¦π)
π’π = π΄πππ, if π€π is normalized such that π€π β = 1 and the ππ‘β
component of π€π is 1
QR Iteration
Iteratively converges to a triangular or block-triangular form, yielding
all eigenvalues of π΄
Starting with π΄0 = π΄, at iteration π compute QR factorization,
πππ π = π΄πβ1
And form the reverse product,
π΄π = π πππ
Product of orthogonal matrices ππ converges to matrix of
corresponding eigenvectors
If π΄ is symmetric, then symmetry is preserved by the QR iteration, so
π΄π converges to a matrix that is both symmetric and triangular
diagonal
Preliminary reductions
Efficiency of QR iteration can be enhanced by first transforming the
matrix to be as close to triangular form as possible
Hessenberg matrix is triangular except for one additional nonzero
diagonal immediately adjacent to the main diagonal
Symmetric Hessenberg matrix is tridiagonal
Any matrix can be reduced to Hessenberg form in finite number of
steps using Householder transformations
Work per iteration is reduced from πͺ π3 to πͺ(π2) for general
matrices and πͺ(π) for symmetric matrices
Krylov subspace methods
Reduces matrix to Hessenberg or tridiagonal form using only matrix-vector products
For arbitrary starting vector π₯0, ifπΎπ = π₯0 π΄π₯0 β― π΄
πβ1π₯0
thenπΎπβ1π΄πΎπ = πΆπ
where πΆπ is upper Hessenberg
To obtain a better conditioned basis for span(πΎπ), compute the QR factorization,
πππ π = πΎπ
so thatπππ»π΄ππ = π ππΆππ π
β1 β‘ π»
with π» is upper Hessenberg
Krylov subspace methods
Equating ππ‘β columns on each side of the equation π΄ππ = πππ» yields
π΄ππ = β1ππ1 +β―+ βππππ + βπ+1,πππ+1
relating ππ+1 to preceding vectors π1, β¦ , ππ
Premultiplying by πππ» and using orthonormality
βππ = πππ»π΄ππ , π = 1,β¦ , π
These relationships yield the Arnoldi iteration
Arnoldi iteration
π₯0 : arbitrary nonzero starting vector
π1 = π₯0/ π₯0 2
for π = 1,2, β¦
π’π = π΄ππ
for π = 1 to π
βππ = πππ»π’π
π’π = π’π β βππππ
βπ+1,π = π’π 2
if βπ+1,π = 0 then stop
ππ+1 = π’π/βπ+1,π
Arnoldi iteration
Ifππ = π1 β― ππ
thenπ»π = ππ
π»π΄ππ
is a upper Hessenberg matrix
Eigenvalues of π»π, called Ritz values, are approximate eigenvalues of π΄, and Ritz vectors given by πππ¦, where π¦ is an eigenvector of π»π, are corresponding approximate eigenvectors of π΄
Eigenvalues of π»π must be computed by another method, such as QR iteration, but this is an easier problem as π βͺ π
Arnoldi iteration
Is fairly expensive in both work and storage because each new
vector ππ must be orthogonalized against all previous columns of ππ, and all must be stored for that purpose
Is usually restarted periodically with a carefully chosen starting
vector
Ritz vectors and values produced are often good approximations to
eigenvalues and eigenvectors of π΄ after relatively few iterations
Lanczos iteration
Work and storage costs drop dramatically if the matrix is symmetric or Hermitian, since the recurrence has only three terms and π»π is tridiagonal
π0, π½0 = 0 and π₯0 = arbitrary nonzero starting vector
π1 = π₯0/ π₯0 2
for π = 1,2, β¦
π’π = π΄ππ
πΌπ = πππ»π’π
π’π = π’π β π½πβ1ππβ1 β πΌπππ
π½π = π’π 2
if π½π = 0 then stop
ππ+1 = π’π/π½π
Lanczos iteration
πΌπ and π½π are diagonal and subdiagonal entries of symmetric
tridiagonal matrix ππ
As with Arnoldi, Lanczos does not produce eigenvalues and
eigenvectors directly, but only the tridiagonal matrix ππ, whose
eigenvalues and eigenvectors must be computed by another
method to obtain Ritz values and vectors
If π½ = 0, then the invariant subspace has already been identified,
i.e., the Ritz values and vectors are already exact at that point
Lanczos iteration
In principle, if we let Lanczos run until π = π, the resulting tridiagonal
matrix would be orthogonally similar to π΄
In practice, rounding errors cause loss of orthogonality
Problem can be overcome by reorthogonalizing vectors
In practice, this is usually ignored. The resulting approximations are
still good
Krylov subspace methods
Great advantage of Arnoldi and Lanczos is their ability to produce
good approximations to extreme eigenvalues for π βͺ π
They only require one matrix-vector product per step and little
auxiliary storage, so are ideally suited for large sparse matrices
If eigenvalues are needed in the middle of the spectrum, say near
π, then the algorithms can be applied to π΄ β ππΌ β1, assuming it is
practical to solve systems of the form π΄ β ππΌ π₯ = π¦