Convergence of the Block Lanczos Method for Eigenvalue ... · 1 Introduction The Lanczos method...

http://www.uta.edu/math/preprint/

Technical Report 2013-03

Convergence of the Block Lanczos Method for Eigenvalue Clusters

Ren-Cang Li Lei-Hong Zhang

Convergence of the Block Lanczos Method

For Eigenvalue Clusters

Ren-Cang Li˚ Lei-Hong Zhang:

April 26, 2013Revised January 7, 2014

Abstract

The Lanczos method is often used to solve a large scale symmetric matrix eigen-value problem. It is well-known that the single-vector Lanczos method can only findone copy of any multiple eigenvalue and encounters slow convergence towards clusteredeigenvalues. On the other hand, the block Lanczos method can compute all or someof the copies of a multiple eigenvalue and, with a suitable block size, also computeclustered eigenvalues much faster. The existing convergence theory due to Saad forthe block Lanczos method, however, does not fully reflect this phenomenon since thetheory was established to bound approximation errors in each individual approximateeigenpairs. Here, it is argued that in the presence of an eigenvalue cluster, the entireapproximate eigenspace associated with the cluster should be considered as a whole,instead of each individual approximate eigenvectors, and likewise for approximat-ing clusters of eigenvalues. In this paper, we obtain error bounds on approximatingeigenspaces and eigenvalue clusters. Our bounds are much sharper than the exist-ing ones and expose true rates of convergence of the block Lanczos method towardseigenvalue clusters. Furthermore, their sharpness is independent of the closeness ofeigenvalues within a cluster. Numerical examples are presented to support our claims.Also a possible extension to the generalized eigenvalue problem is outlined.

Key words. Block Lanczos method, Krylov subspace, clustered eigenvalues, eigenspace, rateof convergence, Chebyshev polynomial

AMS subject classifications. 65F15

˚Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 76019-0408, USA. ([email protected].) Supported in part by NSF grants DMS-1115834 and DMS-1317330, and aResearch Gift Grant from Intel Corporation.

:Department of Applied Mathematics, Shanghai University of Finance and Economics, 777 GuodingRoad, Shanghai 200433, People’s Republic of China. Supported in part by the National Natural ScienceFoundation of China NSFC-11101257, NSFC-11371102, and the Basic Academic Discipline Program, the11th five year plan of 211 Project for Shanghai University of Finance and Economics. Part of this workis done while this author is a visiting scholar at the Department of Mathematics, University of Texas atArlington from February 2013 to January 2014.

1

1 Introduction

The Lanczos method [15] is widely used for finding a small number of extreme eigenval-ues and their associated eigenvectors of a symmetric matrix (or Hermitian matrix in thecomplex case). It requires only matrix-vector products to extract enough information tocompute the desired solutions, and thus is very attractive in practice when the matrix issparse and its size is too large to be solved by, e.g., the QR algorithm [8, 20] or the ma-trix does not exist explicitly but in the form of being capable of generating matrix-vectormultiplications.

Let A be an N ˆ N Hermitian matrix. Given an initial vector v0, the single-vectorLanczos method begins by recursively computing an orthonormal basis tq1, q2, . . . , qnu ofthe nth Krylov subspace of A on v0:

KnpA, v0q “ spantv0, Av0, . . . , An´1v0u (1.1)

and at the same time the projection of A onto KnpA, v0q: Tn “ QHnAQ, where Qn “

rq1, q2, . . . , qns and usually n ! N . Afterwards some of the eigenpairs pλ, wq of Tn:

Tnw “ λw,

especially the extreme ones, are used to construct approximate eigenpairs pλ, Qnwq of A.The number λ is called a Ritz value and Qnw a Ritz vector. This procedure of computingapproximate eigenpairs is not limited to Krylov subspaces but in general works for anygiven subspace. It is called the Rayleigh-Ritz procedure.

The single-vector Lanczos method has difficulty in computing all copies of a multipleeigenvalue of A. In fact, only one copy of the eigenvalue can be found. To compute allcopies, a block Lanczos method with a block size that is no smaller than the multiplicityof the eigenvalue or some variation of the Lanczos method with deflating strategies has tobe used. The single-vector Lanczos method has difficulty in handling eigenvalue clusters– slow convergence to each individual eigenvalue in the cluster. The closer the eigenvaluesin the cluster are, the slower the convergence will be. A block version should performbetter. This is indeed true and well-known.

There are a few block versions, e.g., the ones introduced by Golub and Underwood[9], Cullum and Donath [5], and, more recently, by Ye [24] for an adaptive block Lanczosmethod (see also Cullum and Willoughby [6], Golub and van Loan [10]). The basic idea isto use an N ˆnb matrix V0, instead of a single vector v0, and accordingly an orthonormalbasis of the nth Krylov subspace of A on V0:

KnpA, V0q “ spantV0, AV0, . . . , An´1V0u (1.2)

will be generated, as well as the projection of A onto KnpA, V0q. Afterwards the sameRayleigh-Ritz procedure is applied to compute approximate eigenpairs of A.

There has been a wealth of development, in both theory and implementation, forthe Lanczos methods, mostly for the single-vector version. The most complete referenceup to 1998 is Parlett [20]. This paper is concerned with the theoretical convergencetheory of the block Lanczos method. Related past works include Kaniel [12], Paige [19],

2

Saad [21], Li [18], as well as the potential-theoretic approach in Kuijlaars [13, 14] who, froma very different perspective, investigated which eigenvalues are found first according to theeigenvalue distribution as N Ñ 8, and what are their associated convergence rates as ngoes to 8 while nN stays fixed. Results from these papers are all about the convergenceof a single eigenvalue and eigenvector, even in the analysis of Saad on the block Lanczosmethod.

The focus in this paper is, however, on the convergence of a cluster of eigenvalues,including multiple eigenvalues, and their associated eigenspace for the Hermitian eigen-value problem. Our results distinguish themselves from those of Saad [21] in that theybound errors in approximate eigenpairs belonging to eigenvalue clusters together, ratherthan separately for each individual eigenpair. The consequence is much sharper boundsas our later numerical examples will demonstrate.

One of the key steps in analyzing the convergence of Lanczos methods is to pick(sub)optimal polynomials to minimize error bounds. For any eigenpair other than thefirst one, it is often the standard practice, as in [21], that each chosen polynomial has afactor to annihilate vector components in all proceeding eigenvector directions, resultingin a “bulky” factor in the form of the product involving all previous eigenvalues/Ritzvalues in the error bound. The factor can be big and likely is an artifact of the analyzingtechnique. We propose also a new kind of error bounds that do not have such a “bulky”factor, but require knowledge of the distance from the interested eigenspace to a Krylovsubspace Ki of a lower order as a tradeoff.

The rest of this paper is organized as follows. Section 2 collects some necessary resultson unitarily invariant norms and canonical angles between subspaces for our later use.Section 3 presents the (simplest) block Lanczos method whose convergence analysis thatresults in error bounds of the eigenspace/eigenvalue cluster type is done in section 4 foreigenspaces and section 5 for eigenvalues. In section 6, we perform a brief theoreticalcomparison between our results and related results derived from those of Saad [21] andpoint out when Saad’s bounds will overestimate the true rate of convergence. Numericalexamples are given in section 7 to support our comparison analysis. Section 8 establishesmore bounds, based on the knowledge of Krylov subspaces of lower orders. In section 9, weoutline a possible extension of our results to the generalized eigenvalue problem A ´ λMsolved by a block Lanczos method. Finally, we present our conclusion in section 10.

Throughout this paper, A is an N ˆ N Hermitian matrix, and has

eigenvalues: λ1 ě λ2 ě ¨ ¨ ¨ ě λN , andΛ “ diagpλ1, λ2, . . . , λN q,

orthonormal eigenvectors: u1, u2, . . . , uN , andU “ ru1, u2, . . . , uN s,

eigen-decomposition: A “ UΛUH and UHU “ IN .

(1.3)

Cnˆm is the set of all n ˆ m complex matrices, Cn “ Cnˆ1, and C “ C1. Pk is theset of polynomial of degree no bigger than k. In (or simply I if its dimension is clearfrom the context) is the n ˆ n identity matrix, and ej is its jth column. The superscript“¨H” takes the complex conjugate transpose of a matrix or vector. We shall also adopt

3

MATLAB-like convention to access the entries of vectors and matrices. Let i : j be the setof integers from i to j inclusive. For a vector u and a matrix X, upjq is u’s jth entry, Xpi,jq

is X’s pi, jqth entry; X’s submatrices Xpk:ℓ,i:jq, Xpk:ℓ,:q, and Xp:,i:jq consist of intersectionsof row k to row ℓ and column i to column j, row k to row ℓ, and column i to column j,respectively. RpXq is the column space of X, i.e., the subspace spanned by the columnsof X, and eigpXq denotes the set of all eigenvalues of a square matrix X. For matricesor scalars Xi, both diagpX1, . . . , Xkq and X1 ‘ ¨ ¨ ¨ ‘ Xk denote the same block diagonalmatrix with the ith diagonal block Xi.

2 Preliminaries

2.1 Unitarily invariant norm

A matrix norm ~ ¨ ~ is called a unitarily invariant norm on Cmˆn if it is a matrix normand has the following two properties [1, 22]

1.‌

‌XHBY‌

‌ “ ~B~ for all unitary matrices X and Y of apt sizes and B P Cmˆn.

2. ~B~ “ B2, the spectral norm of B, if rankpBq “ 1.

Two commonly used unitarily invariant norms are

the spectral norm: B2 “ maxj σj ,

the Frobenius norm: BF “b

ř

j σ2j ,

where σ1, σ2, . . . , σmintm,nu are the singular values of B. The trace norm

~B~trace “ÿ

j

σj

is a unitarily invariant norm, too. In what follows, ~¨~ denotes a general unitarily invariantnorm.

In this article, for convenience, any ~ ¨ ~ we use is generic to matrix sizes in the sensethat it applies to matrices of all sizes. Examples include the matrix spectral norm ¨2, theFrobenius norm ¨ F, and the trace norm. One important property of unitarily invariantnorms is

~XY Z~ ď X2 ¨ ~Y ~ ¨ Z2

for any matrices X, Y , and Z of compatible sizes.

Lemma 2.1. Let H and M be two Hermitian matrices, and let S be a matrix of a compat-ible size as determined by the Sylvester equation HY ÝM “ S. If eigpHqXeigpMq “ H,then the equation has a unique solution Y , and moreover

~Y ~ ďc

η~S~ ,

where η “ min |µ´ω| over all µ P eigpMq and ω P eigpHq, and the constant c lies between1 and π2, and it is 1 for the Frobenius norm, or if either eigpHq is in a closed intervalthat contains no eigenvalue of M or vice versa.

4

This lemma for the Frobenius norm and for the case when either eigpHq is in a closedinterval that contains no eigenvalue of M or vice versa is essentially in [7] (see also [22]),and it is due to [2, 3] for the most general case: eigpHq X eigpMq “ H and any unitarilyinvariant norm.

2.2 Angles between subspaces

Consider two subspaces X and Y of CN and suppose

k :“ dimpXq ď dimpYq “: ℓ. (2.1)

Let X P CNˆk and Y P CNˆℓ be orthonormal basis matrices of X and Y, respectively, i.e.,

XHX “ Ik, X “ RpXq, and Y HY “ Iℓ, Y “ RpY q,

and denote by σj for 1 ď j ď k in ascending order, i.e., σ1 ď ¨ ¨ ¨ ď σk, the singular valuesof Y HX. The k canonical angles θjpX,Yq from1 X to Y are defined by

0 ď θjpX,Yq :“ arccosσj ďπ

2for 1 ď j ď k. (2.2)

They are in descending order, i.e., θ1pX,Yq ě ¨ ¨ ¨ ě θkpX,Yq. Set

ΘpX,Yq “ diagpθ1pX,Yq, . . . , θkpX,Yqq. (2.3)

It can be seen that angles so defined are independent of the orthonormal basis matricesX and Y , which are not unique. A different way to define these angles is through theorthogonal projections onto X and Y [23].

When k “ 1, i.e., X is a vector, there is only one canonical angle from X to Y and sowe will simply write θpX,Yq.

In what follows, we sometimes place a vector or matrix in one or both arguments ofθjp ¨ , ¨ q, θp ¨ , ¨ q, and Θp ¨ , ¨ q with the understanding that it is about the subspace spannedby the vector or the columns of the matrix argument.

Proposition 2.1. Let X and Y be two subspaces in CN satisfying (2.1).

(a) For any pY Ď Y with dimppYq “ dimpXq “ k, we have θjpX,Yq ď θjpX, pYq for 1 ď j ď k.

(b) There exist an orthonormal basis tx1, x2, . . . , xku for X and an orthonormal basisty1, y2, . . . , yℓu for Y such that

θjpX,Yq “ θjpxj , yjq for 1 ď j ď k, and

xHi yj “ 0 for 1 ď i ď k, k ` 1 ď j ď ℓ.

In particular, ΘpX,Yq “ ΘpX, pYq, where pY “ spanty1, y2, . . . , yku, and the subspacepY is unique if θ1pX,Yq ă π2.

1If k “ ℓ, we may say that these angles are between X and Y.

5

Proof. Let X P CNˆk and Y P CNˆℓ be orthonormal basis matrices of X and Y, respec-tively. If pY Ď Y with dimppYq “ dimpXq “ k, then there is W P Cℓˆk with orthonormal

columns such that pY “ YW is an orthonormal basis matrix of pY. Since cos θjpX, pYq for1 ď j ď k are the singular values of XHYW which are no bigger than the singular valuesof XHY , i.e., cos θjpX, pYq ď cos θjpX,Yq individually, or equivalently, θjpX, pYq ě θjpX,Yq

for 1 ď j ď k. This is item (a).For item (b), let XHY “ V ΣWH be the SVD of XHY , where

Σ “ diagpσ1, σ2, . . . , σkq ‘ 0kˆpℓ´kq.

Then XV and YW are orthonormal basis matrices of X and Y, respectively, and theircolumns, denoted by xi and yj , respectively, satisfy the specified requirements in thetheorem. If also θ1pX,Yq ă π2, then all σi ą 0 and the first k columns of W spans

RpY HXq which is unique; so pY is unique for each given basis matrix Y . We have to prove

that pY is independent of the choosing of Y . Let rY be another orthonormal basis matrix ofY. Then rY “ Y Z for some ℓ ˆ ℓ unitary matrix Z. Following the above construction forpY, we will have a new pYnew “ RprY ĂWp:,1:kqq, where ĂW is from the SVD XH

rY “ rV ΣĂWH.Notice

XHrY “ XHY Z “ V ΣpWHZq

which is yet another SVD of XHrY . Thus the columns of pZHW qp:,1:kq “ ZHWp:,1:kq span

the column space of rY HX which is also spanned by the columns of ĂWp:,1:kq. Hence ĂWp:,1:kq “

ZHWp:,1:kqM for some nonsingular matrix M , and

rY ĂWp:,1:kq “ YWp:,1:kqM

which implies pYnew “ RprY ĂWp:,1:kqq “ RpYWp:,1:kqq “ pY, as expected.

Proposition 2.2. Let X and Y be two subspaces in CN satisfying (2.1), and let X P CNˆk

be an orthonormal basis matrix of X, i.e., XHX “ Ik. Then

max1ďjďk

sin θpXp:,jq,Yq ď ~sinΘpX,Yq~ ď

kÿ

j“1

sin θpXp:,jq,Yq, (2.4)

max1ďjďk

sin θpXp:,jq,Yq ď sinΘpX,YqF “

g

f

f

e

kÿ

j“1

sin2 θpXp:,jq,Yq , (2.5)

~sinΘpX,Yq~ ď ~tanΘpX,Yq~ ď~sinΘpX,Yq~

a

1 ´ sin2 θ1pX,Yq. (2.6)

Proof. Let YK P CNˆpN´ℓq be an orthonormal basis matrix of the orthogonal complementof Y in CN . We observe that sin θjpX,Yq for 1 ď j ď k are the singular values of XHYK and

thus ~sinΘpX,Yq~ “‌

‌XHYK

‌

‌. Observe also sin θpXp:,jq,Yq “

‌

‌

‌XH

p:,jqYK

‌

‌

‌“ XH

p:,jqYK2.

Therefore

max1ďjďk

‌

‌

‌XH

p:,jqYK

‌

‌

‌ď ~sinΘpX,Yq~ “

‌

‌XHYK

‌

‌ ď

kÿ

j“1

XHp:,jqYK2,

6

max1ďjďk

›

›

›XH

p:,jqYK

›

›

›

Fď sinΘpX,YqF “ XHYKF “

g

f

f

e

kÿ

j“1

›

›

›XH

p:,jqYK

›

›

›

2

F.

They yield both (2.4) and (2.5). For (2.6), we notice

sin θjpX,Yq ď tan θjpX,Yq “sin θjpX,Yq

cos θjpX,Yqď

sin θjpX,Yq

cos θ1pX,Yq

for 1 ď j ď k.

Proposition 2.3. Let X and Y be two subspaces in CN with equal dimension: dimpXq “

dimpYq “ k, and let X P CNˆk be an orthonormal basis matrix of X, i.e., XHX “ Ik, andY be a (not necessarily orthonormal) basis matrix of Y such that each column of Y is aunit vector, i.e., Yp:,jq2 “ 1 for all j. Then

sinΘpX,Yq2F ď pY HY q´12

kÿ

j“1

sin2 θpXp:,jq, Yp:,jqq. (2.7)

Proof. Since sin2 θjpX,Yq for 1 ď j ď k are the eigenvalues of

Ik ´

”

XHY pY HY q´12ıH ”

XHY pY HY q´12ı

“ pY HY q´12“

Y HY ´ pXHY qHpXHY q‰

pY HY q´12,

we have

sinΘpX,Yq2F “

kÿ

j“1

sin2 θjpX,Yq

“

‌

‌

‌

‌

Ik ´

”

XHY pY HY q´12ıH ”

XHY pY HY q´12ı

‌

‌

‌

‌

trace

ď pY HY q´122‌

‌Y HY ´ pXHY qHpXHY q‌

‌

tracepY HY q´122

“ pY HY q´12 trace`

Y HY ´ pXHY qHpXHY q˘

“ pY HY q´12

kÿ

j“1

”

1 ´ Y Hp:,jqXXHYp:,jq

ı

ď pY HY q´12

kÿ

j“1

”

1 ´ Y Hp:,jqXp:,jqX

Hp:,jqYp:,jq

ı

(2.8)

“ pY HY q´12

kÿ

j“1

sin2 θpXp:,jq, Yp:,jqq,

as was to be shown. In obtaining (2.8), we have used

Y Hp:,jqXXHYp:,jq “ XHYp:,jq

22 ě Y H

p:,jqXp:,jqXHp:,jqYp:,jq

because XHp:,jq

Yp:,jq is the jth entry of the vector XHYp:,jq.

7

Remark 2.1. The inequality (2.7) is about controlling the subspace angle ΘpX,Yq by theindividual angles between corresponding basis vectors. These individual angles depend onthe selection of the basis vectors as well as their labelling. By Proposition 2.1(2), it ispossible to find basis vectors for both X and Y and match them perfectly such that θjpX,Yq

collectively is the same as all individual angles between corresponding basis vectors. But,on the other hand, it is possible that for two close subspaces in the sense that ΘpX,Yq

is tiny there are unfortunately chosen and labelled basis vectors to make one or moreindividual angles between corresponding basis vectors near or even π2. In fact, this canhappen even when X “ Y. Therefore in general the collection tθpXp:,jq, Yp:,jqq, 1 ď j ď ku

cannot be controlled by ΘpX,Yq without additional information.

Proposition 2.4. Let X and Y be two subspaces in CN with equal dimension: dimpXq “

dimpYq “ k. Suppose θ1pX,Yq ă π2.

(a) For any pY Ď Y of dimension k1 “ dimppYq ď k, there is a unique pX Ď X of dimension

k1 such that PYpX “ pY, where PY is the orthogonal projection onto Y. Moreover

θj`k´k1pX,Yq ď θjppX, pYq ď θjpX,Yq for 1 ď j ď k1 (2.9)

which implies~ sinΘppX, pYq~ ď ~sinΘpX,Yq~ .

(b) For any set ty1, y2, . . . , yk1u of orthonormal vectors in Y, there is a set tx1, x2, . . . , xk1u

of linearly independent vectors such that PYxj “ yj for 1 ď j ď k1. Moreover (2.9)

holds for pX “ spantx1, x2, . . . , xk1u and pY “ spanty1, y2, . . . , yk1u.

Proof. Let X P CNˆk and Y P CNˆk be orthonormal basis matrices of X and Y, respec-tively. pY Ď Y can be represented by its orthonormal basis matrix Y pY , where pY P Ckˆk1

satisfies pY HpY “ Ik1 . We need to find a pX Ď X with the desired property. pX Ď X can

be represented by its basis matrix (not necessary orthonormal) X pX, where pX P Ckˆk1 is

nonsingular and to be determined. The equation PYpX “ pY is the same as

Y Y HX pX “ Y pY ô Y HX pX “ pY ô pX “ pY HXq´1pY (2.10)

because θ1pX,Yq ă π2 implies Y HX is nonsingular. This proves the existence of pX “

RpX pXq. Following the argument, one can also prove that this pX is independent of howthe orthonormal basis matrices X and Y are chosen, and thus unique. To prove (2.9), we

note that σj :“ cos θjppX, pYq for 1 ď j ď k1 are the singular values of

pY pY qHpX pXqrpX pXqHpX pXqs´12 “ pY HY HX pXr pXHpXs´12

“

”

pY HpY HXq´HpY HXq´1pY

ı´12.

So the eigenvalues of pY HpY HXq´HpY HXq´1pY are σ´2

j for 1 ď j ď k1. On the other hand,

σj :“ cos θjpX,Yq for 1 ď j ď k are the singular values of Y HX. So the eigenvalues of

8

pY HXq´HpY HXq´1 are σ´2j for 1 ď j ď k. Use the Cauchy interlacing inequalities [20] to

conclude thatσ´2j ě σ´2

j ě σ´2j`k´k1

for 1 ď j ď k1

which yield (2.9). This proves item (a).To prove item (b), we pick the orthonormal basis matrix Y above in such a way

that its first k1 columns are y1, y2, . . . , yk1 . In (2.10), let pY “ re1, e2, . . . , ek1s, i.e., pY “

spanty1, y2, . . . , yk1u, and let pX “ pY HXq´1pY . Then rx1, x2, . . . , xk1s :“ X pX gives what

we need because of (2.10).

Remark 2.2. The part of Proposition 2.4 on the existence of pX in the case of k1 “ 1 isessentially taken from [21, Lemma 4].

Remark 2.3. The canonical angles are defined under the standard inner product xx, yy “

xHy in CN . In a straightforward way, they can be defined under any givenM -inner productxx, yyM “ xHMy, where M P CNˆN is Hermitian and positive definite. We will call theseangles the M -canonical angles. All results we have proved in this section are valid inslightly different forms for the M -canonical angles. Details are omitted.

3 Block Lanczos method

Given V0 P CNˆnb with rankpV0q “ nb, the block Lanczos process [5, 9] of Algorithm 3.1is the simplest version and will generate an orthonormal basis of the Krylov subspaceKnpA, V0q as well as a projection of A onto the Krylov subspace. It is simplest because weassume all Z at Lines 4 and 8 in Algorithm 3.1 have full column rank nb for all j. ThenVj P CNˆnb , and

Kn :“ KnpA, V0q “ RpV1q ‘ ¨ ¨ ¨ ‘ RpVnq, (3.1)

the direct sum of RpVjq for j “ 1, 2, . . . , n. A fundamental relation of the process is

AQn “ QnTn ` r0Nˆnnb, Vn`1Bns, (3.2)

where

Qn “ rV1, V2, . . . , Vns P CNˆnnb , and (3.3a)

Tn “ QHnAQn “

»

—

—

—

—

—

–

A1 BH1

B1 A2 BH2

. . .. . .

. . .

Bn´2 An´1 BHn´1

Bn´1 An

fi

ffi

ffi

ffi

ffi

ffi

fl

P Cnnbˆnnb . (3.3b)

Tn “ QHnAQn is the so-called Rayleigh quotient matrix with respect to Kn and it is the

projection of A onto Kn, too. LetΠn “ QnQ

Hn (3.4)

9

Algorithm 3.1 Simple Block Lanczos Process

Given Hermitian A P CNˆN and V0 P CNˆnb with rankpV0q “ nb, this generic blockLanczos process performs a partial tridiagonal reduction on A.

1: perform orthogonalization on given V0 P CNˆnb (rankpV0q “ nb) to obtain V0 “ V1B0

(e.g., via modified Gram-Schmit), where V1 P CNˆnb satisfying V H1 V1 “ Inb

, andB0 P Cnbˆnb ;

2: Z “ AV1, A1 “ V H1 Z;

3: Z “ Z ´ V1A1;4: perform orthogonalization on Z to obtain Z “ V2B1, where V2 P CNˆnb satisfying

V H2 V2 “ Inb

and B1 P Cnbˆnb ;5: for j “ 2 to n do6: Z “ AVj , Aj “ V H

j Z;

7: Z “ Z ´ VjAj ´ Vj´1BHj´1;

8: perform orthogonalization on Z to obtain Z “ Vj`1Bj , Vj`1 P CNˆnb satisfyingV Hj`1Vj`1 “ Inb

and Bj P Cnbˆnb ;9: end for

which is the orthogonal projection onto Kn. In particular Π1 “ Q1QH1 “ V1V

H1 is the

orthogonal projection onto RpV0q “ RpV1q.Basically the block Lanczos method is this block Lanczos process followed by solving

the eigenvalue problem for Tn to obtain approximate eigenpairs for A: any eigenpairpλj , wjq of Tn gives an approximate eigenpair pλj , Qnwjq for A. The number λj is calleda Ritz value and uj :“ Qnwj a Ritz vector.

We introduce the following notation for Tn that will be used in the next two sections:

eigenvalues (also Ritz values): λ1 ě λ2 ě ¨ ¨ ¨ ě λnnb, and

Ω “ diagpλ1, λ2, . . . , λnnbq,

orthonormal eigenvectors: w1, w2, . . . , wnnb, and

W “ rw1, w2, . . . , wnnbs,

eigen-decomposition: TnW “ WΩ and WHW “ Innb,

Ritz vectors: uj “ Qnwj for 1 ď j ď nnb, andrU “ ru1, u2, . . . , unnb

s.

(3.5)

Note the dependency of λj , wj , W on n is suppressed for convenience.As in Saad [21], there is no loss of generality in assuming that all eigenvalues of A are

of multiplicity not exceeding nb. In fact, let Pj be the orthogonal projections onto theeigenspaces corresponding to the distinct eigenvalues of A. Then

U :“à

j

RpPjV0q

is an invariant subspace of A, and A|U, the restriction of A onto U, has the same distincteigenvalues as A and the multiplicity of any distinct eigenvalue of A|U is no bigger than nb.

10

Since KnpA, V0q Ď U, what the block Lanczos method does is essentially to approximatesome of the eigenpairs of A|U.

When nb “ 1, Algorithm 3.1 reduces to the single-vector Lanczos process.

4 Convergence of eigenspaces

Recall Πn in (3.4), and in particular, Π1 “ Q1QH1 “ V1V

H1 . The matrix Xi,k,ℓ defined

by the following lemma obviously depends on nb as well. This dependency is suppressedbecause nb is reserved throughout this article. For the rest of this and the next section,each of i, k, and ℓ will also be reserved for one assignment only: we are considering theith to pi ` nb ´ 1qst eigenpairs of A among which the kth to ℓth eigenvalues may form acluster as in

tλ1tλiλkλℓ

(cluster)

tλi`nb´1tλN

where1 ď i ă n, i ď k ď ℓ ď i ` nb ´ 1.

Recall (1.3). We are interested in bounding

1. the canonical angles from the invariant subspace RpUp:,k:ℓqq to the Krylov subspaceKn ” KnpA, V0q,

2. the canonical angles between the invariant subspace RpUp:,k:ℓqq and spantuk, . . . , uℓu(which we call a Ritz subspace),

3. the differences between the eigenvalues λj and the Ritz values λj for k ď j ď ℓ.

In doing so, we will use the jth Chebyshev polynomial of the first kind Tjptq:

Tjptq “ cospj arccos tq for |t| ď 1, (4.1)

“1

2

„

´

t à

t2 ´ 1¯j

`

´

t à

t2 ´ 1¯´j

ȷ

for t ě 1. (4.2)

It frequently shows up in numerical analysis and computations because of its numerousnice properties, for example |Tjptq| ď 1 for |t| ď 1 and |Tjptq| grows extremely fast2 for|t| ą 1. We will also need [17]

ˇ

ˇ

ˇ

ˇ

Tj

ˆ

1 ` t

1 ´ t

˙ˇ

ˇ

ˇ

ˇ

“

ˇ

ˇ

ˇ

ˇ

Tj

ˆ

t ` 1

t ´ 1

˙ˇ

ˇ

ˇ

ˇ

“1

2

”

∆jt ` ∆´j

t

ı

for 1 ‰ t ą 0, (4.3)

where

∆t :“

?t ` 1

|?t ´ 1|

for t ą 0. (4.4)

2In fact, a result due to Chebyshev himself says that if pptq is a polynomial of degree no bigger than jand |pptq| ď 1 for ´1 ď t ď 1, then |pptq| ď |Tjptq| for any t outside r´1, 1s [4, p.65].

11

In the rest of this section and the entire next section, we will always assume

rankpV H0 Up:,i:i`nb´1qq “ rankpV H

1 Up:,i:i`nb´1qq “ nb, (4.5)

and Xi,k,ℓ P CNˆpℓ´k`1q will have the same assignment to be given in a moment. Consideran application of Proposition 2.4(b) with k1 “ ℓ ´ k ` 1,

X “ RpV0q “ RpV1q, Y “ RpUp:,i:i`nb´1qq, ry1, y2, . . . , yk1s “ ruk, uk`1, . . . , uℓs.

The application yields a unique Xi,k,ℓ :“ rx1, x2, . . . , xk1s such that RpXi,k,ℓq Ď RpV0q and

Up:,i:i`nb´1qUHp:,i:i`nb´1qXi,k,ℓ “ Up:,k:ℓq ” ruk, uk`1, . . . , uℓs. (4.6)

Moreover, by Proposition 2.4(a),‌

‌sinΘpUp:,k:ℓq, Xi,k,ℓq‌

‌ ď‌

‌sinΘpUp:,i:i`nb´1q, V0q‌

‌ , (4.7)‌

‌tanΘpUp:,k:ℓq, Xi,k,ℓq‌

‌ ď‌

‌tanΘpUp:,i:i`nb´1q, V0q‌

‌ . (4.8)

They show that the chosen RpXi,k,ℓq has a significant component in the eigenspace RpUp:,k:ℓqq

of interest if the initial RpV0q has a significant component in the eigenspace RpUp:,i:i`nb´1qq.The idea of picking suchXi,k,ℓ is essentially borrowed from [21, Lemma 4] (see Remark 2.2).

Theorem 4.1. For any unitarily invariant norm ~ ¨ ~, we have

‌

‌tanΘpUp:,k:ℓq,Knq‌

‌ ďξi,k

Tnípκi,ℓ,nbq

‌


‌ , (4.9)

‌

‌

‌sinΘpUp:,k:ℓq, rUp:,k:ℓqq

‌

‌

‌ď γ

‌

‌sinΘpUp:,k:ℓq,Knq‌

‌ (4.10)

ďγ ξi,k

Tnípκi,ℓ,nbq

‌


‌ , (4.11)

where Xi,k,ℓ is defined by (4.6), and3

ξi,k “

i´1ź

j“1

λj ´ λN

λj ´ λk, δi,ℓ,nb

“λℓ ´ λi`nb

λℓ ´ λN, κi,ℓ,nb

“1 ` δi,ℓ,nb

1 ´ δi,ℓ,nb

, (4.12)

γ “ 1 `c

ηΠnApI ´ Πnq2, (4.13)

and the constant c lies between 1 and π2, and it is4 1 for the Frobenius norm or ifλk´1 ą λk, and

η “ minkďjďℓ

păk, or pąℓ

|λj ´ λp|. (4.14)

For the Frobenius norm, γ in (4.13) can be improved to

γ “

d

1 `

ˆ

1

ηΠnApI ´ Πnq2

˙2

. (4.15)

3By convention,ś0

j“1p¨ ¨ ¨ q ” 1.4A by-product of this is that c “ 1 if k “ ℓ.

12

Proof. Write

U ““

i´1 nb Nńbí`1

U1 U2 U3

‰

, Λ “

»

–

i´1 nb Nńbí`1

i´1 Λ1

nb Λ2

Nńbí`1 Λ3

fi

fl, (4.16a)

U2 “ Up:,k:ℓq “ pU2qp:,kí`1:ℓí`1q “ ruk, . . . , uℓs, (4.16b)

Λ2 “ Λpk:ℓ,k:ℓq “ pΛ2qpkí`1:ℓí`1,kí`1:ℓí`1q “ diagpλk, . . . , λℓq. (4.16c)

For convenience, let’s drop the subscripts to Xi,k,ℓ because i, k, ℓ do not change. We have

X “ UUHX “ U1UH1 X ` U2U

H2 X ` U3U

H3 X

“ U1UH1 X ` U2U

H2 X ` U3U

H3 X (4.17)

by (4.6). Let X0 “ XpXHXq´12 which has orthonomal columns. We know that

RpfpAqX0q Ă Kn for any f P Pn´1,

since RpX0q “ RpXq Ď RpV0q. By (4.17),

Y :“ fpAqX0 “ U1fpΛ1qUH1 X0 ` U2fpΛ2qUH

2 X0 ` U3fpΛ3qUH3 X0. (4.18)

By (4.6), UH2 X “ Iℓ´k`1 and thus UH

2 X0 is nonsingular. Now if also fpΛ2q is nonsingular(which is true for the selected f later), then

Y`

UH2 X0

˘´1rfpΛ2qs´1 “ U1fpΛ1qUH

1 X0

`

UH2 X0

˘´1rfpΛ2qs´1

` U2 ` U3fpΛ3qUH3 X0

`

UH2 X0

˘´1rfpΛ2qs´1, (4.19)

and consequently by Proposition 2.1

‌

‌tanΘpU2,Knq‌

‌ ď‌

‌tanΘpU2, Y q‌

‌

“

‌

‌

‌

‌

‌

«

fpΛ1qUH1 X0

`

UH2 X0

˘´1rfpΛ2qs´1

fpΛ3qUH3 X0

`

UH2 X0

˘´1rfpΛ2qs´1

ff‌

‌

‌

‌

‌

“

‌

‌

‌

‌

‌

„

fpΛ1q

fpΛ3q

ȷ

«

UH1 X0

`

UH2 X0

˘´1

UH3 X0

`

UH2 X0

˘´1

ff

rfpΛ2qs´1

‌

‌

‌

‌

‌

ď max1ďjďi´1

i`nbďjďN

|fpλjq| ˆ maxkďjďℓ

1

|fpλjq|ˆ

‌

‌tanΘpU2, X0q‌

‌ . (4.20)

We need to pick an f P Pn´1 to make the right-hand side of (4.20) as small as we can. Tothis end for the case i “ 1, we choose

fptq “ Tn´1

ˆ

2t ´ pλnb`1 ` λN q

λnb`1 ´ λN

˙N

Tn´1pκ1,ℓ,nbq (4.21)

13

for which

minkďjďℓ

fpλjq “ fpλℓq “ 1, maxnb`1ďjďN

|fpλjq| ď1

Tn´1pκ1,ℓ,nbq. (4.22)

This, together with (4.20), concludes the proof of (4.9) for i “ 1.In general for i ą 1, we shall consider polynomials of form

fptq “ pλ1 ´ tq ¨ ¨ ¨ pλi´1 ´ tq ˆ gptq, (4.23)

and search a g P Pní such that maxi`nbďjďN

|gpλjq| is made as small as we can while

minkďjďℓ

|gpλjq| “ gpλℓq “ 1. To this end, we choose

gptq “ Tní

ˆ

2t ´ pλi`nb` λN q

λi`nb´ λN

˙N

Tnípκi,ℓ,nbq, (4.24)

for which

minkďjďℓ

gpλjq “ gpλℓq “ 1, maxi`nbďjďN

|gpλjq| ď1

Tnípκi,ℓ,nbq. (4.25)

This, together with (4.20) and (4.23), concludes the proof of (4.9) for i ą 1.Next we prove (4.10) with an argument influenced by [11]. Recall (4.16). Let QK P

CNˆpNńnbq such that rQn, QKs is unitary, and write

U2 “ QnZ ` QKZK, (4.26)

where Z “ QHn U2, ZK “ QH

KU2. Then

‌

‌cosΘpU2,Knq‌

‌ “ ~Z~ ,‌

‌sinΘpU2,Knq‌

‌ “ ~ZK~ . (4.27)

Keeping in mind that AU2 “ U2Λ2, QHnAQn “ Tn, and TnW “ WΩ from (3.5), we have

QHnA rQn, QKsrQn, QKsHU2 “ QH

n U2Λ2

ñ rTn, QHnAQKs

„

ZZK

ȷ

“ ZΛ2

ñ TnZ ´ ZΛ2 “ ´QHnAQKZK. (4.28)

Similarly to (4.16), partition W and Ω as

W ““

k´1 ℓ´k`1 nnb´ℓ

W1 W2 W3

‰

, Ω “

»

–

k´1 ℓ´k`1 nnb´ℓ

k´1 Ω1

ℓ´k`1 Ω2

nnb´ℓ Ω3

fi

fl,

and set W1,3 :“ rW1,W3s and Ω1,3 :“ Ω1 ‘ Ω3. Multiply (4.28) by WH from the left toget ΩWHZ ´ WHZΛ2 “ ´WHQH

nAQKZK, and thus we have

Ω1,3WH1,3Z ´ WH

1,3ZΛ2 “ ´WH1,3Q

HnAQKZK. (4.29)

14

By Lemma 2.1, we conclude that

‌

‌WH1,3Z

‌

‌ ďc

η

‌

‌WH1,3Q

HnAQKZK

‌

‌ ďc

ηΠnApI ´ Πnq2 ~ZK~ . (4.30)

Let rU1,3 “ QnW1,3. It can be verified that

WH1,3Z “ p rU1,3qHpQnZq “ p rU1,3qHpU2 ´ QKZKq “ p rU1,3qHU2

by (4.26). Therefore

‌

‌

‌sinΘpU2, rUp:,k:ℓqq

‌

‌

‌“

‌

‌

‌r rU1,3, QKsHU2

‌

‌

‌

ď

‌

‌

‌p rU1,3qHU2

‌

‌

‌`

‌

‌QHKU2

‌

‌

“‌

‌WH1,3Z

‌

‌ ` ~ZK~ (4.31)

which, for the Frobenius norm, can be improved to an identity

sinΘpU2, rUp:,k:ℓqqF “

b

WH1,3Z2F ` ZK2F.

The inequality (4.10) is now a consequence of (4.27), (4.30), and (4.31).

Remark 4.1. Although the appearance of three integers, i, k, and ℓ makes Theorem 4.1awkward and more complicated than simply taking i “ k or i`nb ´ 1 “ ℓ, it provides theflexibility when it comes to apply (4.9) with balanced ξi,k (which should be made as smallas possible) and δi,ℓ,nb

(which should be made as large as possible). In fact, for given kand ℓ, both ξi,k and δi,ℓ,nb

increase with i. But the right-hand side of (4.9) increases asξi,k increases and decreases (rapidly) as δi,ℓ,nb

increases. So we would like to make ξi,k assmall as we can and δi,ℓ,nb

as large as we can. In particular, if k ď ℓ ď nb, one can alwayspick i “ 1 so that (4.9) gets used with ξ1,k “ 1; but then if δ1,ℓ,nb

is tiny, (4.9) is betterused with some i ą 1. A general guideline is to make sure tλju

ℓj“k is a cluster and the

rest of λj are relatively far away.

5 Convergence of eigenvalues

In this section, we will bound the differences between the eigenvalues λj and the Ritzvalues λj for k ď j ď ℓ.

Theorem 5.1. Let k “ i. For any unitarily invariant norm, we have

‌

‌

‌diagpλi ´ λi, λi`1 ´ λi`1, . . . , λℓ ´ λℓq

‌

‌

‌

ď pλi ´ λN q

„

ζiTnípκi,ℓ,nb

q

ȷ2‌

‌tan2ΘpUp:,k:ℓq, Xi,k,ℓq‌

‌ , (5.1)

15

where κi,ℓ,nbis the same as the one in (4.12), and

ζi “ maxi`1ďpďN

i´1ź

j“1

ˇ

ˇ

ˇ

ˇ

ˇ

λj ´ λp

λj ´ λi

ˇ

ˇ

ˇ

ˇ

ˇ

. (5.2)

In particular, if also λi´1 ě λi, then

ζi “

i´1ź

j“1

ˇ

ˇ

ˇ

ˇ

ˇ

λj ´ λN

λj ´ λi

ˇ

ˇ

ˇ

ˇ

ˇ

. (5.3)

Proof. Upon shifting A by λiI to A´λiIN , we may assume λi “ 0. Doing so doesn’t changethe Krylov subspace KnpA, V0q “ KnpA´λiI, V0q and doesn’t change any eigenvector andany Ritz vector of A, but it does shift all eigenvalues and Ritz values of A by the sameamount, i.e., λi, and thus all differences λp ´ λj and λp ´ λj remain unchanged. Supposeλi “ 0 and thus λi´1 ě λi “ 0 ě λi`1.

Recall (3.5), and adopt the proof of Theorem 4.1 up to (4.19). Take f as

fptq “ pλ1 ´ tq ¨ ¨ ¨ pλi´1 ´ tq ˆ gptq, (5.4)

where g P Pní. We claim Y HQnwj “ 0 for 1 ď j ď i ´ 1. This is because Y can be

represented by Y “ pA ´ λjIqpY for some matrix pY P CNˆpℓí`1q with RppY q Ď Kn which

further implies pY “ QnY for some matrix Y P Cnnbˆpℓí`1q. Thus Y “ pA ´ λjIqQnYand

Y HQnwj “ Y HQHn pA ´ λjIqQnwj “ Y HpTn ´ λjIqwj “ 0.

SetY0 “ Y

`

UH2 X0

˘´1rfpΛ2qs´1 “ U1R1 ` U2 ` U3R3, (5.5)

where Rj “ fpΛjqUHj X0

`

UH2 X0

˘´1rfpΛ2qs´1.

Let Z “ Y0pY H0 Y0q´12, and denote by µ1 ě ¨ ¨ ¨ ě µℓí`1 the eigenvalues of ZHAZ

which depends on f in (5.4) to be determined for best error bounds. Note

ZHQnrw1, . . . , wi´1s “ 0, RpZq Ď Kn, ZHZ “ Iℓí`1. (5.6)

Write Z “ QnpZ because RpZq Ď Kn, where pZ has orthonormal columns. Apply Cauchy’s

interlace inequalities to ZHAZ “ pZHpQHnAQnq pZ and QH

nAQn to get λi`j´1 ě µj for1 ď j ď ℓ ´ i ` 1, and thus

0 ď λi`j´1 ´ λi`j´1 ď λi`j´1 ´ µj for 1 ď j ď ℓ ´ i ` 1. (5.7)

In particular, this implies µj ď λi`j´1 ď λi ď 0 and consequently Y H0 AY0 is negative

semidefinite. Therefore for any nonzero vector y P Cℓí`1,

0 ě yHY H0 AY0y “ yHRH

1 Λ1R1y ` yHΛ2y ` yHRH3 Λ3R3y

ě yHΛ2y ` yHRH3 Λ3R3y,

16

yHY H0 Y0y “ yHRH

1 R1y ` yHy ` yHRH3 R3y

ě yHy,

where we have used yHRH1 Λ1R1y ě 0, yHRH

1 R1y ě 0, and yHRH3 R3y ě 0. Therefore

0 ěyHY H

0 AY0y

yHY H0 Y0y

ěyHY H

0 AY0y

yHyě

yHpΛ2 ` RH3 Λ3R3qy

yHy. (5.8)

Denote by µ1 ě ¨ ¨ ¨ ě µℓí`1 the eigenvalues of Λ2 `RH3 Λ3R3. By (5.8), we know µj ě µj

for 1 ď j ď ℓ ´ i ` 1 which, together with (5.7), lead to

0 ď λi`j´1 ´ λi`j´1 ď λi`j´1 ´ µj for 1 ď j ď ℓ ´ i ` 1. (5.9)

Hence for any unitarily invariant norm [16, 22]

‌

‌


‌

‌

‌ď ~diagpλi ´ µ1, λi`1 ´ µ2, . . . , λℓ ´ µℓí`1q~

ď‌

‌RH3 Λ3R3

‌

‌

ď pλi ´ λN q‌

‌RH3 R3

‌

‌ , (5.10)

where the last inequality is true because 1) RH3 Λ3R3 is negative semi-definite, 2) we shifted

A by λiI, and 3) the jth largest eigenvalue of RH3 pλiI´Λ3qR3 which is positive semi-definite

is bounded by the jth largest eigenvalue of pλi ´ λN qRH3 R3.

Denote by σj (in descending order) for 1 ď j ď ℓ ´ i ` 1 the singular values of

R3 “ fpΛ3qUH3 X0

`

UH2 X0

˘´1rfpΛ2qs´1, and by σj (in descending order) for 1 ď j ď ℓí`1

the singular values of UH3 X0

`

UH2 X0

˘´1. Then σj is less than or equal to the jth largest

singular value of

«

UH1 X0

`

UH2 X0

˘´1

UH3 X0

`

UH2 X0

˘´1

ff

, which is tan θjpU2, Xq. We have

σj ď maxi`nbďjďN

|fpλjq| ˆ maxiďjďℓ

1

|fpλjq|ˆ σj

ď maxi`nbďjďN

|fpλjq| ˆ maxiďjďℓ

1

|fpλjq|ˆ tan θjpU2, Xq. (5.11)

What remains to be done is to pick f P Pn´1 to make the right-hand side of (5.11) assmall as we can.

For the case of i “ 1, we choose f as in (4.21), and then (4.22) holds. Finally combining(5.10) and (5.11) with (4.22) leads to (5.1) for the case i “ 1.

In general for i ą 1, we take fptq as in (5.4) with gptq given by (4.24) satisfying (4.25),together again with (5.10) and (5.11), to conclude the proof.

Remark 5.1. In Theorem 5.1, k “ i always, unlike lin Theorem 4.1, because we need thefirst equation in (5.6) for our proof to work.

17

6 A Comparison with the existing results

The existing results related to our results in the previous two sections include those byKaniel [12], Paige [19], and Saad [21] (see also Parlett [20]). The most complete andstrongest ones are in Saad [21].

In comparing ours with Saad’s results, the major difference is that ours are of theeigenspace/eigenvalue cluster type while Saad’s results are of the single-vector/eigenvaluetype. When specialized to an individual eigenvector/eigenvalue, our results reduce to thoseof Saad. Specifically, for k “ ℓ “ i the inequality (4.9) becomes Saad [21, Theorem 5] andTheorem 5.1 becomes Saad [21, Theorem 6]. Certain parts of our proofs bear similaritiesto Saad’s proofs for the block case, but there are subtleties in our proofs that cannot behandled in a straightforward way following Saad’s proofs.

It is well-known [7] that eigenvectors associated with eigenvalues in a tight cluster aresensitive to perturbations/rounding errors while the whole invariant subspace associatedwith the cluster is not. Therefore it is natural to treat the entire invariant subspace as awhole, instead of each individual eigenvectors in the invariant subspace separately.

In what follows, we will perform a brief theoretical comparison between our results andthose of Saad, and point out when Saad’s bounds may be too large. Numerical examplesin the next section support this comparison.

As mentioned, Saad’s bounds are of the single-vector/eigenvalue type. So a direct com-parison cannot be done. But it is possible to derive some bounds for eigenspaces/eigenvalueclusters from Saad’s bounds, except that these derived bounds are less elegant and (likely)less sharp (which we will demonstrate numerically in section 7). One possible derivationbased on [21, Theorem 5] may be as follows. By Proposition 2.2,

maxkďjďℓ

sin θpuj ,Knq ď‌


‌ ď

ℓÿ

j“k

sin θpuj ,Knq, (6.1)

maxkďjďℓ

sin θpuj ,Knq ď sinΘpUp:,k:ℓq,KnqF ď

g

f

f

e

ℓÿ

j“k

sin2 θpuj ,Knq . (6.2)

These inequalities imply that the largest sin θpuj ,Knq is comparable to sinΘpUp:,k:ℓq,Knq,and thus finding good bounds for all sin θpuj ,Knq is comparably equivalent to finding agood bound for sinΘpUp:,k:ℓq,Knq.

The right-most sides of (6.1) and (6.2) can be bounded, using [21, Theorem 5] (i.e.,the inequality (4.9) for the case k “ ℓ “ i). In the notation of Theorem 4.1, we have, fork ď j ď ℓ,

tan θpuj ,Knq ďξj,j

Tn´jpκj,j,nbqtan θpuj , Xj,j,jq (6.3)

and use sin θ ď tan θ to get

‌


‌ ď

ℓÿ

j“k

ξj,jTn´jpκj,j,nb

qtan θpuj , Xj,j,jq, (6.4)

18

sinΘpUp:,k:ℓq,KnqF ď

g

f

f

e

ℓÿ

j“k

„

ξj,jTn´jpκj,j,nb

qtan θpuj , Xj,j,jq

ȷ2

, (6.5)

where Xj,j,j P CN are as defined right before Theorem 4.1, and κj,j,nbare as defined in

Theorem 4.1. If θ1pUp:,k:ℓq,Knq is not too close to π2 which we will assume, the left-handsides of (4.9), (6.4), and (6.5) are comparable by Proposition 2.2, but there isn’t any easyway to compare their right-hand sides. Nevertheless, we argue that the right-hand side of(4.9) is preferable. First it is much simpler; Second, it is potentially much sharper for tworeasons:

1. One or more ξj,j for k ď j ď ℓ in (6.4) and (6.5) may be much bigger than ξi,k in(4.9).

2. By Proposition 2.3, ΘpUp:,k:ℓq, rXk,k,k, . . . , Xℓ,ℓ,ℓsq can be bounded in terms of theangles in tθpuj , Xj,j,jq, k ď j ď ℓu but not the other way around, i.e., togethertθpuj , Xj,j,jq, k ď j ď ℓu cannot be bounded by something in terms of

ΘpUp:,k:ℓq, rXk,k,k, . . . , Xℓ,ℓ,ℓsq

in general as we argued in Remark 2.1.

For bounding errors between eigenvectors uj and Ritz vectors uj , the following inequal-ity was established in [21, Theorem 3] (which is also true for the block Lanczos methodas commented there [21, p.703]):

sin θpuj , ujq ď χj sin θpuj ,Knq with χj “

g

f

f

e1 `

˜

ΠnApI ´ Πnq2

minp‰j |λj ´ λp|

¸2

. (6.6)

This inequality can grossly overestimate sin θpuj , ujq even with the “exact” (i.e., computed)sin θpuj ,Knq for the ones associated with a cluster of eigenvalues due to possibly extremelytiny gap minp‰j |λj ´ λp|, not to mention after using (6.3) to bound sin θpuj ,Knq. ByProposition 2.3, we have

sinΘpUp:,k:ℓq, rUp:,k:ℓqqF ď

g

f

f

e

ℓÿ

j“k

sin2 θpuj , ujq

ď

g

f

f

e

ℓÿ

j“k

χ2j sin2 θpuj ,Knq (6.7)

ď

g

f

f

e

ℓÿ

j“k

„

χjξj,jTn´jpκj,j,nb


ȷ2

. (6.8)

Unlike previously we claim finding good bounds for all sin θpuj ,Knq and finding a goodbound for sinΘpUp:,k:ℓq,Knq are comparably equivalent due to (6.1) and (6.2), inher-ently any bound derived from bounds for all sin2 θpuj , ujq likely very much overestimates

19

sinΘpUp:,k:ℓq, rUp:,k:ℓqq2F because there is no simply way to boundřℓ

j“k sin2 θpuj , ujq in

terms of sinΘpUp:,k:ℓq, rUp:,k:ℓqq2F, i.e., the former may already be much bigger than thelatter, as we argued in Remark 2.1. So we anticipate the bounds of (6.7) and (6.8) to bebad when λj with k ď j ď ℓ form a tight cluster.

Saad [21, Theorem 6] provides a bound on λj ´ λj individually. The theorem is thesame as our Theorem 5.1 for the case k “ ℓ “ i. Concerning the eigenvalue clustertλk, . . . , λℓu, Saad’s theorem gives

g

f

f

e

ℓÿ

j“k

pλj ´ λjq2 ď

g

f

f

e

ℓÿ

j“k

pλj ´ λN q2

„

ζjTn´jpκj,j,nb


ȷ4

. (6.9)

This bound, too, could be much bigger than the one of (5.1) because of one or more ζjwith k “ i ď j ď ℓ are much bigger than ζi.

7 Numerical examples

In this section, we shall numerically test the effectiveness of our upper bounds on theconvergence of the block Lanczos method in the case of a cluster. In particular, we willdemonstrate that our upper bounds are preferable to those of the single-vector/eigenvaluetype of Saad [21], especially in the case of a tight cluster. Specifically,

(a) the subspace angle ΘpUp:,k:ℓq, Xi,k,ℓq used in our bounds is more reliable than theindividual angles in tθpuj , Xj,j,jq, k ď j ď ℓu together to bound ΘpUp:,k:ℓq,Knq (seeremarks in section 6), and

(b) our upper bounds are much sharper than those derived from Saad’s bounds in thepresence of tight clusters of eigenvalues.

We point out that the worst individual bound of (6.3) or (6.6) or for λj ´ λj is at the samemagnitude as the derived bound of (6.5) or (6.8) or (6.9), respectively. So if a derivedbound is bad, the worst corresponding individual bound cannot be much better. For thisreason, we shall focus on comparing our bounds to the derived bounds in section 6.

Our first example below is intended to illustrate the first point (a), while the secondexample supports the second point (b). The third example is essentially taken from [21],again to show the effectiveness of our upper bounds.

We implemented the block Lanczos method within MATLAB, with full reorthogonal-ization to simulate the block Lanczos method in exact arithmetic. This is the best onecan do in a floating point environment. In our tests, without loss of generality, we take

A “ diagpλ1, . . . , λN q

with special eigenvalue distributions to be specified later. Although we stated our theoremsin unitarily invariant norms for generality, our numerical results are presented in termsof the Frobenius norm for easy understanding (and thus for Theorem 4.1, γ by (4.15) isused). No breakdown was encountered during all Lanczos iterations.

20

Table 7.1: Example 7.1: N “ 600, n “ 20, i “ k “ 1, ℓ “ 3, nb “ 3, and V0 as in (7.2)

b

řℓj“k |λj ´ λj |2 sinΘpUp:,k:ℓq, rUp:,k:ℓqqF sinΘpUp:,k:ℓq,KnqF

observed bound of Saad bound observed bound of Saad bound observed bound of Saad bound(5.1) of (6.9) (4.11) of (6.8) (4.9) of (6.5)

1.9 ˆ 10´14 4.4 ˆ 10´14 4.2 ˆ 10´3 3.5 ˆ 10´8 1.3 ˆ 10´7 6.8 ˆ 10´2 3.3 ˆ 10´8 1.0 ˆ 10´7 2.5 ˆ 10´2

We will measure the following errors: (in all examples, k “ i “ 1 and ℓ “ nb “ 3)

ϵ1 “

g

f

f

e

ℓÿ

j“k

|λj ´ λj |2, (7.1a)

ϵ2 “ sinΘpUp:,k:ℓq, rUp:,k:ℓqqF, (7.1b)

ϵ3 “ sinΘpUp:,k:ℓq,KnqF, (7.1c)

ϵ4 “ tanΘpUp:,k:ℓq,KnqF, (7.1d)

for their numerically observed values (considered as “exact”), bounds of Theorems 4.1and 5.1, and derived bounds of (6.9), (6.8) and (6.5) considered as “Saad’s bounds” forcomparison purpose. Rigorously speaking, these are not Saad’s bounds.

For very tiny θ1pUp:,k:ℓq,Knq, ϵ3 « ϵ4 since

ϵ3 ď ϵ4 ď ϵ3

b

1 ´ sin2 θ1pUp:,k:ℓq,Knq .

Indeed in the examples that follows, the difference between ϵ3 and ϵ4 is so tiny that wecan safely ignore their difference. Therefore we will be focusing on ϵ1, ϵ2, and ϵ3, but notϵ4.

Example 7.1. We take N “ 600, the number of Lanczos steps n “ 20, and

λ1 “ 3.5, λ2 “ 3, λ3 “ 2.5, λj “ 1 ´5j

N, j “ 4, . . . , N,

and set i “ k “ 1, ℓ “ 3 and nb “ 3. There are two eigenvalue clusters: tλ1, λ2, λ3u andtλ4, . . . , λNu. We are seeking approximations related to the first cluster tλ1, λ2, λ3u. Thegap 0.5 between eigenvalues within the first cluster is to ensure that our investigation forour point (a) will not be affected too much by the eigenvalue closeness in the cluster. Theeffect of the closeness is, however, the subject of Example 7.2.

Our first test run is with a special V0 given by

V0 “

»

—

—

—

—

—

—

—

–

1 0 00 1 00 0 11N sin 1 cos 1...

......

NńbN sinpN ´ nbq cospN ´ nbq

fi

ffi

ffi

ffi

ffi

ffi

ffi

ffi

fl

. (7.2)

21

Table 7.2: Example 7.1: averages over 20 random V0

b



1.2 ˆ 10´14 2.5 ˆ 10´13 8.2 ˆ 10´8 5.9 ˆ 10´8 2.5 ˆ 10´7 2.7 ˆ 10´4 5.6 ˆ 10´8 2.0 ˆ 10´7 9.9 ˆ 10´5

Table 7.1 reports ϵ1, ϵ2, and ϵ3 and their bounds. Our bounds are very sharp – comparableto the observed ones, but the ones by (6.9), (6.8) and (6.5) overestimate the true errorstoo much to be of much use. Looking at (6.5) carefully, we find two contributing factorsthat make the resulting bound too big. The first contributing factor is the constants

ξ1,1 “ 1, ξ2,2 “ 15, ξ3,3 “ 105. (7.3)

The second contributing factor is the angles θpuj , Xj,j,jq for k ď j ď ℓ:

j 1 2 3

θjpUp:,k:ℓq, Xi,k,ℓq 1.51299 1.51298 1.49976

θpuj , Xj,j,jq 1.49976 1.57066 1.57069

What we see is that the canonical angles θjpUp:,k:ℓq, Xi,k,ℓq are bounded away from π2but the last two of the angles θpuj , Xj,j,jq are nearly π2 “ 1.57080. This of course hassomething to do with the particular initial V0 in (7.2). But given the exact eigenvectorsare e1, e2, e3, this V0 should not be considered a deliberate choice so as to simply makeour bounds look good.

Similar reasons explain why the upper bounds of (6.9) and (6.8) are poor as well. Infact, now the first contributing factor is

ζ1 “ 1, ζ2 “ 15 ´ 1.5 ˆ 10´13 « 15, ζ3 “ 105 ´ 3.8 ˆ 10´12 « 105, (7.4)

χ1 « χ2 « χ3 « 2.6860, (7.5)

and then again the set of angles θpuj , Xj,j,jq for k ď j ď ℓ is the second contributingfactor.

Next we test random initial V0 as generated by randnpN,nbq in MATLAB. Table 7.2reports the averages of the same errors/bounds as reported before over 20 test runs. Thebounds of (6.9), (6.8) and (6.5) are much better now, but still about 1000 times biggerthan ours. The following table displays the canonical angles θjpUp:,k:ℓq, Xi,k,ℓq as well asθpuj , Xj,j,jq.

j 1 2 3

θjpUp:,k:ℓq, Xi,k,ℓq 1.5497 1.5183 1.4712

θpuj , Xj,j,jq 1.5481 1.5359 1.5281

22

Table 7.3: Example 7.2: averages over 20 random V0

b


δ observed bound of Saad bound observed bound of Saad bound observed bound of Saad bound

(5.1) of (6.9) (4.11) of (6.8) (4.9) of (6.5)

10´1 5.6 ˆ 10´15 2.6 ˆ 10´13 2.6 ˆ 10´6 7.3 ˆ 10´8 3.8 ˆ 10´7 8.9 ˆ 10´3 6.6 ˆ 10´8 2.2 ˆ 10´7 6.7 ˆ 10´4

10´2 1.8 ˆ 10´13 5.4 ˆ 10´12 8.9 ˆ 10´3 4.1 ˆ 10´7 1.5 ˆ 10´6 4.9 ˆ 10`0 3.8 ˆ 10´7 3.8 ˆ 10´7 3.8 ˆ 10´2

10´3 6.1 ˆ 10´15 2.5 ˆ 10´14 6.1 ˆ 10`2 3.4 ˆ 10´8 1.2 ˆ 10´7 4.1 ˆ 10`3 3.2 ˆ 10´8 7.2 ˆ 10´8 3.2 ˆ 10`1

10´4 2.3 ˆ 10´14 5.1 ˆ 10´14 1.7 ˆ 10`6 4.7 ˆ 10´8 1.6 ˆ 10´7 6.7 ˆ 10`6 4.3 ˆ 10´8 1.0 ˆ 10´7 5.3 ˆ 10`2

10´5 4.0 ˆ 10´15 3.7 ˆ 10´14 6.2 ˆ 10`10 4.9 ˆ 10´8 1.5 ˆ 10´7 1.3 ˆ 10`10 4.5 ˆ 10´8 9.7 ˆ 10´8 1.0 ˆ 10`5

It shows that the randomness in V0 makes none of θpuj , Xj,j,jq for k ď j ď ℓ as close toπ2 as V0 in (7.2) does. In fact, they are about at the same level as the canonical anglesθjpUp:,k:ℓq, Xi,k,ℓq. Therefore, the sole contributing factor that makes the bounds of (6.9),(6.8), and (6.5), worse than ours are the ξ’s and ζ’s in (7.3) and (7.4).

Example 7.2. Let N “ 1000, n “ 25, and

λ1 “ 2 ` δ, λ2 “ 2, λ3 “ 2 ´ δ, λj “ 1 ´5j

N, j “ 4, . . . , N,

and again set i “ k “ 1, ℓ “ 3, nb “ 3. We will adjust the parameter δ ą 0 to control thetightness among eigenvalues within the cluster tλ1, λ2, λ3u and to see how it affects theupper bounds and the convergence rate of the block Lanczos method. We will demonstratethat our bounds which are of the eigenspace/eigenvalue cluster type are insensitive to δand barely change as δ goes to 0, while “Saad’s bounds” are very sensitive and quicklybecome useless as δ goes to 0. We randomly generate initial V0 and investigate the aver-age errors/bounds over 20 test runs. Since the randomness will reduce the difference incontributions by ΘpUp:,k:ℓq, Xi,k,ℓq and by θpuj , Xj,j,jq as we have seen in Example 7.1, thegap δ within the cluster and the gap between the cluster and the rest of the eigenvaluesare the only contributing factor for approximation errors ϵi.

Table 7.3 reports the averages of the errors defined in (7.1) and the averages of theirbounds of Theorems 4.1 and 5.1 and “Saad’s bounds” of (6.9), (6.8) and (6.5) over 20 testruns. From the table, we observed that

(i) all of our bounds which are of the eigenspace/eigenvalue cluster type are rathersharp – comparable to the observed (“exact”) errors, and moreover, they seem to beindependent of the parameter δ as it becomes tinier and tinier;

(ii) as δ gets tinier and tinier, the “Saad’s bounds” of (6.9), (6.8), and (6.5) increasedramatically and quickly contain no useful information for δ “ 10´3 or smaller.

To explain the observation (ii), we first note that ξj,j in (6.5) are given by

ξ1,1 “ 1, ξ2,2 “ 1 `6

δ, and ξ3,3 “

6

δ`

36

δ2.

23

10−12

10−10

10−8

10−6

10−4

10−2

100

10−8

10−7

10−6

10−5

10−4

10−3

10−2

δ

Observed ε5, observed ε

2, and our bound on ε

2

observed ε

5

observed ε2

bound on ε2

10−12

10−10

10−8

10−6

10−4

10−2

100

10−10

10−5

100

105

1010

1015

1020

1025

1030

δ

Observed ε5 and its Saad bounds

observed ε

5

bound β1

bound β2

Figure 7.1: Observed ϵ5 deteriorates as δ goes to 0 while ϵ2 seems to remain unchanged inmagnitude. “Saad’s bounds” on ϵ5 are not included in the left plot in order not to obscurethe radical difference between ϵ2 and ϵ5 for tiny δ.

They grow uncontrollably to 8 as δ goes to 0. Therefore, the “Saad’s bound” of (6.5) isabout Opδ´2q for tiny δ. Similarly, since ξj,j and ζj are almost of the same level, it canbe seen from (6.9) that the “Saad’s bound” for ϵ1 is about Opδ´4q. For (6.8), when n ismoderate, χj is about Opδ´1q, and therefore, “Saad’s bound” for ϵ2 is about Opδ´3q fortiny δ.

We argued in section 6 that Saad’s bound on θpuj , ujq can be poor in a tight cluster ofeigenvalues. Practically, it is also unreasonable to expect it to go as tiny as the machine’sunit roundoff u as the number of Lanczos steps increases. For this example, by the Davis-Kahan sin θ theorem [7] (see also [22]), we should expect, for 1 ď j ď 3,

(observed) sin θpuj , ujq “ O(Lanczos approximation error) ` Opuδq,

where Opuδq is due to machine’s roundoff and can dominate the Lanczos approximationerror after certain number of Lanczos steps. To illustrate this point, we plot, in Figure 7.1,

ϵ5 “

g

f

f

e

ℓÿ

j“k

sin2 θpuj , ujq

as δ varies from 10´1 down to 10´11. Also plotted are its two upper bounds β1 and β2 of(6.7) and (6.8)

ϵ5 ď β1 :“

g

f

f

e

ℓÿ

j“k

χ2j sin

2 θpuj ,Knq

ď β2 :“

g

f

f

e

ℓÿ

j“k

„

χjξj,jTn´jpκj,j,nb


ȷ2

,

24

Table 7.4: Example 7.3: N “ 900, n “ 12, i “ k “ 1, ℓ “ 3, nb “ 3

b



9.4 ˆ 10´10 1.5 ˆ 10´8 4.8 ˆ 10´4 3.9 ˆ 10´5 1.3 ˆ 10´4 3.0 ˆ 10´2 3.7 ˆ 10´5 1.1 ˆ 10´4 1.9 ˆ 10´2

as well as the observed values of ϵ2 “ sinΘpUp:,k:ℓq, rUp:,k:ℓqqF defined in (7.1b) and itsupper bounds of (4.11). From the figure, we see that the observed ϵ5 is about 10´7 forδ ě 10´7 and then starts to deteriorate from about 10´7 up to 10´3 as δ goes down to10´8 or smaller. At the same time, the magnitudes of ϵ2 and its bound of (4.11) remainunchanged around 10´7. This supports our point that one should measure the convergenceof the entire invariant subspace corresponding to tightly clustered eigenvalues rather thantheir each individual eigenvector within the subspace.

Example 7.3. Our last example is from [21]:

λ1 “ 2, λ2 “ 1.6, λ3 “ 1.4, λj “ 1 ´j ´ 3

N, j “ 4, . . . , N,

V0 “

»

–

1 1 1 ¨ ¨ ¨ 1 1 11 0 ´1 ¨ ¨ ¨ 1 0 ´11 ´2 1 ¨ ¨ ¨ 1 ´2 1

fi

fl

H

.

Since the first three eigenvalues are in a cluster, we take i “ k “ 1, ℓ “ 3 and nb “ 3.Saad [21] tested this problem for N “ 60 and n “ 12. We ran our code with various N ,including N “ 60 and saw little variations in observed errors and their bounds. What wereport in Table 7.4 is for N “ 900 and n “ 12. The table shows that our bounds verymuch agree with the corresponding observed values but “Saad’s bounds” are much bigger— about the square roots of the observed values and our bounds. This is due mainly tothe small gap between λ2 and λ3 since we observed that θjpUp:,k:ℓq, Xi,k,ℓq « θpuj , Xj,j,jq «

1.51303 for j “ 1, 2, 3.

8 More bounds

The numerical examples in the previous section indicate that ξi,k in Theorem 4.1 and ζiin Theorem 5.1 can worsen the error bounds, especially in the case of tightly clusteredeigenvalues. However, both can be made 1 if only the first nb eigenpairs are considered (seeRemark 4.1). To use Theorems 4.1 or 5.1 for eigenvalues/eigenvectors involving eigenpairsbeyond the first nb ones, we have to have ξi,k or ζi in the mix. Thus potentially theresulting bounds may overestimate too much to be indicative.

Another situation in which the bounds of Theorems 4.1 or 5.1 would overestimate theactual quantities (by too much) is when there are clustered eigenvalues with cluster sizesbigger than nb, because then δi,ℓ,nb

is very tiny for any choices of i, k, and ℓ. Ye [24]

25

proposed an adaptive strategy to use variable nb aiming at updating nb adaptively so thatit is bigger than or equal to the biggest cluster size of interest.

In what follows, we propose more error bounds with ξi,k and ζi always 1. However, itrequires the knowledge of the canonical angles from the interested eigenspace to KipA, V0q,where i ă n is a positive integer. Roughly speaking, the new results show that if theeigenspace is not too far from KipA, V0q for some i ă n (in the sense that no canonicalangle is too near π2), the canonical angles from the eigenspace to KnpA, V0q are reducedby a factor purely depending upon the optimal polynomial reduction.

To proceed, we let i ă n. Now we are considering the 1st to pinbqth eigenpairs of Aamong which the kth to ℓth eigenvalues may form a cluster as in

tλ1λkλℓ

(cluster)tλinbtλN

where1 ď k ă ℓ ď inb, 1 ď i ă n.

Suppose θ1pUp:,1:inbq, Qiq “ θ1pUp:,1:inbq,KipA, V0qq ă π2, i.e.,

rankpQHi Up:,1:inbqq “ inb. (8.1)

Consider an application of Proposition 2.4(b) with k1 “ ℓ ´ k ` 1,

X “ RpQiq, Y “ RpUp:,1:inbqq, ry1, y2, . . . , yk1s “ ruk, uk`1, . . . , uℓs.

The application yields a unique Zi,k,ℓ :“ rx1, x2, . . . , xk1s such that RpZi,k,ℓq Ď RpQiq and

Up:,1:inbqUHp:,1:inbqZi,k,ℓ “ Up:,k:ℓq ” ruk, uk`1, . . . , uℓs. (8.2)

Moreover

‌

‌sinΘpUp:,k:ℓq, Zi,k,ℓq‌

‌ ď‌

‌sinΘpUp:,1:inbq, Qiq‌

‌ , (8.3)‌

‌tanΘpUp:,k:ℓq, Zi,k,ℓq‌

‌ ď‌

‌tanΘpUp:,1:inbq, Qiq‌

‌ . (8.4)

Finally, we observe thatKní`1pA,Qiq “ KnpA, V0q. (8.5)

The rest is the straightforward application of Theorems 4.1 and 5.1 to Kní`1pA,Qiq.

Theorem 8.1. For any unitarily invariant norm ~ ¨ ~, we have

‌

‌tanΘpUp:,k:ℓq,Knq‌

‌ ď1

Tnípκi,ℓ,nbq

‌


‌ , (8.6)

‌

‌

‌sinΘpUp:,k:ℓq, rUp:,k:ℓqq

‌

‌

‌ď γ

‌


‌ (8.7)

ďγ

Tnípκi,ℓ,nbq

‌


‌ , (8.8)

26

where Zi,k,ℓ is defined by (8.2),

δi,ℓ,nb“

λℓ ´ λinb`1

λℓ ´ λN, κi,ℓ,nb

“1 ` δi,ℓ,nb

1 ´ δi,ℓ,nb

, (8.9)

and γ takes the same form as in (4.13) with, again, the constant c lying between 1 andπ2 and being 1 for the Frobenius norm or if λk´1 ą λk, and with η being the same as theone in (4.14). For the Frobenius norm, γ can be improved to the one in (4.15).

Theorem 8.2. Let k “ 1. For any unitarily invariant norm, we have

‌

‌

‌diagpλ1 ´ λ1, λ2 ´ λ2, . . . , λℓ ´ λℓq

‌

‌

‌

ď pλ1 ´ λN q

„

1

Tnípκi,ℓ,nbq

ȷ2‌

‌tan2ΘpUp:,1:ℓq, Zi,1,ℓq‌

‌ , (8.10)

where κi,ℓ,nbis the same as the one in (8.9).

9 Generalized eigenvalue problem

Consider the generalized eigenvalue problem for A ´ λM , where A and M are N ˆ NHermitian and M is positive definite. Notice this eigenvalue problem is equivalent to thestandard Hermitian eigenvalue problem for

pA :“ M´12AM´12. (9.1)

A (block) Lanczos process for A´λM can essentially be obtained by symbolically writingout the process for pA and then translating it into one without the matrix square root [20].

Take Algorithm 3.1 for an example, and adopt the convention that each notationsymbol for pA always has a hat. The key mathematical step of Algorithm 3.1 applied to pAis the following equation

pVj`1pBj “ pApVj ´ pVj

pAj ´ pVj´1pBHj´1, (9.2)

implemented at Lines 4 to 8 there, where pVj has orthonormal columns. Define

Vj :“ M´12pVj , Aj :“ pAj , Bj :“ pBj . (9.3)

Then Vj has M -orthonormal columns, i.e, V Hj MVj “ Inb

. Multiply (9.2) by M12 fromthe left to get

MVj`1Bj “ AVj ´ MVjAj ´ MVj´1BHj´1 (9.4)

which will be the key equation that yields the following generic block Lanczos process forA ´ λM . In the generic situation, i.e., when all Z in Lines 4 and 8 have full column ranknb, this process produces

Qn “ rV1, V2, . . . , Vns P CNˆnnb , QHnMQn “ Innb

, (9.5a)

27

Algorithm 9.1 Simple Block Lanczos Process for A ´ λM

Given Hermitian pencil A ´ λM P CNˆN with positive definite M and V0 P CNˆnb withrankpV0q “ nb, this generic block Lanczos process performs a partial tridiagonal reductionon A ´ λM .

1: perform M -orthogonalization on given V0 P CNˆnb (rankpV0q “ nb) to obtain V0 “

V1B0 (e.g., via modified Gram-Schmit), where V1 P CNˆnb satisfying V H1 MV1 “ Inb

and B0 P Cnbˆnb ;2: Y “ AV1, A1 “ V H

1 Y ;3: Y “ Y ´ MV1A1; solve MZ “ Y for Z;4: perform M -orthogonalization on Z to obtain Z “ V2B1, where V2 P CNˆnb satisfying

V H2 MV2 “ Inb

and B1 P Cnbˆnb ;5: for j “ 2 to n do6: Y “ AVj , Aj “ V H

j Y ;

7: Y “ Y ´ MVjAj ´ MVj´1BHj´1, and solve MZ “ Y for Z;

8: perform M -orthogonalization on Z to obtain Z “ Vj`1Bj , Vj`1 P CNˆnb satisfyingV Hj`1MVj`1 “ Inb

and Bj P Cnbˆnb ;9: end for

Tn “ QHnAQn “

»

—

—

—

—

—

–

A1 BH1

B1 A2 BH2

. . .. . .

. . .

Bn´2 An´1 BHn´1

Bn´1 An

fi

ffi

ffi

ffi

ffi

ffi

fl

P Cnnbˆnnb (9.5b)

that are related by the following fundamental equation

AQn “ MQnTn ` r0Nˆnnb,MVn`1Bns. (9.6)

As for the standard eigenvalue problem, the block Lanczos method for A ´ λM is thisLanczos process followed by solving the eigenvalue problem for Tn which is Hermitianto obtain approximating eigenpairs for A ´ λM : any eigenpair pλj , wjq of Tn gives anapproximating eigenpair pλj , Qnwjq for A ´ λM . These λj again are called Ritz valuesand uj “ Qnwj Ritz vectors. We will use the same notation as detailed in (3.5) for Tn.

Let pV0 “ M12V0. We have

Knp pA, pV0q “ KnpM´12AM´12,M12V0q

“ M12KnpM´1A, V0q,

and thus

KnpM´1A, V0q “ M´12Knp pA, pV0q

“ M´12rRppV1q ‘ ¨ ¨ ¨ ‘ RppVnqs

“ RpV1q ‘ ¨ ¨ ¨ ‘ RpVnq. (9.7)

28

Correspondingly, the M -orthogonal projection onto KnpM´1A, V0q is

Πn “ QnQHnM.

In particular Π1 “ V1VH1 M .

Our previous approximation theorems in sections 4 and 5 apply directly to pA and pV0.The resulting theorems can then be translated into theorems for A ´ λM upon using therelationships in (9.3) and (9.7).

We introduce the following notation for A ´ λM :


M -orthonormal eigenvectors: u1, u2, . . . , uN , andU “ ru1, u2, . . . , uN s,

eigen-decomposition: UHAU “ Λ and UHMU “ IN .

(9.8)

Let uj “ M12uj and pU “ M12U . Then for pA:


orthonormal eigenvectors: u1, u2, . . . , uN , andpU “ ru1, u2, . . . , uN s,

eigen-decomposition: pA “ pUΛpUH and pUHpU “ IN .

Next we have the following angle relationship: for any Y, pY P CNˆℓ1 , Z, pZ P CNˆℓ2 relatedby pY “ M12Y and pZ “ M12Z,

ΘppY , pZq “ ΘpM12Y,M12Zq “: ΘM pY,Zq (9.9)

which defines the M -canonical angles from one of RpY q and RpZq with lower dimensionto the other.

To proceed, we need an extension of Proposition 2.4 for the M -inner product. Infact, the proposition remains valid with these modifications: replace 1) all canonical an-gles by the corresponding M -canonical angles, 2) orthonormal by M -orthonormal, 3) theorthogonal projection PY by the M -orthogonal projection, i.e., Y Y HM , where Y is theM -orthonormal basis matrix of Y.

Consider an application of the modified Proposition 2.4(b) for the M -inner productwith k1 “ ℓ ´ k ` 1,

X “ RpV0q “ RpV1q, Y “ RpUp:,i:i`nb´1qq, ry1, y2, . . . , yk1s “ ruk, uk`1, . . . , uℓs,

assumingrankpV H

0 MUp:,i:i`nb´1qq “ rankpV H1 MUp:,i:i`nb´1qq “ nb. (9.10)

The application yields a unique Xi,k,ℓ :“ rx1, x2, . . . , xk1s such that RpXi,k,ℓq Ď RpV0q and

Up:,i:i`nb´1qUHp:,i:i`nb´1qMXi,k,ℓ “ Up:,k:ℓq ” ruk, uk`1, . . . , uℓs. (9.11)

29

Moreover‌

‌sinΘM pUp:,k:ℓq, Xi,k,ℓq‌

‌ ď‌

‌sinΘM pUp:,i:i`nb´1q, V0q‌

‌ , (9.12)‌

‌tanΘM pUp:,k:ℓq, Xi,k,ℓq‌

‌ ď‌

‌tanΘM pUp:,i:i`nb´1q, V0q‌

‌ . (9.13)

In the rest of this section, we will always assume (9.10) and Xi,k,ℓ P CNˆpℓ´k`1q will havethe same assignment as given here.

In the two theorems below, ξi,k, κi,ℓ,nb, η, and ζi take the same forms as in (4.12),

(4.14), and (5.2) and (5.3), respectively, but with eigenvalues λj in (9.8) for the pencil

A ´ λM and Ritz values rλj as the eigenvalues of Tn here by Algorithm 9.1.

Theorem 9.1. For any unitarily invariant norm, we have

‌

‌tanΘM pUp:,k:ℓq,Knq‌

‌ ďξi,k

Tnípκi,ℓ,nbq

‌


‌ , (9.14)

‌

‌

‌sinΘM pUp:,k:ℓq, rUp:,k:ℓqq

‌

‌

‌ď γ

‌

‌sinΘM pUp:,k:ℓq,Knq‌

‌ (9.15)

ďγ ξi,k

Tnípκi,ℓ,nbq

‌


‌ , (9.16)

whereγ “ 1 `

c

ηM´12ΠH

n ApI ´ ΠnqM´122, (9.17)

and the constant c lies between 1 and π2, and is 1 for the Frobenius norm or if λk´1 ą λk.For the Frobenius norm, γ can be improved to

γ “

d

1 `

ˆ

1

ηM´12ΠH

n ApI ´ ΠnqM´122

˙2

. (9.18)

Proof. Only the factor M´12ΠHn ApI ´ ΠnqM´122 in (9.17) and (9.18) needs a justifi-

cation since the rest is rather obvious. The inequality (9.15) is derived from (4.10) appliedto pA “ M´12AM´12, resulting in a factor pΠn

pApI ´ pΠnq2. Since pΠn “ pQnpQHn “

M12QnQHnM

12, we have

pΠnpApI ´ pΠnq “ M12QnQ

HnM

12M´12AM´12pI ´ M12QnQHnM

12q

“ M´12MQnQHnApI ´ QnQ

HnMqM´12

“ M´12ΠHn ApI ´ ΠnqM´12

which contributes to the factor M´12ΠHn ApI ´ ΠnqM´122 in (9.17) and (9.18).

Theorem 9.2. Let k “ i. For any unitarily invariant norm, we have‌

‌


‌

‌

‌

ď pλi ´ λN q

„

ζiTnípκi,ℓ,nb

q

ȷ2‌

‌tan2ΘM pUp:,k:ℓq, Xi,k,ℓq‌

‌ . (9.19)

Similarly, we can derive two more theorems as the extensions of Theorems 8.1 and 8.2.We omit the detailed statements.

30

10 Conclusions

We have established a new convergence theory for solving large scale Hermitian eigenvalueproblem by the block Lanczos method from a perspective of bounding approximation errorsin the entire eigenspace associated with all eigenvalues in a tight cluster, in contrast tobounding errors in each individual approximate eigenvector as was done in Saad [21]. Ina way, this is a natural approach to follow because the block Lanczos method is known tobe capable of computing multiple/cluster eigenvalues much faster than the single-vectorLanczos method (which will miss all but one copy of each multiple eigenvalue). Theoutcome is three error bounds on

1. the canonical angles from the eigenspace to the generated Krylov subspace,

2. the canonical angles between the eigenspace and its Ritz approximate subspace,

3. the total differences between the eigenvalues in the cluster and their correspondingRitz values.

These bounds are much sharper than the existing ones and expose true rates of convergenceof the block Lanczos method towards eigenvalue clusters, as illustrated by numerical ex-amples. Furthermore, their sharpness is independent of the closeness of eigenvalues withina cluster.

As is well-known, the (block) Lanczos method favors the eigenvalues at both endsof the spectrum. So far, we have only focused on the convergence of the few largesteigenvalues and their associated eigenspaces, but as is usually done, applying what wehave established to the eigenvalue problem for Á will lead to various convergence resultsfor the few smallest eigenvalues and their associated eigenspaces.

All results are stated in terms of unitarily invariant norms for generality, but special-izing them to the spectral norm and the Frobenius norm will be sufficient for all practicalpurposes.

References

[1] R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics, vol. 169. Springer, New York,1996.

[2] R. Bhatia, C. Davis, and P. Koosis. An extremal problem in Fourier analysis with applicationsto operator theory. J. Funct. Anal., 82:138–150, 1989.

[3] R. Bhatia, C. Davis, and A. McIntosh. Perturbation of spectral subspaces and solution oflinear operator equations. Linear Algebra Appl., 52-53:45–67, 1983.

[4] E. W. Cheney. Introduction to Approximation Theory. Chelsea Publishing Company, NewYork, 2nd edition, 1982.

[5] Jane K. Cullum and W. E. Donath. A block Lanczos algorithm for computing the q alge-braically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetricmatrices. In Decision and Control including the 13th Symposium on Adaptive Processes, 1974IEEE Conference on, volume 13, pages 505 –509, 1974.

31

[6] Jane K. Cullum and Ralph A. Willoughby. Lanczos Algorithms for Large Symmetric Eigen-value Computations, Vol. I: Theory. SIAM, Philadelphia, 2002.

[7] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer.Anal., 7:1–46, 1970.

[8] J. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997.

[9] G. H. Golub and R. R. Underwood. The block Lanczos method for computing eigenvalues.In J. R. Rice, editor, Mathematical Software III, pages 361–377. Academic Press, New York,1977.

[10] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press,Baltimore, Maryland, 3rd edition, 1996.

[11] Zhongxiao Jia and G. W. Stewart. An analysis of the Rayleigh-Ritz method for approximatingeigenspaces. Math. Comp., 70:637–647, 2001.

[12] S. Kaniel. Estimates for some computational techniques in linear algebra. Math. Comp.,20(95):369–378, July 1966.

[13] A. B. J. Kuijlaars. Which eigenvalues are found by the Lanczos method? SIAM J. MatrixAnal. Appl., 22(1):306–321, 2000.

[14] A. B. J. Kuijlaars. Convergence analysis of Krylov subspace iterations with methods frompotential theory. SIAM Rev., 48(1):3–40, 2006.

[15] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differentialand integral operators. J. Res. Nat. Bur. Standards, 45:255–282, 1950.

[16] Ren-Cang Li. Matrix perturbation theory. In L. Hogben, R. Brualdi, A. Greenbaum, andR. Mathias, editors, Handbook of Linear Algebra, page Chapter 15. CRC Press, Boca Raton,FL, 2006.

[17] Ren-Cang Li. On Meinardus’ examples for the conjugate gradient method. Math. Comp.,77(261):335–352, 2008. Electronically published on September 17, 2007.

[18] Ren-Cang Li. Sharpness in rates of convergence for symmetric Lanczos method. Math. Comp.,79(269):419–435, 2010.

[19] C. C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices.PhD thesis, London University, London, England, 1971.

[20] B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, 1998. This SIAMedition is an unabridged, corrected reproduction of the work first published by Prentice-Hall,Inc., Englewood Cliffs, New Jersey, 1980.

[21] Y. Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods. SIAMJ. Numer. Anal., 15(5):687–706, October 1980.

[22] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston,1990.

[23] P.-A. Wedin. On angles between subspaces. In B. Kagstrom and A. Ruhe, editors, MatrixPencils, pages 263–285, New York, 1983. Springer.

[24] Qiang Ye. An adaptive block Lanczos algorithm. Numerical Algorithms, 12:97–110, 1996.

32

Date post:	17-Jul-2018
Category:	Documents
Upload:	vuongque
View:	219 times
Download:	0 times

Convergence of the Block Lanczos Method for Eigenvalue ... · 1 Introduction The Lanczos method...

Documents