25 Deflation Techniques for an Implicitly Restarted Arnoldi Iteration

DEFLATION TECHNIQUES FOR AN IMPLICITLY RE-STARTEDARNOLDI ITERATION �R. B. LEHOUCQy AND D. C. SORENSENzAbstract. A de ation procedure is introduced that is designed to improve the convergence ofan implicitly restarted Arnoldi iteration for computing a few eigenvalues of a large matrix. As theiteration progresses the Ritz value approximations of the eigenvalues of A converge at di�erent rates.A numerically stable scheme is introduced that implicitly de ates the converged approximations fromthe iteration. We present two forms of implicit de ation. The �rst, a locking operation, decouplesconverged Ritz values and associated vectors from the active part of the iteration. The second,a purging operation, removes unwanted but converged Ritz pairs. Convergence of the iteration isimproved and a reduction in computational e�ort is also achieved. The de ation strategies makeit possible to compute multiple or clustered eigenvalues with a single vector restart method. ABlock method is not required. These schemes are analyzed with respect to numerical stability andcomputational results are presented.Key words. Arnoldi method, Lanczos method, eigenvalues, de ation, implicit restartingAMS subject classi�cations. 65F15, 65G051. Introduction. The Arnoldi method is an e�cient procedure for approximat-ing a subset of the eigensystem of a large sparse n�n matrix A. The Arnoldi methodis a generalization of the Lanczos process and reduces to that method when the matrixA is symmetric. After k steps the algorithm produces an upper Hessenberg matrixHkof order k. The eigenvalues of this small matrix Hk are used to approximate a subsetof the eigenvalues of the large matrix A. The matrix Hk is an orthogonal projectionof A onto a particular Krylov subspace and the eigenvalues of Hk are usually calledRitz values or Ritz approximations.There are a number of numerical di�culties with Arnoldi/Lanczos methods.In [34] a variant of this method was developed to overcome these di�culties. Thistechnique, the Implicitly Restarted Arnoldi iteration (ira-iteration) may be viewed asa truncation of the standard implicitly shifted qr-iteration. This connection will bereviewed during the course of the paper. Because of this connection, an ira-iterationshares a number of the qr-iteration's desirable properties. These include the well un-derstood de ation rules of the qr-iteration. These de ation techniques are extremelyimportant with respect to the convergence and stability of the qr-iteration. De ationrules have contributed greatly to the emergence of the practical qr algorithm as themethod of choice for computing the eigen-system of dense matrices. In particular, thede ation rules allow the qr-iteration to compute multiple and clustered eigenvalues.This paper introduces de ation schemes that may be used within an ira-iteration.This iteration is designed to compute a selected subset of the spectrum of A such asthe k eigenvalues of largest real part. We refer to this selected subset as wantedand the remainder of the spectrum as unwanted . As the iteration progresses someof the Ritz approximations to eigenvalues of A may converge long before the entire� This work was supported in part by ARPA (U.S. Army ORA4466.01), by the U.S. Departmentof Energy (Contracts DE-FG0f-91ER25103 and W-31-109-Eng-38), and by the National ScienceFoundation (Cooperative agreement CCR-9120008).y Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL [email protected] Computational and Applied Mathematics Department, Rice University, Houston, TX [email protected]. 1

2 R. B. LEHOUCQ AND D. C. SORENSENset of wanted eigenvalues has been computed. These converged Ritz values may bepart of the wanted or the unwanted portion of the spectrum. In either case, it isdesirable to de ate the converged Ritz values and corresponding Ritz vectors fromthe unconverged portion of the factorization. If the converged Ritz value is wanted, itis necessary to keep it in the subsequent Arnoldi factorizations. This is called locking .If the converged Ritz value is unwanted then it must be removed from the current andsubsequent Arnoldi factorizations. This is called purging . These notions will be madeprecise during the course of the paper. For the moment we note that the advantagesof a numerically stable de ation strategy include:� Reduction of the working size of the desired invariant subspace.� Preventing the e�ects of the forward instability of the Lanczos and qr algo-rithms [27, 39].� The ability to determine clusters of nearby eigenvalues without need for ablock Arnoldi method [18, 32, 33].The fundamentals of the Arnoldi algorithm are introduced in x 2 as well as thedetermination of Ritz value convergence. The ira-iteration is reviewed in x 3. De at-ing within the ira-iteration is examined in x 4. The de ation scheme for convergedRitz values is presented in x 5. The practical issues associated with our de ationscheme are examined in x 6. These include block generalizations of the ideas exam-ined in x 5 for dealing with a number of Ritz values simultaneously, and avoiding theuse of complex arithmetic when a complex conjugate pair of Ritz values converges.An error analysis of the de ated process in presented in x 7. A brief survey of andcomparisons with other de ation strategies is given in x 8. An interesting connectionwith the various algorithms used to re-order a Schur form of matrix is presented inx 9. Numerical results are presented in x 10.Capital and lower case letters denote matrices and vectors while lower case Greekletters denote scalars. The j-th canonical basis vector is denoted by ej . The normsused are the Euclidean and Frobenius ones denoted by k � k and k � kF , respectively.The range of a matrix A is denoted by R(A).2. The Arnoldi Factorization. Arnoldi's method [1] is an orthogonal projec-tion method for approximating a subset of the eigensystem of a general square matrix.The method builds, step by step, an orthogonal basis for the Krylov space,Kk(A; v1) � spanfv1; Av1; : : : ; Ak�1v1g;for A generated by the vector v1. The original algorithm in [1] was designed toreduce a dense matrix to upper Hessenberg form. However, the method only requiresknowledge of A through matrix vector products, and its ultimate value as a techniquefor approximating a few eigenvalues of a large sparse matrix was soon realized. Whenthe matrix A is symmetric the procedure reduces to the Lanczos method [22].Over a decade of research was devoted to understanding and overcoming the nu-merical di�culties of the Lanczos method [26]. Development of the Arnoldi methodlagged behind due to the inordinate computational and storage requirements associ-ated with the original method when a large number of steps are required for conver-gence. Not only is more storage required for Vk and Hk when A is nonsymmetric,but in general more steps are required to compute the desired Ritz value approx-imations. An explicitly restarted Arnoldi iteration (era-iteration) was introducedby Saad [30] to overcome these di�culties. The idea is based on similar ones devel-oped for the Lanczos process by Paige [25], Cullum and Donath [10], and Golub andUnderwood [17]. Karush proposed the �rst example of a re-started iteration in [21].

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 3After k steps, the Arnoldi algorithm computes a truncated factorizationAVk = VkHk + fkeTk ;(2.1)of A 2 Rn�n into upper Hessenberg form where V Tk Vk = Ik. The vector fk is theresidual and is orthogonal to the columns of Vk. The matrix Hk 2 Rk�k is an upperHessenberg matrix that is the orthogonal projection of A onto R(Vk) � Kk(A; v1).The following procedure shows how the factorization is extended from length kto k + p.Algorithm 2.1.function [Vk+p;Hk+p; fk+p] = Arnoldi (A; Vk;Hk; fk; k; p)Input: AVk � VkHk = fkeTk with V Tk Vk = Ik, & V Tk fk = 0.Output: AVk+p�Vk+pHk+p = fk+peTk+p with V Tk+pVk+p = Ik+p, & V Tk+pfk+p = 0.1. For j = 1; 2 : : :p2. �k+j kfk+j�1k; if �k+j = 0 then stop;3. vk+j fk+j�1��1k+j ; Vk+j � Vk+j�1 vk+j �;4. w Avk+j ;5. hk+j V Tk+j�1w; �k+j vTk+jw ;6. Hk+j � Hk+j�1 hk+j�k+jeTk+j�1 �k+j �7. fk+j w � Vk+j�1hk+j � vk+j�k+j;If k = 0 then V1 = v1 represents the initial vector. In order to ensure thatV Tk fk � 0 in �nite precision arithmetic, the above algorithm requires some form ofre-orthogonalization at step 7; see Chapter 7 of [23].In exact arithmetic, the algorithm continues until fk = 0 for some k � n. Allof the intermediate Hessenberg matrices Hj are unreduced for j � k. A Hessenbergmatrix is said to be unreduced if all of its main sub-diagonal elements are nonzero.The residual vanishes at the �rst step k such that dimKk+1(A; v1) = k and hence isguaranteed to vanish for some k � n . The following result indicates when an exacttruncated factorization occurs. This is desirable since the columns of Vk form a basisfor an invariant subspace and the eigenvalues of Hk are a subset of those of A.Theorem 2.2. Let equation (2.1) de�ne a k-step Arnoldi factorization of A,with Hk unreduced. Then fk = 0 if and only if v1 = Qky where AQk = QkRk withQTkQk = Ik, and Rk an upper quasi-triangular matrix of order k.Proof. See Chapter 2 of [23] or [34] for a proof based on the Jordan Canonicalform.In Theorem 2.2, the span of the k columns of Qk represent an invariant subspacefor A. The matrix equation AQk = QkRk is a partial real Schur decompositionof order k for A. The diagonal blocks of Rk contain the eigenvalues of A. Thecomplex conjugate pairs are in blocks of order two and the real eigenvalues are on thediagonal of Rk, respectively. In particular, the theorem gives that if the initial vectoris a linear combination of k linearly independent eigenvectors then the k-th residualvector vanishes. It is therefore desirable to to devise a method that forces the startingvector v1 to lie in the invariant subspace associated with the the wanted eigenvalues.The algorithms of this paper are appropriate when the order of A is so largethat storage and computational requirements prohibit completion of the algorithmthat produces Vn and Hn. We also remark that working in �nite precision arithmeticgenerally removes the possibility of the computed residual ever vanishing exactly.As the norm of fk decreases, the eigenvalues of Hk become better approximationsto those of A. Experience indicates that kfkk rarely becomes small let alone zero.

4 R. B. LEHOUCQ AND D. C. SORENSENHowever, as the order of Hk increases certain eigenvalues of Hk may emerge as excel-lent estimates to eigenvalues of A. When an eigenvalue Hk is su�ciently near one ofA, we will say that convergence occured. Since the interest is in a small subset of theeigensystem of A, alternate criteria that allow termination for k � n are needed. LetHky = y� where kyk = 1. De�ne the vector x = Vky to be a Ritz vector and � to beRitz value. Then kAVky � VkHkyk = kAx� x�k;= kfkk jeTk yj;(2.2)indicates that if the last component of an eigenvector for Hk is small the Ritz pair(x; �) is an approximation to an eigenpair of A. This pair is exact for a nearbyproblem: it is easily shown that (A + E)x = x� with E = �(eTk y)fkxH . The ad-vantage of using the Ritz estimate (2.2) is to avoid explicit formation of the quantityAVky � Vky� when accessing the numerical accuracy of an approximate eigenpair.Recent work by Chatelin and Frays�ee [8, 9] and Godet{Thobie [14] suggests thatwhen A is highly non-normal, the size of eTk y is not an appropriate guide for detectingconvergence. If the relative departure from normality de�ned by the Henrici num-ber kAAT � ATAkF =kA2kF , is large, the matrix A is considered highly non-normal.Assuming that A is diagonalizable, a large Henrici number implies that the basis ofeigenvectors is ill-conditioned [8]. Bennani and Braconnier compare the use of theRitz estimate and direct residual kAx� x�k in Arnoldi algorithms [4]. They suggestnormalizing the Ritz estimate by the norm of A resulting in a stopping criteria basedon the backward error. The backward error is de�ned as the smallest, in norm, per-turbation �A such that the Ritz pair is an eigenpair for A+�A. Scott [33] presentsa lucid account of the many issues involved in determining stopping criteria for theunsymmetric problem.3. The Implicitly Restarted Arnoldi Iteration. Theorem 2.2 motivates theselection of a starting vector that will lead to the construction of an approximatebasis for the desired invariant subspace of A. The best possible starting vector wouldbe a linear combination of a Schur-basis for the desired invariant subspace. Theira-iteration iteratively restarts the Arnoldi factorization with the goal of forcing thestarting vector closer and closer to the desired invariant subspace. The scheme iscalled implicit because the updating of the starting vector is accomplished with animplicitly shifted qr mechanism on Hk. This will allow us to update the startingvector by working with orthogonal matrices that live in Rk�k rather than in Rn�n.The iteration starts by extending a length k Arnoldi factorization by p steps.Next, p shifted qr steps are performed on Hk+p. The last p columns of the factor-ization are discarded resulting in a length k factorization. The iteration is de�ned byrepeating the above process until convergence.As an example, suppose that p = 1 and that k represents the dimension of thedesired invariant subspace. Let � be a real shift and let Hk+1 � �I = QR with Qorthogonal and R upper triangular matrices, respectively. Then from (2.1)(A� �I)Vk+1 � Vk+1(Hk+1 � �I) = fk+1eTk+1;(3.1) (A� �I)Vk+1 � Vk+1QR = fk+1eTk+1;(A � �I)(Vk+1Q)� (Vk+1Q)(RQ) = fk+1eTk+1Q;A(Vk+1Q) � (Vk+1Q)(RQ+ �I) = fk+1eTk+1Q:(3.2)

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 5The matrices are updated via V +k+1 Vk+1Q andH+k+1 RQ+�I and the latter ma-trix remains upper Hessenberg since R is upper triangular and Q is upper Hessenberg.However, equation (3.2) is not quite a legitimate Arnoldi factorization. The relationof equation (3.2) fails to be an Arnoldi factorization since the matrix fk+1eTk+1Q hasa non-zero k-th column. Partitioning the matrices in the updated equation results inA � V +k v+k+1 � = � V +k v+k+1 � � H+k h+k+1�+k+1eTk �+k+1 �(3.3) + fk+1 � �keTk k � ;where �k = eTk+1Qek and k = eTk+1Qek+1. Equating the �rst k columns of (3.3) givesAV +k = V +k H+k + (�+k+1v+k+1 + �kfk+1)eTk :(3.4)Performing the update f+k �+k+1v+k+1 + �kfk+1 and noting that (V +k )Tf+k = 0 itfollows that equation (3.4) is a length k Arnoldi factorization.We now show that the ira-iteration is equivalent to forming the leading portionof an implicitly shifted qr-iteration. Note that equations (3.1){(3.2) are valid for1 � k � n. In particular, extending the factorization of equation (3.1) by n � ksteps gives fn = 0 and AVn � VnHn = 0 de�nes a decomposition of A into upperHessenberg form. Let QnRn = Hn � �I where Qn and Rn are orthogonal and uppertriangular matrices of order n, respectively. Since Q and R are the leading principalsub-matrices of order k+1 for Qn and Rn, respectively, VnQnRne1 = Vk+1QRe1 andeT1Rne1 = eT1 Re1 follow. Post multiplication of equation (3.2) with e1 exposes therelationship (A � �I)v1 = Vk+1Qe1�11 = VnQne1�11 = v+1 ;where �11 = eT1 Re1, v1 = Vk+1e1 and V +k+1e1 = v+1 . In words, the �rst column of theupdated k step factorization matrix is the same as the �rst column of the orthogonalmatrix obtained after a complete qr step on A with shift �. Thus, the ira-iterationmay be viewed as a truncated version of the standard implicitly shifted qr-iteration.This idea may be extended for up to p > 1 shifts [34]. One cycle of the iteration ispictured in Figures 3.1| 3.3. Application of the shifts may be performed implicitlyas in the qr algorithm. If the shifts are in complex conjugate pairs then the implicitdouble shift can be used to avoid complex arithmetic.Numerous choices are possible for the selection of the p shifts. One immediatechoice is to use the p unwanted eigenvalues of Hk+p. In exact arithmetic, the last po� diagonal elements of Hk+p are zero and the Arnoldi factorization decouples. Forexample, in equation (3.4), �+k+1 = 0 when � is an eigenvalue of Hk. The reader isreferred to [7, 23, 34] for further information.The number of shifts to apply at each cycle of the above iteration is problemdependent. At present there is no a-priori analysis to guide the selection of p relativeto k. The only formal requirement is that 1 � p � n � k. However, computationalexperience suggests that p � k is preferable. If many problems of the same type areto be solved, experimentation with p for a �xed k should be undertaken. This usuallydecreases the required number of matrix{vector operations but increases the work andstorage required to maintain the orthogonal basis vectors. The optimal cross-overwith respect to CPU time varies and must be determined empirically. Lehoucq makesa connection with subspace iteration in Chapter 8 of [23]. There has been considerable

6 R. B. LEHOUCQ AND D. C. SORENSENexperience with subspace iteration and this connection may eventually shed light onhow to select p relative to k. For example, it is well known that performing subspaceiteration on a subspace of dimension larger than the number of eigenvalues requiredtypically leads to improved convergence rates; see the paper of Du� and Scott [12] fora discussion and further references.Among the several advantages an implicit updating scheme possess are:� �xed storage requirements.� The ability to maintain a prescribed level of orthogonality for the columns ofV since k is of modest size.� Application of the matrix polynomial v+1 (A)v1 without need to applymatrix vector products with A.� The incorporation of the well understood numerical and theoretical behaviorof the qr algorithm.This last two points warrant further discussion. Quite often, the dominant cost duringArnoldi iterations are the matrix vector products with A. Thus, the ira-iterationmay result in a substantial reduction in time when building a length k + p Arnoldifactorization. The last point is important since it allows the possibility of constructinggeneral purpose and reliable software for the large scale eigenvalue problem.4. De ation within an IRA-iteration. As the iteration progresses the Ritzestimates (2.2) decrease at di�erent rates. When a Ritz estimate is small enough, thecorresponding Ritz value is said to have converged. The converged Ritz value maybe wanted or unwanted. In either case, a mechanism to de ate the converged Ritzvalue from the current factorization is desired. Depending on whether the convergedRitz value is wanted or not, it is useful to de�ne two types of de ation. Before we dothis, it will prove helpful to illustrate how de ation is achieved. Suppose that afterm steps of the Arnoldi algorithm we haveA � V1 V2 � = � V1 V2 � � H1 G�e1eTj H2 �+ feTm;(4.1)where V1 2 Rn�j, H1 2 Rj�j for 1 � j < m. If � is suitably small then the factor-ization decouples in the sense that a Ritz pair (y; �) for H1 provides an approximateeigen pair (x = V1y; �) with a Ritz estimate of j�eTj yj. Setting � to zero splits a nearbyproblem exactly and setting � = 0 is called de ation. If � is suitably small then allthe eigenvalues of H1 may be regarded as converged Ritz values.4.1. Locking. If de ation has taken place, the column vectors in V1 are con-sidered locked . This means that subsequent implicit restarting is done on the basisV2. The sub-matrices e�ected during implicit restarting are G, H2 and V2. However,during the phase of the iteration that extends the Arnoldi factorization from k tok + p steps, all of the columns of � V1 V2 � participate just as if no de ation hadoccurred. This assures that all of the new Arnoldi basis vectors are orthogonalizedagainst converged Ritz vectors and prevents the introduction of spurious eigenvaluesinto the subsequent iteration.After de ation, equating the lastm�j columns of (4.1) results in (I�V1V T1 )AV2 =V2H2+feTm�j : Thus, de ating V1 andH1 from the factorization de�nes a new Arnoldifactorization with the matrix (I�V1V T1 )A and starting vector V2e1: This equivalencewas noted by Saad [31, page 182]. Moreover, this provides a means to safely computemultiple eigenvalues when they are present. A block method is not required if de a-tion and locking are used. The concept of locking was introduced by Jennings andStewart [37] as a de ation technique for simultaneous iteration.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 74.2. Purging. If de ation has occurred but some of the de ated Ritz valuesare unwanted then a further mechanism, purging, must be introduced to remove theunwanted Ritz values and corresponding vectors from the factorization. The basicidea of purging is perhaps best explained with the case of a single de ated Ritz value.Let j = 1 in (4.1) and equate the �rst columns of both sides to obtainAv1 = v1�1 + �V2e1;(4.2)where v1 = V1e1 and H1 = �1. Equation (4.2) is an Arnoldi factorization of lengthone. The Ritz value �1 has Ritz estimate j�j.Equating the last m � 1 columns of (4.1) results inAV2 = V1G+ V2H2 + feTm�1;(4.3)Suppose that �1 represents an unwanted Ritz value. If A were symmetric then G =�eT1 and equation (4.3) would become(A+ E)V2 = V2H2 + feTm�1;where E = ��v1(V2e1)T � �(V2e1)vT1 . Since kEk = � equation (4.3) de�nes a lengthm � 1 Arnoldi factorization for a nearby problem. The unwanted Ritz pair (v1; �1)may be purged from the factorization simply by taking V = V2 and H = H2 andsetting G = 0 in (4.3). If A is not symmetric, the 1 � (m � 1) matrix G couples v1to the rest of the basis vectors V2. This vector may be decoupled using the standardSylvester equation approach [15, pages 386{387]. Purging then takes place as in thesymmetric case. However, the new set of basis vectors must be re-orthogonalized inorder to return to an Arnoldi factorization. This procedure is developed in x 5 andx 6 including the case of purging several vectors.4.3. Complications. An immediate question is: Do any sub-diagonal elementsin the Hessenberg matrix of the factorization (4.1) become negligible as an ira-iteration progresses ? Since a cycle of the Arnoldi iteration involves performing asequence of qr steps, the question is answered by considering the behavior of the qr-iteration upon upper Hessenberg matrices. In exact arithmetic, under the assumptionthat the Hessenberg matrix is unreduced, only the last sub-diagonal element may be-come zero when shifting. But the other sub-diagonal elements may become arbitrarilysmall.In addition, in exact arithmetic, the purging technique would not be necessaryas the implicit shift technique would accomplish the removal of the unwanted Ritzpairs from the leading portion of the iteration. For example, using the unwanted Ritzvalues as shifts accomplishes this removal.Computing in �nite precision arithmetic complicates the situation. A robustimplementation of the qr algorithm sets a sub-diagonal element to zero if it is inmagnitude less than some prescribed threshold and this technique is also adopted forde ation. This de ation overcomes the technical di�culty associated with tiny sub-diagonals and improves the convergence of the ira-iteration. In addition, it may beimpossible to accomplish the removal of the unwanted Ritz values from the leadingportion of the iteration due to the forward instability [27, 39] of the qr algorithm.The phenomena of the forward instability of the tridiagonal qr-iteration [27] wasinitially explored by Parlett and Le. They observe that while the implicitly shifted qr-iteration is always backward stable, there are cases where severe forward instabilitycan occur. It is possible for a qr-iteration to result in a computed Hessenberg matrix

8 R. B. LEHOUCQ AND D. C. SORENSENwith entries that have no signi�cant digits in common with the corresponding entriesof the Hessenberg matrix that would have been determined in exact arithmetic. Theimplication is that the computed sub-diagonal entries may not be reliable indicatorsfor decoupling the Arnoldi factorization. Le and Parlett's analysis formally impliesthat the computed Hessenberg matrix may lose signi�cant digits when the shift usedis nearly an eigenvalue of H, and the last component of the normalized eigenvector issmall. We also mention the work of Watkins [39] that investigates the transmissionof the shift during a qr step through H.Since convergence of a Ritz value is predicated upon the associated Ritz estimatebeing small, using shifts that are near these converged values may force the iraiteration to undergo forward instability. This indicates that it may be impossible to�lter out unwanted eigenvalues with the implicit restarting technique and this is themotivation for developing both the locking and purging techniques. Further detailsmay be found in Chapter 5 of [23].5. De ating Converged Ritz Values. During an Arnoldi iteration, a Ritzvalue may be near an eigenvalue of A with no small elements appearing on the sub-diagonal ofHk. However, when a Ritz value converges, it is always possible to make anorthogonal change of basis in which the appropriate sub-diagonal of Hk is zero. Thefollowing result indicates how to exploit the convergence information available in thelast row of the eigenvector matrix for Hk. For notational convenience, all subscriptsare dropped on the Arnoldi matrices, V , H and f , for the remainder of this section.Lemma 5.1. Let Hy = y� where H 2 Rk�k is an unreduced upper Hessenbergmatrix and � 2 R with kyk = 1 . Let W be a Householder matrix such that Wy = e1�where � = �sign(eT1 y). Then eTkW = eTk +wT ;(5.1)where kwk � p2jeTk yj and WTHWe1 = e1�:(5.2)Proof. The required Householder matrix has the formW = I � (y � �e1)(y � �e1)T ;where = (1 + jeT1 yj)�1 and � = �sign(eT1 y). A direct computation reveals thateTkW = eTk +wT ;(5.3)where wT = eTk y(�eT1 � yT ). Estimatingkwk = jeTk yj1 + jeT1 yj ky � �e1k;= jeTk yj1 + jeT1 yjq2(1 + jeT1 yj);� p2jeTk yj;establishes the bound on kwk. The �nal assertion (5.2) follows fromWTHWe1 = ��1WTHy;= ��1�W T y;= ��1�Wy; (W T = W )= �e1:

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 9Lemma (5.1) indicates that the last row and column of W di�er from the last rowand column of Ik by terms of order jeTk yj. The Ritz estimate (2.2) will indicate whenit is safe to de ate the corresponding Ritz value �. Rewriting (2.1) asAVW = VWW THW + feTkW;and using both (5.1) and (5.2) and partitioning we obtainAV W = V W � � �hT0 �H �+ feTk + fwT :(5.4)Equation (5.4) is not an Arnoldi factorization. In order to return to an Arnoldifactorization, the matrix �H of order k � 1 needs to be returned to upper Hessenbergform and the term fwT dropped. Care must be taken not to disturb the matrix feTkand the �rst column of W THW . To start the process we compute a Householdermatrix Y1 such that Y T1 �HY1 = � �G �g��keTk�2 � ;with eTk�1Y1 = eTk�1. The above idea is repeated resulting in Householder matricesY1; Y2; : : : ; Yk�3 that return �H to upper Hessenberg form. De�ningY = � 1 00 Y1Y2 � � �Yk�3 � ;it follows by the construction of the Yj that eTk Y = eTk andY TWTHWY e1 = �e1:(5.5)The process of computing a similarity transformation as in equation (5.5) is not new.Wilkinson discusses similar techniques in [40, pages 587{596]. Wilkinson referencesthe work of Feller and Forsythe [13] who appear to be the �rst to use elementaryHouseholder transformations for de ation. Problem 7.4.8 of [15, page 371] addressesthe case when working with upper Hessenberg matrices. What appears to be new isthe application to the Arnoldi factorization for converged Ritz values.Since kfwTY k = kfk kY Twk = kfk kwk; the size of kfwTk remains unchanged.Making the updatesV V WY; H Y TWTHWY; wT wTY;we obtain the relation AV = V H + feTk + fwT :(5.6)A de ated Arnoldi factorization is obtained from equation (5.6) by discarding theterm fwT .The following theorem shows that the de ated Arnoldi factorization resultingfrom this scheme is an exact k-step factorization of a nearby matrix.Theorem 5.2. Let an Arnoldi factorization of length k be given by (5.6) whereHy = y� and p2jeTk yj kfk � �kAk for some � > 0. Then there exists a matrixE 2 Rn�n such that (A +E)V = V H + feTk ;(5.7)

10 R. B. LEHOUCQ AND D. C. SORENSENwhere kEk � �kAk:Proof. Subtract fwT from both sides of equation (5.6). Set E = �f(V w)T andthen EV = �f(V w)TV = �fwT ;and equation (5.7) follows. Using Lemma 5.1 giveskEk = kfk kwk = p2jeTk yj kfk � �kAk:If A is symmetric then the choice E = �f(V w)T �(V w)fT results in a symmetricperturbation. If � is on the order of unit roundo� then the de ation scheme introducesa perturbation of the same order to those already present from computing the Arnoldifactorization in oating point arithmetic.Once a converged Ritz value � is de ated, the Arnoldi vector corresponding to� is locked or purged as described in the previous section. The only di�culty thatremains is purging when A is nonsymmetric.If A is not symmetric then the Ritz pair may not be purged immediately becauseof the presence of �h. A standard reduction of H to block diagonal form is used. If �is not an eigenvalue of �H, then we may construct a vector z 2 Rk�1 so that� � �hT�H � � 1 zTIk�1 � = � 1 zTIk�1 � � � �H � :(5.8)Solving the linear system ( �HT � �Ik�1)z = �h;(5.9)determines z. De�ne Z � � 1 zTIk�1 � :Post multiplication of equation (5.6) by Z results inAV Z = V Z � � �H �+ feTk + fwTZ;since eTk Z = eTk . Equating the last k�1 columns of the previous expression results inAV � zTIk�1 � = V � zTIk�1 � �H + feTk�1 + fwT � zTIk�1 � :(5.10)Compute the factorization (using k � 1 Givens rotations)QR = � zTIk�1 � ;(5.11)

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 11where Q 2 Rk�k�1 with QTQ = Ik�1 and R is an upper triangular matrix of orderk � 1. Since the last k � 1 columns of Z are linearly independent, R is nonsingular.Post multiplying equation (5.10) by R�1 givesAV Q = V QR �HR�1 + ��1k�1feTk�1 + fwTQ;(5.12)where �k�1 = eTk�1Rek�1. The last term fwTQ in (5.12) is discarded by the de ationscheme and this relation shows that the discarded term is not magni�ed in norm bythe purging procedure. The matrix R �HR�1 remains upper Hessenberg since R isupper triangular.Partitioning Q conformally with the right side of equation (5.11) results in� qT11Q21 �R = � zTIk�1 � ;and it follows that R�1 = Q21. Using the Cauchy{Schwarz inequality, it follows thatj��1k�1j = jeTk�1Q21ek�1j � 1 and hence the Arnoldi residual is not ampli�ed by thepurging. The �nal purged Arnoldi factorization isAVQ = V QR �HQ21 + ��1k�1feTk�1:(5.13)Performing the set of updatesV V Q; H R �HQ21; f ��1k�1f;de�nes equation (5.13) as an Arnoldi factorization of length k � 1. Theorem 5.2 im-plies this is an Arnoldi factorization for a nearby matrix. It is easily veri�ed thatV T f(eTk�1 + wT ) = 0 and that H is an upper Hessenberg matrix of order k � 1.Moreover, since the updated H is a block diagonal submatrix of a similarity transfor-mation of the original H, the remaining Ritz values are unchanged. Since the termfwT is discarded, the Ritz estimates given by the updated Arnoldi factorization forthe remaining Ritz values will be will be slightly inaccurate. Lemma (5.1) and the factthat kR�1k � 1 may be used to show that the errors in these estimates are boundedabove by kfk(p2jeTk yj): If w = 0 then the Ritz estimates for the updated factorizationwould be exactly the same as the Ritz residuals and estimates for the original one.6. A Practical De ating Procedure for the Arnoldi Factorization. Thepractical issues associated with a numerically stable de ating procedure are addressedin this section. These include:1. Performing the de ation in real arithmetic when a converged Ritz valuehas a non-zero imaginary component.2. De ation with more than one converged Ritz value.3. Error Analysis.Section 6.2 presents two algorithms that implement the de ation schemes. The erroranalysis of the two de ation schemes is presented in the next section.6.1. De ation with Real Arithmetic. Suppose H(y + iz) = (� + i�)(y + iz)where y and z are unit vectors in Rk, H 2 Rk�k and � 6= 0. It then follows thatH � y z � = � y z � � � �� y z �C:Thus, we may de ate a complex Ritz value in real arithmetic if jeTk yj and jeTk zj aresmall enough.

12 R. B. LEHOUCQ AND D. C. SORENSENSuppose that H corresponds to an Arnoldi factorization of length k and thatjeTk yj = O(�) = jeTk zj. Factor � y z � = U � T0 � ;(6.1)where UTU = Ik and T is an upper triangular matrix. It is easily shown that y andz are linearly independent as vectors in Rk since � 6= 0 and the nonsingularity of Tfollows. Performing a similarity transformation on H with U givesUTHU � e1 e2 � = � TCT�10 � :In order to de ate the complex conjugate pair of eigenvalues from the factorizationin an implicit manner, we require that eTkU = eTk + uT where kuk = O(�).We now show that the magnitudes of the last components of y and z are notsu�cient to guarantee the required form for U . Suppose that z = y cos � + r sin�where r is a unit vector orthogonal to y and � measures the positive angle betweeny and z. Lemma 5.1 implies that a Householder W matrix may be constructed suchthat WT � y z � = � �1e1 �1e1 cos �+W T r sin� � � � �1 �0 z � ;where �1 = �1 and the last column and row of W and Ik are the same up to theorder eTk y. To compute the required orthogonal factorization in equation (6.1) anotherHouseholder matrixQ = � 1 00 Q �, is needed so that QT z = �kzke1. But Lemma5.1only results in eTk�1Q = eTk�1 + qT with kqk = O(�) if eTk�1z is small relative to kzk.Unfortunately, if � is small, W T z � �1e1 and kzk � �. Hence we cannot obtain therequired form for U = WQ.Fortunately, when y and z are nearly aligned, � may be neglected as the followingresult demonstrates.Lemma 6.1. Let H(y + iz) = (� + i�)(y + iz) where y and z are unit vectors inRk, H 2 Rk�k and � 6= 0. Suppose that � measures the positive angle between y andz. Then j�j � sin�kHk:(6.2)Proof. Let z = y cos � + r sin� where r is a unit vector orthogonal to y and �measures the positive angle between y and z. Equating real and imaginary parts ofH(y + iz) = (�+ i�)(y + iz) results in Hy = y� � z� and Hz = y�+ z�. The desiredestimate follows since2� = yTHz � zTHy = sin�(yTHr � rTHy);results in j�j � sin�kHk.For small �, y and z are almost parallel eigenvectors of H corresponding to anearly multiple eigenvalue. Numerically, we set � to zero and de ate one copy of �from the Arnoldi factorization.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 13A computable bound on the size of the angle � is now determined using only thereal and imaginary parts of the eigenvector. The second Householder matrixQ shouldnot be computed if jeTk�1zj > kzkjeTk zj:(6.3)Recall that Lemma 5.1 gives eTkW = eTk + wT where wT = eTk y(�1eT1 � yT ) and = (1 + jeT1 yj)�1. ThuseTk�1z = eTkWT z = eTkWz = eTk z + wT z;where the symmetry of W is used. The estimatekzk = k [ 0 zT ]T k = kW Trk sin� = sin�;follows since W is orthogonal and r is a unit vector. Rewriting equation (6.3), weobtain sin� < jeTk z +wT zeTk z j;= j1 + wT zeTk z j;= j1 + (�1eT1 z � yT z)eTk yeTk z j;(6.4)as our computable bound.Suppose that HX = XD where X 2 Rk�j and D is a quasi-diagonal matrix.The eigenvalues of H are on the diagonal of D if they have zero imaginary componentand in blocks of two for the complex conjugate pairs. The columns of X span theeigenspace corresponding to diagonal values of D. For the blocks of order two on thediagonal the corresponding complex eigenvector is stored in two consecutive columnsof X, the �rst holding the real part, and the second the imaginary part. If we wantto block de ate X, where the last row is small, from H, we could proceed as follows.Compute the orthogonal factorization X = Q � R0 � via Householder re ectors whereQTQ = Ik and R 2 Rk�k is upper triangular. Then the last row and column of Qdi�er from that of Ik with terms on the same order of the entries in the last row ofX if the condition number of R is modest. Thus, if the columns of X are not almostlinearly dependent, an appropriate Q may be determined. Finally, we note that whenH is a symmetric tridiagonal matrix, an appropriate Q may always be determined.6.2. Algorithms for De ating Converged Ritz Values. The two proce-dures presented in this section extend the ideas of x 4 to provide de ation of morethan one converged Ritz value at a time. The �rst purges the factorization of the un-wanted converged Ritz values. The second locks the Arnoldi vectors corresponding tothe desired converged Ritz values. When both de ation algorithms are incorporatedwithin an ira-iteration, the locked vectors form a basis for an approximate invariantsubspace of A. This truncated factorization is an approximate partial Schur decom-position. When A is symmetric, the approximate Schur vectors are Ritz vectors andthe upper quasi-triangular matrix is the diagonal matrix of Ritz values.

14 R. B. LEHOUCQ AND D. C. SORENSENPartition a length m Arnoldi factorization asA � Vj �Vm�j � = � Vj �Vm�j � � Hj Gj0 �Hm�j �+ fmeTm + fwT ;(6.5)where Hj and �Hm�j are upper quasi-triangular and unreduced upper Hessenbergmatrices, respectively. The matrix Hj 2 Rj�j contains the wanted converged Ritzvalues of the matrix Hm. The columns of Vj 2 Rn�j are the locked Arnoldi vectorsthat represent an approximate Schur basis for the invariant subspace of interest. Thematrix �Hm�j designates the trailing sub-matrix of order m� j. Analogously, the lastm� j columns of Vm are denoted by �Vm�j . We shall refer to the last m� j columnsof (6.5) as the active part of the factorization. Finally, Gj 2 Rj�(m�j) denotes thesub-matrix in the north{east corner of Hm. Figure 6.1 illustrates the matrix productVmHm of equation (6.5).If A is symmetric the two de ation procedures simplify considerably. In fact,purging is only used when A is nonsymmetric for otherwise Gj = 0j�(m�j) and bothHj and �Hm�j are symmetric tridiagonal matrices. Both algorithms are followed byremarks concerning some of the speci�c details.Algorithm 6.2.function [Vm;Hm; fm] = Lock (Vm;Hm; fm; Xi; j)INPUT: A length m Arnoldi factorization AVm = VmHm + fmeTm: The �rst jcolumns of Vm represent an approximate invariant subspace for A. The leading prin-cipal submatrix Hj of order j of Hm is upper quasi-triangular and contains the con-verged Ritz values of interest. The columns of Xi 2 R(m�j)�i are the eigenvectorscorresponding to the eigenvalues that are to be locked.OUTPUT: A length m Arnoldi factorization de�ned by Vm, Hm and fm wherethe �rst j + i columns of Vm are an approximate invaraint subspace for A.1. Compute the orthogonal factorizationQ � Ri0m�j�i � = Xi;where Q 2 R(m�j)�(m�j) using Householder matrices ;2. Update the factorization�Hm�j QT �Hm�jQ ; �Vm�j �Vm�jQ ; Gj GjQ ;3. Compute an orthogonal matrix P 2 R(m�j�i)�(m�j�i) using Householder ma-trices that restores �Hm�j�i to upper Hessenberg form ;4. Update the factorization�Hm�j�i P T �Hm�j�iP ; �Vm�j�i �Vm�j�iP ; Gj+i Gj+iP ;Line 1 computes an orthogonal basis for the eigenvectors of �Hm�j that correspondto the Ritz estimates that are converged. The matrix of eigenvectors in line 1 satis�esthe equation �Hm�jXi = XiDi where Di is a quasi-diagonal matrix containing theeigenvalues to be locked. From the x 6.1, we see that the leading sub-matrix ofQT �Hm�jQ of order i is upper quasi-triangular. The required relation eTmQ = eTm+qT ,with kqk small is guaranteed if the condition number of Ri is modest. Since i istypically a small number, we compute the condition number of Ri. The number ofvectors to be locked is assumed to be such that the condition number of Ri is small.In particular, if Hm is a symmetric tridiagonal matrix, Q always has the requiredform. Lines 3{4 return the updated �Hm�j to upper Hessenberg form.Before entering Purge, the unwanted converged Ritz pairs are placed at the frontof the factorization. A prior call to Lock places the unwanted values and vectors to the

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 15beginning of the factorization. Unlike Lock, the procedure Purge requires accessingand updating the entire factorization when A is nonsymmetric. Thus, for large scalenonsymmetric eigenvalue computations, the amount of purging performed should bekept to a minimum.Algorithm 6.3.function [Vm�i;Hm�i; fm�i] = Purge (Vm;Hm; fm; j; i)INPUT: A length m Arnoldi factorization AVm = VmHm + fmeTm: The �rst i +j columns of Vm represent an approximate invariant subspace for A. The leadingprincipal submatrix Hi+j of order i+ j of Hm is upper quasi-triangular and containsthe converged Ritz values. The i unwanted converged eigenvalues are in the leadingportion of Hi+j. The converged complex conjugate Ritz pairs are stored in 2�2 blockson the diagonal of Hi+j.OUTPUT: A length m� i Arnoldi factorization de�ned by Vm�i, Hm�i and fm�ipurged of the unwanted converged Ritz values and corresponding Schur vectors.Lines 1{3 purge the factorization of the unwanted converged Ritz values containedin the leading portion of Hm ;1. Solve the Sylvester set of equations,Z �Hm�i �HiZ = Gi;for Z 2 Ri�(m�i) that arise from block diagonalizing Hm ;Hm � Ii ZIm�i � = � Ii ZIm�i � � Hi �Hm�i � ;2. Compute the orthogonal factorizationQRm�i = � QiQm�i �Rm�i = � ZIm�i � ;where Q 2 Rm�(m�i) using Householder matrices ;3. Update the factorization and obtain a length m � i factorization ;Hm�i Rm�i �Hm�iQm�i ; Vm�i VmQ ; fm�i ��1m�i;m�ifm ;where �m�i;m�i = eTm�iRm�iem�i ;At the completion of Algorithm 6.3 the factorization is of length m � i and theleading sub-matrix of order j will be upper quasi{triangular. The wanted convergedRitz values will either be on the diagonal if real or in blocks of two for the complexconjugate pairs. Figure 6.2 shows the structure of the updated VmHm just prior todiscarding the unwanted portions.The solution of the Sylvester equation at line 1 determines the matrix Z thatblock diagonalizes the spectrum of Hm into two sub-matrices. The unwanted portionis in the leading corner and the remaining eigenvalues of Hm are in the other block.A solution Z exists when the Hi and �Hm�i do not have a common eigenvalue. Ifthere is an eigenvalue that is shared by Hi and �Hm�i, then Hm has an eigenvalueof multiplicity greater than one. The remedy is a criterion that determines whetherto increase or decrease i, the number of Ritz values that require purging. Analysissimilar to that in section 5 demonstrates that after line 3 the Ritz estimates for theeigenvalues of Hm�i are not altered. We also remark that Rm�i is nonsingular sincethe matrix � ZIm�i � is of full column rank and that j��1m�i;m�ij � 1.

16 R. B. LEHOUCQ AND D. C. SORENSEN7. Error Analysis. This section examines the numerical stability of the twode ation algorithmswhen computing in �nite precision arithmetic. A stable algorithmcomputes the exact solution of a nearby problem. It will be shown that Algorithms 6.3and 6.2 de ate slightly perturbed matrices.For ease of notation H = � H11 H12H21 H22 � replaces Hm 2 Rm�m used by proce-dures Lock and Purge of x 6.2. The sub-matrixH11 is of order i and H21 is zero exceptfor the sub-diagonal entry of H located in the north-east corner. Analogously, H rep-resents H after the similarity transformation performed by Lock or Purge, partitionedconformally.7.1. Locking. The locking scheme is considered successful if the desired eigen-values end up in H11 and H21 is small in norm. The largest source of error is fromcomputing an orthogonal factorization from the approximate eigenvector matrix con-taining the vectors to be locked.The matrix pair (X;D) represents an approximate quasi-diagonal form for H.The computed eigenvalues of H are on the diagonal of D if they have zero imaginarycomponent and in blocks of two for the complex conjugate pairs. The computedcolumns of X span the right eigenspace corresponding to diagonal values of D. Forthe blocks of order two on the diagonal the corresponding complex eigenvector is storedin two consecutive columns of X, the �rst holding the real part, and the second theimaginary part. We assume that X is a non-singular matrix and that each column isa unit vector.Standard results give kXD �HXk � �1kHk where �1 is a small multiple of ma-chine precision for a stable algorithm. De�ning the matrixE = (XD�HX)Y T whereX�1 = Y T it follows that (H +E)X = XD. If ��1m (X) is the smallest singular valueof X then kX�1k = ��1m (X). Since each column of X is a unit vector, kXk � pm.If �(X) = kXkkX�1k is the condition number for the matrix of approximate eigen-vectors, kEk � �1�(X)kHk. If X is a well conditioned matrix then the approximatequasi-diagonal form for H is exact for a nearby matrix. In particular, if H is sym-metric then E is always a small perturbation. As the columns of X become linearlydependent, �m(X) decreases and E may represent a large perturbation.The following result informs us that locking is a conditionally stable process.Theorem 7.1.Let H 2 Rm�m be an unreduced upper Hessenberg matrix with distinct eigen-values. Suppose that X = � X1 X2 � and D = � D1 00 D2 � are an approximatequasi-diagonal form for H that satis�es (H + E)X = XD where kEk � �1�(X)kHk.Let Q1R1 = X1 2 Rm�j where QT1Q1 = Ij . Suppose a QR factorization of X1 iscomputed so that QR = X1 + E where QT Q = Im and kEk � �2kX1k. Both �1and �2 are small multiples of the machine precision �M . Let � = max(�1; 2�2) and let�(R1) = kR1kkR�11 k be the condition number for R1 where� � �(R1)1� �2�(R1) :If � � �(�(X) + ��(1+ ��(R1))) < 1 then there exists a matrix C 2 Rm�m suchthat QT (H �C)Q = H = � H11 H120 H22 � ;

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 17where H11 is an upper quasi-triangular matrix similar to D1 andkCk � �(�(X) + �)kHk+ O(�2):(7.1)A few remarks are in order.1. If H is symmetric H12 = 0 and H11 is diagonal. Procedure Lock is stablesince noted previously, �(X) = 1 and � � 1. Parlett [26, pages 85{86]proves Theorem 7.1 for symmetric matrices when locking one approximateeigenvector.2. If only one column is locked, then � = 1 + O(�) and kCk is small relative to�(X)kHk.3. If �(R1) is large, the columns of X1 are nearly dependent. In this case, �(X)will also be large and locking will likely introduce no more error into thecomputation than already present from computing the quasi-diagonal pair(X;D). The factor of � may be minimized by decreasing j the number ofcolumns locked.4. A conservative strategy locks only one vector at a time. The only real concernis when locking two vectors corresponding to a complex conjugate pair. Ifthe real and imaginary part of the complex eigenvector are nearly aligned, �will be large and locking may be unstable. But as x 6.1 explains, the complexconjugate pair may be numerically regarded as a double eigenvalue with zeroimaginary part. Only one copy is de ated and � � 1.Proof.Partition X = � X1 X2 � and D = � D1 00 D2 �. The i columns of X1 are abasis for the right eigenspace to be locked and D1 contains the corresponding eigen-values. We assume that the eigenvalues of D1 and D2 are distinct and that X isnon-singular. Let Y T = � Y T1Y T2 � denote the inverse of X. The rows of Y T1 span theleft eigenspace associated with the computed eigenvalues of D1.Let the product QR be an exact QR factorization of a matrix near X1: QR =� Q1 Q2 � � R10 � = X1 + E where kEk � �2kX1k. Using Theorem 1.1 of Stew-art [36], since kR�11 kkEk < � < 1 there exists matrices W1 2 Rm�j and F1 2 Rj�jsuch that (Q1 +W1)(R1 + F1) = Q1R1 where QR = � Q1 Q2 � � R10 � = X1 and(Q1 +W1)T (Q1 +W1) = Ij . De�ne F = � F10 � and W = � W1 0 �. The matricesW and F are the perturbations that account for the backward error E produced bycomputation.Partitioning W conformally with Q givesQTHQ = QTXDY T Q� QTEQ;= QT (X1D1Y T1 +X2D2Y T2 )Q � QTEQ;� � QT1QT2 � (X1D1Y T1 +X2D2Y T2 ) � Q1 Q2 �+(7.2) WT (X1D1Y T1 +X2D2Y T2 ) � Q1 Q2 �+� QT1QT2 � (X1D1Y T1 +X2D2Y T2 )W � QTEQ;

18 R. B. LEHOUCQ AND D. C. SORENSENwhere the second order terms involving W are ignored. From the decompositionX1 = Q1R1 it follows that Q1 = X1R�11 which gives QT2X1 = 0. The equalityY T = X�1 implies that Y Tl Xl = I for l = 1; 2 and Y T2 X1 = 0 = Y T1 X2 and henceY T2 Q1 = 0.Using these relationships, equation (7.2) becomesQTHQ = � R1D1R�11 QT1XDY TQ20 QT2X2D2Y T2 Q2 �+ C;(7.3) � H + C;(7.4)where the matrix C absorbs the three matrix products involvingW or E on the righthand side of equation (7.2). We note that if H is symmetric, QT1X2 = 0 = Y T1 Q2,R1 is a diagonal matrix and hence R1D1R�11 = D1. Thus H is also a symmetricmatrix. De�ning C = QCQT equation (7.4) is rewritten as QT (H �C)Q = H. SinceQH = (X1D1Y T1 +X2D2Y T2 )Q and using the de�nition of C from equation (7.2),C = W TQH +QTWH � QTEQ;(7.5)it follows that kCk � 2kW TQkkHk+ kEk. The result of Theorem 1.1 of Stewart [36]also provides the estimatekWTQk � kWk � �2�(1 + �2��(R1));where O(�3) terms are ignored. For modest values of �, W is numerically orthogonalto Q. From equation (7.5)kCk = kCk;� 2�2�(1 + �2��(R1))kHk+ �1�(X)kHk;� 2�2�(1 + �2��(R1))(kHk+ kCk) + �1�(X)kHk;� �(�(X) + �(1 + ��(R1)))kHk+ ��(1 + ��(R1))kCk;� �kHk+ �kCk;where the second inequality uses equation (7.4). Since � < �, rearranging the lastinequality gives kCk(1 � �) � �kHk. Ignorning O(�2) terms kCk � �kHk. Theestimate on the size of C in equation (7.1) now follows since � = �(�(X) + �(1 +��(X))) � �(�(X) + �) + O(�2).7.2. Purging. The success of the purging scheme depends upon the solution ofthe Sylvester set of equations required by Algorithm 6.3. We rewrite the Sylvesterset of equations in Algorithm 6.3 as ZH22 �H11Z = H12. The job is to examine thee�ect of performing the similarity transformation RH22R�1 whereQR � � QT1QT2 �R = � ZI � � S:The last relation implies that R�1 = QT2 . In actual computation, this equality obvi-ates the need to solve linear systems with R necessary for the similarity transforma-tion. For the error analysis, that follows R�1 is used in a formal sense.Let Z be the computed solution to the Sylvester set of equations. In a similaranalysis, Bai and Demmel [2] assume that the QR factorization of S is performedexactly and we do also. The major source of error is that arising from computing Z.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 19Suppose that QR = � ZI � � S. Write Z = Z + E where E is the error in Z . IfQR = S and kR�1kkEk < 1, then Theorem 1.1 of Stewart [36] gives matrices W andF such that (Q+W )(R+F ) = QR where (Q+W )T (Q+W ) = Im. The result givesthe bound kFk � kRkkEk+O(kEk2). Up to �rst order perturbation terms,RH22R�1 = (R+ F )H22(R+ F )�1 = RH22R�1 +RH22R�1FR�1 + FH22R�1:De�ning the error matrix C = H22R�1F +R�1FH22 it follows thatRH22R�1 = R(H22 +C)R�1:Ignoring second order terms, we obtain the estimatekCk � 2kR�1kkFkkH22k � 2�(S)kEkkH22k:The invariance of k � k under orthogonal transformations gives �(S) = kR�1kkRk.Since the singular values of S are the square roots of the eigenvalues of STS it followsthat �(S) =s1 + �2max(Z)1 + �2min(Z) ;where �max(Z) and �max(Z) are the largest and smallest singular values of Z. SinceZTZ is a symmetric positive semi-de�nite matrix, �max(ZTZ) = kZk2, and then�(S) �p1 + kZk2, with equality if zero is an eigenvalue of ZTZ.The previous discussion is summarized in the following result.Theorem 7.2. Let Z be the computed solution to the Sylvester set of equations,ZH22�H11Z = H12, where the eigenvalues ofH11 and H22 are distinct. Let Z = Z+Ewhere E is the error in Z and suppose that kR�1kkEk < 1 where QR = � ZI �.Then there exists a matrix C such thatRH22R�1 = R(H22 +C)R�1;where kCk � 2p1 + kZk2 kEk kHk:(7.6)If kEk is a modest multiple of machine precision and the solution of the Sylvester'sequations is not large in norm, then purging is backward stable since kCk is smallrelative to kHk.The two standard approaches [3, 16] for solving Sylvester's equation show thatkFkF � �3(kH11kF + kH22kF )kZkF where F � H12 � ZH22 + H11Z and �3 is amodest multiple of machine precision. Standard bounds [8, 15] also give kZkF �sep�1(H11;H22)kH12kF wheresep(H11;H22) � minX 6=0 kXH22 �H11XkFkXkF ;is the separation between H11 and H22. Althoughsep(H11;H22) � mink;l j�k(H11)� �l(H22)j;

20 R. B. LEHOUCQ AND D. C. SORENSENVarah [38] indicates that if the matrices involved are highly non-normal, the smallestdi�erence between the spectrums ofH11 andH22may be an over estimate of the actualseparation. Recently, Higham [19] gives a detailed error analysis for the solution ofSylvester's equation. The analysis takes into account the special structure of theequations involved. For example, Higham shows that kEkF � sep�1(H11;H22)kFkFbut this may lead to an arbitrarily large estimate of the true forward error. For usein practical error estimation, \LAPACK-style" software is available.A robust implementation of procedure Lock determines the backward stability byestimating both kZk and kEk.8. Other De ation Techniques. Wilkinson [40, pages 584{602] has given acomprehensive treatment of various de ation schemes associated with iterative meth-ods. Recently, Saad [31, pages 117{125,180{182] discussed several de ation strategiesused with both simultaneous iteration and Arnoldi's method. Algorithm 6.2 is an inplace version of one of these schemes [31, page 181]. Saad's version explicitly orthonor-malizes the newly converged Ritz vectors against the already computed approximatej Schur vectors. This is the form of locking used by Scott [33]. Instead, procedureLock achieves the same task implicitly through the use of Householder matrices inRm�m. Thus we are able to orthogonalize vectors in Rn at a reduced expense sincem� n.Other de ation strategies include the various Wielandt de ation techniques [31,40]. We brie y review those that do not require the approximate left eigenvectorsof A or complex arithmetic. Denote by �1; : : : ; �j the wanted eigenvalues of A. TheWielandt and Schur{Wielandt forms of de ation determine a rank j modi�cation ofA, Aj = A� UjSjUTj ;(8.1)where Sj 2 Rj�j and j represents the dimension of the approximate invariant sub-space already computed. The idea is to choose Sj so that Aj will converge to theremainder of the invariant subspace desired. For example, Sj is selected to be adiagonal matrix of shifts �1; : : : ; �j so that Aj has eigenvalues f�1 � �1; : : : ; �j ��j; �j+1; : : : ; �ng.Both forms of de ation di�er in the choice of Uj. The Wielandt variant usesconverged Ritz vectors while the Schur{Wielandt uses an approximate Schur basis setvectors. With either form of de ation, the eigenvalues of Aj are �i � �i for i � jand �i otherwise and both forms leave the Schur vectors unchanged. This motivatesSaad to suggest that an approximate Schur basis should be incrementally built as Ritzvectors of Aj converge. Braconnier [6] employs the Wielandt variant and discussesthe details of de ating a converged Ritz value that has nonzero imaginary part in realarithmetic.We now compare our locking scheme to the Schur{Wielandt de ation techniques.We shall assume that AUj = UjRj is a real partial Schur form of order j for A andwe will put Sj = Rj in the Schur{Wielandt de ation scheme. Suppose thatA � Uj Vm � = � Uj Vm � � Rj Mj0 Hm �+ fm+jeTm+j ;(8.2)is a length m + j Arnoldi factorization obtained after locking. Consider any asso-ciated roundo� errors as being absorbed in A here. Equate the last m columns ofequation (8.2) to obtain AVm = UjMj + VmHm + fm+jeTm:(8.3)

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 21Since Uj is orthogonal to Vm, it follows that (I � UjUTj )A(I � UjUTj )Vm = VmHm +fm+jeTm: This implies that the Arnoldi factorization (8.2) is equivalent to applyingArnoldi's method to the projected matrix (I�UjUTj )A(I�UjUTj ) with the �rst columnof Vm as the starting vector. Keeping the locked vectors active in the construction andthe ira update of this Arnoldi factorization assures that the Krylov space generated byVm remains free of components corresponding to locked Ritz values. The appearanceof spurious Ritz values in the subsequent factorization is automatically avoided. Notethat when A is symmetric, this is equivalent to the selective orthogonalization [26,pages 275{284] scheme proposed by Parlett and Scott.In contrast to locking, consider the consequences of applying the Schur{Wielandtde ation scheme to construct a new Arnoldi factorization using Vme1 as a startingvector. In the symmetric case with exact arithmetic, the two schemes would bemathematically equivalent. Without these assumptions, there may be considerabledi�erences. From equation (8.3), it follows that(A � UjRjUTj )Vm = A(I � UjUTj )Vm = UjMj + VmHm + fm+jeTm:(8.4)From equation (8.4) we can use an easy induction to derive the relations(A� UjRjUTj )iVme1 = (UjMj + VmHm)Hi�1m e1; i � 1:Thus, the Krylov subspace Kk(A � UjRjUTj ; Vme1) and hence the correspondingArnoldi factorization of A � UjRjUTj must be corrupted with components in R(Uj)when the starting vector is orthogonal to R(Uj ): Within the context of Arnoldi itera-tions, the Schur{Wielandt techniques do not de ate the invariant subspace informa-tion contained in the R(Uj) from the remainder of the iteration.This helps to explain why Saad suggests that Wielandt and Schur{Wielandt de- ation techniques should not be used \to compute more than a few eigenvalues andeigenvectors."1 We note that if Mj � 0, then the Wielandt forms of de ation maysafely be used within an Arnoldi iteration. This will always be true when A is sym-metric.The cost of matrix vector products with Aj increases due to the rank j mod-i�cations of A required. Moreover, every time an approximate Schur vector or aRitz vector converges, the iteration needs to be explicitly restarted with Aj . Thetwo de ation techniques introduced in this paper allow the iteration to be implicitlyrestarted|avoiding the need to build a new factorization from scratch.Finally, we mention that the idea of de ating a converged Ritz value from aLanczos iteration is also discussed by Parlett and Nour-Omid [28]. They present anexplicit de ation technique by using the qr algorithm with converged Ritz values asshifts. Parlett indicates that this was a primary reason for undertaking the studyconcerning the forward instability of the qr algorithm [27].9. Reordering the Schur Form of a Matrix. We now establish a connectionbetween the ira-iteration with locking and the algorithms used to re-order the Schurform of a matrix. Suppose a matrix A is reduced to upper quasi-triangular form bythe qr algorithm : QTAQ = T � � T11 T12T22 � ;(9.1)1 Page 125 of [31]

22 R. B. LEHOUCQ AND D. C. SORENSENwhere Q is the orthogonal matrix computed by the algorithm. Equation (9.1) is aSchur form for A of order p + q where the sub-matrices T11 and T22 are of orderp and q, respectively. Assume that the spectrums of T11 and T22 are distinct. Inpractice, the order in which the computed eigenvalues of A appear on the diagonalof T is somewhat random. The �rst p columns of Q are an orthogonal basis for theunique invariant subspace associated with the eigenvalues of T11. If the eigenvaluesof interest are located in T22 and an orthonormal basis for them is wanted, we musteither increase the number of columns of Q used or somehow place them at the top ofT . Algorithms for re-ordering a Schur form accomplish this task by using orthogonalmatrices that move the wanted eigenvalues to the top of T . The recent work of Baiand Demmel [2] attempts to correct the occasional numerical problems encountered byStewart's algorithm [35] EXCHNG. Their work was motivated by that of Ruhe [29] andthat of Dongarra, Hammarling, and Wilkinson [11]. Both algorithms swap consecutive1� 1 and 2� 2 blocks of a quasi-triangular matrix to attain the desired ordering.Let both T11 and T22 of equation (9.1) be matrices of at most order two. Whenswapping adjacent blocks of order one, p = 1 = q, EXCHNG constructs a planerotation that zeros the second component of the eigenvector corresponding to theeigenvalue �2 = T22. A similarity transformation is performed on T with the planerotation and the diagonal blocks are interchanged. We refer to a strategy that con-structs an orthogonal matrix and performs a similarity transformation to interchangethe eigenvalues as a direct swapping algorithm. Consider the following alternate itera-tive swapping algorithm: Perform a similarity transformation on T with an arbitraryorthogonal matrix followed by one step of the qr-iteration with shift equal to �2.The arbitrary orthogonal similarity transformation introduces a non-zero o�-diagonalelement in the 2; 1 entry so that the transformed T is an unreduced upper Hessenbergmatrix with the diagonal blocks now coupled. The standard convergence theory ofthe qr algorithm dictates that �1 and �2 are switched and the 2; 1 entry is zero. Ifthe order of T22 is equal to two, EXCHNG uses the iterative swapping strategy using astandard double shift to re-order the diagonal blocks. The direct swapping algorithm,instead, computes an appropriate orthogonal matrix by computing the QR factoriza-tion of a basis of two vectors that span the desired invariant subspace. For examplethe factorization used in equation (6.1) in x 6.1 may be used. The reader is referredto [2, 11] for further details.The iterative swapping algorithm is equivalent to the implicit restarting techniqueused by the ira-iteration since both depend upon an implicitly shifted qr step appliedto an unreduced upper Hessenberg matrix to interchange T11 and T22. The directswapping algorithm is equivalent to the locking technique. An orthogonal matrix isconstructed from a basis for the invariant subspace corresponding to T22. When this isapplied as a similarity transformation the diagonal blocks of T are swapped. In exactarithmetic, both swapping variants result in a matrix that is upper quasi-triangularwith the blocks interchanged. Unfortunately, these existing reording techniques do notpreserve the leading portion of the Arnoldi factorization and thus explicit restartingwould have to be used.The following example demonstrates that the two variants may produce dras-tically di�erent output matrices when computed in oating point arithmetic. Thefollowing experiment was carried out in MATLAB, Version 4.2a, on a SUN SPARCstation IPX. The oating point arithmetic is IEEE standard double precision with

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 23machine precision of �M � 2�52 � 2:2204 � 10�16. LetT = � 1 + 10�M 10 1 � :An eigenvector corresponding to �2 = 1 is � �110�M �. Denote by Z the plane rotationthat transforms this eigenvector to a multiple of the �rst column of the identity matrixin R2�2. Let U = � 1 �5�M10�M 1 � ;so that U is orthogonal up to a small multiple of machine precision. The matrix Uacts as the arbitrary orthogonal transformation required by the iterative algorithm.Let T denote the matrix computed by performing one step of the qr-iteration to thematrix UTTU with shift equal to �1 = 1 + 10�M . We remark that for matrices oforder two, the explicit and implicit formulations of the qr-iteration are equivalent.The two computed matrices are:ZTTZ = � 1 �10 1 + 10�M � ;T = � 1:400000000000003 �7:999999999999996 � 10�12:000000000000002 � 10�1 6:000000000000001 � 10�1 � :The computed eigenvalues of T are 1:000000033320011 and 9:999999666799921 �10�1which both lost eight digits of accuracy. If we perform another qr-step on the ma-trix T with the same shift, � 1:000000000000003 1:000000000000001� 1:09 � 10�15 1 � is computed.Note that the o�-diagonal element is slightly larger than machine precision so thata standard qr algorithm does not set it to zero. Moreover, even if the o�-diagonalelement is set to zero, the iterative swapping algorithm fails to interchange the eigen-values. Continuing to apply qr-steps with the shift equal to �1 does not result in aproperly interchanged matrix.The explanation why the iterative algorithm fails to work is simple enough. Thematrix T constructed is poorly conditioned with respect to the eigenvalue problemsince the eigenvectors are nearly aligned. The eigenvalues of UTTU are1:000000033320011 and 9:999999666799921 � 10�1:Thus the small relative errors on the order of machine precision that occur when com-puting UTTU produce a nearby matrix in which both the eigenvalues di�er by eightdigits of accuracy. Performing a shifted qr step with �1 incurs forward instabilitysince the last components of the eigenvectors for UTTU are on the order of p�M .This is the necessary and su�cient condition of Parlett and Le [27]. Another qrstep with the same shift on T almost zeros out the sub-diagonal element since thelast components of the eigenvectors for T are order 10�1 and the shift is almost theaverage of the eigenvalues of T and quite close to both. We emphasize that the lossof accuracy of the computed eigenvalues is one of the deleterious e�ects of forwardinstability.

24 R. B. LEHOUCQ AND D. C. SORENSEN1. Initialize an Arnoldi factorization of length k2. Main Loop3. Extend an Arnoldi factorization to length k + p4. Check for convergenceExit if k wanted Ritz values convergeLet i and j denote the wanted and unwanted convergedRitz values, respectively5. Lock the i+ j converged Ritz values6. Implicit application of shifts resulting in anArnoldi factorization of length k + j7. Purge the j unwanted converged Ritz values.Table 10.1Formal description of an ira-iterationBai and Demmel [2] present an example which compares their direct swappingapproach with Stewart's algorithm EXCHNG. The matrix considered isA(� ) = 2664 7:001 �87 39:4� 22:2�5 7:001 �12:2� 36:0�0 0 7:01 �11:75670 0 37 7:01 3775 :When � = 10, ten iterations qr-iterations are required to interchange the two blocks.As before, the eigenvalues undergo a loss of accuracy. The iterative swapping algo-rithm fails for the matrix A(100). No explanation is given for the failure of Stewart'salgorithm. The explanation for the failure is the same as for the previous example.Using a direct algorithm, the eigenvalues of A(10) and A(100) are correctly swappedand the eigenvalues lose only a tiny amount of accuracy.Bai and Demmel present a rigorous analysis of their direct swapping algorithm.Although backward stability is not guaranteed, it appears that only when both T11and T22 are both of order two and have almost indistinguishable eigenvalues [5] isstability lost. In this case, the interchange is not performed. Bojanczyk and VanDooren [5] present an alternate swapping algorithm that appears to be backwardstable.10. Numerical Results. An ira-iteration using the two de ation proceduresof section 6.2 was written in MATLAB, Version 4.2a. An informal description givenparameters k and p is given in Table 10.1. The codes are available from the �rst authorupon request. A high-quality and robust implementation of the de ation proceduresis planned for the Fortran software package ARPACK [24].In the examples that follow Qk and Rk denote the approximate Schur factorsfor an invariant subspace of order k computed by an ira-iteration. All the exper-iments used the starting vector equal to randn(n, 1) where the seed is set withrandn('seed', 0) and n is the order of the matrix. The shifting strategy uses theunwanted eigenvalues of Hk+p that have not converged. An eigenpair (�; y) of Hk+pis accepted if its Ritz estimate (2.2) satis�es,jeTk+pyj kfk+pk � �j�j:(10.1)The value of � is chosen according to the relative accuracy of the Ritz value desired.10.1. Example 1. The �rst example illustrates the use of the de ation tech-niques when the underlying matrix has several complex repeated eigenvalues. The

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 25ira-iteration for C450k = 12 and p = 16 with convergence tolerance � = 10�10Iteration Ritz values Locked Ritz values Purged9 2 010 2 012 2 013 2 017 2 021 0 224 2 028 0 231 2 0Totals 14 4Number of matrix vector products 436kC450Q12 �Q12R12k � 10�12kQT12C450Q12 �R12k � 10�11kQT12Q12 � I12k � 10�14kD12 � �12k1 � 10�15Table 10.2Convergence history for Example oneexample also demonstrates how the iteration locks and purges blocks of Ritz valuesin real arithmetic. A block diagonal matrix C was generated having n blocks of ordertwo. Each block was of the form � �l �l��l �l � ;where �l=i+j�1 � 4 sin2( i�2(n + 1)) + 4 sin2( j�2(n+ 1));for 1 � i; j � n and �l � p�l. The eigenvalues of C are �l��li where i = p�1. Sincethe eigenvalues of a quasi-diagonal matrix are invariant under orthogonal similaritytransformations, using an ira-iteration on C with a randomly generated startingvector is general. An ira-iteration was used to compute the k = 12 eigenvaluesof C450 with smallest real part. The number of shifts used was p = 16 and theconvergence tolerance � was set equal to 10�10. With these choices of k and p, theiteration stores at most twenty eight Arnoldi vectors. There are four eigenvalues withmultiplicity two. Table 10.2 shows the results attained. Let the diagonal matrix D12denote the eigenvalues of the upper triangular matrix R12 computed by the iteration.The diagonal matrix �12 contains the wanted eigenvalues. After twenty four iterationstwelve Ritz values converged. But the pair of Ritz values purged at iteration twentyone was a previously locked value which the iteration discarded. This behavior istypical when there are clusters of eigenvalues.10.2. Example 2. Consider the eigenvalue problem for the convection{di�usionoperator, ��u(x; y) + �(ux(x; y) + uy(x; y)) = �u(x; y);

26 R. B. LEHOUCQ AND D. C. SORENSENon the unit square [0; 1] � [0; 1] with zero boundary data. Using a standard �ve-point scheme with centered �nite di�erences, the matrix Ln2 that arises from thediscretization is of order n2 where h = 1=(n+ 1) is the cell size. The eigenvalues ofLn2 are �ij = 2p1� cos( i�n + 1) + 2p1� cos( j�n+ 1);for 1 � i; j � n where = �h=2. An ira-iteration was used to compute the k = 6smallest eigenvalues of L625 where � = 25. The number of shifts used was p = 10 andthe convergence tolerance � was set equal to 10�8. With these choices of k and p, theiteration stores at most sixteen Lanczos vectors. Let the diagonal matrix D6 denotethe eigenvalues of the upper triangular matrix R6 computed by the iteration. Thediagonal matrix �6 2 R6�6 contains the six smallest eigenvalues. We note that thereare two eigenvalues with multiplicity two. Table 10.3 shows the results attained. Thediagonal matrixD6 approximates �6. After thirty iterations six Ritz values converged.But the Ritz value purged at iteration twenty four was a previously locked value. Theother purged Ritz values are approximations to the eigenvalues of L625 larger than�6. Figure 10.1 gives a graphical interpretation of the expense of an ira-iteration interms of matrix vector products when the value of p is increased. For all values ofp shown, the results of the iteration were similar to those of Table 10.3. The resultspresented in Table 10.3 correspond to the value of p that gave the minimum numbermatrix vector products. For the value of p = 1, the iteration converged to the �vesmallest eigenvalues after nine hundred ninety nine matrix vector products. But theiteration was not able to converge to the second copy of �5. For p = 2, the only formof de ation employed was locking. All others values of p shown demonstrated similarbehavior to that of Table 10.3.In order to determine the bene�t of the two de ation techniques, experiments wererepeated without the use of locking or purging. In addition, all the unwanted Ritzvalues were used as shifts, converged or not. The �rst run used the same parametersas given in Table 10.3. After 210 matrix vector products, the iteration converged tosix Ritz values. But the second copy of the �fth smallest eigenvalue was not amongthe �nal six. The value of p was increased to twenty three with the same results.10.3. Example 3. The following example shows the behavior of the iterationon a matrix with a very ill conditioned basis of eigenvectors. De�ne the Clementtridiagonal matrix [20] of order n+ 1Bn+1 = 26664 0 n � � � 01 0 n� 1... . . . . . .0 n 0 37775 :The eigenvalues are �n;�n� 2; � � � ;�1 and zero if n is even. We note that Bn+1 =Sn+1An+1S�1n+1 where S2n+1 = diag(1; n1 ; n1 n�12 ; � � � ; n!n! ) is a diagonal matrix. Thus thecondition number of the basis of eigenvectors for Bn+1 is kSn+1k kS�1n+1k which impliesthat the eigenvalue problem for Bn+1 is quite ill conditioned. An ira-iteration wasused to compute the k = 4 largest in magnitude eigenvalues of B1000. The numberof shifts used was p = 16 and the convergence tolerance � was set equal to 10�6.With these choices of k and p, the iteration stores at most twenty Arnoldi vectors.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 27ira-iteration on L625k = 6 and p = 10 with convergence tolerance � = 10�8Iteration Ritz values Locked Ritz values Purged14 1 016 1 019 1 021 1 023 1 124 0 130 1 035 0 138 1 1Totals 7 4Number of matrix vector products 325kL625Q6 �Q6R6k � 10�9kQT6 L625Q6 �R6k � 10�9kQT6 Q6 � I6k � 10�14kD6 � �6k1 � 10�7Table 10.3Convergence history for Example twoira-iteration on B1000k = 4 and p = 16 with convergence tolerance � = 10�6Iteration Ritz values Locked Ritz values Purged76 1 085 1 091 2 0Totals 4 0Number of matrix vector products 1423kB1000Q4 �Q4R4k=kB1000k � 10�6kQT4 B1000Q4 �R4k � 10�6kQT4 Q4 � I4k � 10�14kD4 � �4k1=kB1000k1 � 10�6Table 10.4Convergence history for Example threeLet the diagonal matrix D4 denote the eigenvalues of the upper triangular matrixR4 computed by the iteration. The diagonal matrix �4 2 R4�4 contains the fourlargest in magnitude eigenvalues. Table 10.4 shows the results attained. Althoughthe iteration needed a large number of matrix vector products, the iteration was ableto extract accurate Ritz values given the convergence tolerance.10.4. Example 4. Finally, we present a dramatic example of how the conver-gence of an ira-iteration bene�ts from the two de ation procedures. A matrix T of

28 R. B. LEHOUCQ AND D. C. SORENSENira-iteration on Tk = 1 and p = 3 with convergence tolerance � = 10�3Iteration Ritz values Locked Ritz values Purged1 0 115 1 1Totals 1 2Number of matrix vector products 32kTQ1 �Q1R1k=�1 � 10�3kQT1 TQ1 � R1k=�1 � 10�3kQT1 Q1 � I1k � 10�15kR1 � �1k1=�1 � 10�3Table 10.5Convergence history for Example fourorder ten had the values�1 = 10�6; �i=2:8 = i � 10�3; �9:10 = 1;on the diagonal. Since the eigenvalues of a matrix are invariant under orthogonalsimilarity transformations, using an ira-iteration on T with a randomly generatedstarting vector is general. An ira-iteration was used to compute an approximationto the smallest eigenvalue. The number of shifts used was p = 3 and the convergencetolerance � was set equal to 10�3. Table 10.5 shows the results attained. Anotherexperiment was run with the locking and purging mechanisms turned o�. Addition-ally, all unwanted Ritz values were used as shifts. The same parameters were used asin Table 10.5 but the iteration now consumed forty one matrix vector products. Asin the results for Table 10.5, the modi�ed iteration converged to one of the dominanteigenvalues after one iteration. After six iterations, the leading block of H4 split o�,having converged to the invariant subspace corresponding to �9:10. But since purgingwas turned o�, the modi�ed iteration had to continue attempting to converge to �1using only the lower block of order two in H4. Incidently, if the iteration insteadsimply discarded the leading portion of the factorization corresponding to �9:10 afterthe sixth iteration, convergence to �1 never occurred. Crucial to the success of anira-iteration is the ability to de ate converged Ritz values in a stable manner. Bothpurging and locking allow faster convergence.11. Conclusions. In the paper, we developed de ation techniques for an implic-itly restarted Arnoldi iteration. The �rst technique, Locking, allows an orthogonalchange of basis for an Arnoldi factorization which results in a partial Schur decompo-sition containing the converged Ritz values. The corresponding Ritz value is de atedin an implicit but direct manner. The second technique, Purging, allows implicitremoval of unwanted converged Ritz values from the Arnoldi iteration. Both de a-tion techniques are accomplished by working with matrices in the projected Krylovspace which for large eigenvalue problems is a fraction of the order of the matrix fromwhich estimates are sought. Since both de ation techniques are implicitly appliedto the Arnoldi factorization the need for explicit restarting associated with all otherde ation strategies is avoided. Both techniques were carefully examined with respectto numerical stability and computational results were presented. Convergence of the

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 29Arnoldi iteration is improved and a reduction in computational e�ort is realized.Although a direct comparison with block Arnoldi/Lanczos methods was not given,computational experience shows that if an ira-iteration builds the same size factor-ization used by the block methods, and the convergence tolerance is small enough,multiple or clustered eigenvalues are correctly computed. The connection between anira- and qr-iteration explains the reason for the size of the convergence toleranceused. REFERENCES[1] W. E. Arnoldi, The principle of minimized iterations in the solution of the matrix eigenvalueproblem, Quarterly of Applied Mathematics, 9 (1951), pp. 17{29.[2] Z. Bai and J. W. Demmel, On swapping diagonal blocks in real Schur form, Linear Algebraand Its Applications, 186 (1993), pp. 73{95.[3] R. H. Bartels and G. W. Stewart, Algorithm 432: Solution of the matrix equationAX +XB = C, Communications of the ACM, 15 (1972), pp. 820{826.[4] M. Bennani and T. Braconnier, Stopping criteria for eigensolvers, technical report, Novem-ber 1994. Submitted to Jour. Num. Lin. Alg. Appl.[5] A. Bojanczyk and P. V. Dooren, Reordering diagonal blocks in the Schur form, in LinearAlgebra for Large Scale and Real Time Applications, NATO ASI Series, Kluwer AcademicPublishers, 1993, pp. 351{352.[6] T. Braconnier, The Arnoldi{Tchebyche� algorithm for solving large nonsymmetric eigen-problems, Technical Report TR/PA/93/25, CERFACS, Toulouse, France, 1993.[7] D. Calvetti, L. Reichel, and D. C. Sorensen, An implicitly restarted Lanczos method forlarge symmetric eigenvalue problems, ETNA, 2 (1994), pp. 1{21.[8] F. Chatelin, Eigenvalues of Matrices, Wiley, 1993.[9] F. Chatelin and V. Frays�ee,Qualitative computing: elements of a theory for �nite-precisioncomputation, tech. report, CERFACS and THOMSON{CSF, June 1993. Lecture Notes forthe Commett European Course, June 8{10, Orsay, France.[10] J. Cullum and W. E. Donath, A block Lanczos algorithm for computing the q algebraicallylargest eigenvalues and a corresponding eigenspace for large, sparse symmetric matrices,in Proceedings of the 1974 IEEE Conference on Decision and Control, New York, 1974,pp. 505{509.[11] J. Dongarra, S. Hammarling, and J. Wilkinson, Numerical considerations in comput-ing invariant subspaces, SIAM Journal on Matrix Analysis and Applications, 13 (1992),pp. 145{161.[12] I. S. Duff and J. A. Scott, Computing selected eigenvalues of large sparse unsymmetricmatrices using subspace iteration, ACM Transactionson Mathematical Software, 19 (1993),pp. 137{159.[13] W. Feller and G. Forsythe, New matrix transformations for obtaining characteristic vec-tors, Quarterly of Applied Mathematics, 8 (1951), pp. 325{331.[14] S. Godet-Thobie, Eigenvalues of large highly nonnormal matrices, PhD thesis, UniversityParis IX, Dauphine, Paris, France, 1993. C.E.R.F.A.C.S. Report Ref.: TH/PA/93/06.[15] G. H. Golub and C. F. V. Loan, Matrix Computations, Johns Hopkins, second ed., 1989.[16] G. H. Golub, S. Nash, and C. F. V. Loan, A Hessenberg{Schur method for the problemAX +XB = C, IEEE Transactions on Automatic Control, AC-24 (1979), pp. 909{913.[17] G. H. Golub and R. Underwood, The block Lanczos method for computing eigenvalues, inMathematical Software III, J. R. Rice, ed., 1977, pp. 361{377.[18] R. G. Grimes, J. G. Lewis, and H. D. Simon, A shifted block Lanczos algorithm for solv-ing sparse symmetric generalized eigenproblems, SIAM Journal on Matrix Analysis andApplications, 15 (1994), pp. 228{272.[19] N. J. Higham, Perturbation theory and backward error for AX � XB = C, BIT, 33 (1993),pp. 124{136.[20] , The Test Matrix Toolbox for Matlab, Numerical Analysis Report No. 237, University ofManchester, England, Dec. 1993.[21] W. Karush, An iterative method for �nding characteristics vectors of a symmetric matrix,Paci�c Journal of Mathematics, 1 (1951), pp. 233{248.[22] C. Lanzcos, An iteration method for the solution of the eigenvalue problem of linear di�er-ential and integral operators, Journal of Research of the National Bureau of Standards, 45(1950), pp. 255{282. Research Paper 2133.

30 R. B. LEHOUCQ AND D. C. SORENSEN[23] R. B. Lehoucq, Analysis and Implementation of an Implicitly Restarted Iteration, PhD thesis,Rice University, Houston, Texas, May 1995. Also available as Technical Report TR95-13,Dept. of Computational and Applied Mathematics.[24] R. B. Lehoucq, D. C. Sorensen, and P. Vu, ARPACK: An implementationof the ImplicitlyRe-started Arnoldi Iteration that computes some of the eigenvalues and eigenvectors of alarge sparse matrix, 1995. Available from [email protected] under the directory scalapack.[25] C. C. Paige, The computation of eigenvalues and eigenvectors of very large sparse matrices,PhD thesis, University of London, London, England, 1971.[26] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, 1980.[27] B. N. Parlett and J. Le, Forward instability of tridiagonal QR, SIAM Journal on MatrixAnalysis and Applications, 14 (1993), pp. 279{316.[28] B. N. Parlett and B. Nour-Omid, The use of a re�ned error bound when updating eigen-values of tridiagonals, Linear Algebra and Its Applications, 68 (1984), pp. 179{219.[29] A. Ruhe, An algorithm for numerical determination of the structure of a general matrix, BIT,10 (1970), pp. 196{216.[30] Y. Saad, Variations on Arnoldi's method for computing eigenelements of large unsymmetricmatrices, Linear Algebra and Its Applications, 34 (1980), pp. 269{295.[31] , Numerical Methods for Large Eigenvalue Problems, Halsted Press, 1992.[32] M. Sadkane, A block Arnoldi{Chebyshev method for computing the leading eigenpairs of largesparse unsymmetric matrices, Numerische Mathematik, 64 (1993), pp. 181{193.[33] J. A. Scott, An Arnoldi code for computing selected eigenvalues of sparse real unsymmetricmatrices, ACM Transactions on Mathematical Software, 21 (1995), pp. 432{475.[34] D. C. Sorensen, Implicit application of polynomial �lters in a k-step Arnoldi method, SIAMJournal on Matrix Analysis and Applications, 13 (1992), pp. 357{385.[35] G. W. Stewart, ALGORITHM 506: HQR3 and EXCHANG: Fortran subroutines for calcu-lating and ordering the eigenvalues of a real upper Hessenberg matrix [F2], ACM Trans-actions on Mathematical Software, 2 (1976), pp. 275{280.[36] , Perturbation bounds for the QR factorization of a matrix, SIAM Journal on NumericalAnalysis, 14 (1977), pp. 509{518.[37] W. Stewart and A. Jennings, A simultaneous iteration algorithm for real matrices, ACMTransactions on Mathematical Software, 7 (1981), pp. 184{198.[38] J. M. Varah, On the separation of two matrices, SIAM Journal on Numerical Analysis, 16(1979), pp. 216{222.[39] D. S. Watkins, Forward stability and transmission of shifts in the QR algorithm, SIAMJournal on Matrix Analysis and Applications, 16 (1995), pp. 469{487.[40] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, UK, 1965.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 31+

p

k

p

pkk + pFig. 3.1. The set of rectangles represents the matrix equation Vk+pHk+p + fk+peTk+p of anArnoldi factorization. The unshaded region on the right is a zero matrix of k + p� 1 columns.+

k

p

pkk + pFig. 3.2. After performing p implicitly shifted qr steps on Hk+p, the middle set of picturesillustrates Vk+pQQTHk+pQ + fk+peTk+pQ. The last p + 1 columns of fk+peTk+pQ are non-zerobecause of the qr-iteration.k

k

+

kFig. 3.3. After discarding the last p columns, the �nal set represents VkHk + fkeTk of a lengthk Arnoldi factorization.

32 R. B. LEHOUCQ AND D. C. SORENSENLocked Vectors

Active FactorizationFig. 6.1. The matrix product VmHm of the factorization upon entering Algorithm 6.2 or 6.3.The shaded region corresponds to the converged portion of the factorization.Vectors to be Purged

Active Factorization

Locked VectorsFig. 6.2. The matrix product VmHm of the factorization just prior to discarding in Algorithm6.3. The darkly shaded regions may now be dropped from the factorization.

DEFLATION TECHNIQUES FOR IMPLICIT RESTARTING 33

0 2 4 6 8 10 12 14 160

200

400

600

800

1000

1200

p

Num

ber

of m

atrix

vec

tor

prod

ucts

Fig. 10.1. Bar graph of the number of matrix vector products used by an ira-iteration forExample 2 as a function of p.

Date post:	08-Nov-2014
Category:	Documents
Upload:	maritza-l-m-capristano
View:	18 times
Download:	1 times

25 Deflation Techniques for an Implicitly Restarted Arnoldi Iteration

Documents