REORTHOGONALIZATION FOR THE GOLUB–KAHAN–LANCZOS BIDIAGONAL ...b58/lanczos_svd_orthI.pdf ·...

transcript

REORTHOGONALIZATION FOR THEGOLUB–KAHAN–LANCZOS BIDIAGONAL

REDUCTION: PART I – SINGULAR VALUES

JESSE L. BARLOW ∗

Department of Computer Science and Engineering, The Pennsylvania State University, University

Park, PA

16802–6822 USA. email: barlow@cse.psu.edu

Abstract. The Golub–Kahan–Lanczos bidiagonal reduction generates a factorization of a matrixX ∈ Rm×n, m ≥ n, such that

X = UBV T

where U ∈ Rm×n is left orthogonal, V ∈ Rn×n is orthogonal, and B ∈ Rn×n is bidiagonal. Whenthe Lanczos recurrence is implemented in finite precision arithmetic, the columns of U and V tendto lose orthogonality, making a reorthogonalization strategy necessary to preserve convergence of thesingular values.

It is shown that if

orth(V ) = ‖I − V T V ‖2,

then the singular values of B and those of X satisfy

(σj(X) − σj(B))2

≤ O(εM + orth(V ))‖X‖2

where εM is machine precision. Moreover, a strategy is introduced for neglecting small off-diagonalelements during reorthogonalization that preserves the above bound on the singular values.

AMS subject classifications. 65F15,65F25.

Key words. Lanczos vectors, orthogonality, singular values, left orthogonal matrix.

1. Introduction. Bidiagonal reduction, the first step in many algorithms forcomputing the singular value decomposition (SVD) [10, 2], is also used for solving leastsquares problems [20, 17], for solving ill-posed problems [9, 5, 13], the computationof matrix functions [7] [12, §13.2.3], for matrix approximation [4], and the solution ofthe Netflix problem in [16].

In [10], Golub and Kahan give two Lanczos-based bidiagonal reduction algorithmswhich we call the Golub–Kahan–Lanczos (GKL) algorithms. The first GKL algorithmtakes a matrix X ∈ Rm×n, m ≥ n, and generates the factorization

X = UBV T(1.1)

U = (u1, . . . ,un) ∈ Rm×n, left orthogonal,(1.2)

V = (v1, . . . ,vn) ∈ Rn×n, orthogonal,(1.3)

∗ The research of Jesse L. Barlow was supported by the National Science Foundation under grantno. CCF-0429481.

2 J.L. BARLOW

and B ∈ Rn×n having a bidiagonal form given by

γ1 φ2 0 · · ·γ2 φ3 · · ·

· · · ·· · · γn−1 φn

· · γn

def= ubidiag(γ1, . . . , γn; φ2, . . . , φn).(1.4)

For certain structured matrices, even with reorthogonalization, this GKL algorithmyields a faster method of producing a bidiagonal reduction to compute the completesingular value decomposition. For large sparse matrices, it is often the method ofchoice to compute a few singular values and associated singular vectors.

The recurrence generating the decomposition (1.1)–(1.4) is constructed by choos-ing a vector v1 ∈ Rn such that ‖v1‖2 = 1, letting uk ∈ Rm, k = 1, . . . , n andvk ∈ Rn, k = 2, . . . , n be unit vectors, and letting γk, φk, k = 1, . . . , n be scalingconstants such that

γ1u1 = Xv1,(1.5)

φk+1vk+1 = XTuk − γkvk, k = 1, . . . , n − 1,(1.6)

γk+1uk+1 = Xvk+1 − φk+1uk.(1.7)

The other GKL algorithm in [10] starts with u1 and instead generates a lower bidi-agonal matrix. The discussion below also applies to that recurrence if we note thatsecond GKL algorithm is just the first applied to

with v1 = e1. For ourpurposes, it is best to associate V with the minimum of the two dimensions m and nof X . The recurrence (1.5)–(1.7) is equivalent to the symmetric Lanczos tridiagonal-ization algorithm performed on the matrix

with the starting vector

Since the vectors u1, . . . ,un and v1, . . . ,vn tend to lose orthogonality in finiteprecision arithmetic, reorthogonalization is performed when the bidiagonal reductionalgorithm (1.5)–(1.7) is used to compute the singular value decomposition as in [10]or in regularization algorithms as in [5, 9] or in the computation of matrix functionsas in [7].

Paige [18] pointed out that the loss of orthogonality in Lanczos reductions is struc-tured in the sense that it is coincident with the convergence of approximate eigenvaluesand eigenvectors (called Ritz values and vectors). Parlett and Scott [22] used this ob-servation to develop partial reorthogonalization procedures. A good summary of thesurrounding issues is given by Parlett [21, Chapter 13].

To understand how the algorithm works with reorthogonalization of V , we definethe loss of orthogonality measures

orth(V )def= ‖I − V T V ‖2,(1.9)

ηk = ‖V Tk−1vk‖2, ηk =

.(1.10)

LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 3

Noting that orth(Vk) satisfies the upper bound

orth(Vk) ≤ ‖I − V Tk Vk‖F ≤

2ηk,(1.11)

and the lower bound

orth(Vk) ≥ max1≤j≤k

ηj ≥ 1√kηk,

we have that orth(Vk) and ηk are large or small together. Thus we express ourbounds in terms of ηk with the understanding that, with minor modification, theycould expressed in terms of orth(Vk).

The singular values of B, given by

σ1(B) ≥ σ2(B) ≥ · · · ≥ σn(B),

and the corresponding singular values of X satisfy an O(εM + ηn) bound given inequation (3.25) in Theorem 3.6. Thus the accuracy of the computed singular valuesdepends upon our ability to preserve the orthogonality of V . These results are similarto those for a procedure due to Barlow, Bosner, and Drmac [2] that generates V usingHouseholder transformations and U by the recurrences (1.5)–(1.7).

We structure this paper as follows. In §2, we establish the framework for theanalysis in §3. In §3, we prove our main theorem (Theorem 3.1) and results on thesingular values of B. In §4, we give three reorthogonalization strategies for V andgive a method for neglecting small superdiagonal elements resulting from reorthogo-nalization. In §5, we give numerical tests based upon regulating the orthogonality ofV in various ways which we follow with a conclusion in §6.

In part II of this work [1] this author uses Theorem 3.1 to produce an algorithmthat computes left singular vectors with stronger residual and orthogonality boundsthan previous versions of the GKL algorithm in the literature.

2. The Lanczos Bidiagonal Recurrence with Reorthogonalization. Inexact arithmetic, the columns of V in (1.3), computed according to (1.5)–(1.7), areorthonormal, but, in floating point arithmetic, some reorthogonalization of these vec-tors is necessary. A model of how that reorthogonalization could be done is proposedand analyzed below.

To recover vk+1 from v1, . . . ,vk and u1, . . . ,uk, we compute

rk = XTuk − γkvk.(2.1)

We then reorthogonalize rk is against v1, . . . ,vk so that

φk+1vk+1 = rk −k∑

hj,k+1vj(2.2)

= rk − Vkhk+1, hk+1 =

h1,k+1

hk,k+1

for some coefficients hj,k+1, j = 1, . . . , k. Combining (2.1) and (2.3), we have that

φk+1vk+1 = XTuk − Vkhk+1(2.4)

4 J.L. BARLOW

where hk+1 = γkek + hk+1.To encapsulate our approaches to reorthogonalization, we assume the existence

of a general function reorthog that performs step (2.3) in some manner. Thus the(k + 1)st Lanczos vector comes from (2.1) followed by

[vk+1, hk+1, φk+1] = reorthog(Bk, Vk, rk),(2.5)

Bk = ubidiag(γ1, . . . , γk; φ2, . . . , φk)(2.6)

may provide necessary information for the partial reorthogonalization schemes.In floating point arithmetic, we assume that the steps (2.1) and (2.5) produce

vectors vk+1 and hk+1, and a scalar φk+1 such that

XTuk = Vkhk+1 + φk+1vk+1 + βk+1(2.7)

‖βk+1‖2 ≤ εMq(m)‖X‖2(2.8)

for some modest sized function q(m). The value of q(m) varies depending upon whichorthogonalization method is used, but, for, say, the complete reorthogonalizationscheme in Function 4.1, we would have q(m) = O(m). In general, we have therecurrence

XT Uk = Vk+1Hk+1 + Ek(2.9)

, Hk+1 =

Hk hk+1

0 φk+1

Ek = (β2, . . . , βk+1).

The following function specifies the first k steps of the Lanczos bidiagonal reduc-tion.

Function 2.1 (First k steps of Lanczos Bidiagonal Reduction with reorthogo-nalization).function [Bk, Uk, Vk]=lanczos bidiag(X,v1, k)V1 = (v1); s1 = Xv1; γ1 = ‖s1‖2; u1 = s1/γ1;for j = 2: k

rj = XTuj−1 − γj−1vj−1;

[vj , hj , φj ]=reorthog(Bj−1, Vj−1, rj);sj = Xvj − φjuj−1; γj = ‖sj‖2; uj = sj/γj;Vj =

Vj−1 vj

; Uj =(

Uj−1 uj

Bj−1 φjej−1

end;end; lanczos bidiag

We discuss three specific methods for performing the reorthogonalization in re-

orthog in §4. In the remainder of this section, we discuss a model for that loss thatleads to the analysis in §3.

Although hk is discarded by Function 2.1 in the construction of Uk, Bk, and Vk,we show that throwing out this information affects the accuracy on the singular valuesof B only through the loss of orthogonality of V .

We assume orth(Vk) < 1, for all k, otherwise Vk is not meaningfully close to leftorthogonal. Using the definition of orth(Vk) in (1.10), the singular values of Vk arebounded by

σ1(Vk) =√

λ1(V Tk Vk) ≤

λ1(Ik) + ‖I − V Tk Vk‖2 =

1 + orth(Vk),(2.10)

σk(Vk) =√

λk(V Tk Vk) ≥

λk(Ik) − ‖I − V Tk Vk‖2 =

1 − orth(Vk).(2.11)

If V †k is the Moore–Penrose pseudoinverse of Vk, then

‖Vk‖2 = σ1(Vk) ≤ (1 + orth(Vk))1/2

,(2.12)

‖V †k ‖2 = σk(Vk)−1 ≤ (1 − orth(Vk))

−1/2.(2.13)

Equation (2.8) can be rewritten,

XTuk+1 − βk+1 = Vk+1

Using our assumption that orth(Vk+1) < 1, Vk+1 must have full column rank so that

V †k+1 satisfies V †

k+1Vk+1 = Ik+1, thus

= V †k+1(X

Tuk+1 − βk+1).

Adding the assumption that orth(Vk+1) and q(m)εM are sufficiently small that

(1 + q(m)εM )

(1 − orth(Vk+1))1/2

def= ω(2.14)

for some reasonable constant ω, thus we infer that

‖Hk+1ek‖2 = ‖(

‖2 ≤ ‖V †k+1‖2(‖XTuk+1‖2 + ‖βk+1‖2),

≤ (1 + q(m)εM )

(1 − orth(Vk+1))1/2

‖X‖2,

= ω‖X‖2.(2.15)

Thus, the columns of Hk+1 are bounded as long as reasonable orthogonality is main-tained for Vk+1.

3. Error Bounds for GKL Bidiagonalization with One-Sided Reorthog-

onalization. The results in this paper and in [1] are based upon Theorem 3.1 statednext.

6 J.L. BARLOW

Theorem 3.1. Let Function 2.1 be implemented in floating point arithmetic withmachine unit εM . Assume that Vk = (v1, . . . ,vk) with orthogonality parametrized byηk in (1.10), Uk = (u1, . . . ,uk), and Bk = ubidiag(γ1, . . . , γk; φ2, . . . , φk) are outputfrom that function. Assume also that orth(Vk) < 1. Define

n 0m XVk

,(3.1)

Wj = I − wjwTj , wj =

Wk = W1 · · ·Wk.(3.3)

If q(m) is defined in (2.8) and ω is given by (2.14), then for k = 1, . . . , n

Ck + δCk = Wk

m + n − k 0

‖δCk‖F ≤ [f1(m, n, k)εM + f2(k)ηk]‖X‖2 + O(ε2M )(3.5)

f1(m, n, k) =√

2/3q(m) + m + n + 2], f2(k) = ω√

2/3k3/2.(3.6)

The matrix Wk is orthogonal because u1, . . . ,uk are unit vectors. Some detailsof the form of Wk are given in [19, Theorem 2.1].

Three technical lemmas are necessary to prove the result (3.4)–(3.6); the firstconcerns the effect of Wk−1.

Lemma 3.2. Let φk, γk,uk, and vk be computed by the kth step of Function 2.1.Let Wk be defined in (3.2). Then, for k ≥ 2,

Wk−1

Xvk − φkuk

+ δzk−1(3.7)

‖δzk−1‖2 ≤√

2(ωηk + q(m)εM )‖X‖2.(3.8)

Proof. We have that

Wk−1

Xvk − φkuk

= Wk−1

φkek−1

+ Wk−1

0Xvk − φkuk

0φkuk−1

0Xvk − φkuk−1

+ uTk−1(Xvk − φkuk−1)wk−1

+ uTk−1(Xvk − φkuk−1)wk−1.(3.9)

To bound the last term, we note that from (2.7), we have

XTuk−1 = φkvk + Vk−1hk + βk, ‖βk‖2 ≤ q(m)εM‖X‖2,

so that

uTk−1Xvk = φkv

Tk vk + vT

k Vk−1hk + uTk−1βk

= φk + δφk(3.10)

δφk = vTk Vk−1hk + uT

k−1βk.

|δφk| ≤ ‖V Tk−1vk‖2‖hk‖2 + ‖βk‖2

≤ ‖V Tk−1vk‖2‖Hk+1ek‖2 + ‖βk‖2

≤ (ωηk + q(m)εM )‖X‖2.(3.11)

Combining (3.9), (3.10), and (3.11), we have (3.7) where

δzk−1 = (δφk)wk−1.

‖δzk−1‖2 = |δφk|‖wk−1‖2.

Since ‖wk−1‖2 = ‖(eTk−1,u

Tk−1)

T ‖2 =√

2, we have the bound (3.8) for ‖δzk−1‖2.Our second lemma bounds the effect of Wj , j = 1, 2, . . . , k − 2.Lemma 3.3. Assume the hypothesis and notation of Lemma 3.2. For k ≥ 3 and

j < k − 1, we have

+ δzj(3.12)

‖δzj‖2 ≤√

2(ωηk + q(m)εM )‖X‖2.(3.13)

Proof. First, we note that

− wjwTj

− (uTj Xvk)wj .(3.14)

Again, using (2.7), we have

XTuj = Vj+1

+ βj .

8 J.L. BARLOW

|uTj Xvk| = |

V Tj+1vk + βT

≤ ‖Hjej−1‖2‖V Tj+1vk‖2 + |βT

j vk|≤ ω‖X‖2ηk + ‖βj‖2.

Therefore, using the bound in (2.15), we have

|uTj Xvk| ≤ [ωηk + q(m)εM ]‖X‖2.

Using (3.12) yields

δzj = −(uTj Xvk)wj ,

so (3.13) follows from

‖δzj‖2 = |uTj Xvk|‖wj‖2 ≤

√2[ωηk + q(m)εM ]‖X‖2.

We now combine Lemma 3.2 and 3.3 to give the effect of the product of House-holder transformations.

Lemma 3.4. Assume the hypothesis and notation of Lemma 3.2. Let Wk be givenby (3.3). Then

φkek−1

+ δck(3.15)

‖δck‖2 ≤ [√

2(k − 1)(ωηk + q(m)εM ) + (m + n + 2)εM ]‖X‖2.(3.16)

Proof. Before proving Theorem 3.1, we note that Wk in (3.2) is defined so that

k − 1 φkek−1

s γke1

n φkek−1

m γkuk

, s = m + n + 1 − k.(3.17)

φkek−1

= Wk−1Wk

φkek−1

= Wk−1

φkek−1

From the Lanczos recurrence,

(γk + δγk)uk = sk = fl(Xvk − φkuk−1)

= Xvk − φkuk−1 − δsk

|δγk| ≤ mεM |γk| + O(ε2M ) ≤ mεM‖X‖2 + O(ε2

‖δsk‖2 ≤ (n + 2)εM‖X‖2 + O(ε2M ).

γkuk = Xvk − φkuk−1 + δzk(3.18)

δzk = −δsk − (δγk)uk.

That yields the bound

‖δzk‖2 ≤ (m + n + 2)εM‖X‖2 + O(ε2M ).

Thus, if we let

δzk =

n 0m δzk

then using (3.17) and (3.18) and a simple recurrence, we have

φkek−1

= Wk−1

φkek−1

= Wk−1

φkek−1

Xvk − φkuk−1 + δzk

= Wk−1

φkek−1

Xvk − φkuk−1

+ Wk−1δzk

= Wk−2Wk−1

φkek−1

Xvk − φkuk−1

+ Wk−1δzk

= Wk−2

+ Wk−2δzk−1 + Wk−1δzk.

After k − 2 applications of Lemma 3.3, this becomes

φkek−1

k−1∑

Wjδzj+1

+ δck

δck =

k−1∑

Wjδzj+1.

‖δck‖2 ≤k−1∑

‖Wjδzj+1‖2

k−1∑

‖δzj+1‖2

= [√

2(k − 1)(ωηk + q(m)εM ) + (m + n + 2)εM ]‖X‖2 + O(ε2M )

10 J.L. BARLOW

establishing the result.We now prove Theorem 3.1.Proof. (of Theorem 3.1) We use induction on k. For k = 1, we have

B1 = (γ1), U1 = (u1), V1 = (v1).

s1 = fl(Xv1) = (γ1 + δγ1)u1, |δγ1| ≤ m|γ1|εM + O(ε2M ),(3.19)

s1 + δs1 = Xv1, ‖δs1‖2 ≤ nεM‖X‖2 + O(ε2M ).(3.20)

Combining (3.19) and (3.20), we have

γ1u1 = Xv1 − δs1 − (δγ1)u1 = Xv1 + δc1,

δc1 = −δs1 − (δγ1)u1,

‖δc1‖2 ≤ (m + n)‖X‖2εM + O(ε2M ).

Rewriting the above in terms of W1, we have

= (I − w1wT1 )γ1e1 =

0γ1u1

0Xv1 − δc1

+ δc1, δc1 =

‖δc1‖2 ≤ 2n‖X‖2εM + O(ε2M ),

thus establishing the result for k = 1.To construct the induction step, we write

Bk−1 φkek−1

0 γke1

Bk−1

φkek−1

Wk−1

Bk−1

φkek−1

0XVk−1

+ δCk−1 Wk

φkek−1

δCk−1 = (δc1, . . . , δck−1).

From Lemma 3.4, we have (3.15) and (3.16). Thus

+ δCk

δCk = (δc1, . . . , δck).

The bound on ‖δCk‖F in (3.5)–(3.6) comes from

‖δCk‖F =

‖δcj‖22

2(j − 1)(2ηj + q(m)εM ) + (m + n + 2)εM ]‖X‖2]2

+ O(ε2M ).

Using the triangle inequality, this becomes

‖δCk‖F ≤

2(j − 1)ηj)2

‖X‖2

2jq(m)εM )2

((m + n + 2)εM )2

]‖X‖2 + O(ε2M ).(3.21)

The Cauchy-Schwarz inequality applied to the first term and bounding the secondterm yields

2(j − 1)ηj

≤ 2√

2/3k3/2ηk(3.22)

2(j − 1)q(m)εM )2

√2q(m)εM

≤√

2/3k3/2q(m)εM ,(3.23)

thus combining (3.21) with (3.22)–(3.23), we have (3.5)–(3.6).To use Theorem 3.1 to bound the distance between the singular values of B = Bn

and those of X , we need a lemma that bounds the difference between the singularvalues of X and XV .

Lemma 3.5. Let V = Vn be the result of n steps of Function 2.1. If σk(X) is thekth singular value of X, then

σk(XV )

(1 + orth(V ))1/2≤ σk(X) ≤ σk(XV )

(1 − orth(V ))1/2.(3.24)

Proof. We use the inequality in [15, p.419,Corollary 7.3.8] of the form

σk(X)σn(V ) ≤ σk(XV ) ≤ σk(X)σ1(V ).

12 J.L. BARLOW

Using the bounds in (2.10)–(2.11), on V we get (3.24).

If we just use Lemma 3.5 with Theorem 3.1, we obtain a bound on the singularvalues of B in terms of those of X .

Theorem 3.6. Assume the hypothesis and terminology of Theorem 3.1. Ex-cluding terms of O(ε2

M + η2n), the singular values of X and those of B are related

(σj(X) − σj(B))2

≤ ‖δCn‖F

+ ‖X‖F [(1 − orth(V ))−1/2 − 1](3.25)

≤ [f1(m, n, n)εM + (f2(n) + n/2)ηn]‖X‖2.(3.26)

Proof. From the triangle inequality applied to the two norm,

(σj(X) − σj(B))2

(σj(X) − σj(XV ))2

(σj(XV ) − σj(B))2

From Theorem 3.1 and the Wielandt–Hoffman theorem [11, p.450,Thm. 8.6.4], wehave that

(σk(XV ) − σk(B))2

≤ ‖δCn‖F .(3.27)

From Lemma 3.5, we have that

|σk(X) − σk(XV )| ≤ σk(X)max{1 − (1 + orth(V ))1/2 , (1 − orth(V ))−1/2 − 1}= [(1 − orth(V ))

−1/2 − 1]σk(X).

Using this inequality, we obtain the bound

(σk(XV ) − σk(B))2 ≤ [(1 − orth(V ))−1/2 − 1]2

σk(X)2

= [(1 − orth(V ))−1/2 − 1]2‖X‖2F .(3.28)

Combining (3.27) and (3.28) yields (3.25). To obtain (3.26) note the bound on ‖δCn‖F

in (3.5)–(3.6) and that

[(1 − orth(V ))−1/2 − 1] = (1/2)orth(V ) + O(orth2(V ))

≤√

2ηn + O(η2

n).(3.29)

Since ‖X‖F ≤ √n‖X‖2, (3.26) results from combining (3.29) with (3.25).

The bound (3.25) shows that the singular values of B are close to those of X aslong as orth(V ) is kept small.

4. Reorthogonalization Strategies. We discuss two common approaches tospecifying reorthog from Function 2.1. The first is to use Gram–Schmidt reorthog-onalization of vk+1 against all previously computed right Lanczos vectors §4.1; thesecond is to use a selective reorthogonalization strategy §4.2. A third method, speci-fied in §4.3, is used for our numerical tests to quantify the effect of loss of orthogonalityin V .

4.1. Complete Reorthogonalization. Our strategy for complete reorthogo-nalization, which grows out of “twice–is–enough” approaches in [14] and approachesin [8, 24], [21, §6.9], is a version of Gram–Schmidt reorthognalization given by Barlowet al. [3].

First, we compute

h(1) = V Tk rk, r

(1)k = rk − Vkh

(1).(4.1)

‖r(1)k ‖2 ≥

4/5‖rk‖2,(4.2)

then we accept

φk+1 = ‖r(1)k ‖2, vk+1 = r

(1)k /φk+1, hk+1 = h(1).

Otherwise, we compute

h(2) = V Tk r

(1)k , r

(2)k = r

(1)k − Vkh

(1).(4.3)

‖r(2)k ‖2 ≥

4/5‖r(1)k ‖2,(4.4)

then we accept

φk+1 = ‖r(2)k ‖2, vk+1 = r

(2)k /φk+1, hk+1 = h(1) + h(2).

If either (4.2) or (4.4) holds, we show in the Appendix, that, ignoring roundingerror, we have

‖V Tk vk+1‖2 ≤ 0.5ξk + O(ξ2

ξk = ‖I − V Tk Vk‖2 = orth(Vk).(4.5)

If (4.2)–(4.4) is false, we use a method from [8] and modified in [3] to constructφk+1 and vk+1. We find eJ such that

‖V Tk eJ‖2 = min

1≤j≤m‖V T

k ej‖2,

and then compute

c(1) = V Tk eJ , t(1) = eJ − Vkc

c(2) = V Tk t(1), t(2) = t(1) − Vkc

14 J.L. BARLOW

vk+1 = t(2)/‖t(2)‖2(4.6)

satisfies

‖V Tk vk+1‖2 ≤

k/(m − k)ξ2k + O(ξ4

For all practical purposes, this choice of vk+1 restarts the Lanczos process. We proposetwo possible ways to choose φk+1. In exact arithmetic,

XTuk = Vkhk + r(2)k ,

where, in the Appendix, we show that

‖r(2)k ‖2 =

φ2k+1 + ‖nk‖2

≤ 2ξk(1 + ξk)‖rk‖2

≤ 2ξk(1 + ξk)‖X‖2(4.7)

which is small relative to ‖X‖2. Our first choice of φk+1, given by

φk+1 = vTk+1r

(2)k(4.8)

produces

XTuk = Vkhk + φk+1vk+1 + nk

where ‖nk‖2 is minimized over all choices of φk+1.However, since

|φk+1| ≤ ‖r(2)k ‖2 ≤ 2ξk (1 + ξk)

1/2 ‖X‖2,

our second choice of φk+1, given by

φk+1 = 0

neglects an element of size O(ξk‖X‖2), the magnitude of the bounds on the errors inthe singular values in Theorem 3.6.

We encapsulate this algorithm in Function 4.1.The Boolean variable setzero is true if we set φk+1 to zero when (4.2) and (4.4)

are false, and false if we compute φk+1 as in (4.8).Function 4.1 (Gram–Schmidt reorthogonalization of rk against Vk).

function [vk+1, hk, φk+1] = GS reorthog(Vk, rk, setzero)

h(1) = V Tk rk; r

(1)k = rk − Vkh

if ‖r(1)k ‖2 ≥

4/5‖rk‖2

φk+1 = ‖r(1)k ‖2; vk+1 = r

(1)k /φk+1; hk = h(1);

h(2) = V Tk r

(1)k ; r

(2)k = r

(1)k − Vkh

hk = h(1) + h(2);

if ‖r(2)k ‖2 ≥

4/5‖r(1)k ‖2

φk+1 = ‖r(2)k ‖2; vk+1 = r

(2)k /φk+1;

Find eJ such that‖V T

k eJ‖2 = min1≤j≤m ‖V Tk ej‖2

c(1) = V Tk eJ ; t(1) = eJ − Vkc

c(2) = V Tk t(1); t(2) = t(1) − Vkc

(2) ;vk+1 = t(2)/‖t(2)‖2;if setzero

φk+1 = 0;else

φk+1 = vTk+1r

(2)k ;

end;end;

end;end GS reorthog

4.2. Selective Reorthogonalization. Selective reorthogonalization was cre-ated by Parlett and Scott [22] from a result of Paige [18] showing that most of theloss of orthogonality in Vk is confined to converged right singular vectors. The variantof that strategy for this decomposition takes the SVD of Bk given by

Bk = QΘST(4.9)

Q = (q1, . . . ,qk) = (qij), S = (s1, . . . , sk) = (sij),(4.10)

Θ = diag(θ1, . . . , θk),(4.11)

and finds components such that ℓ1, . . . , ℓτ such that for a given tolerence tol we have

|φk+1qk,ℓj| ≤ tol ∗ ‖X‖F , j = 1, . . . , τ.

It then lets

Sτ = (sℓ1 , . . . , sℓτ)

be the corresponding right singular vectors of Bk so that the matrix

Zτ = VkSτ = (zℓ1 , . . . , zℓτ)

consists of converged right singular vectors of X . A reorthogonalization procedure,say, GS reorthog, with Zτ , computes a vector tk and vk+1 according to

[vk+1, tk+1, φk+1] = GS reorthog(Zτ , rk)

with the resulting hk+1 in (2.7) is given by

hk+1 = Sτtk+1.

The strategies in our examples in §5 are variants on performing Gram–Schmidt on allprevious right Lanczos vectors, thereby allowing us to demonstrate the effect on theorthogonality of V . Since we expect that τ ≪ k, this reorthogonalization practice isoften much cheaper than complete reorthogonalization.

16 J.L. BARLOW

4.3. Parametrized Reorthogonalization for Numerical Tests. To con-struct our numerical tests in §5, we give a parametrized modification ofGS reorthog in §4. Let

φ(0)k+1 = ‖rk‖2, v

(0)k+1 = rk/φ

(0)k+1,(4.12)

and let

r(j)k = (I − VkVk)jrk, φ

(j)k+1 = ‖r(j)

k ‖2, j = 1, 2(4.13)

v(j)k+1 = r

(j)k /φ

(j)k+1.(4.14)

For j = 0, 1, 2, we accept vk+1 = v(j)k+1 (and do no further reorthogonalization) if

‖V Tk v

(j)k+1‖2 ≤ η(4.15)

for some specified parameter η. If (4.15) is not satisfied for j = 0, 1, 2, we computevk+1 according to (4.6).

Function 4.2 (Parmaterized Gram–Schmidt reorthogonalization of rk againstVk).

function [vk+1, hk, φk+1] = GS reorthog eta (Vk, rk, η, setzero)

h(1) = V Tk rk; φk+1 = ‖rk‖2;

if ‖h(1)‖2 ≤ φk+1 ∗ η

vk+1 = rk/φk+1; hk = 0;else

r(1)k = rk − Vkh

(1) ; h(2) = V Tk r

(1)k ; φk+1 = ‖r(1)

k ‖2;

if ‖h(2)‖2 ≤ φk+1 ∗ η

vk+1 = r(1)k /φk+1; hk = h(1);

r(2)k = r

(1)k − Vkh

(2); h(3) = V Tk r

(2)k ;

φk+1 = ‖r(2)k ‖2

if ‖h(3)‖2 ≤ φk+1 ∗ η

vk+1 = r(2)k /φk+1; hk = h(1) + h(2);

Find eJ such that‖V T

k eJ‖2 = min1≤j≤m ‖V Tk ej‖2

c(1) = V Tk eJ ; t(1) = eJ − Vkc

(1) ;c(2) = V T

k t(1); t(2) = t(1) − Vkc(1) ;

vk+1 = t(2)/‖t(2)‖2;if setzero

φk+1 = 0;else

φk+1 = vTk+1r

(2)k ;

end;end;

end;end;end GS reorthog eta

This routine guarantees that

‖V Tk vk+1‖2 ≤ max{η,

n/(m − n)ξ2k}

where ξk is from (4.5).

5. Numerical Tests.

Example 5.1.For these examples, we construct m × n matrices of the form

X = PΣZT

where n = 50, 60, . . . , 300, m = ⌊1.5 ∗ n⌋, P ∈ Rm×n is left orthogonal, Z ∈ Rn×n

is orthogonal, and Σ is positive and diagonal. The matrices P and Z come from thetwo MATLAB commands

P = orth(randn(m, n)); Z = orth(randn(n, n));

where randn command which generates a m×n matrix with a standard normal distri-bution, and the orth command which produces the orthogonal factor of the contents.The diagonal matrix Σ is given by

Σ = diag(σ1, . . . , σn)

where σ1 = 1, σk = rk−1, and rn−1 = 10−18, giving X has a geometric distributionof singular values.

The bidiagonal reduction of X was computed in three different ways.1. The Golub–Kahan–Householder (GKH) algorithm from [10].2. The Golub–Kahan–Lanczos procedure using Function 4.1 to do reorthogonal-

ization setting setzero = false. We called this GKL-nonzero.3. The GKL procedure as in item (2), except setting setzero = true, for restarts

which we call GKL-setzero. For this case, several elements of the super-diagonal are set to zero. We show how many in Figure 5.2.

The singular values of the resulting bidiagonal matrices are computed by the MAT-LAB svd command and compared to result using the svd command directly on X.

The upper window of Figure 5.1 compares the singular values from the GKH al-gorithm to the GKL–nonzero algorithm. They are also displayed with orth(V ). Theseerrors and orth(V ) are about 10−15, thus V is near orthogonal and the singular valuescomputed by the two methods are very close. Their errors are difference between theircomputed singular values and those computed by the MATLAB svd command on X.In the lower window of Figure 5.1, we look at the difference in the singular valuesbetween GKL–nonzero and GKL–setzero, again displayed with orth(V ). It is shownhere that they are about 10−16, thus the two strategies are almost indistinguishable.

Example 5.2. We construct our examples exactly as in Example 5.1 except thatwe do reorthogonalization of V with Function 4.2, GS reorthog eta with η = 10−8.We do the same three kinds of bidiagonalization: GKH, GKL–nonzero, GKL-

setzero.The upper window of Figure 5.3 is the error in the singular values for GKH and

GKL-nonzero posted beside orth(V ). We see that orth(V ) is the range of 10−8 and theerror in the singular values for GKL-nonzero is a little smaller than that, consistentwith Theorem 3.6. In the lower window of Figure 5.3, we compare the singular valuesof the two restart strategies GKL-nonzero and GKL-setzero. Their difference is about10−9 and thus a little smaller than orth(V ).

18 J.L. BARLOW

50 100 150 200 250 300−15.6

−15.4

−15.2

−14.8

−14.6

Dimension

of Wielandt−−Hoffman Error in Geometric Dist. Matrices

orth(V)

50 100 150 200 250 300−17

−16.5

−15.5

−14.5

Dimension

Diff. in

of difference in Singular Values for Restart Strategies for Lanczos

GKL−−difference in restarts

F−norm of discards

orth(V)

Fig. 5.1. Wielandt–Hoffman Error in Singular Values from Example 5.1

6. Conclusion. When the Golub–Kahan–Lanczos algorithm is applied to X ∈Rm×n, we can reorthogonalize just the right Lanczos vectors as proposed by [23].Theorem 3.1 establishes a key relationship between the Vk, the matrix of the firstk right Lanczos vectors, Bk, the leading k × k submatrix of B, and an orthogonalmatrix Wk generated from the left Lanczos vectors. As a consequence, the computedsingular values of B are a distance bounded by O([εM + orth(V )]‖X‖2) from those ofX .

Moreover, if the reorthogonalization strategies used to produce vk+1, the (k+1)stcolumn of V , do not produce an orthogonal vk+1, vk+1 can be produced from arestart strategy and the corresponding upper bidiagonal element φk+1 can be set tozero without significant loss of accuracy.

In [1], Theorem 3.1 is used to change the manner in which left singular vectorsare computed from the left Lanczos vectors.

50 100 150 200 250 3005

Dimension

Number of zero elements from restarts

Fig. 5.2. Number of zeros in superdiagonal Example 5.1

REFERENCES

[1] J.L. Barlow. Reorthogonalization for the Golub–Kahan–Lanczos bidiagonal reduction: PartII–singular vectors. http://www.cse.psu.edu/~barlow/lanczos svd orthII.pdf, 2010.

[2] J.L. Barlow, N. Bosner, and Z. Drmac. A new backward stable bidiagonal reduction method.Linear Alg. Appl., 397:35–84, 2005.

[3] J.L. Barlow, A. Smoktunowicz, and H. Erbay. Improved Gram–Schmidt downdating methods.BIT, 45:259–285, 2005.

[4] M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM

Review, 41:335–362, 1999.[5] A. Bjorck. A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear

equations. BIT, 28:659–670, 1988.[6] N. Bosner and J. Barlow. Block and parallel versions of one-!sided bidiagonalization. SIAM J.

Matrix Anal. Appl., 29(3):927–953, 2007.[7] D. Calvetti and L. Reichel. Tikhanov regularization on large linear problems. BIT, 43:263–283,

2003.[8] J. W. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart. Reorthogonalization and stable

algorithms for updating the Gram-Schmidt QR factorization. Math. Comp., 30(136):772–795, 1976.

[9] L. Elden. Algorithms for the regularization of ill-conditioned least square problems. BIT,17:134–145, 1977.

[10] G.H. Golub and W.M. Kahan. Calculating the singular values and pseudoinverse of a matrix.SIAM J. Num. Anal. Ser. B, 2:205–224, 1965.

[11] G.H. Golub and C.F. Van Loan. Matrix Computations, Third Edition. The Johns HopkinsPress, Baltimore,MD, 1996.

[12] N.J. Higham. Functions of Matrices: Theory and Computation. SIAM Publications, Philadel-phia, PA, 2008.

20 J.L. BARLOW

50 100 150 200 250 300−16

Dimension

of Wielandt−−Hoffman Error in Geometric Dist. Matrices

orth(V)

50 100 150 200 250 300−13

Dimension

Diff. in

of difference in Singular Values for Restart Strategies for Lanczos

GKL−−difference in restarts

F−norm of discards

orth(V)

Fig. 5.3. Maximum Error in Singular Values from Example 5.2

[13] I. Hnetynkova, M. Plesinger, and Z. Strakos. Golub-Kahan iterative bidiagonalization anddetermining the size of the noise in data. BIT, 49:669–696, 2009.

[14] W. Hoffmann. Iterative algorithms for Gram-Schmidt orthogonalization. Computing, 41:353–367, 1989.

[15] R.A. Horn and C.A. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,UK,1985.

[16] R. Mazumber, T. Hastie, and R. Tishbarani. Spectral regularization algorithms for learninglarge incomplete matrices. http://www-stat.stanford.edu/ hastie/Papers/SVD JMLR.pdf.

[17] C.C. Paige and M.A. Saunders. LSQR:An algorithm for sparse linear equations and leastsquares problems. ACM Trans. on Math. Software, 8:43–71, 1982.

[18] C.C. Paige. The Computation of Eigenvalues and Eigenvalues of Very Large Sparse Matrices.PhD thesis, University of London, 1971.

[19] C.C. Paige. A useful form of unitary matrix form any sequence of unit 2-norm n-vectors. SIAM

J. Matrix Anal. Appl., 31(2):565–583, 2009.[20] C.C. Paige and M.A. Saunders. Algorithm 583 LSQR:Sparse linear equations and least squares

problems. ACM Trans. on Math. Software, 8:195–209, 1982.[21] B.N. Parlett. The Symmetric Eigenvalue Problem. SIAM Publications, Philadelphia, PA, 1998.

Republication of 1980 book.[22] B.N. Parlett and D.S. Scott. The Lanczos algorithm with selective reorthogonalization. Math.

Comp., 33:217–238, 1979.

[23] H. Simon and H. Zha. Low rank matrix approximation using the Lanczos bidiagonalizationprocess. SIAM J. Sci. Stat. Computing, 21:2257–2274, 2000.

[24] K. Yoo and H. Park. Accurate downdating of a modified Gram–Schmidt QR decomposition.BIT, 36:166–181, 1996.

Appendix. In this appendix, we show two bounds that are related to Function4.1. They are similar to bounds proved in [3], but are instead stated in terms ofξk = orth(Vk).

First we assume that either (4.2) or (4.4) holds. Assuming (4.2) we have thatthat

vk+1 = r(1)k /‖r(1)

k ‖2.

The argument for (4.4) is identical. Our argument assumes exact arithmetic, but thearguments in [3] show that if ξk ≥ εM , rounding error has little qualitative effect onthe behavior of this procedure.

First, note that

‖V Tk vk+1‖2 ≤ ‖V T

k r(1)k ‖2/‖r(1)

k ‖2.

r(1)k = (I − VkV T

we have that

‖V Tk r

(1)k ‖2 = ‖V T

k (I − VkV Tk )rk‖2/‖r(1)

k ‖2

= ‖(I − V Tk Vk)V T

k rk‖2/‖r(1)k ‖2

≤ ‖I − V Tk Vk‖2‖V T

k rk‖2/‖r(1)k ‖2

= ξk‖V Tk rk‖2/‖r(1)

k ‖2.

We now bound the ratio

‖V Tk rk‖2/‖r(1)

k ‖2.(6.1)

To do that, we note that

‖r(1)k ‖2

2 = ‖(I − VkV Tk )rk‖2

= ‖rk‖22 − 2rT

k VkV Tk rk + ‖VkV T

k rk‖22

≤ ‖rk‖22 − 2‖V T

k rk‖22 + ‖Vk‖2

2‖V Tk rk‖2

‖Vk‖22 ≤ 1 + ξk

we have

‖r(1)k ‖2

2 ≤ ‖rk‖22 − (1 − ξk)‖V T

k rk‖22.

Using (4.2), this becomes

5‖rk‖2

2 ≤ ‖rk‖22 − (1 − ξk)‖V T

k rk‖22

22 J.L. BARLOW

implying that

‖V Tk rk‖2 ≤ 1

5(1 − ξk)‖rk‖2.(6.2)

Combining (6.2) with (4.2), our bound for (6.1) is

‖V Tk rk‖2/‖r(1)

k ‖2 ≤ 1

1 − ξk

‖V Tk vk+1‖2 ≤ ξk

1 − ξk= 0.5ξk + O(ξ2

If (4.2) and (4.4) are both false, analysis in [8] yields

‖V Tk eJ‖2 ≤

Thus, similar to that above, if we let

v(1)k+1 = t(1)/‖t(1)‖2

‖V Tk v

(1)k+1‖2 =

‖V Tk t(1)‖2

‖t(1)‖2

≤ ξk‖V T

k eJ‖2

‖t(1)‖2

≤√

m − kξk

Now we look at

vk+1 = t(2)/‖t(2)‖2.

A repeat of the arguments above yields

‖V Tk vk+1‖2 = ‖V T

k t(2)‖2/‖t(2)‖2

≤ ξk‖V Tk v

(1)k+1‖2/‖t(2)‖2

≤√

m − kξ2k/‖t(2)‖2

t(2) = (I − VkV Tk )v

(1)k+1

we have that

‖t(2)‖22 = ‖v(1)

k+1‖22 − 2‖V T

k v(1)k+1‖2

2 + ‖VkV Tk v

(1)k+1‖2

≥ 1 − 2ξ2k

m − k+ σ2

k(Vk)‖V Tk v

(1)k+1‖2

Using (2.13), we have

‖t(2)‖22 ≥ 1 − ξ2

k(1 + ξk)k

m − k.

‖V Tk vk+1‖2 ≤

m − k

ξ2k/(1 − ξ2

k(1 + ξk) (k/(m − k))1/2)

m − kξ2k + O(ξ4

We now bound ‖r(2)k ‖2 in the case where ‖r(2)

k ‖2 <√

4/5‖r(1)k ‖2.

Computing two-norms yields

‖r(2)k ‖2

2 = ‖r(1)k ‖2

2 + ‖VkV Tk r

(1)k ‖2

2 − 2[r(1)k ]T VkV T

k r(1)k

= ‖r(1)k ‖2

2 + ‖VkV Tk r

(1)k ‖2

2 − 2‖V Tk r

(1)k ‖2

The second term expands to

‖VkV Tk r

(1)k ‖2

2 = [r(1)k ]T Vk(V T

k Vk)V Tk r

= ‖V Tk r

(1)k ‖2

2 − [r(1)k ]T VkGkV T

k r(1)k

where Gk = I − V Tk Vk and from (4.5), ‖Gk‖2 = orth(Vk) = ξk. Thus

‖r(2)k ‖2

2 = ‖r(1)k ‖2

2 − ‖V Tk r

(1)k ‖2

2 − 2[r(1)k ]T VkGkV T

k r(1)k .(6.3)

Equation (6.3) leads to the bound

‖r(2)k ‖2

2 = ‖r(1)k ‖2

2 − (1 + ξk)‖V Tk r

(1)k ‖2

and since (4.2)–(4.4) is false,

5‖r(1)

k ‖22 ≥ ‖r(2)

k ‖22 = ‖r(1)

k ‖22 − (1 + ξk)‖V T

k r(1)k ‖2

so that

‖V Tk r

(1)k ‖2

2 ≥ 1

5(1 + ξk)‖r(1)

k ‖22(6.4)

which implies

‖V Tk r

(1)k ‖2

2 ≥ 1

5(1 + ξk)

4‖r(2)

k ‖2 =1

4(1 + ξk)‖r(2)

k ‖2.(6.5)

Since we have

V Tk r

(1)k = V T

k (I − VkV Tk )rk = GkV T

‖V Tk r

(1)k ‖2

2 ≤ ‖Gk‖22‖V T

k rk‖22

≤ ξ2k(1 + ξk)‖rk‖2

2.(6.6)

Combining (6.6) and (6.5) and taking square roots, we have

‖r(2)k ‖2 ≤ 2ξk (1 + ξk)

1/2 ‖rk‖2.

Since ‖rk‖2 ≤ ‖X‖2, we have that

‖r(2)k ‖2 ≤ 2ξk (1 + ξk)1/2 ‖X‖2.(6.7)

REORTHOGONALIZATION FOR THE GOLUB–KAHAN–LANCZOS BIDIAGONAL ...b58/lanczos_svd_orthI.pdf ·...

Documents