Post on 17-Jul-2018
transcript
REORTHOGONALIZATION FOR THEGOLUB–KAHAN–LANCZOS BIDIAGONAL
REDUCTION: PART I – SINGULAR VALUES
JESSE L. BARLOW ∗
Department of Computer Science and Engineering, The Pennsylvania State University, University
Park, PA
16802–6822 USA. email: barlow@cse.psu.edu
Abstract. The Golub–Kahan–Lanczos bidiagonal reduction generates a factorization of a matrixX ∈ Rm×n, m ≥ n, such that
X = UBV T
where U ∈ Rm×n is left orthogonal, V ∈ Rn×n is orthogonal, and B ∈ Rn×n is bidiagonal. Whenthe Lanczos recurrence is implemented in finite precision arithmetic, the columns of U and V tendto lose orthogonality, making a reorthogonalization strategy necessary to preserve convergence of thesingular values.
It is shown that if
orth(V ) = ‖I − V T V ‖2,
then the singular values of B and those of X satisfy
0
@
nX
j=1
(σj(X) − σj(B))2
1
A
1/2
≤ O(εM + orth(V ))‖X‖2
where εM is machine precision. Moreover, a strategy is introduced for neglecting small off-diagonalelements during reorthogonalization that preserves the above bound on the singular values.
AMS subject classifications. 65F15,65F25.
Key words. Lanczos vectors, orthogonality, singular values, left orthogonal matrix.
1. Introduction. Bidiagonal reduction, the first step in many algorithms forcomputing the singular value decomposition (SVD) [10, 2], is also used for solving leastsquares problems [20, 17], for solving ill-posed problems [9, 5, 13], the computationof matrix functions [7] [12, §13.2.3], for matrix approximation [4], and the solution ofthe Netflix problem in [16].
In [10], Golub and Kahan give two Lanczos-based bidiagonal reduction algorithmswhich we call the Golub–Kahan–Lanczos (GKL) algorithms. The first GKL algorithmtakes a matrix X ∈ Rm×n, m ≥ n, and generates the factorization
X = UBV T(1.1)
with
U = (u1, . . . ,un) ∈ Rm×n, left orthogonal,(1.2)
V = (v1, . . . ,vn) ∈ Rn×n, orthogonal,(1.3)
∗ The research of Jesse L. Barlow was supported by the National Science Foundation under grantno. CCF-0429481.
1
2 J.L. BARLOW
and B ∈ Rn×n having a bidiagonal form given by
B =
γ1 φ2 0 · · ·γ2 φ3 · · ·
· · · ·· · · γn−1 φn
· · γn
def= ubidiag(γ1, . . . , γn; φ2, . . . , φn).(1.4)
For certain structured matrices, even with reorthogonalization, this GKL algorithmyields a faster method of producing a bidiagonal reduction to compute the completesingular value decomposition. For large sparse matrices, it is often the method ofchoice to compute a few singular values and associated singular vectors.
The recurrence generating the decomposition (1.1)–(1.4) is constructed by choos-ing a vector v1 ∈ Rn such that ‖v1‖2 = 1, letting uk ∈ Rm, k = 1, . . . , n andvk ∈ Rn, k = 2, . . . , n be unit vectors, and letting γk, φk, k = 1, . . . , n be scalingconstants such that
γ1u1 = Xv1,(1.5)
φk+1vk+1 = XTuk − γkvk, k = 1, . . . , n − 1,(1.6)
γk+1uk+1 = Xvk+1 − φk+1uk.(1.7)
The other GKL algorithm in [10] starts with u1 and instead generates a lower bidi-agonal matrix. The discussion below also applies to that recurrence if we note thatsecond GKL algorithm is just the first applied to
(
u1 X)
with v1 = e1. For ourpurposes, it is best to associate V with the minimum of the two dimensions m and nof X . The recurrence (1.5)–(1.7) is equivalent to the symmetric Lanczos tridiagonal-ization algorithm performed on the matrix
M =
(
0 XT
X 0
)
(1.8)
with the starting vector
(
v1
0
)
.
Since the vectors u1, . . . ,un and v1, . . . ,vn tend to lose orthogonality in finiteprecision arithmetic, reorthogonalization is performed when the bidiagonal reductionalgorithm (1.5)–(1.7) is used to compute the singular value decomposition as in [10]or in regularization algorithms as in [5, 9] or in the computation of matrix functionsas in [7].
Paige [18] pointed out that the loss of orthogonality in Lanczos reductions is struc-tured in the sense that it is coincident with the convergence of approximate eigenvaluesand eigenvectors (called Ritz values and vectors). Parlett and Scott [22] used this ob-servation to develop partial reorthogonalization procedures. A good summary of thesurrounding issues is given by Parlett [21, Chapter 13].
To understand how the algorithm works with reorthogonalization of V , we definethe loss of orthogonality measures
orth(V )def= ‖I − V T V ‖2,(1.9)
ηk = ‖V Tk−1vk‖2, ηk =
k∑
j=1
η2j
1/2
.(1.10)
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 3
Noting that orth(Vk) satisfies the upper bound
orth(Vk) ≤ ‖I − V Tk Vk‖F ≤
√2
k∑
j=1
η2j
1/2
=√
2ηk,(1.11)
and the lower bound
orth(Vk) ≥ max1≤j≤k
ηj ≥ 1√kηk,
we have that orth(Vk) and ηk are large or small together. Thus we express ourbounds in terms of ηk with the understanding that, with minor modification, theycould expressed in terms of orth(Vk).
The singular values of B, given by
σ1(B) ≥ σ2(B) ≥ · · · ≥ σn(B),
and the corresponding singular values of X satisfy an O(εM + ηn) bound given inequation (3.25) in Theorem 3.6. Thus the accuracy of the computed singular valuesdepends upon our ability to preserve the orthogonality of V . These results are similarto those for a procedure due to Barlow, Bosner, and Drmac [2] that generates V usingHouseholder transformations and U by the recurrences (1.5)–(1.7).
We structure this paper as follows. In §2, we establish the framework for theanalysis in §3. In §3, we prove our main theorem (Theorem 3.1) and results on thesingular values of B. In §4, we give three reorthogonalization strategies for V andgive a method for neglecting small superdiagonal elements resulting from reorthogo-nalization. In §5, we give numerical tests based upon regulating the orthogonality ofV in various ways which we follow with a conclusion in §6.
In part II of this work [1] this author uses Theorem 3.1 to produce an algorithmthat computes left singular vectors with stronger residual and orthogonality boundsthan previous versions of the GKL algorithm in the literature.
2. The Lanczos Bidiagonal Recurrence with Reorthogonalization. Inexact arithmetic, the columns of V in (1.3), computed according to (1.5)–(1.7), areorthonormal, but, in floating point arithmetic, some reorthogonalization of these vec-tors is necessary. A model of how that reorthogonalization could be done is proposedand analyzed below.
To recover vk+1 from v1, . . . ,vk and u1, . . . ,uk, we compute
rk = XTuk − γkvk.(2.1)
We then reorthogonalize rk is against v1, . . . ,vk so that
φk+1vk+1 = rk −k∑
j=1
hj,k+1vj(2.2)
= rk − Vkhk+1, hk+1 =
h1,k+1
...
hk,k+1
(2.3)
for some coefficients hj,k+1, j = 1, . . . , k. Combining (2.1) and (2.3), we have that
φk+1vk+1 = XTuk − Vkhk+1(2.4)
4 J.L. BARLOW
where hk+1 = γkek + hk+1.To encapsulate our approaches to reorthogonalization, we assume the existence
of a general function reorthog that performs step (2.3) in some manner. Thus the(k + 1)st Lanczos vector comes from (2.1) followed by
[vk+1, hk+1, φk+1] = reorthog(Bk, Vk, rk),(2.5)
where
Bk = ubidiag(γ1, . . . , γk; φ2, . . . , φk)(2.6)
may provide necessary information for the partial reorthogonalization schemes.In floating point arithmetic, we assume that the steps (2.1) and (2.5) produce
vectors vk+1 and hk+1, and a scalar φk+1 such that
XTuk = Vkhk+1 + φk+1vk+1 + βk+1(2.7)
where
‖βk+1‖2 ≤ εMq(m)‖X‖2(2.8)
for some modest sized function q(m). The value of q(m) varies depending upon whichorthogonalization method is used, but, for, say, the complete reorthogonalizationscheme in Function 4.1, we would have q(m) = O(m). In general, we have therecurrence
XT Uk = Vk+1Hk+1 + Ek(2.9)
where
H2 =
(
1 h1
1 φ2
)
, Hk+1 =
(
Hk hk+1
0 φk+1
)
,
Ek = (β2, . . . , βk+1).
The following function specifies the first k steps of the Lanczos bidiagonal reduc-tion.
Function 2.1 (First k steps of Lanczos Bidiagonal Reduction with reorthogo-nalization).function [Bk, Uk, Vk]=lanczos bidiag(X,v1, k)V1 = (v1); s1 = Xv1; γ1 = ‖s1‖2; u1 = s1/γ1;for j = 2: k
rj = XTuj−1 − γj−1vj−1;
[vj , hj , φj ]=reorthog(Bj−1, Vj−1, rj);sj = Xvj − φjuj−1; γj = ‖sj‖2; uj = sj/γj;Vj =
(
Vj−1 vj
)
; Uj =(
Uj−1 uj
)
;
Bj =
(
Bj−1 φjej−1
0 γj
)
;
end;end; lanczos bidiag
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 5
We discuss three specific methods for performing the reorthogonalization in re-
orthog in §4. In the remainder of this section, we discuss a model for that loss thatleads to the analysis in §3.
Although hk is discarded by Function 2.1 in the construction of Uk, Bk, and Vk,we show that throwing out this information affects the accuracy on the singular valuesof B only through the loss of orthogonality of V .
We assume orth(Vk) < 1, for all k, otherwise Vk is not meaningfully close to leftorthogonal. Using the definition of orth(Vk) in (1.10), the singular values of Vk arebounded by
σ1(Vk) =√
λ1(V Tk Vk) ≤
√
λ1(Ik) + ‖I − V Tk Vk‖2 =
√
1 + orth(Vk),(2.10)
σk(Vk) =√
λk(V Tk Vk) ≥
√
λk(Ik) − ‖I − V Tk Vk‖2 =
√
1 − orth(Vk).(2.11)
If V †k is the Moore–Penrose pseudoinverse of Vk, then
‖Vk‖2 = σ1(Vk) ≤ (1 + orth(Vk))1/2
,(2.12)
‖V †k ‖2 = σk(Vk)−1 ≤ (1 − orth(Vk))
−1/2.(2.13)
Equation (2.8) can be rewritten,
XTuk+1 − βk+1 = Vk+1
(
hk+1
φk+1
)
.
Using our assumption that orth(Vk+1) < 1, Vk+1 must have full column rank so that
V †k+1 satisfies V †
k+1Vk+1 = Ik+1, thus
(
hk+1
φk+1
)
= V †k+1(X
Tuk+1 − βk+1).
Adding the assumption that orth(Vk+1) and q(m)εM are sufficiently small that
(1 + q(m)εM )
(1 − orth(Vk+1))1/2
def= ω(2.14)
for some reasonable constant ω, thus we infer that
‖Hk+1ek‖2 = ‖(
hk+1
φk+1
)
‖2 ≤ ‖V †k+1‖2(‖XTuk+1‖2 + ‖βk+1‖2),
≤ (1 + q(m)εM )
(1 − orth(Vk+1))1/2
‖X‖2,
= ω‖X‖2.(2.15)
Thus, the columns of Hk+1 are bounded as long as reasonable orthogonality is main-tained for Vk+1.
3. Error Bounds for GKL Bidiagonalization with One-Sided Reorthog-
onalization. The results in this paper and in [1] are based upon Theorem 3.1 statednext.
6 J.L. BARLOW
Theorem 3.1. Let Function 2.1 be implemented in floating point arithmetic withmachine unit εM . Assume that Vk = (v1, . . . ,vk) with orthogonality parametrized byηk in (1.10), Uk = (u1, . . . ,uk), and Bk = ubidiag(γ1, . . . , γk; φ2, . . . , φk) are outputfrom that function. Assume also that orth(Vk) < 1. Define
Ck =
(
k
n 0m XVk
)
,(3.1)
Wj = I − wjwTj , wj =
(
−ej
uj
)
(3.2)
Wk = W1 · · ·Wk.(3.3)
If q(m) is defined in (2.8) and ω is given by (2.14), then for k = 1, . . . , n
Ck + δCk = Wk
(
k
k Bk
m + n − k 0
)
(3.4)
where
‖δCk‖F ≤ [f1(m, n, k)εM + f2(k)ηk]‖X‖2 + O(ε2M )(3.5)
and
f1(m, n, k) =√
k[√
2/3q(m) + m + n + 2], f2(k) = ω√
2/3k3/2.(3.6)
The matrix Wk is orthogonal because u1, . . . ,uk are unit vectors. Some detailsof the form of Wk are given in [19, Theorem 2.1].
Three technical lemmas are necessary to prove the result (3.4)–(3.6); the firstconcerns the effect of Wk−1.
Lemma 3.2. Let φk, γk,uk, and vk be computed by the kth step of Function 2.1.Let Wk be defined in (3.2). Then, for k ≥ 2,
Wk−1
(
φkek
Xvk − φkuk
)
=
(
0Xvk
)
+ δzk−1(3.7)
where
‖δzk−1‖2 ≤√
2(ωηk + q(m)εM )‖X‖2.(3.8)
Proof. We have that
Wk−1
(
φkek
Xvk − φkuk
)
= Wk−1
(
φkek−1
0
)
+ Wk−1
(
0Xvk − φkuk
)
=
(
0φkuk−1
)
+
(
0Xvk − φkuk−1
)
+ uTk−1(Xvk − φkuk−1)wk−1
=
(
0Xvk
)
+ uTk−1(Xvk − φkuk−1)wk−1.(3.9)
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 7
To bound the last term, we note that from (2.7), we have
XTuk−1 = φkvk + Vk−1hk + βk, ‖βk‖2 ≤ q(m)εM‖X‖2,
so that
uTk−1Xvk = φkv
Tk vk + vT
k Vk−1hk + uTk−1βk
= φk + δφk(3.10)
where
δφk = vTk Vk−1hk + uT
k−1βk.
Thus
|δφk| ≤ ‖V Tk−1vk‖2‖hk‖2 + ‖βk‖2
≤ ‖V Tk−1vk‖2‖Hk+1ek‖2 + ‖βk‖2
≤ (ωηk + q(m)εM )‖X‖2.(3.11)
Combining (3.9), (3.10), and (3.11), we have (3.7) where
δzk−1 = (δφk)wk−1.
Thus
‖δzk−1‖2 = |δφk|‖wk−1‖2.
Since ‖wk−1‖2 = ‖(eTk−1,u
Tk−1)
T ‖2 =√
2, we have the bound (3.8) for ‖δzk−1‖2.Our second lemma bounds the effect of Wj , j = 1, 2, . . . , k − 2.Lemma 3.3. Assume the hypothesis and notation of Lemma 3.2. For k ≥ 3 and
j < k − 1, we have
Wj
(
0Xvk
)
=
(
0Xvk
)
+ δzj(3.12)
where
‖δzj‖2 ≤√
2(ωηk + q(m)εM )‖X‖2.(3.13)
Proof. First, we note that
Wj
(
0Xvk
)
=
(
0Xvk
)
− wjwTj
(
0Xvk
)
=
(
0Xvk
)
− (uTj Xvk)wj .(3.14)
Again, using (2.7), we have
XTuj = Vj+1
(
hj
φj
)
+ βj .
8 J.L. BARLOW
Thus
|uTj Xvk| = |
(
hj
φj
)T
V Tj+1vk + βT
j vk|
≤ ‖Hjej−1‖2‖V Tj+1vk‖2 + |βT
j vk|≤ ω‖X‖2ηk + ‖βj‖2.
Therefore, using the bound in (2.15), we have
|uTj Xvk| ≤ [ωηk + q(m)εM ]‖X‖2.
Using (3.12) yields
δzj = −(uTj Xvk)wj ,
so (3.13) follows from
‖δzj‖2 = |uTj Xvk|‖wj‖2 ≤
√2[ωηk + q(m)εM ]‖X‖2.
We now combine Lemma 3.2 and 3.3 to give the effect of the product of House-holder transformations.
Lemma 3.4. Assume the hypothesis and notation of Lemma 3.2. Let Wk be givenby (3.3). Then
Wk
(
φkek−1
γke1
)
=
(
0Xvk
)
+ δck(3.15)
where
‖δck‖2 ≤ [√
2(k − 1)(ωηk + q(m)εM ) + (m + n + 2)εM ]‖X‖2.(3.16)
Proof. Before proving Theorem 3.1, we note that Wk in (3.2) is defined so that
Wk
(
k − 1 φkek−1
s γke1
)
=
(
n φkek−1
m γkuk
)
, s = m + n + 1 − k.(3.17)
thus
Wk
(
φkek−1
γke1
)
= Wk−1Wk
(
φkek−1
γke1
)
= Wk−1
(
φkek−1
γkuk
)
.
From the Lanczos recurrence,
(γk + δγk)uk = sk = fl(Xvk − φkuk−1)
= Xvk − φkuk−1 − δsk
where
|δγk| ≤ mεM |γk| + O(ε2M ) ≤ mεM‖X‖2 + O(ε2
M ),
‖δsk‖2 ≤ (n + 2)εM‖X‖2 + O(ε2M ).
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 9
Thus,
γkuk = Xvk − φkuk−1 + δzk(3.18)
where
δzk = −δsk − (δγk)uk.
That yields the bound
‖δzk‖2 ≤ (m + n + 2)εM‖X‖2 + O(ε2M ).
Thus, if we let
δzk =
(
n 0m δzk
)
,
then using (3.17) and (3.18) and a simple recurrence, we have
Wk
(
φkek−1
γke1
)
= Wk−1
(
φkek−1
γkuk
)
= Wk−1
(
φkek−1
Xvk − φkuk−1 + δzk
)
= Wk−1
(
φkek−1
Xvk − φkuk−1
)
+ Wk−1δzk
= Wk−2Wk−1
(
φkek−1
Xvk − φkuk−1
)
+ Wk−1δzk
= Wk−2
(
0Xvk
)
+ Wk−2δzk−1 + Wk−1δzk.
After k − 2 applications of Lemma 3.3, this becomes
Wk
(
φkek−1
γke1
)
=
(
0Xvk
)
+
k−1∑
j=1
Wjδzj+1
=
(
0Xvk
)
+ δck
where
δck =
k−1∑
j=1
Wjδzj+1.
Thus
‖δck‖2 ≤k−1∑
j=1
‖Wjδzj+1‖2
=
k−1∑
j=1
‖δzj+1‖2
= [√
2(k − 1)(ωηk + q(m)εM ) + (m + n + 2)εM ]‖X‖2 + O(ε2M )
10 J.L. BARLOW
establishing the result.We now prove Theorem 3.1.Proof. (of Theorem 3.1) We use induction on k. For k = 1, we have
B1 = (γ1), U1 = (u1), V1 = (v1).
Thus
s1 = fl(Xv1) = (γ1 + δγ1)u1, |δγ1| ≤ m|γ1|εM + O(ε2M ),(3.19)
and
s1 + δs1 = Xv1, ‖δs1‖2 ≤ nεM‖X‖2 + O(ε2M ).(3.20)
Combining (3.19) and (3.20), we have
γ1u1 = Xv1 − δs1 − (δγ1)u1 = Xv1 + δc1,
δc1 = −δs1 − (δγ1)u1,
where
‖δc1‖2 ≤ (m + n)‖X‖2εM + O(ε2M ).
Rewriting the above in terms of W1, we have
W1
(
B1
0
)
= (I − w1wT1 )γ1e1 =
(
0γ1u1
)
=
(
0Xv1 − δc1
)
=
(
0Xv1
)
+ δc1, δc1 =
(
0δc1
)
where
‖δc1‖2 ≤ 2n‖X‖2εM + O(ε2M ),
thus establishing the result for k = 1.To construct the induction step, we write
Wk
(
Bk
0
)
= Wk
(
Bk−1 φkek−1
0 γke1
)
=
(
Wk
(
Bk−1
0
)
Wk
(
φkek−1
γke1
) )
=
(
Wk−1
(
Bk−1
0
)
Wk
(
φkek−1
γke1
) )
=
( (
0XVk−1
)
+ δCk−1 Wk
(
φkek−1
γke1
) )
where
δCk−1 = (δc1, . . . , δck−1).
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 11
From Lemma 3.4, we have (3.15) and (3.16). Thus
Wk
(
Bk
0
)
=
(
0XVk
)
+ δCk
δCk = (δc1, . . . , δck).
The bound on ‖δCk‖F in (3.5)–(3.6) comes from
‖δCk‖F =
k∑
j=1
‖δcj‖22
1/2
≤
k∑
j=1
[√
2(j − 1)(2ηj + q(m)εM ) + (m + n + 2)εM ]‖X‖2]2
1/2
+ O(ε2M ).
Using the triangle inequality, this becomes
‖δCk‖F ≤
k∑
j=1
(2√
2(j − 1)ηj)2
1/2
‖X‖2
+ [
k∑
j=1
(√
2jq(m)εM )2
1/2
+
k∑
j=1
((m + n + 2)εM )2
1/2
]‖X‖2 + O(ε2M ).(3.21)
The Cauchy-Schwarz inequality applied to the first term and bounding the secondterm yields
k∑
j=1
2√
2(j − 1)ηj
1/2
≤
k∑
j=1
j2
1/2
k∑
j=1
2√
2ηj
1/2
≤ 2√
2/3k3/2ηk(3.22)
k∑
j=1
(√
2(j − 1)q(m)εM )2
1/2
≤
k∑
j=1
j2
1/2
√2q(m)εM
≤√
2/3k3/2q(m)εM ,(3.23)
thus combining (3.21) with (3.22)–(3.23), we have (3.5)–(3.6).To use Theorem 3.1 to bound the distance between the singular values of B = Bn
and those of X , we need a lemma that bounds the difference between the singularvalues of X and XV .
Lemma 3.5. Let V = Vn be the result of n steps of Function 2.1. If σk(X) is thekth singular value of X, then
σk(XV )
(1 + orth(V ))1/2≤ σk(X) ≤ σk(XV )
(1 − orth(V ))1/2.(3.24)
Proof. We use the inequality in [15, p.419,Corollary 7.3.8] of the form
σk(X)σn(V ) ≤ σk(XV ) ≤ σk(X)σ1(V ).
12 J.L. BARLOW
Using the bounds in (2.10)–(2.11), on V we get (3.24).
If we just use Lemma 3.5 with Theorem 3.1, we obtain a bound on the singularvalues of B in terms of those of X .
Theorem 3.6. Assume the hypothesis and terminology of Theorem 3.1. Ex-cluding terms of O(ε2
M + η2n), the singular values of X and those of B are related
by
n∑
j=1
(σj(X) − σj(B))2
1/2
≤ ‖δCn‖F
+ ‖X‖F [(1 − orth(V ))−1/2 − 1](3.25)
≤ [f1(m, n, n)εM + (f2(n) + n/2)ηn]‖X‖2.(3.26)
Proof. From the triangle inequality applied to the two norm,
n∑
j=1
(σj(X) − σj(B))2
1/2
≤
n∑
j=1
(σj(X) − σj(XV ))2
1/2
+
n∑
j=1
(σj(XV ) − σj(B))2
1/2
.
From Theorem 3.1 and the Wielandt–Hoffman theorem [11, p.450,Thm. 8.6.4], wehave that
(
n∑
k=1
(σk(XV ) − σk(B))2
)1/2
≤ ‖δCn‖F .(3.27)
From Lemma 3.5, we have that
|σk(X) − σk(XV )| ≤ σk(X)max{1 − (1 + orth(V ))1/2 , (1 − orth(V ))−1/2 − 1}= [(1 − orth(V ))
−1/2 − 1]σk(X).
Using this inequality, we obtain the bound
n∑
k=1
(σk(XV ) − σk(B))2 ≤ [(1 − orth(V ))−1/2 − 1]2
n∑
k=1
σk(X)2
= [(1 − orth(V ))−1/2 − 1]2‖X‖2F .(3.28)
Combining (3.27) and (3.28) yields (3.25). To obtain (3.26) note the bound on ‖δCn‖F
in (3.5)–(3.6) and that
[(1 − orth(V ))−1/2 − 1] = (1/2)orth(V ) + O(orth2(V ))
≤√
n
2ηn + O(η2
n).(3.29)
Since ‖X‖F ≤ √n‖X‖2, (3.26) results from combining (3.29) with (3.25).
The bound (3.25) shows that the singular values of B are close to those of X aslong as orth(V ) is kept small.
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 13
4. Reorthogonalization Strategies. We discuss two common approaches tospecifying reorthog from Function 2.1. The first is to use Gram–Schmidt reorthog-onalization of vk+1 against all previously computed right Lanczos vectors §4.1; thesecond is to use a selective reorthogonalization strategy §4.2. A third method, speci-fied in §4.3, is used for our numerical tests to quantify the effect of loss of orthogonalityin V .
4.1. Complete Reorthogonalization. Our strategy for complete reorthogo-nalization, which grows out of “twice–is–enough” approaches in [14] and approachesin [8, 24], [21, §6.9], is a version of Gram–Schmidt reorthognalization given by Barlowet al. [3].
First, we compute
h(1) = V Tk rk, r
(1)k = rk − Vkh
(1).(4.1)
If
‖r(1)k ‖2 ≥
√
4/5‖rk‖2,(4.2)
then we accept
φk+1 = ‖r(1)k ‖2, vk+1 = r
(1)k /φk+1, hk+1 = h(1).
Otherwise, we compute
h(2) = V Tk r
(1)k , r
(2)k = r
(1)k − Vkh
(1).(4.3)
If
‖r(2)k ‖2 ≥
√
4/5‖r(1)k ‖2,(4.4)
then we accept
φk+1 = ‖r(2)k ‖2, vk+1 = r
(2)k /φk+1, hk+1 = h(1) + h(2).
If either (4.2) or (4.4) holds, we show in the Appendix, that, ignoring roundingerror, we have
‖V Tk vk+1‖2 ≤ 0.5ξk + O(ξ2
k)
where
ξk = ‖I − V Tk Vk‖2 = orth(Vk).(4.5)
If (4.2)–(4.4) is false, we use a method from [8] and modified in [3] to constructφk+1 and vk+1. We find eJ such that
‖V Tk eJ‖2 = min
1≤j≤m‖V T
k ej‖2,
and then compute
c(1) = V Tk eJ , t(1) = eJ − Vkc
(1),
c(2) = V Tk t(1), t(2) = t(1) − Vkc
(2).
14 J.L. BARLOW
Then
vk+1 = t(2)/‖t(2)‖2(4.6)
satisfies
‖V Tk vk+1‖2 ≤
√
k/(m − k)ξ2k + O(ξ4
k).
For all practical purposes, this choice of vk+1 restarts the Lanczos process. We proposetwo possible ways to choose φk+1. In exact arithmetic,
XTuk = Vkhk + r(2)k ,
where, in the Appendix, we show that
‖r(2)k ‖2 =
√
φ2k+1 + ‖nk‖2
2
≤ 2ξk(1 + ξk)‖rk‖2
≤ 2ξk(1 + ξk)‖X‖2(4.7)
which is small relative to ‖X‖2. Our first choice of φk+1, given by
φk+1 = vTk+1r
(2)k(4.8)
produces
XTuk = Vkhk + φk+1vk+1 + nk
where ‖nk‖2 is minimized over all choices of φk+1.However, since
|φk+1| ≤ ‖r(2)k ‖2 ≤ 2ξk (1 + ξk)
1/2 ‖X‖2,
our second choice of φk+1, given by
φk+1 = 0
neglects an element of size O(ξk‖X‖2), the magnitude of the bounds on the errors inthe singular values in Theorem 3.6.
We encapsulate this algorithm in Function 4.1.The Boolean variable setzero is true if we set φk+1 to zero when (4.2) and (4.4)
are false, and false if we compute φk+1 as in (4.8).Function 4.1 (Gram–Schmidt reorthogonalization of rk against Vk).
function [vk+1, hk, φk+1] = GS reorthog(Vk, rk, setzero)
h(1) = V Tk rk; r
(1)k = rk − Vkh
(1) ;
if ‖r(1)k ‖2 ≥
√
4/5‖rk‖2
φk+1 = ‖r(1)k ‖2; vk+1 = r
(1)k /φk+1; hk = h(1);
else
h(2) = V Tk r
(1)k ; r
(2)k = r
(1)k − Vkh
(2);
hk = h(1) + h(2);
if ‖r(2)k ‖2 ≥
√
4/5‖r(1)k ‖2
φk+1 = ‖r(2)k ‖2; vk+1 = r
(2)k /φk+1;
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 15
else
Find eJ such that‖V T
k eJ‖2 = min1≤j≤m ‖V Tk ej‖2
c(1) = V Tk eJ ; t(1) = eJ − Vkc
(1) ;
c(2) = V Tk t(1); t(2) = t(1) − Vkc
(2) ;vk+1 = t(2)/‖t(2)‖2;if setzero
φk+1 = 0;else
φk+1 = vTk+1r
(2)k ;
end;end;
end;end GS reorthog
4.2. Selective Reorthogonalization. Selective reorthogonalization was cre-ated by Parlett and Scott [22] from a result of Paige [18] showing that most of theloss of orthogonality in Vk is confined to converged right singular vectors. The variantof that strategy for this decomposition takes the SVD of Bk given by
Bk = QΘST(4.9)
Q = (q1, . . . ,qk) = (qij), S = (s1, . . . , sk) = (sij),(4.10)
Θ = diag(θ1, . . . , θk),(4.11)
and finds components such that ℓ1, . . . , ℓτ such that for a given tolerence tol we have
|φk+1qk,ℓj| ≤ tol ∗ ‖X‖F , j = 1, . . . , τ.
It then lets
Sτ = (sℓ1 , . . . , sℓτ)
be the corresponding right singular vectors of Bk so that the matrix
Zτ = VkSτ = (zℓ1 , . . . , zℓτ)
consists of converged right singular vectors of X . A reorthogonalization procedure,say, GS reorthog, with Zτ , computes a vector tk and vk+1 according to
[vk+1, tk+1, φk+1] = GS reorthog(Zτ , rk)
with the resulting hk+1 in (2.7) is given by
hk+1 = Sτtk+1.
The strategies in our examples in §5 are variants on performing Gram–Schmidt on allprevious right Lanczos vectors, thereby allowing us to demonstrate the effect on theorthogonality of V . Since we expect that τ ≪ k, this reorthogonalization practice isoften much cheaper than complete reorthogonalization.
16 J.L. BARLOW
4.3. Parametrized Reorthogonalization for Numerical Tests. To con-struct our numerical tests in §5, we give a parametrized modification ofGS reorthog in §4. Let
φ(0)k+1 = ‖rk‖2, v
(0)k+1 = rk/φ
(0)k+1,(4.12)
and let
r(j)k = (I − VkVk)jrk, φ
(j)k+1 = ‖r(j)
k ‖2, j = 1, 2(4.13)
v(j)k+1 = r
(j)k /φ
(j)k+1.(4.14)
For j = 0, 1, 2, we accept vk+1 = v(j)k+1 (and do no further reorthogonalization) if
‖V Tk v
(j)k+1‖2 ≤ η(4.15)
for some specified parameter η. If (4.15) is not satisfied for j = 0, 1, 2, we computevk+1 according to (4.6).
Function 4.2 (Parmaterized Gram–Schmidt reorthogonalization of rk againstVk).
function [vk+1, hk, φk+1] = GS reorthog eta (Vk, rk, η, setzero)
h(1) = V Tk rk; φk+1 = ‖rk‖2;
if ‖h(1)‖2 ≤ φk+1 ∗ η
vk+1 = rk/φk+1; hk = 0;else
r(1)k = rk − Vkh
(1) ; h(2) = V Tk r
(1)k ; φk+1 = ‖r(1)
k ‖2;
if ‖h(2)‖2 ≤ φk+1 ∗ η
vk+1 = r(1)k /φk+1; hk = h(1);
else
r(2)k = r
(1)k − Vkh
(2); h(3) = V Tk r
(2)k ;
φk+1 = ‖r(2)k ‖2
if ‖h(3)‖2 ≤ φk+1 ∗ η
vk+1 = r(2)k /φk+1; hk = h(1) + h(2);
else
Find eJ such that‖V T
k eJ‖2 = min1≤j≤m ‖V Tk ej‖2
c(1) = V Tk eJ ; t(1) = eJ − Vkc
(1) ;c(2) = V T
k t(1); t(2) = t(1) − Vkc(1) ;
vk+1 = t(2)/‖t(2)‖2;if setzero
φk+1 = 0;else
φk+1 = vTk+1r
(2)k ;
end;end;
end;end;end GS reorthog eta
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 17
This routine guarantees that
‖V Tk vk+1‖2 ≤ max{η,
√
n/(m − n)ξ2k}
where ξk is from (4.5).
5. Numerical Tests.
Example 5.1.For these examples, we construct m × n matrices of the form
X = PΣZT
where n = 50, 60, . . . , 300, m = ⌊1.5 ∗ n⌋, P ∈ Rm×n is left orthogonal, Z ∈ Rn×n
is orthogonal, and Σ is positive and diagonal. The matrices P and Z come from thetwo MATLAB commands
P = orth(randn(m, n)); Z = orth(randn(n, n));
where randn command which generates a m×n matrix with a standard normal distri-bution, and the orth command which produces the orthogonal factor of the contents.The diagonal matrix Σ is given by
Σ = diag(σ1, . . . , σn)
where σ1 = 1, σk = rk−1, and rn−1 = 10−18, giving X has a geometric distributionof singular values.
The bidiagonal reduction of X was computed in three different ways.1. The Golub–Kahan–Householder (GKH) algorithm from [10].2. The Golub–Kahan–Lanczos procedure using Function 4.1 to do reorthogonal-
ization setting setzero = false. We called this GKL-nonzero.3. The GKL procedure as in item (2), except setting setzero = true, for restarts
which we call GKL-setzero. For this case, several elements of the super-diagonal are set to zero. We show how many in Figure 5.2.
The singular values of the resulting bidiagonal matrices are computed by the MAT-LAB svd command and compared to result using the svd command directly on X.
The upper window of Figure 5.1 compares the singular values from the GKH al-gorithm to the GKL–nonzero algorithm. They are also displayed with orth(V ). Theseerrors and orth(V ) are about 10−15, thus V is near orthogonal and the singular valuescomputed by the two methods are very close. Their errors are difference between theircomputed singular values and those computed by the MATLAB svd command on X.In the lower window of Figure 5.1, we look at the difference in the singular valuesbetween GKL–nonzero and GKL–setzero, again displayed with orth(V ). It is shownhere that they are about 10−16, thus the two strategies are almost indistinguishable.
Example 5.2. We construct our examples exactly as in Example 5.1 except thatwe do reorthogonalization of V with Function 4.2, GS reorthog eta with η = 10−8.We do the same three kinds of bidiagonalization: GKH, GKL–nonzero, GKL-
setzero.The upper window of Figure 5.3 is the error in the singular values for GKH and
GKL-nonzero posted beside orth(V ). We see that orth(V ) is the range of 10−8 and theerror in the singular values for GKL-nonzero is a little smaller than that, consistentwith Theorem 3.6. In the lower window of Figure 5.3, we compare the singular valuesof the two restart strategies GKL-nonzero and GKL-setzero. Their difference is about10−9 and thus a little smaller than orth(V ).
18 J.L. BARLOW
50 100 150 200 250 300−15.6
−15.4
−15.2
−15
−14.8
−14.6
Dimension
Err
or
an
d o
rth
(V)
Log10
of Wielandt−−Hoffman Error in Geometric Dist. Matrices
GKH
GKL
orth(V)
50 100 150 200 250 300−17
−16.5
−16
−15.5
−15
−14.5
Dimension
Diff. in
Re
sta
rts
Log10
of difference in Singular Values for Restart Strategies for Lanczos
GKL−−difference in restarts
F−norm of discards
orth(V)
Fig. 5.1. Wielandt–Hoffman Error in Singular Values from Example 5.1
6. Conclusion. When the Golub–Kahan–Lanczos algorithm is applied to X ∈Rm×n, we can reorthogonalize just the right Lanczos vectors as proposed by [23].Theorem 3.1 establishes a key relationship between the Vk, the matrix of the firstk right Lanczos vectors, Bk, the leading k × k submatrix of B, and an orthogonalmatrix Wk generated from the left Lanczos vectors. As a consequence, the computedsingular values of B are a distance bounded by O([εM + orth(V )]‖X‖2) from those ofX .
Moreover, if the reorthogonalization strategies used to produce vk+1, the (k+1)stcolumn of V , do not produce an orthogonal vk+1, vk+1 can be produced from arestart strategy and the corresponding upper bidiagonal element φk+1 can be set tozero without significant loss of accuracy.
In [1], Theorem 3.1 is used to change the manner in which left singular vectorsare computed from the left Lanczos vectors.
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 19
50 100 150 200 250 3005
10
15
20
25
30
35
40
Dimension
Ze
roe
d o
ut su
pe
rdia
go
na
ls
Number of zero elements from restarts
Fig. 5.2. Number of zeros in superdiagonal Example 5.1
REFERENCES
[1] J.L. Barlow. Reorthogonalization for the Golub–Kahan–Lanczos bidiagonal reduction: PartII–singular vectors. http://www.cse.psu.edu/~barlow/lanczos svd orthII.pdf, 2010.
[2] J.L. Barlow, N. Bosner, and Z. Drmac. A new backward stable bidiagonal reduction method.Linear Alg. Appl., 397:35–84, 2005.
[3] J.L. Barlow, A. Smoktunowicz, and H. Erbay. Improved Gram–Schmidt downdating methods.BIT, 45:259–285, 2005.
[4] M. Berry, Z. Drmac, and E. Jessup. Matrices, vector spaces, and information retrieval. SIAM
Review, 41:335–362, 1999.[5] A. Bjorck. A bidiagonalization algorithm for solving large and sparse ill-posed systems of linear
equations. BIT, 28:659–670, 1988.[6] N. Bosner and J. Barlow. Block and parallel versions of one-!sided bidiagonalization. SIAM J.
Matrix Anal. Appl., 29(3):927–953, 2007.[7] D. Calvetti and L. Reichel. Tikhanov regularization on large linear problems. BIT, 43:263–283,
2003.[8] J. W. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart. Reorthogonalization and stable
algorithms for updating the Gram-Schmidt QR factorization. Math. Comp., 30(136):772–795, 1976.
[9] L. Elden. Algorithms for the regularization of ill-conditioned least square problems. BIT,17:134–145, 1977.
[10] G.H. Golub and W.M. Kahan. Calculating the singular values and pseudoinverse of a matrix.SIAM J. Num. Anal. Ser. B, 2:205–224, 1965.
[11] G.H. Golub and C.F. Van Loan. Matrix Computations, Third Edition. The Johns HopkinsPress, Baltimore,MD, 1996.
[12] N.J. Higham. Functions of Matrices: Theory and Computation. SIAM Publications, Philadel-phia, PA, 2008.
20 J.L. BARLOW
50 100 150 200 250 300−16
−14
−12
−10
−8
−6
Dimension
Err
or
an
d o
rth
(V)
Log10
of Wielandt−−Hoffman Error in Geometric Dist. Matrices
GKH
GKL
orth(V)
50 100 150 200 250 300−13
−12
−11
−10
−9
−8
−7
−6
Dimension
Diff. in
Re
sta
rts
Log10
of difference in Singular Values for Restart Strategies for Lanczos
GKL−−difference in restarts
F−norm of discards
orth(V)
Fig. 5.3. Maximum Error in Singular Values from Example 5.2
[13] I. Hnetynkova, M. Plesinger, and Z. Strakos. Golub-Kahan iterative bidiagonalization anddetermining the size of the noise in data. BIT, 49:669–696, 2009.
[14] W. Hoffmann. Iterative algorithms for Gram-Schmidt orthogonalization. Computing, 41:353–367, 1989.
[15] R.A. Horn and C.A. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,UK,1985.
[16] R. Mazumber, T. Hastie, and R. Tishbarani. Spectral regularization algorithms for learninglarge incomplete matrices. http://www-stat.stanford.edu/ hastie/Papers/SVD JMLR.pdf.
[17] C.C. Paige and M.A. Saunders. LSQR:An algorithm for sparse linear equations and leastsquares problems. ACM Trans. on Math. Software, 8:43–71, 1982.
[18] C.C. Paige. The Computation of Eigenvalues and Eigenvalues of Very Large Sparse Matrices.PhD thesis, University of London, 1971.
[19] C.C. Paige. A useful form of unitary matrix form any sequence of unit 2-norm n-vectors. SIAM
J. Matrix Anal. Appl., 31(2):565–583, 2009.[20] C.C. Paige and M.A. Saunders. Algorithm 583 LSQR:Sparse linear equations and least squares
problems. ACM Trans. on Math. Software, 8:195–209, 1982.[21] B.N. Parlett. The Symmetric Eigenvalue Problem. SIAM Publications, Philadelphia, PA, 1998.
Republication of 1980 book.[22] B.N. Parlett and D.S. Scott. The Lanczos algorithm with selective reorthogonalization. Math.
Comp., 33:217–238, 1979.
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 21
[23] H. Simon and H. Zha. Low rank matrix approximation using the Lanczos bidiagonalizationprocess. SIAM J. Sci. Stat. Computing, 21:2257–2274, 2000.
[24] K. Yoo and H. Park. Accurate downdating of a modified Gram–Schmidt QR decomposition.BIT, 36:166–181, 1996.
Appendix. In this appendix, we show two bounds that are related to Function4.1. They are similar to bounds proved in [3], but are instead stated in terms ofξk = orth(Vk).
First we assume that either (4.2) or (4.4) holds. Assuming (4.2) we have thatthat
vk+1 = r(1)k /‖r(1)
k ‖2.
The argument for (4.4) is identical. Our argument assumes exact arithmetic, but thearguments in [3] show that if ξk ≥ εM , rounding error has little qualitative effect onthe behavior of this procedure.
First, note that
‖V Tk vk+1‖2 ≤ ‖V T
k r(1)k ‖2/‖r(1)
k ‖2.
Since
r(1)k = (I − VkV T
k )rk
we have that
‖V Tk r
(1)k ‖2 = ‖V T
k (I − VkV Tk )rk‖2/‖r(1)
k ‖2
= ‖(I − V Tk Vk)V T
k rk‖2/‖r(1)k ‖2
≤ ‖I − V Tk Vk‖2‖V T
k rk‖2/‖r(1)k ‖2
= ξk‖V Tk rk‖2/‖r(1)
k ‖2.
We now bound the ratio
‖V Tk rk‖2/‖r(1)
k ‖2.(6.1)
To do that, we note that
‖r(1)k ‖2
2 = ‖(I − VkV Tk )rk‖2
2
= ‖rk‖22 − 2rT
k VkV Tk rk + ‖VkV T
k rk‖22
≤ ‖rk‖22 − 2‖V T
k rk‖22 + ‖Vk‖2
2‖V Tk rk‖2
2.
Since
‖Vk‖22 ≤ 1 + ξk
we have
‖r(1)k ‖2
2 ≤ ‖rk‖22 − (1 − ξk)‖V T
k rk‖22.
Using (4.2), this becomes
4
5‖rk‖2
2 ≤ ‖rk‖22 − (1 − ξk)‖V T
k rk‖22
22 J.L. BARLOW
implying that
‖V Tk rk‖2 ≤ 1
√
5(1 − ξk)‖rk‖2.(6.2)
Combining (6.2) with (4.2), our bound for (6.1) is
‖V Tk rk‖2/‖r(1)
k ‖2 ≤ 1
2√
1 − ξk
.
Thus
‖V Tk vk+1‖2 ≤ ξk
2√
1 − ξk= 0.5ξk + O(ξ2
k).
If (4.2) and (4.4) are both false, analysis in [8] yields
‖V Tk eJ‖2 ≤
√
k/m.
Thus, similar to that above, if we let
v(1)k+1 = t(1)/‖t(1)‖2
then
‖V Tk v
(1)k+1‖2 =
‖V Tk t(1)‖2
‖t(1)‖2
≤ ξk‖V T
k eJ‖2
‖t(1)‖2
≤√
k
m − kξk
Now we look at
vk+1 = t(2)/‖t(2)‖2.
A repeat of the arguments above yields
‖V Tk vk+1‖2 = ‖V T
k t(2)‖2/‖t(2)‖2
≤ ξk‖V Tk v
(1)k+1‖2/‖t(2)‖2
≤√
k
m − kξ2k/‖t(2)‖2
Since
t(2) = (I − VkV Tk )v
(1)k+1
we have that
‖t(2)‖22 = ‖v(1)
k+1‖22 − 2‖V T
k v(1)k+1‖2
2 + ‖VkV Tk v
(1)k+1‖2
2
≥ 1 − 2ξ2k
k
m − k+ σ2
k(Vk)‖V Tk v
(1)k+1‖2
2.
LANCZOS BIDIAGONAL REDUCTION–SINGULAR VALUES 23
Using (2.13), we have
‖t(2)‖22 ≥ 1 − ξ2
k(1 + ξk)k
m − k.
Thus
‖V Tk vk+1‖2 ≤
(
k
m − k
)1/2
ξ2k/(1 − ξ2
k(1 + ξk) (k/(m − k))1/2)
=
√
k
m − kξ2k + O(ξ4
k).
We now bound ‖r(2)k ‖2 in the case where ‖r(2)
k ‖2 <√
4/5‖r(1)k ‖2.
Computing two-norms yields
‖r(2)k ‖2
2 = ‖r(1)k ‖2
2 + ‖VkV Tk r
(1)k ‖2
2 − 2[r(1)k ]T VkV T
k r(1)k
= ‖r(1)k ‖2
2 + ‖VkV Tk r
(1)k ‖2
2 − 2‖V Tk r
(1)k ‖2
2.
The second term expands to
‖VkV Tk r
(1)k ‖2
2 = [r(1)k ]T Vk(V T
k Vk)V Tk r
(1)k
= ‖V Tk r
(1)k ‖2
2 − [r(1)k ]T VkGkV T
k r(1)k
where Gk = I − V Tk Vk and from (4.5), ‖Gk‖2 = orth(Vk) = ξk. Thus
‖r(2)k ‖2
2 = ‖r(1)k ‖2
2 − ‖V Tk r
(1)k ‖2
2 − 2[r(1)k ]T VkGkV T
k r(1)k .(6.3)
Equation (6.3) leads to the bound
‖r(2)k ‖2
2 = ‖r(1)k ‖2
2 − (1 + ξk)‖V Tk r
(1)k ‖2
2.
and since (4.2)–(4.4) is false,
4
5‖r(1)
k ‖22 ≥ ‖r(2)
k ‖22 = ‖r(1)
k ‖22 − (1 + ξk)‖V T
k r(1)k ‖2
2
so that
‖V Tk r
(1)k ‖2
2 ≥ 1
5(1 + ξk)‖r(1)
k ‖22(6.4)
which implies
‖V Tk r
(1)k ‖2
2 ≥ 1
5(1 + ξk)
5
4‖r(2)
k ‖2 =1
4(1 + ξk)‖r(2)
k ‖2.(6.5)
Since we have
V Tk r
(1)k = V T
k (I − VkV Tk )rk = GkV T
k rk,
then
‖V Tk r
(1)k ‖2
2 ≤ ‖Gk‖22‖V T
k rk‖22
≤ ξ2k(1 + ξk)‖rk‖2
2.(6.6)
Combining (6.6) and (6.5) and taking square roots, we have
‖r(2)k ‖2 ≤ 2ξk (1 + ξk)
1/2 ‖rk‖2.
Since ‖rk‖2 ≤ ‖X‖2, we have that
‖r(2)k ‖2 ≤ 2ξk (1 + ξk)1/2 ‖X‖2.(6.7)