Randomized Numerical Linear Algebra: Review and...

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Randomized Numerical Linear Algebra:Review and Progresses

Zhihua Zhang

Department of Computer Science and EngineeringShanghai Jiao Tong University

The 12th China Workshop on Machine Learning andApplications

Xi’an, November 2014

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

An interdisciplinary among Theoretical ComputerScience (TCS), Numerical Linear Algebra (NLA), andModern Data AnalysisMany data mining and machine learning algorithmsinvolve matrix decomposition, matrix inverse and matrixdeterminant; and some methods are based on low-rankmatrix approximation.The Big Data phenomenon brings new challenges andopportunities to machine learning and data mining.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Singular Value Decomposition (SVD)

Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.

time complexity: O(mn min(m,n))

The truncated SVD: Ak = UkΣkVTk where Uk and Vk

are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.

Ak is the “closest” rank-k approximation to A. That is,

Ak = argminrank(X)≤k

‖A− X‖ξ.

where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References








‖A− X‖ξ.


RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References








‖A− X‖ξ.


RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The CUR Decomposition

A CUR decomposition algorithm seeks to find a subset of ccolumns of A to form a matrix C ∈ Rm×c , a subset of r rowsto form a matrix R ∈ Rr×n, and an intersection matrixU ∈ Rc×r such that ‖A− CUR‖ξ is minimized.

The CUR decomposition results in an interpretablematrix approximation to A.There are (n

c) possible choices of constructing C and(mr ) possible choices of constructing R, so selecting the

best subsets is a hard problem.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Kernel Methods

K: n × n kernel matrix.Matrix inverse b = (K + αIn)−1y

time complexity: O(n3)performed by Gaussian process regression, leastsquare SVM, kernel ridge regression

Partial eigenvalue decomposition of Ktime complexity: O(n2k)performed by kernel PCA and some manifold learningmethods

Space complexity: O(n2)

the iterative algorithms go many passes through thedatayou had better put the entire kernel matrix in RAMif the data does not fit in the RAM, one swap betweenRAM and disk in each pass.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Approaches for Large Scale MatrixComputations

Two typical approaches: incremental and distributedRandomized algorithms have been also used.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Outline




4 References

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Johnson and Lindenstrauss Lemma

This lemma has been given by Johnson andLindenstrauss (1984), but the proof was notconstructive.Indyk and Motwani (1998) and Dasgupta and Gupta(2003) constructed a result based on Gaussian randomprojection matrix R = [rij ] where rij

iid∼ N(0,1).Matoušek (2008) generalized the result to the case thatrij ’s are any subgaussian random variables; that is,

rijiid∼ G(ν2) for ν ≥ 1.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Definition (ε-isometry)

Given ε ∈ (0,1), a map f : Rp → Rq where p > q is called anε-isometry on set X ⊂ Rp if for every pair x,y ∈ X , we have

(1− ε)‖x− y‖22 ≤ ‖f (x)− f (y)‖22 ≤ (1 + ε)‖x− y‖22.

We consider the case that f is defined as a linear mapR ∈ Rq×p. The Basic idea is to construct a randomprojection R ∈ Rq×p that is an exact isometry “inexpectation;" that is, for every x ∈ Rp,

E[‖Rx‖22

]= ‖x‖22.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Theorem (The Johnson and Lindenstrauss Lemma)

Let X = x1, . . . ,xn ⊂ Rp, and let ε, δ ∈ (0,1). Assume thatR ∈ Rq×p ( p > q) where rij ∈ G(ν2) for some ν ≥ 1. Ifq ≥ 100ν2ε−2 log(n/

√δ), then with probability at least 1− δ,

R is an ε-isometry on X

Pr

supy∈Y

∣∣‖Ry‖22 − 1∣∣ ≥ ε ≤ δ.

where Y =

xi−xj‖xi−xj‖2

: xi ,xj ∈ X ,xi 6= xj

.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Prototype for Randomized SVD

Given an m × n matrix A, a target number k of singularvectors, and an integer c such that k < c < min(m,n), aproto-algorithm based on random projection for SingularValue Decomposition (SVD) of A is as follows.

1 Construct an m × c column-orthonormal matrix Q andform B = QT A;

2 Compute SVD of the small matrix: B = UBΣBVTB ;

3 Set U = QUB;4 Return UΣBVT

B as an approximate SVD of A, andUB,kΣB,kVT

B,k as a truncated SVD of A.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

A Proto-Algorithm for Construction of RandomProjection Matrix Q

Let A be an m × n matrix, and k be a target number ofsingular vectors.

1 Generate an m × 2k Gaussian test matrix Ω.2 Form Y = (AAT )γAΩ where γ = 1 or γ = 2.3 Construct a matrix Q whose columns form an

orthonormal basis for the range of Y.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Computational Complexity for the RandomizedSVD

The randomized SVD procedure requires only 2(γ + 1)passes over the matrix.The flop count is

(2γ + 2)kTmult + O(k2(m + n)),

where Tmult is the flop count of a matrix-vector multiplywith A or AT .

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Theoretical Analysis for the Randomized SVD

Theorem (Halko et al., 2011)

Let A ∈ Rm×n. Give an exponent γ and a target number k ofsingular vectors, where 2 ≤ k ≤ 1

2 min(m,n), running theRandomized SVD algorithm obtains a rank-2k factorizationU2kΣ2k VT

2k . Then

E‖A− U2k Σ2k VT2k‖2 ≤

[1 + 4

√2 min(m,n)

k − 1

]1/(2γ+1)σk+1.

where E is taken w.r.t. the random test matrix and σk+1 isthe top (k + 1)th singular value of A.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Outline




4 References

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Subspace Embedding Problem

For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk

such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Subspace Embedding Problem

For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk

such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References





RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References





RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References





RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References





RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Subspace Embedding in Input-Sparsity Time

Theorem (Meng and Mahoney, 2013)

Let S = ΦD ∈ Rk×m with k = n2+nε2δ

. Then with probability atleast 1− δ,

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn. In addition, SA can be computed inO(nnz(A)).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Spectral Sparsifiers

Theorem (Batson, Spielman and Srivastava, 2014)

Suppose ρ > 1 and v1,v2, . . . ,vm ⊆ Rn with∑i≤m

vivTi = In.

Then there exist scalars di ≥ 0 with |i : di 6= 0| ≤ dρnesuch that(

1− 1√ρ

)2In

∑i≤m

divivTi

(1 +

1√ρ

)2In.

This theorem shows thatλ1(∑

i≤m divivTi )

λn(∑

i≤m divivTi )≤ρ+ 1 + 2

√ρ

ρ+ 1− 2√ρ.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Outline




4 References

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Column Selection and The CX Decomposition

Given an m × n matrix A, column selection algorithmsaim to find a matrix with c columns of A such that‖A− CC+A‖ξ = ‖(Im − CC+)A‖ξ achieves theminimum. Here “ξ = 2," “ξ = F ," and “ξ = ∗"respectively represent the matrix spectral norm, thematrix Frobenius norm, and the matrix nuclear norm,and C+ is the Moore-Penrose inverse of C.Let X be the best rank k approximation to A in thecolumn span of C. Then CX is called the CXDecomposition of A.Since there are (n

c) possible choices of constructing C,selecting the best subset is a hard problem.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

A Randomized Algorithm for Column Selection

Given an m × n matrix A and a rank parameter k , a randomsampling based on the statistical leverage score is:

Compute the importance sampling probabilities πini=1.Here πi = 1

k ‖Vk(i)‖, where Vk is an n × k orthonormal

matrix spanning the top-k right singular subspace of A.Randomly select c = O(k log(k/ε2)) columns of Aaccording to these probabilities to form the matrix C.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Theoretical Result for the Random ColumnSelection (Drineas et al., 2008)

Let Ck be the best rank-k approximation to the matrix C,and define the projection matrix PCk = CkC+

k . Then

‖A− PCk A‖F ≤ (1 + ε)‖A− Ak‖F ,

where Ak = UkΣkVTk is the best rank k approximation to A.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Adaptive Sampling Algorithm

Lemma (Deshpande et al., 2006)

Given a matrix A ∈ Rm×n, let C1 ∈ Rm×c1 consist of c1columns of A, and define the residual B = A− C1C+

1 A.Additionally, for i = 1, · · · ,n, define

πi = ‖bi‖22/‖B‖2F .

We further sample c2 columns i.i.d. from A, in each trial ofwhich the i-th column is chosen with probability πi . LetC2 ∈ Rm×c2 contain the c2 sampled columns and letC = [C1,C2] ∈ Rm×(c1+c2). Then, for any integer k > 0, thefollowing inequality holds:

E‖A− CC+A‖2F ≤ ‖A− Ak‖2F +kc2‖A− C1C+

1 A‖2F ,

where the expectation is taken w.r.t. C2.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Near-Optimal Column Selection Algorithm

Boutsidis et al. (2013) derived a near-optimal algorithm,which consists of three steps:

the approximate SVD via random projection (Halko etal. 2011)a dual set sparsification algorithm—an extension ofspectral sparsifier (BSS)the adaptive sampling algorithm (Deshpande et al.,2006)

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Near-Optimal Column Selection Algorithm

Theorem (Boutsidis et al., 2013)

Given a matrix A ∈ Rm×n of rank ρ, a target rank k(2 ≤ k < ρ), and 0 < ε < 1, the algorithm selects

c =2kε

(1 + o(1)

)columns of A to form a matrix C ∈ Rm×c . Then the followinginequality holds:

E‖A− CC+A‖2F ≤ (1 + ε) ‖A− Ak‖2F ,

where the expectation is taken w.r.t. C. Furthermore, thematrix C can be obtained in time:

O(mk2ε−4/3 + nk3ε−2/3)+ TMultiply

(mnkε−2/3).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The CUR Decomposition (Drineas et al., 2008;Mahoney and Drineas, 2009)

Given an m × n matrix A, and integers c < n and r < m, theCUR decomposition of A finds C ∈ Rm×c with c columnsfrom A, R ∈ Rr×n with r rows from A, and U ∈ Rc×r suchthat A = CUR + E. Here E = A− CUR is the residual errormatrix.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The CUR Problem

Definition (The CUR Decomposition)

Given an m × n matrix A of rank ρ, a rank parameter k , andaccuracy parameter ε ∈ (0,1), construct a matrix C ∈ Rm×c

with c columns from A, R ∈ Rr×n with rows from A, andU ∈ Rc×r , with c, r , and rank(U) being as small as possible,such that

‖A− CUR‖2F ≤ (1 + ε)‖A− Ak‖2F .

Here Ak = UkΣkVTk ∈ Rm×n is the best rank k matrix

obtained via the SVD of A: A = UΣVT .

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Subspace Sampling CUR Algorithm

Drineas et al., (2008) proposed a two-stage randomizedCUR algorithm that called Subspace Sampling.

The first stage samples c columns of A to construct Caccording to the sampling probabilities proportional tothe squared `2-norm of the rows of Vk ;The second stage samples r rows from A and Csimultaneously to construct R and W and let U = W†.The sampling probabilities in this stages areproportional to the leverage scores of A and C,respectively.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Subspace Sampling CUR Algorithm

Lemma (Drineas et al., 2008)

Given an m × n matrix A and a target rank k minm,n,the subspace sampling algorithm selectsc = O(kε−2 log k log(1/δ)) columns and r =O(cε−2 log c log(1/δ)

)rows without replacement. Then

‖A− CUR‖F =∥∥A− CW+R

∥∥F ≤ (1 + ε)‖A− Ak‖F ,

holds with probability at least 1− δ, where W contains therows of C with scaling. The running time is dominated bythe truncated SVD of A, that is, O(mnk).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Adaptive Sampling CUR Algorithm

Wang and Zhang (2013) proposed an Adaptive SamplingCUR Algorithm.

Select c = 2kε

(1 + o(1)

)columns of A to construct

C ∈ Rm×c using Algorithm of Boutsidis et al. (2013);Select r1 = c rows of A to construct R1 ∈ Rr1×n usingAlgorithm of Boutsidis et al. (2013);Adaptively sample r2 = c/ε rows from A according tothe residual A− AR†1R1;Return C, R = [RT

1 ,RT2 ]T , and U = C†AR†.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Lemma (Wang and Zhang, 2013)

Given a matrix A ∈ Rm×n and a matrix C ∈ Rm×c such thatrank(C) = rank(CC†A) = ρ (ρ ≤ c ≤ n), let R1 ∈ Rr1×n

consist of r1 rows of A and define the residualB = A− AR†1R1. Additionally, for i = 1, · · · ,m, we define

πi = ‖b(i)‖22/‖B‖2F .

We further sample r2 rows i.i.d. from A, in each trial of whichthe i-th row is chosen with probability pi . Let R2 ∈ Rr2×n

contain the r2 sampled rows and letR = [RT

1 ,RT2 ]T ∈ R(r1+r2)×n. Then we have

E‖A− CC†AR†R‖2F ≤ ‖A− CC†A‖2F +ρ

r2‖A− AR†1R1‖2F ,

where the expectation is taken w.r.t. R2.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Theorem (Wang and Zhang, 2013)

Given a matrix A ∈ Rm×n and a positive integerk minm,n, the Adaptive Sampling CUR algorithmrandomly selects c = 2k

ε (1+o(1)) columns of A to constructC ∈ Rm×c , and then selects r = c

ε (1+ε) rows of A toconstruct R ∈ Rr×n. Then we have

E‖A−CUR‖F = E‖A−C(C†AR†)R‖F ≤ (1+ε)‖A−Ak‖F .

The algorithm costs timeO((m + n)k3ε−2/3 + mk2ε−2 + nk2ε−4)+ TMultiply

(mnkε−1)

to compute matrices C, U and R.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Optimal CUR Algorithm

Boutsidis and Woodruff (2014) proposed Optimal CURAlgorithm.

Construction C with O(k + kε ) columns:

Compute the top k singular vectors of A: Z1Sample O(k log k) columns from ZT

1 with the leveragescoresDown-sample columns to c1 = O(k) columns with thesampling algorithm of Boutsidis et al. (2013)Adaptively sample c2 = O( k

ε ) columns of A

Construction R with O(k + kε ) rows:

Find Z2 in the span of C such that:‖A− Z2ZT

2 A‖2F ≤ (1 + ε) · ‖A− Ak‖2

FSample O(k log k) rows from Z2 with the leveragescoresDown-sample rows to r1 = O(k) rows with the samplingalgorithm of Boutsidis et al. (2013)Sample r2 = O( k

ε ) rows with adaptive sampling

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Lemma (Boutsidis and Woodruff, 2014)

Given a matrix A ∈ Rm×n, V ∈ Rm×c and an integer k, letV = YΨ be a QR decomposition of V, Γ = YT A,Γk = ∆ΣkVT

k be a rank k SVD of Γ, ∆ ∈ Rc×k . ThenY∆∆T YT satisfies:

‖A− Y∆∆T YT A‖2F ≤ ‖A− Y∆ΣkVTk ‖2F = ‖A− ΠF

V ,k (A)‖2F .

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Theorem (Boutsidis and Woodruff, 2014)

Given a matrix A ∈ Rm×n of rank ρ, a target rank 1 ≤ k ≤ ρ,and 0 < ε < 1, the optimal CUR algorithm selects at mostc = O(k/ε) columns and at most r = O(k/ε) rows from Aform matrices C ∈ Rm×c , R ∈ Rr×n, and U ∈ Rc×r withrank(U) = k such that, with some probability,

‖A− CUR‖2F ≤ ‖(1 + O(ε))‖A− Ak‖2F .

The matrices C, U, and R can be computed in time

O[nnz(A) log n + (m + n)× poly(log n, k ,1/ε)

].

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Nyström Method

Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have

K =

[W KT

21K21 K22

], C =

[W

K21

].

The Nyström Approximation: Knysc ≈ K

Knysc︸︷︷︸

n×n

= C︸︷︷︸n×c

W†︸︷︷︸c×c

CT︸︷︷︸c×n

.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Nyström Method

Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have

K =

[W KT

21K21 K22

], C =

[W

K21

].

The Nyström Approximation: Knysc ≈ K

Knysc︸︷︷︸

n×n

= C︸︷︷︸n×c

W†︸︷︷︸c×c

CT︸︷︷︸c×n

.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Nyström Approximation

The Nyström Approximation:

K ≈ Knysc = CW†CT

(A low-rank factorization).

NyströmApproximation × ×

n×n

c×n

n×c

c×c

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Problem Formulation

Problem:How to select informative columns of K ∈ Rn×n toconstruct C ∈ Rn×c?The approximation error

∥∥K− CUCT∥∥

F or∥∥K− CUCT∥∥

2 should be as small as possible.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Criterion: Upper Error Bounds

Using approximation algorithms to find c good columns(not necessarily the best)

Hope that ‖K−CUCT ‖F‖K−Kk‖F

has upper bound, which is thesmaller the better.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Uniform Sampling: The Simplest Algorithm

Sample c columns of K uniformly at random toconstruct C.

The simplest, but the most widely used.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Adaptive Sampling

The adaptive sampling algorithm [Deshpande et al. , 2006]:

1 Sample c1 columns of K to construct C1 using somealgorithm;

2 Compute the residual B = K− C1C†1K;

3 Compute sampling probabilities pi =‖bi‖2

2‖B‖2

F, for i = 1 to

n;4 Sample further c2 columns of K in c2 i.i.d. trials, in each

trial the i-th column is chosen with probability pi ;Denote the selected columns by C2;

5 Return C = [C1 , C2].

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Adaptive Sampling

The error term ‖K− CC†K‖F is bounded theoretically,but ‖K− CW†CT‖F is not.

Empirically, the adaptive sampling algorithm works verywell.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Better Sampling Algorithms?

We hope ‖K−CW†CT ‖F‖K−Kk‖F

will be very small if the columnsampling algorithm is good enough.

But it cannot be arbitrarily small.Lower Error Bound

Theorem (Wang & Zhang, JMLR 2013)

Whatever column sampling is used to select c columns,there exists a bad case K such that

‖K− CW†CT‖2F‖K− Kk‖2F

≥ Ω

(1 +

nkc2

).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Different Types of Low-Rank Approximation?

The Ensemble Nyström Method [Kumar et al. , JMLR2012]:

K ≈t∑

i=1

1t

C(i)W(i)†C(i)T

It does not improve the lower error bound.Lower Error Bound


Whatever column sampling is used to select c columns,there exists a bad case K such that∥∥K−

∑ti=1

1t C(i)W(i)†C(i)T∥∥2

F

‖K− Kk‖2F≥ Ω

(1 +

nkc2

).

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

The Modified Nyström Method

The Modified Nyström Method [Wang & Zhang, JMLR2013]:

K ≈ C(

C†K(C†)T︸︷︷︸c×c

)CT .


Using a column sampling algorithm, the error incurred bythe modified Nyström method satisfies

E∥∥K− C

(C†K(C†)T )CT

∥∥2F

‖K− Kk‖2F≤ 1 +

√kc.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Comparisons between the Two Methods

The Standard Nyström Method: fast.It costs only TSVD(c3) time to compute the intersectionmatrix Unys = W†.

The Modified Nyström Method: slow.It costs TSVD(nc2) + TMultiply(n2c) time to compute theintersection matrix Umod = C†K(C†)T naively.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


The Standard Nyström Method: inaccurate.It cannot attain 1 + ε Frobenius relative-error boundunless

c ≥√

nk/ε

columns are selected, whatever column selectionalgorithm is used. (Due to its lower error bound.)

The Modified Nyström Method: accurate.Some adaptive sampling based algorithms attain 1 + εFrobenius relative-error bound when

c = O(k/ε2).

(c is the smaller the better.)

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References


Theorem (Exact Recovery)

For the symmetric matrix K defined previously, the followingthree statements are equivalent:

1 rank(W) = rank(K),

2 K = CW†CT ,(i.e., the standard Nyström method is exact)

3 K = C(C†K(C†)T )CT ,

(i.e., the modified Nyström method is exact)

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

Outline




4 References

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

References

Santosh S. Vempala. The Random Projection Method. AmericanMathematical Society, 2000.

Michael W. Mahoney. Randomized Algorithms for Matrices andData. Foundations and Trends in Machine Learning, 3(2): 123-224,2011.

N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure withRandomness: Probabilistic Algorithms for Constructing ApproximateMatrix Decompositions. SIAM Review, 53(2): 217-288, 2011

W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitzmapping into a Hilbert space. Contemporary Mathematics, 1984.

S. Dasgupta and A. Gupta. An elementary proof of a theorem ofJohnson and Lindenstrauss. Random Structure & Algorithms, 2003.

J. Matoušek. On variants of the Johnson and LindenstraussLeamma. Random Structure & Algorithms, 2008.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

References

A. Dasgupta, R. Kumar, and T. Sarlós: A sparseJohnson-Lindenstrauss Transform. In STOC, 2010.

K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.

X. Meng and M. W. Mahoney. Low-distortion subspace embeddingsin input-sparsity time and applications to robust linear regression.STOC, 2013.

J. Nelson and H. L. Nguyên. OSNAP: Faster numerical linearalgebra algorithms via sparser subspace embeddings In FOCS,2013.

J. Batson, D. Spielman, and N. Srivastave: Twice-RamanujanSparsifiers. SIAM Review, 2014.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

References

A. Frieze, K. Kannan, and Rademacher, S. Vempala: FastMonte-Carlo algorithms for finding low-rank approximation. InFOCS, 1998. Journal of the ACM, 2004.

A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrixapproximation and projective clustering via volume sampling.Theory of Computing, 2006.

C. Boutsidis, P. Drineas, and M. Magdon-Ismail: Near optimalcolumn-based matrix reconstruction. SIAM Journal on Computing,2013.

V. Guruswami and A. K. Sinop: Optimal column based low-rankmatrix reconstruction. In SODA, 2012.

RandomizedNumerical

Linear Algebra

Zhang


Randomized SVD

SubspaceEmbedding


CUR Decomposition

The Nyström Method

References

References

P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-errorCUR matrix decompositions. SIAM Journal on Matrix Analysis andApplications, 2008.

M. W. Mahoney and P. Drineas. CUR matrix decompositions forimproved data analysis. Proceedings of the National Academy ofSciences, 2009.

Sshuse Wang and Zhihua Zhang: Improving CUR matrixdecomposition and the Nyström approximation via adaptivesampling. JMLR, 2013.

C. Boutsidis and D. P. Woodruff: Optimal CUR matrixdecompositions. In STOC, 2014.

S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for theNyström method. JMLR, 2012.

K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.

Date post:	07-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Randomized Numerical Linear Algebra: Review and...

Documents