+ All Categories
Home > Documents > Randomized Numerical Linear Algebra: Review and...

Randomized Numerical Linear Algebra: Review and...

Date post: 07-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
63
Randomized Numerical Linear Algebra Zhang Random Projection The Johnson and Lindenstrauss Lemma Randomized SVD Subspace Embedding Random Selection Column Selection CUR Decomposition The Nyström Method References Randomized Numerical Linear Algebra: Review and Progresses Zhihua Zhang Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi’an, November 2014
Transcript
Page 1: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Randomized Numerical Linear Algebra:Review and Progresses

Zhihua Zhang

Department of Computer Science and EngineeringShanghai Jiao Tong University

The 12th China Workshop on Machine Learning andApplications

Xi’an, November 2014

Page 2: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

An interdisciplinary among Theoretical ComputerScience (TCS), Numerical Linear Algebra (NLA), andModern Data AnalysisMany data mining and machine learning algorithmsinvolve matrix decomposition, matrix inverse and matrixdeterminant; and some methods are based on low-rankmatrix approximation.The Big Data phenomenon brings new challenges andopportunities to machine learning and data mining.

Page 3: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Singular Value Decomposition (SVD)

Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.

time complexity: O(mn min(m,n))

The truncated SVD: Ak = UkΣkVTk where Uk and Vk

are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.

Ak is the “closest” rank-k approximation to A. That is,

Ak = argminrank(X)≤k

‖A− X‖ξ.

where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)

Page 4: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Singular Value Decomposition (SVD)

Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.

time complexity: O(mn min(m,n))

The truncated SVD: Ak = UkΣkVTk where Uk and Vk

are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.

Ak is the “closest” rank-k approximation to A. That is,

Ak = argminrank(X)≤k

‖A− X‖ξ.

where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)

Page 5: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Singular Value Decomposition (SVD)

Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.

time complexity: O(mn min(m,n))

The truncated SVD: Ak = UkΣkVTk where Uk and Vk

are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.

Ak is the “closest” rank-k approximation to A. That is,

Ak = argminrank(X)≤k

‖A− X‖ξ.

where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)

Page 6: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The CUR Decomposition

A CUR decomposition algorithm seeks to find a subset of ccolumns of A to form a matrix C ∈ Rm×c , a subset of r rowsto form a matrix R ∈ Rr×n, and an intersection matrixU ∈ Rc×r such that ‖A− CUR‖ξ is minimized.

The CUR decomposition results in an interpretablematrix approximation to A.There are (n

c) possible choices of constructing C and(mr ) possible choices of constructing R, so selecting the

best subsets is a hard problem.

Page 7: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Kernel Methods

K: n × n kernel matrix.Matrix inverse b = (K + αIn)−1y

time complexity: O(n3)performed by Gaussian process regression, leastsquare SVM, kernel ridge regression

Partial eigenvalue decomposition of Ktime complexity: O(n2k)performed by kernel PCA and some manifold learningmethods

Space complexity: O(n2)

the iterative algorithms go many passes through thedatayou had better put the entire kernel matrix in RAMif the data does not fit in the RAM, one swap betweenRAM and disk in each pass.

Page 8: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Approaches for Large Scale MatrixComputations

Two typical approaches: incremental and distributedRandomized algorithms have been also used.

Page 9: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

Page 10: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

Page 11: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Johnson and Lindenstrauss Lemma

This lemma has been given by Johnson andLindenstrauss (1984), but the proof was notconstructive.Indyk and Motwani (1998) and Dasgupta and Gupta(2003) constructed a result based on Gaussian randomprojection matrix R = [rij ] where rij

iid∼ N(0,1).Matoušek (2008) generalized the result to the case thatrij ’s are any subgaussian random variables; that is,

rijiid∼ G(ν2) for ν ≥ 1.

Page 12: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Johnson and Lindenstrauss Lemma

Definition (ε-isometry)

Given ε ∈ (0,1), a map f : Rp → Rq where p > q is called anε-isometry on set X ⊂ Rp if for every pair x,y ∈ X , we have

(1− ε)‖x− y‖22 ≤ ‖f (x)− f (y)‖22 ≤ (1 + ε)‖x− y‖22.

We consider the case that f is defined as a linear mapR ∈ Rq×p. The Basic idea is to construct a randomprojection R ∈ Rq×p that is an exact isometry “inexpectation;" that is, for every x ∈ Rp,

E[‖Rx‖22

]= ‖x‖22.

Page 13: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Johnson and Lindenstrauss Lemma

Theorem (The Johnson and Lindenstrauss Lemma)

Let X = x1, . . . ,xn ⊂ Rp, and let ε, δ ∈ (0,1). Assume thatR ∈ Rq×p ( p > q) where rij ∈ G(ν2) for some ν ≥ 1. Ifq ≥ 100ν2ε−2 log(n/

√δ), then with probability at least 1− δ,

R is an ε-isometry on X

Pr

supy∈Y

∣∣‖Ry‖22 − 1∣∣ ≥ ε ≤ δ.

where Y =

xi−xj‖xi−xj‖2

: xi ,xj ∈ X ,xi 6= xj

.

Page 14: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Prototype for Randomized SVD

Given an m × n matrix A, a target number k of singularvectors, and an integer c such that k < c < min(m,n), aproto-algorithm based on random projection for SingularValue Decomposition (SVD) of A is as follows.

1 Construct an m × c column-orthonormal matrix Q andform B = QT A;

2 Compute SVD of the small matrix: B = UBΣBVTB ;

3 Set U = QUB;4 Return UΣBVT

B as an approximate SVD of A, andUB,kΣB,kVT

B,k as a truncated SVD of A.

Page 15: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

A Proto-Algorithm for Construction of RandomProjection Matrix Q

Let A be an m × n matrix, and k be a target number ofsingular vectors.

1 Generate an m × 2k Gaussian test matrix Ω.2 Form Y = (AAT )γAΩ where γ = 1 or γ = 2.3 Construct a matrix Q whose columns form an

orthonormal basis for the range of Y.

Page 16: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Computational Complexity for the RandomizedSVD

The randomized SVD procedure requires only 2(γ + 1)passes over the matrix.The flop count is

(2γ + 2)kTmult + O(k2(m + n)),

where Tmult is the flop count of a matrix-vector multiplywith A or AT .

Page 17: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Theoretical Analysis for the Randomized SVD

Theorem (Halko et al., 2011)

Let A ∈ Rm×n. Give an exponent γ and a target number k ofsingular vectors, where 2 ≤ k ≤ 1

2 min(m,n), running theRandomized SVD algorithm obtains a rank-2k factorizationU2kΣ2k VT

2k . Then

E‖A− U2k Σ2k VT2k‖2 ≤

[1 + 4

√2 min(m,n)

k − 1

]1/(2γ+1)σk+1.

where E is taken w.r.t. the random test matrix and σk+1 isthe top (k + 1)th singular value of A.

Page 18: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

Page 19: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Subspace Embedding Problem

For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk

such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.

Page 20: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Subspace Embedding Problem

For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk

such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.

Page 21: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

Page 22: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

Page 23: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

Page 24: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

Page 25: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Sparse Embedding Matrices

For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows

h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.

A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).

Page 26: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Subspace Embedding in Input-Sparsity Time

Theorem (Meng and Mahoney, 2013)

Let S = ΦD ∈ Rk×m with k = n2+nε2δ

. Then with probability atleast 1− δ,

(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2

for all x ∈ Rn. In addition, SA can be computed inO(nnz(A)).

Page 27: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Spectral Sparsifiers

Theorem (Batson, Spielman and Srivastava, 2014)

Suppose ρ > 1 and v1,v2, . . . ,vm ⊆ Rn with∑i≤m

vivTi = In.

Then there exist scalars di ≥ 0 with |i : di 6= 0| ≤ dρnesuch that(

1− 1√ρ

)2In

∑i≤m

divivTi

(1 +

1√ρ

)2In.

This theorem shows thatλ1(∑

i≤m divivTi )

λn(∑

i≤m divivTi )≤ρ+ 1 + 2

√ρ

ρ+ 1− 2√ρ.

Page 28: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

Page 29: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Column Selection and The CX Decomposition

Given an m × n matrix A, column selection algorithmsaim to find a matrix with c columns of A such that‖A− CC+A‖ξ = ‖(Im − CC+)A‖ξ achieves theminimum. Here “ξ = 2," “ξ = F ," and “ξ = ∗"respectively represent the matrix spectral norm, thematrix Frobenius norm, and the matrix nuclear norm,and C+ is the Moore-Penrose inverse of C.Let X be the best rank k approximation to A in thecolumn span of C. Then CX is called the CXDecomposition of A.Since there are (n

c) possible choices of constructing C,selecting the best subset is a hard problem.

Page 30: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

A Randomized Algorithm for Column Selection

Given an m × n matrix A and a rank parameter k , a randomsampling based on the statistical leverage score is:

Compute the importance sampling probabilities πini=1.Here πi = 1

k ‖Vk(i)‖, where Vk is an n × k orthonormal

matrix spanning the top-k right singular subspace of A.Randomly select c = O(k log(k/ε2)) columns of Aaccording to these probabilities to form the matrix C.

Page 31: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Theoretical Result for the Random ColumnSelection (Drineas et al., 2008)

Let Ck be the best rank-k approximation to the matrix C,and define the projection matrix PCk = CkC+

k . Then

‖A− PCk A‖F ≤ (1 + ε)‖A− Ak‖F ,

where Ak = UkΣkVTk is the best rank k approximation to A.

Page 32: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Adaptive Sampling Algorithm

Lemma (Deshpande et al., 2006)

Given a matrix A ∈ Rm×n, let C1 ∈ Rm×c1 consist of c1columns of A, and define the residual B = A− C1C+

1 A.Additionally, for i = 1, · · · ,n, define

πi = ‖bi‖22/‖B‖2F .

We further sample c2 columns i.i.d. from A, in each trial ofwhich the i-th column is chosen with probability πi . LetC2 ∈ Rm×c2 contain the c2 sampled columns and letC = [C1,C2] ∈ Rm×(c1+c2). Then, for any integer k > 0, thefollowing inequality holds:

E‖A− CC+A‖2F ≤ ‖A− Ak‖2F +kc2‖A− C1C+

1 A‖2F ,

where the expectation is taken w.r.t. C2.

Page 33: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Near-Optimal Column Selection Algorithm

Boutsidis et al. (2013) derived a near-optimal algorithm,which consists of three steps:

the approximate SVD via random projection (Halko etal. 2011)a dual set sparsification algorithm—an extension ofspectral sparsifier (BSS)the adaptive sampling algorithm (Deshpande et al.,2006)

Page 34: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Near-Optimal Column Selection Algorithm

Theorem (Boutsidis et al., 2013)

Given a matrix A ∈ Rm×n of rank ρ, a target rank k(2 ≤ k < ρ), and 0 < ε < 1, the algorithm selects

c =2kε

(1 + o(1)

)columns of A to form a matrix C ∈ Rm×c . Then the followinginequality holds:

E‖A− CC+A‖2F ≤ (1 + ε) ‖A− Ak‖2F ,

where the expectation is taken w.r.t. C. Furthermore, thematrix C can be obtained in time:

O(mk2ε−4/3 + nk3ε−2/3)+ TMultiply

(mnkε−2/3).

Page 35: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The CUR Decomposition (Drineas et al., 2008;Mahoney and Drineas, 2009)

Given an m × n matrix A, and integers c < n and r < m, theCUR decomposition of A finds C ∈ Rm×c with c columnsfrom A, R ∈ Rr×n with r rows from A, and U ∈ Rc×r suchthat A = CUR + E. Here E = A− CUR is the residual errormatrix.

Page 36: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The CUR Problem

Definition (The CUR Decomposition)

Given an m × n matrix A of rank ρ, a rank parameter k , andaccuracy parameter ε ∈ (0,1), construct a matrix C ∈ Rm×c

with c columns from A, R ∈ Rr×n with rows from A, andU ∈ Rc×r , with c, r , and rank(U) being as small as possible,such that

‖A− CUR‖2F ≤ (1 + ε)‖A− Ak‖2F .

Here Ak = UkΣkVTk ∈ Rm×n is the best rank k matrix

obtained via the SVD of A: A = UΣVT .

Page 37: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Subspace Sampling CUR Algorithm

Drineas et al., (2008) proposed a two-stage randomizedCUR algorithm that called Subspace Sampling.

The first stage samples c columns of A to construct Caccording to the sampling probabilities proportional tothe squared `2-norm of the rows of Vk ;The second stage samples r rows from A and Csimultaneously to construct R and W and let U = W†.The sampling probabilities in this stages areproportional to the leverage scores of A and C,respectively.

Page 38: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Subspace Sampling CUR Algorithm

Lemma (Drineas et al., 2008)

Given an m × n matrix A and a target rank k minm,n,the subspace sampling algorithm selectsc = O(kε−2 log k log(1/δ)) columns and r =O(cε−2 log c log(1/δ)

)rows without replacement. Then

‖A− CUR‖F =∥∥A− CW+R

∥∥F ≤ (1 + ε)‖A− Ak‖F ,

holds with probability at least 1− δ, where W contains therows of C with scaling. The running time is dominated bythe truncated SVD of A, that is, O(mnk).

Page 39: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Adaptive Sampling CUR Algorithm

Wang and Zhang (2013) proposed an Adaptive SamplingCUR Algorithm.

Select c = 2kε

(1 + o(1)

)columns of A to construct

C ∈ Rm×c using Algorithm of Boutsidis et al. (2013);Select r1 = c rows of A to construct R1 ∈ Rr1×n usingAlgorithm of Boutsidis et al. (2013);Adaptively sample r2 = c/ε rows from A according tothe residual A− AR†1R1;Return C, R = [RT

1 ,RT2 ]T , and U = C†AR†.

Page 40: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Adaptive Sampling CUR Algorithm

Lemma (Wang and Zhang, 2013)

Given a matrix A ∈ Rm×n and a matrix C ∈ Rm×c such thatrank(C) = rank(CC†A) = ρ (ρ ≤ c ≤ n), let R1 ∈ Rr1×n

consist of r1 rows of A and define the residualB = A− AR†1R1. Additionally, for i = 1, · · · ,m, we define

πi = ‖b(i)‖22/‖B‖2F .

We further sample r2 rows i.i.d. from A, in each trial of whichthe i-th row is chosen with probability pi . Let R2 ∈ Rr2×n

contain the r2 sampled rows and letR = [RT

1 ,RT2 ]T ∈ R(r1+r2)×n. Then we have

E‖A− CC†AR†R‖2F ≤ ‖A− CC†A‖2F +ρ

r2‖A− AR†1R1‖2F ,

where the expectation is taken w.r.t. R2.

Page 41: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Adaptive Sampling CUR Algorithm

Theorem (Wang and Zhang, 2013)

Given a matrix A ∈ Rm×n and a positive integerk minm,n, the Adaptive Sampling CUR algorithmrandomly selects c = 2k

ε (1+o(1)) columns of A to constructC ∈ Rm×c , and then selects r = c

ε (1+ε) rows of A toconstruct R ∈ Rr×n. Then we have

E‖A−CUR‖F = E‖A−C(C†AR†)R‖F ≤ (1+ε)‖A−Ak‖F .

The algorithm costs timeO((m + n)k3ε−2/3 + mk2ε−2 + nk2ε−4)+ TMultiply

(mnkε−1)

to compute matrices C, U and R.

Page 42: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Optimal CUR Algorithm

Boutsidis and Woodruff (2014) proposed Optimal CURAlgorithm.

Construction C with O(k + kε ) columns:

Compute the top k singular vectors of A: Z1Sample O(k log k) columns from ZT

1 with the leveragescoresDown-sample columns to c1 = O(k) columns with thesampling algorithm of Boutsidis et al. (2013)Adaptively sample c2 = O( k

ε ) columns of A

Construction R with O(k + kε ) rows:

Find Z2 in the span of C such that:‖A− Z2ZT

2 A‖2F ≤ (1 + ε) · ‖A− Ak‖2

FSample O(k log k) rows from Z2 with the leveragescoresDown-sample rows to r1 = O(k) rows with the samplingalgorithm of Boutsidis et al. (2013)Sample r2 = O( k

ε ) rows with adaptive sampling

Page 43: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Optimal CUR Algorithm

Lemma (Boutsidis and Woodruff, 2014)

Given a matrix A ∈ Rm×n, V ∈ Rm×c and an integer k, letV = YΨ be a QR decomposition of V, Γ = YT A,Γk = ∆ΣkVT

k be a rank k SVD of Γ, ∆ ∈ Rc×k . ThenY∆∆T YT satisfies:

‖A− Y∆∆T YT A‖2F ≤ ‖A− Y∆ΣkVTk ‖2F = ‖A− ΠF

V ,k (A)‖2F .

Page 44: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Optimal CUR Algorithm

Theorem (Boutsidis and Woodruff, 2014)

Given a matrix A ∈ Rm×n of rank ρ, a target rank 1 ≤ k ≤ ρ,and 0 < ε < 1, the optimal CUR algorithm selects at mostc = O(k/ε) columns and at most r = O(k/ε) rows from Aform matrices C ∈ Rm×c , R ∈ Rr×n, and U ∈ Rc×r withrank(U) = k such that, with some probability,

‖A− CUR‖2F ≤ ‖(1 + O(ε))‖A− Ak‖2F .

The matrices C, U, and R can be computed in time

O[nnz(A) log n + (m + n)× poly(log n, k ,1/ε)

].

Page 45: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Nyström Method

Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have

K =

[W KT

21K21 K22

], C =

[W

K21

].

The Nyström Approximation: Knysc ≈ K

Knysc︸︷︷︸

n×n

= C︸︷︷︸n×c

W†︸︷︷︸c×c

CT︸︷︷︸c×n

.

Page 46: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Nyström Method

Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have

K =

[W KT

21K21 K22

], C =

[W

K21

].

The Nyström Approximation: Knysc ≈ K

Knysc︸︷︷︸

n×n

= C︸︷︷︸n×c

W†︸︷︷︸c×c

CT︸︷︷︸c×n

.

Page 47: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Nyström Approximation

The Nyström Approximation:

K ≈ Knysc = CW†CT

(A low-rank factorization).

NyströmApproximation × ×

n×n

c×n

n×c

c×c

Page 48: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Problem Formulation

Problem:How to select informative columns of K ∈ Rn×n toconstruct C ∈ Rn×c?The approximation error

∥∥K− CUCT∥∥

F or∥∥K− CUCT∥∥

2 should be as small as possible.

Page 49: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Criterion: Upper Error Bounds

Using approximation algorithms to find c good columns(not necessarily the best)

Hope that ‖K−CUCT ‖F‖K−Kk‖F

has upper bound, which is thesmaller the better.

Page 50: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Uniform Sampling: The Simplest Algorithm

Sample c columns of K uniformly at random toconstruct C.

The simplest, but the most widely used.

Page 51: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Adaptive Sampling

The adaptive sampling algorithm [Deshpande et al. , 2006]:

1 Sample c1 columns of K to construct C1 using somealgorithm;

2 Compute the residual B = K− C1C†1K;

3 Compute sampling probabilities pi =‖bi‖2

2‖B‖2

F, for i = 1 to

n;4 Sample further c2 columns of K in c2 i.i.d. trials, in each

trial the i-th column is chosen with probability pi ;Denote the selected columns by C2;

5 Return C = [C1 , C2].

Page 52: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Adaptive Sampling

The error term ‖K− CC†K‖F is bounded theoretically,but ‖K− CW†CT‖F is not.

Empirically, the adaptive sampling algorithm works verywell.

Page 53: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Better Sampling Algorithms?

We hope ‖K−CW†CT ‖F‖K−Kk‖F

will be very small if the columnsampling algorithm is good enough.

But it cannot be arbitrarily small.Lower Error Bound

Theorem (Wang & Zhang, JMLR 2013)

Whatever column sampling is used to select c columns,there exists a bad case K such that

‖K− CW†CT‖2F‖K− Kk‖2F

≥ Ω

(1 +

nkc2

).

Page 54: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Different Types of Low-Rank Approximation?

The Ensemble Nyström Method [Kumar et al. , JMLR2012]:

K ≈t∑

i=1

1t

C(i)W(i)†C(i)T

It does not improve the lower error bound.Lower Error Bound

Theorem (Wang & Zhang, JMLR 2013)

Whatever column sampling is used to select c columns,there exists a bad case K such that∥∥K−

∑ti=1

1t C(i)W(i)†C(i)T∥∥2

F

‖K− Kk‖2F≥ Ω

(1 +

nkc2

).

Page 55: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

The Modified Nyström Method

The Modified Nyström Method [Wang & Zhang, JMLR2013]:

K ≈ C(

C†K(C†)T︸ ︷︷ ︸c×c

)CT .

Theorem (Wang & Zhang, JMLR 2013)

Using a column sampling algorithm, the error incurred bythe modified Nyström method satisfies

E∥∥K− C

(C†K(C†)T )CT

∥∥2F

‖K− Kk‖2F≤ 1 +

√kc.

Page 56: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Comparisons between the Two Methods

The Standard Nyström Method: fast.It costs only TSVD(c3) time to compute the intersectionmatrix Unys = W†.

The Modified Nyström Method: slow.It costs TSVD(nc2) + TMultiply(n2c) time to compute theintersection matrix Umod = C†K(C†)T naively.

Page 57: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Comparisons between the Two Methods

The Standard Nyström Method: inaccurate.It cannot attain 1 + ε Frobenius relative-error boundunless

c ≥√

nk/ε

columns are selected, whatever column selectionalgorithm is used. (Due to its lower error bound.)

The Modified Nyström Method: accurate.Some adaptive sampling based algorithms attain 1 + εFrobenius relative-error bound when

c = O(k/ε2).

(c is the smaller the better.)

Page 58: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Comparisons between the Two Methods

Theorem (Exact Recovery)

For the symmetric matrix K defined previously, the followingthree statements are equivalent:

1 rank(W) = rank(K),

2 K = CW†CT ,(i.e., the standard Nyström method is exact)

3 K = C(C†K(C†)T )CT ,

(i.e., the modified Nyström method is exact)

Page 59: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

Outline

1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD

2 Subspace Embedding

3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method

4 References

Page 60: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

References

Santosh S. Vempala. The Random Projection Method. AmericanMathematical Society, 2000.

Michael W. Mahoney. Randomized Algorithms for Matrices andData. Foundations and Trends in Machine Learning, 3(2): 123-224,2011.

N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure withRandomness: Probabilistic Algorithms for Constructing ApproximateMatrix Decompositions. SIAM Review, 53(2): 217-288, 2011

W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitzmapping into a Hilbert space. Contemporary Mathematics, 1984.

S. Dasgupta and A. Gupta. An elementary proof of a theorem ofJohnson and Lindenstrauss. Random Structure & Algorithms, 2003.

J. Matoušek. On variants of the Johnson and LindenstraussLeamma. Random Structure & Algorithms, 2008.

Page 61: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

References

A. Dasgupta, R. Kumar, and T. Sarlós: A sparseJohnson-Lindenstrauss Transform. In STOC, 2010.

K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.

X. Meng and M. W. Mahoney. Low-distortion subspace embeddingsin input-sparsity time and applications to robust linear regression.STOC, 2013.

J. Nelson and H. L. Nguyên. OSNAP: Faster numerical linearalgebra algorithms via sparser subspace embeddings In FOCS,2013.

J. Batson, D. Spielman, and N. Srivastave: Twice-RamanujanSparsifiers. SIAM Review, 2014.

Page 62: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

References

A. Frieze, K. Kannan, and Rademacher, S. Vempala: FastMonte-Carlo algorithms for finding low-rank approximation. InFOCS, 1998. Journal of the ACM, 2004.

A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrixapproximation and projective clustering via volume sampling.Theory of Computing, 2006.

C. Boutsidis, P. Drineas, and M. Magdon-Ismail: Near optimalcolumn-based matrix reconstruction. SIAM Journal on Computing,2013.

V. Guruswami and A. K. Sinop: Optimal column based low-rankmatrix reconstruction. In SODA, 2012.

Page 63: Randomized Numerical Linear Algebra: Review and Progressessee.xidian.edu.cn/vipsl/MLA2014/ZhangZhihua.pdf · Department of Computer Science and Engineering Shanghai Jiao Tong University

RandomizedNumerical

Linear Algebra

Zhang

RandomProjectionThe Johnson andLindenstraussLemma

Randomized SVD

SubspaceEmbedding

RandomSelectionColumn Selection

CUR Decomposition

The Nyström Method

References

References

P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-errorCUR matrix decompositions. SIAM Journal on Matrix Analysis andApplications, 2008.

M. W. Mahoney and P. Drineas. CUR matrix decompositions forimproved data analysis. Proceedings of the National Academy ofSciences, 2009.

Sshuse Wang and Zhihua Zhang: Improving CUR matrixdecomposition and the Nyström approximation via adaptivesampling. JMLR, 2013.

C. Boutsidis and D. P. Woodruff: Optimal CUR matrixdecompositions. In STOC, 2014.

S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for theNyström method. JMLR, 2012.

K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.


Recommended