RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Randomized Numerical Linear Algebra:Review and Progresses
Zhihua Zhang
Department of Computer Science and EngineeringShanghai Jiao Tong University
The 12th China Workshop on Machine Learning andApplications
Xi’an, November 2014
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
An interdisciplinary among Theoretical ComputerScience (TCS), Numerical Linear Algebra (NLA), andModern Data AnalysisMany data mining and machine learning algorithmsinvolve matrix decomposition, matrix inverse and matrixdeterminant; and some methods are based on low-rankmatrix approximation.The Big Data phenomenon brings new challenges andopportunities to machine learning and data mining.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Singular Value Decomposition (SVD)
Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.
time complexity: O(mn min(m,n))
The truncated SVD: Ak = UkΣkVTk where Uk and Vk
are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.
Ak is the “closest” rank-k approximation to A. That is,
Ak = argminrank(X)≤k
‖A− X‖ξ.
where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Singular Value Decomposition (SVD)
Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.
time complexity: O(mn min(m,n))
The truncated SVD: Ak = UkΣkVTk where Uk and Vk
are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.
Ak is the “closest” rank-k approximation to A. That is,
Ak = argminrank(X)≤k
‖A− X‖ξ.
where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Singular Value Decomposition (SVD)
Input: an m× n data matrix A of rank r and an integer kless than r .The (condensed) SVD: A = UΣVT where UT U = Ir ,VT V = Ir , and Σ = diag(σ1, . . . , σr ) withσ1 ≥ σ2 ≥ · · · ≥ σr > 0.
time complexity: O(mn min(m,n))
The truncated SVD: Ak = UkΣkVTk where Uk and Vk
are the first k columns of U and V, and Σk is the k × ktop sub-block of Σ.
Ak is the “closest” rank-k approximation to A. That is,
Ak = argminrank(X)≤k
‖A− X‖ξ.
where “ξ = 2" is the matrix spectral norm and “ξ = F " isthe matrix Frobenius norm.time complexity: O(mnk)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The CUR Decomposition
A CUR decomposition algorithm seeks to find a subset of ccolumns of A to form a matrix C ∈ Rm×c , a subset of r rowsto form a matrix R ∈ Rr×n, and an intersection matrixU ∈ Rc×r such that ‖A− CUR‖ξ is minimized.
The CUR decomposition results in an interpretablematrix approximation to A.There are (n
c) possible choices of constructing C and(mr ) possible choices of constructing R, so selecting the
best subsets is a hard problem.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Kernel Methods
K: n × n kernel matrix.Matrix inverse b = (K + αIn)−1y
time complexity: O(n3)performed by Gaussian process regression, leastsquare SVM, kernel ridge regression
Partial eigenvalue decomposition of Ktime complexity: O(n2k)performed by kernel PCA and some manifold learningmethods
Space complexity: O(n2)
the iterative algorithms go many passes through thedatayou had better put the entire kernel matrix in RAMif the data does not fit in the RAM, one swap betweenRAM and disk in each pass.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Approaches for Large Scale MatrixComputations
Two typical approaches: incremental and distributedRandomized algorithms have been also used.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Outline
1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD
2 Subspace Embedding
3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method
4 References
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Outline
1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD
2 Subspace Embedding
3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method
4 References
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Johnson and Lindenstrauss Lemma
This lemma has been given by Johnson andLindenstrauss (1984), but the proof was notconstructive.Indyk and Motwani (1998) and Dasgupta and Gupta(2003) constructed a result based on Gaussian randomprojection matrix R = [rij ] where rij
iid∼ N(0,1).Matoušek (2008) generalized the result to the case thatrij ’s are any subgaussian random variables; that is,
rijiid∼ G(ν2) for ν ≥ 1.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Johnson and Lindenstrauss Lemma
Definition (ε-isometry)
Given ε ∈ (0,1), a map f : Rp → Rq where p > q is called anε-isometry on set X ⊂ Rp if for every pair x,y ∈ X , we have
(1− ε)‖x− y‖22 ≤ ‖f (x)− f (y)‖22 ≤ (1 + ε)‖x− y‖22.
We consider the case that f is defined as a linear mapR ∈ Rq×p. The Basic idea is to construct a randomprojection R ∈ Rq×p that is an exact isometry “inexpectation;" that is, for every x ∈ Rp,
E[‖Rx‖22
]= ‖x‖22.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Johnson and Lindenstrauss Lemma
Theorem (The Johnson and Lindenstrauss Lemma)
Let X = x1, . . . ,xn ⊂ Rp, and let ε, δ ∈ (0,1). Assume thatR ∈ Rq×p ( p > q) where rij ∈ G(ν2) for some ν ≥ 1. Ifq ≥ 100ν2ε−2 log(n/
√δ), then with probability at least 1− δ,
R is an ε-isometry on X
Pr
supy∈Y
∣∣‖Ry‖22 − 1∣∣ ≥ ε ≤ δ.
where Y =
xi−xj‖xi−xj‖2
: xi ,xj ∈ X ,xi 6= xj
.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Prototype for Randomized SVD
Given an m × n matrix A, a target number k of singularvectors, and an integer c such that k < c < min(m,n), aproto-algorithm based on random projection for SingularValue Decomposition (SVD) of A is as follows.
1 Construct an m × c column-orthonormal matrix Q andform B = QT A;
2 Compute SVD of the small matrix: B = UBΣBVTB ;
3 Set U = QUB;4 Return UΣBVT
B as an approximate SVD of A, andUB,kΣB,kVT
B,k as a truncated SVD of A.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
A Proto-Algorithm for Construction of RandomProjection Matrix Q
Let A be an m × n matrix, and k be a target number ofsingular vectors.
1 Generate an m × 2k Gaussian test matrix Ω.2 Form Y = (AAT )γAΩ where γ = 1 or γ = 2.3 Construct a matrix Q whose columns form an
orthonormal basis for the range of Y.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Computational Complexity for the RandomizedSVD
The randomized SVD procedure requires only 2(γ + 1)passes over the matrix.The flop count is
(2γ + 2)kTmult + O(k2(m + n)),
where Tmult is the flop count of a matrix-vector multiplywith A or AT .
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Theoretical Analysis for the Randomized SVD
Theorem (Halko et al., 2011)
Let A ∈ Rm×n. Give an exponent γ and a target number k ofsingular vectors, where 2 ≤ k ≤ 1
2 min(m,n), running theRandomized SVD algorithm obtains a rank-2k factorizationU2kΣ2k VT
2k . Then
E‖A− U2k Σ2k VT2k‖2 ≤
[1 + 4
√2 min(m,n)
k − 1
]1/(2γ+1)σk+1.
where E is taken w.r.t. the random test matrix and σk+1 isthe top (k + 1)th singular value of A.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Outline
1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD
2 Subspace Embedding
3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method
4 References
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Subspace Embedding Problem
For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if
(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2
for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk
such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Subspace Embedding Problem
For a fixed m × n matrix A of rank r and an errorparameter ε ∈ (0,1), we call S : Rm → Rk a subspaceembedding matrix for A if
(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2
for all x ∈ Rn.The Subspace Embedding Problem is to find such anembedding matrix obliviously. More specifically, onedesigns a distribution π over linear maps from Rm to Rk
such that for any fixed m × n matrix A, if we chooseS ∼ π, then with high probability S is an embeddingmatrix for A.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Sparse Embedding Matrices
For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows
h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.
A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Sparse Embedding Matrices
For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows
h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.
A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Sparse Embedding Matrices
For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows
h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.
A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Sparse Embedding Matrices
For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows
h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.
A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Sparse Embedding Matrices
For a fixed m× n matrix A with m > n, let nnz(A) denote thenumber of non-zero entries of A. Assume that nnz(A) ≥ mand that there are no all-zero rows or columns in A. Let[m] = 1,2, . . . ,m. For a parameter k , define a randomlinear map ΦD : Rm → Rk as follows
h : [m]→ [k ] is a random map so that for each i ∈ [m],h(i) = t where t ∈ [k ] with probability 1/k .Φ ∈ 0,1k×m is a k ×m binary matrix, with φh(i),i = 1and all remaining entries 0.D is an m ×m random diagonal matrix, with eachdiagonal entry independently chosen to be +1 or −1with equal probability.
A matrix of the form S = ΦD is referred to as a sparseembedding matrix (Dasgupta et al. , 2010; Clarkson andWoodruff, 2013).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Subspace Embedding in Input-Sparsity Time
Theorem (Meng and Mahoney, 2013)
Let S = ΦD ∈ Rk×m with k = n2+nε2δ
. Then with probability atleast 1− δ,
(1− ε)‖Ax‖2 ≤ ‖SAx‖2 ≤ (1 + ε)‖Ax‖2
for all x ∈ Rn. In addition, SA can be computed inO(nnz(A)).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Spectral Sparsifiers
Theorem (Batson, Spielman and Srivastava, 2014)
Suppose ρ > 1 and v1,v2, . . . ,vm ⊆ Rn with∑i≤m
vivTi = In.
Then there exist scalars di ≥ 0 with |i : di 6= 0| ≤ dρnesuch that(
1− 1√ρ
)2In
∑i≤m
divivTi
(1 +
1√ρ
)2In.
This theorem shows thatλ1(∑
i≤m divivTi )
λn(∑
i≤m divivTi )≤ρ+ 1 + 2
√ρ
ρ+ 1− 2√ρ.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Outline
1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD
2 Subspace Embedding
3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method
4 References
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Column Selection and The CX Decomposition
Given an m × n matrix A, column selection algorithmsaim to find a matrix with c columns of A such that‖A− CC+A‖ξ = ‖(Im − CC+)A‖ξ achieves theminimum. Here “ξ = 2," “ξ = F ," and “ξ = ∗"respectively represent the matrix spectral norm, thematrix Frobenius norm, and the matrix nuclear norm,and C+ is the Moore-Penrose inverse of C.Let X be the best rank k approximation to A in thecolumn span of C. Then CX is called the CXDecomposition of A.Since there are (n
c) possible choices of constructing C,selecting the best subset is a hard problem.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
A Randomized Algorithm for Column Selection
Given an m × n matrix A and a rank parameter k , a randomsampling based on the statistical leverage score is:
Compute the importance sampling probabilities πini=1.Here πi = 1
k ‖Vk(i)‖, where Vk is an n × k orthonormal
matrix spanning the top-k right singular subspace of A.Randomly select c = O(k log(k/ε2)) columns of Aaccording to these probabilities to form the matrix C.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Theoretical Result for the Random ColumnSelection (Drineas et al., 2008)
Let Ck be the best rank-k approximation to the matrix C,and define the projection matrix PCk = CkC+
k . Then
‖A− PCk A‖F ≤ (1 + ε)‖A− Ak‖F ,
where Ak = UkΣkVTk is the best rank k approximation to A.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Adaptive Sampling Algorithm
Lemma (Deshpande et al., 2006)
Given a matrix A ∈ Rm×n, let C1 ∈ Rm×c1 consist of c1columns of A, and define the residual B = A− C1C+
1 A.Additionally, for i = 1, · · · ,n, define
πi = ‖bi‖22/‖B‖2F .
We further sample c2 columns i.i.d. from A, in each trial ofwhich the i-th column is chosen with probability πi . LetC2 ∈ Rm×c2 contain the c2 sampled columns and letC = [C1,C2] ∈ Rm×(c1+c2). Then, for any integer k > 0, thefollowing inequality holds:
E‖A− CC+A‖2F ≤ ‖A− Ak‖2F +kc2‖A− C1C+
1 A‖2F ,
where the expectation is taken w.r.t. C2.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Near-Optimal Column Selection Algorithm
Boutsidis et al. (2013) derived a near-optimal algorithm,which consists of three steps:
the approximate SVD via random projection (Halko etal. 2011)a dual set sparsification algorithm—an extension ofspectral sparsifier (BSS)the adaptive sampling algorithm (Deshpande et al.,2006)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Near-Optimal Column Selection Algorithm
Theorem (Boutsidis et al., 2013)
Given a matrix A ∈ Rm×n of rank ρ, a target rank k(2 ≤ k < ρ), and 0 < ε < 1, the algorithm selects
c =2kε
(1 + o(1)
)columns of A to form a matrix C ∈ Rm×c . Then the followinginequality holds:
E‖A− CC+A‖2F ≤ (1 + ε) ‖A− Ak‖2F ,
where the expectation is taken w.r.t. C. Furthermore, thematrix C can be obtained in time:
O(mk2ε−4/3 + nk3ε−2/3)+ TMultiply
(mnkε−2/3).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The CUR Decomposition (Drineas et al., 2008;Mahoney and Drineas, 2009)
Given an m × n matrix A, and integers c < n and r < m, theCUR decomposition of A finds C ∈ Rm×c with c columnsfrom A, R ∈ Rr×n with r rows from A, and U ∈ Rc×r suchthat A = CUR + E. Here E = A− CUR is the residual errormatrix.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The CUR Problem
Definition (The CUR Decomposition)
Given an m × n matrix A of rank ρ, a rank parameter k , andaccuracy parameter ε ∈ (0,1), construct a matrix C ∈ Rm×c
with c columns from A, R ∈ Rr×n with rows from A, andU ∈ Rc×r , with c, r , and rank(U) being as small as possible,such that
‖A− CUR‖2F ≤ (1 + ε)‖A− Ak‖2F .
Here Ak = UkΣkVTk ∈ Rm×n is the best rank k matrix
obtained via the SVD of A: A = UΣVT .
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Subspace Sampling CUR Algorithm
Drineas et al., (2008) proposed a two-stage randomizedCUR algorithm that called Subspace Sampling.
The first stage samples c columns of A to construct Caccording to the sampling probabilities proportional tothe squared `2-norm of the rows of Vk ;The second stage samples r rows from A and Csimultaneously to construct R and W and let U = W†.The sampling probabilities in this stages areproportional to the leverage scores of A and C,respectively.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Subspace Sampling CUR Algorithm
Lemma (Drineas et al., 2008)
Given an m × n matrix A and a target rank k minm,n,the subspace sampling algorithm selectsc = O(kε−2 log k log(1/δ)) columns and r =O(cε−2 log c log(1/δ)
)rows without replacement. Then
‖A− CUR‖F =∥∥A− CW+R
∥∥F ≤ (1 + ε)‖A− Ak‖F ,
holds with probability at least 1− δ, where W contains therows of C with scaling. The running time is dominated bythe truncated SVD of A, that is, O(mnk).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Adaptive Sampling CUR Algorithm
Wang and Zhang (2013) proposed an Adaptive SamplingCUR Algorithm.
Select c = 2kε
(1 + o(1)
)columns of A to construct
C ∈ Rm×c using Algorithm of Boutsidis et al. (2013);Select r1 = c rows of A to construct R1 ∈ Rr1×n usingAlgorithm of Boutsidis et al. (2013);Adaptively sample r2 = c/ε rows from A according tothe residual A− AR†1R1;Return C, R = [RT
1 ,RT2 ]T , and U = C†AR†.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Adaptive Sampling CUR Algorithm
Lemma (Wang and Zhang, 2013)
Given a matrix A ∈ Rm×n and a matrix C ∈ Rm×c such thatrank(C) = rank(CC†A) = ρ (ρ ≤ c ≤ n), let R1 ∈ Rr1×n
consist of r1 rows of A and define the residualB = A− AR†1R1. Additionally, for i = 1, · · · ,m, we define
πi = ‖b(i)‖22/‖B‖2F .
We further sample r2 rows i.i.d. from A, in each trial of whichthe i-th row is chosen with probability pi . Let R2 ∈ Rr2×n
contain the r2 sampled rows and letR = [RT
1 ,RT2 ]T ∈ R(r1+r2)×n. Then we have
E‖A− CC†AR†R‖2F ≤ ‖A− CC†A‖2F +ρ
r2‖A− AR†1R1‖2F ,
where the expectation is taken w.r.t. R2.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Adaptive Sampling CUR Algorithm
Theorem (Wang and Zhang, 2013)
Given a matrix A ∈ Rm×n and a positive integerk minm,n, the Adaptive Sampling CUR algorithmrandomly selects c = 2k
ε (1+o(1)) columns of A to constructC ∈ Rm×c , and then selects r = c
ε (1+ε) rows of A toconstruct R ∈ Rr×n. Then we have
E‖A−CUR‖F = E‖A−C(C†AR†)R‖F ≤ (1+ε)‖A−Ak‖F .
The algorithm costs timeO((m + n)k3ε−2/3 + mk2ε−2 + nk2ε−4)+ TMultiply
(mnkε−1)
to compute matrices C, U and R.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Optimal CUR Algorithm
Boutsidis and Woodruff (2014) proposed Optimal CURAlgorithm.
Construction C with O(k + kε ) columns:
Compute the top k singular vectors of A: Z1Sample O(k log k) columns from ZT
1 with the leveragescoresDown-sample columns to c1 = O(k) columns with thesampling algorithm of Boutsidis et al. (2013)Adaptively sample c2 = O( k
ε ) columns of A
Construction R with O(k + kε ) rows:
Find Z2 in the span of C such that:‖A− Z2ZT
2 A‖2F ≤ (1 + ε) · ‖A− Ak‖2
FSample O(k log k) rows from Z2 with the leveragescoresDown-sample rows to r1 = O(k) rows with the samplingalgorithm of Boutsidis et al. (2013)Sample r2 = O( k
ε ) rows with adaptive sampling
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Optimal CUR Algorithm
Lemma (Boutsidis and Woodruff, 2014)
Given a matrix A ∈ Rm×n, V ∈ Rm×c and an integer k, letV = YΨ be a QR decomposition of V, Γ = YT A,Γk = ∆ΣkVT
k be a rank k SVD of Γ, ∆ ∈ Rc×k . ThenY∆∆T YT satisfies:
‖A− Y∆∆T YT A‖2F ≤ ‖A− Y∆ΣkVTk ‖2F = ‖A− ΠF
V ,k (A)‖2F .
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Optimal CUR Algorithm
Theorem (Boutsidis and Woodruff, 2014)
Given a matrix A ∈ Rm×n of rank ρ, a target rank 1 ≤ k ≤ ρ,and 0 < ε < 1, the optimal CUR algorithm selects at mostc = O(k/ε) columns and at most r = O(k/ε) rows from Aform matrices C ∈ Rm×c , R ∈ Rr×n, and U ∈ Rc×r withrank(U) = k such that, with some probability,
‖A− CUR‖2F ≤ ‖(1 + O(ε))‖A− Ak‖2F .
The matrices C, U, and R can be computed in time
O[nnz(A) log n + (m + n)× poly(log n, k ,1/ε)
].
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Nyström Method
Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have
K =
[W KT
21K21 K22
], C =
[W
K21
].
The Nyström Approximation: Knysc ≈ K
Knysc︸︷︷︸
n×n
= C︸︷︷︸n×c
W†︸︷︷︸c×c
CT︸︷︷︸c×n
.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Nyström Method
Random Selection:selects c ( n) columns of K to construct C using somerandomized algorithms. After permutation we have
K =
[W KT
21K21 K22
], C =
[W
K21
].
The Nyström Approximation: Knysc ≈ K
Knysc︸︷︷︸
n×n
= C︸︷︷︸n×c
W†︸︷︷︸c×c
CT︸︷︷︸c×n
.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Nyström Approximation
The Nyström Approximation:
K ≈ Knysc = CW†CT
(A low-rank factorization).
NyströmApproximation × ×
n×n
c×n
n×c
c×c
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Problem Formulation
Problem:How to select informative columns of K ∈ Rn×n toconstruct C ∈ Rn×c?The approximation error
∥∥K− CUCT∥∥
F or∥∥K− CUCT∥∥
2 should be as small as possible.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Criterion: Upper Error Bounds
Using approximation algorithms to find c good columns(not necessarily the best)
Hope that ‖K−CUCT ‖F‖K−Kk‖F
has upper bound, which is thesmaller the better.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Uniform Sampling: The Simplest Algorithm
Sample c columns of K uniformly at random toconstruct C.
The simplest, but the most widely used.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Adaptive Sampling
The adaptive sampling algorithm [Deshpande et al. , 2006]:
1 Sample c1 columns of K to construct C1 using somealgorithm;
2 Compute the residual B = K− C1C†1K;
3 Compute sampling probabilities pi =‖bi‖2
2‖B‖2
F, for i = 1 to
n;4 Sample further c2 columns of K in c2 i.i.d. trials, in each
trial the i-th column is chosen with probability pi ;Denote the selected columns by C2;
5 Return C = [C1 , C2].
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Adaptive Sampling
The error term ‖K− CC†K‖F is bounded theoretically,but ‖K− CW†CT‖F is not.
Empirically, the adaptive sampling algorithm works verywell.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Better Sampling Algorithms?
We hope ‖K−CW†CT ‖F‖K−Kk‖F
will be very small if the columnsampling algorithm is good enough.
But it cannot be arbitrarily small.Lower Error Bound
Theorem (Wang & Zhang, JMLR 2013)
Whatever column sampling is used to select c columns,there exists a bad case K such that
‖K− CW†CT‖2F‖K− Kk‖2F
≥ Ω
(1 +
nkc2
).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Different Types of Low-Rank Approximation?
The Ensemble Nyström Method [Kumar et al. , JMLR2012]:
K ≈t∑
i=1
1t
C(i)W(i)†C(i)T
It does not improve the lower error bound.Lower Error Bound
Theorem (Wang & Zhang, JMLR 2013)
Whatever column sampling is used to select c columns,there exists a bad case K such that∥∥K−
∑ti=1
1t C(i)W(i)†C(i)T∥∥2
F
‖K− Kk‖2F≥ Ω
(1 +
nkc2
).
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
The Modified Nyström Method
The Modified Nyström Method [Wang & Zhang, JMLR2013]:
K ≈ C(
C†K(C†)T︸ ︷︷ ︸c×c
)CT .
Theorem (Wang & Zhang, JMLR 2013)
Using a column sampling algorithm, the error incurred bythe modified Nyström method satisfies
E∥∥K− C
(C†K(C†)T )CT
∥∥2F
‖K− Kk‖2F≤ 1 +
√kc.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Comparisons between the Two Methods
The Standard Nyström Method: fast.It costs only TSVD(c3) time to compute the intersectionmatrix Unys = W†.
The Modified Nyström Method: slow.It costs TSVD(nc2) + TMultiply(n2c) time to compute theintersection matrix Umod = C†K(C†)T naively.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Comparisons between the Two Methods
The Standard Nyström Method: inaccurate.It cannot attain 1 + ε Frobenius relative-error boundunless
c ≥√
nk/ε
columns are selected, whatever column selectionalgorithm is used. (Due to its lower error bound.)
The Modified Nyström Method: accurate.Some adaptive sampling based algorithms attain 1 + εFrobenius relative-error bound when
c = O(k/ε2).
(c is the smaller the better.)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Comparisons between the Two Methods
Theorem (Exact Recovery)
For the symmetric matrix K defined previously, the followingthree statements are equivalent:
1 rank(W) = rank(K),
2 K = CW†CT ,(i.e., the standard Nyström method is exact)
3 K = C(C†K(C†)T )CT ,
(i.e., the modified Nyström method is exact)
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
Outline
1 Random ProjectionThe Johnson and Lindenstrauss LemmaRandomized SVD
2 Subspace Embedding
3 Random SelectionColumn SelectionCUR DecompositionThe Nyström Method
4 References
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
References
Santosh S. Vempala. The Random Projection Method. AmericanMathematical Society, 2000.
Michael W. Mahoney. Randomized Algorithms for Matrices andData. Foundations and Trends in Machine Learning, 3(2): 123-224,2011.
N. Halko, P. G. Martinsson, and J. A. Tropp. Finding Structure withRandomness: Probabilistic Algorithms for Constructing ApproximateMatrix Decompositions. SIAM Review, 53(2): 217-288, 2011
W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitzmapping into a Hilbert space. Contemporary Mathematics, 1984.
S. Dasgupta and A. Gupta. An elementary proof of a theorem ofJohnson and Lindenstrauss. Random Structure & Algorithms, 2003.
J. Matoušek. On variants of the Johnson and LindenstraussLeamma. Random Structure & Algorithms, 2008.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
References
A. Dasgupta, R. Kumar, and T. Sarlós: A sparseJohnson-Lindenstrauss Transform. In STOC, 2010.
K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.
X. Meng and M. W. Mahoney. Low-distortion subspace embeddingsin input-sparsity time and applications to robust linear regression.STOC, 2013.
J. Nelson and H. L. Nguyên. OSNAP: Faster numerical linearalgebra algorithms via sparser subspace embeddings In FOCS,2013.
J. Batson, D. Spielman, and N. Srivastave: Twice-RamanujanSparsifiers. SIAM Review, 2014.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
References
A. Frieze, K. Kannan, and Rademacher, S. Vempala: FastMonte-Carlo algorithms for finding low-rank approximation. InFOCS, 1998. Journal of the ACM, 2004.
A. Deshpande, L. Rademacher, S. Vempala, and G.Wang: Matrixapproximation and projective clustering via volume sampling.Theory of Computing, 2006.
C. Boutsidis, P. Drineas, and M. Magdon-Ismail: Near optimalcolumn-based matrix reconstruction. SIAM Journal on Computing,2013.
V. Guruswami and A. K. Sinop: Optimal column based low-rankmatrix reconstruction. In SODA, 2012.
RandomizedNumerical
Linear Algebra
Zhang
RandomProjectionThe Johnson andLindenstraussLemma
Randomized SVD
SubspaceEmbedding
RandomSelectionColumn Selection
CUR Decomposition
The Nyström Method
References
References
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-errorCUR matrix decompositions. SIAM Journal on Matrix Analysis andApplications, 2008.
M. W. Mahoney and P. Drineas. CUR matrix decompositions forimproved data analysis. Proceedings of the National Academy ofSciences, 2009.
Sshuse Wang and Zhihua Zhang: Improving CUR matrixdecomposition and the Nyström approximation via adaptivesampling. JMLR, 2013.
C. Boutsidis and D. P. Woodruff: Optimal CUR matrixdecompositions. In STOC, 2014.
S. Kumar, M. Mohri, and A. Talwalkar: Sampling methods for theNyström method. JMLR, 2012.
K. L. Clarkson and D. P. Woodruff: Low Rank Approximation andRegression in Sparsity Time. In STOC, 2013.