CSCI B609: โFoundations of Data Scienceโ
Grigory Yaroslavtsev http://grigory.us
Lecture 8/9: Faster Power Method and Applications of SVD
Slides at http://grigory.us/data-science-class.html
Faster Power Method โข PM drawback: ๐ด๐๐ด is dense even for sparse ๐ด
โข Pick random Gaussian ๐ and compute ๐ต๐๐
โข ๐ = ๐๐๐๐๐ ๐=1 (augment ๐๐โs to o.n.b. if ๐ < ๐ )
โข ๐ต๐๐ โ ๐12๐๐1๐1
๐ ๐๐๐๐๐๐=1 = ๐1
2๐๐1๐1
๐ต๐๐ = ๐ด๐๐ด ๐ด๐๐ด โฆ (๐ด๐๐ด)๐
โข Theorem: If ๐ is unit โ๐ -vector, ๐๐๐1 โฅ ๐น:
โ ๐ = subspace spanned by ๐๐โฒ๐ for ๐๐ โฅ 1 โ ๐ ๐1
โ ๐ = unit vector after ๐ =1
2๐ln1
๐๐น iterations of PM
โ ๐ has a component at most ๐ orthogonal to ๐
Faster Power Method: Analysis
โข ๐ด = ๐๐๐๐๐๐๐๐
๐=1 and ๐ = ๐๐๐๐๐ ๐=1
โข ๐ต๐๐ = ๐๐2๐๐๐๐๐
๐๐ ๐=1 ๐๐๐๐
๐ ๐=1 = ๐๐
2๐๐๐๐๐๐ ๐=1
๐ต๐๐2
2= ๐๐
2๐๐๐๐๐
๐
๐=12
2
= ๐๐4๐๐๐2
๐
๐=1
โฅ ๐14๐๐12 โฅ ๐๐
4๐๐ฟ2
โข (Squared ) component orthogonal to ๐ is
๐๐4๐๐๐2
๐
๐=๐+1
โค 1 โ ๐ 4๐๐14๐ ๐๐
2
๐
๐=๐+1
โค 1 โ ๐ 4๐๐14๐
โข Component of ๐ โฅ ๐ โค 1 โ ๐ 2๐/๐น โค ๐
Choice of ๐
โข ๐ random spherical Gaussian with unit variance
โข ๐ =๐
๐๐
:
๐๐ ๐๐ป๐ โค1
20 ๐ โค1
10+ 3๐โ๐ /64
โข ๐๐ ๐๐โฅ ๐ ๐ โค 3๐โ๐ /64 (Gaussian Annulus)
โข ๐๐ป๐ โผ ๐ 0,1 โ Pr ๐๐ป๐2โค1
10โค1
10
โข Can set ๐น =1
20 ๐ in the โfaster power methodโ
Singular Vectors and Eigenvectors
โข Right singular vectors are eigenvectors of ๐ด๐๐ด
โข ๐๐2 are eigenvalues of ๐ด๐๐ด
โข Left singular vectors are eigenvectors of ๐ด๐ด๐
โข ๐ด๐๐ด satisfies โ๐: ๐๐๐ต๐ โฅ 0
โ ๐ต = ๐๐2๐๐๐๐
๐๐
โ โ๐: ๐๐๐๐๐๐๐๐ = (๐๐๐๐)
2โฅ 0
โ Such matrices are called positive semi-definite
โข Any p.s.d matrix can be decomposed as ๐ด๐๐ด
Application of SVD: Centering Data
โข Minimize sum of squared distances from ๐จ๐ to ๐๐
โข SVD: best fitting ๐๐ if data is centered
โข What if not?
โข Thm. ๐๐ that minimizes squared distance goes through centroid of the point set:
1
๐ ๐จ๐
โข Will only prove for ๐ = 1, analogous proof for arbitrary ๐ (see textbook)
Application of SVD: Centering Data โข Thm. Line that minimizes squared distance goes through the centroid โข Line: โ = ๐ + ๐๐; distance ๐๐๐ ๐ก(๐จ๐, โ)
โข ๐จ๐ โ ๐ 22= ๐๐๐ ๐ก ๐จ๐, โ
2 + ๐, ๐จ๐2
โข Center so that ๐จ๐๐๐=1 = ๐ by subtracting the centroid
โข ๐๐๐ ๐ก ๐จ๐, โ2๐
๐ = ( ๐จ๐ โ ๐ 22โ ๐, ๐จ๐
2)๐๐=1
= ( ๐จ๐ 22+ ๐
2
2โ 2โจ๐จ๐, ๐โฉ โ ๐, ๐จ๐
2)
๐
๐=1
= ๐จ๐ 22+ ๐ ๐
2
2โ 2โจ ๐จ๐
๐
๐=1
, ๐โฉ โ ๐,๐จ๐2
๐
๐=1
๐
๐=1
= ๐จ๐ 22+ ๐ ๐
2
2โ ๐,๐จ๐
2
๐
๐=1
๐
๐=1
โข Minimized when ๐ = ๐
Principal Component Analysis
โข ๐ ร ๐ matrix: customersรmovies preference
โข ๐ = #customers, ๐ = #movies
โข ๐ด๐๐ = how much customer ๐ likes movie ๐
โข Assumption: ๐ด๐๐ can be described with ๐ factors
โ Customers and movies: vectors in ๐๐ and ๐๐ โ โ๐
โ ๐ด๐๐ = โจ๐๐, ๐๐โฉ
โข Solution: ๐ด๐
Class Project โข Survey of 3-5 research papers
โ Closely related to the topics of the class โข Algorithms for high-dimensional data โข Fast algorithms for numerical linear algebra โข Algorithms for machine learning and/or clustering โข Algorithms for streaming and massive data
โ Office hours if you need suggestions โ Individual (not a group) project โ 1-page Proposal Due: October 31, 2016 at 23:59 EST โ Final Deadline: December 09, 2016 at 23:59 EST
โข Submission by e-mail to Lisul Islam (IU id: islammdl) โ Submission Email Title: Project + Space + โYour Nameโ โ Submission format: PDF from LaTeX
Separating mixture of ๐ Gaussians
โข Sample origin problem: โ Given samples from ๐ well-separated spherical Gaussians
โ Q: Did they come from the same Gaussian?
โข ๐ฟ = distance between centers
โข For two Gaussians naรฏve separation requires
๐ฟ > ๐ ๐ ๐/๐
โข Thm. ๐ฟ = ฮฉ(๐1
4) suffices
โข Idea: โ Project on a ๐-dimensional subspace through centers
โ Key fact: This subspace can be found via SVD
โ Apply naรฏve algorithm
Separating mixture of ๐ Gaussians โข Easy fact: Projection preserves the property of
being a unit-variance spherical Gaussian
โข Def. If ๐ is a probability distribution, best fit line *๐๐, ๐ โ โ+ is:
๐ = ๐๐๐๐๐๐ฅ ๐ =1 ๐ผ๐โผ๐ ๐๐ป๐๐
โข Thm: Best fit line for a Gaussian centered at ๐ passes through ๐ and the origin
Best fit line for a Gaussian โข Thm: Best fit line for a Gaussian centered at ๐
passes through ๐ and the origin
๐ผ๐โผ๐ ๐๐ป๐๐= ๐ผ๐โผ๐ ๐
๐ป ๐ โ ๐ + ๐๐ป๐๐
= ๐ผ๐โผ๐ ๐๐ป ๐ โ ๐ ๐ + 2(๐๐ป๐)๐๐ป ๐ โ ๐ + (๐๐ป๐)2
= ๐ผ๐โผ๐ ,๐๐ป ๐ โ ๐ ๐- + 2(๐๐ป๐)๐ผ๐โผ๐,๐
๐ป ๐ โ ๐ - + (๐๐ป๐)2
= ๐ผ๐โผ๐ ,๐๐ป ๐ โ ๐ ๐- + (๐๐ป๐)2
= ๐2 +(๐๐ป๐)2
โข Where we used:
โ ๐ผ๐โผ๐,๐๐ป ๐ โ ๐ - = ๐
โ ๐ผ๐โผ๐,๐๐ป ๐ โ ๐ ๐- = ๐2
โข Best fit line maximizes (๐๐ป๐)2
Best fit subspace for one Gaussian
โข Best fit ๐-dimensional subspace ๐ฝ๐:
๐ฝ๐ = ๐๐๐๐๐๐ฅ๐ฝ:๐๐๐ ๐ฝ =๐
๐ผ๐โผ๐ ๐๐๐๐ ๐, ๐ฝ ๐
๐
โข For a spherical Gaussian ๐ฝ is a best-fit ๐-dimensional subspace iff it contains ๐
โข If ๐ = 0 then any ๐-dim. subspace is best fit
โข If ๐ โ 0 then best fit line ๐ goes through ๐ โ Same greedy process as SVD projects on ๐
โ After projection we have Gaussian with ๐ = 0
โ Any (๐ โ 1)-dimensional subspace would do
Best fit subspace for ๐ Gaussians
โข Thm. ๐ is a mixture of ๐ spherical Gaussians โ best fit ๐-dim. subspace contains their centers
โข ๐ = ๐ค1๐1 +๐ค2๐2 +โฏ+๐ค๐๐๐
โข Let ๐ฝ be a subspace of dimension โค ๐
๐ผ๐โผ๐ ๐๐๐๐ ๐, ๐ฝ ๐
๐= ๐ค๐
๐
๐=1
๐ผ๐โผ๐๐ ๐๐๐๐ ๐, ๐ฝ ๐
๐
โข Each term is maximized if ๐ฝ contains all ๐๐โฒ๐
โข If we only have a finite number of samples then accuracy has to be analyzed carefully
HITS Algorithm for Hubs and Authorities
โข Document ranking: project on 1st singular vector
โข WWW: directed graph with links = edges
โข ๐ Authorities: pages containing original info
โข ๐ Hubs: collections of links to authorities โ Authority depends on importance of pointing hubs
โ Hub quality depends on how authoritative links are
โข Authority vector: ๐๐, ๐ = 1,โฆ , ๐: ๐๐ โผ ๐๐๐จ๐๐๐ ๐=1
โข Hub vector: ๐๐, ๐ = 1,โฆ , ๐ : ๐๐ โผ ๐๐๐จ๐๐๐๐=1
โข Use power method: ๐ = ๐จ๐, ๐ = ๐จ๐ป๐
โข Converges to first left/right singular vectors