+ All Categories
Home > Documents > CSCI B609: โ€œFoundations of Data Scienceโ€grigory.us/b609/lec05.pdfโ€ข random spherical Gaussian...

CSCI B609: โ€œFoundations of Data Scienceโ€grigory.us/b609/lec05.pdfโ€ข random spherical Gaussian...

Date post: 09-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
16
CSCI B609: โ€œFoundations of Data Scienceโ€ Grigory Yaroslavtsev http://grigory.us Lecture 8/9: Faster Power Method and Applications of SVD Slides at http://grigory.us/data-science-class.html
Transcript

CSCI B609: โ€œFoundations of Data Scienceโ€

Grigory Yaroslavtsev http://grigory.us

Lecture 8/9: Faster Power Method and Applications of SVD

Slides at http://grigory.us/data-science-class.html

Faster Power Method โ€ข PM drawback: ๐ด๐‘‡๐ด is dense even for sparse ๐ด

โ€ข Pick random Gaussian ๐’™ and compute ๐ต๐‘˜๐’™

โ€ข ๐’™ = ๐‘๐‘–๐’—๐‘–๐’…๐‘–=1 (augment ๐’—๐’Šโ€™s to o.n.b. if ๐‘Ÿ < ๐’…)

โ€ข ๐ต๐‘˜๐’™ โ‰ˆ ๐œŽ12๐‘˜๐’—1๐’—1

๐‘‡ ๐‘๐‘–๐’—๐‘–๐‘‘๐‘–=1 = ๐œŽ1

2๐‘˜๐‘1๐’—1

๐ต๐‘˜๐’™ = ๐ด๐‘‡๐ด ๐ด๐‘‡๐ด โ€ฆ (๐ด๐‘‡๐ด)๐’™

โ€ข Theorem: If ๐’™ is unit โ„๐’…-vector, ๐’™๐‘‡๐’—1 โ‰ฅ ๐œน:

โ€“ ๐‘‰ = subspace spanned by ๐’—๐‘–โ€ฒ๐‘  for ๐œŽ๐‘— โ‰ฅ 1 โˆ’ ๐ ๐œŽ1

โ€“ ๐’˜ = unit vector after ๐‘˜ =1

2๐ln1

๐๐œน iterations of PM

โ‡’ ๐’˜ has a component at most ๐ orthogonal to ๐‘‰

Faster Power Method: Analysis

โ€ข ๐ด = ๐œŽ๐‘–๐’–๐‘–๐’—๐‘–๐‘‡๐‘Ÿ

๐‘–=1 and ๐’™ = ๐‘๐‘–๐’—๐‘–๐’…๐‘–=1

โ€ข ๐ต๐‘˜๐’™ = ๐œŽ๐‘–2๐‘˜๐’—๐‘–๐’—๐‘–

๐‘‡๐’…๐‘–=1 ๐‘๐‘—๐’—๐‘—

๐’…๐‘—=1 = ๐œŽ๐‘–

2๐‘˜๐‘๐‘–๐’—๐‘–๐’…๐‘–=1

๐ต๐‘˜๐’™2

2= ๐œŽ๐‘–

2๐‘˜๐‘๐‘–๐’—๐‘–

๐’…

๐‘–=12

2

= ๐œŽ๐‘–4๐‘˜๐‘๐‘–2

๐’…

๐‘–=1

โ‰ฅ ๐œŽ14๐‘˜๐‘12 โ‰ฅ ๐œŽ๐‘–

4๐‘˜๐›ฟ2

โ€ข (Squared ) component orthogonal to ๐‘‰ is

๐œŽ๐‘–4๐‘˜๐‘๐‘–2

๐’…

๐‘–=๐‘š+1

โ‰ค 1 โˆ’ ๐ 4๐‘˜๐œŽ14๐‘˜ ๐‘๐‘–

2

๐’…

๐‘–=๐‘š+1

โ‰ค 1 โˆ’ ๐ 4๐‘˜๐œŽ14๐‘˜

โ€ข Component of ๐’˜ โŠฅ ๐‘‰ โ‰ค 1 โˆ’ ๐ 2๐‘˜/๐œน โ‰ค ๐

Choice of ๐’™

โ€ข ๐’š random spherical Gaussian with unit variance

โ€ข ๐’™ =๐’š

๐’š๐Ÿ

:

๐‘ƒ๐‘Ÿ ๐’™๐‘ป๐’— โ‰ค1

20 ๐’…โ‰ค1

10+ 3๐‘’โˆ’๐’…/64

โ€ข ๐‘ƒ๐‘Ÿ ๐’š๐Ÿโ‰ฅ ๐Ÿ ๐’… โ‰ค 3๐‘’โˆ’๐’…/64 (Gaussian Annulus)

โ€ข ๐’š๐‘ป๐’— โˆผ ๐‘ 0,1 โ‡’ Pr ๐’š๐‘ป๐’—2โ‰ค1

10โ‰ค1

10

โ€ข Can set ๐œน =1

20 ๐’… in the โ€œfaster power methodโ€

Singular Vectors and Eigenvectors

โ€ข Right singular vectors are eigenvectors of ๐ด๐‘‡๐ด

โ€ข ๐œŽ๐‘–2 are eigenvalues of ๐ด๐‘‡๐ด

โ€ข Left singular vectors are eigenvectors of ๐ด๐ด๐‘‡

โ€ข ๐ด๐‘‡๐ด satisfies โˆ€๐’™: ๐’™๐‘‡๐ต๐’™ โ‰ฅ 0

โ€“ ๐ต = ๐œŽ๐‘–2๐’—๐‘–๐’—๐‘–

๐‘‡๐‘–

โ€“ โˆ€๐’™: ๐’™๐‘‡๐’—๐‘–๐’—๐‘–๐‘‡๐’™ = (๐’™๐‘‡๐’—๐‘–)

2โ‰ฅ 0

โ€“ Such matrices are called positive semi-definite

โ€ข Any p.s.d matrix can be decomposed as ๐ด๐‘‡๐ด

Application of SVD: Centering Data

โ€ข Minimize sum of squared distances from ๐‘จ๐’Š to ๐‘†๐‘˜

โ€ข SVD: best fitting ๐‘†๐‘˜ if data is centered

โ€ข What if not?

โ€ข Thm. ๐‘†๐‘˜ that minimizes squared distance goes through centroid of the point set:

1

๐‘› ๐‘จ๐’Š

โ€ข Will only prove for ๐‘˜ = 1, analogous proof for arbitrary ๐‘˜ (see textbook)

Application of SVD: Centering Data โ€ข Thm. Line that minimizes squared distance goes through the centroid โ€ข Line: โ„“ = ๐’‚ + ๐œ†๐’—; distance ๐‘‘๐‘–๐‘ ๐‘ก(๐‘จ๐’Š, โ„“)

โ€ข ๐‘จ๐’Š โˆ’ ๐’‚ 22= ๐‘‘๐‘–๐‘ ๐‘ก ๐‘จ๐’Š, โ„“

2 + ๐’—, ๐‘จ๐’Š2

โ€ข Center so that ๐‘จ๐’Š๐‘›๐‘–=1 = ๐ŸŽ by subtracting the centroid

โ€ข ๐‘‘๐‘–๐‘ ๐‘ก ๐‘จ๐’Š, โ„“2๐‘›

๐‘– = ( ๐‘จ๐’Š โˆ’ ๐’‚ 22โˆ’ ๐’—, ๐‘จ๐’Š

2)๐‘›๐‘–=1

= ( ๐‘จ๐’Š 22+ ๐’‚

2

2โˆ’ 2โŸจ๐‘จ๐’Š, ๐’‚โŸฉ โˆ’ ๐’—, ๐‘จ๐’Š

2)

๐‘›

๐‘–=1

= ๐‘จ๐’Š 22+ ๐‘› ๐’‚

2

2โˆ’ 2โŸจ ๐‘จ๐’Š

๐‘›

๐‘–=1

, ๐’‚โŸฉ โˆ’ ๐’—,๐‘จ๐’Š2

๐‘›

๐‘–=1

๐‘›

๐‘–=1

= ๐‘จ๐’Š 22+ ๐‘› ๐’‚

2

2โˆ’ ๐’—,๐‘จ๐’Š

2

๐‘›

๐‘–=1

๐‘›

๐‘–=1

โ€ข Minimized when ๐’‚ = ๐ŸŽ

Principal Component Analysis

โ€ข ๐’ ร— ๐’… matrix: customersร—movies preference

โ€ข ๐’ = #customers, ๐’… = #movies

โ€ข ๐ด๐‘–๐‘— = how much customer ๐‘– likes movie ๐‘—

โ€ข Assumption: ๐ด๐‘–๐‘— can be described with ๐‘˜ factors

โ€“ Customers and movies: vectors in ๐’–๐’Š and ๐’—๐’Š โˆˆ โ„๐‘˜

โ€“ ๐ด๐‘–๐‘— = โŸจ๐’–๐’Š, ๐’—๐’‹โŸฉ

โ€ข Solution: ๐ด๐‘˜

Class Project โ€ข Survey of 3-5 research papers

โ€“ Closely related to the topics of the class โ€ข Algorithms for high-dimensional data โ€ข Fast algorithms for numerical linear algebra โ€ข Algorithms for machine learning and/or clustering โ€ข Algorithms for streaming and massive data

โ€“ Office hours if you need suggestions โ€“ Individual (not a group) project โ€“ 1-page Proposal Due: October 31, 2016 at 23:59 EST โ€“ Final Deadline: December 09, 2016 at 23:59 EST

โ€ข Submission by e-mail to Lisul Islam (IU id: islammdl) โ€“ Submission Email Title: Project + Space + โ€œYour Nameโ€ โ€“ Submission format: PDF from LaTeX

Separating mixture of ๐‘˜ Gaussians

โ€ข Sample origin problem: โ€“ Given samples from ๐’Œ well-separated spherical Gaussians

โ€“ Q: Did they come from the same Gaussian?

โ€ข ๐›ฟ = distance between centers

โ€ข For two Gaussians naรฏve separation requires

๐›ฟ > ๐œ” ๐’…๐Ÿ/๐Ÿ’

โ€ข Thm. ๐›ฟ = ฮฉ(๐’Œ1

4) suffices

โ€ข Idea: โ€“ Project on a ๐’Œ-dimensional subspace through centers

โ€“ Key fact: This subspace can be found via SVD

โ€“ Apply naรฏve algorithm

Separating mixture of ๐‘˜ Gaussians โ€ข Easy fact: Projection preserves the property of

being a unit-variance spherical Gaussian

โ€ข Def. If ๐‘ is a probability distribution, best fit line *๐‘๐’—, ๐‘ โˆˆ โ„+ is:

๐’— = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ ๐’— =1 ๐”ผ๐’™โˆผ๐‘ ๐’—๐‘ป๐’™๐Ÿ

โ€ข Thm: Best fit line for a Gaussian centered at ๐ passes through ๐ and the origin

Best fit line for a Gaussian โ€ข Thm: Best fit line for a Gaussian centered at ๐

passes through ๐ and the origin

๐”ผ๐’™โˆผ๐‘ ๐’—๐‘ป๐’™๐Ÿ= ๐”ผ๐’™โˆผ๐‘ ๐’—

๐‘ป ๐’™ โˆ’ ๐ + ๐’—๐‘ป๐๐Ÿ

= ๐”ผ๐’™โˆผ๐‘ ๐’—๐‘ป ๐’™ โˆ’ ๐ ๐Ÿ + 2(๐’—๐‘ป๐)๐’—๐‘ป ๐’™ โˆ’ ๐ + (๐’—๐‘ป๐)2

= ๐”ผ๐’™โˆผ๐‘ ,๐’—๐‘ป ๐’™ โˆ’ ๐ ๐Ÿ- + 2(๐’—๐‘ป๐)๐”ผ๐’™โˆผ๐‘,๐’—

๐‘ป ๐’™ โˆ’ ๐ - + (๐’—๐‘ป๐)2

= ๐”ผ๐’™โˆผ๐‘ ,๐’—๐‘ป ๐’™ โˆ’ ๐ ๐Ÿ- + (๐’—๐‘ป๐)2

= ๐œŽ2 +(๐’—๐‘ป๐)2

โ€ข Where we used:

โ€“ ๐”ผ๐’™โˆผ๐‘,๐’—๐‘ป ๐’™ โˆ’ ๐ - = ๐ŸŽ

โ€“ ๐”ผ๐’™โˆผ๐‘,๐’—๐‘ป ๐’™ โˆ’ ๐ ๐Ÿ- = ๐œŽ2

โ€ข Best fit line maximizes (๐’—๐‘ป๐)2

Best fit subspace for one Gaussian

โ€ข Best fit ๐‘˜-dimensional subspace ๐‘ฝ๐‘˜:

๐‘ฝ๐‘˜ = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐‘ฝ:๐‘‘๐‘–๐‘š ๐‘ฝ =๐‘˜

๐”ผ๐’™โˆผ๐‘ ๐‘๐‘Ÿ๐‘œ๐‘— ๐’™, ๐‘ฝ ๐Ÿ

๐Ÿ

โ€ข For a spherical Gaussian ๐‘ฝ is a best-fit ๐‘˜-dimensional subspace iff it contains ๐

โ€ข If ๐ = 0 then any ๐‘˜-dim. subspace is best fit

โ€ข If ๐ โ‰  0 then best fit line ๐’— goes through ๐ โ€“ Same greedy process as SVD projects on ๐’—

โ€“ After projection we have Gaussian with ๐ = 0

โ€“ Any (๐‘˜ โˆ’ 1)-dimensional subspace would do

Best fit subspace for ๐‘˜ Gaussians

โ€ข Thm. ๐’‘ is a mixture of ๐‘˜ spherical Gaussians โ‡’ best fit ๐‘˜-dim. subspace contains their centers

โ€ข ๐‘ = ๐‘ค1๐’‘1 +๐‘ค2๐’‘2 +โ‹ฏ+๐‘ค๐‘˜๐’‘๐‘˜

โ€ข Let ๐‘ฝ be a subspace of dimension โ‰ค ๐‘˜

๐”ผ๐’™โˆผ๐’‘ ๐‘๐‘Ÿ๐‘œ๐‘— ๐’™, ๐‘ฝ ๐Ÿ

๐Ÿ= ๐‘ค๐‘–

๐‘˜

๐‘–=1

๐”ผ๐’™โˆผ๐’‘๐’Š ๐‘๐‘Ÿ๐‘œ๐‘— ๐’™, ๐‘ฝ ๐Ÿ

๐Ÿ

โ€ข Each term is maximized if ๐‘ฝ contains all ๐๐’Šโ€ฒ๐‘ 

โ€ข If we only have a finite number of samples then accuracy has to be analyzed carefully

HITS Algorithm for Hubs and Authorities

โ€ข Document ranking: project on 1st singular vector

โ€ข WWW: directed graph with links = edges

โ€ข ๐’ Authorities: pages containing original info

โ€ข ๐’… Hubs: collections of links to authorities โ€“ Authority depends on importance of pointing hubs

โ€“ Hub quality depends on how authoritative links are

โ€ข Authority vector: ๐’—๐’‹, ๐‘— = 1,โ€ฆ , ๐’: ๐’—๐’‹ โˆผ ๐’–๐‘–๐‘จ๐‘–๐‘—๐’…๐‘–=1

โ€ข Hub vector: ๐’–๐’Š, ๐‘– = 1,โ€ฆ , ๐’…: ๐’–๐’Š โˆผ ๐’—๐‘—๐‘จ๐‘–๐‘—๐’๐‘—=1

โ€ข Use power method: ๐’– = ๐‘จ๐’—, ๐’— = ๐‘จ๐‘ป๐’–

โ€ข Converges to first left/right singular vectors

Exercises

โ€ข Ex. 1: ๐ด is ๐‘› ร— ๐‘› matrix with orthonormal rows

โ€“ Show that it has orthonormal columns

โ€ข Ex. 2: Interpret the left and right singular vectors of the document x term matrix

โ€ข Ex. 3. Use power method to compute singular values of the matrix:

1 23 4


Recommended