+ All Categories
Home > Documents > Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf ·...

Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf ·...

Date post: 22-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
89
Low-rank Models for Data Analysis Carlos Fernandez-Granda www.cims.nyu.edu/~cfgranda 2/27/2018
Transcript
Page 1: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank Models for Data Analysis

Carlos Fernandez-Grandawww.cims.nyu.edu/~cfgranda

2/27/2018

Page 2: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Background

Low-rank models

Matrix completion

Structured low-rank models

Data-driven Analysis of Infant Sleep Patterns

Page 3: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank

For any matrix A

dim (col (A)) = dim (row (A))

This is the rank of A

Page 4: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Singular value decomposition

Every rank r real matrix A ∈ Rm×n, has a singular-value decomposition(SVD) of the form

A =[~u1 ~u2 · · · ~ur

]σ1 0 · · · 00 σ2 · · · 0

. . .0 0 · · · σr

~v T1

~vT2...~vTr

= USV T

Page 5: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Singular value decomposition

I The singular values σ1 ≥ σ2 ≥ · · · ≥ σr are positive real numbers

I The left singular vectors ~u1, ~u2, . . . ~ur form an orthonormal set

I The right singular vectors ~v1, ~v2, . . . ~vr also form an orthonormal set

I The SVD is unique if all the singular values are different

I If σi = σi+1 = . . . = σi+k , then ~ui , . . . , ~ui+k can be replaced by anyorthonormal basis of their span (the same holds for ~vi , . . . , ~vi+k)

I The SVD of an m×n matrix with m ≥ n can be computed in O(mn2)

Page 6: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Column and row space

I The left singular vectors ~u1, ~u2, . . . ~ur are a basis for the column space

I The right singular vectors ~v1, ~v2, . . . ~vr are a basis for the row space

Page 7: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Best rank-k approximation

Let USV T be the SVD of a matrix A ∈ Rm×n

The truncated SVD U:,1:kS1:k,1:kVT:,1:k is the best rank-k approximation

U:,1:kS1:k,1:kVT:,1:k = argmin

A | rank(A)=k

∣∣∣∣∣∣A− A∣∣∣∣∣∣

F

Page 8: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Background

Low-rank models

Matrix completion

Structured low-rank models

Data-driven Analysis of Infant Sleep Patterns

Page 9: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Motivation

Quantity y [i , j ] depends on indices i and j

We observe examples and want to predict new instances

In collaborative filtering, y [i , j ] is rating given to a movie i by a user j

Page 10: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Collaborative filtering

Y :=

Bob Molly Mary Larry

1 1 5 4 The Dark Knight2 1 4 5 Spiderman 34 5 2 1 Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 5 5 Superman 2

Page 11: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Simple model

Assumptions:

I Some movies are more popular in general

I Some users are more generous in general

y [i , j ] ≈ a[i ]b[j ]

I a[i ] quantifies popularity of movie i

I b[j ] quantifies generosity of user j

Page 12: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-1 model

Assume m movies are all rated by n users

Model becomes

Y ≈ ~a ~b T

We can fit it by solving

min~a∈Rm, ~b∈Rn

∣∣∣∣∣∣Y − ~a ~b T∣∣∣∣∣∣

Fsubject to ||~a||2 = 1

Equivalent to

minX∈Rm×n

||Y − X ||F subject to rank (X ) = 1

Page 13: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-1 model

Assume m movies are all rated by n users

Model becomes

Y ≈ ~a ~b T

We can fit it by solving

min~a∈Rm, ~b∈Rn

∣∣∣∣∣∣Y − ~a ~b T∣∣∣∣∣∣

Fsubject to ||~a||2 = 1

Equivalent to

minX∈Rm×n

||Y − X ||F subject to rank (X ) = 1

Page 14: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Best rank-k approximation

Let USV T be the SVD of a matrix A ∈ Rm×n

The truncated SVD U:,1:kS1:k,1:kVT:,1:k is the best rank-k approximation

U:,1:kS1:k,1:kVT:,1:k = argmin

A | rank(A)=k

∣∣∣∣∣∣A− A∣∣∣∣∣∣

F

Page 15: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-1 model

σ1~u1~vT1 = arg min

X∈Rm×n||Y − X ||F subject to rank (X ) = 1

The solution to

min~a∈Rm, ~b∈Rn

∣∣∣∣∣∣Y − ~a ~b T∣∣∣∣∣∣

Fsubject to ||~a||2 = 1

is

~amin =

~u1

~bmin =

σ1~v1

Page 16: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-1 model

σ1~u1~vT1 = arg min

X∈Rm×n||Y − X ||F subject to rank (X ) = 1

The solution to

min~a∈Rm, ~b∈Rn

∣∣∣∣∣∣Y − ~a ~b T∣∣∣∣∣∣

Fsubject to ||~a||2 = 1

is

~amin = ~u1

~bmin = σ1~v1

Page 17: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-r model

Certain people like certain movies: r factors

y [i , j ] ≈r∑

l=1

al [i ]bl [j ]

For each factor l

I al [i ]: movie i is positively (> 0), negatively (< 0) or not (≈ 0)associated to factor l

I bl [j ]: user j likes (> 0), hates (< 0) or is indifferent (≈ 0) to factor l

Page 18: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank-r model

Equivalent to

Y ≈ AB, A ∈ Rm×r , B ∈ Rr×n

SVD solves

minA∈Rm×r ,B∈Rr×n

||Y − AB||F subject to ||~a1||2 = 1, . . . , ||~ar ||2 = 1

Problem: Many possible ways of choosing ~a1, . . . , ~ar , ~b1, . . . , ~br

SVD constrains them to be orthogonal

Page 19: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Collaborative filtering

Y :=

Bob Molly Mary Larry

1 1 5 4 The Dark Knight2 1 4 5 Spiderman 34 5 2 1 Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 5 5 Superman 2

Page 20: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

SVD

A− µ~1~1T = USV T = U

7.79 0 0 00 1.62 0 00 0 1.55 00 0 0 0.62

V T

µ :=1n

m∑i=1

n∑j=1

Aij

Page 21: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Rank 1 model

A + σ1~u1~vT1 =

Bob Molly Mary Larry

1.34 (1) 1.19 (1) 4.66 (5) 4.81 (4) The Dark Knight1.55 (2) 1.42 (1) 4.45 (4) 4.58 (5) Spiderman 34.45 (4) 4.58 (5) 1.55 (2) 1.42 (1) Love Actually4.43 (5) 4.56 (4) 1.57 (2) 1.44 (1) B.J.’s Diary4.43 (4) 4.56 (5) 1.57 (1) 1.44 (2) Pretty Woman1.34 (1) 1.19 (2) 4.66 (5) 4.81 (5) Superman 2

Page 22: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Movies

~a1 =D. Knight Sp. 3 Love Act. B.J.’s Diary P. Woman Sup. 2

( )−0.45 −0.39 0.39 0.39 0.39 −0.45

Coefficients cluster movies into action (+) and romantic (-)

Page 23: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Users

~b1 =Bob Molly Mary Larry

( )3.74 4.05 −3.74 −4.05

Coefficients cluster people into action (-) and romantic (+)

Page 24: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Background

Low-rank models

Matrix completion

Structured low-rank models

Data-driven Analysis of Infant Sleep Patterns

Page 25: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Netflix Prize

? ? ? ?

?

?

??

??

???

?

?

Page 26: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Matrix completion

Bob Molly Mary Larry

1 ? 5 4 The Dark Knight? 1 4 5 Spiderman 34 5 2 ? Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 ? 5 Superman 2

Page 27: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Isn’t this completely ill posed?

Can’t we fill in the missing entries arbitrarily?

Yes, but not if matrix is low rank

Then it depends on ≈ r (m + n) parameters

As long as data > parameters recovery is possible (in principle)

1 1 1 1 ? 11 1 1 1 1 11 1 1 1 1 1? 1 1 1 1 1

Page 28: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Isn’t this completely ill posed?

Can’t we fill in the missing entries arbitrarily?

Yes, but not if matrix is low rank

Then it depends on ≈ r (m + n) parameters

As long as data > parameters recovery is possible (in principle)

1 1 1 1 ? 11 1 1 1 1 11 1 1 1 1 1? 1 1 1 1 1

Page 29: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Isn’t this completely ill posed?

Can’t we fill in the missing entries arbitrarily?

Yes, but not if matrix is low rank

Then it depends on ≈ r (m + n) parameters

As long as data > parameters recovery is possible (in principle)

1 1 1 1 ? 11 1 1 1 1 11 1 1 1 1 1? 1 1 1 1 1

Page 30: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Matrix cannot be sparse

0 0 0 0 0 00 0 0 23 0 00 0 0 0 0 00 0 0 0 0 0

Page 31: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Singular vectors cannot be sparse

1111

[1 1 1 1]

+

0001

[1 2 3 4]

=

1 1 1 11 1 1 11 1 1 12 3 4 5

Page 32: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Incoherence

The matrix must be incoherent: its singular vectors must be spread out

For 1/√n ≤ µ ≤ 1

max1≤i≤r ,1≤j≤m

|Uij | ≤ µ

max1≤i≤r ,1≤j≤n

|Vij | ≤ µ

for the left U1, . . . ,Ur and right V1, . . . ,Vr singular vectors

Page 33: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Measurements

We must see an entry in each row/column at least1 1 1 1? ? ? ?1 1 1 11 1 1 1

=

1?11

[1 1 1 1]

Assumption: Random sampling (usually does not hold in practice!)

Page 34: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank matrix estimation

First idea:

minX∈Rm×n

rank (X ) such that XΩ = y

Ω: indices of revealed entriesy : revealed entries

Page 35: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Convex functions

A function f : Rn → R is convex if for any ~x , ~y ∈ Rn and any θ ∈ (0, 1)

θf (~x) + (1− θ) f (~y) ≥ f (θ~x + (1− θ) ~y)

Page 36: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Convex functions

f (θ~x + (1 − θ)~y)

θf (~x) + (1 − θ)f (~y)

f (~x)

f (~y)

Page 37: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Minimizing convex functions

Page 38: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Minimizing nonconvex functions

Page 39: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

The rank is not convex

The rank of matrices in Rn×n interpreted as a function from Rn×n to Ris not convex

X :=

[1 00 0

]Y :=

[0 00 1

]For any θ ∈ (0, 1)

rank (θX + (1− θ)Y ) = 2

θ rank (X ) + (1− θ) rank (Y ) = 1

Page 40: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

The rank is not convex

The rank of matrices in Rn×n interpreted as a function from Rn×n to Ris not convex

X :=

[1 00 0

]Y :=

[0 00 1

]For any θ ∈ (0, 1)

rank (θX + (1− θ)Y )

= 2

θ rank (X ) + (1− θ) rank (Y )

= 1

Page 41: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

The rank is not convex

The rank of matrices in Rn×n interpreted as a function from Rn×n to Ris not convex

X :=

[1 00 0

]Y :=

[0 00 1

]For any θ ∈ (0, 1)

rank (θX + (1− θ)Y ) = 2

θ rank (X ) + (1− θ) rank (Y )

= 1

Page 42: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

The rank is not convex

The rank of matrices in Rn×n interpreted as a function from Rn×n to Ris not convex

X :=

[1 00 0

]Y :=

[0 00 1

]For any θ ∈ (0, 1)

rank (θX + (1− θ)Y ) = 2

θ rank (X ) + (1− θ) rank (Y ) = 1

Page 43: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Norms are convex

For any ~x , ~y ∈ Rn and any θ ∈ (0, 1)

||θ~x + (1− θ) ~y ||

≤ ||θ~x ||+ ||(1− θ) ~y ||= θ ||~x ||+ (1− θ) ||~y ||

Page 44: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Norms are convex

For any ~x , ~y ∈ Rn and any θ ∈ (0, 1)

||θ~x + (1− θ) ~y || ≤ ||θ~x ||+ ||(1− θ) ~y ||

= θ ||~x ||+ (1− θ) ||~y ||

Page 45: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Norms are convex

For any ~x , ~y ∈ Rn and any θ ∈ (0, 1)

||θ~x + (1− θ) ~y || ≤ ||θ~x ||+ ||(1− θ) ~y ||= θ ||~x ||+ (1− θ) ||~y ||

Page 46: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Promoting low-rank structure

Toy problem: Find t such that

M (t) :=

0.5 + t 1 10.5 0.5 t0.5 1− t 0.5

,is low rank

Strategy: Minimize

f (t) := ||M (t)||

Page 47: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Matrix norms

Frobenius norm

||A||F :=

√√√√ m∑i=1

n∑j=1

A2ij =

√√√√minm,n∑i=1

σ2i

Operator norm

||A|| := max||~x ||2=1 | ~x∈Rn

||A ~x ||2 = σ1

Nuclear norm

||A||∗ :=

minm,n∑i=1

σi

Page 48: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Promoting low-rank structure

1.0 0.5 0.0 0.5 1.0 1.5t

1.0

1.5

2.0

2.5

3.0Rank

Operator norm

Frobenius norm

Nuclear norm

Page 49: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Exact recovery

Guarantees by Gross 2011, Candès and Recht 2008, Candès and Tao 2009

minX∈Rm×n

||X ||∗ such that XΩ = y

achieves exact recovery with high probability as long as the number ofsamples is proportional to r (n + m) up to log terms

Page 50: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank matrix estimation

If data are noisy

minX∈Rm×n

||XΩ − ~y ||22 + λ ||X ||∗

where λ > 0 is a regularization parameter

Page 51: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Matrix completion

Bob Molly Mary Larry

1 ? 5 4 The Dark Knight? 1 4 5 Spiderman 34 5 2 ? Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 ? 5 Superman 2

Page 52: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Matrix completion via nuclear-norm minimization

Bob Molly Mary Larry

1 2 (1) 5 4 The Dark Knight

2 (2) 1 4 5 Spiderman 34 5 2 2 (1) Love Actually5 4 2 1 Bridget Jones’s Diary4 5 1 2 Pretty Woman1 2 5 (5) 5 Superman 2

Page 53: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Real data

I Movielens database

I 671 users

I 300 movies

I Training set: 9 135 ratings

I Test set: 1 016

Page 54: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Real data

10-2 10-1 100 101 102 103 104

λ

0

1

2

3

4

5

6

7

8

Avera

ge A

bso

lute

Rati

ng E

rror

Train ErrorTest Error

Page 55: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank matrix completion

Intractable problem

minX∈Rm×n

rank (X ) such that XΩ ≈ ~y

Nuclear norm: convex but computationally expensive

Page 56: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Alternative

I Fix rank k beforehand

I Parametrize the matrix as AB where A ∈ Rm×r and B ∈ Rr×n

I Solve

minA∈Rm×r ,B∈Rr×n

∣∣∣∣∣∣(AB)Ω− ~y∣∣∣∣∣∣

2

by alternating minimization

Page 57: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Alternating minimization

Sequence of least-squares problems (much faster than computing SVDs)

I To compute A(k) fix B(k−1) and solve

minA∈Rm×r

∣∣∣∣∣∣(AB(k−1))

Ω− ~y∣∣∣∣∣∣

2

I To compute B(k) fix A(k) and solve

minB∈Rr×n

∣∣∣∣∣∣(A(k)B)

Ω− ~y∣∣∣∣∣∣

2

Theoretical guarantees: Jain, Netrapalli, Sanghavi 2013

Page 58: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Background

Low-rank models

Matrix completion

Structured low-rank models

Data-driven Analysis of Infant Sleep Patterns

Page 59: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Nonnegative matrix factorization

Nonnegative atoms/coefficients can make results easier to interpret

X ≈ A B, Ai ,j ≥ 0, Bi ,j ≥ 0, for all i , j

Nonconvex optimization problem:

minimize∣∣∣∣∣∣X − A B

∣∣∣∣∣∣2F

subject to Ai ,j ≥ 0,

Bi ,j ≥ 0, for all i , j

A ∈ Rm×r and B ∈ Rr×n

Page 60: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Face dataset

Page 61: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Faces dataset: Principal component analysis

Page 62: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Faces dataset: Nonnegative matrix factorization

Page 63: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Topic modeling

A :=

singer GDP senate election vote stock bass market band Articles

6 1 1 0 0 1 9 0 8 a1 0 9 5 8 1 0 1 0 b8 1 0 1 0 0 9 1 7 c0 7 1 0 0 9 1 7 0 d0 5 6 7 5 6 0 7 2 e1 0 8 5 9 2 0 0 1 f

Page 64: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

SVD

A = USV T = U

23.64 0 0 00 18.82 0 0 0 00 0 14.23 0 0 00 0 0 3.63 0 00 0 0 0 2.03 00 0 0 0 0 1.36

V T

Page 65: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Left singular vectors

a b c d e f( )U1 = −0.24 −0.47 −0.24 −0.32 −0.58 −0.47( )U2 = 0.64 −0.23 0.67 −0.03 −0.18 −0.21( )U3 = −0.08 −0.39 −0.08 0.77 0.28 −0.40

Page 66: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Right singular vectors

singer GDP senate election vote stock bass market band

( )V1 = −0.18 −0.24 −0.51 −0.38 −0.46 −0.34 −0.2 −0.3 −0.22( )V2 = 0.47 0.01 −0.22 −0.15 −0.25 −0.07 0.63 −0.05 0.49( )V3 = −0.13 0.47 −0.3 −0.14 −0.37 0.52 −0.04 0.49 −0.07

Page 67: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Nonnegative matrix factorization

X ≈W H

Wi ,j ≥ 0, Hi ,j ≥ 0, for all i , j

Page 68: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Right nonnegative factors

singer GDP senate election vote stock bass market band

( )H1 = 0.34 0 3.73 2.54 3.67 0.52 0 0.35 0.35( )H2 = 0 2.21 0.21 0.45 0 2.64 0.21 2.43 0.22( )H3 = 3.22 0.37 0.19 0.2 0 0.12 4.13 0.13 3.43

Interpretations:

I Count atom: Counts for each doc are weighted sum of H1, H2, H3

I Coefficients: They cluster words into politics, music and economics

Page 69: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Left nonnegative factors

a b c d e f( )W1 = 0.03 2.23 0 0 1.59 2.24( )W2 = 0.1 0 0.08 3.13 2.32 0( )W3 = 2.13 0 2.22 0 0 0.03

Interpretations:

I Count atom: Counts for each word are weighted sum of W1, W2, W3

I Coefficients: They cluster docs into politics, music and economics

Page 70: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Sparse PCA

Sparse atoms can make results easier to interpret

X ≈ A B, A sparse

Nonconvex optimization problem:

minimize∣∣∣∣∣∣X − A B

∣∣∣∣∣∣22

+ λ

k∑i=1

∣∣∣∣∣∣Ai

∣∣∣∣∣∣1

subject to∣∣∣∣∣∣Ai

∣∣∣∣∣∣2

= 1, 1 ≤ i ≤ k

A ∈ Rm×r and B ∈ Rr×n

Page 71: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Faces dataset

Page 72: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Background

Low-rank models

Matrix completion

Structured low-rank models

Data-driven Analysis of Infant Sleep Patterns

Page 73: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Acknowledgements

Joint work with Mark Cheng, David Heeger and Sheng Liu

Page 74: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Data

5 10 15 20 25 30 35 40 45

Time of day (half-hour intervals)

50

100

150

200

250

300

350

Age (

days)

5 10 15 20 25 30 35 40 45

Time of day (half-hour intervals)

50

100

150

200

250

300

350

Age (

days)

5 10 15 20 25 30 35 40 45

Time of day (half-hour intervals)

50

100

150

200

250

300

350

Age (

days)

Page 75: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Sample mean

0 4 8 12 16 20 24 28 32 36 40 44

Time of day (half-hour intervals)

0 20 40 60 80 100120140160180200220240260280300320340360

Age (

days)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fraction of sleep

Page 76: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Sample mean

0 4 8 12 16 20 24 28 32 36 40 44

Time of day (half-hour intervals)

0

0.2

0.4

0.6

0.8

1P

rob

ab

ility

of

sle

ep

10

60

120

200

300

Age (days)

Page 77: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Sample mean

0 50 100 150 200 250 300 350

Age (days)

0

0.2

0.4

0.6

0.8

1P

rob

ab

ility

of

sle

ep

2 a.m.

5 a.m.

8 a.m.

11 a.m.

2 p.m.

5 p.m.

8 p.m.

11 p.m.

Time of day

Page 78: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank model

minimize365∑d=1

48∑h=1

∑b∈Bd,t

(S (d , t, b)−

k∑i=1

Di (d)Ti (t)

)2

Page 79: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank model

0 4 8 12 16 20 24 28 32 36 40 44

Time of day (half-hour intervals)

0 20 40 60 80 100120140160180200220240260280300320340360

Age (

days)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fraction of sleep

Page 80: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank model

0 4 8 12 16 20 24 28 32 36 40 44

Time of day (half-hour intervals)

0

0.2

0.4

0.6

0.8

1P

rob

ab

ility

of

sle

ep

10

60

120

200

300

Age (days)

Page 81: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank model

0 50 100 150 200 250 300 350

Age (days)

0

0.2

0.4

0.6

0.8

1P

rob

ab

ility

of

sle

ep

2 a.m.

5 a.m.

8 a.m.

11 a.m.

2 p.m.

5 p.m.

8 p.m.

11 p.m.

Time of day

Page 82: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Factors

0 8 16 24 32 40

Day (half-hour intervals)

0

0.05

0.1

0.15

0.2

0.25

Page 83: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Factors

0 50 100 150 200 250 300 350

Age (day)

-15

-10

-5

0

5

10

15

Page 84: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Low-rank model with nonnegative factors

minimize365∑d=1

48∑h=1

∑b∈Bd,t

(S (d , t, b)−

k∑i=1

Di (d)Ti (t)

)2

subject to Di (d) ≥ 0, Ti (t) ≥ 0 for all i , d , t

Page 85: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Factors

0 8 16 24 32 40

Day (half-hour intervals)

0

0.05

0.1

0.15

0.2

Page 86: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Factors

0 50 100 150 200 250 300 350

Age (day)

0

2

4

6

8

10

12

Page 87: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

RMSE

Mean Low-rank model Nonnegative low-rank modelk=1 k=2 k=3 k=4 k=1 k=2 k=3 k=4

Training 0.3586 0.3663 0.3596 0.3593 0.3591 0.3663 0.3596 0.3593 0.3593Test 0.4282 0.3640 0.3585 0.3581 0.3579 0.3640 0.3585 0.3581 0.3582

Page 88: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Emergence of circadian rhythm

50 100 150 200 250 300 350

Age (days)

0.5

0.6

0.7

0.8

0.9

1C

orr

ela

tion

1

2

3

Page 89: Low-rank Models for Data Analysis - New York Universitycfgranda/pages/stuff/low_rank_models.pdf · 1 25(5) 5 Superman2. Realdata I Movielensdatabase I 671users I 300movies I Trainingset:

Emergence of circadian rhythm

50 100 150 200 250 300 350

Age (days)

0.5

0.6

0.7

0.8

0.9

1C

orr

ela

tion

1

2

3


Recommended