+ All Categories
Home > Documents > Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE ›...

Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE ›...

Date post: 24-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
132
Computing Nonnegative Matrix Factorizations Nicolas Gillis Joint work with Fran¸ cois Glineur, Robert Luce, Stephen Vavasis, Arnaud Vandaele, J´ er´ emy Cohen
Transcript
Page 1: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Computing Nonnegative Matrix Factorizations

Nicolas Gillis

Joint work with Francois Glineur, Robert Luce, Stephen Vavasis,Arnaud Vandaele, Jeremy Cohen

Page 2: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Where is Mons?

2//

Page 3: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Where is Mons?

2//

Page 4: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Nonnegative Matrix Factorization (NMF)

Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, i)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k , i)︸ ︷︷ ︸≥0

for all i .

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.

3//

Page 5: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Nonnegative Matrix Factorization (NMF)

Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, i)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k , i)︸ ︷︷ ︸≥0

for all i .

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.

3//

Page 6: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Nonnegative Matrix Factorization (NMF)

Given a matrix M ∈ Rp×n+ and a factorization rank r min(p, n), find

U ∈ Rp×rand V ∈ Rr×n such that

minU≥0,V≥0

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

NMF is a linear dimensionality reduction technique for nonnegative data :

M(:, i)︸ ︷︷ ︸≥0

≈r∑

k=1

U(:, k)︸ ︷︷ ︸≥0

V (k , i)︸ ︷︷ ︸≥0

for all i .

Why nonnegativity?

→ Interpretability: Nonnegativity constraints lead to easily interpretablefactors (and a sparse and part-based representation).→ Many applications. image processing, text mining, hyperspectralunmixing, community detection, clustering, etc.

3//

Page 7: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 1: Blind hyperspectral unmixing

Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels.

Problem. Identify the materials and classify the pixels.

4//

Page 8: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 1: Blind hyperspectral unmixing

Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels.

Problem. Identify the materials and classify the pixels.

4//

Page 9: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Linear mixing model

5//

Page 10: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Linear mixing model

5//

Page 11: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 1: Blind hyperspectral unmixing with NMF

Basis elements allow to recover the different endmembers: U ≥ 0;

Abundances of the endmembers in each pixel: V ≥ 0.

6//

Page 12: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 1: Blind hyperspectral unmixing with NMF

Basis elements allow to recover the different endmembers: U ≥ 0;

Abundances of the endmembers in each pixel: V ≥ 0.

6//

Page 13: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 1: Blind hyperspectral unmixing with NMF

Basis elements allow to recover the different endmembers: U ≥ 0;

Abundances of the endmembers in each pixel: V ≥ 0.

6//

Page 14: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Urban hyperspectral image

7//

Page 15: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Urban hyperspectral image

Figure: Decomposition of the Urban dataset.

8//

Page 16: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Urban hyperspectral image

Figure: Decomposition of the Urban dataset.

8//

Page 17: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Urban hyperspectral image

Figure: Decomposition of the Urban dataset.

8//

Page 18: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 2: topic recovery and document classification

Basis elements allow to recover the different topics;

Weights allow to assign each text to its corresponding topics.

9//

Page 19: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 2: topic recovery and document classification

Basis elements allow to recover the different topics;

Weights allow to assign each text to its corresponding topics.

9//

Page 20: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Example 2: topic recovery and document classification

Basis elements allow to recover the different topics;

Weights allow to assign each text to its corresponding topics.

9//

Page 21: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exemple 3: feature extraction and classification

The basis elements extract facial features such as eyes, nose and lips.

10//

Page 22: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exemple 3: feature extraction and classification

The basis elements extract facial features such as eyes, nose and lips.

10//

Page 23: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exemple 3: feature extraction and classification

The basis elements extract facial features such as eyes, nose and lips.10

//

Page 24: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis
Page 25: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Outline

1 Computational complexity

2 Standard non-linear optimization schemes and acceleration

3 Exact NMF (M = UV ) and its geometric interpretation

4 NMF under the separability assumption

12//

Page 26: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Computational Complexity of NMF

13//

Page 27: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity of NMF

minU∈Rp×r ,V∈Rr×n

||M − UV ||2F such that U ≥ 0,V ≥ 0.

For r = 1, Eckart-Young and Perron-Frobenius theorems.

Checking whether there exists an exact factorization M = UV :

NP-hard (Vavasis, 2009) where p, n and r are not fixed.

Using quantifier elimination (reformulation with fixed number ofvariables)

Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial

Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .

Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).

14//

Page 28: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity of NMF

minU∈Rp×r ,V∈Rr×n

||M − UV ||2F such that U ≥ 0,V ≥ 0.

For r = 1, Eckart-Young and Perron-Frobenius theorems.

Checking whether there exists an exact factorization M = UV :

NP-hard (Vavasis, 2009) where p, n and r are not fixed.

Using quantifier elimination (reformulation with fixed number ofvariables)

Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial

Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .

Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).

14//

Page 29: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity of NMF

minU∈Rp×r ,V∈Rr×n

||M − UV ||2F such that U ≥ 0,V ≥ 0.

For r = 1, Eckart-Young and Perron-Frobenius theorems.

Checking whether there exists an exact factorization M = UV :

NP-hard (Vavasis, 2009) where p, n and r are not fixed.

Using quantifier elimination (reformulation with fixed number ofvariables)

Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial

Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .

Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).

14//

Page 30: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity of NMF

minU∈Rp×r ,V∈Rr×n

||M − UV ||2F such that U ≥ 0,V ≥ 0.

For r = 1, Eckart-Young and Perron-Frobenius theorems.

Checking whether there exists an exact factorization M = UV :

NP-hard (Vavasis, 2009) where p, n and r are not fixed.

Using quantifier elimination (reformulation with fixed number ofvariables)

Cohen and Rothblum [1991]: (mn)O(mr+nr), non-polynomialArora et al. [2012]: (mn)O(2r ), polynomial

Moitra [2013] : (mn)O(r2), polynomial→ not really useful in practice . . .

Does not imply that rank+ (the minimum r such that M = UV ) canbe computed in polynomial time (because there are no upper boundon rank+).

14//

Page 31: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity for other norms

minu∈Rp ,v∈Rn

||M − uvT ||1 =∑i ,j

|Mij − uivj | . (`1 norm)

If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.

minu∈Rp ,v∈Rn

||M − uvT ||2W =∑i ,j

Wij(M − uvT )2ij , (weighted `2 norm)

where W is a nonnegative weight matrix.This model can be used when

data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2

ij).

G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.

15//

Page 32: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity for other norms

minu∈Rp ,v∈Rn

||M − uvT ||1 =∑i ,j

|Mij − uivj | . (`1 norm)

If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.

minu∈Rp ,v∈Rn

||M − uvT ||2W =∑i ,j

Wij(M − uvT )2ij , (weighted `2 norm)

where W is a nonnegative weight matrix.This model can be used when

data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2

ij).

G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.

G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.

15//

Page 33: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Complexity for other norms

minu∈Rp ,v∈Rn

||M − uvT ||1 =∑i ,j

|Mij − uivj | . (`1 norm)

If M is binary, M ∈ 0, 1m×n, any optimal solution (u∗, v∗) can beassumed to be binary, that is, (u∗, v∗) ∈ 0, 1p × 0, 1n.

minu∈Rp ,v∈Rn

||M − uvT ||2W =∑i ,j

Wij(M − uvT )2ij , (weighted `2 norm)

where W is a nonnegative weight matrix.This model can be used when

data is missing (Wij = 0 for missing entries),entries have different variances (Wij = 1/σ2

ij).

G., Vavasis, On the Complexity of Robust PCA and `1-Norm Low-Rank MatrixApproximation, Mathematics of Operations Research, 2018.G., Glineur, Low-Rank Matrix Approximation with Weights or Missing Data is NP-hard,SIAM J. Mat. Anal. Appl., 2011.

15//

Page 34: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF Algorithms and Acceleration

16//

Page 35: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF Algorithms

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V∈Rr×n+

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

This is a difficult non-linear optimization problem with potentially manylocal minima.

Standard framework:

0. Initialize (U, V ). Then, alternatively update U and V :

1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)

Most NMF algorithms come with no guarantees (except convergence tostationary points).

Solution is in general highly non-unique: indentifiability issues.

17//

Page 36: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF Algorithms

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V∈Rr×n+

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

This is a difficult non-linear optimization problem with potentially manylocal minima.

Standard framework:

0. Initialize (U, V ). Then, alternatively update U and V :

1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)

Most NMF algorithms come with no guarantees (except convergence tostationary points).

Solution is in general highly non-unique: indentifiability issues.

17//

Page 37: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF Algorithms

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V∈Rr×n+

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

This is a difficult non-linear optimization problem with potentially manylocal minima.

Standard framework:

0. Initialize (U, V ). Then, alternatively update U and V :

1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)

Most NMF algorithms come with no guarantees (except convergence tostationary points).

Solution is in general highly non-unique: indentifiability issues.

17//

Page 38: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF Algorithms

Given a matrix M ∈ Rm×n+ and a factorization rank r ∈ N:

minU∈Rm×r

+ ,V∈Rr×n+

||M − UV ||2F =∑i ,j

(M − UV )2ij . (NMF)

This is a difficult non-linear optimization problem with potentially manylocal minima.

Standard framework:

0. Initialize (U, V ). Then, alternatively update U and V :

1. Update V ≈ argminX≥0 ||M − UX ||2F . (NNLS)2. Update U ≈ argminY≥0 ||M − YV ||2F . (NNLS)

Most NMF algorithms come with no guarantees (except convergence tostationary points).

Solution is in general highly non-unique: indentifiability issues.

17//

Page 39: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Block coordinate descent method

Use block-coordinate descent on the NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :

U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max

(0,

RkVTk:

||Vk:||22

)∀k ,

where Rk.

= M −∑

j 6=k U:jVj :, and similarly for V .This is the so-called HALS algorithm.

It can be accelerated:

1 Gauss-Seidel Coordinate descent (Hsieh, Dhillon, 2011).

2 Loop several time over columns of U/rows of V to perform moreiterations at a lower computational cost (Glineur, G., 2012).

3 Randomized shuffling (Chow, Wu, Yin, 2017).

4 Use an extrapolation step: W (k+1) = W (k+1) + βk(W (k+1) −W (k))(Ang, G., 2018).

18//

Page 40: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Block coordinate descent method

Use block-coordinate descent on the NNLS subproblems−→ closed-form solutions for the columns of U and rows of V :

U∗:k = argminU:k≥0 ||Rk − U:kVk:||2F = max

(0,

RkVTk:

||Vk:||22

)∀k ,

where Rk.

= M −∑

j 6=k U:jVj :, and similarly for V .This is the so-called HALS algorithm.

It can be accelerated:

1 Gauss-Seidel Coordinate descent (Hsieh, Dhillon, 2011).

2 Loop several time over columns of U/rows of V to perform moreiterations at a lower computational cost (Glineur, G., 2012).

3 Randomized shuffling (Chow, Wu, Yin, 2017).

4 Use an extrapolation step: W (k+1) = W (k+1) + βk(W (k+1) −W (k))(Ang, G., 2018).

18//

Page 41: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Illustration on the CBCL face image data set

19//

Page 42: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF: Geometry and ExtendedFormulations

20//

Page 43: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of exact NMF

Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:

M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.

The columns of M are convex combinations of the columns of U:

M:j =k∑

i=1

U:i Vij withk∑

i=1

Vij = 1∀j , Vij ≥ 0∀ij .

In other terms,

conv(M) ⊆ conv(U) ⊆ Sn,

where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,

∑ni=1 xi = 1 is the unit simplex.

Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.

21//

Page 44: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of exact NMF

Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:

M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.

The columns of M are convex combinations of the columns of U:

M:j =k∑

i=1

U:i Vij withk∑

i=1

Vij = 1∀j , Vij ≥ 0∀ij .

In other terms,

conv(M) ⊆ conv(U) ⊆ Sn,

where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,

∑ni=1 xi = 1 is the unit simplex.

Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.

21//

Page 45: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of exact NMF

Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:

M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.

The columns of M are convex combinations of the columns of U:

M:j =k∑

i=1

U:i Vij withk∑

i=1

Vij = 1∀j , Vij ≥ 0∀ij .

In other terms,

conv(M) ⊆ conv(U) ⊆ Sn,

where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,

∑ni=1 xi = 1 is the unit simplex.

Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.

21//

Page 46: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of exact NMF

Given M = UV , one can scale M and U such that they become columnstochastic implying that V is column stochastic:

M = UV ⇐⇒ M ′ = MDM = (UDU)(D−1U VDM) = U ′V ′.

The columns of M are convex combinations of the columns of U:

M:j =k∑

i=1

U:i Vij withk∑

i=1

Vij = 1∀j , Vij ≥ 0∀ij .

In other terms,

conv(M) ⊆ conv(U) ⊆ Sn,

where conv(X ) is the convex hull of the columns of X , andSn = x ∈ Rn |x ≥ 0,

∑ni=1 xi = 1 is the unit simplex.

Exact NMF ≡ Find r points whose convex hull is nested between twogiven polytopes.

21//

Page 47: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of NMF

Example: Two nested hexagons (rank(Ma) = 3)

22//

Page 48: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of NMF

Example: Two nested hexagons (rank(Ma) = 3)

Ma =1

a

1 a 2a− 1 2a− 1 a 11 1 a 2a− 1 2a− 1 aa 1 1 a 2a− 1 2a− 1

2a− 1 a 1 1 a 2a− 12a− 1 2a− 1 a 1 1 a

a 2a− 1 2a− 1 a 1 1

, a > 1.

22//

Page 49: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of NMF

Example: Two nested hexagons (rank(Ma) = 3)Case 1: a = 2, rank+(Ma) = 3, col(M) = col(U)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

∆p ∩ col(M2)

conv(M2)

conv(U)

22//

Page 50: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of NMF

Example: Two nested hexagons (rank(Ma) = 3)Case 2: a = 3, rank+(Ma) = 4, col(M) = col(U)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

∆p ∩ col(M3)

conv(M3)

conv(U)

22//

Page 51: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric interpretation of NMF

Example: Two nested hexagons (rank(Ma) = 3)Case 3: a→ +∞, rank+(Ma) = 5, col(M) 6= col(U)

22//

Page 52: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

Let P be a polytope

P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

The hexagon:

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

23//

Page 53: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

Let P be a polytope

P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

The hexagon:

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

23//

Page 54: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

Let P be a polytope

P = x ∈ Rk | bi − A(i , :)x ≥ 0 for 1 ≤ i ≤ m,

and let vj ’s (1 ≤ j ≤ n) be its vertices.

We define the m-by-n slack matrix SP of P as follows:

SP(i , j) = bi − A(i , :)vj≥ 0 1 ≤ i ≤ m, 1 ≤ j ≤ n.

The hexagon:

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

23

//

Page 55: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:

P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.

Remark. The slack matrix SP of P satisfies

conv(SP) = Sm ∩ col(SP).

To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).

24//

Page 56: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:

P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.

Remark. The slack matrix SP of P satisfies

conv(SP) = Sm ∩ col(SP).

To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).

24//

Page 57: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:

P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.

Remark. The slack matrix SP of P satisfies

conv(SP) = Sm ∩ col(SP).

To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).

24//

Page 58: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

An amazing result: NMF and extended formulations

An extended formulation of P is higher dimensional polyhedron Q ⊆ Rk+p

that (linearly) projects onto P. The minimum number of facets of such apolytope is called the extension complexity xp(P) of P.

Theorem (Yannakakis, 1991).

rank+(SP) = xp(P).

Proof (one direction). Given P = x ∈ Rk | b−Ax ≥ 0, any exact NMFof SP = UV ,U ≥ 0,V ≥ 0 provides an explicit extended formulation(with some redundant equalities) of P:

P = x | b − Ax ≥ 0 = x | b − Ax = Uy and y ≥ 0.

Remark. The slack matrix SP of P satisfies

conv(SP) = Sm ∩ col(SP).

To get a small factorization, we need to go to a higher dimensional space:rank(U) > rank(M).

24//

Page 59: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

The Hexagon

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

25//

Page 60: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

The Hexagon

SP =

0 1 2 2 1 00 0 1 2 2 11 0 0 1 2 22 1 0 0 1 22 2 1 0 0 11 2 2 1 0 0

=

1 0 0 1/2 00 1 0 1 00 0 1 1/2 00 0 1 0 1/20 1 0 0 11 0 0 0 1/2

0 1 2 1 0 00 0 1 0 0 11 0 0 0 1 20 0 0 2 2 02 2 0 0 0 0

,

with

rank(SP) = 3 ≤ rank+(SP) = 5 ≤ min(m, n) = 6.

25//

Page 61: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?

Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 62: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.

Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 63: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 64: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 65: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 66: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Some implications

Problem: limits of LP for solving combinatorial problems: given apolytope, what is the most compact way to represent it?Its extension complexity = nonnegative rank of its slack matrix.Key tool: lower bound techniques for the nonnegative rank.

Ex. The matching problem cannot be solved via a polynomial-size LP.Rothvoss (2014). The matching polytope has exponential extension complexity, STOC.

This can be generalized to

approximations (no poly-size LP can approximate these problems upto some precision).Braun, Fiorini, Pokutta & Steurer (2012). Approximation limits of linear programs(beyond hierarchies), FOCS.

any convex cone, in particular PSD (so called PSD-rank).See the survey: Fawzi, Gouveia, Parrilo, Robinson, & Thomas. Positivesemidefinite rank, Mathematical Programming, 2015.

26//

Page 67: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

Can we use numerical solvers to get insight into these problems?

Yes!

We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).

Extension complexity of the octagon?

27//

Page 68: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

Can we use numerical solvers to get insight into these problems? Yes!

We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).

Extension complexity of the octagon?

27//

Page 69: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

Can we use numerical solvers to get insight into these problems? Yes!

We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).

Extension complexity of the octagon?

27//

Page 70: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

Can we use numerical solvers to get insight into these problems? Yes!

We have developed a library to compute exact NMF’s for small matricesusing meta-heuristics.[V14] Vandaele, G., Glineur & D. Tuyttens, Heuristics for Exact NMF (2014).

Extension complexity of the octagon?

rank(SP) = 3 ≤ rank+(SP) = 6 ≤ min(m, n) = 8.

27//

Page 71: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

We observed a special structure on the solutions for regular n-gons,leading to the best known upper bound and closing the gap for somen-gons:

rank+(Sn) ≤

2dlog2(n)e − 1 for 2k−1 < n ≤ 2k−1 + 2k−2,2dlog2(n)e for 2k−1 + 2k−2 < n ≤ 2k .

[V15] Vandaele, G. & Glineur, On the Linear Extension Complexity of Regularn-gons (2015).

Implication: conic quadratic programming is ‘polynomially reducible’to linear programming.[BTN01] Ben-Tal and Nemirovski (2001). On polyhedral approximations of thesecond-order cone. Mathematics of Operations Research, 26(2), 193-205.

28//

Page 72: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Exact NMF computation and regular n-gons

We observed a special structure on the solutions for regular n-gons,leading to the best known upper bound and closing the gap for somen-gons:

rank+(Sn) ≤

2dlog2(n)e − 1 for 2k−1 < n ≤ 2k−1 + 2k−2,2dlog2(n)e for 2k−1 + 2k−2 < n ≤ 2k .

[V15] Vandaele, G. & Glineur, On the Linear Extension Complexity of Regularn-gons (2015).

Implication: conic quadratic programming is ‘polynomially reducible’to linear programming.[BTN01] Ben-Tal and Nemirovski (2001). On polyhedral approximations of thesecond-order cone. Mathematics of Operations Research, 26(2), 193-205.

28//

Page 73: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

NMF under the separability assumption

29//

Page 74: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Separability Assumption

Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸

U

V , with |K| = r .

[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.

30//

Page 75: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Separability Assumption

Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸

U

V , with |K| = r .

[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.

30//

Page 76: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Separability Assumption

Separability of M: there exists an index set K and V ≥ 0 withM = M(:,K)︸ ︷︷ ︸

U

V , with |K| = r .

[AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization –Provably, STOC 2012.

30//

Page 77: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Applications

In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.

In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.

Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.

Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.

31//

Page 78: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Applications

In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.

In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.

Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.

Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.

31//

Page 79: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Applications

In hyperspectral imaging, this is the pure-pixel assumption: for eachmaterial, there is a ‘pure’ pixel containing only that material.[M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insightsfrom Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, 2014.

In document classification: for each topic, there is a ‘pure’ word usedonly by that topic (an ‘anchor’ word).[A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees,ICML 2013.

Time-resolved Raman spectra analysis: each substance has a peak inits spectrum while the other spectra are (close) to zero.[L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis ofTime-Resolved Raman Spectra, Appl Spectrosc. 2016.

Others: video summarization, foreground-background separation.[ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for findingrepresentative objects, CVPR 2012.[KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with `1-and Bregman Loss Functions, SIAM data mining 2015.

31//

Page 80: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric Interpretation

The columns of U are the vertices of the convex hull of the columns of M:

M(:, j) =r∑

k=1

U(:, k)V (k , j) ∀j , wherer∑

k=1

V (k , j) = 1,V ≥ 0.

32//

Page 81: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric Interpretation with Noise

The columns of U are the vertices of the convex hull of the columns of M:

M(:, j) ≈r∑

k=1

U(:, k)V (k , j) ∀j , wherer∑

k=1

V (k , j) = 1,V ≥ 0.

Goal: theoretical analysis of the robustness to noise of separable NMFalgorithms

32//

Page 82: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Key Parameters: Noise and Conditioning

We assumeM = U[Ir , V

′]Π + N,

where V ′ ≥ 0, Π is a permutation and N is the noise.

We will assume that the noise is bounded (but otherwise arbitrary):

||N(:, j)||2 ≤ ε, for all j ,

and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:

33//

Page 83: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Key Parameters: Noise and Conditioning

We assumeM = U[Ir , V

′]Π + N,

where V ′ ≥ 0, Π is a permutation and N is the noise.

We will assume that the noise is bounded (but otherwise arbitrary):

||N(:, j)||2 ≤ ε, for all j ,

and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:

33//

Page 84: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Key Parameters: Noise and Conditioning

We assumeM = U[Ir , V

′]Π + N,

where V ′ ≥ 0, Π is a permutation and N is the noise.

We will assume that the noise is bounded (but otherwise arbitrary):

||N(:, j)||2 ≤ ε, for all j ,

and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:

33//

Page 85: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Key Parameters: Noise and Conditioning

We assumeM = U[Ir , V

′]Π + N,

where V ′ ≥ 0, Π is a permutation and N is the noise.

We will assume that the noise is bounded (but otherwise arbitrary):

||N(:, j)||2 ≤ ε, for all j ,

and some dependence on the conditioning κ(U) = σmax(U)σmin(U) is unavoidable:

33//

Page 86: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Successive Projection Algorithm (SPA)

0: Initially K = ∅.For i = 1 : r1: Find j∗ = argmaxj ||M(:, j)||.2: K = K ∪ j∗.3: M ←

(I − uuT

)M where u = M(:,j∗)

||M(:,j∗)||2 .end∼modified Gram-Schmidt with column pivoting.

Theorem. If ε ≤ O(σmin(U)√rκ2(U)

), SPA satisfies

||U−M(:,K)|| = max1≤k≤r

||U(:, k)−M(:,K(k))|| ≤ O(εκ2(U)

).

Advantages. Extremely fast, no parameter.

Drawbacks. Requires U to be full rank; bound is weak.

[GV14] G., Vavasis, Fast and Robust Recursive Algorithms for Separable Nonnegative MatrixFactorization, IEEE Trans. Patt. Anal. Mach. Intell. 36 (4), pp. 698-714, 2014.

34//

Page 87: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Successive Projection Algorithm (SPA)

0: Initially K = ∅.For i = 1 : r1: Find j∗ = argmaxj ||M(:, j)||.2: K = K ∪ j∗.3: M ←

(I − uuT

)M where u = M(:,j∗)

||M(:,j∗)||2 .end∼modified Gram-Schmidt with column pivoting.

Theorem. If ε ≤ O(σmin(U)√rκ2(U)

), SPA satisfies

||U−M(:,K)|| = max1≤k≤r

||U(:, k)−M(:,K(k))|| ≤ O(εκ2(U)

).

Advantages. Extremely fast, no parameter.

Drawbacks. Requires U to be full rank; bound is weak.

[GV14] G., Vavasis, Fast and Robust Recursive Algorithms for Separable Nonnegative MatrixFactorization, IEEE Trans. Patt. Anal. Mach. Intell. 36 (4), pp. 698-714, 2014.

34//

Page 88: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Pre-conditioning for More Robust SPA

Observation. Pre-multiplying M preserves separability:

P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V

′]Π + PN.

Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)

minA∈Sr+

log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,

allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.

Theorem. If ε ≤ O(σmin(U)r√r

), preconditioned SPA satisfies

||U −M(:,K)|| ≤ O (εκ(U)).

[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.

35//

Page 89: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Pre-conditioning for More Robust SPA

Observation. Pre-multiplying M preserves separability:

P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V

′]Π + PN.

Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).

Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)

minA∈Sr+

log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,

allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.

Theorem. If ε ≤ O(σmin(U)r√r

), preconditioned SPA satisfies

||U −M(:,K)|| ≤ O (εκ(U)).

[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.

35//

Page 90: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Pre-conditioning for More Robust SPA

Observation. Pre-multiplying M preserves separability:

P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V

′]Π + PN.

Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)

minA∈Sr+

log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,

allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.

Theorem. If ε ≤ O(σmin(U)r√r

), preconditioned SPA satisfies

||U −M(:,K)|| ≤ O (εκ(U)).

[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.

35//

Page 91: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Pre-conditioning for More Robust SPA

Observation. Pre-multiplying M preserves separability:

P M = P (U[Ir ,V′]Π + N) = (PU) [Ir ,V

′]Π + PN.

Ideally, P = U−1 so that κ(PU) = 1 (assuming m = r).Solving the minimum volume ellipsoid centered at the origin andcontaining all the columns of M (which is SDP representable)

minA∈Sr+

log det(A)−1 s.t. mjTAmj ≤ 1 ∀ j ,

allows to approximate U−1: in fact, A∗ ≈ (UUT )−1.

Theorem. If ε ≤ O(σmin(U)r√r

), preconditioned SPA satisfies

||U −M(:,K)|| ≤ O (εκ(U)).

[GV15] G., Vavasis, SDP-based Preconditioning for More Robust Near-Separable NMF, SIAM J.on Optimization, 2015.

35//

Page 92: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric Interpretation

Figure: Geometric Interpretation of the SDP-based Preconditioning.

See also Mizutani, Ellipsoidal Rounding for Nonnegative MatrixFactorization Under Noisy Separability, JMLR, 2014.

36//

Page 93: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Synthetic data sets

Each entry of U ∈ R40×20+ uniform in [0, 1]; each column normalized.

The other columns of M are the middle points of the columns of U(hence there are

(202

)= 190).

The noise moves the middle points toward the outside of the convexhull of the column of U.

37//

Page 94: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Synthetic data sets

Each entry of U ∈ R40×20+ uniform in [0, 1]; each column normalized.

The other columns of M are the middle points of the columns of U(hence there are

(202

)= 190).

The noise moves the middle points toward the outside of the convexhull of the column of U.

37//

Page 95: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Results for the synthetic data sets

Figure: Average of the fraction of columns correctly extracted depending on thenoise level (for each noise level, 25 matrices are generated).

38//

Page 96: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Combinatorial formulation for separable NMF

We want to find the index set K with |K| = r such that

M = M(:,K)V .

This is equivalent to finding X ∈ Rn×n with r non-zero rows such that

M = M X .

A combinatorial formulation:

minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.

How to make X row sparse?

39//

Page 97: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Combinatorial formulation for separable NMF

We want to find the index set K with |K| = r such that

M = M(:,K)V .

This is equivalent to finding X ∈ Rn×n with r non-zero rows such that

M = M X .

A combinatorial formulation:

minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.

How to make X row sparse?

39//

Page 98: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Combinatorial formulation for separable NMF

We want to find the index set K with |K| = r such that

M = M(:,K)V .

This is equivalent to finding X ∈ Rn×n with r non-zero rows such that

M = M X .

A combinatorial formulation:

minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.

How to make X row sparse?

39//

Page 99: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Combinatorial formulation for separable NMF

We want to find the index set K with |K| = r such that

M = M(:,K)V .

This is equivalent to finding X ∈ Rn×n with r non-zero rows such that

M = M X .

A combinatorial formulation:

minX||X ||row,0 such that M = MX or ||M −MX || ≤ ε.

How to make X row sparse?

39//

Page 100: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

A Linear Optimization Model

minX∈Rn×n

+

trace(X ) = || diag(X )||1

such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .

Robustness: noise ≤ O(κ−1

)⇒ error ≤ O

(rεκ

)[GL14].

This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =

∑di=1 ||X (i , :)||∞ as a convex

surrogate for ||X ||row,0 [E+12].

[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.

[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

40//

Page 101: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

A Linear Optimization Model

minX∈Rn×n

+

trace(X ) = || diag(X )||1

such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .

Robustness: noise ≤ O(κ−1

)⇒ error ≤ O

(rεκ

)[GL14].

This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =

∑di=1 ||X (i , :)||∞ as a convex

surrogate for ||X ||row,0 [E+12].

[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.

[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

40//

Page 102: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

A Linear Optimization Model

minX∈Rn×n

+

trace(X ) = || diag(X )||1

such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .

Robustness: noise ≤ O(κ−1

)⇒ error ≤ O

(rεκ

)[GL14].

This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.

It is equivalent [GL16] to using ||X ||1,∞ =∑d

i=1 ||X (i , :)||∞ as a convexsurrogate for ||X ||row,0 [E+12].

[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.

[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.

[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

40//

Page 103: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

A Linear Optimization Model

minX∈Rn×n

+

trace(X ) = || diag(X )||1

such that ||M −MX || ≤ ε,Xij ≤ Xii ≤ 1 for all i , j .

Robustness: noise ≤ O(κ−1

)⇒ error ≤ O

(rεκ

)[GL14].

This model is an improvement over [B+12]: more robust and detects thefactorization rank r automatically.It is equivalent [GL16] to using ||X ||1,∞ =

∑di=1 ||X (i , :)||∞ as a convex

surrogate for ||X ||row,0 [E+12].

[GL14] G., Luce, Robust Near-Separable NMF Using Linear Optimization, JMLR 2014.

[B+12] Bittorf, Recht, Re, Tropp, Factoring nonnegative matrices with LPs, NIPS 2012.[E+12] Esser et al., A convex model for NMF and dimensionality reduction on physical space,IEEE Trans. Image Processing, 2012.[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

40//

Page 104: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Practical Model and Algorithm

minX∈Ω

||M −MX ||2F + µ tr(X ),

Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.

We used a fast gradient method (optimal 1st order):

1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).

2 k = 1, 2, . . .

1 X (k) = PΩ

(Y − 1

L∇f (Y )).

2 Y = X (k)+βk(X (k) − X (k−1)

),

where βk = αk (1−αk )α2

k+αk+1with αk+1 ≥ 0 t.q. α2

k+1 = (1− αk+1)α2k .

Projection onto Ω can be done effectively in O(n2 log(n)) operations.

The total computational cost is O(pn2) operations.

[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

41//

Page 105: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Practical Model and Algorithm

minX∈Ω

||M −MX ||2F + µ tr(X ),

Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.

We used a fast gradient method (optimal 1st order):

1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).

2 k = 1, 2, . . .

1 X (k) = PΩ

(Y − 1

L∇f (Y )).

2 Y = X (k)+βk(X (k) − X (k−1)

),

where βk = αk (1−αk )α2

k+αk+1with αk+1 ≥ 0 t.q. α2

k+1 = (1− αk+1)α2k .

Projection onto Ω can be done effectively in O(n2 log(n)) operations.

The total computational cost is O(pn2) operations.

[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

41//

Page 106: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Practical Model and Algorithm

minX∈Ω

||M −MX ||2F + µ tr(X ),

Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.

We used a fast gradient method (optimal 1st order):

1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).

2 k = 1, 2, . . .

1 X (k) = PΩ

(Y − 1

L∇f (Y )).

2 Y = X (k)+βk(X (k) − X (k−1)

),

where βk = αk (1−αk )α2

k+αk+1with αk+1 ≥ 0 t.q. α2

k+1 = (1− αk+1)α2k .

Projection onto Ω can be done effectively in O(n2 log(n)) operations.

The total computational cost is O(pn2) operations.

[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

41//

Page 107: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Practical Model and Algorithm

minX∈Ω

||M −MX ||2F + µ tr(X ),

Ω = X ∈ Rn,n | Xii ≤ 1,wiXij ≤ wjXii∀i , j.

We used a fast gradient method (optimal 1st order):

1 Choose an initial point X (0), Y = X (0), α1 ∈ (0, 1).

2 k = 1, 2, . . .

1 X (k) = PΩ

(Y − 1

L∇f (Y )).

2 Y = X (k)+βk(X (k) − X (k−1)

),

where βk = αk (1−αk )α2

k+αk+1with αk+1 ≥ 0 t.q. α2

k+1 = (1− αk+1)α2k .

Projection onto Ω can be done effectively in O(n2 log(n)) operations.

The total computational cost is O(pn2) operations.

[GL16] G. and Luce, A Fast Gradient Method for Nonnegative Sparse Regression with SelfDictionary, IEEE Trans. Image Processing, 2018.

41//

Page 108: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Hyperspectral unmixing

r = 6 r = 8

Time (s.) Rel. err. (%) Time (s.) Rel. err. (%)

VCA 1.02 18.05 1.05 22.68VCA-500 0.03 7.19 0.09 7.25

SPA 0.26 9.58 0.32 9.45SPA-500 <0.01 10.05 <0.01 8.86

SNPA 13.60 9.63 23.02 5.64SNPA-500 0.15 10.05 0.25 8.86

XRAY 28.17 7.50 95.34 6.82XRAY-500 0.15 8.07 0.28 7.36

H2NMF 12.20 5.81 14.92 5.47H2NMF-500 0.27 5.87 0.37 5.68

FGNSR-500 40.11 5.07 39.49 4.08

Table: Numerical results for the Urban HSI (the best result is highlighted in bold).

42//

Page 109: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Figure: Abundance maps extracted by FGNSR-500.

43//

Page 110: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Minimum-volume NMF: Relaxing separability

minK,V≥0

||M −M(:,K)V ||2F such that |K| = r .

test

Open problems: Efficient algorithm for min-vol NMF, robustness to noise.

Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.

44//

Page 111: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Minimum-volume NMF: Relaxing separability

minU≥0,V≥0

vol(U) such that ||M − UV ||2F ≤ ε,

where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .

Open problems: Efficient algorithm for min-vol NMF, robustness to noise.

Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.

44//

Page 112: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Minimum-volume NMF: Relaxing separability

minU≥0,V≥0

vol(U) such that ||M − UV ||2F ≤ ε,

where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .

Open problems: Efficient algorithm for min-vol NMF, robustness to noise.

Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.

44//

Page 113: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Minimum-volume NMF: Relaxing separability

minU≥0,V≥0

vol(U) such that ||M − UV ||2F ≤ ε,

where vol(U) ∼ det(UTU), V (:, j) ∈ ∆r for all j .

Open problems: Efficient algorithm for min-vol NMF, robustness to noise.

Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and dataanalytics: Identifiability, algorithms, and applications, arXiv:1803.01257, 2018.

44//

Page 114: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability with sparsity

Decompose a low rank matrix with known coefficients sparsity.

M = UV ,rank(M) = rank(U) = r ,‖V (:, j)‖0 ≤ k = r − s < r ∀j .

Many existing theoretical results (see, e.g., [Gribonval 16]) and algorithms(Dictionary Learning). But:

% Not many results specific to the low-rank case

% Only two deterministic identifiability results [Elad 06, Georgiev 05]

% Not much in the NMF case except `1 regularization

45//

Page 115: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability with sparsity

Decompose a low rank matrix with known coefficients sparsity.

M = UV ,rank(M) = rank(U) = r ,‖V (:, j)‖0 ≤ k = r − s < r ∀j .

Many existing theoretical results (see, e.g., [Gribonval 16]) and algorithms(Dictionary Learning). But:

% Not many results specific to the low-rank case

% Only two deterministic identifiability results [Elad 06, Georgiev 05]

% Not much in the NMF case except `1 regularization

45//

Page 116: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability with sparsity: example

Example: p = 3, r = 3, s=sparsity=1, n = 9.

data pointsfirst decomposition

second decomposition

46//

Page 117: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability with sparsity: example

Example: p = 3, r = 3, s=sparsity=1, n = 9.

data pointsfirst decompositionsecond decomposition

46//

Page 118: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability results

Theorem

Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each

hyperplane spanned by all but one column of U, there are⌊r(r−2)

s

⌋+ 1

data points with spark r .

! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).

! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).

! It is tight up to constant factors for any s = βr for any fixed constantβ.

! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.

[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.

47//

Page 119: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability results

Theorem

Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each

hyperplane spanned by all but one column of U, there are⌊r(r−2)

s

⌋+ 1

data points with spark r .

! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).

! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).

! It is tight up to constant factors for any s = βr for any fixed constantβ.

! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.

[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.

47//

Page 120: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability results

Theorem

Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each

hyperplane spanned by all but one column of U, there are⌊r(r−2)

s

⌋+ 1

data points with spark r .

! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).

! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).

! It is tight up to constant factors for any s = βr for any fixed constantβ.

! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.

[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.

47//

Page 121: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability results

Theorem

Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each

hyperplane spanned by all but one column of U, there are⌊r(r−2)

s

⌋+ 1

data points with spark r .

! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).

! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).

! It is tight up to constant factors for any s = βr for any fixed constantβ.

! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.

[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.

47//

Page 122: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Identifiability results

Theorem

Let M = UV where rank(U) = rank(M) = r and each column of V has atleast s zeros. The factorization (U,V ) is essentially unique if on each

hyperplane spanned by all but one column of U, there are⌊r(r−2)

s

⌋+ 1

data points with spark r .

! For s = 1, this requires r3 − 2r2 + r data points and it is tight up tothe constant r (counter examples for any n = r3 − 2r2).

! For s = r − 1, this requires r data points and it is tight (one on eachintersection of r − 1 hyperplanes).

! It is tight up to constant factors for any s = βr for any fixed constantβ.

! Nonnegativity not taken into account in the analysis, it helps both intheory and in practice: further work.

[CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis,arXiv:1808.08765.

47//

Page 123: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Geometric intuition

Example: p = 3, r = 3, sparsity=1, n = 4 + 3 + 2 = 9.

data pointsunique decomposition

48//

Page 124: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Sparsity in action

Spectral unmixing, R = 6, s = 4

! Sparsity is another way to obtain identifiability for matrixdecompositions.

% Hard combinatorial problems to solve. . .

49//

Page 125: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Sparsity in action

Spectral unmixing, R = 6, s = 4

! Sparsity is another way to obtain identifiability for matrixdecompositions.

% Hard combinatorial problems to solve. . .

49//

Page 126: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Sparsity in action

Spectral unmixing, R = 6, s = 4

! Sparsity is another way to obtain identifiability for matrixdecompositions.

% Hard combinatorial problems to solve. . .

49//

Page 127: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Take-home messages

1 NMF is a useful and widely used linear model in data analysis andmachine learning.

2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).

3 NMF is closely related to the nested polytopes problem and extendedformulations.

4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).

5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).

50//

Page 128: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Take-home messages

1 NMF is a useful and widely used linear model in data analysis andmachine learning.

2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).

3 NMF is closely related to the nested polytopes problem and extendedformulations.

4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).

5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).

50//

Page 129: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Take-home messages

1 NMF is a useful and widely used linear model in data analysis andmachine learning.

2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).

3 NMF is closely related to the nested polytopes problem and extendedformulations.

4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).

5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).

50//

Page 130: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Take-home messages

1 NMF is a useful and widely used linear model in data analysis andmachine learning.

2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).

3 NMF is closely related to the nested polytopes problem and extendedformulations.

4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).

5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).

50//

Page 131: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Take-home messages

1 NMF is a useful and widely used linear model in data analysis andmachine learning.

2 NMF is difficult (NP-hard) and ill-posed (non-uniqueness).

3 NMF is closely related to the nested polytopes problem and extendedformulations.

4 NMF with (self-)dictionary is tractable and well-posed (separableNMF).

5 To obtain identifiable NMF models: minimum volume or sparsity canbe used but, as opposed to separability, this does not lead to tractablemodels. This is an important direction of reasearch (robustness tonoise, tractability).

50//

Page 132: Nicolas Gillis Joint work with Fran˘cois Glineur, …laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slides...Computing Nonnegative Matrix Factorizations Nicolas Gillis

Thank you for your attention!

Code and papers available fromhttps://sites.google.com/site/nicolasgillis

51//


Recommended