Date post: | 07-Jul-2018 |
Category: |
Documents |
Upload: | juan-perez-arrikitaun |
View: | 214 times |
Download: | 0 times |
of 15
8/19/2019 kdd08slides (1)
1/33
8/19/2019 kdd08slides (1)
2/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic Data
Real Data
5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
3/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic Data
Real Data
5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
4/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
A Motivating Problem
A dialectologist has some dialectal information in a matrixA = ( a ij )
rows correspond to dialectal features
columns correspond to areas (e.g., municipalities)a ij = 1 if feature is present in the dialect spoken in the area.
Dialectologist wants to solve the following two problems:1 What are the k main characteristic features of dialects?2 What are the k characteristic areas for dialects?
To make more studies on few selected areas.
Some type of matrix decomposition is sought.
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
5/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
First Idea: NMF
Dialectologist don’t want to see negative values in thedecomposition.
“Dialect spoken in area A contains 1.2 of feature X and − 0.2of feature Y ” vs. “Dialect spoken in area A contains 0.7 offeature Z and 0.3 of feature V .”
Negative values can yield negative features
She considers Nonnegative Matrix Factorization .A is represented as A ≈ WH where W and H arenonnegative and their inner dimension is k .
But the columns of W and rows of H are just some nonnegativevectors
⇒ They don’t give the Dialectologist her characteristic areas andfeatures.
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
6/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
Second Idea: CX and CUR Decompositions
Dialectologist could use Column (CX) and Column-Row (CUR)decompositions .
CX Matrix A is represented as A ≈ CX with C
containing k columns of A (while X is arbitrary).CUR Matrix A is represented as A ≈ CUR with C as
above and R containing r rows of A (while U isarbitrary).
Columns of C and rows of R now give the desired characteristicareas and features.But now X and U can have negative values.
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
7/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
Solution: Nonnegative CX and CUR Decompositions
Dialectologist’s solution is to force alsoX and U be nonnegative.Thus
Characteristic areas are given by
columns of C .Characteristic features are given byrows of R (or, by columns of Cwhen CX decomposition is done toA T ).
Other features and areas arerepresented using onlynonnegative linear combinations.
SoutheasternNorthernS.E.TavastianCentral TavastianSavonianSouthwesternS.Osthrobotnian
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
l d l h d l l
http://find/
8/19/2019 kdd08slides (1)
8/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic Data
Real Data5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
O tli I t d ti D iti Al ith d C l it E i t C l i
http://find/
8/19/2019 kdd08slides (1)
9/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
The Nonnegative CX Decomposition
Problem (Nonnegative CX Decomposition, NNCX)
Given a matrix A ∈R m × n+ and an integer k , nd an m × k matrix C of k columns of A and a matrix X∈R k × n+ minimizing
A − CX F .Example:
A =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.9
0.6 0.5 0.2 0.4 1.0
C =0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2
X =1.0 0.0 0.0 0.9 1.70.0 1.0 0.0 0.0 0.50.0 0.0 1.0 0.5 0.0
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://goback/
8/19/2019 kdd08slides (1)
10/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
The Nonnegative CUR Decomposition
Problem (Nonnegative CUR Decomposition, NNCUR)
Given a matrix A ∈R m × n+ and integers k and r , nd an m × k matrix C of k columns of A , an r × n matrix R of r rows of A , and a matrix U ∈R k × r+ minimizing A − CUR F .
Example:
A =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0
C =0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2
U =0.0 1.32.2 0.00.0 0.7
R =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.9
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
11/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
NNCX as a Convex Cone
Columns of A representpoints in space.
0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0
0
0.5
1
1.5
00.5
11.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://goback/
8/19/2019 kdd08slides (1)
12/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
NNCX as a Convex Cone
C selects some of thesepoints.
0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0
0
0.5
1
1.5
00.5
11.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
13/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
NNCX as a Convex Cone
Points in C generatesome convex cone C .
v ∈C if there is x ∈R k+s.t. v = Cx .
0
0.5
1
1.5
00.5
11.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://goback/
8/19/2019 kdd08slides (1)
14/33
g p y p
NNCX as a Convex Cone
A − CX 2F equals to thesum of squared shortest
distances from A ’scolumns to cone’s points.
0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2
0.60
0.3−
0.41.00.4
2
2
= 0.03690
0.5
1
1.5
00.5
11.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://goback/
8/19/2019 kdd08slides (1)
15/33
g p y p
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic DataReal Data
5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
16/33
The Two Subproblems of [NN]CX
Finding matrix C (aka Column Subset Selection problem)
Finding matrix X when some matrix C is given
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
17/33
The Two Subproblems of [NN]CX
Finding matrix C (aka Column Subset Selection problem)
more combinatorial on its nature
nonnegativity constraint, in general, does not have any effects
computational complexity is unknown (assumed to beNP -hard)
Finding matrix X when some matrix C is given
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
18/33
The Two Subproblems of [NN]CX
Finding matrix C (aka Column Subset Selection problem)
more combinatorial on its nature
nonnegativity constraint, in general, does not have any effects
computational complexity is unknown (assumed to beNP -hard)
Finding matrix X when some matrix C is given
constrained (in NNCX) least squares tting problem
well-known methods to solve the problem in polynomial timefor CX one can use Moore–Penrose pseudo-inverse forX = C †Afor NNCX the problem is a convex quadratic program (solvedusing, e.g., quasi-Newtonian methods) .
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
19/33
The Local Algorithm for NNCX
Assume we can nd X when C is given. Local performs astandard greedy local search to select C .
Local
1 initialize C randomly and compute X2 while reconstruction error decreases
1 select c , a column of C, and a , a column of A not in C suchthat if c is replaced with a the reconstruction error decreasesmost
2
replace c
with a
3 compute X and return C and X
N.B. We need to solve X in step 2.1.
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
20/33
The ALS Algorithm
The ALS algorithm uses the alternating least squares methodoften employed in NMF algorithms.
ALS
1 initialize C̃ randomly2 while reconstruction error decreases
1 nd nonnegative X to minimize A − C̃X F2 nd nonnegative C̃ to minimize A − C̃X F
3 match columns of C̃ to their nearest columns in A
4 let C be those columns, compute X and return C and XC̃ does not contain A ’s columns.
Matching can be done in polynomial time.
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
21/33
How to Use Columns: Convex Quadratic Programming
Given C , we can nd nonnegative X minimizing A − CX F inpolynomial time
convex quadratic programming
quasi-Newton methods (L-BFGS)
also convex optimization methods are possible
But these methods can take quite some time.Local needs to solve X k (n − k ) times for a single local
swap.When nal C is selected, they can be used as apost-processing step.
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
22/33
How to Use Columns: Projection Method
We employ a simple projection method :1 let X = C †A (Moore–Penrose pseudo-inverse)2 for x ij < 0 let xij = 0
This method is fast in practice and is often used in NMF algorithms.However, no guarantees on its performance can be given.
In experiments, we used only this method for a fair comparison.
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
http://find/
8/19/2019 kdd08slides (1)
23/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
8/19/2019 kdd08slides (1)
24/33
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic DataReal Data
5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
25/33
Algorithms Used
Local
ALS
844 by Berry, Pulatova, and Stewart (ACM Trans. Math.Softw. 2005)
DMM by Drineas, Mahoney, and Muthukrishnan (ESA,APPROX, and arXiv 2006–07)based on sampling, approximates SVD within 1 + ε w.h.p., butneeds lots of columns in C.
K-means , which selects C using k -means
NMFtheoretical lower bound for NNCX and NNCUR
SVDlower bound for all methods
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
26/33
Synthetic Data
To Find Optimal X or Not
We used convex optimization(CVX) to solve optimal X.
SVD’s distance to optimalCX decomposition (OPTCVX)ALS is optimal evenwithout CVX (ALS, ALSCVX, and OPT CVXcoincide everywhere)
Local benetssomewhat from convexoptimizationpost-processing
0 0.1 0 .2 0 .3 0.4 0.510
−2
100
102
104
F r o
b e n
i u s
d i s t a n c e
noise
SVDALSlocalALS CVXlocal CVX
OPT CVX
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://find/
8/19/2019 kdd08slides (1)
27/33
Synthetic Data
Noise and Decomposition Size
2 8 1 4 20
100
105
F r o
b e n
i u s
d i s t a n c e
k
SVD844ALSNMFlocalkmeansDM M
Left: Local is the best (ex.SVD).
0 0.1 0 .2 0 .3 0.4 0.510
−2
100
102
104
106
F r o
b e n
i u s
d i s t a n c e
noise
SVD844ALSNMFlocalkmeansDMM
Right: ALS is the best (ex.SVD).
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/http://find/
8/19/2019 kdd08slides (1)
28/33
Real Data
CUR and NNCUR Decompositions of the NewsgroupData
Newsgroup data with CURand NNCUR decompositions.Local and ALS are the two
best methods.Only very small increase inreconstruction error whennonnegativity is required∴ data has latent NNCUR
structure.DMM is not included due to itsbad performance.
2 4 6 8 10 12
200
220
240
260
28 0
300
320
k
F r o
b e n
i u s
d i s t a
n c
e
SVDnormD844_nn844ALS_nnALSlocal_nnlocal
Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions
http://find/http://find/
8/19/2019 kdd08slides (1)
29/33
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
8/19/2019 kdd08slides (1)
30/33
Real Data
How Many Columns Are Needed to Beat SVD?
Relative error against SVD:error / SVD(5)
Jester joke dataset, similarexperiment done in Drineas etal. (arXiv), [NN]CXdecomposition
Local is mostly best – better than DMM without
nonnegativityIt takes k = 16 forLocal to be better thanSVD with k = 5.
5 10 15 20 25 30
0.95
1
1.05
1.1
1.15
k R
e l a t i v e r e c o n s
t r u
c t i o n e r r o r
ALSLocalDMMDMM_nn
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
31/33
Outline
1 Introduction
2 Denitions
3 Algorithms and Complexity
4 ExperimentsSynthetic DataReal Data
5 Conclusions
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
32/33
Conclusions
We studied nonnegative variants of CX and CURdecompositions.
Several real-world datasets seem to have such structure.Very simple algorithms were able to nd good
decompositions.Our algorithms can be better than general CX and CURalgorithms.
Better algorithms are sought.Perhaps the convex cone interpretation helps.
Model-selection issue: how big C and R should be?
CX and CUR decompositions are still relatively little studied inCS (esp. data mining) community.
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions
http://find/
8/19/2019 kdd08slides (1)
33/33
Thank You!
Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi
Interpretable Nonnegative Matrix Decompositions
http://find/