+ All Categories
Home > Documents > kdd08slides (1)

kdd08slides (1)

Date post: 07-Jul-2018
Category:
Upload: juan-perez-arrikitaun
View: 214 times
Download: 0 times
Share this document with a friend

of 15

Transcript
  • 8/19/2019 kdd08slides (1)

    1/33

  • 8/19/2019 kdd08slides (1)

    2/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic Data

    Real Data

    5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    3/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic Data

    Real Data

    5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    4/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    A Motivating Problem

    A dialectologist has some dialectal information in a matrixA = ( a ij )

    rows correspond to dialectal features

    columns correspond to areas (e.g., municipalities)a ij = 1 if feature is present in the dialect spoken in the area.

    Dialectologist wants to solve the following two problems:1 What are the k main characteristic features of dialects?2 What are the k characteristic areas for dialects?

    To make more studies on few selected areas.

    Some type of matrix decomposition is sought.

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    5/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    First Idea: NMF

    Dialectologist don’t want to see negative values in thedecomposition.

    “Dialect spoken in area A contains 1.2 of feature X and − 0.2of feature Y ” vs. “Dialect spoken in area A contains 0.7 offeature Z and 0.3 of feature V .”

    Negative values can yield negative features

    She considers Nonnegative Matrix Factorization .A is represented as A ≈ WH where W and H arenonnegative and their inner dimension is k .

    But the columns of W and rows of H are just some nonnegativevectors

    ⇒ They don’t give the Dialectologist her characteristic areas andfeatures.

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    6/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    Second Idea: CX and CUR Decompositions

    Dialectologist could use Column (CX) and Column-Row (CUR)decompositions .

    CX Matrix A is represented as A ≈ CX with C

    containing k columns of A (while X is arbitrary).CUR Matrix A is represented as A ≈ CUR with C as

    above and R containing r rows of A (while U isarbitrary).

    Columns of C and rows of R now give the desired characteristicareas and features.But now X and U can have negative values.

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    7/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    Solution: Nonnegative CX and CUR Decompositions

    Dialectologist’s solution is to force alsoX and U be nonnegative.Thus

    Characteristic areas are given by

    columns of C .Characteristic features are given byrows of R (or, by columns of Cwhen CX decomposition is done toA T ).

    Other features and areas arerepresented using onlynonnegative linear combinations.

    SoutheasternNorthernS.E.TavastianCentral TavastianSavonianSouthwesternS.Osthrobotnian

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    l d l h d l l

    http://find/

  • 8/19/2019 kdd08slides (1)

    8/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic Data

    Real Data5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    O tli I t d ti D iti Al ith d C l it E i t C l i

    http://find/

  • 8/19/2019 kdd08slides (1)

    9/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    The Nonnegative CX Decomposition

    Problem (Nonnegative CX Decomposition, NNCX)

    Given a matrix A ∈R m × n+ and an integer k , nd an m × k matrix C of k columns of A and a matrix X∈R k × n+ minimizing

    A − CX F .Example:

    A =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.9

    0.6 0.5 0.2 0.4 1.0

    C =0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2

    X =1.0 0.0 0.0 0.9 1.70.0 1.0 0.0 0.0 0.50.0 0.0 1.0 0.5 0.0

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://goback/

  • 8/19/2019 kdd08slides (1)

    10/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    The Nonnegative CUR Decomposition

    Problem (Nonnegative CUR Decomposition, NNCUR)

    Given a matrix A ∈R m × n+ and integers k and r , nd an m × k matrix C of k columns of A , an r × n matrix R of r rows of A , and a matrix U ∈R k × r+ minimizing A − CUR F .

    Example:

    A =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0

    C =0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2

    U =0.0 1.32.2 0.00.0 0.7

    R =0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.9

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    11/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    NNCX as a Convex Cone

    Columns of A representpoints in space.

    0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0

    0

    0.5

    1

    1.5

    00.5

    11.5

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://goback/

  • 8/19/2019 kdd08slides (1)

    12/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    NNCX as a Convex Cone

    C selects some of thesepoints.

    0.6 0.9 0.6 0.4 0.71.0 0.7 0.9 1.0 0.90.6 0.5 0.2 0.4 1.0

    0

    0.5

    1

    1.5

    00.5

    11.5

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    13/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    NNCX as a Convex Cone

    Points in C generatesome convex cone C .

    v ∈C if there is x ∈R k+s.t. v = Cx .

    0

    0.5

    1

    1.5

    00.5

    11.5

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://goback/

  • 8/19/2019 kdd08slides (1)

    14/33

    g p y p

    NNCX as a Convex Cone

    A − CX 2F equals to thesum of squared shortest

    distances from A ’scolumns to cone’s points.

    0.6 0.9 0.61.0 0.7 0.90.6 0.5 0.2

    0.60

    0.3−

    0.41.00.4

    2

    2

    = 0.03690

    0.5

    1

    1.5

    00.5

    11.5

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://goback/

  • 8/19/2019 kdd08slides (1)

    15/33

    g p y p

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic DataReal Data

    5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    16/33

    The Two Subproblems of [NN]CX

    Finding matrix C (aka Column Subset Selection problem)

    Finding matrix X when some matrix C is given

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    17/33

    The Two Subproblems of [NN]CX

    Finding matrix C (aka Column Subset Selection problem)

    more combinatorial on its nature

    nonnegativity constraint, in general, does not have any effects

    computational complexity is unknown (assumed to beNP -hard)

    Finding matrix X when some matrix C is given

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    18/33

    The Two Subproblems of [NN]CX

    Finding matrix C (aka Column Subset Selection problem)

    more combinatorial on its nature

    nonnegativity constraint, in general, does not have any effects

    computational complexity is unknown (assumed to beNP -hard)

    Finding matrix X when some matrix C is given

    constrained (in NNCX) least squares tting problem

    well-known methods to solve the problem in polynomial timefor CX one can use Moore–Penrose pseudo-inverse forX = C †Afor NNCX the problem is a convex quadratic program (solvedusing, e.g., quasi-Newtonian methods) .

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    19/33

    The Local Algorithm for NNCX

    Assume we can nd X when C is given. Local performs astandard greedy local search to select C .

    Local

    1 initialize C randomly and compute X2 while reconstruction error decreases

    1 select c , a column of C, and a , a column of A not in C suchthat if c is replaced with a the reconstruction error decreasesmost

    2

    replace c

    with a

    3 compute X and return C and X

    N.B. We need to solve X in step 2.1.

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    20/33

    The ALS Algorithm

    The ALS algorithm uses the alternating least squares methodoften employed in NMF algorithms.

    ALS

    1 initialize C̃ randomly2 while reconstruction error decreases

    1 nd nonnegative X to minimize A − C̃X F2 nd nonnegative C̃ to minimize A − C̃X F

    3 match columns of C̃ to their nearest columns in A

    4 let C be those columns, compute X and return C and XC̃ does not contain A ’s columns.

    Matching can be done in polynomial time.

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    21/33

    How to Use Columns: Convex Quadratic Programming

    Given C , we can nd nonnegative X minimizing A − CX F inpolynomial time

    convex quadratic programming

    quasi-Newton methods (L-BFGS)

    also convex optimization methods are possible

    But these methods can take quite some time.Local needs to solve X k (n − k ) times for a single local

    swap.When nal C is selected, they can be used as apost-processing step.

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    22/33

    How to Use Columns: Projection Method

    We employ a simple projection method :1 let X = C †A (Moore–Penrose pseudo-inverse)2 for x ij < 0 let xij = 0

    This method is fast in practice and is often used in NMF algorithms.However, no guarantees on its performance can be given.

    In experiments, we used only this method for a fair comparison.

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    http://find/

  • 8/19/2019 kdd08slides (1)

    23/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

  • 8/19/2019 kdd08slides (1)

    24/33

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic DataReal Data

    5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    25/33

    Algorithms Used

    Local

    ALS

    844 by Berry, Pulatova, and Stewart (ACM Trans. Math.Softw. 2005)

    DMM by Drineas, Mahoney, and Muthukrishnan (ESA,APPROX, and arXiv 2006–07)based on sampling, approximates SVD within 1 + ε w.h.p., butneeds lots of columns in C.

    K-means , which selects C using k -means

    NMFtheoretical lower bound for NNCX and NNCUR

    SVDlower bound for all methods

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    26/33

    Synthetic Data

    To Find Optimal X or Not

    We used convex optimization(CVX) to solve optimal X.

    SVD’s distance to optimalCX decomposition (OPTCVX)ALS is optimal evenwithout CVX (ALS, ALSCVX, and OPT CVXcoincide everywhere)

    Local benetssomewhat from convexoptimizationpost-processing

    0 0.1 0 .2 0 .3 0.4 0.510

    −2

    100

    102

    104

    F r o

    b e n

    i u s

    d i s t a n c e

    noise

    SVDALSlocalALS CVXlocal CVX

    OPT CVX

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://find/

  • 8/19/2019 kdd08slides (1)

    27/33

    Synthetic Data

    Noise and Decomposition Size

    2 8 1 4 20

    100

    105

    F r o

    b e n

    i u s

    d i s t a n c e

    k

    SVD844ALSNMFlocalkmeansDM M

    Left: Local is the best (ex.SVD).

    0 0.1 0 .2 0 .3 0.4 0.510

    −2

    100

    102

    104

    106

    F r o

    b e n

    i u s

    d i s t a n c e

    noise

    SVD844ALSNMFlocalkmeansDMM

    Right: ALS is the best (ex.SVD).

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/http://find/

  • 8/19/2019 kdd08slides (1)

    28/33

    Real Data

    CUR and NNCUR Decompositions of the NewsgroupData

    Newsgroup data with CURand NNCUR decompositions.Local and ALS are the two

    best methods.Only very small increase inreconstruction error whennonnegativity is required∴ data has latent NNCUR

    structure.DMM is not included due to itsbad performance.

    2 4 6 8 10 12

    200

    220

    240

    260

    28 0

    300

    320

    k

    F r o

    b e n

    i u s

    d i s t a

    n c

    e

    SVDnormD844_nn844ALS_nnALSlocal_nnlocal

    Saara Hyvönen, Pauli Miettinen , and Evimaria TerziInterpretable Nonnegative Matrix Decompositions

    http://find/http://find/

  • 8/19/2019 kdd08slides (1)

    29/33

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

  • 8/19/2019 kdd08slides (1)

    30/33

    Real Data

    How Many Columns Are Needed to Beat SVD?

    Relative error against SVD:error / SVD(5)

    Jester joke dataset, similarexperiment done in Drineas etal. (arXiv), [NN]CXdecomposition

    Local is mostly best – better than DMM without

    nonnegativityIt takes k = 16 forLocal to be better thanSVD with k = 5.

    5 10 15 20 25 30

    0.95

    1

    1.05

    1.1

    1.15

    k R

    e l a t i v e r e c o n s

    t r u

    c t i o n e r r o r

    ALSLocalDMMDMM_nn

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    31/33

    Outline

    1 Introduction

    2 Denitions

    3 Algorithms and Complexity

    4 ExperimentsSynthetic DataReal Data

    5 Conclusions

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    32/33

    Conclusions

    We studied nonnegative variants of CX and CURdecompositions.

    Several real-world datasets seem to have such structure.Very simple algorithms were able to nd good

    decompositions.Our algorithms can be better than general CX and CURalgorithms.

    Better algorithms are sought.Perhaps the convex cone interpretation helps.

    Model-selection issue: how big C and R should be?

    CX and CUR decompositions are still relatively little studied inCS (esp. data mining) community.

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    Outline Introduction Denitions Algorithms and Complexity Experiments Conclusions

    http://find/

  • 8/19/2019 kdd08slides (1)

    33/33

    Thank You!

    Saara Hyvönen, Pauli Miettinen , and Evimaria Terzi

    Interpretable Nonnegative Matrix Decompositions

    http://find/

Recommended