NMF with python

NMF with Python

http://amath.unist.ac.kr

2016.01.21~22

Kyunghoon Kim

http://amath.unist.ac.kr

Why use Low Rank Approximation?

• Data Compression and Storage when k << r

• Remove noise and uncertainty

⇒ improved performance on data mining task of retrieval(e.g., find similar items)

⇒ improved performance on data mining task of clustering

http://langvillea.people.cofc.edu/NISS-NMF.pdf

Weakness of Low Rank Approximation

• storage are usually completely dense

• interpretation of basis vectors is difficult due to mixed signs

• good truncation point k is hard to determine

Weakness of Low Rank Approximation

• storage are usually completely dense

• interpretation of basis vectors is difficult due to mixed signs

• good truncation point k is hard to determine

All create basis vectors that are mixed in sign. Negative elements make interpretation difficult!

use low-rank approximation with nonnegative factors to improve weaknesses of truncated-SVD

Ak = Uk⌃kVTk

Ak = WkHk

nonneg nonneg

nonneg nonnegnonneg

mixed mixed

IDEA of NMF

columns of W are the underlying basis vectors,

i.e., each of the m columns of A can be built from r columns of W.

A

Interpretation of NMF

A

columns of H give the weights associated witheach basis vector.

Ake1 = WkH⇤12

664

...w1...

3

775

2

664

...w2...

3

775

2

664

...wk...

3

775h11 h21 hk1+ · · ·++=

• basis vectors are not ⊥ ⇒ can have overlap of topicswi

• can restrict W, H to be sparse

• immediate interpretation

large ’s ⇒ basis vector is mostly about terms j how much doc1 is pointing in the “direction” of topicvector

• NMF is algorithm-dependent: W, H not unique

Properties of NMF

wij

hi1

wi

wi

A ⇡ WH

W,H � 0s.t.

min||A�WH||2F

Mean squared error objective function

A ⇡ WH

W,H � 0s.t.

min||A�WH||2F

Mean squared error objective function

Nonlinear Optimization Problem

• convex in W or H, but not both ⇒ tough to get global min

• huge # unknowns: mk for W and kn for H

• above objective is one of many possible

http://math.stackexchange.com/questions/393447/why-does-the-non-negative-matrix-factorization-problem-non-convex

http://nimfa.biolab.si/

import nimfanmf = nimfa.Nmf(matrix, seed="random_vcol", rank=2, max_iter=2000)

fit = nmf()W = fit.basis()H = fit.coef()

Python Library; NIMFA

pip install nimfaInstallation

Code http://nimfa.biolab.si/

Date post:	15-Apr-2017
Category:	Education
Upload:	kyunghoon-kim
View:	482 times
Download:	5 times

NMF with python

Education