Date post: | 15-Apr-2017 |
Category: |
Education |
Upload: | kyunghoon-kim |
View: | 482 times |
Download: | 5 times |
Why use Low Rank Approximation?
• Data Compression and Storage when k << r
• Remove noise and uncertainty
⇒ improved performance on data mining task of retrieval(e.g., find similar items)
⇒ improved performance on data mining task of clustering
http://langvillea.people.cofc.edu/NISS-NMF.pdf
Weakness of Low Rank Approximation
• storage are usually completely dense
• interpretation of basis vectors is difficult due to mixed signs
• good truncation point k is hard to determine
Weakness of Low Rank Approximation
• storage are usually completely dense
• interpretation of basis vectors is difficult due to mixed signs
• good truncation point k is hard to determine
All create basis vectors that are mixed in sign. Negative elements make interpretation difficult!
use low-rank approximation with nonnegative factors to improve weaknesses of truncated-SVD
Ak = Uk⌃kVTk
Ak = WkHk
nonneg nonneg
nonneg nonnegnonneg
mixed mixed
IDEA of NMF
columns of W are the underlying basis vectors,
i.e., each of the m columns of A can be built from r columns of W.
A
Interpretation of NMF
A
columns of H give the weights associated witheach basis vector.
Ake1 = WkH⇤12
664
...w1...
3
775
2
664
...w2...
3
775
2
664
...wk...
3
775h11 h21 hk1+ · · ·++=
• basis vectors are not ⊥ ⇒ can have overlap of topicswi
• can restrict W, H to be sparse
• immediate interpretation
large ’s ⇒ basis vector is mostly about terms j how much doc1 is pointing in the “direction” of topicvector
• NMF is algorithm-dependent: W, H not unique
Properties of NMF
wij
hi1
wi
wi
A ⇡ WH
W,H � 0s.t.
min||A�WH||2F
Mean squared error objective function
A ⇡ WH
W,H � 0s.t.
min||A�WH||2F
Mean squared error objective function
Nonlinear Optimization Problem
• convex in W or H, but not both ⇒ tough to get global min
• huge # unknowns: mk for W and kn for H
• above objective is one of many possible
http://math.stackexchange.com/questions/393447/why-does-the-non-negative-matrix-factorization-problem-non-convex
http://nimfa.biolab.si/
import nimfanmf = nimfa.Nmf(matrix, seed="random_vcol", rank=2, max_iter=2000)
fit = nmf()W = fit.basis()H = fit.coef()
Python Library; NIMFA
pip install nimfaInstallation
Code http://nimfa.biolab.si/