Evaluation of Distance Metrics for Recognition Based on Non-Negative Matrix Factorization

Evaluation of Distance Metrics for Evaluation of Distance Metrics for Recognition Based on Non-Recognition Based on Non-

Negative Matrix FactorizationNegative Matrix Factorization

David Guillamet, Jordi VitriàDavid Guillamet, Jordi VitriàPattern Recognition LettersPattern Recognition Letters24:1599-1605, June, 200324:1599-1605, June, 2003

John GaleottiJohn GaleottiAdvanced PerceptionAdvanced Perception

March 23, 2004March 23, 2004

Actually, Two ICPR’02 PapersActually, Two ICPR’02 Papers

Analyzing Non-Negative Matrix Analyzing Non-Negative Matrix Factorization for Image ClassificationFactorization for Image Classification

David Guillamet, Bernt Schiele, Jordi David Guillamet, Bernt Schiele, Jordi VitriàVitrià

Determining a Suitable Metric When using Determining a Suitable Metric When using Non-negative Matrix FactorizationNon-negative Matrix Factorization

David Guillamet, Jordi VitriàDavid Guillamet, Jordi Vitrià

Non-Negative Matrix FactorizationNon-Negative Matrix Factorization

TLA: NMFTLA: NMF Used for dimensionality reductionUsed for dimensionality reduction

VVnnxxmm ≈ ≈ WWnnxxrrHHrrxxmm, r < nm/(n+m), r < nm/(n+m) VV has non-negative training samples as its columns has non-negative training samples as its columns WW contains the non-negative basis vectors contains the non-negative basis vectors HH contains the non-negative coefficients to contains the non-negative coefficients to

approximate each column of approximate each column of VV using using WW Results similar in concept to PCA, but with Results similar in concept to PCA, but with

non-negative “basis vectors”non-negative “basis vectors”

NMF Distinguishing PropertiesNMF Distinguishing Properties

Requires positive dataRequires positive dataComputationally expensiveComputationally expensivePart-based decompositionPart-based decomposition

Because only additive combinations of Because only additive combinations of original data are allowedoriginal data are allowed

Not an orthonormal basisNot an orthonormal basis

Different Decomposition TypesDifferent Decomposition Types20 Dimensions of Numeric Digits20 Dimensions of Numeric Digits

PCA NMFPCA NMF

50 Dimensions of Numeric Digits50 Dimensions of Numeric Digits

PCA NMF PCA NMF

Why not just use PCA?Why not just use PCA?

PCA is optimal for reconstructionPCA is optimal for reconstructionPCA is not optimal for separation and PCA is not optimal for separation and

recognition of classesrecognition of classes

NMF Issues AddressedNMF Issues Addressed

If/when is NMF better at dimensionality If/when is NMF better at dimensionality reduction than PCA for classification?reduction than PCA for classification?

Can combining PCA and NMF lead to Can combining PCA and NMF lead to better performance?better performance?

What is the best distance metric to use What is the best distance metric to use with the nonorthonormal basis of NMF?with the nonorthonormal basis of NMF?

How NMF WorksHow NMF Works

VVnnxxmm ≈ ≈ WWnnxxrrHHrrxxmm, r < nm/(n+m), r < nm/(n+m)Begin with a nBegin with a nxxm matrix of training data m matrix of training data VV

Each column is a vectorized data pointEach column is a vectorized data pointRandomly initialize Randomly initialize WW and and HH with positive with positive

valuesvalues Iterate according to update rules:Iterate according to update rules:

How NMF WorksHow NMF Works

In general, NMF requires the non-linear In general, NMF requires the non-linear optimization of an objective functionoptimization of an objective function

The update rules just given correspond The update rules just given correspond to a popular objective function, and are to a popular objective function, and are guaranteed to converge.guaranteed to converge. That objective function relates to the That objective function relates to the

probability of generating the images in probability of generating the images in VV from the bases from the bases WW and encodings and encodings HH::

NMF vs. PCA ExperimentsNMF vs. PCA Experiments

Dataset: 10 classes of natural texturesDataset: 10 classes of natural textures Clouds, grass, ice, trees, sand, sky, etc.Clouds, grass, ice, trees, sand, sky, etc. 932 color images total932 color images total Each image tessellated into 10x10 patchesEach image tessellated into 10x10 patches 1000 patches for training, 1000 for testing 1000 patches for training, 1000 for testing Each patch classified as a single textureEach patch classified as a single texture

Raw feature vectors: Color histogramsRaw feature vectors: Color histograms Each region histogrammed into 8 bins per Each region histogrammed into 8 bins per

color, 16 colors color, 16 colors 512 dimensional vectors 512 dimensional vectors

NMF vs. PCA ExperimentsNMF vs. PCA Experiments

Learn both NMF and PCA subspaces Learn both NMF and PCA subspaces for each class of histogramfor each class of histogram

For both NMF and PCA:For both NMF and PCA: Project queries onto the learned Project queries onto the learned

subspaces of each classsubspaces of each class Label each query by the subspace that Label each query by the subspace that

best reconstructs the querybest reconstructs the query This seems like a poor scheme for NMFThis seems like a poor scheme for NMF

(Other experiments allow better schemes) (Other experiments allow better schemes)

NMF vs. PCA ResultsNMF vs. PCA Results

NMF works best for dispersed classesNMF works best for dispersed classesPCA works best for compact classesPCA works best for compact classesBoth seem useful…try combining themBoth seem useful…try combining themBut, But, why are less than half of the sky why are less than half of the sky

vectors best reconstructed by PCA vectors best reconstructed by PCA when for sky PCA has a mean when for sky PCA has a mean reconstruction error less than 1/4 that of reconstruction error less than 1/4 that of NMF? Mistakes? NMF? Mistakes?

NMF+PCA ExperimentsNMF+PCA Experiments

During training, we learned whether During training, we learned whether NMF or PCA worked best for each classNMF or PCA worked best for each class

Project a query to a class using only the Project a query to a class using only the method that works best for that classmethod that works best for that class

Result: 2.3% improvement in the Result: 2.3% improvement in the recognition rate over NMF alone (PCA: recognition rate over NMF alone (PCA: 5.8%), but is this significant at 60%?5.8%), but is this significant at 60%?

Hierarchy ExperimentsHierarchy Experiments

At level k of the hierarchy, project the query At level k of the hierarchy, project the query onto each original class’ NMF or PCA onto each original class’ NMF or PCA subspacesubspace

But, to choose the direction to descend the But, to choose the direction to descend the hierarchy, we only care about the level k hierarchy, we only care about the level k super-class containing the matching classsuper-class containing the matching class

Furthermore, for each class the choice of Furthermore, for each class the choice of PCA vs. NMF can be independently set at PCA vs. NMF can be independently set at each level of the hierarchyeach level of the hierarchy

Hierarchy ResultsHierarchy Results

2% improvement in recognition rate2% improvement in recognition rate I really suspect that this is insignificant, I really suspect that this is insignificant,

and resulting only from the additional and resulting only from the additional degrees of freedomdegrees of freedom

They employ various additional They employ various additional neighborhood-based hacks to increase neighborhood-based hacks to increase their accuracy further, but I don’t see their accuracy further, but I don’t see any relevance to NMF specificallyany relevance to NMF specifically

Need for a better metricNeed for a better metric

Want to classify based on nearest Want to classify based on nearest neighbor, rather than reprojection errorneighbor, rather than reprojection error

Unfortunately, NMF generates a Unfortunately, NMF generates a nonorthonormal basis, and so the nonorthonormal basis, and so the relative distance to a base depends on relative distance to a base depends on the uniqueness of that basethe uniqueness of that base Bases will share a lot of pixels in common Bases will share a lot of pixels in common

areasareas

Earth Movers Distance (EMD)Earth Movers Distance (EMD)

Defined as the minimal amount of Defined as the minimal amount of “work” that must be performed to “work” that must be performed to transform one feature distribution into transform one feature distribution into the otherthe other

A special case of the “transportation A special case of the “transportation problem” from linear optimizationproblem” from linear optimization Let I=set of suppliers, J=set of consumers, Let I=set of suppliers, J=set of consumers,

ccijij=cost to ship from I to J, f=cost to ship from I to J, f ijij=amount =amount shipped from I to Jshipped from I to J

Distance = cost to make datasets equalDistance = cost to make datasets equal

Earth Movers Distance (EMD)Earth Movers Distance (EMD)

Based on finding a measure of Based on finding a measure of correlation between bases to define its correlation between bases to define its cost matrixcost matrix

The cost matrix weights the transition of The cost matrix weights the transition of one basis (bone basis (bii) to another (b) to another (bjj))

ccijij = dist = distangleangle(b(bii,b,bjj) = -( x • y )/( ||x|| ||y|| )) = -( x • y )/( ||x|| ||y|| )

EMD: Transportation ProblemEMD: Transportation Problem

ffijij = quant. shipped from i = quant. shipped from ijj

Consumers don’t shipConsumers don’t ship

Don’t exceed demandDon’t exceed demand

Don’t exceed supplyDon’t exceed supply

Demand Demand mustmust equal supply for EMD to be a metric equal supply for EMD to be a metric

EMD vs. “Other” ExperimentsEMD vs. “Other” Experiments

Digit recognition from MNIST digit databaseDigit recognition from MNIST digit database 60,000 training images + 10,000 for test60,000 training images + 10,000 for test Classify by NN and 5NN in the subspaceClassify by NN and 5NN in the subspace Result: EMD works best in low-dimensional Result: EMD works best in low-dimensional

subspaces, but in high-dimensional subspaces subspaces, but in high-dimensional subspaces EMD does not work wellEMD does not work well

More specificly, EMD works well when the More specificly, EMD works well when the bases contain some intersecting pixelsbases contain some intersecting pixels

Occlusion ExperimentsOcclusion Experiments

Randomly occlude either 1 or 2 of the 4 Randomly occlude either 1 or 2 of the 4 quadrants of an image (25% and 50% quadrants of an image (25% and 50% occlusion)occlusion)

Why does distWhy does distangleangle do so well? do so well?

Best subspace & distance with occlusionsBest subspace & distance with occlusionsLow dim.Low dim. High dim.High dim.

25% Occlusion25% Occlusion NMF+distNMF+distangleangle PCA sometimes PCA sometimes betterbetter

50% Occlusion50% Occlusion NMF+distNMF+distangleangle OR EMD OR EMD NMF+distNMF+distangleangle

DemoDemo

NMF difficultiesNMF difficultiesEMD experiments insteadEMD experiments instead

Demonstrate using existing code within the Demonstrate using existing code within the desired framework of a cost matrixdesired framework of a cost matrix

Their code: Their code: http://robotics.stanford.edu/~rubner/emd/dehttp://robotics.stanford.edu/~rubner/emd/default.htmfault.htm

My code: My code: http://www.vialab.org/john/Pres9-code/http://www.vialab.org/john/Pres9-code/

http://robotics.stanford.edu/~rubner/emd/default.htm

http://robotics.stanford.edu/~rubner/emd/default.htm

http://www.vialab.org/john/

ConclusionConclusion

NMF is a parts-based alternative to PCANMF is a parts-based alternative to PCANMF and PCA should be combined for NMF and PCA should be combined for

minimum-reprojection-error classificationminimum-reprojection-error classificationFor nearest-neighbor classification, NMF For nearest-neighbor classification, NMF

needs a better metricneeds a better metric When the subspace dimensionality is When the subspace dimensionality is

chosen appropriately for good bases, chosen appropriately for good bases, NMF+EMD or NMF+distNMF+EMD or NMF+distangleangle have the highest have the highest recognition ratesrecognition rates

Date post:	02-Feb-2016
Category:	Documents
Upload:	kairos
View:	27 times
Download:	0 times

Evaluation of Distance Metrics for Recognition Based on Non-Negative Matrix Factorization

Documents