Principal Manifolds and Probabilistic Principal Manifolds and Probabilistic Subspaces for Visual RecognitionSubspaces for Visual Recognition
Baback MoghaddamBaback MoghaddamTPAMI, June 2002.TPAMI, June 2002.
John GaleottiJohn GaleottiAdvanced PerceptionAdvanced Perception
February 12, 2004February 12, 2004
It’s all about subspacesIt’s all about subspaces
Traditional subspacesTraditional subspaces PCAPCA ICAICA Kernel PCA (& neural network NLPCA)Kernel PCA (& neural network NLPCA)
Probabilistic subspacesProbabilistic subspaces
Linear PCALinear PCA
We already know thisWe already know thisMain propertiesMain properties
Approximate reconstructionApproximate reconstructionx x ≈ ≈ yy
Orthonormality of the basis Orthonormality of the basis TT=I=I
Decorrelated principal componentsDecorrelated principal componentsEyEyiiyyjji≠ji≠j = 0 = 0
Linear ICALinear ICA
Like PCA, but the components’ distribution is Like PCA, but the components’ distribution is designed to be sub/super Gaussian designed to be sub/super Gaussian statistical statistical independenceindependence
Main propertiesMain properties Approximate reconstructionApproximate reconstruction
x x ≈ ≈ AyAy NonorthogonalityNonorthogonality of the basis of the basis AA
AATTA≠IA≠I Near factorization of the joint distribution Near factorization of the joint distribution PP((yy))
PP(y)(y) ≈ ∏ ≈ ∏ pp(y(yii))
Nonlinear PCA (NLPCA)Nonlinear PCA (NLPCA)
AKA principal curvesAKA principal curvesEssentially nonlinear regressionEssentially nonlinear regressionFinds a curved subspace passing Finds a curved subspace passing
“through the middle of the data”“through the middle of the data”
Nonlinear PCA (NLPCA)Nonlinear PCA (NLPCA)
Main propertiesMain properties Approximate reconstructionApproximate reconstruction
yy = = ff((xx)) Nonlinear projectionNonlinear projection
x x ≈ g(≈ g(yy)) No prior knowledge regarding joint distribution of No prior knowledge regarding joint distribution of
the components (typical)the components (typical)PP((yy) = ?) = ?
Two main methodsTwo main methods Neural network encoderNeural network encoder Kernel PCA (KPCA)Kernel PCA (KPCA)
NLPCA neural network encoderNLPCA neural network encoder
Trained to match the output to the inputTrained to match the output to the inputUses a “bottleneck” layer to force a Uses a “bottleneck” layer to force a
lower-dimensional representationlower-dimensional representation
KPCAKPCA
Similar to kernel-based nonlinear SVMSimilar to kernel-based nonlinear SVMMaps data to a higher dimensional Maps data to a higher dimensional
space in which linear PCA is appliedspace in which linear PCA is applied Nonlinear input mappingNonlinear input mapping
(x):(x): NNLL, N<L, N<L Covariance is computed with dot-productsCovariance is computed with dot-products For economy, make For economy, make ((xx) implicit) implicit
kk((xxii,,xxjj) = ( ) = ( ((xxii) ) ((xxjj) )) )
KPCAKPCA
Does not require nonlinear optimizationDoes not require nonlinear optimization Is not subject to overfittingIs not subject to overfitting Requires no prior knowledge of network Requires no prior knowledge of network
architecture or number of dimensionsarchitecture or number of dimensions Requires the (unprincipled) selection of Requires the (unprincipled) selection of
an “optimal” kernel and its parametersan “optimal” kernel and its parameters
Nearest-neighbor recognitionNearest-neighbor recognition
Find labeled image most similar to N-dim input Find labeled image most similar to N-dim input vector using a suitable M-dim subspacevector using a suitable M-dim subspace
Similarity ex: Similarity ex: SS((II11,,II22) ) || ∆ || || ∆ ||-1-1,, ∆ = ∆ = II1 1 - - II22 Observation: Two types of image variationObservation: Two types of image variation
Critical:Critical: Images of Images of differentdifferent objects objects Incidental:Incidental: Images of Images of samesame object under object under
different lighting, surroundings, different lighting, surroundings, etc.etc.
Problem:Problem: Preceding subspace projections doPreceding subspace projections donot help distinguish variation typenot help distinguish variation typewhen calculating similaritywhen calculating similarity
Probabilistic similarityProbabilistic similarity
Similarity based on probability that Similarity based on probability that ∆∆ is is characteristic of incidental variationscharacteristic of incidental variations ∆∆ = image-difference vector (N-dim)= image-difference vector (N-dim) ΩΩII = incidental ( = incidental (intrapersonalintrapersonal) variations) variations ΩΩEE = critical ( = critical (extrapersonalextrapersonal) variations) variations
€
S Δ( ) = P ΩI Δ( ) =P Δ ΩI( )P ΩI( )
Δ ΩI( )P ΩI( ) + Δ ΩE( )P ΩE( )
Probabilistic similarityProbabilistic similarity
Likelihoods Likelihoods P(∆|Ω)P(∆|Ω) estimated using estimated using subspace density estimationsubspace density estimation
Priors Priors PP(Ω)(Ω) are set to reflect specific are set to reflect specific operating conditions (often uniform)operating conditions (often uniform)
Two images are of the same object if Two images are of the same object if P(ΩP(ΩII|∆) > P(Ω|∆) > P(ΩEE|∆) |∆) S(∆) S(∆) > 0.5 > 0.5
€
S Δ( ) = P ΩI Δ( ) =P Δ ΩI( )P ΩI( )
Δ ΩI( )P ΩI( ) + Δ ΩE( )P ΩE( )
Subspace density estimationSubspace density estimation
Necessary for each Necessary for each P(∆|Ω),P(∆|Ω), ΩΩ ΩΩII, , ΩΩEE Perform PCA on training-sets of ∆ for each Perform PCA on training-sets of ∆ for each ΩΩ
The covariance matrix (∑) will define a GaussianThe covariance matrix (∑) will define a Gaussian Two subspaces:Two subspaces:
FF = M-dimensional principal subspace of ∑ = M-dimensional principal subspace of ∑ FF = non-principal subspace orthogonal to = non-principal subspace orthogonal to FF
yyii = ∆ projected onto principal eigenvectors = ∆ projected onto principal eigenvectors ii = ranked eigenvalues = ranked eigenvalues
Non-principal eigenvalues are typically unknown Non-principal eigenvalues are typically unknown and are estimated by fitting a function of the form and are estimated by fitting a function of the form f f --
nn to the known eigenvalues to the known eigenvalues
Subspace density estimationSubspace density estimation
22(∆) = PCA residual (reconstruction error)(∆) = PCA residual (reconstruction error) = density in non-principal subspace= density in non-principal subspace
≈ ≈ average of (estimated) average of (estimated) FF eigenvalues eigenvalues P(∆|Ω) P(∆|Ω) is marginalized into each subspaceis marginalized into each subspace
Marginal density is exact in Marginal density is exact in FF Marginal density is approximate in Marginal density is approximate in FF
Efficient similarity computationEfficient similarity computation
After doing PCA, use a whitening transform to After doing PCA, use a whitening transform to preprocess the labeled images into single preprocess the labeled images into single coefficients for each of the principal subspaces:coefficients for each of the principal subspaces:
where where and V are matrices of the principal and V are matrices of the principal eigenvalues and eigenvectors of either ∑eigenvalues and eigenvectors of either ∑ II or ∑ or ∑EE
At run time, apply the same whitening transform At run time, apply the same whitening transform to the input imageto the input image
Efficient similarity computationEfficient similarity computation
The whitening transform reduces the marginal The whitening transform reduces the marginal Gaussian calculations in the principal subspaces Gaussian calculations in the principal subspaces FF to simple Euclidean distances to simple Euclidean distances
The denominators are easy to precomputeThe denominators are easy to precompute
Efficient similarity computationEfficient similarity computation
Further speedup can be gained by using a Further speedup can be gained by using a maximum likelihood (ML) rule instead of a maximum likelihood (ML) rule instead of a maximum a posteriori (MAP) rule:maximum a posteriori (MAP) rule:
Typically, ML is only a few percent less Typically, ML is only a few percent less accurate than MAP, but ML is twice as fastaccurate than MAP, but ML is twice as fast In general, In general, ΩΩEE seems less important than seems less important than ΩΩII
Similarity ComparisonSimilarity Comparison
Eig
enfa
ce (
PC
A)
Sim
ilar
ity P
rob abilistic Sim
ilarity
ExperimentsExperiments
21x12 low-res faces, aligned and normalized21x12 low-res faces, aligned and normalized 5-fold cross validation5-fold cross validation
~ 140 unique individuals per subset~ 140 unique individuals per subset No overlap of individuals between subsets to test No overlap of individuals between subsets to test
generalization performancegeneralization performance 80% of the data only determines subspace(s)80% of the data only determines subspace(s) 20% of the data is divided into labeled images and 20% of the data is divided into labeled images and
query images for nearest-neighbor testingquery images for nearest-neighbor testing Subspace dimensions = Subspace dimensions = dd = 20 = 20
Chosen so PCA ~ 80% accurateChosen so PCA ~ 80% accurate
ExperimentsExperiments
KPCAKPCA Empirically tweaked Gaussian, polynomial, Empirically tweaked Gaussian, polynomial,
and sigmoidal kernelsand sigmoidal kernels Gaussian kernel performed the best, so it Gaussian kernel performed the best, so it
is used in the comparisonis used in the comparisonMAPMAP
Even split of the 20 subspace dimensionsEven split of the 20 subspace dimensionsMMEE = M = MII = d/2 = 10 so that M = d/2 = 10 so that MEE + M + MII = 20 = 20
ResultsResults
Recognition accuracy (percent)Recognition accuracy (percent)
N-DimensionalNearest Neighbor(no subspace)
ResultsResults
Recognition accuracy vs subspace dimensionalityRecognition accuracy vs subspace dimensionality
Note: data split 50/50 fortraining/testing ratherthan using CV
ConclusionsConclusions
Bayesian matching outperforms all other tested Bayesian matching outperforms all other tested methods and even achieves ≈ 90% accuracy with methods and even achieves ≈ 90% accuracy with only 4 projections (2 for each class of variation)only 4 projections (2 for each class of variation)
Bayesian matching is an order of magnitude faster Bayesian matching is an order of magnitude faster to train than KPCAto train than KPCA
Bayesian superiority with higher resolution images Bayesian superiority with higher resolution images verified in independent US Army FERIT testsverified in independent US Army FERIT tests
Wow!Wow! You should use this You should use this
My resultsMy results
50% Accuracy50% Accuracy Why so bad?Why so bad?
I implemented all suggested approximationsI implemented all suggested approximations Poor data--hand registeredPoor data--hand registered Too little dataToo little data
Note: data split 50/50 fortraining/testing ratherthan using CV
My resultsMy results
My dataMy data
His dataHis data