PCA and Face Recognition - Tamara · PDF fileFace recognition based on PCA models •Face...

transcript

PCA and Face Recognition

Dinghuang Ji

What is PCA?

Color Age Hair Eye Mouth Nose w/o

glasses w/o

earings ………

1. What features are the most important (but not semantic) to identify different group of people? 2. Can we combine these features to reduce this list?

Toy example

• 1. generate some data sample

x = 1:100; y = 20 + 3 * x + 60*randn(100,1)'; scatter(x,y,25,'b','*')

Example courtesy of Marc

Toy example

• 2. find a line fit f(x)

P = polyfit(x,y,1); yfit = P(1)*x+P(2); hold on; plot(x,yfit,'r-.');

Toy example

• 3. find a line fit g(y)

P1 = polyfit(y,x,1); xfit = P1(1)*y+P1(2); plot(xfit,y,'b-.');

Toy example

• 4. find a line fit with the first principal component

x_u = x - mean(x); y_u = y - mean(y); cov_xy = cov(x_u ,y_u); [eigenVec,eigenVal] = eig(cov_xy); plot(x,eigenVec(2,2)/eigenVec(2,1)*x_u+mean(y),'g-.');

Principle Component Analysis

• Principal component analysis (PCA) is a technique that is useful for the compression and classification of data. The purpose is to reduce the dimensionality of a data set (sample) by finding a new set of variables, smaller than the original set of variables, that nonetheless retains most of the sample's information.

• By information we mean the variation present in the sample, given by the correlations between the original variables. The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

Slides courtesy of Frank Masci

principal component

Slides courtesy of Deng Cai

• Multi-dimension

• The 1st PC z1 is a minimum distance fit to a line in X space

• The 2nd PC z2 is a minimum distance fit to a line in the plane perpendicular to the 1st PC, and have the largest variance perpendicular to 1st PC .

• PCs are a series of linear least squares fits to a line, each orthogonal to all the previous, and have the largest variance perpendicular to all previous PCs.

• Main steps for computing PCA:

• Form the covariance matrix S.

• Compute its eigenvectors: 𝒂𝑖 𝑖=1𝑝

• Use the first d eigenvectors 𝒂𝑖 𝑖=1𝑑 to form the d PCs.

• The transformation A is given by 𝐴 = 𝒂1, ⋯𝒂𝑑

• Dimension reduction: 𝑋 ∈ ℛ𝑝×𝑛 → 𝐴𝑇𝑋 ∈ ℛ𝑑×𝑛

• Original data: 𝐴𝑇𝑋 ∈ ℛ𝑑×𝑛 → 𝑋 = 𝐴 𝐴𝑇𝑋 ∈ ℛ𝑝×𝑛

Slides courtesy of Deng Cai

Face recognition based on PCA models

• Face Recognition using Eigenfaces

• Facial Recognition Using Active Shape Models, Local Patches and Support Vector Machines

• Face Recognition Based on Fitting a 3D Morphable Model

EigenFace

• The test image x is projected into the face space to obtain a vector p:

p = AT(x – m)

• The distance of p to each face class is defined by

Єk2 = ||p-pk||2; k = 1,…,m

• A distance threshold Өc, is half the largest distance between any two face images:

Өc = ½ maxj,k {||pj-pk||}; j,k = 1,…,m

Slides courtesy of Peter N. Belhumeur

EigenFace

• Find the distance Є between the original image x and its reconstructed image from the eigenface space, xf,

Є2 = || x – xf ||2 , where xf = U * x + m

• Recognition process:

• IF Є≥Өc

then input image is not a face image;

• IF Є<Өc AND Єk≥Өc for all k then input image contains an unknown face;

• IF Є<Өc AND Єk*=mink{ Єk} < Өc then input image contains the face of individual k*

Eigenface

• Limitations

• Variations in lighting conditions • Different lighting conditions for

enrolment and query.

• Bright light causing image saturation.

• Differences in pose – Head orientation

- 2D feature distances appear to distort.

• Expression

- Change in feature location and shape.

Active shape model

• Proposed by Cootes et. Al based on points distribution model • For facial images, we have

• Landmarks are manually labelled and aligned with Procrustes algorithm

• PCA analysis and obtain

• is called shape parameter and is used to change the facial shape

• Procrustes algorithm

• Find a rigid transformation between two shapes

• Could be computed by least square

Active shape model

Boundary finding with mahalanobis distance

Shape model

Profile pixel modeling

Active shape model

• Algorithm steps: 1. Fit a mean model

2. Find accurate landmark positions

3. Optimize to get a better fit

4. Repeat until convergence

Face recognition with ASM,LP and SVM

• Obtain a set of landmark correspondences.

• Compute local patch feature around the landmarks • 348 dim Gabor wavelet

• Maybe LBP, Geometric blur etc.

• Train one versus all svm model

Experiments

• Do PCA on features

Face recognition with ASM,LP and SVM

• Pro: • More robust to in-plane rotation and illumination

• Con: • Can’t handle profile view faces and wide range of illuminations

Face recognition with 3D Morphable Model

Manually label 7 landmarks of Test image

Fit 3D model to the 2D landmarks

Project 3D model to 2D image and iteratively optimizing model coefficients

4 Minimize

How Do They Do It?

By exploiting the statistics of known faces.

The morphable model is built from 3D scan of 100 males and 100 females with different ages. The structure of newly generated faces is constrained to be in the range of that of known faces. Slides courtesy of Volker Blanz

The Morphable 3D Face Model

The actual 3D structure of known faces is captured in the shape vector S = (x1, y1, z1, x2, …, yn, zn)T, containing the (x, y, z) coordinates of the n vertices of a face, and the texture vector T = (R1, G1, B1, R2, …, Gn, Bn)T, containing the color values at the corresponding vertices.

Slides courtesy of Volker Blanz

The Morphable 3D face model

Again, assuming that we have m such vector pairs in full correspondence, we can form new shapes Smodel and new textures Tmodel as:

iimodel a1

iiimodel

The eigenvalues si2 of CS represent the variance of the

data set along the direction si, the corresponding eigenvector of CS. So Smodel can now be expressed as:

The Morphable 3D Face Model

iiavmodel

and the probability density fit over our data set

is a function of = (1, 2, ... , m)T:

1exp()(

Optimization

• They employ a maximum a posteriori estimator

Experiments

• Can handle harsh illumination,

nonfrontal view or glasses

Experiments

3D Morphable model

• Demo

• Facegen

Thank you

• Questions are welcome

Recognition using Compressed Sensing

Sparse signals

Slide credit: Duarte, Marco F., et al. "Single-pixel imaging via compressive sampling." Signal Processing Magazine, IEEE 25.2 (2008): 83-91.

Selection of features is immaterial as long as the feature space is sparse

Eigenfaces Fisherfaces Laplacianfaces

Occluded images

Patches of image as features

Slide credit: Wright, John, et al. "Robust face recognition via sparse representation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.2 (2009): 210-227.

Sparse feature space and formulation of the recognition problem

Ideal solution (NP hard):

Compressed sensing solution:

L-1 and l-0 minimization routines

• L-1 norm:

– Matching pursuits

– Basis pursuit

– Quadratic solvers

• L-0 norm:

– Smoothened L0 algorithm (SL0)

Valid image vs invalid image

Results

Robust to noise and occlusion

Demo: Raw dataset • MSI data for 1 user in 1 session:

850nm 940nm white

460nm 630nm 700nm

Demo: ROI extraction

Raw image Find angle and largest rectangle

Crop out the largest rectangle

Demo: Features for recognition algorithm: image patches

Demo: Successful recognition heat maps

• Less number of users:

• Large number of users:

Note that the signal is sparse

Demo: Unsuccessful recognition heat maps

• Less number of users:

• Large number of users:

Note that the signal is NOT sparse.

Resources

• http://dsp.rice.edu/cs

Recognizing Actions in Movies

KTH Actions Dataset

Movie Dataset

Space-time Interest Points

• Describe a video segment instead of a single image

• Detected for multiple space-time scales

• Corners in space-time

Optical Flow

• Direction of movement of each pixel

Space-time Features

• Normalized histograms are concatenated into descriptor vectors

• K-means clustering on training data features to form visual vocabulary

Video Sequence Classification

• Space-time pyramid

• Histogram of visual words occurrences over a space-time volume

• Histograms of subsequences of video are concatenated and normalized

• Non-linear SVM using a Gaussian kernel

Results

Using Grammars for Action Recognition

Aniket Bera

Video analysis with CFGs

The “Inverse Hollywood problem”:

From video to scripts and storyboards via causal analysis.

Brand 1997

Action Recognition using Probabilistic Parsing.Bobick and Ivanov 1998

Recognizing Multitasked Activities from Video using

Stochastic Context-Free Grammar.

Moore and Essa 2001

CFG for human activities

enter detach leave enter detach attach touch touch detach attach leave

M. Brand. The "Inverse Hollywood Problem":From video to scripts and storyboards

via causal analysis. AAAI 1997.

Parse treeSCENE (Open up a PC)

IN ACTION (Open PC)

OUT IN

ADD ADD

enter detach leave enter

ACTION (unscrew) OUT

MOVE REMOVE

MOTION MOTION

detach attach touch touch detach attach leave

• Deterministic low-level primitive detection• Deterministic parsing

M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.

Stochastic CFGs

Action Recognition using Probabilistic Parsing.Bobick and Ivanov 1998

Gesture analysis with CFGs

Primitive recognition with HMMs

Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17

left-right

up-down

right-left

down-up

Parse Tree

TOP UD BOT DU

left-right up-down right-left down-up

Errors

Likelihood value over time (not discrete symbols)

Errors are inevitable…

but the grammar acts as a top-down constraint

Dealing with uncertainty & errors

Stolcke-Early (probabilistic) parser

SKIP rules to deal with insertion errors

SCFG for Blackjack

Recognizing Multitasked Activities from Video usingStochastic Context-Free Grammar.

Moore and Essa 2001

• Deals with more complex activities• Deals with more error types

Stochastic Grammars: Overview

• Representation: Stochastic grammar• Terminals: object interactions• Context-sensitive due to internal scene models

• Domain: Towers of Hanoi• Requires activities with

strong temporal constraints

• Contributions• Showed recognition &

decomposition with veryweak appearance models

• Demonstrated usefulnessof feedback from high tolow-level reasoning components

Expectation Grammars(CVPR 2003)

• Analyze video of a person physically solving the Towers of Hanoi task

• Recognize valid activity

• Identify each move

• Segment objects

• Detect distracters / noise

System Overview

ToH: Low-Level Vision

Raw VideoBackground

ForegroundComponents

Foreground andshadow detection

Low-Level Features• Explanation-based symbols

• Blob interaction events

• merge, split, enter, exit, tracked, noise

• Future Work: hidden, revealed, blob-part, coalesce

• All possible explanations generated• Inconsistent explanations heuristically pruned

Contributions

• Showed activity recognition and decomposition without appearance models

• Demonstrated usefulness of feedback from high-level, long-term interpretations to low-level, short-term decisions

PCA and Face Recognition - Tamara · PDF fileFace recognition based on PCA models •Face...

Documents