+ All Categories
Home > Documents > Nicolas Pinto Massachusetts Institute of Technology David Cox

Nicolas Pinto Massachusetts Institute of Technology David Cox

Date post: 05-Feb-2016
Category:
Upload: ehren
View: 46 times
Download: 0 times
Share this document with a friend
Description:
Beyond Simple Features: A Large-Scale Feature Search Approach to Unconstrained Face Recognition. International Conference on Automatic Face and Gesture Recognition (FG), 2011. Nicolas Pinto Massachusetts Institute of Technology David Cox - PowerPoint PPT Presentation
Popular Tags:
30
BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David Cox The Rowland Institute at Harvard, Harvard University International Conference on Automatic Face and Gesture Recognition (FG), 2011.
Transcript
Page 1: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH

APPROACH TO UNCONSTRAINED FACE

RECOGNITION

Nicolas Pinto Massachusetts Institute of TechnologyDavid CoxThe Rowland Institute at Harvard, Harvard University

International Conference on Automatic Face and Gesture Recognition (FG), 2011.

Page 2: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Outline Introduction Method

V1-like visual representation High-throughput-derived multilayer visual

representations Kernel Combination Experiment Result Discussion

Page 3: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Introduction “Biologically-inspired” representation

capture aspects of the computational architecture of the brain and mimic its computational abilities

Page 4: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Introduction Large Scale Feature Search Framework

Generate models with different parameters then screening

Page 5: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Method - V1-like visual representation

“Null model” - only represent first-order description of the primary visual cortex

Detail Preprocessing: resize image to 150 pixels with aspect

ratio preserved using bicubic interpolation Input normalization: divide each pixel’s intensity value

by the norm of the pixels in the 3x3 neighboring region Gabor wavelet: 16 orientation, 6 spatial frequencies Output normalization: divide by the norm of the pixels

in the 3x3 neighboring region Thresholding and Clipping: output value not in (0,1) is

set to {0,1}

Page 6: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

V1-like visual representation Gabor Filter

Page 7: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Method - High-throughput-derived multilayer visual representations

Model architecture: Candidate models were

composed of a hierarchy of two (HT-L2) or three layers (HT-L3)

Page 8: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations

Input size HT-L2: 100 x 100 pixels HT-L3: 200 x 200 pixels

Input was converted into grayscale and locally normalized

Page 9: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Linear Filter

linearly filtered using a bank of filters to produce a stack of feature maps

this operation is analogous to the weighted integration of synaptic inputs, where each filter in the filterbank represents a different cell

Page 10: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Linear Filter (cont.)

Parameter: The filter shapes were chosen

randomly with {3, 5, 7, 9}, Depending on the layer l considered, the

number of filters was chosen randomly from the following sets:

In , In , In ,

All filter kernels were fixed to random values drawn from a uniform distribution

Page 11: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Activation Function

Output values were clipped to be within a parametrically defined

Page 12: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Activation Function (cont.)

Parameter: was randomly chosen to be or 0 was randomly chosen to be 1 or

Page 13: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Pooling

neighboring region were then pooled together and the resulting outputs were spatially downsampled

Page 14: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Pooling (cont.)

Parameter: The stride parameter was fixed to 2, resulting

in a downsampling factor of 4. The size of the neighborhood was randomly

chosen from {3, 5, 7, 9}. The exponent was randomly chosen from {1,

2, 10}. = 1, equivalent to blurring = 2 or 10, is -norm

Page 15: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Normalization

Draws biological inspiration from the competitive interactions observed in natural neuronal systems (e.g. contrast gain control mechanisms in cortical area V1, and elsewhere)

Page 16: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

High-throughput-derived multilayer visual representations Normalization (cont.)

Parameter: The size of the neighborhood region was randomly chosen

from {3, 5, 7, 9} The parameter was chosen from {0, 1} The vector of neighboring values could also be stretched

by gain values {, , } The threshold value was randomly chosen from , , }

Page 17: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Method - Evaluation Binary hard-margin linear SVM

4 feature vector

Page 18: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Method Model overview

Page 19: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Method – Screening Screening (model selection)

Select the best five models on LFW View1 aligned Set

Output dimension are ranged from 256 to 73984

Number of models: HT-L2 : 5915 HT-L3 : 6917

Page 20: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Feature Augmentation Multiple rescaled crops

Three different centered crops 250x250 150x150 125x75

Resized to the standard input size Train SVMs separately

Page 21: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Kernel Combination Three strategies

Blend kernels result from different crops Simple kernel addition with each kernel being

trace-normalized Blend 5 models within the same class Hierarchical blends across model class

Assign exponentially larger weight to higher-level representation (V1-like < HT-L2 < HT-L3)

Page 22: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Kernel Combination Kernel Method

Example:

Page 23: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Kernel Combination The original formulation

Is Equivalent

Page 24: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Kernel Combination Multiple Kernel Learning (MKL)

learn the kernel directly from data

Page 25: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Kernel Combination Multiple Kernel Learning (MKL)

Page 26: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Experiment Screen model on LFW View1 Train SVM and evaluate result using 10-

cross validation on LFW View 2

Page 27: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Result

Page 28: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Result Some error cases

Page 29: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

Discussion Use whole image pixel value, not dealing

with pose variation take advantage on background

information ? Disturb by background

Performance increase when addingdifferent crops

Page 30: Nicolas  Pinto  Massachusetts  Institute of  Technology David Cox

16-GPU Monster-Class Supercomputer

Environment GNU/Linux Python, C, C++, Cython CUDA, PyCuda


Recommended