The Analysis of Faces in Brains and Machines
9.523 Aspects of a Computational Theory of Intelligence
Rafael Reif
stay tuned...
Why is face analysis important for intelligence?
Remember/recognize people we’ve seen before Categorization – e.g. gender, race, age, kinship Social communication – emotions/mood, intentions, trustworthiness,
competence or intelligence, attractiveness Scene understanding, e.g. direction of gaze suggests focus of attention
Why is face recognition hard?
changing pose changing illumination
changing expression clutter
occlusion
aging
Jenkins, White, Van Montfort & Burton, Cognition, 2011
How good are we at face recognition?
Face recognition performance in humans
chance performance
testmybrain.org
Wilmer et al., 2012 Duchaine & Nakayama, 2006
Bruce et al., 1999
Face recognition performance in humans Which of the 10 photos on the bottom depicts the target face? Viewers are ~ 70% correct Performance degrades with changes in pose, expression Only slight improvement with short video clip of target
Importance of familiar vs. unfamiliar face recognition!
How good are the best machines? Public databases of face images serve as benchmarks:
Labeled Faces in the Wild (LFW, http://vis-‐www.cs.umass.edu/lfw) > 13,000 images of celebrities, 5,749 different identities
YouTube Faces Database (YTF, http://www.cs.tau.ac.il/~wolf/ytfaces) 3,425 videos, 1,595 different identities
Private face image datasets:
(Facebook) Social Face Classieication dataset 4.4 million face photos, 4,030 different identities
(Google) 100-‐200 million face images, ~ 8 million different identities
LFW YTF Facebook DeepFace 97.4% 91.4% Google FaceNet 99.6% 95.1% Human performance 97.5% 89.7%
Machine vision applications of face recognition
surveillance
access control
security, forensics
More applications of face recognition
content-‐based image retrieval social media
graphics, HCI humanoid robots
Aspects of face processing
Face detection – eind image regions that contain faces Face identieication – who is the person? Categorization – gender, age, race Facial expression – mood, emotion Non-‐verbal social perception and communication
It all began with Takeo Kanade (1973)… PhD thesis, Picture Processing System by Computer Complex and
Recognition of Human Faces
• Special purpose algorithms to locate eyes, nose, mouth, boundaries of face
• ~ 40 geometric features, e.g. ratios of distances and angles between features
Eigenfaces for recognition (Turk & Pentland) Principal Components Analysis (PCA)
Goal: reduce the dimensionality of the data while retaining as much information as possible in the original dataset
PCA allows us to compute a linear transformation that maps data from a high dimensional space to a lower dimensional subspace
Typical sample training set…
One or more images per person
Aligned & cropped to common pose, size
Simple background
Sample images from the Yale face database, results from C. deCoro http://www.cs.princeton.edu/~cdecoro/eigenfaces/
Eigenfaces for recognition (Turk & Pentland)
1-14
Perform PCA on a large set of training images, to create a set of eigenfaces, Ei(x,y), that span the data set
First components capture most of the variation across the data set, later components capture subtle variations
Each face image F(x,y) can be expressed as a weighted combination of the eigenfaces Ei(x,y):
Ψ(x,y): average face (across all faces)
Ψ(x,y)
http://vismod.media.mit.edu/vismod/demos/facerec/basic.html
F(x,y) = Ψ(x,y) + Σi wi*Ei(x,y)
Representing individual faces Each face image F(x,y) can be expressed as a weighted combination of the eigenfaces Ei(x,y):
Recognition process: (1) Compute weights wi
for novel face image
(2) Find image m in face database with most similar weights, e.g.
min (wi −wim
i=1
k
∑ )2
F(x,y) = Ψ(x,y) + Σi wi*Ei(x,y)
Changing expressions & lighting
1-16
Eigenfaces approach handles changes in facial expression ok…
… but not changes in lighting
(results from C. deCoro)
1-17
Face detection: Viola & Jones
Multiple view-‐based classi4iers based on simple features that best discriminate faces vs. non-‐faces
Most discriminating features learned from thousands of samples of face and non-‐face image windows
Attentional mechanism: cascade of increasingly discriminating classieiers improves performance
1-18
Viola & Jones use simple features Use simple rectangle features:
Σ I(x,y) in gray area – Σ I(x,y) in white area within 24 x 24 image sub-‐windows
• Initially consider 160,000 potential features per sub-‐window!
• features computed very efeiciently
Which features best distinguish face vs. non-‐face?
Learn most discriminating features from thousands of samples of face and non-‐face image windows
1-19
Learning the best features x = image window f = feature p = +1 or -‐1 θ = threshold
weak classiBier using one feature:
(x1,w1,1) (xn,wn,0)
…
normalize weights
find next best weak classifier
use classieication errors to update weights
n training samples, equal weights, known classes
τ
einal classieier
~ 200 features yields good results for “monolithic” classieier
AdaBoost
1-20
“Attentional cascade” of increasingly discriminating classieiers
Early classieiers use a few highly discriminating features, low threshold
• 1st classieier uses two features, removes 50% non-‐face windows
• later classieiers distinguish harder examples
• Increases efeiciency
• Allows use of many more features
à Cascade of 38 classieiers, using ~6000 features
Training with normalized faces
5000 faces many more non-‐face patches faces are normalized for scale, rotation small variation in pose
1-22
Viola & Jones results
With additional diagonal features, classieiers were created to handle image rotations and proeile views
Feature based vs. holistic processing
• inversion disrupts recognition of faces more than other objects
• prosopagnosics do not show inversion effect
Composite Face Effect
• identical top halves seen as different when aligned with different bottom halves
• when misaligned, top halves perceived as identical
Face Inversion Effect
Feature based vs. holistic processing Which features are more diagnostic?
Whole-‐Part Effect
Identieication of the “studied” face is signieicantly better in the whole vs. part condition
Test conditions Eyebrows are important!
View generalization mediated by motion? Hypothesis: Temporal association is used to link multiple views of a person’s face
12 female faces scanned for 3D shape and visual texture
image sequences were created that morph between two different faces
observers viewed morph sequences, back and forth
same or different person? (shown separated in time)
performance within morph groups was compromised by temporal association
✔
Wallis & Bulthoff, PNAS, 2001
The power of averages (Burton & colleagues)
Improves accuracy in the recognition of famous faces
-‐ PCA -‐ commercial system -‐ human experiments average “texture”
average “shape”
Faces everywhere...