Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | antonio-wallace |
View: | 213 times |
Download: | 0 times |
1
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
2
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
3
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
4
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
5
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
6
What can computational models tell us about face processing?
Garrison W. CottrellGary's Unbelievable Research Unit (GURU)Computer Science and Engineering DepartmentInstitute for Neural ComputationUCSD
Collaborators, Past, Present and Future:Ralph Adolphs, Luke Barrington, Serge Belongie, Kristin Branson, Tom Busey, Andy Calder, Eric Christiansen, Matthew Dailey, Piotr Dollar, Michael Fleming, AfmZakaria Haque, Janet Hsiao, Carrie Joyce, Brenden Lake, Kang Lee, Joe McCleery, Janet Metcalfe, Jonathan Nelson, Nam Nguyen, Curt Padgett, Angelina Saldivar, Honghao Shan, Maki Sugimoto, Matt Tong, Brian Tran, Keiji Yamada, Lingyun Zhang
IEEE Computational Intelligence Society
4/12/2006
7/100
And now for something completely different…
• The CIS goal is to “mimic nature for problem solving”
• My goal is to mimic nature in order to understand nature
• In fact, as a cognitive scientist, I am glad when my models make the same mistakes people do…
• Because that means the model is fitting the data better -- so maybe I have a better model!
• So - don’t look for a better problem solver here…hopefully, look for some insights into how people process faces.
IEEE Computational Intelligence Society
4/12/2006
8/100
Why use models to understand thought?
• Models rush in where theories fear to tread.
• Models can be manipulated in ways people cannot
• Models can be analyzed in ways people cannot.
IEEE Computational Intelligence Society
4/12/2006
9/100
Models rush in where theories fear to tread
• Theories are high level descriptions of the processes underlying behavior.• They are often not explicit about the processes involved.• They are difficult to reason about if no mechanisms are
explicit -- they may be too high level to make explicit predictions.
• Theory formation itself is difficult.• Using machine learning techniques, one can often build
a working modelworking model of a task for which we have no theories or algorithms (e.g., expression recognition).
• A working model provides an “intuition pump” for how things might work, especially if they are “neurally plausible” (e.g., development of face processing - Dailey and Cottrell).
• A working model may make unexpected predictions (e.g., the Interactive Activation Model and SLNT).
IEEE Computational Intelligence Society
4/12/2006
10/100
Models can be manipulated in ways people cannot
• We can see the effects of variations in cortical architecture (e.g., split (hemispheric) vs. non-split models (Shillcock and Monaghan word perception model)).
• We can see the effects of variations in processing resources (e.g., variations in number of hidden units in Plaut et al. models).
• We can see the effects of variations in environment (e.g., what if our parents were cans, cups or books instead of humans? I.e., is there something special about face expertise versus visual expertise in general? (Sugimoto and Cottrell, Joyce and Cottrell)).
• We can see variations in behavior due to different kinds of brain damage within a single “brain” (e.g. Juola and Plunkett, Hinton and Shallice).
IEEE Computational Intelligence Society
4/12/2006
11/100
Models can be analyzed in ways people cannot
In the following, I specifically refer to neural network models.
• We can do single unit recordings.
• We can selectively ablate and restore parts of the network, even down to the single unit level, to assess the contribution to processing.
• We can measure the individual connections -- e.g., the receptive and projective fields of a unit.
• We can measure responses at different layers of processing (e.g., which level accounts for a particular judgment: perceptual, object, or categorization? (Dailey et al. J Cog Neuro 2002).
IEEE Computational Intelligence Society
4/12/2006
12/100
How (I like) to build Cognitive Models
• I like to be able to relate them to the brain, so “neurally plausible” models are preferred -- neural nets.
• The model should be a working model of the actual task, rather than a cartoon version of it.
• Of course, the model should nevertheless be simplifying (i.e. it should be constrained to the essential features of the problem at hand):• Do we really need to model the (supposed) translation invariance and size invariance of biological perception?
• As far as I can tell, NO!
• Then, take the model “as is” and fit the experimental data: 0 fitting parameters is preferred over 1, 2 , or 3.
IEEE Computational Intelligence Society
4/12/2006
13/100
The other way (I like) to build Cognitive Models
• Same as above, except:• Use them as exploratory models -- in domains where there is little direct data (e.g. no single cell recordings in infants or undergraduates) to suggest what we might find if we could get the data. These can then serve as “intuition pumps.”
• Examples: • Why we might get specialized face processors• Why those face processors get recruited for other tasks
IEEE Computational Intelligence Society
4/12/2006
14/100
Outline• Review of our model of face and object processing
• Some insights from modeling:
• What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?
• Why would a face area process BMW’s?
• Some new directions:• How do we select where to look next?• How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
15/100
Outline• Review of our model of face and object processing
• Some insights from modeling:
• What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?
• Why would a face area process BMW’s?
• Some new directions:• How do we select where to look next?• How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
16/100
The Face Processing SystemThe Face Processing System
PCA
.
.
.
.
.
.
GaborFiltering
HappySadAfraidAngrySurprisedDisgustedNeural
Net
Pixel(Retina)
Level
Object(IT)
Level
Perceptual(V1)
Level
CategoryLevel
IEEE Computational Intelligence Society
4/12/2006
17/100
PCA
.
.
.
.
.
.
GaborFiltering
BobCarolTedAlice
NeuralNet
Pixel(Retina)
Level
Object(IT)
Level
Perceptual(V1)
Level
CategoryLevel
The Face Processing SystemThe Face Processing System
IEEE Computational Intelligence Society
4/12/2006
18/100
The Face Processing SystemThe Face Processing System
PCA...
.
.
.
GaborFiltering
BobCarolTedCupCanBookNeural
Net
Pixel(Retina)
Level
Object(IT)
Level
Perceptual(V1)
Level
CategoryLevel
FeatureFeaturelevellevel
IEEE Computational Intelligence Society
4/12/2006
19/100
The Face Processing SystemThe Face Processing System
LSF
PCA
HSF
PCA
.
.
.
.
.
.
GaborFiltering
BobCarolTedCupCanBookNeural
Net
Pixel(Retina)
Level
Object(IT)
Level
Perceptual(V1)
Level
CategoryLevel
IEEE Computational Intelligence Society
4/12/2006
20/100
The Gabor Filter Layer• Basic feature: the 2-D Gabor wavelet filter
(Daugman, 85):
• These model the processing in early visual areas
Convolution
*
Magnitudes
Subsample in a 29x36 grid
IEEE Computational Intelligence Society
4/12/2006
21/100
Principal Components Analysis
The Gabor filters give us 40,600 numbers We use PCA to reduce this to 50 numbers PCA is like Factor Analysis: It finds the underlying directions of Maximum Variance
PCA can be computed in a neural network through a competitive Hebbian learning mechanism
Hence this is also a biologically plausible processing step
We suggest this leads to representations similar to those in Inferior Temporal cortex
IEEE Computational Intelligence Society
4/12/2006
22/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
23/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
24/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
25/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
26/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
27/100
How to do PCA with a neural network
(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
28/100
The “Gestalt” Layer: Holons(Cottrell, Munro & Zipser, 1987; Cottrell & Fleming 1990;
Cottrell & Metcalfe 1990; O’Toole et al. 1991)
A self-organizing network that learns whole-object representations
(features, Principal Components, Holons, eigenfaces)(features, Principal Components, Holons, eigenfaces)
...
Holons
(Gestalt layer)
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
29/100
Holons• They act like face cells (Desimone, 1991):
• Response of single units is strong despite occluding eyes, e.g.
• Response drops off with rotation• Some fire to my dog’s face
• A novel representation: Distributed templates -- • each unit’s optimal stimulus is a ghostly looking face (template-like),
• but many units participate in the representation of a single face (distributed).
• For this audience: Neither exemplars nor prototypes!• Explain holistic processing:
• Why? If stimulated with a partial match, the firing represents votes for this template:
Units “downstream” don’t know what caused this unit to fire.
IEEE Computational Intelligence Society
4/12/2006
30/100
The Final Layer: Classification
(Cottrell & Fleming 1990; Cottrell & Metcalfe 1990; Padgett & Cottrell 1996; Dailey & Cottrell, 1999; Dailey et al. 2002)
The holistic representation is then used as input to a categorization network trained by supervised learning.
Excellent generalization performance demonstrates the sufficiency of the holistic representation for recognition
Holons
CategoriesCategories
...
Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc.Output: Cup, Can, Book, Greeble, Face, Bob, Carol, Ted, Happy, Sad, Afraid, etc.
Input fromInput fromPerceptual LayerPerceptual Layer
IEEE Computational Intelligence Society
4/12/2006
31/100
The Final Layer: Classification
• Categories can be at different levels: basic, subordinate.
• Simple learning rule (~delta rule). It says (mild lie here): • add inputs to your weights (synaptic strengths) when you are supposed to be on,
• subtract them when you are supposed to be off.• This makes your weights “look like” your favorite patterns – the ones that turn you on.
• When no hidden units => No back propagation of error.
• When hidden units: we get task-specific features (most interesting when we use the basic/subordinate distinction)
IEEE Computational Intelligence Society
4/12/2006
32/100
Outline• Review of our model of face and object processing
• Some insights from modeling:Some insights from modeling:
• What could “holistic processing” mean?What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?
• Why would a face area process BMW’s?
• Some new directions:• How do we select where to look next?• How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
33/100
Holistic Processing• Holistic processing refers to a type of processing where visual stimuli are treated “as a piece” -- in fact, we are unable to ignore other apparent “parts” of an image.
• Face processing, in particular, is thought to be “holistic” in nature.• We are better at recognizing “Bob’s nose” when it is on his face
• Changing the spacing between the eyes makes the nose look different
• We are unable to ignore conflicting information from other parts of a face
• All of these might be summarized as “context influences perception,” but the context is obligatory.
IEEE Computational Intelligence Society
4/12/2006
34/100
Who do you see?
• Context influences perception
IEEE Computational Intelligence Society
4/12/2006
35/100
Same Different Task
IEEE Computational Intelligence Society
4/12/2006
36/100
IEEE Computational Intelligence Society
4/12/2006
37/100
IEEE Computational Intelligence Society
4/12/2006
38/100
IEEE Computational Intelligence Society
4/12/2006
39/100
These look like very different These look like very different womenwomen
IEEE Computational Intelligence Society
4/12/2006
40/100
But all that has changed is the height of the eyes, right?But all that has changed is the height of the eyes, right?
IEEE Computational Intelligence Society
4/12/2006
41/100
Take the configural processing test!
What emotion is being shown in the top half of the image below?
Happy, Sad, Afraid, Surprised, Disgusted, or Angry?
Now, what do you see?Now, what do you see?
Answer: SadAnswer: Sad
IEEE Computational Intelligence Society
4/12/2006
42/100
Do Holons explain these effects?
• Recall that they are templates --
• each unit’s optimal stimulus is a ghostly looking face (template-like)
• What will happen if there is a partial match?
• Suppose there is a holon that “likes happy faces”.
• The mouth will match, causing this unit to fire.
• Units downstream have learned to associate this firing with a happy face.
• They will “think” the top of the face is happier than it is…
IEEE Computational Intelligence Society
4/12/2006
43/100
Do Holons explain these effects?
• Clinton/Gore: The outer part of the face votes for Gore.
• The nose effect: a match at the eyes votes for that template’s nose.
• Expression/identity configural effects:• Split faces:
• The bottom votes for one person, the top another, but both vote for the WHOLE face…
• Split expressions:• The bottom votes for one expression, the top another…
IEEE Computational Intelligence Society
4/12/2006
44/100
Gabor Pattern
Gabor Filtering
Input Pixel Image
Attention to half an image
Attenuate
Attenuated Pattern
Composite vs. non-composite facial expressions (Calder et al. 2000)
Network ErrorsHuman Reaction Times
(error bars indicate one standard deviation)
IEEE Computational Intelligence Society
4/12/2006
46/100
Is Configural Processing of Identity and Expression Independent?
• Calder et al. (2000) found that adding additional inconsistent information that is not relevant to the task didn’t further slow reaction times.
• E.g., when the task is “who is it on the top?”, having a different person’s face on the bottom hurts your performance, but also having a different expression doesn’t hurt you any more.
Same Identity, Different Expression
Different Identity, Same Expression
Different Identity, Different Expression
IEEE Computational Intelligence Society
4/12/2006
47/100
(Lack of) Interaction between expression and identity
Network Reaction Time:
1 – Correct OutputHuman Reaction Time (ms)
Cottrell, Branson, and Calder, 2002Cottrell, Branson, and Calder, 2002
IEEE Computational Intelligence Society
4/12/2006
48/100
AttenuatedAttenuatedInconsistentInconsistentInformationInformation
herehere
The representationThe representationof shifted informationof shifted informationhere (non-configural)->here (non-configural)->
Has littleHas littleImpact here-->Impact here-->
becausebecauseThe bottom half The bottom half doesn’t match doesn’t match any templateany template
PCA
.
.
.
.
.
.
GaborFiltering
HappySadAfraidBobCarolTedNeural
Net
Pixel(Retina)
Level
Object(IT)
Level
Perceptual(V1)Level
CategoryLevel
Why does this work?Why does this work?
Leads toLeads toa weakera weaker
representationrepresentationhere->here->
Because the Because the Wrong templateWrong template
Is weakly activatedIs weakly activated
IEEE Computational Intelligence Society
4/12/2006
49/100
Configural/holistic processing phenomena
accounted for
• Interference from incorrect information in other half of image.
• Lack of interference from misaligned incorrect information.
• We have shown this for identity and expression, as well as the lack of interaction between these.
• Calder suggested from his data that we must have two representations: one for expression and one for identity: but our model has only one representation.
IEEE Computational Intelligence Society
4/12/2006
50/100
Outline• Review of our model of face and object processing
• Some insights from modeling:Some insights from modeling:
• What could “holistic processing” mean?• Does a specialized processor for faces need to be Does a specialized processor for faces need to be innately specified?innately specified?
• Why would a face area process BMW’s?
• Some new directions:• How do we select where to look next?• How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
51/100
Introduction• The brain appears to devote specialized resources to face processing.
• The issue: innate or learned?• Our approach: computational models guided by neuropsychological and experimental data.
• The model: competing neural networks + biologically plausible task and input biases.
• Results: interaction between face discrimination and low visual acuity leads to networks specializing for face recognition.
• No innateness necessary!
IEEE Computational Intelligence Society
4/12/2006
52/100
Step one: a model with
parts
• Independent networks compete to perform new tasks• A mediator rewards winners• The question: What might cause a specialized face
processor?
StimulusDecision
Mediator
FeatureExtraction units
FaceProcessing
ObjectProcessing
Processing
??
IEEE Computational Intelligence Society
4/12/2006
53/100
Developmental biases in learning
• The task: we have strong need to discriminate between faces but not between baby bottles.• Mother’s face recognition at 4 days (Pascalis et
al., 1995)
• The input: low spatial frequencies - which tends to be more holistic in nature• Infant sensitivity to high spatial frequencies is
low at birth
From Banks and Salapatek, 1981
IEEE Computational Intelligence Society
4/12/2006
54/100
Neural Network Implementation
InputStimulus Image
Preprocessing
• Separate nets in competition
• Output mixed by gate network
• More error feedback to “winner”
......
......
...
Gate
Output...
multiplicativeconnections
High spatialfrequency
Low spatialfrequency
IEEE Computational Intelligence Society
4/12/2006
55/100
Experimental methods
• Image data: 12 faces, 12 books, 12 cups, 12 soda cans, five examples each.
• 8-bit grayscale, cropped and scaled to 64x64 pixels
IEEE Computational Intelligence Society
4/12/2006
56/100
Image PreprocessingGabor Jet
Pattern Vector(8x5 Elements)
Filter Responses(512x5 Elements)
DimensionalityReduction
PCA
PCA
PCA
PCA
PCA
IEEE Computational Intelligence Society
4/12/2006
57/100
Task Manipulation
Trained networks for two types of task:• Superordinate four-way classification (book? face?)
• Subordinate classification within one class; simple classification for others (book? John?)
Face CanBook Cup Bob CanBook CupCarol Ted ...Alice
Task 1: Superordinate Task 2: Subordinate
Network Output Units
IEEE Computational Intelligence Society
4/12/2006
58/100
Input spatial frequency manipulation
Used two input pattern formats• Each module receives same full pattern vector• One module receives low spatial frequencies; other receives high spatial frequencies
a edcb
a 000.5cb 0 ed0.5c0
IEEE Computational Intelligence Society
4/12/2006
59/100
Measuring specialization
Train the network Record how gate network outputs change with each pattern
Net 1
Net 2
Gate
0.2
0.8
Net 1
Net 2
Gate
0.7
0.3
IEEE Computational Intelligence Society
4/12/2006
60/100
Specialization Results
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
Gat
ing
Uni
t Ave
rage
W
eigh
t
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
0
0.2
0.4
0.6
0.8
1
Faces Books Cups Cans
Four-way classification(Face, Book, Cup, Can?)
Book identification(Face, Cup, Can,
Book1, Book2, ...?)
Face identification(Book, Cup, Can,
Bob, Carol, Ted, ...?)
Module 1
Module 2
High frequencymodule
Low frequencymodule
TASK
All frequencies
Hi/Lo split
IEEE Computational Intelligence Society
4/12/2006
61/100
Modeling prosopagnosia Can “damage” the specialized network.
Face Id Task, Low-f Module, Split Inputs
0
10
20
30
40
50
60
70
80
90
100
0% 25% 50% 75% 100%
% Damage
% Generalization Accuracy
Faces
Books
Cups
Cans
Face Id Task, High-f Module, Split Inputs
0
10
20
30
40
50
60
70
80
90
100
0% 25% 50% 75% 100%
% Damage
% Generalization Accuracy
Faces
Books
Cups
Cans
Damage to high spatial frequency network degrades object
classification
Damage to low spatial frequency network degrades face
identification
IEEE Computational Intelligence Society
4/12/2006
62/100
Conclusions so far…• There is a strong interaction between task and spatial frequency in the degree of specialization for face processing.
• The model suggests that the infant’s low visual acuity and the need to discriminate between faces but not other objects could “lock in” a special face processor early in development.
• => General mechanisms (competition, known innate biases) could lead to a specialized face processing “module”• No need for an innately-specified processor
IEEE Computational Intelligence Society
4/12/2006
63/100
Outline• Review of our model of face and object processing
• Some insights from modeling:Some insights from modeling:
• What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?
• Why would a face area process BMW’s?Why would a face area process BMW’s?
• Some new directions:• How do we select where to look next?• How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
64/100
Are you a perceptual expert?Take the expertise test!!!**
“Identify this object with the first name that
comes to mind.”**Courtesy of Jim Tanaka, University of Victoria
IEEE Computational Intelligence Society
4/12/2006
65/100
“Car” - Not an expert
“2002 BMW Series 7” - Expert!
IEEE Computational Intelligence Society
4/12/2006
66/100
“Bird” or “Blue Bird” - Not an expert
“Indigo Bunting” - Expert!
IEEE Computational Intelligence Society
4/12/2006
67/100
“Face” or “Man” - Not an expert
“George Dubya”- Expert!
IEEE Computational Intelligence Society
4/12/2006
68/100
Greeble Experts (Gauthier et al. 1999)
• Subjects trained over many hours to recognize individual Greebles.
• Activation of the FFA increased for Greebles as the training proceeded.
IEEE Computational Intelligence Society
4/12/2006
69/100
The visual expertise mystery
If the so-called “Fusiform Face Area” (FFA) is specialized for face processing, then why would it also be used for cars, birds, dogs, or Greebles?
Our view: the FFA is an area associated with a process: fine level discrimination of homogeneous categories.
But the question remains: why would an area that presumably starts as a face area get recruited for these other visual tasks? Surely, they don’t share features, do they?
Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society
IEEE Computational Intelligence Society
4/12/2006
70/100
Solving the mystery with models
Main idea:• There are multiple visual areas that could compete to be the Greeble expert - “basic” level areas and the “expert” (FFA) area.
• The expert area must use features that distinguish similar looking inputs -- that’s what makes it an expert
• Perhaps these features will be useful for other fine-level discrimination tasks.
We will create • Basic level models - trained to identify an object’s class
• Expert level models - trained to identify individual objects.
• Then we will put them in a race to become Greeble experts.
• Then we can deconstruct the winner to see why they won. Sugimoto & Cottrell (2001), Proceedings of the Cognitive Science Society
IEEE Computational Intelligence Society
4/12/2006
71/100
Model Database
• A network that can differentiate faces, A network that can differentiate faces, books, cups andbooks, cups and
cans is a “basic level network.”cans is a “basic level network.”•A network that can also differentiate A network that can also differentiate individuals within ONE individuals within ONE
class (faces, cups, cans OR books) is an class (faces, cups, cans OR books) is an “expert.”“expert.”
IEEE Computational Intelligence Society
4/12/2006
72/100
Model
• Pretrain two groups of neural networks on different tasks.
• Compare the abilities to learn a new individual Greeble classification task.
cup
Carol
book
can
BobTed
cancupbookface
Hidden layer
Greeble1Greeble2Greeble3
Greeble1Greeble2Greeble3
(Experts)
(Non-experts)
IEEE Computational Intelligence Society
4/12/2006
73/100
Expertise begets expertise
• Learning to individuate cups, cans, books, or faces first, leads to faster learning of Greebles (can’t try this with kids!!!).
• The more expertise, the faster the learning of the new task!• Hence in a competition with the object area, FFA would win.Hence in a competition with the object area, FFA would win.• If our parents were cans, the FCA (Fusiform Can Area) would win.
AmountAmountOfOf
TrainingTrainingRequiredRequiredTo be aTo be aGreebleGreebleExpertExpert
Training Time on first taskTraining Time on first task
IEEE Computational Intelligence Society
4/12/2006
74/100
Entry Level Shift: Entry Level Shift: Subordinate RT decreases Subordinate RT decreases
with trainingwith training (rt = uncertainty of response = 1.0 -(rt = uncertainty of response = 1.0 -
max(output)max(output)))
Human dataHuman data
--- Subordinate Basic
RT
Network dataNetwork data
# Training Sessions
IEEE Computational Intelligence Society
4/12/2006
75/100
How do experts learn the How do experts learn the task?task?
• Expert level networks must be Expert level networks must be sensitivesensitive to within-class to within-class variation:variation:• Representations must Representations must amplifyamplify small differencessmall differences
• Basic level networks must Basic level networks must ignoreignore within-class variation.within-class variation.• Representations should Representations should reducereduce differencesdifferences
IEEE Computational Intelligence Society
4/12/2006
76/100
Observing hidden layer representations
• Principal Components Analysis on hidden Principal Components Analysis on hidden unit activation:unit activation:• PCA of hidden unit activations allows PCA of hidden unit activations allows us to reduce the dimensionality (to us to reduce the dimensionality (to 2) and plot representations.2) and plot representations.
• We can then observe how tightly We can then observe how tightly clustered stimuli are in a low-clustered stimuli are in a low-dimensional subspacedimensional subspace
• We expect basic level networks to We expect basic level networks to separate classes, but not individuals.separate classes, but not individuals.
• We expect expert networks to separate We expect expert networks to separate classes and individuals.classes and individuals.
IEEE Computational Intelligence Society
4/12/2006
77/100
Subordinate level training Subordinate level training magnifies small differences magnifies small differences
withinwithin object object representationsrepresentations
1 epoch 80 epochs 1280 epochs
Face
Basic
greeble
IEEE Computational Intelligence Society
4/12/2006
78/100
Greeble representations are Greeble representations are spread out prior to Greeble spread out prior to Greeble
TrainingTraining
FaceBasic
greeble
IEEE Computational Intelligence Society
4/12/2006
79/100
Variability Decreases Learning Variability Decreases Learning TimeTime
GreebleLearningTime
Greeble Variance Prior to Learning Greebles
(r = -0.834)(r = -0.834)
IEEE Computational Intelligence Society
4/12/2006
80/100
Examining the Net’s Representations
• We want to visualize “receptive fields” in the network.
• But the Gabor magnitude representation is noninvertible.
• We can learn an approximate inverse mapping, however.
• We used linear regression to find the best linear combination of Gabor magnitude principal components for each image pixel.
• Then projecting each hidden unit’s weight vector into image space with the same mapping visualizes its “receptive field.”
IEEE Computational Intelligence Society
4/12/2006
81/100
Two hidden unit receptive fields
AFTER TRAINING AS A FACE EXPERT
AFTER FURTHER TRAINING ON GREEBLES
HU 16HU 16
HU 36HU 36
NOTE: These are not face-NOTE: These are not face-specific!specific!
IEEE Computational Intelligence Society
4/12/2006
82/100
Controlling for the number of classes
• We obtained 13 classes from hemera.com:
• 10 of these are learned at the basic level.
• 10 faces, each with 8 expressions, make the expert task
• 3 (lamps, ships, swords) are used for the novel expertise task.
IEEE Computational Intelligence Society
4/12/2006
83/100
Results: Pre-training• New initial tasks of similar difficulty: In previous
work, the basic level task was much easier.
• These are the learning curves for the 10 object classes and the 10 faces.
IEEE Computational Intelligence Society
4/12/2006
84/100
Results• As before, experts still learned new expert level tasks faster
Number of epochsNumber of epochsTo learn swordsTo learn swords
After learning facesAfter learning facesOr objectsOr objects
Number of training epochs on faces or objectsNumber of training epochs on faces or objects
IEEE Computational Intelligence Society
4/12/2006
85/100
Outline• Review of our model of face and object processing
• Some insights from modeling:
• What could “holistic processing” mean?• Does a specialized processor for faces need to be innately specified?
• Why would a face area process BMW’s?
• Some new directions:Some new directions:• How do we select where to look next?How do we select where to look next?• How is information integrated across saccades?How is information integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
86/100
Issues I haven’t addressed…1. Development - what is the trajectory of the system
from infant to adult? How do representations change over development?
2. How do earlier acquired representations differ from later ones? I.e., what is the representational basis of Age of Acquisition effects?
3. How do representations change based on familiarity?4. Does the FFA participate in basic level processing?5. Dynamics of expertise: Eye movements
1. How do they change with expertise?2. Are there visual routines for different tasks?3. How much does the stimulus influence eye movements?
I.e., how flexible are the routines?4. How do we decide where to look next?
6. How are samples integrated across saccades?
IEEE Computational Intelligence Society
4/12/2006
87/100
How do we decide where to look next?
• Both bottom up and top down influences:• Local stimulus complexity == “interestingness”
• Task requirements: Look for discriminative features
• We’ve looked at at least two ideas:1.Gabor filter response variance2.Mutual information between the features and the categories
IEEE Computational Intelligence Society
4/12/2006
88/100
Interest points created using Gabor filter varianceInterest points created using Gabor filter variance
IEEE Computational Intelligence Society
4/12/2006
89/100
Where do we look next #2: Mutual Information
• Ullman et al. (2002) proposed that features of intermediate complexity are best for classification.
• They used mutual info. between patches in images and categories to find patches that were good discriminators:• Faces vs. non-faces; Cars vs. non-cars• They found that medium-sized and medium-resolution patches were best for these tasks.
• Our question: what features are best for subordinate-level classification tasks that need expertise, like facial identity recognition
• We found that traditional features such as eyes, noses, and mouths are informative for identity ONLY in the context of each other: I.e., in a configuration.
• Conclusion: Holistic processing develops because “it is good.”
IEEE Computational Intelligence Society
4/12/2006
90/100
Ullman et al 2002
• Features of intermediate complexity (size and resolution) are best for classification.
• These were determined by computing the mutual information between an image patch and the class
IEEE Computational Intelligence Society
4/12/2006
91/100
Facial Identity Classification
• Will features that are good for telling faces from objects be good for identification?
• We expect that more specific features will be needed for face identification.
IEEE Computational Intelligence Society
4/12/2006
92/100
Data Set• We used 36 frontal images of 6 individuals (6 images each) from FERET [Phillips et al., 1998]. The images were aligned.
• Gabor filter responses were extracted from rectangular grids
IEEE Computational Intelligence Society
4/12/2006
93/100
Patches
• Rectangular patches of different centers, sizes and Gabor filter frequencies were taken from images.
IEEE Computational Intelligence Society
4/12/2006
94/100
Corresponding Patches• Patches are defined as “corresponding” when
they are in the same position, size and Gabor filter frequency across images.
• If a “Fred patch” matches the corresponding patch in another image, this is evidence for the “Fredness” of the new image.
• We can then use some measure of how many Fred patches match, and a threshold, to decide if this face is “Fred.”
IEEE Computational Intelligence Society
4/12/2006
95/100
Mutual Information
• How useful the patches were for face identification was measured by mutual information:• I(C,F) = H(C) - H(C|F)
• C, F are binary variables standing for class and feature • C=1 when the image is of the individual• F=1 when the patch is present in the image
IEEE Computational Intelligence Society
4/12/2006
97/100
Results: Best Patches
The 6 patches with the highest mutual information. Frequency of 1 to 5 denote from the highest Gabor filter frequency to the lowest.
These are similar to each other because we do not eliminate redundancy in the patches (these are not independent)
IEEE Computational Intelligence Society
4/12/2006
98/100
Conclusions so far…
• Against intuition, local features of eyes and mouths by themselves are not very informative for face identity.
• Local features need to processed in medium-sized face areas for identification - where they are in a particular configuration with other features.
• This may explain why holistic processing has developed for face processing - simply because it is good or even necessary for identification.
IEEE Computational Intelligence Society
4/12/2006
99/100
Integration across saccades• Now, given these patches sampled from an image, what to do with them?
• Joyca LaCroix’s (2004) Natural Input Memory (NIM) model of recognition memory:• At study, sample the image at random points. Store the patches.
• At test, sample the new image at random points.• Count how many stored patches fall inside a ball of radius R around the new patches. The average of this is the recognition score.
• This is a kernel density estimation model, like GCM, but exemplars are patches, not the whole image.
• I.e., the “NIM” answer to the integration problem is:
Don’t integrate!• NIM is a natural partner to our eye movement modeling.
IEEE Computational Intelligence Society
4/12/2006
100
/100
Implications of the NIM model
• What would this mean for expertise?• Lots of experience -> lots of fragments -> better discrimination
• Familiarity also means lots of fragments, under many lighting conditions, all associated with one name
• Augmentation with an interest operator (e.g., look at high variance points on the face (see previous slides and Yamada & Cottrell 1994)) could easily lead to parts-based representations!
IEEE Computational Intelligence Society
4/12/2006
101
/100
Wrap up• We are able to explain a variety of results in face processing.
• We have a mechanistic way of talking about “holistic processing.”
• How a specialized area might arise for faces, and why low spatial frequencies (LSF) appear to be important in face processing (specialization model: LSF -> better learning and generalization).
• Why a face area would be recruited to be a Greeble area: expert level (fine discrimination) processing leads to highly differentiated features useful for other discrimination tasks.
• And…we have plans to go beyond simple passive recognition models…
102
END