Post on 02-Feb-2016
description
transcript
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
CS564 – Lecture 7. Object Recognition
and Scene Analysis
Reading Assignments:TMB2: Sections 2.2, and 5.2
“Handout”: Extracts from HBTNN 2e Drafts: Shimon Edelman and Nathan Intrator: Visual Processing of Object Structure
Guy Wallis and Heinrich Bülthoff: Object recognition, neurophysiology
Simon Thorpe and Michèle Fabre-Thorpe: Fast Visual Processing
(My thanks to Laurent Itti and Bosco Tjan for permission to use the slides they
prepared for lectures on this topic.)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Bottom-Up Segmentation or Top-Down Control?
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Object Recognition
What is Object Recognition?
Segmentation/Figure-Ground Separation: prerequisite or consequence?
Labeling an object [The focus of most studies]
Extracting a parametric description as well
Object Recognition versus Scene Analysis
An object may be part of a scene or
Itself be recognized as a “scene”
What is Object Recognition for?
As a context for recognizing something else (locating a house by the tree
in the garden)
As a target for action (climb that tree)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
"What" versus "How” in Human
Visual
Cortex
Parietal
Cortex
Inferotemporal
Cortex
How (dorsal)
What (ventral)
reach programming
grasp programming
AT: Goodale and Milner
Lesion here: Inability to verbalize or
pantomime size or orientation
DF: Jeannerod et al.
Lesion here: Inability to Preshape
(except for objects with size “in the semantics”
Monkey Data:
Mishkin and
Ungerleider on
“What” versus
“Where”
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Clinical Studies
Studies with patients with some visual deficits strongly
argue that tight interaction between where and
what/how visual streams are necessary for scene interpretation.
Visual agnosia: can see objects, copy drawings of them, etc., but cannot
recognize or name them!
Dorsal agnosia: cannot recognize objects
if more than two are presented simulta-
neously: problem with localization
Ventral agnosia: cannot identify objects.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
These studies suggest…
We bind features of objects into objects (feature binding)
We bind objects in space into some arrangement (space binding)
We perceive the scene.
Feature binding = what/how stream
Space binding = where stream
Double role of spatial relationships:
To relate different portions of an object or scene as a guide to recognition
Augmented by other “how” parameters, to guide our behavior with respect
to the observed scene.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Inferotemporal Pathways
Later stages of IT (AIT/CIT) connect to the frontal
lobe, whereas earlier ones (CIT/PIT) connect to the
parietal lobe. This functional distinction may well
be important in forming a complete picture of
inter-lobe interaction.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Shape perception and scene analysis
- Shape-selective neurons in cortex
- Coding: one neuron per object
or population codes?
- Biologically-inspired algorithms
for shape perception
- The "gist" of a scene: how can we get
it in 100ms or less?
- Visual memory: how much do we remember
of what we have seen?
- The world as an outside memory and our eyes as a lookup tool
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Face Cells in Monkey
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Object recognition
- The basic issues
- Translation and rotation invariance
- Neural models that do it
- 3D viewpoint invariance (data and models)
- Classical computer vision approaches: template matching and matched
filters; wavelet transforms; correlation; etc.
- Examples: face recognition.
- More examples of biologically-
inspired object recognition systems
which work remarkably well
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Extended Scene Perception
Attention-based analysis: Scan scene with attention, accumulate
evidence from detailed local analysis at each attended location.
Main issues:
- what is the internal representation?
- how detailed is memory?
- do we really have a detailed internal representation at all!!?
Gist: Can very quickly (120ms) classify entire scenes or do simple
recognition tasks; can only shift attention twice in that much time!
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Thorpe: Recognizing Whether a Scene Contains an Animal
0
200
400
600
800
1000
1200
1400
0 200 400 600 800 1000
Reaction Time
Distractors
Targets
ERP difference onset
A n i m alN o n - a n i m alDifference
Mean of 15 subjects-6
6µV
100 200 300 ms
A.
B.
Minimum ResponseTime
Claim: This is so quick that only feedforward processing can be involved
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Eye Movements: Beyond Feedforward Processing
1) Examine scene freely
2) estimate material
circumstances of family
3) give ages of the people
4) surmise what family has
been doing before arrival
of “unexpected visitor”
5) remember clothes worn by
the people
6) remember position of people
and objects
7) estimate how long the “unexpected
visitor” has been away from family
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
The World as an Outside Memory
Kevin O’Regan, early 90s:
why build a detailed internal representation of the world?
too complex…
not enough memory…
… and useless?
The world is the memory. Attention and the eyes are a look-up tool!
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
The “Attention Hypothesis”
Rensink, 2000
No “integrative buffer”
Early processing extracts information up to “proto-object” complexity in massively parallel manner
Attention is necessary to bind the different proto-objects into complete objects, as well as to bind object and location
Once attention leaves an object, the binding “dissolves.” Not a problem, it can be formed again whenever needed, by shifting attention back to the object.
Only a rather sketchy “virtual representation” is kept in memory, and attention/eye movements are used to gather details as needed
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Challenges of Object Recognition
The binding problem: binding different features (color, orientation, etc)
to yield a unitary percept. (see next slide)
Bottom-up vs. top-down processing: how
much is assumed top-down vs. extracted
from the image?
Perception vs. recognition vs. categorization: seeing an object vs. seeing
is as something. Matching views of known objects to memory vs.
matching a novel object to object categories in memory.
Viewpoint invariance: a major issue is to recognize objects irrespective
of the viewpoint from which we see them.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Four stages of representation (Marr, 1982)
1) pixel-based (light intensity)
2) primal sketch (discontinuities in intensity)
3) 2 ½ D sketch (oriented surfaces, relative depth between surfaces)
4) 3D model (shapes, spatial relationships, volumes)
TMB2 view: This may work in ideal cases, but in general “cooperative
computation” of multiple visual cues and perceptual schemas will be
required.
problem: computationally intractable!
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
VISIONS
A computer vision system from 1987 developed by
Allen Hanson and Edward Riseman on the basis of
the HEARSAY system for speech understanding (TMB2 Sec. 4.2)
and Arbib’s Schema Theory (TMB2 Sec. 2.2 and Chap. 5)
This is schema-based and can be “mapped” onto hypotheses
about cooperative computation in the brain.
Key idea: Bringing context and scene knowledge into play so that
recognition of objects proceeds via islands of reliability to yield a
consensus interpretation of the scene.
See TMB2 Sec. 5.2 for the figures.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Biederman: Recognition by Components
Biederman et al. (1991 – )
“geons”: units of
3D geometric structure
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
JIM 3 (Hummel)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Collection of Fragments (Edelman and Intrator)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Collection of Fragments 2
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Viewpoint Invariance
Major problem for recognition.
Biederman & Gerhardstein, 1994:
We can recognize two views of an unfamiliar object as being the same
object.
Thus, viewpoint invariance cannot only rely on matching views to
memory.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Models of Object Recognition
See Hummel, 1995, The Handbook of Brain Theory & Neural Networks
Direct Template Matching:
Processing hierarchy yields activation of view-tuned units.
A collection of view-tuned units is associated with one object.
View tuned units are built from V4-like units,
using sets of weights which differ for each object.
e.g., Poggio & Edelman, 1990; Riesenhuber & Poggio, 1999
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Computational Model of Object Recognition
(Riesenhuber and Poggio, 1999)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
the model neurons are
tuned for size
and 3D orientation
of object
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Models of Object Recognition
Hierarchical Template Matching:
Image passed through layers of units with progressively more complex
features at progressively less specific locations.
Hierarchical in that features at one stage are built from features at
earlier stages.
e.g., Fukushima & Miyake (1982)’s Neocognitron:
Several processing layers, comprising
simple (S) and complex (C) cells.
S-cells in one layer respond to conjunc-
tions of C-cells in previous layer.
C-cells in one layer are excited by
small neighborhoods of S-cells.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Models of Object Recognition
Transform & Match:
First take care of rotation, translation, scale, etc. invariances.
Then recognize based on standardized pixel representation of objects.
e.g., Olshausen et al, 1993,
dynamic routing model
Template match: e.g., with
an associative memory based on
a Hopfield network.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Recognition by Components
Structural approach to object recognition:
Biederman, 1987:
Complex objects are composed so simpler pieces
We can recognize a novel/unfamiliar object by parsing it in terms of its
component pieces, then comparing the assemblage of pieces to those of
known objects.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Recognition by components (Biederman, 1987)
GEONS: geometric elements of which all objects are composed
(cylinders, cones, etc). On the order of 30 different shapes.
Skips 2 ½ D sketch: Geons are directly recognized from edges, based
on their nonaccidental properties (i.e., 3D features that are usually
preserved by the projective imaging process).
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Basic Properties of GEONs
They are sufficiently different from each other to be easily
discriminated
They are view-invariant (look identical from most viewpoints)
They are robust to noise (can be identified even with parts of image
missing)
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Support for RBC: We can recognize partially occluded
objects easily if the occlusions do not obscure the set
of geons which constitute the object.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Potential difficulties
Edelman, 1997
A. Structural description not
enough, also need metric info
B. Difficult to extract geons
from real images
C. Ambiguity in the structu-
ral description: most often
we have several candidates
D. For some objects,
deriving a structural repre-
sentation can be difficult
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Geon Neurons in IT?
These are preferred
stimuli for some IT neurons.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Fusiform Face Area in Humans
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
representation
• Image specific
• Supports fine
discrimination
• Noise tolerant
• Image invariant
• Supports
generalization
• Noise sensitive
visual processing
Standard View on Visual Processing
Tjan, 1999
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Early visual processing
Face
Place
Common objects ?(e.g. Kanwisher et al; Ishai et al)
primary visual processing
(Tjan, 1999) Multiple memory/decision sites
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
primary visual processing
memory memory memory...
“R1” “Ri” “Rn”Independent
Decisions
t1 ti tnDelays
Homunculus’
Response the first arriving response
Sensory
Memory
Tjan’s “Recognition by Anarchy”
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
A toy visual system
Task: Identify letters from arbitrary
positions & orientations
“e”
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
normalize
position
normalize
orientationImage
down-
sampling
memory
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
memory
normalize
position
normalize
orientationImage
down-
sampling
memory memorySite 1 Site 2 Site 3
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Test stimuli:1) familiar (studied) views,
2) new positions,
3) new position & orientations
1800 {30%} 1500 {25%} 800 {20%} 450 {15%} 210 {10%}
Signal-to-Noise Ratio {RMS Contrast}
Study stimuli:5 orientations 20 positions at high SNR
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
raw image
norm. pos.
norm. ori.
Site 3
Site 2
Site 1
Processing speed for each recognition module depends
on recognition difficulty by that module.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
0
0.2
0.4
0.6
0.8
1
10 100
0
0.2
0.4
0.6
0.8
1
10 100
0
0.2
0.4
0.6
0.8
1
10 100
Pro
port
ion C
orr
ect
Contrast (%)
Familiar views Novel positionsNovel positions
& orientations
raw image
norm. pos.
norm. ori.
Site 3
Site 2
Site 1
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
0
0.2
0.4
0.6
0.8
1
10 100
0
0.2
0.4
0.6
0.8
1
10 100
0
0.2
0.4
0.6
0.8
1
10 100
Novel positionsNovel positions
& orientations
Pro
port
ion C
orr
ect
raw image
norm. pos.
norm. ori.
Site 3
Site 2
Site 1
Contrast (%)
Familiar views
Black curve: full model in which recognition is based
on the fastest of the responses from the three stages.
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition
Experimental techniques in visual neuroscience
- Recording from neurons: electrophysiology
- Multi-unit recording using electrode arrays
- Stimulating while recording
- Anesthetized vs. awake animals
- Single-neuron recording in awake humans
- Probing the limits of vision: visual psychophysics
- Functional neuroimaging: Techniques
- Experimental design issues
- Optical imaging
- Transcranial magnetic stimulation