+ All Categories
Home > Documents > Object Recognition

Object Recognition

Date post: 01-Jan-2016
Category:
Upload: ashton-villarreal
View: 41 times
Download: 4 times
Share this document with a friend
Description:
Object Recognition. Outline: Introduction Representation: Concept Representation: Features Learning & Recognition Segmentation & Recognition. Credits: major sources of material, including figures and slides were: - PowerPoint PPT Presentation
38
Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 1 Object Recognition Outline: Introduction Representation: Concept Representation: Features Learning & Recognition Segmentation & Recognition
Transcript

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 1

Object RecognitionObject Recognition

Outline:

• Introduction

• Representation: Concept

• Representation: Features

• Learning & Recognition

• Segmentation & Recognition

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 2

Credits: major sources of material, including figures and slides were:

• Riesenhuber & Poggio, Hierarchical models of object recognition in cortex. Nature Neuroscience, 1991.

• B. Mel. SeeMore. Neural Computation, 1997.

• Ullman, Vidal-Naquet, Sari. Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 2002.

• David G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. of Computer Vision, 2004.

• and various resources on the WWW

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 3

Why is it difficult?Why is it difficult?

• position/pose/scale• lighting/shadows

• articulation/expression• partial occlusion

Because appearance drastically varies with:

need invariant recognition!

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 4

The “Classical View”The “Classical View”Historically:

Segmentation

Feature Extractio

n

Recognition

Problem:Bottom-up segmentation only works in very limited range of situations! This architecture is fundamentally flawed!

Image

Two ways out: 1) “direct” recognition, 2) integration of seg.&rec.

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 5

Ventral StreamVentral Stream

→ larger RFs, higher “complexity”, higher invariance →

V1 V2 V4 IT

edges, bars objects, faces

D.vanEssen (V2) K.Tanaka (IT)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 6

Basic ModelsBasic Models

seminal work by Fukushima, newer version by Riesenhuber and Poggio

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 7

QuestionsQuestions• what are the intermediate features?

• how/why are they being learned?

• how is invariance computation implemented?• what nonlinearities; at what level (dendrites?)

• how is invariance learned?• temporal continuity; role of eye movements

• basic model is feedforward, what do feedback connections do?• attention/segmentation/bayesian inference?

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 8

Representation: ConceptRepresentation: Concept• 3-d models: won’t talk about

• view-based:

• holistic descriptions of a view

• invariant features/histogram techniques

• spatial constellation of localized features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 9

Holistic Descriptions I:Holistic Descriptions I:TemplatesTemplates

Idea:• compare image (regions) directly to template• image patches, object template are represented as

high-dimensional vectors• simple comparison metrics (Euclidean distance,

normalized correlation, ...)

Problem:• such metrics not robust w.r.t. even small changes in

position/aspect/scale changes or deformations difficult to achieve invariance

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 10

Holistic Descriptions II:Holistic Descriptions II:Eigenspace ApproachEigenspace Approach

Somewhat better: “Eigenspace” approaches• perform Principal Component Analysis (PCA) on

training images (e.g. “Eigenfaces”• compare images by projecting on subset of the PCs

Turk&Pentland (1992)Murase&Nayar (1995)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 11

AssessmentAssessment

• quite successful for segmented and carefully aligned images (e.g., eyes and nose are at the same pixel coordinates in all images)

• but similar problems as above:• not well-suited for clutter• problems with occlusions• some notable extensions trying to deal with this

(e.g., Leonardis, 1996,1997)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 12

Feature HistogramsFeature HistogramsIdea: reach invariance by computing invariant featuresExamples: Mel (1997), Schiele&Crowley (1997,2000)

histogram pooling: throw occurrences of simple feature from all image regions together into one “bin”

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 13

Assessment:• works very well for segmented images with• only one object, but...

Problem:• histograms of simple features over the whole image

leads to a “superposition catastrophe”, lacks a “binding” mechanism

• consider several objects in scene: histogram contains all their features; no representation of which features came from same object

• system breaks down for clutter or complex backgrounds

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 14

B.

Mel (1

99

7)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 15

Training and test images, performance:

A B C D E

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 16

Feature ConstellationsFeature Constellations

Elastic Matching Techniques:Fischler&Elschlager (1973), Lades et.al. (1993)

Tremendously successful for:• face finding/recognition• object recognition• gesture recognition• cluttered scene analysis

“Elastic Graph Matching”(EGM)

Observation:holistic templates and histogram techniques can´t handle cluttered scenes well

Idea:How about constellations of features?E.g. face is constellation of eyes, nose, mouth, etc.

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 17

Representation: Representation: FeaturesFeatures

Only discuss local features:

• image patches

• wavelet basis, e.g., Haar, Gabor

• complex features, e.g., SIFT (= Scale Invariant Feature Transform)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 18

Image PatchesImage Patches

likelihood ratio:

“merit”:

weight:

Ullm

an

, V

idal-

Naq

uet,

Sali

(20

02

)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 19

Intermediate complexity is best: (trivial result, really)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 20

Recognition examples:

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 21

Gabor WaveletsGabor Wavelets

image space frequency space

• in frequency space Gabor wavelet is a Gaussian• “wavelet”: different wavelets are scaled/rotated versions of a mother wavelet

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 22

Gabor Wavelets as Gabor Wavelets as filtersfilters

Gabor filters: sin() and cos() part

compute correlation of image withfilter at every location x0:

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 23

Tiling of frequency space: Tiling of frequency space: JetsJets

measured frequency tuning of biological neurons (left) and dense coverage

applying different Gabor filters (with different k) to sameimage location gives vector of filter responses: Jet

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 24

SIFT FeaturesSIFT Features• step 1: find scale space extrema

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 25

• step 2: apply contrast and curvature requirements

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 26

• step 3: local image descriptor extracted at key points is a 128-dim vector

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 27

Learning and Learning and RecognitionRecognition

• top-down model matching• Elastic graph matching

• bottom-up indexing• with or without shared features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 28

Elastic Graph Matching Elastic Graph Matching (EGM)(EGM)

“view based”: need differentgraphs for different views

Representation:graph nodes labelled with Jets (Gabor filterresponses of different scales/orientations)

Matching:Minimize cost function that punishesdissimilarities of Gabor responses anddistortions of the graph through stochasticoptimization techniques

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 29

Bunch GraphsBunch GraphsIdea: add invariance by labelling graph nodes with collectionor bunch of different feature exemplars (Wiskott et.al.,1995, 1997)

Advantage: can decouple finding the facial features from the identification

Matching uses a MAX rule.

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 30

Indexing MethodsIndexing Methods

• when you want to recognize very many objects, it’s inefficient to individually check for each model by searching for all of its features in a top-down fashion

• better: indexing methods• also: share features among object models

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 31

Recognition with SIFT Recognition with SIFT featuresfeatures

• recognition: extract SIFT features; match to nearest neighbor in data base of stored features; use Hough transform to pool votes

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 32

Recognition with Gabor Jets Recognition with Gabor Jets and Color Featuresand Color Features

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 33

Scaling Behavior when Scaling Behavior when Sharing Features between Sharing Features between

modelsmodels

• Recognition speed limited more by number of features rather than number of object models, modest number of features o.k.• can incorporate many feature types• can incorporate stereo (reasoning about occlusions)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 34

Hierarchies of FeaturesHierarchies of Features

Long history of using hierarchies:Fukushima’s Neocognitron (1983),Nelson&Selinger (1998,1999):

Advantages using hierarchy:• faster learning and processing• better grip on correlated deformations• easier to find proper specificity vs. invariance tradeoff?

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 35

Feature LearningFeature Learning• Unsupervised clustering: not necessarily optimal for

discrimination

• Use big bag of features, fish out the useful ones (e.g. via boosting: Viola, 1997): takes very long to train, since you have to consider every feature from that big bag

• Note: usefulness of one feature depends on the which other ones you’re using already.

• Learn higher level features as (nonlinear) combinations of lower level features (Perona et.al., 2000): also takes very long to train, only up to 5 features. But could use locality constraint

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 36

FeedbackFeedback

Question: Why all the feedback connections in the brain?Important for on-line processing?

Neuroscience: Object recognition in 150 ms (Thorpe et.al., 1996), but interesting temporal response properties of IT neurons (Oram&Richmond, 1999); some V1 neurons “restore” line behind an occluder

Idea: Feed-forward architecture: can’t correct errors made at early stages later on. Feedback architecture can!

“High level hypotheses try to reinforce their lower level evidence while hypotheses compete at all levels.”

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 37

Recognition & SegmentationRecognition & Segmentation

• Basic Idea: integrate recognition with segmentation in a feedback architecture:

• object hypotheses reinforce their supporting evidence and inhibit competing evidence, suppressing features that do not belong to them (idea goes back to at least the PDP books)

• at the same time: restore missing features due to partial occlusion (associative memory property)

Jochen Triesch, UC San Diego, http://cogsci.ucsd.edu/~triesch 38

Current work in this areaCurrent work in this area

• mostly demonstrating how recognition can aid segmentation

• what is missing is a clear and elegant demonstration of a truly integrated system that shows how the two kinds of processing help each other

• Maybe don’t treat as two kinds of processing but one inference problem

• how best to do this? “million dollar question”


Recommended