Lecture 14: Introduction to Object Recognition Bag of Words...

Post on 07-Aug-2020

0 views 0 download

transcript

Lecture 14Fei-Fei Li

Lecture 14: Introduction to Object Recognition & Bag‐of‐Words (BoW) Models

Professor Fei‐Fei LiStanford Vision Lab

8‐Nov‐111

Lecture 14Fei-Fei Li

What we will learn today?

• Introduction to object recognition– Representation– Learning– Recognition

• Bag of Words models (Problem Set 4 (Q2))– Basic representation– Different learning and recognition algorithms

8‐Nov‐112

Lecture 14Fei-Fei Li

What are the different visual recognition tasks?

8‐Nov‐113

Lecture 14Fei-Fei Li

Classification: Does this image contain a building? [yes/no]

Yes!

8‐Nov‐114

Lecture 14Fei-Fei Li

Classification:Is this an beach?

8‐Nov‐115

Lecture 14Fei-Fei Li

Image Search

Organizing photo collections

8‐Nov‐116

Lecture 14Fei-Fei Li

Detection:Does this image contain a car? [where?]

car

8‐Nov‐117

Lecture 14Fei-Fei Li

Building

clock

personcar

Detection:Which object does this image contain? [where?]

8‐Nov‐118

Lecture 14Fei-Fei Li

clock

Detection:Accurate localization (segmentation)

8‐Nov‐119

Lecture 14Fei-Fei Li

Object: Person, back;1‐2 meters away

Object: Police car, side view, 4‐5 m away

Object: Building, 45º pose, 8‐10 meters awayIt has bricks

Detection: Estimating object semantic & geometric attributes

8‐Nov‐1110

Lecture 14Fei-Fei Li

Applications of computer vision

SurveillanceAssistive technologies

Security Assistive driving

Computational photography

8‐Nov‐1111

Lecture 14Fei-Fei Li

Categorization vs Single instance recognitionDoes this image contain the Chicago Macy building’s?

8‐Nov‐1112

Lecture 14Fei-Fei Li

Where is the crunchy nut?

Categorization vs Single instance recognition

8‐Nov‐1113

Lecture 14Fei-Fei Li

+ GPS

•Recognizing landmarks in mobile platforms

Applications of computer vision

8‐Nov‐1114

Lecture 14Fei-Fei Li

Activity or Event recognitionWhat are these people doing?

8‐Nov‐1115

Lecture 14Fei-Fei Li

Visual Recognition

• Design algorithms that are capable to–Classify images or videos–Detect and localize objects– Estimate semantic and geometrical attributes

– Classify human activities and events

Why is this challenging?8‐Nov‐1116

Lecture 14Fei-Fei Li

How many object categories are there?

8‐Nov‐1117

Lecture 14Fei-Fei Li

Challenges: viewpoint variation

Michelangelo 1475-1564

8‐Nov‐1118

Lecture 14Fei-Fei Li

Challenges: illumination

image credit: J. Koenderink

8‐Nov‐1119

Lecture 14Fei-Fei Li

Challenges: scale

8‐Nov‐1120

Lecture 14Fei-Fei Li

Challenges: deformation

8‐Nov‐1121

Lecture 14Fei-Fei Li

Challenges: occlusion

Magritte, 1957

8‐Nov‐1122

Lecture 14Fei-Fei Li

Challenges: background clutter

Kilmeny Niland. 1995

8‐Nov‐1123

Lecture 14Fei-Fei Li

Challenges: intra‐class variation

8‐Nov‐1124

Lecture 14

• Turk and Pentland, 1991• Belhumeur, Hespanha, & Kriegman, 1997• Schneiderman & Kanade 2004• Viola and Jones, 2000

• Amit and Geman, 1999• LeCun et al. 1998• Belongie and Malik, 2002

• Schneiderman & Kanade, 2004• Argawal and Roth, 2002• Poggio et al. 1993

Some early works on object categorization

8‐Nov‐11

Lecture 14Fei-Fei Li

Basic issues

• Representation– How to represent an object category; which classification scheme?

• Learning– How to learn the classifier, given training data

• Recognition– How the classifier is to be used on novel data

8‐Nov‐1126

Lecture 14Fei-Fei Li

Representation‐ Building blocks: Sampling strategies

RandomlyMultiple interest operators

Interest operators Dense, uniformly 

Image cred

its: L. Fei‐Fei, E. N

owak, J. Sivic

8‐Nov‐1127

Lecture 14Fei-Fei Li

Representation– Appearance only or location and appearance

8‐Nov‐1128

Lecture 14Fei-Fei Li

Representation

–Invariances• View point• Illumination• Occlusion• Scale• Deformation• Clutter• etc.

8‐Nov‐1129

Lecture 14Fei-Fei Li

Representation

– To handle intra‐class variability, it is convenient to  describe an object categories using probabilistic models

– Object models: Generative vs Discriminative vs hybrid

8‐Nov‐1130

Lecture 14

Object categorization: the statistical viewpoint

)|( imagezebrap

)( ezebra|imagnopvs.

)|()|(

imagezebranopimagezebrap

• Bayes rule:

8‐Nov‐1131

Lecture 14

Object categorization: the statistical viewpoint

)|( imagezebrap

)( ezebra|imagnopvs.

• Bayes rule:

)()(

)|()|(

)|()|(

zebranopzebrap

zebranoimagepzebraimagep

imagezebranopimagezebrap

posterior ratio likelihood ratio prior ratio

8‐Nov‐11

Lecture 14

Object categorization: the statistical viewpoint

• Bayes rule:

)()(

)|()|(

)|()|(

zebranopzebrap

zebranoimagepzebraimagep

imagezebranopimagezebrap

posterior ratio likelihood ratio prior ratio

• Discriminative methods model posterior

• Generative methods model likelihood and prior

8‐Nov‐11

Lecture 14Fei-Fei Li

Discriminative models

Zebra

Non‐zebra

Decisionboundary

)|()|(

imagezebranopimagezebrap

• Modeling the posterior ratio: 

8‐Nov‐1134

Lecture 14Fei-Fei Li

Discriminative models

Support Vector Machines

Guyon, Vapnik,  Heisele, Serre, Poggio…

Boosting

Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...

Neural networks

Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba

Latent SVMStructural  SVM

Felzenszwalb 00Ramanan 03…

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

8‐Nov‐1135

Lecture 14Fei-Fei Li

Generative models• Modeling the likelihood ratio: 

)|()|(

zebranoimagepzebraimagep

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

clas

s de

nsiti

es

p(x|C1)

p(x|C2)

x

8‐Nov‐1136

Lecture 14

)|( zebranoimagep)|( zebraimagep

Generative models

0

1

2

3

4

5

clas

s de

nsiti

es

p(x|C1)

p(x|C2)

High Low

Low  High

8‐Nov‐1137

Lecture 14Fei-Fei Li

Generative models• Naïve Bayes classifier

– Csurka Bray, Dance & Fan, 2004

• Hierarchical Bayesian topic models  (e.g. pLSAand LDA)

– Object categorization: Sivic et al. 2005, Sudderth et al. 2005– Natural scene categorization: Fei‐Fei et al. 2005

• 2D Part based models‐ Constellation models: Weber et al 2000; Fergus et al 200‐ Star models: ISM (Leibe et al 05)

• 3D part based models: ‐multi‐aspects: Sun, et al, 2009

8‐Nov‐1138

Lecture 14Fei-Fei Li

Basic issues

• Representation– How to represent an object category; which classification scheme?

• Learning– How to learn the classifier, given training data

• Recognition– How the classifier is to be used on novel data

8‐Nov‐1139

Lecture 14Fei-Fei Li

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

Learning

8‐Nov‐1140

Lecture 14Fei-Fei Li

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

• Level of supervision• Manual segmentation; bounding box; image labels; noisy labels

Learning

• Batch/incremental 

• Priors

8‐Nov‐1141

Lecture 14Fei-Fei Li

• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

• Level of supervision• Manual segmentation; bounding box; image labels; noisy labels

Learning

• Batch/incremental 

• Training images:•Issue of overfitting•Negative images for discriminative methods

• Priors

8‐Nov‐1142

Lecture 14Fei-Fei Li

Basic issues

• Representation– How to represent an object category; which classification scheme?

• Learning– How to learn the classifier, given training data

• Recognition– How the classifier is to be used on novel data

8‐Nov‐1143

Lecture 14Fei-Fei Li

– Recognition task: classification, detection, etc..

Recognition

8‐Nov‐1144

Lecture 14Fei-Fei Li

Recognition– Recognition task– Search strategy: Sliding Windows

• Simple• Computational complexity (x,y, S, , N of classes)

‐ BSW by Lampert et al 08

‐ Also, Alexe, et al 10

Viola, Jones 2001, 

8‐Nov‐1145

Lecture 14Fei-Fei Li

Recognition– Recognition task– Search strategy: Sliding Windows

• Simple• Computational complexity (x,y, S, , N of classes)

• Localization• Objects are not boxes

‐ BSW by Lampert et al 08

‐ Also, Alexe, et al 10

Viola, Jones 2001, 

8‐Nov‐1146

Lecture 14Fei-Fei Li

Recognition– Recognition task– Search strategy: Sliding Windows

• Simple• Computational complexity (x,y, S, , N of classes)

• Localization• Objects are not boxes• Prone to false positive

‐ BSW by Lampert et al 08

‐ Also, Alexe, et al 10

Non max suppression: Canny ’86….Desai et al , 2009

Viola, Jones 2001, 

8‐Nov‐1147

Lecture 14Fei-Fei Li

Recognition

Category: carAzimuth = 225ºZenith = 30º

•Savarese, 2007 •Sun et al 2009• Liebelt et al., ’08, 10•Farhadi et al 09

‐ It has metal‐ it is glossy‐ has wheels

•Farhadi et al 09 • Lampert  et al 09• Wang & Forsyth 09 

– Recognition task– Search strategy– Attributes

8‐Nov‐1148

Lecture 14Fei-Fei Li

Semantic:•Torralba et al 03• Rabinovich et al 07• Gupta & Davis 08• Heitz & Koller 08• L‐J Li et al 08• Yao & Fei‐Fei 10

Recognition– Recognition task– Search strategy– Attributes– Context

Geometric• Hoiem, et al  06• Gould et al 09• Bao, Sun, Savarese 10

8‐Nov‐1149

Lecture 14Fei-Fei Li

Basic issues

• Representation– How to represent an object category; which classification scheme?

• Learning– How to learn the classifier, given training data

• Recognition– How the classifier is to be used on novel data

8‐Nov‐1150

Lecture 14Fei-Fei Li

Part 1: Bag‐of‐words models

This segment is based on the tutorial “Recognizing and Learning Object Categories: Year 2007”, by Prof L. Fei‐Fei, A. Torralba, and R. Fergus

8‐Nov‐1151

Lecture 14Fei-Fei Li

Related works

• Early “bag of words” models: mostly texture recognition– Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; 

Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003;

• Hierarchical Bayesian models for documents (pLSA, LDA, etc.)– Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal & Blei, 2004

• Object categorization– Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros, Freeman & 

Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005;

• Natural scene categorization– Vogel & Schiele, 2004; Fei‐Fei & Perona, 2005; Bosch, Zisserman & 

Munoz, 2006

8‐Nov‐1152

Lecture 14Fei-Fei Li

Object Bag of ‘words’

8‐Nov‐1153

Lecture 14Fei-Fei Li

Analogy to documentsOf all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical

nerve, imageHubel, Wiesel

China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.

China, trade, surplus, commerce,

exports, imports, US, yuan, bank, domestic,

foreign, increase, trade, value

8‐Nov‐1154

Lecture 14Fei-Fei Li

– Independent features

definition of “BoW”

face bike violin

8‐Nov‐1155

Lecture 14Fei-Fei Li

definition of “BoW”– Independent features – histogram representation

codewords dictionary

8‐Nov‐1156

Lecture 14Fei-Fei Li

categorydecision

Representation

feature detection& representation

codewords dictionary

image representation

category models(and/or) classifiers

recognitionle

arni

ng

8‐Nov‐1157

Lecture 14Fei-Fei Li

1.Feature detection and representation

8‐Nov‐1158

Lecture 14Fei-Fei Li

1.Feature detection and representation

• Regular grid– Vogel & Schiele, 2003– Fei‐Fei & Perona, 2005

8‐Nov‐1159

Lecture 14Fei-Fei Li

1.Feature detection and representation

• Regular grid– Vogel & Schiele, 2003– Fei‐Fei & Perona, 2005

• Interest point detector– Csurka, et al. 2004– Fei‐Fei & Perona, 2005– Sivic, et al. 2005

8‐Nov‐1160

Lecture 14Fei-Fei Li

1.Feature detection and representation

• Regular grid– Vogel & Schiele, 2003– Fei‐Fei & Perona, 2005

• Interest point detector– Csurka, Bray, Dance & Fan, 2004– Fei‐Fei & Perona, 2005– Sivic, Russell, Efros, Freeman & Zisserman, 2005

• Other methods– Random sampling (Vidal‐Naquet & Ullman, 2002)– Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003)

8‐Nov‐1161

Lecture 14Fei-Fei Li

1.Feature detection and representation

Normalize patch

Detect patches[Mikojaczyk and Schmid ’02]

[Mata, Chum, Urban & Pajdla, ’02]

[Sivic & Zisserman, ’03]

Compute SIFT

descriptor[Lowe’99]

Slide credit: Josef Sivic

8‐Nov‐1162

Lecture 14Fei-Fei Li

1.Feature detection and representation

8‐Nov‐1163

Lecture 14Fei-Fei Li

2. Codewords dictionary formation

8‐Nov‐1164

Lecture 14Fei-Fei Li

2. Codewords dictionary formation

Clustering/vector quantization

Cluster center= code word

8‐Nov‐1165

Lecture 14Fei-Fei Li

2. Codewords dictionary formation

Fei-Fei et al. 2005

8‐Nov‐1166

Lecture 14Fei-Fei Li

Image patch examples of codewords

Sivic et al. 2005

8‐Nov‐1167

Lecture 14Fei-Fei Li

Visual vocabularies: Issues

• How to choose vocabulary size?– Too small: visual words not representative of all patches– Too large: quantization artifacts, overfitting

• Computational efficiency– Vocabulary trees 

(Nister & Stewenius, 2006)

8‐Nov‐1168

Lecture 14Fei-Fei Li

3. Bag of word representation

Codewords dictionary • Nearest neighbors assignment• K‐D tree search strategy

8‐Nov‐1169

Lecture 14Fei-Fei Li

3. Bag of word representation

Codewords dictionary codewords

frequ

ency

….

8‐Nov‐1170

Lecture 14Fei-Fei Li

feature detection& representation

codewords dictionary

image representation

Representation

1.2.

3.

8‐Nov‐1171

Lecture 14Fei-Fei Li

categorydecision

codewords dictionary

category models(and/or) classifiers

Learning and Recognition

8‐Nov‐1172

Lecture 14Fei-Fei Li

category models(and/or) classifiers

Learning and Recognition

1. Discriminative method: - NN- SVM

2.Generative method: - graphical models

8‐Nov‐1173

Lecture 14Fei-Fei Li

category models

Class 1 Class N

… ……

Discriminative classifiers

Model space

8‐Nov‐1174

Lecture 14Fei-Fei Li

Discriminative classifiers

Query image

Winning class: pink

Model space

8‐Nov‐1175

Lecture 14Fei-Fei Li

Nearest Neighborsclassifier

Query image

Winning class: pink

• Assign label of nearest training data point to each test data point 

Model space

8‐Nov‐1176

Lecture 14Fei-Fei Li

Query image

• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify• Works well provided there is lots of data and the distance function is good

K- Nearest Neighborsclassifier

Model space

Winning class: pink

8‐Nov‐1177

Lecture 14Fei-Fei Li

• For k dimensions: k‐D tree = space‐partitioning data structure for organizing points in a k‐dimensional space• Enable efficient search 

from Duda et al.

K- Nearest Neighborsclassifier

• Voronoi partitioning of feature space for 2‐category 2‐D and 3‐D data

• Nice tutorial: http://www.cs.umd.edu/class/spring2002/cmsc420‐0401/pbasic.pdf

8‐Nov‐1178

Lecture 14Fei-Fei Li

Functions for comparing histograms• L1 distance

• χ2 distance

• Quadratic distance (cross‐bin)

N

iihihhhD

12121 |)()(|),(

Jan Puzicha, Yossi Rubner, Carlo Tomasi, Joachim M. Buhmann: Empirical Evaluation of Dissimilarity Measures for Color and Texture. ICCV 1999

N

i ihihihihhhD

1 21

221

21 )()()()(),(

ji

ij jhihAhhD,

22121 ))()((),(

8‐Nov‐1179

Lecture 14Fei-Fei Li

Learning and Recognition

1. Discriminative method: - NN- SVM

2.Generative method: - graphical models

8‐Nov‐1180

Lecture 14Fei-Fei Li

Discriminative classifiers(linear classifier)

Model spacecategory models

Class 1 Class N

… ……

8‐Nov‐1181

Lecture 14Fei-Fei Li

Support vector machines• Find hyperplane that maximizes the margin between the positive and 

negative examples

MarginSupport vectors

Distance between point and hyperplane: ||||

||wwx bi

Support vectors: 1 bi wx

Margin = 2 / ||w||

Credit slide: S. Lazebnik

i iii y xw

bybi iii xxxw

Classification function  (decision boundary):

Solution:

8‐Nov‐1182

Lecture 14Fei-Fei Li

Support vector machines• Classification

Margin

bybi iii xxxw

2010

classbifclassbif

wxwx

Test point

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

8‐Nov‐1183

Lecture 14Fei-Fei Li

• Datasets that are linearly separable work out great:•

• But what if the dataset is just too hard? 

• We can map it to a higher‐dimensional space:

0 x

0 x

0 x

x2

Nonlinear SVMs

Slide credit: Andrew Moore

8‐Nov‐1184

Lecture 14Fei-Fei Li

Φ: x→ φ(x)

Nonlinear SVMs• General idea: the original input space can always be mapped 

to some higher‐dimensional feature space where the training set is separable:

Slide credit: Andrew Moorelifting transformation

8‐Nov‐1185

Lecture 14Fei-Fei Li

Nonlinear SVMs• Nonlinear decision boundary in the original feature space:

bKyi

iii ),( xx

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

•The kernel K = product of the lifting transformation φ(x):

K(xi,xjj) = φ(xi ) · φ(xj)NOTE:• It is not required to compute φ(x) explicitly:• The kernel must satisfy the “Mercer inequality” 

8‐Nov‐1186

Lecture 14Fei-Fei Li

Kernels for bags of features

• Histogram intersection kernel:

• Generalized Gaussian kernel:

• D can be Euclidean distance, χ2 distance etc…

N

iihihhhI

12121 ))(),(min(),(

2

2121 ),(1exp),( hhDA

hhK

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007

8‐Nov‐1187

Lecture 14Fei-Fei Li

Pyramid match kernel• Fast approximation of Earth Mover’s Distance• Weighted sum of histogram intersections at mutliple resolutions (linear in 

the number of features instead of cubic)

K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV 2005.

8‐Nov‐1188

Lecture 14Fei-Fei Li

Spatial Pyramid Matching

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.  S. Lazebnik, C. Schmid, and J. Ponce. CVPR 2006

8‐Nov‐1189

Lecture 14Fei-Fei Li

What about multi‐class SVMs?

• No “definitive” multi‐class SVM formulation• In practice, we have to obtain a multi‐class SVM by combining 

multiple two‐class SVMs • One vs. others

– Traning: learn an SVM for each class vs. the others– Testing: apply each SVM to test example and assign to it the class of 

the SVM that returns the highest decision value

• One vs. one– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to the test 

example

Credit slide: S. Lazebnik

8‐Nov‐1190

Lecture 14Fei-Fei Li

SVMs: Pros and cons• Pros

– Many publicly available SVM packages:http://www.kernel‐machines.org/software

– Kernel‐based framework is very powerful, flexible– SVMs work very well in practice, even with very small training sample sizes

• Cons– No “direct” multi‐class SVM, must combine two‐class SVMs– Computation, memory 

• During training time, must compute matrix of kernel values for every pair of examples

• Learning can take a very long time for large‐scale problems

8‐Nov‐1191

Lecture 14

Object recognition results

• ETH‐80 database  of 8 object classes (Eichhorn and Chapelle 2004)

• Features: – Harris detector– PCA‐SIFT descriptor, d=10

Kernel Complexity Recognition rateMatch [Wallraven et al.] 84%

Bhattacharyya affinity [Kondor & Jebara]

85%

Pyramid match 84%Slide credit: Kristen Grauman

8‐Nov‐11

Lecture 14Fei-Fei Li

Discriminative models

Support Vector Machines

Guyon, Vapnik,  Heisele, Serre, Poggio…

Boosting

Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005...

Neural networks

Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba

Latent SVMStructural  SVM

Felzenszwalb 00Ramanan 03…

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

8‐Nov‐1193

Lecture 14Fei-Fei Li

Learning and Recognition

1. Discriminative method: ‐ NN‐ SVM 

2.Generative method: ‐ graphical models

Model the probability distribution that produces a given bag of features

8‐Nov‐1194

Lecture 14Fei-Fei Li

Generative models

1. Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004

2. Hierarchical Bayesian text models  (pLSA and LDA)

– Background: Hoffman 2001, Blei, Ng & Jordan, 2004– Object categorization: Sivic et al. 2005, Sudderth et al. 

2005– Natural scene categorization: Fei‐Fei et al. 2005

8‐Nov‐1195

Lecture 14Fei-Fei Li

• w:  a collection of all N codewords in the imagew = [w1,w2,…,wN]

• c: category of the image

Some notations

8‐Nov‐1196

Lecture 14

wN

c

the Naïve Bayes model

)|()( cwpcp)|( wcp

8‐Nov‐11

Prior prob. of the object classes

Image likelihoodgiven the class

Graphical model

Posterior =probability that image I is of category c

Lecture 14

wN

c

the Naïve Bayes model

)|()( cwpcp

N

nn cwpcp

1

)|()(

Object classdecision

)|( wcpc

c maxarg

Likelihood of ith visual wordgiven the class

Estimated by empirical frequencies of code words in images from a given class

8‐Nov‐11

Graphical model

Lecture 14Fei-Fei Li

Csurka et al. 2004

8‐Nov‐1199

Lecture 14Fei-Fei Li

Csurka et al. 2004

8‐Nov‐11100

Lecture 14Fei-Fei Li

Other generative BoWmodels

• Hierarchical Bayesian topic models  (e.g. pLSAand LDA)

– Object categorization: Sivic et al. 2005, Sudderth et al. 2005– Natural scene categorization: Fei‐Fei et al. 2005

8‐Nov‐11101

Lecture 14Fei-Fei Li

Generative vs discriminative

• Discriminative methods– Computationally efficient & fast

• Generative models– Convenient for weakly‐ or un‐supervised, incremental training

– Prior information– Flexibility in modeling parameters

8‐Nov‐11102

Lecture 14Fei-Fei Li

• No rigorous geometric information of the object components

• It’s intuitive to most of us that objects are made of parts – no such information

• Not extensively tested yet for– View point invariance– Scale invariance

• Segmentation and localization unclear

Weakness of BoW the models

8‐Nov‐11103

Lecture 14Fei-Fei Li

What have learned today?

• Introduction to object recognition– Representation– Learning– Recognition

• Bag of Words models (Problem Set 4 (Q2))– Basic representation– Different learning and recognition algorithms

8‐Nov‐11104