Lecture 14-
Fei-Fei Li
Lecture 14:
Introduction to Object Recognition
& Bag-of-Words (BoW) Models
Professor Fei-Fei Li
Stanford Vision Lab
14-Nov-111
Lecture 14-
Fei-Fei Li
What we will learn today?
• Introduction to object recognition
– Representation
– Learning
– Recognition
• Bag of Words models (Problem Set 4 (Q2))
– Basic representation
– Different learning and recognition algorithms
14-Nov-112
Lecture 14-
Fei-Fei Li
What are the different visual recognition tasks?
14-Nov-113
Lecture 14-
Fei-Fei Li
Classification: Does this image contain a building? [yes/no]
Yes!
14-Nov-114
Lecture 14-
Fei-Fei Li
Classification:Is this an beach?
14-Nov-115
Lecture 14-
Fei-Fei Li
Image Search
Organizing photo collections
14-Nov-116
Lecture 14-
Fei-Fei Li
Detection:Does this image contain a car? [where?]
car
14-Nov-117
Lecture 14-
Fei-Fei Li
Building
clock
personcar
Detection:Which object does this image contain? [where?]
14-Nov-118
Lecture 14-
Fei-Fei Li
clock
Detection:Accurate localization (segmentation)
14-Nov-119
Lecture 14-
Fei-Fei Li
Object: Person, back;
1-2 meters away
Object: Police car, side view, 4-5 m away
Object: Building, 45º pose,
8-10 meters away
It has bricks
Detection: Estimating object semantic &
geometric attributes
14-Nov-1110
Lecture 14-
Fei-Fei Li
Applications of computer vision
SurveillanceAssistive technologies
Security Assistive driving
Computational photography
14-Nov-1111
Lecture 14-
Fei-Fei Li
Categorization vs Single instance
recognitionDoes this image contain the Chicago Macy building’s?
14-Nov-1112
Lecture 14-
Fei-Fei Li
Where is the crunchy nut?
Categorization vs Single instance
recognition
14-Nov-1113
Lecture 14-
Fei-Fei Li
+ GPS
•Recognizing landmarks in
mobile platforms
Applications of computer vision
14-Nov-1114
Lecture 14-
Fei-Fei Li
Activity or Event recognitionWhat are these people doing?
14-Nov-1115
Lecture 14-
Fei-Fei Li
Visual Recognition
• Design algorithms that are capable to
– Classify images or videos
– Detect and localize objects
– Estimate semantic and geometrical
attributes
– Classify human activities and events
Why is this challenging?14-Nov-1116
Lecture 14-
Fei-Fei Li
How many object categories are there?
14-Nov-1117
Lecture 14-
Fei-Fei Li
Challenges: viewpoint variation
Michelangelo 1475-1564
14-Nov-1118
Lecture 14-
Fei-Fei Li
Challenges: illumination
image credit: J. Koenderink
14-Nov-1119
Lecture 14-
Fei-Fei Li
Challenges: scale
14-Nov-1120
Lecture 14-
Fei-Fei Li
Challenges: deformation
14-Nov-1121
Lecture 14-
Fei-Fei Li
Challenges:
occlusion
Magritte, 1957
14-Nov-1122
Lecture 14-
Fei-Fei Li
Challenges: background clutter
Kilmeny Niland. 1995
14-Nov-1123
Lecture 14-
Fei-Fei Li
Challenges: intra-class variation
14-Nov-1124
Lecture 14-
• Turk and Pentland, 1991
• Belhumeur, Hespanha, & Kriegman, 1997
• Schneiderman & Kanade 2004
• Viola and Jones, 2000
• Amit and Geman, 1999
• LeCun et al. 1998
• Belongie and Malik, 2002
• Schneiderman & Kanade, 2004
• Argawal and Roth, 2002
• Poggio et al. 1993
Some early works on object
categorization
14-Nov-11
Lecture 14-
Fei-Fei Li
Basic issues
• Representation
– How to represent an object category; which
classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
14-Nov-1126
Lecture 14-
Fei-Fei Li
Representation- Building blocks: Sampling strategies
RandomlyMultiple interest operators
Interest operators Dense, uniformly
Ima
ge
cre
dit
s: L
. Fe
i-Fe
i, E
. N
ow
ak,
J. S
ivic
14-Nov-1127
Lecture 14-
Fei-Fei Li
Representation– Appearance only or location and appearance
14-Nov-1128
Lecture 14-
Fei-Fei Li
Representation
–Invariances
• View point
• Illumination
• Occlusion
• Scale
• Deformation
• Clutter
• etc.
14-Nov-1129
Lecture 14-
Fei-Fei Li
Representation
– To handle intra-class variability, it is convenient to
describe an object categories using probabilistic
models
– Object models: Generative vs Discriminative vs hybrid
14-Nov-1130
Lecture 14-
Object categorization:
the statistical viewpoint
)|( imagezebrap
)( ezebra|imagnopvs.
)|(
)|(
imagezebranop
imagezebrap
• Bayes rule:
14-Nov-1131
Lecture 14-
Object categorization:
the statistical viewpoint
)|( imagezebrap
)( ezebra|imagnopvs.
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap ⋅=
posterior ratio likelihood ratio prior ratio
14-Nov-11
Lecture 14-
Object categorization:
the statistical viewpoint
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap ⋅=
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and prior
14-Nov-11
Lecture 14-
Fei-Fei Li
Discriminative models
Zebra
Non-zebra
Decision
boundary
)|(
)|(
imagezebranop
imagezebrap• Modeling the posterior ratio:
14-Nov-1134
Lecture 14-
Fei-Fei Li
Discriminative models
Support Vector Machines
Guyon, Vapnik, Heisele,
Serre, Poggio…
Boosting
Viola, Jones 2001,
Torralba et al. 2004,
Opelt et al. 2006,…
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005...
Neural networks
Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba
Latent SVM
Structural SVM
Felzenszwalb 00Ramanan 03…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
14-Nov-1135
Lecture 14-
Fei-Fei Li
Generative models
• Modeling the likelihood ratio:
)|(
)|(
zebranoimagep
zebraimagep
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
clas
s de
nsiti
es
p(x|C1)
p(x|C2)
x
14-Nov-1136
Lecture 14-
)|( zebranoimagep)|( zebraimagep
Generative models
0
1
2
3
4
5
clas
s de
nsiti
es
p(x|C1)
p(x|C2)
High Low
Low High
14-Nov-1137
Lecture 14-
Fei-Fei Li
Generative models
• Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004
• Hierarchical Bayesian topic models (e.g. pLSAand LDA)
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
• 2D Part based models- Constellation models: Weber et al 2000; Fergus et al 200
- Star models: ISM (Leibe et al 05)
• 3D part based models: - multi-aspects: Sun, et al, 2009
14-Nov-1138
Lecture 14-
Fei-Fei Li
Basic issues
• Representation
– How to represent an object category; which
classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
14-Nov-1139
Lecture 14-
Fei-Fei Li
• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
Learning
14-Nov-1140
Lecture 14-
Fei-Fei Li
• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
• Level of supervision
• Manual segmentation; bounding box; image labels; noisy labels
Learning
• Batch/incremental
• Priors
14-Nov-1141
Lecture 14-
Fei-Fei Li
• Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
• Level of supervision
• Manual segmentation; bounding box; image labels; noisy labels
Learning
• Batch/incremental
• Training images:•Issue of overfitting
•Negative images for
discriminative methods
• Priors
14-Nov-1142
Lecture 14-
Fei-Fei Li
Basic issues
• Representation
– How to represent an object category; which
classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
14-Nov-1143
Lecture 14-
Fei-Fei Li
– Recognition task: classification, detection, etc..
Recognition
14-Nov-1144
Lecture 14-
Fei-Fei Li
Recognition
– Recognition task
– Search strategy: Sliding Windows
• Simple
• Computational complexity (x,y, S, θ, N of classes)
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Viola, Jones 2001,
14-Nov-1145
Lecture 14-
Fei-Fei Li
Recognition
– Recognition task
– Search strategy: Sliding Windows
• Simple
• Computational complexity (x,y, S, θ, N of classes)
• Localization
• Objects are not boxes
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Viola, Jones 2001,
14-Nov-1146
Lecture 14-
Fei-Fei Li
Recognition
– Recognition task
– Search strategy: Sliding Windows
• Simple
• Computational complexity (x,y, S, θ, N of classes)
• Localization
• Objects are not boxes
• Prone to false positive
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Non max suppression:
Canny ’86
….
Desai et al , 2009
Viola, Jones 2001,
14-Nov-1147
Lecture 14-
Fei-Fei Li
Recognition
Category: car
Azimuth = 225º
Zenith = 30º
•Savarese, 2007
•Sun et al 2009
• Liebelt et al., ’08, 10
•Farhadi et al 09
- It has metal
- it is glossy
- has wheels
•Farhadi et al 09
• Lampert et al 09
• Wang & Forsyth 09
– Recognition task
– Search strategy
– Attributes
14-Nov-1148
Lecture 14-
Fei-Fei Li
Semantic:•Torralba et al 03
• Rabinovich et al 07
• Gupta & Davis 08
• Heitz & Koller 08
• L-J Li et al 08
• Yao & Fei-Fei 10
Recognition
– Recognition task
– Search strategy
– Attributes
– Context
Geometric• Hoiem, et al 06
• Gould et al 09
• Bao, Sun, Savarese 10
14-Nov-1149
Lecture 14-
Fei-Fei Li
Basic issues
• Representation
– How to represent an object category; which
classification scheme?
• Learning
– How to learn the classifier, given training data
• Recognition
– How the classifier is to be used on novel data
14-Nov-1150
Lecture 14-
Fei-Fei Li
Part 1: Bag-of-words models
This segment is based on the tutorial “Recognizing and Learning
Object Categories: Year 2007”, by Prof L. Fei-Fei, A. Torralba, and R. Fergus
14-Nov-1151
Lecture 14-
Fei-Fei Li
Related works
• Early “bag of words” models: mostly texture recognition– Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003;
• Hierarchical Bayesian models for documents (pLSA, LDA, etc.)– Hoffman 1999; Blei, Ng & Jordan, 2004; Teh, Jordan, Beal & Blei, 2004
• Object categorization– Csurka, Bray, Dance & Fan, 2004; Sivic, Russell, Efros, Freeman &
Zisserman, 2005; Sudderth, Torralba, Freeman & Willsky, 2005;
• Natural scene categorization– Vogel & Schiele, 2004; Fei-Fei & Perona, 2005; Bosch, Zisserman &
Munoz, 2006
14-Nov-1152
Lecture 14-
Fei-Fei Li
Object Bag of ‘words’
14-Nov-1153
Lecture 14-
Fei-Fei Li
Analogy to documents
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
14-Nov-1154
Lecture 14-
Fei-Fei Li
– Independent features
definition of “BoW”
face bike violin
14-Nov-1155
Lecture 14-
Fei-Fei Li
definition of “BoW”
– Independent features
– histogram representation
codewords dictionary
14-Nov-1156
Lecture 14-
Fei-Fei Li
categorydecision
Representation
feature detection& representation
codewords dictionary
image representation
category models(and/or) classifiers
recognitionle
arni
ng
14-Nov-1157
Lecture 14-
Fei-Fei Li
1.Feature detection and representation
14-Nov-1158
Lecture 14-
Fei-Fei Li
1.Feature detection and representation
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
14-Nov-1159
Lecture 14-
Fei-Fei Li
1.Feature detection and representation
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
• Interest point detector
– Csurka, et al. 2004
– Fei-Fei & Perona, 2005
– Sivic, et al. 2005
14-Nov-1160
Lecture 14-
Fei-Fei Li
1.Feature detection and representation
• Regular grid
– Vogel & Schiele, 2003
– Fei-Fei & Perona, 2005
• Interest point detector
– Csurka, Bray, Dance & Fan, 2004
– Fei-Fei & Perona, 2005
– Sivic, Russell, Efros, Freeman & Zisserman, 2005
• Other methods
– Random sampling (Vidal-Naquet & Ullman, 2002)
– Segmentation based patches (Barnard, Duygulu, Forsyth, de Freitas, Blei, Jordan, 2003)
14-Nov-1161
Lecture 14-
Fei-Fei Li
1.Feature detection and representation
Normalize patch
Detect patches[Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02]
[Sivic & Zisserman, ’03]
Compute SIFT
descriptor[Lowe’99]
Slide credit: Josef Sivic
14-Nov-1162
Lecture 14-
Fei-Fei Li
…
1.Feature detection and representation
14-Nov-1163
Lecture 14-
Fei-Fei Li
2. Codewords dictionary formation
…
14-Nov-1164
Lecture 14-
Fei-Fei Li
2. Codewords dictionary formation
Clustering/vector quantization
…
Cluster center= code word
14-Nov-1165
Lecture 14-
Fei-Fei Li
2. Codewords dictionary formation
Fei-Fei et al. 2005
14-Nov-1166
Lecture 14-
Fei-Fei Li
Image patch examples of codewords
Sivic et al. 2005
14-Nov-1167
Lecture 14-
Fei-Fei Li
Visual vocabularies: Issues
• How to choose vocabulary size?
– Too small: visual words not representative of all patches
– Too large: quantization artifacts, overfitting
• Computational efficiency
– Vocabulary trees
(Nister & Stewenius, 2006)
14-Nov-1168
Lecture 14-
Fei-Fei Li
3. Bag of word representation
Codewords dictionary
• Nearest neighbors assignment
• K-D tree search strategy
14-Nov-1169
Lecture 14-
Fei-Fei Li
3. Bag of word representation
Codewords dictionary codewords
freq
uenc
y
….
14-Nov-1170
Lecture 14-
Fei-Fei Li
feature detection& representation
codewords dictionary
image representation
Representation
1.
2.
3.
14-Nov-1171
Lecture 14-
Fei-Fei Li
categorydecision
codewords dictionary
category models(and/or) classifiers
Learning and Recognition
14-Nov-1172
Lecture 14-
Fei-Fei Li
category models(and/or) classifiers
Learning and Recognition
1. Discriminative method: - NN- SVM
2.Generative method: - graphical models
14-Nov-1173
Lecture 14-
Fei-Fei Li
category models
Class 1 Class N
… ……
Discriminative classifiers
Model space
14-Nov-1174
Lecture 14-
Fei-Fei Li
Discriminative classifiers
Query image
Winning class: pink
Model space
14-Nov-1175
Lecture 14-
Fei-Fei Li
Nearest Neighborsclassifier
Query image
Winning class: pink
• Assign label of nearest training data point to each test data point
Model space
14-Nov-1176
Lecture 14-
Fei-Fei Li
Query image
• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify• Works well provided there is lots of data and the distance function is good
K- Nearest Neighborsclassifier
Model space
Winning class: pink
14-Nov-1177
Lecture 14-
Fei-Fei Li
• For k dimensions: k-D tree = space-partitioning data structure for organizing points in a
k-dimensional space
• Enable efficient search
from Duda et al.
K- Nearest Neighborsclassifier
• Voronoi partitioning of feature space for 2-category 2-D and 3-D data
• Nice tutorial: http://www.cs.umd.edu/class/spring2002/cmsc420-0401/pbasic.pdf
14-Nov-1178
Lecture 14-
Fei-Fei Li
Functions for comparing histograms
• L1 distance
• χ2 distance
• Quadratic distance (cross-bin)
∑=
−=N
i
ihihhhD1
2121 |)()(|),(
Jan Puzicha, Yossi Rubner, Carlo Tomasi, Joachim M. Buhmann: Empirical Evaluation of Dissimilarity Measures for Color and Texture. ICCV 1999
( )∑
= +−=
N
i ihih
ihihhhD
1 21
221
21 )()(
)()(),(
∑ −=ji
ij jhihAhhD,
22121 ))()((),(
14-Nov-1179
Lecture 14-
Fei-Fei Li
Learning and Recognition
1. Discriminative method: - NN- SVM
2.Generative method: - graphical models
14-Nov-1180
Lecture 14-
Fei-Fei Li
Discriminative classifiers(linear classifier)
Model spacecategory models
Class 1 Class N
… ……
14-Nov-1181
Lecture 14-
Fei-Fei Li
Support vector machines• Find hyperplane that maximizes the margin between the positive and
negative examples
MarginSupport vectors
Distance between point and hyperplane: ||||
||
wwx bi +⋅
Support vectors: 1±=+⋅ bi wx
Margin = 2 / ||w||
Credit slide: S. Lazebnik
∑=i iii y xw α
bybi iii +⋅=+⋅ ∑ xxxw α
Classification function (decision boundary):
Solution:
14-Nov-1182
Lecture 14-
Fei-Fei Li
Support vector machines• Classification
Margin
bybi iii +⋅=+⋅ ∑ xxxw α
20
10
classbif
classbif
→<+⋅→≥+⋅
wx
wx
Test point
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
14-Nov-1183
Lecture 14-
Fei-Fei Li
• Datasets that are linearly separable work out great:•
•
• But what if the dataset is just too hard?
• We can map it to a higher-dimensional space:
0 x
0 x
0 x
x2
Nonlinear SVMs
Slide credit: Andrew Moore
14-Nov-1184
Lecture 14-
Fei-Fei Li
Φ: x→ φ(x)
Nonlinear SVMs• General idea: the original input space can always be mapped
to some higher-dimensional feature space where the
training set is separable:
Slide credit: Andrew Moore
lifting transformation
14-Nov-1185
Lecture 14-
Fei-Fei Li
Nonlinear SVMs
• Nonlinear decision boundary in the original feature
space:
bKyi
iii +∑ ),( xxα
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
•The kernel K = product of the lifting transformation φ(x):
K(xi ,xj j) = φ(xi ) · φ(xj)
NOTE:
• It is not required to compute φ(x) explicitly:
• The kernel must satisfy the “Mercer inequality”
14-Nov-1186
Lecture 14-
Fei-Fei Li
Kernels for bags of features
• Histogram intersection kernel:
• Generalized Gaussian kernel:
• D can be Euclidean distance, χ2 distance etc…
∑=
=N
i
ihihhhI1
2121 ))(),(min(),(
−= 22121 ),(
1exp),( hhD
AhhK
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007
14-Nov-1187
Lecture 14-
Fei-Fei Li
Pyramid match kernel
• Fast approximation of Earth Mover’s Distance
• Weighted sum of histogram intersections at mutliple resolutions (linear in
the number of features instead of cubic)
K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features, ICCV 2005.
14-Nov-1188
Lecture 14-
Fei-Fei Li
Spatial Pyramid Matching
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. S. Lazebnik, C. Schmid, and J. Ponce. CVPR 2006
14-Nov-1189
Lecture 14-
Fei-Fei Li
What about multi-class SVMs?
• No “definitive” multi-class SVM formulation
• In practice, we have to obtain a multi-class SVM by combining
multiple two-class SVMs
• One vs. others
– Traning: learn an SVM for each class vs. the others
– Testing: apply each SVM to test example and assign to it the class of
the SVM that returns the highest decision value
• One vs. one
– Training: learn an SVM for each pair of classes
– Testing: each learned SVM “votes” for a class to assign to the test
example
Credit slide: S. Lazebnik
14-Nov-1190
Lecture 14-
Fei-Fei Li
SVMs: Pros and cons
• Pros
– Many publicly available SVM packages:
http://www.kernel-machines.org/software
– Kernel-based framework is very powerful, flexible
– SVMs work very well in practice, even with very small training sample sizes
• Cons
– No “direct” multi-class SVM, must combine two-class SVMs
– Computation, memory
• During training time, must compute matrix of kernel values for every pair of
examples
• Learning can take a very long time for large-scale problems
14-Nov-1191
Lecture 14-
Object recognition results
• ETH-80 database of 8 object
classes (Eichhorn and Chapelle 2004)
• Features:
– Harris detector
– PCA-SIFT descriptor, d=10
Kernel Complexity Recognition rate
Match [Wallraven et al.] 84%
Bhattacharyya affinity [Kondor & Jebara]
85%
Pyramid match 84%
Slide credit: Kristen Grauman
14-Nov-11
Lecture 14-
Fei-Fei Li
Discriminative models
Support Vector Machines
Guyon, Vapnik, Heisele,
Serre, Poggio…
Boosting
Viola, Jones 2001,
Torralba et al. 2004,
Opelt et al. 2006,…
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005...
Neural networks
Source: Vittorio Ferrari, Kristen Grauman, Antonio Torralba
Latent SVM
Structural SVM
Felzenszwalb 00Ramanan 03…
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
…
14-Nov-1193
Lecture 14-
Fei-Fei Li
Learning and Recognition
1. Discriminative method:
- NN
- SVM
2.Generative method:
- graphical models
� Model the probability distribution that produces a given bag of
features
14-Nov-1194
Lecture 14-
Fei-Fei Li
Generative models
1. Naïve Bayes classifier– Csurka Bray, Dance & Fan, 2004
2. Hierarchical Bayesian text models (pLSA and LDA)
– Background: Hoffman 2001, Blei, Ng & Jordan, 2004
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
14-Nov-1195
Lecture 14-
Fei-Fei Li
• w: a collection of all N codewords in the image
w = [w1,w2,…,wN]
• c: category of the image
Some notations
14-Nov-1196
Lecture 14-
wN
c
the Naïve Bayes model
)|()( cwpcp∝)|( wcp
14-Nov-11
Prior prob. of the object classes
Image likelihoodgiven the class
Graphical model
Posterior =probability that image I is of category c
Lecture 14-
wN
c
the Naïve Bayes model
)|()( cwpcp ∏=
=N
nn cwpcp
1
)|()(
Object classdecision
∝)|( wcpc
c maxarg=∗
Likelihood of ith visual wordgiven the class
Estimated by empirical frequencies of code words in images from a given class
14-Nov-11
Graphical model
Lecture 14-
Fei-Fei Li
Csurka et al. 2004
14-Nov-1199
Lecture 14-
Fei-Fei Li
Csurka et al. 2004
14-Nov-11100
Lecture 14-
Fei-Fei Li
Other generative BoW models
• Hierarchical Bayesian topic models (e.g. pLSAand LDA)
– Object categorization: Sivic et al. 2005, Sudderth et al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
14-Nov-11101
Lecture 14-
Fei-Fei Li
Generative vs discriminative
• Discriminative methods
– Computationally efficient & fast
• Generative models
– Convenient for weakly- or un-supervised, incremental
training
– Prior information
– Flexibility in modeling parameters
14-Nov-11102
Lecture 14-
Fei-Fei Li
• No rigorous geometric information of the object components
• It’s intuitive to most of us that objects are made of parts – no such information
• Not extensively tested yet for
– View point invariance
– Scale invariance
• Segmentation and localization unclear
Weakness of BoW the models
14-Nov-11103
Lecture 14-
Fei-Fei Li
What have learned today?
• Introduction to object recognition
– Representation
– Learning
– Recognition
• Bag of Words models (Problem Set 4 (Q2))
– Basic representation
– Different learning and recognition algorithms
14-Nov-11104