8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 1/135
Machine Learning in
Computer Vision
Fei-Fei Li
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 2/135
W h at is ( com p u t er ) v i sion ?
• When we “see” something, what does itinvolve?
• Take a picture with a camera, it is just abunch of colored dots (pixels)
• Want to make computers understandimages
• Looks easy, but not really…
Image (or video) Sensing device Interpreting device Interpretations
Corn/mature corn jn a cornfield/ plant/blue sky inthe background
Etc.
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 3/135
What is it related to?
Computer Vision
Neuroscience
Machine learning
Speech Information retrieval
Maths
ComputerScience
InformationEngineering
Physics
Biology
Robotics
Cognitivesciences
Psychology
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 4/135
Quiz?
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 5/135
What about this?
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 6/135
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 7/135
horizontal lines
vertical
blue on the top
porous
oblique
white
shadow to the left
textured
large green patches
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 8/135
A picture is worth a thousand words.--- Confucius
or Printers’ Ink Ad (1921)
building court yard
clear sky
campus
trees
autumn leaves
people
day time
bicycles
talking outdoor
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 9/135
Today: machine learning methods
for object recognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 10/135
outline
• Intro to object categorization
• Brief overview – Generative
– Discriminative• Generative models
• Discriminative models
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 11/135
How many object categories are there?
Biederman 1987
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 12/135
Challenges 1: view point variation
Michelangelo 1475-1564
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 13/135
Challenges 2: illumination
slide credit: S. Ullman
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 14/135
Challenges 3: occlusion
Magritte, 1957
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 15/135
Challenges 4: scale
Ch ll d f i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 16/135
Challenges 5: deformation
Xu, Beihong 1943
Ch ll 6 b k d l tt
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 17/135
Challenges 6: background clutter
Klimt, 1913
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 18/135
History: single object recognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 19/135
History: single object recognition
• Lowe, et al. 1999, 2003
• Mahamud and Herbert, 2000
• Ferrari, Tuytelaars, and Van Gool, 2004
• Rothganger, Lazebnik, and Ponce, 2004
• Moreels and Perona, 2005• …
Ch ll 7 i t l i ti
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 20/135
Challenges 7: intra-class variation
Object categorization:Object categorization:
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 21/135
Object categorization:Object categorization:
the statistical viewpointthe statistical viewpoint
)|( image zebra p
)( e zebra|imagno pvs.
• Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebrano p
zebra p
zebranoimage p
zebraimage p
image zebrano p
image zebra p
⋅=
posterior ratio likelihood ratio prior ratio
Object categorization:Object categorization:
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 22/135
Object categorization:Object categorization:
the statistical viewpointthe statistical viewpoint
)(
)(
)|(
)|(
)|(
)|(
zebrano p
zebra p
zebranoimage p
zebraimage p
image zebrano p
image zebra p⋅=
posterior ratio likelihood ratio prior ratio
• Discriminative methods model posterior
• Generative methods model likelihood and
prior
Di i i ti
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 23/135
Discriminative
• Direct modeling of
Zebra
Non-zebra
Decisionboundary
)|(
)|(
image zebrano p
image zebra p
Generative
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 24/135
• Model and
Generative
)|( zebraimage p )|( zebranoimage p
Low Middle
High MiddleLow
)|( zebranoimage p)|( zebraimage p
Th i i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 25/135
Three main issuesThree main issues
• Representation
– How to represent an object category
• Learning
– How to form the classifier, given training data
• Recognition – How the classifier is to be used on novel data
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 26/135
Representation
– Generative / discriminative / hybrid
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 27/135
Representation
– Generative / discriminative / hybrid
– Appearance only orlocation andappearance
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 28/135
Representation
– Generative / discriminative / hybrid
– Appearance only orlocation andappearance
– Invariances• View point
• Illumination
• Occlusion• Scale
• Deformation
• Clutter• etc.
R i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 29/135
Representation
– Generative / discriminative / hybrid
– Appearance only orlocation andappearance
– invariances
– Part-based or global
w/sub-window
R t ti
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 30/135
Representation
– Generative / discriminative / hybrid
– Appearance only orlocation andappearance
– invariances
– Parts or global w/sub-
window – Use set of features or
each pixel in image
L i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 31/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning
Learning
L i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 32/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)
– Methods of training: generative vs.discriminative
Learning
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
c l a s s
d e n
s i t i e s
p( x|C 1)
p( x|C 2)
x0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
p o s t e r i o r p r o b a b i l i t i e s
x
p(C 1| x) p(C
2| x)
L i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 33/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)
– What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
Learning
Contains a motorbike
L i
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 34/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)
– What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and imagelevel; user-feedback )
Learning
Learning
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 35/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)
– What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and imagelevel; user-feedback )
– Training images:• Issue of overfitting
• Negative images for discriminative methods
Learning
Learning
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 36/135
– Unclear how to model categories, so welearn what distinguishes them rather thanmanually specify the difference -- hencecurrent interest in machine learning)
– What are you maximizing? Likelihood(Gen.) or performances on train/validationset (Disc.)
– Level of supervision• Manual segmentation; bounding box; image
labels; noisy labels
– Batch/incremental (on category and imagelevel; user-feedback )
– Training images:• Issue of overfitting
• Negative images for discriminative methods
– Priors
Learning
Recognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 37/135
– Scale / orientation range to search over
– Speed
Recognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 38/135
Bag-of-words models
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 39/135
ObjectObject Bag of Bag of ‘‘wordswords’’
Analogy to documentsAnalogy to documents
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 40/135
Analogy to documentsAnalogy to documents
Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.For a long time it was thought that the retinal
image was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now
know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to
demonstrate that the message about the image falling on the retina undergoes a step-
wise analysis in a system of nerve cells
stored in columns. In this system each cell
has its specific function and is responsible for
a specific detail in the pattern of the retinal
image.
sensory, brain,visual, perception,
retinal, cerebral cortex,
eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn(£51bn) to $100bn this year, a threefoldincrease on 2004's $32bn. The CommerceMinistry said the surplus would be created bya predicted 30% jump in exports to $750bn,compared with a 18% rise in imports to
$660bn. The figures are likely to furtherannoy the US, which has long argued thatChina's exports are unfairly helped by adeliberately undervalued yuan. Beijingagrees the surplus is too high, but says the
yuan is only one factor. Bank of Chinagovernor Zhou Xiaochuan said the countryalso needed to do more to boost domesticdemand so more goods stayed within thecountry. China increased the value of theyuan against the dollar by 2.1% in July and
permitted it to trade within a narrow band, butthe US wants the yuan to be allowed to tradefreely. However, Beijing has made it clear thatit will take its time and tread carefully beforeallowing the yuan to rise further in value.
China, trade,surplus, commerce,
exports, imports, US,
yuan, bank, domestic,
foreign, increase,trade, value
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 41/135
learninglearning recognitionrecognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 42/135
categorycategory
decisiondecision
feature detection
& representation
codewords dictionarycodewords dictionary
image representation
category modelscategory models
(and/or) classifiers(and/or) classifiers
RepresentationRepresentation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 43/135
feature detection
& representation
codewords dictionarycodewords dictionary
image representation
1.1.
2.2.
3.3.
1.Feature detection and representation1.Feature detection and representation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 44/135
eatu e detect o a d ep ese tat op
1.Feature1.Feature detectiondetection andand representationrepresentation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 45/135
pp
Normalize
patch
Detect patches
[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Compute
SIFT
descriptor
[Lowe’99]
Slide credit: Josef Sivic
1.Feature1.Feature detectiondetection andand representationrepresentation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 46/135
…
pp
2. Codewords dictionary formation2. Codewords dictionary formation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 47/135
yy
…
2. Codewords dictionary formation2. Codewords dictionary formation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 48/135
y
Vector quantization
…
Slide credit: Josef Sivic
2. Codewords dictionary formation2. Codewords dictionary formation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 49/135
y
Fei-Fei et al. 2005
3. Image representation3. Image representation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 50/135
…..
f r e q u e
n c y
codewords
RepresentationRepresentation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 51/135
feature detection
& representation
codewords dictionarycodewords dictionary
image representation
1.1.
2.2.
3.3.
Learning and RecognitionLearning and Recognition
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 52/135
categorycategory
decisiondecision
codewords dictionarycodewords dictionary
category modelscategory models
(and/or) classifiers(and/or) classifiers
2 case studies2 case studies
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 53/135
1. Naïve Bayes classifier
– Csurka et al. 2004
2. Hierarchical Bayesian text models(pLSA and LDA)
– Background: Hoffman 2001, Blei et al. 2004 – Object categorization: Sivic et al. 2005, Sudderth et
al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
First, some notationsFirst, some notations
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 54/135
• wn: each patch in an image
– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
– w = [w1,w2,…,wN]
• d j: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
Case #1: the NaCase #1: the Na ï ï veve BayesBayes modelmodel
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 55/135
w
N
c
)|()( cw pc p
Prior prob. ofthe object classes
Image likelihoodgiven the class
Csurka et al. 2004
∏==
N
n
n cw pc p1
)|()(
Object classdecision
∝)|( wc pc
c maxarg=∗
Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesian
text modelstext models
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 56/135
Hoffman, 2001
text modelstext models
wN
d z
D
w
N
c z
D
π
Blei et al., 2001
Probabilistic Latent Semantic Analysis (pLSA)
Latent Dirichlet Allocation (LDA)
Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 57/135
wN
d z
D
text modelstext models
Probabilistic Latent Semantic Analysis (pLSA)
“face”
Sivic et al. ICCV 2005
Case #2: Hierarchical BayesianCase #2: Hierarchical Bayesiantext modelstext models
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 58/135
w
N
c z
D
π
Latent Dirichlet Allocation (LDA)
text modelstext models
Fei-Fei et al. ICCV 2005
“beach”
Another application
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 59/135
Another application
• Human action classification
Invariance issuesInvariance issues
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 60/135
• Scale and rotation – Implicit
– Detectors and descriptors
Kadir and Brady. 2003
Invariance issuesInvariance issues
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 61/135
• Scale and rotation• Occlusion
– Implicit in the models
– Codeword distribution: small variations
– (In theory) Theme (z) distribution: different
occlusion patterns
Invariance issuesInvariance issues
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 62/135
• Scale and rotation• Occlusion
• Translation – Encode (relative) location information
Sudderth et al. 2005
Invariance issuesInvariance issues
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 63/135
• Scale and rotation• Occlusion
• Translation• View point (in theory)
– Codewords: detectorand descriptor
– Theme distributions:
different view points
Fergus et al. 2005
Model propertiesModel properties
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 64/135
Of all the sensory impressions proceeding tothe brain, the visual experiences are thedominant ones. Our perception of the worldaround us is based essentially on themessages that reach the brain from our eyes.
For a long time it was thought that the retinalimage was transmitted point by point to visualcenters in the brain; the cerebral cortex was amovie screen, so to speak, upon which theimage in the eye was projected. Through thediscoveries of Hubel and Wiesel we now
know that behind the origin of the visualperception in the brain there is a considerablymore complicated course of events. Byfollowing the visual impulses along their pathto the various cell layers of the optical cortex,Hubel and Wiesel have been able to
demonstrate that the message about the image falling on the retina undergoes a step-
wise analysis in a system of nerve cells
stored in columns. In this system each cell
has its specific function and is responsible for
a specific detail in the pattern of the retinal image.
sensory, brain,
visual, perception,
retinal, cerebral cortex,
eye, cell, optical
nerve, image
Hubel, Wiesel
• Intuitive
– Analogy to documents
Model propertiesModel properties
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 65/135
• Intuitive
• (Could use)generative models
– Convenient for weakly-
or un-supervisedtraining
– Prior information
– Hierarchical BayesianframeworkSivic et al., 2005, Sudderth et al., 2005
Model propertiesModel properties
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 66/135
• Intuitive
• (Could use)generative models
• Learning and
recognition relativelyfast
– Compare to other
methods
Weakness of the modelWeakness of the model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 67/135
• No rigorous geometric informationof the object components
• It’s intuitive to most of us thatobjects are made of parts – no
such information• Not extensively tested yet for – View point invariance
– Scale invariance
• Segmentation and localization
unclear
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 68/135
part-based models
Slides courtesy to Rob Fergus for “part-based models”
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 69/135
One-shot learning
of object categoriesFei-Fei et al. ‘03, ‘04, ‘06
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 70/135
One-shot learning
of object categories
P. Brueg e l , 15 62
Fei-Fei et al. ‘03, ‘04, ‘06
model representation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 71/135
One-shot learning
of object categoriesFei-Fei et al. ‘03, ‘04, ‘06
X (location)
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 72/135
(x,y) coords. of region center
A (appearance)
Projection ontoPCA basis
c1
c2
c10
…..
normalize
1 1 x 1 1
p a t c h
X (location)The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 73/135
(x,y) coords. of region center
A (appearance)
Projection ontoPCA basis
c1
c2
c10
…..
normalize
1 1 x 1 1
p a t c h
X A
h
ΓXμX
I
ΓAμA
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 74/135
X A
h
ΓXμX
I
ΓAμA
observed variables
hidden variable
parameters
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 75/135
X A
h
ΓXμX
I
ΓAμA
θn
θ1
θ2
where θ = {µX, ΓX, µA, ΓA}
ML/MAP
Weber et al. ’98 ’00, Fergus et al. ’03
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 76/135
θn
θ1
θ2
where θ = {µX, ΓX, µA, ΓA}
ML/MAPΓXμX
ΓAμA
shape model
appearance model
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 77/135
X A
h
I
ΓXμX ΓAμA
θn
θ1
θ2
ML/MAP
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 78/135
X A
h
I
P
ΓXμX ΓAμA
a0X
B0X
m0X
β0X
a0A
B0A
m0A
β0A
Bayesian
Parameters to estimate: {mX, βX, aX, BX, mA, βA, aA, BA}i.e. parameters of Normal-Wishart distribution
θn
θ1
θ2
Fei-Fei et al. ‘03, ‘04, ‘06
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 79/135
X A
h
I
P
ΓXμX ΓAμA
a0X
B0X
m0X
β0X
a0A
B0A
m0A
β0A
parameters
priors
Fei-Fei et al. ‘03, ‘04, ‘06
The Generative Model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 80/135
X A
h
I
P
ΓXμX ΓAμA
a0X
B0X
m0X
β0X
a0A
B0A
m0A
β0A
Prior distribution
Fei-Fei et al. ‘03, ‘04, ‘06
priors
1. human vision
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 81/135
2. modelrepresentation
3. learning& inferences
4. evaluation
& dataset& application
One-shot learning
of object categories
learning & inferences
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 82/135
One-shot learning
of object categoriesFei-Fei et al. 2003, 2004, 2006
No labeling No segmentation No alignment
learning & inferences
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 83/135
One-shot learning
of object categoriesFei-Fei et al. 2003, 2004, 2006
Bayesian
θn
θ1
θ2
RandominitializationVariational EMVariational EM
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 84/135
E-Step
new estimate
of p(θ|train)
M-Step
prior knowledge of p(θ)Attias, Jordan, Hinton etc.
evaluation & dataset
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 85/135
One-shot learning
of object categoriesFei-Fei et al. 2004, 2006a, 2006b
evaluation & dataset -- Caltech 101 Dataset
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 86/135
One-shot learning
of object categoriesFei-Fei et al. 2004, 2006a, 2006b
evaluation & dataset -- Caltech 101 Dataset
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 87/135
One-shot learning
of object categoriesFei-Fei et al. 2004, 2006a, 2006b
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 88/135
Part 3: discriminative methods
Discriminative methodsObject detection and recognition is formulated as a classification problem.
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 89/135
Bag of image patches
Decision
boundary
… and a decision is taken at each window about if it contains a target object or not.
Computer screen
Background
In some feature space
Where are the screens?
The image is partitioned into a set of overlapping windows
Discriminative vs. generativeGenerati e model
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 90/135
(The lousy painter)
0 10 20 30 40 50 60 70
0
0.05
0.1
x = data
• Generative model
0 10 20 30 40 50 60 700
0.5
1
x = data
• Discriminative model
0 10 20 30 40 50 60 70 80
-1
1
x = data
• Classification function
(The artist )
Discriminative methods
Nearest neighbor Neural networks
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 91/135
106
examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…
Neural networks
LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…
Conditional Random FieldsSupport Vector Machines and Kernels
McCallum, Freitag, Pereira 2000Kumar, Hebert 2003
…
Guyon, Vapnik
Heisele, Serre, Poggio, 2001
…
F l ti bi l ifi ti
Formulation
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 92/135
• Formulation: binary classification
+1-1
x1 x2 x3 xN
…
… xN+1 xN+2 xN+M
-1 -1 ? ? ?
…
Training data: each image patch is labeledas containing the object or background
Test data
Features x =
Labels y =
Where belongs to some family of functions
• Classification function
• Minimize misclassification error(Not that simple: we need some guarantees that there will be generalization)
Overview of section
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 93/135
• Object detection with classifiers
• Boosting
– Gentle boosting
– Weak detectors – Object model
– Object detection
• Multiclass object detection
A i l l ith f l i b t l ifi
Why boosting?
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 94/135
• A simple algorithm for learning robust classifiers – Freund & Shapire, 1995 – Friedman, Hastie, Tibshhirani, 1998
• Provides efficient algorithm for sparse visualfeature selection – Tieu & Viola, 2000 – Viola & Jones, 2003
• Easy to implement, not requires externaloptimization tools.
D fi l ifi i dditi d l
Boosting
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 95/135
• Defines a classifier using an additive model:
Strongclassifier
Weak classifier
WeightFeaturesvector
D fi l ifi i dditi d l
Boosting
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 96/135
• Defines a classifier using an additive model:
• We need to define a family of weak classifiers
Strongclassifier
Weak classifier
WeightFeaturesvector
from a family of weak classifiers
Boosting
• It is a sequential procedure:
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 97/135
Each data point has
a class label:
wt =1
and a weight:
+1 ( )
-1 ( )yt =
• It is a sequential procedure:
xt=1
xt=2
xt
Toy exampleWeak learners from the family of lines
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 98/135
Weak learners from the family of lines
h => p(error) = 0.5 it is at chance
Each data point has
a class label:
wt =1
and a weight:
+1 ( )
-1 ( )yt =
Toy example
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 99/135
This one seems to be the best
Each data point has
a class label:
wt =1
and a weight:
+1 ( )
-1 ( )yt =
This is a ‘weak classifier ’: It performs slightly better than chance.
Toy example
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 100/135
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
Toy example
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 101/135
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
Toy example
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 102/135
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
Toy example
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 103/135
We set a new problem for which the previous weak classifier performs at chance again
Each data point has
a class label:
wt wt exp{-yt Ht}
We update the weights:
+1 ( )
-1 ( )yt =
Toy example
f1 f2
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 104/135
The strong (non- linear) classifier is built as the combination ofall the weak (linear) classifiers.
f3
f4
From images to features:Weak detectors
W ill d fi f il f i l
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 105/135
We will now define a family of visualfeatures that can be used as weak
classifiers (“weak detectors”)
Takes image as input and the output is binary response.The output is a weak detector.
Weak detectors
Textures of textures
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 106/135
Textures of texturesTieu and Viola, CVPR 2000
Every combination of three filtersgenerates a different feature
This gives thousands of features. Boosting selects a sparse subset, so computationson test time are very efficient. Boosting also avoids overfitting to some extend.
Weak detectors
Haar filters and integral image
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 107/135
Haar filters and integral imageViola and Jones, ICCV 2001
The average intensity in theblock is computed with foursums independently of theblock size.
Weak detectors
Other weak detectors:
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 108/135
Other weak detectors:
• Carmichael, Hebert 2004
• Yuille, Snow, Nitzbert, 1998• Amit, Geman 1998
• Papageorgiou, Poggio, 2000
• Heisele, Serre, Poggio, 2001
• Agarwal, Awan, Roth, 2004
• Schneiderman, Kanade 2004• …
Weak detectors
Part based: similar to part-based generative
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 109/135
p gmodels. We create weak detectors by
using parts and voting for the object centerlocation
Car model Screen model
These features are used for the detector on the course web site.
Weak detectors
First we collect a set of part templates from a set of trainingobjects
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 110/135
objects.
Vidal-Naquet, Ullman (2003)
…
Weak detectors
We now define a family of “weak detectors” as:
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 111/135
= =
Better than chance
*
Weak detectors
We can do a better job using filtered images
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 112/135
Still a weak detectorbut better than before
* *= ==
TrainingFirst we evaluate all the N features on all the training images.
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 113/135
Then, we sample the feature outputs on the object center and at randomlocations in the background:
Representation and object model
Selected features for the screen detector
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 114/135
…
4 101 2 3
…
100
Lousy painter
Representation and object modelSelected features for the car detector
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 115/135
1 2 3 4 10 100
… …
Overview of section
• Object detection with classifiers
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 116/135
Object detection with classifiers
• Boosting – Gentle boosting
– Weak detectors – Object model
– Object detection
• Multiclass object detection
Example: screen detectionFeatureoutput
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 117/135
Example: screen detectionFeatureoutput
Thresholdedoutput
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 118/135
Weak ‘detector’
Produces many false alarms.
Example: screen detectionFeatureoutput
Thresholdedoutput
Strong classifierat iteration 1
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 119/135
Example: screen detectionFeatureoutput
Thresholdedoutput
Strongclassifier
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 120/135
Second weak ‘detector’
Produces a different set offalse alarms.
Example: screen detectionFeatureoutput
Thresholdedoutput
Strongclassifier
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 121/135
+
Strong classifierat iteration 2
Example: screen detectionFeatureoutput
Thresholdedoutput
Strongclassifier
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 122/135
+
…
Strong classifierat iteration 10
Example: screen detectionFeatureoutput
Thresholdedoutput
Strongclassifier
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 123/135
+
…
Addingfeatures
Final
classification
Strong classifierat iteration 200
applications
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 124/135
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 125/135
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 126/135
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 127/135
Document Analysis
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 128/135
Digit recognition, AT&T labshttp://www.research.att.com/~yann /
Medical Imaging
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 129/135
Robotics
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 130/135
Toys and robots
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 131/135
Finger prints
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 132/135
Surveillance
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 133/135
Security
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 134/135
Searching the web
8/2/2019 slides: Machine Learning for Computer Vision
http://slidepdf.com/reader/full/slides-machine-learning-for-computer-vision 135/135