Computer Vision I -Algorithms and Applications:
Recognition and Wrap-Up
Carsten Rother
03/02/2014 Computer Vision I: Recognition and Wrap-Up
Roadmap this lecture
• Some comments to previous lectures
• Image categorization: Generative versus Discriminative Approach(finalize from last lecture)
• Object Detection
• Wrap-up
03/02/2014 2Computer Vision I: Recognition and Wrap-Up
Reminder: Simulated annealing (Lecture 10)
03/02/2014 3
Gibbs Distribution with a temperature 𝑇: 𝑃𝑇 𝑥 =1
𝑓exp{ −𝐸(𝒙)/𝑇 }
Basic Idea: Start optimization with high temperature and find good initial solution and then go to low temperatures.
Function: –𝐸
Label 𝑥(for illustration: One
Pixel with Label ranges from 0 to 40)
(original distribution)Label 𝑥 Label 𝑥
Label 𝑥 Label 𝑥
Computer Vision I: Recognition and Wrap-Up
Reminder: (simplified) Simulated annealing
03/02/2014 4
Algorithm.Choose an annealing schedule, e.g. (𝑡1, … , 𝑡𝑚) = (100,100,…,1,1,1)Initialize all variables 𝒙
Compute current highest probability: 𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡1(𝒙)
For 𝑗 = 1…𝑚{
For a random selection of variables 𝑥𝑖{
select a new random label 𝑘 for 𝑥𝑖 // In full Simulated Annealing one// samples (see CV2)
if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡
{accept 𝑥𝑖 = 𝑘
𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡𝑗 𝒙
}}
}
Gibbs Distribution with a temperature 𝑇: 𝑃𝑇 𝑥 =1
𝑓exp{ −𝐸(𝒙)/𝑇 }
(This was wrong)
Computer Vision I: Recognition and Wrap-Up
Why is it wrong?
03/02/2014 5
Problem: if you take a random label then the chance of increasing the probability
(i.e. test: if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡 ) is
independent of temperature!
Label 𝑥(for illustration: One
Pixel with Label ranges from 0 to 40)
(original distribution)Label 𝑥 Label 𝑥
Label 𝑥 Label 𝑥
Computer Vision I: Recognition and Wrap-Up
Simulated annealing (correct)
03/02/2014 6
Algorithm.Choose an annealing schedule, e.g. (𝑡1, … , 𝑡𝑚) = (100,100,…,1,1,1)Initialize all variables 𝒙
Compute current highest probability: 𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡1(𝒙)
For 𝑗 = 1…𝑚{
For a random selection of variables 𝑥𝑖{
select a new sample 𝑘 for 𝑥𝑖 // simple sample procedure next
if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡
{accept 𝑥𝑖 = 𝑘
𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡𝑗 𝒙
}}
}
Gibbs Distribution with a temperature 𝑇: 𝑃𝑇 𝑥 =1
𝑓exp{ −𝐸(𝒙)/𝑇 }
Computer Vision I: Recognition and Wrap-Up
How to sample a label for one pixel?
03/02/2014 7
How to sample from a general discrete probability distribution of one variable 𝑝(𝑥), 𝑥 ∈ {0,1,… , 𝑛}?
1. Define “intervals” whose lengths are proportional to 𝑝(𝑥)
2. Concatenate these intervals
3. Sample into the composed interval uniformly
4. Check, in which interval the sampled value falls in.
Below is an example for 𝑝 𝑥 ∝ {1,2,3} (three values).
1 2 3
Computer Vision I: Recognition and Wrap-Up
Reminder: Random Forests
03/02/2014 8
1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:
2-dimensional: We want to classify the white pixel
Feature: color of the green pixel and color of the red pixel
Parameters: 4D (2 offset vectors)
We will visualize in this way:
Computer Vision I: Recognition and Wrap-Up
Reminder: Random Forests
03/02/2014 9
body joint hypotheses
front view side view top view
input depth image body parts
BodypartLabelling
Clustering
Label each pixel
Computer Vision I: Recognition and Wrap-Up
Reminder: Decision Tree – Train Time
03/02/2014 10
Input: all training points
Input data in feature spaceeach point has a class label
The set of all labelled (training data) points, here 35 red and 23 blue.
Split the training set at each node
Measure 𝑝(𝑐) at each leave, it could be 3 red an 1 blue, i.e. 𝑝(𝑟𝑒𝑑) = 0.75; 𝑝(𝑏𝑙𝑢𝑒) = 0.25
(remember, the feature space is also optimized with 𝜃)
Computer Vision I: Recognition and Wrap-Up
Random Forests – Training of features (illustration)
03/02/2014 11
What does it mean to optimize over 𝜃
• For each pixel the same feature test (at one split node) will be done.
• One has to define what happens with feature tests that reaches outside the image
Goal: classify each pixel to belong to class red or blueFeature: Value 𝑥1: what is the value of green (could also be red or blue) color channel
if you look: 𝜃1 pixel right and 𝜃2 pixels upValue 𝑥2: what is the value of green green (could also be red or blue) color channel
if you look: 𝜃3 pixel right and 𝜃4 pixels down
One choice of 𝜃 another choice of 𝜃
Goal: find a such a 𝜃that it is best to separate the data
𝑝𝑜𝑠 + (𝜃1, 𝜃2)
𝑝𝑜𝑠 + (𝜃3, 𝜃4)
Image Labeling (2 classes, red and blue)
Computer Vision I: Recognition and Wrap-Up
Roadmap this lecture
• Some comments to previous lectures
• Image categorization: Generative versus Discriminative Approach(finalize from last lecture)
• Object Detection
• Wrap-up
03/02/2014 12Computer Vision I: Recognition and Wrap-Up
Bag of Words - Overview
03/02/2014 13Computer Vision I: Recognition and Wrap-Up
Feature Detection and Representation
03/02/2014 14Computer Vision I: Recognition and Wrap-Up
Codeword dictionary formation
03/02/2014 15Computer Vision I: Recognition and Wrap-Up
Codeword dictionary visualization
03/02/2014 Computer Vision I: Recognition and Wrap-Up 16
K = 174 (averaged patches for each cluster) [from Fei Fei Li]
Bag of Words – Image Representation
03/02/2014 17
K = 174
Computer Vision I: Recognition and Wrap-Up
Bag of Words - Overview
03/02/2014 18Computer Vision I: Recognition and Wrap-Up
Two approaches
03/02/2014 19
Generative approach: models distributions
Discriminative function:models decision function
Computer Vision I: Recognition and Wrap-Up
Discriminative functions
03/02/2014 20
“2D space (two codewords)”
Support Vector Machine is the optimal classifier -> see Machine Learning 1
e.g. cars
e.g. sky
Computer Vision I: Recognition and Wrap-Up
Which Hyperplane is best and why?
03/02/2014 21
SVM classifier: Max-Margin behavior – best generalization
Computer Vision I: Recognition and Wrap-Up
Reminder: Random Forest also have max-margun effect due to bagging
03/02/2014 22
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic
Training points
Computer Vision I: Recognition and Wrap-Up
Support Vector Machines
03/02/2014 23Computer Vision I: Recognition and Wrap-Up
Simpler decision functions are better
03/02/2014 24
[Florian Markowetz]
Best generalization
Computer Vision I: Recognition and Wrap-Up
Two approaches
03/02/2014 25
Generative approach: models distributions
Discriminative function:models decision function
Computer Vision I: Recognition and Wrap-Up
Bayes Decision Theory
03/02/2014 26
(Likelihood)
Computer Vision I: Recognition and Wrap-Up
Bayes Decision Theory
03/02/2014 27Computer Vision I: Recognition and Wrap-Up
Bayesian Decision Theory
03/02/2014 28
may
Computer Vision I: Recognition and Wrap-Up
Bayesian Decision Theory
03/02/2014 29Computer Vision I: Recognition and Wrap-Up
Bayesian Decision Theory
03/02/2014 30Computer Vision I: Recognition and Wrap-Up
Bayesian Decision Theory
03/02/2014 31
MAP classifier:
A classifier obeying this rule is called a MAP classifier (sometimes called Bayes optimal classifier)
Computer Vision I: Recognition and Wrap-Up
Relation to previous lecture 10 and 11
03/02/2014 32
• Image gets a label (class): 𝐾 labelings
• Each pixel gets a label (class):𝐾𝑛 labelings
• Pixels are structured
• In ML / CV 2 we do further classifiers, in particular minimizing the expected risk.
Computer Vision I: Recognition and Wrap-Up
Naive Bayes Classifier
03/02/2014 33
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions
• Encode each image as a feature vector𝒙 = (𝑥1, … , 𝑥𝑛) where 𝑛 is the
number of interest points.
• 𝑥𝑗 ∈ {𝑤1, …𝑤𝑚}. Here 𝑚 visual words.~200 interest points Interest points for
codewords (visual words)
• Naive Bayes Classifier assumes that visual words are conditionally independent
given object class: 𝑃 𝑥 𝑐 = 𝑗𝑃(𝑥𝑗|𝑐)
(which is rarely true in practice)
• Naive Bayes Classifier:
𝑐∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃 𝑐 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃 𝑐 𝑃 𝑥 𝑐 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃(𝑐) 𝑗 𝑃(𝑥𝑗|𝑐)
Computer Vision I: Recognition and Wrap-Up
Example (blackboard)
03/02/2014 34
Training:
Computer Vision I: Recognition and Wrap-Up
Example (blackboard)
03/02/2014 35
Test:
Computer Vision I: Recognition and Wrap-Up
Image Classification with Naive Bayes
03/02/2014 36Computer Vision I: Recognition and Wrap-Up
Image Classification with Naive Bayes
03/02/2014 37Computer Vision I: Recognition and Wrap-Up
Bag of words – Done!
03/02/2014 38Computer Vision I: Recognition and Wrap-Up
Summary and Discussion
03/02/2014 39
• Bag of words representation:• Sparse representation of object categories• Many Machine learning techniques can be applied (here
naïve Bayes and SVM)• Robust to occlusion• Allows sharing of representation between multiple
classes (via codeword dictionary)
• Problems:• Localization of objects in images not done• Spatial distribution of visual works is not modelled.
Computer Vision I: Recognition and Wrap-Up
Half-way slide
3 minutes break
Questions?
03/02/2014 40Computer Vision I: Recognition and Wrap-Up
Roadmap this lecture
• Some comments to previous lectures
• Image categorization: Generative versus Discriminative Approach(finalize from last lecture)
• Object Detection
• Wrap-up
03/02/2014 41Computer Vision I: Recognition and Wrap-Up
Reminder: Class-based recognition: Level of Detail
• Image Categorization
• One or more categories per image
• Object Class Detection
• Also find bounding box
• Part-based Object Detection
• Find parts of the object (and in this way the full object)
• Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)
• Object-class segmentation
03/02/2014 42
Frog, branch
2D bounding box for each frog
Computer Vision I: Recognition and Wrap-Up
Object Detection – General Process
03/02/2014 43
From Derek Hoiem
Computer Vision I: Recognition and Wrap-Up
Specifying an Object Model
03/02/2014 44
0. Window-based Bag-of-world Model, use a bag of word-mode for each possible window
comment this can be done fast with branch and bound search:Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann, "Beyond Sliding
Windows: Object Localization by Efficient Subwindow Search", IEEE Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, 2008
Computer Vision I: Recognition and Wrap-Up
Specifying an Object Model
03/02/2014 45Computer Vision I: Recognition and Wrap-Up
Specifying an Object Model
03/02/2014 46Computer Vision I: Recognition and Wrap-Up
Specifying an Object Model
03/02/2014 47
… Details to come in CV 2
Computer Vision I: Recognition and Wrap-Up
Specifying an Object Model
03/02/2014 48Computer Vision I: Recognition and Wrap-Up
Object Detection – General Process
03/02/2014 49Computer Vision I: Recognition and Wrap-Up
Generate Hypothesis
03/02/2014 50Computer Vision I: Recognition and Wrap-Up
Generate Hypothesis
03/02/2014 51
Each window is evaluated separately
Computer Vision I: Recognition and Wrap-Up
Generate Hypothesis
03/02/2014 52Computer Vision I: Recognition and Wrap-Up
Generate Hypothesis
03/02/2014 53Computer Vision I: Recognition and Wrap-Up
Object Detection – General Process
03/02/2014 54Computer Vision I: Recognition and Wrap-Up
Object Detection – General Process
03/02/2014 55Computer Vision I: Recognition and Wrap-Up
Resolution
03/02/2014 56Computer Vision I: Recognition and Wrap-Up
Resolution
03/02/2014 57Computer Vision I: Recognition and Wrap-Up
Some Influential Work in this Field
03/02/2014 58Computer Vision I: Recognition and Wrap-Up
Roadmap this lecture
• Some comments to previous lectures
• Image categorization: Generative versus Discriminative Approach(finalize from last lecture)
• Object Detection
• Wrap-up
03/02/2014 59Computer Vision I: Recognition and Wrap-Up
This was Computer Vision 1
03/02/2014 60
• Introduction• Basics of Image processing• Basics of Image processing – Part 2• Image formation process and single camera model• Image formation process and single camera model – Part 2• Two-View Geometry• Robust Two-View Geometry• Multi-View 3D reconstruction• Dense Correspondence Fields• Discrete Labeling• Sematic Segmentation• Recognition • Recognition and Wrap-Up
Computer Vision I: Recognition and Wrap-Up
Commuter Vision timeline
03/02/2014 61
[Michael Black]
Computer Vision I: Recognition and Wrap-Up
Commuter Vision timeline
03/02/2014 62
[Michael Black]
Computer Vision I: Recognition and Wrap-Up
Commuter Vision timeline
03/02/2014 63
[Michael Black]
1999
Computer Vision I: Recognition and Wrap-Up
Commuter Vision timeline
03/02/2014 64
[Michael Black]
1999 1999-2012
Computer Vision I: Recognition and Wrap-Up
Commuter Vision timeline
03/02/2014 65
[Michael Black]
1999 12 2012 - ?1999-2012
Computer Vision I: Recognition and Wrap-Up
Please ask me if you want to work on computer vision
• We have openings for Master, Bachelor thesis, Seminar, Project work, and SHKs
03/02/2014 66
• Lots of academic collaborations in Europe: Oxford, UCL, Heidelberg, Darmstadt, ect.
• Good contacts to companies: Adobe, Microsoft, etc.
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – Project 1
03/02/2014 67
The Two-Frame Inverse Rendering Project:
Input: Two large-displacement images
Output: depth, motion, material, light, objects, segmentation, etc.
Application: image editing, image enhancement, image search, augmented reality, …
Challenge: build a joint statistical model:
1) How to do efficient inference, such as approximating rendering equation?
2) What aspects should be learned and what derived from physics?
3) Do you get better results when deriving multiple outputs?
4) What are good scene priors? Can we use synthetic models from graphics?
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – project 1
03/02/2014 68
2 RGBD Input
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – project 1
03/02/2014 69
http://www.inf.tu-dresden.de/index.php?node_id=1886&ln=en
Object 1; Material: wood
Object 2Material: stone Object 3
Material: glass
Scene in Blender
Inoput Two Images
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – project 2
03/02/2014 70
• Fast detection and 6D object pose estimation of multiple objects from an RGBD stream:
• Scan-in 3D objects (using KinectFusion [Izadi et al. UIST 2011])
Object Instance Recognition
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – project 3
• Improved Regression tree fields
03/02/2014 71
yi
𝐸 𝒚, 𝒙,𝒘 =
𝐹
𝐸𝐹 𝑦𝐹, 𝒙, 𝑤𝐹
Factors graph:Factor graph - compact:
Gibbs distribution with energy:
Computer Vision I: Recognition and Wrap-Up
Activities in the CVLD – project 3
03/02/2014 72
Inputx = K*y
Output y
State of the art for Image de-convolution:
Computer Vision I: Recognition and Wrap-Up
Computer Vision 2: Models, Inference and Learning• Introduction structured models (summary of CV 1)
• Discrete models – Inference (state-of-the art)• Pairwise models – combinatorial optimization, message passing, etc. • Concept of re-parametrization, tree-reweighted message passing• Higher-order models – dual decomposition, 𝑃𝑛-Potts, etc• Continuous-label models – Gaussian Random Fields, IRLS, PMBP, etc.• probabilistic inference – variational methods, sampling• OpenGM and other state-of-the art libraries
• Discrete models - Learning (state-of-the art)• Probabilistic learning – Field of Experts• Loss-based Learning - Regression/Decision Tree Fields, struct-SVM
• Applications: • object recognition and detection (part based, deformable shape models, etc)• 3D scene understanding• Intrinsic Image decomposition• Image partitioning, segmentation and matting
• Continuous Domain models
03/02/2014 73Computer Vision I: Recognition and Wrap-Up
Feedback
• Please give me feedback
03/02/2014 74Computer Vision I: Recognition and Wrap-Up