Download - Computer Vision I - Algorithms and Applications ...cvweb/teaching/Courses/WS_2013... · Computer Vision I - Algorithms and Applications: Recognition and Wrap-Up ... input depth image

Computer Vision I -Algorithms and Applications:

Recognition and Wrap-Up

Carsten Rother

03/02/2014 Computer Vision I: Recognition and Wrap-Up

Roadmap this lecture

• Some comments to previous lectures

• Image categorization: Generative versus Discriminative Approach(finalize from last lecture)

• Object Detection

• Wrap-up

03/02/2014 2Computer Vision I: Recognition and Wrap-Up

Reminder: Simulated annealing (Lecture 10)

03/02/2014 3

Gibbs Distribution with a temperature 𝑇: 𝑃𝑇 𝑥 =1

𝑓exp{ −𝐸(𝒙)/𝑇 }

Basic Idea: Start optimization with high temperature and find good initial solution and then go to low temperatures.

Function: –𝐸

Label 𝑥(for illustration: One

Pixel with Label ranges from 0 to 40)

(original distribution)Label 𝑥 Label 𝑥

Label 𝑥 Label 𝑥

Computer Vision I: Recognition and Wrap-Up

Reminder: (simplified) Simulated annealing

03/02/2014 4

Algorithm.Choose an annealing schedule, e.g. (𝑡1, … , 𝑡𝑚) = (100,100,…,1,1,1)Initialize all variables 𝒙

Compute current highest probability: 𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡1(𝒙)

For 𝑗 = 1…𝑚{

For a random selection of variables 𝑥𝑖{

select a new random label 𝑘 for 𝑥𝑖 // In full Simulated Annealing one// samples (see CV2)

if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡

{accept 𝑥𝑖 = 𝑘

𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡𝑗 𝒙

}}

}



(This was wrong)


Why is it wrong?

03/02/2014 5

Problem: if you take a random label then the chance of increasing the probability

(i.e. test: if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡 ) is

independent of temperature!

Label 𝑥(for illustration: One

Pixel with Label ranges from 0 to 40)

(original distribution)Label 𝑥 Label 𝑥

Label 𝑥 Label 𝑥


Simulated annealing (correct)

03/02/2014 6

Algorithm.Choose an annealing schedule, e.g. (𝑡1, … , 𝑡𝑚) = (100,100,…,1,1,1)Initialize all variables 𝒙

Compute current highest probability: 𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡1(𝒙)

For 𝑗 = 1…𝑚{

For a random selection of variables 𝑥𝑖{

select a new sample 𝑘 for 𝑥𝑖 // simple sample procedure next

if 𝑃𝑡𝑗 𝒙 ≥ 𝑃𝑏𝑒𝑠𝑡

{accept 𝑥𝑖 = 𝑘

𝑃𝑏𝑒𝑠𝑡 = 𝑃𝑡𝑗 𝒙

}}

}




How to sample a label for one pixel?

03/02/2014 7

How to sample from a general discrete probability distribution of one variable 𝑝(𝑥), 𝑥 ∈ {0,1,… , 𝑛}?

1. Define “intervals” whose lengths are proportional to 𝑝(𝑥)

2. Concatenate these intervals

3. Sample into the composed interval uniformly

4. Check, in which interval the sampled value falls in.

Below is an example for 𝑝 𝑥 ∝ {1,2,3} (three values).

1 2 3


Reminder: Random Forests

03/02/2014 8

1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:

2-dimensional: We want to classify the white pixel

Feature: color of the green pixel and color of the red pixel

Parameters: 4D (2 offset vectors)

We will visualize in this way:


Reminder: Random Forests

03/02/2014 9

body joint hypotheses

front view side view top view

input depth image body parts

BodypartLabelling

Clustering

Label each pixel


Reminder: Decision Tree – Train Time

03/02/2014 10

Input: all training points

Input data in feature spaceeach point has a class label

The set of all labelled (training data) points, here 35 red and 23 blue.

Split the training set at each node

Measure 𝑝(𝑐) at each leave, it could be 3 red an 1 blue, i.e. 𝑝(𝑟𝑒𝑑) = 0.75; 𝑝(𝑏𝑙𝑢𝑒) = 0.25

(remember, the feature space is also optimized with 𝜃)


Random Forests – Training of features (illustration)

03/02/2014 11

What does it mean to optimize over 𝜃

• For each pixel the same feature test (at one split node) will be done.

• One has to define what happens with feature tests that reaches outside the image

Goal: classify each pixel to belong to class red or blueFeature: Value 𝑥1: what is the value of green (could also be red or blue) color channel

if you look: 𝜃1 pixel right and 𝜃2 pixels upValue 𝑥2: what is the value of green green (could also be red or blue) color channel

if you look: 𝜃3 pixel right and 𝜃4 pixels down

One choice of 𝜃 another choice of 𝜃

Goal: find a such a 𝜃that it is best to separate the data

𝑝𝑜𝑠 + (𝜃1, 𝜃2)

𝑝𝑜𝑠 + (𝜃3, 𝜃4)

Image Labeling (2 classes, red and blue)






• Wrap-up


Bag of Words - Overview


Feature Detection and Representation


Codeword dictionary formation


Codeword dictionary visualization

03/02/2014 Computer Vision I: Recognition and Wrap-Up 16

K = 174 (averaged patches for each cluster) [from Fei Fei Li]

Bag of Words – Image Representation

03/02/2014 17

K = 174


Bag of Words - Overview


Two approaches

03/02/2014 19

Generative approach: models distributions

Discriminative function:models decision function


Discriminative functions

03/02/2014 20

“2D space (two codewords)”

Support Vector Machine is the optimal classifier -> see Machine Learning 1

e.g. cars

e.g. sky


Which Hyperplane is best and why?

03/02/2014 21

SVM classifier: Max-Margin behavior – best generalization


Reminder: Random Forest also have max-margun effect due to bagging

03/02/2014 22

Training different trees in the forest

Testing different trees in the forest

Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic

Training points


Support Vector Machines


Simpler decision functions are better

03/02/2014 24

[Florian Markowetz]

Best generalization


Two approaches

03/02/2014 25

Generative approach: models distributions

Discriminative function:models decision function


Bayes Decision Theory

03/02/2014 26

(Likelihood)


Bayes Decision Theory


Bayesian Decision Theory

03/02/2014 28

may







03/02/2014 31

MAP classifier:

A classifier obeying this rule is called a MAP classifier (sometimes called Bayes optimal classifier)


Relation to previous lecture 10 and 11

03/02/2014 32

• Image gets a label (class): 𝐾 labelings

• Each pixel gets a label (class):𝐾𝑛 labelings

• Pixels are structured

• In ML / CV 2 we do further classifiers, in particular minimizing the expected risk.


Naive Bayes Classifier

03/02/2014 33

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions

• Encode each image as a feature vector𝒙 = (𝑥1, … , 𝑥𝑛) where 𝑛 is the

number of interest points.

• 𝑥𝑗 ∈ {𝑤1, …𝑤𝑚}. Here 𝑚 visual words.~200 interest points Interest points for

codewords (visual words)

• Naive Bayes Classifier assumes that visual words are conditionally independent

given object class: 𝑃 𝑥 𝑐 = 𝑗𝑃(𝑥𝑗|𝑐)

(which is rarely true in practice)

• Naive Bayes Classifier:

𝑐∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃 𝑐 𝑥 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃 𝑐 𝑃 𝑥 𝑐 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑐 𝑃(𝑐) 𝑗 𝑃(𝑥𝑗|𝑐)


http://en.wikipedia.org/wiki/Statistical_classification

http://en.wikipedia.org/wiki/Bayes'_theorem

http://en.wikipedia.org/wiki/Statistical_independence

Example (blackboard)

03/02/2014 34

Training:


Example (blackboard)

03/02/2014 35

Test:


Image Classification with Naive Bayes


Image Classification with Naive Bayes


Bag of words – Done!


Summary and Discussion

03/02/2014 39

• Bag of words representation:• Sparse representation of object categories• Many Machine learning techniques can be applied (here

naïve Bayes and SVM)• Robust to occlusion• Allows sharing of representation between multiple

classes (via codeword dictionary)

• Problems:• Localization of objects in images not done• Spatial distribution of visual works is not modelled.


Half-way slide

3 minutes break

Questions?






• Wrap-up


Reminder: Class-based recognition: Level of Detail

• Image Categorization

• One or more categories per image

• Object Class Detection

• Also find bounding box

• Part-based Object Detection

• Find parts of the object (and in this way the full object)

• Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)

• Object-class segmentation

03/02/2014 42

Frog, branch

2D bounding box for each frog


Object Detection – General Process

03/02/2014 43

From Derek Hoiem


Specifying an Object Model

03/02/2014 44

0. Window-based Bag-of-world Model, use a bag of word-mode for each possible window

comment this can be done fast with branch and bound search:Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann, "Beyond Sliding

Windows: Object Localization by Efficient Subwindow Search", IEEE Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, 2008


http://pub.ist.ac.at/~chl/papers/lampert-cvpr2008a.pdf

http://vision.eecs.ucf.edu/






03/02/2014 47

… Details to come in CV 2






Generate Hypothesis


Generate Hypothesis

03/02/2014 51

Each window is evaluated separately


Generate Hypothesis


Generate Hypothesis






Resolution


Resolution


Some Influential Work in this Field






• Wrap-up


This was Computer Vision 1

03/02/2014 60

• Introduction• Basics of Image processing• Basics of Image processing – Part 2• Image formation process and single camera model• Image formation process and single camera model – Part 2• Two-View Geometry• Robust Two-View Geometry• Multi-View 3D reconstruction• Dense Correspondence Fields• Discrete Labeling• Sematic Segmentation• Recognition • Recognition and Wrap-Up


Commuter Vision timeline

03/02/2014 61

[Michael Black]



03/02/2014 62

[Michael Black]



03/02/2014 63

[Michael Black]

1999



03/02/2014 64

[Michael Black]

1999 1999-2012



03/02/2014 65

[Michael Black]

1999 12 2012 - ?1999-2012


Please ask me if you want to work on computer vision

• We have openings for Master, Bachelor thesis, Seminar, Project work, and SHKs

03/02/2014 66

• Lots of academic collaborations in Europe: Oxford, UCL, Heidelberg, Darmstadt, ect.

• Good contacts to companies: Adobe, Microsoft, etc.


Activities in the CVLD – Project 1

03/02/2014 67

The Two-Frame Inverse Rendering Project:

Input: Two large-displacement images

Output: depth, motion, material, light, objects, segmentation, etc.

Application: image editing, image enhancement, image search, augmented reality, …

Challenge: build a joint statistical model:

1) How to do efficient inference, such as approximating rendering equation?

2) What aspects should be learned and what derived from physics?

3) Do you get better results when deriving multiple outputs?

4) What are good scene priors? Can we use synthetic models from graphics?


Activities in the CVLD – project 1

03/02/2014 68

2 RGBD Input



03/02/2014 69

http://www.inf.tu-dresden.de/index.php?node_id=1886&ln=en

Object 1; Material: wood

Object 2Material: stone Object 3

Material: glass

Scene in Blender

Inoput Two Images



03/02/2014 70

• Fast detection and 6D object pose estimation of multiple objects from an RGBD stream:

• Scan-in 3D objects (using KinectFusion [Izadi et al. UIST 2011])

Object Instance Recognition



• Improved Regression tree fields

03/02/2014 71

yi

𝐸 𝒚, 𝒙,𝒘 =

𝐹

𝐸𝐹 𝑦𝐹, 𝒙, 𝑤𝐹

Factors graph:Factor graph - compact:

Gibbs distribution with energy:



03/02/2014 72

Inputx = K*y

Output y

State of the art for Image de-convolution:


Computer Vision 2: Models, Inference and Learning• Introduction structured models (summary of CV 1)

• Discrete models – Inference (state-of-the art)• Pairwise models – combinatorial optimization, message passing, etc. • Concept of re-parametrization, tree-reweighted message passing• Higher-order models – dual decomposition, 𝑃𝑛-Potts, etc• Continuous-label models – Gaussian Random Fields, IRLS, PMBP, etc.• probabilistic inference – variational methods, sampling• OpenGM and other state-of-the art libraries

• Discrete models - Learning (state-of-the art)• Probabilistic learning – Field of Experts• Loss-based Learning - Regression/Decision Tree Fields, struct-SVM

• Applications: • object recognition and detection (part based, deformable shape models, etc)• 3D scene understanding• Intrinsic Image decomposition• Image partitioning, segmentation and matting

• Continuous Domain models


Feedback

• Please give me feedback