Detecting Object Instances Without Discriminative Features
Edward Hsiao
June 19, 2013
Thesis Committee: Martial Hebert, Chair
Alexei Efros Takeo Kanade
Andrew Zisserman, University of Oxford 1
Object Instance Detection
Find this object under arbitrary viewpoint, lighting, clutter and occlusions 2
3
4
Robotic Manipulation
5
Scene Understanding
6
Scene Understanding
Stove Refrigerator
Microwave Coffee maker
Paper towel
Dishwasher
Faucet
7
Visual Search
8
Recognition Using Discriminative Features
model test image
9
[SIFT, Lowe 2004]
Extract Keypoints
test image
10
model
[SIFT, Lowe 2004]
Generate 1-To-1 Correspondences
test image
11
model
[SIFT, Lowe 2004]
Enforce Geometric Constraints
test image
12
model
[SIFT, Lowe 2004]
Recognized Object
test image
13
model
[SIFT, Lowe 2004]
Failure of Feature Matching
14
test image model
0 correct correspondences
Overview Lack of Discriminative Features
Ambiguous Keypoint Features
Feature-poor objects
Occlusions
15
Overview Lack of Discriminative Features
Ambiguous Keypoint Features
Feature-poor objects
Occlusions
16
Ambiguous Keypoint Features
17
Repeated Patterns
18
Failure of Discriminative Matching
Geometric model
mdesc2
Model descriptors
mdesc1
.
.
.
Image keypoint descriptor
19
Failure of Discriminative Matching
Geometric model
mdesc2
Model descriptors
mdesc1
.
.
.
Image keypoint descriptor
? or
One-to-one matching
20
Failure of Discriminative Matching
Geometric model
mdesc2
Model descriptors
mdesc1
.
.
.
Image keypoint descriptor
? or
One-to-one matching
Most approaches discard ambiguous features 21
Quantized Matching
Geometric model
qdesc2
Quantized model descriptors
qdesc1
.
.
.
Image keypoint descriptor
22
Quantized Matching
Geometric model
qdesc2
Quantized model descriptors
qdesc1
.
.
.
Image keypoint descriptor
Quantized matching
Preserve ambiguity of match until geometric verification 23
Detection Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Average Precision (higher is better)
24
CMU Grocery Dataset
620 images, 10 household objects
one-to-one matching
[Collet et al. 2009]
quantized matching
Failure of Feature Matching
test image
25 0 correct correspondences
model
Keypoint Comparison
Success Failure
26
Uninformative Keypoints
27
Uninformative Keypoints
28
Uninformative Keypoints
29
“Informative” Keypoints
980 keypoints 10 keypoints
Keypoints contained entirely within the object 30
“Informative” Keypoints
980 keypoints 10 keypoints
Keypoints due to specularities 31
Less keypoints More keypoints
Feature-richness
32
Less keypoints More keypoints
Feature-richness
33
Less keypoints More keypoints
Feature-richness
34
Less keypoints More keypoints
Feature-richness
35
Feature Matching Experiment
36
Feature Matching Experiment
37
Feature Matching Experiment
38
Feature Matching Experiment
39
At least 5 good correspondences between all pairs of images
Less keypoints More keypoints
Works Fails
40
Less keypoints More keypoints
Works Fails
41
Less keypoints More keypoints
Works Fails
42
Less keypoints More keypoints
Feature-rich Feature-poor
43
Less keypoints More keypoints
Feature-rich Feature-poor
44
Overview Lack of Discriminative Features
Ambiguous Keypoint Features
Feature-poor objects
Occlusions
45
46
Feature-poor Objects Shape Matching
Template shape Input window Matched shape
47
Representing Feature-poor Objects
Sparse Edge Points [Berg 2005], [Leordeanu 2007],
[Duchenne 2009], [Hinterstoisser 2011]
Lines & Contour Fragments [Ferrari 2006 & 2008],
[Opelt 2006], [Srinivasan 2010]
Histogram of Oriented Gradients (HOG) [Dalal and Triggs 2005], [Lai 2011]
48
Sparse Edge Points
49
Local information: gradient orientation and color
Sparse Edge Points Matched Not matched
50
Sparse Edge Points Matched Not matched
51
Sparse Edge Points
Edge connectivity is lost
Matched Not matched
52
Lines & Contour Fragments
53
Lines & Contour Fragments
Line fitting is brittle
Difficult to parameterize
Dependent on edge extraction
Splines sensitive to occlusions
54
Lines & Contour Fragments
Line fitting is brittle
Difficult to parameterize
Dependent on edge extraction
Splines sensitive to occlusions
55
Histogram of Oriented Gradients
56
Coarse statistics of gradient orientation and magnitude
Histogram of Oriented Gradients
Corrupted by background clutter Ambiguous shape
patch HOG HOG patch
57
Histogram of Oriented Gradients
Corrupted by background clutter Ambiguous shape
patch HOG HOG patch
58
Gradient Networks Our Approach
1. Match shape explicitly 2. Enforce connectivity without extracting edges
59
Gradient Networks Overview
Shape template Input window
60
Gradient Networks Overview
Shape template Input window
61
Gradient Networks Local Shape Potential
How well does each pixel match locally? 62
Gradient Networks Predicted Shape Match
Find long connected components which follow shape 63
Local Shape Potential
Distance to template Local orientation
Color Edge potential 64
Local Shape Potential
Distance to template Local orientation
Color Edge potential 65
Local Shape Potential
Distance to template Local orientation
Color Edge potential 66
Local Orientation Potential
67
model test
local orientation potential
Local Shape Potential
Distance to template Local orientation
Color Edge potential 68
Local Shape Potential
Distance to template Local orientation
Color Edge potential 69
Local Shape Potential
70
Gradient Networks
𝑝
𝑝
Each pixel is a node in the network 71
𝑝
Gradient Networks
pQ0
pQ1
q
𝑝
Connect each node to neighbors in tangent direction 72
Gradient Networks
𝑝
𝑝
Find paths in the network that match the shape well 73
𝑝
Message Passing Local shape potential
shape similarity
local shape potential
message from left
message from right
[Bhat et al. 2010]
74
𝑝
Message Passing Local shape potential
Initially, it is just the local shape potential 75
Message Passing
𝑝
Local shape potential
76
Message Passing
𝑝
Local shape potential
77
Message Passing
𝑝
Local shape potential
78
Predicted Shape Match
Local shape potential Predicted match
Message passing
79
CMU Kitchen Occlusion Dataset
• 1600 images of 8 feature-poor objects • Single and multiple viewpoints • Cluttered scenes and occlusions
80
Objects Example images
Shape Matching Results
Template Input window Local shape potential
Predicted match 81
Shape Matching Results
Template Input window Local shape potential
Predicted match 82
Shape Matching Results
Template Input window Local shape potential
Predicted match 83
Object Detection Sliding Window
84
Object Detection Sliding Window
85
Detection Performance
86
better
False positives with shape only
Object False positive window
GN point-wise confidences
88
Interior Appearance
Object False positive window
GN point-wise confidences
89
BaRT Boundary and Region Templates
90
BaRT Boundary and Region Templates
91
Boundary
Explicit shape: rLINE2D and GN 92
BaRT Boundary and Region Templates
93
Region
Consider appearance within the object interior HOG and color
94
BaRT Boundary and Region Templates
95
BaRT
Combines explicit boundary and region information
96
HOG Uniform Regions
Uniform regions not represented well
97
HOG Normalization
98
Each cell normalized with respect to magnitude of neighbors
HOG Normalization
99
Amplifies noise if magnitude close to 0
Uniform Regions
100
Learning?
…
HOG + SVM Multiple images
weight = 0
HOG + exemplar SVM Single image
weight = random
101
Learning?
…
HOG + SVM Multiple images
weight = 0
HOG + exemplar SVM Single image
weight = random
102
Learning?
…
HOG + SVM Multiple images
weight = 0
HOG + exemplar SVM Single image
weight = random
103
Modify HOG Normalization
Modified HOG HOG
Set cell to zero if normalization below threshold
104
Matching Uniform Regions
Ours HOG
Test image:
105
HOG Ours
Matching Uniform Regions
Ours HOG
Test image:
HOG Ours
106
Matching Uniform Regions
Ours HOG
More accurate confidences in uniform regions
Test image:
HOG Ours
107
Example Detections
detection zoomed in boundary (GN)
region (HOG+color) 108
Example Detections
detection zoomed in boundary (GN)
region (HOG+color) 109
Example Detections
detection zoomed in boundary (GN)
region (HOG+color) 110
Detection Performance
112
Detection Performance
113
Detection Performance Under Different Occlusion Levels
114
Detection Performance Under Different Occlusion Levels
115
Overview Lack of Discriminative Features
Ambiguous Keypoint Features
Feature-poor objects
Occlusions
116
Occlusions
117
Occlusions
118
Occlusions happen in 3D
119
Occlusions happen in 3D
120
Occlusions happen in 3D
121
Occlusions happen in 3D
122
Occlusion Reasoning
Matched Not matched
Which of these hypotheses is most likely? 123
Occlusion Reasoning
Matched Not matched
Which of these hypotheses is most likely? 124
Occlusion Reasoning
Matched Not matched
Which of these hypotheses is most likely? 125
Occlusion Reasoning
Matched Not matched
Which of these hypotheses is most likely? 126
Occlusion Reasoning
Local Coherency Fransens ‘06, Wang ‘09
Learn Occlusion Structure Gao ’11, Kwak ‘11
Object Detection Depth Ordering Wu ‘05, Wang ‘11
127
Structure of Occlusions
Binary variable that equals 1 if is visible
Probability a point is visible given the visibility labeling of all other points
Occlusion Conditional Likelihood
Occlusion under a given camera view point c
128
Matched Not matched
Occlusion Reasoning Per Environment
objH
objWobjL
Estimate of object dimensions Distribution of object dimensions for a given environment
129
Occlusion Model
130
Occlusion Model
Occluder
Object
131
Occlusion Model objW
objH
h
w
Occluder
Object
132
Occlusion Model objW
objH
h
w
Occluder
Object
133
Occlusion Conditional Likelihood
jX
iX
𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐
jX
𝐴𝑉𝑗,𝑂𝑐
Integral Geometry 134
Occlusion Conditional Likelihood
jX
iX
𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐
jX
𝐴𝑉𝑗,𝑂𝑐
Area covering all positions where Xj is visible and object occluded 135
Occlusion Conditional Likelihood
jX
iX
𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐
jX
𝐴𝑉𝑗,𝑂𝑐
Area covering all positions where Xj is visible and object occluded 136
Occlusion Conditional Likelihood
jX
iX
𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐
jX
𝐴𝑉𝑗,𝑂𝑐
Area covering all positions where Xj is visible and object occluded 137
Occlusion Conditional Likelihood
jX
iX
𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐
jX
𝐴𝑉𝑗,𝑂𝑐
Area covering all positions where Xj and Xj are visible and object occluded 138
Occlusion Conditional Likelihood
139
𝑿𝒋
Occlusion Conditional Likelihood Under Different Viewpoints
140
Occlusion Conditional Likelihood Under Different Viewpoints
141
Occlusion Conditional Likelihood Penalty (OCLP)
iX
High penalty if unlikely to be occluded by a valid object on same support surface
Matched Not matched
:OCLPf
142
Occlusion Conditional Likelihood Penalty (OCLP)
iX
Low penalty if likely to be occluded by a valid object on same support surface
Matched Not matched
:OCLPf
143
Occlusion Conditional Likelihood Penalty (OCLP)
iXMatched Not matched
Low penalty if likely to be occluded by a valid object on same support surface :OCLPf
144
Example Detections
145
Detection Performance
146
Detection Performance Under Different Occlusion Levels
147
Limitation
Binary Matching Pattern Occlusion Conditional Likelihood
148
Limitation
Misclassifications can have impact on distribution
Binary Matching Pattern Occlusion Conditional Likelihood
149
Occlusion Efficient Subwindow Search (OESS)
Probabilistic Matching Pattern 150 Probabilistic Matching Pattern
OESS for True Positive
Occlusion can be explained well 151
OESS for True Positive
95% explained 152
OESS for False Positive
153
OESS for False Positive
Only 50% explained 154
OESS Scoring Matching Pattern
-1
+1 +1
-1
𝑝 = 1
𝑝 = 0
score = (1) + (1) + (-1) + (-1) = 0 155
+1
+1 +1
-1
Occluding block
𝑝 = 1
𝑝 = 0
OESS Scoring Matching Pattern
score = (1) + (1) + (1) + (-1) = 2 156
rewarded
OESS Scoring Matching Pattern
+1
-1 +1
-1
Occluding block
𝑝 = 1
𝑝 = 0
penalized
score = (-1) + (1) + (1) + (-1) = 0 157
OESS
Reformulate as Efficient Subwindow Search (ESS) 158
OESS
Find best occluder object 159
OESS
Remove all explained points 160
OESS
Iterate 161
OESS
Iterate 162
OESS
Iterate 163
OESS
Final prediction 164
Results
groundtruth predicted oboxes boundary region window detection
165
Results
groundtruth predicted oboxes boundary region window detection
166
Results
groundtruth predicted oboxes boundary region window detection
167
Results
groundtruth predicted oboxes boundary region window detection
168
Occlusion Prediction Performance
vs. predicted groundtruth
169
Average Intersection over Union (IoU)
Occlusion Prediction Performance
vs.
predicted groundtruth 170
Detection Performance
171
172
Summary Lack of Discriminative Features
Gradient Networks
Boundary and Region Templates
Occlusion Conditional Likelihood
Occlusion Efficient Subwindow Search
Ambiguous Keypoint Features
Feature-poor objects
Occlusions 173
Main Contributions Ambiguous Keypoint Features
Making specific features less discriminative 174
Main Contributions Representing Feature-poor Objects
Gradient Networks Boundary and Region Templates Explicit shape matching without
extracting edges Capture explicit boundary
and region information
175
Main Contributions Representing Feature-poor Objects
Gradient Networks Boundary and Region Templates Explicit shape matching without
extracting edges Capture explicit boundary
and region information
176
Main Contributions Representing Feature-poor Objects
Gradient Networks Boundary and Region Templates Explicit shape matching without
extracting edges Capture explicit boundary
and region information
177
Main Contributions Occlusion Reasoning
Occlusion Conditional Likelihood
Representing occlusion structure under arbitrary viewpoint
Occlusion Efficient Subwindow Search
Directly search for occluding blocks to explain matching pattern
178
Main Contributions Occlusion Reasoning
Occlusion Conditional Likelihood
Representing occlusion structure under arbitrary viewpoint
Occlusion Efficient Subwindow Search
Directly search for occluding blocks to explain matching pattern
179
Main Contributions Occlusion Reasoning
Occlusion Conditional Likelihood
Representing occlusion structure under arbitrary viewpoint
Occlusion Efficient Subwindow Search
Directly search for occluding blocks to explain matching pattern
180
Acknowledgements
181
Martial Hebert Alexei Efros Takeo Kanade Andrew Zisserman
182
183
184
Background
Augmented Reality
185
3D model Target environment
Augmented Reality
186
3D model Target environment
Instance vs. Category Recognition
187
Instance Arbitrary viewpoint and lighting
Single image per view
Category Intra-class variations
Many images per view
Ambiguous Viewpoint
188
Failure of SIFT Matching
189
Invariant Approaches
190
Future Directions
Fine-grained verification
Scalability 3D
191
Fine-grained Verification
192
Scalability
193
3D
194
195
Datasets
CMU Grocery Dataset
• 620 images of household objects – 10 objects
• 25 single instance, 25 double instance • 12 with ground truth pose
– Clutter, viewpoint, lighting, occlusion
CMU Kitchen Occlusion Dataset
• 1600 images of 8 household objects • Single and multiple viewpoints • Cluttered scenes and occlusions
197 Hsiao and Hebert, CVPR 2012.
Objects Example images
198
Gradient Networks
Local Shape Potential
199
Region of influence Appearance Edge
Local Appearance
200
Gradient Orientation Color
Potentials
201
Pairwise
Unary
Message Passing
202
Shape Similarity
Probability Calibration
scores
Scheirer et al. CVPR 2012
NOT Object Object
Dens
ity o
f N
OT
Obj
ect
Probability of O
bject Weibull fit to
tail of negative distribution
…
CDF of NOT Object
203
Soft Shape Model
204
Additional Results
205
Color Potential
206
LINE2D Similarity
207
ipModel point
∑=
∆=N
iiDLINEscore
12 )cos( θ
LINE2D (Hinterstoisser et al., PAMI 2011)
00.1)0cos( =o
71.0)45cos( =o
iθ∆
Quantized gradient orientation of model point, pi
Quantized gradient orientation of the best matching image point in a local neighborhood
∑=
=∆=N
iiDrLINEscore
12 )0( θδ
Robust LINE2D Similarity
208
iθ∆
ipModel point
rLINE2D (Hsiao and Hebert, CVPR 2012)
Quantized gradient orientation of model point, pi
Quantized gradient orientation of the best matching image point in a local neighborhood
Message Passing Iterations
209
Probability Calibration
210
F-Measure of Shape Matching
211
Single View
212
Multiple View
213
Detection Rate @ 1.0 FPPI
214
Detection Rate @ 1.0 FPPI
215
False Positives
216
217
BaRT
Grid Optimization
Un-optimized : 57 cells Optimized : 60 cells
218
HOG Normalization
219
Amplifies noise in uniform region!
HOG Normalization
220
Sensitive to shading effects!
HOG Normalization Pedestrians
221
Average Precision
222
Single View
223
Multiple View
224
False positives
Match both boundary and region
225
BaRT False Positives Insufficient edge evidence
Unlikely occlusion configuration
Region information is only informative after there is a plausible hypothesis based on the boundary
226
227
Occlusion Reasoning
Occlusion Model
228
Occlusion Scoring
Sliding window
Object detector
Occlusion hypothesis (binary)
Score of window
Occlusion model
229
Occlusion Conditional Likelihood
230
Occlusion Conditional Likelihood
Approximation
Analytic Approximate
231
Distribution of Physical Dimensions
Household Objects
232
Occlusion Statistics
233
Validity of Occlusion Model
234
Occlusion Penalty
Occlusion Prior Penalty (OPP)
Occlusion Conditional Likelihood Penalty (OCLP)
235
Average Precision
236
Performance vs. Occlusion
237
Learning from Data
238
Parameter Sensitivity
239
240
OESS
Occlusion Upper Bound
241
OESS Algorithm
242
OESS vs. Brute Force
243
Occlusion Prediction
244
Object Detection Performance
245
246
Ambiguous Features
Problem
• Not enough correct matches
Result of our system Difficult to obtain matches
Discriminative hierarchical matching (DHM)
Model features (Level 0)
Quantized features (Level 1)
Quantized features (Level 2)
Image features
discriminative match
discriminative match
discriminative match
Candidate correspondences
aggregate
DHM example
All features
DHM result
DHM – 11 correct matches (soymilk can)
Ratio test – 3 correct matches (soymilk can)
Simulated Affine (SA)
Morel & Yu 2009
Baseline systems • Gordon & Lowe
– SIFT + RANSAC – Levenberg-Marquardt non-linear optimization
• Enhanced PnP (EPnP)
– Gordon & Lowe – EPnP non-iterative pose estimation algorithm
• Collet et al.
– Gordon & Lowe – Mean-shift spatial clustering of image features
Averaged precision-recall
Average Precision
Object detection results
Failure cases
Pose ambiguity
Repeated patterns
Extreme lighting, occlusion, viewpoint…etc