Detecting Object Instances Without Discriminative Featuresehsiao/thesis/ehsiao_thesis_slides.pdf ·...

Detecting Object Instances Without Discriminative Features

Edward Hsiao

June 19, 2013

Thesis Committee: Martial Hebert, Chair

Alexei Efros Takeo Kanade

Andrew Zisserman, University of Oxford 1

Object Instance Detection

Find this object under arbitrary viewpoint, lighting, clutter and occlusions 2

3

4

Robotic Manipulation

5

Scene Understanding

6

Scene Understanding

Stove Refrigerator

Microwave Coffee maker

Paper towel

Dishwasher

Faucet

7

Visual Search

8

Recognition Using Discriminative Features

model test image

9

[SIFT, Lowe 2004]

Extract Keypoints

test image

10

model

[SIFT, Lowe 2004]

Generate 1-To-1 Correspondences

test image

11

model

[SIFT, Lowe 2004]

Enforce Geometric Constraints

test image

12

model

[SIFT, Lowe 2004]

Recognized Object

test image

13

model

[SIFT, Lowe 2004]

Failure of Feature Matching

14

test image model

0 correct correspondences

Overview Lack of Discriminative Features

Ambiguous Keypoint Features

Feature-poor objects

Occlusions

15




Occlusions

16


17

Repeated Patterns

18

Failure of Discriminative Matching

Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.

Image keypoint descriptor

19


Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.


? or

One-to-one matching

20


Geometric model

mdesc2

Model descriptors

mdesc1

.

.

.


? or

One-to-one matching

Most approaches discard ambiguous features 21

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

.

.

.


22

Quantized Matching

Geometric model

qdesc2

Quantized model descriptors

qdesc1

.

.

.


Quantized matching

Preserve ambiguity of match until geometric verification 23

Detection Performance

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Average Precision (higher is better)

24

CMU Grocery Dataset

620 images, 10 household objects

one-to-one matching

[Collet et al. 2009]

quantized matching

Failure of Feature Matching

test image

25 0 correct correspondences

model

Keypoint Comparison

Success Failure

26

Uninformative Keypoints

27


28


29

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints contained entirely within the object 30

“Informative” Keypoints

980 keypoints 10 keypoints

Keypoints due to specularities 31

Less keypoints More keypoints

Feature-richness

32


Feature-richness

33


Feature-richness

34


Feature-richness

35

Feature Matching Experiment

36


37


38


39

At least 5 good correspondences between all pairs of images


Works Fails

40


Works Fails

41


Works Fails

42


Feature-rich Feature-poor

43


Feature-rich Feature-poor

44




Occlusions

45

46

Feature-poor Objects Shape Matching

Template shape Input window Matched shape

47

Representing Feature-poor Objects

Sparse Edge Points [Berg 2005], [Leordeanu 2007],

[Duchenne 2009], [Hinterstoisser 2011]

Lines & Contour Fragments [Ferrari 2006 & 2008],

[Opelt 2006], [Srinivasan 2010]

Histogram of Oriented Gradients (HOG) [Dalal and Triggs 2005], [Lai 2011]

48

Sparse Edge Points

49

Local information: gradient orientation and color

Sparse Edge Points Matched Not matched

50

Sparse Edge Points Matched Not matched

51

Sparse Edge Points

Edge connectivity is lost

Matched Not matched

52

Lines & Contour Fragments

53


Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

54


Line fitting is brittle

Difficult to parameterize

Dependent on edge extraction

Splines sensitive to occlusions

55

Histogram of Oriented Gradients

56

Coarse statistics of gradient orientation and magnitude


Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

57


Corrupted by background clutter Ambiguous shape

patch HOG HOG patch

58

Gradient Networks Our Approach

1. Match shape explicitly 2. Enforce connectivity without extracting edges

59

Gradient Networks Overview

Shape template Input window

60

Gradient Networks Overview

Shape template Input window

61

Gradient Networks Local Shape Potential

How well does each pixel match locally? 62

Gradient Networks Predicted Shape Match

Find long connected components which follow shape 63

Local Shape Potential

Distance to template Local orientation

Color Edge potential 64







Local Orientation Potential

67

model test

local orientation potential








70

Gradient Networks

𝑝

𝑝

Each pixel is a node in the network 71

𝑝

Gradient Networks

pQ0

pQ1

q

𝑝

Connect each node to neighbors in tangent direction 72

Gradient Networks

𝑝

𝑝

Find paths in the network that match the shape well 73

𝑝

Message Passing Local shape potential

shape similarity

local shape potential

message from left

message from right

[Bhat et al. 2010]

74

𝑝

Message Passing Local shape potential

Initially, it is just the local shape potential 75

Message Passing

𝑝

Local shape potential

76

Message Passing

𝑝


77

Message Passing

𝑝


78

Predicted Shape Match

Local shape potential Predicted match

Message passing

79

CMU Kitchen Occlusion Dataset

• 1600 images of 8 feature-poor objects • Single and multiple viewpoints • Cluttered scenes and occlusions

80

Objects Example images

Shape Matching Results

Template Input window Local shape potential

Predicted match 81



Predicted match 82



Predicted match 83

Object Detection Sliding Window

84

Object Detection Sliding Window

85


86

better

False positives with shape only

Object False positive window

GN point-wise confidences

88

Interior Appearance

Object False positive window

GN point-wise confidences

89

BaRT Boundary and Region Templates

90


91

Boundary

Explicit shape: rLINE2D and GN 92


93

Region

Consider appearance within the object interior HOG and color

94


95

BaRT

Combines explicit boundary and region information

96

HOG Uniform Regions

Uniform regions not represented well

97

HOG Normalization

98

Each cell normalized with respect to magnitude of neighbors

HOG Normalization

99

Amplifies noise if magnitude close to 0

Uniform Regions

100

Learning?

…

HOG + SVM Multiple images

weight = 0

HOG + exemplar SVM Single image

weight = random

101

Learning?

…


weight = 0


weight = random

102

Learning?

…


weight = 0


weight = random

103

Modify HOG Normalization

Modified HOG HOG

Set cell to zero if normalization below threshold

104

Matching Uniform Regions

Ours HOG

Test image:

105

HOG Ours


Ours HOG

Test image:

HOG Ours

106


Ours HOG

More accurate confidences in uniform regions

Test image:

HOG Ours

107

Example Detections

detection zoomed in boundary (GN)

region (HOG+color) 108

Example Detections



Example Detections




112


113

Detection Performance Under Different Occlusion Levels

114


115




Occlusions

116

Occlusions

117

Occlusions

118

Occlusions happen in 3D

119


120


121


122

Occlusion Reasoning

Matched Not matched

Which of these hypotheses is most likely? 123

Occlusion Reasoning

Matched Not matched


Occlusion Reasoning

Matched Not matched


Occlusion Reasoning

Matched Not matched


Occlusion Reasoning

Local Coherency Fransens ‘06, Wang ‘09

Learn Occlusion Structure Gao ’11, Kwak ‘11

Object Detection Depth Ordering Wu ‘05, Wang ‘11

127

Structure of Occlusions

Binary variable that equals 1 if is visible

Probability a point is visible given the visibility labeling of all other points

Occlusion Conditional Likelihood

Occlusion under a given camera view point c

128

Matched Not matched

Occlusion Reasoning Per Environment

objH

objWobjL

Estimate of object dimensions Distribution of object dimensions for a given environment

129

Occlusion Model

130

Occlusion Model

Occluder

Object

131

Occlusion Model objW

objH

h

w

Occluder

Object

132

Occlusion Model objW

objH

h

w

Occluder

Object

133


jX

iX

𝐴𝑉𝑖,𝑉𝑗,𝑂𝑐

jX

𝐴𝑉𝑗,𝑂𝑐

Integral Geometry 134


jX

iX


jX


Area covering all positions where Xj is visible and object occluded 135


jX

iX


jX




jX

iX


jX




jX

iX


jX


Area covering all positions where Xj and Xj are visible and object occluded 138


139

𝑿𝒋

Occlusion Conditional Likelihood Under Different Viewpoints

140

Occlusion Conditional Likelihood Under Different Viewpoints

141

Occlusion Conditional Likelihood Penalty (OCLP)

iX

High penalty if unlikely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

142


iX

Low penalty if likely to be occluded by a valid object on same support surface

Matched Not matched

:OCLPf

143


iXMatched Not matched

Low penalty if likely to be occluded by a valid object on same support surface :OCLPf

144

Example Detections

145


146


147

Limitation

Binary Matching Pattern Occlusion Conditional Likelihood

148

Limitation

Misclassifications can have impact on distribution

Binary Matching Pattern Occlusion Conditional Likelihood

149

Occlusion Efficient Subwindow Search (OESS)

Probabilistic Matching Pattern 150 Probabilistic Matching Pattern

OESS for True Positive

Occlusion can be explained well 151

OESS for True Positive

95% explained 152

OESS for False Positive

153

OESS for False Positive

Only 50% explained 154

OESS Scoring Matching Pattern

-1

+1 +1

-1

𝑝 = 1

𝑝 = 0

score = (1) + (1) + (-1) + (-1) = 0 155

+1

+1 +1

-1

Occluding block

𝑝 = 1

𝑝 = 0


score = (1) + (1) + (1) + (-1) = 2 156

rewarded


+1

-1 +1

-1

Occluding block

𝑝 = 1

𝑝 = 0

penalized

score = (-1) + (1) + (1) + (-1) = 0 157

OESS

Reformulate as Efficient Subwindow Search (ESS) 158

OESS

Find best occluder object 159

OESS

Remove all explained points 160

OESS

Iterate 161

OESS

Iterate 162

OESS

Iterate 163

OESS

Final prediction 164

Results

groundtruth predicted oboxes boundary region window detection

165

Results


166

Results


167

Results


168

Occlusion Prediction Performance

vs. predicted groundtruth

169

Average Intersection over Union (IoU)

Occlusion Prediction Performance

vs.

predicted groundtruth 170


171

172

Summary Lack of Discriminative Features

Gradient Networks

Boundary and Region Templates


Occlusion Efficient Subwindow Search



Occlusions 173

Main Contributions Ambiguous Keypoint Features

Making specific features less discriminative 174

Main Contributions Representing Feature-poor Objects

Gradient Networks Boundary and Region Templates Explicit shape matching without

extracting edges Capture explicit boundary

and region information

175





176





177

Main Contributions Occlusion Reasoning


Representing occlusion structure under arbitrary viewpoint


Directly search for occluding blocks to explain matching pattern

178






179






180

Acknowledgements

181

Martial Hebert Alexei Efros Takeo Kanade Andrew Zisserman

182

183

184

Background

Augmented Reality

185

3D model Target environment

Augmented Reality

186

3D model Target environment

Instance vs. Category Recognition

187

Instance Arbitrary viewpoint and lighting

Single image per view

Category Intra-class variations

Many images per view

Ambiguous Viewpoint

188

Failure of SIFT Matching

189

Invariant Approaches

190

Future Directions

Fine-grained verification

Scalability 3D

191

Fine-grained Verification

192

Scalability

193

3D

194

195

Datasets

CMU Grocery Dataset

• 620 images of household objects – 10 objects

• 25 single instance, 25 double instance • 12 with ground truth pose

– Clutter, viewpoint, lighting, occlusion

CMU Kitchen Occlusion Dataset

• 1600 images of 8 household objects • Single and multiple viewpoints • Cluttered scenes and occlusions

197 Hsiao and Hebert, CVPR 2012.

Objects Example images

198

Gradient Networks


199

Region of influence Appearance Edge

Local Appearance

200

Gradient Orientation Color

Potentials

201

Pairwise

Unary

Message Passing

202

Shape Similarity

Probability Calibration

scores

Scheirer et al. CVPR 2012

NOT Object Object

Dens

ity o

f N

OT

Obj

ect

Probability of O

bject Weibull fit to

tail of negative distribution

…

CDF of NOT Object

203

Soft Shape Model

204

Additional Results

205

Color Potential

206

LINE2D Similarity

207

ipModel point

∑=

∆=N

iiDLINEscore

12 )cos( θ

LINE2D (Hinterstoisser et al., PAMI 2011)

00.1)0cos( =o

71.0)45cos( =o

iθ∆

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

∑=

=∆=N

iiDrLINEscore

12 )0( θδ

Robust LINE2D Similarity

208

iθ∆

ipModel point

rLINE2D (Hsiao and Hebert, CVPR 2012)

Quantized gradient orientation of model point, pi

Quantized gradient orientation of the best matching image point in a local neighborhood

Message Passing Iterations

209

Probability Calibration

210

F-Measure of Shape Matching

211

Single View

212

Multiple View

213

Detection Rate @ 1.0 FPPI

214

Detection Rate @ 1.0 FPPI

215

False Positives

216

217

BaRT

Grid Optimization

Un-optimized : 57 cells Optimized : 60 cells

218

HOG Normalization

219

Amplifies noise in uniform region!

HOG Normalization

220

Sensitive to shading effects!

HOG Normalization Pedestrians

221

Average Precision

222

Single View

223

Multiple View

224

False positives

Match both boundary and region

225

BaRT False Positives Insufficient edge evidence

Unlikely occlusion configuration

Region information is only informative after there is a plausible hypothesis based on the boundary

226

227

Occlusion Reasoning

Occlusion Model

228

Occlusion Scoring

Sliding window

Object detector

Occlusion hypothesis (binary)

Score of window

Occlusion model

229


230


Approximation

Analytic Approximate

231

Distribution of Physical Dimensions

Household Objects

232

Occlusion Statistics

233

Validity of Occlusion Model

234

Occlusion Penalty

Occlusion Prior Penalty (OPP)


235

Average Precision

236

Performance vs. Occlusion

237

Learning from Data

238

Parameter Sensitivity

239

240

OESS

Occlusion Upper Bound

241

OESS Algorithm

242

OESS vs. Brute Force

243

Occlusion Prediction

244

Object Detection Performance

245

246

Ambiguous Features

Problem

• Not enough correct matches

Result of our system Difficult to obtain matches

Discriminative hierarchical matching (DHM)

Model features (Level 0)

Quantized features (Level 1)

Quantized features (Level 2)

Image features

discriminative match



Candidate correspondences

aggregate

DHM example

All features

DHM result

DHM – 11 correct matches (soymilk can)

Ratio test – 3 correct matches (soymilk can)

Simulated Affine (SA)

Morel & Yu 2009

Baseline systems • Gordon & Lowe

– SIFT + RANSAC – Levenberg-Marquardt non-linear optimization

• Enhanced PnP (EPnP)

– Gordon & Lowe – EPnP non-iterative pose estimation algorithm

• Collet et al.

– Gordon & Lowe – Mean-shift spatial clustering of image features

Averaged precision-recall

Average Precision

Object detection results

Failure cases

Pose ambiguity

Repeated patterns

Extreme lighting, occlusion, viewpoint…etc

Date post:	23-Jul-2020
Category:	Documents
Upload:	others
View:	8 times
Download:	0 times

Detecting Object Instances Without Discriminative Featuresehsiao/thesis/ehsiao_thesis_slides.pdf ·...

Documents