Object Category Detection: Parts-based Modelsdhoiem.cs.illinois.edu/courses/vision_spring10... ·...

Object Category Detection:Parts-based Models

Computer Vision

CS 543 / ECE 549

University of Illinois

Derek Hoiem

03/30/10

Administrative stuff

• Returning homeworks

Administrative stuff• Projects: next class, each group gives 2 minute

summary of projects– Goal– Progress so far– If you want to show an image or figure, e-mail me one

image or powerpoint slide by Wed 5pm

• Deadlines– HW 3 due April 6– HW 4 due May 4 (should be out April 9)– Projects due in ~5 weeks

• Poster session– During finals week

• May 11 12:30-2:30 (Tues) or May 13 12:30-2:30 (Thurs)

Goal: Detect all instances of objectsCars

Faces

Cats

Object model: last class

• Statistical Template in Bounding Box– Object is some (x,y,w,h) in image

– Features defined wrt bounding box coordinates

Image Template Visualization

Images from Felzenszwalb

Last class: sliding window detection

Last class: statistical template

• Object model = log linear model of parts at fixed positions

+3 +2 -2 -1 -2.5 = -0.5

+4 +1 +0.5 +3 +0.5= 10.5

> 7.5?

> 7.5?

Non-object

Object

When do statistical templates make sense?

Caltech 101 Average Object Images

Object models: this class

• Articulated parts model– Object is configuration of parts

– Each part is detectable

Images from Felzenszwalb

Deformable objects

Images from Caltech-256

Slide Credit: Duan Tran

Deformable objects

Images from D. Ramanan’s datasetSlide Credit: Duan Tran

Compositional objects

Parts-based Models

Define object by collection of parts modeled by1. Appearance

2. Spatial configuration

Slide credit: Rob Fergus

How to model spatial relations?

• One extreme: fixed template


• Another extreme: bag of words

=


• Star-shaped model

Root

Part

Part

Part

Part

Part


• Star-shaped model

=X X

XRoot

Part

Part

Part

Part

Part

How to model spatial relations?• Tree-shaped model


Fergus et al. ’03Fei-Fei et al. ‘03

Leibe et al. ’04, ‘08Crandall et al. ‘05Fergus et al. ’05

Crandall et al. ‘05 Felzenszwalb & Huttenlocher ‘05

Bouchard & Triggs ‘05 Carneiro & Lowe ‘06Csurka ’04Vasconcelos ‘00

from [Carneiro & Lowe, ECCV’06]

O(N6) O(N2) O(N3) O(N2)

• Many others...

Today’s class

1. Star-shaped model – Example: ISM

• Leibe et al. 2004, 2008

2. Tree-shaped model– Example: Pictorial structures

• Felzenszwalb Huttenlocher 2005

Root

Part

Part

Part

Part

Part

http://www.cognitivesystems.org/publications/fulltext.pdf

http://www.cs.cornell.edu/~dph/papers/pict-struct-ijcv.pdf

ISM: Implicit Shape Model

Training overview• Start with bounding boxes and (ideally) segmentations of

objects

• Extract local features (e.g., patches or SIFT) at interest points on objects

• Cluster features to create codebook

• Record relative bounding box and segmentation for each codeword

ISM: Implicit Shape Model

Testing overview• Extract interest points in test image

• Softly match to codebook entries

• Each matched codeword votes for object bounding box

• Compute modes of votes using mean-shift

• Check which codewords voted for modes

• Refine

K. Grauman, B. Leibe

Codebook Representation

• Extraction of local object featuresInterest Points (e.g. Harris detector)Sparse representation of the object appearance

• Collect features from whole training set

• Example:


Agglomerative Clustering

• Algorithm (Average-Link)1. Start with each patch as a cluster of its own2. Repeatedly merge the two most similar clusters X and Y,

where the similarity between two clusters is defined as the average similarity between their members

3. Until

• Commonly used similarity measuresNormalized correlationEuclidean distances

θ<),sim( YX


Appearance Codebook

• Clustering ResultsVisual similarity preservedWheel parts, window corners, fenders, ...Store cluster centers as Appearance Codebook

…


Voting with Local Features

• For every feature, store possible “occurrences”

• For new image, let the matched features vote for possible object positions

Record relative size and scale of object

Implicit Shape Model - RecognitionInterest Points Matched Codebook

EntriesProbabilistic

Voting

3D Voting Space(continuous)

x

y

s

Object Position

o,x

Image Feature

f

Interpretation(Codebook match)

Ci

)( fCp i ),,( lin Cxop

∑=i

inin CxopfCpfxop ),,()(),,( ll

[Leibe04, Leibe08]


• Mean-Shift formulation for refinementScale-adaptive balloon density estimator

Scale Voting: Efficient Computation

y

s

Binned accum. array

y

s

x

Refinement(MSME)

y

s

x

Candidatemaxima

y

s

Scale votes

Implicit Shape Model - Recognition

BackprojectedHypotheses

Interest Points Matched Codebook Entries

Probabilistic Voting


x

y

s

Backprojectionof Maxima

[Leibe04, Leibe08]


Original image

Example: Results on Cows


Original imageInterest points



O�� I��

��

Matched patches



O�� I��

��

M�� Prob. Votes



1st hypothesis



2nd hypothesis




3rd hypothesis

ISM: Detection Results

• Qualitative Performance– Recognizes different kinds of objects

– Robust to clutter, occlusion, noise, low contrast


Beyond bounding boxes

Backprojected codewords can vote:• Pixel segmentation

• Part layout

• Pose

• Depth values BackprojectedHypotheses




x

y

s



Segmentation: Probabilistic Formulation

• Influence of patch on object hypothesis (vote weight)

( ) ( ) ( ) ( )( )xop

f,pfCpCxopxofp

n

i iinn ,

||,,, ∑= l

l

( ) ( ) ( )∑∈

===),(

,|,,,,|,|l

llf

nnn xofpxoffigurepxofigurepp

pp• Backprojection to features f and pixels p:

Segmentationinformation

Influence on object hypothesis

[Leibe04, Leibe08]


ISM – Top-Down Segmentation

BackprojectedHypotheses



Segmentation3D Voting Space

(continuous)

x

y

s


p(figure)Probabilities

[Leibe04, Leibe08]

46K. Grauman, B. Leibe

Example Results: Motorbikes

47B. Leibe

Example Results: Chairs

Office chairs

Dining room chairs

48

Inferring Other Information: Part Labels

Training

Test Output

[Thomas07]

49

Inferring Other Information: Part Labels (2)

[Thomas07]

50

Inferring Other Information: Depth Maps

“Depth from a single image”

[Thomas07]

Tree-shaped model

Pictorial Structures Model

Part = oriented rectangle Spatial model = relative size/orientation

Felzenszwalb and Huttenlocher 2005

Pictorial Structures Model

Appearance likelihood Geometry likelihood

Pictorial structures model

Optimization is tricky but can be efficient

Maximization

• For each l1, find best l2:

• Remove v2, and repeat with smaller tree, until only a single part

• For n parts, k locations per part, this has complexity of O(nk2), but can be solved in ~O(nk) usinggeneralized distance transform

Pictorial structures model

Optimization is tricky but can be efficient

Sampling

• Sample root node, then each node given parent, until all parts are sampled

Sample poses from likelihood and choose best match with Chamfer distance

Results for person matching

58

Results for person matching

59

Recently enhanced pictorial structures

BMVC 2009

Things to remember• Rather than searching for whole

object, can locate “parts” that vote for object– Better encoding of spatial

variation

• These parts can vote for other things too

• Models can be broken down into part appearance and spatial configuration– Wide variety of models

• Efficient optimization is often tricky, but many tricks available

Next class

• Each group gives 2 minute summary of projects– Goal

– Progress so far

– If you want to show an image or figure, e-mail me one image or powerpoint slide by Wed 5pm

• Review of object recognition

Date post:	25-Jul-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Object Category Detection: Parts-based Modelsdhoiem.cs.illinois.edu/courses/vision_spring10... ·...

Documents