+ All Categories
Home > Documents > Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Date post: 15-Jan-2016
Category:
Upload: kemal
View: 33 times
Download: 3 times
Share this document with a friend
Description:
Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?. Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Andrew Zisserman Oxford Arasanathan Thayananthan, Bjorn Stenger, Roberto Cipolla Cambridge. Algebra. - PowerPoint PPT Presentation
Popular Tags:
52
Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Andrew Zisserman Oxford Arasanathan Thayananthan, Bjorn Stenger, Roberto Cipolla Cambridge
Transcript
Page 1: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Simultaneous Segmentation and 3D Pose Estimation of Humans

orDetection + Segmentation = Tracking?

Philip H.S. TorrPawan Kumar, Pushmeet Kohli, Matt Bray

Oxford Brookes University

Andrew ZissermanOxford

Arasanathan Thayananthan, Bjorn Stenger, Roberto CipollaCambridge

Page 2: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Algebra

Unifying Conjecture

Tracking = Detection = Recognition Detection = Segmentation

• therefore Tracking (pose estimation)=Segmentation?

Page 3: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Objective

Image Segmentation Pose Estimate??

Aim to get a clean segmentation of a human…

Page 4: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Developments

ICCV 2003, pose estimation as fast nearest neighbour plus dynamics (inspired by Gavrilla and Toyoma & Blake)

BMVC 2004, parts based chamfer to make space of templates more flexible (a la pictorial structures of Huttenlocher)

CVPR 2005, ObjCut combining segmentation and detection.

ECCV 2006, interpolation of poses using the MVRVM (Agarwal and Triggs)

ECCV 2006 combination of pose estimation and segmentation using graph cuts.

Page 5: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Tracking as Detection (Stenger et al ICCV 2003)

Detection has become very efficient,e.g. real-time face detection, pedestrian detection

Example: Pedestrian detection [Gavrila & Philomin, 1999]: Find match among large number of exemplar templates

Issues: Number of templates needed Efficient search Robust cost function

Page 6: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Cascaded Classifiers

Page 7: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

First filter : 19.8 % patches remaining

1280x1024 image, 11 subsampling levels, 80sAverage number of filter per patch : 6.7

Page 8: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Filter 10 : 0.74 % patches remaining

1280x1024 image, 11 subsampling levels, 80sAverage number of filter per patch : 6.7

Page 9: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Filter 20 : 0.06 % patches remaining

1280x1024 image, 11 subsampling levels, 80sAverage number of filter per patch : 6.7

Page 10: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Filter 30 : 0.01 % patches remaining

1280x1024 image, 11 subsampling levels, 80sAverage number of filter per patch : 6.7

Page 11: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Filter 70 : 0.007 % patches remaining

1280x1024 image, 11 subsampling levels, 80sAverage number of filter per patch : 6.7

Page 12: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Hierarchical Detection Efficient template matching (Huttenlocher & Olson,

Gavrila) Idea: When matching similar objects, speed-up by

forming template hierarchy found by clustering Match prototypes first, sub-tree only if cost below

threshold

Page 13: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Trees

These search trees are the same as used for efficient nearest neighbour.

Add dynamic model and • Detection = Tracking = Recognition

Page 14: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Evaluation at Multiple Resolutions

One traversal of tree per time step

Page 15: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Evaluation at Multiple Resolutions

Tree: 9000 templates of hand pointing, rigid

Page 16: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Templates at Level 1

Page 17: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Templates at Level 2

Page 18: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Templates at Level 3

Page 19: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Comparison with Particle Filters

This method is grid based,• No need to render the model on line• Like efficient search• Can always use this as a proposal process for

a particle filter if need be.

Page 20: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Interpolation, MVRVM, ECCV 2006

Code available.

Page 21: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Energy being Optimized, link to graph cuts

Combination of• Edge term (quickly evaluated using chamfer)• Interior term (quickly evaluated using integral

images)

Note that possible templates are a bit like cuts that we put down, one could think of this whole process as a constrained search for the best graph cut.

Page 22: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Likelihood : Edges

Edge Detection Projected Contours

Robust EdgeMatching

Input Image 3D Model

Page 23: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Chamfer MatchingInput image Canny edges

Distance transform Projected Contours

Page 24: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Likelihood : Colour

Skin Colour ModelProjected Silhouette

Input Image 3D Model

Template Matching

Page 25: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Template Matching =

Template Matching = constrained search for a cut/segmentation?

Detection = Segmentation?

Page 26: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Objective

Image Segmentation Pose Estimate??

Aim to get a clean segmentation of a human…

Page 27: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

MRF for Interactive Image Segmentation, Boykov and Jolly [ICCV 2001]

EnergyMR

F

Pair-wise Terms MAP SolutionUnary likelihoodData (D)

Unary likelihood Contrast Term Uniform Prior(Potts Model)

Maximum-a-posteriori (MAP) solution x* = arg min E(x)x

=

Page 28: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

However…

This energy formulation rarely provides realistic (target-

like) results.

Page 29: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Shape-Priors and Segmentation

Combine object detection with segmentation• Obj-Cut, Kumar et al., CVPR ’05• Zhao and Davis, ICCV ’05

Obj-Cut • Shape-Prior: Layered Pictorial Structure (LPS)• Learned exemplars for parts of the LPS model• Obtained impressive results

+

Layer 1 Layer 2

=

LPS model

Page 30: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

LPS for Detection

Learning• Learnt automatically using a set of examples

DetectionTree of chamfers to detect parts, assemble with

pictorial structure and belief propogation.

Page 31: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Solve via Integer Programming

SDP formulation (Torr 2001, AI stats)

SOCP formulation (Kumar, Torr & Zisserman this conference)

LBP (Huttenlocher, many)

Page 32: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Obj-CutImage Likelihood Ratio (Colour)

Shape Prior Distance from

Likelihood + Distance from

Page 33: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Integrating Shape-Prior in MRFs

Unary potential

Pairwise potential

Labels

Pixels

Prior Potts model

MRF for segmentation

Page 34: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Integrating Shape-Prior in MRFs

Unary potential

Pairwise potential

Pose parameters

Labels

Pixels

Prior Potts model

Pose-specific MRF

Page 35: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Layer 2

Layer 1

Transformations

Θ1

P(Θ1) = 0.9

Cow Instance

Do we really need accurate models?

Page 36: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Do we really need accurate models?

Segmentation boundary can be extracted from edges

Rough 3D Shape-prior enough for region disambiguation

Page 37: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Energy of the Pose-specific MRFEnergy to be

minimizedUnary term

Shape prior

Pairwise potential

Potts model

But what should be the value of θ?

Page 38: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

The different terms of the MRF

Original image

Likelihood of being foreground given a

foreground histogram

Grimson-Stauffer

segmentation

Shape prior model

Shape prior (distance transform)

Likelihood of being foreground

given all the terms

Resulting Graph-Cuts

segmentation

Page 39: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Can segment multiple views simultaneously

Page 40: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Solve via gradient descent

Comparable to level set methods

Could use other approaches (e.g. Objcut)

Need a graph cut per function evaluation

Page 41: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Formulating the Pose Inference Problem

Page 42: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

But…But…

… to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACHEACH iteration and the maxflow recomputed!

However…However… Kohli and Torr showed how dynamic graph cuts can

be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05).

Page 43: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Dynamic Graph Cuts

PB SB

cheaperoperation

computationally

expensive operation

Simplerproblem

PB*

differencesbetweenA and B

A and Bsimilar

PA SA

solve

Page 44: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Dynamic Image Segmentation

Image

Flows in n-edges Segmentation Obtained

Page 45: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

First segmentation problem MAP solution

Ga

Our Algorithm

Gb

second segmentation problem

Maximum flow

residual graph (Gr)

G`

differencebetween

Ga and Gbupdated residual

graph

Page 46: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Dynamic Graph Cut vs Active Cuts

Our method flow recycling

AC cut recycling

Both methods: Tree recycling

Page 47: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Experimental Analysis

MRF consisting of 2x105 latent variables connected in a 4-neighborhood.

Running time of the dynamic algorithm

Page 48: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Segmentation Comparison

Gri

mson

-G

rim

son

-S

tau

ffer

Sta

uff

er

Bath

ia0

Bath

ia0

44O

ur

Ou

r m

eth

od

meth

od

Page 49: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Face Detector and ObjCut

Page 50: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Segmentation

Page 51: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Segmentation

Page 52: Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking?

Conclusion

Combining pose inference and segmentation worth investigating.

Tracking = Detection Detection = Segmentation Tracking = Segmentation. Segmentation = SFM ??


Recommended