Adapted from Ce Liu's CVPR2009 slides
Nonparametric Scene Parsing via Label Transfer
Author: Ce Liu Jenny Yuen Antonio Torralba
Group 3Presenter: Hongsheng Yang
Adapted from Ce Liu's CVPR2009 slides
The task of object recognition and scene parsing
tree
sky
road
field
car
unlabeled
building
window
Input Output
Adapted from Ce Liu's CVPR2009 slides
Training based object recognition and scene parsing• Sliding window method - Train a classifier for a fixed-size window (e.g., car vs. non-car) - Try all possible scales and locations, run the classifier - Merge multiple detections• Texton method - Extract pixel-wise high-dimensional feature vectors - Train a multi-class classifier - Spatial regularity: neighboring pixels should agree
J. Shotton et al. Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. ECCV, 2006
Adapted from Ce Liu's CVPR2009 slides
Label Transfer - Intuition• I’ve seen and recognized a few similar
pictures before. • If I could correspond each pixels in the
query image to the pixels in the previous seen images,• then I could infer how the new query image
looks like based on the database images.
Adapted from Ce Liu's CVPR2009 slides
Label Transfer - Pipeline > Given a query image• Find another annotated image with similar
scene• Find dense correspondences between these
two images• Warp the annotation according to the
correspondences
> Two key components:• A large, annotated database• Good correspondences for label transfer
tree
sky
road
field
car
unlabeled
building
window
Query from Database
User annotationWarped annotation
Adapted from Ce Liu's CVPR2009 slides
Large image databases• A subset of LabelMe database (outdoor scenes)• 2688 in total, 2488 for training, 200 for test• 33 object categories + “unlabeled”, including street, beach, mountains, fields,
buildings, etc.
B. Russell et al. LabelMe: a database and web-based tool for image annotation. IJCV 2008.
Adapted from Ce Liu's CVPR2009 slides
A good correspondence approach• SIFT flow - analogous to - Optical flow• Scene level - Image level • SIFT Flow – dense SIFT, spatial regularization
Optical flow
Adapted from Ce Liu's CVPR2009 slides
Input Support Optical flow
Dense SIFT image (RGB = first 3 components of 128D SIFT)
SIFT flow
SIFT flow
Warping of optical flow
Warping of SIFTflow
Adapted from Ce Liu's CVPR2009 slides
Objective energy function is similar to that of optical flow:
• MRF - p, q: grid coordinate, w: flow vector, u, v: x- and y-components, s1, s2: SIFT descriptors
Data term (reconstruction)
Small displacement bias
Smoothness term
C. Liu et al. SIFT Flow: Dense Correspondence across Scenes and its Applications. TPAMI 2011
Adapted from Ce Liu's CVPR2009 slides
Design of Nonparametric Scene Parsing System • Scene retrieval: retrieve a set of nearest neighbors in the database for
a given query image. (One image is not good enough, using GIST as matching score) • Compute the SIFT flow from the query to each nearest neighbor, and
use the achieved minimum energy to re-rank the nearest neighbors. Further select the top M re-ranked retrievals to create the voting candidate set.
Adapted from Ce Liu's CVPR2009 slides
Query SIFT
Candidate set SIFT Annotation SIFT flow Warped Annotations
Adapted from Ce Liu's CVPR2009 slides
• Another multi-labeling MRF to integrate the result of candidate annotated images, including per-pixel likelihood, spatial prior, neighborhood spatial consistency
Warped Anotation
Query SIFT
Candidate set SIFT Annotation SIFT flow
Parsing Ground truth
Warped Annotations
Scene parsing results (1)Query Best match
Annotation of best match
Warped best match to query
Parsing result of label transfer Ground truth
Scene parsing results (2)Query Best match
Annotation of best match
Warped best match to query Parsing result Ground truth
Pixel-wise performanceOur system
optimized parametersPer-pixel rate 74.75%
Pixel-wise frequency count of each class
Stuff Small, discrete objects
The relative importance of different components of the parsing system
Adapted from Ce Liu's CVPR2009 slides
Conclusion• Label transfer provides a novel data-driven way to understand scene. • A few future work are conducted from this line: e.g. Superparsing
• Need a better robust correspondence approach: e.g. scale rotation invariant dense descriptor? complexity? -> one up-to-date work: Deformable Spatial Pyramid Matching for Fast Dense Correspondences Problem, Jaechul Kim, Ce Liu, Fei Sha and Kristen Grauman, CVPR 2013