Post on 22-Dec-2015
transcript
Cue Integration in Figure/Ground LabelingXiaofeng Ren, Charless Fowlkes and Jitendra Malik,
U.C. Berkeley
We present a model of edge and region grouping using a conditional random field built over a scale-invariant representation of images to integrate multiple cues. Our model includes potentials that capture low-level similarity, mid-level curvilinear continuity and high-level object shape.Maximum likelihood parameters for the model are learned from human labeled ground-truth on a large collection of horse images using belief propagation. Using held out test data, we quantify the information gained by incorporating generic mid-level cues and high-level shape.
Conditional Random Field• joint model over contours, regions and objects
• integrate low-, mid- and high-level cues
• easy to train and test on large datasets
Pb CDT
Bottom-up grouping
Contours
Regions, Objects
Output Marginals
Overview
Constrained Delaunay Triangulation (CDT)
Constructing a scale-invariant representation from the bottom-up:
1. Compute low-level edge map
2. Trace contours and recursively split them into piecewise linear segments
3. Use Constrained Delaunay Triangulation to complete gaps and partition the image into dual edges and regions.
Use Phuman the soft ground-truth label defined on CDT graphs: precision close to 100%
Pb averaged over CDT edges: no worse than the original Pb
Increase in asymptotic recall rate: completion of gradientless contours
CDT edges capture most of the image boundaries
A Random Field for Cue Integration
We consider a conditional random field (CRF) on top of the CDT triangulation graph, with a binary random variable Xe for each edge in the CDT, a binary variable Yt for every triangle, and a latent node Z which encodes object location.
ts
tse
e IYYLIXLE,
21 |,| ts
etsV
V XYYMIXM,
21 ,,|
e
et
tt
t IZXHIZYHIYH |,|,| 321
We use a simple linear combination of low-, mid- and high-level cues.
,,|,exp),(
1,,, IZYXE
IZIZYXP
Low-level cues: edge energy (L1) and similarity of brightness/texture (L2).
Mid-level cues: contour continuity and junction frequency (M1) and contour/region labeling consistency (M2).
High-level cues: familiar texture (H1), object region support (H2) and object shape (H3).
Maximum likelihood CRF parameters are fit via gradient descent. We use loopy belief propagation to perform inference, in particular estimating the marginals of X, Y and Z.
Junctions are parameterized by the number of gradient and completed edges.
A feature based on angle governs curvilinear continuity for degree 2 junctions.
Maximum-likelihood weights for various junction types.
Mid-level features
A “shapeme” which captures pairs of vertical edges
Z
Spatial distribution of the shapeme relative to object center.
Average support mask helps group regions with incoherent
appearance.
Z
High-level features
Quantitative Analysis of Cue Integration
We train and test our approach on a dataset of 344 grayscale horse images. We evaluate the performance of the grouping algorithm against both contours and regions in the human marked ground-truth. We find that for this dataset with limited pose variation, high-level knowledge greatly boosts grouping performance; nevertheless mid-level cues still play a significant role.
L+M+H > H+L > M+L > L