Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C....

transcript

Cue Integration in Figure/Ground LabelingXiaofeng Ren, Charless Fowlkes and Jitendra Malik,

U.C. Berkeley

We present a model of edge and region grouping using a conditional random field built over a scale-invariant representation of images to integrate multiple cues. Our model includes potentials that capture low-level similarity, mid-level curvilinear continuity and high-level object shape.Maximum likelihood parameters for the model are learned from human labeled ground-truth on a large collection of horse images using belief propagation. Using held out test data, we quantify the information gained by incorporating generic mid-level cues and high-level shape.

Conditional Random Field• joint model over contours, regions and objects

• integrate low-, mid- and high-level cues

• easy to train and test on large datasets

Pb CDT

Bottom-up grouping

Contours

Regions, Objects

Output Marginals

Overview

Constrained Delaunay Triangulation (CDT)

Constructing a scale-invariant representation from the bottom-up:

1. Compute low-level edge map

2. Trace contours and recursively split them into piecewise linear segments

3. Use Constrained Delaunay Triangulation to complete gaps and partition the image into dual edges and regions.

Use Phuman the soft ground-truth label defined on CDT graphs: precision close to 100%

Pb averaged over CDT edges: no worse than the original Pb

Increase in asymptotic recall rate: completion of gradientless contours

CDT edges capture most of the image boundaries

A Random Field for Cue Integration

We consider a conditional random field (CRF) on top of the CDT triangulation graph, with a binary random variable Xe for each edge in the CDT, a binary variable Yt for every triangle, and a latent node Z which encodes object location.

e IYYLIXLE,

21 |,| ts

V XYYMIXM,

21 ,,|

t IZXHIZYHIYH |,|,| 321

We use a simple linear combination of low-, mid- and high-level cues.

,,|,exp),(

1,,, IZYXE

IZIZYXP

Low-level cues: edge energy (L1) and similarity of brightness/texture (L2).

Mid-level cues: contour continuity and junction frequency (M1) and contour/region labeling consistency (M2).

High-level cues: familiar texture (H1), object region support (H2) and object shape (H3).

Maximum likelihood CRF parameters are fit via gradient descent. We use loopy belief propagation to perform inference, in particular estimating the marginals of X, Y and Z.

Junctions are parameterized by the number of gradient and completed edges.

A feature based on angle governs curvilinear continuity for degree 2 junctions.

Maximum-likelihood weights for various junction types.

Mid-level features

A “shapeme” which captures pairs of vertical edges

Spatial distribution of the shapeme relative to object center.

Average support mask helps group regions with incoherent

appearance.

High-level features

Quantitative Analysis of Cue Integration

We train and test our approach on a dataset of 344 grayscale horse images. We evaluate the performance of the grouping algorithm against both contours and regions in the human marked ground-truth. We find that for this dataset with limited pose variation, high-level knowledge greatly boosts grouping performance; nevertheless mid-level cues still play a significant role.

L+M+H > H+L > M+L > L

Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C....

Documents