Learning to Detect Natural Image Boundaries Using Local Brightness,
Color, and Texture Cues
David R. Martin
Charless C. Fowlkes
Jitendra Malik
Paper Contribution• Proposed Boundary Classifier
– Uses local image features– Detect & Localize Boundaries
• Probability each pixel belongs to a boundary• Determine boundary angle
– Robust on natural scenes– Supervised-learning using human-segmented images
Feature Hierarchy• Cues for perceptual grouping:
– Low-Level: brightness, color, texture, depth, motion– Mid-Level: continuity, closure, convexity, symmetry, …– High-Level: familiar objects and configurations
• This paper– Strictly uses low-level features
• Comparison to other low-level classifiers– Canny– Edge detector based on Eigenvalues of “Second
Moment Matrix”
Boundary Detection• Other methods detect edges
– Abrupt change in low-level feature
– Ad-hoc additions introduce errors
• This method detects boundaries– Contour separating objects
– Resembling human-abilities
Goal & Outline• Goal: Model the posterior probability of a boundary
Pb(x,y,) at each pixel and orientation using local cues
• Method: Supervised learning using dataset of 12,000 segmentations of 1,000 images by 30 subjects
• Outline:– 3 Image-feature cues: brightness, color, texture– Cue calibration– Cue combination– Compare with other approaches
Image Features• Look at region around each pixel for feature
discontinuities– Range of orientations– Range of scales
• Features– Oriented Energy (not used) – Brightness gradient– Color Gradient– Texture Gradient
Image Features• All 3 Features are gradient-based• Gradient found by:
– For every pixel (x,y) draw circle of radius ‘r’– Split circle in half– Compare contents of two halves
• G(x,y,,r)• Large difference indicates an edge
– Repeat for every ‘’ & ‘r’
• Implementation– 8 orientations of ‘’– 3 scales ‘r’
r(x,y)
Brightness Features• Brightness Gradient BG(x,y,r,)
– Model intensity values in each disk-half using kernel density estimation
– Create a histogram of distribution– Compare disk-halves by comparing histograms
2 difference in L * distribution
i ii
ii
hg
hghg
22 )(
2
1),(
Color Features• Color Gradient CG(x,y,r,)
– Model color values in each disk-half using kernel density estimation
– Color Space• Red-Green (a)• Yellow-Blue (b)
– 2D space (a*b) greatly increases computation– Instead, use two 1D spaces (a+b)– Create 2 histograms of kernel densities– Compare disk-halves by comparing histograms
2 difference in (a * distribution) + (b * distribution)
Texture Features• Texture Gradient BG(x,y,r,)
– Pixel-texture values are computed using 13 filters
– Pixels are thus represented with a 13-feature vector– Each disk-half is modeled by a point cloud of vectors
in 13-dimensional space
– Problem: How does one compare two 13D spaces?
Texture Features• Solution: Textons estimate joint distribution
using adaptive bins– Filter response vectors are clustered using k-means– Cluster centers represent texture primitives (textons)
– Example Texton set• K = 64• Trained using 200 images
Texture Features• Texture Gradient BG(x,y,r,)
– Pixels in each disk-half are assigned to nearest texton– Disk-halfs are represented by a histogram of textons– Compare disk-halves by comparing histograms
2 difference in T * distribution
TextonMap
Feature Localization• Problem: Can’t localize because features don’t
form sharp peaks around boundaries– Smooth peaks– Double peaks
• Solution: For each pixel– Use least-squares to fit a cylindrical parabola over the
2D window of radius ‘r’– Parabola center = localized edge
Evaluation• Boundary detector quality
– Used for optimizing parameters– Comparing to other techniques
• Human-marked boundaries as ground truth– 1000 images, 5-10 segmentations
– Highly consistent
Evaluation• Compare to ground truth using Precision-Recall
– Sensitivity vs. Noise
– Optimal tradeoff point is used for comparison
–
)(
)( :Precision
ivesFalsePositPositivesTrue
vesTruePositi
)(
)( :Recall
ivesFalseNegatPositivesTrue
vesTruePositi
))1(/( PRPRF
FewerFalsePositives
Fewer Misses
Goal
F
Cue CalibrationAll free parameters optimized on training data• Brightness Gradient
– Disc Radius, bin/kernel sizes for KDE
• Color Gradient– Disc Radius, bin/kernel sizes for KDE, joint vs. marginals
• Texture Gradient– Disc Radius– Filter bank: scale, multiscale vs. singlescale– Histogram comparison method: 2, EMD, etc.– Number of textons (k-means value)– Image-specific vs. universal textons
• Localization parameters for each cue
Eliminate Redundant CuesSupervised learning using ground-truth images• Orient energy (OE) same info as BG• Multiple scales doesn’t add accuracy• Best results
– BG+TG+CG at single scale
Classifiers for Cue CombinationSupervised learning using ground-truth images• Logistic Regression
– Linear and quadratic terms– Stable, quick, compact, intuitive– Training: Minutes Evaluation: Negligible compared to feature
detection
• Density Estimation– Adaptive bins using k-means– Training: Minutes Evaluation: Negligible compared to feature
detection
• Classification Trees– Top-down splits to maximize entropy, error bounded– Training: Hours Evaluation: Many x longer than regression
• Hierarchical Mixtures of Experts– 8 experts, initialized top-down, fit with EM– Training: Hours Evaluation: 15 x longer than regression
• Support Vector Machines (libsvm)– Terrible: Model was large, exceedingly slow, brittle(parameters), opaque
Summary & Comments• Edge detection does not optimally segment natural images
– Texture suppression is not sufficient
• Proposed method offers significant improvements– Simple but powerful feature detectors– Simple model for cue combination
• Empirical approach for calibration & cue combination• Surprisingly effective for a low level approach
• Likely isn’t robust to larger textures– Offset by using multiple scales
• Prohibitively slow for on-line use– Minutes per image after optimizations
• Normalized Cuts: Boundaries Regions