Groups of Adjacent Contour Segments Groups of Adjacent Contour Segments for Object Detectionfor Object Detection
Vittorio FerrariVittorio FerrariLoic FevrierLoic Fevrier
Frederic JurieFrederic JurieCordelia SchmidCordelia Schmid
Problem: object class detection & localizationProblem: object class detection & localization
Training
Testing
?Focus:classes with characteristic shape
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
Contour segment network[Ferrari et al. ECCV 2006]
1) edgels extracted with Berkeley boundary detector
2) edgel-chains partitioned into straight contour segments
3) segments connected at edgel-chains’ endpoints and junctions
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
segments connected in the network
PAS = groups of two connected segments
2
• encodes geometric properties of the PAS• scale and translation invariant• compact, 5D
PAS descriptor:
Features: pairs of adjacent segments (PAS)Features: pairs of adjacent segments (PAS)
Example PAS
Why PAS ?
+ intermediate complexity:good repeatability-informativeness trade-off
+ scale-translation invariant
+ connected: natural grouping criterion (need not choose a grouping neighborhood or scale)
+ can cover pure portions of the object boundary
PAS codebookPAS codebookBased on descriptors, cluster PAS into types
a few of the most frequent types based on 10 outdoor images (5 horses and 5 background).
types based on 15 indoor images (bottles)
• Frequently occurring PAS have intuitive, natural shapes• As we add images, number of PAS types converges to just ~100• Very similar codebooks come out, regardless of source images
+ general, simple features. We use a single, universal codebook (1st row) for all classes
Window descriptorWindow descriptor
1. Subdivide window into tiles.2. Compute a separate bag of PAS per tile3. Concatenate these semi-local bags
[Lazebnik et al. CVPR 2006]; [Dalal and Triggs CVPR 2005]
+ distinctive: records which PAS appear where weight PAS by average edge strength
+ flexible: soft-assign PAS to types rather coarse tiling
+ fast to compute using Integral Histograms
TrainingTraining1. Learn mean positive window dimensions2. Determine number of tiles T3. Collect positive example descriptors
4. Collect negative example descriptors: slide window over negative training images
TrainingTraining5. Train a linear SVM
Here a few of the top weighted descriptor vector dimensions (= 'PAS + tile'):
+ lie on object boundary (= local shape structure common to many training examples)
TestingTesting1. Slide window of aspect ratio , at multiple scales
2. SVM classify each window + non-maxima suppression
detections
Results – INRIA horsesResults – INRIA horses
+ tiling brings a substantial improvement optimum at T=30 -> keep this setting on all other experiments+ works well: 86% det-rate at 0.3 FPPI (with 50 pos + 50 neg training images)
Dataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter
(missed and FP)
Results – INRIA horsesDataset: ~ Jurie and Schmid, CVPR 2004 170 positive + 170 negative images (training = 50 pos + 50 neg) wide range of scales; clutter
+ PAS better than any IP all interest point (IP) comparisons with T=10, and 120 feature types, (= optimum over INRIA horses, and ETHZ Shape Classes; all IP codebooks are class-specific)
(missed and FP)
Results – Weizmann-Shotton horsesDataset: Shotton et al., ICCV 2005 327 positive + 327 negative images (training = 50 pos + 50 neg) no scale changes; modest clutter
Shotton’s EER
- exact comparison to Shotton et al.: use their images and search at a single scale- PAS same performance (~92% precision-recall EER), but: + no need for segmented training images (only bounding-boxes) + can detect objects at multiple scales (see other experiments)
Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter
Results – ETHZ Shape ClassesResults – ETHZ Shape ClassesDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter
Missed
Results – ETHZ Shape Classes
+ mean det-rate at 0.4 FPPI = 79%
+ PAS >> I.P for apple logos, bottles, mugs PAS ~= IP for giraffes (texture!) PAS < IP for swan
+ overall best IP: Harris-Laplace
+ class specific IP codebooks
Giraffes Mugs Swans
Apple logos Bottles
Results – Caltech 101Results – Caltech 101Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004
42 anchor, 62 chair, 67 cup imagestrain = half + same number of caltech101 backgroundtesting = other half pos + same number of backgroundscale changes; only little clutter
Results – Caltech 101Dataset: Fei-Fei et al., GMBV 2004
On caltech101’s anchor, chair, cup:+ PAS better than Harris-Laplace+ mean PAS det-rate at 0.4 FPPI: 85%
Comparison to Dalal and Triggs CVPR 2005
Giraffes Mugs Swans
Apple logos Bottles
Comparison to Dalal and Triggs CVPR 2005
Caltech anchors Caltech chairs Caltech cups
INRIA horses Shotton horses
+ overall mean det-rate at 0.4 FPPI: PAS 82% >> HoG 58%
PAS >> HoG for 6 datasets PAS ~= HoG for 2 datasets PAS < HoG for 2 datasets
Generalizing PAS to Generalizing PAS to kkASASkAS: any path of length k through the contour segment network
segments connected in the network 3AS 4AS
• scale+translation invariant descriptor with dimensionality 4k-2• k = feature complexity; higher k -> more informative, but less repeatable kAS• overall mean det-rates (%)
1AS PAS 3AS 4AS 0.3 FPPI 69 77 64 57 0.4 FPPI 76 82 70 64
PAS do best !
ConclusionsConclusions
Connected local shape features for object class detection
Experiments on 10 diverse classes from 4 datasets show:
+ better suited than interest points for these shape-based classes
- fixed aspect-ratio window: sometimes inaccurate bounding-boxes
+ object detector deals with clutter, scale changes, intra-class variability
- single viewpoint
+ PAS have the best intermediate complexity among kAS
+ object detector compares favorably to HoG-based one
Current work: detecting object outlinesCurrent work: detecting object outlines
Training: learn the common boundaries from examples
Model• collection of PAS and their spatial variability• only common boundary
1. detect edges
Current work: detecting object outlinesCurrent work: detecting object outlinesDetection on a new image
2. match PAS based on descriptors
3. vote for translation + scaleinitializations
4. match deformable thin-plate spline based on deterministic annealing
Outline object in test image,without segmented training images !
A few preliminary resultsA few preliminary results