+ All Categories
Home > Documents > Composite Statistical Modeling in Segmentation

Composite Statistical Modeling in Segmentation

Date post: 07-Jan-2017
Category:
Upload: vuongtu
View: 228 times
Download: 3 times
Share this document with a friend
28
Composite Statistical Modeling in Segmentation Fuxin Li Georgia Institute of Technology http://www.cc.gatech.edu/~fli/ 1
Transcript

Composite Statistical Modeling

in Segmentation

Fuxin Li

Georgia Institute of Technology

http://www.cc.gatech.edu/~fli/

1

CollaboratorsJoao Carreira Cristian Sminchisescu Guy Lebanon

Ahmad Humayun David Tsai James M. RehgTaeyoung Kim

2

Outline

• Composite statistical modeling in semantic

segmentation

– Learning

– Inference

• in video segmentation

– Learning

3

Recognizing Objects in a Scene

• Given an image, identify the category and spatial

extent of all relevant objects

– a.k.a. Semantic segmentation (Shotton et al. 2006, 2008, Csurka

and Perronin 2010, Boix et al. 2010, Ladicky et al. 2010, Bourdev and Malk 2009,

Bourdev et al. 2010, Xia et al. 2013, Yalladopour et al. 2013, Z. Li et al. 2013)

Horse

Person

Horse

Person

Image Category Label Object Label

Obj 1

Obj 2

Obj 3

Obj 4

Semantic Segmentation 4

Multiple Segmentation Hypotheses

- First used in the Bonn entry (Carreira, Li, Sminchisescu) winning

PASCAL VOC Segmentation Challenge 2009- (CVPR 2014) New algorithm RIGOR that can achieve CPMC

accuracy in 2-4 seconds per image (CPU-only)

Semantic Segmentation: Learning 5

SVRSEGM:Regression on overlap

• Regress on maximal class-specific overlap

Overlap:

Overlap with Horse class: (maximize over 2 horses)80.8% 36.5% 4.7%

Semantic Segmentation: Learning

Li, Carreira, Sminchisescu, CVPR 2010, IJCV2012

6

SVRSEGM• 1-vs-all class-specific overlap regression

on many segment hypotheses

• Heuristic sequential post-processing

Semantic Segmentation: Learning 7

Composite Statistical ModelingTraining set For each Generate (Bottom-up)

Extract features on segments,

learn models that predict statistics (overlap)

Testing image Generate Predict Inference

Recover pixel labels

from prediction

Segment Statistics

(overlap)

Semantic Segmentation 8

Composite Statistical Learning

Composite Statistical Inference

Open Inference Problem

• Resolve noisy predictions on noisy segments

• Identify complicated object interactions,

especially occluded/disconnected objects

Goal:

Category: Object:

Semantic Segmentation: Inference 9

Li, Carreira, Lebanon, Sminchisescu,

CVPR 2013

Idea #1: Break and Recombine

• Break the segments apart and recombine them

– Initial enumerations are constrained

• e.g. continuity, boundary adherence

– Interactions among objects• Create occlusions!

Semantic Segmentation: Inference 10

Dissecting Segments

Seg #1: Chair 0.53

Person 0.29

Seg #2: Chair 0.36

Person 0.47

Seg #3: Chair 0.34

Person 0.54Superpixels:Seg #4: Chair 0.19

Person 0.43

Semantic Segmentation: Inference

1 2

45

6

3

7

11

Generating the Overlap Statistic

• Parametrize on superpixels:

𝐵1

𝜃𝑖𝑗|𝑆𝑖| = num. of category 𝑐𝑗ground truth pixels in 𝑆𝑖

V𝑗 𝐴1; 𝜃 =|𝐴1∩𝐺𝑇|

|𝐴1∪𝐺𝑇|=

𝐴𝑙𝑙 𝐺𝑇 𝑝𝑖𝑥𝑒𝑙𝑠 𝑖𝑛 𝐴1

|𝐴1|+𝐴𝑙𝑙 𝐺𝑇 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝐴1=

𝑆𝑖∈𝐴1𝜃𝑖𝑗|𝑆𝑖|

𝑆𝑖∈𝐴1|𝑆𝑖|+ 𝑆𝑖∉𝐴1

𝜃𝑖𝑗|𝑆𝑖|

𝜃11: % of chair

𝜃12: % of person

𝜃21: % of chair

𝜃22: % of person

𝜃31: % of chair

𝜃32: % of person

𝜃41: % of chair

𝜃42: % of person

𝑆1 𝑆2

𝑆3 𝑆4

𝐴1: Chair 0.53

Person 0.29

Semantic Segmentation: Inference 12

Idea #2: Composite Statistical

Inference• MCLE, moment matching:

• 𝐴𝑖: segments

• 𝑉𝑗: Predicted overlap with category 𝐶𝑗

min𝜃

𝑗=1

𝑐

𝑖=1

𝑚

𝑉𝑗 𝐴𝑖; 𝜃 − 𝑉𝑗 𝐴𝑖2

“Generated” statistic Predicted from the regressor

Jointly over all categories + all segments!

13

Joint optimization

• 𝜃 map after joint optimization on all objects:

Semantic Segmentation: Inference

Chair

Person

Person

Chair

14

Idea #3: Separating Multiple Objects

from the Same Category

• MAP within each category to determine number

of objects– Geometric prior favors less objects

𝑛ℎ𝑜𝑟𝑠𝑒 = 1

𝜃 map:

Posterior: -40.133 𝑛ℎ𝑜𝑟𝑠𝑒 = 2 Posterior: -35.889

𝑛ℎ𝑜𝑟𝑠𝑒 = 3 Posterior: -47.600

15

Joint optimization

• 𝜃 map after joint optimization on all objects:Horse Person Horse

Person

Person

PersonPerson

HorseHorse

Obj1Obj4

Obj2Obj3

Final Result:

Semantic Segmentation: Inference 16

Results: PASCAL 2012

• CSI does especially well on high-interaction objects such

as bike, person, chair, sofa, etc.

Semantic Segmentation: Inference

SVRSEGM JSL CSI

46.8% 47.0% 47.5%

Xia et al. 2013 Yadollahpour et al. 2013 Li et al. 2013

48.0% 48.1% 48.3%

+ mix of models,

more data:

with only PASCAL

training data, only overlap

Person PersonPerson

Person

Horse

PlantChair

TableChair

DogChair

Chair

Bike

Person

Horse

17

PASCAL: noise-free case

• Supply ground truth overlap to different

algorithms

– Upper bound performance with perfect regressor,

noisy segments

• Recombination is important!

SVRSEGM CPMC Best CSI Superpixel Best

79.0% 81.8% 90.2% 95.1%

Person

PersonMotorbike

Obj 1

Obj 2

Obj 3

Obj 1

Obj 2

Obj 3

Semantic Segmentation: Inference 18

Composite Statistical Modeling

Composite Statistical Learning

Composite Statistical Inference

Training set For each Generate (Bottom-up)

Extract features on segments,

perform regression to learn model

Testing image Generate Predict Inference

(Top-down)

Class-specific overlap

Break and recombine

19

Video Segmentation

Video Segmentation 20

Li, Kim, Humayun, Tsai, Rehg, ICCV 2013

Approach

• Track all segments from each frame

– Long-term appearance model for each track

– Every segment starts a track (1000+ tracks)

– Training: Use all segments, regress against overlap

with each track (0-1 segment per frame)

Video Segmentation

Track 1:

Track 3:

Track 2:

21

Least squares make wonders

𝐗⊤𝐗

Store one vector per appearance model

plus a global covariance matrix

Enables learning/optimal online updating

1000+ appearance models

𝐖 = −1

min𝐖

𝐖⊤𝐗 − 𝐘 2 + 𝜆 𝐖 𝐹2

22

How to use that in video?

• Always use the whole segment pool to train

– If we go from 1st – 20th frame, our training set is

always all the segments in all the frames, for ANY

target

– Online update: At each frame, add all segments from

the frame to 𝐗⊤𝐗 and 𝐗⊤𝒚

Video Segmentation 23

Greedy Trimming of Tracks

• Test on the next frame

– Obtain the regression result of every segment against

every track

– Choose best-scoring segment to match

Video Segmentation 24

Results

• Automatically reduce number of tracks from

1200 (CPMC) to 60 per sequence

Video Segmentation 25

Numbers

• We beat closest competitor by 14%

• CSI Refinement improves 3%

• Purely automatic, no user input

Video Segmentation

SPT SPT

+CSI

Pairw

ise

Kim et

al. 2011

Grundmann

et al. 2010

Oracle

Segment

Mean per

object

62.7 65.9 55.4 45.3 51.8 78.6

Mean per

sequence

68.0 71.2 58.6 57.3 50.8 81.5

Avg. number

of tracks

60.0 60.0 702.8 10.6 336.6 1219.3

26

Conclusion

• Composite statistical modeling

– Holistic segments, object-scale models

– Training is a breeze (regression)

– Least squares offer additional benefits

– Breaking down + recombine segments for refinement

– Refinement will be needed when we are going from

85% to 90%

• Or inferring about higher-level semantics, occlusion, etc.

27

Thanks!

Sofa

Person

Bottle

http://www.cc.gatech.edu/~fli/SegTrack2/

Video Segmentation

Code available!

Semantic Segmentation

http://www.cc.gatech.edu/~fli/CSI_tr2.pdf


Recommended