Download - Learning Spatial Context: Using Stuff to Find Thingsgrauman/courses/spring...Motivation 2 Leverage contextual information to enhance detection Some context objects are non-rigid and

Learning Spatial Context: Using Stuff to Find Things

WeiWei--Cheng SuCheng Su

Motivation2

Leverage contextual information to enhance detectiongSome context objects are non-rigid and are more naturally classified based on texture or color. e.g., sky, trees, roadFind the relationships between the stuff of context and the object

Outline3

Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion

Training4

gRegion features &

Segmentation Learning

features &centroids

Things and stuff

Candidateboxes &scoresDetection

Things-and-stuffrelationships

Model parameters

scores

Annotation

GroundtruthsAnnotation

*Red boxes indicate high scoresBlue boxes indicate low scores

Inferring5

gRegion features &

Segmentation Inferring

features &centroids

Candidateboxes &prior scoresDetection prior scores

Posterior scores f ll did tfor all candidates

Outline6


Preprocessing7

p g

SegmentationSeg e tat oSuperpixelPentium-D 2.4 GHz, 4G RAMRun out of memory with a 792x636 image~6.4 minutes for a 480x321 image

DetectionHOG for detecting humans, cars, bicycles, and motorbikesmotorbikesPatch-based boosted detector for detecting cars in satellite images

Segmentation8

g

This level of segmentation result is used

HoG-Cars9

HoG-People10

p

HoG-Motorbikes11

HoG-Bicycles12

y

Satellite13

Satellite14

Th=0Th 0

Satellite15

Th=0 95Th 0.95

Satellite16

Th = 0 99Th 0.99

Satellite17

Th=0 995Th 0.995

Outline18


Running TAS19

g

Run TAS inference on all detected candidatesRun TAS inference on all detected candidatesFalse positives detected by the base detector will be filtered outfiltered outObject not detected by the base detector could not be detected by TASbe detected by TASData set: VOC2005, Google earth satellite images

Base Detector vs TAS20

Left: base detector result. Right: TAS result








Base Detector28

TAS29

Base Detector30

TAS31

Base Detector32

TAS33

Outline34


Things-and-Stuff Relationships35

g p

Feature description: 44 features including colorFeature description: 44 features, including color, texture, shapeThe relationships are learnt during trainingThe relationships are learnt during trainingThe relationships change the score of a candidate25 relationship candidates25 relationship candidates

Relationships36

p

Relationships37

p

Relationships38

p

Relationships39

p

Relationships40

p

Relationships41

p

Relationships42

p

Relationships43

p

Relationships44

p

Relationships45

p

Relationships46

p

Relationships47

p

Relationships48

p

Relationships49

p

Relationships50

p

Relationships51

p

Relationships52

p

Relationships53

p

Relationships54

p

Relationships55

p

Relationships56

p

Some regions inside the bounding box haveSome regions inside the bounding box have relationships with the candidate

Relationships57

p

View pointView point. Different viewpoints generate different relationships

Region features might be misleadingRegion features might be misleading

Relationships58

p

The diversities of the backgroundsThe diversities of the backgroundsThe region features inside the bounding box might be a complementary cue to the features used by thebe a complementary cue to the features used by the base detector

Outline59


Performance Analysis60

y

Training samples: 15g pTest samples: 15Image size: 792x636gTest machine: Core(TM)2 [email protected], 8G RAMImplemented in MatlabDetection and segmentation are not includedRequired computing power

Learning – 2141.67 seconds of CPU timeInferring – 63.89 seconds of CPU time


Cars

P lPeople

Red: base detector. Blue: TAS

Base Detector vs TAS - Motorbikes62

Motorbikes

Bi lBicycles

Red: base detector. Blue: TAS

Base Detector vs TAS - Satellite63

Outline64


Number of Region Clusters65

g

Red: 10

Blue: 3 Blue: 5Blue: 3 Blue: 5

Blue: 20 Blue: 30

Number of Gibbs Iterations66

Red: 10

Blue: 20 Blue: 100

Outline67


Conclusion68

Can be easily integrated with detectorsCan be easily integrated with detectorsThe performance is dependent on the detectorThe “stuff” can come from the context as well asThe stuff can come from the context as well as the object itselfEspecially suitable for background consistent andEspecially suitable for background consistent and view point consistent datasets, ex: aerial images3D information could be used to improve the3D information could be used to improve the performance

Reference69

Learning Spatial Context: Using Stuff to Find Things, g p g g ,Geremy Heitz and Daphne Koller. European Conference on Computer Vision (ECCV), 2008 TAS http://ai.stanford.edu/~gaheitz/Research/TAS/Superpixel http://www.cs.sfu.ca/~mori/research/superpixelsHOG i l i h // l i i l f / f / lHOG implemetation http://pascal.inrialpes.fr/soft/oltPASCAL VOC2005 http://pascallin ecs soton ac uk/challenges/VOC/voc2005/inhttp://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2005/index.html