Learning Spatial Context: Using Stuff to Find Thingsgrauman/courses/spring...Motivation 2 Leverage...

transcript

Learning Spatial Context: Using Stuff to Find Things

WeiWei--Cheng SuCheng Su

Motivation2

Leverage contextual information to enhance detectiongSome context objects are non-rigid and are more naturally classified based on texture or color. e.g., sky, trees, roadFind the relationships between the stuff of context and the object

Outline3

Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion

Training4

gRegion features &

Segmentation Learning

features &centroids

Things and stuff

Candidateboxes &scoresDetection

Things-and-stuffrelationships

Model parameters

scores

Annotation

GroundtruthsAnnotation

*Red boxes indicate high scoresBlue boxes indicate low scores

Inferring5

gRegion features &

Segmentation Inferring

features &centroids

Candidateboxes &prior scoresDetection prior scores

Posterior scores f ll did tfor all candidates

Outline6

Preprocessing7

SegmentationSeg e tat oSuperpixelPentium-D 2.4 GHz, 4G RAMRun out of memory with a 792x636 image~6.4 minutes for a 480x321 image

DetectionHOG for detecting humans, cars, bicycles, and motorbikesmotorbikesPatch-based boosted detector for detecting cars in satellite images

Segmentation8

This level of segmentation result is used

HoG-Cars9

HoG-People10

HoG-Motorbikes11

HoG-Bicycles12

Satellite13

Satellite14

Th=0Th 0

Satellite15

Th=0 95Th 0.95

Satellite16

Th = 0 99Th 0.99

Satellite17

Th=0 995Th 0.995

Outline18

Running TAS19

Run TAS inference on all detected candidatesRun TAS inference on all detected candidatesFalse positives detected by the base detector will be filtered outfiltered outObject not detected by the base detector could not be detected by TASbe detected by TASData set: VOC2005, Google earth satellite images

Base Detector vs TAS20

Left: base detector result. Right: TAS result

Base Detector28

Base Detector30

Base Detector32

Outline34

Things-and-Stuff Relationships35

Feature description: 44 features including colorFeature description: 44 features, including color, texture, shapeThe relationships are learnt during trainingThe relationships are learnt during trainingThe relationships change the score of a candidate25 relationship candidates25 relationship candidates

Relationships36

Relationships37

Relationships38

Relationships39

Relationships40

Relationships41

Relationships42

Relationships43

Relationships44

Relationships45

Relationships46

Relationships47

Relationships48

Relationships49

Relationships50

Relationships51

Relationships52

Relationships53

Relationships54

Relationships55

Relationships56

Some regions inside the bounding box haveSome regions inside the bounding box have relationships with the candidate

Relationships57

View pointView point. Different viewpoints generate different relationships

Region features might be misleadingRegion features might be misleading

Relationships58

The diversities of the backgroundsThe diversities of the backgroundsThe region features inside the bounding box might be a complementary cue to the features used by thebe a complementary cue to the features used by the base detector

Outline59

Performance Analysis60

Training samples: 15g pTest samples: 15Image size: 792x636gTest machine: Core(TM)2 Quad@2.40GHz, 8G RAMImplemented in MatlabDetection and segmentation are not includedRequired computing power

Learning – 2141.67 seconds of CPU timeInferring – 63.89 seconds of CPU time

P lPeople

Red: base detector. Blue: TAS

Base Detector vs TAS - Motorbikes62

Motorbikes

Bi lBicycles

Red: base detector. Blue: TAS

Base Detector vs TAS - Satellite63

Outline64

Number of Region Clusters65

Red: 10

Blue: 3 Blue: 5Blue: 3 Blue: 5

Blue: 20 Blue: 30

Number of Gibbs Iterations66

Red: 10

Blue: 20 Blue: 100

Outline67

Conclusion68

Can be easily integrated with detectorsCan be easily integrated with detectorsThe performance is dependent on the detectorThe “stuff” can come from the context as well asThe stuff can come from the context as well as the object itselfEspecially suitable for background consistent andEspecially suitable for background consistent and view point consistent datasets, ex: aerial images3D information could be used to improve the3D information could be used to improve the performance

Reference69

Learning Spatial Context: Using Stuff to Find Things, g p g g ,Geremy Heitz and Daphne Koller. European Conference on Computer Vision (ECCV), 2008 TAS http://ai.stanford.edu/~gaheitz/Research/TAS/Superpixel http://www.cs.sfu.ca/~mori/research/superpixelsHOG i l i h // l i i l f / f / lHOG implemetation http://pascal.inrialpes.fr/soft/oltPASCAL VOC2005 http://pascallin ecs soton ac uk/challenges/VOC/voc2005/inhttp://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2005/index.html

Learning Spatial Context: Using Stuff to Find Thingsgrauman/courses/spring...Motivation 2 Leverage...

Documents