Learning Spatial Context: Using Stuff to Find Things
WeiWei--Cheng SuCheng Su
Motivation2
Leverage contextual information to enhance detectiongSome context objects are non-rigid and are more naturally classified based on texture or color. e.g., sky, trees, roadFind the relationships between the stuff of context and the object
Outline3
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Training4
gRegion features &
Segmentation Learning
features ¢roids
Things and stuff
Candidateboxes &scoresDetection
Things-and-stuffrelationships
Model parameters
scores
Annotation
GroundtruthsAnnotation
*Red boxes indicate high scoresBlue boxes indicate low scores
Inferring5
gRegion features &
Segmentation Inferring
features ¢roids
Candidateboxes &prior scoresDetection prior scores
Posterior scores f ll did tfor all candidates
Outline6
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Preprocessing7
p g
SegmentationSeg e tat oSuperpixelPentium-D 2.4 GHz, 4G RAMRun out of memory with a 792x636 image~6.4 minutes for a 480x321 image
DetectionHOG for detecting humans, cars, bicycles, and motorbikesmotorbikesPatch-based boosted detector for detecting cars in satellite images
Segmentation8
g
This level of segmentation result is used
HoG-Cars9
HoG-People10
p
HoG-Motorbikes11
HoG-Bicycles12
y
Satellite13
Satellite14
Th=0Th 0
Satellite15
Th=0 95Th 0.95
Satellite16
Th = 0 99Th 0.99
Satellite17
Th=0 995Th 0.995
Outline18
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Running TAS19
g
Run TAS inference on all detected candidatesRun TAS inference on all detected candidatesFalse positives detected by the base detector will be filtered outfiltered outObject not detected by the base detector could not be detected by TASbe detected by TASData set: VOC2005, Google earth satellite images
Base Detector vs TAS20
Left: base detector result. Right: TAS result
Base Detector vs TAS21
Base Detector vs TAS22
Base Detector vs TAS23
Base Detector vs TAS24
Base Detector vs TAS25
Base Detector vs TAS26
Base Detector vs TAS27
Base Detector28
TAS29
Base Detector30
TAS31
Base Detector32
TAS33
Outline34
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Things-and-Stuff Relationships35
g p
Feature description: 44 features including colorFeature description: 44 features, including color, texture, shapeThe relationships are learnt during trainingThe relationships are learnt during trainingThe relationships change the score of a candidate25 relationship candidates25 relationship candidates
Relationships36
p
Relationships37
p
Relationships38
p
Relationships39
p
Relationships40
p
Relationships41
p
Relationships42
p
Relationships43
p
Relationships44
p
Relationships45
p
Relationships46
p
Relationships47
p
Relationships48
p
Relationships49
p
Relationships50
p
Relationships51
p
Relationships52
p
Relationships53
p
Relationships54
p
Relationships55
p
Relationships56
p
Some regions inside the bounding box haveSome regions inside the bounding box have relationships with the candidate
Relationships57
p
View pointView point. Different viewpoints generate different relationships
Region features might be misleadingRegion features might be misleading
Relationships58
p
The diversities of the backgroundsThe diversities of the backgroundsThe region features inside the bounding box might be a complementary cue to the features used by thebe a complementary cue to the features used by the base detector
Outline59
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Performance Analysis60
y
Training samples: 15g pTest samples: 15Image size: 792x636gTest machine: Core(TM)2 [email protected], 8G RAMImplemented in MatlabDetection and segmentation are not includedRequired computing power
Learning – 2141.67 seconds of CPU timeInferring – 63.89 seconds of CPU time
Base Detector vs TAS61
Cars
P lPeople
Red: base detector. Blue: TAS
Base Detector vs TAS - Motorbikes62
Motorbikes
Bi lBicycles
Red: base detector. Blue: TAS
Base Detector vs TAS - Satellite63
Outline64
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Number of Region Clusters65
g
Red: 10
Blue: 3 Blue: 5Blue: 3 Blue: 5
Blue: 20 Blue: 30
Number of Gibbs Iterations66
Red: 10
Blue: 20 Blue: 100
Outline67
Training and inferringTraining and inferringPreprocessingExperimental resultsExperimental resultsThings-and-stuff relationshipsPerformanceEffect of parametersConclusion
Conclusion68
Can be easily integrated with detectorsCan be easily integrated with detectorsThe performance is dependent on the detectorThe “stuff” can come from the context as well asThe stuff can come from the context as well as the object itselfEspecially suitable for background consistent andEspecially suitable for background consistent and view point consistent datasets, ex: aerial images3D information could be used to improve the3D information could be used to improve the performance
Reference69
Learning Spatial Context: Using Stuff to Find Things, g p g g ,Geremy Heitz and Daphne Koller. European Conference on Computer Vision (ECCV), 2008 TAS http://ai.stanford.edu/~gaheitz/Research/TAS/Superpixel http://www.cs.sfu.ca/~mori/research/superpixelsHOG i l i h // l i i l f / f / lHOG implemetation http://pascal.inrialpes.fr/soft/oltPASCAL VOC2005 http://pascallin ecs soton ac uk/challenges/VOC/voc2005/inhttp://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2005/index.html