WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES
Prasad Gabbur, Kobus Barnard
University of Arizona
Overview
Word-prediction using translation model for object recognition
Feature evaluation
Segmentation evaluation
Modifications to Normalized Cuts segmentation algorithm
Evaluation of color constancy algorithms
Effects of illumination color change on object recognition
Strategies to deal with illumination color change
Low-level computer vision algorithms Segmentation, edge detection, feature extraction, etc.
Building blocks of computer vision systems
Is there a generic task to evaluate these algorithms quantitatively?
Word-prediction using translation model for object recognition Sufficiently general
Quantitative evaluation is possible
Motivation
Translation model for object recognition
Translate from visual to semantic description
Approach
Model joint probability distribution of visual representations and associated words using a large, annotated image collection.
Corel database
Image pre-processing
sun sky waves sea
visual features
Segmentation*
* Thanks to N-cuts team [Shi, Tal, Malik] for their segmentation algorithm
[f1 f2 f3 …. fN]
Joint distribution
P(w | b) P(w | l)P(b | l)P(l) P(b)l
word
blob
joint visual/textual concepts *
Learn P(w | l), P(b | l), and P(l) from data using EM
Node l
Frequency table
Gaussian over features
* Barnard et al JMLR 2003
Annotating images
Segment image
Compute P(w|b) for each region
Sum over regions
. . .
b1
b2
P(w|b1)
P(w|b2)
+
P(w|image)
CAT TIGER GRASS FOREST
Predicted Words
Actual Keywords
CAT HORSE GRASS WATER
Measuring performance
• Record percent correct• Use annotation performance as a proxy for recognition
• Large region-labeled databases are not available• Large annotated databases are available
75%Training
160 CD’s
80 CD’s
80 CD’sNovel
25%Test
Experimental protocol
sampling scheme Each CD contains 100 images on one specific topic like “aircraft”
Average results over 10 different samplings
Corel database
Semantic evaluation of vision processes
Feature setsCombinations of visual features
Segmentation methods Mean-Shift [Comaniciu, Meer]
Normalized Cuts [Shi, Tal, Malik]
Color constancy algorithms Train with illumination change
Color constancy processing – Gray-world, Scale-by-max
Feature evaluation
FeaturesSize
Location Shape
• Second moment
• Compactness
• Convexity
• Outer boundary descriptor
Color
(RGB, L*a*b, rgS)
• Average color
• Standard deviation
Texture
Responses to a bank of filters
• Even and Odd symmetric
• Rotationally symmetric (DOG)
Context
(Average surrounding color)
Feature evaluation
Base = Size + Location + Second moment + Compactness
0
0.02
0.04
0.06
0.08
0.1
0.12
Base +Color +Texture +Shape
TrainingHeld outNovel
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
Segmentation evaluation
Mean Shift
(Comaniciu, Meer)
Normalized Cuts (N-Cuts)
(Shi, Tal, Malik)
Segmentation evaluation
• Performance depends on number of regions used for annotation
• Mean Shift is better than N-Cuts for # regions < 6
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
# regions
Normalized Cuts
• Graph partitioning technique• Bi-partitions an edge-weighted graph in an optimal sense
• Normalized cut (Ncut) is the optimizing criterion
i j
wij
Edge weight => Similarity between i and j
A B
Minimize Ncut(A,B)
Nodes
• Image segmentation• Each pixel is a node
• Edge weight is similarity between pixels
• Similarity based on color, texture and contour cues
Normalized Cuts
Original algorithm
pixelpixel regionregion
Initialseg
Finalseg
Produces splits in homogeneous regions, e.g., “sky”
– Local connectivity between pixels
Preseg Seg
Meta-segmentation
regionregion
Preseg Iteration 1 Iteration n
regionregion
k lRi Rj
ijkl WT
W1ˆ
k lRi Rj
ijkl WW
Modifications to Normalized Cuts
Original
Modified
k
l
k
l
Modifications to Normalized Cuts
Original Modified Original Modified
Original vs. Modified
• For # regions < 6, modified out-performs original
• For # regions > 6, original is better
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
# regions
Incorporating high-level information into segmentation
algorithms
Low-level segmenters split up objects (eg. Black and white halves of a penguin)
Using word-prediction gives a way to incorporate high-level semantic information into segmentation algorithms
Propose a merge between regions that have similar posterior distributions over words
Illumination change
Makes recognition difficult
Illumination color change
Illuminant 1
Illuminant 2
Strategies to deal with illumination change:
• Train for illumination change
• Color constancy pre-processing and normalizationhttp://www.cs.sfu.ca/~colour/data
*
*
Training
Train for illumination change
Variation of color under expected illumination changes
[Matas et al 1994, Matas 1996, Matas et al 2000]
Algorithm
Unknown illuminant Canonical (reference) illuminant
(Map image as if it were taken under reference illuminant).
Test Input
Recognition system
Training database
Canonical (reference) illuminant
Color constancy pre-processing
[Funt et al 1998]
Algorithm
Unknown illuminant Canonical (reference) illuminant
(Map image as if it were taken under reference illuminant).
Test Input
Recognition system
Normalized training database
Canonical (reference) illuminant
Training database
Algorithm
Color normalization
[Funt and Finlayson 1995, Finlayson et al 1998]
Unknown illuminant
Simulating illumination change
11 illuminants
(0 is canonical)
0 1 2
3 4 5
6 7 8
9 10
Train with illumination variation
Experiment BTraining: No illumination change
Testing: Illumination change
Experiment CTraining: Illumination change
Testing: Illumination change
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
Experiment ATraining: No illumination change
Testing: No illumination change
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B C
TrainingHeld-outNovel
Color constancy pre-processing
Gray-world
Training Test
Algorithm
Mean color = constant
Canonical Unknown
Canonical
rr
rr ct g
g
gg ct b
b
bb ct
r g b
tr tg tb
Color constancy pre-processingScale-by-max
Training Test
Algorithm
Max color = constant
Canonical Unknown
Canonical
r g b
tr tg tb
rr
rr
m
mc
t gg
gg
m
mc
t bb
bb
m
mc
t
Color constancy pre-processing
Experiment BTraining: No illumination change
Testing: Illumination change
OthersTraining: No illumination change
Testing: Illumination change
+ Color constancy algorithm
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B Gray-world
Scale-by-max
TrainingHeld-outNovel
Experiment ATraining: No illumination change
Testing: No illumination change
Color normalization
Gray-world
Scale-by-max
Training Test Training Test
Algorithm
Algorithm
Mean color = constant
Max color = constant
Canonical
Unknown
Canonical
Unknown
Color normalization
Experiment BTraining: No illumination change
Testing: Illumination change
OthersTraining: No illumination change
+ Color constancy algorithm
Testing: Illumination change
+ Color constancy algorithm
An
nota
tion
P
erf
orm
an
ce
(big
ger
is b
ett
er)
Experiment ATraining: No illumination change
Testing: No illumination change
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
A B Gray-world
Scale-by-max
TrainingHeld-outNovel
Conclusions
Translation (visual to semantic) model for object recognition
Identify and evaluate low-level vision processes for recognition
Feature evaluation
Color and texture are the most important in that order
Shape needs better segmentation methods
Segmentation evaluation
Performance depends on # regions for annotation
Mean Shift and modified NCuts do better than original NCuts for # regions < 6
Color constancy evaluation
Training with illumination helps
Color constancy processing helps (scale-by-max better than gray-world)
Thank you!