1
EE 6882 Statistical Methods for Video Indexing and Analysis
Fall 2004Prof. Shih-Fu Chang
http://www.ee.columbia.edu/~sfchang
Lecture 1 - part B (9/7/04)
2EE6882-Chang
Run-through of a simple image search system Color, Texture, distance metrics, and evaluation issuesReferences
J. R. Smith and S.-F. Chang, "VisualSEEk: A Fully Automated Content-Based Image Query System," ACM Multimedia Conference, Boston, MA, Nov. 1996. J. R. Smith and S.-F. Chang, "Visually Searching the Web for Content," IEEE Multimedia Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997. M. Flickher, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovicand D. Steele, and P. Yanker. Query by image and video content: The QBIC system. In IEEE Computer, volume 38, pages 23-31, 1995.Christos Faloutsos, Ron Barber, Myron Flickner, Wayne Niblack, Dragutin Petkovic, and William Equitz. Efficient and effective querying by image content. J. of Intelligent Information Systems, 3(3/4):231-262, July 1994. (QBIC System)Sikora, T., "The MPEG-7 visual standard for content description-an overview," IEEE Transactions on Circuits and Systems for Video Technology, Volume: 11 Issue: 6 , Page(s): 696 -702, June 2001. Manjunath, B.S.; Ohm, J.-R.; Vasudevan, V.V.; Yamada, A., "Color and texture descriptors," IEEE Transactions on Circuits and Systems for Video Technology, Volume: 11 Issue: 6 , Page(s): 703 -715, June 2001.Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. A Metric for Distributions with Applications to Image Databases. Proceedings of the ICCV'98, Bombay, India, January 1998, pages 59-66.Thanks to John R. Smith for some slides on color/texture feature extraction
3EE6882-Chang
Content-based Image Retrieval System
UserUser
User interface
User interface
Image thumbnails
Image thumbnails
Images & videos
Images & videos
NetworkNetwork
QueryserverQueryserver
Image/videoServer
Image/videoServer
IndexIndex
ArchiveArchive
What functionalities should each component have?What are the bottlenecks of the system?
4EE6882-Chang
Feature Extraction for Content-Based Image Retrieval (Color & Texture)
Why visual features?Manual annotation is tedious and insufficientComputers cannot understand imagesComparison of visual features enables comparison of visual scenesNeed tools for organizing filtering and searching through large amounts of visual data
What visual features?What is available in the data?What features does the human visual system (HVS) use?Color: suitable for color imagesTexture: visual patterns, surface properties, cues for depthShape: boundaries of real world objects, edgesMotion: camera motion vs. object motion
5EE6882-Chang
Visual FeaturesHow to use visual features?
ExtractionRepresentationDiscriminationIndexing
ConsiderationsComplexityInvariance
Rotation, scaling, cropping, occlusion, shift, etc.
DimensionSubjective relevanceDistance Metric
6EE6882-Chang
Visual Features (cont.)Fundamental approach is from pattern recognition work
Group pixels, process the group and generate a feature vectorDiscrimination via (transform and ) feature vector distanceMultidimensional indexing of the feature vectors
Do this for color and textureBuild a content-based image retrieval system
7EE6882-Chang
Color Order SystemsThe Munsell System (1905)
Colors are arranged so that, as nearly as possible the perceptual distance between adjacent color is constant. The Munsell Book of Color – color chips
The Natural Color System (NCS) – (1981)Natural Color System Atlas – derived from 60,000 observationsColor are described by the relative amounts of basic colors: black, white, yellow, blue, red and greenThe DIN system (1981)The Coloroid system (1980-1987)Optical Society of American System (OSA) (1981)Hunter LAB System (1981)
8EE6882-Chang
Color Order Systems (cont.)Advantages of Color Order Systems
Easy to understand, plus samples are availableEasy to use and compare colors side-by-sideNumber and spacing of samples can be adapted to application
DisadvantagesToo many color order systems, can’t translate between themColor comparison is only valid for required illuminantUser perception differsApplication to self-luminous colors (i.e., monitors and computer displays) is not easy
9EE6882-Chang
Color RepresentationWhat is COLOR?
A weighted combination of stimuli at three principal wavelengths in the visible spectrum (form blue=400nm to red=700nm).
β
ργ
Examples:λ=500nm (β, γ, ρ)=(20, 40, 20) B=100 (β, γ, ρ)=(100, 5, 4)G=100 (β, γ, ρ)=(0, 100, 75) R=100 (β, γ, ρ)=(0, 0, 100)
[Oberle]
10EE6882-Chang
Tri-stimulus Representation
Compute correct α1 α2 α3 s.t. the response (β, γ, ρ) are the same as those of original color.
P1(λ)
P2 (λ)
P3 (λ)
α1
α2
α3
HVS Same Response(β, γ, ρ)
E.g., use are R, G, B as primary colors P1 , P2 ,P3
11EE6882-Chang
Color Spaces and Color Order SystemsColor Spaces
RGB – cube in Euclidean space
Standard representation used in color displaysDrawbacks
RGB basis not related to human color judgmentsIntensity should for one of the dimensions of colorImportant perceptual components of color are hue, brightness and saturation
R G Br g bR G B R G B R G B
= = =+ + + + + +
12EE6882-Chang
Color Spaces and Color Order SystemsHSI-cone (cylindrical coordinates)
Opponent-Cartesian
YIQ-NTSC television standard0.6 0.28 0.320.21 0.52 0.310.3 0.59 0.11
I RQ GY B
− − = −
−−−=
BGR
VVI
06/16/16/26/16/1
3/13/13/1
2
1
)(tan1
21
VVH −=
2/122
21 )( VVS +=
1 2 11 1 2
1 1 1
R G RBl Y G
W Bk B
− − − = − − −
13EE6882-Chang
Perceptual Representation Of HSI Space
brightness varies along the vertical axis
hue varies along the circumference
saturation varies along the radius
14EE6882-Chang
Color Coordinate Systems
From Jain’s DIP book
15EE6882-Chang
Color Coordinate Systems (cont.)
16EE6882-Chang
Color Space QuantizationHow many colors to keep
IBM QBIC 16M(RGB) 4096 (RGB) 64 (Munsell) colorsColumbia U. VisualSEEK 16M (RGB) 166 (HSV) colors
(18 Hue, 3 Sat, 3 Val, 4 Gray)Stricker and Orengo (Similarity of Color Images)
16M (RGB) 16 hues, 4 val, 4 sat = 128(HSV) colors16M (RGB) 8 hues, 2 val, 2 sat = 32 (HSV) colors
Sqain and Ballard (Color Indexing)16M (RGB) 8 wb, 16rg, 16by = 2048 (OPP) colors
Independent quantization – each color dimension is quantized independentlyJoint quantization – color dimensions are quantized jointly
17EE6882-Chang
Color HistogramFeature extraction from color images
Choose GOOD color spaceQuantize color space to reduce number of colorsRepresent image color content using color histogramFeature vector IS the color histogram
1 [ , ] , [ , ] , [ , ][ , , ]
0R G B
RGBm n
if I m n r I m n g I m n bh r g b
otherwise= = =
=
∑∑A color histogram represents the distribution of colors where each histogram bin corresponds to a color is the quantized color space
18EE6882-Chang
Color Histogram (cont.)Advantages of color histograms
Compact representation of color informationGlobal color distributionHistogram distance metrics
DisadvantagesHigh dimensionalityNo information about spatial positions of colors
19EE6882-Chang
Other Histogram MetricsL1 distance
L2 distance
Histogram Intersection
Quadratic Distance
Other histogramsEdge histogram + total edge countTextureIssue: quality of edge, texture extraction, lighting (dark frame)
1 1( , 1) ( ) ( )i ij
D i i H j H j++ = −∑2
2 1( , 1) ( ) ( )i ij
D i i H j H j++ = −∑( )1
1
min ( ), ( )
1
min ( ), ( )
i ij
I
i ij j
H j H j
D
H j H j
+
+
= −
∑
∑ ∑( ) ( )
1 2
1 1 1 1 2 2 1 2
1 2 1 2
( ) ( ) ( , ) ( ) ( )
( , ) : , .1 2
Q i i i ij j
j ,j
D H j H j j j H j H j
j j correlation between colors j j e.g. 1-d
α
α
+ += − −∑∑
Color Coherence Vector
A B C D ERegions: Color 1 2 1 3 1
Size 12 15 3 1 5
( ) ( ) ( ) ( )
( ) ( )
1 1 1 1
1 1
, ,..., , , ,..., ,I n n I n n
n n
G i i i i H i i i ii i
G H
G G
by triangular inequality
α β α β α β α β
α α β β α α β β= =
′ ′ ′ ′ ′= =
′ ′ ′ ′∆ − + − ∆ − + −
∆ > ∆
∑ ∑
2 1 2 2 1 12 2 1 2 1 1
... ...
B C B B A AB B C B A A
Color Quantization B C D B A A
Region SegmentaitionB B B A E E
LabelingB B A A E EB B A A E E
→ →
Not just count of colors, also check adjacency
1 2 317 15 03 0 1
ColorColor Co. Vector: α
β
21EE6882-Chang
Consideration of MetricsLimitations of Euclidean Metric
Cannot distinguish classes
Correlation between features
Curved boundariesChange featureUse Mahalanobis dist
Distinctive subclassesUse clustering
Complex featuresUse better features
+
+++
oo
o++
+
+o o
ovs
vs+
++
+o
o
ooo
+
+++
+++
oo
ooo
++
oo
oo+ ++
o oo oo oo o++
o
oo o
o o
+
++++oo
+
+
+++
oo
o
o
o
Mohalanobis Metric( ) ( )2 1
1 2 1 2
1
(1,1) (1,2) ... (1, )... ... ... ...( ,1) ( , 2) ... ( , )
( , ) ( ) ( ) ( ) ( ) / 1, :
Tmah x
x
N
k kk
D x x C x x
c c c dcovariance matrix C
c d c d c d d
c i j x i m i x j m j N N number of samples
−
=
= − −
=
= − − − ∑
oo o
oxi
xj
oo o
oxi
xj
ooo
oo
xi
xj
o oo o
o
xi
xj
oo o
oo
xi
xj
oo
i jc s s= − 12 i jc s s= − 0c = 1
2 i jc s s= i jc s s=
1 2 1 2 1 2
1 11 2 1 2 1 2
| ... | ( , , ..., ) | ... |
| ... | ( ( , ,..., )) | ... |
Tx d d d
Tx d d d
C e e e diag e e e
C e e e diag e e e
λ λ λ
λ λ λ− −
=
=
oo o
ooo
e1e2
si, sj: std. deviation
Projects data to the eigen vectors, divide the sd of each eigen dimension, and compute Euclidian distance
23EE6882-Chang
Mohalanobis Metric (cont.)Advantage of Mahalanobis metric
Account for scaling of coordinate axesInvariant under linear transformation
Correct for correlationProve curved as well as linear decision boundaries
Potential issueNeed enough training data to estimate Cov. Matrix
Need d(d-1)/2 independent elements
2 2,Ty x y xIf y Ax C AC A D D= ⇒ = =
.
km
cm ........
.
. ........
..... . ............
.
Maha. Dist.
Maha. Dist.c1
m1
cc
mc
xiMinimumSelector
Selected class
24EE6882-Chang
Earth Mover’s Distance (EMD)Rubner, Tomasi, Guibas ’98
Transportation Problem [Dantzig’51]
I Jcij
I: set of suppliersJ: set of consumerscij : cost of shipping a unit of supply from i to j
Problem: find the optimal set of flows fij such that
0, ,
,
,
i j iji I i I
ij
ij ji I
ij ij J
j ij J i I
minimize c f s.t.
f i I j J (No reverse shipping)
f y j J (satisfy each consumer need /cacacity)
f x i I (each supplier's limit)
y x (feasibility)
∈ ∈
∈
∈
∈ ∈
≥ ∈ ∈
= ∈
≤ ∈
≤
∑∑
∑∑
∑ ∑
25EE6882-Chang
Advantage of EMDEfficient implementations exist (Simplex Method)Also support partial matching (||I|| >< ||J||, e.g., histogram defined in different color spaces, or scales)If the mass of two distributions equal, then EMD is a true metricAllow flexible structures, e.g., matching multiple regions in each image
Multiple region in one image, each region represented by individual feature vector
Region set: {R1, R2, R3} Region set: {R1’, R2’, R3’, R4’}
Cij = dist(Ri, Rj’), which can be based on EMD also
26EE6882-Chang
EMD of Color Histogram( ) ( ) ( ) ( ) ( ) ( )
( ) 1 1
1 1
, ,..., , , ,..., , ( ) ( )
,
j i
M N
ij iji j
M N
iji j
h h 1 h 2 h M g= g 1 g 2 h N assume g j h i
C f
EMD h gf
= =
= =
= ≤
=
∑ ∑
∑∑
∑∑ Earth Hole
1 1 1
/M N N
ij ij ji j j
ij
ij ij
= C f g Fill up each hole
C : distance between color i in color space h and color j in color space g
f : move f units of mass from i in h to j in g
= = =∑∑ ∑
Normalization by the denominator termAvoid bias toward low mass distributions
Experiment result [Robner, Tomos, Guiba’98]
27EE6882-Chang
EMD with Pre-filtering
,EMD pre pre d d ; if d TH then reject candidate> >x
.. .
.
..
ij ij ji j i j
ij ij ji j i j
i j ji i
f p,q f (p,q )
f p f q
x p y q
≥
∑∑ ∑∑
∑ ∑ ∑ ∑
∑ ∑
For color histogramColor i means color of ith binx: histogramyj: histogram
EMD > Distance between average color
28EE6882-Chang
29EE6882-Chang
TextureWhat is texture?
Has structure or repetitious pattern, i.e., checkeredHas statistical pattern, i.e., grass, sand, rocks
Why texture?Application to satellite images, medical images Describes contents of real world images, i.e., clouds, fabrics, surfaces, wood, stone
Challenging issuesRotation and scale invariance (3D)Segmentation/extraction of texture regions from imagesTexture in noise
30EE6882-Chang
Approaches to texture featuresFourier Domain Energy Distribution
Angular features (directionality)
Radial features (coarseness)
21
1
2
tan
,
),(21
θθ
θθ
≤
≤
=
−
∫∫
uv
where
dudvvuFV
222
1
2
,
),(21
rvurwhere
dudvvuFV rr
<+≤
= ∫∫
xω
yω
φ
xω
yω
r
31EE6882-Chang
Co-occurrence Matrix - (image with m levels)
Popular early texture approach
Approaches to texture
)cos( and )sin( and ],[ and ],[
top'' e.g., pixels, obetween twrelation ),(,
),()0,(
),0()0,0(),(
0101
1100
),(),(
),(),(
),(
θθ
θ
θθ
θθ
θ
dxxdyyjyxIiyxI
dRwhere
mmQmQ
mQQjiQ
dRdR
dRdR
dR
+=+===
=
=
0P
1Pdθ
32EE6882-Chang
Co-occurrence Matrix(also called Grey-Level Dependence, SGLD)
Measures on
Energy
Entropy
Correlation
Inertia
Local Homogeneity
),(),( jiQ dR θ
∑∑=i j
dR jiQdE ),(),( ),(θθ
∑∑=i j
RdR jiQjiQdH ),(log),(),( ),(θθ
∑∑ ⋅−−
=i j
Ryx
yx jiQji
dC ),())((
),(σσ
µµθ
∑∑ −=i j
R jiQjidI ),()(),( 2θ
∑∑−+
=i j
R jiQji
dL ),()(1
1),( 2θ
Statistical MeasuresNone corresponds to a visual component.
33EE6882-Chang
Non-Fourier type bassMatched better to intuitive texture featuresExamples of filters (total 12)
Laws Filters [1980]
−−−−
−−−−−
14642812820000028128214641
−
−−−−
1020120402000002040210201
−−−−−
−−−−−
−−
1464141624164
6243624641624164
14641
Measure energy of output from each filter
mI12 outputs
34EE6882-Chang
Tamura TextureMethods for approximating intuitive texture featuresExample: ‘Coarseness’, others: ‘contrast’, ‘directionality’
Step1: Compute averages at different scales, 1x1, 2x2, 4x4 pixels
Step2: compute neighborhood difference at each scale
Step 3: select the scale with the largest variation
Step 4: compute the coarseness
kBestL yxSEEEEyx 2) ( ), . . . , , max( determine ),( 21k ==∀
∑∑−
−
−
−
+
−=
+
−=
=∀1
11
2
22
2
2),(),( ),,(
k
k
k
k
y
yjk
x
xiK
jifyxAyx
),2(),2() ( ),,( 11, yxAyxAyxEyx k
kk
khk−− −−+=∀
∑∑= =
=m
j
n
iBestCRS jiS
mnF
1 1),(1
38EE6882-Chang
Content-based Image and Video Retrieval System
UserUser
User interface
User interface
Image thumbnails
Image thumbnails
Images & videos
Images & videos
NetworkNetwork
QueryserverQueryserver
Image/videoServer
Image/videoServer
IndexIndex
ArchiveArchive
What are the bottlenecks of the system?What functionalities should each component have?
39EE6882-Chang
Evaluation
Precision / RecallPrecision: C/BRecall: C/A
Rank similaritySimple measure
Ground Truth in DB, A Returned Result, B
BA
C
,B Precision , Recall ↑ ↓ ↑ Recall
Precision
#Image ID Rank Correct Rank1 5 72 20...N
i i iR R α′− ⋅∑
40EE6882-Chang
Evaluation
Detection False AlarmsMissesCorrect Dismissals
N images M Benchmark queries
K Returned Results
1-N0 "Irrelevant" 0 Relevant"" 1
==
nVn
kN
n nk
kN
n nk
k
n nk
k
n nk
BVD
AVC
VB
VA
−−=
−=
−=
=
∑∑∑∑
−
=
−
=
−
=
−
−
))1((
)(
)1(
1
0
1
0
1
0
1
0
41EE6882-Chang
Evaluation
Given size of the returned results KRecall PrecisionFall out
kDkB kCkA
“Returned” “Relevant Ground Truth”
)/( )/( )/(
kkkk
kkkk
kkkk
DBBFBAAPCAAR
+=+=+=
42EE6882-Chang
Evaluation MeasuresPrecision Recall Curve
2. Receiver Operating Characteristic (ROC Curve)
3. Relative Operating Characteristic
4. R value
5. 3-point Pk value
)( kk RP vs kP
kR
kk BA vs
kk FA vs
)int( offcut at 1
0∑ −
== N
n nk VkP
08 5 0 2 at Avg 0=kk RP
Ak(hit)
Bk (false)