Post on 06-Jul-2020
transcript
Lost in Binarization:
Yu-Gang Jiang Jun Wang Shih-Fu Chang
Columbia University IBM T.J. Watson Research
Query-Adaptive Ranking for Similar Image Search with Compact Codes
ACM ICMR 2011, Trento, Italy, April 20111
• Explosive growth of the amount of visual data
• The Internet boosts up information overload
Growth of Visual Data
…
2
Large Scale Visual Search
• Nearest neighbor search• Challenges
– Feature must fit in memory • Disks are too slow…
– Matching needs to be fast enough
Facebook has around 20 billion images (2x1010)PC can have 20 Gbytes of memory (2x1011 bits)
Budget of 101 bits/image
YouTube has over a trillion video frames (1012)Good cluster can have 10 Tbytes memory (1014 bits)
Budget of 102 bits/frame
query
NN
3Budget numbers from slide of Rob Fergus
• Inverted file– Indexing structure is expensive; typically still requires
hundreds of bytes for each image
• Tree-based approaches– E.g., kd tree
• Works well on low dimension, but can not handle high dimensional data very well
– Chapter 39 : Nearest neighbors in high-dimensional spaces. Handbook of Discrete and Computational Geometry (2nd ed.). CRC Press
• Hashing or binary embedding methods– locality sensitive hashing, spectral hashing, deep
learning…– Attracted a lot of attention in recent years
Scalable Search Methods
4
• Hyperplane partitioning
• Linear projection based hashing
x1
Hashing Based Indexing
x x1 x2 x3 x4 x5
h1 0 1 1 0 1
h2 1 0 1 0 1
h3 1 0 1 1 0
h1h2
h3 … … … … … …
hk … … … … …
011… 100… 111… 001… 110…
Hamming Distance
x2
x3x4
x5
5
Visual Query
101101110101Visual Search by Compact Codes
Modified from slide of Rob Fergus
Limitation• Coarse
ranking 6
Visual Query
101101110101Visual Search by Compact Codes
Limitation• Coarse
ranking
12 different codes with Hamming distance 1
66 different codes with Hamming distance 2
220 different codes with Hamming distance 3
7
63
4
8
57
9
12
11
10
12
• Assume we use binary codes with n bits– There will be n different Hamming distances
• Original # levels of ranking: n
• #levels of ranking increase from n to 2n !• The weights are computed adaptively for each query
How to produce better ranking?
Query: 1 0 1 1 0Image 1: 1 1 1 1 0 (HD=1)Image 2: 1 0 1 1 1 (HD=1)
8
Bit-wise weights: 0.1 0.3 0.5 0.2 0.6
Query
…
[0 0 1 0 … 0 1 0]
Binary embedding to compact code
[0.13 0.05 0.51 … 0.06]
Image database(compact codes)
Query-adaptive weights
sunset
water
person
cityscape
tree
plane
… …
Auxiliary database: semantic concept classes- image compact codes and learned class-specific weights
[1 0 1 0… 0 0 0] [1 0 0 0… 0 0 0]
[1 0 0 0… 1 0 0][1 1 1 0… 0 0 0]
[1 0 1 0… 0 1 0][0 0 1 0… 0 0 0]
[1 0 1 0… 0 1 0][0 0 1 0… 0 1 0]
[1 0 1 0… 0 0 1][0 0 1 0… 0 0 1]
[1 0 0 0… 0 0 1][1 0 1 0… 0 0 1]
[0 1 1 0… 0 1 1]
[0 0 1 0… 0 1 0]
[0 1 0 0… 0 1 1]
[0 0 0 0… 0 1 1]
[1 1 1 0… 1 0 0]
[1 1 1 1… 0 0 0]
[1 1 1 1… 0 0 0]
[1 0 1 0… 0 1 0]
[0 0 0 0… 0 1 0] [1 0 0 0… 0 1 0]
[1 0 0 0… 0 1 0][1 0 0 0… 0 1 0]
[0.05 0.15 0.21 … 0.46][0.22 0.11 0.12 … 0.15]
[0.02 0.24 0.22 … 0.08] [0.22 0.04 0.62 … 0.02][0.08 0.17 0.02 … 0.19]
[0.12 0.11 0.42 … 0.10]
…
Feature extraction
Framework for Query-Adaptive Ranking
9
Learning Concept-Specific WeightsCenter of binary codes of concept i
Intra-class compactness
Inter-class relationship
Final objective function
Binary code of an image
Weight vector for concept k
10
Concept class similarity in raw feature space
Learning Concept-Specific Weights
11
• Rewrite the objective function in quadratic form:
Learning Concept-Specific Weights
12
• Rewrite the objective function in quadratic form:
Query
…
[0 0 1 0 … 0 1 0]
Binary embedding to compact code
[0.13 0.05 0.51 … 0.06]
Image database(compact codes)
Query-adaptive weights
sunset
water
person
cityscape
tree
plane
… …
Auxiliary database: semantic concept classes- image compact codes and learned class-specific weights
[1 0 1 0… 0 0 0] [1 0 0 0… 0 0 0]
[1 0 0 0… 1 0 0][1 1 1 0… 0 0 0]
[1 0 1 0… 0 1 0][0 0 1 0… 0 0 0]
[1 0 1 0… 0 1 0][0 0 1 0… 0 1 0]
[1 0 1 0… 0 0 1][0 0 1 0… 0 0 1]
[1 0 0 0… 0 0 1][1 0 1 0… 0 0 1]
[0 1 1 0… 0 1 1]
[0 0 1 0… 0 1 0]
[0 1 0 0… 0 1 1]
[0 0 0 0… 0 1 1]
[1 1 1 0… 1 0 0]
[1 1 1 1… 0 0 0]
[1 1 1 1… 0 0 0]
[1 0 1 0… 0 1 0]
[0 0 0 0… 0 1 0] [1 0 0 0… 0 1 0]
[1 0 0 0… 0 1 0][1 0 0 0… 0 1 0]
[0.05 0.15 0.21 … 0.46][0.22 0.11 0.12 … 0.15]
[0.02 0.24 0.22 … 0.08] [0.22 0.04 0.62 … 0.02][0.08 0.17 0.02 … 0.19]
[0.12 0.11 0.42 … 0.10]
…
Feature extraction
The framework (Recall)
13
Experimental results
14
• 260,000 Flickr images from NUS
• 81 fully labeled classes
• Randomly sampled 8,000 query images
• Evaluation: normalized (mean) average precision
-1 1-1
1 Neighbor pair
Non-neighbor pair
Two supervised binary coding methods
• Semi-Supervised Hashing • J. Wang, S. Kumar, S.-F. Chang, CVPR & ICML 2010
• Deep Belief Network• Hinton & Salakhutdinov
• Science 2006 500
500
w1
500
256
w2
256
N
w3
input
output
15
Overall performance
16
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
airp
ort
anim
albe
ach
bear
bird
sbo
ats
book
brid
gebu
ildin
gsca
rsca
stle cat
city
scap
ecl
ouds
com
pute
rco
ral
cow
danc
ing
dog
earth
quak
eel
kfir
efis
hfla
gsflo
wer
sfo
od fox
frost
gard
engl
acie
rgr
ass
harb
orho
rses
hous
ela
ke leaf
map
mili
tary
moo
nm
ount
ain
nigh
ttim
eoc
ean
pers
onpl
ane
plan
tspo
lice
prot
est
railr
oad
rain
bow
refle
ctio
nro
adro
cks
runn
ing
sand
sign sk
ysn
owso
ccer
spor
tsst
atue
stre
etsu
nsu
nset
surf
swim
mer
sta
ttoo
tem
ple
tiger
tow
erto
wn
toy
train
tree
valle
yve
hicl
ew
ater
wat
erfa
llw
eddi
ngw
hale
sw
indo
wze
bra
ΔM
AP
traditional Hamming distancequery-adaptive Hamming distance
Per-category performance
17
• Divide the queries into 81 groups according to their semantic label(s)
Bas
elin
eO
urs
QueryB
asel
ine
Ou
rs
Query
Result example
18
• A query-adaptive ranking approach for compact code image search• Finer-grained ranking!
• Future work• Consider more
semantic classesin the auxiliary database
Visual Query
101101110101
Visual Search by Compact Codes
Finer-grained ranking!
63
4
8
5
7
9
12
11b
10
12
Summary
Thank you!
20