Post on 19-Dec-2015
transcript
1
WISE: Large Scale Content-Based Web Image Search
Michael IsardJoint with: Qifa Ke, Jian Sun, Zhong Wu
Microsoft Research Silicon Valley
2
What leaf ?
Artist ? Higher resolution?
Bank web-site?
Email ?
Ad. in e-bay ?
……
©: Who else using this?
Query by Images“A picture is worth a thousand words.”
Partial-Duplicate Image Search
• Given a query image, find its partial duplicates from a database of web-images
4
Two Major Challenges
• How to represent images– No text annotations or labels– Noise and modification
• How to efficiently index and query images– Large number of images (millions)
Index
?
5
Image Representation: Bag-of-Words[CVPR’09, ICCV’09]
Descriptor ID
1
3506
999,999
1,000,000
Detection[Lowe 2004, Matas et al 2002,Winder et al 2007]
Bag-of-words [Sivic&Zisserman’2003]
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
20
40
60
80
100
120
0 5 10 15 20 25 300
10
20
30
40
50
60
……
0 5 10 15 20 25 300
10
20
30
40
50
60
……
Code-book
Normalization
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
20
40
60
80
100
120
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
5
10
15
20
25
30
35
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
Description[Lowe 2004, Winder et al 2007]
3506
1
999,999
206
…
2: Quantization1: Feature extraction: Bundle Features 3: Representation
6
…
7
Matching query to database
• Use an index– Each visual word has a ‘posting list’– Lists every image containing the word
• At query time– Look up the posting list for each query word– Merge lists to find candidate images
• Partial match: don’t need every word to be present
8
How much work to query?
• Disk-based index, bottleneck is random reads– One seek per posting list
• Also one seek per matching image– To fetch thumbnail etc
• Keep as little information as possible in posting lists, to keep index size small
9
Index Pipeline• Implemented in a large computer cluster
– 256-nodes, using Dryad/DryadLINQ
Image Crawler
Content Chunk Visual WordsFeature
Quantizer
Crawler
BundledFeatures
Feature Extractor
Media DB(Thumbnail, URL)
Local Features
Indexer
InvertedIndex
IndexVisual Word
10
Query Pipeline
BundledFeatures
Feature Extractor Visual WordsFeature
QuantizerQuery Image
InvertedIndex
Index Server
Search Results(chunkID, imgID)
Media DB(Thumbnail, URL)
GUIResults
…
Bag-of-Words: Limitations
• Quantization– Lost discriminative
power– Sensitive to image
variations and noises– Soft quantization
[Philbin et al, CVPR 2008]
– Hamming embedding [Jegou et al, ECCV 2008]
Descriptor ID
1
3506
999,999
1000,000
0 5 10 15 20 25 300
10
20
30
40
50
60
0 5 10 15 20 25 300
20
40
60
80
100
120
0 5 10 15 20 25 300
10
20
30
40
50
60
……
0 5 10 15 20 25 300
10
20
30
40
50
60
……
Quantization
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
90
12
Geometric verification
• In practice, bag of words is too weak• Does not exploit any geometry• Post-process to check spatial layout of
matching features• Requires a disk seek per image
– Only used as a re-ranking step to shortlist of matched images
Geometric Re-ranking
Re-rank top 300 images
50000 200000 500000 10000000.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
Number of images
mA
P
baseline (bag-of-words)
baseline + reranking
14
Geometry in the index
• Previous works:– Jegou et al ECCV 2008
• Try to match similar orientations and scales
– Perdoch et al CVPR 2009• Match oriented features more effectively
• Still feature-by-feature– Global geometric consitency applied at the end
Single Feature is Weak
++
Neighboring Features ?
Define Neighboring Features
• Previous works– kNN voting [Sivic&Zisserman 2003]
– Higher-order spatial features [Liu et al][Yuan et al][Tirilly et al][Quack et al]
– Post geometric spatial verification [Lowe’2004][Chum et al 2007][Nister 2006][Philbin et al 2007]……
– Geometric Min-Hash [Chum et al 2009]
• Challenges– Repeatable– Partial matching– Scalable: simple enough to build into index
Define Neighboring Features
DoG Features [Lowe 2004] MSER Features [Matas et al 2002]
- point features- repeatable
- region features- repeatable
Define Neighboring Features
DoG Features [Lowe 2004] MSER Features [Matas et al 2002]
- point features- repeatable
- region features- repeatable
region groups points?
Bundled Feature: Definition
• Bundled Feature = A set of DOG features bundled by a MSER region
DoG interest points
MSER region
Bundled Feature: DefinitionBundled Features
Matched bundlep = { pi }
Query bundleq = {qj} = { }
Matching Bundles: MembershipMembership score:
Voting weight:
( ; ) 4mM q p q p
1 2( , ) ( ) 4 16
j j
jq q
Sim I I v q
( ) ( ; ) 4j mv q M q p
Matched bundlesp1, p2, p3
Query bundleq = {qj} = { }
Membership score:
Matching Bundles: Membership
p1
p2
p3
1 1( ; ) 2mM p pq q
2 2( ; ) 1mM p pq q
3 3( ; ) 2mM p pq q
1 2( , ) ( ) 8
j
jq
Sim I I v q 1( ) 2v q 2( ) 2v q
3( ) max(1,2) 2v q
4( ) 2v q
( ) max ( ; ) |j m jv q M q k
kp
q p q
Matching Bundles: Geometric Constraint
1
3
2
44
1
3
2
5
order in target img: 1 < 3 < 4 < 5
order inconsistency: 0 + 0 + 0 = 0
Penalize inconsistent relative orders:
y
1 < 2 < 3 < 4order in query img: 1 < 2 < 3 < 4
1
3
2
4
2
1
3
4
5
5 > 2 > 1 < 3
1 + 1 + 0 = 2
matching order:
inconsistency:
order in query img:
query candidate query candidate
1( ; )g q i q iM O p O p q p
( ; ) ( ; ) ( ; )m gM M M q p q p q p
Matching Bundles: Formulation
• Bundle matching score:
• Image matching score:geometric constraintmembership
1 2{ }
( , ) ( )j
jq
Sim I I v q
( ) max ( ; ) |j jv q M q k
kp
q p q
- Repeatable- Partial matching- Scalable?
Inverted Index (without Bundles)Posting ……
Image ID
Visualword
……
… … … 27
… 27
……
……
Image ID = 27
Inverted Index with BundlesPosting ……
Image ID
Visualword
……
… … … 27,[3,1,1]
…27,
[1,2,1]…
…
……
Image ID = 27
Bundle Bits
9 bits 5 bits 5 bits
Bundle ID X-Order Y-Orderp1
p2
p3
1
3
2
27,[2,2,2
]
RetrievalQuery Image Iq Inverted index with bundle bits
1,1,[2,5,9]
3,1,[3,4,5]
9,2,[3,2,5]
10, 1,[1,1,2]
12,1,[1.1.2]
… 10, 1,[2,2,1] … … …
……
……
……
p1
p2
p3
Top candidate images
Experimental Settings
• Image database:– 1M web images from query-click log
• Ground truth partial duplicates– 780 known partial duplicate images in 19 groups
• Baseline bag-of-words– Visual word vocabulary size = 1 M– Soft quantization factor = 4– 500 features per image
Partial Duplicate Example
……
Partial Duplicate Example
……
Example Query Results
Challenging cases
Query
Evaluation: Precision-Recall
• A query returns N images– T : correct matches– A : expected matches
Precision = T
NRecall =
T
A
Recall
Prec
isio
n
Comparison: Precision-Recall
Baseline bag-of-words (started from 13th)
Bundled features (started from 13th)
Query image:
More Precision-Recall Comparisons
• Average Precision (AP) for one query: – Area under Precision-Recall curve
• mAP: mean of AP’s from all testing queries
Evaluation: mAP
AP
mAP: Baseline Bag-of-Words
50000 200000 500000 1000000
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of images
mA
P
baselineHEbundled(membership)bundledbundled + HE
mAP: Hamming Embedding (HE)
50000 200000 500000 1000000
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of images
mA
P
baselineHEbundled(membership)bundledbundled + HE
mAP: Bundle (Membership)
50000 200000 500000 1000000
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of images
mA
P
baselineHEbundled(membership)bundledbundled + HE
mAP: Bundle (both terms)
50000 200000 500000 1000000
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of images
mA
P
baselineHEbundled(membership)bundledbundled + HE
26% 40%
mAP: Bundle + HE
50000 200000 500000 1000000
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of images
mA
P
baselineHEbundled(membership)bundledbundled + HE
49%
Bundle VS. Geometric Re-ranking
50000 200000 500000 10000000.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
Number of images
mA
P
baseline (bag-of-words)bundlebaseline + rerankingbundle + reranking
Re-rank top 300 images
Bundle + Geometric Re-ranking
50000 200000 500000 10000000.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
Number of images
mA
P
baseline (bag-of-words)bundlebaseline + rerankingbundle + reranking
Re-rank top 300 images
24% 77%
More Results
Failure Case
46
Demo Setup
Document Server
Web Server
Index Servers
Client
6 million images
47
Demo
Query image
Results
Conclusion
• Bundle feature– More discriminative– Enforce spatial constraints while traversing index– Partial match– Scalable: built into index
9 bits 5 bits 5 bits
Bundle ID X-Order Y-Order
Thanks!