+ All Categories
Home > Documents > Image Retrieval Discussion Andrew Chi Brian Cristante COMP 790-133: January 27, 2015.

Image Retrieval Discussion Andrew Chi Brian Cristante COMP 790-133: January 27, 2015.

Date post: 19-Dec-2015
Category:
Upload: derrick-cain
View: 223 times
Download: 0 times
Share this document with a friend
Popular Tags:
38
Image Retrieval Discussion Andrew Chi Brian Cristante COMP 790-133: January 27, 2015
Transcript

Image Retrieval Discussion

Andrew ChiBrian Cristante

COMP 790-133: January 27, 2015

IMAGE RETRIEVAL

AI / VISION PROBLEM

SYSTEMS DESIGN / SOFTWARE ENGINEERING

PROBLEM

Sensory Gap: “What features should we use?”

Query-Dependent?

SIFT

Text Attributes

Semantic Gap: “How should we index the images and

retrieve them?”

Centrality Measures

Semantic Hierarchies

Design issues

Complex architecture Scalability

Integrity of results

Efficiency

Domain

Broad Narrow

Intention Gap: Type of search

and user

SpecificImage

ExploratorySearch

Set of Images

PageRank

VisualKeywords

3

What is “Image Retrieval?”

•Have a specific image stored somewhere and want to find it again•Dig around in a collection for some image that suits

our needs• Just browsing and want guidance, helpful hints, and

logical organization•Query can be text or another image, or both

4

First Attempts

• Web searches: use text associated with the image, in context, on its webpage• Captions• Surrounding text• Other metadata

• IBM QBIC (1995)• QBIC = Query By Image Content• Use low-level features to tag images with “visual keywords”• Search “red,” “square shaped,” “metallic”

5

The Landscape of the Problem

Around the year 2000, researchers began to break down the problem:• Sensory Gap: the difference between a real-world object and

how it looks in an image• Semantic Gap: the difference between low-level features and

the actual content of the image• Intention Gap: what does the user even want?

6

The Landscape of the Problem

This leads to our central questions:• How should we represent our images?• How should we index and organize our images?• How should we interpret a natural-language query by the

user?•What are some algorithms we can use to actually retrieve

an image in response to a query?

IMAGE RETRIEVAL

AI / VISION PROBLEM

SYSTEMS DESIGN / SOFTWARE ENGINEERING

PROBLEM

Sensory Gap: “What features should we use?”

Query-Dependent?

SIFT

Text Attributes

Semantic Gap: “How should we index the images and

retrieve them?”

Centrality Measures

Semantic Hierarchies

Design issues

Complex architecture Scalability

Integrity of results

Efficiency

Domain

Broad Narrow

Intention Gap: Type of search

and user

SpecificImage

ExploratorySearch

Set of Images

PageRank

VisualKeywords

8

Sensory Gap

Addressing the sensory gap means choosing appropriate features that give us the information we need from an image.Some information is naturally lost in creating an image. This can’t be helped.

(Or can it?)

9

What’s a Good Feature?

So we have billions of images that don’t necessarily share visual characteristics. How should we represent them to highlight their similarities and differences?

There’s no clear-cut answer to this …

10

What’s a Good Feature?

Bag of (visual) words SIFT (Gradient-based)

11

What’s a Good Feature?

VisualRank paper:• Search by web text to narrow the number of images under

consideration• Want to find the most “important” image in terms of its similarity

to the other images• Local features can capture more subtle differences• Choose SIFT features, which are rather robust (scale, rotation,

illumination, but not color)

12

What’s a Good Feature?

VisualRank paper asks: could we make the choice of features adapt to users’ queries?

We’ll save this for discussion.

IMAGE RETRIEVAL

AI / VISION PROBLEM

SYSTEMS DESIGN / SOFTWARE ENGINEERING

PROBLEM

Sensory Gap: “What features should we use?”

Query-Dependent?

SIFT

Text Attributes

Semantic Gap: “How should we index the images and

retrieve them?”

Centrality Measures

Semantic Hierarchies

Design issues

Complex architecture Scalability

Integrity of results

Efficiency

Domain

Broad Narrow

Intention Gap: Type of search

and user

SpecificImage

ExploratorySearch

Set of Images

PageRank

VisualKeywords

14

Semantic Gap

• To cross the semantic gap for retrieval, we have to make links between the features we’ve extracted and what a user would be searching for• That’s why, in our concept map, we say that the semantic gap makes

us think about how to index and retrieve images (represented in whatever way)• Think of building a data structure and devising an algorithm to

traverse that data structure

15

Attributes• Elements of semantic significance• Descriptive (“furry”), subcomponents (“has nose”), or discriminative

(something a dog has but a cat does not)

Farhadi et al., 2009

16

AttributesLie inside the semantic gap, between low-level features and the full semantic interpretation of the image.

Semantic Gap

ATTRIBUTES

Features

Image (255, 0, 31)

[0, -0.5, 1.3, 1.6, 0.1, -0.2, …, 0.3]

CATEGORY “Car”

Red, has 4 wheels, has engine

17

Slide credit:Behjat Siddiquie

18

Slide credit:Behjat Siddiquie

19

Slide credit:Behjat Siddiquie

20

Slide credit:Behjat Siddiquie

21

Slide credit:Behjat Siddiquie

22

Semantic Hierarchies

• Organize images in a tree of increasingly more specific categories• IS-A relationships• Need a large number of images for this to be non-trivial

• This can be used for a variety of vision tasks, including retrieval• Exploratory search• Finding representatives of some category• Building datasets• Find images that contain semantically similar objects -- but not necessarily

visually similar!

• ImageNet (www.image-net.org) • Big crossover with NLP (WordNet)

23

Retrieval with Semantic Hierarchies

• Semantic hierarchies and attributes can be used together for efficient retrieval methods • Compute similarity (“image distance”) by comparing attributes• Use the hierarchy to weight the co-occurrence of attributes• That is, the hierarchy accounts for prior knowledge

For you math nerds …

Where:A, B are imagesi, j index attributesδi(A) is the “indicator function”Sij is the co-occurrence score

24

Retrieval with Semantic Hierarchies

• Use hashing to retrieve images in sub-linear time with respect to the size of the collection (Deng, A. Berg, Fei-Fei, 2011)• Highly parallelizable

IMAGE RETRIEVAL

AI / VISION PROBLEM

SYSTEMS DESIGN / SOFTWARE ENGINEERING

PROBLEM

Sensory Gap: “What features should we use?”

Query-Dependent?

SIFT

Text Attributes

Semantic Gap: “How should we index the images and

retrieve them?”

Centrality Measures

Semantic Hierarchies

Design issues

Complex architecture Scalability

Integrity of results

Efficiency

Domain

Broad Narrow

Intention Gap: Type of search

and user

SpecificImage

ExploratorySearch

Set of Images

PageRank

VisualKeywords

26

Narrow domain: medical image search• http://openi.nlm.nih.gov/• Simultaneous phrase and image-

based search• Image retrieval:• Extract low-level features (color,

texture, shape)• Transform features into visual

keywords, annotations• Compute similarity between query

image and database images

27

Types of Centrality• Degree centrality

• Eigenvector centrality

• Katz centrality

• PageRank

From Networks: An Introduction, by M.E.J. Newman, 2010.

A

B

A and B have eigenvector centrality 0, but non-zero Katz centrality.

28

Image Rank at Web Scale

• 1.8 billion photos shared per day• How long would it take to just

compute the similarity matrix?• N = 1.8 x 109, 100 cycles per

similarity, 1000 CPUs @ 3GHz• (N2/2)*100 / (3*109) / 1000 /

86400 / 365 = 1.7 years

• O(n2) is far too slow.

Source: KPCB, Internet Trends 2014

29

Locality-Sensitive Hashing (LSH)

• Key idea: avoid computing the entire distance matrix• Most pairs of images will extremely

dissimilar• Find a way to compare only the

images that have a good chance of being similar

• Hashing• Normally usually used to spread

data uniformly.• LSH does the opposite. Used for

dimensionality reduction.

30

LSH on Sets (MinHash)

• Similarity of two sets (of features, n-grams, etc.)• Jaccard similarity:

• MinHash:• Use a normal hash function to hash

every element of both sets.• Now, assign each set to the bucket

denoted by the minimum (numerical) hash of any of its elements.

• What is the probability two sets S and T will be assigned to the same bucket?

31

LSH on n-Dimensional Feature Vectors

32

LSH for VisualRank: Algorithm1. Extract local (SIFT) features from

images A-D2. Hash features using many LSH

functions of the form:

3. Features match if they hash to the same bucket in >3 tables.

4. Images match if they share >3 matching features.

5. Estimate similarity of matching images using #matches/#avgtotal

33

LSH for VisualRank: Performance

• Time to compute single similarity matrix (single CPU):• 1,000 images• 15 minutes

• Large scale estimate (if your name is Google):• 1,000 CPUs• Top 100,000 queries• Use top 1000 images for each query• Less than 30 hours

• Specifics not published, but MapReduce is the likely platform.

34

MapReduce (sort of)

35

MapReduce (more accurate)

Map Reduce(split) (shuffle)

All the world's a stage, and all…

And all my soul, and all my…

And this the hand that slew…

all, 2the, 1

world, 1and, 1

stage, 1

and, 2all, 2my, 2

soul, 1

and, 1this, 1the, 1

hand, 1that, 1slew, 1

Complete works of

Shakespeare

all, 2all, 2

and, 1and, 2and, 1

the, 1the, 1

world, 1

stage, 1

my, 1

soul, 1

this, 1

all, 4

and, 4

the, 2

world, 1

stage, 1

my, 1

soul, 1

this, 1

(collect)

Hist

36

Questions1. Suggest a method for

implementing the VisualRank LSH algorithm at a large scale.a) MapReduce: what are the

mappers and reducers?b) UNC Kure/Killdevil: what would

each 12-core node do?

2. Say you are not Google How would you approach this problem without knowing the 100,000 most likely queries beforehand?

37

Questions

3. Why might you wish to use graph centrality as a ranking mechanism for image retrieval? Why might you prefer to use a semantic hierarchy instead?

4. (Open-ended) If you were a large search engine, how might you learn and deploy query-dependent feature representations of images? Could you also leverage the information in a semantic hierarchy?

38

References• (Survey paper) Datta, Ritendra, Dhiraj Joshi, Jia Li, and James Z. Wang. “Image Retrieval: Ideas, Influences,

and Trends of the New Age.” ACM Comput. Surv. 40, no. 2 (May 2008): 5:1–5:60. doi:10.1145/1348246.1348248.

• Deng, Jia, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database.” In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, 248–55, 2009. doi:10.1109/CVPR.2009.5206848.

• Ghosh, P., S. Antani, L.R. Long, and G.R. Thoma. “Review of Medical Image Retrieval Systems and Future Directions.” In 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), 1–6, 2011. doi:10.1109/CBMS.2011.5999142.

• Kurtz, Camille, Adrien Depeursinge, Sandy Napel, Christopher F. Beaulieu, and Daniel L. Rubin. “On Combining Image-Based and Ontological Semantic Dissimilarities for Medical Image Retrieval Applications.” Medical Image Analysis 18, no. 7 (October 2014): 1082–1100. doi:10.1016/j.media.2014.06.009.

• Siddiquie, B., R.S. Feris, and L.S. Davis. “Image Ranking and Retrieval Based on Multi-Attribute Queries.” In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 801–8, 2011. doi:10.1109/CVPR.2011.5995329.

• Zhang, Hanwang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. “Attribute-Augmented Semantic Hierarchy: Towards a Unified Framework for Content-Based Image Retrieval.” ACM Trans. Multimedia Comput. Commun. Appl. 11, no. 1s (October 2014): 21:1–21:21. doi:10.1145/2637291.


Recommended