Multimedia Information Retrieval
Lecture 10
Lecturer: Theo GeversLab: MMIS
Email: [email protected]: www.science.uva.nl/~gevers
http: www.science.uva.nl/~gevers/master2003
0. Preview
1. Vision retrieval demands general domain
2. Text, colour, shape and texture
3. Searching and finding
4. Modelling
5. Relevance feedback
6. Compression
7. Indexing
8. Object localisation
6 Compression
Documents and images
•Document compressionHuffman codingDictionary coding (Ziv-Lempel codes)Arithmetic coding
•Image compressionJPEG codingGIF coding (Ziv-Lempel codes)
6 Compression
Level of compressionCharacter or word levelWords or phrases
Data modelStatic model - based on examining a sample of text and constructing statistical tables representing the sampleAdaptive model - starts with an a priori statistical distributionfor the text symbols but modifies this distribution as eachobject is encoded
Document dataLevel of compression and data modelsLevel of compression and data models
6 Compression
Document dataHuffman coding: ExampleHuffman coding: Example
Symbol Frequencya 7b 4c 10d 5e 2f 11g 15h 3i 7j 8
Symbol Huffmana 0110b 0010c 000d 0011e 01110f 010g 10h 01111I 110j 111
0 1
Character frequency Huffman tree Huffman code
Huffman code: variable length, prefix property
e h
5
32
0 1
a
12
7
01
bd
9
54
72
0
0
00
0
0
1
1
11
11c f
gi j
1519 23
3042
7 810
15
6 Compression
Document dataZiv-Lempel, plus variants. LZ77-GZipZiv-Lempel, plus variants. LZ77-GZip
Encoder output <0,0,a><0,0,b><2,1,a><3,2,b><5,3,b>...
Decoder output
Hint: the pointers require less space than the repeated text fragments
a b a a ab b
The code consists of triples <a,b,c>, a identifies how far back in the decodedtext to look for the upcoming text, b tells how many characters to copy for theupcoming segment, and c is a new character to add to complete the nextsegment.
Example of LZ77 compression: abaabab...
6 Compression
Document dataArithmetic codingArithmetic coding
Example: Let’s encode the character string abacus
Symbol initial after a after ab after aba after abac after abacu after abacus a 1/5 2/6 2/7 3/8 3/9 3/10 3/11 b 1/5 1/6 2/7 2/8 2/9 2/10 2/11 c 1/5 1/6 1/7 1/8 2/9 2/10 2/11 s 1/5 1/6 1/7 1/8 1/9 1/10 2/11 u 1/5 1/6 1/7 1/8 1/9 2/10 2/11 UpperBound 1 .000 0.200 0.1000 0.076190 0.073809 0.073809 0.073795 LowerBound 0.000 0.000 0.0666 0.066666 0.072619 0.073767 0.073781
6 Compression
Document dataPerformance comparisonPerformance comparison
200019901980197019601950
5
4
2
3
1
6Co
mpr
essi
on (
bits
per
cha
ract
er)
Year
huffman
compress
LZ78
LZ77
gzip
ppmz
0. Preview
1. Vision retrieval demands general domain
2. Text, colour, shape and texture
3. Searching and finding
4. Modelling
5. Relevance feedback
6. Compression
7. Indexing
8. Object localisation
7 Indexing
DocumentsInverted files
This is a text. A text has many words. Words are made from letters.
1 6 9 2819 2417 4033 4611 50 55 60
lettersmademanytextwords
Vocabulary
Example: A sample text and an inverted index build on it. The words areconverted to lower-case and some are not indexed. The occurrences pointto character positions in the text
60...50...28...11, 19...33, 40...
Occurrences
4...4...2...1, 2...3...
Block occurrences
block1 block3block2 block4
7 Indexing
DocumentsSignature files
This is a text. A text has many words. Words are made from letters.000101 110101 100100 101101
H(text) = 000101H(many)=110000H(words)=100100H(made)=001100H(letters)=100001
Signature function
Example: A signature file for a sample text cut into blocks
block1 block3block2 block4
7 Indexing
ImagesTree-based indexingTree-based indexing
Indexing facilitates searchingImages are too complex for traditional DBMSAn image becomes a point in a k-dimensional space Indexing allows to search all dimensions of the data
feature 1
feature 3
feature 2
A dot represents an image
7 Indexing
ImagesBinary treesBinary trees
Definition: A tree is a finite set of one or more nodes such that: (i) there is aspecially designated node called the root; (ii) the remaining nodes arepartitioned into n>= disjoint sets T1,…,Tn where each of these sets is a tree.T1,…,Tn are called the subtrees of the root
3
2
1LEVEL
16|30
5|10 21|28 30|35
7 Indexing
ImagesK-d TreesK-d Trees
Each of the internal nodes store values to identify a section of themultidimensional data space and a set of pointers referencing its children
FG H J
I
LM
N
OD
B
C
E ST
QPR 6
3
2 5
4 87 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
5 6 7 83 41 2
D E F G HA B C I J N OK L MP Q RS T
A
1
K
7 Indexing
ImagesR-treesR-trees
1
2 43
B CA D FE G IH
BA
CI
H
G
D
EF
2
4
1
3
MBR for the R-tree R-tree
0. Preview
1. Vision retrieval demands general domain
2. Text, colour, shape and texture
3. Searching and finding
4. Modelling
5. Relevance feedback
6. Compression
7. Indexing
8. Object localisation
8 Object localisation
Split and mergeSplit and merge
Split regions until patch is homogeneous ...
8 Object localisation
Split and merge
... and merge patches which are alike.Works because of spatial coherence.
hue
clutteringobject and
occlusionon t independen ),( QIDI
Â
Â
Â
Â
=
=
=
=
=
⋅=
t
kQk
t
kIkQk
t
kQk
t
kIkQk
w
ww
w
ww
1
1I
1
2
1C
},min{Q)(I,D
:onintersecti Histogram 2.
)(Q)(I,D
:ncorrelatio cross Normalized 1.#
100
0
clutteringobject and
occlusionon dependent ),( QIDC
I
Q
8 Object localisation
HomogeneityHomogeneity
Indoor photographyWhere is Waldo!??Where is Waldo!??
8 Object localisation
•
Charly Where is Charly!?!? Where is Charly!?!?
Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions
8 Object localisation
60 degrees rotation 40 degrees rotation Scaling
Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions
8 Object localisation
Original image ViewpointRotation
8 Object localisation
Outdoor photographyOutdoor photography Data set Data set
8 Object localisation
Outdoor photographyOutdoor photography Data set Data set
Outdoor photographyWhere is my favorite bar and where can I buy tickets?
8 Object localisation
8 Object localisation
Outdoor photographyOutdoor photographyResultsResults
8 Object localisation
Outdoor photographyOutdoor photographyResultsResults
Outdoor photographyTexture imagesTexture images
8 Object localisation
Result: RGB Result: colour ratio’s
Split and mergeResultsResults
8 Object localisation
Result: RGB Result: colour ratio’sOriginal image
texture
Split and mergeResultsResults
8 Object localisation
8 Object localisation
Results Looking for traffic signs: local vs global
ResultsResults Looking for traffic signs: local vs. globalocal vs. global
8 Object localisation
Zoekresultaten parkeerbord
0
10
20
30
40
50
60
70
80
90
100
I t e m
Xor Lokaal
And Lokaal
Globaal
Xor Lokaal 1 2 3 5 6 7 15 19 34
And Lokaal 1 2 3 4 5 6 13 15 31
Globaal 14 29 47 58 75 77 79 82 86
1 2 3 4 5 6 7 8 9
ResultsResults Looking for “staatslot” signs: local vs. globalocal vs. global
8 Object localisation
Zoekresultaten Staatslot
0
5
10
15
20
25
30
35
40
I t e m
Xor Lokaal
And Lokaal
Globaal
Xor Lokaal 1 5 14 18 34
And Lokaal 1 4 13 16 27
Globaal 2 12 16 27 37
1 2 3 4 5
0. Preview
1. Vision retrieval demands general domain
2. Text, colour, shape and texture
3. Searching and finding
4. Modelling
5. Relevance feedback
6. Compression
7. Indexing
8. Object localisation
9. Summary and conclusion
Features
text, colour, shape and composite
Modeling
fuzzy-extended boolean, vector space and probabilistic
Searching and classification
k-nearest neighbor, clustering
Interaction
relevance feedback (vector space, probabilistic)
Multimedia information
9 Summary and conclusion
Compression
Huffman, Ziv-Lempel
Indexing
inverted files, signature files, K-d-trees, R-trees
Localization and visualization
Split-and-merge, highlighting
Multimedia information
9 Summary and conclusion
Demo1: real-time skin detection for human recognition
9 Summary and conclusion
Demo2: skin/subtitle/speaker identification
9 Summary and conclusion
Demo3: real-time object recognition and tracking*Hieu
9 Summary and conclusion
Demo4: real-time object recognition and tracking*Hieu
9 Summary and conclusion
Demo5: real-time human recognition and tracking*Hieu
9 Summary and conclusion
Robust to background clutter and changing object appearance
9 Summary and conclusion
Demo6: real-time human recognition and tracking[Hieu, IEEE PAMI, 2003]
Demo7: real-time background detection and removal*Anuj
9 Summary and conclusion
Demo8: real-time object classification
9 Summary and conclusion
video classification
material shadow-shape
video classification
material shadow-shape
Demo9: real-time object classification
9 Summary and conclusion
video classification
Demo10: real-time object classification
9 Summary and conclusion
9 Summary and conclusion
Demo11: real-time object classification
9 Summary and conclusion
Demo12: real-time object classification
9 Summary and conclusion
Demo13: real-time object classification
9 Summary and conclusion
Demo13: real-time object classification
9 Summary and conclusion
Demo14: real-time object classification
Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.
9 Summary and conclusion
Demo15: real-time object classification
Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.
9 Summary and conclusion
Demo15: real-time object classification
Techniques:• Genre classification of image and video• Search and learning strategies in image and video databases• Interactive methods for image search
9 Summary and conclusion
Demo16: real-time object classification: imageserach engines
Content-based image retrieval
Fast indexing
Query
pictorial example
attributes
Invariance
Prototype: Prototype: PictureFinderPictureFinder
9 Summary and conclusion
Peter Vreman “Lokalisatie van objecten in kleurenbeelden” (completed)
Neeltje Blommestein “The Relevance Pyramid: Combining Browsing and RelevanceFeedback in Image Databases” (completed)
Wilma Tomasouw “Relevance Feedback Techniques in Color Texture Image Databases”(completed)
Frank Aldershoff “Classification of Images on Internet by Visual and Textual Information”(completed)
Salmon Tetelepta “Photometric Hashing”
Arnoud Rob “Classifying Football Video”
Simon van der Woude “Billboard Identification in Video”
…
Multimedia informationTrainees at ISIS (stage)
9 Summary and conclusion
Morfologische algorithmiek
Talige indexering van beeldinformatie
Zoeken van beelden op het World Wide Web
Gezichtspunt-onafhankelijk object herkenning
Database research
Gezichtsdetectie in video
Hyperdocument generatie uit trainingsmateriaal
Multimedia informatie analyse
Affien invariante deformatie
Localizatie van mobiele platforms
Aggressiedetectie
Volgen van mensen ….
Multimedia informationTrainees at ISIS (stage)
9 Summary and conclusion