Multimedia Information Retrievalstaff.science.uva.nl/~gevers/pub/part10.pdf · Multimedia...

transcript

Multimedia Information Retrieval

Lecture 10

Lecturer: Theo GeversLab: MMIS

Email: gevers@science.uva.nlhttp: www.science.uva.nl/~gevers

http: www.science.uva.nl/~gevers/master2003

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

6 Compression

Documents and images

•Document compressionHuffman codingDictionary coding (Ziv-Lempel codes)Arithmetic coding

•Image compressionJPEG codingGIF coding (Ziv-Lempel codes)

6 Compression

Level of compressionCharacter or word levelWords or phrases

Data modelStatic model - based on examining a sample of text and constructing statistical tables representing the sampleAdaptive model - starts with an a priori statistical distributionfor the text symbols but modifies this distribution as eachobject is encoded

Document dataLevel of compression and data modelsLevel of compression and data models

6 Compression

Document dataHuffman coding: ExampleHuffman coding: Example

Symbol Frequencya 7b 4c 10d 5e 2f 11g 15h 3i 7j 8

Symbol Huffmana 0110b 0010c 000d 0011e 01110f 010g 10h 01111I 110j 111

Character frequency Huffman tree Huffman code

Huffman code: variable length, prefix property

1519 23

6 Compression

Document dataZiv-Lempel, plus variants. LZ77-GZipZiv-Lempel, plus variants. LZ77-GZip

Encoder output <0,0,a><0,0,b><2,1,a><3,2,b><5,3,b>...

Decoder output

Hint: the pointers require less space than the repeated text fragments

a b a a ab b

The code consists of triples <a,b,c>, a identifies how far back in the decodedtext to look for the upcoming text, b tells how many characters to copy for theupcoming segment, and c is a new character to add to complete the nextsegment.

Example of LZ77 compression: abaabab...

6 Compression

Document dataArithmetic codingArithmetic coding

Example: Let’s encode the character string abacus

Symbol initial after a after ab after aba after abac after abacu after abacus a 1/5 2/6 2/7 3/8 3/9 3/10 3/11 b 1/5 1/6 2/7 2/8 2/9 2/10 2/11 c 1/5 1/6 1/7 1/8 2/9 2/10 2/11 s 1/5 1/6 1/7 1/8 1/9 1/10 2/11 u 1/5 1/6 1/7 1/8 1/9 2/10 2/11 UpperBound 1 .000 0.200 0.1000 0.076190 0.073809 0.073809 0.073795 LowerBound 0.000 0.000 0.0666 0.066666 0.072619 0.073767 0.073781

6 Compression

Document dataPerformance comparisonPerformance comparison

200019901980197019601950

huffman

compress

0. Preview

4. Modelling

6. Compression

7. Indexing

7 Indexing

DocumentsInverted files

This is a text. A text has many words. Words are made from letters.

1 6 9 2819 2417 4033 4611 50 55 60

lettersmademanytextwords

Vocabulary

Example: A sample text and an inverted index build on it. The words areconverted to lower-case and some are not indexed. The occurrences pointto character positions in the text

60...50...28...11, 19...33, 40...

Occurrences

4...4...2...1, 2...3...

Block occurrences

block1 block3block2 block4

7 Indexing

DocumentsSignature files

This is a text. A text has many words. Words are made from letters.000101 110101 100100 101101

H(text) = 000101H(many)=110000H(words)=100100H(made)=001100H(letters)=100001

Signature function

Example: A signature file for a sample text cut into blocks

block1 block3block2 block4

7 Indexing

ImagesTree-based indexingTree-based indexing

Indexing facilitates searchingImages are too complex for traditional DBMSAn image becomes a point in a k-dimensional space Indexing allows to search all dimensions of the data

feature 1

feature 3

feature 2

A dot represents an image

7 Indexing

ImagesBinary treesBinary trees

Definition: A tree is a finite set of one or more nodes such that: (i) there is aspecially designated node called the root; (ii) the remaining nodes arepartitioned into n>= disjoint sets T1,…,Tn where each of these sets is a tree.T1,…,Tn are called the subtrees of the root

1LEVEL

5|10 21|28 30|35

7 Indexing

ImagesK-d TreesK-d Trees

Each of the internal nodes store values to identify a section of themultidimensional data space and a set of pointers referencing its children

FG H J

4 87 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

5 6 7 83 41 2

D E F G HA B C I J N OK L MP Q RS T

7 Indexing

ImagesR-treesR-trees

B CA D FE G IH

MBR for the R-tree R-tree

0. Preview

4. Modelling

6. Compression

7. Indexing

8 Object localisation

Split and mergeSplit and merge

Split regions until patch is homogeneous ...

Split and merge

... and merge patches which are alike.Works because of spatial coherence.

clutteringobject and

occlusionon t independen ),( QIDI

},min{Q)(I,D

:onintersecti Histogram 2.

)(Q)(I,D

:ncorrelatio cross Normalized 1.#

clutteringobject and

occlusionon dependent ),( QIDC

HomogeneityHomogeneity

Indoor photographyWhere is Waldo!??Where is Waldo!??

Charly Where is Charly!?!? Where is Charly!?!?

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions

60 degrees rotation 40 degrees rotation Scaling

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions

Original image ViewpointRotation

Outdoor photographyOutdoor photography Data set Data set

Outdoor photographyWhere is my favorite bar and where can I buy tickets?

Outdoor photographyOutdoor photographyResultsResults

Outdoor photographyTexture imagesTexture images

Result: RGB Result: colour ratio’s

Split and mergeResultsResults

Result: RGB Result: colour ratio’sOriginal image

texture

Split and mergeResultsResults

Results Looking for traffic signs: local vs global

ResultsResults Looking for traffic signs: local vs. globalocal vs. global

Zoekresultaten parkeerbord

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 2 3 5 6 7 15 19 34

And Lokaal 1 2 3 4 5 6 13 15 31

Globaal 14 29 47 58 75 77 79 82 86

1 2 3 4 5 6 7 8 9

ResultsResults Looking for “staatslot” signs: local vs. globalocal vs. global

Zoekresultaten Staatslot

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 5 14 18 34

And Lokaal 1 4 13 16 27

Globaal 2 12 16 27 37

1 2 3 4 5

0. Preview

4. Modelling

6. Compression

7. Indexing

9. Summary and conclusion

Features

text, colour, shape and composite

Modeling

fuzzy-extended boolean, vector space and probabilistic

Searching and classification

k-nearest neighbor, clustering

Interaction

relevance feedback (vector space, probabilistic)

Multimedia information

9 Summary and conclusion

Compression

Huffman, Ziv-Lempel

Indexing

inverted files, signature files, K-d-trees, R-trees

Localization and visualization

Split-and-merge, highlighting

Multimedia information

Demo1: real-time skin detection for human recognition

Demo2: skin/subtitle/speaker identification

Demo3: real-time object recognition and tracking*Hieu

Demo4: real-time object recognition and tracking*Hieu

Demo5: real-time human recognition and tracking*Hieu

Robust to background clutter and changing object appearance

Demo6: real-time human recognition and tracking[Hieu, IEEE PAMI, 2003]

Demo7: real-time background detection and removal*Anuj

Demo8: real-time object classification

video classification

material shadow-shape

Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.

Techniques:• Genre classification of image and video• Search and learning strategies in image and video databases• Interactive methods for image search

Demo16: real-time object classification: imageserach engines

Content-based image retrieval

Fast indexing

pictorial example

attributes

Invariance

Prototype: Prototype: PictureFinderPictureFinder

Peter Vreman “Lokalisatie van objecten in kleurenbeelden” (completed)

Neeltje Blommestein “The Relevance Pyramid: Combining Browsing and RelevanceFeedback in Image Databases” (completed)

Wilma Tomasouw “Relevance Feedback Techniques in Color Texture Image Databases”(completed)

Frank Aldershoff “Classification of Images on Internet by Visual and Textual Information”(completed)

Salmon Tetelepta “Photometric Hashing”

Arnoud Rob “Classifying Football Video”

Simon van der Woude “Billboard Identification in Video”

Multimedia informationTrainees at ISIS (stage)

Morfologische algorithmiek

Talige indexering van beeldinformatie

Zoeken van beelden op het World Wide Web

Gezichtspunt-onafhankelijk object herkenning

Database research

Gezichtsdetectie in video

Hyperdocument generatie uit trainingsmateriaal

Multimedia informatie analyse

Affien invariante deformatie

Localizatie van mobiele platforms

Aggressiedetectie

Volgen van mensen ….

Multimedia informationTrainees at ISIS (stage)