Multimedia Information Retrievalstaff.science.uva.nl/~gevers/pub/part10.pdf · Multimedia...

Post on 14-Jul-2020

1 views 0 download

transcript

Multimedia Information Retrieval

Lecture 10

Lecturer: Theo GeversLab: MMIS

Email: gevers@science.uva.nlhttp: www.science.uva.nl/~gevers

http: www.science.uva.nl/~gevers/master2003

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

6 Compression

Documents and images

•Document compressionHuffman codingDictionary coding (Ziv-Lempel codes)Arithmetic coding

•Image compressionJPEG codingGIF coding (Ziv-Lempel codes)

6 Compression

Level of compressionCharacter or word levelWords or phrases

Data modelStatic model - based on examining a sample of text and constructing statistical tables representing the sampleAdaptive model - starts with an a priori statistical distributionfor the text symbols but modifies this distribution as eachobject is encoded

Document dataLevel of compression and data modelsLevel of compression and data models

6 Compression

Document dataHuffman coding: ExampleHuffman coding: Example

Symbol Frequencya 7b 4c 10d 5e 2f 11g 15h 3i 7j 8

Symbol Huffmana 0110b 0010c 000d 0011e 01110f 010g 10h 01111I 110j 111

0 1

Character frequency Huffman tree Huffman code

Huffman code: variable length, prefix property

e h

5

32

0 1

a

12

7

01

bd

9

54

72

0

0

00

0

0

1

1

11

11c f

gi j

1519 23

3042

7 810

15

6 Compression

Document dataZiv-Lempel, plus variants. LZ77-GZipZiv-Lempel, plus variants. LZ77-GZip

Encoder output <0,0,a><0,0,b><2,1,a><3,2,b><5,3,b>...

Decoder output

Hint: the pointers require less space than the repeated text fragments

a b a a ab b

The code consists of triples <a,b,c>, a identifies how far back in the decodedtext to look for the upcoming text, b tells how many characters to copy for theupcoming segment, and c is a new character to add to complete the nextsegment.

Example of LZ77 compression: abaabab...

6 Compression

Document dataArithmetic codingArithmetic coding

Example: Let’s encode the character string abacus

Symbol initial after a after ab after aba after abac after abacu after abacus a 1/5 2/6 2/7 3/8 3/9 3/10 3/11 b 1/5 1/6 2/7 2/8 2/9 2/10 2/11 c 1/5 1/6 1/7 1/8 2/9 2/10 2/11 s 1/5 1/6 1/7 1/8 1/9 1/10 2/11 u 1/5 1/6 1/7 1/8 1/9 2/10 2/11 UpperBound 1 .000 0.200 0.1000 0.076190 0.073809 0.073809 0.073795 LowerBound 0.000 0.000 0.0666 0.066666 0.072619 0.073767 0.073781

6 Compression

Document dataPerformance comparisonPerformance comparison

200019901980197019601950

5

4

2

3

1

6Co

mpr

essi

on (

bits

per

cha

ract

er)

Year

huffman

compress

LZ78

LZ77

gzip

ppmz

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

7 Indexing

DocumentsInverted files

This is a text. A text has many words. Words are made from letters.

1 6 9 2819 2417 4033 4611 50 55 60

lettersmademanytextwords

Vocabulary

Example: A sample text and an inverted index build on it. The words areconverted to lower-case and some are not indexed. The occurrences pointto character positions in the text

60...50...28...11, 19...33, 40...

Occurrences

4...4...2...1, 2...3...

Block occurrences

block1 block3block2 block4

7 Indexing

DocumentsSignature files

This is a text. A text has many words. Words are made from letters.000101 110101 100100 101101

H(text) = 000101H(many)=110000H(words)=100100H(made)=001100H(letters)=100001

Signature function

Example: A signature file for a sample text cut into blocks

block1 block3block2 block4

7 Indexing

ImagesTree-based indexingTree-based indexing

Indexing facilitates searchingImages are too complex for traditional DBMSAn image becomes a point in a k-dimensional space Indexing allows to search all dimensions of the data

feature 1

feature 3

feature 2

A dot represents an image

7 Indexing

ImagesBinary treesBinary trees

Definition: A tree is a finite set of one or more nodes such that: (i) there is aspecially designated node called the root; (ii) the remaining nodes arepartitioned into n>= disjoint sets T1,…,Tn where each of these sets is a tree.T1,…,Tn are called the subtrees of the root

3

2

1LEVEL

16|30

5|10 21|28 30|35

7 Indexing

ImagesK-d TreesK-d Trees

Each of the internal nodes store values to identify a section of themultidimensional data space and a set of pointers referencing its children

FG H J

I

LM

N

OD

B

C

E ST

QPR 6

3

2 5

4 87 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

5 6 7 83 41 2

D E F G HA B C I J N OK L MP Q RS T

A

1

K

7 Indexing

ImagesR-treesR-trees

1

2 43

B CA D FE G IH

BA

CI

H

G

D

EF

2

4

1

3

MBR for the R-tree R-tree

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

8 Object localisation

Split and mergeSplit and merge

Split regions until patch is homogeneous ...

8 Object localisation

Split and merge

... and merge patches which are alike.Works because of spatial coherence.

hue

clutteringobject and

occlusionon t independen ),( QIDI

Â

Â

Â

Â

=

=

=

=

=

⋅=

t

kQk

t

kIkQk

t

kQk

t

kIkQk

w

ww

w

ww

1

1I

1

2

1C

},min{Q)(I,D

:onintersecti Histogram 2.

)(Q)(I,D

:ncorrelatio cross Normalized 1.#

100

0

clutteringobject and

occlusionon dependent ),( QIDC

I

Q

8 Object localisation

HomogeneityHomogeneity

Indoor photographyWhere is Waldo!??Where is Waldo!??

8 Object localisation

Charly Where is Charly!?!? Where is Charly!?!?

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions

8 Object localisation

60 degrees rotation 40 degrees rotation Scaling

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions

8 Object localisation

Original image ViewpointRotation

8 Object localisation

Outdoor photographyOutdoor photography Data set Data set

8 Object localisation

Outdoor photographyOutdoor photography Data set Data set

Outdoor photographyWhere is my favorite bar and where can I buy tickets?

8 Object localisation

8 Object localisation

Outdoor photographyOutdoor photographyResultsResults

8 Object localisation

Outdoor photographyOutdoor photographyResultsResults

Outdoor photographyTexture imagesTexture images

8 Object localisation

Result: RGB Result: colour ratio’s

Split and mergeResultsResults

8 Object localisation

Result: RGB Result: colour ratio’sOriginal image

texture

Split and mergeResultsResults

8 Object localisation

8 Object localisation

Results Looking for traffic signs: local vs global

ResultsResults Looking for traffic signs: local vs. globalocal vs. global

8 Object localisation

Zoekresultaten parkeerbord

0

10

20

30

40

50

60

70

80

90

100

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 2 3 5 6 7 15 19 34

And Lokaal 1 2 3 4 5 6 13 15 31

Globaal 14 29 47 58 75 77 79 82 86

1 2 3 4 5 6 7 8 9

ResultsResults Looking for “staatslot” signs: local vs. globalocal vs. global

8 Object localisation

Zoekresultaten Staatslot

0

5

10

15

20

25

30

35

40

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 5 14 18 34

And Lokaal 1 4 13 16 27

Globaal 2 12 16 27 37

1 2 3 4 5

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

9. Summary and conclusion

Features

text, colour, shape and composite

Modeling

fuzzy-extended boolean, vector space and probabilistic

Searching and classification

k-nearest neighbor, clustering

Interaction

relevance feedback (vector space, probabilistic)

Multimedia information

9 Summary and conclusion

Compression

Huffman, Ziv-Lempel

Indexing

inverted files, signature files, K-d-trees, R-trees

Localization and visualization

Split-and-merge, highlighting

Multimedia information

9 Summary and conclusion

Demo1: real-time skin detection for human recognition

9 Summary and conclusion

Demo2: skin/subtitle/speaker identification

9 Summary and conclusion

Demo3: real-time object recognition and tracking*Hieu

9 Summary and conclusion

Demo4: real-time object recognition and tracking*Hieu

9 Summary and conclusion

Demo5: real-time human recognition and tracking*Hieu

9 Summary and conclusion

Robust to background clutter and changing object appearance

9 Summary and conclusion

Demo6: real-time human recognition and tracking[Hieu, IEEE PAMI, 2003]

Demo7: real-time background detection and removal*Anuj

9 Summary and conclusion

Demo8: real-time object classification

9 Summary and conclusion

video classification

material shadow-shape

video classification

material shadow-shape

Demo9: real-time object classification

9 Summary and conclusion

video classification

Demo10: real-time object classification

9 Summary and conclusion

9 Summary and conclusion

Demo11: real-time object classification

9 Summary and conclusion

Demo12: real-time object classification

9 Summary and conclusion

Demo13: real-time object classification

9 Summary and conclusion

Demo13: real-time object classification

9 Summary and conclusion

Demo14: real-time object classification

Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.

9 Summary and conclusion

Demo15: real-time object classification

Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.

9 Summary and conclusion

Demo15: real-time object classification

Techniques:• Genre classification of image and video• Search and learning strategies in image and video databases• Interactive methods for image search

9 Summary and conclusion

Demo16: real-time object classification: imageserach engines

Content-based image retrieval

Fast indexing

Query

pictorial example

attributes

Invariance

Prototype: Prototype: PictureFinderPictureFinder

9 Summary and conclusion

Peter Vreman “Lokalisatie van objecten in kleurenbeelden” (completed)

Neeltje Blommestein “The Relevance Pyramid: Combining Browsing and RelevanceFeedback in Image Databases” (completed)

Wilma Tomasouw “Relevance Feedback Techniques in Color Texture Image Databases”(completed)

Frank Aldershoff “Classification of Images on Internet by Visual and Textual Information”(completed)

Salmon Tetelepta “Photometric Hashing”

Arnoud Rob “Classifying Football Video”

Simon van der Woude “Billboard Identification in Video”

Multimedia informationTrainees at ISIS (stage)

9 Summary and conclusion

Morfologische algorithmiek

Talige indexering van beeldinformatie

Zoeken van beelden op het World Wide Web

Gezichtspunt-onafhankelijk object herkenning

Database research

Gezichtsdetectie in video

Hyperdocument generatie uit trainingsmateriaal

Multimedia informatie analyse

Affien invariante deformatie

Localizatie van mobiele platforms

Aggressiedetectie

Volgen van mensen ….

Multimedia informationTrainees at ISIS (stage)

9 Summary and conclusion