Multimedia Information Retrievalstaff.science.uva.nl/~gevers/pub/part10.pdf · Multimedia...

Multimedia Information Retrieval

Lecture 10

Lecturer: Theo GeversLab: MMIS

Email: [email protected]: www.science.uva.nl/~gevers

http: www.science.uva.nl/~gevers/master2003

0. Preview

1. Vision retrieval demands general domain

2. Text, colour, shape and texture

3. Searching and finding

4. Modelling

5. Relevance feedback

6. Compression

7. Indexing

8. Object localisation

6 Compression

Documents and images

•Document compressionHuffman codingDictionary coding (Ziv-Lempel codes)Arithmetic coding

•Image compressionJPEG codingGIF coding (Ziv-Lempel codes)

6 Compression

Level of compressionCharacter or word levelWords or phrases

Data modelStatic model - based on examining a sample of text and constructing statistical tables representing the sampleAdaptive model - starts with an a priori statistical distributionfor the text symbols but modifies this distribution as eachobject is encoded

Document dataLevel of compression and data modelsLevel of compression and data models

6 Compression

Document dataHuffman coding: ExampleHuffman coding: Example

Symbol Frequencya 7b 4c 10d 5e 2f 11g 15h 3i 7j 8

Symbol Huffmana 0110b 0010c 000d 0011e 01110f 010g 10h 01111I 110j 111

0 1

Character frequency Huffman tree Huffman code

Huffman code: variable length, prefix property

e h

5

32

0 1

a

12

7

01

bd

9

54

72

0

0

00

0

0

1

1

11

11c f

gi j

1519 23

3042

7 810

15

6 Compression

Document dataZiv-Lempel, plus variants. LZ77-GZipZiv-Lempel, plus variants. LZ77-GZip

Encoder output <0,0,a><0,0,b><2,1,a><3,2,b><5,3,b>...

Decoder output

Hint: the pointers require less space than the repeated text fragments

a b a a ab b

The code consists of triples <a,b,c>, a identifies how far back in the decodedtext to look for the upcoming text, b tells how many characters to copy for theupcoming segment, and c is a new character to add to complete the nextsegment.

Example of LZ77 compression: abaabab...

6 Compression

Document dataArithmetic codingArithmetic coding

Example: Let’s encode the character string abacus

Symbol initial after a after ab after aba after abac after abacu after abacus a 1/5 2/6 2/7 3/8 3/9 3/10 3/11 b 1/5 1/6 2/7 2/8 2/9 2/10 2/11 c 1/5 1/6 1/7 1/8 2/9 2/10 2/11 s 1/5 1/6 1/7 1/8 1/9 1/10 2/11 u 1/5 1/6 1/7 1/8 1/9 2/10 2/11 UpperBound 1 .000 0.200 0.1000 0.076190 0.073809 0.073809 0.073795 LowerBound 0.000 0.000 0.0666 0.066666 0.072619 0.073767 0.073781

6 Compression

Document dataPerformance comparisonPerformance comparison

200019901980197019601950

5

4

2

3

1

6Co

mpr

essi

on (

bits

per

cha

ract

er)

Year

huffman

compress

LZ78

LZ77

gzip

ppmz

0. Preview




4. Modelling


6. Compression

7. Indexing


7 Indexing

DocumentsInverted files

This is a text. A text has many words. Words are made from letters.

1 6 9 2819 2417 4033 4611 50 55 60

lettersmademanytextwords

Vocabulary

Example: A sample text and an inverted index build on it. The words areconverted to lower-case and some are not indexed. The occurrences pointto character positions in the text

60...50...28...11, 19...33, 40...

Occurrences

4...4...2...1, 2...3...

Block occurrences

block1 block3block2 block4

7 Indexing

DocumentsSignature files

This is a text. A text has many words. Words are made from letters.000101 110101 100100 101101

H(text) = 000101H(many)=110000H(words)=100100H(made)=001100H(letters)=100001

Signature function

Example: A signature file for a sample text cut into blocks

block1 block3block2 block4

7 Indexing

ImagesTree-based indexingTree-based indexing

Indexing facilitates searchingImages are too complex for traditional DBMSAn image becomes a point in a k-dimensional space Indexing allows to search all dimensions of the data

feature 1

feature 3

feature 2

A dot represents an image

7 Indexing

ImagesBinary treesBinary trees

Definition: A tree is a finite set of one or more nodes such that: (i) there is aspecially designated node called the root; (ii) the remaining nodes arepartitioned into n>= disjoint sets T1,…,Tn where each of these sets is a tree.T1,…,Tn are called the subtrees of the root

3

2

1LEVEL

16|30

5|10 21|28 30|35

7 Indexing

ImagesK-d TreesK-d Trees

Each of the internal nodes store values to identify a section of themultidimensional data space and a set of pointers referencing its children

FG H J

I

LM

N

OD

B

C

E ST

QPR 6

3

2 5

4 87 1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

5 6 7 83 41 2

D E F G HA B C I J N OK L MP Q RS T

A

1

K

7 Indexing

ImagesR-treesR-trees

1

2 43

B CA D FE G IH

BA

CI

H

G

D

EF

2

4

1

3

MBR for the R-tree R-tree

0. Preview




4. Modelling


6. Compression

7. Indexing


8 Object localisation

Split and mergeSplit and merge

Split regions until patch is homogeneous ...


Split and merge

... and merge patches which are alike.Works because of spatial coherence.

hue

clutteringobject and

occlusionon t independen ),( QIDI

Â

Â

Â

Â

=

=

=

=

=

⋅=

t

kQk

t

kIkQk

t

kQk

t

kIkQk

w

ww

w

ww

1

1I

1

2

1C

},min{Q)(I,D

:onintersecti Histogram 2.

)(Q)(I,D

:ncorrelatio cross Normalized 1.#

100

0

clutteringobject and

occlusionon dependent ),( QIDC

I

Q


HomogeneityHomogeneity

Indoor photographyWhere is Waldo!??Where is Waldo!??


•

Charly Where is Charly!?!? Where is Charly!?!?

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions


60 degrees rotation 40 degrees rotation Scaling

Indoor photographyWhere is Waldo!?? Varying imaging conditionsWhere is Waldo!?? Varying imaging conditions


Original image ViewpointRotation


Outdoor photographyOutdoor photography Data set Data set


Outdoor photographyOutdoor photography Data set Data set

Outdoor photographyWhere is my favorite bar and where can I buy tickets?



Outdoor photographyOutdoor photographyResultsResults


Outdoor photographyOutdoor photographyResultsResults

Outdoor photographyTexture imagesTexture images


Result: RGB Result: colour ratio’s

Split and mergeResultsResults


Result: RGB Result: colour ratio’sOriginal image

texture

Split and mergeResultsResults



Results Looking for traffic signs: local vs global

ResultsResults Looking for traffic signs: local vs. globalocal vs. global


Zoekresultaten parkeerbord

0

10

20

30

40

50

60

70

80

90

100

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 2 3 5 6 7 15 19 34

And Lokaal 1 2 3 4 5 6 13 15 31

Globaal 14 29 47 58 75 77 79 82 86

1 2 3 4 5 6 7 8 9

ResultsResults Looking for “staatslot” signs: local vs. globalocal vs. global


Zoekresultaten Staatslot

0

5

10

15

20

25

30

35

40

I t e m

Xor Lokaal

And Lokaal

Globaal

Xor Lokaal 1 5 14 18 34

And Lokaal 1 4 13 16 27

Globaal 2 12 16 27 37

1 2 3 4 5

0. Preview




4. Modelling


6. Compression

7. Indexing


9. Summary and conclusion

Features

text, colour, shape and composite

Modeling

fuzzy-extended boolean, vector space and probabilistic

Searching and classification

k-nearest neighbor, clustering

Interaction

relevance feedback (vector space, probabilistic)

Multimedia information

9 Summary and conclusion

Compression

Huffman, Ziv-Lempel

Indexing

inverted files, signature files, K-d-trees, R-trees

Localization and visualization

Split-and-merge, highlighting

Multimedia information


Demo1: real-time skin detection for human recognition


Demo2: skin/subtitle/speaker identification


Demo3: real-time object recognition and tracking*Hieu


Demo4: real-time object recognition and tracking*Hieu


Demo5: real-time human recognition and tracking*Hieu


Robust to background clutter and changing object appearance


Demo6: real-time human recognition and tracking[Hieu, IEEE PAMI, 2003]

Demo7: real-time background detection and removal*Anuj


Demo8: real-time object classification


video classification

material shadow-shape


material shadow-shape
















Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.



Techniques:• Mosaics.• Shot and key-frame detection.• Analysis of camera-motion.



Techniques:• Genre classification of image and video• Search and learning strategies in image and video databases• Interactive methods for image search


Demo16: real-time object classification: imageserach engines

Content-based image retrieval

Fast indexing

Query

pictorial example

attributes

Invariance

Prototype: Prototype: PictureFinderPictureFinder


Peter Vreman “Lokalisatie van objecten in kleurenbeelden” (completed)

Neeltje Blommestein “The Relevance Pyramid: Combining Browsing and RelevanceFeedback in Image Databases” (completed)

Wilma Tomasouw “Relevance Feedback Techniques in Color Texture Image Databases”(completed)

Frank Aldershoff “Classification of Images on Internet by Visual and Textual Information”(completed)

Salmon Tetelepta “Photometric Hashing”

Arnoud Rob “Classifying Football Video”

Simon van der Woude “Billboard Identification in Video”

…

Multimedia informationTrainees at ISIS (stage)


Morfologische algorithmiek

Talige indexering van beeldinformatie

Zoeken van beelden op het World Wide Web

Gezichtspunt-onafhankelijk object herkenning

Database research

Gezichtsdetectie in video

Hyperdocument generatie uit trainingsmateriaal

Multimedia informatie analyse

Affien invariante deformatie

Localizatie van mobiele platforms

Aggressiedetectie

Volgen van mensen ….

Multimedia informationTrainees at ISIS (stage)


Date post:	14-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times