+ All Categories
Home > Documents > Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov...

Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov...

Date post: 21-Aug-2019
Category:
Upload: vantu
View: 250 times
Download: 0 times
Share this document with a friend
31
Outline Markov source Source coding Chapter 2: Source coding Vahid Meghdadi [email protected] University of Limoges Vahid Meghdadi Chapter 2: Source coding
Transcript
Page 1: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Chapter 2: Source coding

Vahid Meghdadi

[email protected] of Limoges

Vahid Meghdadi Chapter 2: Source coding

Page 2: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Markov sourceEntropy of Markov Source

Source codingDefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Vahid Meghdadi Chapter 2: Source coding

Page 3: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Markov model for information sources

Given the present, the future is independent of the past.This fact can be presented as follows. For a given source producingX1, X2 and X3, we can always write:

P(X1, X2, X3) = P(X1)P(X2|X1)P(X3|X1, X2)

However for a Markov source, the history of the source is saved inits present state:

P(Xk = sq|X1, X2, ..., Xk−1) = P(Xk = sq|Sk)

It means that the state contains all the past of the source.

Vahid Meghdadi Chapter 2: Source coding

Page 4: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Markov source modeling

I The source can be in one of n possible states.

I At each symbol generation, the source changes its state fromi to j .

I This state change is done with the probability pij whichdepends only on the initial state i and the final state j andremains constant over the time.

I The generated symbol depends only on the initial state andthe next state (i and j).

I The transition matrix P is defined where [pij ] is the probabilityof transmission from state i to state j .

Vahid Meghdadi Chapter 2: Source coding

Page 5: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Markov source, example

I A symbol is sent out at eachtransition.

I What is the probability ofmessage ”A”?

I What is the probability ofmessage ”B”?

I What is the probability ofmessage ”AB”?

I Give the transition matrix.

1 2

3

B

A1/4

B1/4

A 1/4

C1/4

C 1/4

B 1/2

C 1/2

A 1/2

Vahid Meghdadi Chapter 2: Source coding

Page 6: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Entropy of a Markov source

Each state can be considered as a source. For this source theentropy can be calculated. We average then these entropies overall the states to find the entropy of the whole source, i.e.:

H =n∑

i=1

PiHi

Where n is the number of states and :

Hi =n∑

j=1

Pij log2(1/Pij)

The information rate will be then:

R = rsH

Vahid Meghdadi Chapter 2: Source coding

Page 7: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Example

For the following example calculate

I the entropy of the source

I the information per symbol that is contained in the messagesof size 1, 2 and 3 symbols

I is it true that the information for longer messages is less?

21A B

C

C

3/4 3/41/4

1/4

P1=1/2P2=1/2

Vahid Meghdadi Chapter 2: Source coding

Page 8: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Theorem

Suppose the messages mi of size N at the output of a Markovsource. Because the information content of sequence mi is− log2 p(mi ), the mean of information per symbol is:

GN = − 1

N

∑i

p(mi ) log2 p(mi )

It can be shown that GN is a monotone decreasing function of N,and:

limN→∞

GN = H bits/symbol

Vahid Meghdadi Chapter 2: Source coding

Page 9: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

Entropy of Markov Source

Compression

Application for compression.

I Suppose that there is a source modeled by Markov model.

I If we find the statistic for the sequences of one symbol, thecorrelation between the consecutive symbols is not exploited.

I So the information seems to be more than reality.

I If we consider the sequences of n symbols, there is morecorrelation taken into account, so entroy seems to be lessenand nearer to real source entropy.

Conclusion: to better compress, we should code the longersequences.

Vahid Meghdadi Chapter 2: Source coding

Page 10: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Source coding

Two types of compression can be considered:

I Lossless coding: the information can be reconstructed exactlyfrom coded sequence (Shannon, Huffman, LZW, PKZIP, GIF,TIFF,...)

I Lossy coding: Information loss happens in coding process(JPEG, MPEG, Wavelet, transform coding, sub bandcoding,...)

In this course we only consider the first one. We supposefurthermore that the sequences at the output of encoder are binary.

Vahid Meghdadi Chapter 2: Source coding

Page 11: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Definitions(1)

DefinitionA source code C for a random variable X is a mapping from X ,the range of X , to D, the set of finite length strings of symbolsfrom a D-ary alphabet. C (x) denotes the codeword correspondingto x and l(x) denotes the length of C (x).

Example

If you toss a coin, X = {tail, head},C (head) = 0, C (tail) = 11, l(head) = 1, l(tail) = 2.

Vahid Meghdadi Chapter 2: Source coding

Page 12: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Definitions(2)

DefinitionThe expected (average) length of a code C (x) for a randomvariable X with probability mass function p(x) is given by

L(C ) =∑x∈X

p(x)l(x)

For example for the above example the expected length of thecode is

L(C ) =1

2∗ 2 +

1

2∗ 1 = 1.5

Vahid Meghdadi Chapter 2: Source coding

Page 13: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Definitions(3)

DefinitionThe code is singular if C (x1) = C (x2) and x1 6= x2.

DefinitionThe extension of a code C is a code obtained as:

C (x1x2 . . . xn) = C (x1)C (x2) . . . C (xn)

It means that a long message can be coded by concatenating theshorter message code words. For example if C (x1) = 11 andC (x2) = 00, then C (x1x2) = 1100.

Vahid Meghdadi Chapter 2: Source coding

Page 14: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Definitions(4)

DefinitionA code is uniquely decodable if its extension is non-singular.

In other words, any encoded string has only one possible sourcestring and there is no ambiguity.

DefinitionA code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.

Vahid Meghdadi Chapter 2: Source coding

Page 15: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Example (1)

Example

The following code is a prefix code:C (x1) = 1, C (x2) = 01, C (x3) = 001, C (X4) = 000. Any encodedsequence is uniquely decodable and its corresponding source wordcan be obtained as soon as the code word is received. In otherword, an instantaneous code can be decoded without reference tothe future codewords since the end of a codeword is immediatelyrecognizable. For example the sequence 001100000001 is decodedas x3x1x4x4x2.

Vahid Meghdadi Chapter 2: Source coding

Page 16: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Example (2)

Example

The following code is not instantaneous code but uniquelydecodable: C (x1) = 1, C (x2) = 10, C (x3) = 100, C (X4) = 000.Why? Here you should wait to receive a 1 to be able to decode.Note that if we look at the encoded sequence from right to left, itbecomes instantaneous.

Vahid Meghdadi Chapter 2: Source coding

Page 17: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Kraft-McMillan Inequality

For any uniquely decodable code C :∑

w∈C D−l(w) ≤ 1where w is a codeword in C and l(w) is its length, D is the size ofalphabet, it is 2 for binary sequences.

LemmaFor any message set X with a mass probability function andassociated uniquely decodable code C

H(X ) ≤ L(C )

The proof uses Jensen inequality for concave logarithmic functionand Kraft-MacMillan inequality:

∑i pi f (xi ) ≤ f (

∑i pixi ).

Vahid Meghdadi Chapter 2: Source coding

Page 18: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Kraft-McMillan Inequality example

0

0

0

0

000

1

1

1

1111

A

B

CD

Codewords cannot be used

Here D = 2. There are 4 codewords A=1, B=01, C=001, D=000,with li =1, 2, 3 and 3. So

∑w∈C D−l(w) = 1. If all of the

branches are used the code is complete and the equality isobtained, as above. But, if the tree is not complete, for example”C” in the figure is not used in the code, then the equality cannotbe obtained in Kraft inequality.

Vahid Meghdadi Chapter 2: Source coding

Page 19: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Results

I The average code length is lower bounded by the sourceentropy.

I It can be shown that there is an upper bound based onentropy for optimal prefix code:

L(c) ≤ H(X ) + 1

Vahid Meghdadi Chapter 2: Source coding

Page 20: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Shannon-Fano algorithm(1)

A systematic method to design the code

I The input of the encoder is one of the q possible sequences ofsize N symbols: mi .

I The corresponding probabilities are p1, p2, ..., pq.

I The encoder transforms mi to a binary sequence ci trying tominimize the average output bits L(C ).

I The average number of bits per symbol at the output can becalculated as: L(C ) = (1/N)

∑qi=1 nipi bits/symbol where

ni is the number of bits in the coded sequence ci .

I For a good encoder L(C ) should be very close to the inputentropy: GN = (1/N)

∑qi=1 pi log(1/pi ).

Vahid Meghdadi Chapter 2: Source coding

Page 21: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Shannon-Fano algorithm(2)

The idea is to assign shorter codes to more probable messages. Itis a variable length code.

TheoremIf C is an optimal prefix code for the probabilities {p1, p2, ..., pn},then pi > pj implies that l(ci ) ≤ l(cj).

Vahid Meghdadi Chapter 2: Source coding

Page 22: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Shannon-Fano algorithm(3)

1. Order the messages from the most probable to the leastprobable, from m1 to mq.

2. Put these messages in the first column of a table.

3. In the second column put ni such that

log2(1/pi ) ≤ ni < 1 + log2(1/pi )

4. In the third column, write Fi =∑i−1

k=1 pk with F1 = 0.

5. In the forth column, write down the binary representation ofFi for ni bits. This column gives directly the codecorresponding to the messages of the first column.

Vahid Meghdadi Chapter 2: Source coding

Page 23: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Shannon-Fano algorithm properties

I This is a uniquely decodable prefix code.

I The more probable messages are coded with shorter codes.

I The codewords are distinct.

I The average number of bits per symbol at the output is

GN ≤ L(C ) < GN + 1/N

I When N →∞, GN → L(C ) and L(C )→ H.

I The code performance is quantified by e = H/L(C ).

Vahid Meghdadi Chapter 2: Source coding

Page 24: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Example

21A B

C

C

3/4 3/41/4

1/4

P1=1/2P2=1/2

For the above Markov source, give the code for

1. the messages of size one symbol,

2. the messages of size two symbols,

3. the messages of size three symbols.

Show that the code performance is:

e1 = 40.56%, e2 = 56.34% and e3 = 62.40%

Vahid Meghdadi Chapter 2: Source coding

Page 25: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Huffman algorithm

Huffman coding constructs the coding tree with a systematicmethod. Suppose that the messages are ordered with theirprobabilities, m1 is the most probable and mq the least one.

We consider the two last messages and we sum uptheir probabilities to obtain a new message. Weconstruct at the same time the tree. Then, with newmessages we continue as before.

pq

pq-1

0 1

Vahid Meghdadi Chapter 2: Source coding

Page 26: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Huffman algorithm example

a

c

b

d

e

f

g

0.25

0.15

0.2

0.12

0.1

0.1

0.08

0.25

0.18

0.2

0.15

0.12

0.1

0.25

0.2

0.22

0.18

0.15

0.33

0.22

0.25

0.2

0.42

0.25

0.33

0.58

0.42

0.080.1

g f

0.18 0.15

c

0.18

0.33

0.120.1

e d

0.220.25

a

0.58 0.42

1

0.2

b

0

0

0

0

0

0

1

1

11

1

a:01b:11c:001d:101e:100f: 0001g:0000

1

Vahid Meghdadi Chapter 2: Source coding

Page 27: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Lempel-Ziv algorithm

I Huffman encoding is optimal (average block length is minimal)

I Huffman coding requires the probability distribution of source.

I In Huffman coding, the input sequence is fixed length but theoutput is variable length.

I In LZ coding, the knowledge of probabilities is not required.

I LZ coding therefore belongs to the class of universal sourcecoding

I In LZ coding, the input sequence can be variable length butthe output is fixed length.

Vahid Meghdadi Chapter 2: Source coding

Page 28: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Dictionary construction

In this algorithm we construct ”on the fly” a dictionary. That’swhy the code is called dictionary coding. Suppose the binarymessage 10101101001001110101000011001110101100011011... tobe coded.There is no entry in our table yet. The encoder constructs the firstentry of its table with the first letter in the sequence, 1, with thelocation in the table 1.The second letter, which is a 0, does not belong to the table, so itis added with the location 2.The next letter is a 1 which is already in the table, so the encodercontinues to receive the next letter. Now 10 is not in the table,that will be added with the index 3.

Vahid Meghdadi Chapter 2: Source coding

Page 29: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

The table

For the example given the dictionary for the sequence10101101001001110101000011001110101100011011 will be:

1, 0, 10, 11, 01, 00, 100, 111, 010, 1000, 011, 001, 110, 101, 10001, 1011

Note that each phrase is a concatenation of a previous phrase inthe table with the new letter to be appended.To encode the sequence, the codewords are the position of thephrase in the dictionary with the new letter appended to it.Initially, the location 0000 is used to encode a phrase that has notappeared previously.

Vahid Meghdadi Chapter 2: Source coding

Page 30: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Encoding

Assuming the codewords of length 5, the constructed dictionarytogether with the codewords are presented in the following table:

location content codeword

1 0001 1 000012 0010 0 000003 0011 10 000104 0100 11 000115 0101 01 001016 0110 00 001007 0111 100 00110

... ... ... ...15 1111 10001 10101

Vahid Meghdadi Chapter 2: Source coding

Page 31: Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov Source Markov model for information sources Given the present, the future is independent

OutlineMarkov sourceSource coding

DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Lempel-Ziv decoding

In decoding the table must be constructed also ”on the fly”. Forexample for the previous sequence, the receiver must know thatthe codeword length is 5. It receives a 00001. So the entry of thetable is the 1. This is the decoded sequence and it puts in its tablethe 1 in the first location. Then it receives a 00000 which says thatthe second entry of the table is a 0 and the decoded phrase will be0. The third word is 00010 which means that 0001 is appended to0. The decoded phrase will be the phrase corresponding to the firstrow, 1 with a 0 at the end: 10, and so on ...

Vahid Meghdadi Chapter 2: Source coding


Recommended