Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov...

OutlineMarkov sourceSource coding

Chapter 2: Source coding

Vahid Meghdadi

[email protected] of Limoges

Vahid Meghdadi Chapter 2: Source coding


Markov sourceEntropy of Markov Source

Source codingDefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding



Entropy of Markov Source

Markov model for information sources

Given the present, the future is independent of the past.This fact can be presented as follows. For a given source producingX1, X2 and X3, we can always write:

P(X1, X2, X3) = P(X1)P(X2|X1)P(X3|X1, X2)

However for a Markov source, the history of the source is saved inits present state:

P(Xk = sq|X1, X2, ..., Xk−1) = P(Xk = sq|Sk)

It means that the state contains all the past of the source.




Markov source modeling

I The source can be in one of n possible states.

I At each symbol generation, the source changes its state fromi to j .

I This state change is done with the probability pij whichdepends only on the initial state i and the final state j andremains constant over the time.

I The generated symbol depends only on the initial state andthe next state (i and j).

I The transition matrix P is defined where [pij ] is the probabilityof transmission from state i to state j .




Markov source, example

I A symbol is sent out at eachtransition.

I What is the probability ofmessage ”A”?

I What is the probability ofmessage ”B”?

I What is the probability ofmessage ”AB”?

I Give the transition matrix.

1 2

3

B

A1/4

B1/4

A 1/4

C1/4

C 1/4

B 1/2

C 1/2

A 1/2




Entropy of a Markov source

Each state can be considered as a source. For this source theentropy can be calculated. We average then these entropies overall the states to find the entropy of the whole source, i.e.:

H =n∑

i=1

PiHi

Where n is the number of states and :

Hi =n∑

j=1

Pij log2(1/Pij)

The information rate will be then:

R = rsH




Example

For the following example calculate

I the entropy of the source

I the information per symbol that is contained in the messagesof size 1, 2 and 3 symbols

I is it true that the information for longer messages is less?

21A B

C

C

3/4 3/41/4

1/4

P1=1/2P2=1/2




Theorem

Suppose the messages mi of size N at the output of a Markovsource. Because the information content of sequence mi is− log2 p(mi ), the mean of information per symbol is:

GN = − 1

N

∑i

p(mi ) log2 p(mi )

It can be shown that GN is a monotone decreasing function of N,and:

limN→∞

GN = H bits/symbol




Compression

Application for compression.

I Suppose that there is a source modeled by Markov model.

I If we find the statistic for the sequences of one symbol, thecorrelation between the consecutive symbols is not exploited.

I So the information seems to be more than reality.

I If we consider the sequences of n symbols, there is morecorrelation taken into account, so entroy seems to be lessenand nearer to real source entropy.

Conclusion: to better compress, we should code the longersequences.



DefinitionsShannon-Fano algorithmHuffman codingLempel-Ziv coding

Source coding

Two types of compression can be considered:

I Lossless coding: the information can be reconstructed exactlyfrom coded sequence (Shannon, Huffman, LZW, PKZIP, GIF,TIFF,...)

I Lossy coding: Information loss happens in coding process(JPEG, MPEG, Wavelet, transform coding, sub bandcoding,...)

In this course we only consider the first one. We supposefurthermore that the sequences at the output of encoder are binary.




Definitions(1)

DefinitionA source code C for a random variable X is a mapping from X ,the range of X , to D, the set of finite length strings of symbolsfrom a D-ary alphabet. C (x) denotes the codeword correspondingto x and l(x) denotes the length of C (x).

Example

If you toss a coin, X = {tail, head},C (head) = 0, C (tail) = 11, l(head) = 1, l(tail) = 2.




Definitions(2)

DefinitionThe expected (average) length of a code C (x) for a randomvariable X with probability mass function p(x) is given by

L(C ) =∑x∈X

p(x)l(x)

For example for the above example the expected length of thecode is

L(C ) =1

2∗ 2 +

1

2∗ 1 = 1.5




Definitions(3)

DefinitionThe code is singular if C (x1) = C (x2) and x1 6= x2.

DefinitionThe extension of a code C is a code obtained as:

C (x1x2 . . . xn) = C (x1)C (x2) . . . C (xn)

It means that a long message can be coded by concatenating theshorter message code words. For example if C (x1) = 11 andC (x2) = 00, then C (x1x2) = 1100.




Definitions(4)

DefinitionA code is uniquely decodable if its extension is non-singular.

In other words, any encoded string has only one possible sourcestring and there is no ambiguity.

DefinitionA code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.




Example (1)

Example

The following code is a prefix code:C (x1) = 1, C (x2) = 01, C (x3) = 001, C (X4) = 000. Any encodedsequence is uniquely decodable and its corresponding source wordcan be obtained as soon as the code word is received. In otherword, an instantaneous code can be decoded without reference tothe future codewords since the end of a codeword is immediatelyrecognizable. For example the sequence 001100000001 is decodedas x3x1x4x4x2.




Example (2)

Example

The following code is not instantaneous code but uniquelydecodable: C (x1) = 1, C (x2) = 10, C (x3) = 100, C (X4) = 000.Why? Here you should wait to receive a 1 to be able to decode.Note that if we look at the encoded sequence from right to left, itbecomes instantaneous.




Kraft-McMillan Inequality

For any uniquely decodable code C :∑

w∈C D−l(w) ≤ 1where w is a codeword in C and l(w) is its length, D is the size ofalphabet, it is 2 for binary sequences.

LemmaFor any message set X with a mass probability function andassociated uniquely decodable code C

H(X ) ≤ L(C )

The proof uses Jensen inequality for concave logarithmic functionand Kraft-MacMillan inequality:

∑i pi f (xi ) ≤ f (

∑i pixi ).




Kraft-McMillan Inequality example

0

0

0

0

000

1

1

1

1111

A

B

CD

Codewords cannot be used

Here D = 2. There are 4 codewords A=1, B=01, C=001, D=000,with li =1, 2, 3 and 3. So

∑w∈C D−l(w) = 1. If all of the

branches are used the code is complete and the equality isobtained, as above. But, if the tree is not complete, for example”C” in the figure is not used in the code, then the equality cannotbe obtained in Kraft inequality.




Results

I The average code length is lower bounded by the sourceentropy.

I It can be shown that there is an upper bound based onentropy for optimal prefix code:

L(c) ≤ H(X ) + 1




Shannon-Fano algorithm(1)

A systematic method to design the code

I The input of the encoder is one of the q possible sequences ofsize N symbols: mi .

I The corresponding probabilities are p1, p2, ..., pq.

I The encoder transforms mi to a binary sequence ci trying tominimize the average output bits L(C ).

I The average number of bits per symbol at the output can becalculated as: L(C ) = (1/N)

∑qi=1 nipi bits/symbol where

ni is the number of bits in the coded sequence ci .

I For a good encoder L(C ) should be very close to the inputentropy: GN = (1/N)

∑qi=1 pi log(1/pi ).





The idea is to assign shorter codes to more probable messages. Itis a variable length code.

TheoremIf C is an optimal prefix code for the probabilities {p1, p2, ..., pn},then pi > pj implies that l(ci ) ≤ l(cj).





1. Order the messages from the most probable to the leastprobable, from m1 to mq.

2. Put these messages in the first column of a table.

3. In the second column put ni such that

log2(1/pi ) ≤ ni < 1 + log2(1/pi )

4. In the third column, write Fi =∑i−1

k=1 pk with F1 = 0.

5. In the forth column, write down the binary representation ofFi for ni bits. This column gives directly the codecorresponding to the messages of the first column.




Shannon-Fano algorithm properties

I This is a uniquely decodable prefix code.

I The more probable messages are coded with shorter codes.

I The codewords are distinct.

I The average number of bits per symbol at the output is

GN ≤ L(C ) < GN + 1/N

I When N →∞, GN → L(C ) and L(C )→ H.

I The code performance is quantified by e = H/L(C ).




Example

21A B

C

C

3/4 3/41/4

1/4

P1=1/2P2=1/2

For the above Markov source, give the code for

1. the messages of size one symbol,

2. the messages of size two symbols,

3. the messages of size three symbols.

Show that the code performance is:

e1 = 40.56%, e2 = 56.34% and e3 = 62.40%




Huffman algorithm

Huffman coding constructs the coding tree with a systematicmethod. Suppose that the messages are ordered with theirprobabilities, m1 is the most probable and mq the least one.

We consider the two last messages and we sum uptheir probabilities to obtain a new message. Weconstruct at the same time the tree. Then, with newmessages we continue as before.

pq

pq-1

0 1




Huffman algorithm example

a

c

b

d

e

f

g

0.25

0.15

0.2

0.12

0.1

0.1

0.08

0.25

0.18

0.2

0.15

0.12

0.1

0.25

0.2

0.22

0.18

0.15

0.33

0.22

0.25

0.2

0.42

0.25

0.33

0.58

0.42

0.080.1

g f

0.18 0.15

c

0.18

0.33

0.120.1

e d

0.220.25

a

0.58 0.42

1

0.2

b

0

0

0

0

0

0

1

1

11

1

a:01b:11c:001d:101e:100f: 0001g:0000

1




Lempel-Ziv algorithm

I Huffman encoding is optimal (average block length is minimal)

I Huffman coding requires the probability distribution of source.

I In Huffman coding, the input sequence is fixed length but theoutput is variable length.

I In LZ coding, the knowledge of probabilities is not required.

I LZ coding therefore belongs to the class of universal sourcecoding

I In LZ coding, the input sequence can be variable length butthe output is fixed length.




Dictionary construction

In this algorithm we construct ”on the fly” a dictionary. That’swhy the code is called dictionary coding. Suppose the binarymessage 10101101001001110101000011001110101100011011... tobe coded.There is no entry in our table yet. The encoder constructs the firstentry of its table with the first letter in the sequence, 1, with thelocation in the table 1.The second letter, which is a 0, does not belong to the table, so itis added with the location 2.The next letter is a 1 which is already in the table, so the encodercontinues to receive the next letter. Now 10 is not in the table,that will be added with the index 3.




The table

For the example given the dictionary for the sequence10101101001001110101000011001110101100011011 will be:

1, 0, 10, 11, 01, 00, 100, 111, 010, 1000, 011, 001, 110, 101, 10001, 1011

Note that each phrase is a concatenation of a previous phrase inthe table with the new letter to be appended.To encode the sequence, the codewords are the position of thephrase in the dictionary with the new letter appended to it.Initially, the location 0000 is used to encode a phrase that has notappeared previously.




Encoding

Assuming the codewords of length 5, the constructed dictionarytogether with the codewords are presented in the following table:

location content codeword

1 0001 1 000012 0010 0 000003 0011 10 000104 0100 11 000115 0101 01 001016 0110 00 001007 0111 100 00110

... ... ... ...15 1111 10001 10101




Lempel-Ziv decoding

In decoding the table must be constructed also ”on the fly”. Forexample for the previous sequence, the receiver must know thatthe codeword length is 5. It receives a 00001. So the entry of thetable is the 1. This is the decoded sequence and it puts in its tablethe 1 in the first location. Then it receives a 00000 which says thatthe second entry of the table is a 0 and the decoded phrase will be0. The third word is 00010 which means that 0001 is appended to0. The decoded phrase will be the phrase corresponding to the firstrow, 1 with a 0 at the end: 10, and so on ...


Date post:	21-Aug-2019
Category:	Documents
Upload:	vantu
View:	250 times
Download:	0 times

Chapter 2: Source coding - unilim.fr · Outline Markov source Source coding Entropy of Markov...

Documents