Download - Motivation for Arithmetic Codingstaff.ustc.edu.cn/~jingxi/Lecture 4.pdf · 2010. 3. 15. · Adaptive Arithmetic Coding In the above description of arithmetic coding, we assume that

Motivations for arithmetic coding:

1) Huffman coding algorithm can generate prefix codes with a

minimum average codeword length. But this length is usually strictly

greater than H(X1)

2) To improve the coding efficiency, one can use block memoryless

code by working with the extended alphabet Xn. But computational

complexity will grow exponentially as n increases

Thus for small n, the Huffman coding is inefficient. On the other hand,

for large n, it is unpractical due to its exponential coding complexity.

Solution: Arithmetic coding is one of the algorithms that can address

the above issue. It can achieve the entropy rate of a stationary

source with a linear coding complexity.

Motivation for Arithmetic Coding

Shannon-Fano-Elias Codes

Let (X1, ··· Xn) be a random vector with joint pmf p(u1,u2, ··· un),

uiєX={x0, ··· xJ-1}. We partition the interval [0,1] into disjoint sub-intervals

I(u1,u2, ··· un), u1,u2, ··· un єXn such that the following properties hold:

1) The length of the interval I(u1,u2, ··· un) is equal to p(u1,u2, ··· un).

2)

3) The intervals I(u1,u2, ··· un) are arranged according to the natural

lexicographic order on the sequence u1,u2, ··· un.

11( ) [0,1]

nn

u un X

I u u∈

∪ =⋯

⋯

I(x0) I(x1) I(x2)

…

… I(xJ-2) I(xJ-1)

I(x0x0) I(x0x1)I(x0xJ-1)

…

… I(xJ-1xJ-1)

n=1

…

…

…I(x1x0)

n=2

Shannon-Fano-Elias Codes (cntd)

I(x0x0··· x0 x0) = [0, p(x0x0··· x0 x0)]

I(x0x0··· x0 x1) = [p(x0x0··· x0 x0), p(x0x0··· x0 x0)+ p(x0x0··· x0 x1)]

﹕I(xJ-1xJ-1···xJ-1)= [1-p(xJ-1xJ-1··· xJ-1), 1]

To get the codeword corresponding to u1u2··· un, let

I(u1u2··· un) = [a, b].

Represent the mid-point a+b/2 by its binary expansion

1 2

i i

i=1

0.2

= B 2 , B {0,1}.

L

i

a bB B B

∞−

+=

∈∑

⋯ ⋯

Let 1log ( ) 1 log( ) 1nL p u u b a= − + = − − + ⋯

The binary sequence B1B2…BL is the codeword of u1u2··· un. The length of

the codeword assigned to u1u2··· un is equal to

1log ( ) 1np u u− + ⋯

Shannon-Fano-Elias Codes: Decoding

1 20.2

L

L

a bB B B

+ =

⋯

2 L

a b+

1

1

1 2

[ log ( ) 1]

i=L+1

( log ( ) 1)

1

2 2

0.00 02 2

= 2 2 2

2

1 ( )

2

n

n

L

L L

L

p u ui L

i

p u u

n

a b a b

a b a bB B

B

bp u u

+ +

∞− − + − −

− − +

+ + ≤

+ + − =

< =

≤

−= =

∑ ⋯

⋯

⋯ ⋯

⋯

2

a

is the real number obtained by rounding off (a+b)/2 to the first L bits.

is inside the interval [a,b].2 L

a b+

2 L

a b+

, 2 [ , ]2 2

L

L L

a b a ba b− + +

+ ⊂

Let

We can prove

is inside [a, b]. Furthermore,

After receiving the codeword

B1B2…Bn, the decoder searches

through all u1u2··· unєXn until the

unique u1u2··· un is found for

which I(u1u2··· un) contains

, and then

decodes B1B2…BLas the unique

u1u2··· un.

1 20.2

L

L

a bB B B

+ =

⋯

x p(x) I(x) L(X)= midpoint C(x)

x0 0.25 [0, 0.25] 3 0.001··· 001

x1 0.5 [0.25, 0.75] 2 0.10··· 10

x2 0.125 [0.75, 0.875] 4 0.1101··· 1101

x3 0.125 [0.875, 1] 4 0.1111··· 1111

Shannon-Fano-Elias Codes: Example

log ( ) 1p x− +

Shannon-Fano-Elias Code is a prefix code.

Arithmetic Coding

• The encoding complexity of the Shannon-Fano-Elias coding

algorithm mainly lies in the process of determining the interval

I(u1u2···un).

• Similarly, given B1B2 ···BL, the decoding complexity of the Shannon-

Fano-Elias coding algorithm mainly lies in the process of finding the

unique interval I(u1u2···un) such that the point 0.B1B2 ···BL is in

I(u1u2···un).

• In arithmetic coding, both of the processes can be realized

sequentially with linear complexity.

• The idea of arithmetic coding was originated by Elias and later made

practical by Rissanen, Pasco, Moffat and Witten.

Arithmetic Coding (Continued)

1) To determine the interval I(u1u2···un), we decompose the joint

probability p(u1u2···un) as

p(u1u2···un) = p(u1 ) p(u2|u1 ) p(u3|u1u2) ···p(un|u1···un-1)

we then construct a sequence of embedded intervals

)()()( 21211 nuuuIuuIuI ⋯⋯⊃⊃⊃

2) Partition the interval [0, 1] into disjoint subintervals I(xj),

0≤j≤J-1 shown below

I(x0) I(x1) I(x2)

…

… I(xJ-2) I(xJ-1)0 1

The length of the interval I(xj) is equal to p(xj). Then I(u1)= I(xj) if

u1=xj.

3) If I(u1u2···ui)=[ai,bi], we then partition [ai,bi] into disjoint sub-intervals

I(u1 ···uixj), 0≤j≤J-1 according to the conditional pmf p(xj|u1···ui),

0≤j≤J-1, shown below.

I(u1 ···uix0)

……

ai biI(u1 ···uix0) I(u1 ···uixJ-1)

Arithmetic Coding (Continued)

The length of the interval I(u1 ···uixj) is equal to

p(u1···uixj) = p(u1···ui ) p(xj|u1···ui ) = the length of [ai, bi]x p(xj|u1···ui )

Then I(u1 ···uiui+1) = I(u1 ···uixj) if ui+1=xj

4) Repeat step 3) until the interval I(u1···un) is determined. The last interval

I(u1···un) is the desired interval.

5) To get the codeword corresponding to u1···un, we apply the same

procedure as in the Shannon-Fano-Elias coding. let

I(u1u2··· un) = [a, b].

Let . Rounding off the midpoint (a+b)/2 to the first L

bits, we get

1log ( ) 1nL p u u= − + ⋯

1 20.2

L

L

a bB B B

+ =

⋯

The sequence B1B2···BL is the codeword corresponding to u1···un.

Arithmetic coding ( Decoding)

The decoding process can be realized sequentially.

1) Partition [0, 1) into disjoint sub-intervals I(xj), 0≤j≤J-1. If

0.B1B2···BLєI(xj), set u1=xj.

2) Having decoded u1u2···ui, we then partition I(u1u2···ui) into

disjoint subintervals I(u1u2···uixj ), 0≤j≤J-1. If 0.B1B2···BLє

I(u1u2···uixj ), then set ui+1=xj.

3) Repeat step 2) until the sequence u1u2···un is decoded.

Arithmetic coding

1) In arithmetic coding, the length n of the sequence

u1u2···un to be compressed is assumed to be known to

both the encoder and the decoder.

2) The length of the codeword length assigned to u1u2···un is

Thus the average codeword length in bits/symbol

converges to the entropy rate of a stationary source as n

approaches infinity.

1log ( ) 1nL p u u= − + ⋯

Arithmetic Coding (Example)

Let {xi} be a discrete memoryless source with a common pmf

p(0)=2/5, p(1)=3/5, and the alphabet X={0,1}

Let u1u2···u5=10110. We have

I(1)=[2/5, 1]

I(10)=[2/5, 16/25]

I(101)=[62/125, 16/25]

I(1011)=[346/625, 16/25]

I(10110)=[346/625, 1838/3125]

The length of I(101100) is 108/3125

108log 1 6

3125L

⇒ = − + =

Midpoint = 1784/3125 = 0.100100 ···

and the codeword = 100100

Arithmetic codingArithmetic coding

Let the message to be encoded be x0x1x2x2x3

Source symbol Probability Initial Subinterval

x0 0.2 [0.0, 0.2)

x1 0.2 [0.2, 0.4)

x2 0.4 [0.4, 0.8)

x3 0.2 [0.8, 1.0]

1.0

0.8

0.4

0.2

0.2

0.16

0.08

0.0

0.08

0.072

0.056

0.048

0.072

0.04

0.056

Encoding sequence: x0x1x2x2x3

x0 x1 x2 x2 x3

0.04

0

0.0592

0.0624

0.0688

0.0624

0.0688

0.06368

0.06496

0.06752

The final interval [0.06752,0.0688) , we can get the

codeword length L and the corresponding codeword.

Adaptive Arithmetic Coding

In the above description of arithmetic coding, we assume that both the

encoder and decoder know in advance the joint pmf of the random vector

(X1, X2, ···Xn).

In practice, the pmf is often unknown, and has to be estimated online and

offline.

For simplicity, let x={0,1}. The initial pmf is equally likely, i. e.,

p(0) = p(1) = ½

After u1u2···ui is processed, the conditional pmf given u1u2···ui is given by

1 2 i1 2 i

number of 1 in u u u 1p(1| u u u ) =

2i

+

+

⋯

⋯

1 2 i1 2 i

number of 0 in u u u 1p(0| u u u ) =

2i

+

+

⋯

⋯

Let u1u2···u8 = 11001010. Then according to the above

1 2 6

1 2 1 2 3 3 4 4p( u u u ) =p(11001010)=

2 3 4 5 6 7 8 9⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⋯

Adaptive Arithmetic Coding

1 2 i1 2 i

number of 1 in u u u 1/ 2p(1| u u u ) =

1i

+

+

⋯

⋯

1 2 i1 2 i

number of 0 in u u u 1/ 2p(0| u u u ) =

1i

+

+

⋯

⋯

Another choice for the conditional pmf given u1u2···ui is as follows

1 3/ 2 1/ 2 3/ 2 5 / 2 5 / 2 7 / 2 7 / 2p(11001010)=

2 2 3 4 5 6 7 8⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Lempel-Ziv Algorithm

• Adaptive arithmetic coding presented at the end of the last section

is universal because it does not require source statistics and can

achieve the ultimate compression rate of any discrete memoryless

source

• Lempel-Ziv is another universal source coding algorithm

developed by Ziv and Lempel.

• One Lempel-Ziv algorithm is LZ77 which is known as sliding

window Lempel-Ziv algorithm, which is published in 1977.

• One year later, they propose a variant of LZ77, the incremental

parsing Lempel-Ziv algorithm, i.e., LZ78.

• In this course we will look at LZ78.

Lempel-Ziv parsing

• LZ78 adopts a incremental parsing procedure, which parses the source

sequence u1u2···un into non-overlapping variable-length blocks.

• The first substring in the incremental parsing of u1u2···un is u1. The second

substring in the parsing is the shortest phrase of u1u2···un that has not

appeared so far in the parsing.

• Assume that u1, u1···un2,un2+1···un3, uni-1+1···uni are the substrings created so far

in the parsing process.

The next substring, which is denoted as uni+1···uni+1 is the shortest phrase of

uni+1···un that has not appeared in {u1, u1···un2,un2+1···un3, uni-1+1···uni } is such a

prefix exists

• Otherwise uni+1···uni+1 = uni+1···unwith ni+1=n, and the incremental parsing

procedure terminates.

Lempel-Ziv parsing: Example

1 0 10 11 100 111 00 1110 001 110 01

The incremental parsing procedure yields the following partition

1, 0, 10, 11, 100, 111, 00, 1110, 001, 110, 01

Example 1

1 10 11 0 00 110 1

1, 10, 11, 0, 00, 110, 1

In this example, the last substring 1 has already appeared.

Example 2

Lempel-Ziv parsing

• The concatenation of all phrases is equal to the original source sequence.

• All phrases are distinct, except that the last phrase could be equal to one of

the preceding ones. In Example 2, the last phrase is equal to the first one. All

phrases except the last one are distinct.

• Let denote an empty string. Think of as an initial phrase before the first

phrase in the incremental parsing. Each new phrase in the parsing is the

concatenation of a previous phrase with a new output letter from the source

sequence.

For example, the first phase 1 is the concatenation of the empty string with

the new symbol 1. similarly, the phrase 110 is the concatenation of the

phrases 11 with the new symbol 0.

ΛΛ

Lempel-Ziv Encoding

Let X={x0,··· xJ-1}. The Lempel-Ziv encoding of the sequence u1u2···un can be

implemented sequentially as follows.

1. The first phrase u1 is uniquely determined by (0, u1) where the index 0 is

corresponding to the initial empty phrase . Represent the pair (0,u1)

by the integer 0xJ+index(u1) where the index(u1)=j if u1=xj, 0≤j ≤J-1.

Encode the first phrase into the binary representation of the integer

0xJ+index(u1) = index(u1) padded with possible zeros on the left to

ensure that the total length of the codeword is

2. Having determined the ith phrase, we know that the ith phrase is equal

to the concatenation of the mth phrase with a new symbol xj for some

0≤m≤i-1 and 0≤j ≤J-1. Represent the ith phrase into the binary

representation of the integer mxJ+j padded with some possible zeros on

the left to ensure that the total of the codeword is

3. Repeat step 2 until all phrases are encoded.

Λ

log J

log iJ

Lempel Ziv Encoding: Example

Partitioned phrases: 1 10 11 0 00 110 1

X ={0,1}, J=2.

Phrases ( m, j ) codewords length

1 � (0, 1) � 1 1

10 � (1, 0) � 10 2

11 � (1, 1) � 011 3

0 � (0, 0) � 000 3

00 � (4, 0) � 1000 4

110 � (3, 0) � 0110 4

1 � (0,1) � 0001 4

So the Lempel-Ziv coding transforms from the original source sequence

1 10 11 0 00 110 1

To

1 10 011 000 1000 0110 0001

Lempel Ziv Encoding

• In the example, instead of compression, we get expansion. The problem is

that the source sequence in the example is too short. In fact the LZ78 can

achieve the entropy rate of any stationary source as the length of the

source sequence goes without bound.

• If there are t phrases in the incremental parsing of u1u2···un, then the length

of the whole Lempel-Ziv codeword for u1u2···un is1

logt

i

iJ=

∑

Lempel Ziv Decoding

• The decoding process is easy and can also be done sequentially

since the decoder knows in advance that the length of the codeword

corresponding to the ith phrase is

• After receiving the whole codeword, the decoder parses the whole

codeword into non-overlapping substring of lengths , 1≤i≤t.

From the ith string, the decoder finds the integer mJ+j and the pair

(m,j). Then the ith phrase is the concatenation of the mth phrase with

the symbol xj.

log iJ

log iJ

Lempel Ziv Decoding: Example

1 10 011 000 1000 0110 0001

Integers 1 2 3 0 8 6 1

pairs (0,1) (1,0) (1,1) (0,0) (4,0) (3,0) (0,1)

Phrases 1 10 11 0 00 110 1

Performance of Lempel-Ziv Coding

Theorem 2.6.1

Let {Xi} be a discrete stationary source. Let r(X1···Xn) be

the ratio between the length of the whole Lempel-Ziv

codeword for X1···Xn and the length n of X1···Xn is the

compression rate in bits per symbol. Then

E[r(X1···Xn) ] ( )H X∞→

as n→∞