Motivations for arithmetic coding:
1) Huffman coding algorithm can generate prefix codes with a
minimum average codeword length. But this length is usually strictly
greater than H(X1)
2) To improve the coding efficiency, one can use block memoryless
code by working with the extended alphabet Xn. But computational
complexity will grow exponentially as n increases
Thus for small n, the Huffman coding is inefficient. On the other hand,
for large n, it is unpractical due to its exponential coding complexity.
Solution: Arithmetic coding is one of the algorithms that can address
the above issue. It can achieve the entropy rate of a stationary
source with a linear coding complexity.
Motivation for Arithmetic Coding
Shannon-Fano-Elias Codes
Let (X1, ··· Xn) be a random vector with joint pmf p(u1,u2, ··· un),
uiєX={x0, ··· xJ-1}. We partition the interval [0,1] into disjoint sub-intervals
I(u1,u2, ··· un), u1,u2, ··· un єXn such that the following properties hold:
1) The length of the interval I(u1,u2, ··· un) is equal to p(u1,u2, ··· un).
2)
3) The intervals I(u1,u2, ··· un) are arranged according to the natural
lexicographic order on the sequence u1,u2, ··· un.
11( ) [0,1]
nn
u un X
I u u∈
∪ =⋯
⋯
I(x0) I(x1) I(x2)
…
… I(xJ-2) I(xJ-1)
I(x0x0) I(x0x1)I(x0xJ-1)
…
… I(xJ-1xJ-1)
n=1
…
…
…I(x1x0)
n=2
Shannon-Fano-Elias Codes (cntd)
I(x0x0··· x0 x0) = [0, p(x0x0··· x0 x0)]
I(x0x0··· x0 x1) = [p(x0x0··· x0 x0), p(x0x0··· x0 x0)+ p(x0x0··· x0 x1)]
﹕I(xJ-1xJ-1···xJ-1)= [1-p(xJ-1xJ-1··· xJ-1), 1]
To get the codeword corresponding to u1u2··· un, let
I(u1u2··· un) = [a, b].
Represent the mid-point a+b/2 by its binary expansion
1 2
i i
i=1
0.2
= B 2 , B {0,1}.
L
i
a bB B B
∞−
+=
∈∑
⋯ ⋯
Let 1log ( ) 1 log( ) 1nL p u u b a= − + = − − + ⋯
The binary sequence B1B2…BL is the codeword of u1u2··· un. The length of
the codeword assigned to u1u2··· un is equal to
1log ( ) 1np u u− + ⋯
Shannon-Fano-Elias Codes: Decoding
1 20.2
L
L
a bB B B
+ =
⋯
2 L
a b+
1
1
1 2
[ log ( ) 1]
i=L+1
( log ( ) 1)
1
2 2
0.00 02 2
= 2 2 2
2
1 ( )
2
n
n
L
L L
L
p u ui L
i
p u u
n
a b a b
a b a bB B
B
bp u u
+ +
∞− − + − −
− − +
+ + ≤
+ + − =
< =
≤
−= =
∑ ⋯
⋯
⋯ ⋯
⋯
2
a
is the real number obtained by rounding off (a+b)/2 to the first L bits.
is inside the interval [a,b].2 L
a b+
2 L
a b+
, 2 [ , ]2 2
L
L L
a b a ba b− + +
+ ⊂
Let
We can prove
is inside [a, b]. Furthermore,
After receiving the codeword
B1B2…Bn, the decoder searches
through all u1u2··· unєXn until the
unique u1u2··· un is found for
which I(u1u2··· un) contains
, and then
decodes B1B2…BLas the unique
u1u2··· un.
1 20.2
L
L
a bB B B
+ =
⋯
x p(x) I(x) L(X)= midpoint C(x)
x0 0.25 [0, 0.25] 3 0.001··· 001
x1 0.5 [0.25, 0.75] 2 0.10··· 10
x2 0.125 [0.75, 0.875] 4 0.1101··· 1101
x3 0.125 [0.875, 1] 4 0.1111··· 1111
Shannon-Fano-Elias Codes: Example
log ( ) 1p x− +
Shannon-Fano-Elias Code is a prefix code.
Arithmetic Coding
• The encoding complexity of the Shannon-Fano-Elias coding
algorithm mainly lies in the process of determining the interval
I(u1u2···un).
• Similarly, given B1B2 ···BL, the decoding complexity of the Shannon-
Fano-Elias coding algorithm mainly lies in the process of finding the
unique interval I(u1u2···un) such that the point 0.B1B2 ···BL is in
I(u1u2···un).
• In arithmetic coding, both of the processes can be realized
sequentially with linear complexity.
• The idea of arithmetic coding was originated by Elias and later made
practical by Rissanen, Pasco, Moffat and Witten.
Arithmetic Coding (Continued)
1) To determine the interval I(u1u2···un), we decompose the joint
probability p(u1u2···un) as
p(u1u2···un) = p(u1 ) p(u2|u1 ) p(u3|u1u2) ···p(un|u1···un-1)
we then construct a sequence of embedded intervals
)()()( 21211 nuuuIuuIuI ⋯⋯⊃⊃⊃
2) Partition the interval [0, 1] into disjoint subintervals I(xj),
0≤j≤J-1 shown below
I(x0) I(x1) I(x2)
…
… I(xJ-2) I(xJ-1)0 1
The length of the interval I(xj) is equal to p(xj). Then I(u1)= I(xj) if
u1=xj.
3) If I(u1u2···ui)=[ai,bi], we then partition [ai,bi] into disjoint sub-intervals
I(u1 ···uixj), 0≤j≤J-1 according to the conditional pmf p(xj|u1···ui),
0≤j≤J-1, shown below.
I(u1 ···uix0)
……
ai biI(u1 ···uix0) I(u1 ···uixJ-1)
Arithmetic Coding (Continued)
The length of the interval I(u1 ···uixj) is equal to
p(u1···uixj) = p(u1···ui ) p(xj|u1···ui ) = the length of [ai, bi]x p(xj|u1···ui )
Then I(u1 ···uiui+1) = I(u1 ···uixj) if ui+1=xj
4) Repeat step 3) until the interval I(u1···un) is determined. The last interval
I(u1···un) is the desired interval.
5) To get the codeword corresponding to u1···un, we apply the same
procedure as in the Shannon-Fano-Elias coding. let
I(u1u2··· un) = [a, b].
Let . Rounding off the midpoint (a+b)/2 to the first L
bits, we get
1log ( ) 1nL p u u= − + ⋯
1 20.2
L
L
a bB B B
+ =
⋯
The sequence B1B2···BL is the codeword corresponding to u1···un.
Arithmetic coding ( Decoding)
The decoding process can be realized sequentially.
1) Partition [0, 1) into disjoint sub-intervals I(xj), 0≤j≤J-1. If
0.B1B2···BLєI(xj), set u1=xj.
2) Having decoded u1u2···ui, we then partition I(u1u2···ui) into
disjoint subintervals I(u1u2···uixj ), 0≤j≤J-1. If 0.B1B2···BLє
I(u1u2···uixj ), then set ui+1=xj.
3) Repeat step 2) until the sequence u1u2···un is decoded.
Arithmetic coding
1) In arithmetic coding, the length n of the sequence
u1u2···un to be compressed is assumed to be known to
both the encoder and the decoder.
2) The length of the codeword length assigned to u1u2···un is
Thus the average codeword length in bits/symbol
converges to the entropy rate of a stationary source as n
approaches infinity.
1log ( ) 1nL p u u= − + ⋯
Arithmetic Coding (Example)
Let {xi} be a discrete memoryless source with a common pmf
p(0)=2/5, p(1)=3/5, and the alphabet X={0,1}
Let u1u2···u5=10110. We have
I(1)=[2/5, 1]
I(10)=[2/5, 16/25]
I(101)=[62/125, 16/25]
I(1011)=[346/625, 16/25]
I(10110)=[346/625, 1838/3125]
The length of I(101100) is 108/3125
108log 1 6
3125L
⇒ = − + =
Midpoint = 1784/3125 = 0.100100 ···
and the codeword = 100100
Arithmetic codingArithmetic coding
Let the message to be encoded be x0x1x2x2x3
Source symbol Probability Initial Subinterval
x0 0.2 [0.0, 0.2)
x1 0.2 [0.2, 0.4)
x2 0.4 [0.4, 0.8)
x3 0.2 [0.8, 1.0]
1.0
0.8
0.4
0.2
0.2
0.16
0.08
0.0
0.08
0.072
0.056
0.048
0.072
0.04
0.056
Encoding sequence: x0x1x2x2x3
x0 x1 x2 x2 x3
0.04
0
0.0592
0.0624
0.0688
0.0624
0.0688
0.06368
0.06496
0.06752
The final interval [0.06752,0.0688) , we can get the
codeword length L and the corresponding codeword.
Adaptive Arithmetic Coding
In the above description of arithmetic coding, we assume that both the
encoder and decoder know in advance the joint pmf of the random vector
(X1, X2, ···Xn).
In practice, the pmf is often unknown, and has to be estimated online and
offline.
For simplicity, let x={0,1}. The initial pmf is equally likely, i. e.,
p(0) = p(1) = ½
After u1u2···ui is processed, the conditional pmf given u1u2···ui is given by
1 2 i1 2 i
number of 1 in u u u 1p(1| u u u ) =
2i
+
+
⋯
⋯
1 2 i1 2 i
number of 0 in u u u 1p(0| u u u ) =
2i
+
+
⋯
⋯
Let u1u2···u8 = 11001010. Then according to the above
1 2 6
1 2 1 2 3 3 4 4p( u u u ) =p(11001010)=
2 3 4 5 6 7 8 9⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅⋯
Adaptive Arithmetic Coding
1 2 i1 2 i
number of 1 in u u u 1/ 2p(1| u u u ) =
1i
+
+
⋯
⋯
1 2 i1 2 i
number of 0 in u u u 1/ 2p(0| u u u ) =
1i
+
+
⋯
⋯
Another choice for the conditional pmf given u1u2···ui is as follows
1 3/ 2 1/ 2 3/ 2 5 / 2 5 / 2 7 / 2 7 / 2p(11001010)=
2 2 3 4 5 6 7 8⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
Lempel-Ziv Algorithm
• Adaptive arithmetic coding presented at the end of the last section
is universal because it does not require source statistics and can
achieve the ultimate compression rate of any discrete memoryless
source
• Lempel-Ziv is another universal source coding algorithm
developed by Ziv and Lempel.
• One Lempel-Ziv algorithm is LZ77 which is known as sliding
window Lempel-Ziv algorithm, which is published in 1977.
• One year later, they propose a variant of LZ77, the incremental
parsing Lempel-Ziv algorithm, i.e., LZ78.
• In this course we will look at LZ78.
Lempel-Ziv parsing
• LZ78 adopts a incremental parsing procedure, which parses the source
sequence u1u2···un into non-overlapping variable-length blocks.
• The first substring in the incremental parsing of u1u2···un is u1. The second
substring in the parsing is the shortest phrase of u1u2···un that has not
appeared so far in the parsing.
• Assume that u1, u1···un2,un2+1···un3, uni-1+1···uni are the substrings created so far
in the parsing process.
The next substring, which is denoted as uni+1···uni+1 is the shortest phrase of
uni+1···un that has not appeared in {u1, u1···un2,un2+1···un3, uni-1+1···uni } is such a
prefix exists
• Otherwise uni+1···uni+1 = uni+1···unwith ni+1=n, and the incremental parsing
procedure terminates.
Lempel-Ziv parsing: Example
1 0 10 11 100 111 00 1110 001 110 01
The incremental parsing procedure yields the following partition
1, 0, 10, 11, 100, 111, 00, 1110, 001, 110, 01
Example 1
1 10 11 0 00 110 1
1, 10, 11, 0, 00, 110, 1
In this example, the last substring 1 has already appeared.
Example 2
Lempel-Ziv parsing
• The concatenation of all phrases is equal to the original source sequence.
• All phrases are distinct, except that the last phrase could be equal to one of
the preceding ones. In Example 2, the last phrase is equal to the first one. All
phrases except the last one are distinct.
• Let denote an empty string. Think of as an initial phrase before the first
phrase in the incremental parsing. Each new phrase in the parsing is the
concatenation of a previous phrase with a new output letter from the source
sequence.
For example, the first phase 1 is the concatenation of the empty string with
the new symbol 1. similarly, the phrase 110 is the concatenation of the
phrases 11 with the new symbol 0.
ΛΛ
Lempel-Ziv Encoding
Let X={x0,··· xJ-1}. The Lempel-Ziv encoding of the sequence u1u2···un can be
implemented sequentially as follows.
1. The first phrase u1 is uniquely determined by (0, u1) where the index 0 is
corresponding to the initial empty phrase . Represent the pair (0,u1)
by the integer 0xJ+index(u1) where the index(u1)=j if u1=xj, 0≤j ≤J-1.
Encode the first phrase into the binary representation of the integer
0xJ+index(u1) = index(u1) padded with possible zeros on the left to
ensure that the total length of the codeword is
2. Having determined the ith phrase, we know that the ith phrase is equal
to the concatenation of the mth phrase with a new symbol xj for some
0≤m≤i-1 and 0≤j ≤J-1. Represent the ith phrase into the binary
representation of the integer mxJ+j padded with some possible zeros on
the left to ensure that the total of the codeword is
3. Repeat step 2 until all phrases are encoded.
Λ
log J
log iJ
Lempel Ziv Encoding: Example
Partitioned phrases: 1 10 11 0 00 110 1
X ={0,1}, J=2.
Phrases ( m, j ) codewords length
1 � (0, 1) � 1 1
10 � (1, 0) � 10 2
11 � (1, 1) � 011 3
0 � (0, 0) � 000 3
00 � (4, 0) � 1000 4
110 � (3, 0) � 0110 4
1 � (0,1) � 0001 4
So the Lempel-Ziv coding transforms from the original source sequence
1 10 11 0 00 110 1
To
1 10 011 000 1000 0110 0001
Lempel Ziv Encoding
• In the example, instead of compression, we get expansion. The problem is
that the source sequence in the example is too short. In fact the LZ78 can
achieve the entropy rate of any stationary source as the length of the
source sequence goes without bound.
• If there are t phrases in the incremental parsing of u1u2···un, then the length
of the whole Lempel-Ziv codeword for u1u2···un is1
logt
i
iJ=
∑
Lempel Ziv Decoding
• The decoding process is easy and can also be done sequentially
since the decoder knows in advance that the length of the codeword
corresponding to the ith phrase is
• After receiving the whole codeword, the decoder parses the whole
codeword into non-overlapping substring of lengths , 1≤i≤t.
From the ith string, the decoder finds the integer mJ+j and the pair
(m,j). Then the ith phrase is the concatenation of the mth phrase with
the symbol xj.
log iJ
log iJ
Lempel Ziv Decoding: Example
1 10 011 000 1000 0110 0001
Integers 1 2 3 0 8 6 1
pairs (0,1) (1,0) (1,1) (0,0) (4,0) (3,0) (0,1)
Phrases 1 10 11 0 00 110 1
Performance of Lempel-Ziv Coding
Theorem 2.6.1
Let {Xi} be a discrete stationary source. Let r(X1···Xn) be
the ratio between the length of the whole Lempel-Ziv
codeword for X1···Xn and the length n of X1···Xn is the
compression rate in bits per symbol. Then
E[r(X1···Xn) ] ( )H X∞→
as n→∞