o
Source Coding Fundamentals
Source CodingFundamentals
Thomas Wiegand Digital Image Communication 1 / 54
o
Source Coding Fundamentals
Outline
Introduction
Lossless Coding
Huffman CodingElias and Arithmetic Coding
Rate-Distortion Theory
Rate-Distortion FunctionShannon Lower Bound
Quantization
Scalar QuantizationVector Quantization
Predictive Coding
Linear PredictionDifferential Pulse Code Modulation (DPCM)
Transform Coding
Orthogonal Transforms and Bit AllocationKarhunen Loeve Transform (KLT)Discrete Cosine Transform (DCT)
Thomas Wiegand Digital Image Communication 2 / 54
o
Source Coding Fundamentals
Practical Communication Problem
Source codecs are primarily characterized in terms of:
Throughput of the channel, a characteristic influenced by
transmission channel bit rate andamount of protocol and error-correction coding overhead incurred bytransmission system
Distortion of the decoded signal, which is primarily induced by
source encoder andby channel errors introduced in path to source decoder
The following additional constraints must also be considered:
Delay (start-up latency and end-to-end delay) including
processing delay, buffering,structural delays of source and channel codecs, andspeed at which data are conveyed through transmission channel
Complexity (computation, memory capacity, memory access) of
source codec,protocol stacks and network
Thomas Wiegand Digital Image Communication 3 / 54
o
Source Coding Fundamentals
Formulation of the Practical Communication Problem
The practical source coding design problem can be posed as follows:
Given a maximum allowed delay and a maximum allowedcomplexity, achieve an optimal trade-off between bit rateand distortion for the transmission problem in the targetedapplications
Scope of the consideration in this lecture: Source codec
Delay is only evaluated for source codec
Complexity is also only assessed for the algorithm used in source codec
Thomas Wiegand Digital Image Communication 4 / 54
o
Source Coding Fundamentals
Types of Compression
Lossless coding:
Uses redundancy reduction as the only principle and is therefore reversible
Also referred to as noiseless coding, invertible coding, data compaction,or entropy coding
Well known use for this type of compression for data is Lempel-Ziv coding(e.g., gzip) and for picture and video signals JPEG-LS is well known
Lossy coding:
Uses redundancy reduction and irrelevancy reduction and is therefore notreversible
It is the primary coding type in compression for speech, audio, picture, andvideo signals
The practically relevant bit rate reduction that is achievable through lossycompression is typically more than an order of magnitude larger than withlossless compression
Well known examples are for audio coding are the MPEG-1 Layer 3 (mp3),for still picture coding JPEG, and for video coding H.264/AVC
Thomas Wiegand Digital Image Communication 5 / 54
o
Source Coding Fundamentals
Distortion Measures
Usage of distortion measures
The use of lossy compression requires the ability to measure distortion
Often, the distortion that a human perceives in coded content is a verydifficult quantity to measure, as the characteristics of human perception arecomplex
Perceptual models are far more advanced for speech and audio codecs thanfor picture or video codecs
In picture and video coding,
Perceptual models have limited use to guide encoding decisions(mainly focusing on properties of the human visual system)Viewing tests are used to determine subjective quality of coding results
For investigating source coding techniques, simple objective distortionmeasures such as MSE and PSNR are often used
Thomas Wiegand Digital Image Communication 6 / 54
o
Source Coding Fundamentals
Objective Distortion Measures
Mean Squared Error (MSE)Pictures (X: picture height, Y : picture width):
MSE =1
X · Y
X−1∑x=0
Y−1∑y=0
(s′[x, y]− s[x, y]
)2(1)
Videos (N : number of pictures, MSEn: MSE of picture n):
MSE =1
N
N−1∑n=0
MSEn (2)
Peak Signal-to-Noise Ratio (PSNR)Pictures (2k − 1: maximum amplitude of the picture samples)
PSNR = 10 · log10
((2k − 1)2
MSE
)(3)
Videos (N : number of pictures, PSNRn: PSNR of picture n)
PSNR =1
N
N−1∑n=0
PSNRn (4)
Thomas Wiegand Digital Image Communication 7 / 54
o
Source Coding Fundamentals
Distortion Measures for Picture and Video Coding
Typical artifacts in compressed videos
Blockiness, motion errors, blur, ringing
Picture and video encoding
Measurement of properties of the human visual system have been used toderived spatial contrast sensitivity functions that are used in the encoding ofpictures
Temporal sensitivity functions that have been measured have so far not beenincluded in video encoding algorithms known in the public domain
Mean squared error is a widely used measure
Picture and video quality assessment
Objective quality measures often have very limited correlation with the actualsubjective quality of a compressed picture or video signal
Picture and video quality is assessed in viewing tests
Viewing test with human subjects are costly and time consuming
Thomas Wiegand Digital Image Communication 8 / 54
o
Source Coding Fundamentals
Video Quality Measurement
Viewing conditions
The viewing conditions are fixed and described in order to be able toreproduce the subjective test
ITU-R Rec. BT.500-11 specifies viewing conditions including
Low room illumination; Viewing screen is the main light sourceScreen size and preferred viewing distance as ratio between height of thescreen and distance from screen
Example for a viewing test method
Double stimulus continuous quality scale (DSCQS) is a test method wherethe subject view the stimuli compressed by a codec under test and a reference
Coded sequence and reference are presented alternately
Each sequence is presented twice
Sequences are separated by mid-gray level sequence of 3 sec duration
Subjects have no knowledge of display chronology
Subjects rate video on a continuous scale is partitioned into intervals that arelabeled “Excellent”, “Good”, “Fair”, “Poor” or “Bad”
Thomas Wiegand Digital Image Communication 9 / 54
o
Source Coding Fundamentals Lossless Coding
Lossless Coding
Thomas Wiegand Digital Image Communication 10 / 54
o
Source Coding Fundamentals Lossless Coding
Discrete Random Variables and Probability Mass Function
Definitions
A random variable S is a function of the sample space O that assignsa value S(ζ) to each outcome ζ∈ O of a random experiment
A random variable S called discrete random variable if it takes valuesof countable alphabet A = {a0, a1, . . .}Probability mass function (pmf) for discrete random variables
pS(a) = P (S = a) = P ( {ζ : S(ζ)= a} ) (5)
Examples for pmfs
Binary pmf:
A = {a0, a1} pS(a0) = p, pS(a1) = 1− p (6)
Uniform pmf:
A = {a0, a1, · · ·, aM−1} pS(ai) = 1/M ∀ ai ∈ A (7)
Geometric pmf:
A = {a0, a1, · · · } pS(ai) = (1− p) pi ∀ ai ∈ A (8)
Thomas Wiegand Digital Image Communication 11 / 54
o
Source Coding Fundamentals Lossless Coding
Joint and Conditional Pmfs
Joint probability mass function
The N-dimensional pmf or joint pmf for a random vectorS = (S0, · · · , SN−1)T is defined by
pS(a) = P (S= a) = P (S0 = a0, · · · , SN−1 = aN−1) (9)
Joint pmf of two random vectors X and Y
pXY (ax,ay) = P (X=ax,Y =ay) (10)
Conditional probability mass functions
The conditional pmf pS|B(a | B) of a random variable S given an event B,with P (B) > 0, is defined by
pS|B(a | B) = P (S = a | B) (11)
Conditional pmf of a random vector X given another random vector Y
pX|Y (ax|ay) =pXY (ax,ay)
pY (ay)(12)
Thomas Wiegand Digital Image Communication 12 / 54
o
Source Coding Fundamentals Lossless Coding
Example for Joint Pmf
Example: Joint pmf for neighboring picture samples
Samples in picture and video signals typically show strong statisticaldependencies
Below: Histogram of two horizontally adjacent samples for the picture ’Lena’
Relative
frequency
of occurence
Amplitude of
adjacent pixel
Amplitude of
current pixel
Thomas Wiegand Digital Image Communication 13 / 54
o
Source Coding Fundamentals Lossless Coding
Expectation
Expectation value or expectation
Definition for discrete random variables S
E {g(S)} =∑a∈A
g(a) pS(a) (13)
Important expectation values are mean µS and variance σ2S
µS = E {S} and σ2S = E
{(S − µs)
2}
(14)
Conditional expectation
The conditional expectation value of function g(S) given an event B,with P (B) > 0, is defined by
E {g(S) | B} =∑a∈A
g(a) pS(a | B) =∑a∈A
g(a)P (S = a,B)
P (B)(15)
Thomas Wiegand Digital Image Communication 14 / 54
o
Source Coding Fundamentals Lossless Coding
Discrete Random Processes
Discrete random process
Series of random experiments at time instants tn, with n = 0, 1, 2, . . .,characterized by a series of random variables S = {Sn}Statistical properties of discrete-time random process S: N -th order joint pmf
pSn(a) = P (Sn = a0, · · · , Sn+N−1 = aN−1) (16)
Stationary random process
Statistical properties are invariant to a shift in time: pSn(a) = pS(a)
Memoryless random process
The random variables Sn are independent of each other
Independent and identical distributed (iid) random process
Stationary and memoryless: pS(a) = pS(a0) · pS(a1) · · · pS(aN−1)
Markov process
Future outcomes do not depend on past outcomes, but only on the present outcome
pSn(an | an−1, · · · ) = pSn(an | an−1) (17)
Thomas Wiegand Digital Image Communication 15 / 54
o
Source Coding Fundamentals Lossless Coding
Lossless Source Coding – Overview
Lossless source coding
Reversible mapping of sequence of discrete source symbolsinto sequences of codewords
Other names: Noiseless coding, entropy coding
Original source sequence can be exactly reconstructed – not the case in lossycoding
Bit rate reduction is possible, if source data contain statistical propertiesthat are exploitable for data compression
Thomas Wiegand Digital Image Communication 16 / 54
o
Source Coding Fundamentals Lossless Coding
Lossless Source Coding – Terminology
Terminology
Message s(L) ={s0, · · · , sL−1} drawn from stochastic process S={Sn}
Sequence b(K) ={b0, · · · , bK−1} of K bits (bk ∈ B={0, 1})
Process of lossless coding: Message s(L) is converted to b(K)
Assume:
Subsequence s(N) = {sn, · · · , sn+N−1} with 1 ≤ N ≤ L and
Bits b(`)(s(N)) = {b0, · · · , b`−1} assigned to it
Lossless source code
Encoder mapping:b(`) = γ
(s(N) ) (18)
Decoder mapping:
s(N) = γ−1( b(`) ) = γ−1( γ( s(N) ) ) (19)
Thomas Wiegand Digital Image Communication 17 / 54
o
Source Coding Fundamentals Lossless Coding
Classification of Lossless Source Codes
Lossless source code
Encoder mapping:b(`) = γ
(s(N)
)(20)
Decoder mapping:
s(N) = γ−1(b(`)
)= γ−1
(γ(s(N)
) )(21)
Classification
Fixed-to-fixed mapping: N and ` are both fixed (discussed as special caseof fixed-to-variable)
Fixed-to-variable mapping: N fixed and ` variable – Huffman algorithm forscalars and vectors (discussed in lecture)
Variable-to-fixed mapping: N variable and ` fixed – Tunstall codes (notdiscussed in lecture)
Variable-to-variable mapping: ` and N are both variable – Arithmeticcodes (discussed in lecture)
Thomas Wiegand Digital Image Communication 18 / 54
o
Source Coding Fundamentals Lossless Coding
Variable-Length Coding for Scalars
Assigning codewords to scalar symbols
Assign a separate codeword to each scalar symbol sn of a message s(L)
Assume: Message s(L) generated by stationary random process S = {Sn}Random variables Sn = S with symbol alphabet A = {a0, · · · , aM−1} andmarginal pmf p(a) = P (S = a)
Lossless source code: Assign a binary codeword bi = {bi0, · · · , bi`(ai)−1} toeach alphabet letter ai
The length (number of bits) of the codeword bi that is assigned to ai isdenoted by `(ai), with `(ai) ≥ 1
Average codeword length
¯= E {`(S)} =
M−1∑i=0
p(ai) `(ai) (22)
Thomas Wiegand Digital Image Communication 19 / 54
o
Source Coding Fundamentals Lossless Coding
Optimization Problem
Average code word length is given as
¯=
K−1∑k=0
p(ak) · `(ak) (23)
Goal of lossless code design:
Minimize average codeword length ¯ while providing uniquedecodability
ai p(ai) code A code B code C code D code E
a0 0.5 0 0 0 00 0a1 0.25 10 01 01 01 10a2 0.125 11 010 011 10 110a3 0.125 11 011 111 110 111
¯ 1.5 1.75 1.75 2.125 1.75
Thomas Wiegand Digital Image Communication 20 / 54
o
Source Coding Fundamentals Lossless Coding
Unique Decodability and Prefix Codes
Unique decodability
The code γ has to specify a mapping ai → bi such that
if ak 6= aj then bk 6= bj (24)
For sequences of symbols, above constraint needs to be extended to theconcatenation of multiple symbols
→ For a uniquely decodable code, a sequence of codewords can only begenerated by one possible sequence of source symbols.
Prefix codes
One class of codes that satisfies the constraint of unique decodabilityis called prefix codes
A code is called a prefix code if no codeword is a prefix of any other codeword
It is obvious that if condition (24) is satisfied and the code is a prefix code,then any concatenation of symbols is uniquely decodable
Thomas Wiegand Digital Image Communication 21 / 54
o
Source Coding Fundamentals Lossless Coding
Binary Code Trees
‘ 0 ’
‘ 0 ’
‘ 0 ’
‘ 0 ’
’ 10 ’
‘ 1 ’
‘ 1 ’
‘ 1 ’ ‘ 110 ’
‘ 111 ’
root node
interior
node
terminal
node
branch
‘ 0 ’
‘ 0 ’
‘ 0 ’
‘ 0 ’
’ 10 ’
‘ 1 ’
‘ 1 ’
‘ 1 ’ ‘ 110 ’
‘ 111 ’
root node
interior
node
terminal
node
branch
Representation of prefix codes with binary trees
Prefix codes can be represented by trees
A binary tree contains nodes with two branches (labelled as ’0’ and ’1’)leading to other nodes starting from a root node
A node from which branches depart is called an interior node while a nodefrom which no branches depart is called a terminal node
A prefix code can be constructed by assigning letters of the alphabet A toterminal nodes of a binary tree
Thomas Wiegand Digital Image Communication 22 / 54
o
Source Coding Fundamentals Lossless Coding
Parsing of Prefix Codes
Given the code word assignment to terminal nodes of the binary tree, the parsingrule for this prefix code is given as follows:
1 Set the current node ni equal to the root node.
2 Read the next bit b from the bitstream.
3 Follow the branch labeled with the value of b from the current node ni to thedescendant node nj .
4 If nj is a terminal node, return the associated alphabet letter and proceedwith step 1.
Otherwise, set the current node ni equal to nj and repeat the previous twosteps.
Prefix codes are not only uniquely decodable, but also instantaneouslydecodable.
Thomas Wiegand Digital Image Communication 23 / 54
o
Source Coding Fundamentals Lossless Coding
Kraft Inequality for Prefix Codes
Property of codeword length for prefix codes
Assume fully balanced tree with depth `max (=longest code word)
Codewords assigned to nodes with codeword length `(ak) ≤ `max
Each choice with `(ak) ≤ `max eliminates 2`max−`(ak) possibilities of codeword assignment at level `max, for example:
→ `max − `(ak) = 0, one option is covered
→ `max − `(ak) = 1, two options are covered
Number of terminal nodes is less than or equal to number of terminal nodesin balanced tree with depth `max, which is 2`max
K−1∑k=0
2`max−`(ak) ≤ 2`max (25)
Thomas Wiegand Digital Image Communication 24 / 54
o
Source Coding Fundamentals Lossless Coding
Kraft Inequality and Unique Decodability
Kraft inequality for lossless codes
Observation for prefix codes can be generalized
Can be shown that the Kraft inequality
ζ(γ) =
K−1∑k=0
2−`(ak) ≤ 1 (26)
is a necessary condition for the unique decodability of a code γ
Proof can be found in [Cover and Thomas, 2006, p. 116] or[Wiegand and Schwarz, 2011, p. 25]
Thomas Wiegand Digital Image Communication 25 / 54
o
Source Coding Fundamentals Lossless Coding
Bound for Scalar Variable-Length Coding: The Entropy
Lower bound for average codeword length
Based on the Kraft inequality it can be shown that the average codewordlength ¯ for uniquely decodable scalar codes is bounded by
¯≥ H(S) = E {− log2 p(S)} = −M−1∑i=0
p(ai) log2 p(ai) (27)
An example proof can be found in [Wiegand and Schwarz, 2011, p. 27]
The lower bound H(S) is called the entropy of the source S
Redundancy of a code
The redundancy of a scalar code is given by the difference
% = ¯−H(S) ≥ 0 (28)
The redundancy is zero only if for all alphabet letters ai the length of thecorresponding codewords are `(ai) = − log2 p(ai)
The redundancy can only be zero if all probability masses p(ai) representnegative integer powers of 2
Thomas Wiegand Digital Image Communication 26 / 54
o
Source Coding Fundamentals Lossless Coding
Upper Bound for Minimum Average Codeword Length
Bounds for achievable average codeword length
The fundamental lower bound for ¯ is given by the entropy H(S), but it isnot always achievable (codewords must have integer number of bits)
A code with the codeword lengths `(ai) = d− log2 p(ai)e, ∀ai ∈ A canalways be constructed, yielding the upper bound
¯ =
M−1∑i=0
p(ai) d− log2 p(ai)e
<
M−1∑i=0
p(ai) (1− log2 p(ai)) = H(S) + 1 (29)
The minimum achievable average codeword length ¯min is bounded by
H(S) ≤ ¯min < H(S) + 1 (30)
Thomas Wiegand Digital Image Communication 27 / 54
o
Source Coding Fundamentals Lossless Coding
Entropy of a Binary Source
Binary entropy functionA binary source has probabilities p(0) = p and p(1) = 1− pThe entropy of the binary source is given as
H(S) = −p log2 p− (1− p) log2(1− p) = Hb(p) (31)
with Hb(x) being the so-called binary entropy function
0 0.25 0.5 0.75 10
0.2
0.4
0.6
0.8
1
P(a0)
R [b
it/sy
mbo
l]
Thomas Wiegand Digital Image Communication 28 / 54
o
Source Coding Fundamentals Lossless Coding
The Huffman Algorithm
Constructing lossless codes with minimum redundancy
Question: How to generate a prefix code with minimum redundancy?
The answer was given by D. A. Huffman in 1952 [Huffman, 1952]
The Huffman algorithm always yields a prefix code with minimum redundancy
For a proof that Huffman codes are optimal instantaneous codes (withminimum expected length), see [Cover and Thomas, 2006, p. 124ff]
The Huffman algorithm
1 Given an ensemble representing a memoryless discrete source
2 Pick the two symbols with lowest probabilities and merge them into a newauxiliary symbol and calculate its probability
3 If more than one symbol remains, repeat the previous step
4 Convert the code tree into a prefix code
Thomas Wiegand Digital Image Communication 29 / 54
o
Source Coding Fundamentals Lossless Coding
Example for the Design of a Huffman Code
P=0.03 ‘0’ ‘1’
P=0.06 ‘0’
‘1’ P=0.13 ‘0’
‘1’ P=0.27 ‘0’
‘1’ P=0.43 ‘0’
‘1’
P=0.57 ‘0’ ‘1’
‘0’
‘1’
P(7)=0.29
P(6)=0.28
P(5)=0.16
P(4)=0.14
P(3)=0.07
P(2)=0.03
P(1)=0.02
P(0)=0.01
’11’
’10’
‘01’
‘001’
‘0001’
‘00001’
‘000001’
‘000000’
Thomas Wiegand Digital Image Communication 30 / 54
o
Source Coding Fundamentals Lossless Coding
Conditional Huffman Codes
Reduce average codeword length for sources with memory
Random process {Sn} with memory: Design VLC for conditional pmfExample:
Stationary discrete Markov process, A = {a0, a1, a2}Conditional pmfs p(a|ak) = P (Sn =a |Sn−1 =ak) with k = 0, 1, 2
a a0 a1 a2 entropy
p(a|a0) 0.90 0.05 0.05 H(Sn|a0) = 0.5690
p(a|a1) 0.15 0.80 0.05 H(Sn|a1) = 0.8842 H(Sn|Sn−1) = 0.7331
p(a|a2) 0.25 0.15 0.60 H(Sn|a2) = 1.3527
p(a) 0.64 0.24 0.1 H(S) = 1.2575
Design Huffman code for conditional pmfs
aiHuffman codes for conditional pmfs Huffman code
for marginal pmfSn−1 = a0 Sn−1 = a1 Sn−1 = a2a0 1 00 00 1a1 00 1 01 00a2 01 01 1 01
¯0 = 1.1 ¯
1 = 1.2 ¯2 = 1.4 ¯= 1.3556
¯c = 1.1578
Thomas Wiegand Digital Image Communication 31 / 54
o
Source Coding Fundamentals Lossless Coding
Average Codeword Length of Conditional Huffman Codes
Bounds for the average codeword length
Average codeword length ¯k = ¯(Sn−1 =ak) is bounded by
H(Sn|ak) ≤ ¯k < H(Sn|ak) + 1 (32)
with conditional entropy of Sn given event {Sn−1 =ak}
H(Sn|ak) = H(Sn|Sn−1 =ak) = −M−1∑i=0
p(ai|ak) log2 p(ai|ak) (33)
Taking the expectation yields
M−1∑k=0
p(ak)H(Sn|ak) ≤M−1∑k=0
p(ak) ¯k <
M−1∑k=0
p(ak)H(Sn|ak) + 1 (34)
where the average codeword length of the conditional code is given by
¯=
M−1∑k=0
p(ak) ¯k (35)
Thomas Wiegand Digital Image Communication 32 / 54
o
Source Coding Fundamentals Lossless Coding
Conditional Entropy
Minimum average codeword length and conditional entropy
The lower bound for the average codeword length of conditional codes iscalled the conditional entropy of Sn given the random variable Sn−1
H(Sn|Sn−1) = E {− log2 p(Sn|Sn−1)} =
M−1∑k=0
p(ak)H(Sn|Sn−1 =ak)
(36)
The minimum achievable average codeword length ¯min for conditional codes
is bounded by
H(Sn|Sn−1) ≤ ¯min < H(Sn|Sn−1) + 1 (37)
Conditioning may reduce the average codeword length
The conditional entropy is less than or equal to the marginal entropy(with equality if and only if the process {Sn} is i.i.d.)
H(Sn|Sn−1) ≤ H(Sn) (38)
Thomas Wiegand Digital Image Communication 33 / 54
o
Source Coding Fundamentals Lossless Coding
Huffman Coding of Vectors
Joint coding of blocks of N symbols
Stationary discrete random sources S = {Sn} with an M -ary alphabetA = {a0, · · · , aM−1}N symbols are coded jointly: Design Huffman code for joint pmfp(a0, · · · , aN−1) = P (Sn =a0, · · · , Sn+N−1 =aN−1)
Minimum average codeword length ¯min per symbol is bounded by
H(Sn, · · · , Sn+N−1)
N≤ ¯
min <H(Sn, · · · , Sn+N−1)
N+
1
N(39)
where the block entropy is defined by
H(Sn, · · · , Sn+N−1) = E {− log2 p(Sn, · · · , Sn+N−1)} (40)
The following limit is called entropy rate
H(S) = limN→∞
H(S0, · · · , SN−1)
N(41)
Thomas Wiegand Digital Image Communication 34 / 54
o
Source Coding Fundamentals Lossless Coding
Entropy Rate
Entropy rate as fundamental bound for lossless coding
Entropy rate
H(S) = limN→∞
H(S0, · · · , SN−1)
N(42)
The limit in (42) always exists for stationary sources [Gallager, 1968]
The entropy rate H(S) is the greatest lower bound for the average codewordlength ¯ per symbol for all lossless coding techniques
¯≥ H(S) (43)
Entropy rate for iid processes
H(S) = limN→∞
E {− log2 p(S0, S1, · · · , SN−1)}N
= limN→∞
∑N−1n=0 E {− log2 p(Sn)}
N= H(S) (44)
Thomas Wiegand Digital Image Communication 35 / 54
o
Source Coding Fundamentals Lossless Coding
Entropy Rate for Markov Processes
Entropy rate for stationary Markov processes
H(S) = limN→∞
E {− log2 p(S0, S1, · · · , SN−1)}N
= limN→∞
E {− log2 p(S0)}+∑N−1
n=1 E {− log2 p(Sn|Sn−1)}N
= H(Sn|Sn−1) (45)
Example: Joint Huffman coding of 2 events and ¯ vs. table size NC
aiak p(ai, ak) codewords
a0a0 0.58 1a0a1 0.032 00001a0a2 0.032 00010a1a0 0.036 0010a1a1 0.195 01a1a2 0.012 000000a2a0 0.027 00011a2a1 0.017 000001a2a2 0.06 0011
N ¯ NC1 1.3556 32 1.0094 93 0.9150 274 0.8690 815 0.8462 2436 0.8299 7297 0.8153 21878 0.8027 65619 0.7940 19683
Thomas Wiegand Digital Image Communication 36 / 54
o
Source Coding Fundamentals Lossless Coding
Elias Coding and Arithmetic Coding
Motivation
Main drawbacks of block Huffman codes: Large table sizes
Another class of uniquely decodable codes are Elias codesand arithmetic codes
Encoding: Mapping of a string of N symbols s = {s0, s1, ..., sN−1} ontoa string of K bits b = {b0, b1, ..., bK−1}
γ : s 7→ b (46)
Decoding or parsing: Mapping the bit string onto the string of symbols
γ−1 : b 7→ s (47)
Complexity of code construction: Linear per symbol
Construction method: Recursive subdivision of the unit interval [0, 1)
Iterative encoding and decoding procedures
Thomas Wiegand Digital Image Communication 37 / 54
o
Source Coding Fundamentals Lossless Coding
Mapping of Symbol Sequences to Numbers
Representing a symbol sequence by a number in the interval [0, 1)
Consider a sequence of N random variables S(N) = {S0, S1, . . . , SN−1},each Si with alphabet of size Mi
Order alphabet symbols and let ηi(si) be a function that returnsthe corresponding symbol index in the range from 0 to Mi − 1
A realization s(N) = {s0, s1, . . . , sN−1} of S(N) can be represented bya unqiue real number r ∈ [0, 1)
r = ζ(s(N)
)=
N−1∑i=0
ηi(si) ·Bi with Bi =
i∏j=0
M−1j (48)
Note that when all Mj = M , the basis simplifies to Bi = M−i−1
Define comparison operators for symbol sequences
s(N)a > s
(N)b ⇐⇒ ζ
(s(N)a
)> ζ(s(N)b
)(49)
Thomas Wiegand Digital Image Communication 38 / 54
o
Source Coding Fundamentals Lossless Coding
Mapping of Symbol Sequences to Intervals
Representing a symbol sequence by a probability interval
The probability of the symbol sequence s(N) can be written as
p(s(N)
)= P
(S(N) =s(N)
)= P
(S(N)≤s(N)
)− P
(S(N)<s(N)
)(50)
A symbol sequence s(N) can be represented by an interval IN between twosuccessive levels of the cumulative probability mass function
IN =[LN , LN +WN
)=[P(S(N)<s(N)
), P(S(N)≤s(N)
))(51)
withLN = P
(S(N)<s(N)
)and WN = P
(S(N) =s(N)
)(52)
The intervals IN for different symbol sequences s(N) are disjoint
A symbol sequence s(N) can be uniquely representedby any value v inside the interval IN
Thomas Wiegand Digital Image Communication 39 / 54
o
Source Coding Fundamentals Lossless Coding
How Many Bits for Identifying an Interval?
Bit sequence b = {b0, b1, · · · , bK−1} of K bits for representing an interval INRepresent the value v as binary fraction
v =
K−1∑i=0
bi · 2i−1 = 0.b0b1 · · · bK−1 with v ∈ IN(s(N)) (53)
Size of interval p(s(N)
)governs number K of required bits
p(s(N))=1/2 → B={.0, .1}
p(s(N))=1/4 → B={.00, .01, .10, .11}
p(s(N))=1/8 → B={.000, .001, .010, .011, .100, .101, .110, .111}
Minimum number of bits is
K = K(s(N)) =
⌈− log2 p
(s(N))⌉ (54)
Binary number v that identifies the interval IN and determines the bit string b
v =⌈LN · 2K
⌉· 2−K (55)
Thomas Wiegand Digital Image Communication 40 / 54
o
Source Coding Fundamentals Lossless Coding
Redundancy of Elias Coding
Average codeword length
Consider coding of N symbols
Average codeword length per symbol
¯=1
NE{K(S(N))
}=
1
NE{⌈− log2 p(S
(N))⌉}
(56)
Applying inequalities dxe ≥ x and dxe < x+ 1 yields
1
NE{− log2 p(S
(N))}≤ ¯<
1
NE{
1− log2 p(S(N))
}(57)
Average codeword length is bounded
1
NH(S(N)) ≤ ¯<
1
NH(S(N)) +
1
N(58)
The redundancy approaches zero as the number N of coded symbolsapproaches infinity (similar to block Huffman coding)
Thomas Wiegand Digital Image Communication 41 / 54
o
Source Coding Fundamentals Lossless Coding
Example: IID Source
Example for an iid source for which an optimum Huffman code exists
symbol ak pmf p(ak) Huffman code
a0=‘A’ 0.25 = 2−2 00a1=‘B’ 0.25 = 2−2 01a2=‘C’ 0.50 = 2−1 1
Suppose we intend to send the symbol string s =’CABAC’
Using the Huffman code, the bit string would be b =’10001001’
An alternative to Huffman coding is Elias coding
The probability p(s) is given by
p(s) = p(′C ′) · p(′A′) · p(′B′) · p(′A′) · p(′C ′) =1
2· 1
4· 1
4· 1
4· 1
2=
1
256
The size of the bit string is d− log2 p(s)e = 8 bit
Thomas Wiegand Digital Image Communication 42 / 54
o
Source Coding Fundamentals Lossless Coding
Iterative Algorithm for Elias Coding
Iterative interval subdivision
Consider sub-sequences s(n) = {s0, s1, · · · , sn−1} with 1 ≤ n ≤ NIteration rule for the interval width
Wn+1 = Wn · p(sn | s0, s1, . . . , sn−1 ) (59)
Iteration rule of lower interval boundary
Ln+1 = Ln +Wn · c(sn | s0, s1, . . . , sn−1 ) (60)
with the cumulative probability mass function (cmf) c(·) being defined as
c(sn | s0, s1, . . . , sn−1 ) =∑
∀a∈An: a<sn
p(a | s0, s1, . . . , sn−1 ) (61)
An interval In+1 is always nested inside the interval In
Thomas Wiegand Digital Image Communication 43 / 54
o
Source Coding Fundamentals Lossless Coding
Iterative Interval Subdivision for Different Sources
Iterative interval subdivision for different sources
Derivation above for general case of dependent and differently distributedrandom variables
For i.i.d. sources, interval refinement can be simplified
Wn = Wn−1 · p(sn−1) (62)
Ln = Ln−1 +Wn−1 · c(sn−1) (63)
For Markov sources with conditional pmf p(sn|sn−1) and conditionalcmf c(sn|sn−1)
Wn = Wn−1 · p(sn−1|sn−2) (64)
Ln = Ln−1 +Wn−1 · c(sn−1|sn−2) (65)
For non-stationary sources, the probabilities p(·) can be adapted during thecoding process
Thomas Wiegand Digital Image Communication 44 / 54
o
Source Coding Fundamentals Lossless Coding
Encoding Algorithm for Elias Codes
Encoding algorithm:
1 Given is a sequence {s0, · · · , sN−1} of N symbols
2 Initialization of the iterative process by W0 = 1, L0 = 0
3 For each n = 0, 1, · · · , N − 1, determine the interval In+1 by
Wn+1 = Wn · p(sn|s0, · · · , sn−1)
Ln+1 = Ln +Wn · c(sn|s0, · · · , sn−1)
4 Determine the codeword length by K = d− log2WNe5 Transmit the codeword b of K bits that represents
the fractional part of v = dLN 2Ke 2−K
Thomas Wiegand Digital Image Communication 45 / 54
o
Source Coding Fundamentals Lossless Coding
Example for Elias Encoding
s0=‘C’ s1=‘A’ s2=‘B’
W1 = W0 · p(‘C’) W2 = W1 · p(‘A’) W3 = W2 · p(‘B’)= 1 · 2−1 = 2−1 = 2−1 · 2−2 = 2−3 = 2−3 · 2−2 = 2−5
= (0.1)b = (0.001)b = (0.00001)b
L1 = L0 +W0 · c(‘C’) L2 = L1 +W1 · c(‘A’) L3 = L2 +W2 · c(‘B’)= L0 + 1 · 2−1 = L1 + 2−1 · 0 = L2 + 2−3 · 2−2
= 2−1 = 2−1 = 2−1 + 2−5
= (0.1)b = (0.100)b = (0.10001)b
s3=‘A’ s4=‘C’ termination
W4 = W3 · p(‘A’) W5 = W4 · p(‘C’) K = d− log2W5e = 8= 2−5 · 2−2 = 2−7 = 2−7 · 2−1 = 2−8
= (0.0000001)b = (0.00000001)b v =⌈L5 2K
⌉2−K
L4 = L3 +W3 · c(‘A’) L5 = L4 +W4 · c(‘C’) = 2−1 + 2−5 + 2−8
= L3 + 2−5 · 0 = L4 + 2−7 · 2−1
= 2−1 + 2−5 = 2−1 + 2−5 + 2−8 b = ‘10001001′
= (0.1000100)b = (0.10001001)b
Thomas Wiegand Digital Image Communication 46 / 54
o
Source Coding Fundamentals Lossless Coding
Illustration of Iterative Coding
Thomas Wiegand Digital Image Communication 47 / 54
o
Source Coding Fundamentals Lossless Coding
Decoding Algorithm for Elias Codes
Decoding algorithm:
1 Given is the number N of symbols to be decoded anda codeword b = {b0, · · · , bK−1} of K bits
2 Determine the interval representative v according to
v =
K−1∑i=0
bi 2−i
3 Initialization of the iterative process by W0 = 1, L0 = 04 For each n = 0, 1, · · · , N − 1, do the following:
1 For each ai ∈ An, determine the interval In+1(ai) by
Wn+1(ai) = Wn · p(ai|s0, . . . , sn−1)
Ln+1(ai) = Ln +Wn · c(ai|s0, . . . , sn−1)
2 Select the letter ai ∈ An for which v ∈ In+1(ai)and set sn = ai, Wn+1 = Wn+1(ai), Ln+1 = Ln+1(ai)
Thomas Wiegand Digital Image Communication 48 / 54
o
Source Coding Fundamentals Lossless Coding
Arithmetic Coding
Arithmetic coding as finite precision implementation of Elias coding
Problem with Elias codes: Precision requirement for WN and LN
Arithmetic codes: Variant of Elias codes with fixed-precision arithmetic
Represent pmfs p(a) and cmfs c(a) and width Wn with finite number of bits
Loss in coding efficiency due to rounding is typically negligible
Representation of the lower interval boundary Ln has the structure
Ln = 0. aaaaa · · · a︸ ︷︷ ︸settled bits
0111111 · · · 1︸ ︷︷ ︸outstanding bits
xxxxx · · ·x︸ ︷︷ ︸active bits
00000 · · ·︸ ︷︷ ︸trailing bits
where
“settled bits” are not modified in following interval updates (can be output)“outstanding bits” may be modified by a carry from the active bits“active bits” directly modified by following interval update
Most practical variant of arithmetic coding is binary arithmetic coding
Symbols are first binarized using a variable length code
Decoding search is reduced to one comparison
Multiplication-free algorithms (e.g., M coder) for binary arithmetic coding
Thomas Wiegand Digital Image Communication 49 / 54
o
Source Coding Fundamentals Lossless Coding
Comparison of Lossless Coding Techniques
Example: Markov source
Instantaneous entropy rate Hinst(S, L) = 1LH(S0, S1, . . . , SL−1)
Thomas Wiegand Digital Image Communication 50 / 54
o
Source Coding Fundamentals Lossless Coding
Conditional and Adaptive Codes
Coding of source with memory and/or varying statistics
One approach would be a switch Huffman code trained on the conditionalprobabilities
The resulting number of Huffman code tables is often too large in practice
Hence, conditional entropy coding is typically done using arithmetic codes
In adaptive arithmetic coding, probabilities p(ak) are estimated/adaptedsimultaneously at encoder and decoder
Statistical dependencies can be exploited using so-called context modelingtechniques:Conditional probabilities p(ak|zk) with zk being a context/state that issimultaneously computed at encoder and decoder
Thomas Wiegand Digital Image Communication 51 / 54
o
Source Coding Fundamentals Lossless Coding
Forward and Backward Adaptation
The two basic approaches for adaptation are
Forward adaptation:
Gather statistics for a large enough block of source symbolsTransmit adaptation signal to decoder as side informationDisadvantage: Increased bit rate due to side information
Backward adaptation:
Gather statistics simultaneously at coder and decoderDrawback: Error resilience
With today’s packet-switched transmission systems, an efficient combination ofthe two adaptation approaches can be achieved:
1 Gather statistics for the entire packet and provide initialization of entropycode at the beginning of the packet
2 Conduct backwards adaptation for each symbol inside the packet in order tominimize the size of the packet
Thomas Wiegand Digital Image Communication 52 / 54
o
Source Coding Fundamentals Lossless Coding
Illustration of Adaptive Coding
Encoding
Ch
an
ne
l
Decoding Reconstructed
symbols Source symbols
Ch
an
ne
l
Computation
of adaptation
signal
Delay
Computation
of adaptation
signal Delay
Encoding
Computation
of adaptation
signal Delay
Decoding Reconstructed
symbols
Forward Adaptation
Backward Adaptation
Source
symbols
Thomas Wiegand Digital Image Communication 53 / 54
o
Source Coding Fundamentals Lossless Coding
Summary
Entropy is the lower bound for the average number of bits/symbol foruniquely decodable scalar codes
Entropy rate is the lower bound for the average number of bits/symbol forall uniquely decodable lossless codes
Huffman coding
is an efficient and simple entropy coding methodneeds code tablecan be inefficient for certain probabilitiesdifficult to use for exploiting statistical dependencies and time-varyingprobabilities
Arithmetic coding
is a universal method for encoding strings of symbolsdoes not need a code table, but a table for storing probabilitiestypically requires serial computation of interval and probability estimationupdate (in case probabilities are adapted)approaches entropy for long stringsis well suited for exploiting statistical dependencies and coding withtime-varying probabilities
Thomas Wiegand Digital Image Communication 54 / 54