Arithmetic Coding

transcript

Source Coding

Wireless Ad Hoc NetworksUniversity of Tehran, Dept. of E&CE,

Fall 2007,Farshad Lahouti

Media Basics

Contents:Brief introduction to digital media (Audio/Video)

DigitizationCompressionRepresentationStandards

Signal Digitization

Pulse Code Modulation (PCM)

Sampling

Sampling theory – Nyquist theoremthe discrete time sequence of a sampled continuous function { V(tn) } contains enough information to reproduce the function V=V(t) exactly provided that the sampling rate is at least twice that of the highest frequency contained in the original signal V(t)

Analog signal sampled at a constant ratetelephone: 4 kHz signal BW, 8,000 samples/secCD music: 22 kHz signal BW, 44,100 samples/sec

Quantization

Discretization along energy axisEvery time interval the signal is converted to a digital equivalentUsing 2 bits the following signal can be digitized

Digitization Examples

Each sample quantized, i.e., rounded

e.g., 28 possible quantized values

Each quantized value represented by bits

8 bits for 256 values

Example: 8,000 samples/sec, 256 quantized values --> 64,000 bpsReceiver converts it back to analog signal:

some quality reductionExample rates

CD: 1.411 Mbps – 16 bits/sample stereoInternet telephony: 5.3 - 13 kbpsMP3: 96, 128, 160 kbps

Approximate size for 1 second audio

256Kb16Khz16bitStereo

1441Kb*44.1Khz16bitStereo

2116Kb44.1Khz24bitStereo

128Kb8Khz16bitMono

128Kb8Khz8bitStereo

64Kb8Khz8bitMono

File SizeFsResolutionChannels

1CD 700M 70-80 mins

Lossy and lossless Compression

Lossless compression (more later)Data Compression APE (MonkeyAudio) Image compression for biomedical applications…

Lossy compressionHide errors where humans will not see or hear itStudy hearing and vision system to understand how

we see/hearPerceptual Coding

Lossless compressionDecoded signal is mathematically equivalent to the original oneDrawback : achieves only a small or modest level of compression

Lossy compressionDecoded signal is of a lower quality than the original oneAdvantage: achieves very high degree of compression Objective: maximize the degree of compression with a certain quality

General compression requirementsEnsure a good quality of decoded signalAchieve high compression ratiosMinimize the complexity of the encoding and decoding processSupport multiple channelsSupport various data ratesGive small delay in processing

Requirements for Compression Algorithms

Compression Tools

Transform CodingVariable Rate Coding

Entropy CodingHuffman CodingRun-length Coding

Predictive CodingDPCMADPCM

Ignores semantics of input data and compresses media streams by regarding them as sequences of digits or symbols

Examples: runlength encoding, Huffman encoding , ...Run-length encoding

A compression technique that replaces consecutive occurrences of a symbol with the symbol followed by the number of times it is repeated

a a a a a => 5a000000000000000000001111111 => 0x20 1x7

Most useful where symbols appear in long runs: e.g., for images that have areas where the pixels all have the same value, fax and cartoons for examples.

Variable Length Coding

Entropy codingA few words about Entropy

EntropyA measure of information content

Entropy of the English LanguageHow much information does each character in “typical” English text contain?

From a probability viewIf the probability of a binary event is 0.5 (like a coin), then, on average, you need one bit to represent the result of this event.As the probability of a binary event increases or decreases, the number of bits you need, on average, to represent the result decreases

The figure is expressing that unless an event is totally random, you can convey the information of the event in fewer bits, on average, than it might first appear

Entropy (Shannon 1948)

For a set of messages S with probability p(s), s ∈S, the self information of s is:

measured in bits if the log is base 2.The lower the probability, the higher the self-informationEntropy is the weighted average of self information.

H S p sp ss S

( ) ( ) log( )

=∈∑ 1

i sp s

p s( ) log( )

log ( )= = −1

Entropy Example

( ) {0.25, 0.25, 0.25, 0.125, 0.125}p S =

( ) 3 0.25 log 4 2 0.125 log 8 2.25H S = × + × =

( ) {0.5, 0.125, 0.125, 0.125, 0.125}p S =

( ) {0.75, 0.0625, 0.0625, 0.0625, 0.0625}p S =

( ) 0.5 log 2 4 0.125 log 8 2H S = + × =

( ) 0.75 log(4 / 3) 4 0.0625 log16 1.3H S = + × =

Statistical (Entropy) CodingEntropy Coding• Lossless coding• Takes advantage of the probabilistic nature of information• Example: Huffman coding, arithmetic coding

Theorem (Shannon)(lower bound): For any probability distribution p(S) with

associated uniquely decodable code C,

H S l Ca( ) ( )≤

Recall Huffman coding…

A popular compression technique that assigns variable length codes to symbols, so that the most frequently occurring symbols have the shortest codesHuffman coding is particularly effective where the data are dominated by a small number of symbolsSuppose to encode a source of N =8 symbols: {a,b,c,d,e,f,g,h} The probabilities of these symbols are: P(a) = 0.01, P(b)=0.02, P(c)=0.05, P(d)=0.09, P(e)=0.18, P(f)=0.2, P(g)=0.2, P(h)=0.25If we assign 3 bits per symbol (N =2^3=8), the average length of the symbols is:

The theoretical lowest average length – entropyH(P) = - ∑ i

N=0 P(i)log2P(i) = 2.57 bits /symbol

If we use Huffman encoding, the average length = 2.63 bits/symbol

Huffman Coding

The Huffman code assignment procedure is based on a binary treestructure. This tree is developed by a sequence of pairing operationsin which the two least probable symbols are joined at a node to form two branches of a tree. More precisely:

1. The list of probabilities of the source symbols are associated with the leaves of a binary tree.2. Take the two smallest probabilities in the list and generate an intermediate node as their parent and label the branch from parent to one of the child nodes 1 and the branch from parent tothe other child 0.3. Replace the probabilities and associated nodes in the list by the single new intermediate node with the sum of the two probabilities. If the list contains only one element, quit. Otherwise, go to step 2.

Huffman Coding (Cont’d)

The new average length of the source is

The efficiency of this code isHow do we estimate the P(i) ? Relative frequency of the symbolsHow to decode the bit stream ? Share the same Huffman tableHow to decode the variable length codes ? Prefix codes have the property that no codeword can be the prefix (i.e., an initial segment) of any other codeword. Huffman codes are prefix codes !

11010000000010001 => ?Does the best possible codes guarantee to always reduce the size of sources? No. Worst case exists. Huffman coding is better averagely.Huffman coding is particularly effective where the data are dominated by a small number of symbols

Huffman Coding (Cont’d)

Transform CodingFrequency analysis ?

Time domain ? Not easy!Time domain -> Transform domain

Sequence to be coded is converted into new sequence using a transformation rule.New sequence - transform coefficients.Process is reversible - get back to original sequence using inverse transformation.

Example - Fourier transform (FT)Coefficients represent proportion of energy contributed by different frequencies.

Transform Coding (Cont…)

In transform coding - choose transformation such that only subset of coefficients have significant values.Energy confined to subset of ‘important’ coefficients.Known as ‘energy compaction’.Example - FT of bandlimited signal:

Based on the fact that neighboring samples … x(n-1), x(n), x(n+1), … in a discrete time sequence changes slowly in many applications, e.g., voice, audio, …A differential PCM coder (DPCM) quantizes and encodes the difference d(n) = x(n) – x(n-1)Advantage of using difference d(n) instead of the actual value x(n)

Reduce the number of bits to represent a sampleGeneral DPCM: d(n) = x(n) – a1x(n-1) - a2x(n-2) -…- akx(n-k)

a1, a2, …ak are fixedAdaptive DPCM: a1, a2, …ak are dynamically changed with signal

Differential Coding – DPCM & ADPCM

Psychoacoustic Human aural response

Psychoacoustic ModelBasically: If you can’t hear the sound, don’t encode itNatural Bandlimiting

Audio perception is 20-20 kHz but most sounds in low frequencies (e.g., 2 kHz to 4 kHz)

Human frequency response:Frequency masking: If a stronger sound and weaker sound compete, you can’t hear the weaker sound. Don’t encode it.Temporal masking: After a loud sound, there’s a while before we can hear a soft sound. Stereo redundancy: At low frequencies, we can’t detect where the sound is coming from. Encode it mono.

Perceptual Coding: ExamplesMP3 = MPEG 1/2 layer 3 audio; achieves CD quality in about 192 kbps (a 3.7:1 compression ratio): higher compression possible Sony MiniDisc uses Adaptive Transform Coding (ATRAC) to achieve a 5:1 compression ratio (about 141 kbps)

http://www.mpeg.orghttp://www.minidisc.org/aes_atrac.html

Artefacts of compression

Some areas of the spectrum are lost in the encoding process

MP3 encoded recordings rarely sound identical to original uncompressed audio files On small or PC speakers, however, MP3 compressed audio can be acceptable

Examples

(1.12MB) 128kbps (105KB)

96Kbps(78.9KB) 64kbps (52.6KB)

WAV File (34Mb)

Mp3 file (3Mb)

LPC and Parametric CodingLPC and Parametric Coding

LPC (Linear Predictive Coding)Based on the human utterance organ model

s(n) = a1s(n-1) + a2s(n-2) +…+ aks(n-k) + e(n)Estimate a1, a2, …ak and e(n) for each piece (frame) of speechEncode and transmit/store a1, a2, …ak and type of e(n) Decoder reproduce speech using a1, a2, …ak and e(n) - very low bit rate but relatively low speech quality

Parametric coding: Only coding parameters of sound generation modelLPC is an example where parameters are a1, a2, …ak , e(n) Music instrument parameters: pitch, loudness, timbre, …

Speech CompressionSpeech Compression

Handling speech with other media information such as text, images, video, and data is the essential part of multimedia applicationsThe ideal speech coder has a low bit-rate, high perceived quality, low signal delay, and low complexity.Delay

Less than 150 ms one- way end- to- end delay for a conversation Processing (coding) delay, network delayOver Internet, ISDN, PSTN, ATM, …

ComplexityComputational complexity of speech coders depends on algorithmsContributes to achievable bitrate and processing delay

G.72x Speech Coding StandardsG.72x Speech Coding Standards

Quality“intelligible” - > “natural” or “subjective” qualityDepending on bitrate

Bit-rate

G.72x Audio Coding StandardsG.72x Audio Coding Standards

Silence Compression - detect the "silence", similar to run-length coding Adaptive Differential Pulse Code Modulation (ADPCM)e.g., in CCITT G.721 -- 16 or 32 Kb/s.

(a) Encodes the difference between two or more consecutive signals; the difference is then quantized - - > hence the loss (b) Adapts at quantization so fewer bits are used when the value is smaller. It is necessary to predict where the waveform is headed -- > difficult

Linear Predictive Coding (LPC) fits signal to speech model and then transmits parameters of model --> sounds like a computer talking, 2.4 Kb/s.

Video Digitization and CompressionVideo is sequence of images (frames) displayed at constant frame rate

e.g. 24 images/secDigital image is a 2-D array of pixelsEach pixel represented by bits

R:G:BY:U:V

Y = 0.299R + 0.587G + 0.114B (Luminance or Brightness) U = B - Y (Chrominance 1, color difference) V = R - Y (Chrominance 2, color difference)

RedundancyspatialTemporal

Intra-frame coding

Transform Quantize Encode

JPEG (Joint Photographic Experts Group)

Original size640x480x3=922KB

JPEG Compression Ratios: 30:1 to 50:1 compression is possible with small to moderate defects100:1 compression is quite feasible for low-quality purposes

JPEG Steps1 Block Preparation:

From RGB to YUV (YIQ) planes 8x8 blocks

2 Transform: 2-D Discrete Cosine Transform (DCT) on blocks (lossy?)

3 Quantization: Quantize DCT Coefficients (lossy)

4 Encoding of Quantized Coefficients (lossless)Zigzag ScanDifferential Pulse Code Modulation (DPCM) on DC component Run Length Encoding (RLE) on AC ComponentsEntropy Coding: Huffman or Arithmetic

JPEG Transform Quantize Encode

Transform Quantize

Encode

BlockPreparation

Decompression:Reverse the order

RGB Input DataRGB Input Data After Block PreparationAfter Block Preparation

(1) Block Preparation

Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes:

Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane.U, V: (320 X 240 8-bits/pixel) Chrominance (color) planes.

(2) Discrete Cosine Transform (DCT)A transformation from spatial domain to frequency domain (similar to FFT)Definition of 8-point DCT:

F[0,0] is the DC component and other F[u,v] define AC components of DCT

The 64 (8 x 8) DCT Basis Functions

DC Componentu

Block-based 2-D DCT •Karhunen-Loeve (KL) transform ?

8x8 DCT Example

Original values of an 8x8 block(in spatial domain)

Corresponding DCT coefficientscoefficients(in frequency domain)(in frequency domain)

DC Componentor u

(3) Quantized DCT Coefficients

Uniform quantization:Divide by constant N and round result.

In JPEG, each DCT F[u,v] is divided by a constant q(u,v).

- quantization table (filter ?)

q(u,v)

F[u,v]

RoundedF[u,v]/ q(u,v)

(4) Zigzag ScanMaps an 8x8 block into a 1 x 64 vectorZigzag pattern group low frequency coefficients in top of vector.

(5) Encoding of Quantized DCT Coefficients

DC Components:DC component of a block is large and varied, but often close to the DC value of the previous block.Encode the difference of DC component from previous 8x8 blocks using Differential Pulse Code Modulation (DPCM).

AC components:The 1x64 vector has lots of zeros in it. Using RLE, encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component. Send (0,0) as end-of-block value.

(6) Runlength Coding

Further compression: statistical (entropy) coding

A typical 8x8 block of quantized DCT coefficients. Most of the higher order coefficients have been quantized to 0.

12 34 0 54 0 0 0 087 0 0 12 3 0 0 016 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Zig-zag scan: the sequence of DCT coefficients to be transmitted:12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 .....

DC coefficient (12) is sent via a separate Huffman table.Runlength coding remaining coefficients:

34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 .....

Compression Ratio: 12.3

Quantization Table Used

Compressed Image

OriginalImage

JPEGJPEGExampleExample

Blocking artifact

(JPEG 2000 ?)

Intra-codedI-frame

PredictedP-frame

MPEG: Inter-Frame Coding

Motion Estimation + Compesentation

Video compression: A big picture

Bi-Directional Prediction

I B PB B PB B PB B IB

Intra-CodedI-Frame

Bi-directionalPredictedB-Frame

Group of frames (GOF)

Q: 3D Transform Coding ?

VBR vs CBR: Rate Control

CBRVideoEncoder

Smoothing Buffer

Rate Controller

Raw VBR

Variable-Bit-RateFixed quantizer Qp“Constant” qualityE.g. RMVB

Constant-Bit-RateAdaptive quanitzer“Constant” rate – easier control

Difference (compared to target rate can be 0.5% or less)E.g. RM, MPEG-1

Rate-distortion optimizationRecall that transport layer also has rate control …

Standardization Organizations

ITU-T VCEG (Video Coding Experts Group)

standards for advanced moving image coding methods appropriate for conversational and non-conversational audio/visual applications.

ISO/IEC MPEG (Moving Picture Experts Group)

standards for compression and coding, decompression, processing, and coded representation of moving pictures, audio, and their combination

RelationITU-T H.262~ISO/IEC 13818-2(mpeg2)

Generic Coding of Moving Pictures and Associated Audio.ITU-T H.263~ISO/IEC 14496-2(mpeg4)

WG - work groupSG – sub group

ISO/IEC JTC 1/SC 29/WG 1Coding of Still Pictures

ISO/IEC JTC 1/SC 29/WG 11

Coding Rate and Standards

8 16 64 384 1.5 5 20

kbit/s Mbit/s

Very low bitrate Low bitrate Medium bitrate High bitrate

Mobilevideophone

Videophoneover PSTN

ISDNvideophone Digital TV HDTVVideo CD

MPEG-4 MPEG-1 MPEG-2H.261H.263

ISO MPEG-1 (Moving Pictures Experts Group).

MPEG-1 Progressively scanned video for

multimedia applications, at a bit rate 1.5Mb/s access time for CD-ROM players. Video format: near VHS quality

ISO MPEG-2

MPEG-2

Standard for Digital Television, DVD4 to 8 Mb/s / 10 to 15 Mb/s >> MPEG -1Supports various modes of scalability (Spatial, temporal, SNR)There are differences in quantization and better Variable length codes tables for progressive video sequences.

ISO MPEG-4

A much broader standard. MPEG-4 was aimed primarily at low bit rate video communication, but not limited toApplications:

1. Digital television2. Interactive graphics

applications 3. Interactive multimedia

(World Wide Web)Two version: Divx 3 and Divx 4 (Internet world)Important concept

Video object

A video frame

Background VOP

MPEG-4 Object Video

Instead of ”frames”: Video Object PlanesShape Adaptive DCT

Alpha map

SA DCT

MPEG-4 Structure

A/Vobject

Decoder

posito

Bitstream Audio/Video scene

A/Vobject

Decoder

A/Vobject

Decoder

Object 2

Object 1

Object 3

Object 4

Example

Problems, comments?

Another Example

Status

Microsoft, RealVideo, QuickTime, ...

But only recentagular frame based

H.264 = MPEG-4 part 10 (2003)Shape coding Synthetic scene

H.264H.26x (x=1,2,3)

ITU-T Recommendations Real time video communication applications.

MPEG Standards Video storage, broadcast video, video streaming applications

H.26 L = ITU-T + MPEG = JVT codingLatest project of Joint Video Team formed by ITU-T SG16 Q6 ( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG )Basic configuration similar to H.263 and MPEG-4 Part 2

H.264 DesignGoalsEnhanced Compression performanceProvision of network friendly packet based video representation addressing the conversational and non-conversational applicationsConceptual Separation between Video Coding Layer ( VCL) and Network Adaptation Layer ( NAL)

H.264 Design ( Contd. )

Video Coding Layer

Data Partitioning

Network Adaptation Layer

trol D

Macro-block

Slice/Partition

H.264 Design ( Contd.)

Video Coding LayerCore High compression representation Block based motion compensated transform video coderNew features enabled to achieve significant improvement in coding efficiency.

Network Adaptation LayerProvides the ability to customize the format of the VCL data over a variety of networksUnique packet based interfacePacketisation and appropriate signaling is a part of NAL specification

Video Coding Evolution

Y. Wang, J. Ostermann, Y.-Q. Zhang, Digital Video Processing and Communication. Prentice Hall, 2001.

Arithmetic Coding

Technology