Post on 10-May-2015
transcript
1
Source Coding
Wireless Ad Hoc NetworksUniversity of Tehran, Dept. of E&CE,
Fall 2007,Farshad Lahouti
Media Basics
Contents:Brief introduction to digital media (Audio/Video)
DigitizationCompressionRepresentationStandards
2
Signal Digitization
Pulse Code Modulation (PCM)
Sampling
Sampling theory – Nyquist theoremthe discrete time sequence of a sampled continuous function { V(tn) } contains enough information to reproduce the function V=V(t) exactly provided that the sampling rate is at least twice that of the highest frequency contained in the original signal V(t)
Analog signal sampled at a constant ratetelephone: 4 kHz signal BW, 8,000 samples/secCD music: 22 kHz signal BW, 44,100 samples/sec
3
Quantization
Discretization along energy axisEvery time interval the signal is converted to a digital equivalentUsing 2 bits the following signal can be digitized
Digitization Examples
Each sample quantized, i.e., rounded
e.g., 28 possible quantized values
Each quantized value represented by bits
8 bits for 256 values
Example: 8,000 samples/sec, 256 quantized values --> 64,000 bpsReceiver converts it back to analog signal:
some quality reductionExample rates
CD: 1.411 Mbps – 16 bits/sample stereoInternet telephony: 5.3 - 13 kbpsMP3: 96, 128, 160 kbps
4
Approximate size for 1 second audio
256Kb16Khz16bitStereo
1441Kb*44.1Khz16bitStereo
2116Kb44.1Khz24bitStereo
128Kb8Khz16bitMono
128Kb8Khz8bitStereo
64Kb8Khz8bitMono
File SizeFsResolutionChannels
1CD 700M 70-80 mins
Lossy and lossless Compression
Lossless compression (more later)Data Compression APE (MonkeyAudio) Image compression for biomedical applications…
Lossy compressionHide errors where humans will not see or hear itStudy hearing and vision system to understand how
we see/hearPerceptual Coding
5
Lossless compressionDecoded signal is mathematically equivalent to the original oneDrawback : achieves only a small or modest level of compression
Lossy compressionDecoded signal is of a lower quality than the original oneAdvantage: achieves very high degree of compression Objective: maximize the degree of compression with a certain quality
General compression requirementsEnsure a good quality of decoded signalAchieve high compression ratiosMinimize the complexity of the encoding and decoding processSupport multiple channelsSupport various data ratesGive small delay in processing
Requirements for Compression Algorithms
Compression Tools
Transform CodingVariable Rate Coding
Entropy CodingHuffman CodingRun-length Coding
Predictive CodingDPCMADPCM
6
Ignores semantics of input data and compresses media streams by regarding them as sequences of digits or symbols
Examples: run- length encoding, Huffman encoding , ...Run-length encoding
A compression technique that replaces consecutive occurrences of a symbol with the symbol followed by the number of times it is repeated
a a a a a => 5a000000000000000000001111111 => 0x20 1x7
Most useful where symbols appear in long runs: e.g., for images that have areas where the pixels all have the same value, fax and cartoons for examples.
Variable Length Coding
Entropy codingA few words about Entropy
EntropyA measure of information content
Entropy of the English LanguageHow much information does each character in “typical” English text contain?
From a probability viewIf the probability of a binary event is 0.5 (like a coin), then, on average, you need one bit to represent the result of this event.As the probability of a binary event increases or decreases, the number of bits you need, on average, to represent the result decreases
The figure is expressing that unless an event is totally random, you can convey the information of the event in fewer bits, on average, than it might first appear
7
Entropy (Shannon 1948)
For a set of messages S with probability p(s), s ∈S, the self information of s is:
measured in bits if the log is base 2.The lower the probability, the higher the self-informationEntropy is the weighted average of self information.
H S p sp ss S
( ) ( ) log( )
=∈∑ 1
i sp s
p s( ) log( )
log ( )= = −1
Entropy Example
( ) {0.25, 0.25, 0.25, 0.125, 0.125}p S =
( ) 3 0.25 log 4 2 0.125 log 8 2.25H S = × + × =
( ) {0.5, 0.125, 0.125, 0.125, 0.125}p S =
( ) {0.75, 0.0625, 0.0625, 0.0625, 0.0625}p S =
( ) 0.5 log 2 4 0.125 log 8 2H S = + × =
( ) 0.75 log(4 / 3) 4 0.0625 log16 1.3H S = + × =
8
Statistical (Entropy) CodingEntropy Coding• Lossless coding• Takes advantage of the probabilistic nature of information• Example: Huffman coding, arithmetic coding
Theorem (Shannon)(lower bound): For any probability distribution p(S) with
associated uniquely decodable code C,
H S l Ca( ) ( )≤
Recall Huffman coding…
A popular compression technique that assigns variable length codes to symbols, so that the most frequently occurring symbols have the shortest codesHuffman coding is particularly effective where the data are dominated by a small number of symbolsSuppose to encode a source of N =8 symbols: {a,b,c,d,e,f,g,h} The probabilities of these symbols are: P(a) = 0.01, P(b)=0.02, P(c)=0.05, P(d)=0.09, P(e)=0.18, P(f)=0.2, P(g)=0.2, P(h)=0.25If we assign 3 bits per symbol (N =2^3=8), the average length of the symbols is:
The theoretical lowest average length – entropyH(P) = - ∑ i
N=0 P(i)log2P(i) = 2.57 bits /symbol
If we use Huffman encoding, the average length = 2.63 bits/symbol
Huffman Coding
9
The Huffman code assignment procedure is based on a binary treestructure. This tree is developed by a sequence of pairing operationsin which the two least probable symbols are joined at a node to form two branches of a tree. More precisely:
1. The list of probabilities of the source symbols are associated with the leaves of a binary tree.2. Take the two smallest probabilities in the list and generate an intermediate node as their parent and label the branch from parent to one of the child nodes 1 and the branch from parent tothe other child 0.3. Replace the probabilities and associated nodes in the list by the single new intermediate node with the sum of the two probabilities. If the list contains only one element, quit. Otherwise, go to step 2.
Huffman Coding (Cont’d)
Huffman Coding (Cont’d)
10
The new average length of the source is
The efficiency of this code isHow do we estimate the P(i) ? Relative frequency of the symbolsHow to decode the bit stream ? Share the same Huffman tableHow to decode the variable length codes ? Prefix codes have the property that no codeword can be the prefix (i.e., an initial segment) of any other codeword. Huffman codes are prefix codes !
11010000000010001 => ?Does the best possible codes guarantee to always reduce the size of sources? No. Worst case exists. Huffman coding is better averagely.Huffman coding is particularly effective where the data are dominated by a small number of symbols
Huffman Coding (Cont’d)
Transform CodingFrequency analysis ?
Time domain ? Not easy!Time domain -> Transform domain
Sequence to be coded is converted into new sequence using a transformation rule.New sequence - transform coefficients.Process is reversible - get back to original sequence using inverse transformation.
Example - Fourier transform (FT)Coefficients represent proportion of energy contributed by different frequencies.
11
Transform Coding (Cont…)
In transform coding - choose transformation such that only subset of coefficients have significant values.Energy confined to subset of ‘important’ coefficients.Known as ‘energy compaction’.Example - FT of bandlimited signal:
Based on the fact that neighboring samples … x(n-1), x(n), x(n+1), … in a discrete time sequence changes slowly in many applications, e.g., voice, audio, …A differential PCM coder (DPCM) quantizes and encodes the difference d(n) = x(n) – x(n-1)Advantage of using difference d(n) instead of the actual value x(n)
Reduce the number of bits to represent a sampleGeneral DPCM: d(n) = x(n) – a1x(n-1) - a2x(n-2) -…- akx(n-k)
a1, a2, …ak are fixedAdaptive DPCM: a1, a2, …ak are dynamically changed with signal
Differential Coding – DPCM & ADPCM
12
Psychoacoustic Human aural response
Psychoacoustic ModelBasically: If you can’t hear the sound, don’t encode itNatural Bandlimiting
Audio perception is 20-20 kHz but most sounds in low frequencies (e.g., 2 kHz to 4 kHz)
Human frequency response:Frequency masking: If a stronger sound and weaker sound compete, you can’t hear the weaker sound. Don’t encode it.Temporal masking: After a loud sound, there’s a while before we can hear a soft sound. Stereo redundancy: At low frequencies, we can’t detect where the sound is coming from. Encode it mono.
13
Perceptual Coding: ExamplesMP3 = MPEG 1/2 layer 3 audio; achieves CD quality in about 192 kbps (a 3.7:1 compression ratio): higher compression possible Sony MiniDisc uses Adaptive Transform Coding (ATRAC) to achieve a 5:1 compression ratio (about 141 kbps)
http://www.mpeg.orghttp://www.minidisc.org/aes_atrac.html
Artefacts of compression
Some areas of the spectrum are lost in the encoding process
MP3 encoded recordings rarely sound identical to original uncompressed audio files On small or PC speakers, however, MP3 compressed audio can be acceptable
14
Examples
(1.12MB) 128kbps (105KB)
96Kbps(78.9KB) 64kbps (52.6KB)
WAV File (34Mb)
15
Mp3 file (3Mb)
LPC and Parametric CodingLPC and Parametric Coding
LPC (Linear Predictive Coding)Based on the human utterance organ model
s(n) = a1s(n-1) + a2s(n-2) +…+ aks(n-k) + e(n)Estimate a1, a2, …ak and e(n) for each piece (frame) of speechEncode and transmit/store a1, a2, …ak and type of e(n) Decoder reproduce speech using a1, a2, …ak and e(n) - very low bit rate but relatively low speech quality
Parametric coding: Only coding parameters of sound generation modelLPC is an example where parameters are a1, a2, …ak , e(n) Music instrument parameters: pitch, loudness, timbre, …
16
Speech CompressionSpeech Compression
Handling speech with other media information such as text, images, video, and data is the essential part of multimedia applicationsThe ideal speech coder has a low bit-rate, high perceived quality, low signal delay, and low complexity.Delay
Less than 150 ms one- way end- to- end delay for a conversation Processing (coding) delay, network delayOver Internet, ISDN, PSTN, ATM, …
ComplexityComputational complexity of speech coders depends on algorithmsContributes to achievable bit- rate and processing delay
G.72x Speech Coding StandardsG.72x Speech Coding Standards
Quality“intelligible” - > “natural” or “subjective” qualityDepending on bit- rate
Bit-rate
17
G.72x Audio Coding StandardsG.72x Audio Coding Standards
Silence Compression - detect the "silence", similar to run-length coding Adaptive Differential Pulse Code Modulation (ADPCM)e.g., in CCITT G.721 -- 16 or 32 Kb/s.
(a) Encodes the difference between two or more consecutive signals; the difference is then quantized - - > hence the loss (b) Adapts at quantization so fewer bits are used when the value is smaller. It is necessary to predict where the waveform is headed -- > difficult
Linear Predictive Coding (LPC) fits signal to speech model and then transmits parameters of model --> sounds like a computer talking, 2.4 Kb/s.
Video Digitization and CompressionVideo is sequence of images (frames) displayed at constant frame rate
e.g. 24 images/secDigital image is a 2-D array of pixelsEach pixel represented by bits
R:G:BY:U:V
Y = 0.299R + 0.587G + 0.114B (Luminance or Brightness) U = B - Y (Chrominance 1, color difference) V = R - Y (Chrominance 2, color difference)
RedundancyspatialTemporal
18
Intra-frame coding
Transform Quantize Encode
JPEG (Joint Photographic Experts Group)
Original size640x480x3=922KB
JPEG Compression Ratios: 30:1 to 50:1 compression is possible with small to moderate defects100:1 compression is quite feasible for low-quality purposes
JPEG Steps1 Block Preparation:
From RGB to YUV (YIQ) planes 8x8 blocks
2 Transform: 2-D Discrete Cosine Transform (DCT) on blocks (lossy?)
3 Quantization: Quantize DCT Coefficients (lossy)
4 Encoding of Quantized Coefficients (lossless)Zigzag ScanDifferential Pulse Code Modulation (DPCM) on DC component Run Length Encoding (RLE) on AC ComponentsEntropy Coding: Huffman or Arithmetic
19
JPEG Transform Quantize Encode
Transform Quantize
Encode
BlockPreparation
Decompression:Reverse the order
RGB Input DataRGB Input Data After Block PreparationAfter Block Preparation
(1) Block Preparation
Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes:
Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane.U, V: (320 X 240 8-bits/pixel) Chrominance (color) planes.
20
(2) Discrete Cosine Transform (DCT)A transformation from spatial domain to frequency domain (similar to FFT)Definition of 8-point DCT:
F[0,0] is the DC component and other F[u,v] define AC components of DCT
The 64 (8 x 8) DCT Basis Functions
DC Componentu
v
Block-based 2-D DCT •Karhunen-Loeve (KL) transform ?
21
8x8 DCT Example
Original values of an 8x8 block(in spatial domain)
Corresponding DCT coefficientscoefficients(in frequency domain)(in frequency domain)
DC Componentor u
or v
(3) Quantized DCT Coefficients
Uniform quantization:Divide by constant N and round result.
In JPEG, each DCT F[u,v] is divided by a constant q(u,v).
- quantization table (filter ?)
q(u,v)
F[u,v]
RoundedF[u,v]/ q(u,v)
22
(4) Zigzag ScanMaps an 8x8 block into a 1 x 64 vectorZigzag pattern group low frequency coefficients in top of vector.
(5) Encoding of Quantized DCT Coefficients
DC Components:DC component of a block is large and varied, but often close to the DC value of the previous block.Encode the difference of DC component from previous 8x8 blocks using Differential Pulse Code Modulation (DPCM).
AC components:The 1x64 vector has lots of zeros in it. Using RLE, encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component. Send (0,0) as end-of-block value.
23
(6) Runlength Coding
Further compression: statistical (entropy) coding
A typical 8x8 block of quantized DCT coefficients. Most of the higher order coefficients have been quantized to 0.
12 34 0 54 0 0 0 087 0 0 12 3 0 0 016 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0
Zig-zag scan: the sequence of DCT coefficients to be transmitted:12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 .....
DC coefficient (12) is sent via a separate Huffman table.Runlength coding remaining coefficients:
34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 .....
Compression Ratio: 12.3
Compression Ratio: 7.7
Compression Ratio: 33.9
Quantization Table Used
Compression Ratio: 60.1
Compressed Image
OriginalImage
JPEGJPEGExampleExample
Blocking artifact
(JPEG 2000 ?)
24
Intra-codedI-frame
PredictedP-frame
MPEG: Inter-Frame Coding
Motion Estimation + Compesentation
25
Video compression: A big picture
Bi-Directional Prediction
I B PB B PB B PB B IB
Intra-CodedI-Frame
Bi-directionalPredictedB-Frame
Group of frames (GOF)
Q: 3D Transform Coding ?
26
VBR vs CBR: Rate Control
CBRVideoEncoder
Smoothing Buffer
Rate Controller
Raw VBR
Qp
Variable-Bit-RateFixed quantizer Qp“Constant” qualityE.g. RMVB
Constant-Bit-RateAdaptive quanitzer“Constant” rate – easier control
Difference (compared to target rate can be 0.5% or less)E.g. RM, MPEG-1
Rate-distortion optimizationRecall that transport layer also has rate control …
Standardization Organizations
ITU-T VCEG (Video Coding Experts Group)
standards for advanced moving image coding methods appropriate for conversational and non-conversational audio/visual applications.
ISO/IEC MPEG (Moving Picture Experts Group)
standards for compression and coding, decompression, processing, and coded representation of moving pictures, audio, and their combination
RelationITU-T H.262~ISO/IEC 13818-2(mpeg2)
Generic Coding of Moving Pictures and Associated Audio.ITU-T H.263~ISO/IEC 14496-2(mpeg4)
WG - work groupSG – sub group
ISO/IEC JTC 1/SC 29/WG 1Coding of Still Pictures
ISO/IEC JTC 1/SC 29/WG 11
27
Coding Rate and Standards
8 16 64 384 1.5 5 20
kbit/s Mbit/s
Very low bitrate Low bitrate Medium bitrate High bitrate
Mobilevideophone
Videophoneover PSTN
ISDNvideophone Digital TV HDTVVideo CD
MPEG-4 MPEG-1 MPEG-2H.261H.263
ISO MPEG-1 (Moving Pictures Experts Group).
MPEG-1 Progressively scanned video for
multimedia applications, at a bit rate 1.5Mb/s access time for CD-ROM players. Video format: near VHS quality
28
ISO MPEG-2
MPEG-2
Standard for Digital Television, DVD4 to 8 Mb/s / 10 to 15 Mb/s >> MPEG -1Supports various modes of scalability (Spatial, temporal, SNR)There are differences in quantization and better Variable length codes tables for progressive video sequences.
ISO MPEG-4
A much broader standard. MPEG-4 was aimed primarily at low bit rate video communication, but not limited toApplications:
1. Digital television2. Interactive graphics
applications 3. Interactive multimedia
(World Wide Web)Two version: Divx 3 and Divx 4 (Internet world)Important concept
Video object
29
A video frame
Background VOP
VOP
VOP
MPEG-4 Object Video
Instead of ”frames”: Video Object PlanesShape Adaptive DCT
Alpha map
SA DCT
MPEG-4 Structure
A/Vobject
Decoder
MUX
Com
posito
r
Bitstream Audio/Video scene
A/Vobject
Decoder
A/Vobject
Decoder
30
Object 2
Object 1
Object 3
Object 4
Example
Problems, comments?
Another Example
31
Status
Microsoft, RealVideo, QuickTime, ...
But only recentagular frame based
H.264 = MPEG-4 part 10 (2003)Shape coding Synthetic scene
H.264H.26x (x=1,2,3)
ITU-T Recommendations Real time video communication applications.
MPEG Standards Video storage, broadcast video, video streaming applications
H.26 L = ITU-T + MPEG = JVT codingLatest project of Joint Video Team formed by ITU-T SG16 Q6 ( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG )Basic configuration similar to H.263 and MPEG-4 Part 2
32
H.264 DesignGoalsEnhanced Compression performanceProvision of network friendly packet based video representation addressing the conversational and non-conversational applicationsConceptual Separation between Video Coding Layer ( VCL) and Network Adaptation Layer ( NAL)
H.264 Design ( Contd. )
Video Coding Layer
Data Partitioning
Network Adaptation Layer
Con
trol D
ata
Macro-block
Slice/Partition
33
H.264 Design ( Contd.)
Video Coding LayerCore High compression representation Block based motion compensated transform video coderNew features enabled to achieve significant improvement in coding efficiency.
Network Adaptation LayerProvides the ability to customize the format of the VCL data over a variety of networksUnique packet based interfacePacketisation and appropriate signaling is a part of NAL specification
Video Coding Evolution
Y. Wang, J. Ostermann, Y.-Q. Zhang, Digital Video Processing and Communication. Prentice Hall, 2001.
H.264