outline
2
mage/video compression: what and why
source coding basics
basic idea
symbol codes
stream codes
compression systems and standards
system standards and quality measures
image coding JPEG
video coding and MPEG
Summary
need for compression
3
Image: 6.0 million pixel camera, 3000x2000
18 MB per image -> 56 pictures / 1GB
Video: DVD Disc 4.7 GB
video 720x480, RGB, 30 f/s -> 31.1MB/sec
audio 16bits x 44.1KHz stereo -> 176.4KB/s
->1.5 min per DVD disc
Send video from cellphone:
352*240, RGB, 15 frames / second
3.8 MB/sec ->$38.00/sec levied by AT&T
Data Compression
4
Wikipedia: “data compression, or source coding, is the process of
encoding information using fewer bits (or other information-bearing
units) than an unencoded representation would use through use of
specific encoding schemes.”
Applications
General data compression: .zip, .gz …
Image over network: telephone/internet/wireless/etc
Slow device:
1xCD-ROM 150KB/s, bluetooth v1.2 up to ~0.25MB/s
Large multimedia databases
Understanding compression:
what are behind jpeg/mpeg/mp4 … formats?
what are the “good/fine/super fine” quality modifiers in my Canon 400D?
why/when do I want to use raw/jpeg format in my digital camera?
why doesn’t “zipping” jpeg files help?
what are the best ways to do compression?
are we doing our best? (yes/no/maybe)
what can we compress?
5
Goals of compression
Remove redundancy
Reduce irrelevance
irrelevance or perceptual redundancy not all visual
information is perceived by eye/brain, so throw away
those that are not.
redundant : exceeding what is necessary or normal
symbol redundancy:
the common and uncommon values cost the same
to store.
spatial and temporal redundancy:
Temperatures: tend to be similar in adjacent
geographical areas, also tend to be similar in the
same month over different years …
symbol/inter-symbol redundancy
6
Letters and words in English
e, a, i, s, t, …
a, the, me, I …
good, magnificent, …
fyi, btw, ttyl …
In the evolution of language we
naturally chose to represent frequent
meanings with shorter representations.
Data and information
7
Data is not the same thing as information.
Data is the means with which information is
expressed. The amount of data can be much larger
than the amount of information.
Redundant data doesn't provide additional
information.
Image coding or compression aims at reducing the
amount of data while keeping the information by
reducing the amount of redundancy.
Image Compression
Image compression addresses the problem
of reducing the amount of data required to
represent a digital image. The underlying
basis of reduction process is the removal of
redundant data
Transforming a 2-D pixel array into a
statistically uncorrelated data set
Different Types of Redundancy
9
Coding Redundancy:
Some gray levels are more common than
others.
Inter-pixel Redundancy:
The same gray level may cover a large
area.
Psycho-Visual Redundancy:
The eye can only resolve about 32 gray
levels locally
M. C. Escher, Drawing Hands, 1948
Redundancy
Spatial redundancy
Similarities between adjacent pixels
250 252 249
250 2 -3
Temporal redundancy
Similarities between pixels in adjacent frames
250 252 249
250 2 -1
10
Modes of compression
11
Lossless
preserve all information, perfectly recoverable
examples: morse code, zip/gz
Lossy
throw away perceptually insignificant information
cannot recover all bits
Image Compression
12
Image compression can be:
Reversible (lossless), with no loss of information.
– The image after compression and decompression is identical to the
original image. Often necessary in image analysis applications.
– The compression ratio is typically 2 to 10 times.
Non reversible (lossy), with loss of some information.
– Lossy compression is often used in image communication, compact
cameras, video, www, etc.
– The compression ratio is typically 10 to 30 times.
Image Coding and Compression
13
Image coding
– How the image data can be represented.
Image compression
– Reducing the amount of data required to represent an image.
– Enabling efficient image storage and transmission.
𝑓 𝑓
Lossless compression
14
15
Objective Measures of Image Quality:
Rate
16
Compression ratio serves as the primary measure of a compression
technique’s effectiveness. It is a measure of the number of bits that can be
eliminated from an uncompressed representation of a source image.
Let N1 be the total number of bits required to store an uncompressed (raw)
source image and let N2 be the total number of bits required to store the
compressed data. The compression ratio Cr is then defined as the ratio of
N1 to N2
Larger compression ratios indicate more effective compression
Smaller compression ratios indicate less effective compression
Compression ratios less than one indicate that the compressed
representation is actually larger than the uncompressed representation.
Objective Measures of Image Quality:
Compression Ratio
17
Objective Measures of Image Quality
Objective Measures of Image Quality
18
Objective fidelity criteria
MSE vs PSNR
19
Subjective Measures of Image Quality
20
The problem
– The objective image quality measures previously shown does not
always fit with our perception of image quality.
One solution
– Let a number of test persons rate the image quality of the images on
a scale. This will result in a subjective measure of image quality, or
rather fidelity, but it will be based on how we perceive the quality of the
images.
Subjective fidelity criteria
Excellent
Fine
Passable
Marginal
Inferior
unusable
Information Measure
21
Using Elements of Information theory
Example
Assume that the grading levels are
A, B, C, D, E, F
These are equally distributed
How much information do you have if you know that you don’t have grade F?
How much information is needed to knowyour exact grade?
(5/6)*-log2 (5/6) = 0,22 bits
log2 (6) - ((1/6)*-log2 (1/6)) = 2,15 bits
Information Measure:
Entropy
Shannon Entropy = average information per
source output
H(z)= − 𝑘=0𝐿−1 𝑝 𝑟𝑘 𝑙𝑜𝑔2 𝑝(𝑟𝑘)
where 𝑟𝑘 is gray level number k , and L is
the number of gray levels.
Measure the Amount of Data
24
The average length of the code words assigned to various gray-level
values is found by summing the product of the number of bits used to
represent each gray level and the probability that the gray level occurs
Example 3-bit image
25
The noiseless coding theorem
It is possible to make Lavg/n arbitrary close to
H(z) by coding infinitely long extensions of
the source
)(lim/
zHn
L navg
n
……Redundancy
27
Coding Redundancy:
Some gray levels are more common than
others.
Inter-pixel Redundancy:
The same gray level may cover a large
area.
Psycho-Visual Redundancy:
The eye can only resolve about 32 gray
levels locally
M. C. Escher, Drawing Hands, 1948
Definitions (revisited)
28
Larger compression ratios indicate more effective compression
Smaller compression ratios indicate less effective compression
Compression ratios less than one indicate that the compressed
representation is actually larger than the uncompressed representation.
Dealing with coding redundancy
29
Basic idea:
Different gray levels occur with different probability (non uniform histogram).
Use shorter code words for the more common gray levels and longer code
words for less common gray levels. This is called Variable Code Length.
Code 1 Lavg = 3
Code 2 Lavg = 2,7
Revisit:
Desired properties of symbol codes
30
good codes are not only short but also easy to
encode/decode
Non-singular: every symbol in X maps to a different
code word.
Uniquely decodable: every sequence {x1, … xn} maps
to different codeword sequence.
Instantaneous: no codeword is a prefix of any other
codeword a.k.a. prefix code, self-punctuating code,
prefix-free code.
Huffman Coding
31
First
1. Sort the gray levels by decreasing probability
2. Sum the two smallest probabilities.
3. Sort the new value into the list.
4. Repeat 1 to 3 until only two probabilities remains.
David Albert Huffman
Second
1. Give the code 0 to the highest probability, and the code 1
to the lowest probability in the summed pair.
2. Go backwards through the tree one node and repeat
from 1 until all gray levels have a unique code.
Example of Huffman coding
32
Huffman Coding
33
First
1. Sort the gray levels by decreasing probability
2. Sum the two smallest probabilities.
3. Sort the new value into the list.
4. Repeat 1 to 3 until only two probabilities remains.
David Albert Huffman
Second
1. Give the code 0 to the highest probability, and the code 1
to the lowest probability in the summed pair.
2. Go backwards through the tree one node and repeat
from 1 until all gray levels have a unique code.
Example of Huffman coding
34
Assigning codes
Example of Huffman coding
35
Huffman coding….
36
The Huffman code is completely reversible, i.e.,
lossless.
The table for the translation has to be stored together
with the coded image.
The resulting code is unambiguous. That is, for the
previous example, the encoded string 011011101011
can only be parsed into the code words 0, 110, 1110,
1011 and decoded as 7, 4, 5, 0.
The Huffman code does not take correlation between
adjacent pixels into consideration.
Interpixel Redundancy
37
Also called spatial or geometric redundancy
Adjacent pixels are often correlated, i.e., the
value of neighboring pixels of an observed
pixel can often be predicted from the value of
the observed pixel.
Coding methods:
Run-length coding
Difference coding
Run-length coding
38
Every code word is made up of a pair (g,l) where g is the gray
level, and l is the number of pixels with that gray level (length or
“run”).
E.g.,
results in the run-length code (1,3)(3,3)(4,1)(2,4)(1,5)
The code is calculated row by row in this scan pattern:
(Newer methods can take advantage of runs of repetitive patterns
like: 8 5 5 8 5 5 8 5 5.)
Difference coding
39
Definition:
E.g.,
The code is calculated row by row in the following scan pattern:
Both run-length and difference coding are reversible and can be
combined with, e.g., Huffman coding.
Combining Difference and Huffman Coding
40
Original image
Difference coding
Combining Difference and
Huffman Coding
41
Combining Difference and Huffman Coding
42
LZW Coding
43
LZW, Lempel-Ziv-Welch
In contrast to Huffman with variable code length, LZW uses fixed
lengths of code words which are assigned to variable length sequences
of source symbols.
The coding is done from left-to-right and row-by-row.
Requires no a priori knowledge of the probability of occurrence of the
symbols to be encoded.
Removes some of the inter-pixel redundancy.
During encoding a dictionary or “code-book” with symbol sequences is
created which is recreated when decoding.
(Many modern lossless compression methods uses Huffman coding in
combination with methods like LZW.)
What if the symbol probabilities are
unknown?
LZW-Coding Lempel-Ziv-Welch
Integrated to mainstream imaging file formats
Graphic interchange format –GIF
Tagged image file format – TIFF
Portable document format - PDF
Widely used: GIF, TIFF, PDF …
Its royalty-free variant (DEFLATE) used in PNG, ZIP, …
Unisys U.S. LZW Patent No. 4,558,302 expired on June 20,2003
http://www.unisys.com/about__unisys/lzw
39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126
39 39 126 126 256 258 260 259 257 126
LZW
Coding dictionary (code book) is created
while data are being encoded
LZW decoder builds an identical
decompression dictionary as it decodes the
data stream
Flush the code book
When the codebook is full
When coding is inefficient
Bit Plane coding
Divide the gray level/color image into series of binary
images (with one image per bit). Code each image
separately using the above described methods. An 8-bit
image will be represented by 8 coded binary images.
It is based on the concept of decomposing a multilevel
image into a series of binary images and compressing
each binary image via one of several well-known binary
compression methods
Alternative decomposition approach – start with
representation of Gray code – successive gray level
differ with one bit
Constant area coding (CAC)
Image is divided into areas of size p x q
Classify all white, all black, mixed
Example white=0, black=10, mixed = 11+pixel
values
If dominantly white
Example white=0, black or mixed = 1 + pixel
values
Lossless Predictive Coding
Does not require decomposition of an image into a collection of
bitplanes
Based on eliminating the interpixel interference of closely spaced
pixels by extracting and coding only the new information in each
pixel
Contains encoder, decoder and predictor
The output of the predictor is rounded to the nearest integer
Lossless Predictive Coding
Prediction error
Is coded using an variable length code
Decoder reconstructs
The predictor uses m previous pixels
nnn ffe ˆ
nnn fef ˆ
]),([),(ˆ
1
m
i
in iyxfroundyxf
Lossy Predictive Compression
A Typical Predictive Coder
Lossy Predictive Compression
52
Reduce the accuracy of the saved image for increased compression
One of the simplest lossy predictive coding schemes is known as
Delta Modulation. Record the approximate error (difference)
between the predicted and actual samples values.
For a sequence of samples S that are indexed by k to generate an
approximate sequence of samples S’ by using a fixed delta value, the
approximate error is given by
Lossy Predictive Compression:
Effect of delta value
53
Delta Modulation Example
54
Delta Modulation Example
55
Psycho-Visual Redundancy
56
If the only intended use of an image is visual observation, much of the
information can be psycho-visual redundant, i.e., it can be removed
without changing the visual appearance or perceived quality of the
image. Loss of information inflict a lossy method.
1 721 kB (uncompressed) 78 kB (low quality JPEG)
Psycho-Visual redundancy
57
Psycho-Visual redundancy is often reduced by quantification.
E.g., Uniform quantification of gray levels
Remove the least significant bits of the data.
Causes edge effects.
The edge effects can be reduced by Improved Gray Scale (IGS).
Remove the least significant bit, and add a “random number”
based on the sum of the least significant bits of the present,
and the previous pixel.
IGS reduces edge effects, but will at the same time unsharpen
true edges.
Improved Gray Scale (IGS)
58
(a) Original image. (b) Uniform quantization to 16 levels. (c) IGS quantization to 16
levels.
Transform Coding
A compression technique that is based on
modifying the transform of an image
For most natural images a significant number
of the coefficients have small magnitudes and
can be coarsely quantized or discarded with
little image distortion
Transform Coding
Sub image decomposition
Transformation
Quantization
Coding
Adaptive transform coding or nonadaptive transform coding
Transform Coding
4) Code the resulting data, normally
using some kind of “variable length
coding”, e.g., Huffman code.
The coding is not reversible (unless step
3 is skipped).
1) Divide the image into n × n sub-images.
2) Transform each sub-image using a reversible transform (e.g., the Hotelling
transform, the discrete Fourier transform (DFT) or the discrete cosine
transform (DCT)).
3) Quantify, i.e., truncate the transformed image (e.g., by using DFT, and DCT
frequencies with small amplitude can be removed without much information
loss). The quantification can be either image dependent (IDP) or image
independent (IIP).
g(x,y,u,v) = Forward transformation kernel
h(x,y,u,v) = Inverse transformation kernel
1
0
1
0
),,,(),(),(N
x
N
y
vuyxgyxfvuT
1
0
1
0
),,,(),(),(N
u
N
v
vuyxhvuTyxf
Fourier Transform
Nvyuxj
Nvyuxj
eN
vuyxh
eN
vuyxg
/)(2
2
/)(2
2
1),,,(
1),,,(
Walsh-Hadamard Transform -
WHT
1
0
)]()()()([
)1(1
),,,(),,,(
m
i
iiii vpybupxb
Nvuyxhvuyxg
• The Fourier Transform consists of a projection onto a
set of orthogonal sinusoidal waveforms.
• The FT coefficients are called frequency components
and the waveforms are ordered by frequency.
• The Hadamard Transform consists of a projection
onto a set of square waves called Walsh functions.
• The HT coefficients are called sequence components
and the Walsh functions are ordered by the number of
their zero-crossings.
Discrete Cosine Transform -
DCT• The discrete cosine transform (DCT) is used to transform a signal from
the spatial domain into the frequency domain.
• The reverse process, that of transforming a signal from the frequency
domain into the spatial domain, is called the inverse discrete cosine
transform (IDCT).
• A signal in the frequency domain contains the same information as that in
the spatial domain. The order of values obtained by applying the DCT is
coincidentally from lowest to highest frequency
• This feature and the psychological observation that the human eye and
ear are less sensitive to recognizing the higher-order frequencies leads to
the possibility of compressing a spatial signal by transforming it to the
frequency domain and dropping high-order values and keeping low-order
ones.
• When reconstructing the signal, and transforming it back to the spatial
domain, the results are remarkably similar to the original signal.
Discrete Cosine Transform -
DCT
1...2,1/2
0/1)(
)2
)12(cos()
2
)12(cos()()(),,,(
NuN
uNu
N
vy
N
uxvuvuyxh
JPEG - Sequential baseline
system
Limited to 8-bit words
DCT-values restricted to 11 bit
DCT computation, quantization, variable
length coding
Subimages 8 x 8 left to right, top to bottom
Image size selection
JPEG:
example of transform coding
70
380
5 %
9486
100 %3839
90 %
2086
80 %
1711
60 %
1287
40 %
822
20 %
533
10 %File size in bytes
JPEG quality
Wavelet Coding
The principal difference between wavelet
coding and transform coding is the omisson
of the subimage processing stage
JPEG2000
File Formats with Lossy Compression
72
JPEG, Joint Photographic Experts Group, based on
a cosine transform on 8x8 pixel blocks and Run-
Length coding. Give arise to ringing and block
artifacts. (.jpg .jpe .jpeg)
JPEG2000, created by the Joint Photographic
Experts Group in 2000. Based on wavelet transform
and is superior to JPEG. Give arise only to ringing
artifacts and allows flexible decompression
(progressive transmission, region of interest, ...)
and reading. (.jp2 .jpx)
JPEG vs JPEG-2000
73
Typical steps in losy image
compression
74
How to represent a face image
75
DCT Approach (block-based)
=a +b +f +g +x
PCA Approach (full-frame)
77
JPEG/JFIF overview
78
Using the 2D FFT for image
compression
79
• Image = 200x320 matrix of values
• Compress by keeping largest 2.5% of FFT components
• Similar idea used by jpeg
Trade-off
80
File Formats with Lossless Compression
81
TIFF, Tagged Image File Format, flexible format often
supporting up to 16 bits/pixel in 4 channels. Can use
several different compression methods, e.g.,
Huffman, LZW.
GIF, Graphics Interchange Format. Supports 8
bits/pixel in one channel, that is only 256 colors.
Uses LZW compression. Supports animations.
PNG, Portable Network Graphics, supports up to 16
bits/pixel in 4 channels (RGB + transparency). Uses
Deflate compression (~LZW and Huffman). Good
when interpixel redundancy is present.
Vector based file formats
82
Vector based file formats
83
PS, PostScript, is a page description language developed in
1982 for sending text documents to printers.
EPS, Encapsulated PostScript, like PS but can embed raster
images internally using the TIFF format.
PDF, Portable Document Format, widely used for documents
and are supported by a wide range of platforms. Supports
embedding of fonts and raster/bitmap images. Beware of the
choice of coding. Both lossy and lossless compressions are
supported.
SVG, Scalable Vector Graphics, based on XML supports both
static and dynamic content. All major web browsers supports it
(Internet Explorer from version 9).
Choosing image file format
84
Image analysis
– Lossless formats are vital. TIFF supports a wide
range of different bit depths and lossless compression
methods.
Images for use on the web
– JPEG for photos (JPEG2000), PNG for illustrations.
GIF for small animations. Vector format: SVG,
nowadays supported by web browsers.
Line art, illustrations, logotypes, etc.
– Lossless formats such as PNG etc. (or a vector
format)
Video
Compression
85
Video Compression Standards
Once video is in digital format, it makes
sense to compress it
Similarly to image compression, we want to
store video data as efficiently as possible
Again, we want to both maximize quality and
minimize storage space and processing
resources
This time, we can exploit correlation in both
space and time domains
Video Compression Standards
Unlike image encoding, video encoding is
rarely done in lossless form
No storage medium has enough capacity to
store a practical sized lossless video file Lossless DVD video - 221 Mbps
Compressed DVD video - 4 Mbps
50:1 compression ratio!
Teleconference H.261, H.262, H.263, H.230
Multimedia video MPEG-1 MPEG-2 MPEG-4
Two organizations dominate video
compression standardization:
ITU-T Video Coding Experts Group (VCEG)International Telecommunications Union – Telecommunications Standardization
Sector (ITU-T, a United Nations Organization, formerly CCITT),
ISO/IEC Moving Picture Experts Group
(MPEG)International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee Number 1, Subcommittee 29, Working Group 11
88
Definitions
89
Bitrate Information stored/transmitted per unit time
Usually measured in Mbps (Megabits per second)
Ranges from < 1 Mbps to > 40 Mbps
Resolution Number of pixels per frame
Ranges from 160x120 to 1920x1080
FPS (frames per second) Usually 24, 25, 30, or 60
Don’t need more because of limitations of the human
eye
Scan types
Interlaced scan Odd and even lines displayed on
alternate frames
Initially used to save bandwidth on TV
transmission
When displaying interlaced video on a
progressive scan display, can see “comb
effect
Progressive scan Display all lines on each frame
New “fixed-resolution” displays (such as
LCD, Plasma) all use progressive scan
Deinterlacing is not a trivial task
90
MPEG
(Moving Pictures Expert Group)
Committee of experts that develops video
encoding standards
Until recently, was the only game in town (still
the most popular, by far)
Suitable for wide range of videos
Low resolution to high resolution
Slow movement to fast action
Can be implemented either in software or
hardware91
92
MPEG(Moving Pictures Expert Group)
MPEG:s main components are:
Block (8×8 pixels)
Macro block (2×2 block)
Slice (One row of macro blocks)
Picture (An entire video frame)
Group of pictures (GOP)
Video Sequence (One or more GOP:s)
93
MPEG
8×8 Block
16×16 Macro block
Slice
Evolution of MPEG
MPEG-1
Initial audio/video compression standard
Used by VCD’s
MP3 = MPEG-1 audio layer 3
Target of 1.5 Mb/s bitrate at 352x240 resolution
Only supports progressive pictures
MPEG-2
Current de facto standard, widely used in DVD and Digital TV
Ubiquity in hardware implies that it will be here for a long time
Transition to HDTV has taken over 10 years and is not finished yet
Different profiles and levels allow for quality control
Evolution of MPEG
MPEG-2
Current de facto standard, widely used in DVD and Digital TV
Ubiquity in hardware implies that it will be here for a long time
Transition to HDTV has taken over 10 years and is not finished yet
Different profiles and levels allow for quality control
Evolution of MPEG
MPEG-3 Originally developed for HDTV, but abandoned when MPEG-2
was determined to be sufficient
MPEG-4 Includes support for AV “objects”, 3D content, low bitrate
encoding, and DRM
In practice, provides equal quality to MPEG-2 at a lower bitrate, but often fails to deliver outright better quality
MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray
MPEG-7, 2001 : metadata for audio-video streams, Multimedia Content
Description Interface
MPEG-21, 2002 : distribution, exchange, user access of multimedia data and
intellectual property management
MPEG Block Diagram
MPEG technical specification
Part 1 - Systems - describes synchronization and multiplexing of video and audio.
Part 2 - Video - compression codec for interlaced and non-interlaced video signals.
Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio.
Part 4 - Describes procedures for testing compliance.
Part 5 - Describes systems for Software simulation.
Part 6 - Describes extensions for DSM-CC (Digital Storage Media Command and Control.)
Part 7 - Advanced Audio Coding (AAC)
Part 8 - Deleted
Part 9 - Extension for real time interfaces.
Part 10 - Conformance extensions for DSM-CC.
MPEG
video spatial domain processing
Spatial domain handled very similarly to
JPEG
Convert RGB values to YUV colorspace
Split frame into 8x8 blocks
2-D DCT on each block
Quantization of DCT coefficients
Run length and entropy coding
100
Zig-Zag Scan,
Run-length
coding
Quantization• major reduction
• controls ‘quality’
“Intra-Frame
Encoded”
101
MPEG – YUV compression
4:2:0 or 4:1:1 common in consumer products (DV)
4:2:2 common on professional products (DVCPro)
4:4:4 is rarely used – gives no visible improvement compared with 4:2:2
Y Y
Y YU V
4:2:0
Y Y
Y Y
U
U V
V
4:2:2
Y Y
Y Y
U
U V
V
V
VU
U
4:4:4
Y Y
Y Y
Y Y
Y Y
U V
4:1:1
U V
102
MPEG – Block compression
The same as JPEG
8
8
Spatial domain
DCT Quantization
99 50
-74 28
35 -11
21 24
87 -49
55 95
54 16
35 22
12 0
0 0
0 0
0 0
0 0
4 2
0 0
0 0
68 40
44 57
-17 8
25 12
-25 32
60 18
33 24
14 5
8 0
0 3
0 0
0 0
0 0
0 -1
0 0
0 0
Frequency domain
High frequencies
Low Frequencies
0 0 0 0 0 -1 0 0 0 0 0 0 3 0 5 14 24 0 0 0 0 0 0 2 8 12 …RLE
Huffman
103
MPEG
video time domain processing
(Temporal compression)
Adjacent frames share large similarities
Temporal compression can be achieved in
two ways:
Discarding images (reduce the frame rate)
Through motion estimation and motion vectors
MPEG
video time domain processing
Totally new ballgame (this concept doesn’t exist in JPEG)
General idea – Use motion vectors to specify how a 16x16 macroblocktranslates between reference frames and current frame, then code difference between reference and actual block
Types of frames
I frame (intra-coded)
Coded without reference to other frames
P frame (predictive-coded)
Coded with reference to a previous reference frame
(either I or P)
Size is usually about 1/3rd of an I frame
B frame (bi-directional predictive-coded)
Coded with reference to both previous and future
reference frames (either I or P)
Size is usually about 1/6th of an I frame
GOP (Group of Pictures)
GOP is a set of consecutive frames that can be decoded without any other reference frames
Usually 12 or 15 frames
Transmitted sequence is not the same as displayed sequence
Random access to middle of stream – Start with I frame
Things about prediction
Only use motion vector if a “close” match can be found Evaluate “closeness” with MSE or other metric
Can’t search all possible blocks, so need a smart algorithm
If no suitable match found, just code the macroblock as an I-block
If a scene change is detected, start fresh
Don’t want too many P or B frames in a row Predictive error will keep propagating until next I
frame
Delay in decoding
108
MPEG – Group Of Pictures
(GOP)
MPEG uses three types of frames
Grouped in a Group Of Pictures (GOP)
I-pictures (Intracoded)
P-pictures (Predictive Coded)
B-pictures (Bidirectionally interpolated)
I B B P B B P
Forward Prediction
Bidirectional Prediction
I B B P B B P
109
Compressed video stream
Temporal Redundancy Reduction
I frames are independently encoded
P frames are based on previous I, P frames
B frames are based on previous and following I and P frames
In case something is uncovered
111
MPEG – Motion Estimation
Calculate the position for the macro block in
the new image
Store the motion vector and difference in
appearance
Macro block
112
MPEG – Motion Estimation
Help understanding the content of image sequence
Help reduce temporal redundancy of video
For compression
Stabilizing video by detecting and removing small, noisy global
motions
For building stabilizer in camcorder
A hard problem in general!
Bitrate allocation
CBR – Constant BitRate
Streaming media uses this
Easier to implement
VBR – Variable BitRate
DVD’s use this
Usually requires 2-pass coding
Allocate more bits for complex scenes
This is worth it, because you assume that you encode
once, decode many times
114
MPEG – Data stream
I B B P B B P
Display order
1 5432 76
Order in data stream
I BBP BBP
1 54 32 7 6
MPEG - Audio
MPEG-1 – 3 layers of increasing quality, layer 3 being the most common (MP3) 16 bits
Samping rate - 32, 44.1, or 48 kHz
Bitrate – 32 to 320 kbps
De facto - 44.1 kHz sample rate, 192 kbps bitrate
MPEG-2 – Supports > 2 channels, lower sampling frequencies, low bitrate improvement
AAC (Advanced Audio Coding) More sample frequencies (8 kHz to 96 kHz)
Higher coding efficiency and simpler filterbank
96 kbps AAC sounds better than 128 kbps MP3
Usually CBR, but can do VBR
MPEG Container Format
Container format is a file format that can contain data compressed by standard codecs
2 types for MPEG
Program Stream (PS) – Designed for reasonably reliable media, such as disks
Transport Stream (TS) – Designed for lossy links, such as networks or broadcast antennas
AV Synchronization
Want audio and video streams to be played back in sync with each other
Video stream contains “presentation timestamps”
MPEG-2 clock runs at 90 kHz Good for both 25 and 30 fps
PCR (Program Clock Reference) timestamps are sent with data by sender
Receiver uses PLL (Phase Lock Loop) to synchronize clocks
Real time video encoding
Motion estimation will
be worse, so need
higher bitrate to
compensate
Very hard to do in
software, need
dedicated hardware or
hardware assistance
Tivo, ReplayTV do this
Streaming media
Common types include Flash, RealVideo, Quicktime
Usually have low bandwidth available, need to optimize as such
Want dedicated network protocols for this purpose
TCP will wait indefinitely for retransmission, so is often not suitable
MPEG data stream
Analysis
Pros
Overall sharp picture
Audio and video stay
in sync with each
other
What if we were
transmitting this over a
network?
Cons
Picture flashes, blurs
when there is too
much movement on
screen
Higher bitrate often
does not solve this
problem
Image/Video Compression
Standards
122
Bitstream useful only if the recipient knows the code!
Standardization efforts are important
Technology and algorithm benchmark
System definition and development
Patent pool management
Defines the bitstream (decoder), not how you generate them (encoder)!
123
current industry focus:
H.264 encoding/decoding on mobile devices,
low-latency video transmission over various networks,
low-power video codec …
audio coding versus image
coding
124
VC demo
125