Video Coding
C.M. Liu
Perceptual Signal Processing Lab
College of Computer Science
National Chiao-Tung University
Office: EC538
(03)5731877
(
http://www.csie.nctu.edu.tw/~cmliu/Courses/Compression/
1. Color Fundamentals
Sir Isaac Newton in 1666
A glass prism
Six Broad Regions
Violet, Blue, Green, Yellow, Orange, and Red.
2. Color Fundamentals
Light Properties
Visible light is composed of a relatively narrow band (400nm - 700 nm) of
frequencies in the electromagnetic spectrum.
Achromatic light has only the attribute: intensity or amount.
Three basic quantities used to describe the quality of a chromatic light source:
radiance, luminance, and brightness.
2. Color Fundamentals
Radiance
The total amount of energy that flows from the light source.
It is measured in watts(W).
Luminance
A measure of the amount of energy an observer perceives from a light source.
It is measured in lumens(lm)
Brightness
A subjective descriptor that is practically impossible to measure.
It embodies the achromatic notion of intensity and is one of the key factors in describing color sensation.
2. Color Fundamentals
2. Color Fundamentals
Human Eye
Cones (6-7 millions in an eye)
65% of all cones are sensitive to red light.
33% are sensitive to green light.
2% are sensitive to blue light.
Primary Colors of Light
Primary Colors
Red, Green, Blue.
Secondary Colors
Cyran = G+B
Magnenta = B+R
Yellow = R+G
W=R+G+B.
2. Color Fundamentals
Primary Colors of Pigments Primary Colors C, M, Y C absorbs R M absorbs G Y absorbs B
Secondary Colors C+M absorbs R and G C+M = B
M+Y absorbs G and B M+Y = R
Y+C absorbs B and R Y+C = G
C+M+Y = K (black)
2. Color Fundamentals
Color Specified by brightness and chromaticity (hue and saturation). Chromaticity can be regarded as brightness-normalized color. Hue is an attribute associated with the dominant wavelength in a mixture
of light waves. Saturation referrs to the relative purity or the amount of white light
mixed with a hue. The degree of saturation is inversely proportional to the amount of white
light added.
Tristimulus Values (R, G, B) The amounts of red, green, blue required to form a specific color.
Trichromatic Coefficients x = R/(R+G+B) y = G/(R+G+B) z = B/(R+G+B) Note that x+y+z = 1, thus trichromatic coefficients can be represented
only by x and y (2D space)
2. Color Fundamentals-- CIE Chromaticity Diagram
Pure color is at the boundary of the tongue
Suppose that we have two colors (two points within the tongue)
Any mid-point on the line can be obtained by mixing the two colors
2. Color Fundamentals
Color Gamut
Any point within the corresponding triangle can be obtained by mixing the three colors
All colors cannot be reproduced using only three primary colors.
The boundary of the color is irregular because of a combination of additive and subtractive mixing.
The rugged shape represent the colors that can be represented by a typical printing devices based on CMY space.
3. Color Models
Definition
A color model is a specification of a coordinate system and a subspace within that system where each color is represented by a single point.
Purpose
Facilitate the specification of colors in some standard way.
RGB (red, green, blue)
Display devices
CMY (cyan, magenta, yellow) and CMYK(+black)
Printing devices
HSI (hue, saturation, intensity)
Intuitive description of colors.
Hue (red, green, … violet)
Saturation
Red: high saturation
Pink: less saturation
3. Color Models-- RGB
3. Color Models-- RGB
Full Color (True color) Each of RGB components are represented by 8 bits
RGB pixel is said to have a depth of 24 bits
There are 224 (=16,777,216) colors
Safe RGM Color
Have a subset of colors that are likely to be reproduced faithfully,
reasonably independently of viewer hardware capabilities.
63 = 216 safe colors
6 reproduction levels for each R, G, B
0, 51, 102, 153, 204, or 255.
3. Color Models– RGB
3. Color Models– RGB Full Color
3. Color Models– RGB Safe Color
3. Color Models-- CMY
CMY and CMYK Models
Color printers and copiers, require CMY data input or perform an
RGB to CMY conversion internally.
C+M+Y = K
Black is used most frequently for typical printing Thus,
black ink is added, yielding the CMYK color model
3. Color Models-- HSI
HSI Model
Intensity axis : the main diagonal from black to white
A plane perpendicular to this axis contains the colors with the same
intensity
Saturation: the distance from the intensity axis
Hue: the angle on the plane with respect to the red color
3. Color Models-- HSI
3. Color Models– HSI
Intensity axis : the main diagonal from black to white
A plane perpendicular to this axis contains the colors with the same intensity
Saturation: the distance from the intensity axis
Hue: the angle on the plane with respect to the red color
3. Color Models– RGB to HSI
Hue
Saturation
Intensity
3. Color Models– HSI to RGB
)(3
)60cos(
cos1
)1(
1200
:1
BRIG
H
HSIR
SIB
H
SectorRGCase
)(3
)60cos(
cos1
)1(
240120
:2
GRIB
H
HSIG
SIR
H
SectorGBCase
)(3
)60cos(
cos1
)1(
360240
:3
BGIR
H
HSIB
SIG
H
SectorBRCase
Converting colors from HSI to RGB
There are three sectors of interest, corresponding to the
120° intervals in the separation of primaries.
3. Color Models– HSI
RGB with HSI representation
Discontinuity.
3. Color Models– HIS
Manipulation Example
Analog Video25
Video: A sequence of images played back fast enough to reproduce to the illusion of motion.
Early movies:
16 frames per second (fps)
Updated to 24 fps
Double-/triple-blade shutter artificial 48/72 fps
European standards PAL/SECAM
50 Hz electricity 25 fps
US/Japan/etc
60 Hz electricity 30 fps
29.97 fps (1953 w/ color TV)
TV26
Television, a medium. So called because it is nether
rare nor well done.
--Anonymous
The CRT27
Interlacing28
Odd lines
Retrace
(horizontal blanking)
Even lines
The Color CRT29
The NTSC Standard30
525 scan lines 482/483/487(?) visible
Aspect ration—4:3 (by T. Edison in 1930s) 4/3 x 483 ~ 644 pixels
Image is actually continuous horizontally
Other aspect ratios: PAL/SECAM: 1.33
16/35mm film: 1.33
HDTV: 1.78
Widescreen film: 1.85
70mm film: 2.10
Cinemascope film: 2.35
pel aspect ratio Picture element is not a dot—more like a rectangle
Resolution v.s. Aspect Ratio31
Various Resolution32
TV Formats33
Video Parameters34
Progressive Scan, Frame Rate.
The RGB-to-YUV Conversion35
PAL
Y = 0.299R + 0.587G + 0.115B
U = -0.147R - 0.289G - 0.436B = 0.492 (B-Y)
V = 0.615R - 0.515G - 0.100B = 0.877 (R-Y)
NTSC:
Y = 0.299R + 0.587G + 0.115B
I = 0.596R - 0.274G - 0.322B = -sin33°U + cos33°V
Q = 0.211R - 0.523G - 0.311B = cos33°U + sin33°V
CCIR 601/ITU-R BT.601-236
Standard sampling rates
Multiples of 3.725 MHz
Sampling patterns Y:Cb:Cr
4:4:4—all components samples @13.5MHz
Typical 4:2:2
Luminance sampled @13.5 MHz
Chrominance components sampled @6.75 MHz
RGB to CCIR 60137
After RGB-to-YCbCr conversion
Y normalized as Ys [0, 1]
Cb Cbs [ -½ , ½ ]
Cr Crs [ -½ , ½ ]
8-bit integer conversion
Y = 219Ys + 16, Y [16, 235]
U = 224Cbs + 128, U [16, 240]
V = 224Crs + 128, V [16, 240]
Common Interchange Format (CIF)38
Teleconferencing standard
PAL/NTSC-based: YUV/30 fps
Multiples of 16 x 16 SQCIF: 128 x 96
QCIF: 176 x 144
CIF: 352 x 288
4CIF: 704 x 576
16CIF: 1408 x 1152
Pixel aspect ratio: 1.222:1
SIF: Source Input Format
MPEG-1‘s parlance for CIF
625-line (PAL) & 525-line (NTSC) version
Motion Compensation39
The use of previous frames as prediction of current
frame
I.e. exploitation of temporal redundancy
Rationale:
Most of the time, frame-to-frame changes will be ‗small‘
Idea:
Identify ‗objects‘ that have moved and include a motion
compensation vector
Motion Compensation Example40
Frame #1
Frame #2
Frames 1 & 2 overlaid
motion vectors
Motion Compensation Example41
Block-based Motion Compensation42
‗Pixel-splitting‘ Motion Estimation43
Observation
Best fit may not be pixel aligned
Idea:
―Double‖ the image size
I.e., introduce intermediate pixels with interpolated values
,5.04
5.02,5.02
5.02,5.02
21
21
DCBAc
DBvCAv
DChBAh
Motion Estimation Considerations44
Observations:
Smaller block more possibilities to explore
Larger block higher chance of not finding a match
Note: Numerous methods exist for balancing prediction
accuracy (compression) & computation time
Motion Estimation Example45
46
47
Subpixel Eestimation48
49
MPEG-1/2
MPEG-1 (ISO/IEC 11172) completed in 1991 digital storage media at bit rates up to about 1.5Mbps
remove intra and inter-frame redundancy with block-based DCT and
motion compensation (I, P and B-frames)
progressive pictures only, optimized for SIF (352x240) resolution
fixed 4:2:0 color format
MPEG-2 (ISO/IEC 13818) completed in 1994 extensions that allow for greater input format flexibility, higher data
rates and better error resilience
field/frame prediction modes for interlace format support
field/frame DCT coding syntax
downloadable quantization matrix
scalability extensions (spatial, temporal, SNR)
display syntax (e.g., 3:2 pull-down, pan-and-scan, color formats)
MPEG3, 4, 7
MPEG-3
– Original intended for HDTV coding, dropped when MPEG-2 application domain was extended to HDTV
MPEG-4
– Originally intended for very low bit rate audio/visual coding
– It may be extended for both low and high bit rate application
– Object-oriented coding algorithm
MPEG-7
– There is no reason to pick up the series number 7 instead of 5 or 6 or other
– Intend to set a standard of “Multimedia content description interface” that will specify a standardized description of various types of multimedia information.
51
Bitrates and Resolutions
Standard TV
HDTV
Over 1080P
1080P
64K 1M 1.5M 15M 300M Over 600M
MPEG-4
MPEG-4Studio Profile
CIF
QCIF
MPEG-2
MPEG-24:2:2
Profile
MPEG1
Introduction
Video Formats
Frame Reorder
Data Hierarchy
Syntax
Compression Ratio
MPEG1-- Introduction
Backgrounds
ISO/IEC Draft Standard CD 11172, Dec., 1991
Compression and Decompression of Video & Audio Signals
Synchronization of Audio and Video
Lossy Coding Techniques
Features
A Toolkit
Supports intra and interframe modes.
Only progressive-format data is supported.
The algorithm specifies the bit stream syntax and semantics and a method for decoding it.
MPEG-1 Video54
Overall structure very similar to H.261
… with some non-trivial differences
Focus on stored as opposed to live video
Random access
In H.261 potentially all frames after the first may depend on previous one
MPEG-1 provides random access by requiring periodic independently-encoded frames /I-frames/
Distance b/w I-frames is a trade-off b/w convenience & compression
Also
P-frames—predictively coded
B-frames—bidirectionally predictively coded
MPEG1-- Introduction (c.1)
Features
The algorithm does not specify preprocessing of the video, encoding steps(e.g. motion estimation), postprocessing.
The algorithm does not specify parameters such as coded bit rate. lines per picture(<4096), pels per line (<4096), picture rate(24, 15, or 30), and pel aspect ratio(14 choices) .
A special subset of the parameter space
pels/line <= 720
lines <= 576
macroblocks per picture <= 396
macroblocks per sec. <= 396x25
picture rate <=30
bit rate <=1.86 Mbits/sec.
4:2:0 Format
Crominance components is 1/2 resolution of Y
components ( in both directions)
The MPEG1 format
4:2:0 macroblocks
MPEG1-- Video Formats
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
Y Cb Cr
MPEG1-- Video Formats(c.1)
4:1:1 Format
Same bits/pel area as 4:2:0 but vertical resolution is
higher
Used in DVIx x x x x x
x x x x x x
x x x x x
x x x x x x
x x x x
x x x x x x
x
xx
MPEG1-- Video Formats(c.2)
4:2:2 Format
4:2:2 Macroblock
x x x x x x
x x x x x x
x x x x x
x x x x x x
x x x x
x x x x x x
x
xx
Y Cb Cr
MPEG1-- Video Formats(c.3)
4:4:4 Macroblock
x x
x x
x x
x x
x
x x
x
Y Cb Cr
x
x
x
x
x
x
x x
x x
x x
x x
x
x x
x
x
x
x
x
x
x
MPEG1-- Frame Reorder
Encoder Input
GOP1 is a closed GOP while GOP2 a open GOP.
Encoder Output
Decoder Ouput
1(I) 2(B) 3(B) 4(P) 5(B) 6(B) 7(P) 8(B) 9(B) 10(I) 11(B) 12(B) 13(P)
1(I) 4(P) 2(B) 3(B) 7(P) 5(B) 6(P) 10(I) 8(B) 9(B) 13(P) 11(B) 12(B)
1(I) 4(P) 2(B) 3(B) 7(P) 5(B) 6(P) 10(I) 8(B) 9(B) 13(P) 11(B) 12(B)
GOP1 GOP2
MPEG1-- Data Hierarchy
Video Sequence
The highest syntctic structure of the coded video bitstream.
Sequence header, sequence extension.
Group of Pictures(GOP)
PictureSlice Macroblock Block
MPEG1-- Data Hierarchy (GOP)
I Pictures
Not dependent on another pictures
P Pictures
Predicted from I or P pictures
1 2 3 4 5 6 7 8 1I B B B P B B B I
Forward Prediction
Bidirectional Prediction
B Pictures Predicted from nearby I
and/or P pictures
MPEG-1 Frame Types63
I : All information for frame present.
P: Predictively encoded from previous I or P.
B: Predictively encoded from previous I or P
and next I or P.
I P IP P PB B B B B B B B B B
MPEG-1 Display vs. Bitstream Order
64
Problem: B-frames depend on future—how could they be
decoded?
Solution? Reorder frames!
I P IP P PB B B B B B B B B B
1 4 167 10 132 3 5 6 8 9 11 12 14 15
1 2 145 8 113 4 6 7 9 10 12 13 15 16
MPEG1-- Functional Blocks
DCT-Video Input Output
Buffer
ME & MC
Frame
store
Q-1 &
DCT-1
Q VLC
Changable per Macroblocks
Control Signals• Macroblock type
• Coded Block Pattern
• Quantizer Scale Factor
MPEG1-- Motion Compensation & Estimation
Objects
Reduce the temporal redundancy
Motion Compensation
Process of compensating for the displacement of moving objects from one frame
to another.
Motion Estimation
The process for finding corresponding pixels in the frame; this process is referred
to as motion estimation.
t t+1 t+2
Macroblocks(16x16) Search Area
BPI
Motion/No motion: check if a motion vector transmitted or is it assumed to be zero.
Intra/Non Intra: check the variance of the estimation errors using the vector in step 1.
Coded/not coded: check if the residual is large enough to be coded using DCT
Quant/No Quant: check if the quantizer scale is satisfactory or should be changed.
MPEG1-- Encoded Tree for Macroblocks in P-picture
Macroblocks
Motion
No Motion
Non Intra
Intra
Coded No MQuant
Not Coded
Quant
No QuantQuant
No QuantQuant
No QuantQuantCoded
Not Coded
Intra
Non Intraskipped MB
MPEG-1 Rate Control68
Sequence level: within a GOP (Group of Pictures)
B-frames are easiest to eliminate
Frame level:
Quantization step adjustment
Dropping of higher order coefficients
Constrained Parameter Bitstream (CPB)
Horizontal size 768 pixels
Vertical size 576 pixels
396 macroblocks/frame @ 25 fps (352 x 288 pixels)
330 macroblocks/frame @ 30 fps (352 x 240 pixels)
Rate is 1-1.5 Mbit/s
CPB is understood to be the MPEG-1 typical setup
MPEG1-- Tree Decision for Macroblocks in P-
picture
VAR: variance of original
VAROR: variance of reconstructed error
xBD motionvector
( ) 0
256
yBD motionvector
( )0
256
1
1.5
3
0.5
2.7y=x/1.1
Motion Compensation
No
Motion
Compensation
VAR64
64
Intra
Non Intra
VAROR
MPEG1-- Macroblock Types in B-picture
Macroblock Types
Intracoded
Forwrd predictive coded
Backward predictive coded
Bidirectional predictive coded
Bidirection prediction
Yields accurate prediction in the case of cover/uncovered images.
Two pictures are needed to decode a B picture.
Significant compresion relative to unidirectional prediction (7 kbits/picture versus 100 kbits/picture).
MPEG1-- Encoded Tree for Macroblocks in B-picture
Macroblocks
Forward
Coded
No MQuant
Not Coded
MQuant
CodedNo MQuant
Not Coded
MQuant
Coded
Not Coded
No MQuant
MQuant
No MQuant
MQuant
Backward
Forward &
Backward
Intra
MPEG-1 Bitstream Syntax72
Macroblock and Slice73
MPEG-1 Bitstream Syntax (2)74
extension.header 0000 01B5
GOP.start 0000 01B8
picture.start 0000 0100
reserved 0000 01B0
reserved 0000 01B1
reserved 0000 01B6
sequence.end 0000 01B7
sequence.error 0000 01B4
sequence.header 0000 01B3
slice.start.1 0000 0101
…
slice.start 0000 01AF
user.data.start 0000 01B2
MPEG1-- Discrete Cosine Transform
Objects
Orthogonal transform
Filter-bank-oriented
With the frequency domain interpretation
A fast algorithm and a close approximation to the optimal for a large class of
images
Transform a block in spatial domain into another domain suitable for removing
spatial and psychovisual redundancy
DCT u vC u C v
I j k j u k v
where C x for x and C x for x
kj
( , )( ) ( )
( , ) cos[( ) / ]cos[( ) / ]
( ) , ( )
4
2 1 16 2 1 16
1
20 1 0
MPEG1-- Quantization
Concepts
The combination with run-length coding contribute
to most of the compression
Visual quality achievement by adaptive
quantization
Coarser quantizer for higher frequencies
Application specification quantization matrix
8 16 19 22 26 27 29 34
16 16 22 24 27 29 34 37
19 22 26 27 29 34 34 38
22 22 26 27 29 34 37 40
22 26 27 29 32 35 40 48
26 27 29 32 35 40 48 58
26 27 29 34 38 46 56 69
27 29 35 38 46 56 69 83
For Intra blocks (both luminance and chrominace)
Nonintrablocks all scales are 16
(( , )
( , ) * _) ( , )
int
( ( , ) int
32 1
2
0
I v u
W v u quantizer scalek QF v u
where k for rablock
k sign I u v for non rablock
Quantizer77
MPEG1-- VLC & Runlength Coding
Reduce the coding redundancies
Runlength coding in Zig-Zag Scanning
Variable length coding for the value and runs
3 0 0 0 0 0 0 0
2 -2 0 0 0 0 0 0
4 0 20 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
run-length level run value
MPEG1-- Syntax
Sequence Layer
Sequence_header_code
Horizontal_size
Vertical_size
Pel_aspect_ratio
Picture_Rate
Bit_Rate
Quantizer specification
User Data
MPEG1-- Syntax (c.1)
Group of Picture Layer
Group start code
Time code
Open/closed GOP
User data
Picture Layer
Picture header
temporal_reference
picture_coding_typw (I, P, B, DC only)
MPEG1-- Syntax (c.2)
Slice (resynchronization)
Slice_start_code
Macroblock Layer(motion compensation)
Macroblock header
Macroblock_address_increment
Macroblock_type
Quantizer scale
Motin_vector
Block Layer-Block Data (DCT Unit)
DCT data
MPEG1-- Compression Ratios at Each Stage
Single Frame at 640x480x24 bpp 910KB
Preprocess (filter to reduce noise) ??
YUV (4:2:2) Conversion (from RGB) 460 KB
Scaling to CIF 115 KB
DCT 115 KB
Quantization 115 KB
Run-Length + Huffman Coding 24 KB
Intraframe Compression 5 KB
23:1 Compression ratio or a 184:1 compression ratio
MPEG2-- MPEG Working Group
MPEG
Formed in 1988 to establish standards for coding of moving
pictures and associated audio for various applications such
as storage media, distribution and communication.
ITU-T SGXV
The Experts Group for ATM Video Coding was formed in
1990 to develop video coding standards appropriate for B-
ISDN using ATM transport.
MPEG2 Draft
Draft was prepared by MPEG and ITU-T SG15
MPEG2 Standards
Eight Parts
13818-1 System
13818-2 Video
13818-3 Audio
13818-4 Conformance Testing
13818-5 Simulation Software
13818-6 Digital Storage Media Command and Control (DSM-CC, July, 1996)
13818-7 Nonbackwards Compatible Audio (April 1997)
13818-8 10 bit Video
13818-9 Real-Time Interface (July 1996)
Profiles
4:2:2 Profile ( Jan, 1996)
Multiview Profile (Nov., 1996)
MPEG2 Applications
BSS Broadcasting Satellite Service (to the home)
CATV Cable TV Distribution on Optical Networks, Copper, etc.
CDAD Cable Digital Audio Distribution
DAB Digital Audio Broadcasting (terrestrial and satellite broadcasting)
DTTB Digital Terrestrial Television Broadcasting
EC Electronic Cinema
ENG Electronic News Gathering(including SNG, Satellite News Gathering)
FSS Fixed Satellite Service (e.g. to head end)
MPEG2 Applications (c.1)
HTT Home Television Theatre
IPC International Communications (videoconferencing, videophone, etc)
ISM Interactive Storage Media (optical disks, etc)
MMM Multimedia Mailing
NCA News and Current Affairs
NDB Networked Database Services (via ATM, etc.)
RVS Remote Video Surveillance
SSM Serial Storage Media (digital VTR, etc.)
MPEG2-- System
Video audio synchronization
Multiplexing multiple programs
Transporting over communication channels
Multi-media on CD-ROM
Broadcasting
Digital Storage Media Command and Control (DSM)
Protocal
MPEG2-- Audio
Three Layers Coding (I, II, III)
3/2 Stereo (3 front/ 2 surround) plus Low
Frequency Enhancement (LFE) Channels
Downwards and Backwards Compatibility
Multi-Lingual Capability
Multi-Channel Audio Coding
MPEG2-- Video
Scalable and Nonscalable Syntax
Profiles and Levels
Progressive and Interlaced Sequences
Frame and Field Picture Processing
Error Concealment
MPEG2-- Video
MPEG-2 Enhancements
– Basic coding mode is interframe DCT with I, P, B pictures
– New field/frame prediction modes for interlace support
– Quantization/coding extensions to MPEG-1 syntax for improved quality Improved quatization with greater range/adaptive
Mew Intra-frame VLC‘s
– Scalability extensions for hierarchical service, robustness, etc. Spatial scalability modes for compatibility
Temporal scalability
SNR scalability
DATA partitioning (frequency scalability)
– New system layer for multiplexing, transport, etc.
MPEG2-- Video
MPEG-2 Prediction Modes
– MB syntax extended to include a number of alternative
prediction modes for better compression of interlaced video
– Frame-based prediction (identical to that of MPEG-1)
I B P
1 or 2 vector
MPEG2-- Video
– Field-based prediction
Each field of a MB is predicted separately in this mode
– Adaptive field/frame selection based on better match (should say better compression performance)
I B P
2 or 4 vectors 2 vectors
MPEG2-- Video
– Special prediction mode - dual prime
Basically a set of field motion vectors with a scaling to near
or far field, plus a transmitted delta
Reference Prediction
MPEG2-- Video
Field/frame DCT coding syntax
Field DCT Coding Luminance Macroblock Frame DCT Coding
Note: Chrominance blocks in 4:2:0 mode are always DCT coded in Frame order
MPEG2-- Video
Alternative Zig-Zag scan
8x8 block of quantized DCT coefficients
Normal Zig-Zag scan.
Mandatory in MPEG-1
Option in MPEG-2
Alternative Zig-Zag scan
Not used in MPEG-1
Option in MPEG-2
For Frame DCT
coding of interlaced
video, more energy
exists here, so run
length coding is more
efficient.
MPEG2-- Profile and Levels
Concepts
MPEG2 is a generic standard and it is not practical to implement the full specification at the early stages of its adoption.
A limited number of subsets have been defined by means of "profile" and "level".
Profile
A subset of the bitstream syntax. Within this subset, it is still possible to have a large variation in encoders and decoders on values taken by parameters in the bitstream.
Levels
Levels are defined within each profile to deal with the variation in a profile.
A level within a profile is a defined set of constraints imposed on parameters in the bitstream.
level
profilesyntax
profile
Levels
High
High
(1440)
Main
Low
Simple (4:2:0) Main (4:2:0)Main+ (4:2:0)
(scalable)
High
(scalable)Max. Resolution
Y samples/sec
Min. Resolution
Y samples/sec
# of layers
Bit-Rates (Mbps)
Max. Resolution
Y samples/sec
Min. Resolution
Y samples/sec
# of layers
Bit-Rates (Mbps)
Max. Resolution
Y samples/sec
Min. Resolution
Y samples/sec
# of layers
Bit-Rates (Mbps)
Max. Resolution
Y samples/sec
Min. Resolution
Y samples/sec
# of layers
Bit-Rates (Mbps)
720/576/30
10.4M
1/1/1
15
1920/1152/60
62.7 M
1/1/1
80
1440/1152/60
47.0 M
1/1/1
60
720/576/30
10.4M
1/1/1
15
352/288/30
3.05 M
1/1/1
4
1440/1152/60
47.0 M
720/576/30
10.4 M
3/2/2
60(a), 40 (mid+b), 15(b)
720/576/30
10.4 M
2/1/2
15(a), 10(b)
352/288/30
3.05 M
2/1/2
4(a), 3(b)
1920/1152/60
62.7/83.6 M
960/576/30
14.8/19.7 M
3/2/2
100(a), 40(m+b), 15(b)
1440/1152/60
720/576/30
11.1/14.8 M
3/2/2
80(a), 60(m+b), + 20(b)
720/576/30
11.1/14.8 M
352/288/30
3.05 M
3/2/2
20(a), 15(m+b), 4(b)
MPEG2-- Profile and Level (c.1)
MPEG2-- Compatibility between Different Profiles/Levels
NP
@
HL
NP
@
H-14
NP
@
ML
M+
@
H-14
M+
@
ML
MP
@
LL
MP
@
HL
MP
@
H-14
MP
@
ML
MP
@
LL
SP
@
ML
NP@HL x
NP@H-14 x x
NP@ML x x x
M+@H-14 x x x
M+@ML x x x x x
M+@LL x x x x x x
MP@HL x x
MP@H-14 x x x x x
MP@ML x x x x x x x x x
MP@LL x x x x x x x x x x
SP@ML x x x x x x x x x
MPEG2-- Scalable Extensions
Motivation
Support applications such as video on ATM,
interworking of video standards, HDTV with embedded
TV, etc.
Four Modes of Scalability
Data Partitioning
SNR Scalability
Spatial Scalability
Temporal Scalability
HDTV Compression100
―Grand Alliance‖
FCC-encouraged partnership to define HDTV standard
MPEG-2 compression
HDTV == MP@HL
H.263101
Based on H.261
Focus on non-interlaced video
GOBs/slices
Strip of pixels w/ multiple of 16
Bottom strip may have fewer
GOB macroblocks
Main upgrades
Works with P & I frames
Motion compensation [-16, 15.5]
Prediction == median of motion vectors of neighbors
Half-pixel motion compensation
H.263: Bitstream Structure102
H.263 Optional Modes103
Unrestricted motion vector [-31.5, 31.5] useful for higher resolutions
Motion vector can point outside picture
Syntax-based arithmetic coding Var-length codes replaced w/ AC, m = 16
Specifies various CC tables:
MVector, intra-DC, intra-/inter-AC coefficients
Advanced prediction Four luminance vectors (vs. one for baseline)
Overlapped Block Motion Compensation (OMBC)
Weighted sum of predictions
PB-frames P + B picture/frame
H.263+ Modes104
Advanced intra coding
Prediction-based encoding for coefficients
Deblocking filter
Smoothing of block boundaries for better prediction
Reference picture selection
Selection of reference frame other than the preceding
Temporal, SNR, & spatial scalability
Similar to MPEG-2
Temporal achieved through separate B frames
SNR through layering
Spatial through upsampling
H.263+ Modes (2)105
Reference picture resampling
Resizing/warping of reference picture to obtain better prediction
Reduced resolution update
For highly active scenes
Macroblock is assumed twice as high/wide
Alternate VLC
Enables the use of intra frame codes for inter coding
Helps during high-activity periods
Modified quantization
Split luminance/chrominance quantization
Escape sequences for overload situations
Enhanced reference picture selection
H.264/MPEG-4 Part 10106
Same baseline macroblock structure, plus Submacroblocks:
8x4, 4x8, 4x4, 8x8, 16x8, 8x16
Motion compensation Variable granularity motion tracking at various
levels of detail
Quarter-pixel accuracy
Block-edge filters
Up to 32 possible reference pictures
B pictures—up to two motion vectors (as before)
Pskip—only motion vector is transmitted
H.264 Transform107
New 4x4 DCT/DWHT combination
Pros
Simpler implementation
Better for small stationary
Less noise & noise propagation
Cons
Not normalized--compensated through scaling during quantization
H.264 Intra Prediction108
H.261-263—no de-correlation for I frames
H.264—prediction for intra-coding (9 modes):
109
Components in an MPEG-4 Terminal
...N
e
t
w
o
r
k
L
a
y
e
r
Hierarchical, Interactive,
Audiovisual Scene
...
Elementary
Streams
Demultiplex
...Primitive
AV Objects
Decompression Composition and
Rendering
...Upstream Data
(User Events, Class Request, ...)
Composition
Information
Scene Description
(Script or Classes)
110
Basics of MPEG-4
A scene is constructed of multiple independent objects
Audio or visual, natural or synthetic
Objects can be encoded separately with scene description information
This allows to create the combination of different object types, e.g., animation with natural
video, 3D mesh, Web papers, ...
Objects are composited in a scene at the decoder side:
MPEG-4 has standardized a binary format for scene description, referred to as BIFS,
which is based on VRML
This allows to multiplex and synchronize the data associated with objects, so
that they can transported over network providing a QoS appropriate for
the nature of the specific objects
And interactivity with audiovisual scene generated at the receiver‘s side
111
Parts of the StandardPart 1: Systems Part 6: DMIF
Part 2: Visual Part 7: Optimized encoder tools
Part 3: Audio Part 8: MPEG-4 on IP
Part 4: Conformance Part 9: Reference hardware
Part 5: Software framework Part 10: Advanced Video Coding
112
Video Object and Video Object Plane
VO3
(Background)
VO2 (Moving
Object)
VO1 (Stationary
Object)
VOP: instance of a video object at a
given time
113
Syntax Hierarchy
VOS0
VO0
VOS1
VO1
VOL0 VOL1
GOV0 GOV1
VOP0 .....VOPn VOPn+1 .....VOPm
Video Object Sequence
Visual Object
Video Object Layer
Group of Video Object Plane
Video Object Plane
114
Encoder/Decoder Structure
InputVOP
Definition
VOP 0
Coding
VOP 1
Coding
VOP 2
Coding
MUX Bitstream
OutputComposition
VOP 0
Decoding
VOP 1
Decoding
VOP 2
Decoding
DEMUXBitstream
115
VOP Encoder
Shape
Coding
Motion
Estimation
Motion
Compensation
Texture
Coding
MU
X
Previous Reconstructed
VOP
Buffer
Motion
Coding
+
-
116
VOP Formation
...
Control MB
Tightest Rectangle
Extended
Bounding
Box
Intelligently generated VOP
: control point
...
Object
Code the boundary of each
block with shape coding
117
Binary Shape Coding
Context-based arithmetic encoder (CAE)
Basic idea
operates on the macroblock level
compute a context from neighboring pixels, 8 or 9 bit integer
based on context, use LUT to get probability (pixel is either 0 or 1)
obtain sequence of probabilities that drive an arithmetic encoder
C9 C8 C7
C5 C4 C3C6 C2
C0 xC1
C8
C6 C5C7
C4
C3 C2 C1
C0 x
Intra
Previous Current
Frame
Inter
a b
k
k
kcC 2
118
Lossy Shape Coding
CAE is able a achieve a lossless representation
For rate reduction, MPEG-4 allows the encoder to sub-sample the blocks by
a factor of 2 or 4 - lossy shape coding
distortion is the difference btwn the original and up-sampled block
must also transmit the conversion ratio
downsampling
upsampling
MxCR
MxCR
M
M
conversion error
M
M
119
P-VOP Motion
Estimation/Compensation
Basic techniques
motion estimation (ME) modified for arbitrarily shaped VOP
full and half pixel motion vectors, Intra/Inter decisions
Padding Process
pixels outside of VOP boundary must be padded before ME
padded pixels are not included in matching process
Advanced prediction
16x16, 8x8, and field predictions
16x16 mode
(1 MV)8x8 mode
(4 MV’s)
120
Differential Coding of MV‘s for P-
VOPs
MVMV1
MV2MV3
MV
MV3MV2
MV1 MVMV1
MV2MV3
MVMV1
MV2MV3
MV
MV3MV2
MV1
For 16x16 mode For 8x8 mode
MVDx = MVx - Px
Px = median(MV1x, MV2x, MV3x)
121
Texture Coding
Basic techniques
DCT with motion compensation as in MPEG-2
VLC tables for DC coefficients
run-length coding and VLC tables for AC coefficients
New methods
Intra DC/AC prediction for I- and P-VOPs
texture coding for arbitrarily shape VOPs
low-pass extrapolation (LPE) technique
shape-adaptive DCT (SA-DCT)
supports both H.263 and MPEG quantization methods
122
Adaptive DC Prediction
Choose best DC predictor based on gradients of the DC values
if (|QDCA - QDCB| < |QDCB - QDCC|)QDCX’ = QDCC
else
QDCX’ = QDCA
Obtain differential DC value from this best predictor
A
B C D
X MacroblockY
or or
123
Adaptive AC Prediction
Either coefficients from
the first row or the first
column of a previous
coded block are used to
predict the co-sited
coefficients of the current
block
The best direction is
chosen from the direction
of the DC prediction
A
B
X
DC
or
Macroblock
Y
or
124
Q-Step Scaling for AC Prediction
To compensate for differences in the quantization of previous horizontaly or
vertically adjacent blocks used in AC prediction of the current block, scaling
of prediction coefficients becomes necessary
For example, if block A was chosen to predict block X
Note, complexity increase due to division and storage of previous row of
AC coefficients
00 '
i A Ai X
X
QAC QPQAC
QP
Low-Pass Extrapolation Padding125
3 types of MB‘s in a VOP with arbitrary shape completely located inside VOP (no special treatment)
completely located outside VOP (skipped, 1 bit)
blocks that lie on the boundary (need padding before DCT)
LPE padding for intra blocks only Step 1: assign pixels outside the VOP boundary the mean value of pixels inside
the VOP
Step 2: beginning from upper-left, process outside pixels row by row, taking the
average of 4 pixel values
This process is intended to fill in the undefined pixel
values, while not adding significant energy to HF
126
Shape-Adaptive DCT (SA-DCT)
The SA-DCT algorithm is based on predefined orthonormal sets of DCT basis functions
Apply 1D DCT vertically and horizontally according to the number of active pixels in the row and column of the block
Final number of the SA-DCT coefficients is identical to the number of active pixels of image
Zigzag scan is modified so that non-active coeffs are neglected
Column
DCTs
Row
DCTs
Active image pixels Coefficients of Column DCTs SA-DCT result
127
Error Resilience: Resynchronization
Enable resynchronization between the decoder and bitstream after an error has been detected
Packet approach: provide periodic resync markers based on the number of bits within a packet, not the
number of MB‘s in a packet
header information is contained at the start of a packet so that decoding can be restarted
all predictively coded info must be contained in one packet to prevent error propagation
Interval synchronization to avoid start code emulation: start codes appear only at legal fixed interval locations
128
Error Resilience: Data Recovery
After synchronization has been reestablished, data recovery attempts to
recover data that would be lost
Reversible Variable Length Codes (RVLC)
Huffman codes are designed to be read in both directions
loss of coding efficiency, but substantial increase in error resilience
Resync
Marker
Macroblock
_number
quant
_scale
HEC Motion & Header
Information
Motion
Marker
Texture
Information
Resync
Marker
Texture
Header
TCOEF
Forward Backward
Errors
Decode Decode
129
Error Concealment
Assuming resynchronization can localize errors, attempt to conceal errors by using available info
Data Partitioning: separate motion and texture bits
place resync marker in between
if texture is lost, use motion to conceal the error
Outside of the standard, error concealment can be done in a number of other ways for example, if motion vector is lost, try to predict it from ones that have already
been decoded
130
Sprite Coding
A sprite is an image composed of pixels that are visible throughout an entire video segment
e.g., sprite contains all the background pixels in a panning sequence
Initial sprite coded with I-VOP techniques, then updated
Sprites can be used to reconstruct and predict VOPs
Need to estimate warping parameters that define the relation between the sprite and pixels in a VOP (global motion parameters)
Good coding efficiency for scenes with global motion
131
Object-Based Scalable Encoding
Spatial Scalability
Temporal Scalability
Enhancement layer
Base layer
P
I
B B
P P
VOL0
frame number
Base Layer
0 6 12
Enhancement
Layer
VOL1
frame number0 6 122 4 8 10
132
Wavelet-based Texture Coding
Still Texture Object
Decomposition of image using DWT
high coding efficiency
excellent for spatial and SNR scalability
Quantization of wavelet coefficients
LL band is coded using DPCM
higher order band are coded using zero-tree
Entropy coding using adaptive arithmetic encoder
QUANTZeroTree
ScanningAC
QUANT AC
Other
Bands
Bitstream
PredictionLow-Low
input
DWT
T
133
2D Mesh and Face Animation
MW0
MNS0
ENS0
ES0IRISD0
Mesh Objects
tessellation of 2D planar region into polygon patches
vertices of mesh are referred to as node points
node points are warped from one frame to the next; motion information is coded
• Face Objects
– shape, texture and expressions
of the face are controlled by
Facial Animation Parameters
– FAPs can also be used for
accurate speech articulation,
where visemes are used to
code lip configurations and
mood of the speaker
134
Amendments to Visual Part
2000 Edition issued in Jan‘01 (version1 + version2) included in the 2000 edition are:
all tools/techniques discussed so far
additional coding efficiency tools (global motion, 1/4-pel motion, SA-DCT)
increased flexibility in object-based scalable coding
improved error robustness; NEWPRED tool that switches ref frames
dynamic resolution conversion
Amendment 1 tools to support Studio Profile; spec has been frozen Jan‘01
Amendment 2 tools to support Streaming Video Profile; spec has been frozen Jan‘01
major addition: Fine Granularity Scalability (FGS) where enhancement layer is
bit-plane coded
135
Amendment 1 - Studio Profile
Objectives Object-based techniques for video creation
Higher coding efficiency for studio storage
Applications Professional broadcast, Studio and post production, Inter-studio transmission
Requirements Formats - 4:2:2, 4:4:4 (YUV and RGB), progressive and interlaced
Resolutions - up to 2048 by 2048 pixels per VOP
Bit-rates - up to 1.2 Gbps bitrate with up to 12 bits pixel depth
Lossless coding capability
Support for Binary Shape, Grayscale Shape for Alpha transparency, depth, displacement, Sprites
136
New Tools in Studio Profile
High efficient VLC for high bitrate
Grouping of DCT coefficients based on their values
Recursive selection of VLC table for groups of coefficients as function of
previously coded group
Coded data = group indicator + fixed length code determining the actual value
within the group
Flexible access by special slice structure
New tools for lossless coding
137
Profile Definitions
Simple Studio Profile
To be applied for image acquisition and editing
Only Intra coding for independent processing of frames
Lossless transcoding from MPEG-2 4:2:2 Profile
Support for Arbitrary Shape (binary or grayscale)
Core Studio Profile
To be applied for inter-studio transmission
Inter (P-VOP) coding for more efficient compression
Support for Sprites
138
Amendment 2 - Streaming Video
Profiles
Encoder DecoderChannel
Traditional Model of a Communication System
Internet Streaming Applications
Encoder
Server DecoderChannel
Server
Channel
Channel
Channel
Decoder
Decoder
Decoder
Basic assumptions of traditional model:
Encoder knows channel capacity
Decoder is able to process all received bits
139
New Objective for Video Coding
Channel Bandwidth
Received
Quality
Traditional
Source
Coding
New
Objective
Good
Moderate
Bad
HighLow
Traditional
Distortion-Rate
Curve
140
Fine Granularity Scalability
Motion compensated DCT coding in base layer to reach lower bound of
bitrate range
Bitplane coding of DCT coefficients in enhancement layer to cover bitrate
range
Enhancement layer bitstream may be truncated into any number of bits per
frame
Decoder may ignore some enhancement bits
Reconstructed video quality proportional to number of decoded bits
141
Basic Encoder Structure
DCT Q
Q-1
IDCT
Motion
Compensation
Motion
Estimation
Frame
Memory
VLCInput Video
Base Layer
Bitstream
Bit-plane
Shift
Find
Maximum
Bit-plane
VLC Enhancement
Bitstream
Enhancement Layer Encoding
Clipping
DCT
142
Basic Decoder Structure
VLD Q-1 IDCT
Motion
Compensation
Frame
Memory
Bit-plane
VLDIDCT
Enhancement Layer Decoding
Base Layer
Bitstream
Enhancement
Bitstream
Base Layer Video
(optional output)
Enhancement VideoClipping
Clipping
Bit-plane
Shift
143
Basic Bitplane Coding Technique
+
-
+
- +1
0
0+
0
0
0
0
1
111
1
1
1
MSB
LSB
Bit-Plane
A block of 8x8 DCT coefficient differences
Zigzag ordering of a block of 8x8 DCT
coefficient differences
+ - +
1
0
0
0
1
11
MSB
LSB
Bit-Plane
A block of 8x8 DCT coefficient differences after zigzag ordering
+
0
+
0
-
0
1
10 0 0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0 1
0
0
0
0 0
0
0
0
0
18 zeros12 zeros
20 zeros
MSB
LSB
Bit-Plane
(RUN, EOP) symbols for a block
of 8x8 DCT coefficient differences
after zigzag ordering
(0, 1)
(28, 1)
(6, 0)
(0, 0) (0, 0) (26, 1)
(2, 0) (31, 1)
144
Application: Varying BW Environment
Video
Source
ConsumerConsumer
ConsumerConsumer
Video
SourceConsumer
Time
Bandwidth
User Variation
Temporal
Variation
User Variation: bandwidth varies from user to user
Temporal Variation: bandwidth varies with time
145
Multiplexer CBR Channel
FGS Enhancement Layer Encoder
VBR Base Layer Encoder
Data Server
FGS Enhancement Layer Encoder
VBR Base Layer Encoder
Video 1
Video N
Data
Base 1
Enh 1
Base N
Enh N
Data
Application: Statistical Multiplexing
146
MPEG-4 System Layer Model
SL SL SL
TransMux Layer
FlexMux
TransMux Streams
FlexMux Channel
TransMux Channel FlexMux Streams DMIF Network Interface
DMIF Application Interface
Elementary Stream Interface
SL-Packetized Streams
Elementary Streams
FlexMux
Sync Layer
DMIF Layer
SL SLSL
FlexMux
SL
(RTP)
UDP
IP
(PES)
MPEG2
TS
AAL2
ATM
H223
PSTN
....
....
....
DAB
MuxFile Broad-
cast
Inter-
active (not specified in MPEG-4)
Del
iver
y L
ayer
148
Profile Definitions of Version 1
Simple Profile
Bssic tools of I/P VOP AC/DC Prediction and 4 MV unrestricted
Short header and Error Resilience tools
Core Profile
Simple + Binary Shape, Quantization Method 1/2 and B-VOP
Main Profile
Core + Grey Shape, Interlace and Sprite
Simple Scalable Profile
Simple + Spatial and temperal scalability and B-VOP
N-Bit Profile
Core + N-Bit
Animated 2D Mesh
Core + Scalable Still Texture, 2D dynamic Mesh
Basic Animated Texture
Banary Shape, Scalable Still Texture and 2D Dynamic Mesh
Still Scalable Texture - Scalable Still Texture
Simple Face - Face Animation Parameters
149
Profile Definitions of Version 2 Advanced Real Time Simple Profile
Simple +
Advanced error resilience with back channel,
improved temporal scalability with low buffering delay
Core Scalable Profile
Simple scalable +
Core +
SNR, Spatila/Temporal Scalability for Region or Object of interest
Advanced Coding Efficiency Profile
Tools for improving coding efficiency for both rectangular and arbitrary shaped objects
For applications such as mobile broadcast reception
Advanced Scalable Texture Profile
Tools for decoding arbitrary shaped texture and still image including scalable shape coding
Advanced Core Profile
Core Profile +
Tools for decoding arbitrary shaped video objects and arbitrary shaped scalable still image
Simple Face and Body Animation Profile
Simple face animation + body animation
150
Profile Definitions in subsequent version
Advanced Simple Profile
Simple Profile +
Several tools to make it more efficient:
B-frames
1/4 pel motuon compensation
Extra quantization tables
Global Motion Compensation
Fine Granularity Scalable Profile
Use Advanced Simple Profile as base layer
Fine granularity scalability (FGS)
Fine granularity scalability - temporal (FGST)
Simple Studio Profile
I-frames only
Arbitrary shape
Multiple alpha channels
Up to 2 Gbps
Core Studio Profile
Simple Studio Profile + P-frames
151
MPEG-4 Video Profiles @ Levels
Spatial
&
Temporal
Scalability
Arbitrary
Shape
Rectangular
Frame
No
Scalability
Quality
&
Temporal
Scalability
Additional
Tools
Higher
Error
Resilience
Simple
Core
Simple
Scalable
Core
Scalable
Main
Advanced
Simple
Advanced
Coding
Efficiency
Fine
Granularity
Scalable
Advanced
Realtime
Simple
Simple
Studio
Core
Studio
Additional
Tools
IS
AMD-1
AMD-2
Profiles are used to limit the set of tools in a decoding device
Levels are used to place limits on complexity
153
Visual Object Types/Tools 0f V.2
Visua Tools Visual Object Types
Advanced Real
Time Simple
Advanced Coding
Efficiency
Advanced Scalable
Texture
Core Scalable Simple FBA
Basic
•I/P-Vop
•AC/DC Prediction
•4-MV, Unrestricted MV
X X X
Error Resilience
•Slice resynchronization
•Data partitioning
•Reversible VLC
X X X
Short Header X X X
B-VOP X X
P-VOP with OBMC (Texture) X X
Method 1/Method 2
Quantization
X X
P-VOP based Temporal
Scalability
•Rectangular
•Arbitrary Shape
X X
Binary Shape X X
Grey Shape X
Interlace X
Sprite
154
Visual Object Types/Tools 0f V.2
(Cont‘d)Visua Tools Visual Object Types
Advanced Real
Time Simple
Advanced Coding
Efficiency
Advanced
Scalable Texture
Core Scalable Simple FBA
Temporal Scalability (Rectangular) X
Spatial Scalability (Rectangular) X
N-Bit
Scalable Still Texture X
2D Dynamic Mesh with uniform
topology
2D Dynamic Mesh with Delaunay
topology
Facial Animation Parameters X
Body Animation Parameters X
Dynamic Resolution Conversion X
NEWPRED X
Global Motion Compensation X
¼ Pel motion Compensationn X
SA-DCT X
Error Resilience for Visual Texture
Coding
X
Wavelet Tiling X
Scalable Shape Coding for Still Texture X
Object Based Spatial Scalability X
156
Visual Profiles of V.1
Object Types Simpl
e
Simple
Scalable
Core Core
Scalable
Advance
d Real
Time
Simple
Advance
d Coding
Efficienc
y
Advanced
Scalable
Texture
Simple
FBA
Profiles
V2-1 Advanced
Real Time
Simple
X X
V2-2 Core Scalable X X X X
V2-3 Advanced
Coding
Efficiency
X X X
V2-4 Advanced
Core
X X X
V2-5 Advanced
Scalable
Texture
X
V2-6 Simple FBA X
159
What‘s New in MPEG-4 Visual
New Video Codec: ―MPEG-4 Advanced Video Coding‖
It is developed by JVT (Joint Video Team) of ITU-T and MPEG
Major task is the coding performance improvement
It is based on H.26L
It will be Part 10 of MPEG-4 at MPEG
It probably will be H.264 at ITU-T
Animation Framework eXtension - AFX
High level description of anumation
Enhanced rendering
Compact representations
Low bit rate animations
Scalability based on terminal capabilities
Interactivity at user level, scene level and client-server session level
Compression of representations for static and dynamic tools
3D Video Coding
Interframe Wavelet Coding
160
MPEG-4 Advanced Video Coding (1)
Summary of Fairfax meeting
A total of 160 proposals have been submitted to JVT
Working draft WD.1 has been created
Two profiles: baseline profile and main profile have been decided
Baseline profile to be royalty free, main decoding features include:
I, P pictures
In loop deblocking filters
Interlace support (Level dependent, Level 2.1 or above)
1/4 pel motion prediction
Tree-structured motion segmentation down to 4x4
VLC-based entropy coding
Flexible Macroblock ordering
Main profile
Including all features in baseline profile and adds
B-pictures,
CABAC (Content Adaptive Binary Arithmetic Coding)
Adaptive Block-size Transforms
1/8-sample motion compensation
161
MPEG-4 Advanced Video Coding (2)
Notes on Profiles and Levels
Motion vector range will be limited
A limit is imposed on extreme aspect ratios
Number of reference pictures increases with picture size, never exceeding 15
TBD's
Exact values of motion vector range limit
Smaller than 8x8 bi-predictive motions in B-pictures for Main profile
Adaptive B-picture interpolation in Main profile
Unfulfilled requirements
No 4:2:2 source format support (pending further study)
Mixing Intra and Inter coding type within macroblocks
Data partitioning
SP & SI ("switching" pictures)
Level's summarized with typical format as follows:
Level 1 = QCIF @ 15 (Intermediate levels 1.1 = CIF @ 7.5, 1.2 = CIF @ 15)
Level 2 = CIF @ 30, (Intermediate levels 2.1 = HHR and 2.2)
Level 3 = SDTV (Intermediate levels 3.1, 3.2)
Level 4 = HDTV
Level 5 = SHDTV (1920x1088 @ 60p)
162
MPEG-4 Advanced Video Coding (3)
Technical summary
Order of bitstream within MB: total seven modles
0 0 1 0 1
2 3
16x16 16x8 8x16 8x8
8x8 8x4 4x8
0 0 1
1
0
0 1
2 3
4x4
1
0
MB-Modes
8x8-Modes
CBPY 8x8 block order
0 1
2 3
4 5
6 7
8 9
10 11
12 13
14 15
Luma residual coding 4x4 block order
18 19
20 21
22 23
24 25
16 17
VU
2x2 DC
AC
Chroma residual coding 4x4 block order
0 1
32
163
MPEG-4 Advanced Video Coding (4)
Motion compensation
Motion vector data for 1-16 blocks are transmitted
Motion vector prediction
Median prediction is used except for 16x8 or 8x16 blocks
• The prediction of E is formed as median of A, B, and C
Directional segmentation prediction
• Vector block size 8x16
Left block: A is used if it has same reference picture as E, otherwise "median prediction" is used
Right block: C is used if it has same reference picture as E, otherwise "median prediction" is used
• Vector block size 16x8:
Upper block: B is used as prediction if it has same reference picture as E, otherwise "median
prediction" is used
Lower block: A is used as prediction if it has the same reference picture as E, otherwise "median
prediction" is used
D B C
AE
16x88x16
164
MPEG-4 Advanced Video Coding (5)
Reference pictures
Default reference field number assignment when the current picture is first field coded
Default reference field number assignment when the current picture is second field coded
current field0 12 34 5
Ref. Frame (field) Buf.
Ref. Field No.
......
f1 f2f1 f2f1 f2f1 f2f1 f2f1 f2 f1 f2
6 78 910 11
current field0 12 34 5
Ref. Frame (field) Buf.
Ref. Field No.
......
f1 f2f1 f2f1 f2f1 f2f1 f2f1 f2 f1 f2
6 78 910 11
165
MPEG-4 Advanced Video Coding (6)
Intra prediction: two intra prediction modes
Intra prediction modes for 4x4 of luma
Mode 0: DC prediction
Mode 1: Vertical Prediction
Mode 2: Horizontal prediction
Mode 3: Diagonal Down/Right prediction
Mode 4: Diagonal Down/Left prediction
Mode 5: Vertical-Left prediction
Mode 6: Vertical-Right prediction
Mode 7: Horizontal-Up prediction
Mode 8: Horizontal-Down prediction
Intra prediction for 16x16 mode for luma
Mode 0: Vertical
Mode 1: Horizontal
Mode 2: DC prediction
Mode 3: Plane prediction
1
2
34
56
7
8
166
MPEG-4 Advanced Video Coding (7)
Adaptive Block size Transforms (ABT)
Use of ABT to increase coding efficiency
ABT is synchronized with Motion Compensation
for frame motion compensation, ABT applied to frame MBs
for field motion compensation, ABT applied to field MBs
ABT transform coefficient decoding
Progressive scan
Interlaced scan
4x4
4x8 8x8
8x4
1 3 9 13
2 6 10 14
4 7 11 15
5 8 12 16
1 5 13 21
2 6 14 22
3 7 15 23
4 12 20 28
8 16 24 29
9 17 25 30
10 18 26 31
11 19 27 32
1 3 7 11 15 19 23 27
2 6 10 14 18 22 26 30
4 8 12 16 20 24 28 31
5 9 13 17 21 25 29 32
1 4 9 16 23 31 39 53
2 5 15 22 30 38 46 54
3 8 17 24 32 40 47 59
6 10 21 29 37 45 52 60
7 14 25 33 41 48 55 61
11 18 26 34 42 49 56 62
12 19 27 35 43 50 57 63
13 20 28 36 44 51 58 64
4x44x8
8x48x8
167
MPEG-4 Advanced Video Coding (8)
Content-based Adaptive Binary Arithmetic Coding (CABAC)
Context modeling: provides estimates of conditional probabilities of the coding symbols, utilizing
suitable context models, given inter-symbol redundancy can be exploited by switching between
different probability models according to already coded symbols
Arithmetic codes: permit non-integer number of bits to be assigned to each symbol of the alphabet, this
is extremely beneficial for symbol probabilities much greater than 0.5, which often occur with efficient
context modeling. This is extremely beneficial for symbol probabilities much greater than 0.5, which
often occur with efficient context modeling. In this case, a variable length code has to spend at least one
bit in contrast to arithmetic codes, which may use a fraction of one bit
Adaptive arithmetic: codes permit the entropy coder to adapt itself to non-stationary symbol statistics,
For instance, the statistics of motion vector magnitudes vary over space and time as well as for different
sequences and bit-rates. Hence, an adaptive model taking into account the cumulative probabilities of
already coded motion vectors leads to a better fit of the arithmetic codes to the current symbol statistics
168
MPEG-4 Advanced Video Coding (9)
Other techniques for image quality and encoding performance improvement
In loop deblocking filter
a conditional filtering is applied to boundaries of the 4x4 blocks of a reconstructed MB
in the first step, 16 pel of the 4 vertical edges (horizontal filtering) of the 4x4 raster are filtered
after that, 4 horizontal edges (vertical filtering) follow.
Encode optimization
Using R-D optimizations
Finding optimum prediction mode
the best reference frame
the best motion vectors
fractional pel accuracy
Macroblock level optimum mode decision
decision between intra and inter
adaptive block size
170
Conclusions
MPEG-4 visual standards overviewed
MPEG-4 is the first standard which can be used for object-based coding
Simple profile of MPEG-4 has been used for wireless and internet video transmission
MPEG-4 is expected to be major coding scheme for multimedia applications
Several parts of MPEG-4 designed for special applications such as FGS for video streaming, AVC for increasing coding efficiency (may be used for HD DVD with red laser)
171
For Further Information
MPEG-4 Industry Forum
http://www.m4if.org/
MPEG Home Page
http://mpeg.nist.gov/
IEEE Trans. CSVT special issues:
Feb ‗97: on MPEG-4
Nov ‗98: on representation/coding of images/ video (part I)
Feb ‗99: on representation/coding of images/ video (part II)
Dec ‗99: on object-based coding
Mar ‗01: on streaming video