ECE 634 – Digital Video SystemsSpring 2019
Fengqing Maggie ZhuAssistant Professor of ECE
MSEE [email protected]
Video Coding Standards
1
2
Outline
• Video Standardization• What is standardized?• How does the standardization process work?• Why does certain technology get included?• Profiles and levels
Motivation for Standards
• Goal of standards: • Ensuring interoperability: Enabling communication
between devices made by different manufacturers• Promoting a technology or industry• Reducing costs
3
Requirement for Successful Standards• Interoperability: enable communication between devices made
by different manufacturers• Innovation: perform significantly better than previous standard• Competition: allow competition between manufactures; only
standardize bit-stream syntax and reference decoder• Transmission and storage independent: can be used for a range
of applications• Forward compatibility: decode bit-streams from prior standard• Backward compatibility: prior generation decoders can partially
decode new bit-streams
4
What do the Standards Specify?
Encoder Bitstream Decoder
5
What do the Standards Specify?
• Not the encoder• Not the decoder• Just the bit-stream syntax and the decoding process (e.g., use
IDCT, but not how to implement the IDCT)® Enables improved encoding & decoding strategies to be employed
in a standard-compatible manner
Encoder Bitstream Decoder
Scope of Standardization
(Decoding Process)
6
Timeline and “Needs” of Standards• H.261 (1990): video conferencing• MPEG-1 (1992): non-interactive applications, VCD• MPEG-2 and H.262 (1996): TV broadcast, DVD• H.263 (Nov. 1995; Sept. 1997, Nov. 2000): video conferencing• MPEG-4 video (part 2) (1999): object-oriented coding• H.264 and MPEG-4 part 10 (AVC) (2003): compression efficiency• AVS (Audio and Visual Coding Standard) (2006): avoiding high
licensing fees – not covered• HEVC/H.265 (2013): compression efficiency and higher
resolutions – more details later• AV1 (2018): codec for web, open and royalty free – more details
later7
Image and VideoCompression Standards
Standard Application Bit Rate
JPEG Continuous-tone still-image compression Variable
H.261 Video telephony and teleconferencing over ISDN
p x 64 kb/s
MPEG-1 Video on digital storage media (CD-ROM) 1.5 Mb/s
MPEG-2 Digital Television 2-20 Mb/s
H.263 Video telephony over PSTN 33.6-? kb/s MPEG-4 Object-based coding, synthetic content,
interactivity Variable
JPEG-2000 Improved still image compression Variable
H.264 / MPEG-4 AVC
Improved video compression 10’s kb/s to Mb/s
MPEG and JPEG: International Standards Organization (ISO)H.26x family: International Telecommunications Union (ITU) 8
9
Multimedia Communications Standards and Applications
Standards Application Video Format Raw Data Rate Compressed Data Rate
H.320 (H.261)
Video conferencing over ISDN CIF QCIF
37 Mbps 9.1 Mbps
>=384 Kbps >=64 Kbps
H.323 (H.263)
Video conferencing over Internet 4CIF/ CIF/ QCIF
>=64 Kbps
H.324 (H.263)
Video over phone lines/ wireless QCIF 9.1 Mbps >=18 Kbps
MPEG-1
Video distribution on CD/ WWW CIF 30 Mbps 1.5 Mbps
MPEG-2 Video distribution on DVD / digital TV
CCIR601 4:2:0 128 Mbps 3-10 Mbps
MPEG-4
Multimedia distribution over Inter/Intra net
QCIF/CIF 28-1024 Kbps
GA-HDTV
HDTV broadcasting SMPTE296/295 <=700 Mbps 18--45 Mbps
MPEG-7 Multimedia databases (content description and retrieval)
10
ITU-T Multimedia Communications Standards
/3
History of Video Coding Standards
time200219961990 2004
ISO: MPEG-1
H.261 H.263
MPEG-4 AVC
H.264ITU: H.263+
MPEG-2 MPEG-4
H.262 H.263++
ScalableVideoCoding(SVC)
2007Videoconf VCD Digital TV
DVDVideophone Video iPod
Digital TV, cable, satellite, Blue-ray, HD DVD3G cellular
11
12
H.261 Video Coding Standard
• For video-conferencing/video phone• Video coding standard in H.320• Low delay (real-time, interactive)• Slow motion in general
• For transmission over ISDN• Fixed bandwidth: px64 Kbps, p=1,2,…,30
• Video Format: • CIF (352x288, above 128 Kbps) • QCIF (176x144, 64-128 Kbps)• 4:2:0 color format, progressive scan
• Work started in 1985; Standard published in 1990
MPEG-1 Overview• Audio/video on CD-ROM (1.5 Mbps, SIF: 352x240).
• Maximum: 1.856 Mbps, 768x576 pels• Start late 1988, test in 10/89, Committee Draft 9/90 • ISO/IEC 11172-1~5 (Systems, video, audio, compliance, software)• Prompted explosion of digital video applications: MPEG-1 video CD and
downloadable video over Internet• Software only decoding, made possible by the introduction of Pentium
chips, key to the success in the commercial market• MPEG-1 Audio
• Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations
• MP3 = MPEG-1 layer 3 audio
13
14
MPEG-2/H.262 Overview• A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed
Inter/Intranet) as well as DVD video• 4~8 Mbps for TV quality, 10-15 Mbps for better quality at SDTV resolutions
(BT.601)• 18-45 Mbps for HDTV applications
• MPEG-2 video high profile at high level is the video coding standard used in HDTV
• Test in 11/91, Committee Draft 11/93 • ISO/IEC 13818-1~6 (Systems, video, audio, compliance, software, DSM-CC)• Consist of various profiles and levels• Backward compatible with MPEG-1• MPEG-2 Audio
• Support 5.1 channel • MPEG-2 AAC: requires 30% fewer bits than MPEG-1 layer 3
H.263 Overview
• H.263 is the video coding standard in H.323/H.324, targeted for visual telephone over PSTN or Internet
• Can accommodate computationally more intensive options than H.261
• Initial version (H.263 baseline): 1995• H.263+: 1997• H.263++: 2000
• Goal: Improved quality at lower rates• Result: Significantly better quality at lower rates
• Better video at 18-24 Kbps than H.261 at 64 Kbps• Enables video phone over regular phone lines (28.8 Kbps) or
wireless modem
15
MPEG-4 Overview
• Video functionalities beyond MPEG-1/2• Interaction with individual objects
• The displayed scene can be composed by the receiver from coded objects
• Scalability of contents• Error resilience• Coding of both natural and synthetic audio and video
• Many other sections• Digital Rights Management• Advanced Audio Coding (AAC)
16
H.264/AVC Standards
• Finalized March 2003• Developed by the joint video team (JVT) including video
coding experts from the ITU-T and the ISO MPEG• Also known as MPEG-4 part 10; Advanced Video Coding
(AVC)• Improved video coding efficiency, up to 50% over
H.263++/MPEG4• Half the bit rate for similar quality• Significantly better quality for the same bit rate
17
AVS (Audio Visual Coding Standard)Overview• Chinese standard; 2002-2003 (Video)• Licensing fees for all ISO and ITU standards after (not including)
MPEG-1• China produces more than 30 million Set Top Boxes• Interlaced pictures, SDTV and HDTV• Similar (slightly less) compression efficiency as H.264 • Intra prediction• Variable block-size MC• ¼ resolution motion, 4-tap interpolation filter• 8x8 Integer Transform• Deblocking
18
High Efficiency Video Coding (HEVC)• October 2010: defined new Test Model for HEVC, HM1• Targeting high efficiency and low complexity applications• Block sizes from 8x8 to 64x64 in tree structure • Up to 34 directions for intra-prediction• 6- or 12-tap interpolation filter, down to 1/4-sample • Advanced motion vector prediction • CABAC or Low Complexity Entropy Coding • Deblocking filter or Adaptive Loop Filter • Extended precision options • Goal: 2x better video compression performance compared to H.264/AVC
19
Video Compression Progress
20
Video Compression Progress
~50 % reduction(H.264 vs MPEG-2)
Example Comparison: Results depend strongly on specific sequence & coding tools employed! 21
Standardization Process
• Competition phase• Collaboration phase
• A Reference Model is defined• Companies make contributions• A contribution is incorporated into the (new) reference
model ONLY if it:• Improves performance by a sufficient margin• Is replicated by at least one other company • Its performance improvement is not at the cost of “too much”
complexity
• Verification phase
22
Video Coding Standards• Video coding standards define the operation of a
decoder given a correct bitstream• They do NOT describe an encoder
• Video coding standards typically define a toolkit• Not all pieces of the toolkit need to be implemented to
create a conforming bitstream
• Decoders must implement some subset of the toolkit to be declared “conforming”
23
MPEG-2 Profiles and Levels
• Goal: To enable more efficient implementations for different applications (interoperability)
• Profile: Subset of the tools applicable for a family of applications
• Level: Bounds on the complexity for any profile
24Simple Main HighProfile
Level
Low
Main
High
DVD & SD Digital TV:Main Profile at Main Level(MP@ML)
HDTV: Main Profile at High Level (MP@HL)
25
Profiles and Levels in MPEG-2Profiles: tools
Levels: parameter range for a given profile
Main profile at main level (mp@ml) is the most popular, used for digital TV
Main profile at high level (mp@hl):HDTV
4:2:2 at main level(4:2:2@ml) is used for studio production
26
Profiles and Levels:Example H.264/AVC
From [Ostermann04]
Evolution of the Standards
• Representations of pictures• Coding structure (picture types and prediction
structures)• Motion representations• Loop filters• Spatial prediction • Transforms• Quantizers• Variable length coding
27
Representations of Pictures
28
Representation: Picture Formats• Progressive pictures
• H.261: only CIF (352*288) and QCIF (176*144)• MPEG-1:Progressive using SIF video format
(352*240p30 or 352*288p25)• All other standards
• Interlaced pictures• MPEG-2 and beyond • MPEG-2 introduces concept of field pictures• 4:2:0 format is modified to shift chroma samples
• Video Object Planes (VOP) == Objects• MPEG-4 only• Allows interactive editing of objects
29
MPEG-2: Chroma Sample Shift
30
Object Description Hierarchy in MPEG-4
VO
VOL1 VOL2
VOP2VOP1 VOP3 VOP4
VO: video objectVOL: video object layer
(can be different parts of a VO or different rate/resolution representation of a VO)VOP: video object plane
31
Example of Scene Composition
VOP1
VOL2VOP3VOP2
VOL1
The decoder can compose a scene by including different VOPs in a VOL
32
Object-Based Coding
• Entire scene is decomposed into multiple objects • Object segmentation is the most difficult task! • But this does not need to be standardized
• Each object is specified by its shape, motion, and texture (color)
• Shape and texture both changes in time (specified by motion)
• MPEG-4 assumes the encoder has a segmentation map available, specifies how to decode (not encode!) shape, motion and texture
33
Coding Structure
34
Representation: Coding Structure
• Motion-compensated predictive (P) frames• H.261 and all later standards
• Bidirectional (B) motion-compensated frames• Introduced in MPEG-1 for improved efficiency• Not included in H.263 (baseline) due to increased delay• Re-included in all subsequent standards (H.263+)
• PB-frames• Combination of P and B, appears only in H.263
• Field pictures and field prediction• MPEG-2 and beyond
35
H.263: PB-Picture Mode
PB-picture mode codes two pictures as a group. The second picture (P) is coded first, then the first picture (B) is coded using both the P-picture and the previously coded picture. This is to avoid the reordering of pictures required in the normal B-mode. But it still requires additional coding delay than P-frames only.
In a B-block, forward prediction (predicted from the previous frame) can be used for all pixels;backward prediction (from the future frame) is only used for those pels that the backward motion vectoraligns with pels of the current MB. Pixels in the “white area” use only forward prediction.
An improved PB-frame mode was defined in H.263+, that removes the previous restriction.
36
37
MPEG-2: Frame vs. Field Picture
Motion Representation
38
Motion Representation and Coding• Methods for generating the MVs are not specified in the standard • H.261
• Forward prediction only• Integer accuracy only, range [-16,16]
• H.263• Forward and backward prediction (B and PB-frames)• Half-pixel motion accuracy; bilinear interpolation filter; range
[-31.5,31]• Optional: unrestricted motion (across picture boundary)• Optional: overlapped block motion estimation• Optional: 4 motion vectors per Macroblock
39
Overlapped Block Motion Compensation (OBMC)• Conventional block motion compensation
• One best matching block is found from a reference frame• The current block is replaced by the best matching block
• OBMC• Each pixel in the current block is predicted by a weighted
average of several corresponding pixels in the reference frame
• The corresponding pixels are determined by the MVs of the current as well as adjacent MBs
• The weights for each corresponding pixel depends on the expected accuracy of the associated MV
40
Motion Representation and Coding• MPEG-1
• Using more advanced motion compensation• Half-pel accuracy motion estimation, range [-64,64]
• Using bi-directional temporal prediction
• MPEG-2• Field prediction for field pictures• Field prediction for frame pictures• Dual prime for P-pictures (One MV for two fields)• 16x8 MC for field pictures
41
Field Prediction for Field Pictures• Each field is predicted individually from the reference fields
• A P-field is predicted from one previous field • A B-field is predicted from two fields chosen from two reference
pictures
42
43
Field Prediction for Frame Pictures
Useful for rapid motion
MPEG-4 Prediction• Quarter-pel motion estimation• Four MVs and Unrestricted MVs (range [-2048,2048])• Optional OBMC
• Sprite coding• Code a large background in the beginning of the sequence, plus
affine mappings, which map parts of the background to the displayed scene at different time instances
• Decoder can vary the mapping to zoom in/out, pan left/right
• Global motion compensation• Using 8-parameter projective mapping• Effective for sequences with large global motion
44
H.264 Motion Compensation• Quarter-pel accuracy• Variable block size• Multiple reference frames
• Generalized B-picture• Weighted prediction
• Allow encoder to specify the use of scaling and offset during motion compensation
• Provides significant benefits for fading sequences such as fade-to-black, fade-in, and cross-fade transitions
45
H.264 Variable Blocksize Motion Compensation• Use variable size block-based motion compensation
• 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4• H.263/MPEG4 use only 16x16 and 8x8
From [Ostermann04] 46
H.264 Multiple Reference Frames for Motion Compensation• Can use one or two from several possible reference frames• When two reference frames are used, arbitrary weights can
be used to combine them – Generalized B-picture
From [Ostermann04] 47
MPEG-1 Group of Pictures
I IP P P PB B B B B BB BB B
Bitstream order: 0 2 3 1 5 6 4 8 9 7 11 12 10 14 15 13
Display order: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
48
H.264 Generalized GOP
I IB B B BB B B B B BB BB B
Display order: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
B
In H.264 and beyond, B frames can be used for prediction
49
Loop Filter
50
Loop Filter• In-Loop filtering can be applied to suppress propagation of coding noise
temporally• H.261
• Separable filter • Loop filter can be turned on or off
• MPEG-1• No loop filter (half-pel motion compensation provides some)
• H.263• Optional deblocking filter included in H.263+• Overlapped block motion effectively smoothes block boundaries• Decoder can choose to implement out-of-loop deblocking filter
• H.264• Deblocking filter adapts to the strength of the blocking artifact
51
H.264 Adaptive Deblocking
• Whether filtering will be turned on depends on the pixel differences involving pixels p0,…, q0,…, and the filter depends on block characteristics and coding mode
• Deblocking results in bit rate savings of 6-9% at medium qualities, and more remarkable subjective improvements
From [Ostermann04]
4 strengths availablebased on neighboringblock types
52
Spatial Prediction
53
Spatial Prediction
• MPEG-1• DC coefficients coded predictively
• H.263
• Optional: Intra DC prediction (10-15% improvement)• MPEG-4
• DC prediction: can predict DC coefficient from either the previous block or the block above
• AC prediction: can predict one column/row of AC coefficients from either the previous block or the block above
54
H.264 Intra prediction
8 possible directions
Apply prediction to the entire 16*16 block, or apply prediction separately to sixteen 4*4 blocks
55
Intra-Frame Prediction (H.264)
Acrosssliceboundariesisnotallowed. 56
H.264 Intra Prediction• Instead of the simple DC coefficient prediction to exploit the correlation
between nearby pixels in the same frame, more sophisticated spatial prediction is used, including INTRA_4x4 and INTRA_16x16
From [Ostermann04]57
Transform
58
Transform
• 8x8 DCT• H.261• MPEG-1• H.263• MPEG-2• MPEG-4
• DCT is non-integer; the result depends on the implementation details
59
H.264 Integer Transform• Smaller block size (4x4 or 2x2) can better represent boundaries of
moving objects, and match prediction errors generated by smaller block size motion compensation
• Integer transform can be implemented more efficiently and no mismatch problem between encoder and decoder
• A few integers, approximates DCT, maintains orthogonality
Primary transform
From [Ostermann04]60
Quantization
61
H.261 DCT Coefficient Quantization DC Coefficient in Intra-
mode:Uniform, stepsize=8
All other coefficients:Uniform with deadzone, stepsize=2~64 (MQUANT)
Deadzone:To avoid too many small coefficients being coded, which are typically due to noise – discard those
WIDTH OF DEADZONE ISAN ENCODER OPTION
62
More Quantization
• MPEG-1• Using perceptual-based quantization matrix for I-blocks
(same as JPEG)• Same quantization step sizes
8 16 19 22 26 27 29 3416 16 22 24 27 29 34 3719 22 26 27 29 34 34 3822 22 26 27 29 34 37 40
22 26 27 29 32 35 40 4826 27 29 32 35 40 48 5826 27 29 34 39 46 56 6927 29 35 38 46 56 69 83
63
H.264 Quantization
• Need to adjust for Integer Transform by scaling the quantization step sizes
• Also, instead of having equally spaced Quantization step sizes, H.264 has logarithmically spaced quantization step sizes
• Increasing the QP by 6 increases the quantization step size by a factor of 2
• Increasing the QP by 1 increases the quantization step size by about 12%
64
Variable Length Coding
65
Variable Length Coding
• H.261• DCT coefficients are converted into runlength representations and then
coded using VLC (Huffman coding for each pair of symbols)• Symbol: (Zero run-length, non-zero value range)
• Other information are also coded using VLC (Huffman coding)
• H.263• 3-D VLC for DCT coefficients (last, runlength, value)
• Syntax-based arithmetic coding (option)
• 4% savings in bit rate for P-mode, 10% saving for I-mode, at 50% more computations
• MPEG-4• 3-D VLC similar to H.263
66
MPEG-2 DCT ModesTwo types of DCT and two types of scan pattern:• Frame DCT: divides an MB into 4 blocks, as usual• Field DCT: reorder pixels in an MB into top and bottom fields
67
H.264 Entropy Coding
• Baseline technique: CAVLC (context adaptively switched sets of variable length codes)
• A more complex technique called CABAC: context-based adaptive binary arithmetic coding
• Both offer significant improvement over Huffman coding which uses pre-designed coding tables based on some assumed statistics
68
More Comparisons Between Standards
69
Performance of H.261 and H.263
Forman, QCIF, 12.5 Hz
IntegerMC,+/- 16
Half-pelMC,+/- 32
IntegerMC,+/- 16,loopfilter
IntegerMC,+/- 32
OBMC,4MVs,etc
70
MPEG-2 vs. MPEG-1 Video
• MPEG-1 only handles progressive sequences (SIF)• MPEG-2 is targeted primarily at interlaced sequences and at
higher resolution (BT.601 = 4CIF)• More sophisticated motion estimation methods (frame/field
prediction mode) are developed to improve estimation accuracy for interlaced sequences
• Different DCT modes and scanning methods are developed for interlaced sequences
• MPEG-2 has various scalability modes• MPEG-2 has various profiles and levels, each combination
targeted for different application
71
MPEG-4 vs. MPEG-1 Coding Efficiency
72
Coding EfficiencyStreaming
From [Ostermann02]
73
From [Ostermann02]
Coding Efficiency Conferencing
74
H.264 Complexity
• H.264 decoder is about 2 times as complex as an MPEG-4 Visual decoder for the Simple profile
• H.264 encoder is about 10 times as complex as a corresponding MPEG-4 Visual encoder for the Simple profile
• The H.264/AVC main profile decoder suitable for entertainment applications is about 4 times more complex than MPEG-2
75
Summary• H.261:
• First video coding standard, targeted for video conferencing over ISDN• Uses block-based hybrid coding framework with integer-pel MC
• H.263: • Improved quality at lower bit rate, to enable video conferencing/telephony
below 54 bkps (modems or internet access, desktop conferencing)• Half-pel MC and other improvement
• MPEG-1 video• Video on CD and video on the Internet (good quality at 1.5 mbps) • Half-pel MC and bidirectional MC
• MPEG-2 video• TV/HDTV/DVD (4-15 mbps)• Extended from MPEG-1, considering interlaced video
76
Summary (Cnt’d)• MPEG-4
• To enable object manipulation and scene composition at the decoder àinteractive TV/virtual reality
• Object-based video coding: shape coding• Coding of synthetic video and audio: animation
• H.264: • Significant improvement in coding efficiency over H.263/MPEG4• Fundamentally similar ideas but with more adaptive/optimized
implementation, feasible only with recent advance in computation power• Other MPEG standards
• MPEG-7• To enable search and browsing of multimedia documents
• MPEG-21• beyond MPEG-7, considering intellectual property protection, etc.
77
Some References
• Wang, Ostermann, Zhang, Chap. 13 (13.2, 13.4-13.6) • H.264:
• J. Ostermann et al., Video coding with H.264/AVC: Tools, performance, and complexity, IEEE Circuits and Systems Magazine, First Quarter, 2004
• IEEE Trans. Circuits and Systems for Video Technology, special issue on H.264, July 2003.
78