ECE 634 –Digital Video Systems Spring 2019zhu0/ece634_s19/lecture/... · 2019-03-05 ·...

ECE 634 – Digital Video SystemsSpring 2019

Fengqing Maggie ZhuAssistant Professor of ECE

MSEE [email protected]

Video Coding Standards

1

2

Outline

• Video Standardization• What is standardized?• How does the standardization process work?• Why does certain technology get included?• Profiles and levels

Motivation for Standards

• Goal of standards: • Ensuring interoperability: Enabling communication

between devices made by different manufacturers• Promoting a technology or industry• Reducing costs

3

Requirement for Successful Standards• Interoperability: enable communication between devices made

by different manufacturers• Innovation: perform significantly better than previous standard• Competition: allow competition between manufactures; only

standardize bit-stream syntax and reference decoder• Transmission and storage independent: can be used for a range

of applications• Forward compatibility: decode bit-streams from prior standard• Backward compatibility: prior generation decoders can partially

decode new bit-streams

4

What do the Standards Specify?

Encoder Bitstream Decoder

5

What do the Standards Specify?

• Not the encoder• Not the decoder• Just the bit-stream syntax and the decoding process (e.g., use

IDCT, but not how to implement the IDCT)® Enables improved encoding & decoding strategies to be employed

in a standard-compatible manner

Encoder Bitstream Decoder

Scope of Standardization

(Decoding Process)

6

Timeline and “Needs” of Standards• H.261 (1990): video conferencing• MPEG-1 (1992): non-interactive applications, VCD• MPEG-2 and H.262 (1996): TV broadcast, DVD• H.263 (Nov. 1995; Sept. 1997, Nov. 2000): video conferencing• MPEG-4 video (part 2) (1999): object-oriented coding• H.264 and MPEG-4 part 10 (AVC) (2003): compression efficiency• AVS (Audio and Visual Coding Standard) (2006): avoiding high

licensing fees – not covered• HEVC/H.265 (2013): compression efficiency and higher

resolutions – more details later• AV1 (2018): codec for web, open and royalty free – more details

later7

Image and VideoCompression Standards

Standard Application Bit Rate

JPEG Continuous-tone still-image compression Variable

H.261 Video telephony and teleconferencing over ISDN

p x 64 kb/s

MPEG-1 Video on digital storage media (CD-ROM) 1.5 Mb/s

MPEG-2 Digital Television 2-20 Mb/s

H.263 Video telephony over PSTN 33.6-? kb/s MPEG-4 Object-based coding, synthetic content,

interactivity Variable

JPEG-2000 Improved still image compression Variable

H.264 / MPEG-4 AVC

Improved video compression 10’s kb/s to Mb/s

MPEG and JPEG: International Standards Organization (ISO)H.26x family: International Telecommunications Union (ITU) 8

9

Multimedia Communications Standards and Applications

Standards Application Video Format Raw Data Rate Compressed Data Rate

H.320 (H.261)

Video conferencing over ISDN CIF QCIF

37 Mbps 9.1 Mbps

>=384 Kbps >=64 Kbps

H.323 (H.263)

Video conferencing over Internet 4CIF/ CIF/ QCIF

>=64 Kbps

H.324 (H.263)

Video over phone lines/ wireless QCIF 9.1 Mbps >=18 Kbps

MPEG-1

Video distribution on CD/ WWW CIF 30 Mbps 1.5 Mbps

MPEG-2 Video distribution on DVD / digital TV

CCIR601 4:2:0 128 Mbps 3-10 Mbps

MPEG-4

Multimedia distribution over Inter/Intra net

QCIF/CIF 28-1024 Kbps

GA-HDTV

HDTV broadcasting SMPTE296/295 <=700 Mbps 18--45 Mbps

MPEG-7 Multimedia databases (content description and retrieval)

10

ITU-T Multimedia Communications Standards

/3

History of Video Coding Standards

time200219961990 2004

ISO: MPEG-1

H.261 H.263

MPEG-4 AVC

H.264ITU: H.263+

MPEG-2 MPEG-4

H.262 H.263++

ScalableVideoCoding(SVC)

2007Videoconf VCD Digital TV

DVDVideophone Video iPod

Digital TV, cable, satellite, Blue-ray, HD DVD3G cellular

11

12

H.261 Video Coding Standard

• For video-conferencing/video phone• Video coding standard in H.320• Low delay (real-time, interactive)• Slow motion in general

• For transmission over ISDN• Fixed bandwidth: px64 Kbps, p=1,2,…,30

• Video Format: • CIF (352x288, above 128 Kbps) • QCIF (176x144, 64-128 Kbps)• 4:2:0 color format, progressive scan

• Work started in 1985; Standard published in 1990

MPEG-1 Overview• Audio/video on CD-ROM (1.5 Mbps, SIF: 352x240).

• Maximum: 1.856 Mbps, 768x576 pels• Start late 1988, test in 10/89, Committee Draft 9/90 • ISO/IEC 11172-1~5 (Systems, video, audio, compliance, software)• Prompted explosion of digital video applications: MPEG-1 video CD and

downloadable video over Internet• Software only decoding, made possible by the introduction of Pentium

chips, key to the success in the commercial market• MPEG-1 Audio

• Offers 3 coding options (3 layers), higher layer have higher coding efficiency with more computations

• MP3 = MPEG-1 layer 3 audio

13

14

MPEG-2/H.262 Overview• A/V broadcast (TV, HDTV, Terrestrial, Cable, Satellite, High Speed

Inter/Intranet) as well as DVD video• 4~8 Mbps for TV quality, 10-15 Mbps for better quality at SDTV resolutions

(BT.601)• 18-45 Mbps for HDTV applications

• MPEG-2 video high profile at high level is the video coding standard used in HDTV

• Test in 11/91, Committee Draft 11/93 • ISO/IEC 13818-1~6 (Systems, video, audio, compliance, software, DSM-CC)• Consist of various profiles and levels• Backward compatible with MPEG-1• MPEG-2 Audio

• Support 5.1 channel • MPEG-2 AAC: requires 30% fewer bits than MPEG-1 layer 3

H.263 Overview

• H.263 is the video coding standard in H.323/H.324, targeted for visual telephone over PSTN or Internet

• Can accommodate computationally more intensive options than H.261

• Initial version (H.263 baseline): 1995• H.263+: 1997• H.263++: 2000

• Goal: Improved quality at lower rates• Result: Significantly better quality at lower rates

• Better video at 18-24 Kbps than H.261 at 64 Kbps• Enables video phone over regular phone lines (28.8 Kbps) or

wireless modem

15

MPEG-4 Overview

• Video functionalities beyond MPEG-1/2• Interaction with individual objects

• The displayed scene can be composed by the receiver from coded objects

• Scalability of contents• Error resilience• Coding of both natural and synthetic audio and video

• Many other sections• Digital Rights Management• Advanced Audio Coding (AAC)

16

H.264/AVC Standards

• Finalized March 2003• Developed by the joint video team (JVT) including video

coding experts from the ITU-T and the ISO MPEG• Also known as MPEG-4 part 10; Advanced Video Coding

(AVC)• Improved video coding efficiency, up to 50% over

H.263++/MPEG4• Half the bit rate for similar quality• Significantly better quality for the same bit rate

17

AVS (Audio Visual Coding Standard)Overview• Chinese standard; 2002-2003 (Video)• Licensing fees for all ISO and ITU standards after (not including)

MPEG-1• China produces more than 30 million Set Top Boxes• Interlaced pictures, SDTV and HDTV• Similar (slightly less) compression efficiency as H.264 • Intra prediction• Variable block-size MC• ¼ resolution motion, 4-tap interpolation filter• 8x8 Integer Transform• Deblocking

18

High Efficiency Video Coding (HEVC)• October 2010: defined new Test Model for HEVC, HM1• Targeting high efficiency and low complexity applications• Block sizes from 8x8 to 64x64 in tree structure • Up to 34 directions for intra-prediction• 6- or 12-tap interpolation filter, down to 1/4-sample • Advanced motion vector prediction • CABAC or Low Complexity Entropy Coding • Deblocking filter or Adaptive Loop Filter • Extended precision options • Goal: 2x better video compression performance compared to H.264/AVC

19

Video Compression Progress

20

Video Compression Progress

~50 % reduction(H.264 vs MPEG-2)

Example Comparison: Results depend strongly on specific sequence & coding tools employed! 21

Standardization Process

• Competition phase• Collaboration phase

• A Reference Model is defined• Companies make contributions• A contribution is incorporated into the (new) reference

model ONLY if it:• Improves performance by a sufficient margin• Is replicated by at least one other company • Its performance improvement is not at the cost of “too much”

complexity

• Verification phase

22

Video Coding Standards• Video coding standards define the operation of a

decoder given a correct bitstream• They do NOT describe an encoder

• Video coding standards typically define a toolkit• Not all pieces of the toolkit need to be implemented to

create a conforming bitstream

• Decoders must implement some subset of the toolkit to be declared “conforming”

23

MPEG-2 Profiles and Levels

• Goal: To enable more efficient implementations for different applications (interoperability)

• Profile: Subset of the tools applicable for a family of applications

• Level: Bounds on the complexity for any profile

24Simple Main HighProfile

Level

Low

Main

High

DVD & SD Digital TV:Main Profile at Main Level(MP@ML)

HDTV: Main Profile at High Level (MP@HL)

25

Profiles and Levels in MPEG-2Profiles: tools

Levels: parameter range for a given profile

Main profile at main level (mp@ml) is the most popular, used for digital TV

Main profile at high level (mp@hl):HDTV

4:2:2 at main level(4:2:2@ml) is used for studio production

26

Profiles and Levels:Example H.264/AVC

From [Ostermann04]

Evolution of the Standards

• Representations of pictures• Coding structure (picture types and prediction

structures)• Motion representations• Loop filters• Spatial prediction • Transforms• Quantizers• Variable length coding

27

Representations of Pictures

28

Representation: Picture Formats• Progressive pictures

• H.261: only CIF (352*288) and QCIF (176*144)• MPEG-1:Progressive using SIF video format

(352*240p30 or 352*288p25)• All other standards

• Interlaced pictures• MPEG-2 and beyond • MPEG-2 introduces concept of field pictures• 4:2:0 format is modified to shift chroma samples

• Video Object Planes (VOP) == Objects• MPEG-4 only• Allows interactive editing of objects

29

MPEG-2: Chroma Sample Shift

30

Object Description Hierarchy in MPEG-4

VO

VOL1 VOL2

VOP2VOP1 VOP3 VOP4

VO: video objectVOL: video object layer

(can be different parts of a VO or different rate/resolution representation of a VO)VOP: video object plane

31

Example of Scene Composition

VOP1

VOL2VOP3VOP2

VOL1

The decoder can compose a scene by including different VOPs in a VOL

32

Object-Based Coding

• Entire scene is decomposed into multiple objects • Object segmentation is the most difficult task! • But this does not need to be standardized

• Each object is specified by its shape, motion, and texture (color)

• Shape and texture both changes in time (specified by motion)

• MPEG-4 assumes the encoder has a segmentation map available, specifies how to decode (not encode!) shape, motion and texture

33

Coding Structure

34

Representation: Coding Structure

• Motion-compensated predictive (P) frames• H.261 and all later standards

• Bidirectional (B) motion-compensated frames• Introduced in MPEG-1 for improved efficiency• Not included in H.263 (baseline) due to increased delay• Re-included in all subsequent standards (H.263+)

• PB-frames• Combination of P and B, appears only in H.263

• Field pictures and field prediction• MPEG-2 and beyond

35

H.263: PB-Picture Mode

PB-picture mode codes two pictures as a group. The second picture (P) is coded first, then the first picture (B) is coded using both the P-picture and the previously coded picture. This is to avoid the reordering of pictures required in the normal B-mode. But it still requires additional coding delay than P-frames only.

In a B-block, forward prediction (predicted from the previous frame) can be used for all pixels;backward prediction (from the future frame) is only used for those pels that the backward motion vectoraligns with pels of the current MB. Pixels in the “white area” use only forward prediction.

An improved PB-frame mode was defined in H.263+, that removes the previous restriction.

36

37

MPEG-2: Frame vs. Field Picture

Motion Representation

38

Motion Representation and Coding• Methods for generating the MVs are not specified in the standard • H.261

• Forward prediction only• Integer accuracy only, range [-16,16]

• H.263• Forward and backward prediction (B and PB-frames)• Half-pixel motion accuracy; bilinear interpolation filter; range

[-31.5,31]• Optional: unrestricted motion (across picture boundary)• Optional: overlapped block motion estimation• Optional: 4 motion vectors per Macroblock

39

Overlapped Block Motion Compensation (OBMC)• Conventional block motion compensation

• One best matching block is found from a reference frame• The current block is replaced by the best matching block

• OBMC• Each pixel in the current block is predicted by a weighted

average of several corresponding pixels in the reference frame

• The corresponding pixels are determined by the MVs of the current as well as adjacent MBs

• The weights for each corresponding pixel depends on the expected accuracy of the associated MV

40

Motion Representation and Coding• MPEG-1

• Using more advanced motion compensation• Half-pel accuracy motion estimation, range [-64,64]

• Using bi-directional temporal prediction

• MPEG-2• Field prediction for field pictures• Field prediction for frame pictures• Dual prime for P-pictures (One MV for two fields)• 16x8 MC for field pictures

41

Field Prediction for Field Pictures• Each field is predicted individually from the reference fields

• A P-field is predicted from one previous field • A B-field is predicted from two fields chosen from two reference

pictures

42

43

Field Prediction for Frame Pictures

Useful for rapid motion

MPEG-4 Prediction• Quarter-pel motion estimation• Four MVs and Unrestricted MVs (range [-2048,2048])• Optional OBMC

• Sprite coding• Code a large background in the beginning of the sequence, plus

affine mappings, which map parts of the background to the displayed scene at different time instances

• Decoder can vary the mapping to zoom in/out, pan left/right

• Global motion compensation• Using 8-parameter projective mapping• Effective for sequences with large global motion

44

H.264 Motion Compensation• Quarter-pel accuracy• Variable block size• Multiple reference frames

• Generalized B-picture• Weighted prediction

• Allow encoder to specify the use of scaling and offset during motion compensation

• Provides significant benefits for fading sequences such as fade-to-black, fade-in, and cross-fade transitions

45

H.264 Variable Blocksize Motion Compensation• Use variable size block-based motion compensation

• 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4• H.263/MPEG4 use only 16x16 and 8x8

From [Ostermann04] 46

H.264 Multiple Reference Frames for Motion Compensation• Can use one or two from several possible reference frames• When two reference frames are used, arbitrary weights can

be used to combine them – Generalized B-picture

From [Ostermann04] 47

MPEG-1 Group of Pictures

I IP P P PB B B B B BB BB B

Bitstream order: 0 2 3 1 5 6 4 8 9 7 11 12 10 14 15 13

Display order: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

48

H.264 Generalized GOP

I IB B B BB B B B B BB BB B

Display order: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

B

In H.264 and beyond, B frames can be used for prediction

49

Loop Filter

50

Loop Filter• In-Loop filtering can be applied to suppress propagation of coding noise

temporally• H.261

• Separable filter • Loop filter can be turned on or off

• MPEG-1• No loop filter (half-pel motion compensation provides some)

• H.263• Optional deblocking filter included in H.263+• Overlapped block motion effectively smoothes block boundaries• Decoder can choose to implement out-of-loop deblocking filter

• H.264• Deblocking filter adapts to the strength of the blocking artifact

51

H.264 Adaptive Deblocking

• Whether filtering will be turned on depends on the pixel differences involving pixels p0,…, q0,…, and the filter depends on block characteristics and coding mode

• Deblocking results in bit rate savings of 6-9% at medium qualities, and more remarkable subjective improvements

From [Ostermann04]

4 strengths availablebased on neighboringblock types

52

Spatial Prediction

53

Spatial Prediction

• MPEG-1• DC coefficients coded predictively

• H.263

• Optional: Intra DC prediction (10-15% improvement)• MPEG-4

• DC prediction: can predict DC coefficient from either the previous block or the block above

• AC prediction: can predict one column/row of AC coefficients from either the previous block or the block above

54

H.264 Intra prediction

8 possible directions

Apply prediction to the entire 16*16 block, or apply prediction separately to sixteen 4*4 blocks

55

Intra-Frame Prediction (H.264)

Acrosssliceboundariesisnotallowed. 56

H.264 Intra Prediction• Instead of the simple DC coefficient prediction to exploit the correlation

between nearby pixels in the same frame, more sophisticated spatial prediction is used, including INTRA_4x4 and INTRA_16x16

From [Ostermann04]57

Transform

58

Transform

• 8x8 DCT• H.261• MPEG-1• H.263• MPEG-2• MPEG-4

• DCT is non-integer; the result depends on the implementation details

59

H.264 Integer Transform• Smaller block size (4x4 or 2x2) can better represent boundaries of

moving objects, and match prediction errors generated by smaller block size motion compensation

• Integer transform can be implemented more efficiently and no mismatch problem between encoder and decoder

• A few integers, approximates DCT, maintains orthogonality

Primary transform

From [Ostermann04]60

Quantization

61

H.261 DCT Coefficient Quantization DC Coefficient in Intra-

mode:Uniform, stepsize=8

All other coefficients:Uniform with deadzone, stepsize=2~64 (MQUANT)

Deadzone:To avoid too many small coefficients being coded, which are typically due to noise – discard those

WIDTH OF DEADZONE ISAN ENCODER OPTION

62

More Quantization

• MPEG-1• Using perceptual-based quantization matrix for I-blocks

(same as JPEG)• Same quantization step sizes

8 16 19 22 26 27 29 3416 16 22 24 27 29 34 3719 22 26 27 29 34 34 3822 22 26 27 29 34 37 40

22 26 27 29 32 35 40 4826 27 29 32 35 40 48 5826 27 29 34 39 46 56 6927 29 35 38 46 56 69 83

63

H.264 Quantization

• Need to adjust for Integer Transform by scaling the quantization step sizes

• Also, instead of having equally spaced Quantization step sizes, H.264 has logarithmically spaced quantization step sizes

• Increasing the QP by 6 increases the quantization step size by a factor of 2

• Increasing the QP by 1 increases the quantization step size by about 12%

64

Variable Length Coding

65

Variable Length Coding

• H.261• DCT coefficients are converted into runlength representations and then

coded using VLC (Huffman coding for each pair of symbols)• Symbol: (Zero run-length, non-zero value range)

• Other information are also coded using VLC (Huffman coding)

• H.263• 3-D VLC for DCT coefficients (last, runlength, value)

• Syntax-based arithmetic coding (option)

• 4% savings in bit rate for P-mode, 10% saving for I-mode, at 50% more computations

• MPEG-4• 3-D VLC similar to H.263

66

MPEG-2 DCT ModesTwo types of DCT and two types of scan pattern:• Frame DCT: divides an MB into 4 blocks, as usual• Field DCT: reorder pixels in an MB into top and bottom fields

67

H.264 Entropy Coding

• Baseline technique: CAVLC (context adaptively switched sets of variable length codes)

• A more complex technique called CABAC: context-based adaptive binary arithmetic coding

• Both offer significant improvement over Huffman coding which uses pre-designed coding tables based on some assumed statistics

68

More Comparisons Between Standards

69

Performance of H.261 and H.263

Forman, QCIF, 12.5 Hz

IntegerMC,+/- 16

Half-pelMC,+/- 32

IntegerMC,+/- 16,loopfilter

IntegerMC,+/- 32

OBMC,4MVs,etc

70

MPEG-2 vs. MPEG-1 Video

• MPEG-1 only handles progressive sequences (SIF)• MPEG-2 is targeted primarily at interlaced sequences and at

higher resolution (BT.601 = 4CIF)• More sophisticated motion estimation methods (frame/field

prediction mode) are developed to improve estimation accuracy for interlaced sequences

• Different DCT modes and scanning methods are developed for interlaced sequences

• MPEG-2 has various scalability modes• MPEG-2 has various profiles and levels, each combination

targeted for different application

71

MPEG-4 vs. MPEG-1 Coding Efficiency

72

Coding EfficiencyStreaming

From [Ostermann02]

73

From [Ostermann02]

Coding Efficiency Conferencing

74

H.264 Complexity

• H.264 decoder is about 2 times as complex as an MPEG-4 Visual decoder for the Simple profile

• H.264 encoder is about 10 times as complex as a corresponding MPEG-4 Visual encoder for the Simple profile

• The H.264/AVC main profile decoder suitable for entertainment applications is about 4 times more complex than MPEG-2

75

Summary• H.261:

• First video coding standard, targeted for video conferencing over ISDN• Uses block-based hybrid coding framework with integer-pel MC

• H.263: • Improved quality at lower bit rate, to enable video conferencing/telephony

below 54 bkps (modems or internet access, desktop conferencing)• Half-pel MC and other improvement

• MPEG-1 video• Video on CD and video on the Internet (good quality at 1.5 mbps) • Half-pel MC and bidirectional MC

• MPEG-2 video• TV/HDTV/DVD (4-15 mbps)• Extended from MPEG-1, considering interlaced video

76

Summary (Cnt’d)• MPEG-4

• To enable object manipulation and scene composition at the decoder àinteractive TV/virtual reality

• Object-based video coding: shape coding• Coding of synthetic video and audio: animation

• H.264: • Significant improvement in coding efficiency over H.263/MPEG4• Fundamentally similar ideas but with more adaptive/optimized

implementation, feasible only with recent advance in computation power• Other MPEG standards

• MPEG-7• To enable search and browsing of multimedia documents

• MPEG-21• beyond MPEG-7, considering intellectual property protection, etc.

77

Some References

• Wang, Ostermann, Zhang, Chap. 13 (13.2, 13.4-13.6) • H.264:

• J. Ostermann et al., Video coding with H.264/AVC: Tools, performance, and complexity, IEEE Circuits and Systems Magazine, First Quarter, 2004

• IEEE Trans. Circuits and Systems for Video Technology, special issue on H.264, July 2003.

78

Date post:	10-Jul-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

ECE 634 –Digital Video Systems Spring 2019zhu0/ece634_s19/lecture/... · 2019-03-05 ·...

Documents