VIDEO COMPRESSION STANDARDS FOR HIGH DEFINITION VIDEO: A

transcript

EE5359 Spring 2012

EE5359 Project Proposal on

COMPARATIVE STUDY OF H.264, DIRAC PRO AND AVS PART 2

By Sudeep Gangavati

ID : 1000717165

EE5359 Spring 2012

LIST OF ACRONYMS

AU: Access Unit

AVS: Audio Video Standard

AVS-M: Audio Video Standard for mobile

B-Frame: Bidirectionally Interpolated Frame

BBC: British Broadcasting Corporation

CAVLC: Context Adaptive Variable Length Coding

CBP: Coded Block Pattern

CIF: Common Intermediate Format

DIP: Direct Intra Prediction

DPB: Decoded Picture Buffer

EOB: End of Block

HD: High Definition

ICT: Integer Cosine Transform

IDR: Instantaneous Decoding Refresh

I-Frame: Intra Frame

ITU-T: International Telecommunication Union

JPEG: Joint Photographic Experts Group

MSE: Mean Square Error

PSNR: Peak Signal to Noise Ratio

QCIF: Quarter Common Intermediate Format

SMPTE: Society of Motion Picture and Television Engineers

SSIM: Structural Similarity Index

EE5359 Spring 2012

Video Compression Standards for High Resolution Video

Objective:

The objective of this project to study, implement and compare video coding standards like

H.264/AVC [1], Dirac pro [3] and AVS China P2 (AVS video) [8]. The analysis will be carried out

and different performance metrics like MSE, PSNR, bitrate, SSIM [13] and video quality will be

evaluated for high resolution videos at various bitrates.

Motivation:

With the ever increasing demand for high definition video, several different video coding

standards have been developed to address the needs of HD Video coding. The project attempts

to implement and evaluate the video coding standards that have been extensively used for HD

video broadcasting, storage and distribution. The video coding standards that will be evaluated

are H.264/AVC (Main Profile) [1], AVS China P2 (Base Profile) [8] and Dirac Pro [5]. Since Dirac

Pro is intra frame coding only, the analysis will be carried out only for the intra frame coding in

case of H.264/AVC [1] and AVS China P2 [8].

Introduction:

H.264/AVC

The H.264/AVC is the latest advanced video coding standard developed by ITU-T Video Coding

Experts Group together with the ISO/IEC Moving Picture Experts Group [1]. It is the most widely

used video coding standard [15] [28] for streaming videos, mobile/handheld applications, HDTV

broadcasting etc. The H.264 standard supports three sampling patters for luminance component

(Y), red-difference chroma component (Cr) and blue-difference chroma component (Cb) [20].

The 4:4:4 sampling means that the three components (Y: Cr: Cb) have the same resolution and a

sample of each component exists at every pixel position as show in Figure 1(a).

Fig. 1(a) 4:4:4 , 4:2:2, 4:2:0 Sampling patterns [20]

EE5359 Spring 2012

An H.264 encoder converts the raw video into a compressed version and the decoder converts

the compressed video back to its original format.

The H.264 encoder block diagram is shown in Figure 1(b).

Fig.1 (b) Basic H.264 encoder structure [1]

The encoder performs transform, quantization, prediction and encoding to produce compressed

video. The decoder shown in Figure 2 on the other hand does the inverse operations to obtain

the uncompressed video.

Fig.2 H.264 decoder structure [2] [17]

H.264 Profiles

H.264 standard defines a set of profiles and levels to set points of conformance for different

classes of applications and services. For each profile there are specific encoding tools that will be

EE5359 Spring 2012

supported by the decoders conforming to that profile. There are mainly seven profiles [2]:

Baseline, Main, Extended, High, High 10, High 4:2:2, High 4:4:4.

Main profile is designed for digital storage media and television broadcasting. Extended profile

is designed for multimedia services over Internet. Baseline profile is aimed at real time

applications such as video conferencing. High profile mainly aims at applications such as

content distribution, studio editing and high resolution videos. Profiles are shown in Figure 3.

Fig.3 H.264 profiles and levels [2]

Video Coding Algorithm

The encoder block diagram is shown in Figure 1 [1]. Encoder will select between intra and inter-

coding for blocks of each picture. Intra coding exploits several spatial prediction modes as

shown in Figure 3(a) and (b) to reduce the redundancies in the signal for a single picture. Inter

coding does the inter prediction of each block of sample values from previously coded pictures.

Inter coding uses motion vectors for block based inter-prediction to reduce temporal

redundancy between different pictures. The deblocking filter is used to reduce the blocking

artifacts at the block boundaries. The prediction residual is further transformed to remove

spatial correlation in the block before it is quantized. Finally, the intra predicted modes or

motion vectors are combined with the quantized transform coefficient information and encoded

using arithmetic coding or entropy coding.

Prediction in H.264

The H.264 video coding standard employs two different prediction techniques called intra

prediction and inter prediction in order to predict the current macroblock [20]. In intra

prediction, the prediction for the current macroblock of image samples is created from

previously coded samples in the same frame. In an intra macroblock, there are three choices for

the intra macroblock size for the luma component namely 16 x 16, 8 x 8 or 4 x 4 [1][20]. A single

prediction block is generated using one of a number of possible prediction modes. For a 4 x 4

EE5359 Spring 2012

macroblock there are nine modes, for 8 x 8 macroblock size available in High profile also has 9

modes but for 16 x 16 there are 4 modes [20].

During the prediction, one mode is selected and is then used to predict the values. The modes

for 4 x 4 and 16 x 16 luma prediction modes are shown in Figure 3 (a) and Figure 3 (b)

respectively.

Fig. 3(a) Intra prediction modes for 4 x 4 block size [20].

Fig. 3 (b) Intra prediction modes for 16 x 16 block size [20].

In inter prediction, motion estimation and motion compensation techniques are used to predict

the current macroblock [20]. This process involves selecting a prediction region, generating a

prediction block and subtracting this from the original block of samples to form a residual that

is transformed, quantized and then encoded. The macroblock can be of different sizes as shown

in Figure 3 (c).

Fig.3(c) Inter prediction macroblock sizes [20]

Sub-macroblock partitions

1 macroblock partition of 16*16 luma samples and

associated chroma samples

Macroblock partitions

2 macroblock partitions of 16*8 luma samples and

4 sub-macroblocks of 8*8 luma samples and

2 macroblock partitions of 8*16 luma samples and

1 sub-macroblock partition of 8*8 luma samples and

2 sub-macroblock partitions of 8*4 luma samples and

EE5359 Spring 2012

Deblocking Filter

A filter is applied to every decoded macroblock as shown in Figure 3(d) to reduce the blocking

distortion [20]. The deblocking filter is applied after the inverse transform in the encoder before

reconstructing and storing the macroblock for future predictions and in the decoder before

reconstructing and displaying the macroblock. The filter smooths the block edges, improving

the appearance of the decoded frames.

Fig. 3 (d) Boundaries in a macroblock to be filtered (luma boundaries shown with

solid lines and chroma boundaries shown with dotted lines) [2]

DIRAC PRO

Dirac is a video codec originally developed by BBC [3]. The main aim of Dirac video standard is

to provide high-quality video compression for web streaming and HDTV applications. BBC used

Dirac to transmit HDTV pictures of Beijing Olympics in 2008 [3][4].

Dirac Pro is a version of Dirac family of compression tools mainly optimized for video

production and archiving applications and the focus is on high quality and low latency. Dirac Pro

is intended for high quality applications with lower compression ratios [4][5].

Dirac Pro supports the following technical aspects [5]:

Intra-frame coding only

10 bit 4:2:2

No Subsampling

Lossless or visually lossless compression

Low latency on encode/decode

Robust over multiple passes

Support for multiple HD image formats and frame rates

Low complexity for decoding

EE5359 Spring 2012

The main difference in the Dirac and Dirac Pro is the treatment in the final process in

compression – the arithmetic coding. Arithmetic coding is processing intensive and introduces

delay. These features are undesirable in high end production work and hence Dirac Pro omits

arithmetic coding.

Fig.4 (a) Dirac encoder [3]

Fig.4 (b) Dirac decoder [3]

The encoder and decoder block diagram are shown in Figure 4 (a) and (b) respectively.

Architecture

Dirac can compress any size of picture from low resolution QCIF to HDTV. Dirac employs wavelet

compression instead of discrete cosine transform [6] used in other codecs. Another application

of wavelet transform is the JPEG 2000 compression standard for still images [7].

EE5359 Spring 2012

Motion Estimation

In Dirac, frames have two essential properties. Firstly, they are either predicted from other

frames i.e. Inter. Secondly they can be used to predict other frames. All combinations of these

properties are possible, and any inter frame can be predicted from up to two reference frames.

But in Dirac pro, only intra frame coding is used. Dirac Pro provides spatial and quality

scalabilities, useful to save bandwidth during the transmission of a single bit stream to receivers

with different image resolution and bandwidth requirements [6]. Dirac pro has been adopted by

SMPTE as VC-2 [29].

AVS CHINA

AVS is an acronym for Audio Video Standard which is a compression codec for digital audio and

video developed by China [8]. AVS China was developed to replace the most used H.264/AVC

standard. AVS China finds its applications in high resolution broadcast, video on wireless

communications medium etc.

AVS China has been divided into various parts and thus dividing the AVS china architecture into

various sub-fields. The AVS standard has been divided into 10 parts as shown in Figure 5 [16].

Fig.5 AVS parts [16]

EE5359 Spring 2012

AVS part 1 considers the system for broadcast. AVS Part 2 considers the video part. AVS Part3

covers the audio part and AVS Part 6 includes content creation. The AVS Part 2 encoding and

decoding structures [8] are shown in Figure6 (a) and (b).

Fig.6 (a) AVS China part 2 encoding structure [8].

Fig.6 (b) AVS China part 2 decoding structure [8].

System architecture

AVS Part 2 is hybrid coding based on spatial and temporal predictions, integer transform and

entropy coding. The system architecture is illustrated in Figure.6 [8]

Intra Prediction

Spatial prediction as shown in Figure 7 is used in intra coding in AVS part 2 to exploit spatial

correlations of picture. The intra prediction is based on 8x8 block. The intra prediction method

is derived from the neighboring pixels in left and top blocks. There are five luminance intra

EE5359 Spring 2012

prediction modes, and four chrominance intra prediction modes. The reconstructed pixels of

neighboring blocks before deblocking filter are used on reference pixels for the current block

Fig.7 Five different modes for 8 x8 block intra luminance prediction [8].

Deblocking filter

The deblocking filter is used to reduce/eliminate the block artifacts and enhance both subjective

and objective performance. AVS Part 2 deblocking filter first calculates the boundary strength

(BS) of each block boundary, and then applies different filters for different BS.

AVS Part 2 utilizes an efficient context based 2D-VLC entropy coder for coding 8x8 block-size

transform coefficients. 2D-VLC means that a pair Run-Level is regarded as one event and jointly

coded.

CONCLUSION

This project aims at a thorough study, implementation and exhaustive comparison of video

coding standards like H.264, Dirac pro and AVS part 2. Analysis will be carried out and different

performance metrics like MSE, PSNR etc. will be evaluated for different high definition video

sequences. Based on the values of these performance metrics, conclusions will be drawn as to

which video coding standard is best suited for high definition video compression.

References:

EE5359 Spring 2012

[1] T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video

coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13,

pp.560-576, July 2003.

[2] S.K.Kwon, A. Tamhankar and K.R.Rao, “Overview of H.264/MPEG-4 Part 10” J.VCIR, Vol.

17, pp. 186-216, April 2006, Special Issue on “Emerging H.264/AVC video coding

standard”.

[3] “ The Dirac web page” :http://www.bbc.co.uk/rd/projects/dirac/intro.shtml.

[4] “Dirac Codec Wiki Page ” at http://en.wikipedia.org/wiki/Dirac(codec).

[5] “Dirac Pro web page” at http://www.bbc.co.uk/rd/projects/dirac/diracpro.shtml.

[6] “Video on the web “ at http://etill.net/projects/dirac_theora_evaluation/.

[7] T. Borer, and T. Davies, “Dirac video compression using open technology”, BBC EBU

Technical Review, July 2005.

[8] L. Yu et al, “An overview of AVS-Video: tools, performance and complexity”, Visual

Communications and Image Processing, Proc. of SPIE, vol. 5960, pp.679-690, July 2006.

[9] AVS Video Expert Group, “Information technology – Advanced coding of audio and video

– Part 2: Video (AVS1-P2 JQP FCD 1.0)”, Audio Video Coding Standard Group of China

(AVS), Doc. AVS-N1538, Sep. 2008.

[10] Special issue on “AVS and its applications” Signal processing: Image Communication,

vol.24, pp. 245-344, April 2009.

[11] JVT ”Draft ITU-T recommendation and final draft international standard of joint video

specification (ITU-T rec. H.264– ISO/IEC 14496-10 AVC),” March 2003, JVT-G050

available on http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT-G050.pdf .

[12] K. Onthriar, K. K. Loo and Z. Xue, “Performance comparison of emerging Dirac video

codec with H.264/AVC”, IEEE International Conference on Digital Telecommunications,

2006, ICDT apos; Vol. 06, Page: 22, Issue: 29-31, Aug. 2006.

[13] Z. Wang and A.C. Bovik, “A universal image quality index”, IEEE Signal Processing

Letters, Vol.9, pp. 81-84, March 2002.

[14] A. Ravi, “Performance analysis and comparison of the Dirac video codec with

H.264/AVC”, M.S. Thesis, Electrical Engineering Department, University of Texas at

Arlington, August 2009.

[15] “H.264/MPEG-4 AVC web page” http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC.

[16] H.264 AVC JM Software : http://iphome.hhi.de/suehring/tml/.

EE5359 Spring 2012

[17] H.264 decoder: http://www.adalta.it/Pages/407/266881_266881.jpg.

[18] W. Gao et al, “AVS - The Chinese next-generation video coding standard” NAB, Las

Vegas, 2004.

[19] X. Wang et al, “Performance comparison of AVS and H.264/AVC video coding standards” J.

Computer Science and Technology, vol.21, No.3, pp.310-314, May 2006.

[20] Iain Richardson, “ The H.264 advanced video coding standard”, Second Edition, Wiley, 2010

[21] AVS China part 2 video software, password protected : ftp://124.207.250.92/.

[22] S. Swaminathan and K.R. Rao, “Multiplexing and demultiplexing of AVS CHINA video with

AAC audio,” TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011.

[23] Dirac Pro Software : http://diracvideo.org/download/

[24] M. Tun, K.K. Loo and J. Cosmas, “Semi-hierarchical motion estimation for the Dirac video

codec,” 2008 IEEE International Symposium on Broadband Multimedia Systems and

Broadcasting, pp.1–6, March 31-April 2, 2008.

[25] T. Davies, “The Dirac Algorithm”: http://dirac.sourceforge.net/documentation/algorithm/,

[26] Dirac video codec – A programmer's guide:

http://dirac.sourceforge.net/documentation/code/programmers_guide/toc.htm

[26] B. Tang et al, “AVS encoder performance and complexity analysis based on mobile video

communication”, WRI International conference on Communications and Mobile Computing,

CMC „09, vol. 3, pp. 102-107, 6-8 Jan. 2009.

[27] A. Ravi and K.R. Rao, “Performance analysis and comparison of the Dirac video codec with

H.264 / MPEG-4 Part 10 AVC”, IJWMIP, vol.9, No. 4, pp.635-654, 2011.

[28] T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, pp.

50-54, Sept. 2011.

[29] Adoption of Dirac pro by SMPTE as VC-2 “"SMPTE 2042-1-2009", Sept. 2009.

VIDEO COMPRESSION STANDARDS FOR HIGH DEFINITION VIDEO: A

Documents