Post on 12-Sep-2021
transcript
EE5359 Spring 2012
EE5359 Project Proposal on
VIDEO COMPRESSION STANDARDS FOR HIGH DEFINITION VIDEO: A
COMPARATIVE STUDY OF H.264, DIRAC PRO AND AVS PART 2
By Sudeep Gangavati
ID : 1000717165
EE5359 Spring 2012
LIST OF ACRONYMS
AU: Access Unit
AVS: Audio Video Standard
AVS-M: Audio Video Standard for mobile
B-Frame: Bidirectionally Interpolated Frame
BBC: British Broadcasting Corporation
CAVLC: Context Adaptive Variable Length Coding
CBP: Coded Block Pattern
CIF: Common Intermediate Format
DIP: Direct Intra Prediction
DPB: Decoded Picture Buffer
EOB: End of Block
HD: High Definition
ICT: Integer Cosine Transform
IDR: Instantaneous Decoding Refresh
I-Frame: Intra Frame
ITU-T: International Telecommunication Union
JPEG: Joint Photographic Experts Group
MSE: Mean Square Error
PSNR: Peak Signal to Noise Ratio
QCIF: Quarter Common Intermediate Format
SMPTE: Society of Motion Picture and Television Engineers
SSIM: Structural Similarity Index
EE5359 Spring 2012
Video Compression Standards for High Resolution Video
Objective:
The objective of this project to study, implement and compare video coding standards like
H.264/AVC [1], Dirac pro [3] and AVS China P2 (AVS video) [8]. The analysis will be carried out
and different performance metrics like MSE, PSNR, bitrate, SSIM [13] and video quality will be
evaluated for high resolution videos at various bitrates.
Motivation:
With the ever increasing demand for high definition video, several different video coding
standards have been developed to address the needs of HD Video coding. The project attempts
to implement and evaluate the video coding standards that have been extensively used for HD
video broadcasting, storage and distribution. The video coding standards that will be evaluated
are H.264/AVC (Main Profile) [1], AVS China P2 (Base Profile) [8] and Dirac Pro [5]. Since Dirac
Pro is intra frame coding only, the analysis will be carried out only for the intra frame coding in
case of H.264/AVC [1] and AVS China P2 [8].
Introduction:
H.264/AVC
The H.264/AVC is the latest advanced video coding standard developed by ITU-T Video Coding
Experts Group together with the ISO/IEC Moving Picture Experts Group [1]. It is the most widely
used video coding standard [15] [28] for streaming videos, mobile/handheld applications, HDTV
broadcasting etc. The H.264 standard supports three sampling patters for luminance component
(Y), red-difference chroma component (Cr) and blue-difference chroma component (Cb) [20].
The 4:4:4 sampling means that the three components (Y: Cr: Cb) have the same resolution and a
sample of each component exists at every pixel position as show in Figure 1(a).
Fig. 1(a) 4:4:4 , 4:2:2, 4:2:0 Sampling patterns [20]
EE5359 Spring 2012
An H.264 encoder converts the raw video into a compressed version and the decoder converts
the compressed video back to its original format.
The H.264 encoder block diagram is shown in Figure 1(b).
Fig.1 (b) Basic H.264 encoder structure [1]
The encoder performs transform, quantization, prediction and encoding to produce compressed
video. The decoder shown in Figure 2 on the other hand does the inverse operations to obtain
the uncompressed video.
Fig.2 H.264 decoder structure [2] [17]
H.264 Profiles
H.264 standard defines a set of profiles and levels to set points of conformance for different
classes of applications and services. For each profile there are specific encoding tools that will be
EE5359 Spring 2012
supported by the decoders conforming to that profile. There are mainly seven profiles [2]:
Baseline, Main, Extended, High, High 10, High 4:2:2, High 4:4:4.
Main profile is designed for digital storage media and television broadcasting. Extended profile
is designed for multimedia services over Internet. Baseline profile is aimed at real time
applications such as video conferencing. High profile mainly aims at applications such as
content distribution, studio editing and high resolution videos. Profiles are shown in Figure 3.
Fig.3 H.264 profiles and levels [2]
Video Coding Algorithm
The encoder block diagram is shown in Figure 1 [1]. Encoder will select between intra and inter-
coding for blocks of each picture. Intra coding exploits several spatial prediction modes as
shown in Figure 3(a) and (b) to reduce the redundancies in the signal for a single picture. Inter
coding does the inter prediction of each block of sample values from previously coded pictures.
Inter coding uses motion vectors for block based inter-prediction to reduce temporal
redundancy between different pictures. The deblocking filter is used to reduce the blocking
artifacts at the block boundaries. The prediction residual is further transformed to remove
spatial correlation in the block before it is quantized. Finally, the intra predicted modes or
motion vectors are combined with the quantized transform coefficient information and encoded
using arithmetic coding or entropy coding.
Prediction in H.264
The H.264 video coding standard employs two different prediction techniques called intra
prediction and inter prediction in order to predict the current macroblock [20]. In intra
prediction, the prediction for the current macroblock of image samples is created from
previously coded samples in the same frame. In an intra macroblock, there are three choices for
the intra macroblock size for the luma component namely 16 x 16, 8 x 8 or 4 x 4 [1][20]. A single
prediction block is generated using one of a number of possible prediction modes. For a 4 x 4
EE5359 Spring 2012
macroblock there are nine modes, for 8 x 8 macroblock size available in High profile also has 9
modes but for 16 x 16 there are 4 modes [20].
During the prediction, one mode is selected and is then used to predict the values. The modes
for 4 x 4 and 16 x 16 luma prediction modes are shown in Figure 3 (a) and Figure 3 (b)
respectively.
Fig. 3(a) Intra prediction modes for 4 x 4 block size [20].
Fig. 3 (b) Intra prediction modes for 16 x 16 block size [20].
In inter prediction, motion estimation and motion compensation techniques are used to predict
the current macroblock [20]. This process involves selecting a prediction region, generating a
prediction block and subtracting this from the original block of samples to form a residual that
is transformed, quantized and then encoded. The macroblock can be of different sizes as shown
in Figure 3 (c).
Fig.3(c) Inter prediction macroblock sizes [20]
0
Sub-macroblock partitions
0
1 0 1
0 1
2 3
0 0
1 0 1
0
2
1
3
1 macroblock partition of 16*16 luma samples and
associated chroma samples
Macroblock partitions
2 macroblock partitions of 16*8 luma samples and
associated chroma samples
4 sub-macroblocks of 8*8 luma samples and
associated chroma samples
2 macroblock partitions of 8*16 luma samples and
associated chroma samples
1 sub-macroblock partition of 8*8 luma samples and
associated chroma samples
2 sub-macroblock partitions of 8*4 luma samples and
associated chroma samples
4 sub-macroblock partitions of 4*4 luma samples and
associated chroma samples
2 sub-macroblock partitions of 4*8 luma samples and
associated chroma samples
EE5359 Spring 2012
Deblocking Filter
A filter is applied to every decoded macroblock as shown in Figure 3(d) to reduce the blocking
distortion [20]. The deblocking filter is applied after the inverse transform in the encoder before
reconstructing and storing the macroblock for future predictions and in the decoder before
reconstructing and displaying the macroblock. The filter smooths the block edges, improving
the appearance of the decoded frames.
Fig. 3 (d) Boundaries in a macroblock to be filtered (luma boundaries shown with
solid lines and chroma boundaries shown with dotted lines) [2]
DIRAC PRO
Dirac is a video codec originally developed by BBC [3]. The main aim of Dirac video standard is
to provide high-quality video compression for web streaming and HDTV applications. BBC used
Dirac to transmit HDTV pictures of Beijing Olympics in 2008 [3][4].
Dirac Pro is a version of Dirac family of compression tools mainly optimized for video
production and archiving applications and the focus is on high quality and low latency. Dirac Pro
is intended for high quality applications with lower compression ratios [4][5].
Dirac Pro supports the following technical aspects [5]:
Intra-frame coding only
10 bit 4:2:2
No Subsampling
Lossless or visually lossless compression
Low latency on encode/decode
Robust over multiple passes
Support for multiple HD image formats and frame rates
Low complexity for decoding
EE5359 Spring 2012
The main difference in the Dirac and Dirac Pro is the treatment in the final process in
compression – the arithmetic coding. Arithmetic coding is processing intensive and introduces
delay. These features are undesirable in high end production work and hence Dirac Pro omits
arithmetic coding.
Fig.4 (a) Dirac encoder [3]
Fig.4 (b) Dirac decoder [3]
The encoder and decoder block diagram are shown in Figure 4 (a) and (b) respectively.
Architecture
Dirac can compress any size of picture from low resolution QCIF to HDTV. Dirac employs wavelet
compression instead of discrete cosine transform [6] used in other codecs. Another application
of wavelet transform is the JPEG 2000 compression standard for still images [7].
EE5359 Spring 2012
Motion Estimation
In Dirac, frames have two essential properties. Firstly, they are either predicted from other
frames i.e. Inter. Secondly they can be used to predict other frames. All combinations of these
properties are possible, and any inter frame can be predicted from up to two reference frames.
But in Dirac pro, only intra frame coding is used. Dirac Pro provides spatial and quality
scalabilities, useful to save bandwidth during the transmission of a single bit stream to receivers
with different image resolution and bandwidth requirements [6]. Dirac pro has been adopted by
SMPTE as VC-2 [29].
AVS CHINA
AVS is an acronym for Audio Video Standard which is a compression codec for digital audio and
video developed by China [8]. AVS China was developed to replace the most used H.264/AVC
standard. AVS China finds its applications in high resolution broadcast, video on wireless
communications medium etc.
AVS China has been divided into various parts and thus dividing the AVS china architecture into
various sub-fields. The AVS standard has been divided into 10 parts as shown in Figure 5 [16].
Fig.5 AVS parts [16]
EE5359 Spring 2012
AVS part 1 considers the system for broadcast. AVS Part 2 considers the video part. AVS Part3
covers the audio part and AVS Part 6 includes content creation. The AVS Part 2 encoding and
decoding structures [8] are shown in Figure6 (a) and (b).
Fig.6 (a) AVS China part 2 encoding structure [8].
Fig.6 (b) AVS China part 2 decoding structure [8].
System architecture
AVS Part 2 is hybrid coding based on spatial and temporal predictions, integer transform and
entropy coding. The system architecture is illustrated in Figure.6 [8]
Intra Prediction
Spatial prediction as shown in Figure 7 is used in intra coding in AVS part 2 to exploit spatial
correlations of picture. The intra prediction is based on 8x8 block. The intra prediction method
is derived from the neighboring pixels in left and top blocks. There are five luminance intra
EE5359 Spring 2012
prediction modes, and four chrominance intra prediction modes. The reconstructed pixels of
neighboring blocks before deblocking filter are used on reference pixels for the current block
.
Fig.7 Five different modes for 8 x8 block intra luminance prediction [8].
Deblocking filter
The deblocking filter is used to reduce/eliminate the block artifacts and enhance both subjective
and objective performance. AVS Part 2 deblocking filter first calculates the boundary strength
(BS) of each block boundary, and then applies different filters for different BS.
VLC
AVS Part 2 utilizes an efficient context based 2D-VLC entropy coder for coding 8x8 block-size
transform coefficients. 2D-VLC means that a pair Run-Level is regarded as one event and jointly
coded.
CONCLUSION
This project aims at a thorough study, implementation and exhaustive comparison of video
coding standards like H.264, Dirac pro and AVS part 2. Analysis will be carried out and different
performance metrics like MSE, PSNR etc. will be evaluated for different high definition video
sequences. Based on the values of these performance metrics, conclusions will be drawn as to
which video coding standard is best suited for high definition video compression.
References:
EE5359 Spring 2012
[1] T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video
coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13,
pp.560-576, July 2003.
[2] S.K.Kwon, A. Tamhankar and K.R.Rao, “Overview of H.264/MPEG-4 Part 10” J.VCIR, Vol.
17, pp. 186-216, April 2006, Special Issue on “Emerging H.264/AVC video coding
standard”.
[3] “ The Dirac web page” :http://www.bbc.co.uk/rd/projects/dirac/intro.shtml.
[4] “Dirac Codec Wiki Page ” at http://en.wikipedia.org/wiki/Dirac(codec).
[5] “Dirac Pro web page” at http://www.bbc.co.uk/rd/projects/dirac/diracpro.shtml.
[6] “Video on the web “ at http://etill.net/projects/dirac_theora_evaluation/.
[7] T. Borer, and T. Davies, “Dirac video compression using open technology”, BBC EBU
Technical Review, July 2005.
[8] L. Yu et al, “An overview of AVS-Video: tools, performance and complexity”, Visual
Communications and Image Processing, Proc. of SPIE, vol. 5960, pp.679-690, July 2006.
[9] AVS Video Expert Group, “Information technology – Advanced coding of audio and video
– Part 2: Video (AVS1-P2 JQP FCD 1.0)”, Audio Video Coding Standard Group of China
(AVS), Doc. AVS-N1538, Sep. 2008.
[10] Special issue on “AVS and its applications” Signal processing: Image Communication,
vol.24, pp. 245-344, April 2009.
[11] JVT ”Draft ITU-T recommendation and final draft international standard of joint video
specification (ITU-T rec. H.264– ISO/IEC 14496-10 AVC),” March 2003, JVT-G050
available on http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT-G050.pdf .
[12] K. Onthriar, K. K. Loo and Z. Xue, “Performance comparison of emerging Dirac video
codec with H.264/AVC”, IEEE International Conference on Digital Telecommunications,
2006, ICDT apos; Vol. 06, Page: 22, Issue: 29-31, Aug. 2006.
[13] Z. Wang and A.C. Bovik, “A universal image quality index”, IEEE Signal Processing
Letters, Vol.9, pp. 81-84, March 2002.
[14] A. Ravi, “Performance analysis and comparison of the Dirac video codec with
H.264/AVC”, M.S. Thesis, Electrical Engineering Department, University of Texas at
Arlington, August 2009.
[15] “H.264/MPEG-4 AVC web page” http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC.
[16] H.264 AVC JM Software : http://iphome.hhi.de/suehring/tml/.
EE5359 Spring 2012
[17] H.264 decoder: http://www.adalta.it/Pages/407/266881_266881.jpg.
[18] W. Gao et al, “AVS - The Chinese next-generation video coding standard” NAB, Las
Vegas, 2004.
[19] X. Wang et al, “Performance comparison of AVS and H.264/AVC video coding standards” J.
Computer Science and Technology, vol.21, No.3, pp.310-314, May 2006.
[20] Iain Richardson, “ The H.264 advanced video coding standard”, Second Edition, Wiley, 2010
[21] AVS China part 2 video software, password protected : ftp://124.207.250.92/.
[22] S. Swaminathan and K.R. Rao, “Multiplexing and demultiplexing of AVS CHINA video with
AAC audio,” TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011.
[23] Dirac Pro Software : http://diracvideo.org/download/
[24] M. Tun, K.K. Loo and J. Cosmas, “Semi-hierarchical motion estimation for the Dirac video
codec,” 2008 IEEE International Symposium on Broadband Multimedia Systems and
Broadcasting, pp.1–6, March 31-April 2, 2008.
[25] T. Davies, “The Dirac Algorithm”: http://dirac.sourceforge.net/documentation/algorithm/,
2008.
[26] Dirac video codec – A programmer's guide:
http://dirac.sourceforge.net/documentation/code/programmers_guide/toc.htm
[26] B. Tang et al, “AVS encoder performance and complexity analysis based on mobile video
communication”, WRI International conference on Communications and Mobile Computing,
CMC „09, vol. 3, pp. 102-107, 6-8 Jan. 2009.
[27] A. Ravi and K.R. Rao, “Performance analysis and comparison of the Dirac video codec with
H.264 / MPEG-4 Part 10 AVC”, IJWMIP, vol.9, No. 4, pp.635-654, 2011.
[28] T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, pp.
50-54, Sept. 2011.
[29] Adoption of Dirac pro by SMPTE as VC-2 “"SMPTE 2042-1-2009", Sept. 2009.