THESIS PROPOSAL STUDY AND IMPLEMENTATION … · thesis proposal study and implementation of unified...

THESIS PROPOSAL

STUDY AND IMPLEMENTATION OF UNIFIED LOOP FILTER IN H.264

Under guidance of

DR K R RAO

DEPARTMENT OF ELECTRICAL ENGINEERING

UNIVERSITY OF TEXAS AT ARLINGTON

By

PAVAN KUMAR REDDY GAJJALA

1000769393

[email protected]

ACRONYMS AND ABBREVIATIONS

AVC: Advanced Video Coding

CABAC: Context-based Adaptive Binary Arithmetic Coding

CAVLC: Context-based Adaptive Variable Length Coding

DLF: De-blocking Loop Filter

DPB: Decoded Picture Buffer

DVB: Digital Video Broadcasting

FMO: Flexible Macro block Ordering

GOP: Group of Pictures

HD: High Definition

ISO: International Standards Organization

ITU: International Telecommunication Union

JVT: Joint Video Team

LMSE: Least Mean Square Error

MC: Motion Compensation

MDCT: Modified Discrete Cosine Transform

ME: Motion Estimation

MPEG: Moving Picture Experts Group

PIT: Pre-scaled Integer Transform

PSNR: Peak Signal to Noise Ratio

RDO: Rate Distortion Optimization

SD: Standard Definition

SI: Switching I

SP: Switching P

SSIM: Structural Similarity Index Metric

VCEG: Video Coding Experts Group

ABSTRACT: The thesis is based on study and implementation of unified loop filter for video

coding in (H.264) [1], which suppresses the quantization noise optimally and improves the

objective and subjective qualities of the reconstructed picture simultaneously .Unified loop filter

unifies nonlinear enhancement filter and linear restoration filter within the classical optimization

framework of least mean square error. Experimental results show that unified loop filter achieves

superior objective coding gain and better visual quality improvement around edges and textures,

when compared with loop de-blocking filter in H.264/AVC high profile.[2]

MOTIVATION

The motivation to unify de-blocking filtering and Wiener filtering into one filtering

framework is that, if the number of the sources causing information loss is reduced, the capability

of picture restoration can be further improved. [2]

INTRODUCTION

1. OVERVIEW OF H.264

H.264 or MPEG-4 part 10: AVC [10] is the next generation video codec developed by

MPEG (moving picture experts group) of ISO/IEC and VCEG (video coding experts group) of

ITU-T, together known as the JVT (joint video team). The H.264/MPEG-4 AVC standard, like

previous standards, is based on motion compensated transform coding.

H.264 also uses hybrid block based video compression techniques such as transformation

for reduction of spatial correlation, quantization for bit-rate control, motion compensated

prediction for reduction of temporal correlation and entropy coding for reduction in statistical

correlation. The important changes in H.264 occur in the details of each functional element. It

includes adaptive intra-picture prediction, a new 4x4 integer transform, multiple reference

pictures, variable block sizes, a quarter pel precision for motion compensation, an in-loop de-

blocking filter, and improved entropy coding.

Fig 1.1 shows the H.264 encoder block diagram and Fig 1.2 shows the H.264 decoder

block diagram.

Fig. 1.1: H.264 encoder block diagram [10]

Fig. 1.2: H.264 decoder block diagram [10]

The functions of different blocks of the H.264 encoder are described below:

Transform: A 4x4 integer transform is used and the transform coefficients are explicitly

specified in AVC and allow it to be perfectly invertible. In AVC, the transform coding always

uses predictions to construct the residuals, even in the case of intra macro blocks. [10]

Quantization and scan: The standard specifies the mathematical formulae of the quantization

process. The scale factor for each element in each sub-block varies as a function of the

quantization parameter associated with the macro block that contains the sub block, and as a

function of the position of the element within the sub-block. The rate-control algorithm in the

encoder controls the value of quantization parameter. [10]

CAVLC and CABAC entropy coders: VLC encoding of syntax elements for the compressed

stream is performed using Exp-Golomb codes. For transform coefficient coding AVC includes

two different entropy coding methods (CAVLC and CABAC) [17, 18] for coding the quantized

transform coefficients. The entropy coding method can change as often as every picture. [10]

De-blocking filter: This filter operates on a macro block after motion compensation and

residual coding, or on a macro block after intra-prediction and residual coding, depending

whether the macro block is inter-coded or intra-coded. The result of the loop filtering operation

is stored as a reference picture. The loop filter operation is adaptive in response to several factors

such as the quantization parameter of the current and neighboring macro blocks, the magnitude

of the motion vector and the macro block coding type. [10] (Fig 1.3).

Fig 1.3 Shows the block diagram of de-blocking filter in H.264 encoder block diagram [10]

De-blocking filtering is applied to vertical or horizontal edges of 4 ×4 blocks in a macroblock

excluding edges on slice boundaries, in the following order:

1. Filter 4 vertical boundaries of the luma component in order a, b, c, d in Figure 1.4

2. Filter 4 horizontal boundaries of the luma component in order e, f, g, h, in Figure 1.4

3. Filter 2 vertical boundaries of each chroma component (i, j)

4. Filter 2 horizontal boundaries of each chroma component (k, l)

Fig 1.4 Edge filtering order in a macro block [10]

Mode decision: It determines the coding mode for each macro block. Mode decision to achieve

high efficiency may use rate distortion optimization. Mode decision works with rate control

algorithm and the outcome is the best-selected coding mode for a macro block. [10]

Intra prediction: Prediction for intra macro blocks is called intra-prediction and is done in

pixel-domain in this standard. The standard describes intra-prediction as linear interpolations of

pixels from the adjacent edges of neighboring macro blocks that are decoded before the current

macro block. The interpolations are directional in nature, with multiple modes, each implying a

spatial direction of prediction. For luminance pixels with 4x4 partitions, 9 intra-prediction modes

are defined. Four intra-prediction modes are defined when a 16x16 partition is used – mode 0,

mode 1, mode 2 and mode 4. [10]. Table 2.1 shows the different prediction block sizes like

16×16, 8×8, 4×4 and their possible prediction modes.

Table 2.1 Different intra prediction block sizes with possible prediction modes [10]

4 × 4 luma prediction modes: Fig 2.1 shows a sample 4×4 luma block (P) to be predicted , the

samples above and to the left, labeled A-M in Fig 2.1, have previously been encoded and

reconstructed and are therefore available in the encoder and decoder to form a prediction

reference. The samples a, b, c . . . . p of the prediction block P (Fig 2.1) are calculated based on

the samples A-M. Table 2.2 shows the possible prediction modes for a 4×4 luma block, the

arrows in Fig 2.2 indicate the direction of prediction in each mode. For modes 3-8, the predicted

samples are formed from a weighted average of the prediction samples A-M. For example, if

mode 4 is selected, the top-right sample of P, labeled‘d’ is predicted by:

d = round (B/4+C/2+D/4).

Fig 2.1 4 × 4 luma block to be predicted [10]

Table 2.2 Labeling of prediction samples, 4 × 4 prediction [10]

Fig 2.2 4 × 4 intra prediction modes [10]

16 × 16 luma prediction modes: As an alternative to the 4×4 luma modes described, the entire

16×16 luma component of a macro block may be predicted in one operation. Four modes are

available, shown in Fig 2.3 and in tabular form (Table 2.3) [10].

Fig 2.3 Intra 16 × 16 prediction modes [10]

Table 2.3 Labeling of prediction samples, 16×16 prediction [10]

Inter prediction: Inter prediction is the process of predicting a block of luma and chroma

samples from a reference picture that has previously been coded and transmitted. This involves

selecting a prediction region, generating a prediction block and subtracting this from the original

block of samples to form a residual that is then coded and transmitted. The block of samples to

be predicted, a macro block partition or sub-macro block partition, can range in size from a

complete macro block, i.e. 16 × 16 luma samples and corresponding chroma samples, down to a

4 × 4 block of luma samples and corresponding chroma samples. The reference picture is chosen

from a list of previously coded pictures, stored in a decoded picture buffer (DPB), which may

include pictures before and after the current picture in display order .The offset between the

position of the current partition and the prediction region in the reference picture is a motion

vector. The motion vector may point to integer, half or quarter-sample positions in the luma

component of the reference picture. Half- or quarter-sample positions are generated by

interpolating the samples of the reference picture. Each motion vector is differentially coded

from the motion vectors of neighboring blocks. The prediction block may be generated from a

single prediction region in a reference picture (P block), or from two prediction regions in

reference pictures (B macro block) [10]. Sample sequences of reference pictures coded using

inter prediction with P and B frames is shown in Fig 2.4.

Fig. 2.4 Sample sequence reference pictures in H.264 [10]

Following are the steps involved in inter prediction.

Interpolate the picture(s) in the decoded picture buffer (DPB), to generate 1/4-sample

positions in the luma component and 1/8-sample positions in the chroma components.

Fig 3.1 explains the interpolation of luma half-pel pixels.

Luma Component: The half-pel samples in the luma component of the reference picture

are generated first, Figure 3.1, grey markers. Each half-pel sample that is adjacent to two integer

samples, e.g. b, h, m, s in Figure 3.1, is interpolated from integer-pel samples using a 6 tap finite

impulse response (FIR) filter with weights (1/32, −5/32, 5/8, 5/8, −5/32, 1/32). For example, half-

pel sample b is calculated from the 6 horizontal integer samples E, F, G, H, I and J using a

process equivalent to:

b = round ((E − 5F + 20G + 20H − 5I + J)/32)

Similarly, h is interpolated by filtering A, C, G, M, R and T. Once all of the samples

adjacent to integer samples have been calculated, the remaining half-pel positions are calculated

by interpolating between six horizontal or vertical half-pel samples from the first set of

operations. For example, j is generated by filtering cc, dd, h, m, ee and ff. Note that the result is

the same whether j is interpolated horizontally or vertically. The 6-tap interpolation filter is

relatively complex but produces an accurate fit to the integer-sample data and hence good motion

compensation performance.

Fig 3.1 Interpolation of luma half-pel positions [10]

Choose an inter prediction mode from the following options:

(a) Choice of reference picture(s), previously-coded pictures available as sources

for prediction. (Table 3.1)

(b) Choice of macro block partitions and sub-macro block partitions, i.e.

prediction block sizes. (Fig 3.2)

(c) Choice of prediction types:

(i) Prediction from one reference picture in list 0 for P or B macro blocks

or list 1 for B macro blocks only.

(ii) Bi-prediction from two reference pictures, one in list 0 and one in list

1, B macro blocks only, optionally using weighted prediction.

Table 3.1 Reference picture sources [10]

Fig 3.2 Macro block partitions and sub-macro block partitions [10].

Choose motion vector(s) for each macro block partition or sub-macro block partition, one

or two vectors depending on whether one or two reference pictures are used. (Table 3.2)

Table 3.2 Reference frames and motion vectors for P and B macro blocks [10]

Predict the motion vector(s) from previously-transmitted vector(s) and generate motion

vector difference(s). Optionally, use direct mode prediction, B macro blocks only.

Code the macro block type, choice of prediction reference(s), motion vector difference(s)

and residual.

Apply a de-blocking filter prior to storing the reconstructed picture as a prediction

reference for further coded pictures.

H.264/AVC profiles

H.264 standard is defined with a large variety of coding tools. This is done to make sure that

standard caters to all classes of applications. However, not all tools are required for a particular

application. So, the coding tools are segregated into different groups called profiles. The basic

profiles defined in the standard are shown in Fig. 4.1

Fig. 4.1 Profile structure in H.264 [10]

Some common features to all profiles are:

1. Intra-coded slices (I slice): These slices are coded using prediction only from decoded

samples within the same slice.

2. Predictive-coded slices (P slice): These slices are usually coded using inter prediction

from previously decoded reference pictures, except for some macro blocks in P slices that

are intra coded. Sample values of each block are predicted using one motion vector and

also weighted using multiple frames.

3. 4X4 modified integer DCT.

4. CAVLC for entropy encoding.

5. Exponential Golomb encoding for headers and associated slice data.

Baseline profile: I- and P-slice coding, enhanced error resilience tools (flexible macro block

ordering (FMO), arbitrary slices and redundant slices), and CAVLC, offers the least coding

efficiency. [10]

Extended profile: Superset of the baseline profile, besides tools of the baseline profile it

includes B-, SP- and SI-slices, data partitioning, and interlace coding tools, provides better

coding efficiency [10]

Main profile: I-, P- and B-slices, interlace coding, CAVLC and CABAC, provides highest

possible coding efficiency, designed to best suit the digital storage media, television broadcasting

and set-top box applications. [10]

High profile: The high profile is a superset of the main profile and adds the following tools: 8 ×

8 transform and 8 × 8 intra prediction for better coding performance, especially at higher spatial

resolutions, quantizer scale matrices which support frequency-dependent quantizer weightings,

separate quantizer parameters for Cr and Cb and support for monochrome video. The high

profile makes it possible to use a higher coded data rate for the same level. The high profile may

be particularly useful for high definition applications. Fig 4.2 shows four high profile together

with main profile for comparison, each of these profiles adds coding tools that support higher-

quality applications – High Definition, extended bit depths, higher color depths – at the expense

of greater decoding complexity. [10]

Fig 4.2 Main and four high profiles [10]

PRINCIPLE OF UNIFIED LOOP FILTER:

The unified loop filter [2] unifies nonlinear enhancement filter (for removing blocking

and ringing artifacts) and linear restoration filter (for improving coding efficiency) within the

classical optimization framework of least mean square error (LMSE). The joint use of DLF and

ALF [5] can be replaced by ULF [2].The unified loop filter unifies the nonlinear bilateral filter

and the linear Wiener filter which are explained in the following section.

Suppose that X = is a support column vector containing N pixels of the

reconstructed picture arranged by the spatial order surrounding the central pixel , as shown in

Fig. 5.1

Fig. 5.1 Support vector X [2].

1. Nonlinear Bilateral Filter

Nonlinear bilateral filter [13] is designed to address the limitation of the low-pass de-blocking

filters, and has shown to be effective in de-noising both blocking and ringing artifacts while

retaining the sharpness of real edges. The output of bilateral filter is given

where is the weighting function that is designed to smooth in regions of similar

intensity while keeping edges intact, by heavily weighting those pixels that are both

geometrically close and photometrically similar to the center pixel

d ( , ) is the Euclidean distance between , and and denote the geometric

spread parameter and the photometric spread parameter, respectively.

2. Linear Wiener Filter

The Wiener filter [14] is a well-known optimal linear filter to cope with the pictures

degraded by the Gaussian noise, blurring and distortion caused by compression. The output of

Wiener filter is given by

∑

where is a column vector of N optimal filter coefficients and X is the support vector

shown in Fig 4.2 that can be obtained by the least mean square error (LMSE) algorithm. [17]

3. Unified loop filter

In order to achieve both objective and subjective quality optimizations, nonlinear

similarity-ordered statistics filter is concatenated with linear spatially ordered statistics filter,

a.k.a. Wiener filter [14], to form the proposed unified loop filter [2]. Therefore, the output of

unified loop filter becomes

where is a column vector M + N optimal filter coefficients (N coefficients for the

nonlinear part and M coefficients for the linear part), and

=

= is

used to support .The advantages of this filter are that, in the presence of singularities, the

weights on the similarity ordered pixels (nonlinear part) that are similar to the central pixel value

can be increased to better preserve edges; in cases in which frequency selectivity is most

advantageous, the weights on the spatially ordered pixels (linear part) can be set accordingly [2].

The block diagram of H.264 encoder with unified loop filter shown in Fig 5.2

Fig 5.2: H.264 encoder block diagram with unified loop filter.[2]

4. Unified loop filter Design

For each nonlinear and linear group, unified loop filter should be carefully designed to

meet different quantization error characteristics. Here, classification-based filter design strategy

is proposed. For the enhancement pixels of luma component, the unified loop filter should not

only emphasize de-blocking enhancement, but also take restoration into consideration. [2]

Therefore, the nonlinear part consists of one 12-tap diamond filter, whereas the linear part

consists of four kinds of taps (1-tap, 13-tap, 25-tap, and 41-tap) diamond filters with quadrant

symmetry, as shown in Fig. 5.3(a). For the restoration pixels of luma component, the unified

loop filter should not only emphasize restoration, but also take de-ringing enhancement into

consideration. Therefore, the nonlinear part consists of one 8-tap cross filter, whereas the linear

part consists of three kinds of taps (13-tap, 25-tap, and 41-tap) diamond filters with central point

symmetry, as shown in Fig. 5.3(b). The reason for only one filter type in the nonlinear part is that

shorter filter cannot efficiently remove artifacts, where as longer filter will increase side

information significantly. For the enhancement pixels of chroma components, the nonlinear part

consists of one 4-tap diamond filter, whereas the linear part consists of two kinds of taps (1-tap

and 13-tap) diamond filters with quadrant symmetry, as shown in Fig. 5.4(a) and Fig. 5.4(b).

Fig. 5.3: Construction of classification-based unified loop filter for (a) enhancement

pixels and (b) restoration pixels of luma component [2]

Fig. 5.4: Construction of classification-based unified loop filter for (a) enhancement

pixels and (b) restoration pixels of chroma component [2]

Since the unified loop filter for each group has different combinations of nonlinear part

and linear part, the best combination for each group is decided by RDO selection on a frame

basis, = D + λ*R

where D is the distortion between the filtered frame and the original frame, and R is the

number of bits for the filter side information which includes filter tap type and filter coefficient

quantization bits. The filter coefficients can be encoded in two modes: 10-bit fixed length coding

or temporal prediction coding with exp-Golomb code. [2]

Goal:

The main aim of the thesis is to take advantage of the order statistics of the filter [4] and unify

the non linear bilateral filter [13] and linear Wiener filter [14] into an unified loop filter which

improves the reconstructed image (removes ringing, blocking artifacts and reduces quantization

noise) and thus improves the prediction, which in overall increases the compression of video.

The thesis also compares the performance of H.264 with and without unified loop filter (ULF)

based on various test sequences at different bit rates in terms of MSE, PSNR, SSIM and

complexity [16].

References

[1] Kwon Soon-kak, A. Tamhankar and K. R. Rao, “Overview of H.264/MPEG-4 part 10”, 4th

EURASIP Conference on Video/Image Processing and Multimedia Communications, vol. 1, pp. 1-51,

July. 2003.

[2] Y. Liu, “Unified Loop Filter for Video Compression”, IEEE Trans. on Circuits and Systems for

Video Technology, vol. 20, no. 10, pp. 1378 – 1382, Oct. 2010.

[3] T. Wedi, “Adaptive interpolation filter for motion compensated prediction”, IEEE International

Conference on Image Processing (ICIP2002), New York, USA, vol. 2, pp. II-509 - II-512, Sept. 2002.

[4] A. Bovik , T. Huang and D. Munson, “A generalization of median filtering using linear

combinations of order statistics”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31,no. 6,

pp. 1342 – 1350, Dec. 1983.

[5] Y. Liu and Y. Huo, “Unified loop filter for high-performance video coding”, IEEE International

conference on Multimedia and Expo (ICME), pp. 1271-1276, July. 2010. [6] S. Wittmann and T. Wedi, “Transmission of post-filter hints for video coding schemes”, IEEE

International Conference on Image Processing (ICIP), vol. 1, pp. 81-84, Sept. 16. 2007 - Oct. 19. 2007

[7]P. List, A. Joch, J. Lainema, G. Bjontegaard and M. Karczewicz, “Adaptive de blocking filter”,

IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614-619, July. 2003.

[8] C. Qian, Z. Yunfei, Y. Peng, L. Xiaoan, J. Sole, X. Qian, E. Francois and W. Dapeng, “Classified

quad tree-based adaptive loop filter”, 2011 IEEE International Conference on Multimedia and Expo

(ICME), pp. 1-6, 15 July. 2011.

[9] A. Bovik, T. Huang and D. Munson , “Nonlinear filtering using linear combinations of order

statistics”, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2067-2070,

May. 1982.

[10] I. E. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Wiley 2010.

[11] H.264/AVC JM reference software. Website: http://iphome.hhi.de/suehring/tml/download

[12] T. Wiegand, G.J. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video

coding standard”, IEEE Trans. CSVT, vol. 13, pp. 560-576, July. 2003.

[13] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images”, Computer Vision,

Sixth International Conference, pp. 839-846, Jan. 1998

[14] Y. Chiu and L. Xu, “Adaptive (Wiener) filter for video compression”, ITU-T SG16 Contribution

C437, Geneva, April. 2008.

[15] Z. Wang et al, “Image quality assessment: from error visibility to structural similarity”, IEEE

Trans. on Image processing, vol. 13, pp. 600-612, April. 2004.

[16] X. Wang, “Recursive algorithms for linear LMSE estimators under uncertain observations”, IEEE

Trans. on Automatic control, vol. 29, pp. 853-854, Sep. 1984.

[17] I.E Richardson, “White paper: H.264/AVC context adaptive variable length coding”, http://www.vcodex.com/files/H264_cavlc_wp.pdf.

[18] D. Marpe, H. Schwarz, and T. Wiegand: Context-Based Adaptive Binary Arithmetic Coding in the

H.264 / AVC Video Compression Standard, IEEE Trans. on Circuits and Systems for Video Technology,

vol. 13, no. 7, pp. 620-636, July. 2003.

http://www.vcodex.com/files/H264_cavlc_wp.pdf

[19] T. Wiegand and G.J Sullivan, “The picture phone is here. Really”, IEEE spectrum, vol.48, pp.50-

54, Sept. 2011.

[20] G.J. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC advanced video coding standard:

Overview and introduction to the fidelity range extensions”, SPIE Conf. on application of Digital Image

Processing XXVII, vol.5558, pp.53-74, Aug. 2004.

[21] T. Wiegand and G.J. Sullivan, “The H.264/AVC Video coding standard”, IEEE SP Magazine, vol.

24, pp. 148-153, March. 2007.

[22] G.J. Sullivan and T. Wiegand, “Video compression –From concepts to the H.264/AVC standard,”

Proc. IEEE, vol. 93, no.1, pp. 18-31, Jan. 2005.

[23] I.E.G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding For Next Generation

Multimedia, New York: Wiley, 2003.

[24] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression

standard”, Signal processing, image Communication, vol.19, pp. 793-849, Oct. 2004.

[25] G.J. Sullivan, “The H.264/MPEG-4-AVC video coding standard and its deployment status,” Proc.

SPIE Conf. Visual Communications and Image Processing (VCIP), Beijing, China, vol. 5960, pp.709-

719, July. 2005.

[26] T. Sikora, “Trends and perspective in image and video coding “, PROC. IEEE, vol. 93, pp. 6-17,

Jan. 2005.

Date post:	12-Jun-2018
Category:	Documents
Upload:	danghanh
View:	215 times
Download:	0 times

THESIS PROPOSAL STUDY AND IMPLEMENTATION … · thesis proposal study and implementation of unified...

Documents