THESIS PROPOSAL
STUDY AND IMPLEMENTATION OF UNIFIED LOOP FILTER IN H.264
Under guidance of
DR K R RAO
DEPARTMENT OF ELECTRICAL ENGINEERING
UNIVERSITY OF TEXAS AT ARLINGTON
By
PAVAN KUMAR REDDY GAJJALA
1000769393
ACRONYMS AND ABBREVIATIONS
AVC: Advanced Video Coding
CABAC: Context-based Adaptive Binary Arithmetic Coding
CAVLC: Context-based Adaptive Variable Length Coding
DLF: De-blocking Loop Filter
DPB: Decoded Picture Buffer
DVB: Digital Video Broadcasting
FMO: Flexible Macro block Ordering
GOP: Group of Pictures
HD: High Definition
ISO: International Standards Organization
ITU: International Telecommunication Union
JVT: Joint Video Team
LMSE: Least Mean Square Error
MC: Motion Compensation
MDCT: Modified Discrete Cosine Transform
ME: Motion Estimation
MPEG: Moving Picture Experts Group
PIT: Pre-scaled Integer Transform
PSNR: Peak Signal to Noise Ratio
RDO: Rate Distortion Optimization
SD: Standard Definition
SI: Switching I
SP: Switching P
SSIM: Structural Similarity Index Metric
VCEG: Video Coding Experts Group
ABSTRACT: The thesis is based on study and implementation of unified loop filter for video
coding in (H.264) [1], which suppresses the quantization noise optimally and improves the
objective and subjective qualities of the reconstructed picture simultaneously .Unified loop filter
unifies nonlinear enhancement filter and linear restoration filter within the classical optimization
framework of least mean square error. Experimental results show that unified loop filter achieves
superior objective coding gain and better visual quality improvement around edges and textures,
when compared with loop de-blocking filter in H.264/AVC high profile.[2]
MOTIVATION
The motivation to unify de-blocking filtering and Wiener filtering into one filtering
framework is that, if the number of the sources causing information loss is reduced, the capability
of picture restoration can be further improved. [2]
INTRODUCTION
1. OVERVIEW OF H.264
H.264 or MPEG-4 part 10: AVC [10] is the next generation video codec developed by
MPEG (moving picture experts group) of ISO/IEC and VCEG (video coding experts group) of
ITU-T, together known as the JVT (joint video team). The H.264/MPEG-4 AVC standard, like
previous standards, is based on motion compensated transform coding.
H.264 also uses hybrid block based video compression techniques such as transformation
for reduction of spatial correlation, quantization for bit-rate control, motion compensated
prediction for reduction of temporal correlation and entropy coding for reduction in statistical
correlation. The important changes in H.264 occur in the details of each functional element. It
includes adaptive intra-picture prediction, a new 4x4 integer transform, multiple reference
pictures, variable block sizes, a quarter pel precision for motion compensation, an in-loop de-
blocking filter, and improved entropy coding.
Fig 1.1 shows the H.264 encoder block diagram and Fig 1.2 shows the H.264 decoder
block diagram.
The functions of different blocks of the H.264 encoder are described below:
Transform: A 4x4 integer transform is used and the transform coefficients are explicitly
specified in AVC and allow it to be perfectly invertible. In AVC, the transform coding always
uses predictions to construct the residuals, even in the case of intra macro blocks. [10]
Quantization and scan: The standard specifies the mathematical formulae of the quantization
process. The scale factor for each element in each sub-block varies as a function of the
quantization parameter associated with the macro block that contains the sub block, and as a
function of the position of the element within the sub-block. The rate-control algorithm in the
encoder controls the value of quantization parameter. [10]
CAVLC and CABAC entropy coders: VLC encoding of syntax elements for the compressed
stream is performed using Exp-Golomb codes. For transform coefficient coding AVC includes
two different entropy coding methods (CAVLC and CABAC) [17, 18] for coding the quantized
transform coefficients. The entropy coding method can change as often as every picture. [10]
De-blocking filter: This filter operates on a macro block after motion compensation and
residual coding, or on a macro block after intra-prediction and residual coding, depending
whether the macro block is inter-coded or intra-coded. The result of the loop filtering operation
is stored as a reference picture. The loop filter operation is adaptive in response to several factors
such as the quantization parameter of the current and neighboring macro blocks, the magnitude
of the motion vector and the macro block coding type. [10] (Fig 1.3).
Fig 1.3 Shows the block diagram of de-blocking filter in H.264 encoder block diagram [10]
De-blocking filtering is applied to vertical or horizontal edges of 4 ×4 blocks in a macroblock
excluding edges on slice boundaries, in the following order:
1. Filter 4 vertical boundaries of the luma component in order a, b, c, d in Figure 1.4
2. Filter 4 horizontal boundaries of the luma component in order e, f, g, h, in Figure 1.4
3. Filter 2 vertical boundaries of each chroma component (i, j)
4. Filter 2 horizontal boundaries of each chroma component (k, l)
Fig 1.4 Edge filtering order in a macro block [10]
Mode decision: It determines the coding mode for each macro block. Mode decision to achieve
high efficiency may use rate distortion optimization. Mode decision works with rate control
algorithm and the outcome is the best-selected coding mode for a macro block. [10]
Intra prediction: Prediction for intra macro blocks is called intra-prediction and is done in
pixel-domain in this standard. The standard describes intra-prediction as linear interpolations of
pixels from the adjacent edges of neighboring macro blocks that are decoded before the current
macro block. The interpolations are directional in nature, with multiple modes, each implying a
spatial direction of prediction. For luminance pixels with 4x4 partitions, 9 intra-prediction modes
are defined. Four intra-prediction modes are defined when a 16x16 partition is used – mode 0,
mode 1, mode 2 and mode 4. [10]. Table 2.1 shows the different prediction block sizes like
16×16, 8×8, 4×4 and their possible prediction modes.
Table 2.1 Different intra prediction block sizes with possible prediction modes [10]
4 × 4 luma prediction modes: Fig 2.1 shows a sample 4×4 luma block (P) to be predicted , the
samples above and to the left, labeled A-M in Fig 2.1, have previously been encoded and
reconstructed and are therefore available in the encoder and decoder to form a prediction
reference. The samples a, b, c . . . . p of the prediction block P (Fig 2.1) are calculated based on
the samples A-M. Table 2.2 shows the possible prediction modes for a 4×4 luma block, the
arrows in Fig 2.2 indicate the direction of prediction in each mode. For modes 3-8, the predicted
samples are formed from a weighted average of the prediction samples A-M. For example, if
mode 4 is selected, the top-right sample of P, labeled‘d’ is predicted by:
d = round (B/4+C/2+D/4).
Fig 2.1 4 × 4 luma block to be predicted [10]
Table 2.2 Labeling of prediction samples, 4 × 4 prediction [10]
Fig 2.2 4 × 4 intra prediction modes [10]
16 × 16 luma prediction modes: As an alternative to the 4×4 luma modes described, the entire
16×16 luma component of a macro block may be predicted in one operation. Four modes are
available, shown in Fig 2.3 and in tabular form (Table 2.3) [10].
Fig 2.3 Intra 16 × 16 prediction modes [10]
Table 2.3 Labeling of prediction samples, 16×16 prediction [10]
Inter prediction: Inter prediction is the process of predicting a block of luma and chroma
samples from a reference picture that has previously been coded and transmitted. This involves
selecting a prediction region, generating a prediction block and subtracting this from the original
block of samples to form a residual that is then coded and transmitted. The block of samples to
be predicted, a macro block partition or sub-macro block partition, can range in size from a
complete macro block, i.e. 16 × 16 luma samples and corresponding chroma samples, down to a
4 × 4 block of luma samples and corresponding chroma samples. The reference picture is chosen
from a list of previously coded pictures, stored in a decoded picture buffer (DPB), which may
include pictures before and after the current picture in display order .The offset between the
position of the current partition and the prediction region in the reference picture is a motion
vector. The motion vector may point to integer, half or quarter-sample positions in the luma
component of the reference picture. Half- or quarter-sample positions are generated by
interpolating the samples of the reference picture. Each motion vector is differentially coded
from the motion vectors of neighboring blocks. The prediction block may be generated from a
single prediction region in a reference picture (P block), or from two prediction regions in
reference pictures (B macro block) [10]. Sample sequences of reference pictures coded using
inter prediction with P and B frames is shown in Fig 2.4.
Fig. 2.4 Sample sequence reference pictures in H.264 [10]
Following are the steps involved in inter prediction.
Interpolate the picture(s) in the decoded picture buffer (DPB), to generate 1/4-sample
positions in the luma component and 1/8-sample positions in the chroma components.
Fig 3.1 explains the interpolation of luma half-pel pixels.
Luma Component: The half-pel samples in the luma component of the reference picture
are generated first, Figure 3.1, grey markers. Each half-pel sample that is adjacent to two integer
samples, e.g. b, h, m, s in Figure 3.1, is interpolated from integer-pel samples using a 6 tap finite
impulse response (FIR) filter with weights (1/32, −5/32, 5/8, 5/8, −5/32, 1/32). For example, half-
pel sample b is calculated from the 6 horizontal integer samples E, F, G, H, I and J using a
process equivalent to:
b = round ((E − 5F + 20G + 20H − 5I + J)/32)
Similarly, h is interpolated by filtering A, C, G, M, R and T. Once all of the samples
adjacent to integer samples have been calculated, the remaining half-pel positions are calculated
by interpolating between six horizontal or vertical half-pel samples from the first set of
operations. For example, j is generated by filtering cc, dd, h, m, ee and ff. Note that the result is
the same whether j is interpolated horizontally or vertically. The 6-tap interpolation filter is
relatively complex but produces an accurate fit to the integer-sample data and hence good motion
compensation performance.
Fig 3.1 Interpolation of luma half-pel positions [10]
Choose an inter prediction mode from the following options:
(a) Choice of reference picture(s), previously-coded pictures available as sources
for prediction. (Table 3.1)
(b) Choice of macro block partitions and sub-macro block partitions, i.e.
prediction block sizes. (Fig 3.2)
(c) Choice of prediction types:
(i) Prediction from one reference picture in list 0 for P or B macro blocks
or list 1 for B macro blocks only.
(ii) Bi-prediction from two reference pictures, one in list 0 and one in list
1, B macro blocks only, optionally using weighted prediction.
Table 3.1 Reference picture sources [10]
Fig 3.2 Macro block partitions and sub-macro block partitions [10].
Choose motion vector(s) for each macro block partition or sub-macro block partition, one
or two vectors depending on whether one or two reference pictures are used. (Table 3.2)
Table 3.2 Reference frames and motion vectors for P and B macro blocks [10]
Predict the motion vector(s) from previously-transmitted vector(s) and generate motion
vector difference(s). Optionally, use direct mode prediction, B macro blocks only.
Code the macro block type, choice of prediction reference(s), motion vector difference(s)
and residual.
Apply a de-blocking filter prior to storing the reconstructed picture as a prediction
reference for further coded pictures.
H.264/AVC profiles
H.264 standard is defined with a large variety of coding tools. This is done to make sure that
standard caters to all classes of applications. However, not all tools are required for a particular
application. So, the coding tools are segregated into different groups called profiles. The basic
profiles defined in the standard are shown in Fig. 4.1
Fig. 4.1 Profile structure in H.264 [10]
Some common features to all profiles are:
1. Intra-coded slices (I slice): These slices are coded using prediction only from decoded
samples within the same slice.
2. Predictive-coded slices (P slice): These slices are usually coded using inter prediction
from previously decoded reference pictures, except for some macro blocks in P slices that
are intra coded. Sample values of each block are predicted using one motion vector and
also weighted using multiple frames.
3. 4X4 modified integer DCT.
4. CAVLC for entropy encoding.
5. Exponential Golomb encoding for headers and associated slice data.
Baseline profile: I- and P-slice coding, enhanced error resilience tools (flexible macro block
ordering (FMO), arbitrary slices and redundant slices), and CAVLC, offers the least coding
efficiency. [10]
Extended profile: Superset of the baseline profile, besides tools of the baseline profile it
includes B-, SP- and SI-slices, data partitioning, and interlace coding tools, provides better
coding efficiency [10]
Main profile: I-, P- and B-slices, interlace coding, CAVLC and CABAC, provides highest
possible coding efficiency, designed to best suit the digital storage media, television broadcasting
and set-top box applications. [10]
High profile: The high profile is a superset of the main profile and adds the following tools: 8 ×
8 transform and 8 × 8 intra prediction for better coding performance, especially at higher spatial
resolutions, quantizer scale matrices which support frequency-dependent quantizer weightings,
separate quantizer parameters for Cr and Cb and support for monochrome video. The high
profile makes it possible to use a higher coded data rate for the same level. The high profile may
be particularly useful for high definition applications. Fig 4.2 shows four high profile together
with main profile for comparison, each of these profiles adds coding tools that support higher-
quality applications – High Definition, extended bit depths, higher color depths – at the expense
of greater decoding complexity. [10]
Fig 4.2 Main and four high profiles [10]
PRINCIPLE OF UNIFIED LOOP FILTER:
The unified loop filter [2] unifies nonlinear enhancement filter (for removing blocking
and ringing artifacts) and linear restoration filter (for improving coding efficiency) within the
classical optimization framework of least mean square error (LMSE). The joint use of DLF and
ALF [5] can be replaced by ULF [2].The unified loop filter unifies the nonlinear bilateral filter
and the linear Wiener filter which are explained in the following section.
Suppose that X = is a support column vector containing N pixels of the
reconstructed picture arranged by the spatial order surrounding the central pixel , as shown in
Fig. 5.1
Fig. 5.1 Support vector X [2].
1. Nonlinear Bilateral Filter
Nonlinear bilateral filter [13] is designed to address the limitation of the low-pass de-blocking
filters, and has shown to be effective in de-noising both blocking and ringing artifacts while
retaining the sharpness of real edges. The output of bilateral filter is given
where is the weighting function that is designed to smooth in regions of similar
intensity while keeping edges intact, by heavily weighting those pixels that are both
geometrically close and photometrically similar to the center pixel
d ( , ) is the Euclidean distance between , and and denote the geometric
spread parameter and the photometric spread parameter, respectively.
2. Linear Wiener Filter
The Wiener filter [14] is a well-known optimal linear filter to cope with the pictures
degraded by the Gaussian noise, blurring and distortion caused by compression. The output of
Wiener filter is given by
∑
where is a column vector of N optimal filter coefficients and X is the support vector
shown in Fig 4.2 that can be obtained by the least mean square error (LMSE) algorithm. [17]
3. Unified loop filter
In order to achieve both objective and subjective quality optimizations, nonlinear
similarity-ordered statistics filter is concatenated with linear spatially ordered statistics filter,
a.k.a. Wiener filter [14], to form the proposed unified loop filter [2]. Therefore, the output of
unified loop filter becomes
where is a column vector M + N optimal filter coefficients (N coefficients for the
nonlinear part and M coefficients for the linear part), and
=
= is
used to support .The advantages of this filter are that, in the presence of singularities, the
weights on the similarity ordered pixels (nonlinear part) that are similar to the central pixel value
can be increased to better preserve edges; in cases in which frequency selectivity is most
advantageous, the weights on the spatially ordered pixels (linear part) can be set accordingly [2].
The block diagram of H.264 encoder with unified loop filter shown in Fig 5.2
Fig 5.2: H.264 encoder block diagram with unified loop filter.[2]
4. Unified loop filter Design
For each nonlinear and linear group, unified loop filter should be carefully designed to
meet different quantization error characteristics. Here, classification-based filter design strategy
is proposed. For the enhancement pixels of luma component, the unified loop filter should not
only emphasize de-blocking enhancement, but also take restoration into consideration. [2]
Therefore, the nonlinear part consists of one 12-tap diamond filter, whereas the linear part
consists of four kinds of taps (1-tap, 13-tap, 25-tap, and 41-tap) diamond filters with quadrant
symmetry, as shown in Fig. 5.3(a). For the restoration pixels of luma component, the unified
loop filter should not only emphasize restoration, but also take de-ringing enhancement into
consideration. Therefore, the nonlinear part consists of one 8-tap cross filter, whereas the linear
part consists of three kinds of taps (13-tap, 25-tap, and 41-tap) diamond filters with central point
symmetry, as shown in Fig. 5.3(b). The reason for only one filter type in the nonlinear part is that
shorter filter cannot efficiently remove artifacts, where as longer filter will increase side
information significantly. For the enhancement pixels of chroma components, the nonlinear part
consists of one 4-tap diamond filter, whereas the linear part consists of two kinds of taps (1-tap
and 13-tap) diamond filters with quadrant symmetry, as shown in Fig. 5.4(a) and Fig. 5.4(b).
Fig. 5.3: Construction of classification-based unified loop filter for (a) enhancement
pixels and (b) restoration pixels of luma component [2]
Fig. 5.4: Construction of classification-based unified loop filter for (a) enhancement
pixels and (b) restoration pixels of chroma component [2]
Since the unified loop filter for each group has different combinations of nonlinear part
and linear part, the best combination for each group is decided by RDO selection on a frame
basis, = D + λ*R
where D is the distortion between the filtered frame and the original frame, and R is the
number of bits for the filter side information which includes filter tap type and filter coefficient
quantization bits. The filter coefficients can be encoded in two modes: 10-bit fixed length coding
or temporal prediction coding with exp-Golomb code. [2]
Goal:
The main aim of the thesis is to take advantage of the order statistics of the filter [4] and unify
the non linear bilateral filter [13] and linear Wiener filter [14] into an unified loop filter which
improves the reconstructed image (removes ringing, blocking artifacts and reduces quantization
noise) and thus improves the prediction, which in overall increases the compression of video.
The thesis also compares the performance of H.264 with and without unified loop filter (ULF)
based on various test sequences at different bit rates in terms of MSE, PSNR, SSIM and
complexity [16].
References
[1] Kwon Soon-kak, A. Tamhankar and K. R. Rao, “Overview of H.264/MPEG-4 part 10”, 4th
EURASIP Conference on Video/Image Processing and Multimedia Communications, vol. 1, pp. 1-51,
July. 2003.
[2] Y. Liu, “Unified Loop Filter for Video Compression”, IEEE Trans. on Circuits and Systems for
Video Technology, vol. 20, no. 10, pp. 1378 – 1382, Oct. 2010.
[3] T. Wedi, “Adaptive interpolation filter for motion compensated prediction”, IEEE International
Conference on Image Processing (ICIP2002), New York, USA, vol. 2, pp. II-509 - II-512, Sept. 2002.
[4] A. Bovik , T. Huang and D. Munson, “A generalization of median filtering using linear
combinations of order statistics”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 31,no. 6,
pp. 1342 – 1350, Dec. 1983.
[5] Y. Liu and Y. Huo, “Unified loop filter for high-performance video coding”, IEEE International
conference on Multimedia and Expo (ICME), pp. 1271-1276, July. 2010. [6] S. Wittmann and T. Wedi, “Transmission of post-filter hints for video coding schemes”, IEEE
International Conference on Image Processing (ICIP), vol. 1, pp. 81-84, Sept. 16. 2007 - Oct. 19. 2007
[7]P. List, A. Joch, J. Lainema, G. Bjontegaard and M. Karczewicz, “Adaptive de blocking filter”,
IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614-619, July. 2003.
[8] C. Qian, Z. Yunfei, Y. Peng, L. Xiaoan, J. Sole, X. Qian, E. Francois and W. Dapeng, “Classified
quad tree-based adaptive loop filter”, 2011 IEEE International Conference on Multimedia and Expo
(ICME), pp. 1-6, 15 July. 2011.
[9] A. Bovik, T. Huang and D. Munson , “Nonlinear filtering using linear combinations of order
statistics”, IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2067-2070,
May. 1982.
[10] I. E. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Wiley 2010.
[11] H.264/AVC JM reference software. Website: http://iphome.hhi.de/suehring/tml/download
[12] T. Wiegand, G.J. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video
coding standard”, IEEE Trans. CSVT, vol. 13, pp. 560-576, July. 2003.
[13] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images”, Computer Vision,
Sixth International Conference, pp. 839-846, Jan. 1998
[14] Y. Chiu and L. Xu, “Adaptive (Wiener) filter for video compression”, ITU-T SG16 Contribution
C437, Geneva, April. 2008.
[15] Z. Wang et al, “Image quality assessment: from error visibility to structural similarity”, IEEE
Trans. on Image processing, vol. 13, pp. 600-612, April. 2004.
[16] X. Wang, “Recursive algorithms for linear LMSE estimators under uncertain observations”, IEEE
Trans. on Automatic control, vol. 29, pp. 853-854, Sep. 1984.
[17] I.E Richardson, “White paper: H.264/AVC context adaptive variable length coding”, http://www.vcodex.com/files/H264_cavlc_wp.pdf.
[18] D. Marpe, H. Schwarz, and T. Wiegand: Context-Based Adaptive Binary Arithmetic Coding in the
H.264 / AVC Video Compression Standard, IEEE Trans. on Circuits and Systems for Video Technology,
vol. 13, no. 7, pp. 620-636, July. 2003.
[19] T. Wiegand and G.J Sullivan, “The picture phone is here. Really”, IEEE spectrum, vol.48, pp.50-
54, Sept. 2011.
[20] G.J. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC advanced video coding standard:
Overview and introduction to the fidelity range extensions”, SPIE Conf. on application of Digital Image
Processing XXVII, vol.5558, pp.53-74, Aug. 2004.
[21] T. Wiegand and G.J. Sullivan, “The H.264/AVC Video coding standard”, IEEE SP Magazine, vol.
24, pp. 148-153, March. 2007.
[22] G.J. Sullivan and T. Wiegand, “Video compression –From concepts to the H.264/AVC standard,”
Proc. IEEE, vol. 93, no.1, pp. 18-31, Jan. 2005.
[23] I.E.G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding For Next Generation
Multimedia, New York: Wiley, 2003.
[24] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression
standard”, Signal processing, image Communication, vol.19, pp. 793-849, Oct. 2004.
[25] G.J. Sullivan, “The H.264/MPEG-4-AVC video coding standard and its deployment status,” Proc.
SPIE Conf. Visual Communications and Image Processing (VCIP), Beijing, China, vol. 5960, pp.709-
719, July. 2005.
[26] T. Sikora, “Trends and perspective in image and video coding “, PROC. IEEE, vol. 93, pp. 6-17,
Jan. 2005.