G.J. Sullivan, J.R. Ohm, W.J. Han, and T. Wiegand

transcript

G.J. Sullivan, J.R. Ohm, W.J. Han, and T. WiegandIEEE Trans. Circuits and Systems for Video Technology, vol. 22, no. 12, Dec., 2012

Overview of the High Efficiency Video Coding (HEVC) Standard

Gaewon Kim (Ph.D. course) and Prof. Changhoon YimDepartment of Internet and Multimedia Engineering, Konkuk University

Typical HEVC video encoder

HEVC Video Coding Layer

• Coding tree unit (CTU) and coding tree block (CTB) A CTU consists of one luma CTB and two chroma CTB L×L luma CTB: L can be 16, 32, 64

• Coding unit (CU) and coding block (CB) The root of quadtree is CTU. CTU is partitioned into CUs recursively. A CU consists of one luma CB and two chroma CB. Each CU has an associated partitioning into prediction

units (PUs) and a tree of transform units (TUs)

HEVC Video Coding Layer• Prediction unit (PU) and prediction block (PB)

A PU partitioning structure has its root at the CU level. PB size can be from 64×64 down to 4×4.

• Transform unit (TU) and transform block (TB) A TU tree structure has its root at the CU level. A luma CB may be identical to the luma TB or may be split

into smaller luma TBs. TB size can be 4×4, 8×8, 16×16, and 32×32.

HEVC Video Coding Layer• Motion compensation

Quarter-sample precision is used for the MVs. 7-tap or 8-tap filters are used for interpolation of fractional-

sample positions.• Intrapicture prediction

33 directional modes, planar (surface fitting), DC (flat) Modes are encoded by deriving most probable modes

(MPMs) based on those of previously decoded neighboring PBs.

HEVC Video Coding Layer• Quantization control

Uniform reconstruction quantization (URQ)• Entropy coding

Context adaptive binary arithmetic coding (CABAC)• In-Loop deblocking filtering

Similar to the one in H.264 More friendly to parallel processing

• Sample adaptive offset (SAO) Nonlinear amplitude mapping For better reconstruction of amplitude by histogram analysis

HEVC Video Coding Techniques

• HEVC : block-based hybrid video coding① Interpicture prediction

Temporal statistical dependences② Intraprcture prediction

Spatial statistical dependences③ Transform coding

Spatial statistical dependences

Sampled Representation of Pictures

• HEVC uses YCbCr color space with 4:2:0 subsampling. Y component

Luminance (luma).Represents brightness (gray level).

Cb and Cr componentsChrominance (chroma).Color difference from gray toward blue and red.

Coding Tree Unit (CTU)

• A picture is partitioned into CTUs. The CTU is the basic processing unit. It contains luma CTBs and chroma CTBs.

A luma CTB covers L L samples.Two chroma CTBs cover each L/2 L/2 samples.

HEVC supports variable-size CTBs.The value of L may be equal to 16, 32, or 64.It is selected according to needs of encoders.

In terms of memory and computational requirements.Large CTB is beneficial when encoding high-resolution

video content.

Division of the CTB into CBs.

• CTBs can be used as CBs or can be partitioned into multiple CBs using quadtree structures.

• The quadtree splitting process can be iterated until the size for a luma CB reaches a minimum allowed luma CB size (8 8 or larger).

PBs and PUs• The prediction mode for the CU is signaled as being

intra or inter.• When it is signaled as intra, the PB (prediction

block) size is the same as the CB size for all block. CB can be split into four PB quadrants when the CB size

is equal to the smallest CB size. It allows mode selections for blocks as small as 4 4.

PBs and PUs• When the prediction mode is signaled as inter,

It is specified whether the CBs are split into one, two, or four PBs.The splitting into four PBs is allowed only when the CB

size is equal to the smallest CB size. Each interpicture-predicted PB is assigned one or two

motion vectors and reference picture indices.

PBs and PUsIntrapictureprediction

Interpictureprediction

Asymmetric Motion

Partitioning

Tree-Structured Partitioning into Transform Blocks and Units

• For residual coding, a CB can be recursively partitioned into transform blocks.

• The partitioning is signaled by a residual quadtree.

Tree-Structured Partitioning into Transform Blocks and Units

• Subdivision of a CTB into CBs and TBs.• Solid lines: CB boundaries, dotted lines: TB boundaries

Slices and Tiles

• Slices are a sequence of CTUs that are processed in the order of a raster scan.

• The main purpose of slices is resynchronization after data losses.

Slices and Tiles

• Slices are self-contained. It can be correctly decoded without the use of any

data from other slices in the same picture.This means that prediction within the picture is not

performed across slice boundaries.Except for the in-loop filtering.

Slices and Tiles• Each slice can be coded using different coding types.

I sliceA slice in which all CUs are coded using only intrapicture prediction.

P sliceSome CUs can be coded using interpicture prediction

with uniprediction. B slice

Some CUs can be coded using interpicture prediction with biprediction.

Slices and Tiles

• Tiles are self-contained and independently decodable.

• The main purpose of tiles is to enable the use of parallel processing architectures for encoding and decoding.

Slices and Tiles• A slice is divided into rows of CTUs.

This supports parallel processing of rows of CTUs by using several processing threads in the encoder or decoder.

Wavefront parallel processing (WPP)

Intrapicture Prediction• Planar prediction (Intra_Planar)

Amplitude surface with a horizontal and vertical slope derived from boundaries

• DC prediction (Intra_DC) Flat surface with a value matching the mean value

of the boundary samples• Directional prediction (Intra_Angular)

33 different directional prediction is defined for square TB sizes from 4×4 up to 32×32

Intrapicture Prediction

Fig. 6. Modes and directional orientations for intrapicture prediction

Intrapicture PredictionPB Partitioning

• When the CB size is larger than the minimum CB size, PB size is equal to the CB size

• When the CB size is equal to the minimum CB size, An intrapicture-predicted CB may have two types

of PB partitions PART_2N×2N: no split PART_N×N: split into four equal-sized PBs

Intrapicture PredictionIntra-Angular Prediction

• 33 prediction directions, Intra-Angular[k], k: 2~34• Each TB is predicted directionally from spatially

neighboring samples that are reconstructed For TB of size N×N, a total of 4N+1 spatially

neighboring samples may be used for prediction Left, Above, Above right, Lower left

• To improve the intrapicture prediction accuracy, the projected reference sample is computed with 1/32 sample accuracy

Intrapicture PredictionReference Sample Smoothing

• Reference samples used for the intrapicture prediction are sometimes filtered by [1 2 1]/4 smoothing filter

• 4×4 block Smoothing filter is not applied

• 8×8 block Only for diagonal directions, k = 2, 8, 34

• 16×16 block Most directions, except near horizontal, vertical

• 32×32 block Most directions, except exact horizontal, vertical

Intrapicture PredictionMode Coding

• HEVC considers 3 most probable modes (MPM) when coding luma intrapicture prediction modes predictively First two modes are initialized by the prediction modes of

the above and left PBs Any unavailable prediction mode is considered to Intra_DC When the first two MPM are not equal, the third MPM is set

to Intra_Planar, Intra_DC, or Intra_Angular[26] (vertical)• If the current luma prediction mode is one of three

MPMs, only the MPM index is transmitted Otherwise, the index of the current luma prediction mode is

transmitted by using 5-b fixed length code

Interpicture Prediction

• Partitioning modes PART_2N2N

The CB is not split. PART_2NN

The CB is split into two equal-size PBs horizontally. PART_N2N

The CB is split into two equal-size PBs horizontally. PART_NN

The CB is split into four equal-size PBs. PART_2NnU, PART_2NnD, PART_nL2N, and PART_nR2N

These types are known as asymmetric motion partitions (AMP).

• HEVC supports motion vectors with units of one quarter of the distance between luma samples.

• Fractional Sample Interpolation It is used to generate the prediction samples for

noninteger sampling positions.

Interpicture Prediction• Fractional Sample Interpolation

HEVC uses an eight-tap filter for the half-sample positions and a seven-tap filter for the quarter sample positions.

HEVC uses a single interpolation process.It improves precision and simplifies the architecture.

Transform, Scaling, and Quantization

• HEVC uses transform coding of the prediction error residual. The residual block is partitioned into multiple

square TBs. The supported transform block sizes are 44, 88,

1616, and 3232.

• Core Transform Two-dimentional transforms are computed by

applying 1-D transforms in the horizontal and vertical directions.

The elements of the core transform matrices were derived by approximating scaled DCT basis functions.

• Alternative integer Transform It is derived from a DST. It is applied to only 44 luma residual blocks.

For intrapicture prediction modes. It is not much more computationally demanding than the

44 DCT-style transform. It provides approximately 1% bit-rate reduction.

• Scaling and Quantization HEVC uses a uniform reconstruction quantization

(URQ) scheme controlled by a quantization parameter (QP).

The range of the QP values is defined from 0 to 51.

Entropy Coding

• HEVC uses only CABAC for entropy coding.• Context modeling

The number of contexts used in HEVC is substantially less than in H.264/MPEG-4 AVC.

Entropy coding design actually provides better compression.• Adaptive coefficient scanning

Coefficient scanning is performed in 44 subblocks for all TB sizes. The selection of the coefficient scanning order depends on the

directionalities of the intrapicture prediction.

Entropy Coding• Adaptive coefficient scanning

The horizontal scan is used when the prediction direction is close to vertical. The vertical scan is used when the prediction direction is close to horizontal. For other prediction directions, the diagonal up-right scan is used.

Entropy Coding

• Coefficient coding HEVC transmits the position of the last nonzero transform

coefficient, a significance map, sign bits and levels for the transform coefficient.

In-Loop Filters

• Two processing steps, a deblocking filter (DBF) followed by an sample adaptive offset (SAO) filter, are applied to the reconstructed samples. The DBF is intended to reduce the blocking artifacts due to block-

based coding. The DBF is only applied to the samples located at block

boundaries. The SAO filter is applied adaptively to all samples satisfying

certain conditions. e.g. based on gradient.

In-Loop Filters• Deblocking Filter

It is applied to all samples adjacent to a PU or TU boundary.Except the case when the boundary is also a picture boundary, or

when deblocking is disabled across slice or tile boundaries. HEVC only applies the deblocking filter to the edge that are aligned

on an 88 sample grid.This restriction reduces the worst-case computational complexity

without noticeable degradation of the visual quality. It also improves parallel-processing operation.

The processing order of the deblocking filter is defined as horizontal filtering for vertical edges for the entire picture first, followed by vertical filtering for horizontal edges.

In-Loop Filters• Deblocking Filter

The strength of the deblocking filter is controlled to only three strengths.

Given P and Q are two adjacent blocks with a common 88 grid boundary,

The filter strength of 2 is assigned when one of the blocks is intrapicture predicted.

The filter strength of 1 is assigned if any of the following conditions is satisfied.

① P or Q has at least one nonzero transform coefficient.② The reference indices of P and Q are not equal.③ The motion vectors of P and Q are not equal.④ The difference between a motion vector component of P and Q is greater than

or equal to one integer sample. The filter strength of 0 means that the deblcoking process is not applied.

In-Loop Filters

• SAO (sample adaptive offset) It is a process that modifies the decoded samples by conditionally

adding an offset value to each sample after the application of the deblocking filter, based on values in look-up tables transmitted by the encoder.

It is performed on a region basis, based on filtering type selected per CTB.sao_type_idx 0: it is not applied to the CTB.sao_type_idx 1: band offset filteringsao_type_idx 2: edge offset filtering

In-Loop Filters

• SAO In the band offset mode.

The selected offset value directly depends on the sample amplitude.The full sample amplitude range is uniformly split into 32

segments called bands.The sample values belonging to four of these bands (which are

consecutive within the 32 bands) are modified by adding transmitted values.

The main reason for using four consecutive bands is that in the smooth areas artifacts can appear.

In-Loop Filters• SAO

In the edge offset mode. a horizontal, vertical, or one of two diagonal gradient directions is used for the

edge offset classification in the CTB.

Each sample in the CTB is classified into one of five EdgeIdx categories.

In-Loop Filters

• SAO In the edge offset mode.

Depending on the EdgeIdx category, an offset value is added to the

sample value.

It generally has a smoothing effect in the edge offset mode.

Special Coding Modes• I_PCM mode

The prediction, transform, quantization and entropy coding are bypassed.

The samples are directly represented by a pre-defined number of bits.

Its main purpose is to avoid excessive consumption of bits when the signal characteristics are extremely unusual and cannot be properly handled by hybrid coding.

Special Coding Modes

• Lossless mode The transform, quantization, and other processing that

affects the decoded picture are bypassed. The residual signal from inter- or intrapicture prediction is

directly fed into the entropy coder. It allows mathematically lossless reconstruction. SAO and deblocking filtering are not applied to this

regions.

Special Coding Modes

• Transform skipping mode Only the transform is bypassed. It improves compression for certain types of video content

such as computer-generated images or graphics mixed with camera-view content.

It can be applied to TBs of 44 size only.

G.J. Sullivan, J.R. Ohm, W.J. Han, and T. Wiegand

Documents