Date post: | 09-Apr-2018 |
Category: |
Documents |
Upload: | shyam-krishnan |
View: | 222 times |
Download: | 0 times |
of 23
8/8/2019 H_264 encoder
1/23
H-264 Encoder
(MPEG-4 Part 10)
8/8/2019 H_264 encoder
2/23
Definitions
MOTION VECTOR: A two-dimensional vector used for inter prediction that provides an offset from the
coordinates in the decoded picture to the coordinates in a reference picture.
MOTION COMPENSATION(MC): The chosen candidate region becomes the predictor for the current M Nblock and is subtracted from the current block to form a residual M N block.
MOTION ESTIMATION(ME): This process of finding the best match is known as motion estimation.
NAL : Network Abstraction Layer.
The prediction PRED is subtracted from the current block to produce a RESIDUAL.
RTP : Real-time Transport Protocol
VCL : video coding layer
8/8/2019 H_264 encoder
3/23
Blocks
Discrete cosine transform (DCT)
Quantization
Inverse quantization
Inverse DCT Prediction Mode Select
Prediction Calculation
Of these blocks, the most challenging will probably be the prediction mode select block, which
determines the optimal prediction mode to minimize the sum of absolute differences (SATD) between the
predicted block and the actual image block.
Uses more sophisticated transforms, like quantization, Hadamard,and discrete cosine transforms to achieve
top-quality compression.
H.264 was designed to compensate for lossy networks. It includes its own network layer to facilitate streaming
video (with lost packets) and to minimize the amount of transfer that needs to be completed.
Acoded picture consists of a number ofmacroblocks, each containing 16 16 luma samples
and associated chroma samples (8 8 Cb and 8 8 Cr samples in the current standard).
8/8/2019 H_264 encoder
4/23
Encoder
8/8/2019 H_264 encoder
5/23
The Encoder includes two dataflow paths, a forward path (left to right) and a reconstruction path (right
to left).
Encoder (forward Path) :
Each macroblock is encoded in intra or inter mode and, for each block in the macroblock, a prediction
PRED (marked P in Figure 6.1) is formed based on reconstructed picture samples.
In Intra mode, PRED is formed from samples in the current slice that have previously encoded, decoded
and reconstructed.
In Inter mode, PRED is formed by motion-compensated prediction from one or two reference picture(s)
selected from the set of list 0 and/or list 1 reference pictures. The prediction PRED is subtracted from the current block to produce a residual (difference) block Dn that
is transformed (using a block transform) and quantised to give X.
The entropy-encoded coefficients, together with side information required to decode each block within
the macroblock (prediction modes, quantiser parameter, motion vector information, etc.) form the
compressed bitstream which is passed to a Network Abstraction Layer (NAL) for transmission or storage.
Encoder (Reconstruction Path) As well as encoding and transmitting each block in a macroblock, the encoder decodes (reconstructs) it to
provide a reference for further predictions.
A filter is applied to reduce the effects of blocking distortion and the reconstructed reference picture is
created from a series of blocks Fn.
8/8/2019 H_264 encoder
6/23
Intra prediction
In intra mode a prediction block P is formed based on previously encoded and reconstructed blocks and is
subtracted from the current block prior to encoding.
For the luma samples, P is formed for each 4 4 block or for a 16 16 macroblock.
There are a total of nine optional prediction modes for each 4 4 luma block, four modes for a 16 16
luma block and four modes for the chroma components.
The encoder typically selects the prediction mode for each block that minimises the difference between P
and the block to be encoded.
4 4 Luma Prediction Modes
The samples above and to the left have previously been encoded
and reconstructed and are therefore available in the encoder and
decoder to form a prediction reference.
The samples a, b, c, . . . , p of the prediction block P (Figure 6.23)
are calculated based on the samples AM.
For modes 3-8, the predicted samples are formed from a weighted average of the prediction samples A-Q.
16x16 luma prediction modes
As an alternative to the 4x4 luma modes described above, the entire 16x16 luma component of
amacroblock may be predicted.
8/8/2019 H_264 encoder
7/23
Four modes are available :
Mode 0 (vertical): extrapolation from upper samples (H).
Mode 1 (horizontal): extrapolation from left samples (V).
Mode 2 (DC): mean of upper and left-hand samples (H+V).
Mode 4 (Plane): a linear plane function is fitted to the
upper and left-hand samples H and V. This works well in
areas of smoothly-varying luminance.
8/8/2019 H_264 encoder
8/23
8/8/2019 H_264 encoder
9/23
IN ADDITION TO THESE TWO TYPES OF LUMA PREDICTION, A SEPARATE CHROMA PREDICTION IS
CONDUCTED.
As an alternative to Intra_4 *4 and Intra_16 *16, the I_PCM coding type allows the encoder to simply
bypass the prediction and transform coding processes and instead directly send the values of the encodedsamples.
8/8/2019 H_264 encoder
10/23
P_MB
Memory
+_
A_MB
Memory
DCT & Q
IDCT &
Q -1
P
calculator
Scaler
16*16 Y
8*8 Cr
8*8 Cb
16*16 Y
16*16 Cr
16*16 Cb
I/P
From
ITU 656
Y_MB
fetcher
16*16 / 8*8Cb_MB
fetcher
Cr_MB
fetcher
Frame
Memory
MUX
MUX
Y_refpixel
fetcher
Cb_refpixel
fetcher
Cr_refpixel
fetcher
From
controlckt
From
control
ckt
To NAL
4:4:4 4:2:0
Control
ckt
Reconstructed
Frame Memory
Mode
select
input
From
control
ckt
8/8/2019 H_264 encoder
11/23
Prediction of Inter Macroblocks in P-slices
Inter prediction creates a prediction model from one or more previously encoded video frames.
The model is formed by shifting samples in the reference frame(s) (motion compensated prediction).
Important differences from earlier standards include the support for a range of block sizes (down to 4x4)and fine sub-pixel motion vectors (1/4 pixel in the luma component).
Tree structured motion compensation :
AVC supports motion compensation block sizes ranging from 16x16 to 4x4 luminance samples with manyoptions between the two.
The luminance component of each macroblock (16x16 samples) may be split up in 4 ways as shown inFigure 2-1: 16x16, 16x8, 8x16 or 8x8.
If the 8x8 mode is chosen, each of the
four 8x8 macroblock partitions within
the macroblock may be split in a further
4 ways as shown in Figure 2-2: 8x8, 8x4,
4x8 or 4x4 (known as macroblock
sub-partitions), THIS IS KNOWN AS TREE
STRUCTURE .
A separate motion vector is required for each partition or sub-partition. Each motion vector must becoded and transmitted; in addition, the choice of partition(s) must be encoded in the compressedbitstream.
8/8/2019 H_264 encoder
12/23
Choosing a large partition size (e.g. 16x16, 16x8, 8x16) means that a small number of bits are required to
signal the choice of motion vector(s) and the type of partition; however, the motion compensated residual
may contain a significant amount of energy in frame and vice versa.
The choice of partition size therefore has a significant impact on compression.
In general, a large partition size is appropriate for homogeneous areas of the frame and a small partitionsize may be beneficial for detailed areas.
Each chroma block is partitioned in the same way as the luma component, except that the partition sizes
have exactly half the horizontal and vertical resolution (an 8x16 partition in luma corresponds to a 4x8
partition in chroma; an 8x4 partition in luma corresponds to 4x2 in chroma; and so on).
Shows a residual frame. The AVC reference encoder selects the best partition size for each part of the
frame, i.e. the partition size that minimizes the coded residual and motion vectors. The macroblock
partitions chosen for each area are shown superimposed on the residual frame. In areas where there is
little change between the frames (residual appears grey), a 16x16 partition is chosen; in areas of detailed
motion (residual appears black or white), smaller partitions are more efficient.
8/8/2019 H_264 encoder
13/23
Instead of directly encoding the raw pixel values for each block, the encoder will try to find a similar block
to the one it is encoding on a previously encoded frame, referred to as reference frame.
If the encoder succeeds on its search, the block could be directly encoded by a vector, known as MOTION
VECTOR, which points to the position of the matching block at the reference frame.
Encoder will compare the block found on the reference frame and the block it is encoding, obtaining the
differences between them. Those differences are known as the PREDICTION ERROR and need to be
transformed and sent to the decoder.
If the block matching algorithm fails to find a suitable
match the prediction error will be considerable. Thus
the overall size of motion vector plus prediction error
will be greater than the raw encoding. In this case theencoder would make an exception and send a raw
encoding for that specific block.
If the matched block at the reference frame has also been
encoded using Inter frame prediction, the errors made
for its encoding will be propagated to the next block.
These drawbacks stress out the need of a reliable andtime periodic reference frame for this technique to be
efficient and useful (I, B, P).
8/8/2019 H_264 encoder
14/23
P_MBMemory
+_
A_MB
Memory
DCT & Q
IDCT &
Q -1Scaler
I/P
From
ITU 656
Cb_MB
fetcher
Cr_MB
fetcher
Frame
Memory
MUXFromcontrol
ckt
Fromcontrol
ckt
PREDICTION
ERROR
4:4:4 4:2:0
Reconstructed
Frame Memory
Reconstructed
Frame Memory
Reconstructed
Frame Memory
List 0 & list 1
MUX
Y_MB
fetcher
From
control ckt
SAD calculator
&
Block-matching
algorithms
From
control ckt
Result
memory
Fromcontrol ckt
Cb_MB
fetcher
Cr_MB
fetcher
Y_MB
fetcher
MOTION
VECTOR
8/8/2019 H_264 encoder
15/23
Slice Modes......
I (Intra) Contains only I macroblocks (each block or All macroblock is predicted from previously codeddata
within the same slice).
P (Predicted) Contains P macroblocks (each macroblock All or macroblock partition is predicted from
onelist 0 reference picture) and/or I macroblocks.
B (Bi-predictive) Contains B macroblocks (each macroblock or macroblock Extended and Main partition is
predicted from a list 0 and/or a list 1 reference picture) and/or I macroblocks.
SP (Switching P) Facilitates switching between coded streams; contains Extended P and/or I macroblocks.
SI (Switching I) Facilitates switching between coded streams; contains SI Extended macroblocks (a special
type of intra coded macroblock).
Most broadcast quality applications however, have tended to use 2 consecutive B frames (I,B,B,P,B,B,P,)
as the ideal trade-off between compression efficiency and video quality.
The main advantage of the usage of B frames is coding efficiency. Backward prediction in this case allows
the encoder to make more intelligent decisions on how to encode the video within these areas. Also, since
B frames are not used to predict future frames, errors generated will not be propagated further within thesequence.
One disadvantage of B frame is that the frame reconstruction memory buffers within the encoder and
decoder must be doubled in size to accommodate the 2 anchor frames.
I B B B P B B B P B B B P B B B P .
8/8/2019 H_264 encoder
16/23
I frame
ntra frame is essentially the first frame to encode but with less amount of compression.
This frame is also known as key frame because the preceding frames are encoded using the information
available from this frame.
Intra-prediction utilizes spatial correlation in each frame to reduce the amount of transmission data
necessary to represent the picture.
Intra-frame is more or less similar to image compression like JPEG or GIF.
They is coded without any dependencies to other frames.
Intra prediction
H.264 performs intra-prediction on two different sized blocks:
16x16 (the entire macroblock) and 4x4.
16x16 prediction is generally chosen for areas of the picture that are
smooth. 4x4 prediction, on the other hand, is useful for predicting
more detailed sections of the frame.
The general idea is to predict a block, whether it be a 4x4 or 16x16
block, based on surrounding pixels using a mode that results in a
prediction that most closely resembles the actual pixels in
that block.
8/8/2019 H_264 encoder
17/23
P-frame
P-frames are predicted by using the previous P or I-frame.
This type of frames is responsible for the most reduction
of the video stream.
Motion estimation is the process of selecting an offset to
a suitable reference area in a previously coded frame.
Motion estimation is carried out in a video encoder(not in decoder).
A good choice of prediction reference minimises the energy in the motion-compensated residual which in
turn maximises compression performance.
However, finding the best offset can be a very computationally intensive procedure.
The goal of a practical motion estimation algorithm is to find a vector that minimises the residual energy
after motion compensation while keeping the computational complexity within acceptable limits.
8/8/2019 H_264 encoder
18/23
B-frame
B-frames are bidirectional predicted frames, i.e
B-frames rely on the frames preceding and following
them.
B-frames contain only the data that have changed
from the preceding frame or are different from the
data in the very next frame.
B frames are interesting for two facts:
1st . First they have a slightly better prediction.2nd. and more important, they do not impact the quality of following frames, so they can be coded with lower quality
without degrading the whole sequence.
Since B-frames depend on both past and future picture, the decoder have to be fed with future I-P frames before
being able to decode them.
Size of I, P, B frames =>
8/8/2019 H_264 encoder
19/23
The most important improvements of this technique in regard to previous H.264 standard are:
More flexible block partition,
Resolution of up to pixel motion compensation,
Multiple references,
Enhanced Direct/Skip Macroblock,
8/8/2019 H_264 encoder
20/23
Macro Blocks
Human eyes are sensing the the color and brightness
by different set of sensors.
The compression algorithms first transforms the image
from RGB to the luminance/chrominance (Y-Cb-Cr) color
space.
Here Y called as luma represents the brightness/grayscale,
And Cb-Cr are the two color components represents the
extent to which the color deviates from gray toward blue
and red, respectively.
Since human visual system is more sensitive to luma than chroma, we will use one fourth of the number of
samples the chroma component has, than the luma component.
This is done by down sampling half the number of samples in both the horizontal and vertical dimensions.
This is called 4:2:0 sampling with 8 bits of precision persample.
4:4:4
4:2:2
4:2:0
8/8/2019 H_264 encoder
21/23
Picture structure
Processing is done in block level(MB).
8/8/2019 H_264 encoder
22/23
At the top level, an H.264 sequence consists of a series of packets or Network Adaptation Layer Units
(NAL Units or NALUs).
These can include parameter sets (containing key parameters that are used by the decoder to correctly
decode the video data) and slices (coded video frames or parts of video frames).
At the next level, a slice represents all or part of a coded video frame and consists of a number of coded
macroblocks, each containing compressed data corresponding to a 16x16 block of displayed pixels in a
video frame.
At the lowest level of Figure 16, a macroblock contains type information (describing the particular choice
of methods used to code the macroblock), prediction information (coded motion vectors or intra
prediction mode information) and coded residual data.
8/8/2019 H_264 encoder
23/23