Video Compression
Fall 2011Hongli Luo
Video Compression
Image compression To reduce spatial redundancy
Video compression spatial redundancy exists in each frame as in images Temporal redundancy exists between frames and
can be used for compression Video compression reduces spatial redundancy
within a frame and temporal redundancy between frames
Each video frame can be encoded differently depending on whether to exploit spatial redundancy or temporal redundancy
• Intraframe • Interframe
Intraframe and Interframe
Intraframe Each frame is encoded as an individual image Use image compression technique, e.g., DCT
Interframe Predictive Encoding between frames in the temporal
domain Instead of coding the current frame directly, the
difference between the current frame and a prediction based on previous frames
Use motion compensation
Intraframe Coding
The frames are compressed using Lossy compression, e.g., DCT or subsampling and
quantization Lossless entropy compression, e.g. huffman or
arithmetic
MPEG/ITU standard compress the intraframe similar to JPEG image standard Get 8 x 8 blocks DCT transformation on each block Quantization of the coefficients AC zigzag DPCM on DC coefficients Runlength coding on AC coefficients Huffman or arithmetic coding
Interframe Coding
How does a pixel value change from one frame to the next frame? No change, e.g., background Slight changes due to quantization Changes due to motion of the object Changes due to motion of the camera Changes due to environment and lighting
No changes – no need to code Changes due to motion of object or camera
Predict how the pixel has moved Encoding the changing vector
Video Compression with MotionCompensation
Consecutive frames in a video are similar - temporal redundancy exists.
Temporal redundancy is exploited so that not every frame
of the video needs to be coded independently as a new image.
The difference between the current frame and other frame(s) in the sequence will be coded - small values and low entropy, good for compression.
Steps of Video compression based on Motion Compensation (MC):1. Motion Estimation (motion vector search).2. Motion Compensation based Prediction.3. Derivation of the prediction error, i.e., the difference.
Video Compression Based on Motion Compensation
Each image is divided into macroblocks of size N x N. By default For luminance images, N = 16 For chrominance images, N = 8 if 4:2:0 chroma
subsampling is adopted
Motion compensation is at the macroblock level The current image frame is referred to as Target
Frame. A match is sought between the macroblock in the
Target Frame and the most similar macroblock in previous and/or future frame(s) (referred to as Reference frame(s)).
The displacement of the reference macroblock to the target macroblock is called a motion vector MV.
Assume color of (x, y) is the same or very similar to (x0,y0)
Displacement or motion vector d = (dx, dy) (x, y) = (x0+dx, y0+dy) d = (dx, dy) = (x-x0, y-y0) = (x,y) – (x0,y0)
dx = x-x0 dy=y-y0
Motion Estimation and Compensation
Motion Estimation For a certain macroblock of pixels in the current
frame (referred to as target frame) , find the most similar macroblock in a reference frame (previous or future frame), within specified search area.
• Search for the Motion Vector - MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [−p, p]
Motion Compensation The target macroblock is predicted from the
reference macroblock Use the motion vectors to compensate the picture
Simple Motion Example
Consider a simple block of a moving circle. Instead of coding the current frame, code the difference between 2 frames. The difference needs fewer bits to encode.
From Multimedia CM0340 David Marshall
Estimate Motion of Blocks
Estimate the motion of the object, encode the motion vectors and difference picture.
From Multimedia CM0340 David Marshall
Decode Motion of Blocks
Use the motion vector and difference picture for decoding.
From Multimedia CM0340 David Marshall
Motion Estimation and Compensation
Advantage Motion estimation and compensation reduce the
video bitrates significantly After the first frame, only the motion vectors and
difference macroblocks need be coded.
Introduce extra computational complexity The motion estimation is the most computation
expensive part of a video encoder Need to buffer reference pictures – previous frames
or future frames
Video Compression Standard
Image, Video and Audio compression standards have been specified by two major groups since 1985
ISO (International Standards Organization) JPEG MPEG
• MPEG-1, MPEG-2, MPEG-4, MPEG-7, MPEG-21
ITU (International Telecommunications Union) H.261 H.263 H.264 – by Joint Video Team (JVT) of ISO/IEC MPEG
and ITU-T VCEG.
H.261
H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards.
The standard was designed for videophone, video conferencing and other audiovisual services over ISDN.
The video codec supports bit-rates of p x 64 kbps, where p ranges from 1 to 30 (Hence also known as p x 64).
Require that the delay of the video encoder be less than 150 msec so that the video can be used for real-time bidirectional video conferencing.
ITU Recommendations & H.261 Video Formats
H.261 belongs to the following set of ITU recommendations for visual telephony systems: H.221 - Frame structure for an audiovisual channel
supporting 64 to 1,920 kbps. H.230 - Frame control signals for audiovisual
systems. H.242 - Audiovisual communication protocols. H.261 - Video encoder/decoder for audiovisual
services at p x 64 kbps. H.320 - Narrow-band audiovisual terminal equipment
for p x 64 kbps transmission.
H.261 Frame Sequence
Two types of image frames are defined: Intra-frames (I-frames) and Inter-frames (P-frames): I-frames are treated as independent images.
Transform coding method similar to JPEG is applied within each I-frame, hence “Intra”.
P-frames are not independent: coded by a forward predictive coding method (prediction from a previous P-frame is allowed - not just from a previous I-frame).
Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only spatial redundancy removal.
To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video.
Intra-frame (I-frame) Coding
Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x 8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed.
A macroblock consists of four Y, one Cb, and one Cr 8 x 8 blocks. For each 8 x 8 block a DCT transform is applied, the DCT coefficients then go through quantization zigzag scan and entropy coding.
Inter-frame (P-frame) Predictive Coding
Figure 10.6 shows the H.261 P-frame coding scheme based on motion compensation: For each macroblock in the Target frame, a
motion vector is allocated by one of the search methods discussed earlier.
After the prediction, a difference macroblock is derived to measure the prediction error.
Each of these 8 x 8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.
Inter-frame (P-frame) Predictive Coding
The P-frame coding encodes the difference macroblock (not the Target macroblock itself).
Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level. The MB itself is then encoded (treated as an Intra MB)
and in this case it is termed a non-motion compensated MB.
In fact, even the motion vector is not directly coded.
The difference, MVD, between the motion vectors of the preceding macroblock and current macroblock is sent for entropy coding: MVD = MVPreceding − MVCurrent (10:3)
H.263
H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN).
Aims at low bit-rate communications at bit-rates of less than 64 kbps.
Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).
MPEG-1
MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.
MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format).
MPEG-1 supports only non-interlaced video. Normally, its picture resolution is: 352 x 240 for NTSC video at 30 fps 352 x 288 for PAL video at 25 fps It uses 4:2:0 chroma subsampling
Motion Compensation in MPEG-1
Motion Compensation (MC) based video encoding in H.261 works as follows: In Motion Estimation (ME), each macroblock
(MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.
prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.
The prediction is from a previous frame - forward prediction.
• The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. • A match however can readily be obtained from the next frame.
Motion Compensation in MPEG-1 (Cont'd)
MPEG introduces a third frame type - B-frames, and its accompanying bi-directional motion compensation.
The MC-based B-frame coding idea is illustrated in Fig. 11.2: Each MB from a B-frame will have up to two motion
vectors (MVs) (one from the forward and one from the backward prediction).
Group of Picture (GOP): starts with a I-frame, followed by B and P frames
This GOP has 10 frames, with the structure: IBBPBBPBB
MPEG-1 Frames
Coding mechanism similar to H.261 Three types of frames:
I-frames, coded in intra-frame mode P-frames, coded with motion compensation
using a previous I or P frame as reference) B-frames, coded with bidirectional motion
compensation based on a previous or a future I or P frames
B-frames
Advantages: Coding efficiency. Most B frames use less bits. Better Error propagation: B frames are not used to
predict future frames, errors generated will not be propagated further within the sequence.
Disadvantage: Frame reconstruction memory buffers within the
encoder and decoder must be doubled in size to accommodate the 2 anchor frames.
Other MPEG
MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps. Originally designed as a standard for digital broadcast TV Also adopted for DVDs
MPEG-3: Originally for HDTV (1920 x 1080), got folded into MPEG-2
MPEG-4: very low bit-rate communication The bit-rate for MPEG-4 video now covers a large range between 5
kbps to 10 Mbps. MPEG-7: Main objective is to serve the need of audiovisual
content-based retrieval (or audiovisual object retrieval) in applications such as digital libraries.
MPEG-21: New standard The vision for MPEG-21 is to define a multimedia framework to enable
transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities.
MPEG-4 Part10/H.264
The H.264 video compression standard, formerly known as “H.26L”, is being developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG.
Preliminary studies using software based on this new standard suggests that H.264 offers up to 30-50% better compression than MPEG-2, and up to 30% over H.263+ and MPEG-4 advanced simple profile.
The outcome of this work is actually two identical standards: ISO MPEG-4 Part10 and ITU-T H.264.
H.264
H.264 is currently one of the leading candidates to carry High Definition TV (HDTV) video content on many potential applications.
H.264 is adopted by Apple QuickTime 7 delivers high quality at remarkably low data rates. Generate bit stream across a broad range of
bandwidths, • 3G mobile devices, iPod• Video on demand, video streaming (MPEG-4 Part
2) • video conferencing (H.263)• HD for broadcast (MPEG-2)• DVD (MPEG-2)