Image and Video Compression - umu.se · [email protected] Image and Video Compression. outline...

Shafq ur Ré[email protected]

Image and Video

Compression

outline

2

mage/video compression: what and why

source coding basics

basic idea

symbol codes

stream codes

compression systems and standards

system standards and quality measures

image coding JPEG

video coding and MPEG

Summary

need for compression

3

Image: 6.0 million pixel camera, 3000x2000

18 MB per image -> 56 pictures / 1GB

Video: DVD Disc 4.7 GB

video 720x480, RGB, 30 f/s -> 31.1MB/sec

audio 16bits x 44.1KHz stereo -> 176.4KB/s

->1.5 min per DVD disc

Send video from cellphone:

352*240, RGB, 15 frames / second

3.8 MB/sec ->$38.00/sec levied by AT&T

Data Compression

4

Wikipedia: “data compression, or source coding, is the process of

encoding information using fewer bits (or other information-bearing

units) than an unencoded representation would use through use of

specific encoding schemes.”

Applications

General data compression: .zip, .gz …

Image over network: telephone/internet/wireless/etc

Slow device:

1xCD-ROM 150KB/s, bluetooth v1.2 up to ~0.25MB/s

Large multimedia databases

Understanding compression:

what are behind jpeg/mpeg/mp4 … formats?

what are the “good/fine/super fine” quality modifiers in my Canon 400D?

why/when do I want to use raw/jpeg format in my digital camera?

why doesn’t “zipping” jpeg files help?

what are the best ways to do compression?

are we doing our best? (yes/no/maybe)

what can we compress?

5

Goals of compression

Remove redundancy

Reduce irrelevance

irrelevance or perceptual redundancy not all visual

information is perceived by eye/brain, so throw away

those that are not.

redundant : exceeding what is necessary or normal

symbol redundancy:

the common and uncommon values cost the same

to store.

spatial and temporal redundancy:

Temperatures: tend to be similar in adjacent

geographical areas, also tend to be similar in the

same month over different years …

symbol/inter-symbol redundancy

6

Letters and words in English

e, a, i, s, t, …

a, the, me, I …

good, magnificent, …

fyi, btw, ttyl …

In the evolution of language we

naturally chose to represent frequent

meanings with shorter representations.

Data and information

7

Data is not the same thing as information.

Data is the means with which information is

expressed. The amount of data can be much larger

than the amount of information.

Redundant data doesn't provide additional

information.

Image coding or compression aims at reducing the

amount of data while keeping the information by

reducing the amount of redundancy.

Image Compression

Image compression addresses the problem

of reducing the amount of data required to

represent a digital image. The underlying

basis of reduction process is the removal of

redundant data

Transforming a 2-D pixel array into a

statistically uncorrelated data set

Different Types of Redundancy

9

Coding Redundancy:

Some gray levels are more common than

others.

Inter-pixel Redundancy:

The same gray level may cover a large

area.

Psycho-Visual Redundancy:

The eye can only resolve about 32 gray

levels locally

M. C. Escher, Drawing Hands, 1948

Redundancy

Spatial redundancy

Similarities between adjacent pixels

250 252 249

250 2 -3

Temporal redundancy

Similarities between pixels in adjacent frames

250 252 249

250 2 -1

10

Modes of compression

11

Lossless

preserve all information, perfectly recoverable

examples: morse code, zip/gz

Lossy

throw away perceptually insignificant information

cannot recover all bits

Image Compression

12

Image compression can be:

Reversible (lossless), with no loss of information.

– The image after compression and decompression is identical to the

original image. Often necessary in image analysis applications.

– The compression ratio is typically 2 to 10 times.

Non reversible (lossy), with loss of some information.

– Lossy compression is often used in image communication, compact

cameras, video, www, etc.

– The compression ratio is typically 10 to 30 times.

Image Coding and Compression

13

Image coding

– How the image data can be represented.

Image compression

– Reducing the amount of data required to represent an image.

– Enabling efficient image storage and transmission.

𝑓 𝑓

Lossless compression

14

15

Objective Measures of Image Quality:

Rate

16

Compression ratio serves as the primary measure of a compression

technique’s effectiveness. It is a measure of the number of bits that can be

eliminated from an uncompressed representation of a source image.

Let N1 be the total number of bits required to store an uncompressed (raw)

source image and let N2 be the total number of bits required to store the

compressed data. The compression ratio Cr is then defined as the ratio of

N1 to N2

Larger compression ratios indicate more effective compression

Smaller compression ratios indicate less effective compression

Compression ratios less than one indicate that the compressed

representation is actually larger than the uncompressed representation.

Objective Measures of Image Quality:

Compression Ratio

17

Objective Measures of Image Quality

Objective Measures of Image Quality

18

Objective fidelity criteria

MSE vs PSNR

19

Subjective Measures of Image Quality

20

The problem

– The objective image quality measures previously shown does not

always fit with our perception of image quality.

One solution

– Let a number of test persons rate the image quality of the images on

a scale. This will result in a subjective measure of image quality, or

rather fidelity, but it will be based on how we perceive the quality of the

images.

Subjective fidelity criteria

Excellent

Fine

Passable

Marginal

Inferior

unusable

Information Measure

21

Using Elements of Information theory

Example

Assume that the grading levels are

A, B, C, D, E, F

These are equally distributed

How much information do you have if you know that you don’t have grade F?

How much information is needed to knowyour exact grade?

(5/6)*-log2 (5/6) = 0,22 bits

log2 (6) - ((1/6)*-log2 (1/6)) = 2,15 bits

Information Measure:

Entropy

Shannon Entropy = average information per

source output

H(z)= − 𝑘=0𝐿−1 𝑝 𝑟𝑘 𝑙𝑜𝑔2 𝑝(𝑟𝑘)

where 𝑟𝑘 is gray level number k , and L is

the number of gray levels.

Measure the Amount of Data

24

The average length of the code words assigned to various gray-level

values is found by summing the product of the number of bits used to

represent each gray level and the probability that the gray level occurs

Example 3-bit image

25

The noiseless coding theorem

It is possible to make Lavg/n arbitrary close to

H(z) by coding infinitely long extensions of

the source

)(lim/

zHn

L navg

n

……Redundancy

27

Coding Redundancy:

Some gray levels are more common than

others.

Inter-pixel Redundancy:

The same gray level may cover a large

area.

Psycho-Visual Redundancy:

The eye can only resolve about 32 gray

levels locally

M. C. Escher, Drawing Hands, 1948

Definitions (revisited)

28

Larger compression ratios indicate more effective compression

Smaller compression ratios indicate less effective compression

Compression ratios less than one indicate that the compressed

representation is actually larger than the uncompressed representation.

Dealing with coding redundancy

29

Basic idea:

Different gray levels occur with different probability (non uniform histogram).

Use shorter code words for the more common gray levels and longer code

words for less common gray levels. This is called Variable Code Length.

Code 1 Lavg = 3

Code 2 Lavg = 2,7

Revisit:

Desired properties of symbol codes

30

good codes are not only short but also easy to

encode/decode

Non-singular: every symbol in X maps to a different

code word.

Uniquely decodable: every sequence {x1, … xn} maps

to different codeword sequence.

Instantaneous: no codeword is a prefix of any other

codeword a.k.a. prefix code, self-punctuating code,

prefix-free code.

Huffman Coding

31

First

1. Sort the gray levels by decreasing probability

2. Sum the two smallest probabilities.

3. Sort the new value into the list.

4. Repeat 1 to 3 until only two probabilities remains.

David Albert Huffman

Second

1. Give the code 0 to the highest probability, and the code 1

to the lowest probability in the summed pair.

2. Go backwards through the tree one node and repeat

from 1 until all gray levels have a unique code.

Example of Huffman coding

32

Huffman Coding

33

First

1. Sort the gray levels by decreasing probability

2. Sum the two smallest probabilities.

3. Sort the new value into the list.

4. Repeat 1 to 3 until only two probabilities remains.

David Albert Huffman

Second

1. Give the code 0 to the highest probability, and the code 1

to the lowest probability in the summed pair.

2. Go backwards through the tree one node and repeat

from 1 until all gray levels have a unique code.


34

Assigning codes


35

Huffman coding….

36

The Huffman code is completely reversible, i.e.,

lossless.

The table for the translation has to be stored together

with the coded image.

The resulting code is unambiguous. That is, for the

previous example, the encoded string 011011101011

can only be parsed into the code words 0, 110, 1110,

1011 and decoded as 7, 4, 5, 0.

The Huffman code does not take correlation between

adjacent pixels into consideration.

Interpixel Redundancy

37

Also called spatial or geometric redundancy

Adjacent pixels are often correlated, i.e., the

value of neighboring pixels of an observed

pixel can often be predicted from the value of

the observed pixel.

Coding methods:

Run-length coding

Difference coding

Run-length coding

38

Every code word is made up of a pair (g,l) where g is the gray

level, and l is the number of pixels with that gray level (length or

“run”).

E.g.,

results in the run-length code (1,3)(3,3)(4,1)(2,4)(1,5)

The code is calculated row by row in this scan pattern:

(Newer methods can take advantage of runs of repetitive patterns

like: 8 5 5 8 5 5 8 5 5.)

Difference coding

39

Definition:

E.g.,

The code is calculated row by row in the following scan pattern:

Both run-length and difference coding are reversible and can be

combined with, e.g., Huffman coding.

Combining Difference and Huffman Coding

40

Original image

Difference coding

Combining Difference and

Huffman Coding

41

Combining Difference and Huffman Coding

42

LZW Coding

43

LZW, Lempel-Ziv-Welch

In contrast to Huffman with variable code length, LZW uses fixed

lengths of code words which are assigned to variable length sequences

of source symbols.

The coding is done from left-to-right and row-by-row.

Requires no a priori knowledge of the probability of occurrence of the

symbols to be encoded.

Removes some of the inter-pixel redundancy.

During encoding a dictionary or “code-book” with symbol sequences is

created which is recreated when decoding.

(Many modern lossless compression methods uses Huffman coding in

combination with methods like LZW.)

What if the symbol probabilities are

unknown?

LZW-Coding Lempel-Ziv-Welch

Integrated to mainstream imaging file formats

Graphic interchange format –GIF

Tagged image file format – TIFF

Portable document format - PDF

Widely used: GIF, TIFF, PDF …

Its royalty-free variant (DEFLATE) used in PNG, ZIP, …

Unisys U.S. LZW Patent No. 4,558,302 expired on June 20,2003

http://www.unisys.com/about__unisys/lzw

39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126

39 39 126 126 256 258 260 259 257 126

LZW

Coding dictionary (code book) is created

while data are being encoded

LZW decoder builds an identical

decompression dictionary as it decodes the

data stream

Flush the code book

When the codebook is full

When coding is inefficient

Bit Plane coding

Divide the gray level/color image into series of binary

images (with one image per bit). Code each image

separately using the above described methods. An 8-bit

image will be represented by 8 coded binary images.

It is based on the concept of decomposing a multilevel

image into a series of binary images and compressing

each binary image via one of several well-known binary

compression methods

Alternative decomposition approach – start with

representation of Gray code – successive gray level

differ with one bit

Constant area coding (CAC)

Image is divided into areas of size p x q

Classify all white, all black, mixed

Example white=0, black=10, mixed = 11+pixel

values

If dominantly white

Example white=0, black or mixed = 1 + pixel

values

Lossless Predictive Coding

Does not require decomposition of an image into a collection of

bitplanes

Based on eliminating the interpixel interference of closely spaced

pixels by extracting and coding only the new information in each

pixel

Contains encoder, decoder and predictor

The output of the predictor is rounded to the nearest integer

Lossless Predictive Coding

Prediction error

Is coded using an variable length code

Decoder reconstructs

The predictor uses m previous pixels

nnn ffe ˆ

nnn fef ˆ

]),([),(ˆ

1

m

i

in iyxfroundyxf

Lossy Predictive Compression

A Typical Predictive Coder

Lossy Predictive Compression

52

Reduce the accuracy of the saved image for increased compression

One of the simplest lossy predictive coding schemes is known as

Delta Modulation. Record the approximate error (difference)

between the predicted and actual samples values.

For a sequence of samples S that are indexed by k to generate an

approximate sequence of samples S’ by using a fixed delta value, the

approximate error is given by

Lossy Predictive Compression:

Effect of delta value

53

Delta Modulation Example

54

Delta Modulation Example

55

Psycho-Visual Redundancy

56

If the only intended use of an image is visual observation, much of the

information can be psycho-visual redundant, i.e., it can be removed

without changing the visual appearance or perceived quality of the

image. Loss of information inflict a lossy method.

1 721 kB (uncompressed) 78 kB (low quality JPEG)

Psycho-Visual redundancy

57

Psycho-Visual redundancy is often reduced by quantification.

E.g., Uniform quantification of gray levels

Remove the least significant bits of the data.

Causes edge effects.

The edge effects can be reduced by Improved Gray Scale (IGS).

Remove the least significant bit, and add a “random number”

based on the sum of the least significant bits of the present,

and the previous pixel.

IGS reduces edge effects, but will at the same time unsharpen

true edges.

Improved Gray Scale (IGS)

58

(a) Original image. (b) Uniform quantization to 16 levels. (c) IGS quantization to 16

levels.

Transform Coding

A compression technique that is based on

modifying the transform of an image

For most natural images a significant number

of the coefficients have small magnitudes and

can be coarsely quantized or discarded with

little image distortion

Transform Coding

Sub image decomposition

Transformation

Quantization

Coding

Adaptive transform coding or nonadaptive transform coding

Transform Coding

4) Code the resulting data, normally

using some kind of “variable length

coding”, e.g., Huffman code.

The coding is not reversible (unless step

3 is skipped).

1) Divide the image into n × n sub-images.

2) Transform each sub-image using a reversible transform (e.g., the Hotelling

transform, the discrete Fourier transform (DFT) or the discrete cosine

transform (DCT)).

3) Quantify, i.e., truncate the transformed image (e.g., by using DFT, and DCT

frequencies with small amplitude can be removed without much information

loss). The quantification can be either image dependent (IDP) or image

independent (IIP).

g(x,y,u,v) = Forward transformation kernel

h(x,y,u,v) = Inverse transformation kernel

1

0

1

0

),,,(),(),(N

x

N

y

vuyxgyxfvuT

1

0

1

0

),,,(),(),(N

u

N

v

vuyxhvuTyxf

Fourier Transform

Nvyuxj

Nvyuxj

eN

vuyxh

eN

vuyxg

/)(2

2

/)(2

2

1),,,(

1),,,(

Walsh-Hadamard Transform -

WHT

1

0

)]()()()([

)1(1

),,,(),,,(

m

i

iiii vpybupxb

Nvuyxhvuyxg

• The Fourier Transform consists of a projection onto a

set of orthogonal sinusoidal waveforms.

• The FT coefficients are called frequency components

and the waveforms are ordered by frequency.

• The Hadamard Transform consists of a projection

onto a set of square waves called Walsh functions.

• The HT coefficients are called sequence components

and the Walsh functions are ordered by the number of

their zero-crossings.

Discrete Cosine Transform -

DCT• The discrete cosine transform (DCT) is used to transform a signal from

the spatial domain into the frequency domain.

• The reverse process, that of transforming a signal from the frequency

domain into the spatial domain, is called the inverse discrete cosine

transform (IDCT).

• A signal in the frequency domain contains the same information as that in

the spatial domain. The order of values obtained by applying the DCT is

coincidentally from lowest to highest frequency

• This feature and the psychological observation that the human eye and

ear are less sensitive to recognizing the higher-order frequencies leads to

the possibility of compressing a spatial signal by transforming it to the

frequency domain and dropping high-order values and keeping low-order

ones.

• When reconstructing the signal, and transforming it back to the spatial

domain, the results are remarkably similar to the original signal.

Discrete Cosine Transform -

DCT

1...2,1/2

0/1)(

)2

)12(cos()

2

)12(cos()()(),,,(

NuN

uNu

N

vy

N

uxvuvuyxh

JPEG - Sequential baseline

system

Limited to 8-bit words

DCT-values restricted to 11 bit

DCT computation, quantization, variable

length coding

Subimages 8 x 8 left to right, top to bottom

Image size selection

JPEG:

example of transform coding

70

380

5 %

9486

100 %3839

90 %

2086

80 %

1711

60 %

1287

40 %

822

20 %

533

10 %File size in bytes

JPEG quality

Wavelet Coding

The principal difference between wavelet

coding and transform coding is the omisson

of the subimage processing stage

JPEG2000

File Formats with Lossy Compression

72

JPEG, Joint Photographic Experts Group, based on

a cosine transform on 8x8 pixel blocks and Run-

Length coding. Give arise to ringing and block

artifacts. (.jpg .jpe .jpeg)

JPEG2000, created by the Joint Photographic

Experts Group in 2000. Based on wavelet transform

and is superior to JPEG. Give arise only to ringing

artifacts and allows flexible decompression

(progressive transmission, region of interest, ...)

and reading. (.jp2 .jpx)

JPEG vs JPEG-2000

73

Typical steps in losy image

compression

74

How to represent a face image

75

DCT Approach (block-based)

=a +b +f +g +x

PCA Approach (full-frame)

77

JPEG/JFIF overview

78

Using the 2D FFT for image

compression

79

• Image = 200x320 matrix of values

• Compress by keeping largest 2.5% of FFT components

• Similar idea used by jpeg

Trade-off

80

File Formats with Lossless Compression

81

TIFF, Tagged Image File Format, flexible format often

supporting up to 16 bits/pixel in 4 channels. Can use

several different compression methods, e.g.,

Huffman, LZW.

GIF, Graphics Interchange Format. Supports 8

bits/pixel in one channel, that is only 256 colors.

Uses LZW compression. Supports animations.

PNG, Portable Network Graphics, supports up to 16

bits/pixel in 4 channels (RGB + transparency). Uses

Deflate compression (~LZW and Huffman). Good

when interpixel redundancy is present.

Vector based file formats

82

Vector based file formats

83

PS, PostScript, is a page description language developed in

1982 for sending text documents to printers.

EPS, Encapsulated PostScript, like PS but can embed raster

images internally using the TIFF format.

PDF, Portable Document Format, widely used for documents

and are supported by a wide range of platforms. Supports

embedding of fonts and raster/bitmap images. Beware of the

choice of coding. Both lossy and lossless compressions are

supported.

SVG, Scalable Vector Graphics, based on XML supports both

static and dynamic content. All major web browsers supports it

(Internet Explorer from version 9).

Choosing image file format

84

Image analysis

– Lossless formats are vital. TIFF supports a wide

range of different bit depths and lossless compression

methods.

Images for use on the web

– JPEG for photos (JPEG2000), PNG for illustrations.

GIF for small animations. Vector format: SVG,

nowadays supported by web browsers.

Line art, illustrations, logotypes, etc.

– Lossless formats such as PNG etc. (or a vector

format)

Video

Compression

85

Video Compression Standards

Once video is in digital format, it makes

sense to compress it

Similarly to image compression, we want to

store video data as efficiently as possible

Again, we want to both maximize quality and

minimize storage space and processing

resources

This time, we can exploit correlation in both

space and time domains

Video Compression Standards

Unlike image encoding, video encoding is

rarely done in lossless form

No storage medium has enough capacity to

store a practical sized lossless video file Lossless DVD video - 221 Mbps

Compressed DVD video - 4 Mbps

50:1 compression ratio!

Teleconference H.261, H.262, H.263, H.230

Multimedia video MPEG-1 MPEG-2 MPEG-4

Two organizations dominate video

compression standardization:

ITU-T Video Coding Experts Group (VCEG)International Telecommunications Union – Telecommunications Standardization

Sector (ITU-T, a United Nations Organization, formerly CCITT),

ISO/IEC Moving Picture Experts Group

(MPEG)International Standardization Organization and International Electrotechnical

Commission, Joint Technical Committee Number 1, Subcommittee 29, Working Group 11

88

Definitions

89

Bitrate Information stored/transmitted per unit time

Usually measured in Mbps (Megabits per second)

Ranges from < 1 Mbps to > 40 Mbps

Resolution Number of pixels per frame

Ranges from 160x120 to 1920x1080

FPS (frames per second) Usually 24, 25, 30, or 60

Don’t need more because of limitations of the human

eye

Scan types

Interlaced scan Odd and even lines displayed on

alternate frames

Initially used to save bandwidth on TV

transmission

When displaying interlaced video on a

progressive scan display, can see “comb

effect

Progressive scan Display all lines on each frame

New “fixed-resolution” displays (such as

LCD, Plasma) all use progressive scan

Deinterlacing is not a trivial task

90

MPEG

(Moving Pictures Expert Group)

Committee of experts that develops video

encoding standards

Until recently, was the only game in town (still

the most popular, by far)

Suitable for wide range of videos

Low resolution to high resolution

Slow movement to fast action

Can be implemented either in software or

hardware91

92

MPEG(Moving Pictures Expert Group)

MPEG:s main components are:

Block (8×8 pixels)

Macro block (2×2 block)

Slice (One row of macro blocks)

Picture (An entire video frame)

Group of pictures (GOP)

Video Sequence (One or more GOP:s)

93

MPEG

8×8 Block

16×16 Macro block

Slice

Evolution of MPEG

MPEG-1

Initial audio/video compression standard

Used by VCD’s

MP3 = MPEG-1 audio layer 3

Target of 1.5 Mb/s bitrate at 352x240 resolution

Only supports progressive pictures

MPEG-2

Current de facto standard, widely used in DVD and Digital TV

Ubiquity in hardware implies that it will be here for a long time

Transition to HDTV has taken over 10 years and is not finished yet

Different profiles and levels allow for quality control

Evolution of MPEG

MPEG-2

Current de facto standard, widely used in DVD and Digital TV

Ubiquity in hardware implies that it will be here for a long time

Transition to HDTV has taken over 10 years and is not finished yet

Different profiles and levels allow for quality control

Evolution of MPEG

MPEG-3 Originally developed for HDTV, but abandoned when MPEG-2

was determined to be sufficient

MPEG-4 Includes support for AV “objects”, 3D content, low bitrate

encoding, and DRM

In practice, provides equal quality to MPEG-2 at a lower bitrate, but often fails to deliver outright better quality

MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray

MPEG-7, 2001 : metadata for audio-video streams, Multimedia Content

Description Interface

MPEG-21, 2002 : distribution, exchange, user access of multimedia data and

intellectual property management

MPEG Block Diagram

MPEG technical specification

Part 1 - Systems - describes synchronization and multiplexing of video and audio.

Part 2 - Video - compression codec for interlaced and non-interlaced video signals.

Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio.

Part 4 - Describes procedures for testing compliance.

Part 5 - Describes systems for Software simulation.

Part 6 - Describes extensions for DSM-CC (Digital Storage Media Command and Control.)

Part 7 - Advanced Audio Coding (AAC)

Part 8 - Deleted

Part 9 - Extension for real time interfaces.

Part 10 - Conformance extensions for DSM-CC.

MPEG

video spatial domain processing

Spatial domain handled very similarly to

JPEG

Convert RGB values to YUV colorspace

Split frame into 8x8 blocks

2-D DCT on each block

Quantization of DCT coefficients

Run length and entropy coding

100

Zig-Zag Scan,

Run-length

coding

Quantization• major reduction

• controls ‘quality’

“Intra-Frame

Encoded”

101

MPEG – YUV compression

4:2:0 or 4:1:1 common in consumer products (DV)

4:2:2 common on professional products (DVCPro)

4:4:4 is rarely used – gives no visible improvement compared with 4:2:2

Y Y

Y YU V

4:2:0

Y Y

Y Y

U

U V

V

4:2:2

Y Y

Y Y

U

U V

V

V

VU

U

4:4:4

Y Y

Y Y

Y Y

Y Y

U V

4:1:1

U V

102

MPEG – Block compression

The same as JPEG

8

8

Spatial domain

DCT Quantization

99 50

-74 28

35 -11

21 24

87 -49

55 95

54 16

35 22

12 0

0 0

0 0

0 0

0 0

4 2

0 0

0 0

68 40

44 57

-17 8

25 12

-25 32

60 18

33 24

14 5

8 0

0 3

0 0

0 0

0 0

0 -1

0 0

0 0

Frequency domain

High frequencies

Low Frequencies

0 0 0 0 0 -1 0 0 0 0 0 0 3 0 5 14 24 0 0 0 0 0 0 2 8 12 …RLE

Huffman

103

MPEG

video time domain processing

(Temporal compression)

Adjacent frames share large similarities

Temporal compression can be achieved in

two ways:

Discarding images (reduce the frame rate)

Through motion estimation and motion vectors

MPEG

video time domain processing

Totally new ballgame (this concept doesn’t exist in JPEG)

General idea – Use motion vectors to specify how a 16x16 macroblocktranslates between reference frames and current frame, then code difference between reference and actual block

Types of frames

I frame (intra-coded)

Coded without reference to other frames

P frame (predictive-coded)

Coded with reference to a previous reference frame

(either I or P)

Size is usually about 1/3rd of an I frame

B frame (bi-directional predictive-coded)

Coded with reference to both previous and future

reference frames (either I or P)

Size is usually about 1/6th of an I frame

GOP (Group of Pictures)

GOP is a set of consecutive frames that can be decoded without any other reference frames

Usually 12 or 15 frames

Transmitted sequence is not the same as displayed sequence

Random access to middle of stream – Start with I frame

Things about prediction

Only use motion vector if a “close” match can be found Evaluate “closeness” with MSE or other metric

Can’t search all possible blocks, so need a smart algorithm

If no suitable match found, just code the macroblock as an I-block

If a scene change is detected, start fresh

Don’t want too many P or B frames in a row Predictive error will keep propagating until next I

frame

Delay in decoding

108

MPEG – Group Of Pictures

(GOP)

MPEG uses three types of frames

Grouped in a Group Of Pictures (GOP)

I-pictures (Intracoded)

P-pictures (Predictive Coded)

B-pictures (Bidirectionally interpolated)

I B B P B B P

Forward Prediction

Bidirectional Prediction

I B B P B B P

109

Compressed video stream

Temporal Redundancy Reduction

I frames are independently encoded

P frames are based on previous I, P frames

B frames are based on previous and following I and P frames

In case something is uncovered

111

MPEG – Motion Estimation

Calculate the position for the macro block in

the new image

Store the motion vector and difference in

appearance

Macro block

112

MPEG – Motion Estimation

Help understanding the content of image sequence

Help reduce temporal redundancy of video

For compression

Stabilizing video by detecting and removing small, noisy global

motions

For building stabilizer in camcorder

A hard problem in general!

Bitrate allocation

CBR – Constant BitRate

Streaming media uses this

Easier to implement

VBR – Variable BitRate

DVD’s use this

Usually requires 2-pass coding

Allocate more bits for complex scenes

This is worth it, because you assume that you encode

once, decode many times

114

MPEG – Data stream

I B B P B B P

Display order

1 5432 76

Order in data stream

I BBP BBP

1 54 32 7 6

MPEG - Audio

MPEG-1 – 3 layers of increasing quality, layer 3 being the most common (MP3) 16 bits

Samping rate - 32, 44.1, or 48 kHz

Bitrate – 32 to 320 kbps

De facto - 44.1 kHz sample rate, 192 kbps bitrate

MPEG-2 – Supports > 2 channels, lower sampling frequencies, low bitrate improvement

AAC (Advanced Audio Coding) More sample frequencies (8 kHz to 96 kHz)

Higher coding efficiency and simpler filterbank

96 kbps AAC sounds better than 128 kbps MP3

Usually CBR, but can do VBR

MPEG Container Format

Container format is a file format that can contain data compressed by standard codecs

2 types for MPEG

Program Stream (PS) – Designed for reasonably reliable media, such as disks

Transport Stream (TS) – Designed for lossy links, such as networks or broadcast antennas

AV Synchronization

Want audio and video streams to be played back in sync with each other

Video stream contains “presentation timestamps”

MPEG-2 clock runs at 90 kHz Good for both 25 and 30 fps

PCR (Program Clock Reference) timestamps are sent with data by sender

Receiver uses PLL (Phase Lock Loop) to synchronize clocks

Real time video encoding

Motion estimation will

be worse, so need

higher bitrate to

compensate

Very hard to do in

software, need

dedicated hardware or

hardware assistance

Tivo, ReplayTV do this

Streaming media

Common types include Flash, RealVideo, Quicktime

Usually have low bandwidth available, need to optimize as such

Want dedicated network protocols for this purpose

TCP will wait indefinitely for retransmission, so is often not suitable

MPEG data stream

Analysis

Pros

Overall sharp picture

Audio and video stay

in sync with each

other

What if we were

transmitting this over a

network?

Cons

Picture flashes, blurs

when there is too

much movement on

screen

Higher bitrate often

does not solve this

problem

Image/Video Compression

Standards

122

Bitstream useful only if the recipient knows the code!

Standardization efforts are important

Technology and algorithm benchmark

System definition and development

Patent pool management

Defines the bitstream (decoder), not how you generate them (encoder)!

123

current industry focus:

H.264 encoding/decoding on mobile devices,

low-latency video transmission over various networks,

low-power video codec …

audio coding versus image

coding

124

VC demo

125

Date post:	04-Jun-2020
Category:	Documents
Upload:	others
View:	15 times
Download:	0 times

Image and Video Compression - umu.se · [email protected] Image and Video Compression. outline...

Documents