Topic for lecture 2

transcript

Topic for lecture 2

• Topic: video compression • The ultimate compression task?• Color image (300 x 300 x 24bit):

– 2.16Mbit/image x 30 image/s = 64.8Mbps

• Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit

• 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!

Agenda for lecture 2

• What makes video compression possible?

• Implementations of motion compensation– Block matching

• The YCbCr color representation

• MPEG

Video compression • A sequence of images that needs to be

compressed: storage and/or transmission

• Ignore audio as images >> audio

• Straight forward methods– Motion JPEG – 3D DCT

Temporal redundancy• Less than 10% of the pixels changes more than

1% between frames

• Temporal redundancy or interframe correlation

• Temporal redundancy > spatial redundancy

• Origin: slow camera- and object movements

Motion compensated coding

• Second generation of temporal compression method• More efficient (especially with rapid changes) but also more

complex: – Ok since the cost of computer power is decreasing faster than the

cost of bandwidth

• Basic idea: only difference between two images are the moving objects (draw)

• Estimate the motion and simply code this information• From prediction and the initial frame we can encode/decode

all other frames

Practical issues• Due to noise, camera movements, light changes etc. =>

the object and background changes =>– Calculate the predicted error (difference) and code this

• Very hard to track and describe a general object (contour and texture) instead a block of pixels is used as ’object’

• The estimated motion is represented as pure translation: no rotation and scaling– This is justified since we have high frame rates and ’slow’

changes

– Denoted the displacement vector or motion vector

Procedure for motion compensated coding • Image sequence => image => blocks of pixels• Step 1: Motion analysis:

– Estimate the motion vector of the current block, i.e. the position of the block in the previous image(s)

• Step 2: Prediction and differentiation– Predict how the block found in the previous image(s) will look

like in the current image– Subtract the predicted block from the current block =>

difference • Step 3: Entropy encoding of the difference and motion vector• Encoded difference and motion vector << raw image =>

video compression• Step 3 we know

Motion analysis and prediction• In general we seek the trajectory of a block so we

can predict its current position e.g. using weights• In praxis this is too complicated and instead a 0th

order predictor is applied:– Predicted block(x,y,t) = block(a,b,t-1)– MPEG uses two 0th order predictors

• The only unknown issue: step 1: how do we find the block in the previous frame that best matches the block in the current frame?

• Three methods:– Block matching (by far the most applied method)– Pel-recursion (block = 1 pixel)– Optical flow (block = 1 pixel)

Block matching (1)• Principle• The displacement of

the pixels in a block are assumed to have the same motion vector

• Search window– Maximum from frame rate and context– Usually a square region

• Usually p=q => square block• The smaller the block size => the better prediction, but

more overhead (motion vectors)• Usually block size = 16 x 16

Block matching (2)• Overlapping blocks improve reconstructed

image quality but decrease the bit-rate– Usually non-overlapping blocks are applies

• Block matching via a similarity measure:– Sum of squared differences (SSD): S(u,v) = (u-v)^2– Mean absolute differences (MAD): S(u,v) = |u-v|

Searching strategies• Full search:

– Finds global minimum but requires heavy processing!

• Only one minimum in the search region => A less computational demanding search strategy

• Accept a local minimum => – Larger difference but less processing

• Searching strategies with one (local) minimum:– Coarse-fine three-step search– 2D logarithmic search– Conjugate direction search– Etc.

Coarse-fine three-step search• Step 1) Test 9 points within a fixed pattern

• Step 2+3) Centre the pattern around the best match and change the distance within the pattern

YCbCr color representation

• A camera captures color in RGB format (show)• We would like a representation where the intensity and color is separated:

– So we can transmit and decode both a color and gray-scale signal – [R,G,B]: [50,50,50] same color as [100,100,100]– HSI (hue-saturation-intensity)– HSI is complex to calculate so we seek a more simple rep.

• YUV-representation is a simple approximation:– Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B– The non-uniform weighting comes from the HVS– U = B – intensity = ”pure” blue color = 0.492 (B - Y)– V = R – intensity = ”pure” red color = 0.877 (R - Y)– Rough approximation but very simple to compute

YCbCr color representation (3)• The HVS is more sensitive to intensity (Y)

than to color (Cb and Cr) so more bits can be used to represent the intensity

• Formats:

= Y sample = Cb and Cr sample

4:4:4 (24 bits) 4:2:2 (16 bits) 4:2:0 (12 bits)

MPEG• MPEG = Moving pictures experts group• International standard for compression of video (image,

sound, and system info.), due to grows in the digital media (e.g. CD-rom, DVD) market. Both transmission and storage

• MPEG-1: 1991• MPEG-2: 1994

– MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today

• MPEG is NOT an algorithm but rather a frameworkwith several algorithms and MANY user-settings. – Fixed protocol, hence fixed decoders (encoder not specified! )– Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )

• MPEG is a lossy compression algorithm

MPEG-1• MPEG-2 is an ”add-on” to MPEG-1• Typical bit rate for MPEG-1 = 1.5Mbps

– Meaning that an MPEG-1 decoder can decode and show real-time video that has been compressed to 1.5Mbps. MPEG: Trade off between video quality and bandwidth

• Allows resolutions up to 4095 x 4095 at 60Hz– Most used is the CPB (constrained parameter bit steam)

• Fixed resolutions and frame rates =>

HW implementations

• Max. resolution = 768 x 576 at 30Hz

• Max. bit rate = 1.856Mbps

MPEG-1 compression rate• BT.601 (digital TV-signal):• 704 x 576 x 24bit x 25Hz = 243Mbps• Compression factor: 243Mbps / 1.5Mbps = 162 • JPEG = 10-20• YCrCb 4:2:0 format: 12 bit per pixel• Basic operation: down-scale to SIF (source input format)

– Fixed resolution => HW solutions– 360 x 288 (ignore lines and/or interpolate)

• 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20• But can be higher or lower• In general: Fewer input data => better image quality (for

fixed bit rate)

MPEG-1 principle (1)

• Full-motion-compensated DCT and difference coding

• Frames: 1,2,3,4,5,6,7,8,9, …

• 1: (DCT-JPEG)

• 2,3,4,5,6,7,8,9, … : difference coding– The difference is DCT coded and quantized =>

loosy compression– Problems? – Error propagation – No random access

MPEG-1 principle (2)

• I-picture: intra-coded

– Similar to JPEG

• P-picture: predictive

coded via forward prediction

• B-picture: predictive coded via:

– forward-, backward-, or bi-directional prediction

• Errors in I and P are limited to max one GOP (group of pixels)

• Errors in B are limited to one picture

• High N and M => good coding but error propagation.

– Usually: 13<N<16 and 0<M<4

– Recommended: I each ½ sec. and whenever scene changes

• Coding order vs. visualisation order

Entire sequence

4:2:0-format

6 Blocks

Type: I,P,B

MB = Macro Block

Coding one Block (8x8)

• Similar to JPEG except for adaptive quantization– DCT, quantization, zig-zag scan, entropy coding– Adaptive quantization controls the quality/amount of data– Intra vs. Inter coding:

• I-blocks: Intra

• P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.

Coding one Block (8x8)

• Encoding

• Decoding

What to remember

• Video compression is done by removing the temporal redundancy• Principle: (at block level)

– Step 1: Motion analysis => motion vector– Step 2: Calculate the error/difference (subtraction)– Step 3: Entropy encoding of motion vector and difference

• Motion analysis:– Pel-recursion– Optical flow– Block matching (the currently applied method)

• Block matching– Block of pixels (16 x 16)– Similarity measure– Search region– Different search strategies to avoid the full search

What to remember• Video compression is done by removing the temporal redundancy• Principle: (at (macro)block level)

– Step 1: Motion analysis (block matching) => motion vector– Step 2: Calculate the error/difference (subtraction)– Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding)

• MPEG-1: – Bit rate ~1.5Mbps– Asymmetrical codec ~ 100:1 ( JPEG ~1:1 )– Compression rate < 400 (down scaling + YCbCr 4:2:0 => ~20)– Coding-style: I B B P B B P B B I

• Questions?• Presentations: email me tbm@cvmt.dk• The end

Pel-recursion (1)• The block consists of only one pixel (= pel)• Problem formulation:

– Displaced frame difference function: – DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1)– Find (dx,dy) which minimises DFD^2 =>

most similar pixel => best displacement vector

• Solution:– Setting the partial derivatives = 0– Non-linear programming problem:

• Iterative algorithm• Steepest decent method• Newton-Raphson’s method• others

Pel-recursion (2)• Algorithm:• Find the motion vector (dx,dy) for the first pixel• The motion vectors

are correlated =>– Use ’old’ (dx,dy) as

initial guess for the iterative algorithm =>recursion

Optical flow

• The block consists of only one pixel

• Similar to Pel-recursive but calculated in a different manner

Comparing the 3 types of motion analysis

• The three: pel-recursion, optical flow and block matching • Optical flow and pel-recursion calculated one motion

vector for each pixel =>– More precise => predicted block and current block are more

similar => smaller difference => more compact coding of the difference.

– More overhead as more motion vectors are to be coded– More complex to calculate– Pixel methods avoid the block artefacts of block matching

• Block matching is (at present) more suitable– Used in all coding standards

Temporal methods

• Two methods which exploit both the spatial and temporal redundancies– Frame replenishment– Motion compensation

• Both utilise prediction => short summery

Frame replenishment (1)

• Exploit the temporal redundancy• First generation of temporal compression method• If: value changed significantly:

| i(x,y,t) – i(x,y,t-1) | > TH • Then: code value and position: i(x,y,t) x,y• Else: code nothing => re-use i(x,y,t-1)• Enhancements:

– Send differences instead of values– Remove noise from the images prior to processing

Frame replenishment (2)

• A fixed bit rate of 1Mbps means that the decoder can only decode and play-back real-time video compressed to 1Mbps

• Many changes between two images => many pixels to be coded.

• To achieve the same bit rate => TH is higher

=> only large changes are coded => poorer reconstructionaka. the dirty window effect

2D logarithmic search• Test 5 points within a fixed pattern

• Centre the pattern around the best match

• When best match is in the centre or on the border: reduce distance in pattern

Conjugate direction search• Step 1: Test 3 vertical points next to each other

• Step 2: Move to minimum point

• Continue step 1 and 2 until a minimum is found. Then repeat the process in the vertical direction

YCbCr color representation (2)

• YUV-representation can have negative values, so YUV-representation is scaled and shifted to avoid this => YCbCr-representation

• Cb and Cr are denoted the chrominances

• YCbCr is the representation utilised in image/video compression

0.299 0.587 0.114-0.147 -0.289 0.436 0.615 -0.515 -0.100

0.257 0.504 0.098-0.148 -0.291 0.439 0.439 -0.368 -0.071

+ 16128128

Audio in MPEG-1• 16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps

• Stereo at 44.1Kbps = 1.4Mbps• Compression based on psycho-acoustic redundancy:• Three methods:

– Layer 1: Target rate = 384Kbps– Layer 2: Target rate = 256Kbps– Layer 3: Target rate = 128Kbps

• Layer 3 is the most advanced and often applied– It has a nickname, which?

MPEG-2• Defined in 1994• Developed for DTV but has lots of other applications• Based on MPEG-1 (backward compatible) • Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4)• Lots of new features including:

– Support for fields, support for 4:4:4 and 4:2:2

– Alternative zig-zag scan, better motion vectors

– Scalability to allow any subset of a stream to be decoded and visualised, etc.

• MPEG-3: Purpose: HDTV – Merged with MPEG-2 => no MPEG-3 standard

MPEG-4• Both for real video and synthetic video• Very low bit rates < 64Kbps => efficient coding• Content based coding: code the objects

– Shape, texture and sprite (background objects)

• Interactivity• Popular coding

standards:

Topic for lecture 2

Documents