Next Generation Video Coding‐ H.265/HEVC and its extensions · APSIPA Asia-Pacific Signal and...

transcript

APSIPAAsia-Pacific Signal and Information Processing Association

APSIPA Distinguished Lecture Series www.apsipa.org

NextGenerationVideoCoding‐H.265/HEVCanditsextensions

Oscar C. Au (PhD, Princeton Univ.)Dept. of Electronic and Computer EngineeringHong Kong University of Science and TechnologyClearwater Bay, Hong KongEmail: eeau@ust.hk

OscarC.Au• BS,Toronto.MA/PhD,Princeton.Postdoc,Princeton.• Professor,HKUST.Director,MultimediaTechCenter.• SteeringCommittee,ICME/TMM.• IEEE/HKIEFellow.BoG,APSIPA.• BestPaperAwards:SiPS/PCM/MMSP/ICIP• AE ofjournals:TCSVT,TIP,TCAS1,JVCIR,JSPS,TSIP,JMM,JFI.• Chairof3TC:CASMSATC,SPSMMSPTC,APSIPAIVMTC.• MemberofTC:CASVSPS/DSP,SPSIVMSP/IFS,ComSoc MMC.• 400+papers.H‐index=29.100+patentsfiled.20granted.• 80+standardcontribution(MPEG/VCEG/JCTVC/AVS).

Outline

• Introduction• Lossless Compression• Image Compression• Video Compression• Simulation Results• Conclusion

Whatis“Image”?• “An image is worth a thousand words.”• Camera – “an optical instrument that records images that can be stored directly” (wikipedia)

• Black & white photo, Color photo• Analog camera (film), DC (card), DSLR, smart phone• Storage card: SD, xD, SDHC, USB, • auto‐focus (AF), auto‐exposure (AE), auto‐white‐balancing (AWB)• ISO, shutter speed, aperture, metering• Lens: wide‐angle, zoom, fish‐eye, vibration reduction/image stablizer,

Whatis“Video”?• Video is sequence of images• Transmission: TV, CCTV, broadcasting TV, movie, camcorder, • Storage: VHS, betamax, SVHS, video8, LD, VCD, DVD, Blu‐Ray• PAL/SECAM (625 lines, 25 fps), NTSC (525 lines, 30 fps), movie• SDTV: 720x576 (PAL/SECAM), 720x480 (NTSC)• HDTV: 1920x1080 (1080i/p), 1280x720 (720i/p)• UHDTV: 7680x4320• digital cinema: 2048x1080 (2K), 4096x2160 (4K)• CCD/CMOS sensor, Bayer color filter array, demosaicking• AF/AE/AWB

Whyvideocoding?Whypossible?

Why? • Raw video data is huge. • Channel capacity is limited• Video coding to reduce data rate

Why possible? • Signal representation is naturally redundant.• Some signal details irrelevant to observer.• Video coding by removing redundancy and irrelevancy• Redundancy removal is reversible (lossless)• Irrelevancy removal is irreversible. 6

WhyCompression?Storage.

a) A 3000 line text file • 3000 line x 80 char/line x 1 byte/char = 240 KB

b) A 3‐minute song on CD• 3 x 60 sec x 44100 Hz x 2 byte/sample x 2 channel = 32 MB

c) A 150 minute NTSC movie (3.4 TB for HDTV, 60fps)• 150 x 60 sec x 30 fps x 480 x 720 pixel x 3 color = 273 GB

• Storage capacity:• SD card/Flash drive: 1/2/4GB, 8/64/128GB• iPod/hard‐disk: 8GB/16GB, 100GB/500GB/1TB• CD‐R: 650MB/700MB• DVD+‐R: ~4.7GB 7

WhyCompression?Communication.

a) Speech over telephone• 8000 Hz x 8 bit/sample = 64 kbit/s

b) CD quality music/audio over network• 44.1 x 1000 Hz x 16 bit/sample x 2 channels = 1.4 Mbit/s

c) NTSC Digital Video over network (3.0 Gb/s for HDTV)• 30 fps x 480 x 720 pixel x 8 bit/pixel x 3 color = 250 Mbit/s

• Channel capacity• GPRS: 120/20 kb/s 3G (SS): 384/160 kb/s• HSPA: 28/42 Mb/s, 4G (OFDMA/MIMO): 1000/100 Mb/s• Bluetooth: 1‐3 Mb/s• WiFi: 12/54 Mb/s, WiMax: 1 Gb/s 8

WhyPossible?

• Compression is performed to remove the redundancy inherent in the image/video• Spatial, temporal, statistical and psychovisual redundancies.

• Compression methods• Lossless: Redundancy removal is reversible;• Lossy: Redundancy removal is irreversible.

Lossy coding:Distortionmeasure

Distortion measures• Sum of absolute error (SAD), Sum of square error (SSE)• Peak‐signal‐to‐noise ratio (PSNR)

Rate‐distortion trade‐off• Lower bit rate =>

higher distortion• Higher bit rate =>

lower distortion

Lossy codingissues

• Bit rate (R)• Distortion (D)• Coding delay (important for real time systems)• Complexity (important for software/hardware cost)• Sensitivity to model deviation (robustness)• Sensitivity to channel errors (robustness, error concealment)

Twocodingapproaches

• Waveform coding• Represent a signal such that after source coding, a good approximation of original waveform remains.

• E.g. DPCM, JPEG, MPEG‐1/2/4, H.261/3/4, JPEG2000, GIF, MP3• Typically distortion is due to quantization

• Parametric coding• Assume signal generated from model with parameters.• Encoder extract/encode model parameters from • Good approximation of waveform may not be obtained, but the result could be sound/image that resemble the original perceptually

• E.g. fractals, model‐based coding, LPC, CELP, 12

Videocodingstandards?

• What is a standard? • International standards• National standards• Industry standards• Defacto standards

• Mandatory? Optional? MP3? JPEG?• Why standard?• The standard war ..

StandardOrganizations

• Three main international organizations:• ITU‐T: International Telecommunication Union (Telegraphy section);

• ISO: International Standards Organization;• IEC: International Electrotechnical Commission.

• These organizations form different groups to develop various standards:• ITU‐T forms the VCEG (Video Coding Experts Group);• ISO and IEC form the MPEG (Moving Picture Experts Group);• ITU‐T and ISO form the JPEG (Joint Photographic Experts Group).

ISO/IECwithsubgroupsforvideo

StandardizationHistory

Outline

Entropycoding

• Entropy is a measure of randomness• Entropy H is non‐negative.

• Theorem: Average codeword length S of a uniquely decodable binary code satisfies S>=H.

• Theorem: A prefix‐condition code for groups of M symbols exists with an average no. of bits per symbol S<H(q_i)+1/M

jjji ppqH

LosslessCompressionTechniques

• Huffman Coding• Arithmetic Coding• LZW Coding• Run‐length Coding• DPCM Coding• …

HuffmanCoding

• Properties• Fixed‐length input, variable‐length output;• Sensitive to channel errors;• Encode/decode using table lookup;• Sensitive to changes in signal statistics.

• Applications• Modified Huffman coding in JPEG and H.261;• 2D and 3D Huffman coding in MPEG and H.263;

Examples(onesymbol)

• P(A)=3/4, P(B)=3/16, P(C)=1/16• Average length per symbol (S) is

• S =1x3/4 + 2x3/16 + 2x1/16• = 1.25 bit/symbol

• where entropy (H) is defined as

• Theoretically, S>=H. S can be arbitrarily close to H by coding N symbols at a time. (meaningful 20% improvement possible)

012.1161log

163log

ii ppH log

1/161/4

A 0B 10C 11

HuffmanCodingExample

A 0.6B 0.2C 0.1D 0.05E 0.025F 0.0125G 0.00625H 0.00625

• Ave. codeword length = 1x0.6+2x0.2+3x0.1+4x0.05+5x0.025+6x0.0125+7x0.00625x2 = 1.7875 bit/symbol

• Entropy=H=1.7585

Example(twosymbols)

AA 9/16AB 9/64AC 3/64BA 9/64BB 9/256BC 3/256CA 3/64CB 3/256CC 1/256

• Ave. length = 2.09 bit/codeword = 1.045 bit/symbol• Much closer to H=1.012 !!

• Code multiple symbols at a time => lower average length/symbol

ArithmeticCoding

• Properties• Fixed‐length input, variable‐length output• Optimal minimum average code word length• Sensitive to channel errors;• Adapt to changing of signal statistics.

• Application• Image coding standards like JBIG, JPEG, JPEG 2000;• Video coding standards like H.263 and H.264/MPEG‐4 AVC

ArithmeticCoding

• flexible, can use adaptive model of signal statistics• optimal, minimum average codeword length• slower than Huffman and Ziv‐Lempel• no random access• potential unbounded output delay• need to indicate end‐of‐file• poor error resistance; error propagates• compress better than Huffman in JPEG, more

complicated

BasicalgorithmofAC

1. Initialize the current interval [L, H) to [0, 1).2. For each symbol of the file, do:

a) subdivide current interval into subintervals, one for each possible alphabet symbol. The size of a symbol’s subinterval is proportional to the estimated probability that the symbol will be the next symbol in the file, according to the model of the input.

b) select the subinterval corresponding to the symbol that actually occurs next in file, and make it the current interval.

3. Output enough bits to distinguish the final current interval from all other possible final intervals. 1log iP

Example(Arithmeticcoding)

P(a)=0.5, P(b)=0.4, P(c=EOF)=0.1

Current Interval

Action Subinterval In-put

[0.0, 1.0) Subdivide [0.0, 0.5) [0.5. 0.9) [0.9, 1.0) a

[0.0, 0.5) Subdivide [0.0, 0.25) [0.25, 0.45) [0.45, 0.5) a

[0.0, 0.25) Subdivide [0.0, 0.125) [0.125, 0.225) [0.225, 0.25) b

[0.125, 0.225) Subdivide [0.125, 0.175) [0.175, 0.215) [0.215, 0.225) c

[0.215, 0.225) End-of-file

• To encode the interval [0.215, 0.225), • need a number with enough precision to specify the interval.

• Width of interval=0.01. • Half width=0.005.• Need 8 bits of precision to specify half‐width:

• 2‐8 =0.0039<0.005, 2‐7 =0.0078>0.005

Target interval: [0.215, 0.225)0 0.5 [0.0,0.5) / [0.5, 1.0) 0 0.25 [0.0, 0.25) / [0.25, 0.5)1 0.125 [0.0, 0.125) / [0.125, 0.25)1 0.0625 [0.125, 0.1875) / [0.1875, 0.25)1 0.03125 [0.1875, 0.21875) / [0.21875, 0.25)0 0.015625 [0.21875, 0.234375) / [0.234375, 0.25)0 0.0078125 [0.21875, 0.22656) / [0.22656, 0.234375)0 0.00390625 [0.21875, 0.2227125) / [0.2227125,

0.22656) (completely within=> stop)Codeword = 00111000 = 0.21875

ModifiedArithmeticCoding

a) if new subinterval is not entirely within one of these intervals: [0, 1/2), [1/4, 3/4), or [1/2, 1), then stop iterating and return.

b) if new subinterval lies entirely within [0, 1/2), then output 0 and any 1’s left over from previous “follow”symbols (one 1 for each “follow” symbol), and double the size of the interval [0, 1/2) expanding from left boundary (0) towards the right.

c) if new subinterval lies entirely within [1/2, 1), then output 1 and any 0’s left over from previous “follow” symbols (one 0 for each “follow”), and double the size of current interval by expanding it from right boundary (1) towards the left. For example: interval [0.72, 0.8) will become [0.44, 0.6), because 2*dist(0.72,1)=0.56, 1‐0.56=0.44 => left point; new width=2*(0.8‐0.72)=0.16; 0.44+0.16=0.6 => right point.

d) if new subinterval lies entirely within [1/4, 3/4), apply a “follow” symbol and double the size of current interval by expanding it in both directions away from midpoint (1/2). For example: interval [0.48, 0.588) will become [0.46, 0.676), because 2*dist(0.48,0.5)=0.04, 0.5‐0.04=0.46 => left point; 2*dist(0.588,0.5)=0.176, 0.5+0.176=0.676 => right point.

e) goto (a).

Example(ModifiedAC)

P(a)=0.4, P(b)=0.2, P(c)=0.4. Assume a=EOF.

Current Interval

[0.0, 1.0) Subdivide [0.0, 0.4) [0.4. 0.6) [0.6, 1.0) b

[0.4, 0.6) Expand (Follow)

[0.3, 0.7) Expand (Follow)

[0.1, 0.9) Subdivide [0.1, 0.42) [0.42, 0.58) [0.58, 0.9) a

[0.1, 0.42) Expand (Output 0, 1, 1)

[0.2, 0.84) End (Output 01 or 10)*

Example(ModifiedAC)

• Therefore, • codeword = 01110 or 01101.• cw(01110)=0.25+0.125+0.0625=0.4375• cw(01101)=0.25+0.125+0.003125=0.40625

* Need to output 10 or 01 for input x, where P(x1 )=0.2, P(x2 )=0.64, P(x3 )=0.16

Example(oldmethod‐ AC)

• P(a)=0.4, P(b)=0.2, P(c)=0.4

Current Interval

Action Subinterval In-puta b c

[0.0, 1.0) Subdivide [0.0, 0.4) [0.4. 0.6) [0.6, 1.0) b

[0.4, 0.6) Subdivide [0.4, 0.48) [0.48, 0.52) [0.52, 0.60) a

[0.4, 0.48) End-of-File

Example(oldmethod–AC)

Target interval: [0.4, 0.48)0 0.5 [0.0, 0.5) / [0.5, 1.0)1 0.25 [0.0, 0.25) / [0.25, 0.5)1 0.125 [0.25, 0.375) / [0.375, 0.5)1 0.0625 [0.375, 0.4375) / [0.4375, 0.5)0 0.03125 [0.4375, 0.46875) / [0.46875, 0.5)

(entirely within [0.4, 0.48))

• codeword = 01110 (can also be 01101)• Same as modified AC.

Example(ModifiedACDecode)

P(a)=0.4, P(b)=0.2, P(c)=0.4. Assume a=EOF. cw(01110)=0.4375Current Interval

[0.0, 1.0) Subdivide [0.0, 0.4) [0.4. 0.6) [0.6, 1.0) b

[0.4, 0.6) Expand (Follow) cw=0.5-2*(0.5-0.4375)=0.375

[0.3, 0.7) Expand (Follow) cw=0.5-2*(0.5-0.375)=0.25

[0.1, 0.9) Subdivide [0.1, 0.42) [0.42, 0.58) [0.58, 0.9) a

Stop decoding because a=end-of-file. (cw=codeword)

Example(ModifiedACDecode)P(a)=0.4, P(b)=0.2, P(c)=0.4. Assume a=EOF. cw(01101)=0.40625

Current Interval

[0.0, 1.0) Subdivide [0.0, 0.4) [0.4. 0.6) [0.6, 1.0) b

[0.4, 0.6) Expand (Follow) cw=0.5-2*(0.5-0.40625)=0.3125

[0.3, 0.7) Expand (Follow) cw=0.5-2*(0.5-0.3125)=0.125

[0.1, 0.9) Subdivide [0.1, 0.42) [0.42, 0.58) [0.58, 0.9) a

Stop decoding because a=end-of-file. (cw=codeword)

Outline

ImageCompression

• Applications: Internet, digital photography, medical imaging, remote sensing, surveillance, facsimile, etc.

• Standardization of image compression• JPEG became the International Standards in 1992.• JPEG 2000 was finalized in 2002.• General Structure of Image Coding Standards:

ImageStandards

• JPEG (lossy and lossless): ITU‐T T.81, ISO/IEC 10918‐1• JPEG extensions: ITU‐T T.84• JPEG‐LS (lossless, improved): ITU‐T T.87, ISO/IEC 14495‐1• JBIG (lossless, bi‐level pictures, fax): ITU‐T T.82, ISO/IEC 11544• JBIG2 (bi‐level pictures): ITU‐T T.88, ISO/IEC 14492• JPEG 2000: ITU‐T T.800, ISO/IEC 15444‐1• JPEG 2000 extensions: ITU‐T T.801• JPEG XR (formerly called HD Photo prior to standardization) : ITU‐T T.832, ISO/IEC 29199‐2

• History• Starting from the mid‐1980s;• Designed for compressing grayscale and color still images;• Became an international standard in 1992.

• Application• The JPEG coding standard still serves as the most widely used compression algorithm today.

• Its application can be found in diverse storage and transmission domains, such as the Internet, digital professional and consumer photography, and video.

GoalsofJPEGFour modes of operation:

a) Sequential encoding: image encoded in single left‐right, top‐bottom scan (include baseline sequential codec)

b) Progressive encoding: image encoded in multiple scans encoding with progressively refined details, for image reconstruction in multiple coarse‐to‐clear passes

c) Lossless encoding: image encoded to guarantee exact recovery of every source image sample value (though compress ratio low compared to lossy modes)

d) Hierarchical encoding: image encoded at multiple resolutions s.t. lower resolution image accessible without decompressing higher or even full resolution

SequentialModeofJPEG

• 3 major steps: DCT, quantization, entropy coding

Image Source FDCT Quantizer Entropy

Encoder

Compressed Image

Dequantizer Entropy Decoder

IDCTReconstructed

DiscreteCosineTransform(DCT)

where , for u=0 and v=0,

, otherwise.

1612cos

1612cos,

1612cos

1612cos,,

vyuxvCuCyxf

vyuxyxfvCuCvuF

This image cannot currently be displayed.

1 vCuC

1,1 vCuC

Quantization

• quantization is a many‐to‐one mapping and thus lossy –the principal source of distortion in DCT‐based encoder

• quantization defined as division of each DCT coefficients by its corresponding quantizer step size followed by rounding to nearest integer (normalized by the quantizer step size):

vuQvuFroundvuFQ

EntropyCoding

• each quantized DC coefficient encoded as difference from DC term of previous block in encoding order;

• this special treatment is worthwhile as DC coefficients frequently contain a significant fraction of total image energy.

a) Input image

158158158163161161162162157157157162163161162162157157157160161161161161155155155162162161160159159159159160160162161159156156156158163160155150156156156156156153151144155155155155153149144139

00000000000000000000000000000000000000010000001100000012000001079

0000000000000000000000000000000000000001400000013140000001224000001001264

160161163164164163161160157158160161162161160160154156158160161161161160155156158161162162162162157158160162163163162162158159160161161159158157156156157156155153150149154154153152150147144142

9910310011298959272101120121103877864499211310481645535247710310968563722186280875129221714566957402416131455605826191412126151402416101116

b) Forward DCT coefficients c) Quantization table

d) quantized DCT coefficients e) dequantized DCT coefficients f) Reconstructed image

DCT/QuantizationExample

4.06.02.19.18.18.36.16.28.01.17.15.05.13.04.03.10.10.15.18.03.06.12.08.1

3.16.07.01.06.15.18.06.03.001.09.05.12.09.11.71.06.09.02.05.16.13.99.102.14.01.09.22.32.65.176.22

3.17.27.11.22.51.120.16.1259

JPEGLosslessMode

1. JPEG discovered that a DCT‐based lossless mode was difficult to define as a practical standard against which encoders and decoders could be independently implemented, without placing severe constraints on both encoder and decoder implementations. • Instead, JPEG has chosen a simple predictive method

which is wholly independent of DCT processing.

JPEGLosslessMode

2. The predictive method produces results which, in light of its simplicity, are surprisingly close to the state of the art for loss‐less continuous tone compression.

• Lossless codecs typically produce around 2:1 compression for color images with moderately complex scenes.

JPEGLosslessMode

3. A predictor combines values of up to 3 neighbouringsamples (A, B, C) to form a prediction of the sample indicated by X, and the difference is encoded losslesslyby either Huffman or arithmetic coding. • The encoder can use any source image precision from

2 to 16 bits per sample, and can use any of the predictors except selection value 0.

JPEGProgressiveMode

• Each image component is encoded in multiple scans rather than in a single scan.

• The first scan encodes a rough but recognizable version of the image which can be transmitted quickly in comparison to the total transmission time

• Subsequent scans refined the image progressively to finally reach the level of picture quality that was established by the quantization tables.

JPEGProgressiveMode

• To achieve this requires the addition of an image‐sized buffer memory at the output of the quantizer before the input to the entropy encoder.

• The buffer memory must be sufficiently large to store the image as quantized DCT coefficients each of which is 3 bits larger in size than the source image samples.

JPEGProgressiveMode

• There are two complementary method by which a block of quantized DCT coefficients may be partially encoded.

• In the first method, only N low frequency DCT coefficients need to be encoded. This is called spectral selection. The other high frequency coefficients are sent in succeeding scans.

JPEGProgressiveMode

• In the second method, the coefficients need not be encoded to their full (quantized) accuracy in a given scan

• Initially the N most significant bits are encoded. • In subsequent scans, the less significant bits can be

encoded. • This is called successive approximation.

JPEGHierarchicalMode(pyramidalcoding)

• image is filtered and down‐sampled (decimated) by the desired number of multiple of 2 in each dimension.

• Encode the reduced‐size image using one of the sequential DCT, progressive DCT or lossless encoders. (step b)

JPEGHierarchicalMode

• Decode this reduced sized image, interpolate and up‐sample by a factor of 2 horizontally and vertically, using an identical interpolation filter which the receiver must also use.

• Use this up‐sampled image as a prediction of the original at this resolution and encode the difference image using one of the sequential DCT, progressive DCT or lossless encoder. (step d)

• repeat until full resolution of image is encoded.

• encoding in steps (b) and (d) may be done using only DCT‐based processes, only lossless processes or DCT‐based processes with a final lossless process for each component.

• useful in applications in which a very high resolution image must be accessed by a lower‐resolution device, which does not have the buffer capacity to reconstruct the image at its full resolution and then scale it down for lower‐resolution display.

JPEG• Key technology

• Lossless coding scheme: predictive coding method using neighboring pixel values

• Lossy mode: the well‐known DCT, serves as the base‐line of JPEG• Block artifacts

Block based DCT

Cameraman, 8 bits/pixel

Cameraman, 0.5 bits/pixel Cameraman, 0.8 bits/pixel

Cameraman, 0.15 bits/pixel

Lena, 8 bits/pixel

Lena, 0.5 bits/pixel Lena, 0.8 bits/pixel

Lena, 0.15 bits/pixel

JPEG2000

• History• Started in 1998;• Aimed at improving the quality and capability of JPEG;• Approved as the international standards in 2002

• Application• Remote sensing, color fax, printing, scanning, digital photography, medical imagery, digital libraries/archives, Internet, e‐commerce, etc.

JPEG2000Tiling

• Each color component divided into rectangular tiles (E.g. 64x64 non‐overlapping blocks)

• Each tile encoded independently. For each tile, apply wavelet transform, quantization, form precinct and code blocks, EBCOT, AC.

• Bad: Slightly lower compression efficiency with tiling than without.

• Good: lower memory requirement, random access • Arithmetic coding used: MQ‐coder (also used in JBIG2, similar to QM‐coder in JPEG)

Block Diagram of JPEG2000(Encoder)

• Preprocessing• subtract input value (unsigned) by 128

• Forward Inter‐component Transform• RGB to YUV (reversible) or YCbCr (irreversible)• Decorrelation, possible 2:1 subsampling in UV or CbCr• Useful data reduction method in JPEG, but not so much in JPEG2000 (can discard HL, LH, HH)

Reversible Irreversible

• Forward Intra‐component Transform• Discrete Wavelet Transform (DWT) of non‐overlapping tiles • Daubechies 9‐tap/7‐tap wavelet filter for irreversible/ lossytransform

• 5‐tap/3‐tap for reversible/lossless transform (allow repetitive encoding/decdding without additional loss)

• implemented by convolution or lifting using periodic symmetric extension to handle boundary effect

Imagewavelettransform

• Quantization• Step size calculated from rate control

• Precinct partitioning• 3 spatially consistent rectangles grouped to form a packet partition location or a precinct

• Each precinct divided into non‐overlapping code‐blocks, scanned with a particular order

• Tier‐1 Encoder• Bit plane of coefficients within code‐block entropy encoded using embedded block coding with optimal truncation (EBCOT)

• EBCOT: Arithmetic Coding (AC) followed by post‐compression rate‐distortion (PCRD) optimized truncation

• certain ROI can be coded at higher quality• Tier‐2 Encoder

• Referred to as packetization.

EBCOT(EmbeddedBlockCodingwithOptimalTruncation)

EBC(EmbeddedBlockCoding)

• Each bit plane of a code‐block is encoded by 3 coding passes• Significant pass (AC or direct)• Refinement pass (AC or direct)• Cleanup pass (AC)

• Exception: highest bit plane encoded with Cleanup pass only

JPEG2000

Cameraman, 8 bits/pixel

Cameraman, 0.5 bits/pixel Cameraman, 0.8 bits/pixel

Cameraman, 0.15 bits/pixel

JPEG2000

Lena, 8 bits/pixel

Lena, 0.5 bits/pixel Lena, 0.8 bits/pixel

Lena, 0.15 bits/pixel

ComparisonofJPEGandJPEG2000

• Compared with JPEG

Advantages Disadvantages

• Better compression performance at low bit rate

• Spatial and quality scalability• No blocking artifacts

• No substantial improvement at medium and high bit rate

• More complex than JPEG

JPEGDerivedIndustryStandards

• JFIF (JPEG File Interchange Format, XXX.jpg);• JTIP (JPEG Tiled, Pyramid Format);• TIFF (Tagged Image File Format);• SPIFF (Still Picture Interchange File Format, JPEG Part 3);• FlashPix

• Developed by Kodak, Hewlett‐Packard, Microsoft (1996);• Widely used in digital still cameras.

Outline

VideoCompression

• Applications• DVD, digital TV, HDTV, video telephony, and teleconferencing

• Standardization process

VCEG MPEG

H.264/MPEG‐4 AVC

H.262/MPEG‐2

MPEG‐4

MPEG‐1

Twotypesofapplications1. Asymmetric applications: infrequent use of compressor

(complication) but frequent use of decompressor (simple).• e.g. electronic publishing, education and training, travel

guidance, videotext, point‐of‐sale, video games, entertainment (movies), video‐on‐demand (VOD), etc.

2. Symmetric applications: essentially equal use of compressor and decompressor.

• e.g. video mail, videophone, video‐conferencing, generation of material for playback only applications, etc.

* MPEG is for asymmetric applications. H.261/263 are for symmetric applications.

Requirementsforcompressedvideoondigitalstoragemedia1. Random access : any frame decodable in less than 0.5 second. Need

access point, i.e. segments of information coded only with reference to themselves.

2. Fast Forward/Reverse Searches : possible to scan a compressed bit stream, display selected frames to obtain a fast forward or fast reverse effect. (a more demanding form of random access)

3. Reverse Playback : impossible without an extreme additional cost in memory.

4. Audio‐video synchronization : need mechanism to resynchronize audio and video should they be derived from slightly different clocks.

Requirementsforcompressedvideoondigitalstoragemedia5. Robustness to errors : avoid catastrophic behavior in the presence of

errors in storage media or transmission channels.

6. Coding/Decoding delay : video‐conferencing applications need to maintain total system delay under 150ms in order to maintain conversation. Publishing applications can allow long encoding delay, but short interactive threshold decoding delay of 1 sec.

7. Editability : editing units of a short time duration and coded only with reference to themselves needed for editability in compressed form.

8. Format flexibility : allow for a large flexibility of format in terms of raster size (width, height) and frame rate.

9. Cost tradeoff : decoder implementable in small number of chips.

VideoCompression

• General structure of the video codec• Intra‐frame: exploit the spatial correlation to predict the signal• Inter‐frame: exploits the temporal correlation to further reduce the redundancies.

TemporalRedundancyReductionThree types of pictures :

1) Intra‐pictures(I): provide access point for random access, but only moderate compression(~10:1)

2) Predictive Pictures(P): coded with reference to a past picture(I or P); used as a reference for future P pictures; higher compression (~20:1)

3) Bidirectional predicted pictures(B): provide highest amount of compression (~40:1) but require both a past and future reference for prediction; not used as a reference.

MPEG‐1/2/4VideoCompression• Raw video is huge in size for transmission/storage (e.g. 150 GB for a

120 minute movie). Need compression.

• Compression achieved by exploiting 4 characteristics:

• statistical redundancy, perceptual irrelevancy, spatial redundancy ‐

used in image compression

• temporal redundancy (Motion Estimation) unique and most

important for video compression

Frame 21 of tennis sequenceFrame 20 of tennis sequence

Frame difference without motion estimation with motion estimation

TemporalRedundancyReduction

AdvantageofDPCMoverPCM

Block‐basedMotionEstimationGoal : To establish blockwise correspondence between two frames

sec/108.5sec

30396961512 9 opsxframeframeblock

blocksearch

searchops

MEtoocomputationintensive!

Computational distribution in MPEG-4 encoder

ErrorSurface

• Multi‐model

• Easily trapped in local minimum

ErrorSurface

FastMotionEstimation

Full Search (41.79dB, slow) PMVFAST (41.85dB, speedup=1125)

“Foreman” test sequence, medium bit rate 512 kbit/s, medium resolution CIF, 15 fps, SA32

AdaptiveQuantization/RateControl

cij = ijth DCT coefficient of an 8x8 block,cijq = quantized cij,Qij = ijth entry in the quantization table,Qp = extra quantization step.

• For a given frame, Qij or the quantization table is fixed. But Qp can be changed on a block‐by‐block basis.

ijqij QQ

croundC 8

Errorconcealment

a) Two types of I‐frame errors: loss of HP cells resulting in loss of headers, DC and low order AC

coefficients, which results in serious degradation of video signal.

loss of SP cells resulting in loss of higher order AC coefficients, which results in loss of detail in the blocks being reconstructed.

b) Three types of P and B‐frame errors loss of an HP cell resulting in total loss of information pertaining

to image region represented by the coded bits loss of SP cell containing motion information loss of SP cell containing no motion information (ignore)

c) DC synthesis (I‐frames) DC values are synthesized by bilinear interpolation from the

nearest blocks in the top and bottom macroblocks. Since a cell loss generally causes loss of data in a series of macroblocks, the horizontal neighbors are not used for synthesis.

d) AC synthesis (I‐frames) In order to reduce the effect of distinct block boundaries,

some of the low order AC coefficients have to be synthesized. The five lowest order AC coefficients (in zigzag order) are synthesized using some method.

Errorconcealment

e) Prediction mode (P and B‐frames) For P‐frame, if top or bottom macroblock is coded as forward

predicted, the damaged macroblock is assigned forward mode.

If both neighbors are coded in intra mode, the damaged macroblock is assigned intra mode. Similar for B‐frames.

f) Motion vectors (P and B‐frames) If both top and bottom vectors are defined, then the average

of motion vectors is used for the synthesized macroblock. If only one of the vertical neighbors has valid motion vector(s)

defined, then this vector(s) is used. If no motion available, the macroblock is synthesized by intra

frame technique as in I‐frames.

Errorconcealment

VideoStandards

• CCIR 601 (ITU‐T)• H.261 (ITU‐T)• H.263 (ITU‐T)• H.264/MPEG‐4 AVC (ITU‐T + ISO)• H.265• M‐JPEG (ISO)• MPEG‐1 (ISO)• MPEG‐2 (ITU‐T + ISO)• MPEG‐4 (ISO)• VC‐1 (SMPTE)• AVS 100

• Target• International standard for ISDN picture phones and for video conferencing systems (1990)

• Image format: CIF (352x288) or QCIF (176*144), frame rate 7.5 ... 30 fps• Bit‐rate: multiple of 64 kbps, typically 128 kbps including audio.• Picture quality: for 128 kbps acceptable with limited motion in the scene

• Basic Properties• The very first one of the H.26x standards in the domain of VCEG• I‐frame + P‐frame• I‐frame: basically the same as JPEG• P‐frame: motion estimation/compensation + JPEG• Motion estimation/compensation.• Loop filter.

• Ratified in November 1988 • The first widespread practical success• Still in use, mostly as a backward‐compatibility feature overtaken by H.263

• Designed to operate at video bit rates between 40kbit/s and 2Mbit/s

• Still used as backward‐compatibility mode in some video conferencing systems and for some types of internet video

MPEG‐1

• Target• Target bit‐rate about 1.5 Mbit/s • Typical image format CIF, no interlace• Frame rate 24 ... 30 fps• Main application: video storage for multimedia (e.g., on CD‐ROM)

• Basic Properties• Designed for CD‐ROM application by MPEG• I‐frame + P‐frame + B‐frame• P‐frame: unidirectional motion compensation• B‐frame: bi‐directional motion compensation• Half‐pixel ME 103

MPEG‐1

• Final standard was approved in November 1992• Use was fairly widespread, but mostly overtaken by MPEG‐2• Can provide approximately VHS quality between 1‐2 Mbps• Application:

• MP3• VCD• DVD• CD‐ROM• DVB (Digital Video Broadcasting)• DAB (Digital Audio Broadcasting)

H.262/MPEG‐2

• Target• Extension for interlace, optimized for TV resolution (NTSC: 704 x 480 Pixel)

• Image quality similar to NTSC, PAL, SECAM at 4 ‐8 Mbit/s• HDTV at 20 Mbit/s

• Basic Information• Meet the need of entertainment TV for transmission media.• Frame/field picture;• The scalability tools as functionality tools were first defined.

SNRscalability

• EI and EP have same resolution as I and P

• EI predicted from I

• EP predicted from P or previous EI or EP

Spatialscalability

• Similar to SNR scalability except that I and P are half-size of EI and EP

• Prediction from I and P involves enlarging them by a factor of 2

Multilayerscalability

H.262/MPEG‐2

• First video compression codec released in 1995• Now in wide use for DVD standard and DTV

• The most commonly used video coding standard• Range of use normally 2‐20 Mbps• Application

• DVD (NTSC & PAL)• HDV (High‐definition video on DV cassette tape)• MOD and TOD (Digital tapeless camcorders)• VOD (Video On Demand)• ATSC• XDCAM• ISDB‐T• DVB

• Target• International standard for picture phones over analog subscriber lines (1995)

• Image format usually CIF, QCIF or Sub‐QCIF, frame rate usually below 10 fps

• Bit‐rate: arbitrary, typically 20 kbps for PSTN• Picture quality: with new options as good as H.261 (at half rate)• Widely used as compression engine for Internet video streaming

• Basic Properties• Designed for video conferencing at a low bit rate in the mobile wireless communication scenario.

• 8x8 motion compensation;• 8x8 block DCT

• Ratification in March 1996• Application

• Widely used as compression engine for Internet video streaming (YouTube, Google Video, Myspace)

• Also found use in H.323 (RTP/IP based video conferencing), RTSP , SIP (IP‐based videoconferencing) solutions.

• Low bit‐rate compressed format• MMS (mobile multimedia message)• Video telephony

MPEG‐4

• Target• Object based coding;• Wide‐range of applications, with choices of interactivity, scalability, error resilience, etc.

• Basic Properties• A lot of new coding tools:

• Interactive graphics;• Object and shape coding;• Scalable video coding.

• Robust transmission;• Can encode mixed media data. 112

• Introduced in late 1998, still a developing standard• Efficient across a variety of bit‐rates ranging from a few kbps to tens of mbps.

• Support variety of bitrates• Application

• Internet video streaming• Wireless video• Studio editing• Video database• Interactive video• Video conferencing/e‐mail• Games• Education

H.264/MPEG‐4AVC

• Target• Reduce half of the bit‐rate compared to MPEG‐2, H.263 or MPEG‐4 Part 2

• Basic Properties• Finalized in 2003• Variable block‐size motion compensation• Quarter‐pixel precision for motion compensation• Powerful entropy coding techniques:

• Context‐Adaptive Binary Arithmetic Coding (CABAC)• Context‐Adaptive Variable‐Length Coding (CAVLC)

• Scalable Video Coding (SVC)• Multiview Video Coding (MVC)• Integer‐based transform

H.264/MPEG‐4AVC

• First drafting work was completed in May 2003• Broad Application

• Blu‐ray Discs• Streaming internet source (Vimeo, YouTube, iTunes Store)• Web software (Adobe Flash Player, Microsoft Silverlight)• HDTV broadcasts (ATSC, DVB‐T, DVB‐C DVB‐S)• CCTV (Closed Circuit TV) and Video surveillance

AdvancedTechniquesofH.264

• New Intra Prediction Method• Advance Inter Prediction Method

• Multiple Reference Frames• Multiple Block Size ME (e.g. 16x8, 8x8, 4x4)• Better ME accuracy (1/4 pixel)

• 4x4 integer transform• In‐Loop Deblocking Filter• Better entropy encoding (CABAC)

4x4Intra‐prediction(luma)

4x4IntraPredictionModes

16x16IntraPrediction(luma)

Mode 0 (Vertical) Mode 1 (Horizontal)

Mode 2 (DC) Mode 3 (Plane)

Complicated!

8x8intra‐prediction(chroma)

Codingintra‐predictionmode

ME:Multiplereferenceframes

Frame NN-1N-2N-3N-4N-5

ME:Multipleblocksize

• Totally 7 block mode• Motion estimation• 4x4 integer transform• Advantage

• Save bits (~ 15%, 7 modes)• Disadvantage

• Computation increase

Transform

• 4x4 array for luma DC coefficients

• 2x2 array for chroma DC coefficients

4x4DCT(withHadamard transformforDC)

4x4IntegerDCT

Fast4x4DCT

QuantizationofDCTcoeff.

QPandQStep

QuantizationinH.264referencesoftware

HEVCvideocompression

• Target• Target at HDTV or ultra‐HDTV compression, with substantially improved coding efficiency compared to H.264/AVC, i.e. 50% bit rate reduction

• Focus on the increasing need for parallel processing

• Basic Properties• Finalized in 2013• Large Block structure• Quad‐tree based block partition• Asymmetric mode partition• Sample adaptive offset• Tile

HEVCTimeline

• 2010.01: Formal joint CfP from VCEG and MPEG

• 2010.04: JCT‐VC team, HEVC joint project, full proposals

• 2010.07: TMuC SW ready, tool experiments (TE)

• 2010.10: HM SW ready, core experiments (CE)

• 2011.02: WD

• 2012.02: CD

• 2012.07: DIS

• 2012.10: SoDIS (Study of Draft International Standard)

• 2013.01: FDIS

• Mid‐2013 – mid 2014: Extensions/amendments, such as Scalable, 3D, 4:x:x, bit‐depth > 10, color…

HEVCInvolvedCompanies

MajorApplicationsSummaryField Bandwidth Video Standards

Digital Television Broadcasting

2 ... 6 Mbps (10 ... 20 Mbps for HD)

• H.262/MPEG‐2• H.264/MPEG‐4 AVC

Blue‐ray DVD video 6 ... 8 Mbps • H.262/MPEG‐2• H.264/MPEG‐4 AVC

Internet video streaming 20 ... 200 kbps • H.263• H.264/MPEG‐4 AVC

Videoconferencing, Videotelephony

20 ... 320 kbps • H.261• H.263• H.264/MPEG‐4 AVC

Video over 3G wireless 20 ... 200 kbps • H.263• H.264/MPEG‐4 AVC

AVSvideocompression

• AVS: Audio Video Standard• Founded by the China Audio Video Coding Standard Working Group in June 2002.

• Aimed at reducing the foreign technology dependence.• Two AVS standards are finalized or to be finalized

• AVS 1• Finalized in 2008• Provide the coding efficiency two times higher than MPEG‐2, comparable to H.264/MEPG‐4 AVC

• The complexity is only 30%, and 70% compared to H.264/MPEG‐4 AVC encoder and decoder.

• AVS 2• To be finalized in Dec 2013• Expected to improve coding efficiency by two times compared to AVS1, under high definition or higher resolution conditions

Outline

ImageStandardsComparison

• JPEG and JPEG 2000 simulations are conducted;• Several images are tested under various compression rate;

• Tools: 1520x1200• Bike: 2048x2560• Cafe: 2048x2560• Woman: 2048x2560

ImageStandardsComparison

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.830

Bit Rate bits/pixel

JPEGJPEG2000

0 0.2 0.4 0.6 0.8 1 1.2 1.430

Bit Rate bits/pixel

JPEGJPEG2000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.830

Bit Rate bits/pixel

JPEGJPEG2000

0 0.2 0.4 0.6 0.8 1 1.2 1.430

Bit Rate bits/pixel

JPEGJPEG2000

VideoStandardsComparison

• Various video standards simulations are conducted under various quality levels:• H.261, H.263, MPEG‐1, MPEG‐2, MPEG‐4 are conducted using the FFMPEG software;

• H.264 uses the latest JM 18.4;• HEVC uses the latest HM 11.0;

• Two sequences are tested:• Foreman: 176x144• RaceHorses: 832x480

• All Intra condition is tested:• All the frames are encoded as I frame;• JPEG and JPEG 2000 are also included in the comparison;

• Low‐Delay condition is tested:• IPPP structure are used;

All Intra Comparison

0 5000 10000 1500028

40RaceHorses All Intra

Bit−Rate(kbps)

HEVCH264MJPEG2000MPEG4MPEG1MPEG2MJPEG

0 200 400 600 800 1000 1200 140026

42Foreman All Intra

Bit−Rate(kbps)

HEVCH264MPEG4MPEG1MPEG2H263H261MJPEG2000MJPEG

Low Delay Comparison

0 50 100 150 200 250 300 350 400 45024

38Foreman Low Delay

Bit−Rate(kbps)

HEVCH264MPEG4H263MPEG1MPEG2H261

0 1000 2000 3000 4000 5000 6000 700027

37RaceHorses All Intra

Bit−Rate(kbps)

HEVCH264MPEG4MPEG1MPEG2

Outline

Conclusion

• Image coding standards: JPEG and JPEG 2000, both standards have their own advantages and enjoy popularity under certain circumstances.

• Video coding standards: MPEG‐1, MPEG‐2, MPEG‐4, H.261, H.263, H.264/MPEG‐4 AVC and HEVC, key features and applications introduced

• Experimental results: every evolution of the coding algorithms contributes greatly the compression performance

• This field is developing rapidly and its application can be found in various situations, and continuous effort on improving the coding algorithm will bring about a promising future for image and video compression. 146

WhatNext?

• H.266?• MPEG‐5?• Wavelet transform?• Big Block?• SSIM v.s. PSNR?

Next Generation Video Coding‐ H.265/HEVC and its extensions · APSIPA Asia-Pacific Signal and...

Documents