+ All Categories
Home > Documents > IEE 5037 Multimedia Communications Lecture 12:...

IEE 5037 Multimedia Communications Lecture 12:...

Date post: 07-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
89
IEE 5037 Multimedia Communications Lecture 12: MPEG-4 Dept. Electronics Engineering, National Chiao Tung University Adapted from Prof. Hang’s slides
Transcript
Page 1: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

IEE 5037 Multimedia CommunicationsLecture 12: MPEG-4

Dep

t. Electro

nics E

ng

ineerin

g,N

ational Chiao T

ung University Adapted from Prof. Hang’s slides

Page 2: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Video Coding

Part 2: Object-oriented codingFGS – Scalable coding

To be dropped from the standard?

Page 3: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Overview of MPEG-4

Page 4: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4

ISO/IEC …/WG11: A standard for multimedia applicationsHistory: (Rao & Hwang, Chap. 12)— Nov. 1992: MPEG started new work item— Nov. 1994: Call for proposals — many submitted— Nov. 1995: Subjective testing and tool evaluation— Jan. 1996: Define Verification Model (VM1) (encoder)— July 1996: Evaluate SNHC proposals— Nov. 1996: Working draft (WD)— Apr. 1997: Video VM 7.0 (WD 3.0)— Nov. 1997: Committee draft (CD): IS/IEC 14496— Apr. 1999: International Standard (IS)— (Now) Working on newer versions with additional features

Page 5: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG – 4 Documents

Part 1 Systems Part 2 VisualPart 3 AudioPart 4 ConformancePart 5 Reference SoftwarePart 6 DMIF - Delivery

Multimedia Integration Framework

Part 7 Optimized SoftwarePart 8 MPEG 4 on IPPart 9 Reference Hardware

Part 10 Advanced Video Coding (AVC) (JVT, H.264)

Part 11 Scene DescriptionPart 12 ISO base media file formatPart 13 IPMP extensionsPart 14 MP4 File formatPart 15 AVC File formatPart 16 Multimedia Animation

FrameworkPart 17 Streaming Text FormatPart 18 Font Compression and

Streaming

MPEG - 4 1998 Coding of audio-visual objects

Page 6: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Goals

Content-based interactivity: Content-based manipulation and endingUniversal access: Robustness in error-prone environments; content-based scalabilityCoding of natural and synthetic data: Merging pixel-based video/audio with synthesized graphics /audio/ speech in highly flexible way.High compression: Improved coding efficiency for particularly low rate applicationsFlexible syntax and toolsTexture coding based on H.263

Page 7: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Audio

Three core coders and some additional tools:— Parametric coder (PARA) — 2 to 16 kbs— CELP-based speech coder — 4 to 24 kbs— Time/frequency mapping based coder — 16 to 64 kbs— SNHC audio tools — text-to-speech, structured audio…Features:

— Improved coding efficiency— Time-scale change, pitch change (Karaoke)— Scalability: bitrate, bandwidth, …— Error resilience

Page 8: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-1 and MPEG-2MPEG-1 and MPEG-2

1992: MPEG-1 Standard

CD-ROM

(1.5 Mbit/s)

1994: MPEG-2 Standard

Digital Television (SDTV/HDTV)

(4 Mbit/s - 24 Mbit/s)

♦ Video Compression ♦ Audio Compression ♦ Systems (Multipl.)

Page 9: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4MPEG-4

1999/2000: MPEG-4 Standard

Flexible Multimedia Communications

(5 kbits/s - 50 Mbit/s)

♦ Video Object Compression ♦ Audio Object Compression ♦ Synthetic Audio/Speech and Video ♦ Systems (Multiplex and flexible Composition)

Page 10: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

First Things FirstFirst Things First

Top-quality MPEG-4 audio and video coders for streaming conventional speech, audio and video

•excellent AV compression•excellent robustness against packet loss •scalability of bitrate vs quality•MPEG-4 File Format

MPEG-4 attempts to become THEstandard for streaming AV media on the Internet and via wireless networks

Page 11: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

But MPEG-4 Vision Goes MUCH Further

MPEG-4 attempts to provide a bridge between the www and conventional AV

media

But MPEG-4 Vision Goes MUCH Further

MPEG-4 attempts to provide a bridge between the www and conventional AV

media

Page 12: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Interactivities in MPEG-4

Page 13: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Example: MPEG-4 audio-visual SceneExample: MPEGExample: MPEG--4 4 audioaudio--visual Scenevisual Scene

2D Background2D Background

3D Furniture3D Furniture

SpeechSpeech

Video ObjectVideo Object

AV PresentationAV Presentation

Page 14: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Systems: BIFS-Composition of Scenes

MPEGMPEG--4 Systems: 4 Systems: BIFSBIFS--Composition of ScenesComposition of Scenes

Scene

Person Audio-visualPresentation

2D Background Furniture

Globe TableSpeech Video

Page 15: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Integration of Natural and SyntheticContent

Integration of Natural and SyntheticContent

Page 16: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Application: Augmented Reality

Page 17: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Application: Telepresence

Page 18: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 New FunctionalitiesMPEG-4 New Functionalities

Streaming AV over mobile networks of much interest

More freedom to flexibly interact with what is within scenes

Support integration of natural and syntheticAV media (“Virtual Playground”)

Identification, Protection of intellectual property and rights on content

Page 19: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4: Coding of AV Objects

AV scenes consist of ‘objects’Objects can be both natural or/and synthetic (A&V, Text & Graphics, animated faces, arbitrarily shaped or rectangular)A ‘compositor’ composes objects in a scene (A&V, 2&3D)Binary Format for Scene Description : ‘BIFS’Independent of Bitrate!

Page 20: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Object ManipulationObject Manipulation

Original Decoded Decoded and Manipulated

Page 21: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Part II. Visual

Page 22: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

CompressionCompression

Error ResilienceError Resilience

ScalabilityScalability

Content-based CodingContent-based Coding

Baseline Extended

Conventional coding Object coding

Still Texture CodingStill Texture Coding

MPEGMPEG--4 Video4 Video

Page 23: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Video Standard

MPEG-4 Video Provides Tools for a Number of Functionalities

Many tools are not used

Integrated Approach (Baseline and Extensions)

Based on DCT Technology(except for Still Texture Coding) – DWTbased

Page 24: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEGMPEG--44Baseline and ExtensionsBaseline and Extensions

Page 25: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Compatibility Issues of MPEG-4 Video Standard

MPEG-4 Video is Compatible to Baseline H.263

And Almost Compatible to MPEG-1

And almost compatible to MPEG-2

Page 26: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Basic Structure for Video Standard

EntropyCoding

Scaling & Inv.

Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

OutputVideoSignal

Page 27: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Baseline:Rectangular VOP (Conventional Coding)

EntropyCoding

Scaling & Inv.

Transform

Motion-Compensation

ControlData

Quant.Transf. coeffs

MotionData

Intra/Inter

CoderControl

Decoder

MotionEstimation

Transform/Scal./Quant.-

InputVideoSignal

Split intoMacroblocks16x16 pixels

OutputVideoSignal

8x8 DCT TransformAccuracy problem

(MPEG-2/4)Q: H.263 or MPEG-2 type

Intra DC/AC prediction (MPEG-4)

A

B C D

X MacroblockY

or or

Motion vector accuracy 1/4 (6-tap filter)

(MPEG-4)

0

16x16MB

Types

8x80 1

2 3

Page 28: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Scan:— Alternate-horizontal— Alternate-vertical— Zig-zagAdaptive DC prediction; adaptive AC predictionInverse Quantizer:— Quantization method 1 - similar to that of H.263— Quantization method 2 - similar to that of MPEG-2— Optimized nonlinear quantization for DC coeff.

(can be used together with previous two methods)

DCT and Quantization

Page 29: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Adaptive Intra-DC prediction

A

B C D

X Macroblock(16x16)

Y

or or

Block(8x8)

Choose best DC predictor based on gradients of the DC values (side info. not transmitted)

if (|QDCA - QDCB| < |QDCB - QDCC|) QDCX’ = QDCC

else QDCX’ = QDCA

Page 30: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Adaptive Intra-AC prediction

A

B

X

DC

or

Macroblock

Y

or

Shaded coefficients are predicted from previous coded blocks.The best direction is chosen based on the DC prediction.On/off Mblkbasis --transmitted

Page 31: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Functionality-Baseline

Similar to MPEG-2/H.263 structure and algorithms

8x8 DCT/Q/MC/ME/VLC

50% bit rate reduction compared to MPEG-2Intra DC/AC prediction, 8x8 ME, better VLC table

Widely used in current consumer marketMobile phoneDVDivX

Page 32: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Syntax

Page 33: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Inside the Bit Stream

VS1

VOL1

VO1

GOV1

VOPk

VS1VS1

VS1VS1

VO2

VS1VS1

VOL2

VS1VOPk+1VS1

VOP1

VS1GOV2

VOP1 VS1VOP2

Video session(VS)

Video Object(VO)

Video Object Layer(VOL)

Group Of VOPs(GOV)

Video Object Plane(VOP)

VS1…VSN

VO1…VON

VOL1…VOLN

GOV1…GOVN

VOP1…VOPk VOP1…VOPNVOPk+1…VOPN

Layer 1 Layer 2

Page 34: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

SyntaxVideo-object Sequence (VS)

delivers the complete MPEG-4 visual scene, which may contain 2-D or 3-D natural or synthetic objects.

Video Object (VO)a particular object in the scene, which can be of arbitrary (non-rectangular) shape corresponding to an object or background of the scene.

Video Object Layer (VOL)facilitates a way to support (multi-layered) scalable coding. A VO can have multiple VOLs under scalable coding, or have a single VOL under non-scalable coding.

Group of Video Object Planes (GOV)groups Video Object Planes together (optional level).

Video Object Plane (VOP)a snapshot of a VO at a particular moment.

Page 35: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Syntax (1)

Page 36: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Syntax (2)Video_object_layer_start_code

(long Header)Video Object Layer

Video_plane_with_short_header(short Header)

Header User DataVideo Object

Plane (Optional)

Group_of_VideoObjectPlane

(optional)

Video ObjectPlane

Video ObjectPlane

Video ObjectPlane

Header Gob_layer Gob_layer Gob_layerShort_video_start

_markerShort_video_end

_marker

Gob_layer Header(Optional)

Macroblock Macroblock Macroblock

Video Object Plane Vop_start_code Header Sprite DataMotion_shapre_t

extureVideo_packet_he

aderMotion_shapre_t

exture

MacroblockHeader

Shape Data Motion Vector BlockMacroblock

HeaderShape Data Motion Vector BlockMacroblock

Differential DCCoefficient

Run-Level VLC Run-Level VLC End_of_blockBlock

Page 37: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Important Header Information (1)VOL

video_object_layer_shapevol_widthvol_heightinterlacedvol_quant_typenot_8_bitshort headerquarter_sample

VOPvop_coding_type (vop_prediction_type)vop_codedintra_dc_vlc_thrvop_quant

Page 38: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Important Header Information (2)

Macroblocknot_codedmcbpc

VLC to derive the macroblock type and coded block pattern for chrominanceTable B-6, -7 (Also Table B-1~2)

mcsselFor S-VOP

ac_pred_flagAC prediction

cbpyVLC for the pattern of non-transparent Y blocksTable B-8 ~11

Page 39: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Object based Video Coding

Page 40: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Visual Standards

Video Object: 2-D representation of natural video — MPEG-1/2, H.263 + shape)Face Object: 3-D representation of human face — facial animation parameters; model-based codingMesh Object: 2-D deformable geometric shape (triangle)Still-texture: Wavelet-based still image coding using zero-tree technique

Page 41: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Visual Decoding

Video object decoding

Page 42: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Video

— Based on Verification Model 9 (April 1997)Video Object Plane (VOP)Motion / texture coding derived from

MPEG-1/2 & H.263 Polygon matching for motion estimationPadding for motion estimation / texture

codingShape coding: binary and gray-scaleSprite coding: extended background scene

Page 43: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Video Object Coding

Page 44: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Video Object Plane (VOP)

— An arbitrarily shaped image region

Page 45: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP Codec Structure

Page 46: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP DecoderVOP Decoder

ShapeDecoding

TextureDecoding

Shape InformationDEMULTIPLEXER

Motion Compensation

Bit

stre

am MotionDecoding

VOPMemory

Reconstructed VOP

CompositorVideo Out

Compositing script

Conventional decoding + shape capability

Page 47: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP-based v.s. Frame-based

Page 48: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP-based CodingMPEG-4 VOP-based coding also employs the Motion Compensation technique:

An Intra-frame coded VOP is called an I-VOP.The Inter-frame coded VOPs are called P-VOPs if only forward prediction is employed, or B-VOPs if bi-directional predictions are employed.

The new difficulty for VOPs: may have arbitrary shapes, shape information must be coded in addition to the texture of the VOP.

Note: texture here actually refers to the visual content, that is the gray-level (or chroma) values of the pixels in the VOP.

Page 49: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP-based Coding

1. Motion compensation codingMC + shape capability

By padding process to convert non-rectangular MBs (boundary MB) into rectangular MC and applying conventional ME

2. Texture coding8x8 DCT with zero padding or shape adaptive DCT + Q + VLC

3. Shape codingMC: binary ME or gray scale MEContext adaptive arithmetic coding (CAE)

Page 50: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

1. VOP-based Motion CompensationMC-based VOP coding in MPEG-4 again involves three steps:

(a) Motion Estimation.(b) MC-based Prediction.(c) Coding of the prediction error.

Only pixels within the VOP of the current (Target) VOP are considered for matching in MC.

To facilitate MC, each VOP is divided into many macroblocks (MBs). MBs are by default 16x16 in luminance images and 8x8 in chrominance images.

Padding steps for MB processingTo help matching every pixel in the target VOP and meet the mandatory requirement of rectangular blocks in transform codine (e.g., DCT), a pre-processing step of padding is applied to the Reference VOPs prior to motion estimation.

Page 51: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

VOP Formulation

— Minimize the number of MBs to be retained

Video ObjectPlane

bounding box

shapeblock(Binary Alpha Block)

Page 52: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Padding

Page 53: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Motion Compensation Tools

time

I-VOP

P-VOP

B-VOP

-- Motion compensated coding modes (I, B, P) (similar to MPEG-1/2 and H.263)

Page 54: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Motion Computation

modified block(polygon) matching

conventionalblock matching

nomatching

referenceP-VOP orI-VOP

padded referencepixels for blockmatching

reference VOPpixels for blockmatching

P-VOP orB-VOP

padded referencepixels forunrestrictedblock matching

boundingbox

advancedpredictionmode (four8x8 blocks)

Only pixels within the VOP of the current (Target) VOP are considered for matching in MC

Page 55: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Motion Vector CodingLet C(x + k; y +l) be pixels of the MB in Target VOP, and R(x+i+k; y+j+l) be pixels of the MB in Reference VOP.A Sum of Absolute Difference (SAD) for measuring

the difference between the two MBs can be defined as

Page 56: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

2. Texture Coding Tools

macroblockentirely insideVOP(coded byconventionalDCT scheme)

VOP

macoblockpartially outside VOP(blocks partially outside the VOP are coded by DCT after padding)

macroblockentirely ousideVOP (not coded)

Page 57: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Texture Coding Tools (2/2)

VariableLengthDecoding

MotionCompen-sation

InverseScan

InverseQuantiz-ation

InverseDCT

VOPMemory

ReconstructedVOP

DecodedShape

CodedData

QFS[n] SQF[v][u]

F[v][u] f[y][x] d[y][x]

DecodedPels

InverseAC/DCprediction

QF[v][u]

Page 58: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Adaptive Intra-DC prediction

A

B C D

X Macroblock(16x16)

Y

or or

Block(8x8)

Choose best DC predictor based on gradients of the DC values (side info. not transmitted)

if (|QDCA - QDCB| < |QDCB - QDCC|) QDCX’ = QDCC

else QDCX’ = QDCA

Page 59: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Adaptive Intra-AC prediction

A

B

X

DC

or

Macroblock

Y

or

Shaded coefficients are predicted from previous coded blocks.The best direction is chosen based on the DC prediction.On/off Mblkbasis --transmitted

Page 60: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Boundary blocks: (DCT based)— Inter blocks — Padded with zeros— Intra blocks — Lowpass extrapolation padding

Step 1: Assign the mean value of object pels(inside MB) to the outside pels;

Step 2: f(I,j)=1/4[f(I,j-1) + f(I-1,j) + f(I,j+1) + f(I+1,j)]

starting from the top left corner. If any of the reference 4 pels is outside the block, do not include it and adjust the 1/4 factor accodingly.

Texture Coding

Page 61: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Scan:— Alternate-horizontal— Alternate-vertical— Zig-zagAdaptive DC prediction; adaptive AC predictionInverse Quantizer:— Quantization method 1 - similar to that of H.263— Quantization method 2 - similar to that of MPEG-2— Optimized nonlinear quantization for DC coeff.

(can be used together with previous two methods)

DCT and Quantization

Page 62: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Shape adaptive DCT for Boundary MBShape Adaptive DCT (SA-DCT) is another texture coding method for boundary MBs.Due to its effectiveness, SA-DCT has been adopted for coding boundary MBs in MPEG-4 Version 2.It uses the 1D DCT-N transform and its inverse, IDCT-N:

Page 63: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

SA-DCT Flow

Page 64: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

3. Shape Coding

The shape information is called alpha planesBinary alpha plane — Code the boundaries usingcontext-based arithmetic encoding (CAE)Gray scale alpha plane — Consists of support and alpha values (texture)

— Support is coded using CAE (as binary alpha plane)— Alpha values (texture) are coded using motion compensated DCT (similar to the texture coding)Motion compensation for shape — similar to that of texture but simpler

Page 65: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Shape CodingShape Coding

binary

arbitrary

X

Page 66: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

CAE

Context-based Arithmetic Encoding(CAE) — Predict the current pel value (1 or 0) based on the conditional probability (table)

Page 67: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Other parts

Page 68: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Others

Scalability:— Object scalability— Temporal scalability— Spatial scalability

Error resilience: H.263 marker, MPEG-4 marker, …Sprite codingSNHC visual: Face and bodyDynamic 2-D meshesScalability still texture: Wavelet with zero-tree

Page 69: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Visual Decoding

Video object decoding

Page 70: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Sprite CodingA sprite is a graphic image that can freely move around within a larger graphic image or a set of images.

To separate the foreground object from the background, we introduce the notion of a sprite panorama: a still image that describes the static background over a sequence of video frames.

The large sprite panoramic image can be encoded and sent to the decoder only once at the beginning of the video sequence.When the decoder receives separately coded foreground objects and parameters describing the camera movements thus far, it can reconstruct the scene in an efficient manner.

Page 71: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Sprite Coding

+Sprite Foreground

Object

DecodedFrame

Page 72: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

2-D Mesh Coding

Objects are represented by 2-D polygons.

Node positions and motion vectors are coded.

Page 73: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

3-D Face Animation

A 3-D face model is defined in terms of 68 Face Animation Parameters (FAPs)

Page 74: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

FGS

Page 75: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Fine Granularity Scalability (FGS)

Amendment 2 (2001)Technique: Base layer + Enhancement layer

Enhance layer bit plane coding“Tuned” Huffman coding

ApplicationsInternet streamingBroadcastingUnicast with/without feedback control.Resource sharingWireless communications

Page 76: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Bandwidth Scalability

I P/B P/B P/B

MPEG-4 base layer

Fine-granular scalable enhancement layer

P/B

Page 77: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Wireless Applications

Ethernet

Page 78: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

FGS Advantages

Channel Bandwidth

ReceivedQuality

TraditionalSourceCoding

NewObjective

Good

Moderate

Bad

HighLow

TraditionalDistortion-Rate

Curve

Page 79: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

FGS Principles

Base layer: MPEG-4 motion compensated DCT codingEnhancement layer: DCT residuals (the quantization errors of the base layer) are bit-plane-coded. Enhancement layer bitstream can be truncated into any number of bits per frameDecoder may ignore some enhancement bitsReconstructed video quality is proportional to number of decoded bits

Page 80: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

FGS Encoder

DCT Q

Q-1

IDCT

MotionCompensation

MotionEstimation

FrameMemory

VLCInput Video

Base LayerBitstream

Bit-planeShift

FindMaximum

Bit-planeVLC Enhancement

Bitstream

Enhancement Layer Encoding

Clipping

DCT

Page 81: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

FGS Decoder

VLD Q-1 IDCT

MotionCompensation

FrameMemory

Bit-planeVLD IDCT

Enhancement Layer Decoding

Base LayerBitstream

EnhancementBitstream

Base Layer Video(optional output)

Enhancement VideoClipping

Clipping

Bit-planeShift

Page 82: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Bitplane Coding

+-

+- +

1

0

0+

0

0

0

0

1

111

1

1

1

MSB

LSB

Bit-Plane

A block of 8x8 DCT coefficient differences

Zigzag ordering of a block of 8x8 DCTcoefficient differences

+ - +

1

0

0

0

1

11

MSB

LSB

Bit-Plane

A block of 8x8 DCT coefficient differences after zigzag ordering

+

0

+

0

-

0

1

10 0 0 0 0 0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0 1

0

0

0

0 0

0

0

0

0

18 zeros 12 zeros 20 zeros

MSB

LSB

Bit-Plane

(RUN, EOP) symbols for a blockof 8x8 DCT coefficient differences

after zigzag ordering

(0, 1)

(28, 1)

(6, 0)

(0, 0) (0, 0) (26, 1)

(2, 0) (31, 1)

Page 83: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Profiles in Version 1Simple Profile -- Basic tools of I/P VOP, AC/DC Prediction and 4 MV unrestrictedCore Profile -- Simple + Binary Shape, Quantization Method 1/2 and B-VOPMain Profile -- Core + Grey Shape, Interlace and SpriteSimple Scalable Profile -- Simple + Spatial and temporal scalabilityand B-VOPN-Bit Profile -- Core + N-BitAnimated 2D Mesh -- Core + Scalable Still Texture, 2D dynamicMeshBasic Animated Texture -- Banary Shape, Scalable Still Texture and 2D Dynamic MeshStill Scalable Texture -- Scalable Still TextureSimple Face -- Face Animation Parameters

Page 84: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Profiles in Version 2

Advanced Real Time Simple Profile -- Simple + Advanced errorresilience + improved temporal scalabilityCore Scalable Profile -- Simple scalable + Core + SNR, Spatila/Temporal Scalability for Region or Object of interestAdvanced Coding Efficiency Profile -- Tools for improving codingefficiency for both rectangular and arbitrary shaped objectsAdvanced Scalable Texture Profile -- Tools for decoding arbitraryshaped texture and still image including scalable shape codingAdvanced Core Profile -- Core Profile + Tools for decodingarbitrary shaped video objects and arbitrary shaped scalable stillimageSimple Face and Body Animation Profile -- Simple face animation + body animation

Page 85: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

Additional ProfilesAdvanced Simple Profile -- Simple Profile + efficient coding tools: B-frames, 1/4 pel MC, … Fine Granularity Scalable Profile

Advanced Simple Profile as base layerFine granularity scalability (FGS)Fine granularity scalability - temporal (FGST)

Simple Studio ProfileI-frames onlyArbitrary shapeMultiple alpha channelsUp to 2 Gbps

Core Studio Profile -- Simple Studio Profile + P-frames

Page 86: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

MPEG-4 Video Profiles

Spatial &

TemporalScalability

ArbitraryShape

RectangularFrame

NoScalability

Quality &

TemporalScalability

AdditionalTools

HigherError

Resilience

Simple

Core

SimpleScalable

CoreScalable

Main

AdvancedSimple

AdvancedCoding

Efficiency

Fine Granularity

Scalable

AdvancedRealtimeSimple

SimpleStudio

CoreStudio

AdditionalTools

ISAMD-1AMD-2

Profiles limit the set of tools in a decoding deviceLevels specify parameter ranges (limit complexity)

Page 87: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

LevelsVisualProfile

Level Typical VisualSession Size(indicative)

Maximumtotal numberof objects 1

Maximumnumber per

type

Maximumnumberdifferent

QuantizationTables

Max. totalReferencememory

(MB units)2

Maximumnumber of

MB/sec

Costfunction

equivalentI-MB/sec5

Maximumvbv_buffer_size (unitsof 16384

bits)

Max. videopacketlength(bits)6

Max spritesize (MB

units)

Waveletrestrictions

Maxbitrate

Max.enhancement

layers perobject

Main L4 1920 x 1088 32 32 x Main orCore orSimple

4 16320 489600 1290100 380 16384 65280 1 tapsdefaultinteger filter

38.4Mbit/s

1 temporal, 2spatial

Main L3 CCIR 601 32 32 x Main orCore orSimple

4 3240 97200 256200 160 16384 6480 1 tapsdefaultinteger filter

15 Mbit/s 1

Main L2 CIF 16 16 x Main orCore orSimple

4 792 23760 62700 40 8192 1584 1 tapsdefaultinteger filter

2 Mbit/s 1

Core L2 CIF 16 16 x Core orSimple

4 792 23760 62700 40 8192 N. A. N. A. 2 Mbit/s 1

Core L1 QCIF 4 4 x Core orSimple

4 198 5940 15700 8 4096 N. A. N. A. 384

kbit/s

1

SimpleScalable

L2 CIF 4 4 x Simpleor SimpleScalable

1 792 23760 N. A. 20 4096 N. A. N. A. 256 kbit/s 1 spatial ortemporalenhancementlayer

SimpleScalable

L1 CIF 4 4 x Simpleor SimpleScalable

1 495 7425 N. A. 20 2048 N. A. N. A. 128 kbit/s 1 spatial ortemporalenhancementlayer

Simple L3 CIF 4 4 x Simple 1 396 11880 N. A. 20 8192 N. A. N. A. 384 kbit/s N. A.

Simple L2 CIF 4 4 x Simple 1 396 5940 N. A. 20 4096 N. A. N. A. 128 kbit/s N. A.

Simple L1 QCIF 4 4 x Simple 1 99 1485 N. A. 5 2048 N. A. N. A. 64 kbit/s N. A.

Page 88: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

F. Pereira & T. Ebrahimi, The MPEG-4 Book, Prentice-Hall, 2002A. Puri and T. Chen, ed., Multimedia Systems, Standards, and Networks, Marcel Dekker, 2000.ISO/IEC JTC1/SC29/WG11/Doc.N1869: MPEG-4 Video Verification Model Version 9.0, Oct. 1997.Image Communication: Tutorial Issue on MPEG-4, Jan. 2000.Weiping Li, “The Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. on Circuits and Systems for Video Tech., pp.301-317, March 2001.Weiping Li and et al., “Fine granularity scalability in MPEG-4 for streaming video,” IEEE ISCAS 2000, pp. 299–302.

References

Page 89: IEE 5037 Multimedia Communications Lecture 12: MPEG-4twins.ee.nctu.edu.tw/courses/multimedia_c_05spring/handout/MMC 12 MPEG-4.pdfOnly pixels within the VOP of the current (Target)

H.M. Radha and et al., “MPEG-4 fine-grained scalable video coding method for multimedia streaming over IP,”IEEE Trans. on Multimedia, pp.53 –68, March 2001.F. Wu and et al., “A framework for efficient progressive fine granularity scalable video coding,” IEEE Trans. on Circuits and Systems for Video Tech., vol. 11, no. 3, March 2001.ISO/IEC MPEG and ITU-T VCEG, Joint Committee Draft (CD), JVT-C167, May. 2002.ISO/IEC MPEG and ITU-T VCEG, Low Complexity Transform and Quantization, JVT-B038, Feb. 2002.ITU-T VCEG, H.26L Test Model Long-Term Number 9 (TML-9) draft 0, VCEG-N83d1, Dec. 2001.ITU-T VCEG, New Intra Prediction Modes, VCEG-N54, Sept. 2001.


Recommended