+ All Categories
Home > Documents > COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002...

COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002...

Date post: 09-May-2020
Category:
Upload: others
View: 5 times
Download: 1 times
Share this document with a friend
52
July 24, 2002 Vol. 13.25 Copyright © CSR 2002 1 COMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T SG16 Q6 (VCEG) MAY 6 – 10, 2002, FAIRFAX, VA The following report represents the view of the reporter and is not the official, authorized minutes of the meeting. Joint Video Team (JVT) Meeting #3, ISO/IEC JTC1/SC29/WG11 (MPEG) and ITU-T SG16 Q6 (VCEG), May 6 – 10, 2002, Fairfax, VA...................................................................3 Interim Ad Hoc Reports........................................................................................................3 “5% / 10% Rule of Thumb”................................................................................................4 IPR Issues.............................................................................................................................4 Inputs through MPEG..........................................................................................................5 Inputs through VCEG...........................................................................................................6 Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) Summary.................6 Draft Content of the Interface to External Systems-Level Specifications..............................9 Requirements and the Definition of Normative Profiles and Levels......................................9 Motion Compensation..............................................................................................................10 “Improved MB Prediction” Modes...................................................................................10 Motion Interpolation...........................................................................................................11 Bidirectionally Predictive (B) Frames.................................................................................13 Multiframe Motion Prediction............................................................................................15 Motion Vector Coding........................................................................................................16 Global Motion Vector Coding and Global Motion Compensation......................................17 Deblocking Filter......................................................................................................................18 HRD/VBV Buffering Model....................................................................................................21 Transform and Quantization.....................................................................................................23 Transform coding...............................................................................................................23 Quantization........................................................................................................................24 Intra Prediction...................................................................................................................25 Adaptive Block Transform (ABT).......................................................................................26 Source Formats.........................................................................................................................27 Interlace Coding........................................................................................................................28 Performance, Implementation, and Complexity Analysis..........................................................29 Carriage and File Format..........................................................................................................30 NAL, High-Level Syntax, and Robustness................................................................................31 Switchable P / Still Image Frames.............................................................................................38 Entropy Coding........................................................................................................................39 VLC Approaches................................................................................................................39 CABAC..............................................................................................................................40 Profiles and Levels....................................................................................................................41
Transcript
Page 1: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 1

COMMUNICATIONS STANDARDS

REVIEW

Volume 13, Number 25 July 24, 2002

REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IECJTC1/SC29/WG11 (MPEG) AND ITU-T SG16 Q6 (VCEG)

MAY 6 – 10, 2002, FAIRFAX, VA

The following report represents the view of the reporterand is not the official, authorized minutes of the meeting.

Joint Video Team (JVT) Meeting #3, ISO/IEC JTC1/SC29/WG11 (MPEG) and ITU-T SG16 Q6(VCEG), May 6 – 10, 2002, Fairfax, VA...................................................................3

Interim Ad Hoc Reports........................................................................................................3“5% / 10% Rule of Thumb”................................................................................................4IPR Issues.............................................................................................................................4Inputs through MPEG..........................................................................................................5Inputs through VCEG...........................................................................................................6

Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) Summary.................6Draft Content of the Interface to External Systems-Level Specifications..............................9Requirements and the Definition of Normative Profiles and Levels......................................9

Motion Compensation..............................................................................................................10“Improved MB Prediction” Modes...................................................................................10Motion Interpolation...........................................................................................................11Bidirectionally Predictive (B) Frames.................................................................................13Multiframe Motion Prediction............................................................................................15Motion Vector Coding........................................................................................................16Global Motion Vector Coding and Global Motion Compensation......................................17

Deblocking Filter......................................................................................................................18HRD/VBV Buffering Model....................................................................................................21Transform and Quantization.....................................................................................................23

Transform coding...............................................................................................................23Quantization........................................................................................................................24Intra Prediction...................................................................................................................25Adaptive Block Transform (ABT).......................................................................................26

Source Formats.........................................................................................................................27Interlace Coding........................................................................................................................28Performance, Implementation, and Complexity Analysis..........................................................29Carriage and File Format..........................................................................................................30NAL, High-Level Syntax, and Robustness................................................................................31Switchable P / Still Image Frames.............................................................................................38Entropy Coding........................................................................................................................39

VLC Approaches................................................................................................................39CABAC..............................................................................................................................40

Profiles and Levels....................................................................................................................41

Page 2: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

2 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Test Model................................................................................................................................43JVT Interim Ad Hoc Groups Established.................................................................................44JVT Meeting Roster, May 6 – 10, 2002, Fairfax, VA................................................................45

Acronym Definitions......................................................................................................................49Communications Standards Review Copyright Policy....................................................................52

CSR’s Fully Searchable CDs

CSR CDs are indexed for machine searching (Adobe Acrobat). They arevery useful for researching technical issues as well as for prior-art searches.

Your company’s patent or legal departments may also find these CDs useful.

Twelve Year CD: all CSR reports from 1990 through 2001 on one CD$2,400 to non-subscribers; subscribers receive a $200 discount for each year ofsubscription during 1990 – 2001

Quarterly CDs : 3 months of CSR reports on each CD, in an annual subscription$695 as a standalone subscription, $200 as an add-on to current subscriptions

Annual CDs: 12 months of CSR reports on a CD for each calendar year 1990 topresent

$695 to non-subscribers, $200 to current subscribers

Page 3: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 3

REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IECJTC1/SC29/WG11 (MPEG) AND ITU-T SG16 Q6 (VCEG),

MAY 6 – 10, 2002, FAIRFAX, VA

The purpose of this meeting was to further the draft Joint Video Specification (ITU-T Rec. H.264 |ISO/IEC 14496-10 AVC). The output of this meeting is the draft document contained in JVT-C167. This meeting was held under the auspices of MPEG.

The Joint Video Team (JVT) is composed of ITU-T VCEG (Q6/SG16) and ISO/IEC MPEG(ISO/IEC JTC1/SC29/WG11). The JVT chair and co-chairs are G. Sullivan (Microsoft), T.Wiegand (Heinrich Hertz Institute), and A. Luthra (Motorola BCS). JVT-C001d1 reports theresults of this meeting. JVT-C003 is the list of Participants at this meeting, JVT-C004 is the list ofJVT Experts, and JVT-000 is the list of documents. JVT-C002d5 is the Report of the secondJVT meeting in Geneva (Jan-Feb. 2002, see CSR 3.03).

The general JVT reflector can be subscribed to by clicking on “join jvt-experts” at<http://mail.imtc.org/cgi-bin/lyris.pl?enter=jvt-experts>. Email for the reflector should be sent to<[email protected]>. The subject line of each email message will automatically be prefixedwith “[jvt-experts]”. Project identification and unsubscribe information will be attached to thebottom of each reflected message.

The new ftp site recently established for JVT use is <ftp://ftp.imtc-files.org/jvt-experts/>. The priorsite containing the files of the first and second JVT meetings was the VCEG site at<http://standard.pictel.com/ftp/video-site>. The new work will be put on the new site, and the priorfiles from the old site will be transferred to the new one.

The JVT group thanked:

• The International Multimedia Telecommunications Consortium for their great generosity inhosting both the JVT email reflector and JVT ftp site.

• The US National Institute of Standards and Technology, and C. Fenimore in particular, for aidin providing equipment and support for demonstrations of video technology designs at themeeting.

• The USNB to WG11, and the sponsors of the social event, including Contentguard, INCITS,MPAA, Microsoft Corporation, NIST, RIAA, and Rightscom for hosting the JVT and providingexcellent arrangements for the JVT work.

The next JVT meeting (#4) will be July 22-26, 2002, in Klagenfurt, Austria.

Regarding WG11 National Body comments on technical content of the JVT draft, the JVT thanksthe USNB to WG11 for its comments, and reports that the actions taken by the JVT are consistentwith all of the USNB requests.

Interim Ad Hoc Reports

The following ad hoc committees presented reports of their interim activities between Meetings #2and #3 (this meeting):

Previous Interim Ad hoc Chair Ad HocReport

JVT Project Management G. Sullivan (Microsoft) JVT-C005WD & JM Doc & S/W Editing T. Wiegand (HHI) JVT-C006

Page 4: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

4 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Deblocking P. List (Deutsche Telekom) JVT-C007VLC G. Bjøntegaard (Telenor) JVT-C008CABAC D. Marpe (HHI) JVT-C009Multiframe Buffering Syntax M. Schlockermann (Panasonic) JVT-C010GMVC / GMC H. Kamata (NTT), J. Lainema (Nokia) JVT-C011Additional Transforms andQuantization Methods

M. Wien (RWTH Aachen) JVT-C012

Interlaced Coding P. Borgwardt (VideoTele.com), L. Wang(Motorola)

JVT-C013

Motion Interpolation T.-I. Johansen (Tandberg) JVT-C014NAL and High-Level Syntax Y-K Lim (NetnTV) JVT-C015Complexity Minimization M. Horowitz (Polycom) JVT-C016Robustness M. Horowitz (Polycom) JVT-C017Film Mode / Timing G. Sullivan (Microsoft), S. Chen (Broadcom) JVT-C018Profiles, Levels & Applications D. Lindbergh (Polycom) JVT-C019Buffering E. Viscito (GlobespanVirata) JVT-C020

“5% / 10% Rule of Thumb”

The group adopted the following guideline for coding efficiency proposal evaluations at thismeeting:

Proposed changes are not to be adopted unless the change results in BD-Rate savings of at leastapproximately 5% average across group-agreed common conditions (realistic quantization rangeand test sequences), and 10% peak.

Very minor changes (like adding one or two sentences to the draft) or (better) bug-fix/clean-upwork may also be considered.

Complexity reduction proposals require different consideration. Participants should expect thesethreshold values to go up at the next meeting. Exceptions are to be made only with very stronggroup consensus.

IPR issues

JVT-C110, JVT IPR Status Report (G. Sullivan, Microsoft), contains a detailed company-specificreport on the status of known IPR issues in the JVT project. It notes that the JVT project isintended to define a “baseline profile” of the JVT standard that is royalty-free for allimplementations.

JVT-C085 (T. Kogure, T. Seno, S. Kadono, Matsushita) requests the establishment of an IPRworking group in JVT to conduct expert investigations of the validity and potential infringement ofIPR related to the potential royalty-free baseline profile, and to discuss possible solutions to reachthat objective.

JVT-C123 (J. van der Meer, Philips) discusses the potential JVT licensing situation in regards to aneed to address IPR issues. Doubt was expressed regarding whether the royalty-free baselineprofile goal of the JVT project could be realistically achieved. Concern was expressed in particularregarding the potential applicability of patents relating to MPEG-2 video.

JVT-C124, Information on JVT patents (J. van der Meer, Philips), reports that a significantpercentage of patents relating to MPEG-2 may be essential to the JVT baseline and that some of theowners of these patents may not actively participate in JVT. It was suggested that the JVT make

Page 5: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 5

sure that potential patent holders beyond JVT are approached to provide commitments on licensingconditions of IPR they may hold that may be essential elements of JVT technology, including onthe JVT baseline. It was suggested to approach in particular potential patent holders thatparticipated in developments on video coding technology in ITU and MPEG since about 1985.

JVT-C149, Licensing issues on JVT (Y. Yagasaki, Sony), reports that patent rights of sometechnologies in the JVT codec may belong to companies and organizations that may have declaredlicensing terms under RAND terms and conditions for other standardization projects, and that thesepatent holders may not necessarily be members of JVT. The contribution requests establishment ofa working group consisting of “IPR experts,” to discuss a licensing model for the JVT codec.

JVT-C150, Support for JVT Royalty Free Baseline (D. Lindbergh, Polycom), is from a largenumber of co-authoring companies (Apple Computer, British Telecom, Broadcom, Cisco Systems,Conexant, Deutsche Telekom, FastVDO, Nokia, Polycom, RADVision, Sand Video, Siemens, SunMicrosystems, Tandberg, Telenor, Teles AG, Texas Instruments, UB Video, VCON, VideoLocus,ViXS, and VWeb). It expresses strong support for the royalty-free baseline profile goal of the JVTproject. It opines that the royalty-free baseline profile (RFB) goal is critically important to thesuccess of the JVT standard in the marketplace, and that this goal is practical and workable.

In response to these contributions, it was noted that the Baseline Profile, as defined in the Joint CDFairfax output document, does not include any currently-reported 2.2 (or 3.2 referring to unknownor 2.2 status) IPR in the adopted list. The editor was asked to double-check the adoption list andidentify whether any concerns exist regarding this status assessment.

The JVT supported the concept of a joint MPEG+VCEG call for information on IPR status toaddress the need for outreach to potential patent holders and to ensure no infringement of theirpatent rights in the JVT design.

No change to JVT goals or policy was made in response to these contributions. Editor’s note: Inthis report, we have noted the contributions in which IPR para. 2.2, reasonable and non-discirminatory license, has been checked.

Inputs through MPEG

M8338, USNB Comments, notes that actions taken are consistent with the USNB requests. Thegroup concluded that no informative annexes will be included in the draft at this stage. It willcontinue to work on these as a longer-term effort.

M8407, Draft CFP for fast motion estimation, was discussed. The JVT prefers to focus onnormative text finalization at this time, rather than to start a process for evaluation of fast exampleencoding methods.

JVT-C154, Summary of JVT-Related Actions 3/02 MPEG Meeting (G. Sullivan, Microsoft), is areport of actions taken at the previous MPEG meeting where JVT was not present. G. Sullivan metwith MPEG to discuss a procedure to develop profiles & levels. He replied to MPEG that strongcollaborative work took place on the proposed guidelines for carriage (N4714, MPEG), and JVTbelieves these issues to be essentially resolved by the CD draft.

The JVT committed to providing configuration files for each defined profile to use the JVTreference software.

JVT-C156, MPEG Requirements for AVC Codec (N4672p) (MPEG), presents the philosophy,goals, and requirements for the Advanced Video Coding (AVC) standard to be developed by theJoint Video Team (JVT). The JVT understanding is that this input does not require change to

Page 6: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

6 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

current JVT video-level requirements. The JVT noted to MPEG that the YCbCr 4:2:2 requirementis unfulfilled in the CD.

JVT-C160 (MPEG) provides MPEG Guidelines Workplan for Complexity Analysis of AVC(N4571n). The JVT met with MPEG to discuss complexity analysis and decided to establish anAHG on complexity analysis with co-chairs J. Bormans (IMEC) and M. Horowitz (Polycom)using one reflector for both WG11 and JVT.

Inputs through VCEG

JVT-C153 (G. Sullivan, Microsoft) is a summary of JVT-Related Actions 2/02 from the SG16Meeting. This input contribution was appreciated. The JVT work ongoing is consistent with theremarks provided by SG16.

VCEG input comments:

• There should be a minimum number of profiles (definitely not more than three).• The profiles should be designed as something “to be proud of” in terms of coding efficiency.• Suggestion of “B+2” level of memory capacity in all profiles.

Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC)Summary

The JVT adopted Committee Draft of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC14496-10 AVC), text of its joint draft for technically-aligned standardization approval by the parentbodies, including in particular approval in ISO/IEC as a Committee Draft (JVT-C167 / N4810).This draft will be forwarded to the JVT parent committees. The draft includes the adoption of thechanges in Table 1, relative to the editor’s draft text in JVT-C039 (“JWD2r8”). Each of thedocuments mentioned in the table is described more fully below.

Document, andAcceptance

Title / Description Topic area

Removal of elementsnot specifyingnormative video-layerfunctionality

Removal of draft RTP payload specification (inAnnex A of JWD2r8), reference example encoder(in Annex B), interim file format (Annex C), andreference example error concealment description(Annex D)

Bug-fix/clean-up

Clarification fromdiscussion

Change “bi-directional” picture à “bi-predictive” picture

Bug-fix/clean-up

ConditionalJVT-C056subject to cross-verification and stresstesting

Dithered 5-tap Filter for In loop Deblocking Qualityenhancement

ConditionalJVT-C094subject to cross-verification

Complexity Reduction and Analysis forDeblocking FilterTo adopt modification 1 (removal of recursivenessin “default” filter, and modification 3 (simplifiedswitching for Bs=4, which is I-macroblockboundary). Not modification 2 at this time.

Complexityreduction

JVT-C027 Skip Mode Motion Compensation Bug-fix/clean-up

Page 7: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 7

JVT-C028 Context-adaptive VLC (CVLC) coding ofcoefficients

Qualityimprovement

ConditionalJVT-C038subject to cross-verification

Bounding the complexity of arithmetic decoding Complexityreduction

JVT-C060clearer text descriptionfor CD will beprovided

Improved CABAC Qualityimprovement

JVT-C061 Fast Arithmetic Coding for CABAC Complexityreduction

JVT-C064 Alteration of byte stream formatAdopted with N=1.

Definition ofsystem interface

JVT-C066,JVT-C067,JVT-C103

Multiframe Interpolative PredictionRelated docs:JVT-C047, JVT-C066, JVT-C067,JVT-C077, JVT-C103Adopt as in JVT-C067 and JVT-C103with revised syntax consideringJVT-C077 (non-baseline)

Very significantqualityimprovement

JVT-C069 Levels and HRD Clean-upJVT-C078 Coding of Parameter Sets Definition of

system interfaceJVT-C083 Signaling of “Clean” Random Access Points Clean-upConditionalJVT-C089with limit ofmaximum of 8 slicegroups and subject tofurther complexityanalysis andverification

FMO: Flexible Macroblock Ordering Improved errorresilience

JVT-C095 H.320 NAL for JVTNon-system-specific start code structure elementsonly

Definition ofsystem interface

JVT-C107 and JVT-C140

Additional Transforms and Quant.Adoption into non-baseline profile.

New feature

JVT-C114 The improved JVT-B097 SP coding scheme Very significantimprovement

Response to JVT-C116

Color SpaceAdd equivalent support to MPEG-2 & 4 Videoif information is not present: undefined or definedby system. “Luma/Chroma” terminology agreed.Sequence level.

Bug-fix/clean-up

Response to JVT-C120

Change of MV calculation for direct bi-predictivemotion. Send N1, N2 and D at picture level usingUVLC.MVf = N1*MV/DMVb = N2*MV/D

Bug-fix

JVT-C127 MV Prediction in B Pictures Bug-fix/clean-up

Page 8: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

8 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

JVT-C135 Pixel Aspect RatiosAdd equivalent support to H.263+/MPEG-4If indication is zero, undefined or defined bysystem. Add the 1/2, 2/3, and 3/4 variants of thenominal SDTV itemized PARs. Sequence level. 8bit table.

Bug-fix/clean-up

JVT-C136 Rounding, QP Origin, Dynamic Range, and |f|Item 1: Adopted.Item 2: Agree to initialize QP at slice level relativeto middle of range. Check current design at MBlevel and use that with mid-range prediction forslice level.Note: Could change after consideration of JVT-C079.Item 3: Adopt the first of the suggested alternativevariants.Item 4: Adopted.Item 5: Adopted.Item 6: Adopt subpoint 2, not the restItem 7: Not adopted.Item 8: No immediate action requested.

Bug-fix/clean-up

JVT-C137 Cropping, Generalized Pan-Scan, Source FormatsCropping and pan-scan concepts adopted.Source format modifications not adopted.

Bug-fix/clean-up

JVT-C138 TRs, PNs, Pictures, Frames, and Fields27 MHz support via general time clock support.Dangling field support adopted.Separate timing information for each field adopted.SEI for SMPTE-compatible compressed timingadopted.Picture numbering aspects for further study.

Bug-fix/clean-up

JVT-C162subject to cross-verification and somefurther study

Putting a Reasonable Upper Limit on BinarizationExpansion.

Bug-fix/clean-up

JVT-C166 Definition of HRD for VBR and CBR and lowdelay mode. Includes method of handling header-level data. CBR can be same as VBR with averagerate equal to peak rate. Explicit support of 3:2pull-down is included. Relationship to othertiming-related issues for further study.

Clean-up

Table 1. Changes to JVT-C039 to create JVT-C167 / N4810.

The document will be edited, and Joint CD text will be provided by the editor, by May 12, 2002.

The JVT agreed to remove elements not specifying normative video-layer functionality. Thisincludes removal of draft RTP payload specification (in Annex A of JWD2r8), reference exampleencoder (in Annex B), interim file format (Annex C), and reference example error concealmentdescription (Annex D).

It was agreed to change “bi-directional” picture to “bi-predictive” picture, for clarification.

Page 9: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 9

Draft Content of the Interface to External Systems-Level Specifications

The joint video specification defines a common representation of all parameter set data, slice data,and “SEI” data units that are carried in the same way in bitstreams and in packets. Each atomicchunk of such data is referred to as a NAL unit, where NAL is the “network abstraction layer.”This ensures no emulation of start code prefixes of the bitstream format within a NAL unit.

The joint video specification includes definition of a bitstream format (in a normative annex). Thisformat:

• Is capable of conveying all data “in band.”• Includes sufficient information to identify conformance of the video bitstream; this requires

inclusion of sufficient relative timing information.• Is capable of conveying temporal reference (relative timing data) within the video bitstream for

use in HRD conformance verification.

A packet-oriented interface is also defined within the joint video specification.

To support simple conversion between these two formats, distinct parts of the joint specificationdescribe what data needs to be carried, and

• The bitstream definition section defines how to put all that into a bitstream.• The packet-oriented definition section defines how to put this data into packets, and what other

information (such as timing information) is necessary for using those packets.

Compound packets are not defined within the joint video specification. If used, these are to bedefined externally.

In the bitstream format, NAL units are prefixed by unique start code prefixes; temporal referencedata is included at the picture or slice level. In the packet format, timing information support isdefined externally.

Further information on these aspects is available in JVT-C168, Report of Design Plan for JVTInterface to External Systems (G. Sullivan, Microsoft, JVT Rapporteur).

Requirements and the Definition of Normative Profiles and Levels

The Baseline Profile includes:

• I and P picture types• In-loop deblocking filter• Interlace (level-dependent, levels 2.1 and above)• 1/4-sample motion compensation• Tree-structured motion segmentation down to 4x4 block size• VLC-based entropy coding• Flexible macroblock ordering (maximum 8 groups)

This includes all normative decoding features except:

• B pictures• CABAC• Adaptive block-size transforms• 1/8-sample motion compensation

Page 10: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

10 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

• Mixing intra and inter coding types within a macroblock• Data partitioning• SP & SI “switching” pictures

One additional profile has been defined, the Main Profile. It adds the following differences relativeto the Baseline Profile:

• B pictures• CABAC• Adaptive block-size transforms

Notes on Profiles and Levels:

• MV range will be limited.• A limit is imposed on extreme aspect ratios.• Number of reference pictures at highest supported picture size is level-dependent.• Number of reference pictures increases when picture size is smaller than the highest supported

size for the level, never exceeding 15.

Items still TBD (to be determined):

• Exact values of MV range limits• Whether to allow smaller than 8x8 bi-predictive motion in B pictures for Main profile• Whether to include adaptive B picture interpolation coefficients for Main profile

Levels (in both profiles) are summarized with example typical format support as follows:

• Level 1=QCIF@15 (Intermediate Levels [email protected] and 1.2=CIF@15)• Level 2=CIF@30 (Intermediate Levels 2.1=HHR and 2.2)• Level 3=SDTV (Intermediate Levels 3.1 and 3.2)• Level 4=HDTV• Level 5=SHDTV (1920×1088×60p)

Regarding Requirements, YCbCr 4:2:2 source format support is currently defined (pending furtherstudy and sufficiently mature contributions to the next meeting).

Motion Compensation

“Improved MB Prediction” Modes

JVT-C119, Complexity Analysis of Improved MB Prediction Modes (S. Adachi, S. Kato, F.Moschetti, M. Etoh, NTT DoCoMo), reports the results of the complexity analysis at the decoderand encoder side, focusing on the impact of different MB coding strategies for motion estimation.In particular it compares the performance and the complexity of VCEG-O17 (Tree-structuredmacroblock partition, H. Schwarz and T. Wiegand, HHI), and VCEG-O22 (Refined Results Low-overhead Pred. Modes, S. Adachi, S. Sekiguchi, S. Kato, M. Kobayashi and M. Etoh, NTTDoCoMo, Inc., see CSR 12.48). It proposes the adaptation of the so-called “improved MBprediction modes” as a feature of higher profiles. IPR para. 2.2 is checked.

Intel Xeon Processor 1.7 GHz and 1024 MB of memory were used for the test. Decodercomplexity was reported to be similar. Encoder complexity was reported to between 5% and 14%higher than VCEG-O17.

Page 11: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 11

For encoder operation, scalable mode selection is proposed. The seven shapes of WD1 areproposed for baseline while the complete set of 19 shapes is proposed for a higher complexityprofile.

The group commented that an increased number of MB modes causes increased amount of code -maybe leading to cache misses. It may require 4x4 MC for all MB modes. Some people pointedout that “the above statement about cache misses is true for the WD2 model as well.”

Adding Intra provides 0.8 % additional bitrate savings.

Some participants were sympathetically disposed to the concept, but for motion estimation inhardware, complexity may be increased.

It was remarked that no clear judgment of code size / complexity relationship is possible.

Compression quality relative to JWD2 was expected to be 3-4% better.

JVT-C119r1 is the revised version. It was not available during the meeting.

Motion Interpolation

JVT-C014 is the AHG Report on Motion Interpolation (T.-I. Johansen (Tandberg). Significantactivity was reported. Three contributions for adaptivity and three for complexity were noted.

JVT-C037, Full 16-bit implementation of 1/4-pel motion compensation (F. Bossen, NTTDoCoMo), provides information about an implementation of quarter-pel motion compensationcompliant with JWD2 could be implemented that requires 16-bit registers only.

It was suggested to turn this into an informative Annex. It was also reported that when using a 2-Dkernel using only 14-bit registers should suffice. For DSPs, a shift-based implementation oftendoes not make a difference, while for ASICs it could.

The group suggested that F. Bossen draft a short informative annex for consideration. However, noinformative annexes were included in the CD output draft due to scheduling issues. The groupplans to pursue the drafting of such informative content on a slower schedule than that of the maintext.

Using a set of pre-calculated filter coefficients, the proposed encoder in JVT-C040, AdaptiveMotion Interpolation on MB-basis (K. Chono, Y. Miyamoto, NEC), selects a filter number andsends the selection to the decoder on a macroblock basis.

• PSNR: Reported up to 0.5 dB, 5% BD-PSNR on Mobile, average 2% overall, additionalinformation provided.

• Complexity: Higher than current 1/4 pel method, but less than 1/8 pel.• Common test conditions: yes.• Perceptual quality: No demo. The proponent’s impression is that there is a very small but

visible gain.• Verification: No.• Software: Used JM1.4, not yet made available to JVT.

IPR para. 2.2 is checked.

Page 12: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

12 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Currently, a 6-tap filter is used for 1/4-pel MC and an 8-tap filter is used for 1/8-pel MC, whichrequires high decoder computational complexity and memory bandwidth, especially for SDTV andHDTV bitstreams. Various solutions are proposed in JVT-C052, Adaptive MC Interpolation Filterfor Complexity Reduction (K. Sato, T. Suzuki, Y. Yagasaki, Sony):

Solution 1: FIR1 for larger MC blocks than 8x8; FIR2 for smaller MC blocksSolution 2: P-picture …FIR1 for larger MC blocks than 8x8; FIR2 for smaller MC blocks

B-picture …FIR2 for all MC blocksSolution 3: P-picture …FIR1 for larger MC blocks than 8x8; FIR2 for smaller MC blocks

B-picture …FIR2 for larger MC blocks than 8x8; FIR3 for smaller MC blocks

FIR1 {1, -5, 20, 20, -5, 1} // 32FIR2 {-1, 5, 5, -1} // 8FIR3 {1, 1} // 2

Subset of test set was used. Very little losses were reported. It was noted that Solution 3 requiresthe least complexity, but it might cause a severe loss in coding efficiency. Verification: code will bemade available.

JVT-C052 proposes Solutions 1 and 3. IPR para. 2.2 is checked.

In regards to Motion Search, currently three different sub-pel filtered areas are used in motionestimation. The impact of using one filter for motion estimation instead of three and thencompensating with the right one is not known. No subjective tests were conducted.

There was roughly 1% loss. The contribution reported roughly 40% less complexity for P picturesin the decoder – and a complexity increase in the encoder.

JVT-C057, Evaluation of Adaptive Interpolation Filter for Real-time Applications (A. Fuldseth,Tandberg), evaluates the use of an adaptive interpolation filter with focus on real-time applications.In particular, various methods were investigated to reduce the encoder complexity. The author hadnot been able to find low-complexity implementations that justify the inclusion of an adaptiveinterpolation filter in the standard.

• A codebook of candidate filters is used.• Different filters in (0,1/2), (1/2,0), and (1/2,1/2) positions respectively.• Basic algorithm gives about 0%-5%.• Using a codebook of 128 filters that was generated manually reduced gain by 0-1%.• No explicit results provided.

This contribution was presented as an information document. Tandberg tried to achieve complexityreduction, but found loss of performance – the contributor does not think much complexityreduction is likely to be achievable.

JVT-C059, New Results on Adaptive Interpolation Filter (T. Wedi, Hannover U.), presents anadaptive interpolation scheme. This interpolation scheme is based on filter coefficients that areadapted once per frame to the non-stationary statistical properties of the video signal. The filter-coefficients are coded with 12 bits (requiring 32 bit registers) and transmitted in the slice header.Due to the adaptive interpolation filter, a coding gain up to 0.8 dB PSNR was reported.

• BD-Rate savings for a set of CIF sequences is 0.8-8% and for one sequence (Waterfall) thegains were up to 18%.

Page 13: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 13

• Common conditions were 4.9% average for 1 reference frame, 3.2% for 5 reference frames.8% peak for 6-tap, 12% for 8-tap. On other tested sequences, the gain is higher.

• Subjective results were OK.• Verification: software of older version is available; the presented version will also be made

available.• Results for B frames were not available.• Group: Gains are impressive. Simple encoder implementation requested in order to show

graceful degradation.• Feature adds some flexibility for encoder optimization.• Puts a potential burden on hardware decoder.

An idea was suggested for possible adoption at the next meeting in a non-interactive profile, subjectto successful tests with 5 bit (instead of 12-bit) quantization and 16 bit implementation and resultsfor B frames. Further work is planned.

JVT-C104, 4-Tap Motion Interpolation Filter (J. Boyce, C. Gomila, Thomson), proposes inclusionof a 4-tap 1/4 pel motion interpolation filter in the standard, selectable by profile. Experimentalresults were presented comparing 4-tap and 6-tap performance, with and without the use of Bframes and multiple reference frames.

• The average loss is about 6% with a maximum of about 12%.• Subjective problems are reported with 4 tap filtering.

JVT-C105, Block Boundary Mirroring for Motion Interpolation (U. Benzler, M. Wollborn,Bosch), notes that for motion compensation, the subpel values have to be generated by a FIRinterpolation filter. For a 4x4 block, this means to access a block of 9x9=81 pel instead of the5x5=25 pels which are needed for bilinear interpolation. An efficient method for reducing theamount of reference memory access is proposed. A similar method is found in the MPEG-4 Visualstandard.

It was agreed to continue the Interpolation AHG with a mandate to study implementation/complexity and performance aspects of sub-pel MC.

Bidirectionally Predictive (B) Frames

JVT-C044, Introducing Direct Mode P-picture (DP) to reduce coding complexity (Q. Gu, Q.Wang, W. Qi, S-L. Chen VWeb), introduces a new Direct Mode P-picture type to reduce codingcomplexity, in which the motion vector and mode information can be derived directly from previousnormal P-picture without motion estimation and the mode decision process. The decrease inresidual coding efficiency is reported to be offset by the savings in motion vector and modeinformation overhead coding. This contribution is proposing a syntax change. It suggests usingthis on every other picture – not sent mode and MV information, which is a lot of the bits at low bitrate.

One participant noted that one can get the same complexity reduction without syntax change. Theproponent responded that this has coding efficiency improvement relative to the no-syntax-changeapproach (results not provided).

JVT-C076, Coding of Faded Scene Transitions (D. Tian, TICSP; M. Hannuksela, Nokia), presentsa so-called “overlay” coding technique for faded scene transition coding. Overlay coding is basedon independent coding of the source sequences of the scene transition and run-time composition ofthe fade. This paper analyzes possibilities to preprocess the source sequences and finds that pixel-

Page 14: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

14 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

amplitude weighting methods can further improve the performance of simple overlay coding.Random access is possible with overlay coding.

Although complexity can be controlled versus level assignment, the increase was considered to bedifficult to estimate. The gain of the proposal is significant but its relationship to the complexityincrease remains unclear. Gamma correction may impact the generality of the compositionapproach.

JVT-C120, B picture coding for sequence with repeating scene changes (B-M. Jeon, LGElectronics), notes that in many video applications, a scene or part of the scene may appearrepeatedly such as in news, sports broadcasting, a close-up conversation like an interview, andmulti-point video conferencing. Such a change in the video scene will cause the scene cut picture tobe encoded in Intra picture mode or the partial scene-change picture to create a large number ofIntra coded macroblocks. Because the Intra mode will normally require much higher bitconsumption, it can cause a burst of increased bit rate and this problem will be particularlysignificant in low bit rate applications. As a way to absorb such a burst of bit rate, the rate controlmechanism has been applied but it can result in a sharp drop in image quality and create unpleasantvisual effects. Instead, the current JVT model employing long-term buffer scheme can significantlyimprove the coding efficiency when there is multiple repeating scene changes in the sequence. Thatis, for a scene cut picture or a partial scene-change picture, the long-term picture belonging to thesame scene group will be used as a normal reference picture in addition to multiple short-termpictures so that Inter mode will be selected as a coding mode instead of Intra mode, and it helpsavoiding a burst of bit rate. Those long-term pictures are pictures that initially turned out to be ascene cut picture or a partial scene-change picture by a scene-change detector in encoder side, andwere encoded in Intra picture mode for scene cut picture and in Intra MB mode for partial scene-change region, and finally were stored in long-term buffer by MMCO command.

It is known that B pictures generally improve the overall compression efficiency as compared to thatof P pictures. Especially, a direct mode is dominant at low bit rate because it does not require anyside information, whereas a bi-directional mode is dominant at high bit rate because theperformance at high bit rate depends on the efficiency of the residual coding, and the bi-directionalmode can provide the more accurate prediction.

Now, consider that the B picture is inserted when the sequence with repeating scene changes iscoded. If the co-located macroblock in the temporally subsequent picture uses the long-termpicture for the prediction, the remaining issue is how to get the forward and backward motionvectors of the direct mode because the temporal distances TRB and TRD in the conventionalequations in JWD2 are no longer appropriate. JVT-C120 proposes how to derive the forward andbackward motion vectors of the direct mode, when there is a scene change between the most recentprevious picture and the temporally subsequent picture and the motion vector in the correspondingmacroblock in the subsequent picture was obtained from a long-term picture. JVT-C120 notesconsiderable gains for the affected B pictures. IPR para. 2.2 is checked.

In response to JVT-C120, the group agreed to a change of MV (motion vector) calculation fordirect bi-predictive motion. It was agreed to send N1, N2 and D at picture level using UVLC:

MVf = N1*MV/DMVb = N2*MV/D

Concern was expressed regarding the use of a division operator. Alteration to(N1*MV+offset)>>K form, possibly with zero offset is for further study. After the meeting,remarks were made that significant loss of compression quality was reported to result from theadopted change by B-M. Jeon. Further study is expected.

Page 15: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 15

JVT-C121, New syntax for Bi-directional mode in MH pictures (B-M. Jeon, LG Electronics),notes that when bi-directional mode in a bi-predictive picture is used, the combination employed hasto be specified to separate two index parameters, ref_idx_fwd and ref_idx_bwd. JVT-C121proposes introduction of a new syntax element “Multi-Hypothesis mode Indicator” into amacroblock layer of bi-predictive pictures to indicate direction information for bi-directional mode.IPR para. 2.2 is checked.

The group agreed to change the nomenclature of “bidirectional picture” in the draft to “bi-predictive” picture for enhanced clarity of the intent to decouple the decoding design from priorassumptions of the temporal direction of bi-predictive motion compensation.

The draft was double-checked to ensure clarity of the relationship between time and motioncompensated prediction operation, and further work on the subject of that clarification is expected.

JVT-C127, Motion Vector Prediction in Bidirectionally Predictive (B) frames with regards toDirect Mode (A. M. Tourapis, J. Xu, F. Wu, S. Li, Microsoft), investigates the performance of themotion vector predictor within B frames, with regards to the B Direct mode. It reports that theexisting JVT standard does not specify how such should be performed, whereas in the currentsoftware Direct Mode, motion vectors are not used for the prediction. Instead it proposes that BDirect mode motion vectors also be used within the prediction, thus improving B frame codingefficiency up to 4.5%. The scheme also increases motion vector correlation for B frames within theencoded stream which is quite important for error concealment purposes.

The results with CABAC are missing. Software was provided for verification. No subjectiveresults were reported. JVT-C127 was accepted as the change appeared to be a minor bug-fix/clean-up issue and unlikely to cause problems.

JVT-C128, Direct Prediction for Predictive (P) and Bidirectionally Predictive (B) frames in VideoCoding (A. M. Tourapis, J. Xu, F. Wu, S. Li, Microsoft), proposes a new Inter macroblock mode,which is an extension of the Direct Mode used within B frames. This new mode is reported toeffectively exploit temporal correlation of motion information within a sequence, thus effectivelyimproving performance. The implementation is reported to be relatively simple, with littlecomplexity added on the encoder/decoder. The combination of this mode with the predictor skipmode (MotionCopy, JVT-C027) is reported to achieve considerable bitrate reduction (up to 10.7%- average 3.85%). Further extensions of this mode, with the addition of submodes and theinclusion of multihypothesis prediction, are also described; the new mode also appears to be verypromising for a future implementation within B frames.

The average gain on top of JVT-C027 is 1.65% with a maximum of 5%. A problem in error proneenvironments was mentioned and the gain appeared not to be worth the syntax change.

Multiframe Motion Prediction

JVT-C010 (M. Schlockermann, Panasonic) is the AHG Report on Multiframe Buffering Syntax.Active AHG activity was reported. In the new definition of the direct mode, it was reportedly notclear which motion vector to use. The storing in display order may have an error resilience impact.

JVT-C066, Multiframe interpolative prediction with modified syntax (Y. Kikuchi, T Chujoh, S.Kadono, Toshiba), notes that the multiframe interpolative prediction (MFIP) which was firstproposed in JVT-B075 exploits a temporal-interpolation on multiple reference frames. JVT-C066proposes a modified syntax based on the B-picture in the current JWD. This syntax enablesswitching between the two default MFIP coefficients with no overhead, and requires only minorchanges. IPR para. 2.2 is checked. Simulation results are shown in JVT-C047 (Result on

Page 16: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

16 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Multiframe interpolative prediction with modified syntax , S. Kadono, M. Hagai, Matsushita), JVT-C067 (Experiment result on multiframe interpolative prediction with default coefficients, Y.Kikuchi, T Chujoh, Toshiba), and JVT-C103 (Interpolation coefficient adaptation in multiframeinterpolative prediction, Y. Kikuchi, T Chujoh, Toshiba). These three papers verify the effectivenessof the MFIP. SNR improvements of up to 1.7 dB for fading sequences are shown even withsimple default coefficients, and extremely high coding gain of up to 5.8 dB by adaptivelydetermining the coefficients. Since the coding gain derived by the MFIP is significant, it wasaccepted for the JVT CD, as shown in JVT-C067 and JVT-C103, with revised syntax consideringJVT-C077 (below), into non-baseline, with a final decision left to the Profiles and Levelsdiscussion.

In JVT-C047, MFIP shows an average bit saving of 14-28% for fading scene. By downloadingweighting coefficients, the average bit savings sometimes exceeded 50%. It verifies results of JVT-C067 and JVT-C103. It was noted that multiplying by two requires clipping.

JVT-C067 reports experiment result on multiframe interpolative prediction (MFIP) proposed inJVT-B075. It automatically works with Lagrangian mode selection. PSNR improvement arereported up to 1.7 dB with simple default coefficients (1/2,1/2,0) and (2,-1,0). For non-fadingsequences, the gains with 2 reference pictures are small. Verification is OK.

The coding performance of the multiframe interpolative prediction (MFIP) with interpolationcoefficient adaptation was evaluated. Simulation results in JVT-C103 show extremely highimprovement of up to 5.8 dB. Subjective results showed a significant gain. Complexity is 2multiplications and 1 shift, 1 addition, and clipping instead of 2 shifts. Verification was OK.

JVT-C077, Generalized B/MH-Picture Averaging (M. Hannuksela, Nokia), presents a generalizedformula for pixel amplitude scaling according to temporal distance. Under certain constraints, theformula equals to weighted averaging of B-pictures (Q15-K44, August 2000, see CSR 11.10) andinterpolative motion compensation (JVT-B075). In addition, it proposes explicit signaling of pixelamplitude scaling factors.

JVT-C049 (P. Borgwardt, VideoTele.com) Multi-picture Buffer Semantics for Interlaced Coding,provides three proposals related to the Multi-picture Buffer. Semantics are proposed for handlingthe mix of frames and fields in the Multi-picture Buffer: picture number (PN) is incremented by 1for each frame. JVT-C049 also proposes to de-couple RPSLI and RPBT in the syntax, and tomove Multi-picture Buffer re-ordering and management commands from the slice layer to thepicture parameter set. The assumption is that the picture is stored in Adaptive Memory Control.Codewords will be added to indicate reference buffer resampling from the first slice and codewordon how to store the picture. Field/frame adaptive operation analysis is required.

Motion Vector Coding

JVT-C041, Core Experiment Results on Adaptive MV Coding (Y. Suzuki, Hitachi), provides areport of core experiment results on adaptive motion vector coding and the revised text for the CD.The proposed scheme provides general improvements up to 5% for similar complexity with JWD2.Adaptive MV coding sub-samples the MV table to more efficiently transmit MVD. Syntaxextension of MB modes as well as the 8x8 block indication are needed to signal accuracy. Theproposed scheme is reported to provide general improvements on average between 2-3% BDBRwith similar performance for most cases with 0.27-4.5% for higher complexity with B frames and0.15-3.2% for baseline case without additional complexity. IPR para. 2.2 is checked. The groupagreed that there is good trade-off between complexity and gain, but not enough support to adoptthe feature. The coding efficiency gain is too small.

Page 17: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 17

JVT-C051, Result of Core Experiment on Adaptive MV Coding (K. Sato, T. Suzuki, Y. Yagasaki,Sony Corp.), uses two coders to evaluate the new entropy coding for motion vector data:

• Anchor TML: the previous or latest TML on and after the version 1.7• AMVC software: the anchor TML with the adaptive MV coding described in JVT-B115 (see

CSR 13.03).

It presents the CE (Core Experiment) result proposed in JVT-B061 (see CSR 13.03), and verifiesJVT-C041.

JVT-C097, Context-based motion vector coding with 2-Dimensional VLC (A. Nakagawa, Fujitsu;T. Takeshi, Waseda University), notes that the probability density function of differential motionvector values is strongly dependent on the motion of the local area in a frame. It evaluates acontext-based motion vector coding scheme. The scheme utilizes statistics of the motion vectorsestimated from the motion in the surrounding area and switches the VLCs for encoding the motionvectors adaptively. In addition, a 2-Dimensional vector coding is introduced to improve the codingefficiency of the scheme. There was an average gain of 1.3% with up to 4% for Forman. Furtherwork and discussion in relation to entropy coding is encouraged.

Global Motion Vector Coding and Global Motion Compensation

JVT-C011 (H. Kamata, NTT; J. Lainema, Nokia) is the AHG Report on GMVC/GMC. Itconfirms the desirability of GMVC/GMC and the so-called “Motion Copy” technique providedby Nokia on March 15. Based on this technique there have been three modified techniques,Mcopy_r1, Mcopy_r2, and Mcopy_r1+r2, proposed. The subject has been very extensivelystudied.

In JVT-C021, Experimental results of GMVC and “Motion Copy” (S. Sun, S. Lei, Sharp), theGMVC technique uses a parametric motion model (perspective or affine) transmitted at the sliceheader. In the MB layer, the conventional copy mode for an MB is replaced by a GMVC_COPYmode as when GMVC_flag is ON. An additional GMVC_BLOCK mode is defined as a sub-partition mode for the 8x8 (split) MB mode when GMVC_flag is ON.

The bitrate savings are up to 20% at QP=28 and Foreman, CIF. The average bitrate savings for thetest set are 2.05%. For B frame coding, the maximum bitrate saving is 4.8% with and average of1.45%.

JVT-C022, Additional “zero-motion” subblock for “Motion Copy” (S. Sun, Sharp), providesresults of a study for “Motion Copy” with an additional “zero-motion” sub-block mode. Theperformance is slightly improved against JVT-027 with the option to skip also 8x8 blocks.

JVT-C027, Skip Mode Motion Compensation (J. Lainema, M. Karczewicz, Nokia ResearchCenter), proposes a simple modification to the Skip mode motion compensation. A motion vectoris generated similarly to the prediction motion vector for the 16x16 macroblock mode and is used inmotion compensation. That is, no syntax changes are proposed, but the way Skip mode isinterpreted is modified. Compression efficiency improvements are similar to those of significantlymore complex GMC/GMVC methods.

The new rule is that zero motion vector is used if:

• A Macroblock immediately above or to the left is not available (that is, is out of the picture orbelongs to a different slice).

Page 18: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

18 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

• A Macroblock or block immediately above or to the left that is used in motion vector predictionhas a zero motion vector and uses the latest picture as reference in motion compensation.

Otherwise the predictor is used for skip-mode motion compensation prediction.

Bitrate reductions are only at the very low bitrate end up to 18% at QP=28 and Foreman, CIF. Theaverage bitrate savings for the test set are 2.2%. For B frame coding, the maximum bitrate saving is5.4% with an average of 0.35%. The group felt that the performance vs. complexity tradeoff isreasonable and JVT-C027 was accepted.

JVT-C043, GMVC and GMC with B-picture (H. Kimata, NTT), proposes GMC and GMVCcombined with “Motion copy.” Experiments with both P-picture and B-picture showedimprovement of coding efficiency up to 9.68% BD-bit rate savings, especially in a sequence withzoom. The maximum (best setting) average bit rate saving over the test set was 3.36% and 4.52%over a set of sequences with global motion. Maximum bit rate saving against B was 3.5% with anaverage of 2%. IPR para. 2.2 is checked. Comparing JVT-C027 and JVT-C043, the group feltthat the performance vs. complexity tradeoff is not reasonable. Further improvements on top ofJVT-C027 or reductions in complexity would be required.

Deblocking Filter

JVT-C007 (P. List, Deutsche Telekom) is the AHG Report on Deblocking. Significant activity inthe area of deblocking filter work was noted.

JVT-C035, A Low Complexity Deblocking for JVT (J. Jung, Y. Le Maguet, C. Miró, J. Gobert,Philips), is a follow-on to JVT-B037 (CSR 13.03). A low complexity deblocking algorithm isproposed, referred to as FADA (for Fast Deblocking Algorithm). This new algorithm wasimplemented in the encoding loop and compared to the JML1.9 deblocking in terms of both qualityand complexity. The contribution reports that FADA is equivalent to the JML1.9 deblocking interms of perceived visual quality. Complexity gain was reported between 1 and 2.2 depending onthe target platform. IPR para. 2.2 is checked. The presentation included:

1) 4x4 classification based on luma into categories: homogeneous, row, column, texture2) Filter selection: no filter, one, two, or three samples each side3) Filtering

The same filter was applied for each four-sample edge. The encoder indicates on a picture-by-picture basis whether the “same MV with zero residual” edges are filtered or not. A demo of theartifact on a frame of Tempête was shown. Another example was given verbally: blocking theartifact present in one frame with subsequent zero motion vectors can produce persistent blockingartifacts. A question was asked whether the encoder could possibly detect and correct in someother way.

A complexity analysis was given of 1) CPU cycles, 2) min, max, abs type instructions, and 3)parallelization. The group questioned the worst-case versus average complexity.

Verification: Software was provided to the AHG a month ago. UB Video and Nokia have tried it.

The JVT Group agreed that this proposal needs more analysis. At this stage there was noagreement that this indeed reduces complexity for all architectures.

The following need more work:

Page 19: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 19

• Memory size, ASIC architectures, Pentium-based architecture• Needs more evidence that complexity is indeed reduced• Drop in SNR and Bit Rate is a concern.

After the demo, there still was no consensus to accept the proposal. The group suggestedconducting more experiments on a wider class of video sequences.

JVT-C056, Dithering 5-tap Filter for In-loop Deblocking (G. Conklin, N. Gokhale, RealNetworks,Inc), proposes a change to the 5-tap filter used in in-loop deblocking. The deblocking algorithmused in the current (JM 2.0) test model switches from a 4-tap filter to a stronger 5-tap filter forsmooth regions of the frame. Specifically, the 5-tap filter is used on the macroblock edges ofINTRA_16x16 macroblocks in smooth parts of the image. By using a pseudo-random additionfactor in the filter definition, a small (+/- 3/8 LSB) dither is applied to the filtered image. Thisdithering effect improves visual quality for two reasons:

• It allows for effective smoothing between blocks that differ by a single luminance level.• It visually “breaks-up” the block structure that can remain even after filtering.

While the proposed scheme improves visual quality, the effect on bitrate and PSNR are negligible,as results will show. To further improve visual quality, additional results will be presented using arecursive version of the 5-tap filter.

The current JVT design has a 5 tap filter which is now used for smooth intra 16x16 regions. Thereis still blockiness on some displays. This is particularly annoying in smooth areas.

Verification: Software was provided to the ad hoc group approximately one month before themeeting.

Discussion noted:

• Need to see demo for Moving Pictures – for the two sequences shown, quality was a little better(smoother but visually better).

• Improves the sequences with large flat areas.• Need to compare against Post Filtering.• May add some burden to low-end processors that exist today.

The JVT Group gave this contribution conditional acceptance, and noted the need to do more testsfor lower QP values and more number of sequences.

JVT-C094, Complexity Reduction and Analysis for Deblocking Filter (J. Au, B. Lin, A. Joch, F.Kossentini, UB Video), attempts to reduce the complexity of the JM 1.9 deblocking filter describedin JVT-B118r7, Working Draft 2 Revision 7, as much as possible while providing equivalent orbetter subjective quality. JVT-C094 proposes three simplifications to the JM 1.9 deblocking filterto reduce complexity by an average of 23% for implementations on media processorimplementations while providing equivalent to slightly improved subjective quality. Hardwareimplementations will also benefit from the proposed modifications. The original JM 1.9deblocking filter and the proposed filter are compared in terms of their optimal implementationcomplexity on the TI C64x DSP platform. The proposed deblocking filter is compared to the latestPhilips Deblocking Solution (PDS) deblocking filter software (provided by Philips to thedeblocking filter ad hoc group on April 19, 2002), which is a modified version of the filter proposedin JVT-B037 (CSR 13.03, Feb. 2002). While PDS is designed for low complexity deblocking onmedia processors, this complexity analysis has found that the proposed deblocking filter has similarcomplexity to PDS while maintaining slightly better subjective quality, as indicated by formalsubjective test results.

Page 20: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

20 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

The software was available to the AHG a week previous to the meeting. Verification: Nokia triedtheir software and found essential equivalence in visual quality (very slight preference if any to JMmethod). There was no clear feeling on the complexity estimates.

The proponent noted that JVT-C094 is similar to the Philips proposal, but the Philips proposal isnot as good (particularly QCIF, especially Foreman). M. Horowitz (Polycom) may be able tocomment on complexity. It was agreed that the Foreman QCIF is the most complex one, and themost important case for deblocking in the common conditions.

The group agreed to adopt modification 1 (removal of recursiveness in “default” filter), andmodification 3 (simplified switching for Bs=4, which is I-macroblock boundary). Further workwill be conducted to see if some parts of JVT-C130 can be included. The JVT conditionallyaccepted the proposal in JVT-C094, subject to cross-verification.

JVT-C111, Revised Downloadable Threshold Tables for Loop-filter (T-W. Foo, S-M. Shen, S.Kadono, Panasonic), divides the loop-filter into two parts, the detection of block edge and thefiltering part. The detection of block edge is dependent on a set of threshold values, in the case ofJVT codec, the ALPHA and BETA tables. The quantization parameter (QP) of the macroblock willdetermine which ALPHA and BETA values to be used for detecting the block edge. It reportedlyimproves threshold values for subjective video quality. IPR para 2.2 is checked.

Syntax was added to specify threshold values for decoding. The proponent indicated that currentthresholds are optimized for CIF picture size. Larger thresholds seem to work better on bothsmaller and larger picture sizes. Some examples are shown of artifacts alleviated withdownloadable method.

The JVT Group discussion noted that more evidence is needed. The group liked the concept buthad no clear consensus how often to download (Slice Layer downloadable?). P. List (DeutscheTelekom) noted that a fixed number of switchable tables is desirable. JVT-C111 was not acceptedat this meeting. If further results look encouraging, the group may accept the concept at the Julymeeting but not for baseline text.

JVT-C112, Automatic Selection of Threshold Values for Downloadable Threshold Tables (T-W.Foo, S-M. Shen, S. Kadono, Panasonic), presents a program for automatic selection of thresholdvalues for video sequences, to support proposals in JVT-C111 and JVT-B084. Differentthreshold values are reported to optimize the subjective video quality of encoded video. Encoder-only optimization was used for JVT-C111. IPR para. 2.2 is checked. The program:

1. Choose a number of frames (e.g., 5 frames at beginning of sequence).2. ode them using different threshold values.3. hoose the set with the highest PSNR.4. Encode the entire sequence using those thresholds.

JVT-C130, Simplification of the JVT deblocking filter (C. Gomila, J. Boyce, ThomsonMultimedia), proposes a simplification of the deblocking filter currently in the draft standard. IPRpara. 2.2 is checked. The aim is to reduce the number of line-based operations, which are reportedto be by far the most demanding term when evaluating the computational cost of filtering. Onaverage, the proposed method was reported to use on average 47.81% fewer operations than the JM1.9 deblocking filter, with minimal impact on visual quality and PSNR. The simplification appliesto:

• HVS argument for reducing processing in very dark and very bright areas (Y below 48 andabove 232).

Page 21: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 21

• Link chroma and luma filtering (reduce computations to determine chroma filtering conditions).Lock them together so that one filters chroma if and only if one filters luma.

• Avoid distinction between stronger and normal filtering – change the stronger F6 filter to F4,change the chroma length-4 filter to F2. F# indicates number of samples that change on eitherside of edge.

JVT Group Discussion noted that the proposal in JVT-C130:

• Needs more analysis for wider classes of Video.• Reduces memory requirements.

The group is concerned about reduction in quality. They liked some of the ideas but are hesitant toaccept the whole proposal. Thus, it was not accepted at this meeting. Work will start to combinesome of the ideas here with JVT-C094. Participants include C. Gomila (Thompson), M. Zhou(TI), and A. Joch (UB Video).

HRD/VBV Buffering Model

JVT-C020 (E. Viscito, GlobespanVirata) is the AHG Report on Buffering. Issues noted includehow to account for header data, timing, and low-delay versus non-low-delay.

JVT-C166, Description of Hypothetical Reference Decoder (HRD) and Buffering Verifiers (M.Hannuksela, Nokia; E. Viscito, GlobespanVirata), is a document for consideration on a definition ofHRD for VBR and CBR, and low delay mode. It includes a method of handling header-level data.CBR can be same as VBR with average rate equal to peak rate. It includes an explicit film modestate for 3:2 pull-down.

The JVT group discussed incorporating the HRD Syntax and text discussed during the plenary intothe CD in addition to prior HRD text. They suggest not including special splicing features. It wasagreed to adopt this paper except for explicit film mode state for 3:2 pull-down – that issue is to beresolved in further study as it relates to JVT-C138.

JVT-C138, Temporal References, Picture Numbers, Pictures, Frames, and Fields (G. Sullivan,Microsoft), contains a design for the representation of time and the management of multi-picturebuffer contents for the JVT codec, including discussion of the use of interlaced frames and fields aswell as progressive-scan pictures.

• 27 MHz support via general time clock support.• Dangling field support adopted.• Separate timing information for each field adopted.• SEI for SMPTE-compatible compressed timing adopted.• Picture numbering aspects for further study.

JVT-C046, NAL for H.264 with MPEG-2 Systems (A. MacInnis, S. Chen, Broadcom), proposesthe specifics of an MPEG-2 Systems NAL for use with H.264. The proposal is consistent with thecurrent normative agreements of draft H.264, and it attempts to harmonize other H.264 conceptsand suggestions with existing video industry practice and MPEG-2 Systems and Videospecifications. Due to the existing very wide deployment of systems, devices and software whichutilize MPEG-2 Systems and MPEG-2 Video, and the resulting opportunity for large scaledeployment of H.264 in such systems, the NAL to package H.264 video in MPEG-2 shouldconform as closely as possible with the outer layers of MPEG-2 video, subject to the efficiencygoals and semantic requirements of H.264, while the MPEG-2 Systems layer should be unmodified

Page 22: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

22 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

from its original specification, except for the unavoidable assignment of stream IDs and similarminor modifications. It was noted that this proposal also contains some HRD content.

Comments from the group included:

• The proposal resembles JVT-C055 (below) to some extent.• What is the difference between “Type-2” and “Type-1” slices? Why are they both needed?• The proposal does not follow the spirit of the current JVT high-level syntax based on parameter

sets.• The proposal is not optimal for cross-network gateway design.• How is the proposal going to affect profiles?• It was requested to create a table about the critical requirements of the “TELs” being defined

(H.320, ISO media file format, RTP, and MPEG-2 Systems).• Fundamental difference between RTP-oriented TEL (Transport Encapsulation Layer) and

MPEG-2-oriented TEL: the lack/presence of picture header. Picture-synchronous data could berepeated to ensure its correct reception in error-prone environment.

• There were concerns about the overhead of carrying the current JVT slice header in each slice inerror-free environments. It was agreed to leave the slice header as it is, because the overheadwas considered to be moderate.

• Picture header can be useful to find a complete picture. However, a flag at the beginning of theslice header or a picture start code alone or a bit in the PTIB/NALP_type could indicate similarfunctionality.

• In MPEG-2, vbv_delay is a part of each picture header. In JVT, vbv_delay could be carried inan SEI message, as it is not crucial if it is received for each picture.

• A fixed-length slice type in a fixed position of the slice header was proposed.• It also has some slice types in it.• Further discussion: Type 2 and Type 1 are not the same.

Proponents were encouraged to bring in evidence of the benefits of having Type 2 Slices. Ifevidence is found to be satisfactory during the July meeting, it will be accepted.

JVT-C055, NAL syntax and carriage of JVT stream over MPEG-2 Systems (T. Suzuki, N. Oishi,Y. Yagasaki, Sony Corp.), provides detailed definitions of NAL for MPEG-2 systems, based on thestudy in JVT-B070 (CSR 13.03). IPR para. 2.2 is checked. See also the discussion on NAL,High-Level Syntax, and Robustness, below.

JVT-C069, Levels and HRD (M. Hannuksela, Nokia), proposes processing and storage space leveldefinitions for the JVT codec. It also proposes a Hypothetical Reference Decoder (HRD)including a low-delay mode. The current working draft assumes clearly-established decoding time.

Proposed:

• Slice arrives as a chunk into HRD at a time corresponding to the time of the last bit reception ofits content.

• Post-decoder buffer of uncompressed pictures holds pictures until they are no longer neededfor reference or display.

• Each NAL packet has a hypothetical transmission time (time of the last bit arrival for the packet)at the bit rate.

• Note: There is also some levels definition content.• Buffer inspected at base clock tick rate.• NAL packet including its header goes to buffer.• Low-delay mode: the picture is removed once it has arrived.• Normal mode: it is removed at its decoding timestamp.• Normal mode: space must be available in the post-decoder buffer

Page 23: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 23

Discussion yielded the following comments:

• Low-delay mode (in proposal): Can this cause a problem of the decoder getting ahead of thecontent?

• Proposed to send initial post-decoder buffering period.• Must a picture arrive by its presentation time? (e.g., MPEG-2 low-delay mode).• Some syntax is assigned to the parameter set, some to SEI. Operation points and post-decoder

buffer size sent at parameter set level.• Does post-decoder buffer size need to be specified?• Maybe pictures shouldn’t be sent in decoding order.

“Low delay” operation means not necessarily completely reconstructing the input relative timing(e.g., MPEG-2 low delay or H.263 or H.261 HRDs) at the output. “Non-low-delay” operationmeans the output of the HRD decoding process will preserve the relative timing of all TRs (PTSs).

Distance between PTSs must be at least the equivalent of the MPI.

There is a question whether “low delay” rules out the need for picture reordering.

Requirements for encoder and decoder compliance are specified.

The disposition was as noted above with regard to JVT-C166. It was agreed to provide textregarding post decoder buffer for tentative adoption, subject to profile restrictions.

JVT-C163, Concept of average rate (T. Fautier, Philips), introduces the concept of “Average bitrate concept for VBR transmission,” which is reported to not exist either in AVC, or in MPEG2. Itdiscusses a method of defining average bitrate in order to bound it for each level and thus makesure that software-based decoder implementation can be deployed for AVC. Further study isneeded to determine the impact of this contribution on the design.

Transform and Quantization

Transform coding

JVT-C024, Notes on the JVT IDCT (L. Kerofsky, Sharp Labs), cross verifies the equivalence ofIDCT implementations and the dynamic range of calculations. It provides information to helpimplementers, and puts alternative examples in software.

JVT-C025, Modifications to the JVT IDCT (L. Kerofsky, Sharp Labs), proposes modifications tothe normative IDCT and dequantization definitions. Although the modifications were described, thegroup preferred the stability of the current design. IPR para. 2.2 is checked.

JVT-C102, Threshold Adjustment for Adaptive Use of Double Scan (C-W, Kim, K-S. In,McubeWorks), verifies the threshold adjustment for adaptive use of double scan which already wasproposed in JVT-B081. It also includes the slight modification of the syntax to cover the wholedynamic range of additional QP values which is specified in JWD2. The new algorithm isimplemented in the JM 2.0 software; the result is presented in JVT-C102.xls.

A bit of gain was lost from JVT-B081 (C-W. Kim, McubeWorks; S-W. Rhie, SK Telecom, CSR13.03), which proposes to adaptively use double / simple scan, to JVT-C102.

Page 24: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

24 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

There was no consensus to accept this contribution. The JVT group concluded that the gains wereonly for Intra frames/modes. Overall gain for the entire sequences are very small (reported 4.5%gain is for Intra frames only). Later, double-scan was removed from the draft after consideration ofother contributions.

Quantization

JVT-C023, Lossless Coding and QP Range Selection (S. Sun, Sharp Labs), provides results of alossless coding scheme. Consideration was deferred to the next meeting. The original documentwas submitted as information only.

JVT-C045, One bit added to indicate universal quant for the whole picture (Q. Gu, Q. Wang, W.Qi, S-L. Chen, VWeb), recommends adding one bit at the picture header to indicate that a universalquant will be used for the whole picture, and that the quant fields in the macroblock layer will notexist. It proposes to add a parameter set or slice header syntax to support this (if slice header isperhaps constant per slice instead of constant per picture). It was noted that this would save one bitper coded macroblock with DCT coefficients.

The estimated savings were 1-2%. This contribution was considered in the Bug fix/cleanupcategory. After discussion and in consideration that the reported bit rate reductions are not for thecommon test conditions, that no verification was provided as to the quantity of bit rate savings, andthat the estimated bit rate savings was small, there was no consensus to accept this proposal.

JVT-C053, Weighting Matrix for JVT codec (T. Suzuki, K. Sato, Y. Yagasaki, Sony), proposes theaddition of step size adjustment in a frequency-dependent manner, providing similar functionality asin MPEG-2 video design. Improved coding efficiency was reported, particularly in perceptualterms. It was implemented in JM 1.7 software. IPR para. 2.2 is checked.

The complexity impact of table selection was discussed. Several people asserted that smartencoding can avoid the need for this. Further work was requested by the next meeting.

JVT-C136, Rounding, QP Origin, Dynamic Range, and |f| (G. Sullivan, Microsoft), advocatesminor changes to rounding behavior and to the nominal origin of the quantization parameter forspecification and transmission purposes, and proposes a refinement of the dynamic range spec forintra 16x16 mode luma DC transform coefficients. A remark is also provided on potentialimprovement of R-D performance by making |f| larger in the example encoder.

Proposed item 1. Editorial change of QP. This has no effect on bitstream. Note: There’s someconcern about the low end of the QP range (QP < -10 causing a problem for Transform dynamicrange) . The proposed change was adopted.

Proposed item 2. QP transmission: parameter level -> delta to the middle, slice -> delta off of that,mb -> delta off of that. It was agreed to initialize at the slice level to the middle of the range. Acheck is needed of the current design at MB level; use that with mid-range prediction for slice level.

Proposed item 3. Intra Plane Pred rounding: Adopt the first of the suggested alternative variants.

Proposed item 4. Clean up JVT-B038, Transform and Quantizer - part 1: Basics (A. Hallapuro,M. Karczewicz, Nokia; H. Malvar, Microsoft, CSR 13.03), eq 9-5 to 9-7 with a simpler offset sothat one can use a shift instead of a careful divider. There is a difference in the way rounding takesplace when the value is exactly in the middle. There is a further shift down by 5 bits at a later stagemaking this difference very minor. Adopted.

Page 25: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 25

Proposed item 5. Similar to item 4. Adopted.

Proposed item 6. Luma DC 16x16 data has a dynamic range problem for QP < -6. Before onedoes the down shift in the decoder, one might need 17 bits. Draft 8 doesn’t limit the range,allowing 17 bits before the downshift. This is not a common intra: only an issue in intra 16x16.

The proposal suggests requiring the encoder to not send data that would require the decoder to use17 bits of accuracy (before the downshift). This way, EVERYTHING would be 16 bits, evenintermediate calculations. Conclusion: Adopt subpoint 2 of item 6, not the rest of item 6.

Proposed item 7. B-Picture related divide by 2 rounding alteration. Not adopted.

Proposed item 8. |f| offset added before the quantizing. The proponent reports that |f| maycurrently be too small. The proponent thinks increasing this could give FREE coding gain. Thisidea needs testing. Since item 8 is just an information suggestion, no action was taken.

Intra Prediction

JVT-C033, Scalable Intra Prediction (M. Zhou, TI), proposes a complexity-scalable intraprediction scheme for H.26L video coding to satisfy the different needs of low- and high-endH.26L devices. The scheme is based on the design of a memory scalable probability table thatallows the scheme to operate in low-complexity or high-complexity mode. In the low-complexitymode, the scheme only needs to store a sub-table of the probability table and the entropy table usedfor encoding the selected prediction modes. While in the high-complexity mode, the entireprobability and the entropy table has to be stored. The high-complexity mode is fully backwardcompatible to the low-complexity mode. The experiments reported that the proposed schemeprovides the comparable coding efficiency with the current nine-mode based and previous six-modebased intra prediction scheme when operating in the high- and low-complexity mode, respectively.

Nine intra prediction modes are now in the draft, with a 10x10x9 table and an additional 9x9 table.Previous design had smaller tables (7x7x6 and 36). Theoretically there is a nibble in each entry –in practice it is usually a byte. If the encoder uses a subset of the nine modes, then performancewas measured to be worse than the former 6-mode method.

The proposal is to have both a lower and higher complexity type of operation. Lower complexitymode would be the pre-Geneva design except for the ordering on the tables. The proposalconstrains the high-complexity mode such that the first six entries in each list selectable from thelow-complexity values are from the low complexity table. Confusion over the correctness ofimprovement of the 9-mode method versus 6-mode method should be tested.

The training set used to design the tables was a superset of the test set. Software is available as partof the contribution, in version 1.9. There has not yet been verification.

Concerns about the memory consumption were raised, and the complexity vs. coding efficiencytrade-off was questioned. Further work on reducing complexity is requested in an AHG onINTRA frame coding complexity reduction (Chair: M. Karczewicz, Nokia).

This is considered as cleanup, and further work is encouraged. No change to JWD2 was adoptedat this time.

JVT-C118, Improved Chroma Spatial Prediction for Intra Coding (D. Hoang, T. Horie, E. Viscito,GlobespanVirata), notes that subjective tests are reported to have shown that chroma blockiness canbe present in JM 2.0 coded video material, particularly in the presence of highly saturated chroma.

Page 26: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

26 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

JVT-C118 proposes an improved method of chroma prediction that is reported to increasesubjective quality. One variant of the proposal requires a syntax change.

Best example: Improved chroma spatial prediction for intra coding in the news sequence reducestotal bit rate by 3% while increasing chroma PSNR by 0.2 dB keeping luma fidelity constant.Never worse. Average across sequences: 1.5% reduction with 0.06 dB chroma fidelity gain. Thegroup’s impression is favorable, with interest in more test results. This contribution wasconsidered as cleanup. It should be considered within the INTRA AHG.

JVT-C151, New Intra Prediction using Self-Frame MCP (S-L. Yu, C. Chrysafis, Divio), proposesa new approach to improve Intra macroblock coding by using motion compensation technique forvideo coding. This approach can also be applied to image coding. It uses motion compensation forintra prediction, and a motion search in previously-decoded areas of the current picture (priormacroblocks in raster scan order). The (0,0) motion vector indicates prediction from right wherethe decoded MB sits. IPR para. 2.2 is checked.

The I picture is redefined to be a P picture with a single reference picture that is the previous partsof the reference picture in raster-scan order (including the MV prediction). Search range +/-32.

Up to 14% BD-Rate savings (or 0.8 dB BD-PSNR) – no cases of negative impact, especially inForeman, also in Akiyo. CIF is better than QCIF. Overall, the BD-rate savings is around 5%.This proposal takes advantage of motion comp engine power that could go unused otherwise in Ipictures.

The results appear promising and further work is highly encouraged. This work lacks maturity. Itis missing subjective confirmation. There was no support from the group. Various interactionswith the loop filter are unclear.

Adaptive Block Transform (ABT)

JVT-C012 is the AHG Report on Additional Transforms and Quantization (M. Wien, RWTHAachen).

For high quality applications, JVT-C106, Adaptive Block size Transform (U. Benzler, M.Wollborn, Bosch), reports achieving an improved subjective quality compared with the JM. Incombination with other tools that improve coding efficiency for high quality applications, ABT isproposed for the Broadcast Profile in the JVT standard.

JVT-C107, 16 bit Adaptive Block size Transforms (M. Wien, A. Dahlhoff, RWTH Aachen),presents the revision of the ABT proposal according to the current working draft JWD2. Thetransform design is modified to fit into 16 bits. The new JM 4x4 transform and an 8x8 transformproposed by G. Bjøntegaard (Telenor) are used. The new design exhibits less ringing artifacts andsuperior subjective quality compared to the TML9 proposal. The CABAC encoding mechanism isredesigned and a VLC solution is proposed. The intra prediction is extended to nine predictionmodes as adopted at the last meeting. The accompanying contribution, JVT-C108, provides theintegration of the ABT proposal into the JM 2.0 software. In this paper, the coding elementsneeded for ABT coding are defined. The software is available (update with bug fixes shouldimprove quality).

This provides somewhat less PSNR gain than the previous Geneva proposal (due to improved intraperf from last meeting for intra prediction, new macroblock segmentation, and improved CABAC).Interlace common conditions set with CABAC CCIR I5BP 1.6% to 9% BD-Rate saving, CCIR III2% to 6%, HHR I5BP 1.5 to 6%, HHR III -0.2 to 4% savings. (CCIR refers to ITU-R BT.601-5,

Page 27: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 27

Studio encoding parameters of difital television.) Progressive common conditions with VLC (couldprovide also with CABAC) IBBP 0.4 to 5% BD-Rate savings, III -0.5 to 3%.

JVT-C140, Alternate Coefficient Scanning Patterns for Interlaced ABT Coding (L. Wang, D.Baylon, Motorola), notes that Adaptive Block Transform (ABT) coding for JVT has been proposedas a method which can improve coding performance for interlaced and high resolution video. InJVT-B053, “traditional” zigzag scans are used to convert two-dimensional coefficient data intoone-dimensional data. This contribution looks at the use of alternate coefficient scanning patternsfor interlaced material in order to improve the ABT coding efficiency. Preliminary results withalternate scanning patterns have shown bitrate savings of up to about 7-8%.

The proposal to add ABT capability appears to be technically sound and the results are sufficient asof JVT-C140. JVT-C107 and JVT-C140 were adopted into a non-baseline profile, subject toverification. This is also contingent on the Profile discussion results.

Source Formats

JVT-C116, Default Color Space for JVT Video (G. Sullivan, Microsoft), states that the currentJVT draft does not define an operational color space. JVT-C116 provides a specification of onesuch color space here, and proposes it to be the default color space definition for JVT video. It wasagreed to add equivalent support to MPEG-2 & 4 Video. The proposed color space is consistentwith ITU-R BT.709-5, Parameter values for the HDTV standards for production and internationalprogram exchange.

A terminology change is proposed to use “luma” and “chroma” on non-linearly transformedcolor. There are some precision issues (3 digits of significance) in the various standards (such as709). The proposal is to specify the default as ITU-R BT.709 (601 variant of 709). Others couldbe dealt with by signaling in the bitstream, possibly in the parameter set data. The MPEG-4 defaultspec is in accordance with this proposal EXCEPT for the matrices (MPEG-4/2 defaults to the HDvariant of 709).

The JVT Group agreed to accept JVT-C116, as there is a need for a defined color space. Thegroup decided to change what was proposed to provide the same flexibility as in MPEG-2 andMPEG-4 with no default.

JVT-C135, Pixel Aspect Ratio Support (G. Sullivan, Microsoft), proposes support of five specialvalues of pixel aspect ratio (originally defined in Annex C of JWD2) and also custom arbitrarywidth and height numbers that shall be relatively prime. The five special values are 10:11, 12:11,40:33, 16:11, and 1:1. This design is the same as found in H.263+ and MPEG-4 visual. Thepurpose of having special enumerated values is to discourage strange small variations of similarnumbers. JVT-C135 advocates restricting the width and height of the pixel aspect ratio to berelatively prime, and disallowing zero values for these parameters. It provides a list of preferreduses of pixel aspect ratios.

The JVT Group agreed to accept information inserted at the Sequence Layer along with the“cropping rectangle” information at the Sequence Layer, and agreed to add 3:4, 2:1 and 2:3variations of the proposed aspect ratios. It was agreed to use an 8 bit table (instead of 4). It wasagreed to add equivalent support to H.263+/MPEG-4.

JVT-C137, Cropping, Generalized Pan-Scan, Source Formats (G. Sullivan, Microsoft), advocatesthe adoption of generalized pan-scan support, notes that 4:2:2 is a stated but unfulfilled requirementof the JVT video codec project, suggests an initial design for 4:2:2 support, and discusses somethoughts on 4:4:4, 10-bit and 12-bit support, and support of more than three video components.

Page 28: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

28 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Most of the comments in this contribution echo those made in D.50© at the November 2000meeting of ITU-T SG16 - a contribution that received a favorable reception at the time.

Proposal subject 1. The proponent supports arbitrary cropping (Annex C) per picture. Thecropping rectangle is the basis for the output image.

Proposal subject 2. Let the encoder send an arbitrary number of pan/scan rectangles (which is arectangle relative to the cropping rectangle), not necessarily tied to a display aspect ratio (as inMPEG-2), tagged with IDs that are assigned somewhere up in the system level.

This will not conflict with the system layer. A system level scene composition will tell the decoderwhat to do with these. if any. If the system layer has similar information, it can override theinformation in the video layer. This information will be supplemental enhancement info, and assuch, can simply be dropped on the floor and the decoder will still be conformant.

On the proposal subject 3 (4:2:2): the proposal suggests a simple way to incorporate 4:2:2, notaimed at optimality. Further investigation is recommended.

Proposal subject 4. Potential Extensions (4:4:4, 4:4:4:4, 4:4:4:4:4:4, 10-bit, and 12-bit) are not inthe requirements document, and were not considered at this time.

The Cropping and Generalized Pan and Scan information at Picture Layer were adopted, while thesource format modifications were not adopted. 4:2:2 is not mature at this stage, and needs furtherwork.

Interlace Coding

JVT-C013 (P. Borgwardt, VideoTele.com; L. Wang, Motorola) is the AHG Report on Interlace.

JVT-C042, Core Experiment Results on Interlace Chroma Phase Shift (Y. Suzuki, Hitachi),provides a report on the results of a core experiment for interlace chroma phase shift. Theproposed tool provides improvements up to 10%, while the JM 1.7 with the proposed tool showsbetter rate-distortion performance than the original JM 1.7 for all the test cases. Verification ofJVT-C050 (below) is indicated with a final check on BD values.

In frame mode, for certain vertical MVs, a chroma phase shift is motion-compensated. JVT-C050,Result of Core Experiment on 4:2:0 Interlace Chroma Phase Shift (S. Sato, T. Suzuki, Y. Yagasaki,Sony), proposes to spatially interpolate the correct chroma position using the WD2 1/4-pel filter.The problem is somewhat suppressed by RD-opt in MB mode decision. For RD-opt on, theproposal provides PSNR gains between 0-0.5 dB corresponding to bitrate reductions of 0-10%.The complete test set was not used. IPR para. 2.2 is checked. The CE results proposed in JVT-B068 are presented. Further investigation is encouraged on this bug-fix.

JVT-C138 (see also under HRD/VBV Buffering Model, above), Temporal Rerences, PictureNumbers, Pictures, Frames, and Fields (G. Sullivan, Microsoft), contains a design for therepresentation of time and the management of multi-picture buffer contents for the JVT codec,including discussion of the use of interlaced frames and fields as well as progressive-scan pictures.

Temporal references should be based on a general “timescale” parameter as found in Annex C ofJWD2r7 and JVT-B109, or should be based on a 27 MHz clock (not a 90 kHz clock). It wasasked whether anybody anticipates a problem with using that clock. It was recommended foradoption, and later modified to support a general clock, not just 27 MHz. Dangling fields supportwas adopted.

Page 29: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 29

Each field in a frame may (or shall) have its own temporal reference. This was recommended foradoption. Some details about how it works for a frame need to be worked out. There is a problemwith direct mode prediction. A solution will be worked out and presented later. Work will be doneon pic num for interlaced coding, and the topic will be revisited. SEI to signal SMPTE compatibletime code representations was recommended for adoption.

JVT-C139, Macroblock adaptive frame/field coding for interlace sequences (L. Wang, R. Gandhi,K. Panusopone, Y. Yu and A. Luthra, Motorola), presents the results of MB-level adaptiveframe/field coding for interlaced video materials, and the performance comparisons with picturelevel adaptive frame/field coding. The simulation results show that MB level adaptive coding canprovide additional gain over picture level adaptive coding for sequences that favor frame coding. Italso suggests that MB level adaptive frame/field coding can be integrated in picture level adaptivecoding. MB/picture level adaptive coding guarantees the performance over both MB level andpicture level adaptive coding. BDBR savings are reported up to 14%. Further work isrecommended, with possible inclusion in July.

Performance, Implementation, and Complexity Analysis

A joint session was held with MPEG on complexity issues.

M8347, portability of C code, ANSI C compliance (MPEG), provides information that should beconveyed to the JVT software coordinator.

M8378, Complexity analysis “simple”, “medium”, and “high” complexity settings (MPEG),covers analysis of memory access, speed, etc. It notes that the memory footprint is big. JVT notedthe desire to have prototype control files: The suggestion was to make them part of the softwarepackage.

MPEG introduced their AHG on complexity analysis: [email protected], and invitedsubscriptions: [email protected].

JVT-C148, Demonstration of a Computation-Optimized JVT Codec (A. Joch, F. Kossentini, UBVideo), presents results from a computation-optimized JVT codec which was demonstrated at themeeting. For CIF content on an 800 MHz Pentium III PC, the codec achieves speeds of 49, 137and 36 frames per second for encoding, decoding, and simultaneous encode and decode,respectively. Subjective testing has shown that this reduced JVT complexity encoder achievessimilar subjective quality at 25% reduced bit rate compared to an RD-optimized H.263+ encoder.

JVT-C016 (M. Horowitz, Polycom) is the AHG Report on Complexity. It contains a review ofcomplexity analysis and reduction efforts. Two trends are emerging on entropy coding. The lowercomplexity VLC and adaptive VLC methods have begun to converge on CABAC codingperformance and the complexity of these methods continues to decrease while the complexity ofCABAC related algorithms also continues to decrease.

The current decoder is reported to be quite demanding in terms of the worst-case bus bandwidthrequired to be compliant in all cases. JVT-C115, Reduced Decoder Peak Bus Bandwidth (L.Winger, VideoLocus), suggests a solution to reduce the system cost of a compliant decoder byreducing the worst-case bus bandwidth. The proposed solution was reported to have virtually noimpact on the rate-distortion performance of the JM1.9 codec for the common test conditions, whilereducing the worst case total number of block accesses by half.

Page 30: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

30 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

A verification experiment is needed to verify that loss in quality is not visible. It is hoped to obtainconclusions at the July meeting.

JVT-C152, JVT/H.26L Decoder Complexity Analysis (M. Horowitz, Polycom; A. Joch, UBVideo; F. Kossentino, UB Video; A. Hallapuro, Nokia), studies the complexity of the JVT/H.26Ldecoder, which is an important part of the overall cost effectiveness of a JVT/H.26L-based videosystem. It uses a table-driven method to analyze complexity on different platforms, and comparesthem to the H.263 baseline. It evaluates the dependence of complexity of the decoder on theencoding design. It also notes memory requirements. Results indicate that the JVT/H.26L decodermay be 2-3:1 higher in terms of computational complexity than an H.263 baseline decoder. It waspresented for information only.

JVT-C159, MPEG Memory Complexity Analysis AVC (N4570n) (MPEG), was presented forinformation.

Carriage and File Format

The JVT specification will have a common representation of all parameter set data, slice data, andSEIs that are carried in the same way in bitstreams and in packets. Each atomic chunk of such datais referred to as an NAL unit. This includes ensuring no emulation of start code prefixes of thebitstream format (i.e., in the current design, this is the common definition of EBSPs).

There will be a bitstream format in the joint video standard (e.g., in an annex). This format will:

• Be capable of conveying temporal reference (relative timing data) in the video bitstream for usein HRD.

• Be capable of conveying all data in band.• Include sufficient information to identify conformance of the video bitstream; this requires

sufficient relative timing information.

There will be a packet-oriented interface in the joint video specification.

Conversion between the two formats should be simple. In other words, one part of the specificationwill describe what data needs to be carried. The bitstream definition part will describe how to put allthat into a bitstream. The packet-oriented definition part will describe how to put this data intopackets.

If some data is necessary to decode the Y, Cb, Cr sample arrays, it will go in the parameter set or theNALU, if not, then as a general rule, it will go in SEI. SEI use is generally intended for things thatcan get lost. Parameter sets are for things that cannot get lost. Parameter sets are persistent forsequence unless changed.

The following was agreed:

• Bitstream picture start code suffix will include picture type and store/no-store indicator.• I picture contains all I slices, P either I slices or P slice, B either I or P or B slices.

SEI is carried as a separate packet (e.g., with a separate start code or in a separate packet).

NALU is Type code including error indication (8b), with the definition of EBSPs to follow.

Compound packets, if done, will be external to the joint video specification.

Page 31: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 31

Payloads:

• Partition(s) of each type of slice (store/no-store indicator in slice header)• Parameter set update(s)• SEIs

All currently defined parameters nominally reside at the random access point level. Also a picturetype indicator is at the picture level. If there is a lot of picture level data, JVT will do picture-levelparameter sets, if not, it will be repeated at the slice level. RPSL was designed with an intent to haveboth picture level RPSL and slice level RPSL.

Some concern was expressed as to gateway operation and the support of packet networks that donot have a per-packet timestamp (e.g., H.324/M). These concerns are expected to be the subject offurther study.

JVT-C122, JVT Transport over RTP (J. van der Meer, Philips), addresses requirements for theRTP NAL, in particular in relation to the MPEG-2 NAL. An example provides a global descriptionof how these requirements could be implemented.

JVT-C143, File Format for JVT Video based on MP4 (T. Walker, A. Tabatabai, M. Z. Visharam,Sony Electronics; D Singer, Apple Computer), proposes a file format for JVT video based on theISO/MPEG-4 MP4 multimedia file format. MP4 is a generic file format for storing streams oftimed media data and is independent of any particular compression scheme. As such, it providesmost of facilities needed for the storage of the JVT file format, e.g., data storage, random access,metadata storage, etc. This report was also submitted as input to the file format work of MPEG.

JVT-C147, Towards storing JVT in an MP4 File (D. Singer, Apple), notes that the MP4 fileformat has been defined for MPEG-4 content, and as it is an extensible file format, it is natural toask how JVT sequences might be stored in it. Other contributions explore the possibility of addingstructures to the file format, which, though generic, are intended to solve the specific issues thatarise in JVT. This document attempts to derive a different approach: to what extent can JVTsequences be stored in MP4 without introducing new structural concepts at the file format level?This report was also submitted as input to the file format work of MPEG.

JVT-C158, MPEG Proposed Guidelines for the Carriage of AVC Content within MPEGFramework (N4714p) (MPEG), describes the proposed guidelines for the carriage of AVC content.In this document, “carriage” includes delivery of content as well as storage in the file.

NAL, High-Level Syntax, and Robustness

JVT-C055, NAL Syntax and Carriage of JVT Stream over MPEG-2 Systems (T. Suzuki, N. Oishi,Y. Yagasaki, Sony Corp.), notes that a method for carrying JVT video over MPEG-2 Systems wasproposed in Geneva (see CSR 13.03) in JVT-B049, JVT-B063, JVT-B070 and JVT-B088. Itwas agreed that the startcode is needed for bit serial communication environment, like MPEG-2Systems, and adopted in JM. This document proposes detail definitions of NAL for MPEG-2Systems, based on the study in JVT-B070.

The Chair’s remarks on the presentation:

• MPEG-2 TEL proposed. Sequence header, random access header, picture header, and sliceheader proposed.

• MPEG-2 definition of an access unit is unchanged: An access unit is a picture.• Mapping to PES.

Page 32: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

32 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

• Modifications to MPEG-2 Systems: New Stream id and stream type.• New descriptors of MPEG-2 Systems for parameter set descriptor and SEI descriptor.

Comments from the JVT group:

• New amendment needed for carriage of MPEG-4 Part 10 in MPEG-2 Systems.• The proposal makes low-delay gateway design very hard, because sequence, random access,

picture, and slice header have to be converted to be parameter sets.• The proposal does not follow the spirit of the current JVT high-level syntax based on parameter

sets.

JVT-C063, NAL Compatible to QoS Controlled Network (D-Y. Suh, Kyunghee University), is aninformational contribution. It discusses the use of the JVT codec in networks where QoS can becontrolled. The group did not recognize any immediate needs to respond to the suggestedproblems in the JVT coding standard.

JVT-C064, Byte Stream Format with Byte Alignment Recovery (G. Sullivan, Microsoft), presentsa modification of the prior JVT-B063 (CSR 13.03) method of formatting JVT data into a bytestream with use of start codes to address concerns regarding byte alignment loss. The modificationpreserves the advantages of the prior method (low data expansion overhead for start code emulationprevention, low start code overhead, byte-oriented processing for encoding and decoding, andavoidance of conflict with MPEG-2 Systems start codes) and adds simple byte alignment recoverycapability using only byte-oriented processing. The proposed method is actually simpler than theone from JVT-B063 that is in the current design. This alteration of byte stream format wasadopted with N=1. See also JVT-C095, below.

JVT-C070, Signaling of Timestamps (M. Hannuksela, Nokia), proposes signaling of timestampclock frequency, temporal references, decoding timestamps, and presentation timestamps.

• Question: Is there a limitation on timestamp SEI message frequency? It is up to a specific TELto decide.

• The use of timestamp SEI messages is up to a specific TEL.• The group supports sections 3 and 4 of the proposal (signaling of temporal references).• Presentation timestamps may not have to be carried in coded video stream, because all targeted

systems have another mechanism to transmit them (MPEG-2 Systems, RTP) or they are notneeded (H.320). However, compound packets in RTP might make an exception.

• Decoding timestamps are needed at least in RTP transport when gatewaying to MPEG-2Systems.

• Agreed: SEI packets are used to convey timestamps if needed in the system.• Decision on carriage of decoding and presentation timestamps is deferred to be dealt with

HRD.

JVT-C078, Coding of Parameter Sets (M. Hannuksela, Nokia), notes that in the JVT NAL design,infrequently changing information is organized into the parameter set structure introduced inVCEG-N52 (CSR 12.37, Sept. 2001). There are typically few distinct parameter sets in a videocommunication session. The parameter sets are preferably transmitted reliably and out-of-band atthe session set-up time. However, in some systems, mainly broadcast ones, reliable out-of-bandtransmission of parameter sets is not feasible, but rather parameter sets are conveyed in-band inParameter Update NAL Packets (PUPs). Parameter sets are enumerated, and the active parameterset is indicated in the slice header.

JVT-C078 proposes that the JVT coding standard should allow pre-definition of a large number ofparameter sets and their IDs in order to avoid an excessive amount of PUPs (parameter set updateNAL packets) in broadcast applications. While many of these pre-defined parameter sets are

Page 33: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 33

application-specific and fall outside the scope of the JVT coding standard, JVT could include adefinition of default parameter sets for each profile and level. JVT-C078 uses four parameter setstructures: independent GOP, picture, slice, and presentation parameter set.

The group decided to have only the independent GOP parameter set. Some of the proposedparameters were considered potentially unnecessary. Parameters proposed for the presentationparameter set were decided to be carried with the Supplemental Enhancement Information. Theparameter set identifier signaling was also agreed (section 3.3 of the contribution).

JVT-C080, Signaling of Enhanced GOP Information (M. Hannuksela, Nokia), notes that theenhanced GOP concept organizes pictures in temporal enhancement layers and sub-sequences.The signaling mechanism proposed in the contribution can be used for improved pictureidentification and computational scalability in decoders, for example. As this was not a requirement,it was not high priority at this meeting.

JVT-C087, Common NAL Packet Structure (Y. Matsushita), proposes a conceptual syntax of thenetwork layer independent NAL packet to be used for all transport layers including file format suchas MPEG-2 TS/PS, RTP and MP4. It proposes a flexible NAL packet syntax that can containanything from a partial slice to multiple slices. IPR para. 2.2 is checked.

Comments from the group:

• The proposal would add 24 bits overhead to a NALP.• The property of having independently decodable NALPs would be lost (if a NALP carries a

fragment of a slice).• The proposal makes the operation of gateways harder.

JVT-C095, H.320 NAL for JVT (D. Lindbergh, Polycom), proposes start codes and start codeemulations based on JVT-C064 (above). The group supported the proposal. Concerns wereexpressed on the uneven probability of having long zero-runs in the exp-Golomb code and theproportion of start code emulation bytes due to this fact. The proponent promised to bring harddata on the proportion of start code emulation bytes by the next meeting.

Further investigation is encouraged as to whether it is possible to have a clear way to distinguishwithin a bitstream format between the H.26Xs. Adopted with N=1 pending results coming in at thenext meeting.

JVT-C131, NAL Packet Segmentation (T. Stockhammer, TUM; S. Wenger, Teles AG), proposes anew NAL packet type and an additional NAL header flag to allow segmentation of long NALpackets in shorter packets. Syntax and Semantics are included in this contribution. It waswithdrawn.

JVT-C144, Generic Adaptation Layer for JVT Video (T. Walker, A. Tabatabai, M. Z. Visharam,Sony), proposes division of what used to be called the “NAL” to the network-independent“Generic Adaptation Layer” (GAL) and to the network-dependent NAL. IPR para. 2.2 is checked.These issues were covered in Carriage and File Format, above.

JVT-C018 (G. Sullivan, Microsoft; S. Chen, Broadcom) is the AHG Report on Film Mode /Timing. Note: JVT-C046, covered under HRD/VBV Buffering Model, above, also contains filmmode.

JVT-C164 (M. Hannuksela, Nokia) provides an output document description of NAL. ThePicture and Slice header parts were adopted.

Page 34: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

34 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

JVT-C030, Additional MMCO command for supporting more flexible bitstream switching (S.Kadono, S. Kondo, M. Schlockermann, Matsushita), notes that the prediction structure of S-pictures needs to be strictly limited to guarantee the uniqueness after bit stream switching. Itintroduces a new MMCO command to allow a more flexible picture prediction structure. When anS picture is received, the multiframe buffer must be reset. S frames must be treated like I pictures.IPR para. 2.2 is checked.

JVT-C054, Study of Random Access for JVT (T. Suzuki, Y. Yagasaki, Sony), reports further studyon random access for the JVT codec. It studies splicing and bitstream editing for the JVT codec. Itadvocates that the JVT codec should support splicing and bitstream editing at the video layer as it issupported by MPEG-2. The syntax and semantics extensions are proposed to support splicing. Itdefines NALP (NAL Packet) types. IPR para. 2.2 is checked. JVT may consider adopting NALPtypes as part of JVT-C074 (below) as simplified and revised in conjunction with FMO. Thediscontinuity flag conflicts with JVT-C117.

JVT-C072 to JVT-C075 were treated as a group:

JVT-C072, Isolated Regions: Motivation, Problems, and Solutions (Y-K. Wang, TUT; M.Hannuksela, Nokia), defines an isolated region as a solid area of macroblocks, defining the shape ofthe border across which loop filtering is turned off and spatial in-picture prediction is limited. Theshape of isolated regions may evolve during a number of consecutive coded pictures. Temporalprediction outside the isolated regions in the reference frames may be disallowed. Over time theisolated region can grow or shrink. The loop-filter is turned off at the boundary of the isolatedregion. Various limitations are applied to motion vectors. JVT-C072 proposes syntax andsemantics for a concept of isolated regions, a technique that provides a form of gradual randomaccess, error resiliency/recovery, picture in picture functionality, and coding of masked video scenetransitions. Software availability is indicated.

JVT-C073 (Y-K. Wang, TUT; M. Hannuksela, Nokia) proposes an error-robust video codingtechnique using isolated regions. While improving the error resiliency performance of the JVTcodec, it also provides a possibility for perfect gradual random access. Experimental results reportthat methods involving isolated regions provide significant gains in terms of decoder PSNR (mobiletest conditions). Subjective results were requested.

JVT-C074, Gradual Decoder Refresh Using Isolated Regions (Y-K. Wang, TUT; M. Hannuksela,Nokia), proposes a gradual decoder refresh technique using isolated regions. It reports that it isadvantageous to use isolated regions instead of intra pictures to provide random access points inerror-prone environments. The corresponding NALP type must be transmitted. The main reasonfor this method to work is the switched off loop filter. There was no consensus to adopt this at thistime, however the results were encouraging and can be revisited at the next meeting.

JVT-C075 (Y-K. Wang, TUT; M. Hannuksela, Nokia) proposes to code masked scene transitionsusing isolated regions. A large number of different mask patterns can reportedly be realizedconveniently. Isolated region defines a boundary around which the loop filter is turned off. Someways of supporting this functionality:

1. Combines well with the FMO concept.2. Can also do by turning off loop filter at slice boundaries as in JVT-C117.3. This set of contributions propose to define a set of evolution patterns in the parameter set and

put position within the pattern into the slice header.

This functionality supports JVT-B063 gradual decoder refresh design.

Page 35: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 35

This functionality has comparable error resilience to the RDO method without its complexity, andbetter performance if the two techniques are combined.

More discussion is needed. The group considers this to be one possible solution. Further study isrecommended.

JVT-C079, Improved Coding of Slice Headers (M. Hannuksela, J. Lainema, Nokia), achieves bit-rate savings in slice headers by fixed-length coding of parameters, granular slice position and size,and differential coding of QP relative to a default value given in the parameter set. There was noconsensus to adopt this proposal. It was agreed to treat the QP issue with JVT-C117.

JVT-C081, Sync Pictures (M. Hannuksela, Nokia), refers to coded pictures representing the samepicture contents as sync pictures. It proposes slice header syntax and semantics to signal thepresence of sync pictures. Members were encouraged to consider this for further study.

JVT-C083, Signaling of “Clean” Random Access Positions (M. Hannuksela, Nokia), notes thatinstantaneous decoder refresh refers to “clean” random access, where no data prior to the randomaccess position is referred to in the decoding process. JVT-B041 analyzed the requirements forinstantaneous decoder refresh, and appropriate syntax and semantics were published in JVT-B109and reviewed during the Geneva meeting. There was no disagreement on the contents of JVT-B109, but as the review was done in such a late phase of the meeting, the final decision of theadoption was deferred to the Fairfax meeting. JVT-C083 recaps the main points of JVT-B041and JVT-B109 regarding instantaneous decoder refresh and proposes identification of randomaccess positions using a specific NAL packet type and a specific picture number update rule in therandom access positions. NALP type, multipicture reset, and picture number = 0 were proposedand adopted.

JVT-C129, Reduced slice headers and bit error resilience (T. Halbach, NTNU), proposes“Subslices” for efficient treatment of residual errors in the code stream after transmission.Employment of these structures and their small-size headers reportedly leads to both bit ratesavings and a significant increase of robustness in the decoder. Along with subslices, datapartitioning, resynchronization markers, a variable-length non-resynchronizing code, and a simpleerror concealment technique are utilized. Subslices, DP and RMs aim at the normative, the variable-length code and EC at the non-normative section.

The group noted that some aspects of this proposal are out of date relative to the current JVTcurrent design. They note that different systems will not carry the data in the form described in thedocument “as-is.” They believe the slice structure design will address the concerns in thiscontribution. The error conditions and the design assumed in the document for the way the currentdesign works may not be relevant.

JVT-C141, ERPS Use for Commercials in Broadcast/Streaming (S. Wenger, Teles), notes thatduring the discussions between the authors of the JVT RTP packetization I-D, the authors cameacross some properties of the Reference Picture Selection scheme that are probably not on the radarscreen of most experts of JVT. It seems to be possible to send the Intra information typically usedfor scene changes, storing them in advance in a reference picture other than the default picture, andto use them at a later point of time as a reference for a commercial break without initial transmissiondelay. It would also make sense to display them directly without referencing them, and this is thereason for proposing a Display Command in the form of a Supplementary Enhancement Message.This paper was presented for information.

JVT-C142, Proposal for JVT SEI Messages (T. Walker A. Tabatabai, Sony), proposes a revisedsyntax for SEI messages and proposes several ways to support the embedding of SEI messagesinto the JVT video stream. In particular, it proposes using SEI messages to support inclusion of

Page 36: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

36 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

stream metadata, such as MPEG-7 metadata within JVT streams as discussed in the JVTrequirements. It also proposes using an XML-based message syntax with both textual and binaryversions based on MPEG-7 Systems. IPR para. 2.2 is checked. Comments noted that SEIcapability already in the design can carry all possible data including metadata. Metadata supportshould therefore be feasible.

JVT-C017 (M. Horowitz, Polycom) is the AHG Report on Robustness. The Robustness AHGwas created in Geneva in January 2002 with the following charter: To consider aspects of the JVTdesign in regard to robustness to lost data, including particular consideration of scattered sliceserror concealment and error robust macroblock mode and reference frame selection, and to defineand conduct the core experiments in this area.

JVT-C032r1, Experimental Result of Fragile Watermark Based Error Detection Scheme (P. Zhou,Z. Chen, Y. He, Tsinghua University), proposes a set of error detection schemes using fragilewatermark for JM2.0 based video communication. This proposal is aimed at using an embeddedwatermark in error resilience and concealment. The watermark schemes proposed do not embedextra bits into video, but constrain a relation between the Q-DCT coefficients. That results inkeeping the bit rate and PSNR almost unchanged; the error detection rate is increased compared tothe Syntax based error detection method.

Comments: In mobile environments, very low bitrates are used. Therefore header and MV datadominate and not many coefficients are used. No detection on those syntax elements could resultin possible degradations in performance. Following mobile environments test conditions is needed.A comparison to FEC is missing. Network interaction and particular application should beconsidered.

JVT-C048, Rectangular Slices to Tradeoff Error Resiliency and Coding Efficiency (P. Borgwardt,VideoTele.com), proposes rectangular slices as more efficient for prediction between macroblocksthan the current slice rows. They would allow an encoder to use smaller slices for error resiliencywhile still having good coding efficiency. No action was taken. See JVT-C089, below. FlexibleMacroblock Ordering is more general.

JVT-C071, Core-Experiment Results of Scattered Slices (Y-K. Wang, TUT; M. Hannuksela,Nokia), provides the core experiment results of scattered slices, proposed in JVT-B027, as aresponse to the core experiment description given in JVT-B111r1. Additional results of theexperiments may be found in JVT-C071 and JVT-C090, below.

JVT-C046, covered above under HRD/VBV Buffering Model, also includes a slice type part.

JVT-C082, Spare Reference Pictures (M. Hannuksela, Nokia), proposes signaling of entire andpartial spare reference pictures with the Supplemental Enhancement Information mechanism. Withthe help of the proposed signaling, receivers may avoid unnecessary picture freezing, feedback, anderror concealment, which are normally done as a response to missing picture data. Thiscontribution was not considered a high priority at this time, and with be considered within the AHG.

Contributions JVT-C089 through JVT-C093 were treated as a group.

JVT-C089, FMO: Flexible Macroblock Ordering (S. Wenger, Teles AG; M. Horowitz, Polycom),proposes the introduction of a new general-purpose tool into the JVT baseline, which allows thecoding of macroblocks in an order other than the typical raster scan order. The key application ofthe proponents for the tool is the implementation of error resilience mechanisms such as ScatteredSlices and Slice interleaving, as documented in JVT-C090 (Scattered Slices: Simulation Results,below) and JVT-C091 (Slice Interleaving: Simulation Results, below), but due to its flexibility,other applications are certainly possible.

Page 37: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 37

JVT-C090, Scattered Slices: Simulation Results (S. Wenger, Teles AG; M. Horowitz, Polycom)reports findings when applying the error resilience tool “Scattered Slices,” which is based on theFlexible Macroblock Ordering tool as proposed in JVT-C089 (above), to the test conditionsdefined in the current “Common Conditions for Wireline IP/UDP/RTP Conversationalenvironment” (see JVT-C146 for a discussion of these common conditions). Despite theadditional overhead of FMO through the broken in-picture prediction and the additional RTPpackets necessary for transmission, small gains in (modified) PSNR can be reported for mostsequences, when using a low complexity pseudo-random intra placement scheme. When using theloss-aware Rate-Distortion optimization, on average the tool provides small gains as well. Gainsdue to better error concealment might be offset somewhat by the loss of available bit rate due tooverhead.

JVT-C091, Slice Interleaving: Simulation Results (S. Wenger, Teles AG; M. Horowitz, Polycom),discusses primarily the different properties of implementing Slice Interleaving on the VCL and theNAL layer. The actual simulation results for a VCL-based implementation were rolled into JVT-C090 for practical reasons. Theoretical thoughts presented here make clear that VCL/FMO basedSlice Interleaving has advantages in terms of overhead. However, NAL-based slice interleaving isstill very useful in heterogeneous networking scenarios, and it “comes free” with the VCL/NALdesign of JVT 1.

JVT-C092 (S. Wenger, Teles AG; M. Horowitz, Polycom) provides JM1.7 software withimplemented FMO.

JVT-C093, FMO changes proposed to Working Draft content (S. Wenger, Teles AG; M.Horowitz, Polycom), contains a version of JVT-B118r8 with change bars to reflect necessarymodifications of the WD to support FMO, as defined in JVT-C089.

This concept was previously known as “scattered slices,” that is: Sending a map of indices suchthat a map of assignment of each macroblock is made to a slice group ID. A slice contains anumber of MB in raster-scan order within one slice group. Send map at parameter set level.Low impact on complexity was reported. Cross-verified (by Nokia) in JM1.7 code. This isproposed primarily as an error resilience tool. Common conditions tests for IP/RTP (using losspatterns from the internet) reported consistent improvements (more for some sequences), althoughit is not entirely clear how much gain. The visual gain was shown previously in Geneva. It wasasserted that further improvement is possible with better encoder optimization.

The FMO was adopted into the baseline and Main profile with a limit of a maximum of eight slicegroups and subject to further complexity analysis and verification experiments to be completed bythe July meeting.

JVT-C096, Sub-picture coding with the reduced boundary effect (W-S. Kim, Samsung; Y-K.Wang, TUT; M. Hannuksela, Nokia), proposes an error robust sub-picture coding method withreduced boundary effect. A frame is divided into the foreground and background sub-pictures, andthe foreground is coded in a higher picture quality and transmitted with a better error protectionthan the background. This proposal relates to the foreground/background region proposal fromGeneva. Most of it can be done with FMO syntax. It was agreed to incorporate this use withinFMO as a non-normative issue.

JVT-C117, Seven Steps Toward a More Robust Codec Design (G. Sullivan, Microsoft), proposesseven ways of making the current JVT video codec design more robust. It proposes to:

1) Avoid worst-case data expansion by adding an intra PCM mode. It was agreed to discuss thisin July.

Page 38: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

38 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

2) Reduce needless design variation by limiting the number of ways of partitioning the VCL data.Because only two ways are in the current design (1 packet per slice or 3 packets per slice), noaction is needed.

3) Achieve basic decodability of intra regions by allowing intra prediction from inter-codedregions to be turned off within slices. No action is needed because this is already in the draft.

4) Achieve greater error resilience and random access recovery by allowing deblocking filtering tobe turned off at intra/inter boundaries within slices. Further study will be conducted in July.

5) Achieve greater error resilience and random access recovery by allowing deblocking filtering tobe turned off at slice boundaries. Further study will be conducted in July.

6) Adopt “gradual decoder refresh” functionality as previously proposed. Further study will beconducted in July.

7) Avoid a needless coupling of codec technology to assumed input characteristics by lettingencoder choice rather than the step size selection determine whether double-scan of coefficientsis used. This proposal was voided.

The double-scan aspect was discussed in the transform and quantization section and recommendedfor adoption there. Double-scan was then removed from the design, obviating the need for thisproposal to be considered.

JVT-C125, Flexible data partitioning for improved robustness performance (Y. Chen, J-C. Ye,Philips), describes modifications to the VCL data partitioning syntax and the corresponding NALpacketization process to enable flexible data partitioning, which is advocated as essential tosupporting video communication applications over diverse packet-lossy networks. The proposedchanges are reported to be backward compatible with the current specifications. It also proposes apartitioning framework that eliminates decoding complexity overhead and reduces bit rate overhead.Two partitions are proposed. IPR para. 2.2 is checked.

This paper received a good reception from the group. More results with common conditions areneeded. Refer to JVT-C117 for the restriction on DP flexibility.

JVT-C132, Independent Data Partitions A and B (T. Stockhammer, TUM; S. Wenger, Teles AG),proposes to make partitions A and B independently decodable by duplicating the slice header andthe MB type information in partition B. Mandatory decoder actions are specified which allowencoder trade-off for simplicity and compression efficiency. No action was taken on this proposal.

Switchable P / Still Image Frames

JVT-C114, The improved JVT-B097 SP coding scheme (X. Sun, Harbin Institute; F. Wu, S. Li,Microsoft; R. Kurceren, Nokia), proposes an altered SP coding scheme by combining the JVT SPscheme and the JVT-B097 scheme. The reconstructed reference can be improved 3.64% inaverage, and the display image can be improved 10.34% in average. Experiment results andsoftware were provided. This proposal was accepted.

JVT-C126, Core experiment results on SP (S. Sun, F. Wu, Microsoft; R. Kurceren, Nokia),reports the statistic results of the core experiment JVT-B112. An additional excel file contains alldetailed results.

Entropy Coding

VLC Approaches

JVT-C008 (G. Bjøntegaard, RealNetworks) is the AHG Report on VLC. At present the TML hasdefined 2 ways of entropy coding:

Page 39: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 39

• A simple VLC based coding (referred to as UVLC)• A more powerful and more complex method with context adaptivity and use of arithmetic

coding (referred to as CABAC)

Most of the bit consumption comes from transform coefficient coding. With this in mind, therewere two contributions at the last meeting addressing more efficient VLC coding of transformcoefficients. The contributions were from RealNetworks (JVC-B045) and Nokia (JVC-B072);they both reported high gains over UVLC - especially for low QP values. The VLC AHG wasestablished based on this. As a result of the AHG work, two proposals (Nokia and RealNetworks)on adaptive VLC coefficient coding were submitted to the current JVT meeting.

JVT-C028, Context-adaptive VLC (CVLC) coding of coefficients (G. Bjøntegaard, K. Lillevold,RealNetworks), notes that CVLC is a method of coding transform coefficients. It replaces theUVLC coding of transform coefficients (Tcoeff) for luma as well as chroma, and uses contextadaptive coding to achieve a significant bit rate savings over UVLC. All other elements are identicalto the description of UVLC coding.

The group noted that this proposal is a little better than JVT-C088 (below). Some of the concernsare:

• Most of the gain is at the higher than normal operating range of the bit rates for JVT’s currentprofiles.

• 18 bit code length: It would be desirable to limit the code length to 16 bits.• Cross verification (Intel, RealNetworks)

JVT-C088, Adaptive coding of transform coefficients (M. Karczewicz, R. Kurceren, Nokia),addresses the VLC coding of transform coefficients. The proposed method provides significantimprovements, up to 23% for inter and 30% for intra, compared to the current method of coding oftransform coefficients. Software is also provided.

Some of the group’s concerns are that most of the gain is at the higher than normal operating rangeof the bit rates of JVT’s current profiles. It was cross verified by Matsushita in JVT-C145, below.

JVT-C145, Verification results for adaptive coding of transform coefficients (JVT-C088) (S.Kondo, K. Abe and S. Kadono, Matsushita), reports verification results for JVT-C088. It isimplemented to the JM1.7 software independently. It verifies that it can reduce large amount of bitscompared to UVLC coding. It was presented for information.

JVT-C086, Improving coefficients coding for Intra Picture (N-M. Cheung, Y. Itoh, TI), proposesan improvement on Nokia’s VLC proposal JVT-B072-revised, providing additional gain in codingefficiency on top of their results. It provides about 1% improvement over JVT-C088 for Intraonly.

Technically both JVT-C028 and JVT-C088 are very close. IPR statements favor JVT-C028.Therefore, based on technical and IPR situation, the group recommended use of JVT-C028 overJVT-C088. The proponents and the group are requested to work on reducing the number oftables, code length, and to look into the possible problem of start code emulation and usingstructured code. JVT-C028 was adopted.

CABAC

JVT-C009 (D. Marpe, HHI) is the CABAC Ad Hoc Group Report.

Page 40: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

40 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

JVT-C029, Low-Complexity Arithmetic Coding for CABAC (R. J. van der Vleuten, Philips),proposes an implementation of arithmetic coding that significantly reduces the arithmetic codingimplementation complexity, while it has a negligible influence on the compression efficiency. Theproposal satisfies all the requirements that were set by the CABAC AHG. IPR para. 2.2 ischecked. JVT-C062, Verification Results for JVT-C029 (D. Marpe, HHI), investigates theproposal for a low-complexity arithmetic coding engine presented in JVT-C029. The algorithmspresented were tested using the software of the contributors. Encoder and decoder were tested andno mismatch occurred.

JVT Group Discussion noted that this proposal was not cross-verified completely. Only codingefficiency was cross-verified but complexity was not. For some architectures, it is expected to be alittle better than JVT-C061 (below) and for others a little worse. The group agreed that technicallythere is no clear winner between JVT-C029 and JVT-C061.

JVT-C061, Fast Arithmetic Coding for CABAC (D. Marpe, G. Heising, G. Blättermann, T.Wiegand, Heinrich-Hertz-Institute), proposes a new arithmetic coding core engine to speed up thearithmetic coding and decoding process. Compared to the arithmetic coder in JWD2, the proposedfast arithmetic coder leads to only minor increases of the bit rate, on average about 0.5%, while aspeed-up of about 24% of the CABAC decoder (for bit parsing, context modeling and symboldecoding) and of about 5% of the total decoding time can be achieved.

JVT-C133, Verification of Improved CABAC (T. Stockhammer, TUM), verifies results for “FastArithmetic Coding for CABAC” proposed in JVT-C061. As the identical results have beenobtained, the improved performance could be confirmed.

It was agreed to accept JVT-C061. There was no opposition except from the proponent of JVT-029. Proponents were encouraged to complete the cross verification. JVT-C061 leaves the dooropen for adoption of CABAC in another profile that can possibly be royalty free.

JVT-C031, Improvements on RUN and LEVEL information coding in CABAC (S. Kondo, S.Kadono, Matsushita Electric Industrial), proposes altered methods on (RUN, LEVEL) informationcoding in CABAC. The proposed methods are an inverse scan of transform coefficients and newtransition method of contexts in the LEVEL coding. IPR para. 2.2 is checked. The group notedthat gains are smaller than those provided by JVT-C060 (below). The proposal was not accepted.

JVT-C060, Improved CABAC (H. Schwarz, D. Marpe, G. Blättermann, T. Wiegand, Heinrich-Hertz-Institute), proposes two modifications to CABAC, which further improve coding efficiency incomparison to UVLC. The first modification concerns the coding of transform coefficients. Theexperimental results show that this modification increases the bit-rate savings of CABAC incomparison to UVLC by additional 1-3%. The second modification is a minor change of themacroblock type encoding in B-frames. It introduces the SKIPPED macroblock into the set of B-frame macroblock modes for CABAC as it has been already done for the UVLC. Thismodification unifies the process of macroblock type encoding for both schemes. Substantialcoding gains for B-frames are achieved for sequences with low motion.

JVT-C134, Verification of Fast Arithmetic Coding Engine (T. Stockhammer, TUM), verifiesresults for improved CABAC results proposed in JVT-C060. As the identical results have beenobtained, the improved performance could be confirmed.

JVT-C060 was accepted subject to better text description and software availability. Proponents ofJVT-C031 and JVT-C060 were encouraged to do further work in the direction of merging someof the ideas while still keeping the 2.2.1 IPR status (royalty free if everyone else agrees to royaltyfree) of JVT-C060.

Page 41: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 41

JVT-C038, Bounding the complexity of arithmetic decoding (F. Bossen, NTT DoCoMo), analyzesthe worst case complexity of arithmetic decoding and concludes that the worst case complexity canbe 40-50 times higher than the average complexity. It proposes a modification of the arithmeticcoding core to reduce the worst case complexity. IPR para. 2.2 is checked.

The JVT conditionally accepted the proposal in JVT-C038, subject to cross-verification, andCABAC not being in Baseline.

JVT-C100, Improved Terminology for CABAC Section of WD (B. Haskell, Bosch), proposesdefinitions of terminology. Although it was presented for information, text editors are encouragedto take this contribution into account during the editing process.

JVT-C162, Putting a Reasonable Upper Limit on Binarization Expansion (L. Winger,VideoLocus), notes that VCEG-O18 proposed an exp-Golumb binarization to limit the maximumnumber of bits in the binarization to a more reasonable number than the current unary binarization.This proposal greatly reduces the RD losses at higher QP that were experienced with VCEG-O18(CSR 12.48, December 2001), while still reducing the potential binarization length by many ordersof magnitude. In contrast to the VCEG-O18 proposal, JVT-C162 proposes keeping unarybinarization for the first number of bits in the new binarization (to allow better context adaptation ofCABAC), and only switching to an exp-Golumb binarization suffix for larger symbol levels andlarger motion vector residual magnitudes. The group agreed that this proposal should be subject tocross-verification and some further study.

JVT-C098, New Syntax for Error Resilient Transmission of CABAC Stream (S-I. Sekiguchi, Y.Yamada, K. Asai, Mitsubishi), proposes a set of new additional syntax elements for error resilienttransmission of CABAC stream. This proposal will be studied further in the ad hoc group onCABAC.

Profiles and Levels

JVT-C019 (D. Lindbergh, Polycom) is the AHG Report from Profiles, Levels & Applications. Itincludes a description of remarks and proposals relating to profiles, levels and applications.The AHG members proposed the Levels should be tweaked to optimally support the followingformats:

720x480 at 30 fps720x576 at 25 fps1280x720 at 24, 25, 50, and 60 fps (all progressive scan only)1920x1080i at 30 fps (note that 1088 lines fill out an even number of macroblocks)

Based on these requests, JVT-C019 provides a modified set of Levels with changed valuesindicated from JVT-B108.

JVT-C026, JVT Profiles Proposal (D. Lindbergh, M. Horowitz, Polycom; T-I. Johansen,Tandberg; S. Wenger, Teles AG), proposes specifics for the JVT Baseline and other Profiles.

JVT-C036, TV profile and levels (Y. Le Maguet, J. Gobert, T. Fautier, Philips), aims at defining aTV profile both for standard and high definition. It also defines two levels for the TV profile, onefor standard definition and one for high definition and proposes a list of parameters to be part ofthe level definition.

JVT-C058, Straw-man proposal for baseline profile (J. McVeigh, B. Iverson, B. Reese, Intel), is inresponse to the request issued in JVT-B108 for proposals on profile tool set definition. A straw-

Page 42: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

42 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

man proposal for the Baseline Profile is presented, justifications for the recommended tool set arediscussed, and open issues are identified to be resolved before final profile definition.

JVT-C068, Definition of Profiles (M. Hannuksela, Nokia), proposes two profiles for the JVTcodec and the allocation of the accepted coding tools between these profiles.

JVT-C099, Main Profile for MPEG-4 AVC and H.264 (S. Chen, A. MacInnis, Broadcom), is aproposal to define the Main Profile of MPEG-4 AVC (H.264) for applications requiringentertainment quality video, such as digital TV, HD DVD, high-quality streaming video, video-on-demand and personal video recording, etc. The proposal is based on reported requirements forenhanced TV services. Tools selected for this profile were evaluated by considering codingperformance vs. cost trade-off and also with supporting the RFB principle.

JVT-C109, Profile/Level restrictions for very low end applications (G. Bäse, Siemens), is targetedon the “very low end” side of applications. Therefore only the baseline profile is taken intoaccount. Most of the intended applications so far have a certain level of available complexity whichis not the case on the very low end side. Some additional restrictions are proposed to permit verylow end applications:

The Baseline Profile should not contain: 1/8 pel, CABAC, SP-frames and B-frames.The number of Multiple Reference Frames should be graded with increasing Levels.The intended picture format should be the upper limit for any custom picture format.Additional frame rates should be defined.

JVT-C113, Baseline Profile for JVT Coding Standard (A. Luthra, Motorola), is a proposal todefine the Baseline Profile in JVT with the aim of unifying the video conferencing, digital TV(including broadcasting of entertainment video, video-on-demand and personal video recordingtypes of applications) and streaming video worlds under the same profile. In the past, MPEG hasfocused on defining different profiles for these worlds. However, with the advancement intechnology and cost effectiveness of new technology, it is proposed that there is no need to defineseparate profiles for the applications in those classes.

JVT-C155, MPEG Procedures to Develop Profiles & Levels (N4671p) (MPEG), is a descriptionof the MPEG process for development of profiles and levels.

JVT-C161, Mobile Profile Proposal (J. Boyce, Thomson; C.J. Tsai, PacketVideo), proposes aprofile targeted toward low-power mobile devices. Some opinions and remarks expressed duringdiscussion included:

• Limit number of levels as well as profiles?• Concept of levels with sublevels introduces a way of compromising between a minimum

number of major configuration points and a desire for flexibility and scalability ofimplementation.

• Leave some TBDs?• Does the current draft allow MVs way off the edge of the picture?• If FLC for QUANT is in the slice header, then set the number of bits at parameter set level?• Baseline: no 1/8 pel, Yes deblocking, no mixing intra/inter within MB, 4:2:0 only, Interlace at

level capable of HHR+, maximum of 15 reference pictures, no CABAC, VLC, No SP, allmotion prediction block shapes, all intra prediction modes, no data partitioning, all QP values, Bpictures? No.

• Max MV Range: 256• # Ref pictures at highest resolution: ?• Main: Same as baseline, plus CABAC, ABT, B pictures, bi-predictive coding of blocks smaller

than 8x8 is not allowed (revisit after JVT-C115, above, under Performance, Implementation,and Complexity Analysis).

Page 43: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 43

• ABT to be discussed.• Will have maximum bit rate.• May want limits on bit rate and number of bits per macroblock.• The level numbering scheme will have sub-levels using numbers.• Limit on extreme aspect ratios as defined in Geneva are tentatively accepted and will be the

subject of further study.

At this meeting JVT agreed (see above), after considerable discussion, on two Profiles, “Baseline”and “Main,” and on a set of Levels, according to the principles previously agreed in JVT-B108.Details are recorded in JVT-C165, JVT Profiles and Levels Agreements (JVT output document).Adoption of ABT is subject to verification experiments and further complexity analysis to becompleted by the July meeting.

The FMO is accepted in the baseline and the Main profile with a limit of a maximum of eight slicegroups and subject to further complexity analysis and verification experiments to be completed bythe July meeting.

Test Model

JVT-C034, A novel fast fractional-pel motion estimation, (Z. Chen, Y. He, Tsinghua University; Y.Chen, SVA Group), presents a motion estimation search method. This proposal states: In any cases(1/2 pixel, 1/4 pixel or 1/8 pixel resolution) a rough computation reduction of 62.5% in fractionalpixel motion estimation can be achieved compared with the full fractional pixel motion searchalgorithm. Experimental results show that this strategy keeps a better performance in preservingimage quality and makes little influence on the bit rate.

JVT-C065, Further Improvements on Motion Search Range Decision (M-C. Hong, SoongsilUniversity; C-W. Kim, McubeWorks; S-W. Rhie, SK Telecom), is about the optimal decision ofsearch range for fast encoder implementation using neighboring motion vectors. It improves theperformance of JVT-B022 which was already presented in the Geneva meeting. IPR para. 2.2 ischecked.

JVT-C084, Lagrange Multiplier and RD-characteristics (K. Takagi, KDDI), verifies the Lagrangemultiplier used in the current JM and proposes a better value of L than the conventional one.

JVT-C101, High-Complexity Mode Decision for Error Prone Environment (C-W. Kim,McubeWorks; D-W. Kang, Kookmin University; I-S. Hwang, SK Telecom), proposes animprovement of the high complexity mode decision for error-prone channels presented in VCEG-N43r3. It was stated that the drift noise, which is defined as the encoder-decoder mismatch ofreference frame, is the major factor that causes degradation of video quality when transmittingcompressed video bitstream through error prone channel. Though intra-refreshment is known to bean effective method of reducing the drift noise, it generally requires sacrifice of compressionefficiency. JVT-C101 proposes an R-D optimization method considering estimated drift noise toperform optimized intra-refreshment.

JVT-C146, Prop. Common Cond. for Wireline IP/UDP/RTP (Wenger), was not available.

Due to a high volume of high-priority work toward definition of the normative standard draft, thegroup agreed to primarily defer consideration of the test model and other such non-normativeissues for further study in an ad hoc group and at future meetings.

Page 44: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

44 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

JVT Interim Ad Hoc Groups Established

Below is the current list of interim JVT ad hoc groups established at the Fairfax meeting that willreport to the Klagenfurt meeting (#4), their chair contact information, and their charters.

JVT Project ManagementChairs: Gary Sullivan ([email protected]), Ajay Luthra ([email protected]), andThomas Wiegand ([email protected])Charter: To further the work on the JVT project as a whole, including project planning, workcoordination, and status review.

Text Editing and Reference Software DevelopmentChair: Thomas Wiegand ([email protected])Charter: To further the work on the draft text and software implementation of the joint design,including incorporation of modifications as approved by the group, production of the jointcommittee draft text, collection of comments on the text and software, preparations to facilitatenecessary future text modification work, and provision of improved software for group use in futureexperiments and for eventual approval as standardized reference software.

Motion Compensation InterpolationChair: Thomas Wedi ([email protected])Charter: To study the design of the motion compensation interpolation processing in the JVTdesign, including consideration of the completeness and correctness of the draft specification, thepotential for use of adaptive motion interpolation and consideration of the rate-distortion-complexity tradeoffs in motion interpolation design.

Study of ComplexityChairs: Michael Horowitz ([email protected]) and Jan Bormans([email protected])Charter:• To study the implementation complexity of the AVC/JVT Codec.• To formulate strategies to minimize the computational and implementation AVC/JVT codec

complexity.• To provide guidelines for the complexity studies of the AVC/JVT Codec.• To collect meaningful AVC/JVT Codec parameter settings (to be provided by the JVT).

Deblocking Filter AnalysisChair: Peter List ([email protected])Charter: To study the completeness and correctness of deblocking filter specification; to investigatequality and complexity issues for loop filter design in the JVT codec and to assess the potential forimproved visual quality, reduced decoder computational complexity, and enhanced designsimplicity. In particular this ad hoc group should consider the relationship of the deblocking filterdesign to interlaced-scan video and the potential usefulness of turning off loop filtering around theboundaries of slices.

High-Level SyntaxChairs: Miska Hannuksela ([email protected]), Young-Kwon Lim([email protected]), Thomas Stockhammer ([email protected])Charter: To study the high-level syntax of JVT content (particularly syntax at the slice header level,picture header level, and parameter set level) to determine whether that this syntax is fully andproperly specified and supports the carriage of JVT bitstreams for all appropriate transportenvironments.

Page 45: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 45

Timing and Timing InformationChairs: Gary Sullivan ([email protected]) and Sherman Chen ([email protected])Charter: To study issues of timing (e.g., capture, presentation, etc.) and Film mode video in relationto the completeness and correctness of the JVT design.

CABACChair: Detlev Marpe ([email protected])Charter: To study the CABAC design with regards to completeness and correctness ofspecification, rate-distortion performance, and complexity.

Joint Model Reference Encoding Method DevelopmentChair: Chul-Woo Kim ([email protected])Charter: To further the work on description of example reference encoding methods for the JVTcodec design toward eventual adoption as non-normative standard text.

Adaptive Block TransformsChair: Mathias Wien ([email protected])Charter: To study the completeness and correctness of the specification of adaptive block transformuse in the JVT draft specification; focusing in particular on the degree of harmonization of the ABTdesign with other changes adopted into the draft.

Intra CodingChair: Marta Karczewicz ([email protected])Charter: Consideration of possible need to add new prediction modes for chroma prediction andinvestigate potential for reducing intra coding memory requirements.

JVT Meeting Roster, May 6 – 10, 2002, Fairfax, VA

Gary Sullivan, Microsoft JVT Rapporteur | ChairThomas Wiegand, Heinrich Hertz Institute, JVT Associate Rapporteur | Co-ChairAjay Luthra, Motorola, BCS JVT Associate Rapporteur | Co-Chair

Canada ATI Technologies, Inc. David Strasser [email protected] Comm. Research Centre Demin Wang [email protected] Scientific Atlanta Canada Alex Luccisano [email protected] UB Video Inc. Anthony Joch [email protected] UB Video Inc. Faouzi Kossentini [email protected] VideoLocus Inc. Guy Côté [email protected] VideoLocus Inc. Lowell Winger [email protected] ViXs Systems Patrick Rault [email protected] Microsoft Shipeng Li [email protected] Microsoft Feng Wu [email protected] Tsinghua University Yun He [email protected] Tsinghua University Peng Zhou [email protected] Nokia Corporation Miska Hannuksela [email protected] Nokia Research Center Antti Hallapuro [email protected] France Telecom Frederic Loras [email protected] France Telecom R&D Christophe Daguet [email protected]

m.comFrance Philips Marc Legrand [email protected] Philips Research France Joel Jung [email protected]

Page 46: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

46 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

France Philips Research France Yann Le Maguet [email protected] Philips semiconductors Thierry Fautier [email protected] Thomson Multimedia Anne Lorette [email protected] Thomson Multimedia R&D France Guillaume Boisson [email protected] Deutsche Telekom Peter List [email protected] Fraunhofer IIS-a Herbert Thoma [email protected] Heinrich-Hertz-Institute (HHI) Detlev Marpe [email protected] Heinrich-Hertz-Institute (HHI) Aljoscha Smolic [email protected] Heinrich Hertz Institute (HHI) Thomas Wiegand [email protected] Panasonic European Labs Martin Schlockermann [email protected] RWTH Aachen Jens-Rainer Ohm [email protected] RWTH Aachen Mathias Wien [email protected] Siemens AG Gero Bäse [email protected] Teles AG / TU Berlin Stephan Wenger [email protected] University of Hannover Thomas Wedi [email protected] Emuzed Inc. Rajesh Rajagopalan [email protected] Harmonic Inc. Natan Peterfreund [email protected] Aethra Roberto Flaiani [email protected] Fujitsu Laboratories Ltd. Akira Nakagawa +81-44-754-2347Japan Hitachi, Ltd. Yoshinori Suzuki [email protected] KDDI Corp. Koichi Takagi [email protected] Matsushita Electric Industrial Co. Shinya KadonoJapan Matsushita Electric Industrial Co. Satoshi KondoJapan Matsushita Electric Industrial Co. Yoshinori Matsui [email protected] Matsushita Electric Industrial Co. Takanori Senoh [email protected] Mitsubishi Electric Kohtaro Asai [email protected] Mitsubishi Electric Tokumichi Murakami [email protected]

elco.co.jpJapan Mitsubishi Electric Hirofumi Nishikawa [email protected] Mitsubishi Electric Shun-ichi Sekiguchi [email protected] NEC Yoshihiro Miyamoto [email protected] NTT Makoto Endoe [email protected] NTT Hideaki Kimata [email protected] NTT DoCoMo, Inc. Satoru Adachi [email protected] NTT DoCoMo, Inc. Sadaatsu Kato [email protected] Sony Corp. Kazushi Sato [email protected] Sony Corp. Teruhiko Suzuki [email protected] Sony Corp. Yoichi Yagasaki +81-3-5435-3891Japan Texas Instruments Japan Ngai-Man Cheung [email protected] Toshiba Takeshi Chujoh [email protected] Toshiba Yoshihiro Kikuchi [email protected] ETRI Bongsue Suh [email protected] K-JIST Yo-Sung Ho [email protected] Kookmin University Dong Wook Kang [email protected] Kyunghee University Doug Young Suh [email protected] LG Electronics Inc. Byeong-Moon Jeon [email protected] McubeWorks Inc. Korea Chul-Woo Kim [email protected] Net&tv Inc. Young-Kwon Lim [email protected] onTimetek Inc. Angelo Yong-Goo KimKorea Samsung AIT Woo-Shik Kim [email protected] Samsung Electronics Cheul-hee Hahm [email protected] Samsung Electronics Jeong-Hoon Park [email protected] Samsung Electronics Byung Cheol Song [email protected] Sejong University Yung Lyul Lee [email protected]

Page 47: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 47

Korea SK Telecom Kyoung Seok In [email protected] Sung Kyun Kwan Univ. Woongil Choi [email protected] Sung Kyun Kwan Univ. Byeungwoo Jeon [email protected] Thin Multimedia, Inc. Sang-hee Lee [email protected] University of Seoul Yong Han Kim [email protected] Philips Consumer Electronics Jan van der Meer [email protected] RealNetworks, Inc Gisle Bjontegaard [email protected] Tandberg Tom-Ivar Johansen [email protected] Warsaw Univ. of Technology Wladyslaw Skarbek [email protected] Panasonic Singapore Labs Teck Wee Foo [email protected] Panasonic Singapore Labs Shen Shengmei [email protected] Ericsson Rickard Sjöberg [email protected] Ericsson AB Per Fröjdh [email protected] BTexact Technologies Mike Nilsson [email protected] Essential Viewing Systems Limited Richard Fryer richard.fryer@essential-

viewing.comUK Mitsubishi Electric ITE-VIL Leszek Cieplinski [email protected] Tandberg Television Ping Wu [email protected] 8x8, Inc. Barry Andrews [email protected] Apple David Singer [email protected] Apple Computer Hsi-Jung Wu [email protected] Broadcom Corp. Sherman Chen [email protected] Broadcom Corp. Sheng Zhong [email protected] Cable TV Labs Mukta Kar [email protected] CableLabs Yasser Syed [email protected] Cirrus Logic Cheng-Tie Chen [email protected] Conexant Systems, Inc. John Nelson [email protected] Demografx, Inc. Tom McMahon [email protected] DoCoMo Communications Labs

USA Inc.Frank Bossen [email protected]

USA Expedient Media Solutions Matthew Goldman [email protected] FastVDO Pankaj Topiwala [email protected] GlobespanVirata Eric Viscito [email protected] Hewlett-Packard Susie Wee [email protected] Hicks & Associates (representing

U.S. D.o.D.)Dr. Guy W. Beakley [email protected]

USA IBM Cesar Gonzales [email protected] IBM Krishna Ratakonda [email protected] Independent Consultant Barry G. Haskell [email protected] Intel Vaughn Iverson [email protected] Intel Jeff McVeigh [email protected] JRI Technology Jordan Isailovic [email protected] LSI Logic Iole Moccagatta [email protected] Microsoft Shankar Regunathan [email protected] Microsoft Sridhar Srinivasan [email protected] Microsoft Gary J. Sullivan [email protected] Mitsubishi Electric Research

LaboratoriesHuifang Sun [email protected]

USA Motorola Faisal Ishtiaq [email protected] Motorola Ajay Luthra [email protected] Motorola Limin Wang [email protected] Nokia Jani Lainema [email protected] Nokia Research Center Marta Karczewicz [email protected] Nokia, USA Ragip Kurceren [email protected]

Page 48: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

48 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

USA PacketVideo Corp. Chun-Jen Tsai [email protected] Philips Research USA Yingwei Chen [email protected] Polycom Michael Horowitz [email protected] Polycom Dave Lindbergh [email protected] Pulsent Corporation Ron Richter [email protected] Qualcomm Amnon Silberger [email protected] Qualcomm Jay Yun [email protected] RealNetworks Greg Conklin [email protected] RealNetworks Gary S. Greenbaum [email protected] RealNetworks Karl O Lillevold [email protected] Sand Video, Inc. Peter Besen [email protected] Sarnoff Corporation Lulin Chen [email protected] Scientific-Atlanta, Inc. Arturo A. Rodriguez [email protected] Sharp Labs of America Louis Joseph Kerofsky [email protected] Sharp Labs of America Shawmin Lei [email protected] Sharp Labs of America Shijun Sun [email protected] Sony Ali Tabatabai [email protected] Sony Electronics Toby Walker [email protected] Sorceron Sassan Pejhan [email protected] Texas Instruments Minhua Zhou [email protected] Thomson Multimedia Jill Boyce [email protected] Thomson Multimedia Cristina Gomila [email protected] Thomson Multimedia Izzat H. Izzat [email protected] Thomson Multimedia Haoping Yu [email protected] VideoTele.com Peter Borgwardt [email protected] WebCast Technologies Weiping Li [email protected] Xilinx Labs Robert D. Turney [email protected]

Page 49: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 49

Acronym Definitions

ABT Adaptive Block TransformAHG Ad Hoc GroupAMVC Adaptive Motion Vector CodingASIC Application Specific Integrated CircuitAVC Audio Visual CodingB Bidirectionally Predictive (frames, pictures)BD Bjøntegaard DeltaBDBR Bjøntegaard Delta Bit RateBD-PSNR Bjøntegaard Delta PSNRCABAC Context-based Adaptive Binary Arithmetic CodingCBR Constant Bit RateCD Committee DraftCE Core ExperimentCfP Call for ProposalCIF Common Intermediate FormatCPU Central Processing UnitDC Direct Current (steady state)DCT Discrete Cosine TransformDP Direct mode P-PictureDSP Digital Signal ProcessingEBSP Emulation of Start Code PrefixesERPS Enhanced Reference Picture SelectionFEC Forward Error CorrectionFIR Finite Impulse ResponseFLC Fixed Length CodewordFMO Flexible Macroblock OrderingGMC Global Motion CompensationGMVC Global Motion Vector CodingGOP Group of PicturesHD High DefinitionHD DVD High Definition Digital Video DiskHDTV High Definition TelevisionHHR Half-Horizontal Resolution (352 by 480 or 576)HRD Hypothetical Reference DecoderI Intra (JVT)ID IdentificationIDCT Inverted Discrete Cosine TransformIEC International Electrotechnical CommissionIP Internet Protocol (IETF)IPR Intellectual Property RightsISO International Organization for StandardizationITU-R International Telecommunication Union - Radiocommunications SectorITU-T International Telecommunication Union - Telecommunications SectorJM Joint test Model (JVT Group)JTC Joint Technical CommitteeJVT Joint Video Team (ISO/IEC MPEG Video + ITU-T Q6/16)JWD Joint Working Draft (JVT Group)LSB Least Significant BitMB Macro BlockMC Motion CompensationMCP Motion Compensated Prediction

Page 50: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

50 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

MFIP Multi-Frame Interpolative PredictionMH MultiHypothesis motion pictures (H.26L)MMCO Memory Management Control OperationMPEG Motion Picture Experts Group (ISO/IEC)MPI Minimum Picture IntervalMV Motion VectorMVD Motion Vector DataNAL Network Adaptation LayerNALP NAL PacketP Predicted (JVT)PAR Peak to Average RatioPCM Pulse Code ModulationPES Program Elementary StreamPSNR Peak Signal to Noise RatioPTS Presentation Time StampQCIF Quarter CIFQ-DCT Quantized DCTQoS Quality of ServiceQP Quantization Parameter (H.262)QUANT Quantization parameterRAND Random Challenge MemoryRD Rate DistortionRFB Royalty Free BaselineRM Resynchronization MarkerRPBT Reference Picture Buffering TypeRPSL Reference Picture Selection LayerRPSLI Indication of RPSLRTP Real Time Transport Protocol (IETF)S SwitchableS/W SoftwareSDTV Standard Definition TelevisionSEI Supplemental Enhancement InformationSG Study Group (ITU)SHDTV Super HDTVSI Still ImageSMPTE Society of Motion Picture and Television EngineersSNR Signal to Noise RatioSP Switchable-P [frames]TBD To be DeterminedTEL Transport Encapsulation LayerTML Test ModelTR Temporal ReferenceTS/PS Transport Stream / Program StreamTV TelevisionUDP User Datagram Protocol (IETF)USNB U.S. National BodyUVLC Universal Variable Length CodewordVBR Variable Bit RateVBV Video Buffer VerificationVCEG Video Coding Experts GroupVCL Video Coding LayerVLC Variable Length CodingWD Working DocumentWG Working Group

Page 51: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

July 24, 2002 Vol. 13.25 Copyright © CSR 2002 51

XML eXtended Markup LanguageYCbCr Color space used in digital video

The CSR LibrarySubscribers may order copies of documents shown in boldface type from CommunicationsStandards Review, where not controlled. $50.00 for the first document in any order, $40.00for the second, and $25.00 for each additional document in any order. Volume discountsavailable. Please contact CSR.

Documents listed with © are controlled documents. These documents are not for sale, but wecan provide you with the author’s contact information. ITU and ETSI meeting documents arealso not for sale, but we can provide you with the author’s contact information.

We have a large library of standards work in process and can help you locate otherinformation you may need.

CSR recommends that you obtain published standards from Global Engineering Documents.Tel: 800 854-7179, +1 303 792-2181, Fax : +1 303 397-7935, http://global.ihs.com

Page 52: COMMUNICATIONS STANDARDS REVIEWCOMMUNICATIONS STANDARDS REVIEW Volume 13, Number 25 July 24, 2002 REPORT OF JOINT VIDEO TEAM (JVT) MEETING #3, ISO/IEC JTC1/SC29/WG11 (MPEG) AND ITU-T

COMMUNICATIONS STANDARDS REVIEW

52 Vol. 13.25 Copyright © CSR 2002 July 24, 2002

Communications Standards Review Copyright Policy

Copying of individual articles/reports for distribution within an organization is not permitted, unlessthe user holds a multiple copy license from CSR. The single user electronic version may bemounted on a server whose access is restricted both to a single organization and to one user at atime. You are welcome to forward your single user electronic copy (deleting it on your system) toanother user in your organization. CSR offers an Intranet subscription which permits unlimitedcopies to the subscribing organization.

Year 2002 Standards Committee Meeting SchedulesPlease see the updated calendar at http://www.csrstds.com/mtgs.html.

Visit the CSR Web Pages: http://www.csrstds.comThe Web Pages include an on-line store (order subscriptions and reports), an updatedTelecom Acronym Definitions list, updated meeting schedules, background material ontelecom standards and CSR (the company), data sheets on both CSR technical journals, andmore.

Communications Standards Reviewregularly covers the following committee meetings:

ITU-T SG9 Cable Networks & TransmissionSG15 WP1 Network AccessSG15 WP2 Network Signal ProcessingSG16 Multimedia

ETSI AT Access and TerminalsTIPHON Voice over InternetTM6 Transmission & Multiplexing

Communications Standards Review (ISSN 1064-3907) reports are published within days after the relatedstandards meetings. Publisher: Elaine J. Baskin, Ph.D. Technical Editor: Ken Krechmer. Subscription Manager:Denise Hylen Lai. Copyright © 2002, Communications Standards Review. All rights reserved. Subscriptions:$795.00 per year worldwide, electronic format; $995.00 paper format. Corporate Intranet subscriptions (Corporatelicense for unlimited copies) are $2,150.00. Submit articles for consideration to: Communications StandardsReview, 757 Greer Road, Palo Alto, CA 94303-3024 USA. Tel: +1-650-856-9018. Fax: +1-650-856-6591.e-mail: [email protected]. Web: http://www.csrstds.com. 13677


Recommended