Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility

transcript

Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder

Compatibility

Jingming XuMultimedia Communications Lab

University of Waterloo

September 16th, 2005 2

Outline Introduction and motivation MP3, AAC, and Two-nested-loop Searc

h Rate-distortion optimization for MP3 Rate-distortion optimization for AAC Conclusions and Future Research

Introduction Audio coding - different from universal data

compression Long term correlations Multi-channel correlations Subject to natural noises Subjective perceptual quality judgement

Audio coding methods - for both lossy and lossless

Linear prediction Time-frequency mapping (DCT, FFT, MDCT, etc.) Parameter coding ….

Introduction (2)

MPEG - the most successful audio coding standard series so far

MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity

MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions

MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality

MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions

MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular “MP3”

Introduction (3) Motivations

MP3 and AAC leave structured encoding blocks design open for performance enhancement.

The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard-constrained flexibility for best rate-distortion tradeoff.

The huge success of MP3 and AAC in the digital audio industry.

Introduction (4)

Quality evaluation of compressed audio Most widely used objective measure - noise-to-

mask ratio

Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116)

Triple sources A, B, C with hidden reference, double blind

5-grade impairment score scale

MP3 and AAC audio coding standards

Encoding process Window switching Stereo coding Pre-processing in AAC: gain control, prediction, noise

shaping and substitution, etc.

MP3 and AAC audio coding standards (2)

Quantization and entropy coding in MP3 Scale factor bands and non-uniform quantization

scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream

Quantization and entropy coding in MP3 Huffman coding

34 fixed Huffman codebooks Huffman coding region division: Each region is coded

with a different codebook that best matches the statistics of that region.big_value, count_1, zero, ….

Quantization and entropy coding in AAC Non-uniform quantizer: same as in MP3 scale_factor values are differentially encoded

relatively to the one of the preceding band by fixed Huffman codebook

Huffman coding 12 fixed Huffman codebooks Huffman coding region division: Section boundaries

can only be at the scale factor band boundaries For each section, the length of the section in scale

factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits.

Two-nested-loop Search algorithm

Inner LoopOuter Loop

Two-nested-loop Search algorithm (2)

Problems in TNLS Quantization, scale factor adaption and Huffman

coding are considered separately. Has no convergence guarantee Does not target at minimizing the overall distortion Disregards the inter-band correlations of scale

factors and Huffman codebook selection in AAC

Rate-distortion optimization for MP3

Problem formulation Lagrangian RD cost minimization

- quantized coefficients

- scale factors

- Huffman coding region division- Huffman codebook selection

- non-uniform de-quantizer defined in MP3- noise-to-mask ratio

Rate-distortion optimization for MP3 (2)

Problem formulation Soft-decision quantization

In conventional hard-decision quantization, is solely determined by given , i.e., .However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore, .

Rate-distortion optimization for MP3 (3) Fixed-slope graph-based iterative RD

optimization Step 1: Initialize a set of scale factors from the given

frame of spectrum with a HCB selection fashion . Set t=0, and specify a tolerance as the convergence criterion.

Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division

fashion throughout a standard-constrained graph, where and achieve the minimum

Denote by .

Graph Search for MP3 Quantized Spectrum and Region Division

Rate-distortion optimization for MP3 (5) Fixed-slope graph-based iterative RD optimization

Step 3: Given , and , update to , so thatachieves the minimum

Step 4: Given , and , update to , so that achieves the minimum

Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until , then output , , and .

Simulation results: ANMR (implementation based on ISO MP3 reference codec)

violin.wav spme50_1.wav

Simulation results: ANMR (implementation based on LAME3.96.1 Best-quality mode)

Simulation results: ITU listening test (80kb/s)

Remarks The iteration process may only achieve local

optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance.

The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS.

One can adaptively adjust the value of , to meet rate or distortion constraints in real audio compression applications.

Rate-distortion optimization for AAC

Problem formulation Lagrangian RD cost minimization

- scale factor sequence

- Huffman codebook index sequence

first-order inter-band dependency ->

Dynamic programming (Viterbi algorithm)

Rate-distortion optimization for AAC (2)

Fixed-slope trellis-based RD optimization Step 1: Build up trellis structure. For each state

, = 0,1,…., -1, = 0,1,…., -1, = 0,1,…., -1, in the trellis, find the best to minimize its decomposed RD cost

Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm

Step 3: Backtrack the optimal , and as final output

Trellis Structure for AAC Quantization and Entropy Coding

Simulation results: ANMR Implementation based on ISO AAC reference codec Also compared with Aggarwal’s approach (Steps 2, 3 only)

Simulation results: ITU listening test (64kb/s)

Remarks The fixed-slope trellis-based algorithm we proposed

achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints.

Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame.

Conclusions and Future Research Conclusions

Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm.

Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement.

Substantial performance improvement against the state-of-the-art encoders is achieved with complete decoder compatibility in each case.

Conclusions and Future Research (2) Future research

Real-time implementations Extension to scalable AAC Joint pre-processing and optimization for AAC Optimal lossy audio compression without syntax

constraints Optimal settings for transform (e.g. block lengths),

quantization (e.g. stepsizes) and prediction Joint design of quantization and entropy coding ….

Questions?

Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder Compatibility

Documents