Post on 22-Jan-2016
description
transcript
Rate-distortion Optimization for MP3 and AAC Audio Coding with Complete Decoder
Compatibility
Jingming XuMultimedia Communications Lab
University of Waterloo
September 16th, 2005 2
Outline Introduction and motivation MP3, AAC, and Two-nested-loop Searc
h Rate-distortion optimization for MP3 Rate-distortion optimization for AAC Conclusions and Future Research
September 16th, 2005 3
Introduction Audio coding - different from universal data
compression Long term correlations Multi-channel correlations Subject to natural noises Subjective perceptual quality judgement
Audio coding methods - for both lossy and lossless
Linear prediction Time-frequency mapping (DCT, FFT, MDCT, etc.) Parameter coding ….
September 16th, 2005 4
Introduction (2)
MPEG - the most successful audio coding standard series so far
MPEG-1 (1992) - T/F mapping based, 3 Layers with increased complexity
MPEG-2 BC (1994) - backward compatible with MPEG-1, with multi-channel and sampling frequency extensions
MPEG-2 AAC (1997) - introducing more coding tools and giving up backward compatibility to improve quality
MPEG-4 AAC (1999) - inherited from MPEG-2 AAC with TwinTQ and bitrate scalability extensions
MPEG-1 Layer 3 and MPEG-2 BC Layer 3 define the popular “MP3”
September 16th, 2005 5
Introduction (3) Motivations
MP3 and AAC leave structured encoding blocks design open for performance enhancement.
The state-of-the-art MP3 and AAC quantization and entropy coding scheme, Two-nested-loop Search (TNLS), is essentially incapable to exploit the maximal standard-constrained flexibility for best rate-distortion tradeoff.
The huge success of MP3 and AAC in the digital audio industry.
September 16th, 2005 6
Introduction (4)
Quality evaluation of compressed audio Most widely used objective measure - noise-to-
mask ratio
Most widely used subjective measure - ITU listening test (ITU-R Recommendation BS.1116)
Triple sources A, B, C with hidden reference, double blind
5-grade impairment score scale
September 16th, 2005 7
MP3 and AAC audio coding standards
Encoding process Window switching Stereo coding Pre-processing in AAC: gain control, prediction, noise
shaping and substitution, etc.
September 16th, 2005 8
MP3 and AAC audio coding standards (2)
Quantization and entropy coding in MP3 Scale factor bands and non-uniform quantization
scale_factor values are encoded by fixed number of bits in the side information and variable number of bits in the main_data stream
September 16th, 2005 9
MP3 and AAC audio coding standards (3)
Quantization and entropy coding in MP3 Huffman coding
34 fixed Huffman codebooks Huffman coding region division: Each region is coded
with a different codebook that best matches the statistics of that region.big_value, count_1, zero, ….
September 16th, 2005 10
MP3 and AAC audio coding standards (4)
Quantization and entropy coding in AAC Non-uniform quantizer: same as in MP3 scale_factor values are differentially encoded
relatively to the one of the preceding band by fixed Huffman codebook
Huffman coding 12 fixed Huffman codebooks Huffman coding region division: Section boundaries
can only be at the scale factor band boundaries For each section, the length of the section in scale
factor bands, and the index of the codebook used for that section, are transmitted with a fixed number of bits.
September 16th, 2005 11
Two-nested-loop Search algorithm
Inner LoopOuter Loop
September 16th, 2005 12
Two-nested-loop Search algorithm (2)
Problems in TNLS Quantization, scale factor adaption and Huffman
coding are considered separately. Has no convergence guarantee Does not target at minimizing the overall distortion Disregards the inter-band correlations of scale
factors and Huffman codebook selection in AAC
September 16th, 2005 13
Rate-distortion optimization for MP3
Problem formulation Lagrangian RD cost minimization
- quantized coefficients
- scale factors
- Huffman coding region division- Huffman codebook selection
- non-uniform de-quantizer defined in MP3- noise-to-mask ratio
September 16th, 2005 14
Rate-distortion optimization for MP3 (2)
Problem formulation Soft-decision quantization
In conventional hard-decision quantization, is solely determined by given , i.e., .However, in the soft-decision quantization scenario, is considered as a flexible coding factor and selected such that the actual RD cost can be minimized. Therefore, .
September 16th, 2005 15
Rate-distortion optimization for MP3 (3) Fixed-slope graph-based iterative RD
optimization Step 1: Initialize a set of scale factors from the given
frame of spectrum with a HCB selection fashion . Set t=0, and specify a tolerance as the convergence criterion.
Step 2: Given and for any t 0, find the optimal quantized spectrum and HCB region division
fashion throughout a standard-constrained graph, where and achieve the minimum
Denote by .
September 16th, 2005 16
Rate-distortion optimization for MP3 (4)
Graph Search for MP3 Quantized Spectrum and Region Division
September 16th, 2005 17
Rate-distortion optimization for MP3 (5) Fixed-slope graph-based iterative RD optimization
Step 3: Given , and , update to , so thatachieves the minimum
Step 4: Given , and , update to , so that achieves the minimum
Step 5: Repeat Steps 2, 3 and 4 for t = 0,1,2…. Until , then output , , and .
September 16th, 2005 18
Rate-distortion optimization for MP3 (6)
Simulation results: ANMR (implementation based on ISO MP3 reference codec)
violin.wav spme50_1.wav
September 16th, 2005 19
Rate-distortion optimization for MP3 (7)
Simulation results: ANMR (implementation based on LAME3.96.1 Best-quality mode)
violin.wav spme50_1.wav
September 16th, 2005 20
Rate-distortion optimization for MP3 (8)
Simulation results: ITU listening test (80kb/s)
September 16th, 2005 21
Rate-distortion optimization for MP3 (9)
Remarks The iteration process may only achieve local
optimality, thus a wisely chosen initial state is favored when one targets at achieving the best possible RD performance.
The fixed-slope graph-based iterative algorithm we proposed provides a feasible solution to the problems in TNLS.
One can adaptively adjust the value of , to meet rate or distortion constraints in real audio compression applications.
September 16th, 2005 22
Rate-distortion optimization for AAC
Problem formulation Lagrangian RD cost minimization
- scale factor sequence
- Huffman codebook index sequence
first-order inter-band dependency ->
Dynamic programming (Viterbi algorithm)
September 16th, 2005 23
Rate-distortion optimization for AAC (2)
Fixed-slope trellis-based RD optimization Step 1: Build up trellis structure. For each state
, = 0,1,…., -1, = 0,1,…., -1, = 0,1,…., -1, in the trellis, find the best to minimize its decomposed RD cost
Step 2: Find the optimal path throughout the Trellis by Viterbi algorithm
Step 3: Backtrack the optimal , and as final output
September 16th, 2005 24
Rate-distortion optimization for AAC (3)
Trellis Structure for AAC Quantization and Entropy Coding
September 16th, 2005 25
Rate-distortion optimization for AAC (4)
Simulation results: ANMR Implementation based on ISO AAC reference codec Also compared with Aggarwal’s approach (Steps 2, 3 only)
violin.wav spme50_1.wav
September 16th, 2005 26
Rate-distortion optimization for AAC (5)
Simulation results: ITU listening test (64kb/s)
September 16th, 2005 27
Rate-distortion optimization for AAC (6)
Remarks The fixed-slope trellis-based algorithm we proposed
achieves the global optimum RD performance within the quantization and entropy coding stage under the AAC standard constraints.
Joint design of the pre-processing decisions with our proposed optimization can theoretically achieve the global optimum performance in the entire standard-constrained parameter space, however, with computational complexity exponential to the number of bands per frame.
September 16th, 2005 28
Conclusions and Future Research Conclusions
Fixed-slope approach converts the encoding problem to a search problem through a constrained space and then permits the implementation of efficient sequential search algorithm.
Soft-decision quantization spirit completes our RD optimization frameworks, and introduces significant performance enhancement.
Substantial performance improvement against the state-of-the-art encoders is achieved with complete decoder compatibility in each case.
September 16th, 2005 29
Conclusions and Future Research (2) Future research
Real-time implementations Extension to scalable AAC Joint pre-processing and optimization for AAC Optimal lossy audio compression without syntax
constraints Optimal settings for transform (e.g. block lengths),
quantization (e.g. stepsizes) and prediction Joint design of quantization and entropy coding ….
Questions?