+ All Categories
Home > Documents > Voice and Audio Compression for Wireless...

Voice and Audio Compression for Wireless...

Date post: 04-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
212
VOICE-BOO 2007/8/20 page 1 Voice and Audio Compression for Wireless Communications by c L. Hanzo, F.C.A. Somerville, J.P. Woodard, H-T. How School of Electronics and Computer Science, University of Southampton, UK
Transcript
Page 1: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 1

Voice and Audio Compression for WirelessCommunications

by

c©L. Hanzo, F.C.A. Somerville, J.P. Woodard, H-T. HowSchool of Electronics and Computer Science,

University of Southampton, UK

Page 2: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page i

Contents

Preface and Motivation 1

Acknowledgements 11

I Speech Signals and Waveform Coding 13

1 Speech Signals and Coding 151.1 Motivation of Speech Compression . . . . . . . . . . . . . . . . . . .. . . . 151.2 Basic Characterisation of Speech Signals . . . . . . . . . . . .. . . . . . . . 161.3 Classification of Speech Codecs . . . . . . . . . . . . . . . . . . . . .. . . 20

1.3.1 Waveform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.3.1.1 Time-domain Waveform Coding . . . . . . . . . . . . . . 211.3.1.2 Frequency-domain Waveform Coding . . . . . . . . . . . . 21

1.3.2 Vocoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.3.3 Hybrid Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.4 Waveform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.1 Digitisation of Speech . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.2 Quantisation Characteristics . . . . . . . . . . . . . . . . . . .. . . 251.4.3 Quantisation Noise and Rate-Distortion Theory . . . . .. . . . . . . 251.4.4 Non-uniform Quantisation for a Known PDF: Companding. . . . . . 281.4.5 PDF-independent Quantisation using Logarithmic Compression . . . 31

1.4.5.1 Theµ-Law Compander . . . . . . . . . . . . . . . . . . . 321.4.5.2 The A-law Compander . . . . . . . . . . . . . . . . . . . 33

1.4.6 Optimum Non-uniform Quantisation . . . . . . . . . . . . . . . .. . 351.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2 Predictive Coding 412.1 Forward Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . .. . 412.2 DPCM Codec Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3 Predictor Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

i

Page 3: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page ii

ii CONTENTS

2.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3.2 Covariance Coefficient Computation . . . . . . . . . . . . . . .. . . 452.3.3 Predictor Coefficient Computation . . . . . . . . . . . . . . . .. . . 46

2.4 Adaptive One-word-memory Quantization . . . . . . . . . . . . .. . . . . . 502.5 DPCM Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.6 Backward-Adaptive Prediction . . . . . . . . . . . . . . . . . . . . .. . . . 55

2.6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.6.2 Stochastic Model Processes . . . . . . . . . . . . . . . . . . . . . .57

2.7 The 32 kbps G.721 ADPCM Codec . . . . . . . . . . . . . . . . . . . . . . 602.7.1 Functional Description of the G.721 Codec . . . . . . . . . .. . . . 602.7.2 Adaptive Quantiser . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.7.3 G.721 Quantiser Scale Factor Adaptation . . . . . . . . . . .. . . . 622.7.4 G.721 Adaptation Speed Control . . . . . . . . . . . . . . . . . . .. 632.7.5 G.721 Adaptive Prediction and Signal Reconstruction. . . . . . . . 64

2.8 Speech Quality Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .. . 662.9 G.726 and G.727 ADPCM Coding . . . . . . . . . . . . . . . . . . . . . . . 68

2.9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.9.2 Embedded G.727 ADPCM coding . . . . . . . . . . . . . . . . . . . 682.9.3 Performance of the Embedded G.727 ADPCM Codec . . . . . . .. 70

2.10 Rate-Distortion in Predictive Coding . . . . . . . . . . . . . .. . . . . . . . 742.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

II Analysis by Synthesis Coding 83

3 Analysis-by-synthesis Principles 853.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.2 Analysis-by-synthesis Codec Structure . . . . . . . . . . . . .. . . . . . . . 863.3 The Short-term Synthesis Filter . . . . . . . . . . . . . . . . . . . .. . . . . 873.4 Long-Term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 90

3.4.1 Open-loop Optimisation of LTP parameters . . . . . . . . . .. . . . 903.4.2 Closed-loop Optimisation of LTP parameters . . . . . . . .. . . . . 96

3.5 Excitation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1003.6 Adaptive Post-filtering . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 1023.7 Lattice-based Linear Prediction . . . . . . . . . . . . . . . . . . .. . . . . . 1053.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

4 Speech Spectral Quantization 1134.1 Log-area Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.2 Line Spectral Frequencies . . . . . . . . . . . . . . . . . . . . . . . . .. . . 117

4.2.1 Derivation of the Line Spectral Frequencies . . . . . . . .. . . . . . 1174.2.2 Computation of the Line Spectral Frequencies . . . . . . .. . . . . . 1214.2.3 Chebyshev-description of Line Spectral Frequencies. . . . . . . . . 123

4.3 Spectral Vector Quantization . . . . . . . . . . . . . . . . . . . . . .. . . . 1254.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.3.2 Speaker-adaptive Vector Quantisation of LSFs . . . . . .. . . . . . 129

Page 4: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page iii

CONTENTS iii

4.3.3 Stochastic VQ of LPC Parameters . . . . . . . . . . . . . . . . . . .1304.3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 1314.3.3.2 The Stochastic VQ Algorithm . . . . . . . . . . . . . . . . 132

4.3.4 Robust Vector Quantisation Schemes for LSFs . . . . . . . .. . . . 1344.3.5 LSF Vector-quantisers in Standard Codecs . . . . . . . . . .. . . . . 136

4.4 Spectral Quantizers for Wideband Speech Coding . . . . . . .. . . . . . . . 1374.4.1 Introduction to Wideband Spectral Quantisation . . . .. . . . . . . . 137

4.4.1.1 Statistical Properties of Wideband LSFs . . . . . . . . .. 1394.4.1.2 Speech Codec Specifications . . . . . . . . . . . . . . . . 139

4.4.2 Wideband LSF Vector Quantizers . . . . . . . . . . . . . . . . . . .1424.4.2.1 Memoryless Vector Quantization . . . . . . . . . . . . . . 1424.4.2.2 Predictive Vector Quantization . . . . . . . . . . . . . . . 1454.4.2.3 Multimode Vector Quantization . . . . . . . . . . . . . . . 149

4.4.3 Simulation Results and Subjective Evaluations . . . . .. . . . . . . 1524.4.4 Conclusions on Wideband Spectral Quantisation . . . . .. . . . . . 153

4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

5 RPE Coding 1555.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 1555.2 The 13 kbps RPE-LTP GSM Speech encoder . . . . . . . . . . . . . . . .. 162

5.2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1625.2.2 STP analysis filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 1645.2.3 LTP analysis filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 1655.2.4 Regular Excitation Pulse Computation . . . . . . . . . . . . .. . . . 165

5.3 The 13 kbps RPE-LTP GSM Speech Decoder . . . . . . . . . . . . . . . .. 1665.4 Bit-sensitivity of the GSM Codec . . . . . . . . . . . . . . . . . . . .. . . . 1705.5 A ’Tool-box’ Based Speech Transceiver . . . . . . . . . . . . . . .. . . . . 1715.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6 Forward-Adaptive CELP Coding 1756.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1756.2 The Original CELP Approach . . . . . . . . . . . . . . . . . . . . . . . . .1766.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1796.4 CELP Excitation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .181

6.4.1 Binary Pulse Excitation . . . . . . . . . . . . . . . . . . . . . . . . 1816.4.2 Transformed Binary Pulse Excitation . . . . . . . . . . . . . .. . . 182

6.4.2.1 Excitation Generation . . . . . . . . . . . . . . . . . . . . 1826.4.2.2 TBPE Bit Sensitivity . . . . . . . . . . . . . . . . . . . . 184

6.4.3 Dual-rate Algebraic CELP Coding . . . . . . . . . . . . . . . . . .. 1876.4.3.1 ACELP Codebook Structure . . . . . . . . . . . . . . . . 1876.4.3.2 Dual-rate ACELP Bitallocation . . . . . . . . . . . . . . . 1896.4.3.3 Dual-rate ACELP Codec Performance . . . . . . . . . . . 190

6.5 CELP Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1916.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1916.5.2 Calculation of the Excitation Parameters . . . . . . . . . .. . . . . . 192

6.5.2.1 Full Codebook Search Theory . . . . . . . . . . . . . . . . 192

Page 5: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page iv

iv CONTENTS

6.5.2.2 Sequential Search Procedure . . . . . . . . . . . . . . . . 1946.5.2.3 Full Search Procedure . . . . . . . . . . . . . . . . . . . . 1956.5.2.4 Sub-Optimal Search Procedures . . . . . . . . . . . . . . . 1976.5.2.5 Quantization of the Codebook Gains . . . . . . . . . . . . 198

6.5.3 Calculation of the Synthesis Filter Parameters . . . . .. . . . . . . . 2006.5.3.1 Bandwidth Expansion . . . . . . . . . . . . . . . . . . . . 2016.5.3.2 Least Squares Techniques . . . . . . . . . . . . . . . . . . 2016.5.3.3 Optimization via Powell’s Method . . . . . . . . . . . . . 2046.5.3.4 Simulated Annealing and the Effects of Quantization . . . 205

6.6 CELP Error-sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 2096.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2096.6.2 Improving the Spectral Information Error Sensitivity . . . . . . . . . 209

6.6.2.1 LSF Ordering Policies . . . . . . . . . . . . . . . . . . . . 2096.6.2.2 The Effect of FEC on the Spectral Parameters . . . . . . .2116.6.2.3 The Effect of Interpolation . . . . . . . . . . . . . . . . . 212

6.6.3 Improving the Error Sensitivity of the Excitation Parameters . . . . . 2136.6.3.1 The Fixed Codebook Index . . . . . . . . . . . . . . . . . 2146.6.3.2 The Fixed Codebook Gain . . . . . . . . . . . . . . . . . . 2146.6.3.3 Adaptive Codebook Delay . . . . . . . . . . . . . . . . . . 2156.6.3.4 Adaptive Codebook Gain . . . . . . . . . . . . . . . . . . 215

6.6.4 Matching Channel Codecs to the Speech Codec . . . . . . . . .. . . 2166.6.5 Error Resilience Conclusions . . . . . . . . . . . . . . . . . . . .. . 220

6.7 Dual-mode Speech Transceiver . . . . . . . . . . . . . . . . . . . . . .. . . 2216.7.1 The Transceiver Scheme . . . . . . . . . . . . . . . . . . . . . . . . 2216.7.2 Re-configurable Modulation . . . . . . . . . . . . . . . . . . . . . .2226.7.3 Source-matched Error Protection . . . . . . . . . . . . . . . . .. . . 224

6.7.3.1 Low-quality 3.1 kBd Mode . . . . . . . . . . . . . . . . . 2246.7.3.2 High-quality 3.1 kBd Mode . . . . . . . . . . . . . . . . . 228

6.7.4 Packet Reservation Multiple Access . . . . . . . . . . . . . . .. . . 2296.7.5 3.1 kBd System Performance . . . . . . . . . . . . . . . . . . . . . . 2316.7.6 3.1 kBd System Summary . . . . . . . . . . . . . . . . . . . . . . . 234

6.8 Multi-slot PRMA Transceiver . . . . . . . . . . . . . . . . . . . . . . .. . 2356.8.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 2356.8.2 PRMA-assisted Multi-slot Adaptive Modulation . . . . .. . . . . . 2356.8.3 Adaptive GSM-like Schemes . . . . . . . . . . . . . . . . . . . . . . 2376.8.4 Adaptive DECT-like Schemes . . . . . . . . . . . . . . . . . . . . . 2386.8.5 Summary of Adaptive Multi-slot PRMA . . . . . . . . . . . . . . .. 239

6.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

7 Standard Speech Codecs 2417.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2417.2 The US DoD FS-1016 4.8 kbits/s CELP Codec . . . . . . . . . . . . . .. . 241

7.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2417.2.2 LPC Analysis and Quantization . . . . . . . . . . . . . . . . . . . .2437.2.3 The Adaptive Codebook . . . . . . . . . . . . . . . . . . . . . . . . 2447.2.4 The Fixed Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . 245

Page 6: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page v

CONTENTS v

7.2.5 Error Concealment Techniques . . . . . . . . . . . . . . . . . . . .. 2467.2.6 Decoder Post-Filtering . . . . . . . . . . . . . . . . . . . . . . . . .2477.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

7.3 The IS-54 DAMPS speech codec . . . . . . . . . . . . . . . . . . . . . . . .2477.4 The JDC speech codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2517.5 The Qualcomm Variable Rate CELP Codec . . . . . . . . . . . . . . . .. . 253

7.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2537.5.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . .. . 2547.5.3 Codec Rate Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 2557.5.4 LPC Analysis and Quantization . . . . . . . . . . . . . . . . . . . .2567.5.5 The Pitch Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2577.5.6 The Fixed Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . 2587.5.7 Rate 1/8 Filter Excitation . . . . . . . . . . . . . . . . . . . . . . .. 2597.5.8 Decoder Post-Filtering . . . . . . . . . . . . . . . . . . . . . . . . .2607.5.9 Error Protection and Concealment Techniques . . . . . . .. . . . . . 2607.5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

7.6 Japanese Half-Rate Speech Codec . . . . . . . . . . . . . . . . . . . .. . . 2617.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2617.6.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . .. . 2627.6.3 Encoder Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . 2647.6.4 LPC Analysis and Quantization . . . . . . . . . . . . . . . . . . . .2647.6.5 The Weighting Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.6.6 Excitation Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.6.7 Excitation Vector 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2667.6.8 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2667.6.9 Decoder Post Processing . . . . . . . . . . . . . . . . . . . . . . . . 268

7.7 The half-rate GSM codec . . . . . . . . . . . . . . . . . . . . . . . . . . . .2697.7.1 Half-rate GSM codec outline . . . . . . . . . . . . . . . . . . . . . .2697.7.2 Half-rate GSM Codec’s Spectral Quantisation . . . . . . .. . . . . . 2717.7.3 Error protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

7.8 The 8 kbits/s G.729 Codec . . . . . . . . . . . . . . . . . . . . . . . . . . .2737.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2737.8.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . .. . 2747.8.3 Encoder Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . 2757.8.4 LPC Analysis and Quantization . . . . . . . . . . . . . . . . . . . .2767.8.5 The Weighting Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 2787.8.6 The Adaptive Codebook . . . . . . . . . . . . . . . . . . . . . . . . 2797.8.7 The Fixed Algebraic Codebook . . . . . . . . . . . . . . . . . . . . 2807.8.8 Quantization of the Gains . . . . . . . . . . . . . . . . . . . . . . . 2837.8.9 Decoder Post Processing . . . . . . . . . . . . . . . . . . . . . . . . 2847.8.10 G.729 Error Concealment Techniques . . . . . . . . . . . . . .. . . 2867.8.11 G.729 Bit-sensitivity . . . . . . . . . . . . . . . . . . . . . . . . .. 2877.8.12 Turbo-coded OFDM G.729 Speech Transceiver . . . . . . . .. . . . 288

7.8.12.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 2887.8.12.2 System Overview . . . . . . . . . . . . . . . . . . . . . . 2887.8.12.3 Turbo Channel Encoding . . . . . . . . . . . . . . . . . . 289

Page 7: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page vi

vi CONTENTS

7.8.12.4 OFDM in the FRAMES Speech/Data Sub–Burst . . . . . . 2907.8.12.5 Channel model . . . . . . . . . . . . . . . . . . . . . . . . 2907.8.12.6 Turbo-coded G.729 OFDM Parameters . . . . . . . . . . . 2917.8.12.7 Turbo-coded G.729 OFDM Performance . . . . . . . . . . 2927.8.12.8 Turbo-coded G.729 OFDM Summary . . . . . . . . . . . . 293

7.8.13 G.729 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2957.9 The Reduced Complexity G.729 Annex A Codec . . . . . . . . . . . .. . . 295

7.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2957.9.2 The Perceptual Weighting Filter . . . . . . . . . . . . . . . . . .. . 2967.9.3 The Open Loop Pitch Search . . . . . . . . . . . . . . . . . . . . . . 2967.9.4 The Closed Loop Pitch Search . . . . . . . . . . . . . . . . . . . . . 2967.9.5 The Algebraic Codebook Search . . . . . . . . . . . . . . . . . . . .2977.9.6 The Decoder Post Processing . . . . . . . . . . . . . . . . . . . . . .2987.9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

7.10 The Enhanced Full-rate GSM codec . . . . . . . . . . . . . . . . . . .. . . 2987.10.1 Codec Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2987.10.2 Operation of the EFR-GSM Encoder . . . . . . . . . . . . . . . . .. 300

7.10.2.1 Spectral Quantisation in the EFR-GSM Codec . . . . . .. 3007.10.2.2 Adaptive Codebook Search . . . . . . . . . . . . . . . . . 3027.10.2.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . 303

7.11 The IS-136 Speech Codec . . . . . . . . . . . . . . . . . . . . . . . . . . .3047.11.1 IS-136 codec outline . . . . . . . . . . . . . . . . . . . . . . . . . . 3047.11.2 IS-136 Bitallocation scheme . . . . . . . . . . . . . . . . . . . .. . 3057.11.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . 3077.11.4 IS-136 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . 307

7.12 The ITU G.723.1 Dual-Rate Codec . . . . . . . . . . . . . . . . . . . .. . . 3087.12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3087.12.2 G.723.1 Encoding Principle . . . . . . . . . . . . . . . . . . . . .. 3097.12.3 Vector-Quantisation of the LSPs . . . . . . . . . . . . . . . . .. . . 3127.12.4 Formant-based Weighting Filter . . . . . . . . . . . . . . . . .. . . 3127.12.5 The 6.3 kbps High-rate G.723.1 Excitation . . . . . . . . .. . . . . 3137.12.6 The 5.3 kbps low-rate G.723.1 excitation . . . . . . . . . .. . . . . 3147.12.7 G.723.1 Bitallocation . . . . . . . . . . . . . . . . . . . . . . . . .. 3157.12.8 G.723.1 Error Sensitivity . . . . . . . . . . . . . . . . . . . . . .. . 317

7.13 Advanced Multi-rate JD-CDMA Transceiver . . . . . . . . . . .. . . . . . . 3197.13.1 Multi-rate codecs and systems . . . . . . . . . . . . . . . . . . .. . 3197.13.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3227.13.3 The Adaptive Multi-Rate Speech Codec . . . . . . . . . . . . .. . . 323

7.13.3.1 AMR Codec Overview . . . . . . . . . . . . . . . . . . . 3237.13.3.2 Linear Prediction Analysis . . . . . . . . . . . . . . . . . 3247.13.3.3 LSF Quantization . . . . . . . . . . . . . . . . . . . . . . 3247.13.3.4 Pitch Analysis . . . . . . . . . . . . . . . . . . . . . . . . 3257.13.3.5 Fixed Codebook With Algebraic Structure . . . . . . . .. 3267.13.3.6 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . 3277.13.3.7 The AMR Codec’s Bit Allocation . . . . . . . . . . . . . . 3277.13.3.8 Codec Mode Switching Philosophy . . . . . . . . . . . . . 328

Page 8: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page vii

CONTENTS vii

7.13.4 The AMR Speech Codec’s Error Sensitivity . . . . . . . . . .. . . . 3287.13.5 Redundant Residue Number System Based Channel Coding . . . . . 332

7.13.5.1 Redundant Residue Number System Overview . . . . . . .3327.13.5.2 Source-Matched Error Protection . . . . . . . . . . . . . .333

7.13.6 Joint Detection Code Division Multiple Access . . . . .. . . . . . . 3357.13.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3357.13.6.2 Joint Detection Based Adaptive Code Division Multiple

Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3367.13.7 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 336

7.13.7.1 Subjective Testing . . . . . . . . . . . . . . . . . . . . . . 3427.13.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

7.14 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

8 Backward-Adaptive CELP Coding 3498.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3498.2 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . .. 3508.3 Backward-Adaptive G728 Schematic . . . . . . . . . . . . . . . . . .. . . . 3528.4 Backward-Adaptive G728 Coding . . . . . . . . . . . . . . . . . . . . .. . 355

8.4.1 G728 Error Weighting . . . . . . . . . . . . . . . . . . . . . . . . . 3558.4.2 G728 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3558.4.3 Codebook Gain Adaption . . . . . . . . . . . . . . . . . . . . . . . 3598.4.4 G728 Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . 3618.4.5 G728 Excitation Vector Quantization . . . . . . . . . . . . . .. . . 3648.4.6 G728 Adaptive Postfiltering . . . . . . . . . . . . . . . . . . . . . .366

8.4.6.1 Adaptive Long-term Postfiltering . . . . . . . . . . . . . . 3668.4.6.2 G728 Adaptive Short-term Postfiltering . . . . . . . . . .. 369

8.4.7 Complexity and Performance of the G728 Codec . . . . . . . .. . . 3698.5 Reduced-Rate 16-8 kbps G728-Like Codec I . . . . . . . . . . . . .. . . . . 3708.6 The Effects of Long Term Prediction . . . . . . . . . . . . . . . . . .. . . . 3738.7 Closed-Loop Codebook Training . . . . . . . . . . . . . . . . . . . . .. . . 3788.8 Reduced-Rate 16-8 kbps G728-Like Codec II . . . . . . . . . . . .. . . . . 3838.9 Programmable-Rate 8-4 kbps CELP Codecs . . . . . . . . . . . . . .. . . . 383

8.9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3838.9.2 8-4kbps Codec Improvements . . . . . . . . . . . . . . . . . . . . . 3848.9.3 8-4kbps Codecs - Forward Adaption of the STP SynthesisFilter . . . 3858.9.4 8-4kbps Codecs - Forward Adaption of the LTP . . . . . . . . .. . . 387

8.9.4.1 Initial Experiments . . . . . . . . . . . . . . . . . . . . . 3878.9.4.2 Quantization of Jointly Optimized Gains . . . . . . . . .. 3898.9.4.3 8-4kbps Codecs - Voiced/Unvoiced Codebooks . . . . . .. 392

8.9.5 Low Delay Codecs at 4-8 kbits/s . . . . . . . . . . . . . . . . . . . .3938.9.6 Low Delay ACELP Codec . . . . . . . . . . . . . . . . . . . . . . . 397

8.10 Backward-adaptive Error Sensitivity Issues . . . . . . . .. . . . . . . . . . 4008.10.1 The Error Sensitivity of the G728 Codec . . . . . . . . . . . .. . . 4008.10.2 The Error Sensitivity of Our 4-8 kbits/s Low Delay Codecs . . . . . . 4018.10.3 The Error Sensitivity of Our Low Delay ACELP Codec . . .. . . . . 406

8.11 A Low-Delay Multimode Speech Transceiver . . . . . . . . . . .. . . . . . 407

Page 9: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page viii

viii CONTENTS

8.11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4078.11.2 8-16 kbps Codec Performance . . . . . . . . . . . . . . . . . . . . .4088.11.3 Transmission Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 410

8.11.3.1 Higher-quality Mode . . . . . . . . . . . . . . . . . . . . 4108.11.3.2 Lower-quality Mode . . . . . . . . . . . . . . . . . . . . . 411

8.11.4 Speech Transceiver Performance . . . . . . . . . . . . . . . . .. . . 4118.12 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

III Wideband Coding and Transmission 413

9 Wideband Speech Coding 4159.1 Subband-ADPCM Wideband Coding . . . . . . . . . . . . . . . . . . . . .. 415

9.1.1 Introduction and Specifications . . . . . . . . . . . . . . . . . .. . . 4159.1.2 G722 Codec Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 4169.1.3 Principles of Subband Coding . . . . . . . . . . . . . . . . . . . . .4199.1.4 Quadrature Mirror Filtering . . . . . . . . . . . . . . . . . . . . .. 421

9.1.4.1 Analysis Filtering . . . . . . . . . . . . . . . . . . . . . . 4219.1.4.2 Synthesis Filtering . . . . . . . . . . . . . . . . . . . . . . 4239.1.4.3 Practical QMF Design Constraints . . . . . . . . . . . . . 425

9.1.5 G722 Adaptive Quantisation and Prediction . . . . . . . . .. . . . . 4319.1.6 G722 Coding Performance . . . . . . . . . . . . . . . . . . . . . . . 433

9.2 Wideband Transform-Coding at 32 kbps . . . . . . . . . . . . . . . .. . . . 4339.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4339.2.2 Transform-Coding Algorithm . . . . . . . . . . . . . . . . . . . . .433

9.3 Subband-Split Wideband CELP Codecs . . . . . . . . . . . . . . . . .. . . 4379.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4379.3.2 Subband-based Wideband CELP coding . . . . . . . . . . . . . . .. 437

9.3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 4379.3.2.2 Low-band Coding . . . . . . . . . . . . . . . . . . . . . . 4399.3.2.3 Highband Coding . . . . . . . . . . . . . . . . . . . . . . 4399.3.2.4 Bit allocation Scheme . . . . . . . . . . . . . . . . . . . . 439

9.4 Fullband Wideband ACELP Coding . . . . . . . . . . . . . . . . . . . . .. 4409.4.1 Wideband ACELP Excitation . . . . . . . . . . . . . . . . . . . . . 4409.4.2 Wideband 32 kbps ACELP Coding . . . . . . . . . . . . . . . . . . 4439.4.3 Wideband 9.6 kbps ACELP Coding . . . . . . . . . . . . . . . . . . 444

9.5 Turbo-coded Wideband Speech Transceiver . . . . . . . . . . . .. . . . . . 4459.5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . 4459.5.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4489.5.3 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 4499.5.4 Constant Throughput Adaptive Modulation . . . . . . . . . .. . . . 4509.5.5 Adaptive Wideband Transceiver Performance . . . . . . . .. . . . . 4519.5.6 Multi–mode Transceiver Adaptation . . . . . . . . . . . . . . .. . . 4529.5.7 Transceiver Mode Switching . . . . . . . . . . . . . . . . . . . . . .4549.5.8 The Wideband G.722.1 Codec . . . . . . . . . . . . . . . . . . . . . 456

9.5.8.1 Audio Codec Overview . . . . . . . . . . . . . . . . . . . 456

Page 10: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page ix

CONTENTS ix

9.5.9 Detailed Description of the Audio Codec . . . . . . . . . . . .. . . 4579.5.10 Wideband Adaptive System Performance . . . . . . . . . . . .. . . 4599.5.11 Audio Frame Error Results . . . . . . . . . . . . . . . . . . . . . . .4599.5.12 Audio Segmental SNR Performance and Discussions . . .. . . . . . 4619.5.13 G.722.1 Audio Transceiver Summary and Conclusions .. . . . . . . 462

9.6 Turbo-Detected IRCC AMR-WB Transceivers . . . . . . . . . . . . .. . . . 4639.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4639.6.2 The AMR-WB Codec’s Error Sensitivity . . . . . . . . . . . . . . .4659.6.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4669.6.4 Design of Irregular Convolutional Codes . . . . . . . . . . .. . . . 4679.6.5 An Example Irregular Convolutional Code . . . . . . . . . . .. . . 4699.6.6 UEP AMR IRCC Performance Results . . . . . . . . . . . . . . . . 4709.6.7 UEP AMR Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 472

9.7 The AMR-WB+ Audio Codec . . . . . . . . . . . . . . . . . . . . . . . . . 4749.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4749.7.2 Audio requirements in mobile multimedia applications . . . . . . . . 477

9.7.2.1 Summary of audio-visual services . . . . . . . . . . . . . . 4789.7.2.2 Bit rates supported by the radio network . . . . . . . . . .478

9.7.3 Overview of the AMR-WB+ codec . . . . . . . . . . . . . . . . . . 4789.7.3.1 Encoding the high frequencies . . . . . . . . . . . . . . . 4829.7.3.2 Stereo encoding . . . . . . . . . . . . . . . . . . . . . . . 4829.7.3.3 Complexity of AMR-WB+ . . . . . . . . . . . . . . . . . 4839.7.3.4 Transport and file format of AMR-WB+ . . . . . . . . . . 483

9.7.4 Performance of AMR-WB+ . . . . . . . . . . . . . . . . . . . . . . 4849.7.5 Summary of the AMR-WB+ codec . . . . . . . . . . . . . . . . . . 486

9.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

10 Advanced Multi-Rate Speech Transceivers 48910.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48910.2 The Adaptive Multi-Rate Speech Codec . . . . . . . . . . . . . . .. . . . . 491

10.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49110.2.2 Linear Prediction Analysis . . . . . . . . . . . . . . . . . . . . .. . 49310.2.3 LSF Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49310.2.4 Pitch Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49310.2.5 Fixed Codebook With Algebraic Structure . . . . . . . . . .. . . . . 49510.2.6 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49610.2.7 The AMR Codec’s Bit Allocation . . . . . . . . . . . . . . . . . . .49610.2.8 Codec Mode Switching Philosophy . . . . . . . . . . . . . . . . .. 496

10.3 Speech Codec’s Error Sensitivity . . . . . . . . . . . . . . . . . .. . . . . 49710.4 System Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50110.5 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50310.6 Redundant Residue Number System (RRNS) Channel Coding. . . . . . . . 504

10.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50410.6.2 Source-Matched Error Protection . . . . . . . . . . . . . . . .. . . 505

10.7 Joint Detection Code Division Multiple Access . . . . . . .. . . . . . . . . 50810.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508

Page 11: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page x

x CONTENTS

10.7.2 Joint Detection Based Adaptive Code Division Multiple Access . . . 50810.8 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .509

10.8.1 Subjective Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 51810.9 A Turbo-Detected Irregular Convolutional Coded AMR Transceiver . . . . . 519

10.9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51910.9.2 The AMR-WB Codec’s Error Sensitivity . . . . . . . . . . . . . .. 52010.9.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52010.9.4 Design of Irregular Convolutional Codes . . . . . . . . . .. . . . . 52210.9.5 An Example Irregular Convolutional Code . . . . . . . . . .. . . . 52410.9.6 UEP AMR IRCC Performance Results . . . . . . . . . . . . . . . . 52510.9.7 UEP AMR Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 527

10.10Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .529

11 MPEG-4 Audio Compression and Transmission 53111.1 Overview of MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . . . . . .53111.2 General Audio Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . .533

11.2.1 Advanced Audio Coding . . . . . . . . . . . . . . . . . . . . . . . . 54111.2.2 Gain Control Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 54411.2.3 Psychoacoustic Model . . . . . . . . . . . . . . . . . . . . . . . . . 54511.2.4 Temporal Noise Shaping . . . . . . . . . . . . . . . . . . . . . . . . 54711.2.5 Stereophonic Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 54911.2.6 AAC Quantization and Coding . . . . . . . . . . . . . . . . . . . . .55011.2.7 Noiseless Huffman Coding . . . . . . . . . . . . . . . . . . . . . . .55211.2.8 Bit-Sliced Arithmetic Coding . . . . . . . . . . . . . . . . . . .. . 55311.2.9 Transform-domain Weighted Interleaved Vector Quantization . . . . 55511.2.10 Parametric Audio Coding . . . . . . . . . . . . . . . . . . . . . . .558

11.3 Speech Coding in MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . .. 55911.3.1 Harmonic Vector Excitation Coding . . . . . . . . . . . . . . .. . . 55911.3.2 CELP Coding in MPEG-4 . . . . . . . . . . . . . . . . . . . . . . . 56211.3.3 LPC Analysis and Quantization . . . . . . . . . . . . . . . . . . .. 56411.3.4 Multi Pulse and Regular Pulse Excitation . . . . . . . . . .. . . . . 565

11.4 MPEG-4 Codec Performance . . . . . . . . . . . . . . . . . . . . . . . . .. 56711.5 MPEG-4 Space-Time Block Coded OFDM Audio Transceiver .. . . . . . . 569

11.5.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57111.5.2 System parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 57111.5.3 Frame Dropping Procedure . . . . . . . . . . . . . . . . . . . . . . .57211.5.4 Space-Time Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 57511.5.5 Adaptive Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 57611.5.6 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 579

11.6 Turbo-Detected STTC Aided MPEG-4 Audio Transceivers .. . . . . . . . . 58111.6.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . .58111.6.2 Audio Turbo Transceiver Overview . . . . . . . . . . . . . . . .. . 58311.6.3 The Turbo Transceiver . . . . . . . . . . . . . . . . . . . . . . . . . 58411.6.4 Turbo Transceiver Performance Results . . . . . . . . . . .. . . . . 58611.6.5 MPEG-4 Turbo Transceiver Summary . . . . . . . . . . . . . . . .. 589

11.7 Turbo-Detected STTC Aided MPEG-4 Versus AMR-WB Transceivers . . . . 590

Page 12: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page xi

CONTENTS xi

11.7.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . .59011.7.2 The AMR-WB Codec’S Error Sensitivity . . . . . . . . . . . . . .. 59111.7.3 The MPEG-4 TwinVQ Codec’S Error Sensitivity . . . . . . .. . . . 59311.7.4 The Turbo Transceiver . . . . . . . . . . . . . . . . . . . . . . . . . 59411.7.5 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . . 59611.7.6 AMR-WB and MPEG-4 TwinVQ Turbo Transceiver Summary . .. . 599

11.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

IV Very Low Rate Coding and Transmission 601

12 Overview of Low-rate Speech Coding 60312.1 Low Bit Rate Speech Coding . . . . . . . . . . . . . . . . . . . . . . . . .. 603

12.1.1 Analysis-by-Synthesis Coding . . . . . . . . . . . . . . . . . .. . . 60512.1.2 Speech Coding at 2.4kbps . . . . . . . . . . . . . . . . . . . . . . . 607

12.1.2.1 Background to 2.4kbps Speech Coding . . . . . . . . . . . 60812.1.2.2 Frequency Selective Harmonic Coder . . . . . . . . . . . .60912.1.2.3 Sinusoidal Transform Coder . . . . . . . . . . . . . . . . . 61012.1.2.4 Multiband Excitation Coders . . . . . . . . . . . . . . . . 61112.1.2.5 Subband Linear Prediction Coder . . . . . . . . . . . . . . 61212.1.2.6 Mixed Excitation Linear Prediction Coder . . . . . . .. . 61312.1.2.7 Waveform Interpolation Coder . . . . . . . . . . . . . . . 615

12.1.3 Speech Coding Below 2.4kbps . . . . . . . . . . . . . . . . . . . . .61612.2 Linear Predictive Coding model . . . . . . . . . . . . . . . . . . . .. . . . 617

12.2.1 Short Term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 61812.2.2 Long Term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 61912.2.3 Final Analysis-by-Synthesis Model . . . . . . . . . . . . . .. . . . 620

12.3 Speech Quality Measurements . . . . . . . . . . . . . . . . . . . . . .. . . 62012.3.1 Objective Speech Quality Measures . . . . . . . . . . . . . . .. . . 62112.3.2 Subjective Speech Quality Measures . . . . . . . . . . . . . .. . . . 62212.3.3 2.4kbps Selection Process . . . . . . . . . . . . . . . . . . . . . .. 622

12.4 Speech Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62412.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625

13 Linear Predictive Vocoder 62913.1 Overview of a Linear Predictive Vocoder . . . . . . . . . . . . .. . . . . . . 62913.2 Line Spectrum Frequencies Quantization . . . . . . . . . . . .. . . . . . . . 630

13.2.1 Line Spectrum Frequencies Scalar Quantization . . . .. . . . . . . . 63013.2.2 Line Spectrum Frequencies Vector Quantization . . . .. . . . . . . 631

13.3 Pitch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63513.3.1 Voiced-Unvoiced Decision . . . . . . . . . . . . . . . . . . . . . .. 63713.3.2 Oversampled Pitch Detector . . . . . . . . . . . . . . . . . . . . .. 63813.3.3 Pitch Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641

13.3.3.1 Computational Complexity . . . . . . . . . . . . . . . . . 64413.3.4 Integer Pitch Detector . . . . . . . . . . . . . . . . . . . . . . . . .646

13.4 Unvoiced Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647

Page 13: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page xii

xii CONTENTS

13.5 Voiced Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64813.5.1 Placement of Excitation Pulses . . . . . . . . . . . . . . . . . .. . . 64813.5.2 Pulse Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649

13.6 Adaptive Postfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64913.7 Pulse Dispersion Filter . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 652

13.7.1 Pulse Dispersion Principles . . . . . . . . . . . . . . . . . . . .. . 65213.7.2 Pitch Independent Glottal Pulse Shaping Filter . . . .. . . . . . . . 65313.7.3 Pitch Dependent Glottal Pulse Shaping Filter . . . . . .. . . . . . . 654

13.8 Results for Linear Predictive Vocoder . . . . . . . . . . . . . .. . . . . . . 65513.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660

14 Wavelets and Pitch Detection 66114.1 Conceptual Introduction to Wavelets . . . . . . . . . . . . . . .. . . . . . . 661

14.1.1 Fourier Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66114.1.2 Wavelet Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66214.1.3 Detecting Discontinuities with Wavelets . . . . . . . . .. . . . . . . 663

14.2 Introduction to Wavelet Mathematics . . . . . . . . . . . . . . .. . . . . . . 66414.2.1 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . .. . 66514.2.2 Polynomial Spline Wavelets . . . . . . . . . . . . . . . . . . . . .. 66614.2.3 Pyramidal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 66814.2.4 Boundary Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668

14.3 Preprocessing the Wavelet Transform Signal . . . . . . . . .. . . . . . . . . 66914.3.1 Spurious Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66914.3.2 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67214.3.3 Candidate Glottal Pulses . . . . . . . . . . . . . . . . . . . . . . .. 672

14.4 Voiced-Unvoiced Decision . . . . . . . . . . . . . . . . . . . . . . . .. . . 67314.5 Wavelet Based Pitch Detector . . . . . . . . . . . . . . . . . . . . . .. . . . 673

14.5.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 67414.5.2 Autocorrelation Simplification . . . . . . . . . . . . . . . . .. . . . 677

14.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681

15 Zinc Function Excitation 68315.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68315.2 Overview of Prototype Waveform Interpolation Zinc Function Excitation . . 684

15.2.1 Coding Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68415.2.1.1 U-U-U Encoder Scenario . . . . . . . . . . . . . . . . . . 68515.2.1.2 U-U-V Encoder Scenario . . . . . . . . . . . . . . . . . . 68515.2.1.3 V-U-U Encoder Scenario . . . . . . . . . . . . . . . . . . 68715.2.1.4 U-V-U Encoder Scenario . . . . . . . . . . . . . . . . . . 68715.2.1.5 V-V-V Encoder Scenario . . . . . . . . . . . . . . . . . . 68715.2.1.6 V-U-V Encoder Scenario . . . . . . . . . . . . . . . . . . 68715.2.1.7 U-V-V Encoder Scenario . . . . . . . . . . . . . . . . . . 68815.2.1.8 V-V-U Encoder Scenario . . . . . . . . . . . . . . . . . . 68815.2.1.9 U-V Decoder Scenario . . . . . . . . . . . . . . . . . . . 68815.2.1.10 U-U Decoder Scenario . . . . . . . . . . . . . . . . . . . 68915.2.1.11 V-U Decoder Scenario . . . . . . . . . . . . . . . . . . . . 689

Page 14: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page xiii

CONTENTS xiii

15.2.1.12 V-V Decoder Scenario . . . . . . . . . . . . . . . . . . . . 68915.3 Zinc Function Modelling . . . . . . . . . . . . . . . . . . . . . . . . . .. . 689

15.3.1 Error Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 69015.3.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . .. 69115.3.3 Reducing the Complexity of Zinc Function ExcitationOptimization . 69215.3.4 Phases of the Zinc Functions . . . . . . . . . . . . . . . . . . . . .. 693

15.4 Pitch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69315.4.1 Voiced-Unvoiced Boundaries . . . . . . . . . . . . . . . . . . . .. . 69315.4.2 Pitch Prototype Selection . . . . . . . . . . . . . . . . . . . . . .. . 694

15.5 Voiced Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69615.5.1 Energy Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69915.5.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699

15.6 Excitation Interpolation Between Prototype Segments. . . . . . . . . . . . . 70115.6.1 ZFE Interpolation Regions . . . . . . . . . . . . . . . . . . . . . .. 70115.6.2 ZFE Amplitude Parameter Interpolation . . . . . . . . . . .. . . . . 70215.6.3 ZFE Position Parameter Interpolation . . . . . . . . . . . .. . . . . 70215.6.4 Implicit Signalling of Prototype Zero Crossing . . . .. . . . . . . . 70415.6.5 Removal of ZFE Pulse Position Signalling and Interpolation . . . . . 70415.6.6 Pitch Synchronous Interpolation of Line Spectrum Frequencies . . . 70515.6.7 ZFE Interpolation Example . . . . . . . . . . . . . . . . . . . . . .. 705

15.7 Unvoiced Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70515.8 Adaptive Postfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70515.9 Results for Single Zinc Function Excitation . . . . . . . . .. . . . . . . . . 70815.10Error Sensitivity of the 1.9kbps PWI-ZFE Coder . . . . . . .. . . . . . . . . 711

15.10.1 Parameter Sensitivity of the 1.9kbps PWI-ZFE coder .. . . . . . . . 71115.10.1.1 Line Spectrum Frequencies . . . . . . . . . . . . . . . . . 71115.10.1.2 Voiced-Unvoiced Flag . . . . . . . . . . . . . . . . . . . . 71215.10.1.3 Pitch Period . . . . . . . . . . . . . . . . . . . . . . . . . 71215.10.1.4 Excitation Amplitude Parameters . . . . . . . . . . . . .. 71215.10.1.5 Root Mean Square Energy Parameter . . . . . . . . . . . . 71215.10.1.6 Boundary Shift Parameter . . . . . . . . . . . . . . . . . . 713

15.10.2 Degradation from Bit Corruption . . . . . . . . . . . . . . . .. . . . 71315.10.2.1 Error Sensitivity Classes . . . . . . . . . . . . . . . . . . .713

15.11Multiple Zinc Function Excitation . . . . . . . . . . . . . . . .. . . . . . . 71515.11.1 Encoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 71515.11.2 Performance of Multiple Zinc Function Excitation .. . . . . . . . . 718

15.12A Sixth-rate, 3.8 kbps GSM-like Speech Transceiver . .. . . . . . . . . . . 72215.12.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72215.12.2 The Turbo-coded Sixth-rate 3.8 kbps GSM-like System . . . . . . . . 72215.12.3 Turbo Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . 72315.12.4 The Turbo-coded GMSK Transceiver . . . . . . . . . . . . . . .. . 72415.12.5 System Performance Results . . . . . . . . . . . . . . . . . . . .. . 725

15.13Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .726

Page 15: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page xiv

xiv CONTENTS

16 Mixed-Multiband Excitation 72916.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72916.2 Overview of Mixed-Multiband Excitation . . . . . . . . . . . .. . . . . . . 73116.3 Finite Impulse Response Filter . . . . . . . . . . . . . . . . . . . .. . . . . 73416.4 Mixed-Multiband Excitation Encoder . . . . . . . . . . . . . . .. . . . . . 735

16.4.1 Voicing Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73716.5 Mixed-Multiband Excitation Decoder . . . . . . . . . . . . . . .. . . . . . 739

16.5.1 Adaptive Postfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . 74116.5.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . .. 741

16.6 Performance of the Mixed-Multiband Excitation Coder .. . . . . . . . . . . 74316.6.1 Performance of a Mixed-Multiband Excitation LinearPredictive Coder74316.6.2 Performance of a Mixed-Multiband Excitation and Zinc Function

Prototype Excitation Coder . . . . . . . . . . . . . . . . . . . . . . . 74816.7 A Higher Rate 3.85kbps Mixed-Multiband Excitation Scheme . . . . . . . . 75116.8 A 2.35 kbit/s Joint-Detection CDMA Speech Transceiver. . . . . . . . . . . 754

16.8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75416.8.2 The Speech Codec’s Bit Allocation . . . . . . . . . . . . . . . .. . 75416.8.3 The Speech Codec’s Error Sensitivity . . . . . . . . . . . . .. . . . 75416.8.4 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75516.8.5 The JD-CDMA Speech System . . . . . . . . . . . . . . . . . . . . 75616.8.6 System performance . . . . . . . . . . . . . . . . . . . . . . . . . . 75716.8.7 Conclusions on the JD-CDMA Speech Transceiver . . . . .. . . . . 758

16.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758

17 Sinusoidal Transform Coding Below 4kbps 76117.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76117.2 Sinusoidal Analysis of Speech Signals . . . . . . . . . . . . . .. . . . . . . 762

17.2.1 Sinusoidal Analysis with Peak Picking . . . . . . . . . . . .. . . . 76217.2.2 Sinusoidal Analysis using Analysis-by-Synthesis .. . . . . . . . . . 763

17.3 Sinusoidal Synthesis of Speech Signals . . . . . . . . . . . . .. . . . . . . 76417.3.1 Frequency, Amplitude and Phase Interpolation . . . . .. . . . . . . 76417.3.2 Overlap-Add Interpolation . . . . . . . . . . . . . . . . . . . . .. . 765

17.4 Low Bit Rate Sinusoidal Coders . . . . . . . . . . . . . . . . . . . . .. . . 76517.4.1 Increased Frame Length . . . . . . . . . . . . . . . . . . . . . . . . 76817.4.2 Incorporating Linear Prediction Analysis . . . . . . . .. . . . . . . 768

17.5 Incorporating Prototype Waveform Interpolation . . . .. . . . . . . . . . . . 76917.6 Encoding the Sinusoidal Frequency Component . . . . . . . .. . . . . . . . 77017.7 Determining the Excitation Components . . . . . . . . . . . . .. . . . . . . 773

17.7.1 Peak-Picking of the Residual Spectra . . . . . . . . . . . . .. . . . 77317.7.2 Analysis-by-Synthesis of the Residual Spectrum . . .. . . . . . . . 77317.7.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . .. 77517.7.4 Reducing the Computational Complexity . . . . . . . . . . .. . . . 775

17.8 Quantizing the Excitation Parameters . . . . . . . . . . . . . .. . . . . . . 77917.8.1 Encoding the Sinusoidal Amplitudes . . . . . . . . . . . . . .. . . . 779

17.8.1.1 Vector Quantization of the Amplitudes . . . . . . . . . .. 78017.8.1.2 Interpolation and Decimation . . . . . . . . . . . . . . . . 780

Page 16: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page xv

CONTENTS xv

17.8.1.3 Vector Quantization . . . . . . . . . . . . . . . . . . . . . 78117.8.1.4 Vector Quantization Performance . . . . . . . . . . . . . .78217.8.1.5 Scalar Quantization of the Amplitudes . . . . . . . . . .. 783

17.8.2 Encoding the Sinusoidal Phases . . . . . . . . . . . . . . . . . .. . 78517.8.2.1 Vector Quantization of the Phases . . . . . . . . . . . . . .78517.8.2.2 Encoding the Phases with a Voiced-Unvoiced Switch. . . 785

17.8.3 Encoding the Sinusoidal Fourier Coefficients . . . . . .. . . . . . . 78617.8.3.1 Equivalent Rectangular Bandwidth Scale . . . . . . . .. . 786

17.8.4 Voiced-Unvoiced Flag . . . . . . . . . . . . . . . . . . . . . . . . . 78717.9 Sinusoidal Transform Decoder . . . . . . . . . . . . . . . . . . . . .. . . . 788

17.9.1 Pitch Synchronous Interpolation . . . . . . . . . . . . . . . .. . . . 78817.9.1.1 Fourier Coefficient Interpolation . . . . . . . . . . . . .. 789

17.9.2 Frequency Interpolation . . . . . . . . . . . . . . . . . . . . . . .. 78917.9.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . .. 789

17.10Speech Coder Performance . . . . . . . . . . . . . . . . . . . . . . . .. . . 79017.11Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .796

18 Conclusions on Low Rate Coding 79718.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79718.2 Listening Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79818.3 Summary of Very Low Rate Coding . . . . . . . . . . . . . . . . . . . . .. 79918.4 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .801

19 Comparison of Speech Transceivers 80319.1 Background to Speech Quality Evaluation . . . . . . . . . . . .. . . . . . . 80319.2 Objective Speech Quality Measures . . . . . . . . . . . . . . . . .. . . . . 804

19.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80419.2.2 Signal to Noise Ratios . . . . . . . . . . . . . . . . . . . . . . . . . 80519.2.3 Articulation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 80519.2.4 Ceptral Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80619.2.5 Cepstral Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80919.2.6 Logarithmic likelihood ratio . . . . . . . . . . . . . . . . . . .. . . 81119.2.7 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 812

19.3 Subjective Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 81219.3.1 Quality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813

19.4 Comparison of Quality Measures . . . . . . . . . . . . . . . . . . . .. . . . 81419.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81419.4.2 Intelligibility tests . . . . . . . . . . . . . . . . . . . . . . . . .. . 815

19.5 Subjective Speech Quality of Various Codecs . . . . . . . . .. . . . . . . . 81619.6 Speech Codec Bit-sensitivity . . . . . . . . . . . . . . . . . . . . .. . . . . 81819.7 Transceiver Speech Performance . . . . . . . . . . . . . . . . . . .. . . . . 81819.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825

A Constructing the Quadratic Spline Wavelets 827

B Zinc Function Excitation 831

Page 17: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 1

CONTENTS 1

C Probability Density Function for Amplitudes 837

Bibliography 843

Index 887

Author Index 887

Page 18: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 1

Preface and MotivationThe Speech Coding Scene

Despite the emergence of sophisticated high-rate multimedia services, voice communicationsremain the predominant means of human communications, although the compressed voicesignals may be delivered via the Internet. The large-scale,pervasive introduction of wirelessInternet services is likely to promote the unified transmission of both voice and data signalsusing the Voice over Internet Protocol (VoIP) even in the third - generation (3G) wirelesssystems, despite wasting much of the valuable frequency resources for the transmission ofpacket headers. Even when the predicted surge of wireless data and Internet services becomesa reality, voice remains the most natural means of human communications, although this maybe delivered via the Internet.

This book is dedicated to audio and voice compression issues, although the aspects of errorresilience, coding delay, implementational complexity and bitrate are also at the centre of ourdiscussions, characterising many different speech codecsincorported in source-sensitivitymatched wireless transceivers. A unique feature of the bookis that it also provides cutting-edge turbo-transceiver-aided research-oriented design examples and an a chapter on the VoIPprotocol.

Here we attempt a rudimentary comparison of some of the codecschemes treated in thebook in terms of their speech quality and bitrate, in order toprovide a road map for the readerwith reference to Cox’s work [1, 2]. The formally evaluated Mean Opinion Score (MOS)values of the various codecs portrayed in the book are shown in Figure 1.

Observe in the figure that over the years a range of speech codecs have emerged, whichattained the quality of the 64 kbps G.711 PCM speech codec, although at the cost of signifi-cantly increased coding delay and implementational complexity. The 8 kbps G.729 codec isthe most recent addition to this range of the International Telecommunications Union’s (ITU)standard schemes, which significantly outperforms all previous standard ITU codecs in ro-bustness terms. The performance target of the 4 kbps ITU codec (ITU4) is also to maintainthis impressive set of specifications. The family of codecs designed for various mobile radiosystems - such as the 13 kbps Regular Pulse Excited (RPE) scheme of the Global System ofMobile communications known as GSM, the 7.95 kbps IS-54, andthe IS-95 Pan-Americanschemes, the 6.7 kbps Japanese Digital Cellular (JDC) and 3.45 kbps half-rate JDC arrange-ment (JDC/2) - exhibits slightly lower MOS values than the ITU codecs. Let us now considerthe subjective quality of these schemes in a little more depth.

The 2.4 kbps US Department of Defence Federal Standard codecknown as FS-1015 is theonly vocoder in this group and it has a rather synthetic speech quality, associated with the low-est subjective assessment in the figure. The 64 kbps G.711 PCMcodec and the G.726/G.727Adaptive Differential PCM (ADPCM) schemes are waveform codecs. They exhibit a low im-

1

Page 19: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 2

2 CONTENTS

plementational complexity associated with a modest bitrate economy. The remaining codecsbelong to the so-called hybrid coding family and achieve significant bitrate economies at thecost of increased complexity and delay.

Excellent

Good

Fair

Poor

MOS

2 4 8 16 32 64 128

bit rate (kb/s)

PCM

G.711G.726G.728

GSM

G.729G.723ITU4

IS54

IS96JDC

In-M

FS1016

JDC/2

MELP

FS1015

New Research

ComplexityDelay

Figure 1: Subjective speech quality of various codecs [1]c©IEEE, 1996

Specifically, the 16 kbps G.728 backward-adaptive scheme maintains a similar speechquality to the 32 and 64 kbps waveform codecs, while also maintaining an impressively low,2 ms delay. This scheme was standardised during the early nineties. The similar-quality, butsignificantly more robust 8 kbps G.729 codec was approved in March 1996 by the ITU. Itsstandardisation overlapped with the G.723.1 codec developments. The G.723.1 codec’s 6.4kbps mode maintains a speech quality similar to the G.711, G.726, G.727, G.728 and G.728codecs, while its 5.3 kbps mode exhibits a speech quality similar to the cellular speech codecsof the late eighties. The standardisation of a 4 kbps ITU scheme, which we refer to here asITU4 is also a desirable design goal at the time of writing.

In parallel to the ITU’s standardisation activities a rangeof speech coding standards havebeen proposed for regional cellular mobile systems. The standardisation of the 13 kbps RPE-LTP full-rate GSM (GSM-FR) codec dates back to the second half of the eighties, represent-ing the first standard hybrid codec. Its complexity is significantly lower than that of the morerecent Code Excited Linear Predictive (CELP) based codecs.Observe in the figure that thereis also a similar-rate Enhanced Full-Rate GSM codec (GSM-EFR), which matches the speechquality of the G.729 and G.728 schemes. The original GSM-FR codec’s development wasfollowed a little later by the release of the 7.95 kbps VectorSum Excited Linear Predictive

Page 20: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 3

CONTENTS 3

(VSELP) IS-54 American cellular standard. Due to advances in the field the 7.95 kbps IS-54codec achieved a similar subjective speech quality to the 13kbps GSM-FR scheme. Thedefinition of the 6.7 kbps Japanese JDC VSELP codec was almostcoincident with that of theIS-54 arrangement. This codec development was also followed by a half-rate standardisationprocess, leading to the 3.2 kbps Pitch-Synchroneous Innovation CELP (PSI-CELP) scheme.

The IS-95 Pan-American CDMA system also has its own standardised CELP-based speechcodec, which is a variable-rate scheme, supporting bitrates between 1.2 and 14.4 kbps, de-pending on the prevalent voice activity. The perceived speech quality of these cellular speechcodecs contrived mainly during the late eighties was found subjectively similar to each otherunder the perfect channel conditions of Figure 1. Lastly, the 5.6 kbps half-rate GSM codec(GSM-HR) also met its specification in terms of achieving a similar speech quality to the13 kbps original GSM-FR arrangements, although at the cost of quadruple complexity andhigher latency.

Recently the advantages of intelligent multimode speech terminals (IMT), which can re-configure themselves in a number of different bitrate, quality and robustness modes attractedsubstantial research attention in the community, which ledto the standardisation of the High-Speed Downlink Packet Access (HSDPA) mode of the 3G wirelesssystems. The HSDPA-style transceivers employ both adaptive modulation and adaptive channel coding, which re-sult in a channel-quality dependent bit-rate fluctuation, hence requiring reconfigurable multi-mode voice and audio codecs, such as the Advanced Multi-Ratecodec referred to as the AMRscheme. Following the standardisation of the narrowband AMR codec, the wideband AMRscheme referred to as the AMR-WB arrangement and encoding the0 - 7 KHz band was alsodeveloped, which will also be characterised in the book. Finally, the most recent AMR codec,namely the so-called AMR-WB+ scheme will also be the subject of our discussions.

Rcent research on sub-2.4 kbps speech codecs is also coveredextensively in the book,where the aspects of auditory masking become more dominant.Finally, since the classicG.722 subband-ADPCM based wideband codec has become obsolete in the light of excitingnew developments in compression, the most recent trend is toconsider wideband speech andaudio codecs, providing susbtantially enhanced speech quality. Motivated by early seminalwork on transform-domain or frequency-domain based compression by Noll and his col-leagues, in this field the wideband G.721.1 codec - which can be programmed to operatebetween 10 kbps and 32 kbps and hence lends itself to employment in HSDPA-style near-instantaneously adaptive wireless communicators - is the most attractive candidate. Thiscodec is portrayed in the context of a sophisticated burst-by-burst adaptive wideband turbo-coded Orthogonal Frequency Division Multiplex (OFDM) IMT in the book. This scheme isalso capable of transmitting high-quality audio signals, behaving essentially as a high-qualitywaveform codec.

Mile-stones in Speech Coding History

Over the years a range of excellent monographs and text bookshave been published, char-acterising the state-of-the-art at its various stages of development and constituting significantmile-stones. The first major development in the history of speech compression can be con-sidered the invention of the vocoder, dating back to as earlyas 1939. Delta modulation wascontrived in 1952 and later it became well established following Steele’s monograph on the

Page 21: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 4

4 CONTENTS

topic in 1975 [3]. Pulse Coded Modulation (PCM) was first documented in detail in Cat-termole’s classic contribution in 1969 [4]. However, it wasrealised in 1967 that predictivecoding provides advantages over memory-less coding techniques, such as PCM. Predictivetechniques were analysed in depth by Markel and Gray in their1976 classic treatise [5]. Thiswas shortly followed by the often cited reference [6] by Rabiner and Schafer. Also Lindblomand Ohman contributed a book in 1979 on speech communicationresearch [7].

The foundations of auditory theory were layed down as early as 1970 by Tobias [8], butthese principles were not exploited to their full potentialuntil the invention of the analysis bysynthesis (AbS) codecs, which were heralded by Atal’s multi-pulse excited codec in the earlyeighties [9]. The waveform coding of speech and video signals has been comprehensivelydocumented by Jayant and Noll in their 1984 monograph [10]. During the eighties the speechcodec developments were fuelled by the emergence of mobile radio systems, where spectrumwas a scarce resource, potentially doubling the number of subscribers and hence the revenue,if the bitrate could be halved.

The RPE principle - as a relatively low-complexity analysisby synthesis technique - wasproposed by Kroon, Deprettere and Sluyter in 1986 [11], which was followed by further re-search conducted by Vary [12,13] and his colleagues at PKI inGermany and IBM in France,leading to the 13 kbps Pan-European GSM codec. This was the first standardised AbS speechcodec, which also employed long-term prediction (LTP), recognising the important role thepitch determination plays in efficient speech compression [14, 15]. It was in this era, whenAtal and Schroeder invented the Code Excited Linear Predictive (CELP) principle [16], lead-ing to perhaps the most productive period in the history of speech coding during the eighties.Some of these developments were also summarised for exampleby O’Shaughnessy [17],Papamichalis [18], Deller, Proakis and Hansen [19].

It was during this era that the importance of speech perception and acoustic phonetics [20]was duly recognised for example in the monograph by Lieberman and Blumstein. A rangeof associated speech quality measures were summarised by Quackenbush, Barnwell III andClements [21]. Nearly concomitantly Furui also published abook related to speech process-ing [22]. This period witnessed the appearance of many of thespeech codecs seen in Figure 1,which found applications in the emerging global mobile radio systems, such as IS-54, JDC,etc. These codecs were typically associated with source-sensitivity matched error protection,where for example Steele, Sundberg and Wong [23–26] have provided early insights on thetopic. Further sophisticated solutions were suggested forexample by Hagenauer [27].

Both the narrow-band and wide-band AMR, as wello as the AMR-WB+ (AMR) codecs [28,29] are capable of adaptively adjusting their bitrate. Thisalso allows the user to adjust theratio between the speech bit rate and the channel coding bit rate constituting the error protec-tion oriented redundancy according to the prevalent near-instantaneous channel conditions inHSDPA-style transceivers. When the channel quality is inferior, the speech encoder operatesat low bit rates, thus accommodating more powerful forward error control within the total bitrate budget. By contrast, under high-quality channel conditions the speech encoder may ben-efit from using the total bit rate budget, yielding high speech quality, since in this high-ratecase low redundancy error protection is sufficient. Thus, the AMR concept allows the systemto operate in an error-resilient mode under poor channel conditions, while benefitting from abetter speech quality under good channel conditions. Hence, the source coding scheme mustbe designed for seamless switching between rates availablewithout annoying artifacts.

Page 22: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 5

CONTENTS 5

Overview of MPEG-4 Audio

The Moving Picture Experts Group (MPEG) was first established by the International Stan-dard Organisation (ISO) in 1988 with the aim of developing a full audio-visual coding stan-dard referred to as MPEG-1 [30–32]. The audio-related section MPEG-1 was designed toencode digital stereo sound at a total bit rate of 1.4 to 1.5 Mbps - depending on the samplingfrequency, which was 44.1 kHz or 48 kHz - down to a few hundred kilobits per second [33].The MPEG-1 standard is structured in layers, from Layer I to III. The higher layers achievea higher compression ratio, albeit at an increased complexity. Layer I achieves perceptualtransparency, i.e. subjective equivalence with the uncompressed original audio signal at 384kbit/s, while Layer II and III achieve a similar subjective quality at 256 kbit/s and 192 kbit/s,respectively [34–38].

MPEG-1 was approved in November 1992 and its Layer I and II versions were immediatelyemployed in practical systems. However, the MPEG Audio Layer III, MP3 for short only be-came a practical reality a few years later, when multimedia PCs were introduced having im-proved processing capabilities and the emerging Internet sparked off a proliferation of MP3compressed teletraffic. This changed the face of the music world and its distribution of music.The MPEG-2 backward compatible audio standard was approvedin 1994 [39], providing animproved technology that would allow those who had already launched MPEG-1 stereo audioservices to upgrade their system to multichannel mode, optionally also supporting a highernumber of channels at a higher compression ratio. Potentialapplications of the multichannelmode are in the field of quadraphonic music distribution or cinemas. Furthermore, lowersampling frequencies were also incorporated, which include 16, 22.05, 24, 32, 44.1 and 48kHz [39]. Concurrently, MPEG commenced research into even higher-compression schemes,relinquishing the backward compatibility requirement, which resulted in the MPEG-2 Ad-vanced Audio Coding standard (AAC) standard in 1997 [40]. This provides those who arenot constrained by legacy systems to benefit from an improvedmultichannel coding scheme.In conjunction with AAC, it is possible to achieve perceptual transparent stereo quality at 128kbit/s and transparent multichannel quality at 320 kbit/s for example in cinema-type applica-tions.

The MPEG-4 audio recommendation is the latest standard completed in 1999 [41–45],which offers in addition to compression further unique features that will allow users to inter-act with the information content at a significant higher level of sophistication than is possibletoday. In terms of compression, MPEG-4 supports the encoding of speech signals at bit ratesfrom 2 kbit/s up to 24 kbit/s. For coding of general audio, ranging from very low bit ratesup to high quality, a wide range of bit rates and bandwidths are supported, ranging from a bitrate of 8 kbit/s and a bandwidth below 4 kHz to broadcast quality audio, including monoauralrepresentations up to multichannel configuration.

The MPEG-4 audio codec includes coding tools from several different encoding families,covering parametric speech coding, CELP-based speech coding and Time/Frequency (T/F)audio coding, which are characterised in Figure 11.1. It canbe observed that a parametriccoding scheme, namely Harmonic Vector eXcitation Coding (HVXC) was selected for cover-ing the bit rate range from 2 to 4 kbit/s. For bit rates between4 and 24 kbit/s, a CELP-codingscheme was chosen for encoding narrowband and wideband speech signals. For encodinggeneral audio signals at bit rates between 8 and 64 kbit/s, a time/frequency coding schemebased on the MPEG-2 AAC standard [40] endowed with additional tools is used. Here, a com-

Page 23: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 6

6 CONTENTS

Parametric(HILN)

24 644832

SatelliteSecure com

20 kHz8 kHz4 kHz

2 4 6 8 1210 14 16

Scalable Codec

T/F codec

bit rate (kbps)

UMTS, Cellular ISDN Internet

Parametric Codec(HVXC)

BandwidthTypical Audio

ITU-Tcodec

CELP codec

Figure 2: MPEG-4 framework [41].

bination of different techniques was established, becauseit was found that maintaining therequired performance for representing speech and music signals at all desired bit rates cannotbe achieved by selecting a single coding architecture. A major objective of the MPEG-4 au-dio encoder is to reduce the bit rate, while maintaining a sufficiently high flexibility in termsof bit rate selection. The MPEG-4 codec also offers other newfunctionalities, which includebit rate scalability, object-based of a specific audio passage for example, played by a cer-tain instrument representation, robustness against transmission errors and supporting specialaudio effects.

MPEG-4 consists of Versions 1 and 2. Version 1 [41] contains the main body of the stan-dard, while Version 2 [46] provides further enhancement tools and functionalities, that in-cludes the issues of increasing the robustness against transmission errors and error protection,low-delay audio coding, finely grained bit rate scalabilityusing the Bit-Sliced ArithmeticCoding (BSAC) tool, the employment of parametric audio coding, using the CELP-basedsilence compression tool and the 4 kbit/s extended variablebit rate mode of the HVXC tool.Due to the vast amount of information contained in the MPEG-4standard, we will only con-sider some of its audio compression components, which include the coding of natural speechand audio signals. Readers who are specifically interested in text-to-speech synthesis or syn-thetic audio issues are referred to the MPEG-4 standard [41]and to the contributions byScheireret al. [47, 48] for further information. Most of the material in this chapter will bebased on an amalgam of References [34–38, 40, 41, 43, 44, 46, 49]. In the next few sections,

Page 24: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 7

CONTENTS 7

the operations of each component of the MPEG-4 audio component will be highlighted ingreater detail. As an application example, we will employ the Transform-domain WeightedInterleaved Vector Quantization (TWINVQ) coding tool, which is one of the MPEG-4 audiocodecs in the context of a wireless audio transceiver in conjunction with space-time cod-ing [50] and various Quadrature Amplitude Modulation (QAM)schemes [51]. The audiotransceiver is introduced in Section 11.5 and its performance is discussed in Section 11.5.6.

Motivation and Outline of the Book

During the early 1990s Atal, Cuperman and Gersho [52] have edited prestigious contribu-tions on speech compression. Also Ince [53] contributed a book in 1992 related to the topic.Anderson and Mohan co-authored a monograph on source and channel coding in 1993 [54].Research-oriented developments were then consolidated inKondoz’ excellent monographin 1994 [55] and in the multi-authored contribution edited by Keijn and Paliwal [56] in1995. The most recent addition to the above range of contributions is the second edition ofO’Shaughnessy well-referenced book cited above. However,at the time of writing no bookspans the entire history of speech and audio compression, which is the goal of this volume.

Against this backcloth, this book endeavours to review the recent history of speech com-pression and communications in the era of wireless turbo-transceivers and joint source/channelcoding. We attempt to provide the reader with a historical perspective, commencing with arudimentary introduction to communications aspects, since throughout the book we illustratethe expected performance of the various speech codecs studied also in the context of jointlyoptimized wireless transceivers.

The book is constituted by four parts. Part I and II are covering classic background materialon speech signals, predictive waveform codecs and analysis-by-synthesis codecs as well as onthe entire speech and audio coding standardisation scene. The bulk of the book is constitutedby the research-oriented Part III and IV, covering both standardised and proprietary speechcodecs - including the most recent AMR-WB+ and the MPEG-4 audio codecs, as well ascutting-edge wireless turbo transceivers.

Specifically,Chapters 1 and 2 of Part I provide a rudimentary introduction to speechsignals, classic waveform coding as well as predictive coding, respectively, quantifying theoverall performance of the various speech codecs, in order to render our treatment of thetopics as self-contained and all-encompassing as possible.

Part II of the book is centred around analysis by synthesis based coding, reviewing theclassic principles in Chapter 3 as well as both narrow and wideband spectral envelope quan-tisation in Chapter 4. RPE and CELP coding are the topic of Chapters 5 and 6, which arefollowed by a detailed chapter on the entire plethora of existing forward-adaptive standardisedCELP codecs in Chapter 7 and on their associated source-sensitivity matched channel codingschemes. The subject of Chapter 8 is both proprietary and standard backward-adaptive CELPcodecs, which is concluded with a system design example based on a low-delay, multi-modewireless transceiver.

The research-orientedPart III of the book are dedicated to a range of standard and propri-etary wideband coding techniques and wireless systems. As an introduction to the widebandcoding scene, in Chapter 9 the classic subband-based G.722 wideband codec is reviewed first,leading to the discussion of numerous low-rate wideband voice and audio codecs. Chapter 9

Page 25: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 8

8 CONTENTS

Algorithms/Techniques Timeline Standards/Commercial Codecs

1983

1981

1979

1970

1961

1940

1986

1987

1991

1990

1989

1988

1999

1998

1997

1995

1994

1993

1992

MPEG-4 Version 1 & 2 finalized [110,111]

Dolby AC-2 [103]

MPEG-1 Audio finalized [104]Dolby AC-3 [103]

MPEG-2 backward compatible [107]

MPEG-2 Advanced Audio Coding (AAC) [109]

CNET codec [91]

Levine & Smith, Verma & Ming: Sinusoidal+Transients+Noise coding [100,101]

Park: Bit-Sliced Arithmetic Coding (BSAC) [98]

Herre & Johnston: Temporal Noise Shaping [97]Iwakami: TWINVQ [96]

Herre: Intensity Stereo Coding [95]

Mahieux: backward adaptive prediction [91]Edler: Window switching strategy [92]Johnston: M/S stereo coding [93]

Johnston: Perceptual Transform Coding [90]

Scharf, Hellman: Masking effects [84,85]

Schroeder: Spread of masking [86]

Rothweiler: Polyphase Quadrature Filter [88]

Fletcher: Auditory patterns [81]

Nussbaumer: Pseudo-Quadrature Mirror Filter [87]

Princen: Time Domain Aliasing Cancellation [89]

Malvar: Modified Discrete Cosine Transform [94]

Sony: MiniDisc: Adaptive TransformAcoustic Coding(ATRAC) [105]

NTT: Transform-domain WeightedInterleaved Vector Quantization (TWINVQ) [96,108]

Philips: Digital Compact Cassette (DCC) [106]

Zwicker, Greenwood: Critical bands [82,83]

AT&T: Perceptual Audio Coder (PAC) [102]

Purnhagen: Parametric Audio Coding [99]

Figure 3: Important milestones in the development of perceptual audio coding.

Page 26: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 9

CONTENTS 9

also contains diverse sophisticated wireless voice- and audio-system design examples, includ-ing a turbo-coded Orthogonal Frequency Division Multiplex(OFDM) wideband audio sys-tem design study. This is followed by a wideband voice transceiver application example usingthe AMR-WB codec, a source-sensitivity matched Irregular Convolutional Code (IRCC) andExtrinsic Information Transfer (EXIT) charts for achieving a near-capacity system perfor-mance. Chapter 9 is concluded with the protrayal of the AMR-WB+ codec. In Chapter 10of Part III we detailed the principles behind the MPEG-4 codec and comparatively studiedthe performance of the MPEG-4 and AMR-WB audio/speech codecscombined with vari-ous sophisticated wireless transceivers. Amongst others,a jointly optimised source-coding,outer unequal protection Non-Systematic Convolutional (NSC) channel-coding, inner Trel-lis Coded Modulation (TCM) and spatial diversity aided Space-Time Trellis Coded (STTC)turbo transceiver was investigated. The employment of TCM provided further error protec-tion without expanding the bandwidth of the system and by utilising STTC spatial diversitywas attained, which rendered the error statistics experienced pseudo-random, as required bythe TCM scheme, since it was designed for Gaussian channels inflicting randomly dispersedchannel errors. Finally, the performance of the STTC-TCM-2NSC scheme was enhancedwith the advent of an efficient iterative joint decoding structure.

Chapters 11-17 ofPart IV are all dedicated to sub-4kbps codecs and their wireless transceivers,while Chapter 18 is devoted to speech quality evaluation techniques as well as to a rudimen-tary comparison of various speech codecs and transceivers.The last chapter of the book is onVoIP.

This book is naturally limited in terms of its coverage of these aspects, simply owingto space limitations. We endeavoured, however, to provide the reader with a broad rangeof applications examples, which are pertinent to a range of typical wireless transmissionscenarios.

Our hope is that the book offers you - the reader - a range of interesting topics, portrayingthe current state-of-the-art in the associated enabling technologies. In simple terms, findinga specific solution to a voice communications problem has to be based on a compromise interms of the inherently contradictory constraints of speech quality, bitrate, delay, robustnessagainst channel errors, and the associated implementational complexity. Analysing thesetrade-offs and proposing a range of attractive solutions tovarious voice communicationsproblems is the basic aim of this book.

Again, it is our hope that the book underlines the range of contradictory system designtrade-offs in an unbiassed fashion and that you will be able to glean information from it, inorder to solve your own particular wireless voice communications problem, but most of allthat you will find it an enjoyable and relatively effortless reading, providing you - the reader- with intellectual stimulation.

Lajos HanzoClare SomervilleJason Woodard

Page 27: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 10

10 CONTENTS

Page 28: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 11

AcknowledgementsThe book has been conceived in the Electronics and Computer Science Department at theUniversity of Southampton, although Dr. Somerville and Dr.Woodard have moved on inthe mean-time. We are indebted to our many colleagues who have enhanced our understand-ing of the subject, in particular to Prof. Emeritus Raymond Steele. These colleagues andvalued friends, too numerous all to be mentioned, have influenced our views concerning var-ious aspects of wireless multimedia communications and we thank them for the enlightmentgained from our collaborations on various projects, papersand books. We are grateful to JanBrecht, Jon Blogh, Marco Breiling, Marco del Buono, Sheng Chen, Stanley Chia, ByoungJo Choi, Joseph Cheung, Peter Fortune, Sheyam Domeya, Lim Dongmin, Dirk Didascalou,Stephan Ernst, Eddie Green, David Greenwood, Hee Thong How,Thomas Keller, Ee-LinKuan, Joerg Kliewer, W.H. Lam, C.C. Lee, M.A. Nofal, Xiao Lin, Chee Siong Lee, Tong-Hooi Liew, Soon-Xin Ng, Matthias Muenster, Noor Othman, Vincent Roger-Marchart, Red-wan Salami, David Stewart, Jeff Torrance, Spiros Vlahoyiannatos, Jin Wang, William Webb,John Williams, Jason Woodard, Choong Hin Wong, Henry Wong, James Wong, Lie-LiangYang, Bee-Leong Yeap, Mong-Suan Yee, Kai Yen, Andy Yuen and many others with whomwe enjoyed an association.

We also acknowledge our valuable associations with the Virtual Centre of Excellence inMobile Communications, in particular with its Chief Executives, Dr. Tony Warwick andDr. Walter Tuttlebee, Dr. Keith Baughan and other members ofits Executive Committee,Professors Hamid Aghvami, Mark Beach, John Dunlop, Barry Evans, Joe McGeehan, SteveMacLaughlin, Rahim Tafazolli. Our sincere thanks are also due to John Hand and NafeesaSimjee, the EPSRC, UK; Dr. Joao Da Silva, Dr Jorge Pereira, Bartholome Arroyo, BernardBarani, Demosthenes Ikonomou and other colleagues from theCommission of the EuropeanCommunities, Brussels, Belgium; Andy Wilton, Luis Lopes and Paul Crichton from MotorolaECID, Swindon, UK for sponsoring some of our recent research.

We feel particularly indebted to Hee Thong How for his invaluable contributions to thebook by coauthoring some of the chapters and to Rita Hanzo as well as Denise Harvey fortheir skillful assistance in typesetting the manuscript inLatex. Similarly, our sincere thanksare due to Mark Hammond, Jennifer Beal, Sarah Hinton and a number of other staff fromJohn Wiley & Sons for their kind assistance throughout the preparation of the camera-readymanuscript. Finally, our sincere gratitude is due to the numerous authors listed in the AuthorIndex - as well as to those, whose work was not cited due to space limitations - for theircontributions to the state-of-the-art, without whom this book would not have materialised.

Lajos HanzoClare SomervilleJason Woodard

11

Page 29: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 12

12 CONTENTS

Page 30: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 13

Part I

Speech Signals and WaveformCoding

13

Page 31: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 14

Page 32: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 348

348

Page 33: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 413

Part III

Wideband Coding andTransmission

413

Page 34: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 414

Page 35: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 488

488

Page 36: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 489

Chapter 10Advanced Multi-Rate SpeechTransceivers

H-T. How and L. Hanzo

10.1 Introduction1 Recent speech coding research efforts have been successfulin creating a range of bothnarrow- and wide-band multimode and multirate coding schemes, many of which have foundtheir way into standardised codecs, such as the Advanced Multi-Rate (AMR) codec and itswideband version known as the AMR-WB scheme proposed for employment in the third-generation wireless systems. Other multimode solutions have been used in the MPEG-4codec, which will be investigated in the next chapter. In multimode coding schemes [335,336], a mode selection process is invoked and the specific coding mode best suited to the localcharacter of the speech signal is selected from a predetermined set of modes. This techniquedynamically tailors the coding scheme to the widely varyinglocal acoustic-phonetic characterof the speech signal.

Multi-rate coding on the other hand facilitates the assignment of a time-variant number ofbits for a frame, adapting the encoding rate on the basis of the local phonetic character of thespeech signal or the network conditions. This is particularly useful in digital cellular commu-nications, where one of the major challenges is that of designing an encoder that is capableof providing high quality speech for a wide variety of channel conditions. Ideally, a goodsolution must provide the highest possible speech quality under perfect channel conditions,while maintaining an error-resilient behaviour in hostilechannel environments. Tradition-ally, existing digital cellular applications have employed a single coding mode where a fixedsource/channel bit allocation provides a compromise solution between the perfect and hostile

1This chapter is based on H.T. How, T.H. Liew, E.L. Kuan, L-L. Yang and L. Hanzo: A Redundant Residue Num-ber System Coded Burst-by-Burst Adaptive Joint-DetectionBased CDMA Speech Transceiver, IEEE Transactionson Vehicular Technology, Volume 55, Issue 1, Jan. 2006, pp 387- 397

489

Page 37: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 490

490 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

channel conditions. Clearly, a coding solution which is well suited for high-quality channelswould use most of the available bits for source coding in conjunction with only minimal er-ror protection, while a solution designed for poor channelswould use a lower rate speechencoder along with more powerful forward error protection.Due to the powerful combina-tion of channel equalization, interleaving and channel coding, near-error-free transmissioncan be achieved down to a certain threshold of the Carrier to Interferer ratio (C/I). However,below this threshold, the error correction code is likely tofail in removing the transmission er-rors, with the result that the residual errors may cause annoying artifacts in the reconstructedspeech signal.

Therefore, in existing systems typically a worst case design is applied, where the channelcoding scheme is sufficiently powerful to remove most transmission errors, as long as thesystem operates within a reasonable C/I range. However, thedrawback of this solution is thatthe speech quality becomes lower than necessary under good channel conditions, since a highproportion of the gross bit rate is dedicated to channel coding.

The Adaptive Multi-Rate (AMR) concept [28] solves this ’resource allocation’ problem ina more intelligent way. Specifically, the ratio between the speech bit rate and the error protec-tion oriented redundancy is adaptively adjusted accordingto the prevalent channel conditions.While the channel quality is inferior, the speech encoder operates at low bit rates, thus ac-commodating powerful forward error control within the total bit rate budget. By contrast,under high channel conditions the speech encoder may benefitfrom using the total bit ratebudget, yielding high speech quality, since in this high-rate case low redundancy error pro-tection is sufficient. Thus, the AMR concept allows the system to operate in an error-resilientmode under poor channel conditions, while benefitting from abetter speech quality undergood channel conditions. This is achieved by dynamically splitting the gross bit rate of thetransmission system between source and channel coding according to the instantaneous chan-nel conditions. Hence, the source coding scheme must be designed for seamless switchingbetween rates available without annoying artifacts.

In this chapter, we first give an overview of the AMR narrowband codec [29], which hasbeen standardised by ETSI [28,29]. The AMR codec is capable of operating in both the full-rate and half-rate speech traffic channels of GSM. It is also amenable to adapting the sourcecoding and channel coding bit rates according to the qualityof the radio channel. As statedabove, most speech codecs employed in communication systems - such as for example theexisting GSM speech codecs (full-rate [362], half-rate [363] and enhanced full-rate [364])- operate at a fixed bit rate, with a trade-off between source coding and channel coding.However, estimating the channel quality and adjusting the transceiver’s bit rate adaptively ac-cording to the channel conditions has the potential of improving the system’s error resilienceand hence the speech quality experienced over high error-rate wireless channels.

The inclusion of an AMR Wideband (AMR-WB) mode has also been under discussion,with feasibility studies [337, 338] being conducted at the time of writing for applications inGSM networks, as well as for the evolving Third Generation (3G) systems [339]. With aimof providing a system-design example for such intelligent systems, during our forthcomingdiscourse in this chapter we will characterize the error sensitivity of the AMR encoder’soutput bits so that the matching channel encoder can be carefully designed for providing therequired protection for the speech bits, which are most sensitive to transmission errors. Theproposed intelligent adaptive multirate voice communications system will be described inSection 10.4.

Page 38: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 491

10.2. THE ADAPTIVE MULTI-RATE SPEECH CODEC 491

++LPC info

Gp

Gc

c (n)k

u(n - )αADAPTIVECODEBOOK

FILTERLP SYNTHESIS

LP ANALYSISQUANTIZATIONINTERPOLATION

WEIGHTINGFILTER

MINIMISEWEIGHTED ERROR

s(n)

FIXEDCODEBOOK

u(n)

e(n)

we (n)

Speech Signal

Figure 10.1: Schematic of ACELP speech encoder.

10.2 The Adaptive Multi-Rate Speech Codec

10.2.1 Overview

The AMR codec employs the Algebraic Code-Excited Linear Predictive (ACELP) model[365, 366] shown in Figure 10.1. Here we provide a brief overview of the AMR codec fol-lowing the approach of [28, 29, 257]. The AMR codec’s complexity is relatively low andhence it can be implemented cost-efficiently. This codec operates on a 20ms frame of 160speech samples, and generates encoded blocks of 95, 103, 118, 134, 148, 159, 204 and 244bits/20ms. This leads to bit rates of 4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2 and 12.2 kbit/s, respec-tively. Explicitly, the AMR speech codec provides eight different modes and their respectiveSegmental SNR performance was shown in Figure 10.2.

Multirate coding [335] supports a variable allocation of bits for a speech frame, adaptingthe rate to the instantaneous local phonetic character of the speech signal, to the channel qual-ity or to network conditions. This is particularly useful indigital cellular communications,where one of the major challenges is that of designing a codecthat is capable of providinghigh quality speech for a wide variety of channel conditions. Ideally, a good solution mustprovide the highest possible quality under perfect channelconditions, unimpaired by thechannel, while also maintaining good quality in hostile high error-rate channel environments.The codec mode adaptation is a key feature of the new AMR standard that has not been usedin any prior mobile standard. At a given fixed gross bit rate, this mechanism of adapting thesource coding rate has the potential of altering the partitioning between the speech source bit

Page 39: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 492

492 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

4 5 6 7 8 9 10 11 12 13Bit Rate (kbit/s)

8

9

10

11

12

13

14

15

16

Seg

men

talS

NR

(dB

)

GSM-AMR Codec (five.bin)

Figure 10.2: Segmental SNR performance of the AMR codec, operating at bit rates inthe range be-tween 4.75 kbit/s and 12.2 kbit/s.

rate and the redundancy added for error protection. Hence, the AMR codec will be invokedin our Burst-by-Burst Adaptive Quadrature Amplitude Modulation Code Division MultipleAccess (BbB-AQAM/CDMA) transceiver.

As shown in Figure 10.1, the Algebraic Code Excited Linear Prediction (ACELP) encoderoperates on the sampled input speech signals(n) and Linear Prediction Coding(LPC) is ap-plied to each speech segment. The coefficients of this predictor are used for constructingan LPC synthesis filter1/(1 − A(z)), which describes the spectral envelope of the speechsegment [335,367]. An Analysis-by-Synthesis (AbS) procedure is employed, in order to findthe particular excitation that minimizes the weighted Minimum Mean Square Error (MMSE)between the reconstructed and original speech signal. The weighting filter is derived fromthe LPC synthesis filter and takes into account the psychoacoustic quantisation noise mask-ing effect, namely that the quantization noise in the spectral neighbourhood of the spectrallyprominent speech formants is less perceptible [335, 367]. In order to reduce the complexity,the adaptive and fixed excitation codebooks are searched sequentially in order to find the per-ceptually best codebook entry, first for the adaptive codebook contribution, and then for thefixed codebook entry. The adaptive codebook consists of time-shifted versions of past exci-tation sequences and describes the long-term characteristics of the speech signal [335,367].

Three of the AMR coding modes correspond to existing standards, which renders commu-nication systems employing the new AMR codec interoperablewith other systems. Specifi-

Page 40: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 493

10.2. THE ADAPTIVE MULTI-RATE SPEECH CODEC 493

cally, the 12.2 kbit/s mode is identical to the GSM Enhanced Full Rate (EFR) standard [339],the 12.2 and 7.4 kbit/s modes [368] correspond to the US1 and EFR (IS-641) codecs ofthe TDMA (IS-136) system, and the 6.7 kbit/s mode is equivalent to the EFR codec of theJapanese PDC system [335]. For each of the codec modes, thereexist corresponding channelcodecs, which perform the mapping between the speech sourcebits and the fixed number ofchannel coded bits.

In the forthcoming subsections, we will give a functional description of the AMR codec’soperation in the 4.75 and 10.2 kbit/s modes. These two bit rates will be used in our investi-gations in order to construct a dual-mode speech transceiver in Section 10.4.

10.2.2 Linear Prediction Analysis

A 10th order LPC analysis filter is employed for modelling theshort term correlation of thespeech signals(n). Short-term prediction, or linear predictive analysis is performed once foreach 20ms speech frame using the Levinson-Durbin algorithm[367]. The LP coefficientsare transformed to the Line Spectrum Frequencies (LSF) for quantization and interpolation.The employment of the LSF [369] representation for quantization of the LPC coefficients, ismotivated by their advantageous statistical properties. Specifically, within each speech frame,there is a strong intra-frame correlation due to the ordering property of the neighbouringLSF values [367]. This essentially motivates the employment of vector quantization. Theinterpolated quantized and unquantized LSFs are convertedback to the LP filter coefficients,in order to construct the synthesis and weighting filters at each subframe. The synthesis filtershown in Figure 10.1 is used in the decoder for producing the reconstructed speech signalfrom the received excitation signalu(n).

10.2.3 LSF Quantization

In the AMR codec, the LSFs are quantized using interframe LSFprediction and Split VectorQuantization (SVQ) [28]. The SVQ aims to split the 10-dimensional LSF vector into a num-ber of reduced-dimension LSF subvectors, which simplifies the associated codebook entrymatching and search complexity. Specifically, the proposedconfiguration minimizes the av-erage Spectral Distortion (SD) [370] achievable at a given total complexity. Predictive vectorquantization is used [28] and the 10-component LSF vectors are split into 3 LSF subvectorsof dimension 3, 3 and 4. The bit allocations for the three subvectors will be described inSection 10.2.7 for the 4.75- and 10.2 kbit/s speech coding modes.

10.2.4 Pitch Analysis

Pitch analysis using the adaptive codebook approach modelsthe long-term periodicity, i.e.,the pitch of the speech signal. It produces an output, which is an amplitude- scaled versionof the adaptive codebook of Figure 10.1 based on previous excitations. The excitation signalu(n) = Gpu(n−α)+Gcck(n) seen in Figure 10.1 is determined from itsGp-scaled historyafter adding theGc-scaled fixed algebraic codebook vectorck for every 5ms subframe. Theoptimum excitation is chosen on the basis of minimising the mean squared errorEw over thesubframe.

Page 41: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 494

494 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

Subframe Subset Pulse: Positions1 i0: 0,5,10,15,20,25,30,35

1 i1: 2,7,12,17,22,27,32,372 i0: 1,6,11,16,21,26,31,36

i1: 3,8,13,18,23,28,33,38

1 i0: 0,5,10,15,20,25,30,352 i1: 3,8,13,18,23,28,33,38

2 i0: 2,7,12,17,22,27,32,37i1: 4,9,14,19,24,29,34,39

1 i0: 0,5,10,15,20,25,30,353 i1: 2,7,12,17,22,27,32,37

2 i0: 1,6,11,16,21,26,31,36i1: 4,9,14,19,24,29,34,39

1 i0: 0,5,10,15,20,25,30,354 i1: 3,8,13,18,23,28,33,38

2 i0: 1,6,11,16,21,26,31,36i1: 4,9,14,19,24,29,34,39

Table 10.1: Pulse amplitudes and positions for 4.75 kbit/s AMR codec mode [28].

Track Pulse Positions1 i0, i4 0,4,8,12,16,20,24,28,32,362 i1, i5 1,5,9,13,17,21,25,29,33,373 i2, i6 2,6,10,14,18,22,26,30,34,384 i3, i7 3,7,11,15,19,23,27,31,35,39

Table 10.2: Pulse amplitudes and positions for 10.2 kbit/s AMR codec code [28].

In an optimal codec, the fixed codebook index and codebook gain as well as the adaptivecodebook parameters would all be jointly optimized in orderto minimizeEw [371]. However,in practice this is unfeasible due to the associated excessive complexity. Hence, a sequentialsub-optimal approach is applied in the AMR codec, where the adaptive codebook parametersare determined first under the assumption of zero fixed codebook excitation component, i.e.,Gc = 0, since at this optimisation stage no fixed codebook entry wasdetermined. Then,given that the adaptive codebook parameters are found, which consist of the delay and gainof the pitch filter, the fixed codebook parameters are determined.

Most CELP codecs employ both so-called open-loop and closed-loop estimation of theadaptive codebook delay parameters, as is the case in the AMRcodec. The open-loop esti-mate of the pitch period is used to narrow down the range of thepossible adaptive codebookdelay values and then the full closed-loop analysis-by-synthesis procedure is used for findinga high-resolution delay around the approximate open-loop position [366].

Page 42: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 495

10.2. THE ADAPTIVE MULTI-RATE SPEECH CODEC 495

Parameter 1st 2nd 3rd 4th Totalsubframe subframe subframe subframe per frame

LSFs 8+8+7=23 (1-23)Pitch Delay 8 (24-31) 4 (49-52) 4 (62-65) 4 (83-86) 20

Fixed CB Index 9 (32-40) 9 (53-61) 9 (66-74) 9 (87-95) 36Codebook Gains 8 (41-48) 8 (75-82) 16

Total 95/20ms=4.75 kbit/s

LSFs 8+9+9=26Pitch Delay 8 5 8 5 26

Fixed CB Index 31 31 31 31 124Codebook Gains 7 7 7 7 28

Total 204/20ms=10.2 kbit/s

Table 10.3: Bit allocation of the AMR speech codec at 4.75 kbit/s and 10.2 kbit/s [28]. The bit po-sitions for 4.75 kbit/s mode, which are shown in round bracket assist in identifying thecorresponding bits in Figure 10.5.

10.2.5 Fixed Codebook With Algebraic Structure

Once the adaptive codebook parameters are found, the fixed codebook is searched by takinginto account the now known adaptive codebook vector. This sequential approach consti-tutes a trade-off between the best possible performance andthe affordable computationalcomplexity. The fixed codebook is searched by using an efficient non-exhaustive analysis-by-synthesis technique [372], minimizing the mean square error between the weighted inputspeech and the weighted synthesized speech.

The fixed, or algebraic codebook structure is specified in Table 10.1 and Table 10.2 forthe 4.75 kbit/s and 10.2 kbit/s codec modes, respectively [28]. The algebraic fixed code-book structure is based on the so-called Interleaved Single-Pulse Permutation (ISPP) codedesign [371]. The computational complexity of the fixed codebook search is substantiallyreduced, when the codebook entriesck(n) used are mostly zeros. The algebraic structure ofthe excitation having only a few non-zero pulses allows for afast search procedure. The non-zero elements of the codebook are equal to either +1 or -1, andtheir positions are restrictedto the limited number of excitation pulse positions, as portrayed in Table 10.1 and 10.2 forthe speech coding modes of 4.75 and 10.2 kbit/s, respectively.

More explicitly, in the 4.75 kbit/s codec mode, the excitation codebook contains 2 non-zero pulses, denoted byi0 andi1 in Table 10.1. Again, all pulses can have the amplitudes+1 or -1. The 40 positions in a subframe are divided into 4 so-called tracks. Two subsets of2 tracks each are used for each subframe with one pulse in eachtrack. Different subsets oftracks are used for each subframe, as shown in Table 10.1 and hence one bit is needed forencoding the subset used. The two pulse positions,i0 andi1 are encoded with the aid of 3bits each, since both have eight legitimate positions in Table 10.1. Furthermore, the sign ofeach pulse is encoded using 1 bit. This gives a total of 1+2(3)+2(1)=9 bits for the algebraicexcitation encoding in a subframe.

In the 10.2 kbit/s codec mode of Table 10.3 there are four tracks, each containing twopulses. Hence, the excitation vector contains a total of 4x2=8 non-zero pulses. All the pulses

Page 43: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 496

496 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

can have the amplitudes of +1 or -1 and the excitation pulses are encoded using a total of 31bits.

For the quantization of the fixed codebook gain, a gain predictor is used, in order to exploitthe correlation between the fixed codebook gains in adjacentframes [28]. The fixed code-book gain is expressed as the product of the predicted gain based on previous fixed codebookenergies and a correction factor. The correction factor is the parameter, which is coded to-gether with the adaptive codebook gain for transmission over the channel. In the 4.75 kbit/smode the adaptive codebook gains and the correction factorsare jointly vector quantized forevery 10 ms, while this process occurs every subframe of 5 ms in the 10.2 kbit/s mode.

10.2.6 Post-Processing

At the decoder, an adaptive postfilter [373] is used for improving the subjective quality ofthe reconstructed speech. The adaptive postfilter consistsof a formant-based postfilter and aspectral tilt-compensation filter [373]. Adaptive Gain Control (AGC) is also used, in orderto compensate for the energy difference between the synthesized speech signal, which is theoutput from the synthesis filter and the postfiltered speech signal.

10.2.7 The AMR Codec’s Bit Allocation

The AMR speech codec’s bit allocation is shown in Table 10.3 for the speech modes of 4.75kbit/s and 10.2 kbit/s. For the 4.75 kbit/s speech mode, 23 bits are used for encoding theLSFs by employing split vector quantization. As stated before, the LSF vector is split into3 subvectors of dimension 3, 3 and 4, and each subvector is quantized using 8, 8 and 7 bits,respectively. This gives a total of 23 bits for the LSF quantization of the 4.75 kbit/s codecmode.

The pitch delay is encoded using 8 bits in the first subframe and the relative delays of theother subframes are encoded using 4 bits. The adaptive codebook gain is quantized togetherwith the above-mentioned correction factor of the fixed codebook gain for every 10ms using 8bits. As a result, a total of 16 bits are used for encoding boththe adaptive- and fixed codebookgains. As described in Section 10.2.5, 9 bits were used to encode the fixed codebook indicesfor every subframe, which resulted in a total of 36 bits per 20ms frame for the fixed codebook.

For the 10.2 kbit/s mode, the three LSF subvectors are quantized using 8, 9 and 9 bitsrespectively. This implies that 26 bits are used for quantizing the LSF vectors at 10.2 kbit/s,as shown in Table 10.3. The pitch delay is encoded using 8 bitsin the first and third subframesand the relative delay of the other subframes is encoded using 5 bits. The adaptive codebookgain is quantized together with the correction factor of thefixed codebook gain using a 7-bit non-uniform vector quantization scheme for every 5ms subframe. The fixed codebookindices are encoded using 31 bits in each subframe, in order to give a total of 124 bits for a20ms speech frame.

10.2.8 Codec Mode Switching Philosophy

In the AMR codec, the mode adaptation allows us to invoke a subset of at most 4 modes outof the 8 available modes [258]. This subset is referred to as the Active Codec Set (ACS). Inthe proposed BbB-AQAM/CDMA system the codec mode adaptation is based on the channel

Page 44: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 497

10.3. SPEECH CODEC’S ERROR SENSITIVITY 497

quality, which is expressed as the MSE at the output of the multi-user CDMA detector [374].The probability of switching from one mode to another is typically lower, than the probabilityof sustaining a specific mode.

Intuitively, frequent mode switching is undesirable due tothe associated perceptual speechquality fluctuations. It is more desirable to have a mode selection mechanism that is primarilysource-controlled, assisted by a channel-quality-controlled override. During good channelconditions, the mode switching process is governed by the local phonetic character of thespeech signal and the codec will adapt itself to the speech signal characteristics in an attemptto deliver the highest possible speech quality. When the channel is hostile or the network iscongested, transceiver control or external network control can take over the mode selectionand allocate less bits to source coding, in order to increasethe system’s robustness or usercapacity. By amalgamating the channel-quality motivated or network- and source-controlledprocesses, we arrive at a robust, high-quality system. Surprisingly, we found from our infor-mal listening tests that the perceptual speech quality was not affected by the rate of codecmode switching, as it will be demonstrated in Section 10.8. This is due to the robust ACELPstructure, whereby the main bit rate reduction is related tothe fixed codebook indices, asshown in Table 10.3 for the codec modes of 4.75 kbit/s and 10.2kbit/s.

As expected, the performance of the AMR speech codec is sensitive to transmission errorsof the codec mode information. The corruption of the codec mode information that describes,which codec mode has to be used for decoding leads to completespeech frame losses, sincethe decoder is unable to apply the correct mode for decoding the received bit stream. Hence,robust channel coding is required in order to protect the codec mode information and therecommended transmission procedures were discussed for example by Bruhnet al. [257].Furthermore, in transceiver-controlled scenarios the prompt transmission of the codec modeinformation is required for reacting to sudden changes of the channel conditions. In our inves-tigations we assume that the signalling of the codec mode information is free from corruption,so that we can concentrate on other important aspects of the system.

Let us now briefly focus our attention on the robustness of theAMR codec against channelerrors.

10.3 Speech Codec’s Error Sensitivity

In this section, we will demonstrate that some bits are significantly more sensitive to channelerrors than others, and hence these sensitive bits have to bebetter protected by the channelcodec [371]. A commonly used approach in quantifying the sensitivity of a given bit is toinvert this bit consistently in every speech frame and evaluate the associated Segmental SNR(SEGSNR) degradation. The error sensitivity of various bits of the AMR codec determined inthis way is shown in Figure 10.3 for the bit rate of 4.75 kbit/s. Again, Figure 10.3 shows moreexplicitly the bit sensitivities in each speech subframe for the bit rate of 4.75 kbit/s, with thecorresponding bit allocations shown in Table 10.3. For the sake of visual clarity, Subframe 4(83-95) was not shown explicitly above, since it exhibited identical SEGSNR degradation toSubframe 2.

It can be observed from Figure 10.3 that the most sensitive bits are those of the LSF sub-vectors, seen at positions 1-23. The error sensitivity of the adaptive codebook delay is thehighest in the first subframe, commencing at bit 24, as shown in Figure 10.3, which was

Page 45: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 498

498 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 10 20 30 40 50 60 70 80 90Bit Index

0

2

4

6

8

10

12

SE

GS

NR

Deg

rada

tion

(dB

)

4.75 kbps Speech Mode

23 LSFs bits Subframe 1 Subframe 2 Subframe 3 Subframe 4

0 5 10 15 20Bit Index

0

5

10

15 1st LSF subvector 2nd LSF subvector 3rd LSF subvector

25 30 35 40 45Bit Index (Subframe 1)

0

5

10

15

SE

GS

NR

Deg

rada

tion

(dB

)

Pitch Delay Fixed Codebook Index Codebook Gains

50 52 54 56 58 60Bit Index (Subframe 2)

0

5

10

15Rel. Pitch Delay Fixed Codebook Index

65 70 75 80Bit Index (Subframe 3)

0

5

10

15 Rel. Pitch Delay Fixed Codebook Index Codebook Gains

Figure 10.3: The SEGSNR degradations due to 100% bit error rate in the 95-bit, 20 ms AMR speechframe. The associated bit allocation can be seen in Table 10.3.

Page 46: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 499

10.3. SPEECH CODEC’S ERROR SENSITIVITY 499

0 2 4 6 8 10 12Frame Number

02468

1012141618

Deg

rada

tion

(dB

)

LSF 1st subvector (Bit 1)

0 1 2 3 4 5 6 7 8 9 10 11 12Frame Number

02468

1012141618

Deg

rada

tion

(dB

)

Adaptive Codebook Delay (Bit 24)

0 1 2 3 4 5 6 7 8 9 10 11 12Frame Number

02468

1012141618

Deg

rada

tion

(dB

)

Fixed Codebook Index (Bit 33)

0 1 2 3 4 5 6 7 8 9 10 11 12Frame Number

02468

1012141618

Deg

rada

tion

(dB

)

Fixed Codebook Sign (Bit 39)

0 1 2 3 4 5 6 7 8 9 10 11 12Frame Number

02468

1012141618

Deg

rada

tion

(dB

)

Codebook Gains (Bit 41)

Figure 10.4: The SEGSNR degradation versus speech frame index for various bits.

encoded using 8 bits in Table 10.3. By contrast, the relativeadaptive codebook delays inthe next three subframes are encoded using 4 bits each, and a graceful degradation of theSEGSNR is observed in Figure 10.3. The next group of bits is constituted by the 8 codebookgains in decreasing order of bit sensitivity, as seen in Figure 10.3 for bit positions 41-48 ofSubframe 1 and 75-82 of Subframe 3. The least sensitive bits are related to the fixed code-book pulse positions, which were shown for example at bit positions 54-61 in Figure 10.3.This is because, if one of the fixed codebook index bits is corrupted, the codebook entry se-lected at the decoder will differ from that used in the encoder only in the position of one ofthe non-zero excitation pulses. Therefore the corrupted codebook entry will be similar to theoriginal one. Hence, the algebraic codebook structure usedin the AMR codec is inherentlyquite robust to channel errors. The information obtained here will be used in Section 10.6.2for designing the bit mapping procedure in order to assign the channel encoders according tothe bit error sensitivities.

Although appealing in terms of its conceptual simplicity, the above approach we used forquantifying the error sensitivity of the various coded bitsdoes not take into account the errorpropagation properties of different bits over consecutivespeech frames. In order to obtain abetter picture of the error propagation effects, we also employed a more elaborate error sen-sitivity measure [371]. Here, for each bit we find the averageSEGSNR degradation due to a

Page 47: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 500

500 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 10 20 30 40 50 60 70 80 90Bit Index

0

10

20

30

40

50

60

70

SE

GS

NR

Deg

rada

tion

(dB

)

4.75 kbps Speech Mode

Figure 10.5: Average SEGSNR degradation due to single bit errors in various speechcoded bits

single bit error both in the specific frame in which the error occurs and in consecutive frames.These effects are exemplified in Figure 10.4 for five different bits, where each of the bitsbelongs to a different speech codec parameter. More explicitly, Bit 1 represents the first bit ofthe first LSF subvector, which shows some error propagation effects due to the interpolationbetween the LSFs over consecutive frames. The associated SEGSNR degradation dies awayover six frames. Bit 24 characterised in Figure 10.4 is one ofthe adaptive codebook delaybits and the corruption of this bit has the effect of a more prolonged SEGSNR degradationover 10 frames. The fixed codebook index bits of Table 10.3 aremore robust and observed tobe the least sensitive bits, as it was shown in Figure 10.3 earlier. This argument is supportedby the example of Bit 33 in Figure 10.4, where a smaller degradation is observed over con-secutive frames. A similar observation also applies to Bit 39 in Figure 10.4, which is the signbit of the fixed codebook. By contrast, Bit 41 of the codebook gains produced a high andprolonged SEGSNR degradation profile.

We recomputed our bit-sensitivity results of Figure 10.3 using this second approach, inorder to obtain Figure 10.5, taking into account the error propagation effects. More explicitly,these results were calculated by summing the SEGSNR degradations over all the frames,which were affected by the error. Again, these results are shown in Figure 10.5 and theassociated bit positions can be identified with the aid of Table 10.3. The importance of the

Page 48: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 501

10.4. SYSTEM BACKGROUND 501

RRNS

decoder

AMR

decoder

MMSE-

BDFE

encoder

AMR Spreader

RRNS

encoder

Mode

selection

Modulation

adaptation

De-

Modulator

Modulator

Channel

estimationChannel

Figure 10.6: Schematic of the adaptive dual-mode JD-CDMA system

adaptive codebook delay bits became more explicit. By contrast, the significance of the LSFswas reduced, although still requiring strong error protection using channel coding.

Having characterised the error sensitivity of various speech bits, we will capitalise on thisknowledge, in order to assign the speech bits to various bit protection classes, as it will bediscussed in Section 10.6.2. Let us now consider the variouscomponents of our transceiver,which utilises the AMR codec in the next section. We will firstdiscuss the motivation ofemploying multirate speech encoding in conjunction with a near-instantaneously adaptivetransceiver, with a detailed background description of earlier contributions from various re-searchers.

10.4 System Background

The AMR concept is amenable to a range of intelligent configurations. When the instanta-neous channel quality is low, the speech encoder operates atlow bit rates, thus facilitatingthe employment of powerful forward error control within a fixed bit rate budget. By contrast,under favourable channel conditions the speech encoder mayuse its highest bit rate, implyinghigh speech quality, since in this case weaker error protection is sufficient or a less robust,but higher bit rate transceiver mode can be invoked. However, the system must be designedfor seamless switching between its operating rates withoutobjectionable perceptual artifacts.

Daset al. provided an extensive review of multimode and multirate speech coding in [375].Some of the earlier contributors in multimode speech codingincluded Taniguchiet al. [376],Kroon and Atal [377], Yong and Gersho [378], DeJacoet al. [379], Paksoyet al. [380] andCellario et al. [381] . Further recent work on incorporating multirate speech coding intowireless systems was covered in a range of contributions [382]- [383]. Specifically, Yuenet al. [382] in their paper employed embedded and multimode speechcodecs based on theCode Excited Linear Prediction (CELP) technique in combination with channel codecs usingRate Compatible Punctured Convolutional codes (RCPC) [384]. The combined speech andchannel coding resulted in gross bit rates of 12.8 kbit/s and9.6 kbit/s, supported by either

Page 49: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 502

502 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

TDMA or CDMA multiple access techniques. The investigations showed that multimodeCELP codecs performed better, than their embedded counterparts, and that adaptive schemeswere superior to fixed-rate schemes.

LeBlancet al. in [385] developed a low power, low delay, multirate codec suitable forindoor wireless communications. The speech codec was a modified version of the G.728LD-CELP standard scheme [386], employing a multi-stage excitation configuration togetherwith an adaptive codebook. A lower LPC predictor order of 10 was used, rather than 50 as inG.728, and a higher bandwidth expansion factor of 0.95, rather than 0.9883 was employed,which resulted in a more robust performance over hostile channels. This algorithm was inves-tigated over indoor wireless channels assisted by 2-branchdiversity, using QPSK modulationand wideband TDMA transmission. No channel coding was employed and the system’s per-formance was not explicitly characterised in the paper. In [241], Kleideret al. proposed anadaptive speech transmission system utilising the Multi-Rate Sinusoidal Transform Codec(MRSTC), in conjunction with convolutional channel codingand Pulse Position Modula-tion (PPM). The MRSTC is based on the sinusoidal transform coding scheme proposed byMcAulay [387]. The MRSTC was investigated further by the same authors for wireless andinternet applications in [388], using a range of bit rates between 1.2 kbit/s and 9.6 kbit/s. TheMRSTC was incorporated into a communication system employing convolutional channelcoding and a fixed BPSK modulation scheme, and it was reportedto give a nearly 9 dB inaverage spectral distortion reduction over the fixed-rate 9.6 kbit/s benchmarker.

In a contribution from the speech coding team at Qualcomm, Das et al. [389] illustratedusing a multimode codec having four modes (Full-rate, Half-rate, Quarter-rate and Eight-rate), that the diverse characteristics of the speech segments can be adequately captured us-ing variable rate codecs. It was shown that a reduced averagerate can be obtained, achievingequivalent speech quality to that of a fixed full-rate codec.Specifically, a multimode codecwith an average rate of 4 kbit/s achieved significantly higher speech quality than that of theequivalent fixed-rate codec. An excellent example of a recent standard variable-rate codec isthe Enhanced Variable Rate Codec (EVRC), standardized by the Telecommunications Indus-try Association (TIA) as IS-127 [245]. This codec operates at a maximum rate of 8.5 kbit/sand at an average rate of about 4.1 kbit/s. The EVRC consists of three coding modes that areall based on the CELP model. The activation of one of the threemodes is source-controlled,based on the estimation of the input signal state.

Multimode speech coding was also evaluated in an ATM-based environment by Beritelliet al. in [390]. The speech codec possessed seven coding rates, ranging from 0.4 to 16kbit/s. Five different bit rates were allocated for voiced/unvoiced speech encoding, while twolower bit rates were generated for inactive speech periods,depending on the stationarity ofthe background noise. The variable-rate voice source was modelled using a Markov-modelbased process. The multimode coding scheme was compared to the 12 kbit/s CS-ACELPstandard codec using the traditional ON-OFF voice generation model. It was found that themultimode codec performed better, than the CS-ACELP ON-OFFscheme, succeeding inminimizing the required transmission bandwidth by exploiting the near-instantaneous localcharacteristics of the speech waveform and it was also capable of synthesizing the backgroundnoise realistically.

Our discussion so far have been focused on source-controlled multirate codecs, where thecoding algorithm responds to the time-varying local character of the speech signal in orderto determine the required speech rate. An additional capacity enhancement can be achieved

Page 50: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 503

10.5. SYSTEM OVERVIEW 503

by introducing network control, which implies that the speech codec has to respond to anetwork-originated control signal for switching the speech rate to one of a predetermined setof possible rates. The network control procedure for example was addressed by Hanzoet al.[371] and Kawashimaet al. [383]. Specifically, in [371] a novel high-quality, low complexitydual-rate 4.7 kbit/s and 6.5 kbit/s ACELP codec was proposedfor indoor communications,which was capable of dropping the associated source rate andspeech quality under networkcontrol, in order to invoke a more resilient modem mode, amongst less favourable channelconditions. Source-matched binary BCH channel codecs combined with unequal protectiondiversity- and pilot-assisted 16QAM and 64QAM was employed, in order to accommodateboth the 4.7 and the 6.5 kbit/s coded speech bits at a fixed signalling rate of 3.1 kBd. Goodcommunications quality speech was reported in an equivalent speech channel bandwidth of4 kHz, if the channel Signal-to-Noise Ratio (SNR) and Signal-to-Interference (SIR) of thebenign indoors cordless channels were in excess of about 15 and 25 dB for the lower andhigher speech quality 16QAM and 64QAM systems, respectively. In [383], Kawashimaetal. proposed network control procedures for CDMA systems, focusing only on the downlinkfrom the base to the mobile station, where the base station can readily coordinate the codingrate of all users without any significant delay. This networkcontrol scheme was based onthe so-called M/M/∞/M queueing model applied to a cell under heavy traffic conditions. Amodified version of the QCELP codec [379] was used, employingfixed rates of 9.6 kbit/s and4.8 kbit/s.

Focussing our attention on the associated transmission aspects, in recent years significantresearch interests have also been devoted to Burst-by-Burst Adaptive Quadrature AmplitudeModulation (BbB-AQAM) transceivers [51]- [391]. The transceiver reconfigures itself ona burst-by-burst basis, depending on the instantaneous perceived wireless channel quality.More explicitly, the associated channel quality of the nexttransmission burst is estimated andthe specific modulation mode, which is expected to achieve the required performance targetat the receiver is then selected for the transmission of the current burst. Modulation schemesof different robustness and of different data throughput have also been investigated [392]-[393]. The BbB-AQAM principles have also been applied to Joint Detection Code DivisionMultiple Access (JD-CDMA) [374,394] and OFDM [395,396].

Against the above background, in this section we introduce anovel dual-mode burst-by-burst adaptive speech transceiver scheme, based on the AMR speech codec, RedundantResidue Number System (RRNS) assisted channel coding [397]and Joint Detection aidedCode-Division Multiple Access (JD-CDMA) [374]. The mode switching is controlled by thechannel quality fluctuations imposed by the time-variant channel,which is not necessarilya desirable scenario. However, we will endeavour to contrive measures in order to mitigatethe associated perceptual speech quality fluctuations. Theunderlying trade-offs associatedwith employing two speech modes of the AMR standard speech codec in conjunction with areconfigurable, unequal error protection BPSK/4QAM modem are investigated.

10.5 System Overview

The schematic of the proposed adaptive JD-CDMA speech transceiver is depicted in Fig-ure 10.6. The encoded speech bits generated by the AMR codec at the bit rate of 4.75 or10.2 kbit/s are first mapped according to their error sensitivities into three protection classes,

Page 51: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 504

504 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

RRNS Number of Total TotalClass Code Codewords databits databits codedbits

4.75kbit/s/BPSKI RRNS(8, 4) 2 40II RRNS(8, 5) 1 25 95 160III RRNS(8, 6) 1 30

10.2kbit/s/4QAMI RRNS(8, 4) 3 60II RRNS(8, 5) 1 25 205 320III RRNS(8, 6) 4 120

Table 10.4: RRNS codes designed for two different modulation modes.

although for simplicity this is not shown explicitly in the figure. The sensitivity-orderedspeech bits are then channel encoded using the RRNS encoder [397] and modulated using are-configurable BPSK or 4QAM based JD-CDMA scheme [51]. We assigned the 4.75 kbit/sspeech codec mode to the BPSK modulation mode, while the 10.2kbit/s speech codec modeto the 4QAM mode. Therefore, this transmission scheme delivers a higher speech qualityat 10.2 kbit/s, provided that sufficiently high channel SNRsand SIRs prevail. Furthermore,it can be reconfigured under transceiver control in order to provide an inherently lower, butunimpaired speech quality amongst lower SNR and SIR conditions at the speech rate of 4.75kbit/s.

Subsequently, the modulated symbols are spread in Figure 10.6 by the CDMA spreadingsequence assigned to the user, where a random spreading sequence is used. The MinimumMean Squared Error Block Decision Feedback Equaliser (MMSE-BDFE) is used as the mul-tiuser detector [374], where perfect Channel Impulse Response (CIR) estimation and perfectdecision feedback are assumed. The soft outputs for each user are obtained from the MMSE-BDFE and passed to the RRNS channel decoder. Finally, the decoded bits are mapped backto their original bit protection classes by using a bit-mapper (not shown in Figure 10.6) andthe speech decoder reconstructs the original speech information.

In BbB-AQAM/CDMA, in order to determine the best choice of modulation mode in termsof the required trade-off between the BER and throughput, the near instantaneous quality ofthe channel has to be estimated. The channel quality is estimated at receiverA and the chosenmodulation mode and its corresponding speech mode are then communicated using explicitsignalling to transmitterB in a closed-loop scheme, as depicted in Figure 10.6. Specifically,the channel quality estimate is obtained by using the Signalto residual Interference plusNoise Ratio (SINR) metric, which can be calculated at the output of MMSE-BDFE [374].

Page 52: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 505

10.6. REDUNDANT RESIDUE NUMBER SYSTEM (RRNS) CHANNEL CODING 505

10.6 Redundant Residue Number System (RRNS) ChannelCoding

10.6.1 Overview

In order to improve the performance of the system, we employ the novel family of the so-called Redundant Residue Number System (RRNS) codes for protecting the speech bits, de-pending on their respective error sensitivities.

Since their introduction, RRNS have been used for constructing fast arithmetics [398,399].In this paper, we exploit the error control properties of thenon-binary systematic RRNScodes, which - similarly to Reed-Solomon codes - exhibit maximum minimum distance prop-erties [400,401]. Hence, RRNS codes are similar to Reed Solomon (RS) codes [339]. How-ever, the RRNS codes chosen in our design are more amenable todesigning short codes. Moreexplicitly, in the context of RS codes, short codes are derived by inserting dummy symbolsinto full-length codes. This, however, requires the decoding of the full-length RS-code. Bycontrast, RRNS codes simply add the required number of redundant symbols. Furthermore,RRNS codes allow us to use the low-complexity technique of residue dropping [401]. Bothof these advantages will be augmented during our further discourse.

An RRNS(n, k) code hask so-called residues, which host the original data bits and theadditional(n − k) redundant residues can be employed for error correction at the decoder.The coding rate of the code isk/n and the associated error correction capability of the codeis t = ⌊n−k

2 ⌋ non-binary residues [400, 401]. At the receiver, both soft decision [397] andresidue dropping [402] decoding techniques are employed.

The advantages of the RRNS codes are simply stated here without proof due to lack ofspace [397, 402]. Since the so-called residues of the RRNS [398, 399] can be computed in-dependently from each other, additional residues can be added at any stage of processing ortransmission [403]. This has the advantage that the required coding power can be adjustedaccording to the prevalent BER of the transmission medium. For example, when the pro-tected speech bits enter the wireless section of the network- where higher BERs prevail thanin the fixed network - simply a number of additional redundantresidues are computed andconcatenated to the message for providing extra protection.

In our design, RRNS codes employing 5 bits per residue have been chosen. Three differentRRNS codes having different code rates are used for protecting the three different classes ofspeech bits. In addition, the RRNS codes employed are also switched in accordance withthe modulation modes and speech rates used in our system. In Table 10.4, we have two setof RRNS codes for the BPSK and 4QAM modulation modes. For the most sensitive classI speech bits, we used a RRNS(8,4) code, which has a minimum free distance ofdmin =5 [397] and a code rate of1/2. At the receiver, the soft metric of each received bit wascalculated and soft decoding was applied. An extra information residue was added to theRRNS(8,4) code for generating the RRNS(8,5) code for the speech bit protection class II.The extra residue enables us to apply one residue dropping [402], and soft decision decoding.The Class III bits are least protected, using the RRNS(8,6) code, which has a minimum freedistance ofdmin = 3 and a code rate of2/3. Only soft decision decoding is applied to thiscode.

Page 53: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 506

506 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0.01 0.1 1 10Bit Error Rate (%)

0

2

4

6

8

10

12

14

SE

GS

NR

Deg

rada

tion

(dB

)

Class Three (30 bits)Class Two (25 bits)Class One (40 bits)Full Class (with weighting)Full Class (95 bits)

Figure 10.7: SEGSNR degradation versus average BER for the 4.75 kbit/s AMR codecfor full-classand triple-class protection systems. When the bits of a specific class were corrupted, bitsof the other classes were kept intact.

10.6.2 Source-Matched Error Protection

The error sensitivity of the 4.75 kbit/s AMR codec’s source bits was evaluated in Figures 10.3and 10.5. The same procedures were applied in order to obtainthe error sensitivity for thesource bits of the 10.2 kbit/s AMR codec. Again, in our system, we employed RRNS channelcoding and three protection classes were deemed to constitute a suitable trade-off between amoderate system complexity and high performance. As shown in Table 10.4, three differentRRNS codes having different code rates are used for protecting the three different classes ofspeech bits in a speech frame.

For the 4.75 kbit/s AMR speech codec, we divided the 95 speechbits into three sensitivityclasses, Class I, II and III. Class I consists of 40 bits, while Class II and III were allocated 25and 30 bits, respectively. Then we evaluated the associatedSEGSNR degradation inflictedby certain fixed channel BERs maintained in each of the classes using randomly distributederrors, while keeping bits of the other classes intact. The results of the SEGSNR degradationsapplying random errors are portrayed in Figure 10.7 for boththe full-class and the triple-class

Page 54: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 507

10.6. REDUNDANT RESIDUE NUMBER SYSTEM (RRNS) CHANNEL CODING 507

0.01 0.1 1 10Bit Error Rate (%)

0

5

10

15

20

SE

GS

NR

Deg

rada

tion

(dB

) Class Three (119 bits)Class Two (25 bits)Class One (60 bits)Full Class (with weighting)Full Class (204 bits)

Figure 10.8: SEGSNR degradation versus average BER for the 10.2 kbit/s AMR codecfor full classand triple-class protection systems. When the bits of a specific class were corrupted, bitsof the other classes were kept intact.

system. It can be seen that Class I, which consists of the 40 most sensitive bits, suffers thehighest SEGSNR degradation. Class II and Class III - which are populated mainly with thefixed codebook index bits - are inherently more robust to errors. Note that in the full-classscenario the associated SEGSNR degradation is higher than that of the individual protectionclasses. This is due to having more errors in the entire 95-bit frame at a fixed BER, com-pared to the individual protection classes assigned 40, 25 and 30 bits respectively, since uponcorrupting a specific class using a fixed BER, the remaining classes remained intact. Hencethe BER of the individual protection classes averaged over all the 95 bits was lower, thanthat of the full-class scenario. For the sake of completeness, we decreased the BER of thefull-class scheme so that on average the same number of errors was introduced into the indi-vidual classes as well as in the full-class scheme. In this scenario, it can be seen from Figure10.7 that, as expected, the Class I scheme has the highest SEGSNR degradation, while thesensitivity of the full-class scheme is mediocre.

Similarly, the 204 bits of a speech frame in the 10.2 kbit/s AMR speech codec mode are

Page 55: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 508

508 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

divided into three protection classes. Class I is allocatedthe 60 most sensitive bits, while 25and 119 bits are distributed to Class II and Class III, in decreasing order of error sensitivity.Their respective SEGSNR degradation results against the BER are presented in Figure 10.8.Due to the fact that the number of bits in Class III is five timeshigher than in Class II,the error sensitivity of Class III compared to Class II appeared higher. Hence the SEGSNRdegradation appears higher for Class III than for Class II, as observed in Figure 10.8. Thisoccurs due to the non-trivial task of finding appropriate channel codes to match the sourcesensitivities, and as a result, almost 60% of the bits are allocated to Class III. Note that afterthe RRNS channel coding stage, an additional dummy bit is introduced in Class III, whichcontains 119 useful speech bits, as shown in Table 10.4. The extra bit can be used as aCyclic Redundancy Check (CRC) bit for the purpose of error detection. Having consideredthe source and channel coding aspects, let us now focus our attention on transmission issues.

10.7 Joint Detection Code Division Multiple Access

10.7.1 Overview

Joint detection receivers [404] constitute a class of multiuser receivers that were developedbased on conventional channel equalization techniques [51] used for mitigating the effects ofInter-Symbol Interference (ISI). These receivers utilizethe Channel Impulse Response (CIR)estimates and the knowledge of the spreading sequences of all the users in order to reduce thelevel of Multiple Access Interference (MAI) in the receivedsignal.

By concatenating the data symbols of all CDMA users successively, as though they weretransmitted by one user, we can apply the principles of conventional single-user channelequalization [51] to multiuser detection. In our investigations, we have used the MMSE-BDFE proposed by Kleinet al. [404], where the multiuser receiver aims to minimize themean square error between the data estimates and the transmitted data. A feedback processis incorporated, where the previous data estimates are fed back into the receiver in order toremove the residual interference and to assist in improvingthe BER performance.

10.7.2 Joint Detection Based Adaptive Code Division Multiple Access

In QAM [51], n bits are grouped to form a signalling symbol andm = 2n different symbolsconvey all combinations of then bits. Thesem symbols are arranged in a modulation constel-lation to form them-QAM scheme. In the proposed system we used the BbB-AQAM/CDMAmodes of BPSK (2-QAM) and 4QAM, conveying 1 and 2 bits per symbol, respectively. How-ever, for a given channel SNR, the BER performance degrades upon switching from BPSKto 4QAM, whilst doubling the throughput.

Previous research in BbB-AQAM schemes designed for TDMA transmissions has beencarried out by Webb and Steele [405]; Sampei, Komaki and Morinaga [391]; Goldsmith andChua [406]; as well as Torranceet al. [407]. This work has been extended to widebandchannels, where the received signal also suffers from ISI inaddition to amplitude and phasedistortions due to the fading channel. The received signal strength is not a good indicatorof the wideband channel’s instantaneous quality, since thesignal is also contaminated by ISIand co-channel interference. Wonget al. [408] proposed a wideband BbB-AQAM scheme,where a channel equalizer was used for mitigating the effects of ISI on the CIR estimate.

Page 56: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 509

10.8. SYSTEM PERFORMANCE 509

Parameter Value

Channel type COST 207 Bad Urban (BU)Paths in channel 7Doppler frequency 80 HzSpreading factor 16Chip rate 2.167 MBaudJD block size 26 symbolsReceiver type MMSE-BDFEAQAM type Dual-mode (BPSK, 4QAM)Channel codec Triple-class RRNSChannel-coded Rate 8/16 kbit/sSpeech Codec AMR (ACELP)Speech Rate 4.75/10.2 kbit/sSpeech Frame Length 20 ms

Table 10.5: Transceiver Parameters

Here we propose to combine joint detection CDMA [404] with AQAM, by modifying theapproach used by Wonget al. [408]. Joint detection is particularly suitable for combiningwith AQAM, since the implementation of the joint detection algorithms does not requireany knowledge of the modulation mode used [374]. Hence the associated complexity isindependent of the modulation mode used.

In order to choose the most appropriate BbB-AQAM/CDMA mode for transmission, theSINR at the output of the MMSE-BDFE was estimated by modifying the SINR expressiongiven in [404] exploiting the knowledge of the transmitted signal amplitude,g, the spreadingsequence and the CIR. The data bits and noise values were assumed to be uncorrelated. Theaverage output SINR was calculated for each transmission burst of each user. The conditionsused for switching between the two AQAM/JD-CDMA modes were set according to theirtarget BER requirements as:

Mode =

BPSK SINR< t14QAM t1 ≤ SINR

, (10.1)

wheret1 represents the switching threshold between the two modes.With the system elements described, we now focus our attention on the overall performance

of the adaptive transceiver proposed.

10.8 System Performance

The simulation parameters used in our AQAM/JD-CDMA system are listed in Table 10.5.The channel profile used was the COST 207 Bad Urban (BU) channel [409] consisting ofseven paths, where each path was faded independently at a Doppler frequency of 80 Hz.

The BER performance of the proposed system is presented in Figures 10.9, 10.10 and10.11. Specifically, Figure 10.9 portrays the BER performance using the 4QAM modulation

Page 57: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 510

510 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 2 4 6 8 10 12 14SNR(dB)

10-4

2

5

10-3

2

5

10-2

2

5

10-1

2

BE

R

BER Against SNR

Class IIIClass IIClass IAverage

Figure 10.9: BER performance of 4QAM/JD-CDMA over the COST 207 BU channel ofTable 10.5using the RRNS codes of Table 10.4.

mode and the RRNS codes of Table 10.4 for a two-user JD-CDMA speech transceiver. Asseen in Table 10.4 of Section 10.6.2, three different RRNS codes having different code ratesare used for protecting the three different classes of speech bits in the speech codec. The BERof the three protection classes is shown together with the average BER of the channel codedbits versus the channel SNR. The number of bits in these protection classes was 60, 25 and120, respectively. As expected, the Class I subchannel exhibits the highest BER performance,followed by the Class II and Class III subchannels in decreasing order of BER performance.The corresponding BER results for the BPSK/JD-CDMA mode areshown in Figure 10.10.

In Figure 10.11, the average BER performance of the coded fixed-mode BPSK/JD-CDMAand 4QAM/JD-CDMA systems is presented along with that of thetwin-mode AQAM/JD-CDMA system supporting two users and assuming zero-latencymodem mode signalling. Theperformance of the AQAM scheme was evaluated by analyzing the BER and the through-put expressed in terms of the average number of Bits Per Symbol (BPS) transmitted. The

Page 58: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 511

10.8. SYSTEM PERFORMANCE 511

0 2 4 6 8 10SNR(dB)

10-4

2

5

10-3

2

5

10-2

2

5

10-1

2

BE

R

BER Against SNR

Class IIIClass IIClass IAverage

Figure 10.10: BER performance of BPSK/JD-CDMA over the COST 207 BU channel ofTable 10.5using the RRNS codes of Table 10.4.

BER curve has to be read by referring to the vertical-axis at the left of the figure, whilethe BPS throughput curve is interpreted by referring to the vertical-axis at the right that islabelled BPS. At low channel SNRs the BER of the AQAM/JD-CDMAscheme mirroredthat of BPSK/JD-CDMA, which can be explained using Figure 10.12. In Figure 10.12,the Probability Density Functions (PDF) of the AQAM/JD-CDMA modes versus channelSNR are plotted. As mentioned earlier, the results were obtained using a switching thresh-old of 10.5 dB. We can see from the figure that at low average channel SNRs (< 6 dB), themode switching threshold of 10.5 dB instantaneous SNR was seldom reached, and thereforeBPSK/JD-CDMA was the predominant mode. Hence, the performance of the AQAM/JD-CDMA scheme was similar to BPSK/JD-CDMA. However, as the channel SNR increased,the BER performance of AQAM/JD-CDMA became better than thatof BPSK/JD-CDMA, asshown in Figure 10.11. This is because the 4QAM mode is employed more often, reducingthe probability of using BPSK, as shown in Figure 10.12. Since the mean BER of the system

Page 59: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 512

512 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16SNR(dB)

10-5

2

5

10-4

2

5

10-3

2

5

10-2

2

5

10-1

2

BE

R

BER Against SNR

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

1.5

BP

S

BPSBER4QAMBPSKAQAM

Figure 10.11: BER and BPS comparisons for fixed mode BPSK and 4QAM as well as fortheAQAM/JD-CDMA system, using the RRNS codes of Table 10.4. The switching thresh-old for AQAM was set to 10.5 dB and the simulation parameters are listed in Table 10.5.

is the ratio of the total number of bit errors to the total number of bits transmitted, the meanBER will decrease with a decreasing number of bit errors or with an increasing number oftransmitted bits. For a fixed number of symbols transmitted,the total number of transmittedbits in a frame is constant for fixed mode BPSK/JD-CDMA, whilefor AQAM/JD-CDMAthe total number of transmitted bits increased, when the 4QAM/JD-CDMA mode was used.Consequently, the average BER of the AQAM/JD-CDMA system was lower than that of thefixed-mode BPSK/JD-CDMA scheme.

The BPS throughput performance curve is also plotted in Figure 10.11. As expected, thenumber of BPS of both BPSK and 4QAM is constant for all channelSNR values. Theyare limited by the modulation scheme used and the coding rateof the RRNS codes seenin Table 10.4. For example, for 4QAM we have 2 BPS, but the associated channel coderate is205/320, as shown in Table 10.4, hence the effective throughput of the system is2× 205

320 = 1.28 BPS. For AQAM/JD-CDMA, we can see from Figure 10.11 that the through-put is similar to that of BPSK/JD-CDMA at low channel SNRs. However, as the average

Page 60: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 513

10.8. SYSTEM PERFORMANCE 513

0 2 4 6 8 10 12 14 16 18 20Channel SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0

Pro

babi

lity

10.2kbps/4-QAM4.75kbps/BPSK

Figure 10.12: The probability of each modulation mode being chosen for transmission in atwin-mode(BPSK, 4QAM), two-user AQAM/JD-CDMA system using the parameters of Table10.5.

channel SNR increased, more and more frames were transmitted using 4QAM/JD-CDMAand the average throughput increased gradually. At high average SNRs, the throughput ofAQAM/JD-CDMA became similar to that of the 4QAM/JD-CDMA scheme.

The overall SEGSNR versus channel SNR performance of the proposed speech transceiveris displayed in Figure 10.13. Observe that the source sensitivity-matched triple-class 4.75kbit/s BPSK/JD-CDMA system requires a channel SNR in excessof about 8 dB for nearlyunimpaired speech quality over the COST207 BU channel of Table 10.5. When the channelSNR was in excess of about 12 dB, the 10.2 kbit/s 4QAM/JD-CDMAsystem outperformedthe 4.75 kbit/s BPSK/JD-CDMA scheme in terms of both objective and subjective speechquality. Furthermore, at channel SNRs around 10 dB, where the BPSK and 4QAM SEGSNRcurves cross each other in Figure 10.13, it was preferable touse the inherently lower qual-ity but unimpaired mode of operation. In the light of these findings, the application of theAMR speech codec in conjunction with AQAM constitutes an attractive trade-off in termsof providing users with the best possible speech quality under arbitrary channel conditions.Specifically, the 10.2 kbit/s 4QAM/JD-CDMA scheme has a higher source bit rate and thusexhibits a higher SEGSNR under error-free conditions. The 4.75 kbit/s BPSK/JD-CDMAscheme exhibits a lower source bit rate and correspondinglylower speech quality under error-

Page 61: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 514

514 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 2 4 6 8 10 12 14 16 18 20Channel SNR (dB)

-2

0

2

4

6

8

10

12

14

16

Seg

men

talS

NR

(dB

)

AQAM/JD-CDMA10.2 kbps 4QAM/JD-CDMA4.75 kbps BPSK/JD-CDMA

Figure 10.13: SEGSNR versus channel SNR

free conditions. However, due to its less robust 4QAM modulation mode, the 10.2 kbit/s4QAM/JD-CDMA scheme is sensitive to channel errors and breaks down under hostile chan-nel conditions, where the 4.75 kbit/s BPSK/JD-CDMA scheme still exhibits robust operation,as illustrated in Figure 10.13.

In the context of Figure 10.13 ideally a system is sought thatachieves a SEGSNR per-formance, which follows the envelope of the SEGSNR curves ofthe individual BPSK/JD-CDMA and 4QAM/JD-CDMA modes. The SEGSNR performance of the AQAM system isalso displayed in Figure 10.13. We observe that AQAM provides a smooth evolution acrossthe range of channel SNRs. At high channel SNRs, in excess of 12-14 dB, the system op-erates predominantly in the 4QAM/JD-CDMA mode. As the channel SNR degrades below12 dB, some of the speech frames are transmitted in the BPSK/JD-CDMA mode, which im-plies that the lower quality speech rate of 4.75 kbit/s is employed. This results in a slightlydegraded average speech quality, while still offering a substantial SEGSNR gain comparedto the fixed-mode 4.75 kbit/s BPSK/JD-CDMA scheme. At channel SNRs below 10 dB, theperformance of the 10.2 kbit/s 4QAM/JD-CDMA mode deteriorates due to the occurrenceof a high number of channel errors, inflicting severe SEGSNR degradations. In these hostileconditions, the 4.75 kbit/s BPSK/JD-CDMA mode provides a more robust performance asso-ciated with a better speech quality. With the advent of the AQAM/JD-CDMA mode switching

Page 62: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 515

10.8. SYSTEM PERFORMANCE 515

0 100 200 300 400 500 600 700 800 900 1000Frame Index

0

10

20

30

40

No.

ofer

rors

(c) AQAM, switching threshold = 10.5 dB

0

10

20

30

40

No.

ofer

rors

(b) BPSK

0

10

20

30

40N

o.of

erro

rs

(a) 4-QAM

Figure 10.14: The comparison of the number of errors per frame versus 20ms frame index for the (a)4QAM, (b) BPSK and (c) AQAM/JD-CDMA systems with a switching thresholdof 10.5dB, at channel SNR = 10 dB for 1000 frames over the COST207 BU channel of Table10.5.

regime the transceiver exhibits a less bursty error distribution, than that of the conventionalfixed-mode 4QAM modem, as it can be seen in Figure 10.14, wherethe error events of theBPSK/JD-CDMA scheme are also displayed.

The benefits of the proposed dual-mode transceiver are further demonstrated by Figure10.15, consisting of three graphs plotted against the speech frame index, giving an insightfulcharacterisation of the adaptive speech transceiver. Figure 10.15(a) shows a speech segmentof 30 frames. In the AMR codec, a speech frame corresponds to aduration of 20 ms. In Figure10.15(b), the SEGSNR versus frame index performance curvesof the BPSK, 4QAM andAQAM/JD-CDMA schemes are shown, in both error-free and channel-impaired scenarios.The SINR at the output of the MMSE-BDFE is displayed in Figure10.15(c). The adaptationof the modulation mode is also shown in Figure 10.15(c), where the transceiver switches tothe BPSK or 4QAM mode according to the estimated SINR using the switching threshold setto 10.5 dB.

When transmitting in the less robust 4QAM mode using the higher-rate speech mode of10.2 kbit/s, a sudden steep drop in the channel conditions - as portrayed at Frame 1 in Figure10.15 - results in a high number of transmission errors, as also illustrated in Figure 10.14(a).

Page 63: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 516

516 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0 5 10 15 20 25 30Frame Index

02468

101214

SIN

R(d

B)

BPSK4QAM

Mode(c)

SINR

AQAM/JD-CDMA mode

-505

1015202530

SE

GS

NR

(dB

)

AQAM4QAMBPSK

(b)

COST207 BU channel

perfect channel-15000-10000-5000

05000

1000015000

Am

plitu

de (a)

Figure 10.15: Characteristic waveforms of the adaptive system. (a) Time-domain speech signal forframe indices between 0 and 30; (b) SEGSNR in various transceiver modes; (c) SINRversus time and transceiver modes versus time over the COST207 BU channel of Table10.5.

This happens to occur during the period of voice onset in Figure 10.15, resulting in the cor-ruption of the speech frame, which has the effect of inflicting impairments to subsequentframes due to the error propagation effects of various speech bits, as alluded to in Section10.3. It can be seen in Figure 10.15 that the high number of errors inflicted in the 4QAMmode during voiced speech segments caused a severe SEGSNR degradation at frame index10 and the 10.2 kbit/s speech codec never fully recovered, until the channel conditions ex-pressed in terms of the SINR in Figure 10.15(c) improved. On the other hand, the significantlymore robust 4.75 kbit/s BPSK/JD-CDMA scheme performed wellunder these hostile channelconditions, encountering a low number of errors in Figure 10.14(b), while transmitting at alower speech rate, hence at an inherently lower speech quality. For the sake of visual clarity,the performance curves of BPSK/JD-CDMA and AQAM/JD-CDMA were not displayed inFigure 10.15(b) for the channel-impaired scenarios, sincetheir respective graphs are almostidentical to that of the error-free speech SEGSNR curves.

The benefits of the proposed dual-mode transceiver are also demonstrated by Figure 10.16,which shares the same graphs arrangement as described earlier for Figure 10.15 but at adifferent frame index range between 300 and 330. It can be seen in Figure 10.16 that a

Page 64: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 517

10.8. SYSTEM PERFORMANCE 517

300 305 310 315 320 325 330Frame Index

02468

101214

SIN

R(d

B)

BPSK4QAM

Mode

(c)SINR

AQAM/JD-CDMA mode

-505

1015202530

SE

GS

NR

(dB

)

AQAM4QAMBPSK

(b)

COST207 BU channel

perfect channel-15000-10000-5000

05000

1000015000

Am

plitu

de

(a)

Figure 10.16: Characteristic waveforms of the adaptive system. (a) Time-domain speech signal forframe indices between 300 and 330; (b) SEGSNR in various transceiver modes; c) SINRversus time and transceiver modes versus time over the COST207 BU channel of Table10.5.

sudden steep drop in the channel conditions at frame index 300 during the 4QAM mode,caused a severe SEGSNR degradation for the voiced speech segments and the 10.2 kbit/sspeech codec never recovered until the channel conditions improved.

10.8.1 Subjective Testing

Informal listening tests were conducted, in order to assessthe performance of the AQAM/JD-CDMA scheme in comparison to the fixed-mode BPSK/JD-CDMA and4QAM/JD-CDMAschemes. It is particularly revealing to investigate, how the AQAM/JD-CDMA scheme per-forms in the intermediate channel SNR region between 7 dB and11 dB. The speech qualitywas assessed using pairwise comparison tests. The listeners were asked to express a pref-erence between two speech files A or B or neither. A total of 12 listeners were used in thepairwise comparison tests. Four different utterances wereemployed during the listening tests,where the utterances were due to a mixture of male and female speakers having Americanaccents. Table 10.6 details some of the results of the listening tests.

Through the listening tests we found that for the fixed-mode BPSK/JD-CDMA schemeunimpaired perceptual speech quality was achieved for channel SNRs in excess of 7 dB.

Page 65: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 518

518 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

PreferenceSpeech Material A Speech Material B A (%) B (%) Neither (%)

4.75 kbit/s (Error free) 10.2 kbit/s (Error free) 4.15 66.65 29.2AQAM(9dB) 4QAM(9dB) 100 0.00 0.00AQAM(9dB) 4QAM(11dB) 8.3 50.0 41.7AQAM(9dB) BPSK(9dB) 37.5 16.65 45.85AQAM(12dB) 4QAM(12dB) 4.15 20.85 75.0AQAM(12dB) 4QAM(13dB) 8.3 25.0 66.7AQAM(12dB) BPSK(12dB) 41.65 8.3 50.05

Table 10.6: Details of the listening tests conducted using the pairwise comparison method,where thelisteners were given a choice of preference between two speech files coded in differenttransmission scenarios.

With reference to Figure 10.13, when the channel conditionsdegraded below 7 dB, the speechquality became objectionable due to the preponderance of channel errors. For the fixed mode4QAM/JD-CDMA scheme, the channel SNR threshold was 11 dB, below which the speechquality started to degrade. The perceptual performance of AQAM/JD-CDMA was foundsuperior to that of 4QAM/JD-CDMA at channel SNRs below 11 dB.Specifically, it can beobserved from Table 10.6 that all the listeners preferred the AQAM/JD-CDMA scheme ata channel SNR of 9 dB due to the associated high concentrationof channel errors in theless robust 4QAM/JD-CDMA scheme at the same channel SNR, resulting in a perceptuallydegraded reconstructed speech quality.

More explicitly, we opted for investigating the AQAM/JD-CDMA scheme at a channelSNR of 9 dB, since - as shown in Figure 10.12 - this SNR value falls in the transitory re-gion between BPSK/JD-CDMA and 4QAM/JD-CDMA. As the channelconditions improvedto an SNR in excess of 11 dB, the 4QAM/JD-CDMA scheme performed slightly better, thanAQAM/JD-CDMA due to its inherently higher SEGSNR performance under error free condi-tions. Nonetheless, the AQAM/JD-CDMA scheme provided a good perceptual performance,as exemplified in Table 10.6 at a channel SNR of 12 dB, in comparison to the 4QAM/JD-CDMA scheme at the channel SNRs of both 12 dB and 13 dB. Here, only about twentypercent of the listeners preferred the 4QAM/JD-CDMA schemeto the AQAM/JD-CDMAscheme, while the rest suggested that both sounded very similar. It can also be observed fromTable 10.6 that the AQAM/JD-CDMA scheme performed better than BPSK/JD-CDMA fora channel SNR of 7 dB and above, while in the region below 7 dB, AQAM/JD-CDMA hasa similar perceptual performance to that of BPSK/JD-CDMA. As shown in Table 10.7, wefound that changing the mode switching frequency for every 1, 10 or 100 frames does notimpair the speech quality either in objective SEGSNR terms or in terms of informal listeningtests.

10.9 A Turbo-Detected Unequal Error Protection IrregularConvolutional Coded AMR Transceiver

J. Wang, N. S. Othman, J. Kliewer, L. L. Yang and L. Hanzo

Page 66: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 519

10.9. A TURBO-DETECTED IRREGULAR CONVOLUTIONAL CODED AMR TRANSCEIVER 519

Frame Switching Frequency SEGSNR (dB)1 11.3810 11.66100 11.68

Table 10.7: Frame switching frequency versus SEGSNR

10.9.1 Motivation

Since the different bits of multimedia information, such asspeech and video, have differ-ent error sensitivity, efficient unequal-protection channel coding schemes have to be usedto ensure that the perceptually more important bits benefit from more powerful protection.Furthermore, in the context of turbo detection the channel codes should also match the char-acteristics of the channel for the sake of attaining a good convergence performance. In thissection we address this design dilemma by using irregular convolutional codes (IRCCs) whichconstitute a family of different-rate subcodes. we benefit from the high design flexibility ofIRCCs and hence excellent convergence properties are maintained while having unequal er-ror protection capabilities matched to the requirements ofthe source. An EXIT chart baseddesign procedure is proposed and used in the context of protecting the different-sensitivityspeech bits of the wideband AMR speech codec. As a benefit, theunequal-protection systemusing IRCCs exhibits an SNR advantage of about 0.4 dB over theequal-protection systememploying regular convolutional codes, when communicating over a Gaussian channel. Wewill also demonstrate that irregular Convolutional codes exhibit excellent convergence prop-erties in the context of iterative decoding, whilst having an unequal error protection capabil-ity, which is exploited in this contribution to protect the different-sensitivity speech bits ofthe wideband AMR speech codec. As a benefit, the unequal-protection system exhibits anSNR advantage of about 0.3 dB over the equal-protection system, when communicating overa Guassian channel.

Source encoded information sources, such as speech, audio or video, typically exhibit anon-uniform error sensitivity, where the effect of a channel error may significantly vary fromone bit to another [410,411]. Hence unequal error protection (UEP) is applied to ensure thatthe perceptually more important bits benefit from more powerful protection. In [340], thespeech bits were protected by a family of Rate-Compatible Punctured Convolutional (RCPC)codes [341] whose error protection capabilities had been matched to the bit-sensitivity ofthe speech codec. Different-rate RCPC codes were obtained by puncturing the same mothercode, while satisfying the rate-compatibility restriction. However, they were not designed inthe context of turbo detection. Other schemes using a serially concatenated system and turboprocessing were proposed in [342, 343], where the UEP was provided by two different-rateconvolutional codes.

Recently, Tuchleret al.[344,345] studied the construction of irregular convolutional codes(IRCCs) and proposed several design criteria. These IRCCs consisted of a family of convolu-tional codes having different code rates and were specifically designed with the aid of extrin-sic information transfer (EXIT) charts [346] invoked, for the sake of improving the conver-gence behaviour of iteratively decoded serially concatenated systems. In general, EXIT chartanalysis assumes having a long interleaver block lengths. However, it was shown in [345]

Page 67: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 520

520 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

that by using an appropriate optimization criterion, the concatenated system is capable ofperforming well even for short interleaver block lengths. Since the constituent codes havedifferent coding rates, the resultant IRCC is capable of providing unequal error protection(UEP).

A novel element of this section is that UEP and EXIT chart based code optimization willbe jointly carried out and successfully applied for improving the achievable robustness ofspeech transmission. We propose a serially concatenated turbo transceiver using an IRCCas the outer code for the transmission of Adaptive Multi-Rate Wideband (AMR-WB) codedspeech. Rather than being decoded separately, the constituent codes of the IRCC are decodedjointly and iteratively by exchanging extrinsic information with the inner code. The IRCCis optimized to match the characteristics of both the speechsource codec and those of thechannel, so that UEP is achieved while maximizing the iteration gain attained.

In contrast to the error sensitivity of the narrow-band AMR codec characterized in Sec-tion 10.3, that of the AMR-WB speech codec will characterizedIn Section 10.9.2, whileour system model will be introduced in Section 10.9.3, followed by Section 10.9.4, whichdescribes the design procedure of IRCCs. An IRCC design example is provided in Section10.9.5. Our performance results are presented in Section 10.9.6, while Section?? concludesthe section.

10.9.2 The AMR-WB Codec’s Error Sensitivity

The AMR-WB speech codec is capable of supporting bit rates varying from 6.6 to 23.85 kbit/sand it has become a 3GPP and ITU-T standard, which provides a superior speech quality incomparison to the conventional telephone-bandwith voice codecs [347]. Each AMR-WBframe represents 20 ms of speech, producing 317 bits at a bitrate of 15.85 kbps plus 23 bitsof header information per frame. The codec parameters in each frame include the so-calledimittance spectrum pairs (ISPs), the adaptive codebook delay (pitch delay), the algebraiccodebook excitation index and the jointly vector quantizedpitch gains as well as algebraiccodebook gains.

Most source coded bitstreams contain certain bits that are more sensitive to transmissionerrors than others. A common approach for quantifying the sensitivity of a given bit is toconsistently invert this bit in every speech frame and evaluate the associated Segmental SNR(SegSNR) degration [410]. The error sensitivity of the various encoded bits in the AMR-WB codec determined in this way is shown in Fig. 10.17. The results are based on speechsamples taken from the EBU SQAM (Sound Quality Assessment Material) CD, sampled at16 kHz and encoded at 15.85 kbps. It can be observed that the bits representing the ISPs,the adaptive codebook delay, the algebraic codebook index and the vector quantized gain arefairly error sensitive. By contrast, the least sensitive bits are related to the fixed codebook’sexcitation pulse positions. Statistically, about 10% (35/340) of the bits in a speech frame willcause a SegSNR degration in excess of 10 dB, and about 8% (28/340) of the bits will inflicta degration between 5 and 10 dB. Furthermore, the error-freereception of the 7% (23/340)header information is in general crucial for the adequate detection of speech.

Page 68: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 521

10.9. A TURBO-DETECTED IRREGULAR CONVOLUTIONAL CODED AMR TRANSCEIVER 521

0 5 10 15 20 25 30 35 40 4505

101520

ISP1 ISP2 ISP3 ISP4 ISP5 ISP6 ISP7

50 60 70 80 90 100 11005

101520

Adaptive C I Fixed Codebook Index VQ Gain

120 130 140 150 160 170 18005

101520

SE

GS

NR

De

gra

da

tion

(dB

)

Adaptive C I Fixed Codebook Index VQ Gain

190 200 210 220 230 240 25005

101520

Adaptive C I Fixed Codebook Index VQ Gain

260 270 280 290 300 310Bit Index

05

101520

Adaptive C I Fixed Codebook Index VQ Gain

Figure 10.17: SegSNR degrations versus bit index due to inflicting 100% BER in the 317-bit,20 msAMR-WB frame

10.9.3 System Model

Fig. 10.18 shows the system’s schematic diagram. At the transmitter, each of theK-bitspeech frame is protected by a serially concatenated channel code consisting of an outer code(Encoder I) and an inner code (Encoder II) before transmission over the channel, resultingin an overall coding rate ofR. At the receiver, iterative decoding is performed with adventof extrinsic information exchange between the inner code (Decoder II) and the outer code(Decoder I). Both decoders employ the a-posteriori probability (APP) decoding algorithm,e.g., the BCJR algorithm [348]. AfterF number of iterations, the speech decoder is invokedin order to reconstruct the speech frame.

According to the design rules of [349], the inner code of a serially concatenated systemshould be recursive to enable interleaver gain. Furthermore, it has been shown in [350] thatfor binary erasure channels (BECs) and block lengths tending to infinity the inner code shouldhave rate-1 to achieve capacity. Experiments have shown that this approximately holds alsofor AWGN channels [344,345]. For the sake of simplicity, we opted for employing a memory-1 recursive convolutional code having a generator polynomial of 1/(1+D), which is actually

Page 69: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 522

522 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

Transmitter

Encoder I Encoder II

n

Receiver

Decoder IIDecoder IDecoder

Speech

Speech

EncoderΠ

Π−1

Π

Figure 10.18: System Model

a simple accumulator. Hence the decoding complexity of the inner code is extremely low. Inthe proposed system, we use an IRCC as the outer code, while inthe benchmarker system, weuse a regular non-systematic convolutional (NSC) code as the outer code. BPSK modulationand encountering an AWGN channel are assumed.

10.9.4 Design of Irregular Convolutional Codes

An IRCC is constructed from a family ofP subcodes. First, a rate-r convolutional mothercodeC1 is selected and the(P − 1) other subcodesCk of raterk > r are obtained by punc-turing. LetL denote the total number of encoded bits generated from theK input informationbits. Each subcode encodes a fraction ofαkrkL information bits and generatesαkL encodedbits. Given the target code rate ofR ∈ [0, 1], the weighting coefficientαk has to satisfy:

1 =

P∑

k=1

αk, R =

P∑

k=1

αkrk, and αk ∈ [0, 1], ∀k. (10.2)

For example, in [345] a family ofP = 17 subcodes were constructed from a systematic,rate-1/2, memory-4 mother code defined by the generator polynomial(1, g1/g0), whereg0 =1+D +D4 is the feedback polynomial andg1 = 1+D2 +D3 +D4 is the feedforward one.Higher code rates may be obtained by puncturing, while lowerrates are created by addingmore generators and by puncturing under the contraint of maximizing the achievable freedistance. The two additional generators used areg2 = 1 + D + D2 + D4 andg3 = 1 + D +D3 + D4. The resultant17 subcodes have coding rates spanning from0.1, 0.15, 0.2, · · · , to0.9.

The IRCC constructed has the advantage that the decoding of all subcodes may be per-formed using the same mother code trellis, except that at thebeginning of each block ofαkrkL trellis sections corresponding to the subcodeCk, the puncturing pattern has to berestarted. Trellis termination is necessary only after allof theK information bits have been

Page 70: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 523

10.9. A TURBO-DETECTED IRREGULAR CONVOLUTIONAL CODED AMR TRANSCEIVER 523

encoded.We now optimize the iterative receiver by means of EXIT charts [346], which is capable

of predicting the performance of an iterative receiver by examining the extrinsic informationtransfer function of each of the component devices independently.

For the outer decoder (Decoder I), denote the mutual information between the aprioriinput A and the transmitted code bitsC asIA1 = I(C;A), while the mutual informationbetween the extrinsic outputE and the transmitted code bitsC is denoted asIE1 = I(C;E).Then the transfer function of Decoder I can be defined as:

IE1 = TI(IA1), (10.3)

which maps the input variableIA1 to the output variableIE1. Similarly, for the inner decoder(Decoder II), we denote the mutual information between the apriori inputA and the transmit-ted information bitsX asIA2 = I(X;A). Furthermore, we denote the mutual informationbetween the extrinsic outputE and the transmitted information bitsX asIE2 = I(X;E).Note that the extrinsic output of the inner code also dependson the channel SNR orEb/N0.Hence the transfer function of the inner code is defined as

IE2 = TII(IA2, Eb/N0). (10.4)

The transfer functions can be obtained by using the histogram-based LLR measurements asproposed in [346] or the simplified method as proposed in [351].

When using IRCCs, the transfer function of an IRCC can be obtained from those of itssubcodes. Denote the transfer function of the subcodek asTI,k(i). Assuming that the trellisfractions of the subcodes do not significantly interfere with each other, which might changethe associated transfer characteristics, the transfer function TI(i) of the target IRCC is theweighted superposition of the transfer functionTI,k(i) [345], yielding,

TI(i) =

P∑

k=1

αkTI,k(i). (10.5)

Note that in iterative decoding, the extrinsic outputE2 of Decoder II becomes the aprioriinput A1 of Decoder I and vice versa. Given the transfer function,TII(i, Eb/N0), of theinner code, and that of the outer codeTI(i), the extrinsic informationIE1 at the output ofDecoder I after theith iteration can be calculated using the recursion of:

µi = TI(TII(µi−1, Eb/N0)), i = 1, 2, . . . , (10.6)

with µ0 = 0, i.e., assuming the absence of apriori input for Decoder II at the commencementof iterations.

Generally, interactive speech communication systems require a low delay, and hence ashort interleaver block length. And the number of iterations for the iterative decoder is alsolimited due to the constraint of complexity. It has been found [345] that EXIT charts may pro-vide a reasonable convergence prediction for the first couple of iterations even in the case ofshort block lengths. Hence, we fixed the transfer function ofthe inner code for a givenEb/N0

value yieldingTII(i) = TII(i, Eb/N0), and optimized the weighting coefficientsαk of theouter IRCC for the sake of obtaining a transfer functionTI(i) that specifically maximizes the

Page 71: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 524

524 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

extrinsic output after exactlyF number of iterations [345], which is formulated as:

maximize µi = TI(TII(µi−1)), i = 1, 2, . . . , F, (10.7)

with µ0 = 0.Additionally, considering the non-uniform error sensitivity of the speech source bits char-

acterized in Figurefig:amr-sensitivity, we may intentionally enhance the protection of themore sensitive source data bits by using strong subcodes, thus imposing the source constraintsof:

k2∑

k=k1

αkrk/R ≥ x%, 1 ≤ k1 ≤ k2 ≤ P, 0 ≤ x ≤ 100, (10.8)

which implies that the percentage of the speech source bits protected by the subcodesk1 tok2 is at leastx%.

Finally, our task is to find a weight vectorα = [α1, α2, · · · , αP ]T , so that eq. (10.7) ismaximized, while satisfying the constraints of eq. (10.2) and eq. (10.8). This optimizationproblem can be solved by slightly modifying the procedure proposed in [345], as it will beillustrated by the following example.

10.9.5 An Example Irregular Convolutional Code

We assume the overall system coding rate to beR = 0.5. As stated in Section 10.9.3, the innercode has a unitary code rate, hence all the redundancy is assigned to the outer code. We use ahalf-rate, memory-4, maximum free distance NSC code havingthe generator polynomials ofg0 = 1+D +D2 +D4, andg1 = 1+D3 +D4. The extrinsic information transfer functionsof the inner code and the outer NSC code are shown in Fig. 10.19. It can be seen that theminimum convergence SNR threshold for the benchmarker system using the NSC outer codeis about 1.2 dB, although we note that these curves are based on the assumption of havingan infinite interleaver length and a Gaussian Log LikelihoodRatio (LLR) distribution. In thecase of short block lengths, the actual SNR convergence threshold might be higher.

Hence, when constructing the IRCC, we choose the target inner code transfer functionTII(i) atEb/N0 = 1.5 dB, and the number of iterationsF = 6. For the constituent subcodes,we use those proposed in [345] except that code rates ofrk > 0.75 are excluded from our de-sign for the sake of avoiding significant error floors. The resultant code rates of the subcodesspan the range ofr1 = 0.1, r2 = 0.15, · · · , r14 = 0.75.

Initially the source constraint of eq. (10.8) was not imposed. By using the optimization pro-cedure of [345], we arrive at the weight vector ofα0 = [0 0 0 0 0.01 0.13 0.18 0.19 0.14 0.120.10 0.01 0.03 0.10]T , and the percentage of the input speech data bits protected by thedifferent subcodes becomes[0, 0, 0, 0, 0.6%, 9.0%, 14.4%, 16.7%, 14.0%, 13.0%,11.5%, 1.6%, 4.2%, 15.0%]T . The extrinsic output of Decoder I after 6 iterations becomesµ6 = 0.98.

Observe in the context of the vector containing the corresponding speech bit fractionsthat only 0.6% of the source bits are protected by ther5 = 0.3-rate subcode, whereas atotal of 23.4% of the speech bits is protected by ther6 = 0.35 and r7 = 0.4-rate sub-codes. In order to enhance the protection of the more sensitive speech bits, we imposenow the source constraint of eq. (10.8) by requiring all the header information bits in a

Page 72: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 525

10.9. A TURBO-DETECTED IRREGULAR CONVOLUTIONAL CODED AMR TRANSCEIVER 525

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

I E1,

I A2

IA1, IE2

1.5 dB

1.2 dB

2 dB

inner codeNSC code

IRCC

Figure 10.19: Extrinsic information transfer functions of the outer NSC code and the designed IRCC,as well as those of the inner code atEb/N0 = 1.2, 1.5 and 2 dB.

speech frame to be protected by the relatively strongr5 = 0.3-rate subcode. More explic-itly, we impose the constraint ofα5r5/0.5 ≥ 7%, resulting in a new weight vector ofα1 =[0 0 0 0 0.12 0.06 0.14 0.16 0.13 0.12 0.10 0.02 0.04 0.11]T , and the new vector of speechbit fractions becomes[0, 0, 0, 0, 7.1%, 4.0%, 10.9%, 14.8%, 13.5%, 13.3%, 12.2%,2.7%, 5.5%, 16%]T . The extrinsic output after 6 iterations is now slightly reduced toµ6 = 0.97, which is close to the maximum value of 0.98. Furthermore, now, 14.9% ofthe speech bits is protected by ther6 = 0.35 andr7 = 0.4-rate subcodes.

The extrinsic information transfer function of this IRCC isalso shown in Fig. 10.19. Asseen from the EXIT chart, the convergence SNR threshold for the system using the IRCC islower than 1.2 dB and there is a wider EXIT chart tunnel between the inner code’s curve andthe outer code’s curve which is particularly so at the lowIA values routinely encounteredduring the first couple of iterations. Hence, given a limitednumber of iterations, we wouldpredict that the system using the IRCC may be expected to perform better than that using theNSC outer code in the range ofEb/N0 = 1.5∼2 dB.

10.9.6 UEP AMR IRCC Performance Results

Finally, the achievable system performance was evaluated for a K = 340 speech bit per20 ms transmission frame, resulting in an interleaver length of L = 688 bits, including8 tail bits. This wideband-AMR speech coded [347] frame was generated at a bit rate of

Page 73: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 526

526 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

15.85 kbps in the codec’s mode 4. Before channel encoding, each frame of speech bits isrearranged according to the descending order of the error sensitivity of the bits by consideringFigure 10.17, so that the more important data bits are protected by stronger IRCC subcodes.An S-random interleaver [352] was employed withS = 15, where all of the subcodes’ bitsare interleaved together, and 10 iterations were performedby the iterative decoder.

The BER performance of the UEP system using IRCCs and that of the Equal Error Protec-tion (EEP) benchmarker system using the NSC code are depicted in Fig. 10.20. It can be seenthat the UEP system outperforms the EEP system in the range ofEb/N0 = 1.5 ∼ 2.5 dB,which matches our performance prediction inferred from theEXIT chart analysis of Section10.9.4.

10-4

10-3

10-2

10-1

100

1.5 2 2.5 3 3.5 4 4.5

BE

R

Eb/N0 [dB]

10-4

10-3

10-2

10-1

100

1.5 2 2.5 3 3.5 4 4.5

BE

R

Eb/N0 [dB]

10-4

10-3

10-2

10-1

100

1.5 2 2.5 3 3.5 4 4.5

BE

R

Eb/N0 [dB]

IT #1IT #6

IT #10

10-4

10-3

10-2

10-1

100

1.5 2 2.5 3 3.5 4 4.5

BE

R

Eb/N0 [dB]

UEP EEP

Figure 10.20: BER performance of both the UEP system employing the IRCC and the EEP systemusing the NSC code

The actual decoding trajectories of both the UEP system and the EEP system recordedat Eb/N0=1.5 and 2 dB are shown in Fig. 10.21 and Fig. 10.22, respectively. These areobtained by measuring the evolution of mutual information at the input and output of boththe inner decoder and the outer decoder as the iterative decoding algorithm is simulated. Dueto the relatively short interleaver block length of 688 bits, the actual decoding trajectories donot closely follow the transfer functions especially when increasing the number of iterations.Nonetheless, the UEP system does benefit from having a wider open tunnel during the firstcouple of iterations and hence it is capable of reaching a higher extrinsic output in the end,resulting in a lower BER.

The BER profiles of the UEP system atEb/N0=1.5, 2 and 2.5 dB are plotted in Fig.10.23. As intended, different fractions of the speech framebenefitted from different degreesof IRCC-aided protection. The first 60 bits represent the header information bits and the mostsensitive speech bits, which require the lowest BER.

The SegSNR performances of both the UEP and EEP system are depicted in Fig. 10.24.The UEP system is seen to outperform the EEP system atEb/N0 ≤ 2.5 dB. Above thisEb/N0 point, the two systems attained almost the same SegSNRs. To achieve a good speech

Page 74: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 527

10.9. A TURBO-DETECTED IRREGULAR CONVOLUTIONAL CODED AMR TRANSCEIVER 527

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

I E1,

I A2

IA1,IE2

2 dB

1.5 dB

1.5 dB2 dB

Figure 10.21: The EXIT chart and the simulated decoding trajectories of the UEP system using ourIRCC as the outer code and a rate-1 recursive code as the inner code at bothEb/N0=1.5and 2 dB

quality associated with SegSNR>9 dB, the UEP system requiresEb/N0 ≥ 2 dB, about 0.3dB less than the EEP system.

10.9.7 UEP AMR Conclusions

In Figure 10.17 of Sectionsec:amr-wb-codec we briefly exemplified the error sensitivity ofthe AMR-WB codec and then investigated the application of IRCCs for the sake of providingUEP for the AMR-WB speech codec. The IRCCs were optimized withthe aid of EXIT chartsand the design procedure used was illustrated with the aid ofan example.

In the design of IRCCs, we aimed for matching the extrinsic information transfer functionof the outer IRCC to that of the inner code, where that of the latter is largely determined bythe channel SNR. At the same time, we imposed certain source constraints determined by theerror sensitivity of the AMR-WB source bits. Hence the designmethod proposed here maybe viewed as an attractive joint source/channel codec optimization.

The concatenated system using an IRCC benefits from having a low convergence SNRthreshold. Owing to its design flexibility, various transfer functions can be obtained for anIRCC. We have shown that our IRCC was capable of achieving better convergence than aregular NSC code having the same constraint length and code rate. Hence the system usingIRCCs has the potential of outperforming the correspondingarrangement using regular NSC

Page 75: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 528

528 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

I E1,

I A2

IA1,IE2

2 dB

1.5 dB

1.5 dB2 dB

Figure 10.22: The EXIT chart and the simulated decoding trajectories of the EEP system using ourNSC code as the outer code and a rate-1 recursive code as the inner code at bothEb/N0=1.5 and 2 dB

10-4

10-3

10-2

10-1

100

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320

BE

R

Bit Index

1.5dB2 dB

2.5 dB

Figure 10.23: Bit error rate of the different speech bits after ten iterations at bothEb/N0=1.5, 2 and2.5dB recorded by transmitting105 speech frames.

Page 76: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 529

10.10. CHAPTER SUMMARY 529

0

1

2

3

4

5

6

7

8

9

10

11

1.5 2 2.5 3 3.5

Seg

SN

R (

dB)

Eb/N0 (dB)

EEP UEP

Figure 10.24: Comparison of SegSNRs of the AMR-WB speech codec using both EEP and UEP

codes in the low SNR region.Furthermore, IRCCs are capable of providing UEP, since it isconstituted by various sub-

codes having different code rates and hence different errorprotection capabilities. Multime-dia source information, such as speech, audio and video source can benefit from this property,when carefully designing the IRCC to match the source’s bit sensitivity. Our future researchaims for exchanging soft speech bits between the speech and channel decoders.

It is worth noting that an ISI channel can also be viewed as a rate-1 convolutional code, andthe transfer function of an equalizer for a precoded ISI channel [353] is similar to that of theinner code here. Hence the proposed design method can be easily extended to ISI channels.

10.10 Chapter SummaryIn Section 10.2 the various components of the narrowband AMRcodec have been discussed.The error sensitivity of the narrowband AMR speech codec wascharacterised in 10.3, inorder to match various channel codecs to the different-sensitivity bits of the speech codec.Specifically, we have shown that some bits in both the narrow-and wideband AMR codec aremore sensitive to channel errors than others and hence require different grade of protectionby channel coding. The error propagation properties of different bits over consecutive speechframes have also been characterized. We have shown how the degradations produced byerrors propagate from one speech frame to the other and hencemay persist over consecutivespeech frames, especially in the scenario when the LSFs or the adaptive codebook delay bitswere corrupted.

In Section 10.4, a joint-detection assisted near-instantaneously adaptive CDMA speechtransceiver was designed, which allows us to switch betweena set of different source andchannel codec modes as well as transmission parameters, depending on the overall instanta-neous channel quality. The 4.75 kbit/s and 10.2 kbit/s speech modes of the AMR codec havebeen employed in conjunction with the novel family of RRNS based channel coding, using

Page 77: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 530

530 CHAPTER 10. ADVANCED MULTI-RATE SPEECH TRANSCEIVERS

the reconfigurable BPSK or 4QAM based JD-CDMA scheme. In Section 10.6.2, the speechbits were mapped into three different protection classes according to their respective errorsensitivities. In Section 10.8 the benefits of the multimodespeech transceiver clearly mani-fested themselves in terms of supporting unimpaired speechquality under time-variant chan-nel conditions, where a fixed-mode transceiver’s quality would become severely degraded bythe channel effects. The benefits of our dual-mode transceiver were further demonstrated withthe aid of the characteristic waveforms displayed in Figure10.15 and 10.16. Our AQAM/JD-CDMA scheme achieved the best compromise between unimpaired error-free speech qualityand robustness, which has been verified by our informal listening tests shown in Table 10.6.

In Sectionsec:amr-wb-codec the wideband AMR codec was investigated and in Figure 10.17we briefly exemplified the error sensitivity of the AMR-WB codec. Then IRCCs were in-voked for the sake of providing UEP for the AMR-WB speech codec, which were optimizedwith the aid of the novel tools of EXIT charts. More specifically, we aimed for matching theEXIT transfer function of the outer IRCC to that of the inner code and we additionally im-posed certain source constraints determined by the error sensitivity of the AMR-WB sourcebits. This design procedure may be readily extended to otherjoint source and channel codingschemes for the sake of attaining a near-capacity performance.

Page 78: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 531

Chapter 11MPEG-4 Audio Compression andTransmission

H-T. How and L. Hanzo

11.1 Overview of MPEG-4 Audio

The Moving Picture Experts Group (MPEG) was first established by the International Stan-dard Organisation (ISO) in 1988 with the aim of developing a full audio-visual coding stan-dard referred to as MPEG-1 [30–32]. The audio-related section MPEG-1 was designed toencode digital stereo sound at a total bit rate of 1.4 to 1.5 Mbps - depending on the samplingfrequency, which was 44.1 kHz or 48 kHz - down to a few hundred kilobits per second [33].The MPEG-1 standard is structured in layers, from Layer I to III. The higher layers achievea higher compression ratio, albeit at an increased complexity. Layer I achieves perceptualtransparency, i.e. subjective equivalence with the uncompressed original audio signal at 384kbit/s, while Layer II and III achieve a similar subjective quality at 256 kbit/s and 192 kbit/s,respectively [34–38].

MPEG-1 was approved in November 1992 and its Layer I and II versions were immediatelyemployed in practical systems. However, the MPEG Audio Layer III, MP3 for short only be-came a practical reality a few years later, when multimedia PCs were introduced having im-proved processing capabilities and the emerging Internet sparked off a proliferation of MP3compressed teletraffic. This changed the face of the music world and its distribution of music.The MPEG-2 backward compatible audio standard was approvedin 1994 [39], providing animproved technology that would allow those who had already launched MPEG-1 stereo audioservices to upgrade their system to multichannel mode, optionally also supporting a highernumber of channels at a higher compression ratio. Potentialapplications of the multichannelmode are in the field of quadraphonic music distribution or cinemas. Furthermore, lowersampling frequencies were also incorporated, which include 16, 22.05, 24, 32, 44.1 and 48kHz [39]. Concurrently, MPEG commenced research into even higher-compression schemes,

531

Page 79: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 532

532 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Parametric(HILN)

24 644832

SatelliteSecure com

20 kHz8 kHz4 kHz

2 4 6 8 1210 14 16

Scalable Codec

T/F codec

bit rate (kbps)

UMTS, Cellular ISDN Internet

Parametric Codec(HVXC)

BandwidthTypical Audio

ITU-Tcodec

CELP codec

Figure 11.1: MPEG-4 framework [41].

relinquishing the backward compatibility requirement, which resulted in the MPEG-2 Ad-vanced Audio Coding standard (AAC) standard in 1997 [40]. This provides those who arenot constrained by legacy systems to benefit from an improvedmultichannel coding scheme.In conjunction with AAC, it is possible to achieve perceptual transparent stereo quality at 128kbit/s and transparent multichannel quality at 320 kbit/s for example in cinema-type applica-tions.

The MPEG-4 audio recommendation is the latest standard completed in 1999 [41–45],which offers in addition to compression further unique features that will allow users to inter-act with the information content at a significant higher level of sophistication than is possibletoday. In terms of compression, MPEG-4 supports the encoding of speech signals at bit ratesfrom 2 kbit/s up to 24 kbit/s. For coding of general audio, ranging from very low bit ratesup to high quality, a wide range of bit rates and bandwidths are supported, ranging from a bitrate of 8 kbit/s and a bandwidth below 4 kHz to broadcast quality audio, including monoauralrepresentations up to multichannel configuration.

The MPEG-4 audio codec includes coding tools from several different encoding families,covering parametric speech coding, CELP-based speech coding and Time/Frequency (T/F)audio coding, which are characterised in Figure 11.1. It canbe observed that a parametriccoding scheme, namely Harmonic Vector eXcitation Coding (HVXC) was selected for cover-ing the bit rate range from 2 to 4 kbit/s. For bit rates between4 and 24 kbit/s, a CELP-codingscheme was chosen for encoding narrowband and wideband speech signals. For encoding

Page 80: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 533

11.2. GENERAL AUDIO CODING 533

general audio signals at bit rates between 8 and 64 kbit/s, a time/frequency coding schemebased on the MPEG-2 AAC standard [40] endowed with additional tools is used. Here, a com-bination of different techniques was established, becauseit was found that maintaining therequired performance for representing speech and music signals at all desired bit rates cannotbe achieved by selecting a single coding architecture. A major objective of the MPEG-4 au-dio encoder is to reduce the bit rate, while maintaining a sufficiently high flexibility in termsof bit rate selection. The MPEG-4 codec also offers other newfunctionalities, which includebit rate scalability, object-based of a specific audio passage for example, played by a cer-tain instrument representation, robustness against transmission errors and supporting specialaudio effects.

MPEG-4 consists of Versions 1 and 2. Version 1 [41] contains the main body of the stan-dard, while Version 2 [46] provides further enhancement tools and functionalities, that in-cludes the issues of increasing the robustness against transmission errors and error protection,low-delay audio coding, finely grained bit rate scalabilityusing the Bit-Sliced ArithmeticCoding (BSAC) tool, the employment of parametric audio coding, using the CELP-basedsilence compression tool and the 4 kbit/s extended variablebit rate mode of the HVXC tool.Due to the vast amount of information contained in the MPEG-4standard, we will only con-sider some of its audio compression components, which include the coding of natural speechand audio signals. Readers who are specifically interested in text-to-speech synthesis or syn-thetic audio issues are referred to the MPEG-4 standard [41]and to the contributions byScheireret al. [47, 48] for further information. Most of the material in this chapter will bebased on an amalgam of References [34–38, 40, 41, 43, 44, 46, 49]. In the next few sections,the operations of each component of the MPEG-4 audio component will be highlighted ingreater detail. As an application example, we will employ the Transform-domain WeightedInterleaved Vector Quantization (TWINVQ) coding tool, which is one of the MPEG-4 audiocodecs in the context of a wireless audio transceiver in conjunction with space-time cod-ing [50] and various Quadrature Amplitude Modulation (QAM)schemes [51]. The audiotransceiver is introduced in Section 11.5 and its performance is discussed in Section 11.5.6.

11.2 General Audio Coding

The MPEG-4 General Audio (GA) coding scheme employs the Time/Frequency (T/F) codingalgorithm, which is capable of encoding music signals at bitrates from 8 kbit/s per channeland stereo audio signals at rates from 16 kbit/s per stereo channel up to broadcast qualityaudio at 64 kbit/s per channel and higher. This coding schemeis based on the MPEG-2Advanced Audio Coding (AAC) standard [40], enriched by further addition of tools andfunctionalities. The MPEG-4 GA coding incorporates a rangeof state-of-the-art coding tech-niques, and in addition to supporting fixed bit rates it also accommodates a wide range ofbit rates and variable rate coding arrangements. This was facilitated with the aid of the con-tinuous development of the key audio technologies throughout the past decades. Figure 11.2shows in an non-exhaustive fashion some of the important milestones in the history of percep-tual audio coding, with emphasis on the MPEG standardization activities. These importantdevelopments and contributions, which will be highlightedin more depth during our furtherdiscourse throughout this chapter, have also resulted in several well-known commercial audiocoding standards, such as the Dolby AC-2/AC-3 [412], the Sony Adaptive Transform Acous-

Page 81: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 534

534 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Algorithms/Techniques Timeline Standards/Commercial Codecs

1983

1981

1979

1970

1961

1940

1986

1987

1991

1990

1989

1988

1999

1998

1997

1995

1994

1993

1992

MPEG-4 Version 1 & 2 finalized [110,111]

Dolby AC-2 [103]

MPEG-1 Audio finalized [104]Dolby AC-3 [103]

MPEG-2 backward compatible [107]

MPEG-2 Advanced Audio Coding (AAC) [109]

CNET codec [91]

Levine & Smith, Verma & Ming: Sinusoidal+Transients+Noise coding [100,101]

Park: Bit-Sliced Arithmetic Coding (BSAC) [98]

Herre & Johnston: Temporal Noise Shaping [97]Iwakami: TWINVQ [96]

Herre: Intensity Stereo Coding [95]

Mahieux: backward adaptive prediction [91]Edler: Window switching strategy [92]Johnston: M/S stereo coding [93]

Johnston: Perceptual Transform Coding [90]

Scharf, Hellman: Masking effects [84,85]

Schroeder: Spread of masking [86]

Rothweiler: Polyphase Quadrature Filter [88]

Fletcher: Auditory patterns [81]

Nussbaumer: Pseudo-Quadrature Mirror Filter [87]

Princen: Time Domain Aliasing Cancellation [89]

Malvar: Modified Discrete Cosine Transform [94]

Sony: MiniDisc: Adaptive TransformAcoustic Coding(ATRAC) [105]

NTT: Transform-domain WeightedInterleaved Vector Quantization (TWINVQ) [96,108]

Philips: Digital Compact Cassette (DCC) [106]

Zwicker, Greenwood: Critical bands [82,83]

AT&T: Perceptual Audio Coder (PAC) [102]

Purnhagen: Parametric Audio Coding [99]

Figure 11.2: Important milestones in the development of perceptual audio coding.

Page 82: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 535

11.2. GENERAL AUDIO CODING 535

0

203040

5060

70

10

Threshold in quiet

Masker

Masked Sound

Masking Threshold

0.02 0.05 0.1 0.2 0.5 1 2 5 10 20

Sound PressureLevel (dB)

Frequency (kHz)

Figure 11.3: Threshold in quiet and masking threshold [416].

tic Coding (ATRAC) for MiniDisc [413], the Lucent Perceptual Audio Coder (PAC) [414]and Philips Digital Compact Cassette (DCC) [415] algorithms. Advances in audio bit ratecompression techniques can be attributed to four key technologies:

A) Perceptual CodingAudio coders reduce the required bit rates by exploiting thecharacteristics of masking the

effects of quantization errors in both the frequency and time domains by the human auditorysystem, in order to render its effects perceptually inaudible [417–420]. The foundations ofmodern auditory masking theory were laid down by Fletcher’sseminal paper in 1940 [421].Fletcher [421] suggested that the auditory system behaves like a bank of bandpass filtershaving continuously overlapping passbands. Research has shown that the ear appears toperceive sounds in a number of critical frequency bands, as shown by Zwicker [418] andGreenwood [422]. This model of the ear can be roughly described as a bandpass filterbank,consisting of overlapping bandpass filters having bandwidths on the order of 100 Hz for signalfrequencies below 500 Hz. By contrast, the bandpass filter bandwidths of this model may beas high as 5000 Hz at high frequencies. There exists up to twenty five such critical bandsin the frequency range up to 20 kHz [418]. Auditory masking refers to the mechanism bywhich a fainter, but distinctly audible signal becomes inaudible, when a louder signal occurssimultaneously (simultaneous masking), or within a very short time (forward or backwardmasking) [423]. More specifically, in the case of simultaneous masking the two sounds occurat the same time, for example in a scenario, when a conversation (masked signal) is rendered

Page 83: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 536

536 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

inaudible by a passing train (the masker). Forward masking is encountered when the maskedsignal remains inaudible for a time after the masker has ended, while an example of thisphenomenon in backward masking takes place when the masked signal becomes inaudibleeven before the masker begins. An example is the scenario during abrupt audio signal attacksor transients, which create a pre-and post-masking regionsin time during which a listener willnot be able to perceive signals beneath the audibility thresholds produced by a masker. Hence,specific manifestation of masking depends on the spectral composition of both the maskerand masked signal, and their variations as a function of time[424]. Important conclusions,which can be drawn from all three masking scenarios [424,425] are firstly, that simultaneousmasking is more effective when the frequency of the masked signal is equal to or higherthan that of the masker. This result is demonstrated in Figure 11.3, where a masker renderedthree masked signals inaudible, which occurred at both lower and higher frequencies thanthe masker. Secondly, while forward masking is effective for a considerable time after themasker has decayed, backward masking may only be effective for less than 2 or 3 ms beforethe onset of the masker [424].

A masking thresholdcan be determined, whereby signals below this threshold will be in-audible. Again, Figure 11.3 depicts an example of the masking threshold of a narrowbandmasker, having three masked signals in the neighbourhood. As long as the sound pressurelevels of the three maskees are below the masking threshold,the corresponding signals willbe masked. Observe that the slope of the masking threshold issteeper towards lower frequen-cies, which implies that higher frequencies are easier to mask. When no masker is present, asignal will be inaudible if its sound pressure level is belowthethreshold in quiet, as displayedin Figure 11.3. Thethreshold in quietcharacterizes the amount of energy required for a puretone to be detectable by a listener in a noiseless environment. The situation discussed hereonly involved one masker, but in real life, the source signals may consists of many simultane-ous maskers, each having its own masking threshold. Thus, aglobal masking thresholdhasto be computed, which describes the threshold ofjust noticeable distortionsas a function offrequency [424].

B) Frequency Domain CodingThe evolution of time/frequency mapping or filterbank basedtechniques has contributed to

the rapid development in the area of perceptual audio coding. Some of the earliest frequencydomain audio coders include contributions from Brandenburg [426] and Johnston [427] al-though subband based narrow- and wideband speech codecs were developed during the late1970s and early 1980s [428–430]. Frequency domain encoders[431, 432], which are em-ployed in all MPEG codecs offer a convenient way of controlling the frequency-domain dis-tribution of the quantization noise, in conjunction with dynamic bit allocation applied to thequantization of subband signals or transform coefficients.Essentially, the filterbank dividesthe spectrum of the input signal into frequency subbands, which host the contributions ofthe fullband signal in the subband concerned. Given the knowledge of an explicit perceptualmodel, the filterbank facilitates the task of perceptually motivated noise shaping and that ofidentifying the perceptually unimportant subbands. It is important to choose the appropri-ate filterbank for bandsplitting. An adaptive filterbank exhibiting time-varying resolutionsin both the time and frequency domain is highly desirable. This issue has motivated inten-sive research, experimenting with various switched or hybrid filterbank structures, where theswitching decisions were based on the time-variant input signal characteristics [433].

Depending on the frequency domain resolution, we can categorize frequency domain coders

Page 84: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 537

11.2. GENERAL AUDIO CODING 537

..

. ... ..

. ...

..

.

MMH (z)M-1 M-1

G (z)

MM

MM

0H (z) MM 0

G (z)

M-1x (n)

. . .M-1

. . .2z (n)

2x (n)

2

. . .y (n)

1z (n)

1x (n)

1

. . .y (n)

0z (n)

0x (n)

0

1H (z)

2H (z)

1

2G (z)

y (n)

G (z)

y (n) z (n)M-1

s(n)^s(n)

Figure 11.4: Uniform M -band analysis-synthesis filterbank [420].

as either transform coders [426, 427], or subband coders [434–436]. The basic principle oftransform coders is the multiplication of overlapping blocks of audio samples with a smoothtime-domain window function, followed by either the Discrete Fourier Transform (DFT) orthe Discrete Cosine Transform (DCT) [437], which transformthe input time-domain sig-nal into a high resolution frequency domain representation, consisting of nearly uncorrelatedspectral lines or transform coefficients. The transform coefficients are subsequently quan-tized and transmitted over the channel. At the decoder, the inverse transformation is applied.By contrast, in subband codecs, the input signal is split into several uniform or non-uniformwidth subbands using critically sampled [435], Perfect Reconstruction [438] (PR) or non-PR [439] filterbanks. For example, as shown in Figure 11.4, when an input signal is splitinto M bandpass signals, critical decimation by a factor ofM is applied. This means thateverymth sample of each bandpass signal is retained, which ensuresthat the total numberof samples across the subbands equals the number of samples in the original input signal.At the synthesis stage, a summation of theM bandpass signals is performed, which leads tointerpolation between samples at the output.

The traditional categorization into the families of subband and transform coders has beenblurred by the emerging trend of combining both techniques in the codec design, as ex-emplified by the MPEG codecs, which employ both techniques. In the contribution byTemerinac [440], it was shown mathematically that all transforms used today in the audiocoding systems can be viewed as filterbanks. All uniform-width subband filterbanks canbe viewed as transforms of splitting a full-band signal inton components [440]. One ofthe first filterbank structure proposed in early 1980s, was based on Quadrature Mirror Fil-ters (QMF) [428]. Specifically, a near-PR QMF filter was proposed by Nussbaumer [441]and Rothweiler [439]. In order to derive the pseudo-QMF structure, firstly the analysis-by-synthesis filters have to meet the mirror image condition of [439]:

gk(n) = hk(L − 1 − n) . (11.1)

Page 85: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 538

538 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Additionally, the precise relationships between the analysis and synthesis filtershk andgk have to be established in order to eliminate aliasing. With reference to Figure 11.4, theanalysis and synthesis filters which eliminate both aliasing and phase distortions are givenby [435] :

hk(n) = 2w(n)cos

[

π

M(k + 0.5)

(

n −(L − 1)

2

)

+ θk

]

(11.2)

and

gk(n) = 2w(n)cos

[

π

M(k + 0.5)

(

n −(L − 1)

2

)

− θk

]

(11.3)

respectively, where

θk = (−1)k π

4. (11.4)

The filterbank design is now reduced to the design of the time-domain window function,w(n). The principles of Pseudo-QMFs have been applied in both theMPEG-1 and MPEG-2schemes, which employ a 32-channel Pseudo-QMF for implementing spectral decompositionin both the Layer I and II schemes. The same Pseudo-QMF filter was used in conjunctionwith a PR cosine-modulated filterbank in Layer III in order toform a hybrid filterbank [35].This hybrid combination could provide a high frequency resolution by employing a cascadeof a filterbank and an Modified Discrete Cosine Transform (MDCT) transform that splits eachsubband further in the frequency domain [37].

The MDCT [437], which has been defined in the current MPEG-2 and 4 codecs, wasfirst proposed under the name of Time Domain Aliasing Cancellation (TDAC) by Princenand Bradley [442] in 1986. It is essentially a PR cosine modulated filterbank satisfying theconstraint ofL = 2M , whereL is the window size whileM is the transform length. Inconventional block transforms, such as the DFT or DCT, blocks of samples are processedindependently, due to the quantization errors the decoded signal will exhibit discontinuitiesat the block boundaries since in the context of conventionalblock-based transforms the time-domain signal is effectively multiplied by a rectangular time-domain window, its sinc-shapedfrequency domain representation is convolved with the spectrum of the audio signal. Thisresults in the well-known Gibbs phenomenon. This problem ismitigated by applying theMDCT, using a specific window function in combination with overlapping the consecutivetime-domain blocks. As shown in Figure 11.5, a window of2M samples collected fromtwo consecutive time-domain blocks undergoes cosine transformation, which producesMfrequency-domain transform coefficients. The time-domainwindow is then shifted byMsamples for computing the nextM transform coefficients. Hence, there will be a 50% overlapin each consecutive DCT transform coefficient computation.This overlap will ensure a moresmooth evolution of the reconstructed time-domain samples, even though there will be someresidual blocking artifacts due to the quantization of the transform coefficients. Nonetheless,the MDCT virtually eliminates the problem of blocking artifacts that plague the reconstructedsignal produced by non-overlapped transform coders. This problem often manifestated itselfas a periodic clicking in the reconstructed audio signals. Again, the processes associatedwith the MDCT-based overlapped analysis and the corresponding overlap-add synthesis are

Page 86: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 539

11.2. GENERAL AUDIO CODING 539

M M M M

2M

2M

2M

MDCT

MDCT

MDCT

M

M

M

IMDCTM

IMDCTM

IMDCTM 2M

2M

2M

M M

Frame k+1 k+2

Frame k k+1 k+2 k+3

(a)

+

+

. . .. . .

(b)

Figure 11.5: (a) MDCT analysis process,2M samples are mapped intoM spectral coefficients (b)MDCT synthesis process,M spectral coefficients are mapped to a vector of2M sampleswhich is overlapped byM samples with the vector of2M samples from the previousframe, and then added together to obtain the reconstructed output ofM samples [420].

illustrated in Figure 11.5. At the analysis stage, the forward MDCT is defined as [443]:

X(k) =

2M−1∑

n=0

x(n)hk(n), k = 0...M − 1 , (11.5)

where theM MDCT coefficientsX(k), k = 0...M −1 are generated by computing a seriesof inner products between the2M samplesx(n) of the input signal and the correspondinganalysis filter impulse responsehk(n). The analysis filter impulse response,hk(n), is givenby [443]:

hk(n) = w(n)

2

Mcos

[

(2n + M + 1)(2k + 1)π

4M

]

, (11.6)

wherew(n) is a window function, and the specific window function used inthe MPEG stan-

Page 87: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 540

540 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

dard is the sine window function, given by [443]:

w(n) = sin

[(

n +1

2

)

π

2M

]

. (11.7)

At the synthesis stage, the inverse MDCT is defined by [443]:

x(n) =

M−1∑

k=0

[X(k)hk(n) + XP (k)hk(n + M)] . (11.8)

In Equation 11.8 we observe that the time-domain reconstructed samplex(n) is obtainedby computing a sum of the basis vectorshk(n) andhk(n + M) weighted by the transformcoefficientsX(k) andXP (k) on the basis of the current and previous blocks as it was alsoillustrated in Figure 11.5. More specifically, the firstM -sample block of thekth basis vector,hk(n), for 0 ≤ n ≤ M−1, is weighted by thekth MDCT coefficients of the current block. Bycontrast, the second M-sample block of thekth basis vector,hk(n), for M ≤ n ≤ 2M − 1is weighted by thekth MDCT coefficients of the previous block, namely byXP (k). Theinverse MDCT operation is also illustrated in Figure 11.5.

C) Window SwitchingThe window switching strategy was first proposed in 1989 by Edler [444], where a bit rate

reduction method was proposed for audio signals based on overlapping transforms. Morespecifically, Edler proposed adapting the window functionsand the transform lengths to thenature of the input signal. This improved the performance ofthe transform codec in thepresence of impulses and rapid energy on-set occurrences inthe input signal. The notion ofapplying different windows according to the input signal’sproperties has been subsequentlyincorporated in the MPEG codecs employing the MDCT, for example MPEG-1 Layer III andMPEG-2 AAC codecs [40].

Typically, a long time-domain window is employed for encoding the identifiable stationarysignal segments while primarily a short window is used for localizing the pre-echo effects dueto the occurrence of sudden signal on-sets, as experienced during transient signal periods, forexample [40]. In order to ensure that the conditions of PR-based analysis and synthesisfiltering are property are preserved, transitional windowsare needed for switching betweenthe long and short windows [443]. These transitional windows are depicted graphically inFigure 11.6, utilizing four window functions, namely long,short, start and stop windows,which are also used in the MPEG-4 General Audio coding standard.

D) Dynamic Bit AllocationDynamic bit allocation aims for assigning bits to each of thequantizers of the transform

coefficients or subband samples, in such a way that the overall perceptual quality is maxi-mized [445]. This is an iterative process, where in each iteration, the number of quantizinglevels is increased, while satisfying the constraint that the number of bits used must not ex-ceed the number of bits available for that frame.

Furthermore, another novel bit allocation technique referred to as the “bit reservoir” schemewas proposed for accommodating the sharp signal on-sets, which resulted in an increasednumber of required bits during the encoding of transient signals [445]. This is due to thefact that utilising the window switching strategy does not succeed in avoiding all audiblepre-echos, in particular, when sudden signal on-set occurrences near the end of a transform

Page 88: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 541

11.2. GENERAL AUDIO CODING 541

0 512 1024 1536 2048 2560 3072 3584 4096

Start Short Stop

Sample Index

2 3 4 5 6 7 8 91 101

0

0

1A B C

Gain

(a)

(b)

Long

Figure 11.6: Window transition during (a) steady state using long windows and (b) transient conditionsemploying start, short, and stop windows [40].

block [420]. In block-based schemes like conventional transform codecs, the inverse trans-form spreads the quantization errors evenly in time over theduration of the reconstructionblock. This results in audible unmasked distortion throughout the low-energy signal segmentthe instant of the signal attack [420]. Hence, the “bit reservoir” technique was introduced forallocating more bits to those frames, which invoked pre-echo control. This “bit reservoir”technique was employed in the MPEG Layer III and MPEG-2 AAC codecs [40].

11.2.1 Advanced Audio Coding

The MPEG-2 Advanced Audio Coding (AAC) scheme was declared an international stan-dard by MPEG at the end of April 1997 [40]. The main driving factor behind the MPEG-2AAC initiative was the quest for an efficient coding method for multichannel surround soundsignals such as the 5-channel (left, right, centre, left-surround and right-surround) system de-signed for cinemas. The main block diagram of the MPEG-4 Time/Frequency (T/F) codec isas shown in Figure 11.7, which was defined to be backward compatible to the MPEG-2 AACscheme [40].

In this section we commence with an overview of the AAC profiles based on Figure 11.7

Page 89: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 542

542 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Filterbank

SpectralNormalization

TemporalNoise Shaping

AACQuantizationand Coding

BSACQuantizationand Coding

WindowLength

Decision

BitstreamFormatter

Spectral Processing

Input time signal

AAC GainControl Tool

Quantization and Coding

coded audiobitstream

DataControl

IntensityStereo

Prediction

PsychoacousticModel

Twin-VQ

StereoM/S

Figure 11.7: Block diagram of MPEG-4 T/F-based encoder [41].

Page 90: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 543

11.2. GENERAL AUDIO CODING 543

and each block will be discussed in more depth in Section 11.2.2 - 11.2.10. Following thediagram shown in Figure 11.7, the T/F coder first decomposes the input signal into a T/F rep-resentation by means of an analysis filterbank prior to subsequent quantization and coding.The filterbank is based on the Modified Discrete Cosine Transform (MDCT) [442], whichis also known as Modulated Lapped Transform (MLT) [446]. In the case when the ScalableSampling Rate (SSR) mode is invoked, the MDCT will be preceded by a Polyphase Quadra-ture Filter (PQF) [439] and a gain control module, which are not explicitly shown in Figure11.7 but will be described in Section 11.2.2. In the encodingprocess, the filterbank takes in ablock of samples, applies the appropriate windowing function and performs the MDCT withinthe filterbank block. The MDCT block length can be either 2048or 256 samples, switcheddynamically depending on the input signal’s characteristics. This window switching mecha-nism was first introduced by Edler in [444]. Long block transform processing (2048 samples)will improve the coding efficiency of stationary signals, but problems might be incurred whencoding transients signals. Specifically, this gives rise tothe problem of pre-echos, which oc-cur when a signal exhibiting a sudden sharp signal envelope rise begins near the end of atransform block [420]. In block-based schemes, such as transform codecs, the inverse trans-form will spread the quantization error evenly in time over the reconstructed block. This mayresult in audible unmasked quantization distortion throughout the low-energy section preced-ing the instant of the signal attack [420]. By contrast, a shorter block length processing (256samples) will be optimum for coding transient signals, although it suffers from inefficientcoding of steady-state signals due to the associated poorerfrequency resolution.

Figure 11.6 shows the philosophy of the block switching mechanisms during both steadystate and transient conditions. Specifically, two different window functions, the Kaiser-Besselderived (KBD) window [412] and the sine window can be used forwindowing the incominginput signal for the sake of attaining an improved frequencyselectivity and for mitigatingthe Gibb-oscillation, before the signal is transformed by the MDCT [412]. The potentialproblem of appropriate block alignment due to window switching is solved as follows. Twoextra window shapes, so-called start and stop windows are introduced together with the longand short windows depicted in Figure 11.6. The long window consists of 2048 samples whilea short window is composed of eight short blocks arranged to overlap by 50% with each other.At the boundaries between long and short blocks, half of the transform blocks overlap withthe start and stop windows. Specifically, thestart window enables the transition between thelong and short window types. The left half of astart window seen at the bottom of Figure11.6 shares the form as the left half of the long window type depicted at the top of Figure11.6. The right half of thestart window has the value of unity for one-third of the lengthand the shape of the right half of a short window for the central one-third duration of its totallength, with remaining one-third of thestart window duration length set to zero. Figure 11.6(a) shows at the top of Figure 11.6 the steady state condition, where only long transformblocks are employed. By contrast, Figure 11.6 (b) displays the block switching mechanism,where we can observe that the start (#1) and stop (#10) windowsequences ensure a smoothtransition between long and short transforms. The start window can be either the KBD or thesine-window, in order to match the previous long window type, while the stop window is thetime-reversed version of the start window.

Like all other perceptually motivated coding schemes, the MPEG-4 AAC-based codecmakes use of the signal masking properties of the human ear, in order to reduce the requiredbit rate. By doing so, the quantization noise is distributedto frequency bands in such a

Page 91: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 544

544 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

way that it is masked by the total signal and hence it remains inaudible. The input audiosignal simultaneously passes through a psychoacoustic model as shown in Figure 11.7, thatdetermines the ratio of the signal energy to the masking threshold. An estimate of the maskingthreshold is computed using the rules of psychoacoustics [34]. Here, a perceptual modelsimilar to the MPEG-1 psychoacoustic model II [40] is used, which will be described inSection 11.2.3. A signal-to-mask ratio is computed from themasking threshold, which isused to decide on the bit allocation, in an effort to minimizethe audibility of the quantizationnoise.

After the MDCT carried out in the filterbank block of Figure 11.7 the spectral coefficientsare passed to the Spectral Normalization ’toolbox’, if the TWINVQ mode is used. The Spec-tral Normalization tool will be described in Section 11.2.9. For AAC-based coding, the spec-tral coefficients will be processed further by the Temporal Noise Shaping (TNS) ’toolbox’ ofFigure 11.7, where TNS uses a prediction approach in the frequency domain for shaping anddistributing the quantization noise over time.

Time domain ’Prediction’ block of Figure 11.7 or Long-Term Prediction (LTP) is an impor-tant tool, which increases redundancy reduction of stationary signals. It utilises a second or-der backward adaptive predictor, which is similar to the scheme proposed by Mahieux [447].In the case of multichannel input signals, ’Intensity Stereo’ coding is also applied as seenin Figure 11.7, which is a method of replacing the left and right stereo signals by a singlesignal having embedded directional information. Mid/Side(M/S) stereo coding, as describedby Johnston [448] can also be used as seen in Figure 11.7, where instead of transmitting theleft and right signals, the sum and difference signals are transmitted.

The data-compression based bit rate reduction occurs in thequantization and coding stage,where the spectral values can be coded either using the AAC, Bit Sliced Arithmetic Cod-ing [449] (BSAC) or TWINVQ [450] techniques as seen in Figure 11.7. The AAC quanti-zation scheme will be highlighted in Section 11.2.6 while the BSAC and TWINVQ-basedtechniques will be detailed in Section 11.2.8 and 11.2.9, respectively. The AAC techniqueinvokes an adaptive non-linear quantizer and a further noise shaping mechanism employ-ing scale-factors is implemented. The allocation of bits tothe spectral values is carried outaccording to the psychoacoustic model, with the aim of suppressing the quantization noisebelow the masking threshold. Finally, the quantized and coded spectral coefficients and con-trol parameters are packed into a bitstream format ready fortransmission. In the followingsections, the individual components of Figure 11.7 will be discussed in further details.

11.2.2 Gain Control Tool

When the Scalable Sampling Rate (SSR) mode is activated, which facilitates the employmentof different sampling rates, the MDCT transformation taking place in the Filterbank blockof Figure 11.7 is preceded by uniformly-spaced 4-band Polyphase Quadrature Filter [441](PQF), plus a gain control module [41]. The PQF splits the input signal into four frequencybands of equal width. When the SSR mode is invoked, lower bandwidth output signals, andhence lower sampling rate signals can be obtained by neglecting the signals residing in thelower-energy upper bands of the PQF. In the scenario, when the bandwidth of the input signalis 24 kHz, equivalent to a 48 kHz sampling rate, output bandwidths of 18, 12 and 6 kHz canbe obtained when one, two or three PQF outputs are ignored, respectively [40].

The purpose of the gain control module is to appropriately attenuate or amplify the output

Page 92: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 545

11.2. GENERAL AUDIO CODING 545

ComputeMasking Threshold

CombineAbsolute Threshold

Signal-to-MaskRatio

Input Samples SMR

FFT

Windowing

Frequency-to CriticalBand Mapping

SpreadingFunction

ComputeTonality Index

Figure 11.8: Flow diagram of the psychoacoustic model II in MPEG-4 AAC coding.

of each PQF band, in order to reduce the potential pre-echo effects [420]. The gain controlmodule, which estimates and adjusts the gain factor of the subbands, according to the psy-choacoustic requirements, can be applied independently toeach subband. At the encoder, thegain control ’toolbox’ receives the time domain signals as its input and outputs the gain con-trol data and the appropriately scaled signal whose length is equal to the length of the MDCTwindow. The ’gain control data’ consists of the number of bands which experienced gainmodification, the number of modified segments and the indicesindicating the location andlevel of gain modification for each segment. Meanwhile, the ’gain modifier’ associated witheach PQF band controls the gain of each band. This effectively smoothes the transient peaksin the time domain prior to MDCT spectral analysis. Subsequently, the normal procedure ofcoding stationary signals using long blocks can be applied.

11.2.3 Psychoacoustic Model

As argued in Section 11.2, the MPEG-4 audio codec and other perceptually optimized codecsreduce the required bit rate by taking advantage of the humanauditory system’s inability toperceive the quantization noise satisfying the conditionsof auditory masking. Again, per-ceptual masking occurs, when the presence of a strong signalrenders the weaker signalssurrounding it in the frequency-domain imperceptible [424]. The psychoacoustic model usedin the MPEG-4 audio codec is similar to the MPEG-1 psychoacoustic model II [34].

Figure 11.8 shows the flow chart of the psychoacoustic model II. First a Hann window [41]

Page 93: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 546

546 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

is applied to the input signal and then the Fast Fourier Transform (FFT) provides the necessarytime-frequency mapping. The Hann window is defined as [41]:

w(n) =1

2[1 − cos(

2πn

N)] (11.9)

whereN is the FFT length. This windowing procedure is applied for the sake of reducing thefrequency-domain Gibbs oscillation potentially imposed by a rectangular transform window.Depending on whether the signal’s characteristics are of stationary or transient nature, FFTsizes of either 1024 or 128 samples can be applied. The FFT-based spectral coefficient valuesare then grouped according to the corresponding critical frequency band widths. This isachieved by transforming the spectral coefficient values into the “partition index” domain,where the partition indices are related near-linearly to the critical bands that were summarisedin Figure 11.9 (a) recorded at the sampling rate of 44.1 kHz. At low frequencies, a singlespectral line constitutes a partition, while at high frequencies many lines will be combinedin order to form a partition, as displayed in Figure 11.9 (b).This facilitates the appropriaterepresentation of the critical bands of the human auditory system [36]. Tables of the mappingfunctions between the spectral and partition domains and their respective values for thresholdin quiet are supplied in the MPEG-4 standard for all available sampling rates [41].

During the FFT process of Figure 11.8, the polar representation of the transform-domaincoefficients is also calculated. Both the magnitude and phase of this polar representationwill be used for the calculation of the ’predictability measure’, which is used for quantifyingthe predictability of the signal, as an indicator of the grade of tonality. The psychoacousticmodel identifies the tonal and noise-like components of the audio signal, because the maskingabilities of the two types of signals differ. In this psychoacoustic model, the masking ability ofa tone masking the noise, which is denoted byTMN(b), is fixed at 18 dB in all the partitions,which implies that any noise within the critical band more than 18 dB belowTMN(b) will bemasked by the tonal component. The masking ability of noise masking tone, which is denotedby NMT (b), is set to 6 dB for all partitions. The previous two frequency-domain blocks areused for predicting the magnitude and phase of each spectralline for the current frequency-domain block, via linear interpolation in order to obtain the ’predictability’ values for thecurrent block. Tonal components are more predictable and hence will have higher tonalityindices. Furthermore, a spreading function [41] is appliedin order to take into considerationthe masking ability of a given spectral component, which could spread across its surroundingcritical band.

The masking threshold is calculated in Figure 11.8 by using the tonality index and thethreshold in quiet,Tq, which is known as the lower threshold bound above which a sound isaudible. The masking threshold in each frequency-domain partition corresponds to the powerspectrum multiplied by an attenuation factor given by [41]:

Attenuation Factor = 10−SNR(b), (11.10)

implying that the higher the SNR, the lower the attenuation factor and also the maskingthreshold, where the Signal-to-Noise (SNR) ratio is derived as:

SNR(b) = tb(b) · TMN(b) + (1 − tb(b)) · NMT (b), (11.11)

Page 94: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 547

11.2. GENERAL AUDIO CODING 547

0 10 20 30 40 50 60Partition Index

0

5

10

15

20

25

Crit

ical

Ban

ds(B

ark)

(a)

(b)

0 200 400 600 800 1000Spectral Lines

0102030405060

Par

titio

nIn

dex

Figure 11.9: (a) Relationship between the partition index and critical bands. (b) The conversion fromthe FFT spectral lines to the partition index domain at the sampling rate of 44.1kHz for atotal of 1024 spectral lines per time-domain audio frame [34].

where the masking ability of tone-masking-noise and noise-masking-tone is considered, byexploiting the tonality index in each partition.

The masking threshold is transformed back to the linear frequency scale by spreadingit evenly over all spectral lines corresponding to the partitions, as seen in Figure 11.10 inpreparation for the calculation of the Signal-to-Mask Ratios (SMR) for each subband. Theminimum masking threshold, as shown in Figure 11.10, takes into account the value of thethreshold in quiet,Tq, raising the masking threshold value to the value ofTq, if the maskingthreshold value is lower thanTq. Finally, the SMR is computed for each scalefactor bandas the ratio of the signal energy within a frequency-domain scalefactor band to the minimummasking threshold for that particular band, as depicted graphically in Figure 11.10. The SMRvalues will then be used for the subsequent allocation of bits in each frequency band.

11.2.4 Temporal Noise Shaping

Temporal Noise Shaping (TNS) in audio coding was first introduced by Herreet al. in [451].The TNS tool seen in Figure 11.7 is a frequency domain technique, which operates on thespectral coefficients generated by the analysis filterbank.The idea is to employ linear predic-

Page 95: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 548

548 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Signal-to-Mask Ratio

SMR

Sound PressureLevel (in dB)

CriticalBand

MinimumMasking Threshold

Noise Level ofm-bit Quantizer

m-1mm+1

Masker

Masking Threshold

SNR(m)

FrequencyNeighbouringBand

Mask-to-Noise RatioMNR(m)

Figure 11.10: Masking effects and masking threshold calculation.

tive coding across the frequency range, rather than in the time-domain. TNS is particularlyimportant, when coding signals that vary dynamically over time, such as for example tran-sient signals. Transform codecs often encounter problems when coding such signals sincethe distribution of the quantization noise can be controlled over the frequency range but thisspectral noise shaping is typically time-invariant over a complete transform block. When asignal changes drastically within a time-domain transformblock without activating a switchto shorter time-domain transform lengths, the associated time-invariant distribution of quan-tization noise may lead to audible audio artifacts.

The concept of TNS is based upon the time- and frequency-domain duality of the LPCanalysis paradigm [433], since it is widely recognized thatsignals exhibiting a non-uniformspectrum can be efficiently coded either by directly encoding the spectral-domain transformcoefficients using transform coding, or by applying linear predictive coding methods to thetime-domain input signal. The corresponding ’duality statement’ relates to the encoding ofaudio signals exhibiting a time-variant time-domain behaviour, such as in case of transientsignals. Thus, efficient encoding of transient signals can be achieved by either directly en-coding their time domain representation or by employing predictive audio coding methodsacross the frequency domain.

Page 96: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 549

11.2. GENERAL AUDIO CODING 549

Predictor

AnalysisFilterbank Q

TNS Filtering

Predictor

Q-1Synthesis

Filterbank

InverseTNS Filtering

+-

++

Figure 11.11: The TNS processing block also seen in Figure 11.7.

Figure 11.11 shows the more detailed TNS filtering process seen in the centre of Fig-ure 11.7. The TNS tool is applied to the spectral-domain transform coefficients after thefilterbank stage of Figure 11.7. The TNS filtering operation replaces the spectral-domaincoefficients with the prediction residual between the actual and predicted coefficient values,thereby increasing their representation accuracy. Similarly, at the decoder an inverse TNSfiltering operation is performed on the transform coefficient prediction residual in order toobtain the decoded spectral coefficients. TNS can be appliedto either the entire frequencyspectrum, or only to a part of the spectrum, such that the frequency-domain quantization canbe controlled in a time-variant fashion [40], again, with the objective of achieving agile andresponsive adjustment of the frequency-domain quantization scheme for sudden time-domaintransients. In combination with further techniques such aswindow switching and gain con-trol, the pre-echo problem can be further mitigated. In addition, the TNS technique enablesthe peak bit rate demand of encoding transient signals to be reduced. Effectively, this impliesthat an encoder may stay longer in the conventional and more bit-rate efficient long encodingblock.

Additionally, the long-term time-domain redundancy of theinput signal may be exploitedusing the well-documented Long Term Prediction (LTP) technique, which is frequently usedin speech coding [41,335,367].

11.2.5 Stereophonic Coding

The MPEG-4 scheme includes two specific techniques for encoding stereo coding of signals,namely intensity-based stereo coding [452] and Mid/Side (M/S) stereo coding [453], both

Page 97: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 550

550 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

of which will be described in this section. These coding strategies can be combined byselectively applying them to different frequency regions.

Intensity-based stereophonic coding is based on the analysis of high-frequency audio per-ception, as outlined by Herreet al. in [452]. Specifically, high-frequency audio perceptionis mainly based on the energy-time envelope of this region ofthe audio spectrum. It allowsa stereophonic channel pair to share a single set of spectralintensity values for the high-frequency components with little or no loss in sound quality. Effectively, the intensity signalspectral components are used to replace the corresponding left channel spectral coefficients,while the corresponding spectral coefficients of the right channel are set to zero. Intensity-based stereophonic coding can also be interpreted as a simplified approximation to the ideaof directional coding. Thus, only the information of one of the two stereo channel is retained,while the directional information is obtained with the aid of two scalefactor values assignedto the left and right channels [454].

On the other hand, M/S stereo coding allows the pair of stereochannels to be conveyed asleft/right (L/R) or as the mid/side (M/S) signals representing the M/S information on a block-by-block basis [453], where M=(L+R)/2 and S=(L-R)/2. Here,The M/S matrix takes the suminformation M + S, and sends it to the left channel, and the difference information M - S, andsends it to the right channel. When the left and right signals are combined, (M + S) + (M -S) = 2M, the sum is M information only. The number of bit actually required to encode theM/S information and L/R information is then calculated. In cases where the M/S channel paircan be represented with the aid of fewer bits, while maintaining a certain maximum level ofquantization distortion, the corresponding spectral coefficients are encoded, and a flag bit isset for signalling that the block has utilized M/S stereo coding. During decoding the decodedM/S channel pair is converted back to its original left/right format.

11.2.6 AAC Quantization and Coding

After all the pre-processing stages of Figure 11.7 using various coding tools, as explained inearlier sections, all parameters to be transmitted will nowhave to be quantized. The quan-tization procedure follows an analysis-by-synthesis process, consisting of two nested itera-tion loops, which are depicted in Figure 11.12. This involves the non-uniform quantizationof the spectral-domain transform coefficients [40]. Transform-domain non-linear quantizershave the inherent advantage of facilitating spectral-domain noise shaping in comparison toconventional linear quantizers [431]. The quantized spectral-domain transform coefficientsare then coded using Huffman coding. In order to improve the achievable subjective audioquality, the quantization noise is further shaped using scalefactors [455], as it is highlightedbelow.

Specifically, the spectrum is divided into several groups ofspectral-domain transform coef-ficients, which are referred to as scalefactor bands (SFB). Each frequency-domain scalefactorband will have its individual scalefactor, which is used to scale the amplitude of all spectral-domain transform coefficients in that scalefactor band. This process shapes the spectrum ofthe quantization noise according to the masking threshold portrayed in Figure 11.10, as esti-mated on the basis of the psychoacoustic model. The width of the frequency-domain scalefac-tor bands is adjusted according to the critical bands of the human auditory system [423], seenin Figure 11.9. The number of frequency-domain scalefactorbands and their width depend onthe transform length and sampling frequency. The spectral-domain noise shaping is achieved

Page 98: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 551

11.2. GENERAL AUDIO CODING 551

Outer Iteration Loop

Begin

Inner Iteration Loop

All SFBs amplified??

Best result??

End

InnerIteration Loop

End

Begin

yes

yes

yes

no

no

no

no

yes

available bits??

number of bits >

Non-linear Quantization

of transform coefficient

Huffman Coding

(count no. of bits)

Compute Transform Coeff.

more than theallowed distortion

Amplify SFBs having

At least one band’sMSE >

allowed MSE??

MSE in all SFBs

Restorebest tr. coeff.

quantizer

Store best tr. coeff.

quantizer

Decrease Quantizer

Step Size

Figure 11.12: AAC inner and outer quantization loops designed for encoding the frequency-domaintransform coefficients.

by adjusting the scalefactor using a step size of 1.5 dB. The decision as to which scalefactorbands should be amplified/attenuated relies on the threshold computed from the psychoa-coustic model and also on the number of bits available. The spectral coefficients amplifiedhave high amplitudes and this results in a higher SNR after quantization in the correspondingscalefactor bands. This also implies that more bits are needed for encoding the transformcoefficients of the amplified scalefactor bands and hence thedistribution of bits across thescalefactor bands will be altered. Naturally, the scalefactor information will be needed at thedecoder, hence the scalefactors will have to be encoded as efficiently as possible. This isachieved by first exploiting the fact that the scalefactors usually do not change dramaticallyfrom one scalefactor band to another. Thus a differential encoding proved useful. Secondly,Huffman coding is applied, in order to further reduce the redundancy associated with theencoding of the scalefactors [40].

Again, the AAC quantization and coding process consists of two iteration loops, the in-

Page 99: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 552

552 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

ner and outer loops. The inner iteration loop shown in Figure11.12 consists of a non-linearfrequency-domain transform coefficient quantizer and the noiseless Huffman coding module.The frequency-domain transform coefficient values are firstquantized using a non-uniformquantizer, and further processing using the noiseless Huffman coding tool is applied forachieving a high coding efficiency. The quantizer step size is decreased until the numberof bits generated exceeds the available bit rate budget of the particular scalefactor band con-sidered. Once the inner iteration process is completed, theouter loop evaluates the MeanSquare Error (MSE) associated with all transform coefficients for all scalefactor bands. Thetask of the outer iteration loop is to amplify the transform coefficients of the scalefactor bands,in order to satisfy the requirements of the psychoacoustic model. The MSE computed is com-pared to the masking threshold value obtained from the associated psychoacoustic analysis.When the best result, i.e. the lowest MSE is achieved, the corresponding quantization schemewill be stored in memory. Subsequently, the scalefactor bands having a higher MSE than theacceptable threshold are amplified, using a step size of 1.5 dB. The iteration process will becurtailed, when all scalefactor bands have been amplified orit was found that the MSE ofno scalefactor band exceeds the permitted threshold. Otherwise, the whole process will berepeated, using new SFB amplification values, as seen in Figure 11.12.

11.2.7 Noiseless Huffman Coding

The noiseless Huffman coding tool of Figure 11.12 is used forfurther reducing the redun-dancy inherent in the quantized frequency-domain transform coefficients of the audio signal.One frequency-domain transform coefficients quantizer perscalefactor band is used. The stepsize of each of these frequency-domain transform coefficients quantizers is specified in con-junction with a global gain factor that normalizes the individual scalefactors. The global gainfactor is coded as an 8-bit unsigned integer. The first scalefactor associated with the quan-tized spectrum is differentially encoded relative to the global gain value and then Huffmancoded using the scalefactor codebook. The remaining scalefactors are differentially encodedrelative to the previous scalefactor and then Huffman codedusing the scalefactor codebook.

Noiseless coding of the quantized spectrum relies on partitioning of the spectral coeffi-cients into sets. The first partitioning divides the spectrum into scalefactor bands that containan integer multiple of 4 quantized spectral coefficients. The second partitioning divides thequantized frequency-domain transform coefficients into sections constituted by several scale-factor bands. The quantized spectrum within such a section will be represented using a singleHuffman codebook chosen from a set of twelve possible codebooks. This includes a partic-ular codebook that is used for signalling that all the coefficients within that section are zero.Hence no spectral coefficients or scalefactors will be transmitted for that particular band, andthus an increased compression ratio is achieved. This is a dynamic quantization process,which varies from block to block, such that the number of bitsneeded for representing thefull set of quantized spectral coefficients is minimized. The bandwidth of the section and itAUTOINDEX number=27 closes associated Huffman codebook indices must be transmittedas side information, in addition to the section’s Huffman coded spectrum.

Huffman coding creates variable length codes [431,456], where higher probability symbolsare encoded by shorter codes. The Huffman coding principlesare highlighted in Figure 11.13.Specifically, successive Column 0 in Figure 11.13 shows the set of symbols A, B, C and D,which are Huffman coded in the successive columns. At first, the symbols are sorted from

Page 100: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 553

11.2. GENERAL AUDIO CODING 553

I

III

II

A

C

D

B

0

1

0

1

0

1

A 0.4

B 0.2

C 0.3

D 0.1

A 0.4

C 0.3

B 0.2

D 0.1

A 0.4

C 0.3

I 0.3

II 0.6

A 0.4

I

II

III

I II III0

Figure 11.13: Huffman coding

top to bottom with decreasing probability. In every following step, the two lowest probabilitysymbols at the bottom are combined to one symbol, which is assigned the sum of the singleprobabilities. The new symbol is then fitted into the list at the correct position according to itsnew probability of occurrence. This procedure is continued, until all codewords are merged,which leads to a coding tree structure, as seen in Figure 11.13. The assignment of Huffmancoded bits is carried out as follows. At every node, the upperbranch is associated with abinary ’1’, and the lower branch with a binary ’0’, or the other way round. The completebinary tree can be generated by recursively reading out the symbol list, starting with symbol’III’. As a result, symbolA is coded as ’0’,B with ’1111’, C as ’10’ andD with ’110’sincenone of the symbols constitutes a prefix of the other symbols,their decoding is unambiguous.

11.2.8 Bit-Sliced Arithmetic Coding

The Bit-Sliced Arithmetic Coding (BSAC) tool, advocated byParket al. [449] is an alterna-tive to the AAC noiseless Huffman coding module of Section 11.2.7, while all other modulesof the AAC-based codec remain unchanged, as shown earlier inFigure 11.7. BSAC is in-cluded in the MPEG-4 Audio Version 2 for supporting finely-grained bitstream scalability,and further reducing the redundancy inherent in the scalefactors and in the quantized spec-trum of the MPEG-4 T/F codec [457].

In MPEG-4 Audio Version 1, the General Audio (GA) codec supports coarse scalabilitywhere a base layer bitstream can be combined with one or more enhancement layer bitstreamsin order to achieve a higher bit rate and thus an improved audio quality. For example, in atypical scenario we may utilise a 24 kbit/s base layer together with two 16 kbit/s enhancementlayers. This gives us the flexibility of decoding in three modes, namely 24 kbit/s, 24+16=40kbit/s or 24+16+16=56 kbit/s modes. Each layer carries significant amount of side informa-tion and hence finely-grained scalability was not supportedefficiently in Version 1.

The BSAC tool provides scalability in steps of 1 kbit/s per channel. In order to achievefinely-grained scalability, a ’bit-slicing’ scheme is applied to the quantized spectral coeffi-cients [449]. A simple illustration assisting us in understanding the operation of this BSACalgorithm is shown in Figure 11.14. Let us consider a quantized transform coefficient se-

Page 101: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 554

554 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

0

0

0

0

0

1

0

1

1

1

0

0

0

1

1

1

1110 (Least significant vector)

0011 (2nd significant vector)

x[3]=2

x[2]=7

x[1]=1

x[0]=5

MSB LSB

0000 (Most significant vector)

1010 (1st significant vector)

Subvector 0

Subvector 1

StatePrevious

Vector

0

0

0

0

0

0

0

0

Subvector 0

Subvector 1

StatePrevious

Vector

0

0

0

0

1st significant vector

Next State

1 0 1 0

1

0

1

0

1 0 1 0 = 1010

Most significant vector

Next State

= 0000

0 0 0 0

0 0 0 0

Subvector 0

Subvector 1

StatePrevious

Vector

Subvector 0

Subvector 1

StatePrevious

Vector

1 1 0

Next State

1 1 1 0

1

1

1

1

= 1

Least significant vector

1

0

1

1

1

=110

Next State

= 01

2nd significant vector

0 0 1 1

1

0

1

0

1

0

1

1

0 1

0 1 = 01

Figure 11.14: BSAC bit-sliced operations, where four quantized bit-sliced sequence are mapped intofour 4-bit vectors.

Page 102: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 555

11.2. GENERAL AUDIO CODING 555

quence, x[n], each coefficient quantized with the aid of fourbits, assuming the values ofx[0]=5, x[1]=1, x[2]=7 and x[3]=2. Firstly, the bits of thisgroup of sequences are processedin slices according to their significance, commencing with the MSB or LSB. Thus, the MostSignificant Bits (MSB) of the quantized vectors are grouped together yielding the bit-slicedvector of 0000, followed by the 1st significant vector (1010), 2nd significant vector (0011)and the least significant vector (1110), as displayed in the top half of Figure 11.14.

The next step is to process the four bit-sliced vectors, exploiting their previous values,which is first initialized to zero. The MSB vector (0000) is first decomposed into two sub-vectors. Subvector 0 is is composed of the bit values of the current vector whose previousstate is 0, while Subvector 1 consists of bit values of the current vector whose previous stateis 1. Note that when a specific previous state bit is zero, the next state bit will remain zeroif the corresponding bit value of the current vector is zero and it is set to 1, when either theprevious state bit or the current vector’s bit value, or bothis 1.

By utilising this BSAC scheme, finely-grained bit rate scalability can be achieved by em-ploying first the most significant bits. An increasing numberof enhancement layers can beutilised by using more of the less significant bits obtained through the bit-slicing procedure.The actively encoded bandwidth can also be increased by providing bit slices of the transformcoefficients in the higher frequency bands.

11.2.9 Transform-domain Weighted Interleaved Vector Quantization

As shown in Figure 11.7, the third quantization and coding tool employed for compress-ing the spectral components is the so-called Transform-domain Weighted Interleaved VectorQuantization (TWINVQ) [41] scheme. It is based on an interleaved vector quantization andLPC spectral estimation technique, and its performance wassuperior in comparison to AACcoding at bit rates below 32 kbit/s per channel [450, 458–460]. TWINVQ invokes some ofthe compression tools employed by the G.729 8 kbit/s standard codec [372], such as the LPCanalysis, LSF parameter quantization employing conjugatestructure VQ [461]. The opera-tion of the TWINVQ encoder is shown in Figure 11.15. Each blockwill be described duringour further discourse in a little more depth. Suffice to say that TWINVQ was found to besuperior for encoding audio signals at extremely low bit rates, since the AAC codec per-forms poorly at low bit rates, while the CELP mode of MPEG-4 isunable to encode musicsignals [462]. The TWIN-VQ scheme has also been used as a general coding paradigm forrepresenting both speech and music signals at a rate of 1 bit per sample [463].

More specifically, the input signal, as shown in Figure 11.15, is first transformed into thefrequency domain using the MDCT. Before the transformation, the input signal is classifiedinto one of three modes, each associated with a different transform window size, namely along, medium or short window. In the long-frame mode, the transform size is equal to theframe size of 1024. The transform operations are carried outtwice in a 1024-sample framewith a half-transform size in the medium-frame mode, and eight times having a one-eighthtransform size in the short-frame mode. These different window sizes cater for different inputsignal characteristics. For example, transient signals are best encoded using a small transformsize, while stationary signals can be windowed employing the normal long frame mode.

As shown in Figure 11.15, the spectral envelope of the MDCT coefficients is approximatedwith the aid of LPC analysis applied to the time-domain signal. The LPC coefficients are thentransformed to the Line Spectrum Pair (LSP) parameters. A two-stage split vector quantizer

Page 103: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 556

556 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

MDCT

LPC Envelope

Pitch Component

Bark Envelope

LPC Analysis

Interleave

Weighted VQ

Interleave

Window Switch

Input Signal

Power

Index

Index

Index

Index

Index

Index

Figure 11.15: TWINVQ encoder [458].

with inter-frame moving-average prediction was used for quantizing the LSPs, which wasalso employed in the G.729 8 kbit/s standard codec [372]. TheMDCT coefficients are thensmoothed in the frequency domain using this LPC spectral envelope. After the smoothing bythe LPC envelope, the resultant MDCT coefficients still retain their spectral fine structure.In this case, the MDCT coefficients would still exhibit a highdynamic range, which is notamenable to vector quantization. Pitch analysis is also employed, in order to obtain thebasic harmonic of the MDCT coefficients, although this is only applied in the long framemode. The periodic MDCT peak components correspond to the pitch period of speech oraudio signal. The extracted pitch parameters are quantizedby the interleaved weighted vectorquantization scheme [464], as it will be explained later in this section.

As seen in Figure 11.15, the Bark-envelope is then determined from the MDCT coeffi-cients, which is smoothed by the LPC spectrum. This is achieved by first calculating thesquare-rooted power of the smoothed MDCT coefficients corresponding to each Bark-scalesubband. Subsequently, the average MDCT coefficient magnitudes of the Bark-scale sub-bands are normalized by their overall average value in orderto create the Bark-scale enve-

Page 104: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 557

11.2. GENERAL AUDIO CODING 557

Weighted VQ Weighted VQ Weighted VQWeighted VQ

DividedSubvectors

Index Index Index Index

Input Vector

Interleave

Weights

Figure 11.16: TWINVQ interleaved weighted vector quantization process [41].

lope. Before quantizing the Bark-scale envelope, further redundancy reduction is achievedby employing interframe backward prediction, whereby the correlation between the Bark-scale envelope of the current 23.2 ms frame and that of the previous frame is exploited. Ifthe correlation is higher than 0.5, the prediction is activated. Hence, an extra flag bit hasto be transmitted. The Bark-scale envelope is then vector quantized using the technique ofinterleaved weighted vector quantization, as seen at the bottom of Figure 11.15 [463] andaugmented below.

At the final audio coding stage, the smoothed MDCT coefficients are normalized by aglobal frequency-domain gain value, which is then scalar quantized in the logarithm do-main, which takes place in the ’Weighted VQ’ block of Figure 11.15. Finally, the MDCTcoefficients are interleaved, divided into subvectors for the sake of reducing the associatedmatching complexity, and vector quantized using a weighteddistortion measure derived fromthe LPC spectral envelope [464]. The role of the weighting isthat of reducing the spectral-domain quantization errors in the perceptually most vulnerable frequency regions. Moriyaet al. [464] proposed this vector quantizer, since it constitutesa promising way of reducingthe computational complexity incurred by vector quantization [461], as it will be highlightedbelow. Specifically, this two-stage MDCT VQ-scheme uses twosets of trained codebooksfor vector quantizing the MDCT coefficients of a subvector, and the MDCT subvector isreconstructed by superimposing the two codebook vectors. In the encoder, a full-search isinvoked for finding the combination of the code vector indices that minimizes the distortionbetween the input and reconstructed MDCT subvector. This two-stage MDCT VQ-schemeconstitutes a sub-optimal arrangement in comparison to a single-stage VQ, however it signif-icantly reduces the memory and the computational complexity required. The employment ofa fixed frame rate combined with the above vector quantizer improves its robustness against

Page 105: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 558

558 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Parameters No. of BitsWindow mode 4MDCT coefficients 295Bark-envelope VQ 44Prediction switch 1Gain factor 9LSF VQ 19Total bits 372

Table 11.1: MPEG-4 TWINVQ bit allocation scheme designed for a rate of 16 kbit/s, which corre-sponds to 372 bits per 23.22 ms frame.

errors, since it does not use any error sensitive compression techniques, such as adaptive bitallocation or variable length codes [458].

The MPEG-4 TWINVQ bitstream structure is shown in Table 11.1 in its 16 kbit/s mode,which will be used in our investigations in order to construct a multi-mode speech transceiver,as detailed in Section 11.5. A substantial fraction of the bits were allocated for encoding theMDCT coefficients, which were smoothed by the LPC and Bark-scale spectra. Specifically, atotal of 44 bits were allocated for vector quantizing the Bark-scale envelope, while one bit isused for the interframe prediction flag. Nine bits were used for encoding the global spectral-domain gain value obtained from the MDCT coefficients and theLSF VQ requires 19 bitsper 23.22 ms audio frame.

11.2.10 Parametric Audio Coding

An enhanced functionality provided by the MPEG-4 Audio Version 2 scheme is parametricaudio coding, with substantial contributions from Purnhagen et al. [465–467], Edler [468],Levine [469] and Verma [470]. This compression tool facilitates the encoding of audio signalsat the very low bit rate of 4 kbit/s, using a parametric representation of the audio signal.Similarly to the philosophy of parametric speech coding, here instead of waveform codingthe audio signal is decomposed into audio objects, which aredescribed by appropriate sourcemodels and the quantized models parameters are transmitted. This coding scheme is referredto as the Harmonic and Individual Lines plus Noise (HILN) technique, which includes objectmodels for sinusoids, harmonic tones and noise components [466].

Due to the limited bit rate budget at low target bit rate, onlythe specific parameters that aremost important for maintaining an adequate perceptual quality of the signal are transmitted.More specifically, in the context of the HILN technique, the frequency and amplitude param-eters are quantized using existing masking rules from psychoacoustics [424]. The spectralenvelope of the noise and harmonic tones is described using LPC techniques. Parameter pre-diction is employed in order to exploit the correlation between the parameters across consec-utive 23.22 ms frames. The quantized parameters are finally encoded using high-efficiency,but error-sensitive Huffman coding. Using a speech/music classification tool in the encoder, itis possible to automatically activate the coding of speech signals using the HVXC parametricencoder or the HILN encoder contrived for music signals.

The operating bit rate of the HILN scheme is at a fixed rate of 6 kbit/s in the mono, 8kHz

Page 106: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 559

11.3. SPEECH CODING IN MPEG-4 AUDIO 559

sampling rate mode and 16 kbit/s in the mono, 16 kHz sampling rate mode, respectively. In analternative proposal by Levineet al. in [471], an audio codec employing switching betweenparametric and transform coding based representations wasadvocated. Sinusoidal signals andnoise are modelled using multiresolution sinusoidal modelling [469] and Bark-scale basednoise modelling, respectively, while the transients are represented by short-window basedof transform coding. Vermaet al. in [470] extended the work in [469] by proposing anexplicit transient model for sinusoidal-like signals and for noise. A slowly varying sinusoidalsignal is impulse-like in the frequency domain. By contrast, transients are impulse-like in thetime domain and cannot be readily represented with the aid ofShort-Time Fourier Transform(STFT) based analysis. However, due to the duality between time and frequency, transientswhich are impulse-like in the time domain, appear to be oscillatory in the frequency domain.Hence, sinusoidal modelling can be applied after the transformation of the transient time-domain signals to sinusoidal-like signals in the frequencydomain by quantizing their DCT[470] coefficients.

11.3 Speech Coding in MPEG-4 Audio

While the employment of transform coding is dominant in coding music audio and speechsignals at rates above 24 kbit/s, its performance deteriorates, as the bit rate decreases. Hence,in the MPEG-4 audio scheme, dedicated speech coding tools are included, operating at thebit rates in the range between 2 and 24 kbit/s [44, 49]. Variants of the Code Excite LinearPrediction (CELP) technique [365] are used for the encodingof speech signals at the bitrates between 4 and 24 kbit/s, incorporating the additionalflexibility of encoding speechrepresented at both 8 and 16 kHz sampling rates. Below 4 kbit/s, a sinusoidal technique,namely the so-called Harmonic Vector eXcitation Coding (HVXC) scheme was selected forencoding speech signals at rates down to a bit rate of 2 kbit/s. The HVXC technique will bedescribed in the next section, while CELP schemes will be discussed in Section 11.3.2.

11.3.1 Harmonic Vector Excitation Coding

Harmonic Vector Excitation Coding (HVXC) is based on the signal classification of voicedand unvoiced speech segments, facilitating the encoding ofspeech signals at 2 kbit/s and 4kbit/s [472, 473]. Additionally, it also supports variablerate encoding by including specificcoding modes for both background noise and mixed voice generation in order to achieve anaverage bit rate as low as 1.2 - 1.7 kbit/s.

The basic structure of an HVXC encoder is shown in Figure 11.17, which first performsLPC analysis for obtaining the LPC coefficients. The LPC coefficients are then quantizedand used in the inverse LPC filtering block in order to obtain the prediction residual signal.The prediction residual signal is then transformed into thefrequency domain using the Dis-crete Fourier Transform (DFT) and pitch analysis is invoked, in order to assist in the V/UVclassification process. Furthermore, the frequency-domain spectral envelope of the predictionresidual is quantized by using a combination of two-stage shape vector quantizer and a scalargain quantizer. For unvoiced segments, a closed-loop codebook search is carried out in orderto find the best excitation vector.

Specifically, the HVXC codec operates on the basis of a 20 ms frame length for speech

Page 107: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 560

560 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

V/UVDecision

EnvelopeQuantization

Codebook Search

BITSTREAM

FORMATTING

PitchAnalysis

InputSignal DFT Pitch/Envelope

Analysis

PredictionResidual

LPC Analysisand Quantization

Inverse LPCFiltering

Figure 11.17: Harmonic Vector Excitation Coding

signals represented at an 8 kHz sampling rate. Table 11.2 shows the bit allocation schemes ofthe HVXC codec at rates of 2 and 4 kbit/s [474]. Both voiced andunvoiced speech segmentsuse the LSF parameters and the voiced/unvoiced indicator flag. For 2 kbit/s transmissionof voiced speech, the parameters include 18 bits for LSF quantization using a two-stagesplit vector quantizer, which facilitates the reduction ofthe codebook search complexity bymitigating the VQ matching complexity. Furthermore, 2 bitsare used for the V/UV modeindication, where the extra one bit is used to indicate the background noise interval andmixed speech modes for variable rate coding, as will be explained later. Furthermore, 7bits are dedicated to pitch encoding, while 8 and 5 bits are used for encoding the harmonicshape and gain of the prediction residual in Figure 11.17, respectively. Explicitly, for thequantization of the harmonic spectral magnitudes/shapes of the prediction residual in Figure11.17, a two-stage shape vector quantizer is used, where thesize of both the shape codebooksis 16, both requiring a four-bit index. The codebook gains are quantized using three and twobits, respectively. In the case of unvoiced speech transmission at 2 kbit/s, besides the LSFquantization indices and the V/UV indication bits, the shape and gain codebook indices ofthe Vector eXCitation (VXC) requires 6 and 4 bits, respectively for a 10 ms frame length.

For 4 kbit/s transmission, a coding enhancement layer is added to the base layer of 2 kbit/s.In the case of LSF quantization, a 10-dimensional vector quantizer using an 8-bit codebookis added to the 18 bits/20 ms LSF quantizer scheme of the 2 kbit/s codec mode seen at thetop of Table 11.2. This results in an increased bit rate requirement for LSF quantization,namely from 18 bits/20ms to 26 bits/20ms. A split VQ scheme, composed of four vectorquantizers having addresses of 7, 10, 9 and 6 bits, respectively is added to the two-stage vectorquantizer required for the quantization of the harmonic shapes of the prediction residual inFigure 11.17. This results in a total of bit rate budget increase of 32 bits/20 ms, as seen inTable 11.2. For unvoiced speech segment encoding at 4 kbit/s, the excitation vectors of the

Page 108: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 561

11.3. SPEECH CODING IN MPEG-4 AUDIO 561

Voiced Common UnvoicedLSF1 (2-stage Split VQ at 2 kbit/s 18bits/20msLSF2 (at 4 kbit/s) 8bits/20msV/UV 2bits/20msPitch 7bits/20msHarmonic1 Shape (at 2 kbit/s) 4+4bits/20msHarmonic1 Gain (at 2 kbit/s) 5bits/20msHarmonic2 Split (at 4 kbit/s) 32bits/20msVXC1 Shape (at 2 kbit/s) 6bits/10msVXV1 Gain (at 2 kbit/s) 4bits/10msVXC2 Shape (at 4 kbit/s) 5bits/5msVXC2 Gain (at 4 kbit/s) 3bits/5ms2 kbit/s mode 40bits/20ms 40bits/20ms4 kbit/s mode 80bits/20ms 80bits/20ms

Table 11.2: MPEG-4 Bit allocations at the fixed rates of 2.0 and 4.0 kbit/s using the HVXCcodingmode [41].

Mode Background Noise Unvoiced Mixed Voiced/VoicedV/UV 2bits/20ms 2bits/20ms 2bits/20msLSF 0bits/20ms 18bits/20ms 18bits/20msExcitation 0bits/20ms 8bits/20ms 20bits/20ms

(gain only) (pitch & harmonicspectral parameters)

Total 2bits/20ms 28bits/20ms 40bits/20ms= 0.1 kbit/s = 1.4 kbit/s = 2.0kbit/s

Table 11.3: Bit allocations for variable rate HVXC coding [41].

enhancement layer are obtained by utilising codebook search and the gain/shape codebookindices, which minimize the weighted distortion are transmitted. Specifically, a 5-bit shapecodebook as well as 3-bit gain codebook are used and this procedure is updated every 5 ms.For the unvoiced speech segments, the LPC coefficients of only the current 20 ms frame areused for two 10 ms subframes without any interpolation procedure using the LPC coefficientsfrom the previous frame. Again, the codec’s performance wassummarised in Table 11.2.

Optional variable rate coding can be applied to the HVXC codec, incorporating back-ground noise detection, where only the mode bits are received during the “background noisemode”. When the HVXC codec is in the “background noise mode”, the decoding is similarto the manner applied in an UV frame, but in this scenario no LSF parameters are transmit-ted while only the mode bits are transmitted. Instead two sets of LSF parameters generatedduring the previous two UV frames will be used for the LPC synthesis process. During thebackground noise mode, fully encoded unvoiced (UV) frames are inserted every nine 20 msframes, in order to transmit the background noise parameters. This means only eight con-secutive “background noise” frame are allowed to use the same two sets of LSF parameters

Page 109: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 562

562 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

CELP

Mode Narrowband WidebandSampling Rate (kHz) 8 16Bandwidth (Hz) 300 - 3400 50 - 7000Bit Rate (kbit/s) 3.85 - 12.2 10.9 - 24.0Excitation Scheme MPE/RPE RPEFrame Size (ms) 10 - 40 10 - 20Delay (ms) 15 - 85 18.75 - 41.75

Multi Bit Rate CodingFeatures Bit Rate Scalability

Bandwidth ScalabilityComplexity Scalability

Table 11.4: Summary of various features of the MPEG-4 CELP codec [41].

from the previous UV frames. Hereafter new UV frame will be transmitted. This UV framemay or may not be a real UV frame indicating the beginning of active speech bursts. This issignalled by the transmitted gain factor. If the gain factoris smaller or equal to the previoustwo gain values, then this UV frame is regarded as backgroundnoise. In this case the mostrecent previously transmitted LSF parameters are used for maintaining the smooth variationof the LSF parameters. Otherwise, the currently transmitted LSFs are used, since the frameis deemed a real UV frame. During background noise periods, again-normalised Gaussiannoise vector is used instead of the stochastic prediction residual shape codebook entry em-ployed during UV frame decoding. The prediction residual gain value is encoded using an8-bit codebook entry, as displayed in Table 11.3.

Table 11.3 shows the bit allocation scheme of variable rate HVXC coding for four differ-ent encoding modes, which are the modes dedicated to background noise, unvoiced, mixedvoiced and voiced segments. The mixed voiced and voiced modes share the same bit allo-cation at 2 kbit/s. The unvoiced mode operates at 1.4 kbit/s,where only the gain parameterof the vector excitation is transmitted. Finally, for the background noise mode only the twovoiced/unvoiced/noise signalling bits are transmitted.

11.3.2 CELP Coding in MPEG-4

While the HVXC mode of MPEG-4 supports the very low bit rate encoding of speech signalsfor rates below 4 kbit/s, the CELP compression tool is used for bit rates in excess of 4 kbit/s,as illustrated in the summary of Table 11.4. The MPEG-4 CELP tool enables the encodingof speech signals at two different sampling rates, namely at8 and 16 kHz [475]. For nar-rowband speech coding, the operating bit rates are between 3.85 and 12.0 kbit/s. Higher bitrates between 10.9 and 24 kbit/s are allocated for wideband speech coding, which cater for ahigher speech quality due to their extended bandwidth of about 7 kHz. The MPEG-4 CELPcodec supports a range of further functionalities, which include the possibility of supportingmultiple bit rates, bit rate scalability, bandwidth scalability and complexity scalability. Addi-tionally, the MPEG-4 CELP mode supports both fixed and variable bit rate transmission. The

Page 110: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 563

11.3. SPEECH CODING IN MPEG-4 AUDIO 563

PERCEPTUAL WEIGHTEDERROR MINIMIZATION

FIXEDCODEBOOK(MPE/RPE)

InputSpeech

++LPC info

PREPROCESSING

LPC info

ADAPTIVECODEBOOK

Gp

Gc

FILTERLPC SYNTHESIS

LPC ANALYSIS

INTERPOLATIONLAR/LSF QUANTIZATION

Figure 11.18: CELP encoder.

bit rate is specified by the user’s requirements, taking account of the sampling rate chosenand also of the type of LPC quantizer (scalar quantizer or vector quantizer) selected. Thedefault CELP codec operating at 16 kHz sampling rate employsa scalar quantizer and in thismode also the Fine Rate Control (FRC) switch is turned on. TheFRC mode allows the codecto change the bit rate by skipping the transmission of the LPCcoefficients, by utilising theInterpolation and theLPC Present flags [476], as it will be discussed in Section 11.3.3.By contrast, at the 8 kHz sampling rate, the default MPEG-4 CELP mode utilises vectorquantizer and the FRC switch is turned off.

As shown in Figure 11.18, first the LPC coefficients of the input speech are determinedand converted to Log Area Ratios (LAR) or LSF. The LARs or LSFsare then quantized andalso inverse quantized, in order to obtain the quantized LPCcoefficients. These coefficientsare used by the LPC synthesis filter. The excitation signal consists of the superposition ofcontributions by the adaptive codebook and one or more fixed codebooks. The adaptivecodebook represents the periodic speech components, whilethe fixed codebooks are used forencoding the random speech components. The transmitted parameters include the LAR/LSFcodebook indices, the pitch lag for the adaptive codebook, the shape codebook indices ofthe fixed codebook and the gain codebook indices of the adaptive as well as fixed codebookgains. Multi-Pulse Excitation (MPE) [477] or Regular PulseExcitation (RPE) [478] can alsobe used for the fixed codebooks. The difference among the two lies in the degree of freedomfor pulse positions. MPE allows more freedom in the choice ofthe inter-pulse distance thanRPE, which has a fixed inter-pulse distance. As a result, MPE typically achieves a betterspeech coding quality than RPE at a given bit rate. On the other hand, the RPE schemeimposes a lower computational complexity than MPE, which renders MPE a useful tool forwideband speech coding, where the computational complexity is naturally higher than in

Page 111: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 564

564 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

2nd StageVQ

2nd StageVQ

+

+

1st Stage VQ

Predictor Buffer

QuantizedLSFs

LSFs

1

2

2

1

VQ without prediction

VQ with prediction

_

_

Figure 11.19: LSF VQ operating in two modes, with or without LSF prediction at the second stageVQ.

narrowband speech coding due to the doubled sampling rate used.

11.3.3 LPC Analysis and Quantization

Depending on the tolerable complexity, the LPC coefficientscan be quantized using either ascalar or a vector quantization scheme. When a scalar quantizer is used, the LPC coefficientshave to be transformed to the LAR parameters. In order to obtain the LAR parameters,the LPC coefficients are first transformed to the reflection coefficients [339]. The reflectioncoefficients are then quantized using a look-up table. The relationship between the LARs andthe reflection coefficients are described by:

LAR[i] = log((1 − q rfc[i])/(1 + q rfc[i])) (11.12)

whereq rfc represents the quantized reflection coefficients. The necessity to transmit theLARs depends on the amount of change between the current audio/speech spectrum and thespectrum described by the LARs obtained by interpolation from the LARs of the adjacentframe. If the spectral change is higher than a pre-determined threshold, then the currentLAR coefficients are transmitted to the decoder. The threshold is adaptive, depending on thedesired bit rate. If the resultant bit rate is higher than thedesired bit rate, the threshold israised, otherwise, it is lowered. In order to reduce the bit rate further, the LAR coefficientscan be losslessly Huffman coded. We note however that lossless coding will only be appliedto the LARs but not to the LSF, since only the LARs are scalar quantized and there is no LARVQ in the standard.

If vector quantization of the LPC coefficients is used, the LPC coefficients are be trans-formed into the LSF domain. There are two methods of quantizing the LSFs in the CELPMPEG-4 mode. We can either employ a two-stage vector quantizer without interframe LSF

Page 112: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 565

11.3. SPEECH CODING IN MPEG-4 AUDIO 565

Interpolation LPC Present Description1 1 LPCcur = interpolate(LPCprev + LPCnext)0 0 LPCcur = LPCprev

0 1 LPCcur = LPC received in current frame

Table 11.5: Fine Rate Control utilising theInterpolation andLPC Present flags [41].

prediction, or in combination with interframe LSF prediction, as shown in Figure 11.19. Inthe case of using a two-stage vector quantizer without interframe LSF prediction, the second-stage VQ quantizes the LSF quantization error of the first stage. When interframe LSF predic-tion is employed, the difference between the input LSFs and the predicted LSFs is quantized.At the encoder, both methods are applied and the better method is selected by comparing theLSF quantization error, obtained by calculating the weighted mean squared LSF error. In nar-rowband speech coding, the number of LSF parameters is 10, while it is 20 in the widebandMPEG-4 CELP speech encoding mode. The number of bits used forLSF quantization is 22for the narrowband case and 46 bits for the wideband scenario, which involves 25 bits usedfor quantizing the first ten LSF coefficients and 21 bits for the 10 remaining LSFs [41].

The procedure of spectral envelope interpolation can also be employed for interpolatingboth the LARs and LSFs. TheInterpolation flag, together with theLPC Present flag un-ambiguously describe, how the LPC coefficients of the current frame are derived. The asso-ciated functionalities are summarised in Table 11.5. Specifically, if the Interpolation flag isset to one, this implies that the LPC coefficients of the current 20 ms frame are calculated byusing the LPC coefficients of the previous and next frames. This would mean in general thedecoding of the current frame must be delayed by one frame. Inorder to avoid the latencyof one frame delay at the decoder, the LPC coefficients of the next frame are enclosed inthe current frame [41]. In this case, theLPC Present flag is set. Since the LPC coefficientsof the next frame are already present in the current frame, the next frame will contain noLPC information. When theInterpolation flag is zero and theLPC Present flag is zero,the LPC parameters of the current frame are those received inthe previous frame. Whenthe Interpolation flag is zero and theLPC Present flag is one, then the current frame is acomplete frame and the LPC parameters received in the current frame belong to the currentframe. Note that in order to maintain good subjective speechquality, it is not allowed to haveconsecutive frames without the LPC information. This meanstheInterpolation flag may nothave a value of 1 in two successive frames.

11.3.4 Multi Pulse and Regular Pulse Excitation

In MPEG-4 CELP coding, the excitation vectors can be encodedeither using the Multi-PulseExcitation (MPE) [477] or Regular-Pulse Excitation (RPE) [478] techniques. MPE is thedefault mode used for narrowband speech coding while RPE is the default mode for widebandspeech coding, due to its simplicity in comparison to the MPEtechnique.

In Analysis-by-Synthesis (AbS) based speech codecs, the excitation signal is representedby a linear combination of the adaptive codevector and the fixed codevector scaled by their re-spective gains. Each component of the excitation signal is chosen by an analysis-by-synthesissearch procedure in order to ensure that the perceptually weighted error between the input sig-

Page 113: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 566

566 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Bit Rate Frame Length No. subframes No. pulsesRange (kbit/s) (ms) per frame per subframe

3.85 - 4.65 40 4 3...54.90 - 5.50 30 3 5...75.70 - 7.30 20 2 6...127.70 - 10.70 20 4 4...1211.00 - 12.20 10 2 8...12

Table 11.6: Excitation configurations for narrowband MPE.

Bit Rate Frame Length No. subframes No. pulsesRange (kbit/s) (ms) per frame per subframe

10.9 - 13.6 20 4 5...1113.7 - 14.1 20 8 3...1014.1 - 17.0 10 2 5...1121.1 - 23.8 10 4 3...10

Table 11.7: Excitation configurations for wideband MPE

nal and the reconstructed signal is minimized [367]. The adaptive codebook parameters areconstituted by the closed-loop delay and gain. The closed-loop delay is selected with theaid of a focussed search in the range around the estimated open-loop delay. The adaptivecodevector is generated from a block of the past excitation signal samples associated withthe selected closed-loop delay. The fixed codevector contains several non-zero excitationpulses. The excitation pulse positions obey an algebraic structure [367,479]. In order to im-prove the achievable performance, after determining several sets of excitation pulse positioncandidates, a combined search based on the amalgamation of the excitation pulse positioncandidates and the pulse amplitudes is carried out.

For narrowband speech coding utilising Multi-Pulse Excitation (MPE) [477], the bit ratecan vary from 3.85 to 12.2 kbit/s when using different configurations based on varying theframe length, the number of subframes per frame, and the number of pulses per subframe.These different configurations are shown in Table 11.6 and Table 11.7 for narrowband MPEand wideband MPE, respectively.

On the other hand, Regular Pulse Excitation (RPE) [478, 480]enables implementationshaving significantly lower encoder complexity and only slightly reduced compression ef-ficiency. The RPE principle is used in wideband speech encoding, replacing MPE as thedefault mode and supporting bit rates between 13 and 24 kbit/s. RPE employs fixed pulsespacing, which implies that the distance of subsequent excitation pulses in the fixed code-book is fixed. This reduces the codebook search complexity required for obtaining the bestindices during the analysis-by-synthesis procedure.

Having introduced the most important speech and audio coding modes of the MPEG-4codec, let us now characterize its performance in the next section.

Page 114: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 567

11.4. MPEG-4 CODEC PERFORMANCE 567

0 10 20 30 40 50 60Bit Rate (kbps)

10

12

14

16

18

20

Seg

men

talS

NR

(dB

)

0 10 20 30 40 50 6010

12

14

16

18

20

0 10 20 30 40 50 6010

12

14

16

18

20

TWINVQ, 16kHz Sampling RateRPE, 16kHz Sampling RateMPE, 8kHz Sampling Rate

Figure 11.20: Segmental SNR performance for the encoding of speech signals, with respect to threedifferent MPEG-4 coding modes, where the MPE codec and the RPE codec are employedat the sampling rates of 8 kHz and 16 kHz, respectively, while the TWINVQaudio codecof Section 11.2.9 operates at 16 kHz sampling rate.

11.4 MPEG-4 Codec Performance

Figure 11.20 shows the achievable Segmental SNR performance of the MPEG-4 codec atvarious bit rates applying various speech and audio coding modes. The MPE speech codecmode has been applied for bit rates between 3.85 kbit/s and 12.2 kbit/s for encoding narrow-band speech while the RPE codec in the CELP ’toolbox’ is employed for wideband speechencoding spans from 13 kbit/s to 24 kbit/s. The TWINVQ audio codec of Section 11.2.9 wasutilised for encoding music signals for bit rates of 16 kbit/s and beyond. In Figure 11.20, thecodecs were characterized in terms of their performance, when encoding speech signals. Asexpected, the Segmental SNR increases upon increasing the bit rate. When the RPE codecmode is used, the wideband speech quality is improved in terms of both the objective Seg-mental SNR measure and the subjective quality. For the case of TWINVQ codec mode ofSection 11.2.9, the Segmental SNR increases near-linearlywith the bit rates. It is worth not-ing in Figure 11.20, that the RPE codec mode outperformed theTWINVQ codec mode over

Page 115: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 568

568 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

-10000-5000

05000

10000

Am

plitu

de

0 5 10 15 20 25 30 35 40 45 50Frame Index

0

5

10

15

20

25

Seg

men

talS

NR

(dB

)

MPE 12.0 kbpsMPE 6.00 kbpsMPE 3.85 kbps

Figure 11.21: Segmental SNR performances versus frame index for three different bit rates of 3.85, 6.0and 12.0 kbit/s, using the CELP tool in MPEG-4 Audio at a sampling rate of 8 kHz forthe speech file of five.bin.

its entire bit rate range in the context of wideband speech encoding. This is because the RPEscheme is a dedicated speech codec while the TWINVQ codec is a more general audio codec,but capable of also encoding speech signals.

Figure 11.21 displays the achievable Segmental SNR performance versus frame index forthe three different narrowband speech coding bit rates of 3.85, 6.0 and 12.0 kbit/s, using theMPE tool of the MPEG-4 Audio standard. The MPE tool offers theoption of multi-ratecoding, which is very useful in adaptive transmission schemes that can adapt the source bitrate according to the near-instantaneous channel conditions.

The performance of various codecs of the MPEG-4 toolbox is used for the encoding ofmusic signals is shown in Figure 11.22 at a sampling rate of 16kHz. We observe that asexpected, the TWINVQ codec of Section 11.2.9 performed better, than the CELP codec whenencoding music signals. The difference in Segmental SNR performance can be as high as 2

Page 116: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 569

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 569

10 20 30 40 50 60Bit Rate (kbps)

10

12

14

16

18

20

22

Seg

men

talS

NR

(dB

)

music: moza.bin

10 20 30 40 50 6010

12

14

16

18

20

22

TWINVQ, 16kHz Sampling RateCELP, 16kHz Sampling Rate

Figure 11.22: Comparing Segmental SNR performances for two different codecs -CELP andTWINVQ codecs at 16 kHz sampling rate, for coding of music file of moza.bin.

dB at the same bit rate.

11.5 MPEG-4 Space-Time Block Coded OFDM Audio Transceiver1 The Third Generation (3G) mobile communications standards[481] are expected to providea wide range of bearer services, spanning from voice to high-rate data services, supportingrates of at least 144 kbit/s in vehicular, 384 kbit/s in outdoor-to-indoor and 2 Mbit/s in indooras well as in picocellular applications.

In an effort to support such high rates, the bit/ symbol capacity of band-limited wirelesschannels can be increased by employing multiple antennas [482]. The concept of Space-Time

1This section is based on How, Liew and Hanzo: An MPEG-4 Space-Time OFDM Audio Transceiver, submittedto IEEE Proceedings of VTC, New Jersey, USA, 2001 and it was based on collaborative research with the co-authors.

Page 117: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 570

570 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

Trellis Codes (STTCs) was proposed by Tarokh, Seshadri and Calderbank [483] in 1998.By jointly designing the FEC, modulation, transmit diversity and optional receive diversityscheme, they increased the effective bits/symbol (BPS) throughput of band-limited wirelesschannels, given a certain channel quality. A few months later, Alamouti [50] invented a low-complexity Space-Time Block Code (STBC), which imposes a significantly lower complexityat the cost of a slight performance degradation. Alamouti’sinvention motivated Tarokhetal. [484,485] to generalise Alamouti’s scheme to an arbitrary number of transmitter antennas.Then, Tarokhet al., Bauchet al. [486], Agrawal [487], Liet al. [488] and Naguibet al.[489] extended the research of space-time codes from considering narrow-band channels todispersive channels [490]. The benefits of space time codingin terms of mitigating the effectsof channel fading are substantial and hence they were optionally adopted in the forthcoming3G cellular standards [491].

In recent years substantial advances have been made in the field of Orthogonal FrequencyDivision Multiplexing (OFDM), which was first proposed by Chang in his 1966 paper [492].Research in OFDM was revived amongst others by Cimini in his often cited paper [493] andthe field was further advanced during the nineties, with a host of contributions documentedfor example in [494]. In Europe, OFDM has been favoured for both Digital Audio Broad-casting (DAB) and Digital Video Broadcasting (DVB) [495, 496] as well as for high-rateWireless Asynchronous Transfer Mode (WATM) systems due to its ability to combat the ef-fects of highly dispersive channels [497]. Most recently OFDM has been also proposed forthe downlink of high-rate wireless Internet access [498].

At the time of writing we are witnessing the rapid emergence of intelligent multi-modeHigh-Speed Downlink Packet Access (HSDPA) style mobile speech and audio communica-tors [339, 371, 499], that can adapt their parameters in response to rapidly changing propa-gation environments. Simultaneously, significant effortshave been dedicated to researchingmulti-rate source coding, which are required by the near-instantaneously adaptive transceivers[500]. The recent GSM Adaptive Multi-Rate (AMR) standardization activities have promptedsignificant research interests in invoking the AMR mechanism in half-rate and full-rate chan-nels [28]. Recently ETSI also standardized the wideband AMR(AMR-WB) speech codec[337] for the GSM system, which provides a high speech quality due to representing the ex-tra audio bandwidth of 7 kHz, instead of the conventional 3.1kHz bandwidth. Finally, thefurther enhanced AMR-WB+ audio- and speech codec was detailed in Section 9.7.

The standardization activities within the framework of theMPEG-4 audio coding initia-tive [501] have also reached fruition, supporting the transmission of natural audio signals,including the representation of synthetic audio, such as Musical Instrument Digital Interface(MIDI) [48] and Text-to-Speech (TTS) systems [42]. A wide ranging set of bit rates span-ning from 2 kbit/s per channel up to 64 kbit/s per channel are supported by the MPEG-4 audiocodec.

Against this backcloth, in this section the underlying trade-offs of using the multi-rateMPEG-4 TWINVQ audio encoder of Section 11.2.9, in conjunction with a turbo-coded [502]and space-time coded [483], reconfigurable BPSK/QPSK/16QAM OFDM system [51] areinvestigated, in order to provide an attractive system design example.

Page 118: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 571

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 571

SourceEncoder

ChannelInterleaver

ModulatorIFFT

IFFT

Channel

SourceDecoder

TurboDecoder

ChannelDeinterleaver

Demodulator Space TimeDecoder FFT

FFT

EncoderSpace TimeTurbo

Encoder

Figure 11.23: Schematic overview of the turbo-coded and space-time coded OFDM system.

11.5.1 System Overview

Figure 11.23 shows the schematic of the turbo-coded and space-time coded OFDM system.The source bits generated by the MPEG-4 TWINVQ encoder [41] are passed to the turboencoder using the half-rate, constraint length three turboconvolutional encoder TC(2,1,3),employing an octal generator polynomial of (7,5). The encoded bits were channel inter-leaved and passed to the modulator. The choice of the modulation scheme to be used bythe transmitter for its next OFDM symbol is determined by thechannel quality estimate ofthe receiver based on the current OFDM symbol. Here, perfectchannel quality estimationand perfect signalling of the required modem modes were assumed. In order to simplifythe task of signalling the required modulation modes from receiver A to transmitter B, weemployed the subband-adaptive OFDM transmission scheme proposed by Kelleret al. [51].More specifically, the total OFDM symbol bandwidth was divided into equi-width subbandshaving a similar channel quality, where the same modem mode was assigned. The modulatedsignals were then passed to the encoder of the space-time block codeG2 [50], which employstwo transmitters and one receiver. The space-time encoded signals were OFDM modulatedand transmitted by the corresponding antennas.

The received signals were OFDM demodulated and passed to thespace-time decoders.Logarithmic Maximum Aposteriori (Log-MAP) decoding [503]of the received space-timesignals was performed, in order to provide soft-outputs forthe TC(2,1,3) turbo decoder. Thereceived bits were then channel deinterleaved and passed tothe TC decoder, which again,employs the Log-MAP decoding algorithm. The decoded bits were finally passed to theMPEG-4 TWINVQ decoder for obtaining the reconstructed audiosignal.

11.5.2 System parameters

Table 11.8 and 11.9 gives an overview of the proposed system’s parameters. The transmissionparameters have been partially harmonised with those of theTDD-mode of the Pan-EuropeanUMTS system [491]. The sampling rate is assumed to be 1.9 MHz,leading to a 1024 sub-carrier OFDM symbol. The channel model used was the four-path COST 207 Typical Urban(TU) Channel Impulse Response (CIR) [409], where each impulse was subjected to indepen-dent Rayleigh fading having a normalised Doppler frequencyof 2.25 · 10−6, correspondingto a pedestrian scenario at a walking speed of 3mph. The channel impulse response is shownin Figure 11.24.

The channel encoder is a convolutional constituent coding based turbo encoder [502], em-

Page 119: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 572

572 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

System Parameters ValueCarrier Frequency 1.9 GHzSampling Rate 3.78 MHzChannelImpulse Response COST207Normalised Doppler Frequency2.25 · 10−6

OFDMNumber of Subcarriers 1024OFDM Symbols/Packet 1OFDM Symbol Duration (1024+64)

x 1/(3.78 · 106)Guard Period 64 samplesModulation Scheme Fixed ModulationsSpace Time CodingNumber of transmitters 2Number of receivers 1Channel Coding Turbo ConvolutionalConstraint Length 3Code Rate 0.5Generator Polynomials 7, 5Turbo Interleaver Length 464/928/1856/2784Decoding Algorithm Log MAPNumber of Iterations 8Source Coding MPEG-4 TWINVQBit Rates (kbit/s) 16 - 64Audio Frame Length (ms) 23.22Sampling Rate (kHz) 44.1

Table 11.8: System Parameters

ploying block turbo interleavers and a pseudo-random channel interleaver. Again, the con-stituent Recursive Systematic Convolutional (RSC) encoder employs a constraint length of3 and the octal generator polynomial of (7,5). Eight iterations are performed at the decoder,utilising the MAP-algorithm and the Log-Likelihood Ratio (LLR) soft inputs provided by thedemodulator.

The MPEG-4 TWINVQ audio coder has been chosen for this system,which can be pro-grammed to operate at bit rates between 16 and 64 kbit/s. It provides a high audio quality atan adjustable bit rate and will be described in more depth in the next section.

11.5.3 Frame Dropping Procedure

For completeness, we investigated the bit sensitivity of the TWINVQ codec. A high robust-ness against bit errors, inflicted by wireless channels is animportant criterion for the designof a communication system. A commonly used approach in quantifying the sensitivity of a

Page 120: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 573

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 573

Data + Parity Bits 928 1856 3712Source Coded Bits/Packet 372 743 1486Source Coding Bit Rate (kbit/s) 16 32 64Modulation Mode BPSK QPSK 16QAMMinimum Channel SNR for 1% FER (dB) 4.3 7.2 12.4Minimum Channel SNR for 5% FER (dB) 2.7 5.8 10.6

Table 11.9: System parameters

0 2 4 6 8 10 12 14 16 18 20Sample Number

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Am

plitu

de

Figure 11.24: COST207 channel impulse response [409].

Page 121: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 574

574 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

0

2

4

6

8

10

12

14

16

18

20

SE

GS

NR

Deg

rada

tion

(dB

)0 50 100 150 200 250 300 350

Bit Index

MDCT coefficients LSPBark envelope GainWindow mode

Figure 11.25: SEGSNR degradation against bit index using MPEG-4 TWINVQ at 16 kbit/s.The cor-responding bit allocation scheme was given in Table 11.1.

given bit is to invert this bit consistently in every audio frame and to evaluate the associatedSegmental SNR (SEGSNR) degradation [371]. Figure 11.25 shows the bit error sensitivityof the MPEG-4 TWINVQ encoder of Section 11.2.9 at 16 kbit/s. This figure shows that thebits representing the gain factors (bit 345-353), the LSF parameters (bit 354-372), and theBark-envelope (bit 302-343) are more sensitive to channel errors, compared to the bits rep-resenting the MDCT coefficients (bit 7-301). The bits signalling the window mode used arealso very sensitive to transmission errors and hence have tobe well protected. The windowmodes were defined in Section 11.2.

In Section 7.13.5.2 we studied the benefits of invoking multi-class embedded error cor-rection coding assigned to the narrowband AMR speech codec,while in Section 10 in thecontext of the AMR-WB codec. By contrast, in the wideband MPEG-4 TWINVQ systemstudied here erroneously received audio frames are droppedand replaced by the previousaudio frame, since the system is aiming for maintaining a high audio quality and the error-infested audio frames would result in catastrophic inter-frame error propagation. Hence thesystem’s audio quality is determined by the tolerable transmission Frame Error Rate (FER),rather than by the BER. In order to determine the highest FER that can be tolerated by theMPEG-4 TWINVQ codec, it was exposed to random frame dropping and the associatedSEGSNR degradation as well as the informally assessed perceptual audio degradation wasevaluated. The corresponding SEGSNR degradation is plotted in Figure 11.26. Observe inthe figure that at a given FER the higher rate modes suffer froma higher SEGSNR degra-dation. This is because their audio SEGSNR is inherently higher and hence for exampleobliterating one frame in 100 frames inevitably reduces theaverage SEGSNR more dramat-ically. We found that the associated audio quality expressed in terms of Segmental SNR(SEGSNR) degradation was deemed to be perceptually objectionable for frame error rates inexcess of 1%. Again, frame dropping was preferred, which wasfound to be more beneficial

Page 122: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 575

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 575

10-1

2 5 100

2 5 101

FER (%)

0.0

0.5

1.0

1.5

2.0

2.5S

EG

SN

RD

egra

datio

n(d

B)

64 kbit/s32 kbit/s16 kbit/s

Figure 11.26: SEGSNR degradation against FER for the MPEG-4 TWINVQ codec of Section 11.2.9,at bit rates of 16, 32 and 64 kbit/s. The SEGSNR degradation values wereobtained, inconjunction with the employment of frame dropping.

in audio quality terms, than retaining corrupted audio frames.For the sake of completeness, Figure 11.27 shows the SEGSNR degradation when inflict-

ing random bit errors but retaining the corrupted audio frames. As expected, the highest bitrate mode of 64 kbit/s suffered the highest SEGSNR degradation upon increasing the BER,since a higher number of bits per frame was corrupted by errors, which degraded the audioquality more considerably.

11.5.4 Space-Time Coding

Traditionally, the most effective technique of combating fading has been the exploitationof diversity [483]. Diversity techniques can be divided into three broad categories, namelytemporal diversity, frequency diversity and spatial diversity. Temporal and frequency diver-sity schemes [50] introduce redundancy in the time and/or frequency domain, which resultsin a loss of bandwidth efficiency. Examples of spatial diversity are constituted by multi-ple transmit- and/or receive-antenna based systems [483].Transmit-antenna diversity relieson employing multiple antennas at the transmitter and henceit is more suitable for downlinktransmissions, since having multiple transmit antennas atthe base station is certainly feasible.By contrast, receive-antenna diversity employs multiple antennas at the receiver for acquiringmultiple copies of the transmitted signals, which are then combined in order to mitigate thechannel-induced fading.

Space time coding [50,483] is a specific form of transmit-antenna diversity, which aims forusefully exploiting the multipath phenomenon experiencedby signals propagating throughthe dispersive mobile channel. This is achieved by combining multiple transmission antennasin conjunction with appropriate signal processing at the receiver, in order to provide diversity

Page 123: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 576

576 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

10-1

2 5 100

2 5 101

BER (%)

0

2

4

6

8

10

12

SE

GS

NR

Deg

rada

tion

(dB

)

64 kbit/s32 kbit/s16 kbit/s

Figure 11.27: SEGSNR degradation against BER for the MPEG-4 TWINVQ codec of Section 11.2.9,at bit rates of 16, 32 and 64 kbit/s.

and coding gain in comparison to uncoded single-antenna scenarios [489].In the system investigated, we employ a two-transmitter andone-receiver configuration, in

conjunction with turbo channel coding [502]. In Figure 11.28, we show the instantaneouschannel SNR experienced by the 512-subcarrier OFDM modem for a one-transmitter, one-receiver scheme and for the space time block codeG2 [50] using two transmitters and onereceiver for transmission over the COST207 channel. The average channel SNR was 10 dB.We can see in Figure 11.28 that the variation of the instantaneous channel SNR for a one-transmitter, one-receiver scheme is severe. The instantaneous channel SNR may become aslow as 4 dB due to the deep fades inflicted by the channel. On theother hand, we can seethat for the space-time block codeG2 using one receiver the variation of the instantaneouschannel SNR is less severe. Explicitly, by employing multiple transmit antennas in Figure11.28, we have significantly reduced the depth of the channelfades. Whilst space-time codingendeavours to mitigate the fading-related time- and frequency-domain channel-quality fluc-tuations at the cost of increasing the transmitter’s complexity, adaptive modulation attemptsto accommodate these channel quality fluctuations, as it will be outlined in the next section.

11.5.5 Adaptive Modulation

In order to accommodate the time- and frequency-domain channel quality variations seen incase of the 1Tx 1Rx scenario of Figure 11.28, the employment of a multi-mode system isdesirable, which allows us to switch between a set of different source- and channel encodersas well as various transmission parameters, depending on the instantaneous channel quality[51].

In the proposed system, we have defined three operating modes, which correspond to theuncoded audio bit rates of 16, 32 and 64 kbit/s. This corresponds to 372, 743 and 1486 bits

Page 124: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 577

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 577

1 Tx 1 Rx

0128 256 384 512

Subcarrier index 0 25 50 75 100 125 150

Transmission frame (Time)

4

5

6

7

8

9

10

11

12

13

14

Inst

ante

nous

SN

R(d

B)

2 Tx 1 Rx

0128 256 384 512

Subcarrier index 0 25 50 75 100 125 150

Transmission frame (Time)

4

5

6

7

8

9

10

11

12

13

14

Inst

anta

neou

sS

NR

(dB

)

Figure 11.28: Instantaneous channel SNR of 512-subcarrier OFDM symbols for one-transmitter one-receiver (1Tx 1Rx) and for the space-time block code using two-transmitter one-receiver(2Tx 1Rx).

Page 125: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 578

578 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

0.0 0.05 0.1 0.15 0.2Channel BER

10-42

510

-32

510

-22

510

-12

510

0

FE

R

2048 BPS mode1024 BPS mode512 BPS mode

Figure 11.29: FER against channel BER performance of the adaptive OFDM modem conveying 512,1024 and 2048 BPS for transmission over the channel model of Figure11.24.

per 23.22 ms audio frame. In conjunction with half-rate channel coding and also allowingfor check sums and signalling overheads, the number of transmitted turbo coded bits perOFDM symbol is 928, 1856 and 3712 for the three source-coded modes, respectively. Again,these bit rates are also summarised in Table 11.9. Each transmission mode uses a differentmodulation scheme, depending on the instantaneous channelconditions. It is beneficial, ifthe transceiver can drop its source rate, for example from 64kbit/s to 32 kbit/s and invokeQPSK modulation instead of 16QAM, while maintaining the same bandwidth. Hence, duringgood channel conditions the higher throughput, higher audio quality but less robust modesof operation can be invoked, while the more robust but lower audio quality BPSK/16kbit/smode can be applied during degrading channel conditions.

Figure 11.29 shows the FER observed for all three modes of operation, namely for the 512,1024 and 2048 versus the channel BER that was predicted by theOFDM receiver during thechannel quality estimation process. Again, the rationale behind using the FER, rather thanthe BER for estimating the expected channel quality of the next transmitted OFDM symbolis, because the MPEG-4 audio codec has to drop the turbo-decoded received OFDM symbols,which contained transmission errors. This is because corrupted audio packets would resultin detrimental MPEG-4 decoding error propagation and audioartifacts. A FER of 1% wasobserved for an estimated input bit error rate of about 4% forthe 16 and 32 kbit/s modes,while a BER of over 5% was tolerable for the 64 kbit/s mode. This was, because the numberof bits per OFDM symbol was quadrupled in the 16QAM mode over which turbo interleavingwas invoked compared to the BPSK mode. The quadrupled interleaving length substantiallyincreased the turbo codec’s performance.

In Figure 11.30, we show our Bits Per Symbol (BPS) throughputperformance comparisonbetween the subband-adaptive and fixed mode OFDM modulationschemes. From the figurewe can see that at a low BPS throughput the adaptive OFDM modulation scheme outper-

Page 126: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 579

11.5. MPEG-4 SPACE-TIME BLOCK CODED OFDM AUDIO TRANSCEIVER 579

0 2 4 6 8 10 12 14 16 18SNR (dB)

0.0

0.5

1.0

1.5

2.0

BP

S

FER=10-2

FER=10-3

FixedAQAM

R=1/2, BPSK

Figure 11.30: BPS performance comparison between the adaptive and fixed-mode OFDM modulationschemes, when using space-time coding, for transmission over the channel model ofFigure 11.24.

forms the fixed OFDM modulation scheme. However, as the BPS throughput of the systemincreases, the fixed modulation schemes become preferable.This is, because adaptive modu-lation is advantageous, when there are high channel qualityvariations in the one-transmitter,one receiver scheme. However, we have shown in Figure 11.28 that the channel qualityvariations have been significantly reduced by employing twoG2 space-time transmitters.Therefore, the advantages of adaptive modulation eroded due to the reduced channel qual-ity variations in the space-time coded system. As a consequence, two different-complexitysystem design principles can be proposed. The first system isthe lower-complexity one-transmitter, one receiver scheme, which mitigates the severe variation of the channel qualityby employing subband adaptive OFDM modulation. By contrast, we can design a morecomplexG2 space-time coded system, which employs fixed modulation schemes, since nosubstantial benefits accrue from employing adaptive modulation, once the fading-inducedchannel-quality fluctuations have been sufficiently mitigated by theG2 space-time code. Inthe remainder of this section, we have opted for investigating the performance of the morepowerful space-time coded system, requiring an increased complexity.

11.5.6 System Performance

As mentioned before, the detailed subsystem parameters used in our space-time coded OFDMsystem are listed in Table 11.8. Again, the channel impulse response profile used was theCOST 207 Typical Urban (TU) channel [409] having four paths and a maximum dispersionof 4.5 µs, where each path was faded independently at a Doppler frequency of2.25 · 10−6

Hz.The BER is plotted versus the channel SNR in Figure 11.31 for the three different fixed

Page 127: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 580

580 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

10-1

2 5 100

2 5 101

BER (%)

0

2

4

6

8

10

12

SE

GS

NR

Deg

rada

tion

(dB

)

64 kbit/s32 kbit/s16 kbit/s

Figure 11.31: BER against Channel SNR performance of the fixed-mode OFDM transceiver of Ta-ble 11.8 in conjunction with and without space time coding, in comparison to thecon-ventional one-transmitter, one-receiver benchmarker for transmission over the channelmodel of Figure 11.24.

10-1

2 5 100

2 5 101

FER (%)

0.0

0.5

1.0

1.5

2.0

2.5

SE

GS

NR

Deg

rada

tion

(dB

)

64 kbit/s32 kbit/s16 kbit/s

Figure 11.32: FER against Channel SNR performance of the fixed-mode OFDM transceiver of Table11.8 in conjunction with and without space time coding, in comparison with the con-ventional one-transmitter, one-receiver benchmarker for transmission over the channelmodel of Figure 11.24.

Page 128: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 581

11.6. TURBO-DETECTED STTC AIDED MPEG-4 AUDIO TRANSCEIVERS 581

modes of operation conveying 512, 1024 or 2048 bits per OFDM symbol both with andwithout space-time coding. The employment of space time coding improved the system’sperformance significantly, giving an approximately 3 dB channel SNR improvement at a BERof 1%. As expected, the lowest throughput BPSK/ 16kbit/s mode was more robust in BERterms, than the QPSK/32kbit/s and the 16QAM /64kbit/s configurations, albeit delivering alower audio quality. Similar results were obtained in termsof FER versus the channel SNR,which are displayed in Figure 11.32, indicating that the most robust BPSK/16kbit/s schemeperformed better than the QPSK/32kbit/s and 16QAM/64kbit/s configurations, albeit at alower audio quality.

The overall SEGSNR versus channel SNR performance of the proposed audio transceiveris displayed in Figure 11.33, again, employingG2 space-time coding using two transmittersand one receiver. The lower-complexity benchmarker using the conventional one-transmitter,one-receiver scheme was also characterized in the figure. Weobserve again that the employ-ment of space time coding provides a substantial improvement in terms maintaining an errorfree audio performance. Specifically, an SNR advantage of 4 dB was recorded comparedto the conventional lower-complexity one-transmitter, one-receiver benchmarker for all threemodulation modes. Furthermore, focussing on the three different operating modes usingspace-time coding, namely on the curves drawn in continuouslines, the 16QAM/64kbit/smode was shown to outperform the QPSK/32kbit/s scheme in terms of both objective andsubjective audio quality for channel SNRs in excess of about10 dB. At a channel SNR ofabout 9 dB, where the 16QAM and QPSK SEGSNR curves cross each other in Figure 11.33,it is preferable to invoke the inherently lower audio quality, but unimpaired QPSK mode ofoperation. Similarly, at a Channel SNR around 5 dB, when the QPSK/32kbit/s scheme’sperformance starts to degrade, it is better to invoke the unimpaired BPSK/ 16kbit/s mode ofoperation, in order to avoid the channel-induced audio artifacts.

11.6 Turbo-Detected Space-Time Trellis CodedMPEG-4 Audio Transceivers

N. S. Othman, S. X. Ng and L. Hanzo

11.6.1 Motivation and Background

In this section a jointly optimised turbo transceiver capable of providing unequal error pro-tection is proposed for employment in an MPEG-4 coded audio transceiver. The transceiveradvocated consists of Space-Time Trellis Coding (STTC), Trellis Coded Modulation (TCM)and two different-rate Non-Systematic Convolutional codes (NSCs) used for unequal errorprotection. A benchmarker scheme combining STTC and a single-class protection NSC isused for comparison with the proposed scheme. The audio performance of the both schemeswill be evaluated when communicating over uncorrelated Rayleigh fading channels. We willdemosntrate that the proposed unequal protection turbo-transceiver scheme requires abouttwo dBs lower transmit power than the single-class turbo benchmarker scheme in the contextof the MPEG-4 audio transceiver, when aiming for an effective throughput of 2 bits/symbol,while exhibiting a similar decoding complexity.

Page 129: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 582

582 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

0 2 4 6 8 10 12 14 16Channel SNR(dB)

6

8

10

12

14

16

18

20

22

SE

GS

NR

(dB

)

Conventional 1T-1RSpace Time 2T-1R16QAM/64kbit/sQPSK/32kbit/sBPSK/16kbit/s

Figure 11.33: SEGSNR against channel SNR of the MPEG-4 TWINVQ based fixed modeOFDMtransceiver in conjunction with and without space time coding, in comparison to theconventional one-transmitter, one-receiver benchmarker.

The previously characterized MPEG-4 standard [504, 505] defines a comprehensive mul-timedia content representation scheme that is capable of supporting numerous applications- such as streaming multimedia signals over the internet/intranet, content-based storage andretrieval, digital multimedia broadcast or mobile communications. The audio-related sectionof the MPEG-4 standard [506] defines audio codecs covering a wide variety applications -ranging from narrowband low-rate speech to high quality multichannel audio, and from nat-ural sound to synthesized sound effects as a benefit of its object-based approach used forrepresenting the audio signals.

The MPEG-4 General Audio (GA) encoder is capable of compressing arbitrary naturalaudio signals. One of the key components of the MPEG-4 GA encoder is the Time/Frequency(T/F) compression scheme constituted by the Advanced AudioCoding (AAC) and Transformbased Weighted Vector Quantization (TwinVQ), which is capable of operating at bitratesranging from 6 kbit/s to broadcast quality audio at 64 kbit/s[504].

The MPEG-4 T/F codec is based on the MPEG-2 AAC standard, extended by a number ofadditional functionalities, such as Perceptual Noise Substitution (PNS) and Long Term Pre-diction (LTP) for enhancing the achievable compression performance, and combined with theTwinVQ for operation at extremely low bit rates. Another important feature of this codec is its

Page 130: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 583

11.6. TURBO-DETECTED STTC AIDED MPEG-4 AUDIO TRANSCEIVERS 583

TCM

Encoder

STTC

Encoder

Fading

Channels

Iterative

Decoder

Audio/

DecoderEncoder

Audio/

SpeechSpeech

Encoder

Encoder

NSC1

NSC2

s

b1

b2

u1

u2

c

xNt

x1

.

yNr

y1 b1

b2

s.

.

.

.

.

Figure 11.34: Block diagram of the serially concatenated STTC-TCM-2NSC assisted MPEG-4 audio

scheme. The notationss, s, bi, bi, ui, c, xj andyk denote the vector of the audiosource symbol, the estimate of the audio source symbol, the class-i audio bits, the es-timates of the class-i audio bits, the encoded bits of class-i NSC encoders, the TCMcoded symbols, the STTC coded symbols for transmitterj and the received symbolsat receiverk, respectively. Furthermore,Nt andNr denote the number of transmittersand receivers, respectively. The symbol-based channel interleaver between the STTCand TCM schemes as well as the two bit-based interleavers at the output ofNSC en-coders are not shown for simplicity. The iterative decoder seen at the right is detailed inFigure 11.35.

robustness against transmission errors in error-prone propagation channels [507]. The errorresilience of the MPEG-4 T/F codec is mainly attributed to the so-called Virtual Codebooktool (VCB11), Reversible Variable Length Coding tool (RVLC) and Huffman Codeword Re-ordering tool (HCR) [507, 508], which facilitate the integration of the MPEG-4 T/F codecinto wireless systems.

In this study the MPEG-4 audio codec was incorporated in a sophisticated unequal-protectionturbo transceiver using joint coding and modulation as inner coding, twin-class convolutionalouter coding as well as space time coding based spatial diversity. Specifically, maximalminimum distance Non-Systematic Convolutional codes (NSCs) [509, p. 331] having twodifferent code-rates were used as outer encoders for providing unequal audio protection. Onone hand, Trellis Coded Modulation (TCM) [510–512] constitutes a bandwidth-efficient jointchannel coding and modulation scheme, which was originallydesigned for transmission overAdditive White Gaussian Noise (AWGN) channels. On the other hand, Space-Time Trel-lis Coding (STTC) [511, 513] employing multiple transmit and receive antennas is capableof providing spatial diversity gain. When the spatial diversity order is sufficiently high, thechannel’s Rayleigh fading envelope is transformed to a Gaussian-like near-constant enve-lope. Hence, the benefits of a TCM scheme designed for AWGN channels will be efficientlyexploited, when TCM is concatenated with STTC.

We will demonstrate that significant iteration gains are attained with the aid of the pro-posed turbo transceiver. The section is structured as follows. In Section 11.6.2 we describethe MPEG-4 audio codec, while in Section 11.6.3 the architecture of the turbo transceiveris described. We elaborate further by characterising the achievable system performance inSection 11.6.4 and conclude in Section 11.6.5.

11.6.2 Audio Turbo Transceiver Overview

As mentioned above, the MPEG-4 AAC is based on time/frequency audio coding, which pro-vides redundancy reduction by exploiting the correlation between subsequent audio samplesof the input signal. Furthermore, the codec uses perceptualmodelling of the human auditory

Page 131: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 584

584 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

system for masking the quantisation distortion of the encoded audio signals by allowing moredistortion in those frequency bands, where the signal exhibits higher energy peaks and viceversa [507,508].

The MPEG-4 AAC is capable of providing an attractive audio quality versus bitrate per-formance, yielding high-fidelity audio reconstruction forbit rates in excess of 32 kbit/s perchannel. In the proposed wireless system the MPEG-4 AAC is used for encoding the stereoaudio file at a bit rate of 48 kbit/s. The audio input signal wassampled at 44.1 kHz and henceresults in an audio framelength of 23.22 ms, which corresponds to 1024 audio input samples.The compressed audio information is formatted into a packetized bitstream, which conveyedone audio frame. In our system, the average transmission frame size is approximately 1116bits per frame. The audio Segmental Signal to Noise Ratio (SegSNR) of this configurationwas found to beS0 = 16.28dB, which gives a transparent audio quality.

It is well recognised that in highly compressed audio bitstreams a low bit error ratio (BER)may lead to perceptually unacceptable distortion. In orderto prevent the complete loss oftransmitted audio frames owing to catastrophic error propagation, the most sensitive bitshave to be well protected from channel errors. Hence, in the advocated system Unequal ErrorProtection (UEP) is employed, where the compressed audio bitstream was partitioned intotwo sensitivity classes. More explicitly, an audio bit, which resulted in a SegSNR degradationabove 16 dB upon its corruption was classified into protection class-1. A range of differentaudio files were used in our work and the results provided are related to a 60 seconds longexcerpt of Mozart’s ”Clarinet Concerto (2nd movement - Adagio)”. From the bit sensitivitystudies using this audio file as the source, we found that approximately 50% of the totalnumber of MPEG-4 encoded bits falls into class-1.

At the receiver, the output of the turbo transceiver is decoded using the MPEG-4 AACdecoder. During the decoding process, the erroneously received audio frames were droppedand replaced by the previous error-free audio frame for the sake of avoiding an even moredramatic error-infested audio-quality degradation [514,515].

11.6.3 The Turbo Transceiver

The block diagram of the serially concatenated STTC-TCM-2NSC turbo scheme using aSTTC, a TCM and two different-rate NSCs as its constituent codes is depicted in Figure 11.40.Since the number of class-1 audio bits is approximately the same as that of the class-2 audiobits and there are approximately 1116 bits per audio frame, we protect the 558-bit class-1 audio sequence using a rate-R1 NSC encoder and the 558-bit class-2 sequence using arate-R2 NSC encoder. Let us denote the turbo scheme as STTC-TCM-2NSC-1 when theNSC coding rates ofR1 = k1/n1 = 1/2 and R2 = k2/n2 = 3/4 are used. Further-more, when the NSC coding rates ofR1 = 2/3 andR2 = 3/4 are used, we denote theturbo scheme as STTC-TCM-2NSC-2. The code memory of the class-1 and class-2 NSCencoders isL1 = 3 and L2 = 3, respectively. The class-1 and class-2 NSC coded bitsequences are interleaved by two separate bit interleavers, before they are fed to the rate-R3 = 3/4 TCM [510–512] scheme having a code memory ofL3 = 3. Code terminationwas employed for the NSCs, TCM [510–512] and STTC codecs [511, 513]. The TCM sym-bol sequence is then symbol-interleaved and fed to the STTC encoder. We invoke a 16-stateSTTC scheme having a code memory ofL4 = 4 andNt = 2 transmit antennas, employingM = 16-level Quadrature Amplitude Modulation (16QAM) [512]. TheSTTC employing

Page 132: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 585

11.6. TURBO-DETECTED STTC AIDED MPEG-4 AUDIO TRANSCEIVERS 585

(3) NSC(3a)

MAP

Decoder

(3b)MAP

Decoder

(1)TCM

MAP

Decoder

(2)

MAP

Decoder

STTC

=

=

=

π−1b1

πb2

Ω

Ψ

πs

π−1s

P 1a (c) = P 2

i (c)

y

P 1p (c) = P 1

e (c) + P 2i (c)

P 2a (c) = P 1

e (c) P 2p (c) = P 2

i (c) + P 1e (c)

P 2p (u) = P 2

i (u) + P 3e (u)

Ψ−1 π

−1b2

πb1

L3a(u) = L2

i (u)

L3aa (u)

L3ba (u)

L3ap (u) = L3a

e (u) + L3aa (u)

L3ap (u)

L3bp (u)

L3ap (b1)

L3bp (b2)

L3bp (u) = L3b

e (u) + L3ba (u)

Ω−1

L2p(u) = L2

i (u) + L3e(u)

P 2a (u) = P 3

e (u) L3e(u)

2m+1

2m

Bit LLR

Figure 11.35: Block diagram of the STTC-TCM-2NSC turbo detection scheme seen at theright ofFigure 11.40. The notationsπ(s,bi) andπ−1

(s,bi)denote the interleaver and deinterleaver,

while the subscripts denotes the symbol-based interleaver of TCM and the subscriptbi

denotes the bit-based interleaver for class-i NSC. Furthermore,Ψ andΨ−1 denote LLR-to-symbol probability and symbol probability-to-LLR conversion, whileΩ andΩ−1 de-note the parallel-to-serial and serial-to-parallel converter, respectively. The notationmdenotes the number of information bits per TCM coded symbol. The thickness of theconnecting lines indicates the number of non-binary symbol probabilities spanning froma single LLR per bit to2m and2m+1 probabilities [516] c©IEE, 2004, Ng, Chung andHanzo.

Nt = 2 requires one 16QAM-based termination symbol. The overall coding rate is givenby Rs1 = 1116/2520 ≈ 0.4429 andRs2 = 1116/2152 ≈ 0.5186 for the STTC-TCM-2NSC-1 and STTC-TCM-2NSC-2 schemes, respectively. The effective throughput of theSTTC-TCM-2NSC-1 and STTC-TCM-2NSC-2 schemes islog2(M)Rs1 ≈ 1.77 Bits PerSymbol (BPS) andlog2(M)Rs2 ≈ 2.07 BPS, respectively.

At the receiver, we employNr = 2 receive antennas and the received signals are fed tothe iterative decoders for the sake of estimating the audio bit sequences in both class-1 andclass-2, as seen in Figure 11.40. The STTC-TCM-2NSC scheme’s turbo decoder structureis illustrated in Figure 11.35, where there are four constituent decoders, each labelled with around-bracketed index. The Maximum A-Posteriori (MAP) algorithm [511] operating in thelogarithmic-domain are employed by the STTC, TCM and the twoNSC decoders, respec-tively. The notationsP (.) andL(.) in Figure 11.35 denote the logarithmic-domain symbolprobabilities and the Logarithmic-Likelihood Ratio (LLR)of the bit probabilities, respec-tively. The notationsc, u andbi in the round brackets(.) in Figure 11.35 denote TCM codedsymbols, TCM information symbols and the class-i audio bits, respectively. The specific na-ture of the probabilities and LLRs is represented by the subscriptsa, p, e andi, which denotea priori, a posteriori, extrinsic andintrinsic information, respectively. The probabilitiesand LLRs associated with one of the four constituent decoders having a label of1, 2, 3a, 3bare differentiated by the identical superscripts of1, 2, 3a, 3b. Note that the superscript3is used for representing the two NSC decoders of3a and3b. The iterative turbo-detectionscheme shown in Figure 11.35 enables an efficient information exchange between STTC,TCM and NSCs constituent codes for the sake of achieving spatial diversity gain, codinggain, unequal error proctection and a near-channel-capacity performance. The informationexchange mechanism between each constituent decoders is detailed in [516].

For the sake of benchmarking the scheme advocated, we created a powerful benchmark

Page 133: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 586

586 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

scheme by replacing the TCM and NSC encoders of Figure 11.40 by a single NSC codechaving a coding rate ofR0 = k0/n0 = 1/2 and a code memory ofL0 = 6. We will refer tothis benchmarker scheme as the STTC-NSC arrangement. All audio bits are equally protectedin the benchmarker scheme by a single NSC encoder and a STTC encoder. A bit-basedchannel interleaver is inserted between the NSC encoder andSTTC encoder. Taking intoaccount the bits required for code termination, the number of output bits of the NSC encoderis (1116 + k0L0)/R0 = 2244, which corresponds to 561 16QAM symbols. Again, a 16-state STTC scheme havingNt = 2 transmit antennas is employed. After code termination,we have561 + 1 = 562 16QAM symbols or4(562) = 2248 bits in a transmission frameat each transmit antenna. The overall coding rate is given byR = 1116/2248 ≈ 0.4964and the effective throughput islog2(16)R ≈ 1.99 BPS, both of which are very close tothe corresponding values of the STTC-TCM-2NSC-2 scheme. A decoding iteration of theSTTC-NSC benchmarker scheme is comprised of a STTC decodingand a NSC decodingstep.

We will quantify the decoding complexity of the proposed STTC-TCM-2NSC scheme andthat of the benchmarker scheme using the number of decoding trellis states. The total numberof decoding trellis states per iteration for the proposed scheme employing 2 NSC decodershaving a code memory ofL1 = L2 = 3, TCM havingL3 = 3 and STTC havingL4 = 4, isgiven byS = 2L1 + 2L2 + 2L3 + 2L4 = 40. By contrast, the total number of decoding trellisstates per iteration for the benchmarker scheme having a code memory ofL0 = 6 and STTChavingL4 = 4, is given byS = 2L0 + 2L4 = 80. Therefore, the complexity of the proposedSTTC-TCM-2NSC scheme having two iterations is equivalent to that of the benchmarkerscheme having a single iteration, which corresponds to 80 decoding states.

11.6.4 Turbo Transceiver Performance Results

In this section we evaluate the performance of the proposed MPEG-4 based audio transceiverschemes using both the achievable Bit Error Ratio (BER) and the attainable Segmental Signalto Noise Ratio (SegSNR).

Figures 11.36 and 11.37 depict the BER versus Signal to NoiseRatio (SNR) per bit, namelyEb/N0, performance of the 16QAM-based STTC-TCM-2NSC-1 and STTC-TCM-2NSC-2schemes, respectively, when communicating over uncorrelated Rayleigh fading channels. Aswe can observe from Figures 11.36 and 11.37, the gap between the BER performance ofthe class-1 and class-2 audio bits is wider for STTC-TCM-2NSC-1 compared to the STTC-TCM-2NSC-2 scheme. More explicitly, the class-1 audio bitsof STTC-TCM-2NSC-1 havea higher protection at the cost of a lower throughput compared to the STTC-TCM-2NSC-2scheme. However, the BER performance of the class-2 audio bits of the STTC-TCM-2NSC-1arrangement is approximately 0.5 dB poorer than that of STTC-TCM-2NSC-2 at BER=10−5.

Let us now study the audio SegSNR performance of the schemes in Figures 11.38 and 11.39.As we can see from Figure 11.38, the SegSNR performance of STTC-TCM-2NSC-1 is infe-rior in comparison to that of STTC-TCM-2NSC-2, despite providing a higher protection forthe class-1 audio bits. More explicitly, STTC-TCM-2NSC-2 requiresEb/N0 = 2.5 dB, whileSTTC-TCM-2NSC-1 requiresEb/N0 = 3 dB, when having an audio SegSNR in excess of16 dB after the fourth turbo iteration. Hence the audio SegSNR performance of STTC-TCM-2NSC-1 is 0.5 dB poorer than that of STTC-TCM-2NSC-2 after the fourth iteration. Notethat the BER of the class-1 and class-2 audio bits for the corresponding values ofEb/N0,

Page 134: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 587

11.6. TURBO-DETECTED STTC AIDED MPEG-4 AUDIO TRANSCEIVERS 587

ber12-a.gle

0 1 2 3 4 5 6Eb/N0 (dB)

10-6

10-5

10-4

10-3

10-2

10-1

100B

ER

STTC-TCM-2NSC-1

class 2class 1

6 iter4 iter2 iter1 iter

Figure 11.36: BER versusEb/N0 performance of the 16QAM-based STTC-TCM-2NSC-1 assistedMPEG-4 audio scheme, when communicating over uncorrelated Rayleighfading chan-nels. The effective throughput was1.77 BPS.

ber12-b.gle

0 1 2 3 4 5 6Eb/N0 (dB)

10-6

10-5

10-4

10-3

10-2

10-1

100

BE

R

STTC-TCM-2NSC-2

class 2class 1

6 iter4 iter2 iter1 iter

Figure 11.37: BER versusEb/N0 performance of the 16QAM-based STTC-TCM-2NSC-2 assistedMPEG-4 audio scheme, when communicating over uncorrelated Rayleighfading chan-nels. The effective throughput was2.07 BPS.

Page 135: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 588

588 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

segsnr-s12.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

6

8

10

12

14

16

18

Seg

SN

R

STTC-TCM-2NSC-2STTC-TCM-2NSC-1

6 iter4 iter2 iter1 iter

Figure 11.38: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-TCM-2NSCassisted MPEG-4 audio scheme, when communicating over uncorrelatedRayleigh fad-ing channels. The effective throughput of STTC-TCM-2NSC-1 and STTC-TCM-2NSC-2 was1.77 and2.07 BPS, respectively.

segsnr-bench.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

6

8

10

12

14

16

18

Seg

SN

R

STTC-NSC

4 iter3 iter2 iter1 iter

Figure 11.39: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-NSC assistedMPEG-4 audio benchmarker scheme, when communicating over uncorrelated Rayleighfading channels. The effective throughput was1.99 BPS.

Page 136: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 589

11.6. TURBO-DETECTED STTC AIDED MPEG-4 AUDIO TRANSCEIVERS 589

SegSNR and iteration index is less than10−7 and10−4, respectively, for the two differentturbo schemes. After the sixth iteration, the SegSNR performance of both turbo schemesbecomes quite similar since the corresponding BER is low. These results demonstrate thatthe MPEG-4 audio decoder requires a very low BER for both class-1 and class-2 audio bits,when aiming for a SegSNR above 16 dB. In this context it is worth mentioning that RecursiveSystematic Convolutional codes (RSCs) [509–511] are capable of achieving a higher itera-tion gain, but suffer from an error floor. Owing to this reasonthe SegSNR performance of theschemes employing RSCs instead of NSCs was found to be poorer. The SegSNR results ofthe turbo schemes employing RSCs instead of NSCs as the outercode were not shown herefor reasons of space economy.

Figure 11.39 portrays the SegSNR versusEb/N0 performance of the STTC-NSC audiobenchmarker scheme, when communicating over uncorrelatedRayleigh fading channels.Note that if we reduce the code memory of the NSC constituent code of the STTC-NSCbenchmarker arrangement fromL0=6 to 3, the achievable performance becomes poorer, asexpected. If we increasedL0 from 6 to 7 (or higher), the decoding complexity would increasesignificantly, while the attainable best possible performance is only marginally increased.Hence, the STTC-NSC scheme havingL0=6 constitutes a good benchmarker scheme in termsof its performance versus complexity tradeoffs. It is shownin Figures 11.38 and 11.39 that thefirst iteration based performance of the STTC-NSC benchmarker scheme is better than thatof the proposed STTC-TCM-2NSC arrangements. However, at the same decoding complex-ity of 160 (240) trellis decoding states STTC-TCM-2NSC-2 having 4 (6) iterations performsapproximately 2 (1.5) dB better than the STTC-NSC arrangement having 2 (3) iterations.

It is worth mentioning that other joint coding and modulation schemes directly designed forfading channels, such as for example Bit Interleaved Coded Modulation (BICM) [511, 512,517] were outperformed by the TCM-based scheme, since the STTC arrangement renderedthe error statistics more Gaussian-like [518].

11.6.5 MPEG-4 Turbo Transceiver Summary

In conclusion, a jointly optimised audio source-coding, outer unequal protection NSC channel-coding, inner TCM and spatial diversity aided STTC turbo transceiver was proposed foremployment in a MPEG-4 wireless audio transceiver. With theaid of two different-rateNSCs the audio bits were protected differently according totheir error sensitivity. The em-ployment of TCM improved the bandwidth efficiency of the system and by utilising STTCspatial diversity was attained. The performance of the proposed STTC-TCM-2NSC schemewas enhanced with the advent of an efficient iterative joint decoding structure. The high-compression MPEG-4 audio decoder is sensitive to transmission errors and hence it wasfound to require a low BER for both classes of audio bits in order to attain a perceptuallypleasing, artefact-free audio quality. The proposed twin-class STTC-TCM-2NSC schemeperforms approximately 2 dB better in terms of the requiredEb/N0 than the single-classSTTC-NSC audio benchmarker.

Page 137: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 590

590 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

11.7 Turbo-Detected Space-Time Trellis Coded MPEG-4 Ver-sus AMR-WB Speech Transceivers

N. S. Othman, S. X. Ng and L. Hanzo

11.7.1 Motivation and Background

The MPEG-4 TwinVQ audio codec and the AMR-WB speech codec are investigated in thecontext of a jointly optimised turbo transceiver capable ofproviding unequal error protec-tion. The transceiver advocated consists of serially concatenated Space-Time Trellis Coding(STTC), Trellis Coded Modulation (TCM) and two different-rate Non-Systematic Convolu-tional codes (NSCs) used for unequal error protection. A benchmarker scheme combiningSTTC and a single-class protection NSC is used for comparison with the proposed scheme.The audio and speech performance of both schemes is evaluated, when communicating overuncorrelated Rayleigh fading channels. We will demonstrate that anEb/N0 value of about2.5 (3.5) dB is required for near-unimpaired audio (speech)transmission, which is about 3.07(4.2) dB from the capacity of the system.

In recent years, joint source-channel coding (JSCC) has been receiving significant researchattention in the context of both delay- and complexity-constrained transmission scenarios.JSCC aims at designing the source codec and channel codec jointly for the sake of achievingthe highest possible system performance. As it was argued in[518], this design philosophydoes not contradict to the classic Shannonian source and channel coding separation theo-rem. This is because instead of considering perfectly lossless Shannonian entropy codersfor source coding and transmitting their bitstreams over Gaussian channels, we considerlow-bitrate lossy audio and speech codecs, as well as Rayleigh-fading channels. Since thebitstreams of the speech and audio encoders are subjected toerrors during wireless trans-mission, it is desirable to provide stronger error protection for the audio bits, which have asubstantial effect on the objective or subjective quality of the reconstructed speech or audiosignals. Unequal error protection (UEP) is a particular manifestation of JSCC, which offersa mechanism to match the error protection capabilities of channel coding schemes havingdifferent error correction capabilities to the differing bit-error sensitivities of the speech oraudio bits [519].

Speech services are likely to remain the most important onesin wireless systems. However,there is an increasing demand for high-quality speech transmissions in multimedia applica-tions, such as video-conferencing [514]. Therefore, an expansion of the speech bandwidthfrom the 300-3400 Hz range to a wider bandwidth of 50-7000 Hz is a key factor in meetingthis demand. This is because the low-frequency enhancementranging from 50 to 200 Hzcontributes to the increased naturalness, presence and comfort, whilst the higher-frequencyextension spanning from 3400 to 7000 Hz provides a better fricative differentiation and there-fore a higher intelligibility. A bandwidth of 50 to 7000 Hz not only improves the intelligibil-ity and naturalness of speech, but also adds an impression oftransparent communication andeases speaker recognition. The Adaptive Multi-Rate Wideband (AMR-WB) voice codec hasbecome a 3GPP standard, which provides a superior speech quality [520].

For the sake of supporting high-quality multimedia services over wireless communica-tion channels requires the development of techniques for transmitting not only speech, butalso video, music, and data. Therefore, in the field of audio-coding, high-quality, high-

Page 138: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 591

11.7. TURBO-DETECTED STTC AIDED MPEG-4 VERSUS AMR-WB TRANSCEIVERS 591

TCM

Encoder

STTC

Encoder

Fading

Channels

Iterative

Decoder

Audio/

DecoderEncoder

Audio/

SpeechSpeech

Encoder

Encoder

NSC1

NSC2

s

b1

b2

u1

u2

c

xNt

x1

.

yNr

y1 b1

b2

s.

.

.

.

.

Figure 11.40: Block diagram of the serially concatenated STTC-TCM-2NSC assisted audio/speech

scheme. The notationss, s, bi, bi, ui, c, xj andyk denote the vector of the audio/speechsource symbol, the estimate of the audio/speech source symbol, the class-i audio/speechbits, the estimates of the class-i audio/speech bits, the encoded bits of class-i NSC en-coders, the TCM coded symbols, the STTC coded symbols for transmitterj and thereceived symbols at receiverk, respectively. Furthermore,Nt andNr denote the num-ber of transmitters and receivers, respectively. The symbol-basedchannel interleaverbetween the STTC and TCM schemes as well as the two bit-based interleavers at theoutput of NSC encoders are not shown for simplicity. The iterative decoder seen at theright is detailed in Figure 11.43.

compression and highly error-resilient audio-coding algorithms are required. The MPEG-4Transform-domain Weighted Interleaved Vector Quantization (TwinVQ) scheme is a low-bit-rate audio-coding technique that achieves a high audio quality under error-free transmissionconditions at bitrates below 40 kbps [506]. In order to render this codec applicable to wirelesssystems, which typically exhibit a high bit-error ratio (BER), powerful turbo transceivers arerequired.

Trellis Coded Modulation (TCM) [510–512] constitutes a bandwidth-efficient joint chan-nel coding and modulation scheme, which was originally designed for transmission over Ad-ditive White Gaussian Noise (AWGN) channels. Space-Time Trellis Coding (STTC) [511,513] is a joint spatial diversity and channel coding technique. STTC may be efficientlyemployed in an effort to mitigate the effects of Rayleigh fading channels and render themGaussian-like for the sake of supporting the operation of a TCM code. Recently, a sophis-ticated unequal-protection turbo transceiver using twin-class convolutional outer coding, aswell as joint coding and modulation as inner coding combinedwith STTC-based spatial di-versity scheme was designed for MPEG-4 video telephony in [516, 518]. Specifically, max-imal minimum distance Non-Systematic Convolutional codes(NSCs) [509, p. 331] havingtwo different code-rates were used as outer encoders for providing unequal MPEG-4 videoprotection. Good video quality was attained at a low SNR and medium complexity by theproposed transceiver. By contrast, in this section we studythe achievable performance ofthe AMR-WB and the MPEG-4 TwinVQ speech and audio codecs in conjunction with thesophisticated unequal-protection turbo transceiver of [516,518].

11.7.2 The AMR-WB Codec’S Error Sensitivity

The synthesis filter’s excitation signal in the AMR-WB codec is based on the Algebraic CodeExcited Linear Predictor (ACELP) algorithm, supporting nine different speech codec modeshaving bitrates of 23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85 and 6.6 kbps [520].Like most ACELP-based algorithms, the AMR-WB codec interprets 20 ms segments ofspeech as the output of a linear synthesis filter synthesizedfrom an appropriate excitationsignal. The task of the encoder is to optimise the filter as well as the excitation signal and

Page 139: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 592

592 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

0 5 10 15 20 25 30 35 40 4505

101520

ISP1 ISP2 ISP3 ISP4 ISP5 ISP6 ISP7

50 60 70 80 90 100 11005

101520

Adaptive C I Fixed Codebook Index VQ Gain

120 130 140 150 160 170 18005

101520

SE

GS

NR

Deg

rada

tion

(dB

)

Adaptive C I Fixed Codebook Index VQ Gain

190 200 210 220 230 240 25005

101520

Adaptive C I Fixed Codebook Index VQ Gain

260 270 280 290 300 310

Bit Index05

101520

Adaptive C I Fixed Codebook Index VQ Gain

Figure 11.41: SegSNR degradations versus bit index due to inflicting 100% Bit Error Rate(BER) inthe 317-bit, 20 ms AMR-WB frame

then represent both as efficiently as possible with the aid ofa frame of binary bits. At thedecoder, the encoded bit-based speech description is used to synthesize the speech signal byinputting the excitation signal to the synthesis filter, thereby generating the speech segment.Again, each AMR-WB frame represents 20 ms of speech, producing 317 bits at a bitrate of15.85 kbps. The codec parameters that are transmitted over the noisy channel include theso-called imittance spectrum pairs (ISPs), the adaptive codebook delay (pitch delay), the al-gebraic codebook excitation index, and the jointly vector quantized, pitch gains as well asalgebraic codebook gains.

Most source coded bitstreams contain certain bits that are more sensitive to transmissionerrors than others. A common approach used for quantifying the sensitivity of a given bitis to consistently invert this bit in every speech or audio frame and evaluate the associatedSegmental SNR (SegSNR) degradation [515]. The SegSNR degradation is computed bysubtracting from the SegSNR recorded under error-free conditions the corresponding valuewhen there are channel-induced bit-errors.

The error sensitivity of the various encoded bits in the AMR-WB codec determined in thisway is shown in Figure 2. The results are based on samples taken from the EBU SQAM(Sound Quality Assessment Material) CD, sampled at 16 kHz and encoded at 15.85 kbps. Itcan be observed that the bits representing the ISPs, the adaptive codebook delay, the algebraiccodebook index and the vector quantized gain are fairly error sensitive. The least sensitive

Page 140: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 593

11.7. TURBO-DETECTED STTC AIDED MPEG-4 VERSUS AMR-WB TRANSCEIVERS 593

bits are related to the fixed codebook’s excitation pulse positions, as shown in Figure 11.41.This is because, when one of the fixed codebook index bits is corrupted, the codebook entryselected at the decoder will differ from that used in the encoder only in the position of oneof the non-zero excitation pulses. Therefore the corruptedexcitation codebook entry will besimilar to the original one. Hence, the algebraic codebook structure used in the AMR-WBcodec is quite robust to channel errors.

11.7.3 The MPEG-4 TwinVQ Codec’S Error Sensitivity

The MPEG-4 TwinVQ scheme is a transform coder that uses the modified discrete cosinetransformation (MDCT) [506] for transforming the input signal into the frequency-domaintransform coefficients. The input signal is classified into one of three modes, each associatedwith a different transform window size, namely a long, medium or short window, catering fordifferent input signal characteristics. The MDCT coefficients are normalized by the spectralenvelope information obtained through the Linear Predictive Coding (LPC) analysis of thesignal. Then the normalized coefficients are interleaved and divided into sub-vectors by us-ing the so-called interleave and division technique of [506], and all sub-vectors are encodedseparately by the VQ modules.

0

5

10

15

20

25

30

35

SE

GS

NR

Deg

rada

tion

(dB

) 0 100 200 300 400 500 600 700Bit Index

MDCT coefficientsBark envelopeGain

Window mode LSP

Figure 11.42: SegSNR degradations due to inflicting a 100% BER in the 743-bit, 23.22 ms MPEG-4TwinVQ frame

Similarly, bit error sensitivity investigations were performed in the same way, as describedin the previous section. Figure 11.42 shows the error sensitivity of the various bits of theMPEG-4 TwinVQ codec for a bitrate of 32 kbps. The results provided are based on a 60seconds long excerpt of Mozart’s ”Clarinet Concerto (2nd movement - Adagio)”. This stereoaudio file was sampled at 44.1 kHz and again, encoded at 32 kbps. Since the analysis framelength is 23.22 ms, which corresponds to 1024 audio input samples, there are 743 encodedbits in each frame. This figure shows that the bits representing the gain factors, the LineSpectral Frequency (LSF) parameters, and the Bark-envelope are more sensitive to channelerrors, compared to the bits representing the MDCT coefficients. The bits signalling the

Page 141: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 594

594 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

(3) NSC(3a)

MAP

Decoder

(3b)MAP

Decoder

(1)TCM

MAP

Decoder

(2)

MAP

Decoder

STTC

=

=

=

π−1b1

πb2

Ω

Ψ

πs

π−1s

P 1a (c) = P 2

i (c)

y

P 1p (c) = P 1

e (c) + P 2i (c)

P 2a (c) = P 1

e (c) P 2p (c) = P 2

i (c) + P 1e (c)

P 2p (u) = P 2

i (u) + P 3e (u)

Ψ−1 π

−1b2

πb1

L3a(u) = L2

i (u)

L3aa (u)

L3ba (u)

L3ap (u) = L3a

e (u) + L3aa (u)

L3ap (u)

L3bp (u)

L3ap (b1)

L3bp (b2)

L3bp (u) = L3b

e (u) + L3ba (u)

Ω−1

L2p(u) = L2

i (u) + L3e(u)

P 2a (u) = P 3

e (u) L3e(u)

2m+1

2m

Bit LLR

Figure 11.43: Block diagram of the STTC-TCM-2NSC turbo detection scheme seen at theright ofFigure 11.40. The notationsπ(s,bi) andπ−1

(s,bi)denote the interleaver and deinterleaver,

while the subscripts denotes the symbol-based interleaver of TCM and the subscriptbi

denotes the bit-based interleaver for class-i NSC. Furthermore,Ψ andΨ−1 denote LLR-to-symbol probability and symbol probability-to-LLR conversion, whileΩ andΩ−1 de-note the parallel-to-serial and serial-to-parallel converter, respectively. The notationmdenotes the number of information bits per TCM coded symbol [516]c©IEE, 2004,Hanzo.

window mode used are also very sensitive to transmission errors and hence have to be wellprotected. The proportion of sensitive bits was only about 10%. This robustness is deemed tobe a benefit of the weighted vector-quantization procedure which uses a fixed-length codingstructure as opposed to using an error-sensitive variable-length structure, where transmissionerrors would result in a loss of synchronisation.

11.7.4 The Turbo Transceiver

Once the bit error sensitivity of the audio/speech codecs was determined, the bits of theAMR-WB and the MPEG-4 TwinVQ codec are protected according totheir relative impor-tance. Figure 11.40 shows the schematic of the serially concatenated STTC-TCM-2NSCturbo scheme using a STTC and a TCM scheme as well as two different-rate NSCs as itsconstituent codes. Let us denote the turbo scheme using the AMR-WB codec as STTC-TCM-2NSC-AMR-WB, whilst STTC-TCM-2NSC-TVQ refers to the turbo scheme usingthe MPEG-4 TwinVQ as the source codec. For comparison, both schemes protect 25% ofthe most sensitive bits in class-1 using an NSC code rate ofR1 = k1/n1 = 1/2. By con-trast, the remaining 75% of the bits in class-2 are protectedby an NSC scheme having arate of R2 = k2/n2 = 3/4. The code memory of the class-1 and class-2 encoders isL1 = 3 andL2 = 3, respectively. The class-1 and class-2 NSC coded bit sequences areinterleaved by two separate bit interleavers, before they are fed to the rate-R3 = 3/4 TCMscheme [510–512] having a code memory ofL3 = 3. Code termination was employed forthe NSCs, as well as for the TCM [510–512] and STTC codecs [511,513]. The TCM symbolsequence is then symbol-interleaved and fed to the STTC encoder as seen in Figure 11.43.We invoke a 16-state STTC scheme having a code memory ofL4 = 4 andNt = 2 transmitantennas, employingM = 16-level Quadrature Amplitude Modulation (16QAM) [512]. TheSTTC scheme employingNt = 2 requires a single 16QAM-based termination symbol. Inthe STTC-TCM-2NSC-AMR-WB scheme the 25% of the bits that are classified into class-1includes 23 header bits, which gives a total of 340 NSC1-encoded bits. In the ITU streamformat [521], the header bits of each frame include the frametypes and the window-mode

Page 142: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 595

11.7. TURBO-DETECTED STTC AIDED MPEG-4 VERSUS AMR-WB TRANSCEIVERS 595

used.Hence, the overall coding rate of the STTC-TCM-2NSC-AMR-WB scheme becomesRAMRWB =

340/720 ≈ 0.4722. By contrast, the overall coding rate of the STTC-TCM-2NSC-TVQscheme isRTV Q = 744/1528 ≈ 0.4869. The effective throughput of the STTC-TCM-2NSC-AMR-WB and STTC-TCM-2NSC-TVQ schemes islog2(M) · RAMRWB ≈ 1.89Bits Per Symbol (BPS) andlog2(M) · RTV Q ≈ 1.95 BPS, respectively.

At the receiver, we employNr = 2 receive antennas and the received signals are fed tothe iterative decoders for the sake of estimating the audio bit sequences in both class-1 andclass-2, as seen in Figure 11.40. The STTC-TCM-2NSC scheme’s turbo decoder structureis illustrated in Figure 11.43, where there are four constituent decoders, each labelled witha round-bracketed index. The Maximum A-Posteriori (MAP) algorithm [511] operating inthe logarithmic-domain is employed by the STTC and TCM schemes as well as by the twoNSC decoders, respectively. The iterative turbo-detection scheme shown in Figure 11.43enables an efficient information exchange between the STTC,TCM and NSCs constituentcodes for the sake of achieving spatial diversity gain, coding gain, unequal error protectionand a near-channel-capacity performance. The informationexchange mechanism betweeneach constituent decoders is detailed in [516].

For the sake of benchmarking both audio schemes advocated, we created a powerful bench-mark scheme for each of them by replacing the TCM and NSC encoders of Figure 11.40 bya single-class NSC codec having a coding rate ofR0 = k0/n0 = 1/2 and a code memoryof L0 = 6. Note that if we reduce the code memory of the NSC constituentcode of theSTTC-NSC benchmarker arrangement fromL0=6 to 3, the achievable performance becomespoorer, as expected. If we increasedL0 from 6 to 7 (or higher), the decoding complexitywould double, while the attainable performance is only marginally increased. Hence, theSTTC-NSC scheme havingL0=6 constitutes a good benchmarker scheme in terms of itsperformance versus complexity tradeoffs. We will refer to this benchmarker scheme as theSTTC-NSC-TVQ and the STTC-NSC-AMR-WB arrangement designedfor the audio andthe speech transceiver, respectively. Again, all audio andspeech bits are equally protectedin the benchmarker scheme by a single NSC encoder and a STTC encoder. A bit-basedchannel interleaver is inserted between the NSC encoder andSTTC encoder. Taking intoaccount the bits required for code termination, the number of output bits of the NSC en-coder of the STTC-NSC-TVQ benchmarker scheme is(744 + k0L0)/R0 = 1500, whichcorresponds to 375 16QAM symbols. By contrast, in the STTC-NSC-AMR-WB schemethe number of output bits after taking into account the bits required for code terminationbecomes(340 + k0L0)/R0 = 692, which corresponds to 173 16QAM symbols. Again, a16-state STTC scheme havingNt = 2 transmit antennas is employed. After code termina-tion, we have375 + 1 = 376 16QAM symbols or4(376) = 1504 bits in a transmissionframe at each transmit antenna for the STTC-NSC-TVQ. The overall coding rate is given byRTV Q−b = 744/1504 ≈ 0.4947 and the effective throughput islog2(16)RTV Q−b ≈ 1.98BPS, both of which are very close to the corresponding valuesof the STTC-TCM-2NSC-TVQ scheme. Similary, for the STTC-NSC-AMR-WB scheme, aftercode termination, wehave173 + 1 = 174 16QAM symbols or4(174) = 696 bits in a transmission frame at eachtransmit antenna. This gives the overall coding rate asRAMRWB−b = 340/696 ≈ 0.4885and the effective throughput becomeslog2(16)RAMRWB−b ≈ 1.95 BPS. Again, both of thevalues are close to the corresponding values of the STTC-TCM-2NSC-AMR-WB scheme. Adecoding iteration of each of the STTC-NSC benchmarker schemes is comprised of a STTC

Page 143: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 596

596 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

decoding and a NSC decoding step.We will quantify the decoding complexity of the proposed STTC-TCM-2NSC schemes and

that of its corresponding benchmarker schemes using the number of decoding trellis states.The total number of decoding trellis states per iteration ofthe proposed scheme employing 2NSC decoders having a code memory ofL1 = L2 = 3, using the TCM scheme havingL3 =3 and the STTC arrangement havingL4 = 4, becomesS = 2L1 +2L2 +2L3 +2L4 = 40. Bycontrast, the total number of decoding trellis states per iteration for the benchmarker schemehaving a code memory ofL0 = 6 and for the STTC havingL4 = 4 is given byS = 2L0 +2L4 = 80. Therefore, the complexity of the proposed STTC-TCM-2NSC scheme having twoiterations is equivalent to that of the benchmarker scheme having a single iteration, whichcorresponds to 80 decoding states.

11.7.5 Performance Results

In this section we comparatively study the performance of the audio and speech transceiverusing the Segmental Signal to Noise Ratio (SegSNR) metric.

segsnr-tvq.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

6

8

10

12

14

16

Seg

SN

R

STTC-TCM-2NSC-TVQ

6 iter5 iter4 iter3 iter2 iter1 iter

Figure 11.44: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-TCM-2NSCassisted MPEG-4 TwinVQ audio scheme, when communicating over uncorrelatedRayleigh fading channels. The effective throughput was1.95 BPS.

Figures 11.44 and 11.45 depict the audio SegSNR performanceof the STTC-TCM-2NSC-TVQ and that of its corresponding STTC-NSC-TVQ benchmarkerschemes, respectively,when communicating over uncorrelated Rayleigh fading channels. It can be seen from Fig-ures 11.44 and 11.45 that the non-iterative single-detection based performance of the STTC-NSC-TVQ benchmarker scheme is better than that of the STTC-TCM-2NSC assisted MPEG-4 TwinVQ audio scheme. However, at the same decoding complexity quantified in termsof the number of trellis decoding states the STTC-TCM-2NSC-TVQ arrangement performsapproximately 0.5 dB better in terms of the required channelEb/N0 value than the STTC-NSC-TVQ benchmarker scheme, both exhibiting a SegSNR of 13.8 dB. For example, atthe decoding complexity of 160 trellis decoding states, this corresponds to the STTC-TCM-2NSC-TVQ scheme’s 4th iteration, whilst in the STTC-NSC-TVQ scheme this correspondsto the 2nd iteration. Therefore, we observe in Figures 11.44and 11.45 that the STTC-TCM-

Page 144: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 597

11.7. TURBO-DETECTED STTC AIDED MPEG-4 VERSUS AMR-WB TRANSCEIVERS 597

segsnr-tvqbench.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

6

8

10

12

14

16

Seg

SN

R

STTC-NSC-TVQ

4 iter3 iter2 iter1 iter

Figure 11.45: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-NSC assistedMPEG-4 TwinVQ audio benchmarker scheme, when communicating overuncorrelatedRayleigh fading channels. The effective throughput was1.98 BPS.

segsnr-amr.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

5

6

7

8

9

10

11

12

Seg

SN

R

STTC-TCM-2NSC-AMR-WB

6 iter5 iter4 iter3 iter2 iter1 iter

Figure 11.46: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-TCM-2NSCassisted AMR-WB speech scheme, when communicating over uncorrelated Rayleighfading channels. The effective throughput was1.89 BPS.

Page 145: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 598

598 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

segsnr-amrbench.gle

0 1 2 3 4 5 6Eb/N0 (dB)

4

5

6

7

8

9

10

11

12S

egS

NR

STTC-NSC-AMR-WB

4 iter3 iter2 iter1 iter

Figure 11.47: Average SegSNR versusEb/N0 performance of the 16QAM-based STTC-NSC as-sisted AMR-WB speech benchmarker scheme, when communicating over uncorrelatedRayleigh fading channels. The effective throughput was1.95 BPS.

2NSC-TVQ arrangement performs by 0.5 dB better in terms of the required channelEb/N0

value than its corresponding benchmarker scheme.

Similarly, it can be observed from Figures 11.46 and 11.47 that at the decoding complex-ity of 160 trellis decoding states the STTC-TCM-2NSC-AMR-WBarrangement performs0.5 dB better in terms of the required channelEb/N0 value than the STTC-NSC-AMR-WBscheme, when targetting a SegSNR of 10.6 dB. By comparing Figures 11.44 and 11.46,we observe that the SegSNR performance of the STTC-TCM-2NSC-AMR-WB scheme isinferior in comparison to that of STTC-TCM-2NSC-TVQ.

More explicitly, the STTC-TCM-2NSC-TVQ system requires anEb/N0 value of 2.5 dB,while the STTC-TCM-2NSC-AMR-WB arrangement necessitatesEb/N0 = 3.0 dB, whenhaving their respective maximum attainable average SegSNRs. The maximum attainable av-erage SegSNRs for STTC-TCM-2NSC-TVQ and STTC-TCM-2NSC-AMR-WB are 13.8 dBand 10.6 dB, respectively.

This discrepancy is due to the reason that both schemes map the most sensitive 25% of theencoded bits to class-1. By contrast, based on the bit error sensitivity study of the MPEG-4TwinVQ codec outlined in Section 3, only 10% of the MPEG-4 TwinVQ encoded bits werefound to be gravely error sensitive. Therefore, the 25% class-1 bits of the MPEG-4 TwinVQalso includes some bits, which were found to be only moderately sensitive to channel errors.However, in the case of the AMR-WB codec all the bits of the 25%-partition were found tobe quite sensitive to channel errors. Furthermore, the frame length of the STTC-TCM-2NSC-TVQ scheme is longer than that of the STTC-TCM-2NSC-AMR-WB arrangement and hencebenefits from a higher coding gain.

It is worth mentioning that the channel capacity for the system employing the full-diversitySTTC scheme with the aid ofNt = 2 transmit antennas andNr = 2 receive antennas is -0.57 dB and -0.70 dB for the throughputs of 1.95 BPS and 1.89 BPS, respectively, whencommunicating over uncorrelated Rayleigh fading channels[522].

Page 146: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 599

11.8. CHAPTER SUMMARY 599

11.7.6 AMR-WB and MPEG-4 TwinVQ Turbo Transceiver Summary

In this section we comparatively studied the performance ofthe MPEG-4 TwinVQ and AMR-WB audio/speech codecs combined with a jointly optimised source-coding, outer unequalprotection NSC channel-coding, inner TCM and spatial diversity aided STTC turbo transceiver.The audio bits were protected differently according to their error sensitivity with the aid oftwo different-rate NSCs . The employment of TCM improved thebandwidth efficiency ofthe system and by utilising STTC spatial diversity was attained. The performance of theSTTC-TCM-2NSC scheme was enhanced with the advent of an efficient iterative joint de-coding structure. Both proposed twin-class STTC-TCM-2NSCschemes perform approxi-mately 0.5 dB better in terms of the requiredEb/N0 than the corresponding single-classSTTC-NSC audio benchmarker schemes. This relatively modest advantage of the twin-classprotected transceiver was a consequence of having a rather limited turbo-interleaver length.In the longer interleaver of the videohphone system of [516,518] an approximately 2 dBEb/N0 gain was achieved. For a longer-delay non-realtime audio streaming scheme a simliarperformance would be achieved to that of [516]. Our future work will further improve theachievable audio performance using the soft speech-bit decoding technique of [523].

11.8 Chapter Summary

In this chapter the MPEG-4 Audio standard was discussed in detail. The MPEG-4 audiostandard is constituted by a toolbox of different coding algorithms, designed for coding bothspeech and music signals in the range spanning from very low bit rates, such as 2 kbit/s torates as high as 64 kbit/s. In Section 11.2, the important milestones in the field of audiocoding were described and summarised in Figure 11.2. Specifically, four key technologies,namely perceptual coding, frequency domain coding, the window switching strategy and thedynamic bit allocation technique were fundamentally important in the advancement of audiocoding. The MPEG-2 AAC codec [40], as described in Section 11.2.1 forms a core part ofthe MPEG-4 audio codec. Various tools that can be used for processing the transform coeffi-cients in order to achieve an improved coding efficiency werehighlighted in Sections 11.2.2to 11.2.5. The AAC quantization procedure was discussed in Section 11.2.6, while two othertools provided for encoding the transform coefficients, namely the BSAC and TWIVQ tech-niques were detailed in Sections 11.2.8 and 11.2.9, respectively. More specifically, the BSACcoding technique provides finely-grained bitstream scalability, in order to further reduce theredundancy inherent in the quantized spectrum of the audio signal generated by the MPEG-4codec. The TWINVQ codec [463] described in Section 11.2.9 wasfound to be capable ofencoding both speech and music signals, which provides an attractive option for low bit rateaudio coding.

In Section 11.3, which was dedicated to speech coding tools,the HVXC and CELP codecswere discussed. The HVXC codec was employed for encoding speech signals in the bit raterange spanning from 2 to 4 kbit/s, while the CELP codec is usedat bit rates between 4 and24 kbit/s, with the additional capability of encoding speech signals at the sampling rates of 8and 16 kHz.

In Section 11.5 turbo coded and space-time coded adaptive aswell as fixed modulationbased OFDM assisted MPEG-4 audio systems have been investigated. The transmission pa-rameters have been partially harmonised with the UMTS TDD mode [491], which provides an

Page 147: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 600

600 CHAPTER 11. MPEG-4 AUDIO COMPRESSION AND TRANSMISSION

attractive system design framework. More specifically, we employed the MPEG-4 TWINVQcodec at the bit rates of 16, 32 and 64 kbit/s. We found that by employing space-time coding,the channel quality variations have been significantly reduced and no additional benefits couldbe gained by employing adaptive modulation. However, adaptive modulation was foundbeneficial when it was employed in a low-complexity one-transmitter, one-receiver scenariowhen high channel quality variations were observed. The space-time coded, two-transmitter,one-receiver configuration was shown to outperform the conventional one-transmitter, one-receiver scheme by about 4 dB in channel SNR terms over the highly dispersive COST207TU channel.

In Section 11.7.6 we comparatively studied the performanceof the MPEG-4 TwinVQ andAMR-WB audio/speech codecs combined with a jointly optimised source-coding, outer un-equal protection NSC channel-coding, inner TCM and spatialdiversity aided STTC turbotransceiver. The employment of TCM provided further error protection without expandingthe bandwidth of the system and by utilising STTC spatial diversity was attained, whichrendered the error statistics experienced pseudo-random,as required by the TCM scheme,since it was designed for Gaussian channels inflicting randomly dispersed channel errors. Fi-nally, the performance of the STTC-TCM-2NSC scheme was enhanced with the advent ofan efficient iterative joint decoding structure. Both proposed twin-class STTC-TCM-2NSCschemes perform approximately 0.5 dB better in terms of the requiredEb/N0 than the cor-responding single-class STTC-NSC audio benchmarker schemes. This relatively modest ad-vantage of the twin-class protected transceiver was a consequence of having a rather lim-ited turbo-interleaver length imposed by the limited tolertable audio delay. In the longerinterleaver of the less delay-limited videophone system of[516,518] an approximately 2 dBEb/N0 gain was achieved. For a longer-delay non-realtime audio streaming scheme a similiarperformance would be achieved to that of [516].

Page 148: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 601

Part IV

Very Low Rate Coding andTransmission

601

Page 149: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 602

Page 150: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 628

628

Page 151: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 682

682

Page 152: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 728

728

Page 153: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 802

802

Page 154: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 842

842

Page 155: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 843

Bibliography

[1] R. Cox and P. Kroon, “Low bit-rate speech coders for multimedia communications,”IEEE Communications Magazine, pp. 34–41, December 1996.

[2] R. Cox, “Speech coding and synthesis,” inSpeech coding standards(W. Kleijn andK. Paliwal, eds.), ch. 2, pp. 49–78, Netherlands: Elsevier,1995.

[3] R. Steele,Delta modulation systems. London, UK: Pentech Press, 1975.

[4] K. Cattermole,Principles of Pulse Code Modulation. London, UK: Hiffe Books, 1969.

[5] J. Markel and A. Gray Jr.,Linear Prediction of Speech. New York, USA: Springer-Verlag, 1976.

[6] L. Rabiner and R. Schafer,Digital Processing of Speech Signals. Englewood Cliffs,NJ, USA: Prentice-Hall, 1978.

[7] B. Lindblom and S. Ohman,Frontiers of Speech Communication Research. New York,USA: Academic Press, 1979.

[8] J. Tobias, ed.,Foundations of Modern Auditory Theory. New York, USA: AcademicPress, 1970. ISBN: 0126919011.

[9] B. Atal and J. Remde, “A new model of LPC excitation for producing natural-soundingspeech at low bit rates,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’82[636], pp. 614–617.

[10] N. Jayant and P. Noll,Digital Coding of Waveforms, Principles and Applications toSpeech and Video. Englewood Cliffs, NJ, USA: Prentice-Hall, 1984.

[11] P. Kroon, E. Deprettere, and R. Sluyter, “Regular pulseexcitation — a novel approachto effective efficient multipulse coding of speech,”IEEE Transactions on Acoustics,Speech and Signal Processing, vol. 34, pp. 1054–1063, October 1986.

[12] P. Vary and R. Sluyter, “MATS-D speech codec: Regular-pulse excitation LPC,” inProceedings of the Nordic Seminar on Digital Land Mobile Radio Communications(DMRII), (Stockholm, Sweden), pp. 257–261, October 1986.

843

Page 156: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 844

844 BIBLIOGRAPHY

[13] P. Vary and R. Hoffmann, “Sprachcodec fur das europaische Funkfernsprechnetz,”Fre-quenz 42 (1988) 2/3, pp. 85–93, 1988.

[14] W. Hess,Pitch determination of speech signals: algorithms and devices. Berlin:Springer Verlag, 1983.

[15] G. Gordos and G. Takacs,Digital Speech Processing (Digitalis Beszed Feldolgozas).Budapest, Hungary: Technical Publishers (Muszaki Kiado),1983. in Hungarian.

[16] M. Schroeder and B. Atal, “Code excited linear prediction (CELP): High-qualityspeech at very low bit rates,” inProceedings of International Conference on Acous-tics, Speech, and Signal Processing, ICASSP’85, (Tampa, Florida, USA), pp. 937–940,IEEE, 26–29 March 1985.

[17] D. O’Shaughnessy,Speech Communication: Human and Machine. Addison-Wesley,1987. ISBN: 0780334493.

[18] P. Papamichalis,Practical Approaches to Speech Coding. Englewood Cliffs, NJ, USA:Prentice-Hall, 1987.

[19] J. Deller, J. Proakis, and J. Hansen,Discrete-time processing of speech signals. Engle-wood Cliffs, NJ, USA: Prentice-Hall, 1987.

[20] P. Lieberman and S. Blumstein,Speech physiology, speech perception, and acousticphonetics. Cambridge: Cambridge University Press, 1988.

[21] S. Quackenbush, T. Barnwell III, and M. Clements,Objective measures of speech qual-ity. Englewood Cliffs, NJ, USA: Prentice-Hall, 1988.

[22] S. Furui,Digital Speech Processing, Synthesis and Recognition. Marcel Dekker, 1989.

[23] R. Steele, C.-E. Sundberg, and W. Wong, “Transmission of log-PCM via QAM overGaussian and Rayleigh fading channels,”IEE Proceedings, vol. 134, Pt. F, pp. 539–556, October 1987.

[24] R. Steele, C.-E. Sundberg, and W. Wong, “Transmission errors in companded PCMover Gaussian and Rayleigh fading channels,”AT&T Bell Laboratories Technical Jour-nal, pp. 995–990, July–August 1984.

[25] C.-E. Sundberg, W. Wong, and R. Steele, “Weighting strategies for companded PCMtransmitted over Rayleigh fading and Gaussian channels,”AT&T Bell LaboratoriesTechnical Journal, vol. 63, pp. 587–626, April 1984.

[26] W. Wong, R. Steele, and C.-E. Sundberg, “Soft decision demodulation to reduce theeffect of transmission errors in logarithmic PCM transmitted over Rayleigh fadingchannels,”AT&T Bell Laboratories Technical Journal, vol. 63, pp. 2193–2213, De-cember 1984.

[27] J. Hagenauer, “Source-controlled channel decoding,”IEEE Transactions on Commu-nications, vol. 43, pp. 2449–2457, September 1995.

Page 157: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 845

BIBLIOGRAPHY 845

[28] “GSM 06.90: Digital cellular telecommunications system (Phase 2+).” AdaptiveMulti-Rate (AMR) speech transcoding, version 7.0.0, Release 1998.

[29] S. Bruhn, E. Ekudden, and K. Hellwig, “Adaptive Multi-Rate: A new speech servicefor GSM and beyond,” inProceedings of 3rd ITG Conference on Source and ChannelCoding, (Technical Univ. Munich, Germany), pp. 319–324, 17th-19th, January 2000.

[30] L. Chiariglione, “MPEG: a technological basis for multimedia application,”IEEE Mul-timedia, vol. 2, pp. 85–89, Spring 1995.

[31] L. Chiariglione, “The development of an integrated audiovisual coding standard:MPEG,” Proceedings of the IEEE, vol. 83, pp. 151–157, Feb 1995.

[32] L. Chiariglione, “MPEG and multimedia communications,” IEEE Transactions on Cir-cuits and Systems for Video Technology,, vol. 1, pp. 5–18, Feb 1997.

[33] “Information technology - Coding of moving pictures and associated audio for dig-ital storage media up to about 1.5 Mbit/s - Part 3: Audio, IS11172-3 .” ISO/IECJTC1/SC29/WG11/ MPEG-1, 1992.

[34] D. Pan, “A Tutorial on MPEG/Audio Compression,”IEEE Multimedia, vol. 2, no. 2,pp. 60–74, Summer 1995.

[35] K. Brandenburg and G. Stoll, “ISO-MPEG1 Audio: A Generic Standard for Coding ofHigh- Quality Digital Audio,”Journal of Audio Engineering Society, vol. 42, pp. 780–792, Oct 1994.

[36] S. Shlien, “Guide to MPEG1 Audio Standard,”IEEE Transactions on Braodcasting,vol. 40, pp. 206–218, Dec 1994.

[37] P. Noll, “MPEG Digital Audio Coding,”IEEE Signal Processing Magazine, vol. 14,pp. 59–81, Sept 1997.

[38] K. Brandenburg and M. Bosi, “Overview of MPEG Audio: Current and Future Stan-dards for Low-Bit-Rate Audio Coding,”Journal of Audio Engineering Society, vol. 45,pp. 4–21, Jan/Feb 1997.

[39] “Information technology - Generic coding of moving pictures and associated audio -Part 3: Audio, IS13818-3 .” ISO/IEC JTC1/SC29/WG11/ MPEG-2,1994.

[40] M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K.Akagiri, H. Fuchs, M. Dietz,J. Herre, G. Davidson, and Y. Oikawa, “ISO/IEC MPEG-2 Advanced Audio Coding,”Journal of Audio Engineering Society, vol. 45, pp. 789–814, Oct 1997.

[41] ISO/IEC JTC1/SC29/WG11/N2203, MPEG-4 Audio Version 1 Final Committee Draft14496-3, http://www.tnt.uni-hannover.de/project/mpeg/audio/documents/, March 1998.

[42] R. Koenen, “MPEG-4 Overview.” http://www.cselt.it/mpeg/standards/mpeg-4/mpeg-4.htm.

Page 158: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 846

846 BIBLIOGRAPHY

[43] S. R. Quackenbush, “Coding of Natural Audio in MPEG-4,”in Proceedings of ICASSP,Seattle, Washington, USA, vol. 6, pp. 3797–3800, 12-15th, May 1998.

[44] K. Brandenburg, O. Kunz, and A. Sugiyama, “MPEG-4 Natural Audio Coding,”SignalProcessing:Image Communication, vol. 15, no. 4, pp. 423–444, 2000.

[45] L. Contin, B. Edler, D. Meares, and P. Schreiner, “Testson MPEG-4 audio codecproposals,”Signal Processing:Image Communication, vol. 9, pp. 327–342, May 1997.

[46] ISO/IEC JTC1/SC29/WG11/N2203, MPEG-4 Audio Version 2 Final Commit-tee Draft 14496-3 AMD1,http://www.tnt.uni-hannover.de/project/mpeg/audio/documents/w2803.html, July 1999.

[47] E. D. Scheirer, “The MPEG-4 Structured Audio Standard,” in Proceedings of ICASSP,Seattle, Washington, USA, vol. 6, pp. 3801–3804, 12-15th, May 1998.

[48] B. Vercoe, W. Gardner, and E. Scheirer, “Structured Audio: Creation, transmissionand rendering of parametric sound representation,”Proceedings of the IEEE, vol. 86,pp. 922–940, May 1998.

[49] B. Edler, “Speech Coding in MPEG-4,”International Journal of Speech Technology,vol. 2, pp. 289–303, May 1999.

[50] S. Alamouti, “A simple transmit diversity technique for wireless communications,”IEEE Journal on Selected Areas in Communications, vol. 16, pp. 1451–1458, Oct1998.

[51] L. Hanzo, W. T. Webb, and T. Keller,Single and Multicarrier Quadrature AmplitudeModulation : Principles and Applications for Personal Communications, WLANs andBroadcasting (2nd Ed.). John Wiley-IEEE Press, April 2000.

[52] B. Atal, V. Cuperman, and A. Gersho, eds.,Advances in Speech Coding. Dordrecht:Kluwer Academic Publishers, January 1991. ISBN: 0792390911.

[53] A. Ince, ed.,Digital Speech Processing: Speech Coding, Synthesis and Recognition.Dordrecht: Kluwer Academic Publishers, 1992.

[54] J. Anderson and S. Mohan,Source and Channel Coding — An Algorithmic Approach.Dordrecht: Kluwer Academic Publishers, 1993.

[55] A. Kondoz,Digital Speech: Coding for low bit rate communications systems. NewYork, USA: John Wiley, 1994.

[56] W. Kleijn and K. Paliwal, eds.,Speech Coding and Synthesis. The Netherlands: Else-vier Science, 1995.

[57] C. Shannon,Mathematical Theory of Communication. University of Illinois Press,1963.

[58] J. Hagenauer, “Quellengesteuerte kanalcodierung fuer sprach- und tonuebertragung immobilfunk,” Aachener Kolloquium Signaltheorie, pp. 67–76, 23–25 March 1994.

Page 159: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 847

BIBLIOGRAPHY 847

[59] A. Viterbi, “Wireless digital communications: A view based on three lessons learned,”IEEE Communications Magazine, pp. 33–36, September 1991.

[60] S. Lloyd, “Least squares quantisation in PCM,”Institute of Mathematical StatisticsMeeting, Atlantic City, NJ, USA, September 1957.

[61] S. Lloyd, “Least squares quantisation in PCM,”IEEE Transactions on InformationTheory, vol. 28, no. 2, pp. 129–136, 1982.

[62] J. Max, “Quantising for minimum distortion,”IRE Transactions on Information The-ory, vol. 6, pp. 7–12, 1960.

[63] W. Bennett, “Spectra of quantised signals,”Bell System Technical Journal, pp. 446–472, July 1946.

[64] H. Holtzwarth, “Pulse code modulation und ihre verzerrung bei logarithmischer quan-teilung,” Archiv der Elektrischen Uebertragung, pp. 227–285, January 1949.

[65] P. Panter and W. Dite, “Quantisation distortion in pulse code modulation with non-uniform spacing of levels,”Proceedings of the IRE, pp. 44–48, January 1951.

[66] B. Smith, “Instantaneous companding of quantised signals,” Bell System TechnicalJournal, pp. 653–709, 1957.

[67] P. Noll and R. Zelinski, “A contribution to the quantisation of memoryless modelsources,”Technical Report, Heinrich Heinz Institute, Berlin, 1974. (in German).

[68] M. Paez and T. Glisson, “Minimum mean squared error quantisation in speech PCMand DPCM systems,”IEEE Transactions on Communications, pp. 225–230, April1972.

[69] A. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, NJ, USA:Prentice-Hall, 1989.

[70] R. Salami,Robust Low Bit Rate Analysis-by-Synthesis Predictive Speech Coding. PhDthesis, University of Southampton, UK, 1990.

[71] R. Salami, L. Hanzo, R. Steele, K. Wong, and I. Wassell, “Speech coding,” in Steeleand Hanzo [575], ch. 3, pp. 186–346.

[72] S. Haykin,Adaptive Filter Theory. Englewood Cliffs, NJ, USA: Prentice-Hall, 1996.

[73] W. Webb, “Sizing up the microcell for mobile radio communications,”IEE Electronicsand communications Journal, vol. 5, pp. 133–140, June 1993.

[74] K. Wong and L. Hanzo, “Channel coding,” in Steele and Hanzo [575], ch. 4, pp. 347–488.

[75] A. Jennings,Matrix Computation for Engineers and Scientists. New York, USA: JohnWiley and Sons Ltd., 1977.

[76] J. Makhoul, “Stable and efficient lattice methods for linear prediction,”IEEE Transac-tions on Acoustic Speech Signal Processing, vol. 25, pp. 423–428, October 1977.

Page 160: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 848

848 BIBLIOGRAPHY

[77] J. Makhoul, “Linear prediction: A tutorial review,”Proceedings of the IEEE, vol. 63,pp. 561–580, April 1975.

[78] N. Jayant, “Adaptive quantization with a one-word memory,” Bell System TechnicalJournal, vol. 52, pp. 1119–1144, September 1973.

[79] R. Steedman, “The common air interface MPT 1375,” in Tuttlebee [626]. ISBN3540196331.

[80] L. Hanzo, “The British cordless telephone system: CT2,” in Gibson [622], ch. 29,pp. 462–477.

[81] H. Ochsner, “The digital european cordless telecommunications specification, DECT,”in Tuttlebee [626], pp. 273–285. ISBN 3540196331.

[82] S. Asghar, “Digital European Cordless Telephone,” in Gibson [622], ch. 30, pp. 478–499.

[83] “Personal handy phone (PHP) system.” RCR Standard, STD-28, Japan.

[84] “CCITT recommendation G.721.”

[85] N. Kitawaki, M. Honda, and K. Itoh, “Speech-quality assessment methods for speechcoding systems,”IEEE Communications Magazine, vol. 22, pp. 26–33, October 1984.

[86] A. Gray and J. Markel, “Distance measures for speech processing,”IEEE Transactionson Acoustic Speech Signal Processing, vol. 24, no. 5, pp. 380–391, 1976.

[87] N. Kitawaki, H. Nagabucki, and K. Itoh, “Objective quality evaluation for low-bit-ratespeech coding systems,”IEEE Journal on Selected Areas in Communications, vol. 6,pp. 242–249, February 1988.

[88] P. Noll and R. Zelinski, “Bounds on quantizer performance in the low bit-rate region,”IEEE Transactions on Communications, pp. 300–304, February 1978.

[89] T. Thorpe, “The mean squared error criterion: Its effect on the performance of speechcoders,” inProceedings of International Conference on Acoustics, Speech, and SignalProcessing, ICASSP’89[629], pp. 77–80.

[90] J. O’Neal, “Bounds on subjective performance measuresfor source encoding systems,”IEEE Transactions on Information Theory, pp. 224–231, May 1971.

[91] J. Makhoul, S. Roucos, and H. Gish, “Vector quantization in speech coding,”Proceed-ings of the IEEE, pp. 1551–1588, November 1985.

[92] B. Atal and M. Schroeder, “Predictive coding of speech signals and subjective errorcriteria,” IEEE Transactions on Acoustics, Speech and Signal Processing, pp. 247–254, June 1979.

[93] R. Steele, “Deploying personal communications networks,” IEEE CommunicationsMagazine, pp. 12–15, September 1990.

Page 161: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 849

BIBLIOGRAPHY 849

[94] J.-H. Chen, R. Cox, Y. Lin, N. Jayant, and M. Melchner, “Alow-delay CELP codecfor the CCITT 16 kb/s speech coding standard,”IEEE Journal on Selected Areas inCommunications, vol. 10, pp. 830–849, June 1992.

[95] D. Sen and W. Holmes, “PERCELP-perceptually enhanced random codebook excitedlinear prediction,” inProceedings of IEEE Workshop on Speech Coding for Telecom-munications, pp. 101–102, 1993.

[96] S. Singhal and B. Atal, “Improving performance of multi-pulse LPC coders at low bitrates,” inProceedings of International Conference on Acoustics, Speech, and SignalProcessing, ICASSP’84[628], pp. 1.3.1–1.3.4.

[97] “Group speciale mobile (GSM) recommendation,” April 1988.

[98] L. Hanzo and J. Stefanov, “The Pan-European Digital Cellular Mobile Radio System— known as GSM,” in Steele and Hanzo [575], ch. 8, pp. 677–765.

[99] S. Singhal and B. Atal, “Amplitude optimization and pitch prediction in multipulsecoders,”IEEE Transactions on Acoustics, Speech and Signal Processing, pp. 317–327,March 1989.

[100] “Federal standard 1016 – telecommunications: Analogto digital conversion of radiovoice by 4,800 bits/second code excited linear prediction (CELP),” February 14 1991.

[101] S. Wang and A. Gersho, “Phonetic segmentation for low rate speech coding,” in Atalet al. [52], pp. 257–266. ISBN: 0792390911.

[102] P. Lupini, H. Hassanein, and V. Cuperman, “A 2.4 kbit/sCELP speech codec withclass-dependent structure,” inProceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP’93)[633], pp. 143–146.

[103] D. Griffin and J. Lim, “Multiband excitation vocoder,”IEEE Transactions on Acous-tics, Speech and Signal Processing, pp. 1223–1235, August 1988.

[104] M. Nishiguchi, J. Matsumoto, R. Wakatsuki, and S. Ono,“Vector quantized MBEwith simplified v/uv division at 3.0Kbps,” inProceedings of the IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP’93)[633], pp. 151–154.

[105] W. Kleijn, “Encoding speech using prototype waveforms,” IEEE Transactions onSpeech and Audio Processing, vol. 1, pp. 386–399, October 1993.

[106] V. Ramamoorthy and N. Jayant, “Enhancement of ADPCM speech by adaptive post-filtering,” Bell Systems Technical Journal, vol. 63, pp. 1465–1475, October 1984.

[107] N. Jayant and V. Ramamoorthy, “Adaptive postfilteringof 16 kb/s-ADPCM speech,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’86, (Tokyo, Japan), pp. 829–832, IEEE, 7–11 April 1986.

[108] J.-H. Chen and A. Gersho, “Real-time vector APC speechcoding at 4800 bps withadaptive postfiltering,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’87[631], pp. 2185–2188.

Page 162: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 850

850 BIBLIOGRAPHY

[109] ITU-T, CCITT Recommendation G.728: Coding of Speech at 16 kbit/s Using Low-Delay Code Excited Linear Prediction, 1992.

[110] J.-H. Chen and A. Gersho, “Adaptive postfiltering for quality enhancement of codedspeech,”IEEE Transactions on Speech and Audio Processing, vol. 3, pp. 59–71, Jan-uary 1995.

[111] F. Itakura and S. Saito, “Analysis-synthesis telephony based upon the maximum likeli-hood method,” inProceedings of the 6th International Congress on Acoustic, (Tokyo,Japan), pp. C17–20, 1968.

[112] F. Itakura and S. Saito, “A statistical method for estimation of speech spectral den-sity and formant frequencies,”Electronics and Communications in Japan, vol. 53-A,pp. 36–43, 1970.

[113] N. Kitawaki, K. Itoh, and F. Itakura, “PARCOR speech analysis synthesis system,”Review of the Electronic Communication Lab., Nippon TTPC, vol. 26, pp. 1439–1455,November-December 1978.

[114] R. Viswanathan and J. Makhoul, “Quantization properties of transmission parametersin linear predictive systems,”IEEE Transactions on Acoustic Speech Signal Process-ing, pp. 309–321, 1975.

[115] N. Sugamura and N. Farvardin, “Quantizer design in LSPanalysis-synthesis,”IEEEJournal on Selected Areas in Communications, vol. 6, pp. 432–440, February 1988.

[116] K. Paliwal and B. Atal, “Efficient vector quantizationof LPC parameters at 24bits/frame,” IEEE Transactions on Speech and Audio Processing, vol. 1, pp. 3–14,January 1993.

[117] F. Soong and B.-H. Juang, “Line spectrum pair (LSP) andspeech data compression,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’84[628], pp. 1.10.1–1.10.4.

[118] G. Kang and L. Fransen, “Low-bit rate speech encoders based on line-spectrum fre-quencies (LSFs),” Tech. Rep. 8857, NRL, November 1984.

[119] P. Kabal and R. Ramachandran, “The computation of linespectral frequencies us-ing Chebyshev polynomials,”IEEE Transactions Acoustic Speech Signal Processing,vol. 34, pp. 1419–1426, December 1986.

[120] M. Omologo, “The computation and some spectral considerations on line spectrumpairs (LSP),” inProceedings EUROSPEECH, pp. 352–355, 1989.

[121] B. Cheetham, “Adaptive LSP filter,”Electronics Letters, vol. 23, pp. 89–90, January1987.

[122] K. Geher,Linear Circuits. Budapest, Hungary: Technical Publishers, 1972. (in Hun-garian).

Page 163: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 851

BIBLIOGRAPHY 851

[123] N. Sugamura and F. Itakura, “Speech analysis and synthesis methods developed atECL in NTT– from LPC to LSP,”Speech Communications, vol. 5, pp. 199–215, June1986.

[124] A. Lepschy, G. Mian, and U. Viaro, “A note on line spectral frequencies,”IEEE Trans-actions on Acoustic Speech Signal Processing, vol. 36, pp. 1355–1357, August 1988.

[125] B. Cheetham and P. Huges, “Formant estimation from LSPcoefficients,” inProceed-ings IERE 5th International Conference on Digital Processing of Signals in Commu-nications, pp. 183–189, 20–23 September 1988.

[126] A. Gersho and R. Gray,Vector Quantization and Signal Compression. Dordrecht:Kluwer Academic Publishers, 1992.

[127] Y. Shoham, “Vector predictive quantization of the spectral parameters for low ratespeech coding,” inProceedings of International Conference on Acoustics, Speech, andSignal Processing, ICASSP’87[631], pp. 2181–2184.

[128] R. Ramachandran, M. Sondhi, N. Seshadri, and B. Atal, “A two codebook format forrobust quantisation of line spectral frequencies,”IEEE Transactions on Speech andAudio Processing, vol. 3, pp. 157–168, May 1995.

[129] C. Xydeas and K. So, “Improving the performance of the long history scalar and vectorquantisers,” inProceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP’93)[633], pp. 1–4.

[130] K. Lee, A. Kondoz, and B. Evans, “Speaker adaptive vector quantisation of LPC pa-rameters of speech,”Electronic Letters, vol. 24, pp. 1392–1393, October 1988.

[131] B. Atal, “Stochastic gaussian model for low-bit rate coding of LPC area parameters,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’87[631], pp. 2404–2407.

[132] R. Salami, L. Hanzo, and D. Appleby, “A fully vector quantised self-excited vocoder,”in Proceedings of International Conference on Acoustics, Speech, and Signal Process-ing, ICASSP’89[629], pp. 124–128.

[133] M. Yong, G. Davidson, and A. Gersho, “Encoding of LPC spectral parameters usingswitched-adaptive interframe vector prediction,” inProceedings of International Con-ference on Acoustics, Speech, and Signal Processing, ICASSP’88 [632], pp. 402–405.

[134] J. Huang and P. Schultheis, “Block quantization of correlated gaussian random vari-ables,”IEEE Transactions Communication Systems, vol. 11, pp. 289–296, September1963.

[135] R. Salami, L. Hanzo, and D. Appleby, “A computationally efficient CELP codec withstochastic vector quantization of LPC parameters,” inURSI International Symposiumon Signals, Systems and Electronics, (Erlangen, West Germany), pp. 140–143, 18–20September 1989.

Page 164: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 852

852 BIBLIOGRAPHY

[136] B. Atal, R. Cox, and P. Kroon, “Spectral quantization and interpolation for CELPcoders,” inProceedings of International Conference on Acoustics, Speech, and SignalProcessing, ICASSP’89[629], pp. 69–72.

[137] R. Laroia, N. Phamdo, and N. Farvardin, “Robust and efficient quantisation of speechLSP parameters using structured vector quantisers,” inProceedings of InternationalConference on Acoustics, Speech, and Signal Processing, ICASSP’91[630], pp. 641–644.

[138] H. Harborg, J. Knudson, A. Fudseth, and F. Johansen, “Areal time wideband CELPcoder for a videophone application,” inProceedings of the IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP’94) [615], pp. II121–II124.

[139] R. Lefebvre, R. Salami, C. Laflamme, and J. Adoul, “Highquality coding of widebandaudio signals using transform coded excitation (TCX),” inProceedings of the IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP’94)[615], pp. I193–I196.

[140] J. Paulus and J. Schnitzler, “16kbit/s wideband speech coding based on unequal sub-bands,” inProceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP’96)[616], pp. 255–258.

[141] J. Chen and D. Wang, “Transform predictive coding of wideband speech signals,” inProceedings of the IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP’96)[616], pp. 275–278.

[142] A. Ubale and A. Gersho, “A multi-band CELP wideband speech coder,” inProceedingsof the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’97)[617], pp. 1367–1370.

[143] P. Combescure, J. Schnitzler, K. Fischer, R. Kirchherr, C. Lamblin, A. L. Guyader,D. Massaloux, C. Quinquis, J. Stegmann, and P. Vary, “A 16, 24, 32 Kbit/s wide-band speech codec based on ATCELP,” inProceedings of International Conferenceon Acoustics, Speech, and Signal Processing, ICASSP’99, IEEE, 1999.

[144] F. Itakura, “Line spectrum representation of linear predictive coefficients of speechsignals,”Journal of the Acoustic Society of America, vol. 57, p. S35, 1975.

[145] L. Rabiner, M. Sondhi, and S. Levinson, “Note on the properties of a vector quantizerfor LPC coefficients,”The Bell System Technical Journal, vol. 62, pp. 2603–2616,October 1983.

[146] “7 khz audio coding within 64 kbit/s.” CCITT Recommendation G.722, 1988.

[147] “Recommendation G.729: Coding of speech at 8 kbit/s using conjugate-structurealgebraic-code-excited linear-prediction (CS-ACELP).”CCITT Study Group XVIII,30 June 1995. Version 6.31.

[148] T. Eriksson, J. Linden, and J. Skoglung, “A safety-netapproach for improved exploita-tion of speech correlation,” inProceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP’95)[614], pp. 96–101.

Page 165: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 853

BIBLIOGRAPHY 853

[149] T. Eriksson, J. Linden, and J. Skoglung, “Exploiting interframe correlation in spectralquantization — a study of different memory VQ schemes,” inProceedings of the IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP’96)[616], pp. 765–768.

[150] H. Zarrinkoub and P. Mermelstein, “Switched prediction and quantization of LSP fre-quencies,” inProceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP’96)[616], pp. 757–764.

[151] J. Natvig, “Evaluation of six medium bit-rate coders for the Pan-European digital mo-bile radio system,”IEEE Journal on Selected Areas in Communications, pp. 324–331,February 1988.

[152] J. Schur, “Uber potenzreihen, die im innern des einheitskreises beschrankt sind,”Jour-nal fur die reine und angewandte Mathematik, Bd 14, pp. 205–232, 1917.

[153] W. Webb, L. Hanzo, R. Salami, and R. Steele, “Does 16-QAM provide an alternativeto a half-rate GSM speech codec ?,” inProceedings of IEEE Vehicular TechnologyConference (VTC’91), (St. Louis, MO, USA), pp. 511–516, IEEE, 19–22 May 1991.

[154] L. Hanzo, W. Webb, R. Salami, and R. Steele, “On QAM speech transmission schemesfor microcellular mobile PCNs,”European Transactions on Communications, pp. 495–510, September/October 1993.

[155] J. Williams, L. Hanzo, R. Steele, and J. Cheung, “A comparative study of microcellularspeech transmission schemes,”IEEE Transactions on Vehicular Technology, vol. 43,pp. 909–925, November 1994.

[156] “Cellular system dual-mode mobile station-base station compatibility standard IS-54B.” Telecommunications Industry Association Washington DC, 1992. EIA/TIA In-terim Standard.

[157] Research and Development Centre for Radio Systems, Japan,Public Digital Cellular(PDC) Standard, RCR STD-27.

[158] R. Steele and L. Hanzo, eds.,Mobile Radio Communications. New York, USA: IEEEPress-John Wiley, 2nd ed., 1999.

[159] L. Hanzo, W. Webb, and T. Keller,Single- and Multi-carrier Quadrature AmplitudeModulation. New York, USA: IEEE Press-John Wiley, April 2000.

[160] R. Salami, C. Laflamme, J.-P. Adoul, and D. Massaloux, “A toll quality 8 kb/s speechcodec for the personal communications system (PCS),”IEEE Transactions on Vehicu-lar Technology, pp. 808–816, August 1994.

[161] A. Black, A. Kondoz, and B. Evans, “High quality low delay wideband speech codingat 16 kbit/sec,” inProceedings of 2nd International Workshop on Mobile MultimediaCommunications, 11–14 April 1995. Bristol University, UK.

Page 166: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 854

854 BIBLIOGRAPHY

[162] C. Laflamme, J.-P. Adoul, R. Salami, S. Morissette, andP. Mabilleau, “16 Kbps wide-band speech coding technique based on algebraic CELP,” inProceedings of Inter-national Conference on Acoustics, Speech, and Signal Processing, ICASSP’91[630],pp. 13–16.

[163] R. Salami, C. Laflamme, and J.-P. Adoul, “Real-time implementation of a 9.6 kbit/sACELP wideband speech coder,” inProceedings of GLOBECOM ’92, 1992.

[164] I. Gerson and M. Jasiuk, “Vector sum excited linear prediction (VSELP),” in Atalet al. [52], pp. 69–80. ISBN: 0792390911.

[165] M. Ireton and C. Xydeas, “On improving vector excitation coders through the use ofspherical lattice codebooks (SLC’s),” inProceedings of International Conference onAcoustics, Speech, and Signal Processing, ICASSP’89[629], pp. 57–60.

[166] C. Lamblin, J. Adoul, D. Massaloux, and S. Morissette,“Fast CELP coding based onthe barnes-wall lattice in 16 dimensions,” inProceedings of International Conferenceon Acoustics, Speech, and Signal Processing, ICASSP’89[629], pp. 61–64.

[167] C. Xydeas, M. Ireton, and D. Baghbadrani, “Theory and real time implementation ofa CELP coder at 4.8 and 6.0 kbit/s using ternary code excitation,” in Proceedings ofIERE 5th International Conference on Digital Processing ofSignals in Communica-tions, pp. 167–174, September 1988.

[168] J. Adoul, P. Mabilleau, M. Delprat, and S. Morissette,“Fast CELP coding based onalgebraic codes,” inProceedings of International Conference on Acoustics, Speech,and Signal Processing, ICASSP’87[631], pp. 1957–1960.

[169] L. Hanzo and J. Woodard, “An intelligent multimode voice communications systemfor indoor communications,”IEEE Transactions on Vehicular Technology, vol. 44,pp. 735–748, November 1995. ISSN 0018-9545.

[170] A. Kataoka, J.-P. Adoul, P. Combescure, and P. Kroon, “ITU-T 8-kbits/s standardspeech codec for personal communication services,” inProceedings of InternationalConference on Universal Personal Communications 1985, (Tokyo, Japan), pp. 818–822, November 1995.

[171] H. Law and R. Seymour, “A reference distortion system using modulated noise,”IEEPaper, pp. 484–485, November 1962.

[172] P. Kabal, J. Moncet, and C. Chu, “Synthesis filter optimization and coding: Applica-tions to CELP,” inProceedings of International Conference on Acoustics, Speech, andSignal Processing, ICASSP’88[632], pp. 147–150.

[173] Y. Tohkura, F. Itakura, and S. Hashimoto, “Spectral smoothing technique in PARCORspeech analysis-synthesis,”IEEE Transactions on Acoustics, Speech and Signal Pro-cessing, pp. 587–596, 1978.

[174] J.-H. Chen and R. Cox, “Convergence and numerical stability of backward-adaptiveLPC predictor,” inProceedings of IEEE Workshop on Speech Coding for Telecommu-nications, pp. 83–84, 1993.

Page 167: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 855

BIBLIOGRAPHY 855

[175] S. Singhal and B. Atal, “Optimizing LPC filter parameters for multi-pulse excitation,”in Proceedings of International Conference on Acoustics, Speech, and Signal Process-ing, ICASSP’83[634], pp. 781–784.

[176] M. Fratti, G. Miani, and G. Riccardi, “On the effectiveness of parameter reoptimizationin multipulse based coders,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’92[635], pp. 73–76.

[177] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C. Cam-bridge: Cambridge University Press, 1992.

[178] G. Golub and C. V. Loan, “An analysis of the total least squares problem,”SIAMJournal of Numerical Analysis, vol. 17, no. 6, pp. 883–890, 1980.

[179] M. A. Rahham and K.-B. Yu, “Total least squares approach for frequency estimationusing linear prediction,”IEEE Transactions on Acoustics, Speech and Signal Process-ing, pp. 1440–1454, 1987.

[180] R. Degroat and E. Dowling, “The data least squares problem and channel equaliza-tion,” IEEE Transactions on Signal Processing, pp. 407–411, 1993.

[181] F. Tzeng, “Near-optimum linear predictive speech coding,” in IEEE Global Telecom-munications Conference, pp. 508.1.1–508.1.5, 1990.

[182] M. Niranjan, “CELP coding with adaptive output-errormodel identification,” inPro-ceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’90[623], pp. 225–228.

[183] J. Woodard and L. Hanzo, “Improvements to the analysis-by-synthesis loop in CELPcodecs,” inProceedings of IEE Conference on Radio Receivers and Associated Sys-tems (RRAS’95)[627], pp. 114–118.

[184] L. Hanzo, R. Salami, R. Steele, and P. Fortune, “Transmission of digitally encodedspeech at 1.2 Kbaud for PCN,”IEE Proceedings, Part I, vol. 139, pp. 437–447, August1992.

[185] R. Cox, W. Kleijn, and P. Kroon, “Robust CELP coders fornoisy backgrounds andnoisy channels,” inProceedings of International Conference on Acoustics, Speech,and Signal Processing, ICASSP’89[629], pp. 739–742.

[186] J. Campbell, V. Welch, and T. Tremain, “An expandable error-protected 4800 bpsCELP coder (U.S. federal standard 4800 bps voice coder),” inProceedings of Inter-national Conference on Acoustics, Speech, and Signal Processing, ICASSP’89[629],pp. 735–738.

[187] S. Atungsiri, A. Kondoz, and B. Evans, “Error control for low-bit-rate speech commu-nication systems,”IEE Proceedings-I, vol. 140, pp. 97–104, April 1993.

[188] L. Ong, A. Kondoz, and B. Evans, “Enhanced channel coding using source criteria inspeech coders,”IEE Proceedings-I, vol. 141, pp. 191–196, June 1994.

Page 168: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 856

856 BIBLIOGRAPHY

[189] W. Kleijn, “Source-dependent channel coding and its application to CELP,” in Atalet al. [52], pp. 257–266. ISBN: 0792390911.

[190] J. Woodard and L. Hanzo, “A dual-rate algebraic CELP-based speech transceiver,” inProceedings of IEEE VTC ’94[619], pp. 1690–1694.

[191] C. Laflamme, J.-P. Adoul, H. Su, and S. Morissette, “On reducing the complexityof codebook search in CELP through the use of algebraic codes,” in Proceedings ofInternational Conference on Acoustics, Speech, and SignalProcessing, ICASSP’90[623], pp. 177–180.

[192] S. Nanda, D. Goodman, and U. Timor, “Performance of PRMA: A packet voiceprotocol for cellular systems,”IEEE Transactions on Vehicular Technology, vol. 40,pp. 584–598, August 1991.

[193] M. Frullone, G. Riva, P. Grazioso, and C. Carciofy, “Investigation on dynamic channelallocation strategies suitable for PRMA schemes,”IEEE International Symposium onCircuits and Systems, pp. 2216–2219, May 1993.

[194] J. Williams, L. Hanzo, and R. Steele, “Channel-adaptive voice communications,”in Proceedings of IEE Conference on Radio Receivers and Associated Systems(RRAS’95)[627], pp. 144–147.

[195] W. Lee, “Estimate of channel capacity in Rayleigh fading environment,”IEEE Trans-actions on Vehicular Technology, vol. 39, pp. 187–189, August 1990.

[196] T. Tremain, “The government standard linear predictive coding algorithm: LPC-10,”Speech Technology, vol. 1, pp. 40–49, April 1982.

[197] J. Campbell, T. Tremain, and V. Welch, “The DoD 4.8 kbpsstandard (proprosed federalstandard 1016),” in Atalet al. [52], pp. 121–133. ISBN: 0792390911.

[198] J. Marques, I. Trancoso, J. Tribolet, and L. Almeida, “Improved pitch prediction withfractional delays in CELP coding,” inProceedings of International Conference onAcoustics, Speech, and Signal Processing, ICASSP’90[623], pp. 665–668.

[199] W. Kleijn, D. Kraisinsky, and R. Ketchum, “An efficientstochastically excited lin-ear predictive coding algorithm for high quality low bit rate transmission of speech,”Speech Communication, pp. 145–156, October 1988.

[200] Y. Shoham, “Constrained-stochastic excitation coding of speech at 4.8 kb/s,” in Atalet al. [52], pp. 339–348. ISBN: 0792390911.

[201] A. Suen, J. Wand, and T. Yao, “Dynamic partial search scheme for stochastic codebookof FS1016 CELP coder,”IEE Proceedings, vol. 142, no. 1, pp. 52–58, 1995.

[202] I. Gerson and M. Jasiuk, “Vector sum excited linear prediction (VSELP) speech cod-ing at 8 kbps,” inProceedings of International Conference on Acoustics, Speech, andSignal Processing, ICASSP’90[623], pp. 461–464.

Page 169: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 857

BIBLIOGRAPHY 857

[203] I. Gerson and M. Jasiuk, “Techniques for improving theperformance of CELP-typespeech codecs,”IEEEJournal on Selected Areas in Communications, vol. 10, pp. 858–865, June 1992.

[204] I. Gerson, “Method and means of determining coefficients for linear predictive cod-ing.” US Patent No 544,919, October 1985.

[205] A. Cumain, “On a covariance-lattice algorithm for linear prediction,” inProceedingsof International Conference on Acoustics, Speech, and Signal Processing, ICASSP’82[636], pp. 651–654.

[206] W. Gardner, P. Jacobs, and C. Lee, “QCELP: a variable rate speech coder for CDMAdigital cellular,” in Speech and Audio Coding for Wireless and Network Applications(B. Atal, V. Cuperman, and A. Gersho, eds.), pp. 85–92, Dordrecht: Kluwer AcademicPublishers, 1993.

[207] Telcomm. Industry Association (TIA), Washington, DC, USA,Mobile station — Basestation compatibility standard for dual-mode wideband spread spectrum cellular sys-tem, EIA/TIA Interim Standard IS-95, 1993.

[208] T. Ohya, H. Suda, and T. Miki, “5.6 kbits/s PSI-CELP of the half-rate pdc speechcoding standard,” inProceedings of Vehicular Technology Conference, vol. 3, (Stock-holm), pp. 1680–1684, IEEE, May 1994.

[209] T. Ohya, T. Miki, and H. Suda, “Jdc half-rate speech coding standard and a real timeoperating prototype,”NTT Review, vol. 6, pp. 61–67, November 1994.

[210] K. Mano, T. Moriya, S. Miki, H. Ohmuro, K. Ikeda, and J. Ikedo, “Design of a pitchsynchronous innovation CELP coder for mobile communications,” IEEE Journal onSelected Areas in Communications, vol. 13, no. 1, pp. 31–41, 1995.

[211] I. Gerson, M. Jasiuk, J.-M. Muller, J. Nowack, and E. Winter, “Speech and chan-nel coding for the half-rate GSM channel,”Proceedings ITG-Fachbericht, vol. 130,pp. 225–233, November 1994.

[212] A. Kataoka, T. Moriya, and S. Hayashi, “Implementation and performance of an 8-kbits/s conjugate structured CELP speech codec,” inProceedings of the IEEE Inter-national Conference on Acoustics, Speech and Signal Processing (ICASSP’94)[615],pp. 93–96.

[213] R. Salami, C. Laflamme, and J.-P. Adoul, “8 kbits/s ACELP coding of speech with 10ms speech frame: A candidate for CCITT standardization,” inProceedings of the IEEEInternational Conference on Acoustics, Speech and Signal Processing (ICASSP’94)[615], pp. 97–100.

[214] J. Woodard, T. Keller, and L. Hanzo, “Turbo-coded orthogonal frequency divisionmultiplex transmission of 8 kbps encoded speech,” inProceeding of ACTS MobileCommunication Summit ’97[621], pp. 894–899.

[215] T. Ojanpare et al., “FRAMES multiple access technology,” in Proceedings of IEEEISSSTA’96, vol. 1, (Mainz, Germany), pp. 334–338, IEEE, September 1996.

Page 170: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 858

858 BIBLIOGRAPHY

[216] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error-correctingcoding and decoding: Turbo codes,” inProceedings of the International Conferenceon Communications, (Geneva, Switzerland), pp. 1064–1070, May 1993.

[217] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding:Turbo codes,”IEEE Transactions on Communications, vol. 44, pp. 1261–1271, Octo-ber 1996.

[218] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolu-tional codes,”IEEE Transactions on Information Theory, vol. 42, pp. 429–445, March1996.

[219] P. Jung and M. Nasshan, “Performance evaluation of turbo codes for short frame trans-mission systems,”IEE Electronic Letters, pp. 111–112, January 1994.

[220] A. Barbulescu and S. Pietrobon, “Interleaver design for turbo codes,”IEE ElectronicLetters, pp. 2107–2108, December 1994.

[221] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes forminimising symbol error rate,”IEEE Transactions on Information Theory, vol. 20,pp. 284–287, March 1974.

[222] “COST 207: Digital land mobile radio communications,final report.” Office for Offi-cial Publications of the European Communities, 1989. Luxembourg.

[223] R. Salami, C. Laflamme, B. Bessette, and J.-P. Adoul, “Description of ITU-T rec-ommendation G.729 annex A: Reduced complexity 8 kbits/s CS-ACELP codec,” inProceedings of the IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP’97)[617], pp. 775–778.

[224] R. Salami, C. Laflamme, B. Bessette, and J.-P. Adoul, “ITU-T recommendationG.729 annex A: Reduced complexity 8 kbits/s CS-ACELP codec for digital simulta-neous voice and data (DVSD),”IEEE Communications Magazine, vol. 35, pp. 56–63,September 1997.

[225] R. Salami, C. Laflamme, B. Besette, J.-P. Adoul, K. Jarvinen, J. Vainio, P. Kapanen,T. Hankanen, and P. Haavisto, “Description of the GSM enhanced full rate speechcodec,” inProceedings of ICC’97, 1997.

[226] “PCS1900 enhanced full rate codec US1.” SP-3612.

[227] “IS-136.1A TDMA cellular/PCS — radio interface — mobile station — base stationcompatibility digital control channel,” August 1996. Revision A.

[228] T. Honkanen, J. Vainio, K. Jarvinen, P. Haavisto, R. Salami, C. Laflamme, andJ. Adoul, “Enhanced full rate speech codec for IS-136 digital cellular system,” inProceedings of the IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP’97)[617], pp. 731–734.

[229] “TIA/EIA/IS641, interim standard, TDMA cellular/PCS radio intergface — enhancedfull-rate speech codec,” May 1996.

Page 171: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 859

BIBLIOGRAPHY 859

[230] “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3kbit/s.” CCITT Recommendation G.723.1, March 1996.

[231] S. Bruhn, E. Ekudden, and K. Hellwig, “Adaptive Multi-Rate: A new speech servicefor GSM and beyond,” inProceedings of 3rd ITG Conference on Source and ChannelCoding, (Technical Univ. Munich, Germany), pp. 319–324, 17th-19th, January 2000.

[232] A. Das, E. Paksoy, and A. Gersho, “Multimode and Variable Rate Coding of Speech,”in Speech Coding and Synthesis(W. Kleijn and K. Paliwal, eds.), ch. 7, pp. 257–288,Elsevier, 1995.

[233] T. Taniguchi, S. Unagami, and R. Gray, “Multimode coding: a novel approachto narrow- and medium-band coding,”Journal of the Acoustic Society of America,vol. 84, p. S12, 1988.

[234] P. Kroon and B. Atal, “Strategies for improving CELP coders,” in Proceedings ofICASSP, vol. 1, pp. 151–154, 1988.

[235] A. D. Jaco, W. Gardner, P. Jacobs, and C. Lee, “QCELP: the North American CDMAdigital cellular variable rate speech coding standard,” inProceedings of IEEE Work-shop on Speech Coding for Telecommunications, pp. 5–6, 1993.

[236] E. Paksoy, K. Srinivasan, and A. Gersho, “Variable bitrate CELP coding of speechwith phonetic classification,”European Transactions on Telecommunications, vol. 5,pp. 591–602, Oct 1994.

[237] L. Cellario and D. Sereno, “CELP coding at variable rate,” European Transactions onTelecommunications, vol. 5, pp. 603–613, Oct 1994.

[238] E. Yuen, P. Ho, and V. Cuperman, “Variable Rate Speech and Channel Coding for Mo-bile Communications,” inProceedings of VTC, Stockholm, Sweden, pp. 1709–1712,8-10 June 1994.

[239] T. Kawashima, V. Sharma, and A. Gersho, “Capacity enhancement of cellular CDMAby traffic-based control of speech bit rate,”IEEE Transactions on Vehicular Technol-ogy, vol. 45, pp. 543–550, August 1996.

[240] W. P. LeBlanc and S. A. Mahmoud, “Low Complexity, Low Delay Speech Codingfor Indoor Wireless Communications,” inProceedings of VTC, Stockholm, Sweden,pp. 1695–1698, 8-10 June 1994.

[241] J. E. Kleider and W. M. Campbell, “An Adaptive Rate Digital Communication SystemFor Speech,” inProceedings of ICASSP, Munich, Germany, vol. 3, pp. 1695–1698,21-24th, April 1997.

[242] R. J. McAulay and T. F. Quatieri, “Low-rate Speech Coding Based on the SinusoidalModel,” in Advances in Speech Signal Processing(S. Furui and M. Sondhi, eds.), ch. 6,Marcel Dekker, New York, 1992.

[243] J. E. Kleider and R. J. Pattison, “Multi-Rate Speech Coding For Wireless and InternetApplications,” inProceedings of ICASSP, pp. 2379–2382, 15-19 March 1999.

Page 172: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 860

860 BIBLIOGRAPHY

[244] A. Das, A. D. Jaco, S. Manjunath, A. Ananthapadmanabhan, J. Juang, and E. Choy,“Multimode Variable Bit Rate Speech Coding: An Efficient Paradigm For High-Quality Low-Rate Representation Of Speech Signal,” inProceedings of ICASSP, 1999.

[245] “TIA/EIA/IS-127.” Enhanced Variable Rate Codec, Speech Service Option 3 for Wide-band Spread Spectrum Digital Systems.

[246] F. Beritelli, A. Lombardo, S. Palazzo, and G. Schembra, “Performance Analysis of anATM Multiplexer Loaded with VBR Traffic Generated by Multimode Speech Coders,”IEEE Journal On Selected Areas In Communications, vol. 17, pp. 63–81, Jan 1999.

[247] “Description of a 12 kbps G.729 higher rate extension.” ITU-T, Q19/16, Sept. 1997.

[248] L. Hanzo, W. Webb, and T. Keller,Single- and Multi-carrier Quadrature AmplitudeModulation. New York: John Wiley-IEEE Press, April 2000.

[249] S. Sampei, S. Komaki, and N. Morinaga, “Adaptive modulation/TDMA scheme forlarge capacity personal multi-media communication systems,” IEICE Transactions onCommunications (Japan), vol. E77-B, pp. 1096–1103, September 1994.

[250] C. Wong and L. Hanzo, “Upper-bound performance of a wideband burst-by-burstadaptive modem,”IEEE Transactions on Communications, vol. 48, pp. 367–369,March 2000.

[251] M. Yee, T. Liew, and L. Hanzo, “Radial basis function decision feedback equalisationassisted block turbo burst-by-burst adaptive modems,” inProceeding of VTC’99 (Fall)[611], pp. 1600–1604.

[252] E. Kuan and L. Hanzo, “Comparative study of adaptive-rate CDMA transmission em-ploying joint-detection and interference cancellation receivers,” inaccepted for theIEEE Vehicular Technology Conference, Tokyo, Japan, 2000.

[253] E. L. Kuan, C. H. Wong, and L. Hanzo, “Burst-by-burst adaptive joint-detectionCDMA,” in Proc. of IEEE VTC’99, (Houston, USA), pp. 1628–1632, May 1999.

[254] T. Keller and L. Hanzo, “Sub–band adaptive pre–equalised OFDM transmission,” inProceeding of VTC’99 (Fall)[611], pp. 334–338.

[255] M. Munster, T. Keller, and L. Hanzo, “Co–channel interference suppression assistedadaptive OFDM in interference limited environments,” inProceeding of VTC’99 (Fall)[611], pp. 284–288.

[256] T. H. Liew, L. L. Yang, and L. Hanzo, “Soft-decision Redundant Residue NumberSystem Based Error Correction Coding,” inProc. of IEEE VTC’99, (Amsterdam, TheNetherlands), pp. 2546–2550, Sept 1999.

[257] S. Bruhn, P. Blocher, K. Hellwig, and J. Sjoberg, “Concepts and Solutions for LinkAdaptation and Inband Signalling for the GSM AMR Speech Coding Standard,” inProceedings of VTC, Houston, Texas, USA, vol. 3, pp. 2451–2455, 16-19 May 1999.

[258] “GSM 05.09: Digital cellular telecommunications system (Phase 2+).” Link Adapta-tion, version 7.0.0, Release 1998.

Page 173: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 861

BIBLIOGRAPHY 861

[259] N. Szabo and R. Tanaka,Residue Arithmetic and Its Applications to Computer Tech-nology. New York, USA: McGraw-Hill, 1967.

[260] F. Taylor, “Residue arithmetic: A tutorial with examples,” IEEE Computer Magazine,vol. 17, pp. 50–62, May 1984.

[261] H. Krishna, K.-Y. Lin, and J.-D. Sun, “A coding theory approach to error controlin redundant residue number systems - part I: Theory and single error correction,”IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing,vol. 39, pp. 8–17, January 1992.

[262] J.-D. Sun and H. Krishna, “A coding theory approach to error control in redundantresidue number systems - part II: Multiple error detection and correction,”IEEE Trans-actions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 39,pp. 18–34, January 1992.

[263] L.-L. Yang and L. Hanzo, “Performance of residue number system based DS-CDMAover multipath fading channels using orthogonal sequences,” ETT, vol. 9, pp. 525–536,November–December 1998.

[264] A. Klein, G. Kaleh, and P. Baier, “Zero forcing and minimum mean square error equal-ization for multiuser detection in code division multiple access channels,”IEEE Trans-actions on Vehicular Technology, vol. 45, pp. 276–287, May 1996.

[265] W. Webb and R. Steele, “Variable rate QAM for mobile radio,” IEEE Transactions onCommunications, vol. 43, pp. 2223–2230, July 1995.

[266] A. Goldsmith and S. Chua, “Variable-rate variable-power MQAM for fading chan-nels,” IEEE Transactions on Communications, vol. 45, pp. 1218–1230, October 1997.

[267] J. Torrance and L. Hanzo, “Upper bound performance of adaptive modulation in a slowRayleigh fading channel,”Electronics Letters, vol. 32, pp. 718–719, 11 April 1996.

[268] C. Wong and L. Hanzo, “Upper-bound of a wideband burst-by-burst adaptive modem,”in Proceeding of VTC’99 (Spring)[613], pp. 1851–1855.

[269] M. Failli, “Digital land mobile radio communicationsCOST 207,” tech. rep., EuropeanCommission, 1989.

[270] C. Hong,Low Delay Switched Hybrid Vector Excited Linear PredictiveCoding ofSpeech. PhD thesis, National University of Singapore, 1994.

[271] J. Zhang and H.-S. Wang, “A low delay speech coding system at 4.8 kb/s,” inPro-ceedings of the IEEE International Conference on Communications Systems, vol. 3,pp. 880–883, November 1994.

[272] J.-H. Chen, N. Jayant, and R. Cox, “Improving the performance of the 16 kb/sLD-CELP speech coder,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’92[635].

Page 174: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 862

862 BIBLIOGRAPHY

[273] J.-H. Chen and A. Gersho, “Gain-adaptive vector quantization with applicationto speech coding,”IEEE Transactions on Communications, vol. 35, pp. 918–930,September 1987.

[274] J.-H. Chen and A. Gersho, “Gain-adaptive vector quantization for medium rate speechcoding,” inProceedings of IEEE International Conference on Communications 1985,(Chicago, IL, USA), pp. 1456–1460, IEEE, 23–26 June 1985.

[275] J.-H. Chen, Y.-C. Lin, and R. Cox, “A fixed-point 16 kb/sLD-CELP algorithm,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’91[630], pp. 21–24.

[276] J.-H. Chen, “High-quality 16 kb/s speech coding with aone-way delay less than 2ms,” in Proceedings of International Conference on Acoustics, Speech, and SignalProcessing, ICASSP’90[623], pp. 453–456.

[277] J. D. Marca and N. Jayant, “An algorithm for assigning binary indices to the codevec-tors of a multi-dimensional quantizer,” inProceedings of IEEE International Confer-ence on Communications 1987, (Seattle, WA, USA), pp. 1128–1132, IEEE, 7–10 June1987.

[278] K. Zeger and A. Gersho, “Zero-redundancy channel coding in vector quantization,”Electronic Letters, vol. 23, pp. 654–656, June 1987.

[279] J. Woodard and L. Hanzo, “A low delay multimode speech terminal,” in Proceedingsof IEEE VTC’96, vol. 1, (Atlanta, GA, USA), pp. 213–217, IEEE, 28 April–1 May1996.

[280] Y. Linde, A. Buzo, and R. Gray, “An algorithm for vectorquantiser design,”IEEETransactions on Communications, vol. Com-28, January 1980.

[281] W. Kleijn, D. Krasinski, and R. Ketchum, “Fast methodsfor the CELP speech codingalgorithm,”IEEE Transactions on Acoustics, Speech and Signal Processing, pp. 1330–1342, August 1990.

[282] S. D’Agnoli, J. D. Marca, and A. Alcaim, “On the use of simulated annealing for errorprotection of CELP coders employing LSF vector quantizers,” in Proceedings of IEEEVTC ’94[619], pp. 1699–1703.

[283] X. Maitre, “7 kHz audio coding within 64 kbit/s,”IEEE Journal on Selected Areas ofCommunications, vol. 6, pp. 283–298, February 1988.

[284] R. Crochiere, S. Webber, and J. Flanagan, “Digital coding of speech in sub-bands,”Bell System Technical Journal, pp. 1069–1085, October 1976.

[285] R. Crochiere, “An analysis of 16 Kbit/s sub-band coderperformance: dynamic range,tandem connections and channel errors,”Bell Systems Technical Journal, vol. 57,pp. 2927–2952, October 1978.

[286] D. Esteban and C. Galand, “Application of quadrature mirror filters to split band voicecoding scheme,” inProceedings of International Conference on Acoustics, Speech,and Signal Processing, ICASSP’77[618], pp. 191–195.

Page 175: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 863

BIBLIOGRAPHY 863

[287] J. Johnston, “A filter family designed for use in quadrature mirror filter banks,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’80[625], pp. 291–294.

[288] H. Nussbaumer, “Complex quadrature mirror filters,” in Proceedings of InternationalConference on Acoustics, Speech, and Signal Processing, ICASSP’83[634], pp. 221–223.

[289] C. Galand and H. Nussbaumer, “New quadrature mirror filter structures,”IEEE Trans-actions on Acoustic Speech Signal Processing, vol. ASSP-32, pp. 522–531, June 1984.

[290] S. Quackenbush, “A 7 kHz bandwidth, 32 kbps speech coder for ISDN,” in Pro-ceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’91[630], pp. 1–4.

[291] J. Johnston, “Transform coding of audio signals usingperceptual noise criteria,”IEEEJournal on Selected Areas of Communications, vol. 6, no. 2, pp. 314–323, 1988.

[292] E. Ordentlich and Y. Shoham, “Low-delay code-excitedlinear-predictive coding ofwideband speech at 32kbps,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’91[630], pp. 9–12.

[293] R. Soheili, A. Kondoz, and B. Evans, “New innovations in multi-pulse speech codingfor bit rates below 8 kb/s,” inProc. of Eurospeech, pp. 298–301, 1989.

[294] V. Sanchez-Calle, C. Laflamme, R. Salami, and J.-P. Adoul, “Low-delay algebraicCELP coding of wideband speech,” inSignal Processing VI: Theories and Applica-tions (J. Vandewalle, R. Boite, M. Moonen, and A. Oosterlink, eds.), pp. 495–498,Netherlands: Elsevier Science Publishers, 1992.

[295] G. Roy and P. Kabal, “Wideband CELP speech coding at 16 kbit/sec,” inProceedingsof International Conference on Acoustics, Speech, and Signal Processing, ICASSP’91[630], pp. 17–20.

[296] R. Steele and W. Webb, “Variable rate QAM for data transmission over Rayleigh fad-ing channels,” inProceeedings of Wireless ’91, (Calgary, Alberta), pp. 1–14, IEEE,1991.

[297] Y. Kamio, S. Sampei, H. Sasaoka, and N. Morinaga, “Performance of modulation-level-control adaptive-modulation under limited transmission delay time for land mo-bile communications,” inProceedings of IEEE Vehicular Technology Conference(VTC’95), (Chicago, USA), pp. 221–225, IEEE, 15–28 July 1995.

[298] K. Arimochi, S. Sampei, and N. Morinaga, “Adaptive modulation system with dis-crete power control and predistortion-type non-linear compensation for high spectralefficient and high power efficient wireless communication systems,” inProceedings ofIEEE International Symposium on Personal, Indoor and Mobile Radio Communica-tions, PIMRC’97[624], pp. 472–477.

Page 176: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 864

864 BIBLIOGRAPHY

[299] M. Naijoh, S. Sampei, N. Morinaga, and Y. Kamio, “ARQ schemes with adaptivemodulation/TDMA/TDD systems for wireless multimedia communication systems,”in Proceedings of IEEE International Symposium on Personal, Indoor and MobileRadio Communications, PIMRC’97[624], pp. 709–713.

[300] A. Goldsmith, “The capacity of downlink fading channels with variable rate andpower,” IEEE Transactions on Vehicular Technolgy, vol. 46, pp. 569–580, August.1997.

[301] M.-S. Alouini and A. Goldsmith, “Area spectral efficiency of cellular mobileradio systems,” to appear IEEE Transactions on Vehicular Technology, 1999.http://www.systems.caltech.edu.

[302] A. Goldsmith and P. Varaiya, “Capacity of fading channels with channel side informa-tion,” IEEE Transactions on Information Theory, vol. 43, pp. 1986–1992, November1997.

[303] T. Liew, C. Wong, and L. Hanzo, “Block turbo coded burst-by-burst adaptivemodems,” inProceedings of Microcoll’99, Budapest, Hungary, pp. 59–62, 21–24March 1999.

[304] C. Wong, T. Liew, and L. Hanzo, “Blind-detection assisted, block turbo coded,decision-feedback equalised burst-by-burst adaptive modulation.” submitted to IEEEJSAC, 1999.

[305] H. Matsuoka, S. Sampei, N. Morinaga, and Y. Kamio, “Adaptive modulation systemwith variable coding rate concatenated code for high quality multi-media communica-tions systems,” inProceedings of IEEE VTC’96, vol. 1, (Atlanta, GA, USA), pp. 487–491, IEEE, 28 April–1 May 1996.

[306] V. Lau and M. Macleod, “Variable rate adaptive trelliscoded QAM for high bandwidthefficiency applications in rayleigh fading channels,” inProceedings of IEEE VehicularTechnology Conference (VTC’98)[620], pp. 348–352.

[307] A. Goldsmith and S. Chua, “Adaptive coded modulation for fading channels,”IEEETransactions on Communications, vol. 46, pp. 595–602, May 1998.

[308] T. Keller and L. Hanzo, “Adaptive orthogonal frequency division multiplexingschemes,” inProceeding of ACTS Mobile Communication Summit ’98[612], pp. 794–799.

[309] E. Kuan, C. Wong, and L. Hanzo, “Burst-by-burst adaptive joint detection CDMA,” inProceeding of VTC’99 (Spring)[613].

[310] R. Chang, “Synthesis of band-limited orthogonal signals for multichannel data trans-mission,”Bell Systems Technical Journal, vol. 46, pp. 1775–1796, December 1966.

[311] L. Cimini, “Analysis and simulation of a digital mobile channel using orthogonalfrequency division multiplexing,”IEEE Transactions on Communications, vol. 33,pp. 665–675, July 1985.

Page 177: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 865

BIBLIOGRAPHY 865

[312] K. Fazel and G. Fettweis, eds.,Multi-Carrier Spread-Spectrum. Dordrecht: Kluwer,1997. ISBN 0-7923-9973-0.

[313] T. May and H. Rohling, “Reduktion von Nachbarkanalstorungen in OFDM–Funkubertragungssystemen,” in2. OFDM–Fachgesprach in Braunschweig, 1997.

[314] S. Muller and J. Huber, “Vergleich von OFDM–Verfahren mit reduzierter Spitzenleis-tung,” in 2. OFDM–Fachgesprach in Braunschweig, 1997.

[315] F. Classen and H. Meyr, “Synchronisation algorithms for an OFDM system for mobilecommunications,” inCodierung fur Quelle, Kanal undUbertragung, no. 130 in ITGFachbericht, (Berlin), pp. 105–113, VDE–Verlag, 1994.

[316] F. Classen and H. Meyr, “Frequency synchronisation algorithms for OFDM systemssuitable for communication over frequency selective fading channels,” inProceedingsof IEEE VTC ’94[619], pp. 1655–1659.

[317] S. Shepherd, P. van Eetvelt, C. Wyatt-Millington, andS. Barton, “Simple codingscheme to reduce peak factor in QPSK multicarrier modulation,” Electronics Letters,vol. 31, pp. 1131–1132, July 1995.

[318] A. Jones, T. Wilkinson, and S. Barton, “Block coding scheme for reduction of peak tomean envelope power ratio of multicarrier transmission schemes,”Electronics Letters,vol. 30, pp. 2098–2099, 1994.

[319] M. D. Benedetto and P. Mandarini, “An application of MMSE predistortion to OFDMsystems,”IEEE Transactions on Communications, vol. 44, pp. 1417–1420, November1996.

[320] P. Chow, J. Cioffi, and J. Bingham, “A practical discrete multitone transceiver loadingalgorithm for data transmission over spectrally shaped channels,” IEEE Transactionson Communications, vol. 48, pp. 772–775, 1995.

[321] K. Fazel, S. Kaiser, P. Robertson, and M. Ruf, “A concept of digital terrestrial televi-sion broadcasting,”Wireless Personal Communications, vol. 2, pp. 9–27, 1995.

[322] H. Sari, G. Karam, and I. Jeanclaude, “Transmission techniques for digital terrestrialTV broadcasting,”IEEE Communications Magazine, pp. 100–109, February 1995.

[323] J. Borowski, S. Zeisberg, J. Hubner, K. Koora, E. Bogenfeld, and B. Kull, “Perfor-mance of OFDM and comparable single carrier system in MEDIANdemonstrator60GHz channel,” inProceeding of ACTS Mobile Communication Summit ’97[621],pp. 653–658.

[324] I. Kalet, “The multitone channel,”IEEE Transactions on Communications, vol. 37,pp. 119–124, February 1989.

[325] Y. Li and N. Sollenberger, “Interference suppressionin OFDM systems using adaptiveantenna arrays,” inProceeding of Globecom’98, (Sydney, Australia), pp. 213–218,IEEE, 8–12 November 1998.

Page 178: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 866

866 BIBLIOGRAPHY

[326] F. Vook and K. Baum, “Adaptive antennas for OFDM,” inProceedings of IEEE Vehic-ular Technology Conference (VTC’98)[620], pp. 608–610.

[327] T. Keller, J. Woodard, and L. Hanzo, “Turbo-coded parallel modem techniques forpersonal communications,” inProceedings of IEEE VTC’97, (Phoenix, AZ, USA),pp. 2158–2162, IEEE, 4–7 May 1997.

[328] T. Keller and L. Hanzo, “Blind-detection assisted sub-band adaptive turbo-codedOFDM schemes,” inProceeding of VTC’99 (Spring)[613], pp. 489–493.

[329] “Universal mobile telecommunications system (UMTS); UMTS terrestrial radio ac-cess (UTRA); concept evaluation,” tech. rep., ETSI, 1997. TR 101 146.

[330] tech. rep. http://standards.pictel.com/ptelcont.htm#Audio orftp://standard.pictel.com/sg16q20/199909 Geneva/.

[331] M. Failli, “Digital land mobile radio communicationsCOST 207,” tech. rep., EuropeanCommission, 1989.

[332] J. Proakis,Digital Communications. New York, USA: McGraw-Hill, 3rd ed., 1995.

[333] H. Malvar,Signal Processing with Lapped Transforms. London, UK: Artech House,1992.

[334] K. Rao and P. Yip,Discrete Cosine Transform: Algorithms, Advantages and Applica-tions. New York, USA: Academic Press Ltd., 1990.

[335] W. B. Kleijn and K. K. Paliwal,Speech Coding and Synthesis. Elsevier, 1995.

[336] P. Combescure, J. Schnitzler, K. Fischer, R. Kirchherr, C. Lamblin, A. L. Guyader,D. Massaloux, C. Quinquisd, J. Stegmann, and P. Vary, “A 16, 24, 32 Kbit/s Wide-band Speech Codec Based on ATCELP,” inProceedings of ICASSP, Phoenix, Arizona,vol. 1, pp. 5–8, 14-18th, March 1999.

[337] J. Schnitzler, C. Erdmann, P. Vary, K. Fischer, J. Stegmann, C. Quinquis, D. Mas-saloux, and C. Lamblin, “Wideband Speech Coding For the GSM Adaptive Multi-Rate System,” inProceedings of 3rd ITG Conference on Source and Channel Coding,(Technical Univ. Munich, Germany), pp. 325–330, 17th-19th, January 2000.

[338] A. Murashima, M. Serizawa, and K. Ozawa, “A Multi-RateWideband Speech CodecRobust to Background Noise,” inProceedings of ICASSP, Istanbul, Turkey, vol. 2,pp. 1165–1168, 5-9th, June 2000.

[339] R. Steele and L. Hanzo,(Editors), Mobile Radio Communications. IEEE Press-JohnWiley, 2nd Edition, 1999.

[340] R. V. Cox, J. Hagenauer, N. Seshadri, and C.-E. Sundberg, “Subband speech codingand matched convolutional channel coding for mobile radio channels,”IEEE Transac-tions on signal processing, vol. 39, pp. 1717–1731, August 1991.

Page 179: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 867

BIBLIOGRAPHY 867

[341] J. Hagenauer, “Rate-compatible punctured convolutional codes (rcpc codes) and theirapplications,”IEEE Transactions on Communications, vol. 36, pp. 389–400, April1988.

[342] N. S. Othman, S. X. Ng, and L. Hanzo, “Turbo-dected unequal protection audio andspeech transceivers using serially concatenated convolutional codes, trellis coded mod-ulationa and space-time trellis coding,” inTo appear in Proceedings of the IEEE Ve-hicular Technology Conference, (Dallas, USA), 25-28 September 2005.

[343] S. X. Ng, J. Y. Chung, and L. Hanzo, “Turbo-dected unequal protection MPEG-4telephony using trellis coded modulatioan and space-time trellis coding,” inProceed-ings of IEE International Conference on 3G mobile communication Technologies (3G2004), (London, UK), pp. 416–420, 18-20 October 2004.

[344] M. Tuchler and J. Hagenauer, “Exit charts of irregular codes,” in Proceedings of Con-ference on Information Science and Systems [CDROM], (Princeton University), 20-22March 2002.

[345] M. Tuchler, “Design of serially concatenated systems depending on the block length,”IEEE Transactions on Communications, vol. 52, pp. 209–218, February 2004.

[346] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenatedcodes,” IEEE Transactions on Communications, vol. 49, pp. 1727–1737, October2001.

[347] B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio,H. Mikkola, and K. Jarvinen, “The adaptive multirate wideband speech codec,”IEEETransaction on Speech and Audio Processing, vol. 10, pp. 620–636, November 2002.

[348] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimaldecoding of linear codesfor minimal symbol error rate,”IEEE Transactions on Information Theory, vol. 20,pp. 284–287, March 1974.

[349] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concatenation of inter-leaved codes: Performance analysis, design, and iterativedecoding,”IEEE Transac-tions on Information Theory, vol. 44, pp. 909–926, May 1998.

[350] A. Ashikhmin, G. Kramer, and S. ten Brink, “Extrinsic information transfer functions:model and erasure channel properties,”IEEE Transactions on Information Theory,vol. 50, pp. 2657– 2673, November 2004.

[351] I. Land, P. Hoeher, and S. Gligorevic, “Computation of symbol-wise mutual informa-tion in transmission systems with logAPP decoders and application to EXIT charts,” inProceedings of International ITG Conference on Source and Channel Coding (SCC),(Erlangen, Germany), pp. 195–202, January 2004.

[352] S. Dolinar and D. Divsalar, “Weight distributions forturbo codes using random andnonrandom permutations,”JPL-TDA Progress Report 42-122, pp. 56–65, August1995.

Page 180: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 868

868 BIBLIOGRAPHY

[353] A. Lillie, A. Nix, and J. McGeehan, “Performance and design of a reduced complex-ity iterative equalizer for precoded isi channels,” inProceedings of IEEE VehicularTechnology Conference (VTC 2003-Fall), (Orlando, USA), 6-9 October 2003.

[354] “Information technology-coding of audio-visual objects-part3: Audio,” ISO/IEC14496-3:2001.

[355] “Specification for the use of video and audio coding in dvb services delivered directlyover ip,” DVB Document A-84 Rev.1, November 2005.

[356] “IP Datacast over DVB-H: Architecture,”DVB Document A098, November 2005.

[357] “Extended AMR Wideband codec; Transcoding functions,” 3GPP TS 26.290.

[358] S. Ragot and B. Bessette and R. Lefebvre, “Low-complexity multi-rate lattice vectorquantization with application to wideband speech coding at32 kb/s,”Proceedings ofICASSP-2004, May 2004.

[359] B. Bessette, R. Lefebvre, and R. Salami, “Universal speech/audio coding using hybridacelp/tcx techniques,”Proc. ICASSP-2005, March 2005.

[360] J. S. et al., “Rtp payload format for the extended adaptive multi-rate wideband (amr-wb+) audio codec,”IETF RFC 4352, January 2006.

[361] “Method for the subjective assessment fo intermediate quality level of coding sys-tems,”Recommendation ITU-R BS.1534.

[362] “ETSI GSM 06.10.” Full Rate (FR) speech transcoding (RPE-LTP: Regular Pulse Ex-citation - Long Term Prediction).

[363] “ETSI GSM 06.20.” Half Rate (HR) speech transcoding (VSELP: Vector Sum ExcitedLinear Prediction.

[364] K. Jarvinen, J. Vainio, P. Kapanen, T. Honkanen, and P.Haavisto, “GSM EnhancedFull Rate Speech Codec,” inProceedings of ICASSP, Munich, Germany, vol. 2,pp. 771–774, 21-24th, April 1997.

[365] M. R. Schroeder and B. S. Atal, “Code Excited Linear Prediction (CELP) : HighQuality Speech at Very Low Bit Rates,” inProceedings of ICASSP, (Tampa, Florida),pp. 937–940, March 1985.

[366] R. A. Salami, C. Laflamme, J. P. Adoul, and D. Massaloux,“A Toll Quality 8 kbit/sSpeech Codec for the Personal Communications System(PCS),” IEEE Transactions onVehicular Technology, vol. 43, pp. 808–816, Aug 1994.

[367] R. A. Salami and L. Hanzo, “Speech Coding,” inMobile Radio Communications(R. Steele and L. Hanzo, eds.), ch. 3, pp. 187–335, IEEE Press-John Wiley, 1999.

[368] L. Hanzo, F. C. A. Somerville, and J. Woodard,Modern Speech Communications:Principles and Applications for Fixed and Wireless Channels. John Wiley, 2001.

Page 181: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 869

BIBLIOGRAPHY 869

[369] F. Itakura, “Line Spectrum Representation of Linear Predictor Coefficients of SpeechSignals,”Journal of Acoustical Society of America, vol. 57, no. S35(A), 1975.

[370] K. K. Paliwal and B. S. Atal, “Efficient Vector Quantization of LPC Parameters at 24bits/frame,”IEEE Transactions on Speech and Audio Processing, vol. 1, pp. 3–14, Jan1993.

[371] L. Hanzo and J. P. Woodard, “An Intelligent Multimode Voice CommunicationsSystem For Indoors Communications,”IEEE Transactions on Vehicular Technology,vol. 44, pp. 735–749, Nov 1995.

[372] “Coding of Speech at 8 kbit/s using Conjugate-Structure Algebraic Code-Excited Lin-ear Prediction (CS-ACELP).” ITU Recommendation G.729, 1995.

[373] J. H. Chen and A. Gersho, “Adaptive Postfiltering for Quality Enhancement of CodedSpeech,”IEEE Transactions on Speech and Audio Processing, vol. 3, pp. 59–71, Jan1995.

[374] E. L. Kuan, C. H. Wong, and L. Hanzo, “Burst-by-burst adaptive joint-detectionCDMA,” in Proc. of IEEE VTC’99, vol. 2, (Houston, USA), pp. 1628–1632, May1999.

[375] A. Das, E. Paksoy, and A. Gersho, “Multimode and Variable Rate Coding of Speech,”in Speech Coding and Synthesis(W. Kleijn and K. Paliwal, eds.), ch. 7, pp. 257–288,Elsevier, 1995.

[376] T. Taniguchi, S. Unagami, and R. Gray, “Multimode coding: a novel approachto narrow- and medium-band coding,”Journal of the Acoustic Society of America,vol. 84, p. S12, 1988.

[377] P. Kroon and B. Atal, “Strategies for improving CELP coders,” in Proceedings ofICASSP, New York, NY, USA, vol. 1, pp. 151–154, 11-14th, April 1988.

[378] M. Yong and A. Gersho, “Vector excitation coding with dynamic bit allocation,” inProceedings of IEEE Globecom, Hollywood, FL, USA, pp. 290–294, Nov 28-Dec 11988.

[379] A. DeJaco, W. Gardner, P. Jacobs, and C. Lee, “QCELP: the North American CDMAdigital cellular variable rate speech coding standard,” inProceedings of IEEE Work-shop on Speech Coding for Telecommunications, pp. 5–6, 1993.

[380] E. Paksoy, K. Srinivasan, and A. Gersho, “Variable bitrate CELP coding of speechwith phonetic classification,”European Transactions on Telecommunications, vol. 5,pp. 591–602, Oct 1994.

[381] L. Cellario and D. Sereno, “CELP coding at variable rate,” European Transactions onTelecommunications, vol. 5, pp. 603–613, Oct 1994.

[382] E. Yuen, P. Ho, and V. Cuperman, “Variable Rate Speech and Channel Coding for Mo-bile Communications,” inProceedings of VTC, Stockholm, Sweden, pp. 1709–1712,8-10 June 1994.

Page 182: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 870

870 BIBLIOGRAPHY

[383] T. Kawashima, V. Sharma, and A. Gersho, “Capacity enhancement of cellular CDMAby traffic-based control of speech bit rate,”IEEE Transactions on Vehicular Technol-ogy, vol. 45, pp. 543–550, Aug 1996.

[384] J. Hagenauer, “Rate-Compatible Punctured Convolutional Codes (RCPC Codes) andtheir applications,”IEEE Transactions on Communications, vol. 36, pp. 389–400, Apr1988.

[385] W. P. LeBlanc and S. A. Mahmoud, “Low Complexity, Low Delay Speech Codingfor Indoor Wireless Communications,” inProceedings of VTC, Stockholm, Sweden,pp. 1695–1698, 8-10 June 1994.

[386] “Coding of Speech at 16 kbit/s using Low-Delay Code Excited Linear Prediction (LD-CELP).” ITU Recommendation G.728, 1992.

[387] R. J. McAulay and T. F. Quatieri, “Low-rate Speech Coding Based on the SinusoidalModel,” in Advances in Speech Signal Processing(S. Furui and M. Sondhi, eds.), ch. 6,Marcel Dekker, New York, 1992.

[388] J. E. Kleider and R. J. Pattison, “Multi-Rate Speech Coding For Wireless and InternetApplications,” inProceedings of ICASSP, Phoenix, Arizona, pp. 1193–1196, 14-18th,March 1999.

[389] A. Das, A. Dejaco, S. Manjunath, A. Ananthapadmanabhan, J. Juang, and E. Choy,“Multimode Variable Bit Rate Speech Coding: An Efficient Paradigm For High-Quality Low-Rate Representation Of Speech Signal,” inProceedings of ICASSP,Phoenix, Arizona, pp. 3014–3017, 14-18th, March 1999.

[390] F. Beritelli, A. Lombardo, S. Palazzo, and G. Schembra, “Performance Analysis of anATM Multiplexer Loaded with VBR Traffic Generated by Multimode Speech Coders,”IEEE Journal On Selected Areas In Communications, vol. 17, pp. 63–81, Jan 1999.

[391] S. Sampei, S. Komaki, and N. Morinaga, “Adaptive Modulation/TDMA scheme forlarge capacity personal multimedia communications systems,” IEICE transactions onCommunications, vol. E77-B, pp. 1096–1103, September 1994.

[392] C. H. Wong and L. Hanzo, “Upper-bound performance of a wideband adaptive mo-dem,” IEEE Transactions on Communications, vol. 48, no. 3, pp. 367–369, 2000.

[393] M. S. Yee and L. Hanzo, “Block Turbo Coded Burst-By-Burst Adaptive Radial BasisFunction Decision Feedback Equaliser Assisted Modems,” inProc. of VTC’99 Fall,Amsterdam, Netherlands, pp. 1600–1604, 19-22 September 1999.

[394] E. Kuan and L. Hanzo, “Comparative study of adaptive-rate CDMA transmissionemploying joint-detection and interference cancellationreceivers,” inProceedings ofIEEE Vehicular Technology Conference, Tokyo, Japan, vol. 1, pp. 71–75, 2000.

[395] T. Keller and L. Hanzo, “Sub-band Adaptive Pre-Equalised OFDM Schemes,” inPro-ceedings of the IEEE VTC’99 Fall, Amsterdam, pp. 334–338, 1999.

Page 183: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 871

BIBLIOGRAPHY 871

[396] M. Muenster, T. Keller, and L. Hanzo, “Co-Channel Interference Suppression AssistedAdaptive OFDM in Interference-limited Environments,” inProceedings of the IEEEVTC’99 Fall, Amsterdam, pp. 284–288, 1999.

[397] T. H. Liew, L. L. Yang, and L. Hanzo, “Soft-decision Redundant Residue NumberSystem Based Error Correction Coding,” inProc. of IEEE VTC’99, (Amsterdam, TheNetherlands), pp. 2546–2550, Sept 1999.

[398] N. S. Szabo and R. I. Tanaka,Residue Arithmetic and Its Applications to ComputerTechnology. McGraw-Hill Book Company, 1967.

[399] F. J. Taylor, “Residue Arithmetic: A Tutorial with Examples,”IEEE Computer Maga-zine, pp. 50–62, May 1984.

[400] H. Krishna, K. Y. Lin, and J. D. Sun, “A Coding Theory Approach to Error Controlin Redundant Residue Number Systems - Part I: Theory and Single Error Correction,”IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing,vol. 39, pp. 8–17, Jan 1992.

[401] J. D. Sun and H. Krishna, “A Coding Theory Approach to Error Control in Redun-dant Residue Number Systems - Part II: Multiple Error Detection and Correction,”IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing,vol. 39, pp. 18–34, Jan 1992.

[402] L. L. Yang and L. Hanzo, “Performance of Residue NumberSystem Based DS-CDMAover Multipath Channels Using Orthogonal Sequences,”European Transactions onCommunications, vol. 9, pp. 525–535, Nov.-Dec 1998.

[403] L. L. Yang and L. Hanzo, “Unified Error-Control Procedure for Global Telecommu-nication Systems Using Redundant Residue Number System Codes,” in Proc. of theWWRF (Wireless World Research Forum) Kick-off Meeting, (Munich, Germany), 6-7th, March 2001.

[404] A. Klein, G. K. Kaleh, and P. W. Baier, “Zero forcing andminimum mean square errorequalization for multiuser detection in code division multiple access channels,”IEEETransactions on Vehicular Technology, vol. 45, pp. 276–287, May 1996.

[405] W. Webb and R. Steele, “Variable rate QAM for mobile radio,” IEEE Transactions onCommunications, vol. 43, pp. 2223–2230, July 1995.

[406] A. J. Goldsmith and S. G. Chua, “Variable rate variablepower MQAM for fadingchannels,”IEEE Transactions on Communications, vol. 45, pp. 1218–1230, October1997.

[407] J. M. Torrance and L. Hanzo, “On the Upper bound performance of adaptive QAM inslow Rayleigh fading channel,”IEE Electronics Letters, pp. 169–171, April 1996.

[408] C. H. Wong and L. Hanzo, “Upper-bound performance of a wideband burst-by-burstadaptive modem,” inProc. of IEEE VTC’99, (Houston, USA), pp. 1851–1855, May1999.

Page 184: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 872

872 BIBLIOGRAPHY

[409] M. Faili, “Digital land mobile radio communications COST 207,” tech. rep., EuropeanCommission, Luxembourg, 1989.

[410] L. Hanzo, F. C. A. Somerville, and J. P. Woodard,Voice compression and commu-nications: principles and applications for fixed and wireless channels. Chichester,UK:John Wiley-IEEE Press, 2001.

[411] L. Hanzo, P. J. Cherriman, and J. Streit,Wireless Video Communications: Second toThird Generation System and Beyond. Piscataway, NJ:IEEE Press, 2001.

[412] L. Fielder, M. Bosi, G. Davidson, M. Davis, C. Todd, andS. Vernon, “AC-2 and AC-3:Low complexity transform-based audio coding,” inCollected Papers on Digital AudioBit-Rate Reduction(N. Gilchrist and C. Grewin, eds.), pp. 54–72, Audio EngineeringSociety, 1996.

[413] K. Tsutsui, H. Suzuki, O. Shimoyoshi, M. Sonohara, K. Akagiri, and R. Heddle,“ATRAC:adaptive transform acoustic coding for MiniDisc,”in Collected Papers onDigital Audio Bit-Rate Reduction(N. Gilchrist and C. Grewin, eds.), pp. 95–101, Au-dio Engineering Society, 1996.

[414] J. Johnston, D. Sinha, S. Doward, and S. quackenbush, “AT&T perceptual audio cod-ing (PAC),” in Collected Papers on Digital Audio Bit-Rate Reduction(N. Gilchrist andC. Grewin, eds.), pp. 73–81, Audio Engineering Society, 1996.

[415] G. C. P. Lokhoff, “Precision Adaptive Subband Coding (PASC) for the Digital Com-pact Cassette (DCC),”IEEE Transactions on Consumer Electronic, vol. 38, pp. 784–789, Nov 1992.

[416] P. Noll, “Digital audio coding for visual communications,” Proceedings of the IEEE,vol. 83, pp. 925–943, June 1995.

[417] N. S. Jayant, J. Johnston, and R. Sofranek, “Signal Compression Based on Models ofHuman Perception,”Proceedings of IEEE, vol. 81, pp. 1385–1422, Oct 1993.

[418] E. Zwicker, “Subdivision of the Audible Frequency Range Into Critical Bands (Fre-quenzgruppen),”The Journal of the Acoustical Society of America, vol. 33, p. 248, Feb1961.

[419] K. Brandenburg, “Introduction to perceptual coding,” in Collected Papers on DigitalAudio Bit-Rate Reduction(N. Gilchrist and C. Grewin, eds.), pp. 23–30, Audio Engi-neering Society, 1996.

[420] T. Painter and A. Spanias, “Perceptual Coding of Digital Audio,” Proceedings of theIEEE, vol. 88, pp. 451–513, Apr 2000.

[421] H. Fletcher, “Auditory patterns,”Rev. Mod. Phys., pp. 47–65, Jan 1940.

[422] D. D. Greenwood, “Critical bandwidth and the frequency coordinates of the Basilarmembrane,”The Journal of the Acoustical Society of America, pp. 1344–1356, Oct1961.

Page 185: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 873

BIBLIOGRAPHY 873

[423] B. Scharf, “Critical bands,” inFoundations of Modern Auditory Theory, New York,Academic Publisher, 1970.

[424] E. Zwicker and H. Fastl,Psychoacoustics - Facts and Models. Springer-Verlag, Berlin,Germany, 1990.

[425] R. Hellman, “Asymmetry of masking between noise and tone,” Percep. Psychphys.,vol. 11, pp. 241–246, 1972.

[426] K. Brandenburg, “OCF - A new coding algorithm for high quality sound signals,” inProceedings of ICASSP, Dallas, Texas, USA, pp. 141–144, April 1987.

[427] J. Johnston, “Transform Coding of Audio Signals UsingPerceptual Noise Criteria,”IEEE Journal on Selected Areas in Communications, vol. 6, pp. 314–323, Feb 1988.

[428] D. Esteban and C. Galand, “Application of Quadrature Mirror Filters to Split BandVoice Coding Scheme,” inProceedings of ICASSP, pp. 191–195, 9-11th, May 1977.

[429] R. E. Crochiere, S. A. Webber, and J. L. Flanagan, “Digital Coding of Speech in Sub-bands,”The Bell System Technical Journal, vol. 55, pp. 1069–1085, Oct 1976.

[430] J. M. Tribolet and R. E. Crochiere, “Frequency Domain Coding of Speech,”IEEETransactions on Acoustics, Speech, and Signal Processing, vol. 27, pp. 512–530, Oct1979.

[431] N. S. Jayant and P. Noll,Digital Coding of Waveforms : Principles and Applicationsto Speech and Video Coding. Prentice Hall, INC. Englewood Cliffs, NJ, 1984.

[432] R. Zelinski and P. Noll, “Adaptive Transform Coding ofSpeech Signals,”IEEE Trans-actions on Acoustics, Speech, and Signal Processing, vol. 25, pp. 299–309, Aug 1977.

[433] J. Herre and J. Johnston, “Continuously signal-adaptive filterbank for high-quality per-ceptual audio coding,” inProceedings of IEEE ASSP Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), 19-22nd, Oct 1997.

[434] P. P. Vaidyanathan, “Multirate digital filters, filterbanks, polyphase networks and ap-plications: a tutorial,”Proceedings of the IEEE, vol. 78, pp. 56–93, Jan 1990.

[435] P. P. Vaidyanathan,Multirate Systems and Filter Banks. Englewood Cliffs, NJ: PrenticeHall, 1993.

[436] A. Akansu and M. T. J. S. (Eds),Subband and Wavelet Transforms, Designs and Ap-plications. Norwell, MA:Kluwer Academic, 1996.

[437] H. S. Malvar,Signal Processing with Lapped Transforms. Boston, MA: Artech House,1992.

[438] H. S. Malvar, “Modulated QMF filter banks with perfect reconstruction,”ElectronicsLetters, vol. 26, pp. 906–907, June 1990.

[439] J. H. Rothweiler, “Polyphase quadrature filters: A newsubband coding technique,” inProceedings of ICASSP, Boston, Massachusetts, USA, pp. 1280–1283, April 1983.

Page 186: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 874

874 BIBLIOGRAPHY

[440] M. Temerinac and B. Edler, “LINC: A Common Theory of Transform and SubbandCoding,” IEEE Transactions on Communications, vol. 41, no. 2, pp. 266–274, 1993.

[441] H. J. Nussbaumer, “Pseudo QMF filter bank,”IBM Tech. Disclosure Bull., vol. 24,pp. 3081–3087, Nov 1981.

[442] J. P. Princen and A. B. Bradley, “Analysis/Synthesis Filter Bank Design Based on TimeDomain Aliasing Cancellation,”IEEE Transactions on Acoustics, Speech and SignalProcessing, vol. ASSP-34, pp. 1153–1161, Oct 1986.

[443] S. Shlien, “The Modulated Lapped Transfrom, Its Time-Varying Forms, and Its Appli-cations to Audio Coding Standards,”IEEE Transactions on Speeech and Audio Pro-cessing, vol. 5, pp. 359–366, July 1997.

[444] B. Edler, “Coding of Audio Signals with Overlapping Block Transform and AdaptiveWindow Functions,”(in German), Frequenz, vol. 43, pp. 252–256, Sept 1989.

[445] K. Brandenburg and G. Stoll, “ISO-MPEG-1 Audio: A Generic Standard for Coding ofHigh-Quality Digital Audio,” inCollected Papers on Digital Audio Bit-Rate Reduction(N. Gilchrist and C. Grewin, eds.), pp. 31–42, Audio Engineering Society, 1996.

[446] H. S. Malvar, “Lapped Transform for Efficient Transform/Subband Coding,”IEEETransactions on Acoustics, Speech and Signal Processing, vol. 38, pp. 969–978, June1990.

[447] Y. Mahieux, J. Petit, and A. Charbonnier, “Transform coding of audio signals usingcorrelation between succesive transform blocks,” inProceedings of ICASSP, Glasgow,Scotland, pp. 2021–2024, May 1989.

[448] J. Johnston and A. Ferreira, “Sum-difference stereo transform coding,” inProceedingsof ICASSP, San Fransisco, CA, USA, pp. II–569–II–572, March 1992.

[449] S. Park, Y. Kim, and Y. Seo, “Multi-Layer Bit-Sliced Bit-Rate Scalable Audio Cod-ing,” in 103th Convention of the Audio Engineering Society, preprint 4520, Sep 1997.

[450] N. Iwakami, T. Moriya, and S. Miki, “High-quality audio coding at less than 64 kbpsby using transform-domain weighted interleave vector quantization (TwinVQ),” inProceedings of ICASSP, Detroit, MI, USA, pp. 3095–3098, 9-12th, May 1995.

[451] J. Herre and J. Johnston, “Enhancing the performance of perceptual audio coders byusing temporal noise shaping (TNS),” in101th Convention of the Audio EngineeringSociety, preprint 4384, Dec 1996.

[452] J. Herre, K. Brandenburg, and D. Lederer, “Intensity Stereo Coding,” in96th Conven-tion of the Audio Engineering Society, preprint 3799, May 1994.

[453] J. Johnston, “Perceptual transform coding of wideband stereo signals,” inProceedingsof ICASSP, Glasgow, Scotland, pp. 1993–1996, May 1989.

[454] R. G. van der Waal and R. N. J. Veldhuis, “Subband codingof stereophonic digitalaudio signals,” inProceedings of ICASSP, Toronto, Canada, vol. 5, pp. 3601–3604,14-17th, May 1991.

Page 187: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 875

BIBLIOGRAPHY 875

[455] T. V. Sreenivas and M. Dietz, “Vector Quantization of Scale Factors in AdvancedAudio Coder (AAC),” in Proceedings of ICASSP, Seattle, Washington, USA, vol. 6,pp. 3641–3644, 12-15th, May 1998.

[456] D. A. Huffman, “A Method for the Construction of Minimum-Redundancy Codes,”Proceedings of IRE, vol. 40, pp. 1098–1101, Sept 1952.

[457] ISO/IEC JTC1/SC29/WG11/N2203TF, MPEG-4 Audio Version 1 Final Com-mittee Draft 14496-3 Subpart 4:TF,http://www.tnt.uni-hannover.de/project/mpeg/audio/documents/, May 1998.

[458] K. Ikeda, T. Moriya, N. Iwakami, A. Jin, and S. Miki, “A design of TwinVQ audiocodec for personal communication systems,” inFourth IEEE International Conferenceon Universal Personal Communications., pp. 803–807, 1995.

[459] K. Ikeda, T. Moriya, and N. Iwakami, “Error protected TwinVQ audio coding at lessthan 64 kbit/s,” inProceedings of IEEE Speech Coding Workshop, pp. 33–34, 1995.

[460] T. Moriya, N. Iwakami, K. Ikeda, and S. Miki, “Extension and complexity reductionof TwinVQ audio coder,” inProceedings of ICASSP, Atlanta, Georgia, pp. 1029–1032,7-10th, May 1996.

[461] T. Moriya, “Two-channel conjugate vector quantizer for noisy channel speech coding,”IEEE Journal on Selected Areas in Communications, vol. 10, pp. 866–874, 1992.

[462] N. Kitawaki, T. Moriya, T. Kaneko, and N. Iwakami, “Comparison of two speech andaudio coders at 8 kbit/s from the viewpoints of coding schemeand quality,” IEICETransactions on Communications, vol. E81-B, pp. 2007–2012, Nov 1998.

[463] T. Moriya, N. Iwakami, A. Jin, K. Ikeda, and S. Miki, “A design of transform coderfor both speech and audio signals at 1 bit/sample,” inProceedings of ICASSP, Munich,Germany, pp. 1371–1374, 21-24th, April 1997.

[464] T. Moriya and M. Honda, “Transform coding of speech using a weighted vector quan-tizer,” IEEE Journal on Selected Areas in Communications, vol. 6, pp. 425–431, Feb1988.

[465] H. Purnhagen, “An Overview of MPEG-4 Audio Version 2,”in AES 17th InternationalConference on High-Quality Audio Coding, Sept 1999.

[466] H. Purnhagen, “Advances in parametric audio coding,”in Proceedings of IEEEWorkshop on Applications of Signal Processing to Audio and Acoustics, New York,pp. W99–1–W99–4, 17-20th, Oct 1999.

[467] H. Purnhagen and N. Meine, “HILN - The MPEG-4 Parametric Audio Coding Tools,”in Proceedings - IEEE International Symposium on Circuits andSystems, Geneva,Switzerland, pp. III–201–III–204, 29-31th, May 2000.

[468] B. Edler and H. Purnhagen, “Concepts for Hybrid Audio Coding Schemes Based onParametric Techniques,” inAES 105th Convention, Preprint 4808, Sept 1998.

Page 188: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 876

876 BIBLIOGRAPHY

[469] S. Levine, T. Verma, and J. O. Smith, “Multiresolutionsinusoidal modelling for wide-band audio with modifications,” inProceedings of ICASSP, Seattle, Washington, USA,vol. 6, pp. 3585–3588, 12-15th, May 1998.

[470] T. S. Verma and T. H. Y. Meng, “Analysis/synthesis toolfor transient signals thatallows a flexible sines+transients+noise model for audio,”in Proceedings of ICASSP,Seattle, Washington, USA, vol. 6, pp. 3573–3576, 12-15th, May 1998.

[471] S. Levine and J. O. S. III, “A switched parametric & transform audio coder,” inPro-ceedings of ICASSP, Phoenix, Arizona, pp. 985–988, 14-18th, March 1999.

[472] M. Nishiguchi and J. Matsumoto, “Harmonic and Noise Coding of LPC Residuals withClassified Vector Quantization,” inProceedings of ICASSP, Detroit, MI, USA, vol. 1,pp. 484–487, 9-12th, May 1995.

[473] M. Nishiguchi, K. Iijima, and J. Matsumoto, “Harmonicvector excitation coding ofspeech at 2 kbit/s,” inProceedings of IEEE Workshop on Speech Coding for Telecom-munications, pp. 39–40, 1997.

[474] ISO/IEC JTC1/SC29/WG11/N2203PAR, MPEG-4 Audio Version 1 Final Com-mittee Draft 14496-3 Subpart 2:Parametric Coding,http://www.tnt.uni-hannover.de/project/mpeg/audio/documents/, March 1998.

[475] ISO/IEC JTC1/SC29/WG11/N2203CELP, MPEG-4 Audio Version 1 Final Commit-tee Draft 14496-3 Subpart 3:CELP,http://www.tnt.uni-hannover.de/project/mpeg/audio/documents/, May 1998.

[476] R. Taori, R. J. Sluijter, and A. J. Gerrits, “On scalability in CELP coding systems,” inProceedings of IEEE Workshop on Speech Coding for Telecommunications, pp. 67–68,Sep 1997.

[477] K. Ozawa, M. Serizawa, and T. Nomura, “High quality MP-CELP speech coding at 12kb/s and 6.4 kb/s,” inProceedings of IEEE Workshop on Speech Coding for Telecom-munications, pp. 71–72, Sep 1997.

[478] E. F. Deprettere and P. Kroon, “Regular excitation reduction for effective and effi-cient LP-coding of speech,” inProceedings of ICASSP, (Tampa, Florida), pp. 965–968,March 1985.

[479] C. Laflamme, J. P. Adoul, R. A. Salami, S. Morissette, and P. Mabilleau, “16kbit/sWideband Speech Coding Technique Based On Algebraic CELP,”in Proceedings ofICASSP, Toronto, Canada, vol. 1, pp. 13–16, 14-17th, May 1991.

[480] P. Kroon, R. J. Sluyter, and E. F. Deprettere, “Regular-pulse excitation - a novel ap-proach to effective and efficient multipulse coding of speech,” IEEE Transactions onAcoustics, Speech, and Signal Processing, vol. ASSP-34, pp. 1054–1063, Oct 1986.

[481] P. Chaudhury, “The 3GPP proposal for IMT-2000,”IEEE Communications Magazine,vol. 37, pp. 72–81, Dec 1999.

Page 189: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 877

BIBLIOGRAPHY 877

[482] G. Foschini Jr. and M. Gans, “On limits of wireless communication in a fading envi-ronment when using multiple antennas,”Wireless Personal Communications, vol. 6,pp. 311–335, March 1998.

[483] V. Tarokh, N. Seshadri, and A. Calderbank, “Space-time codes for high data rate wire-less communication: Performance criterion and code construction,” IEEE Transac-tions on Information Theory, vol. 44, pp. 744–765, March 1998.

[484] V. Tarokh, H. Jafarkhani, and A. Calderbank, “Space-time block codes from orthogo-nal designs,”IEEE Transactions on Information Theory, vol. 45, pp. 1456–1467, July1999.

[485] V. Tarokh, H. Jafarkhani, and A. Calderbank, “Space-time block coding for wirelesscommunications: Performance results,”IEEE Journal on Selected Areas in Communi-cations, vol. 17, pp. 451–460, March 1999.

[486] G. Bauch, “Concatenation of space-time block codes and Turbo-TCM,” inProceedingsof IEEE International Conference on Communications, Vancouver, Canada, pp. 1202–1206, June 1999.

[487] D. Agrawal, V. Tarokh, A. Naguib, and N. Seshadri, “Space-time coded OFDM forhigh data-rate wireless communication over wideband channels,” in Proceedings ofIEEE Vehicular Technology Conference, (Ottawa, Canada), pp. 2232–2236, May 1998.

[488] Y. Li, J. Chuang, and N. Sollenberger, “Transmitter diversity for OFDM systems andits impact on high-rate data wireless networks,”IEEE Journal on Selected Areas inCommunications, vol. 17, pp. 1233–1243, July 1999.

[489] A. Naguib, N. Seshadri, and A. Calderbank, “Increasing data rate over wireless chan-nels,” IEEE Signal Processing Magazine, vol. 17, pp. 76–92, May 2000.

[490] A. Naguib, V. Tarokh, N. Seshadri, and A. Calderbank, “A space-time coding mo-dem for high-data-rate wireless communications,”IEEE Journal on Selected Areas inCommunications, vol. 16, pp. 1459–1478, October 1998.

[491] H. Holma and A. Toskala,WCDMA for UMTS. John Wiley-IEEE Press, April 2000.

[492] R. W. Chang, “Synthesis of Band-Limited Orthogonal Signals for Multichannel DataTransmission,”BSTJ, vol. 46, pp. 1775–1796, Dec 1966.

[493] J. Cimini, “Analysis and Simulation of a Digital Mobile Channel Using Orthog-onal Frequency Division Multiplexing,”IEEE Transactions on Communications,vol. COM-33, pp. 665–675, July 1985.

[494] K. Fazel and G. Fettweis,Multi-carrier spread spectrum. Kluwer, 1997.

[495] P. R. K. Fazel, S. Kaiser, and M. Ruf, “A concept of digital terrestial television broad-casting,”Wireless Personal Communications, vol. 2, pp. 9–27, 1995.

[496] I. J. H. Sari and G. Karam, “Transmission techniques for digital terrestial tv broadcast-ing,” IEEE Communications Magazine, vol. 33, pp. 100–109, Feb 1995.

Page 190: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 878

878 BIBLIOGRAPHY

[497] J. Borowski, S. Zeiberg, J. Hbner, K. Koora, E. Bogenfeld, and B. Kull, “Performanceof ofdm and comparable single carrier system in median demonstration 60 ghz chan-nel,” in Proceedings of ACTS Summit, (Aalborg, Denmark), pp. 653–658, Oct 1997.

[498] J. C. I. Chuang, Y. G. Li, and N. R. Sollenberger, “OFDM based High-speed WirelessAccess for Internet Applications,” inProceedings of PIMRC Fall, London, UK, 18-21th September 2000.

[499] T. Keller, M. Muenster, and L. Hanzo, “A Turbo-Coded Burst-By-Burst Adap-tive Wideband Speech Transceiver,”Journal on Selected Areas in Communications,vol. 18, pp. 2363–2372, Nov 2000.

[500] L. Hanzo, C. H. Wong, and P. Cherriman, “Channel-adaptive wideband wireless videotelephony,”IEEE Signal Processing Magazine, vol. 17, pp. 10–30, July 2000.

[501] MPEG Audio Web Page,http://www.tnt.uni-hannover.de/project/mpeg/audio/.

[502] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error-correctingcoding and decoding: Turbo codes,” inIEEE International Conference on Communi-cations, pp. 1064–1070, 23-26th, May 1993.

[503] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes forminimizing symbol error rate,”IEEE Transactions on Information Theory, vol. 20,pp. 284–287, March 1974.

[504] R. Koenen , “MPEG-4 Overview,” inISO/IEC JTC1/SC29/WG11 N4668, version 21-Jeju Version, ISO/IEC, (http://www.chiariglione.org/mpeg/standards/mpeg-4/mpeg-4.htm), March 2002.

[505] R. Koenen, “MPEG-4 Multimedia for Our Time,” vol. 36, pp. 26–33, February 1999.

[506] ISO/IEC JTC1/SC29/WG11 N2503, “Information Technology-Very Low BitrateAudio-Visual Coding,” inISO/IEC 14496-3. Final Draft International Standard. Part3: Audio, 1998.

[507] J. Herre, and B. Grill, “Overview of MPEG-4 Audio and its Applications in MobileCommunications,” vol. 1, pp. 11–20, August 2000.

[508] F. Pereira and T. Ebrahimi ,The MPEG-4 Book. New Jersey, USA: Prentice Hall PTRIMSC Press, 2002.

[509] S. Lin and D. J. Costello, Jr,Error Control Coding: Fundamentals and Applications.Inc. Englewood Cliffs, New Jersey 07632: Prentice-Hall, 1983.

[510] G. Ungerbock, “Channel Coding with Multilevel/Phase Signals,”IEEE Transactionson Information Theory, vol. 28, pp. 55–67, January 1982.

[511] L. Hanzo, T. H. Liew and B. L. Yeap,Turbo Coding, Turbo Equalisation and SpaceTime Coding for Transmission over Wireless channels. New York, USA: John WileyIEEE Press, 2002.

Page 191: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 879

BIBLIOGRAPHY 879

[512] L. Hanzo, S. X. Ng, W. T. Webb and T. Keller,Quadrature Amplitude Modula-tion: From Basics to Adaptive Trellis-Coded, Turbo-Equalised and Space-Time CodedOFDM, CDMA and MC-CDMA Systems. New York, USA: John Wiley IEEE Press,2004.

[513] V. Tarokh, N. Seshadri and A. R. Calderbank, “Space-time Codes for High Rate Wire-less Communication: Performance analysis and code construction,” IEEE Transac-tions on Information Theory, vol. 44, pp. 744–765, March 1998.

[514] L. Hanzo, P.J. Cherriman and J. Street,Wireless Video Communications: Second toThird Generation Systems and Beyond. NJ, USA : IEEE Press., 2001.

[515] L. Hanzo, F.C.A. Somerville, and J.P. Woodard,Voice Compression and Communica-tions: Principles and Applications for Fixed and Wireless Channels. Chichester, UK:John Wiley-IEEE Press, 2001.

[516] S. X. Ng, J. Y. Chung and L. Hanzo, “Turbo-Detected Unequal Protection MPEG-4 Wireless Video Telephony using Trellis Coded Modulation and Space-Time TrellisCoding,” inIEE International Conference on 3G Mobile Communication Technologies(3G 2004), (London, UK), 18 - 20 October 2004.

[517] E. Zehavi, “8-PSK trellis codes for a Rayleigh fading channel,”IEEE Transactions onCommunications, vol. 40, pp. 873–883, May 1992.

[518] S. X. Ng, J. Y. Chung and L. Hanzo, “Integrated wirelessmultimedia turbo-transceiverdesign - Interpreting Shannon’s lessons in the turbo-era,”in IEE Sparse-Graph CodesSeminar, (The IEE, Savoy Place, London), October 2004.

[519] R.V. Cox, J. Hagenauer, N. Seshadri, and C-E. W. Sundberg, “Subband Speech Codingand Matched Convolutional Coding for Mobile Radio Channels,” IEEE Transaction onSignal Processing, vol. 39, pp. 1717–1731, August 1991.

[520] B. Bessette, R. Salami, R. Lefebvre, M. Jelinek, J. Rotola-Pukkila, J. Vainio, H.Mikkola and K. Jarvinen, “The Adaptive Multirate Wideband Speech Codec (AMR-WB),” IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 620–636,November 2002.

[521] 3GPP TS 26.173, “Adaptive Multi-Rate Wideband SpeechANSI-C Code,” in3GPPTechnical Specification, 2003.

[522] S. X. Ng and L. Hanzo, “On the MIMO Channel Capacity of Multi-Dimensional SignalSets,” inIEEE Vehicular Technology Conference, (Los Angeles, USA), 26-29 Septem-ber 2004.

[523] T. Fingscheidt, and P. Vary, “Softbit Speech Decoding: A New Approach to ErrorConcealment,”IEEE Transaction on Speech and Audio Processing, vol. 9, pp. 240–251, March 2001.

[524] B. Atal and M. Schroeder, “Predictive coding of speechsignals,”Bell System TechnicalJournal, pp. 1973–1986, October 1970.

Page 192: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 880

880 BIBLIOGRAPHY

[525] I. Wassel, D. Goodman, and R. Steele, “Embedded delta modulation,” IEEE Trans-actions on Acoustics, Speech and Signal Processing, vol. 36, pp. 1236–1243, August1988.

[526] B. Atal and S. Hanauer, “Speech analysis and synthesisby linear prediction of thespeech wave,”The Journal of the Acoustical Society of America, vol. 50, no. 2,pp. 637–655, 1971.

[527] M. Kohler, L. Supplee, and T. Tremain, “Progress towards a new government stan-dard 2400bps voice coder,” inProceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP’95)[614], pp. 488– –491.

[528] K. Teague, B. Leach, and W. Andrews, “Development of a high-quality MBE basedvocoder for implementation at 2400bps,” inProceedings of the IEEE Wichita Con-ference on Communications, Networking and Signal Processing, pp. 129–133, April1994.

[529] H. Hassanein, A. Brind’Amour, S. Dery, and K. Bryden, “Frequency selective har-monic coding at 2400bps,” inProceedings of the 37th Midwest Symposium on Circuitsand Systems, vol. 2, pp. 1436–1439, 1995.

[530] R. McAulay and T. Quatieri, “The application of subband coding to improve qualityand robustness of the sinusoidal transform coder,” inProceedings of the IEEE Inter-national Conference on Acoustics, Speech and Signal Processing (ICASSP’93)[633],pp. 439–442.

[531] A. McCree and T. Barnwell III, “A mixed excitation LPC vocoder model for low bitrate speech coding,”IEEE Transactions on Speech and audio Processing, vol. 3, no. 4,pp. 242–250, 1995.

[532] P. Laurent and P. L. Noue, “A robust 2400bps subband LPCvocoder,” inProceedingsof the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’95)[614], pp. 500–503.

[533] W. Kleijn and J. Haagen, “A speech coder based on decomposition of characteris-tic waveforms,” inProceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP’95)[614], pp. 508–511.

[534] R. McAulay and T. Champion, “Improved interoperable 2.4 kb/s LPC using sinusoidaltransform coder techniques,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’90[623], pp. 641–643.

[535] K. Teague, W. Andrews, and B. Walls, “Harmonic speech coding at 2400 bps,” inPro-ceedings of 10th Annual Mid-America Symposium on Emerging Computer Technology,(Norman, Oklahoma, USA), 1996.

[536] J. Makhoul, R. Viswanathan, R. Schwartz, and A. Huggins, “A mixed-source model forspeech compression and synthesis,”The Journal of the Acoustical Society of America,vol. 64, no. 4, pp. 1577–1581, 1978.

Page 193: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 881

BIBLIOGRAPHY 881

[537] A. McCree, K. Truong, E. George, T. Barnwell, and V. Viswanathan, “A 2.4kbit/s codercandidate for the new U.S. federal standard,” inProceedings of the IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP’96)[616], pp. 200–203.

[538] A. McCree and T. Barnwell III, “Improving the performance of a mixed excitation LPCvocoder in acoustic noise,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’92[635], pp. 137–140.

[539] J. Holmes, “The influence of glottal waveform on the naturalness of speech from a par-allel formant synthesizer,”IEEE Transaction on Audio and Electroacoustics, vol. 21,pp. 298–305, June 1973.

[540] W. Kleijn, Y. Shoham, D. Sen, and R. Hagen, “A low-complexity waveform interpola-tion coder,” inProceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP’96)[616], pp. 212–215.

[541] D. Hiotakakos and C. Xydeas, “Low bit rate coding usingan interpolated zinc excita-tion model,” inProceedings of the ICCS 94, pp. 865–869, 1994.

[542] R. Sukkar, J. LoCicero, and J. Picone, “Decompositionof the LPC excitation using thezinc basis functions,”IEEE Transactions on Acoustics, Speech and Signal Processing,vol. 37, no. 9, pp. 1329–1341, 1989.

[543] M. Schroeder, B. Atal, and J. Hall, “Optimizing digital speech coders by exploitingmasking properties of the human ear,”Journal of the Acoustical Society of America,vol. 66, pp. 1647–1652, December 1979.

[544] W. Voiers, “Diagnostic acceptability measure for speech communication systems,” inProceedings of International Conference on Acoustics, Speech, and Signal Processing,ICASSP’77[618], pp. 204–207.

[545] W. Voiers, “Evaluating processed speech using the diagnostic rhyme test,”SpeechTechnology, January/February 1983.

[546] T. Tremain, M. Kohler, and T. Champion, “Philosophy and goals of the D.O.D 2400bpsvocoder selection process,” inProceedings of the IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP’96)[616], pp. 1137–1140.

[547] M. Bielefeld and L. Supplee, “Developing a test program for the DoD 2400bps vocoderselection process,” inProceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP’96)[616], pp. 1141–1144.

[548] J. Tardelli and E. Kreamer, “Vocoder intelligibilityand quality test methods,” inPro-ceedings of the IEEE International Conference on Acoustics, Speech and Signal Pro-cessing (ICASSP’96)[616], pp. 1145–1148.

[549] A. Schmidt-Nielsen and D. Brock, “Speaker recognizability testing for voice coders,”in Proceedings of the IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP’96)[616], pp. 1149–1152.

Page 194: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 882

882 BIBLIOGRAPHY

[550] E. Kreamer and J. Tardelli, “Communicability testingfor voice coders,” inProceedingsof the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’96)[616], pp. 1153–1156.

[551] B. Atal and L. Rabiner, “A pattern recognition approach to voiced-unvoiced-silenceclassification with applications to speech recognition,”IEEE Transactions an Acous-tics, Speech and Signal Processing, vol. 24, pp. 201–212, June 1976.

[552] T. Ghiselli-Crippa and A. El-Jaroudi, “A fast neural net training algorithm and its ap-plication to speech classification,”Engineering Applications of Artificial Intelligence,vol. 6, no. 6, pp. 549–557, 1993.

[553] A. Noll, “Cepstrum pitch determination,”Journal of the Acoustical Society of America,vol. 41, pp. 293–309, February 1967.

[554] S. Kadambe and G. Boudreaux-Bartels, “Application ofthe wavelet transform forpitch detection of speech signals,”IEEE Transactions on Information Theory, vol. 38,pp. 917–924, March 1992.

[555] L. Rabiner, M. Cheng, A. Rosenberg, and C. MGonegal, “Acomparative performancestudy of several pitch detection algorithms,”IEEE Transactions on Acoustics, Speech,and Signal Processing, vol. 24, no. 5, pp. 399–418, 1976.

[556] DVSI, Inmarsat-M Voice Codec, Issue 3.0 ed., August 1991.

[557] M. Sambur, A. Rosenberg, L. Rabiner, and C. McGonegal,“On reducing the buzz inLPC synthesis,”Journal of the Acoustical Society of America, vol. 63, pp. 918–924,March 1978.

[558] A. Rosenberg, “Effect of glottal pulse shape on the quality of natural vowels,”Journalof the Acoustical Society of America, vol. 49, no. 2 pt.2, pp. 583–590, 1971.

[559] T. Koornwinder,Wavelets: An Elementary Treatment of Theory and Applications.World Scientific, 1993.

[560] C. Chui,Wavelet Analysis and its Applications, vol. I: An Introduction to Wavelets.New York, USA: Academic Press, 1992.

[561] C. Chui,Wavelet Analysis and its Applications, vol. II: Wavelets: A Tutorial in Thoeryand Applications. New York, USA: Academic Press, 1992.

[562] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal ProcessingMagazine, pp. 14–38, October 1991.

[563] A. Graps, “An introduction to wavelets,”IEEE Computational Science & Engineering,pp. 50–61, Summer 1995.

[564] A. Cohen and J. Kovacevic, “Wavelets: The mathematical background,”Proceedingsof the IEEE, vol. 84, pp. 514–522, April 1996.

Page 195: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 883

BIBLIOGRAPHY 883

[565] I. Daubechies, “The wavelet transform, time-frequency localization and signal anal-ysis,” IEEE Transactions on Information Theory, vol. 36, pp. 961–1005, September1990.

[566] S. Mallat, “A theory for multiresolution signal decomposition: the wavelet represen-tation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11,pp. 674–693, July 1989.

[567] H. Baher,Analog & Digital Signal Processing. New York, USA: John Wiley and Sons,1990.

[568] J. Stegmann, G. Schroder, and K. Fischer, “Robust classification of speech basedon the dyadic wavelet transform with application to CELP coding,” in Proceedingsof the IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP’96)[616], pp. 546–549.

[569] S. Mallat and S. Zhong, “Characterization of signals from multiscale edges,”IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 710–732, July1992.

[570] M. Unser and A. Aldroubi, “A review of wavelets in biomedical applications,”Pro-ceedings of the IEEE, vol. 84, pp. 626–638, April 1996.

[571] C. Li, C. Zheng, and C. Tai, “Detection of ECG characteristic points using wavelettransforms,”IEEE Transactions in Biomedical Engineering, vol. 42, pp. 21–28, Jan-uary 1995.

[572] S. Mallat and W. Hwang, “Singularity detection and processing with wavelets,”IEEETransactions on Information Theory, vol. 38, pp. 617–643, March 1992.

[573] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Englewood Cliffs, NJ,USA: Prentice-Hall, 1995.

[574] R. Sukkar, J. LoCicero, and J. Picone, “Design and implementation of a robust pitchdetector based on a parallel processing technique,”IEEE Journal on Selected Areas inCommunications, vol. 6, pp. 441–451, February 1988.

[575] R. Steele and L. Hanzo, eds.,Mobile Radio Communications. Piscataway, NJ, USA:IEEE Press, 1999.

[576] F. Brooks, B. Yeap, J. Woodard, and L. Hanzo, “A sixth-rate, 3.8kbps GSM-likespeech transceiver,” inProceeding of ACTS Mobile Communication Summit ’98[612],pp. 647–652.

[577] F. Brooks, E. Kuan, and L. Hanzo, “A 2.35kbps joint-detection CDMA speechtransceiver,” inProceeding of VTC’99 (Spring)[613], pp. 2403–2407.

[578] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and sub-optimalMAP decoding algorithms operating in the log domain,” inProceedings of the Inter-national Conference on Communications, pp. 1009–1013, June 1995.

Page 196: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 884

884 BIBLIOGRAPHY

[579] P. Robertson, “Illuminating the structure of code anddecoder of parallel concatenatedrecursive systematic (turbo) codes,”IEEE Globecom, pp. 1298–1303, 1994.

[580] W. Koch and A. Baier, “Optimum and sub-optimum detection of coded data disturbedby time-varying inter-symbol interference,”IEEE Globecom, pp. 1679–1684, Decem-ber 1990.

[581] J. Erfanian, S. Pasupathy, and G. Gulak, “Reduced complexity symbol dectectors withparallel structures for ISI channels,”IEEE Transactions on Communications, vol. 42,pp. 1661–1671, 1994.

[582] J. Hagenauer and P. Hoeher, “A viterbi algorithm with soft-decision outputs and itsapplications,” inIEEE Globecom, pp. 1680–1686, 1989.

[583] C. Berrou, P. Adde, E. Angui, and S. Faudeil, “A low complexity soft-output viterbidecoder architecture,” inProceedings of the International Conference on Communica-tions, pp. 737–740, May 1993.

[584] L. Rabiner, C. McGonegal, and D. Paul,FIR Windowed Filter Design Program - WIN-DOW, ch. 5.2. IEEE Press, 1979.

[585] S. Yeldner, A. Kondoz, and B. Evans, “Multiband linearpredictive speech coding atvery low bit rates,”IEE Proceedings in Vision, Image and Signal Processing, vol. 141,pp. 284–296, October 1994.

[586] A. Klein, R. Pirhonen, J. Skoeld, and R. Suoranta, “FRAMES multiple access mode1 — wideband TDMA with and without spreading,” inProceedings of IEEE Interna-tional Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC’97[624], pp. 37–41.

[587] J. Flanagan and R. Golden, “Phase vocoder,”The Bell System Technical Journal,pp. 1493–1509, November 1966.

[588] R. McAulay and T. Quatieri, “Speech analysis/synthesis based on sinusoidal repre-sentation,”IEEE Transactions on Acustics, Speech and Signal Processing, vol. 34,pp. 744–754, August 1986.

[589] L. Almeida and J. Tribolet, “Nonstationary spectral modelling of voiced speech,”IEEETransactions on Acoustics, Speech and Signal Processing, vol. 31, pp. 664–677, June1983.

[590] E. George and M. Smith, “Analysis-by-synthesis/overlap-add sinusoidal modelling ap-plied to the analysis and synthesis of musical tones,”Journal of the Audio EngineeringSociety, vol. 40, pp. 497–515, June 1992.

[591] E. George and M. Smith, “Speech analysis/synthesis and modification using ananalysis-by-synthesis/overlap-add sinusoidal model,”IEEE Transaction on Speechand Audio Processing, vol. 5, pp. 389–406, September 1997.

[592] R. McAulay and T. Quatieri, “Pitch estimation and voicing detection based on a sinu-soidal speech model,” inProceedings of ICASSP’ 90, pp. 249–252, 1990.

Page 197: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 885

BIBLIOGRAPHY 885

[593] R. McAulay and T. Quatieri, “Sinusoidal coding,” inSpeech Coding and Synthesis(W.B.Keijn and K.K.Paliwal, eds.), ch. 4, Netherlands: Elsevier Science, 1995.

[594] R. McAulay, T. Parks, T. Quatieri, and M. Sabin, “Sine-wave amplitude coding at lowdata rates,” inAdvances in Speech Coding(V. B.S.Atal and A.Gersho, eds.), pp. 203–214, Dordrecht: Kluwer Academic Publishers, 1991.

[595] M. Nishiguchi and J. Matsumoto, “Harmonic and noise coding of LPC residuals withclassified vector quantization,” inProceedings of the IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP’95)[614], pp. 484–487.

[596] V. Cuperman, P. Lupini, and B. Bhattacharya, “Spectral excitation coding of speech at2.4kb/s,” inProceedings of the IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP’95)[614], pp. 496–499.

[597] S. Yeldner, A. Kondoz, and B. Evans, “High quality multiband LPC coding of speechat 2.4kbit/s,”Electronics Letters, vol. 27, no. 14, pp. 1287–1289, 1991.

[598] H. Yang, S.-N. Koh, and P. Sivaprakasapillai, “Pitch synchronous multi-band (PSMB)speech coding,” inProceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP’95)[614], pp. 516–518.

[599] E. Erzin, A. Kumar, and A. Gersho, “Natural quality variable-rate spectral speech cod-ing below 3.0kbps,” inProceedings of the IEEE International Conference on Acous-tics, Speech and Signal Processing (ICASSP’97)[617], pp. 1579–1582.

[600] C. Papanastasiou and C. Xydeas, “Efficient mixed excitation models in LPC based pro-totype interpolation speech coders,” inProceedings of the IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP’97) [617], pp. 1555–1558.

[601] O. Ghitza, “Auditory models and human performance in tasks related to speech codingand speech recognition,”IEEE Transactions on Speech and Audio Processing, vol. 2,pp. 115–132, January 1994.

[602] K. Kryter, “Methods for the calculation of the articulation index,” tech. rep., AmericanNational Standards Institute, 1965.

[603] U. Halka and U. Heute, “A new approach to objective quality-measures based on at-tribute matching,”Speech Communications, vol. 11, pp. 15–30, 1992.

[604] S. Wang, A. Sekey, and A. Gersho, “An objective measurefor predicting subjectivequality of speech coders,”Journal on Selected Areas in Communications, vol. 10,pp. 819–829, June 1992.

[605] T. Barnwell III and A. Bush, “Statistical correlationbetween objective and subjectivemeasures for speech quality,” inProceedings of International Conference on Acoustics,Speech, and Signal Processing, ICASSP’78, (Tulsa, Okla, USA), pp. 595–598, IEEE,10–12 April 1978.

[606] T. Barnwell III, “Correlation analysis of subjectiveand objective measures for speechquality,” in Proceedings of International Conference on Acoustics, Speech, and SignalProcessing, ICASSP’80[625], pp. 706–709.

Page 198: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 886

886 BIBLIOGRAPHY

[607] P. Breitkopf and T. Barnwell III, “Segmental preclassification for improved objectivespeech quality measures,” inIEEE Proceedings of International Conference on Acous-tic Speech Signal Processing, pp. 1101–1104, 1981.

[608] L. Hanzo and L. Hinsenkamp, “On the subjective and objective evaluation of speechcodecs,”Budavox Telecommunications Review, no. 2, pp. 6–9, 1987.

[609] K. Kryter, “Masking and speech communications in noise,” in The effects of Noise onMan, ch. 2, New York, USA: Academic Press, 1970. ISBN: 9994669966.

[610] A. House, C. Williams, M. Hecker, and K. Kryter, “Articulation testing methods:Consonated differentiation with a closed-response set,”Journal Acoustic Soc. Am.,pp. 158–166, January 1965.

[611] IEEE, Proceeding of VTC’99 (Fall), (Amsterdam, Netherlands), 19–22 September1999.

[612] ACTS, Proceeding of ACTS Mobile Communication Summit ’98, (Rhodes, Greece),8–11 June 1998.

[613] IEEE,Proceeding of VTC’99 (Spring), (Houston, TX, USA), 16–20 May 1999.

[614] IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP’95), (Detroit, MI, USA), 9–12 May 1995.

[615] IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP’94), (Adelaide, Australia), 19–22 April 1994.

[616] IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP’96), (Atlanta, GA, USA), 7–10 May 1996.

[617] IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP’97), (Munich, Germany), 21–24 April 1997.

[618] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’77, (Hartford, CT, USA), 9–11 May 1977.

[619] IEEE,Proceedings of IEEE VTC ’94, (Stockholm, Sweden), 8–10 June 1994.

[620] IEEE, Proceedings of IEEE Vehicular Technology Conference (VTC’98), (Ottawa,Canada), 18–21 May 1998.

[621] ACTS,Proceeding of ACTS Mobile Communication Summit ’97, (Aalborg, Denmark),7–10 October 1997.

[622] J. Gibson, ed.,The Mobile Communications Handbook. Boca Raton, FL, USA: CRCPress and IEEE Press, 1996.

[623] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’90, (Albuquerque, New Mexico, USA), 3–6 April 1990.

Page 199: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 887

BIBLIOGRAPHY 887

[624] IEEE,Proceedings of IEEE International Symposium on Personal, Indoor and MobileRadio Communications, PIMRC’97, (Marina Congress Centre, Helsinki, Finland), 1–4September 1997.

[625] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’80, (Denver, CO, USA), 9–11 April 1980.

[626] W. Tuttlebee, ed.,Cordless Telecommunications in Europe: The Evolution of PersonalCommunications. London: Springer-Verlag, 1990. ISBN 3540196331.

[627] IEE, Proceedings of IEE Conference on Radio Receivers and Associated Systems(RRAS’95), (Bath, UK), 26–28 September 1995.

[628] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’84, (San Diego, CA, USA), 19–21 March 1984.

[629] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’89, (Glasgow, Scotland, UK), 23–26 May 1989.

[630] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’91, (Toronto, Ontario, Canada), 14–17 May 1991.

[631] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’87, (Dallas, TX, USA), 6–9 April 1987.

[632] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’88, (New York, NY, USA), 11–14 April 1988.

[633] IEEE, Proceedings of the IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP’93), (Minneapolis, MN, USA), 27–30 April 1993.

[634] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’83, (Boston, MA, USA), 14–16 April 1983.

[635] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’92, March 1992.

[636] IEEE,Proceedings of International Conference on Acoustics, Speech, and Signal Pro-cessing, ICASSP’82, May 1982.

Page 200: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 888

Author Index

Symbols, Jr [509] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5473GPP TS 26.173, [521] . . . . . . . . . . . . . . . . . 552

AAdde, P. [583] . . . . . . . . . . . . . . . . . . . . . 683, 714Adoul, J-P. [170] . . . . . . . . . . . . . . . . . . . . . . . 189Adoul, J-P. [191] . . . . . . . . . . . . . . . . . . . . . . . 221Adoul, J-P. [162] 176, 282, 398, 440–444, 446Adoul, J-P. [225] . . . . . . . . . 298, 302, 303, 307Adoul, J-P. [163] . . . . . 176, 442, 444–446, 487Adoul, J-P. [213] . . . . . . . . . . . . . . . . . . 274, 386Adoul, J-P. [160]175, 189, 274, 307, 323, 350,

386, 394Adoul, J-P. [224] . . . . . . . . . . . . . . . . . . 295, 297Adoul, J-P. [223] . . . . . . . . . . . . . . . . . . 295, 297Adoul, J-P. [294] . . . . . . . . . . . . . . . . . . 443, 446Adoul, J.P. [168]181, 188, 298, 303, 304, 307,

441Adoul, J.P. [479] . . . . . . . . . . . . . . . . . . . . . . . 524Adoul, J.P. [166] . . . . . . . . . . . . . . . . . . . . . . . 181Adoul, J.P. [139] . . . . . . . . . . . . . . . . . . . . . . . 138Adoul, J.P. [228] . . . . . . . . . . . . . . . . . . 304, 307Agrawal, D. [487] . . . . . . . . . . . . . . . . . . . . . . 528Akagiri, K. [40] 5, 6, 490, 491, 498, 499, 502,

507–509, 557Akagiri, K. [413] . . . . . . . . . . . . . . . . . . . . . . .493Akansu, A. [436] . . . . . . . . . . . . . . . . . . . . . . .495Alamouti, S. [50] . . 7, 491, 528, 529, 533, 534Alcaim, A. [282] . . . . . . . . . . . . . . . . . . . . . . . 404Aldroubi, A. [570] . . . . . . . . . . . . . . . . . . . . . 622Almeida, L.B. [589] . . . . . . . . . . . . . . . . . . . . 719Almeida, L.B. [198] . . . . . . . . . . . . . . . . . . . . 244Alouini, M-S. [301] . . . . . . . . . . . . . . . . . . . . 445Ananthapadmanabhan, A. [244] . . . . . . . . . 320Andrews, W. [528] . . . . . . . . . . . . 566, 569, 570

Andrews, W. [535] . . . . . . . . . . . . . . . . .569, 570Angui, E. [583] . . . . . . . . . . . . . . . . . . . .683, 714Anisur Rahham, M.D. [179] . . . . . . . . . . . . .203Antti Toskala, [491] . . . . . . . . . . . 528, 529, 557Appleby, D.G. [132] . . . . . . . . . . . . . . . 129, 131Appleby, D.G. [135] . . . . . . . . . . . . . . . 134, 181Arimochi, K. [298] . . . . . . . . . . . . . . . . . . . . . 445Asghar, S. [82] . . . . . . . . . . . . . . . . . . . . . . . . . . 60Atal, B. [234] . . . . . . . . . . . . . . . . . . . . . . . . . . 320Atal, B.S. [526] . . . . . . . . . . . . . . . 562, 563, 687Atal, B.S. [551] . . . . . . . . . . . . . . . . . . . 594, 621Atal, B.S. [92] . . . . . . . . . . . . . . . . . . . . . . . . . . 94Atal, B.S. [9] . . . . . . . . . . . . . . . . . 100, 155, 563Atal, B.S. [131] . . . . . . . . . . . . . . . . . . . 129, 131Atal, B.S. [136] . . . . . . . . . . . . . . . . . . . . . . . . 134Atal, B.S. [52] . . . . . . . . . . . . . . . . . . . . . . . . . 561Atal, B.S. [116] .117, 129, 134, 135, 139, 209,

325, 589Atal, B.S. [128] . . . . . . . . . . . . . . . 129, 134–136Atal, B.S. [543] . . . . . . . . . . . . . . . . . . . . . . . . 578Atal, B.S. [16] . . 101, 175–177, 323, 370, 564Atal, B.S. [365] . . . . . . . . . . . . . . . . . . . . . . . . 517Atal, B.S. [175] . . . . . . . . . . . . . . . . . . . . . . . . 202Atal, B.S. [96] . . . . . . . . . . . . . . . . . . . . . . . . . . 96Atal, B.S. [99] . . . . . . . . . . . . . . . . . . . . . . . . . 100Atungsiri, S.A. [187] . . . . . . . . . . . . . . .210, 211

BBaghbadrani, D.K. [167] . . . . . . . . . . . 181, 441Baher, H. [567] . . . . . . . . . . . . . . . . . . . . . . . . 619Bahl, L. [503] . . . . . . . . . . . . . . . . . . . . . . . . . 529Bahl, L.R. [221] . . . . . . . . . . . . . . 289, 449, 683Baier, A. [580] . . . . . . . . . . . . . . . . . . . . . . . . . 683Baier, P.W. [264] . . . . . . . . . . . . . . . . . . 335, 336Barbulescu, A.S [220] . . . . . . . . . . . . . . . . . . 289Barnwell, T.P. [537] . . . . . . . . . . . . . . . . . . . . 571

888

Page 201: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 889

AUTHOR INDEX 889

Barnwell, T.P. III [538] . . . . . . . . . . . . . . . . . 571Barnwell, T.P. III [531] . . . . . . . .567, 571, 572,

610–614, 663, 695, 696, 759Barton, S.K. [318]. . . . . . . . . . . . . . . . . . . . . .447Barton, S.K. [317]. . . . . . . . . . . . . . . . . . . . . .447Bauch, G. [486] . . . . . . . . . . . . . . . . . . . . . . . . 528Baum, K.L. [326] . . . . . . . . . . . . . . . . . . . . . . 447Bennett, W.R. [63] . . . . . . . . . . . . . . . . . . . . . . 29Beritelli, F. [246] . . . . . . . . . . . . . . . . . . . . . . . 321Berrou, C. [583] . . . . . . . . . . . . . . . . . . . 683, 714Berrou, C. [216] .289, 447, 449, 680, 681, 713Berrou, C. [502] . . . . . . . . . . . . . . 528, 529, 534Berrou, C. [217] .289, 447, 449, 680, 681, 713Besette, B. [225] . . . . . . . . . 298, 302, 303, 307Bessette, B. [520] . . . . . . . . . . . . . . . . . .548, 549Bessette, B. [358] . . . . . . . . . . . . . . . . . . . . . . 480Bessette, B. [224] . . . . . . . . . . . . . . . . . .295, 297Bessette, B. [223] . . . . . . . . . . . . . . . . . .295, 297Bhattacharya, B. [596] . . . . . . . . . . . . . 723, 724Bielefeld, M.R. [547] . . . . . . . . . . . . . . . . . . . 581Bingham, J.A.C. [320] . . . . . . . . . . . . . . . . . . 447Black, A.W. [161] . . . 176, 437, 438, 440, 445,

446, 487Blocher, P. [257]. . . . . . . . . . . . . . . . . . .323, 328Blumstein, S.E. [20] . . . . . . . . . . . . . . . . . . . . 562Bogenfeld, E. [497] . . . . . . . . . . . . . . . . . . . . 528Bogenfeld, E. [323] . . . . . . . . . . . . . . . . . . . . 447Borowski, J. [497]. . . . . . . . . . . . . . . . . . . . . .528Borowski, J. [323]. . . . . . . . . . . . . . . . . . . . . .447Bosi, M. [40] . . . 5, 6, 490, 491, 498, 499, 502,

507–509, 557Bosi, M. [38] . . . . . . . . . . . . . . . . . 5, 6, 489, 491Bosi, M. [412] . . . . . . . . . . . . . . . . . . . . 491, 501Boudreaux-Bartels, G.F. [554] . 594, 621, 627,

631Bradley, A.B. [442] . . . . . . . . . . . . . . . . 496, 501Brandenburg, K. [40] 5, 6, 490, 491, 498, 499,

502, 507–509, 557Brandenburg, K. [35] . . . . . 5, 6, 489, 491, 496Brandenburg, K. [419] . . . . . . . . . . . . . . . . . . 493Brandenburg, K. [426] . . . . . . . . . . . . . 494, 495Brandenburg, K. [44] . . . . . 5, 6, 490, 491, 517Brandenburg, K. [445] . . . . . . . . . . . . . . . . . . 498Brandenburg, K. [38] . . . . . . . . . . 5, 6, 489, 491Brandenburg, K. [452] . . . . . . . . . . . . . 507, 508Brind’Amour, A. [529] . . . . . . . . 566–568, 723Brock, D.P. [549] . . . . . . . . . . . . . . . . . . . . . . 581Bruhn, S. [231] . . . . . . . . . . . . . . . . . . . . 319, 323Bruhn, S. [29] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Bruhn, S. [257] . . . . . . . . . . . . . . . . . . . . 323, 328

Bryden, K. [529] . . . . . . . . . . . . . . 566–568, 723Buzo, A. [280] . . . . . . . . . . . . . . . . . . . . 378, 440

CCalderbank, A. [490] . . . . . . . . . . . . . . . . . . . 528Calderbank, A. [489]. . . . . . . . . . . . . . .528, 534Calderbank, A. [485] . . . . . . . . . . . . . . . . . . . 528Calderbank, A. [483]. . . . . . . . . . . . . . .528, 533Calderbank, A. [484] . . . . . . . . . . . . . . . . . . . 528Calderbank, A.R. [513] . . . 541, 542, 549, 552Campbell, J.P. [186] . . . . . . 209, 210, 214, 242Campbell, J.P. [197] . . . . . . . . . . . 242, 245, 596Campbell, W.M. [241] . . . . . . . . . . . . . . . . . . 320Carciofy, C. [193] . . . . . . . . . . . . . . . . . . . . . . 230Cattermole, K.W. [4] . . . . . . . . . . . . 32, 33, 561Cellario, L. [237] . . . . . . . . . . . . . . . . . . . . . . 320Champion, T. [534] . . 568, 569, 723, 724, 728Champion, T.G. [546] . . . . . . . . . . . . . . . . . . 580Chang, R.W. [310] . . . . . . . . . . . . . . . . . . . . . 447Chang, R.W. [492] . . . . . . . . . . . . . . . . . . . . . 528Charbonnier, A. [447] . . . . . . . . . . . . . . . . . . 502Chaudhury, P. [481] . . . . . . . . . . . . . . . . . . . . 527Cheetham, B.M.G. [121] . . . . . . . . . . . . . . . . 121Cheetham, B.M.G. [125] . . . . . . . . . . . . . . . . 125Chen, J-H. [275] . . . . . . . . . . . . . . . . . . . . . . . 356Chen, J-H. [272] . . . . . . . . . . . . . . . . . . . . . . . 352Chen, J-H. [274] . . . . . . . . . . . . . . . . . . .352, 354Chen, J-H. [108] . . . . . 102, 247, 352, 355, 366Chen, J-H. [273] . . . . . . . . . . . . . . . . . . .352, 378Chen, J-H. [276] . . . . . . . . . 364, 378, 408, 440Chen, J-H. [94] . . 96, 101, 102, 104, 349–352,

355, 357, 363, 401Chen, J-H. [174] . . . . . . . . . . . . . . . . . . . . . . . 201Chen, J-H. [110]103, 264, 268, 284, 327, 607,

609Chen, J.H. [141] . . . . . . . . . . . . . . . . . . . . . . . 138Cheng, M.J. [555] . . . . . . . . . . . . . . . . . . . . . . 594Cherriman, P. [500] . . . . . . . . . . . . . . . . . . . . 528Cherriman, P.J. [514] . . . . . . . . . . . . . . 542, 548Cheung, J.C.S [155] . . . . . . . . . . . 171, 173, 222Chiariglione, L. [30] . . . . . . . . . . . . . . . . . 5, 489Chiariglione, L. [31] . . . . . . . . . . . . . . . . . 5, 489Chiariglione, L. [32] . . . . . . . . . . . . . . . . . 5, 489Chow, P.S. [320] . . . . . . . . . . . . . . . . . . . . . . . 447Choy, E. [244] . . . . . . . . . . . . . . . . . . . . . . . . . 320Chu, C.C. [172] . . . . . . . . . . . . . . . . . . . . . . . . 197Chua, S. [266] . . . . . . . . . . . . . . . . . . . . .336, 445Chua, S. [307] . . . . . . . . . . . . . . . . . . . . . . . . . 447Chuang, J. [488] . . . . . . . . . . . . . . . . . . . . . . . 528Chuang, J.C.I. [498] . . . . . . . . . . . . . . . . . . . . 528

Page 202: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 890

890 AUTHOR INDEX

Chui, C.K. [560] . . . . . . . . . . . . . . . . . . 622, 623Chui, C.K. [561] . . . . . . . . . . . . . . . . . . 619, 622Chung, J.Y. [518] . . . . . . . . . 547–549, 557, 558Chung, J.Y. [516]543, 549, 552, 553, 557, 558Cimini, J. [493] . . . . . . . . . . . . . . . . . . . . . . . . 528Cimini, L.J. [311] . . . . . . . . . . . . . . . . . . . . . . 447Cioffi, J.M. [320] . . . . . . . . . . . . . . . . . . . . . . 447Classen, F. [315] . . . . . . . . . . . . . . . . . . . . . . . 447Classen, F. [316] . . . . . . . . . . . . . . . . . . . . . . . 447Cocke, J. [221] . . . . . . . . . . . . . . . 289, 449, 683Cocke, J. [503]. . . . . . . . . . . . . . . . . . . . . . . . .529Cohen, A. [564] . . . . . . . . . . . . . . . . . . . . . . . .619Combescure, P. [143] . . . . . . . . . . 138, 456, 457Combescure, P. [336] . . . . . . . . . . . . . . . . . . . 463Combescure, P. [170] . . . . . . . . . . . . . . . . . . . 189Contin, L. [45] . . . . . . . . . . . . . . . . . . . . . . 5, 490Costello, D.J. [509]. . . . . . . . . . . . . . . . . . . . .547Cox, R.V. [275] . . . . . . . . . . . . . . . . . . . . . . . . 356Cox, R.V. [272] . . . . . . . . . . . . . . . . . . . . . . . . 352Cox, R.V. [136] . . . . . . . . . . . . . . . . . . . . . . . . 134Cox, R.V. [94] . . . 96, 101, 102, 104, 349–352,

355, 357, 363, 401Cox, R.V. [174] . . . . . . . . . . . . . . . . . . . . . . . . 201Cox, R.V. [2] . . . . . . . . . . . . . . . . . . . . . . . . . . 345Cox, R.V. [185] . . . . . . . . . . . . . . . 209, 214–216Cox, R.V. [1] . . . . . . . . . . . . . . . . . . . . . . . . . . 345Cox, R.V. [519] . . . . . . . . . . . . . . . . . . . . . . . . 548Crochiere, R.E. [284] . . . . . . . . . . 419, 561, 687Crochiere, R.E. [285] . . . . . . . . . . . . . . . . . . . 419Crochiere, R.E. [429] . . . . . . . . . . . . . . . . . . . 494Crochiere, R.E. [430] . . . . . . . . . . . . . . . . . . . 494Cuperman, V. [52] . . . . . . . . . . . . . . . . . . . . . 561Cuperman, V. [596] . . . . . . . . . . . . . . . . 723, 724Cuperman, V. [102] . . . . . . . . . . . . . . . . . . . . 101Cuperman, V. [238] . . . . . . . . . . . . . . . . . . . . 320

DD’Agnoli, S.L.Q. [282] . . . . . . . . . . . . . . . . . 404Das, A. [232] . . . . . . . . . . . . . . . . . . . . . . . . . . 320Das, A. [244] . . . . . . . . . . . . . . . . . . . . . . . . . . 320Daubechies, I. [565] . . . . . . . . . . . . . . . 619, 620Davidson, G. [40] . . . 5, 6, 490, 491, 498, 499,

502, 507–509, 557Davidson, G. [412] . . . . . . . . . . . . . . . . 491, 501Davidson, G. [133] . . . . . . . . . . . . 131, 134, 320Davis, M. [412] . . . . . . . . . . . . . . . . . . . 491, 501De Jaco, A. [244] . . . . . . . . . . . . . . . . . . . . . . 320De Jaco, A. [235] . . . . . . . . . . . . . . . . . . 320, 321de La Noue, P. [532] . . . . . . . . . . 567, 570, 571De Marca, J.R.B. [277] . . . . . . . . . . . . .364, 409

De Marca, J.R.B. [282] . . . . . . . . . . . . . . . . . 404Deep Sen, [540] . . . . . . . . . . . . . . . . . . . . . . . .573Degroat, R.D. [180] . . . . . . . . . . . . . . . . . . . . 204Deller, J.R. [19] . . . . . . . . . . . . . . . . . . . . . . . . 562Delprat, M. [168] . . . .181, 188, 298, 303, 304,

307, 441Deprettere, E.F. [478] . . . . . . . . . 521, 523, 524Deprettere, E.F. [11]. .157, 158, 160, 162, 564Deprettere, E.F. [480] . . . . . . . . . . . . . . . . . . .524Dery, S. [529] . . . . . . . . . . . . . . . . 566–568, 723Di Benedetto, M.G. [319] . . . . . . . . . . . . . . . 447Dietz, M. [40] . . 5, 6, 490, 491, 498, 499, 502,

507–509, 557Dietz, M. [455] . . . . . . . . . . . . . . . . . . . . . . . . 508Dite, W. [65] . . . . . . . . . . . . . . . . . . . . . . . . . . . .32Doward, S. [414] . . . . . . . . . . . . . . . . . . . . . . . 493Dowling, E.M. [180] . . . . . . . . . . . . . . . . . . . 204

EEbrahimi, T. [508] . . . . . . . . . . . . . . . . . 541, 542Edler, B. [45] . . . . . . . . . . . . . . . . . . . . . . . . 5, 490Edler, B. [468] . . . . . . . . . . . . . . . . . . . . . . . . . 516Edler, B. [444] . . . . . . . . . . . . . . . . . . . . 498, 501Edler, B. [49] . . . . . . . . . . . . . . . . . . . 6, 491, 517Edler, B. [440] . . . . . . . . . . . . . . . . . . . . . . . . . 495Ekudden, E. [231] . . . . . . . . . . . . . . . . . 319, 323Ekudden, E. [29] . . . . . . . . . . . . . . . . . . . . . . . . . 4El-Jaroudi, A. [552]. . . . . . . . . . . . . . . .594, 621Erdmann, C. [337] . . . . . . . . . . . . . . . . . 464, 528Erfanian, J.A. [581] . . . . . . . . . . . . . . . . . . . . 683Eriksson, T. [148] . . . . . . . . . . . . . . . . . . . . . . 149Eriksson, T. [149] . . . . . . . . . . . . . . . . . 149, 150Erzin, E. [599] . . . . . . . . . . . . . . . . . . . . . . . . . 723Esteban, D. [286] . . . . 421, 422, 424, 425, 692Esteban, D. [428] . . . . . . . . . . . . . . . . . . 494, 495Evans, B.G. [187] . . . . . . . . . . . . . . . . . 210, 211Evans, B.G. [161] . . . 176, 437, 438, 440, 445,

446, 487Evans, B.G. [130] . . . . . . . . . . . . . . . . . . . . . . 129Evans, B.G. [188] . . . . . . . . . . . . . . . . . . . . . . 212Evans, B.G. [293] . . . . . . . . . . . . . . . . . . . . . . 439Evans, B.G. [597] . . . . . . . . . . . . . . . . . . . . . . 723Evans, B.G. [585] . . . . . . . . . . . . . . . . . 693, 758

FFaili, M. [409] . . . . . . . . . . . . . . . . 529, 531, 537Failli, M. [331] . . 449, 681, 683, 684, 714, 715Failli, M. [269] . . . . . . . . . . . . . . . . . . . . . . . . 336Farvardin, N. [137] . . . . . . . . . . . . . . . . . . . . . 135Farvardin, N. [115] . . . . . . . . . . . . . . . . 117, 134

Page 203: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 891

AUTHOR INDEX 891

Fastl, H. [424] . . . . . . . . . . . . . . . . 494, 503, 516Faudeil, S. [583] . . . . . . . . . . . . . . . . . . .683, 714Fazel, K. [321] . . . . . . . . . . . . . . . . . . . . . . . . . 447Fazel, K. [312] . . . . . . . . . . . . . . . . . . . . . . . . . 447Fazel, K. [494] . . . . . . . . . . . . . . . . . . . . . . . . . 528Fazel, P.R.K. [495] . . . . . . . . . . . . . . . . . . . . . 528Ferreira, A. [448] . . . . . . . . . . . . . . . . . . . . . . 502Fettweis, G. [312] . . . . . . . . . . . . . . . . . . . . . . 447Fettweis, G. [494] . . . . . . . . . . . . . . . . . . . . . . 528Fielder, L. [40] . 5, 6, 490, 491, 498, 499, 502,

507–509, 557Fielder, L. [412] . . . . . . . . . . . . . . . . . . . 491, 501Fingscheidt, T. [523] . . . . . . . . . . . . . . . . . . . 557Fischer, K. [143] . . . . . . . . . . . . . . 138, 456, 457Fischer, K. [336] . . . . . . . . . . . . . . . . . . . . . . . 463Fischer, K. [337] . . . . . . . . . . . . . . . . . . 464, 528Fischer, K.A. [568] . . . . . . . . . . . . 621, 627, 631Flanagan, J.L. [284] . . . . . . . . . . . 419, 561, 687Flanagan, J.L. [429] . . . . . . . . . . . . . . . . . . . . 494Flanagan, J.L. [587] . . . . . . . . . . . . . . . . . . . . 719Flannery, B.P. [177]. . . . . . .203–206, 404, 605Fletcher, H. [421] . . . . . . . . . . . . . . . . . . . . . . 493Fortune, P.M. [184] . . . . . . . 209, 216, 217, 220Foschini, G. Jr [482] . . . . . . . . . . . . . . . . . . . .527Fransen, L.J. [118]. . . . . . . .118–121, 124, 577Fratti, M. [176] . . . . . . . . . . . . . . . . . . . . 202, 207Frullone, M. [193] . . . . . . . . . . . . . . . . . . . . . 230Fuchs, H. [40] . . 5, 6, 490, 491, 498, 499, 502,

507–509, 557Fudseth, A. [138] . . . . . . . . . . . . . . . . . . 137, 138Furui, S. [22] . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

GGaland, C. [286] . . . . . 421, 422, 424, 425, 692Galand, C. [428] . . . . . . . . . . . . . . . . . . .494, 495Galand, C.R. [289] . . . . . . . . . . . . . . . . . . . . . 427Gans, M. [482] . . . . . . . . . . . . . . . . . . . . . . . . 527Gardner, W. [235] . . . . . . . . . . . . . . . . . 320, 321Gardner, W. [206] . . . . . . . . . . . . . 253, 255, 261Gardner, W. [48] . . . . . . . . . . . . . . . . 6, 491, 528Geher, K. [122] . . . . . . . . . . . . . . . . . . . . . . . . 123George, E.B. [590] . . 720, 721, 723, 732–734,

754, 758George, E.B. [591] . . . 720, 721, 733, 754, 758George, E.B. [537] . . . . . . . . . . . . . . . . . . . . . 571Gerrits, A.J. [476] . . . . . . . . . . . . . . . . . . . . . . 521Gersho, A. [274] . . . . . . . . . . . . . . . . . . 352, 354Gersho, A. [52] . . . . . . . . . . . . . . . . . . . . . . . . 561Gersho, A. [108] . . . . . 102, 247, 352, 355, 366Gersho, A. [273] . . . . . . . . . . . . . . . . . . 352, 378

Gersho, A. [110]103, 264, 268, 284, 327, 607,609

Gersho, A. [232] . . . . . . . . . . . . . . . . . . . . . . . 320Gersho, A. [599] . . . . . . . . . . . . . . . . . . . . . . . 723Gersho, A. [236] . . . . . . . . . . . . . . . . . . . . . . . 320Gersho, A. [142] . . . . . . . . . . . . . . . . . . . . . . . 138Gersho, A. [126] 129, 142, 143, 364, 738–740Gersho, A. [239] . . . . . . . . . . . . . . . . . . 320, 321Gersho, A. [101] . . . . . . . . . . . . . . . . . . . . . . . 101Gersho, A. [133] . . . . . . . . . . . . . . 131, 134, 320Gersho, A. [278] . . . . . . . . . . . . . . . . . . . . . . . 364Gerson, I.A. [204] . . . . . . . . 249, 251, 271, 272Gerson, I.A. [202] . . . . . . . . 247, 249, 251, 269Gerson, I.A. [164] . . . . . . . . . . . . . . . . . 181, 187Gerson, I.A. [203] . . . . . . . . 247, 249, 251, 269Gerson, I.A. [211] . . . . . . . . . . . . . . . . . 269, 272Ghiselli-Crippa, T. [552] . . . . . . . . . . . 594, 621Ghitza, O. [601] . . . . . . . . . . . . . . . . . . . 744, 745Gish, H. [91] . . . . . . . . . . . . . . . . . . 78, 129, 132Glavieux, A. [216]. . .289, 447, 449, 680, 681,

713Glavieux, A. [502] . . . . . . . . . . . . 528, 529, 534Glavieux, A. [217]. . .289, 447, 449, 680, 681,

713Glisson, T.H. [68] . . . . . . . . . . . . . . . . . . . . . . . 38Golden, R.M. [587] . . . . . . . . . . . . . . . . . . . . 719Goldsmith, A.J. [300] . . . . . . . . . . . . . . . . . . 445Goldsmith, A.J. [266] . . . . . . . . . . . . . . 336, 445Goldsmith, A.J. [307] . . . . . . . . . . . . . . . . . . 447Goldsmith, A.J. [301] . . . . . . . . . . . . . . . . . . 445Goldsmith, A.J. [302] . . . . . . . . . . . . . . . . . . 445Golub, G.H. [178] . . . . . . . . . . . . . . . . . 203, 204Goodman, D.J. [192] . . . . . 221, 226, 229, 235Gordos, G. [15] . . . . . . . . . . 105, 107, 110, 111Gray, A.H. [86] . . . . . . . . . . . . . . . . . . . . . 67, 134Gray, A.H. Jr [5] . . . . . . . . . . . . . . . . . . . . . 16, 87Gray, R. [233] . . . . . . . . . . . . . . . . . . . . . . . . . 320Gray, R.M. [280] . . . . . . . . . . . . . . . . . . 378, 440Gray, R.M. [126] 129, 142, 143, 364, 738–740Grazioso, P. [193] . . . . . . . . . . . . . . . . . . . . . . 230Greenwood, D.D. [422] . . . . . . . . . . . . . . . . . 493Griffin, D.W. [103] . . 101, 566, 567, 569, 570,

687Grill, B. [507] . . . . . . . . . . . . . . . . . . . . . 541, 542Gulak, G. [581] . . . . . . . . . . . . . . . . . . . . . . . . 683Guyader, A.L. [336] . . . . . . . . . . . . . . . . . . . . 463

HHaagen, J. [533] . . . . . . . . . . . . . . 567, 573, 653Haavisto, P. [225] . . . . . . . . 298, 302, 303, 307

Page 204: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 892

892 AUTHOR INDEX

Haavisto, P. [228] . . . . . . . . . . . . . . . . . .304, 307Hagen, R. [540] . . . . . . . . . . . . . . . . . . . . . . . . 573Hagenauer, J. [582] . . . . . . . . . . . . . . . . 683, 714Hagenauer, J. [58] . . . . . . . . . . . . . . . . . . . . . . . 15Hagenauer, J. [27] . . . . . . . . . . . . . . . . . . . . . . 683Hagenauer, J. [218] . . . . . . . . . . . . . . . . . . . . .289Hagenauer, J. [519] . . . . . . . . . . . . . . . . . . . . .548Hall, J.L. [543] . . . . . . . . . . . . . . . . . . . . . . . . 578Hanauer, S.L. [526] . . . . . . . . . . . 562, 563, 687Hankanen, T. [225] . . . . . . . 298, 302, 303, 307Hansen, J.H.L. [19] . . . . . . . . . . . . . . . . . . . . 562Harborg, H. [138] . . . . . . . . . . . . . . . . . 137, 138Harri Holma, [491] . . . . . . . . . . . . 528, 529, 557Hashimoto, S. [173] . . . . . . . . . . . . . . . 200, 201Hassanein, H. [529] . . . . . . . . . . . 566–568, 723Hassanein, H. [102] . . . . . . . . . . . . . . . . . . . . 101Hayashi, S. [212] . . . . . . . . . . . . . . . . . . . . . . 274Haykin, S. [72] . . . . . . . . . . . . . . . . . . . . . . 45, 46Hbner, J. [497] . . . . . . . . . . . . . . . . . . . . . . . . . 528Heddle, R. [413] . . . . . . . . . . . . . . . . . . . . . . . 493Hellman, R. [425] . . . . . . . . . . . . . . . . . . . . . . 494Hellwig, K. [231] . . . . . . . . . . . . . . . . . . 319, 323Hellwig, K. [29] . . . . . . . . . . . . . . . . . . . . . . . . . . 4Hellwig, K. [257] . . . . . . . . . . . . . . . . . . 323, 328Herre, J. [40] . . . 5, 6, 490, 491, 498, 499, 502,

507–509, 557Herre, J. [451] . . . . . . . . . . . . . . . . . . . . . . . . . 505Herre, J. [452] . . . . . . . . . . . . . . . . . . . . .507, 508Herre, J. [507] . . . . . . . . . . . . . . . . . . . . .541, 542Herre, J. [433] . . . . . . . . . . . . . . . . . . . . .494, 506Hess, W. [14] . . . . . . . . . . . . . . . . . . . . . . . . . . 593Hikmet Sari, I.J. [496] . . . . . . . . . . . . . . . . . . 528Hiotakakos, D.J. [541] . . . .574, 575, 641, 646,

648, 652, 653, 659, 660, 684, 758,807

Ho, P. [238] . . . . . . . . . . . . . . . . . . . . . . . . . . . 320Hoeher, P. [582] . . . . . . . . . . . . . . . . . . . 683, 714Hoeher, P. [578] . . . . . . . . . . . . . . . . . . . 681, 683Hoffmann, R. [13]. . . . . . . . . . . . . . . . . . . . . .162Holmes, J.N. [539] . . . . . . . . . . . . 572, 610, 612Holmes, W.H. [95] . . . . . . . . . . . . . . . . . . . . . . 96Holtzwarth, H. [64] . . . . . . . . . . . . . . . . . . . . . .32Honda, M. [85] . . . . . . . . . . . . . . . . 67, 210, 578Honda, M. [464] . . . . . . . . . . . . . . . . . . .514, 515Hong, C. [270]. . . . . . . . . . . . . . . .351, 371, 382Honkanen, T. [228] . . . . . . . . . . . . . . . . 304, 307Huang, J.J.Y. [134] . . . . . . . . . . . . . . . . . . . . . 131Huber, J.B. [314] . . . . . . . . . . . . . . . . . . . . . . .447Hubner, J. [323] . . . . . . . . . . . . . . . . . . . . . . . .447Huffman, D.A. [456] . . . . . . . . . . . . . . . . . . . 510

Huges, P.M. [125] . . . . . . . . . . . . . . . . . . . . . . 125Huggins, A.W.F. [536] . . . . . . . . . . . . . 571, 743Hwang, W.L. [572] . . . . . . . . . . . . . . . . . . . . . 622

IIijima, K. [473] . . . . . . . . . . . . . . . . . . . . . . . . 517Ikeda, K. [458] . . . . . . . . . . . . . . . 513, 514, 516Ikeda, K. [459]. . . . . . . . . . . . . . . . . . . . . . . . .513Ikeda, K. [210] . . . . . . . . . . . . . . . . . . . . 261, 565Ikeda, K. [460]. . . . . . . . . . . . . . . . . . . . . . . . .513Ikeda, K. [463] . . . . . . . . . . . . . . . 513, 515, 557Ikedo, J. [210] . . . . . . . . . . . . . . . . . . . . .261, 565Ireton, M.A. [165]. . . . . . . . . . . . . . . . . . . . . .181Ireton, M.A. [167] . . . . . . . . . . . . . . . . . 181, 441ISO/IEC JTC1/SC29/WG11 N2503, [506]540,

549, 551Itakura, F. [111] . . . . . . . . . . . . . . . . . . . . . . . . 105Itakura, F. [112] . . . . . . . . . . . . . . . . . . . . . . . . 105Itakura, F. [144] . . . . . . . . . . . . . . .139, 324, 577Itakura, F. [113] . . . . . . . . . . . . . . . . . . . . . . . . 105Itakura, F. [123] . . . . . . . . . . . . . . . . . . . . . . . . 125Itakura, F. [173] . . . . . . . . . . . . . . . . . . . 200, 201Itoh, K. [113] . . . . . . . . . . . . . . . . . . . . . . . . . . 105Itoh, K. [85] . . . . . . . . . . . . . . . . . . . 67, 210, 578Itoh, K. [87] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Iwakami, N. [458]. . . . . . . . . . . . .513, 514, 516Iwakami, N. [459] . . . . . . . . . . . . . . . . . . . . . .513Iwakami, N. [450] . . . . . . . . . . . . . . . . . 502, 513Iwakami, N. [462] . . . . . . . . . . . . . . . . . . . . . .513Iwakami, N. [460] . . . . . . . . . . . . . . . . . . . . . .513Iwakami, N. [463]. . . . . . . . . . . . .513, 515, 557

JJacobs, P. [235] . . . . . . . . . . . . . . . . . . . .320, 321Jacobs, P. [206] . . . . . . . . . . . . . . . 253, 255, 261Jafarkhani, H. [485] . . . . . . . . . . . . . . . . . . . . 528Jafarkhani, H. [484] . . . . . . . . . . . . . . . . . . . . 528Jain, A.K. [69] . . . . . . . . . . 38, 53–55, 131, 132Jarvinen, K. [520] . . . . . . . . . . . . . . . . . 548, 549Jarvinen, K. [225] . . . . . . . . 298, 302, 303, 307Jarvinen, K. [228] . . . . . . . . . . . . . . . . . 304, 307Jasiuk, M.A. [202] . . . . . . . .247, 249, 251, 269Jasiuk, M.A. [164] . . . . . . . . . . . . . . . . . 181, 187Jasiuk, M.A. [203] . . . . . . . .247, 249, 251, 269Jasiuk, M.A. [211] . . . . . . . . . . . . . . . . . 269, 272Jayant, N. [94] . . . 96, 101, 102, 104, 349–352,

355, 357, 363, 401Jayant, N.S. [277] . . . . . . . . . . . . . . . . . 364, 409Jayant, N.S. [106] . . . . . . . . . . . . . 102, 352, 366Jayant, N.S. [107] . . . . . . . . . . . . . 102, 352, 366

Page 205: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 893

AUTHOR INDEX 893

Jayant, N.S. [272] . . . . . . . . . . . . . . . . . . . . . . 352Jayant, N.S. [78] . . . . . . . . . . . . . . . . . 50, 51, 53Jayant, N.S. [417] . . . . . . . . . . . . . . . . . . . . . . 493Jayant, N.S. [431] . . . . . . . . . . . . . 494, 508, 510Jayant, N.S. [10]20, 23, 29, 30, 32, 36, 38, 39,

55, 56, 199, 435, 561, 605Jeanclaude, I. [322] . . . . . . . . . . . . . . . . . . . . 447Jelinek, F. [221] . . . . . . . . . . . . . . .289, 449, 683Jelinek, F. [503] . . . . . . . . . . . . . . . . . . . . . . . . 529Jelinek, M. [520] . . . . . . . . . . . . . . . . . . 548, 549Jennings, A. [75] . . . . . . . . . . . . . . . 45, 131, 132Jin, A. [458] . . . . . . . . . . . . . . . . . . 513, 514, 516Jin, A. [463] . . . . . . . . . . . . . . . . . . 513, 515, 557Johansen, F. [138] . . . . . . . . . . . . . . . . . 137, 138Johnston, J. [451] . . . . . . . . . . . . . . . . . . . . . . 505Johnston, J. [433] . . . . . . . . . . . . . . . . . . 494, 506Johnston, J. [417] . . . . . . . . . . . . . . . . . . . . . . 493Johnston, J. [414] . . . . . . . . . . . . . . . . . . . . . . 493Johnston, J. [453] . . . . . . . . . . . . . . . . . . 507, 508Johnston, J. [448] . . . . . . . . . . . . . . . . . . . . . . 502Johnston, J. [427] . . . . . . . . . . . . . . . . . . 494, 495Johnston, J.D. [287] . . . . . . . . . . . . . . . 421, 427Johnston, J.D. [291] . . . . . . . . . . . . . . . . . . . . 433Jones, A.E. [318] . . . . . . . . . . . . . . . . . . . . . . .447Juang, B-H. [117] . . . 117, 120–122, 124, 135,

139, 209Juang, J. [244] . . . . . . . . . . . . . . . . . . . . . . . . . 320Jung, P. [219] . . . . . . . . . . . . . . . . . . . . . 289, 290

KKabal, P. [119] . . . . . . . . . . . . . . . . 121, 123, 124Kabal, P. [172] . . . . . . . . . . . . . . . . . . . . . . . . . 197Kabal, P. [295] . . . . . . . . . . . . . . . . . . . . . . . . . 445Kadambe, S. [554] . . . . . . . .594, 621, 627, 631Kai-Bor Yu, [179] . . . . . . . . . . . . . . . . . . . . . . 203Kaiser, S. [321] . . . . . . . . . . . . . . . . . . . . . . . . 447Kaiser, S. [495] . . . . . . . . . . . . . . . . . . . . . . . . 528Kaleh, G.K. [264] . . . . . . . . . . . . . . . . . 335, 336Kalet, I. [324] . . . . . . . . . . . . . . . . . . . . . . . . . 447Kamio, Y. [297] . . . . . . . . . . . . . . . . . . . . . . . . 445Kamio, Y. [305] . . . . . . . . . . . . . . . . . . . . . . . . 446Kamio, Y. [299] . . . . . . . . . . . . . . . . . . . . . . . . 445Kaneko, T. [462] . . . . . . . . . . . . . . . . . . . . . . . 513Kang, G.S. [118] . . . . . . . . . 118–121, 124, 577Kapanen, P. [225] . . . . . . . . 298, 302, 303, 307Karam, G. [496] . . . . . . . . . . . . . . . . . . . . . . . 528Karam, G. [322] . . . . . . . . . . . . . . . . . . . . . . . 447Kataoka, A. [212] . . . . . . . . . . . . . . . . . . . . . . 274Kataoka, A. [170] . . . . . . . . . . . . . . . . . . . . . . 189Kawashima, T. [239] . . . . . . . . . . . . . . . 320, 321

Keller, T. [214] . . . . . . . . . . . 288, 292–294, 447Keller, T. [308] . . . . . . . . . . . . . . . . . . . . 447, 450Keller, T. [51] . . . . . . . . . . 7, 491, 528, 529, 534Keller, T. [327] . . . . . . . . . . . . . . . . . . . . . . . . 447Keller, T. [499] . . . . . . . . . . . . . . . . . . . . . . . . 528Keller, T. [328] . . . . . . . . . . . . . . . . . . . . 447, 451Keller, T. [248] . . . . . . . . . . . 321, 322, 335, 336Keller, T. [255] . . . . . . . . . . . . . . . . . . . . . . . . 322Keller, T. [159] . 171, 221, 222, 235, 288, 410,

445Keller, T. [512] . . . . . . 541, 542, 547, 549, 552Keller, T. [254] . . . . . . . . . . . . . . . . . . . . . . . . 322Ketchum, R.H. [199] . . . . . . . . . . . . . . . . . . . 245Ketchum, R.H. [281] . . . . . . . . . . . . . . . . . . . 395Kim, Y. [449] . . . . . . . . . . . . . . . . . . . . . 502, 511Kirchherr, R. [143] . . . . . . . . . . . . 138, 456, 457Kirchherr, R. [336] . . . . . . . . . . . . . . . . . . . . . 463Kitawaki, N. [113] . . . . . . . . . . . . . . . . . . . . . 105Kitawaki, N. [85] . . . . . . . . . . . . . . 67, 210, 578Kitawaki, N. [87] . . . . . . . . . . . . . . . . . . . . . . . .67Kitawaki, N. [462] . . . . . . . . . . . . . . . . . . . . . 513Kleider, J.E. [243] . . . . . . . . . . . . . . . . . . . . . .320Kleider, J.E. [241] . . . . . . . . . . . . . . . . . . . . . .320Kleijn, W.B. [185] . . . . . . . . . . . . 209, 214–216Kleijn, W.B. [199] . . . . . . . . . . . . . . . . . . . . . 245Kleijn, W.B. [281] . . . . . . . . . . . . . . . . . . . . . 395Kleijn, W.B. [189] . . . . . . . . . . . . . . . . . . . . . 215Kleijn, W.B. [105] . . . . . . . . 102, 573, 646, 758Kleijn, W.B. [533] . . . . . . . . . . . . 567, 573, 653Kleijn, W.B. [540] . . . . . . . . . . . . . . . . . . . . . 573Kleijn, W.B. [335] . . . . . . . . . . . . . . . . . 463, 507Kleijn, W.B. [56] . . . . . . . . . . . . . .323, 324, 561Klein, A. [586] . . . . . . . . . . . . . . . . . . . . 713, 714Klein, A. [264] . . . . . . . . . . . . . . . . . . . . 335, 336Knudson, J. [138] . . . . . . . . . . . . . . . . . .137, 138Koch, W. [580] . . . . . . . . . . . . . . . . . . . . . . . . 683Koenen, R. [504] . . . . . . . . . . . . . . . . . . . . . . . 540Koenen, R. [505] . . . . . . . . . . . . . . . . . . . . . . . 540Koenen, R. [42] . . . . . . . . . . . . . . . . . 5, 490, 528Koh, S-N [598] . . . . . . . . . . . . . . . . . . . . . . . . 723Kohler, M.A. [527] . . . . . . . . . . . . . . . . . . . . . 566Kohler, M.A. [546] . . . . . . . . . . . . . . . . . . . . . 580Komaki, S. [249] . . . . . . . . . . . . . . . . . . 321, 336Kondoz, A.M. [187] . . . . . . . . . . . . . . . 210, 211Kondoz, A.M. [161] . 176, 437, 438, 440, 445,

446, 487Kondoz, A.M. [55] . . . . . . . 143, 149, 176, 561Kondoz, A.M. [130] . . . . . . . . . . . . . . . . . . . . 129Kondoz, A.M. [188] . . . . . . . . . . . . . . . . . . . . 212Kondoz, A.M. [293] . . . . . . . . . . . . . . . . . . . . 439

Page 206: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 894

894 AUTHOR INDEX

Kondoz, A.M. [597] . . . . . . . . . . . . . . . . . . . . 723Kondoz, A.M. [585] . . . . . . . . . . . . . . . 693, 758Koora, K. [497] . . . . . . . . . . . . . . . . . . . . . . . . 528Koora, K. [323] . . . . . . . . . . . . . . . . . . . . . . . . 447Koornwinder, T.H. [559] . . . . . . . 619, 622, 623Kovac, J. [564] . . . . . . . . . . . . . . . . . . . . . . . . .619Kovac, J. [573] . . . . . . . . . . . . . . . . . . . . . . . . .622Kraisinsky, D.J. [199] . . . . . . . . . . . . . . . . . . 245Krasinski, D.J. [281] . . . . . . . . . . . . . . . . . . . 395Kreamer, E.W. [550] . . . . . . . . . . . . . . . . . . . 581Kreamer, E.W. [548] . . . . . . . . . . . . . . . . . . . 581Krishna, H. [261] . . . . . . . . . . . . . . . . . . 332, 333Krishna, H. [262] . . . . . . . . . . . . . . . . . . 332, 333Kroon, P. [136] . . . . . . . . . . . . . . . . . . . . . . . . 134Kroon, P. [185] . . . . . . . . . . . . . . . 209, 214–216Kroon, P. [1] . . . . . . . . . . . . . . . . . . . . . . . . . . .345Kroon, P. [478] . . . . . . . . . . . . . . . 521, 523, 524Kroon, P. [170] . . . . . . . . . . . . . . . . . . . . . . . . 189Kroon, P. [11] . . . . . . . 157, 158, 160, 162, 564Kroon, P. [234] . . . . . . . . . . . . . . . . . . . . . . . . 320Kroon, P. [480] . . . . . . . . . . . . . . . . . . . . . . . . 524Kuan, E.L. [577] . . . . . . . . . . . . . . . . . . . . . . . 673Kuan, E.L. [309] . . . . . . . . . . . . . . . . . . 447, 714Kuan, E.L. [253] . . . . . . . . . 322, 323, 328, 336Kuan, E.L. [252] . . . . . . . . . . . . . . . . . . . . . . . 322Kull, B. [497] . . . . . . . . . . . . . . . . . . . . . . . . . . 528Kull, B. [323] . . . . . . . . . . . . . . . . . . . . . . . . . . 447Kumar, A. [599] . . . . . . . . . . . . . . . . . . . . . . . 723Kunz, O. [44] . . . . . . . . . . . . 5, 6, 490, 491, 517Kwan Truong, [537] . . . . . . . . . . . . . . . . . . . . 571

LLaflamme, C. [191] . . . . . . . . . . . . . . . . . . . . 221Laflamme, C. [162] . . 176, 282, 398, 440–444,

446Laflamme, C. [479] . . . . . . . . . . . . . . . . . . . . 524Laflamme, C. [139] . . . . . . . . . . . . . . . . . . . . 138Laflamme, C. [225] . . . . . . . 298, 302, 303, 307Laflamme, C. [228] . . . . . . . . . . . . . . . . 304, 307Laflamme, C. [163] . . 176, 442, 444–446, 487Laflamme, C. [213] . . . . . . . . . . . . . . . . 274, 386Laflamme, C. [160] . . 175, 189, 274, 307, 323,

350, 386, 394Laflamme, C. [224] . . . . . . . . . . . . . . . . 295, 297Laflamme, C. [223] . . . . . . . . . . . . . . . . 295, 297Laflamme, C. [294] . . . . . . . . . . . . . . . . 443, 446Lamblin, C. [143] . . . . . . . . . . . . . 138, 456, 457Lamblin, C. [336] . . . . . . . . . . . . . . . . . . . . . . 463Lamblin, C. [166] . . . . . . . . . . . . . . . . . . . . . . 181Lamblin, C. [337] . . . . . . . . . . . . . . . . . 464, 528

Laroia, R. [137] . . . . . . . . . . . . . . . . . . . . . . . . 135Lau, V.K.N. [306] . . . . . . . . . . . . . . . . . . . . . . 446Laurent, P.A. [532] . . . . . . . . . . . . 567, 570, 571Law, H.B. [171] . . . . . . . . . . . . . . . . . . . . . . . .191Le Guyader, A. [143] . . . . . . . . . . 138, 456, 457Leach, B. [528] . . . . . . . . . . . . . . . 566, 569, 570LeBlanc, W.P. [240] . . . . . . . . . . . . . . . . . . . . 320Lederer, D. [452] . . . . . . . . . . . . . . . . . . 507, 508Lee, C. [235] . . . . . . . . . . . . . . . . . . . . . . 320, 321Lee, C. [206] . . . . . . . . . . . . . . . . . 253, 255, 261Lee, K.Y. [130] . . . . . . . . . . . . . . . . . . . . . . . . 129Lee, W.C.Y. [195] . . . . . . . . . . . . . . . . . . . . . . 238Lefebvre, R. [520] . . . . . . . . . . . . . . . . . 548, 549Lefebvre, R. [139] . . . . . . . . . . . . . . . . . . . . . 138Lefebvre, R. [358] . . . . . . . . . . . . . . . . . . . . . 480Lepschy, A. [124] . . . . . . . . . . . . . . . . . . . . . . 125Levine, S. [469] . . . . . . . . . . . . . . . . . . . 516, 517Levine, S. [471] . . . . . . . . . . . . . . . . . . . . . . . . 517Levinson, S. [145]. . . . . . . . . . . . . . . . . . . . . .139Li, Y. [488] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528Li, Y. [325] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447Li, Y.G. [498] . . . . . . . . . . . . . . . . . . . . . . . . . .528Lieberman, P. [20] . . . . . . . . . . . . . . . . . . . . . 562Liew, T.H. [303] . . . . . . . . . . . . . . . . . . . . . . . 446Liew, T.H. [511] .541–543, 547, 549, 552, 553Liew, T.H. [256] . . . . . . . . . . . . . . . . . . . 322, 333Liew, T.H. [304] . . . . . . . . . . . . . . . . . . . . . . . 446Liew, T.H. [251] . . . . . . . . . . . . . . . . . . . 322, 446Lim, J.S. [103] . . 101, 566, 567, 569, 570, 687Lin, K-Y. [261] . . . . . . . . . . . . . . . . . . . . 332, 333Lin, S. [509] . . . . . . . . . . . . . . . . . . . . . . . . . . . 547Lin, Y.-C. [275] . . . . . . . . . . . . . . . . . . . . . . . . 356Lin, Y.C. [94] . . . .96, 101, 102, 104, 349–352,

355, 357, 363, 401Linde, Y. [280] . . . . . . . . . . . . . . . . . . . . 378, 440Linden, J. [148] . . . . . . . . . . . . . . . . . . . . . . . . 149Linden, J. [149] . . . . . . . . . . . . . . . . . . . 149, 150Lloyd, S.P. [60] . . . . . . . . . . . . . . . . . . . . . . 29, 36Lloyd, S.P. [61] . . . . . . . . . . . . . . . . . . . . . . 29, 36LoCicero, J.L. [574] . . . . . . . . . . . . . . . . . . . . 632LoCicero, J.L. [542] . 574, 641, 647, 673, 758,

807Lokhoff, G.C.P. [415] . . . . . . . . . . . . . . . . . . 493Lombardo, A. [246] . . . . . . . . . . . . . . . . . . . . 321Lupini, P. [596] . . . . . . . . . . . . . . . . . . . .723, 724Lupini, P. [102] . . . . . . . . . . . . . . . . . . . . . . . . 101

MMabilleau, P. [168] . . 181, 188, 298, 303, 304,

307, 441

Page 207: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 895

AUTHOR INDEX 895

Mabilleau, P. [162] . . 176, 282, 398, 440–444,446

Mabilleau, P. [479] . . . . . . . . . . . . . . . . . . . . . 524Macleod, M.D. [306] . . . . . . . . . . . . . . . . . . . 446Mahieux, Y. [447] . . . . . . . . . . . . . . . . . . . . . . 502Mahmoud, S.A. [240] . . . . . . . . . . . . . . . . . . 320Maitre, X. [283] . . . . . . . . . . . . . . . . . . . 415, 429Makhoul, J. [77] . . . . . . . . . . . . . . . . . . . . . 46, 47Makhoul, J. [76] . . . . . . . . . . . . . . . . . . . . 46, 105Makhoul, J. [536] . . . . . . . . . . . . . . . . . .571, 743Makhoul, J. [91] . . . . . . . . . . . . . . . 78, 129, 132Makhoul, J. [114] . . . . . . . . . . . . . . . . . . . . . . 113Mallat, S. [566] . . . . . . . . . . . . . . . 619, 622, 626Mallat, S. [569] .621, 622, 624, 626, 758, 804,

805Mallat, S. [572] . . . . . . . . . . . . . . . . . . . . . . . . 622Malvar, H.S. [333] . . . . . . . . . . . . . . . . . 457, 459Malvar, H.S. [437] . . . . . . . . . . . . . . . . . 495, 496Malvar, H.S. [438] . . . . . . . . . . . . . . . . . . . . . 495Malvar, H.S. [446] . . . . . . . . . . . . . . . . . . . . . 501Mandarini, P. [319] . . . . . . . . . . . . . . . . . . . . . 447Manjunath, S. [244] . . . . . . . . . . . . . . . . . . . . 320Mano, K. [210] . . . . . . . . . . . . . . . . . . . . 261, 565Markel, J.D. [86] . . . . . . . . . . . . . . . . . . . 67, 134Markel, J.D. [5] . . . . . . . . . . . . . . . . . . . . . .16, 87Marques, J.S. [198]. . . . . . . . . . . . . . . . . . . . .244Massaloux, D. [143] . . . . . . . . . . .138, 456, 457Massaloux, D. [336] . . . . . . . . . . . . . . . . . . . . 463Massaloux, D. [166] . . . . . . . . . . . . . . . . . . . . 181Massaloux, D. [160] . 175, 189, 274, 307, 323,

350, 386, 394Massaloux, D. [337] . . . . . . . . . . . . . . . 464, 528Matsumoto, J. [104] . . . . . . . . . . . . . . . . . . . . 102Matsumoto, J. [595] . . . . . . . . . . . . . . . . . . . . 723Matsumoto, J. [472] . . . . . . . . . . . . . . . . . . . . 517Matsumoto, J. [473] . . . . . . . . . . . . . . . . . . . . 517Matsuoka, H. [305] . . . . . . . . . . . . . . . . . . . . .446Max, J. [62] . . . . . . . . . . . . . . . . . . . . . .29, 36, 38May, T. [313] . . . . . . . . . . . . . . . . . . . . . . . . . . 447McAulay, R.J. [588] . . . . . . . . . . . 719–722, 754McAulay, R.J. [592] . . . . . . . . . . . . . . . . . . . . 720McAulay, R.J. [534] . .568, 569, 723, 724, 728McAulay, R.J. [594] . . . . . . 723, 725, 743, 744McAulay, R.J. [530] . .566, 568, 569, 723, 724McAulay, R.J. [593] . .720, 723, 743, 744, 754McAulay, R.J. [242] . . . . . . . . . . . . . . . . . . . . 320McCree, A. [537] . . . . . . . . . . . . . . . . . . . . . . 571McCree, A.V. [538] . . . . . . . . . . . . . . . . . . . . 571McCree, A.V. [531] . .567, 571, 572, 610–614,

663, 695, 696, 759

McGonegal, C.A. [584] . . . . . . . . . . . . . . . . . 693McGonegal, C.A. [557] . . . . . . . . . . . . 610, 612Meares, D. [45] . . . . . . . . . . . . . . . . . . . . . . 5, 490Meine, N. [467] . . . . . . . . . . . . . . . . . . . . . . . . 516Melchner, M.J. [94] . . . . . . . 96, 101, 102, 104,

349–352, 355, 357, 363, 401Meng, T.H.Y. [470] . . . . . . . . . . . . . . . . 516, 517Mermelstein, P. [150] . . . . . . . . . . . . . . . . . . . 149Meyr, H. [315] . . . . . . . . . . . . . . . . . . . . . . . . . 447Meyr, H. [316] . . . . . . . . . . . . . . . . . . . . . . . . . 447MGonegal, C.A. [555] . . . . . . . . . . . . . . . . . . 594Mian, G.A. [124] . . . . . . . . . . . . . . . . . . . . . . .125Miani, G.A. [176] . . . . . . . . . . . . . . . . . 202, 207Miki, S. [458] . . . . . . . . . . . . . . . . 513, 514, 516Miki, S. [450] . . . . . . . . . . . . . . . . . . . . . 502, 513Miki, S. [210] . . . . . . . . . . . . . . . . . . . . . 261, 565Miki, S. [460] . . . . . . . . . . . . . . . . . . . . . . . . . 513Miki, S. [463] . . . . . . . . . . . . . . . . 513, 515, 557Miki, T. [209] . . . . . . . . . . . . . . . . . . . . . . . . . .261Miki, T. [208] . . . . . . . . . . . . . . . . . . . . . . . . . .261Mikkola, H. [520] . . . . . . . . . . . . . . . . . 548, 549Moncet, J.L. [172] . . . . . . . . . . . . . . . . . . . . . 197Morinaga, N. [249] . . . . . . . . . . . . . . . . 321, 336Morinaga, N. [297] . . . . . . . . . . . . . . . . . . . . . 445Morinaga, N. [305] . . . . . . . . . . . . . . . . . . . . . 446Morinaga, N. [298] . . . . . . . . . . . . . . . . . . . . . 445Morinaga, N. [299] . . . . . . . . . . . . . . . . . . . . . 445Morissette, S. [168] . .181, 188, 298, 303, 304,

307, 441Morissette, S. [191] . . . . . . . . . . . . . . . . . . . . 221Morissette, S. [162] . .176, 282, 398, 440–444,

446Morissette, S. [479] . . . . . . . . . . . . . . . . . . . . 524Morissette, S. [166] . . . . . . . . . . . . . . . . . . . . 181Moriya, T. [458] . . . . . . . . . . . . . . 513, 514, 516Moriya, T. [459] . . . . . . . . . . . . . . . . . . . . . . . 513Moriya, T. [450] . . . . . . . . . . . . . . . . . . . 502, 513Moriya, T. [212] . . . . . . . . . . . . . . . . . . . . . . . 274Moriya, T. [462] . . . . . . . . . . . . . . . . . . . . . . . 513Moriya, T. [210] . . . . . . . . . . . . . . . . . . . 261, 565Moriya, T. [460] . . . . . . . . . . . . . . . . . . . . . . . 513Moriya, T. [463] . . . . . . . . . . . . . . 513, 515, 557Moriya, T. [464] . . . . . . . . . . . . . . . . . . . 514, 515Moriya, T. [461] . . . . . . . . . . . . . . . . . . . 513, 515Muenster, M. [499] . . . . . . . . . . . . . . . . . . . . . 528Muller, J-M. [211] . . . . . . . . . . . . . . . . . 269, 272Muller, S.H. [314] . . . . . . . . . . . . . . . . . . . . . .447Munster, M. [255] . . . . . . . . . . . . . . . . . . . . . . 322Murashima, A. [338] . . . . . . . . . . . . . . . . . . . 464

Page 208: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 896

896 AUTHOR INDEX

NNagabucki, H. [87] . . . . . . . . . . . . . . . . . . . . . . 67Naguib, A. [487] . . . . . . . . . . . . . . . . . . . . . . . 528Naguib, A. [490] . . . . . . . . . . . . . . . . . . . . . . . 528Naguib, A. [489] . . . . . . . . . . . . . . . . . . 528, 534Naijoh, M. [299] . . . . . . . . . . . . . . . . . . . . . . . 445Nanda, S. [192] . . . . . . . . . . 221, 226, 229, 235Nasshan, M. [219] . . . . . . . . . . . . . . . . . 289, 290Natvig, J.E. [151] . . . . . . . . . . . . . . . . . .154, 162Ng, S.X. [518] . . . . . . . . . . . 547–549, 557, 558Ng, S.X. [522] . . . . . . . . . . . . . . . . . . . . . . . . . 556Ng, S.X. [516] . . 543, 549, 552, 553, 557, 558Ng, S.X. [512] . . . . . . . 541, 542, 547, 549, 552Niranjan, M. [182] . . . . . . . . . . . . . . . . . . . . . 207Nishiguchi, M. [104] . . . . . . . . . . . . . . . . . . . 102Nishiguchi, M. [595] . . . . . . . . . . . . . . . . . . . 723Nishiguchi, M. [472] . . . . . . . . . . . . . . . . . . . 517Nishiguchi, M. [473] . . . . . . . . . . . . . . . . . . . 517Noll, A.M. [553] . . . . . . . . . . . . . . . . . . . . . . . 594Noll, P. [431] . . . . . . . . . . . . . . . . . 494, 508, 510Noll, P. [10] .20, 23, 29, 30, 32, 36, 38, 39, 55,

56, 199, 435, 561, 605Noll, P. [67] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Noll, P. [88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Noll, P. [416] . . . . . . . . . . . . . . . . . . . . . . . . . . 493Noll, P. [37] . . . . . . . . . . . . . .5, 6, 489, 491, 496Noll, P. [432] . . . . . . . . . . . . . . . . . . . . . . . . . . 494Nomura, T. [477] . . . . . . . . . . . . . 521, 523, 524Nowack, J.M. [211] . . . . . . . . . . . . . . . . 269, 272Nussbaumer, H.J. [289] . . . . . . . . . . . . . . . . . 427Nussbaumer, H.J. [288] . . . . . . . . . . . . . . . . . 427Nussbaumer, H.J. [441] . . . . . . . . . . . . 495, 502

OO’Neal, J. [90] . . . . . . . . . . . . . . . . . . . . . . . . . . 78O’Shaughnessy, D. [17] . . . . . . . . . . . . . 16, 562Ochsner, H. [81] . . . . . . . . . . . . . . . . . . . . . . . . 60Offer, E. [218] . . . . . . . . . . . . . . . . . . . . . . . . . 289Ohmuro, H. [210] . . . . . . . . . . . . . . . . . 261, 565Ohya, T. [209] . . . . . . . . . . . . . . . . . . . . . . . . . 261Ohya, T. [208] . . . . . . . . . . . . . . . . . . . . . . . . . 261Oikawa, Y. [40] .5, 6, 490, 491, 498, 499, 502,

507–509, 557Ojanpare., T. [215]. . . . . . . . . . . . . . . . .288, 290Omologo, M. [120] . . . . . . . . . . . . . . . . . . . . .121Ong, L.K. [188] . . . . . . . . . . . . . . . . . . . . . . . . 212Ordentlich, E. [292] . . . . . . . . . . . . . . . . . . . . 437Ozawa, K. [338] . . . . . . . . . . . . . . . . . . . . . . . 464Ozawa, K. [477] . . . . . . . . . . . . . . 521, 523, 524

PPaez, M.D. [68] . . . . . . . . . . . . . . . . . . . . . . . . . 38Painter, T. [420] . 493, 495, 497, 499, 501, 503Paksoy, E. [232] . . . . . . . . . . . . . . . . . . . . . . . 320Paksoy, E. [236] . . . . . . . . . . . . . . . . . . . . . . . 320Palazzo, S. [246] . . . . . . . . . . . . . . . . . . . . . . . 321Paliwal, K.K. [335] . . . . . . . . . . . . . . . . 463, 507Paliwal, K.K. [56] . . . . . . . . . . . . .323, 324, 561Paliwal, K.K. [116] . . 117, 129, 134, 135, 139,

209, 325, 589Pan, D. [34] . . . . .5, 6, 489, 491, 502, 503, 505Panter, P.F. [65] . . . . . . . . . . . . . . . . . . . . . . . . . 32Papanastasiou, C. [600] . . . . . . . . . . . . . . . . . 724Papke, L. [218] . . . . . . . . . . . . . . . . . . . . . . . . 289Park, S. [449] . . . . . . . . . . . . . . . . . . . . . 502, 511Parks, T. [594] . . . . . . . . . . . 723, 725, 743, 744Pasupathy, S. [581] . . . . . . . . . . . . . . . . . . . . . 683Pattison, R.J. [243] . . . . . . . . . . . . . . . . . . . . . 320Paul, D. [584] . . . . . . . . . . . . . . . . . . . . . . . . . .693Paulus, J.W. [140] . . . . . . . . . . . . . . . . . 138, 456Pereira, F. [508] . . . . . . . . . . . . . . . . . . . 541, 542Petit, J. [447] . . . . . . . . . . . . . . . . . . . . . . . . . . 502Phamdo, N. [137] . . . . . . . . . . . . . . . . . . . . . . 135Picone, J.W. [574] . . . . . . . . . . . . . . . . . . . . . .632Picone, J.W. [542] . . . 574, 641, 647, 673, 758,

807Pietrobon, S.S. [220] . . . . . . . . . . . . . . . . . . . 289Pirhonen, R. [586] . . . . . . . . . . . . . . . . . 713, 714Press, W.H. [177] . . . . . . . . .203–206, 404, 605Princen, J.P. [442] . . . . . . . . . . . . . . . . . 496, 501Proakis, J.G. [19] . . . . . . . . . . . . . . . . . . . . . . 562Proakis, J.G. [332] . . . . . . . . . . . . . . . . . . . . . 451Purnhagen, H. [468] . . . . . . . . . . . . . . . . . . . . 516Purnhagen, H. [465] . . . . . . . . . . . . . . . . . . . . 516Purnhagen, H. [467] . . . . . . . . . . . . . . . . . . . . 516Purnhagen, H. [466] . . . . . . . . . . . . . . . . . . . . 516

QQuackenbush, S. [40] 5, 6, 490, 491, 498, 499,

502, 507–509, 557quackenbush, S. [414] . . . . . . . . . . . . . . . . . . 493Quackenbush, S.R. [290] . . . . . . 433, 434, 446Quackenbush, S.R. [43] . . . . . . . .5, 6, 490, 491Quatieri, T.F. [588] . . . . . . . . . . . . 719–722, 754Quatieri, T.F. [592] . . . . . . . . . . . . . . . . . . . . . 720Quatieri, T.F. [594] . . . . . . . 723, 725, 743, 744Quatieri, T.F. [530] . . . 566, 568, 569, 723, 724Quatieri, T.F. [593] . . . 720, 723, 743, 744, 754Quatieri, T.F. [242] . . . . . . . . . . . . . . . . . . . . . 320Quinquis, C. [143] . . . . . . . . . . . . 138, 456, 457

Page 209: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 897

AUTHOR INDEX 897

Quinquis, C. [337] . . . . . . . . . . . . . . . . . 464, 528Quinquisd, C. [336] . . . . . . . . . . . . . . . . . . . . 463

RRabiner, L. [145] . . . . . . . . . . . . . . . . . . . . . . . 139Rabiner, L.R. [551] . . . . . . . . . . . . . . . . 594, 621Rabiner, L.R. [584] . . . . . . . . . . . . . . . . . . . . .693Rabiner, L.R. [555] . . . . . . . . . . . . . . . . . . . . .594Rabiner, L.R. [6]45–47, 50, 87, 105, 111, 113,

575–577Rabiner, L.R. [557] . . . . . . . . . . . . . . . . 610, 612Ragot, S. [358] . . . . . . . . . . . . . . . . . . . . . . . . 480Ramachandran, R.P. [119] . . . . . 121, 123, 124Ramachandran, R.P. [128] . . . . . 129, 134–136Ramamoorthy, V. [106] . . . . . . . . 102, 352, 366Ramamoorthy, V. [107] . . . . . . . . 102, 352, 366Rao, K.R. [334] . . . . . . . . . . . . . . . . . . . . . . . . 459Raviv, J. [221] . . . . . . . . . . . . . . . . 289, 449, 683Raviv, J. [503] . . . . . . . . . . . . . . . . . . . . . . . . . 529Remde, J.R. [9] . . . . . . . . . . . . . . . 100, 155, 563Riccardi, G. [176] . . . . . . . . . . . . . . . . . 202, 207Rioul, O. [562] . . . . . . . . . . . . . . . . . . . . 619–621Riva, G. [193] . . . . . . . . . . . . . . . . . . . . . . . . . 230Robertson, P. [321] . . . . . . . . . . . . . . . . . . . . . 447Robertson, P. [579] . . . . . . . . . . . . . . . . . . . . . 682Robertson, P. [578] . . . . . . . . . . . . . . . . 681, 683Rohling, H. [313] . . . . . . . . . . . . . . . . . . . . . . 447Rosenberg, A.E. [555] . . . . . . . . . . . . . . . . . . 594Rosenberg, A.E. [558] . . . . . . . . . . . . . 610–612Rosenberg, A.E. [557] . . . . . . . . . . . . . 610, 612Rothweiler, J.H. [439] . . . . . . . . . . . . . .495, 501Rotola-Pukkila, J. [520] . . . . . . . . . . . . 548, 549Roucos, S. [91] . . . . . . . . . . . . . . . . 78, 129, 132Roy, G. [295] . . . . . . . . . . . . . . . . . . . . . . . . . . 445Ruf, M. [495] . . . . . . . . . . . . . . . . . . . . . . . . . . 528Ruf, M.J. [321] . . . . . . . . . . . . . . . . . . . . . . . . 447

SSabin, M. [594] . . . . . . . . . . 723, 725, 743, 744Saito, S. [111] . . . . . . . . . . . . . . . . . . . . . . . . . 105Saito, S. [112] . . . . . . . . . . . . . . . . . . . . . . . . . 105Salami, R. [520] . . . . . . . . . . . . . . . . . . . 548, 549Salami, R.A. [154] . . . . . . . . . . . . . . . . . . . . . 168Salami, R.A. [184]. . . . . . . .209, 216, 217, 220Salami, R.A. [162] . . 176, 282, 398, 440–444,

446Salami, R.A. [479] . . . . . . . . . . . . . . . . . . . . . 524Salami, R.A. [139] . . . . . . . . . . . . . . . . . . . . . 138Salami, R.A. [225]. . . . . . . .298, 302, 303, 307Salami, R.A. [228] . . . . . . . . . . . . . . . . .304, 307

Salami, R.A. [132] . . . . . . . . . . . . . . . . .129, 131Salami, R.A. [135] . . . . . . . . . . . . . . . . .134, 181Salami, R.A. [163] . . . 176, 442, 444–446, 487Salami, R.A. [213] . . . . . . . . . . . . . . . . .274, 386Salami, R.A. [160] . . 175, 189, 274, 307, 323,

350, 386, 394Salami, R.A. [224] . . . . . . . . . . . . . . . . .295, 297Salami, R.A. [223] . . . . . . . . . . . . . . . . .295, 297Salami, R.A. [367] . . . . . . . . . . . . . . . . .507, 524Salami, R.A. [70] . . . .45, 94, 96–99, 156–158,

160, 162, 176, 179, 182, 210, 214Salami, R.A. [294] . . . . . . . . . . . . . . . . .443, 446Salami, R.A. [71] . . . . . . . . . . . . . . . . . . . 45–47,

50, 96–99, 113, 155–160, 162, 176,179, 181, 182, 191, 421, 442

Salami, R.A. [153] . . . . . . . . . . . . . . . . . . . . . 168Sambur, M.R. [557] . . . . . . . . . . . . . . . .610, 612Sampei, S. [249] . . . . . . . . . . . . . . . . . . .321, 336Sampei, S. [297] . . . . . . . . . . . . . . . . . . . . . . . 445Sampei, S. [305] . . . . . . . . . . . . . . . . . . . . . . . 446Sampei, S. [298] . . . . . . . . . . . . . . . . . . . . . . . 445Sampei, S. [299] . . . . . . . . . . . . . . . . . . . . . . . 445Sanchez-Calle, V.E. [294] . . . . . . . . . . 443, 446Sari, H. [322] . . . . . . . . . . . . . . . . . . . . . . . . . . 447Sasaoka, H. [297] . . . . . . . . . . . . . . . . . . . . . . 445Schafer, R.W. [6] . . . . 45–47, 50, 87, 105, 111,

113, 575–577Scharf, B. [423] . . . . . . . . . . . . . . . . . . . 493, 508Scheirer, E. [48] . . . . . . . . . . . . . . . . .6, 491, 528Scheirer, E.D. [47] . . . . . . . . . . . . . . . . . . . 6, 491Schembra, G. [246] . . . . . . . . . . . . . . . . . . . . 321Schmidt-Nielsen, A. [549] . . . . . . . . . . . . . . 581Schnitzler, J. [143] . . . . . . . . . . . . 138, 456, 457Schnitzler, J. [336] . . . . . . . . . . . . . . . . . . . . . 463Schnitzler, J. [140] . . . . . . . . . . . . . . . . .138, 456Schnitzler, J. [337] . . . . . . . . . . . . . . . . .464, 528Schreiner, P. [45] . . . . . . . . . . . . . . . . . . . . 5, 490Schroder, G. [568] . . . . . . . . . . . . 621, 627, 631Schroeder, M.R. [92] . . . . . . . . . . . . . . . . . . . . 94Schroeder, M.R. [543] . . . . . . . . . . . . . . . . . . 578Schroeder, M.R. [16] 101, 175–177, 323, 370,

564Schroeder, M.R. [365] . . . . . . . . . . . . . . . . . . 517Schultheis, P.M. [134] . . . . . . . . . . . . . . . . . . 131Schur, J. [152] . . . . . . . . . . . . . . . . . . . . . . . . . 164Schwartz, R. [536] . . . . . . . . . . . . . . . . . 571, 743Sen, D. [95] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Seo, Y. [449] . . . . . . . . . . . . . . . . . . . . . . 502, 511Sereno, D. [237] . . . . . . . . . . . . . . . . . . . . . . . 320Serizawa, M. [338] . . . . . . . . . . . . . . . . . . . . . 464

Page 210: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 898

898 AUTHOR INDEX

Serizawa, M. [477] . . . . . . . . . . . . 521, 523, 524Seshadri, N. [487] . . . . . . . . . . . . . . . . . . . . . . 528Seshadri, N. [519] . . . . . . . . . . . . . . . . . . . . . . 548Seshadri, N. [490] . . . . . . . . . . . . . . . . . . . . . . 528Seshadri, N. [489] . . . . . . . . . . . . . . . . . 528, 534Seshadri, N. [128] . . . . . . . . . . . . .129, 134–136Seshadri, N. [513] . . . . . . . . 541, 542, 549, 552Seshadri, N. [483] . . . . . . . . . . . . . . . . . 528, 533Seymour, R.A. [171] . . . . . . . . . . . . . . . . . . . 191Shannon, C.E. [57] . . . . . . . . . . . . . . . . . . . . . . 15Sharma, V. [239] . . . . . . . . . . . . . . . . . . 320, 321Shepherd, S.J. [317] . . . . . . . . . . . . . . . . . . . . 447Shimoyoshi, O. [413] . . . . . . . . . . . . . . . . . . . 493Shinobu Ono, [104] . . . . . . . . . . . . . . . . . . . . 102Shlien, S. [36] . . . . . . . . . . . .5, 6, 489, 491, 504Shlien, S. [443] . . . . . . . . . . . . . . . . . . . .497, 498Shoham, Y. [292] . . . . . . . . . . . . . . . . . . . . . . 437Shoham, Y. [127] . . . . . . . . . . . . . . . . . . . . . . 129Shoham, Y. [200] . . . . . . . . . . . . . . . . . . . . . . 245Singhal, S. [175] . . . . . . . . . . . . . . . . . . . . . . . 202Singhal, S. [96] . . . . . . . . . . . . . . . . . . . . . . . . . 96Singhal, S. [99] . . . . . . . . . . . . . . . . . . . . . . . . 100Sinha, D. [414] . . . . . . . . . . . . . . . . . . . . . . . . 493Sivaprakasapillai, P [598] . . . . . . . . . . . . . . . 723Sjoberg, J. [257] . . . . . . . . . . . . . . . . . . . 323, 328Skoeld, J. [586] . . . . . . . . . . . . . . . . . . . 713, 714Skoglung, J. [148] . . . . . . . . . . . . . . . . . . . . . .149Skoglung, J. [149] . . . . . . . . . . . . . . . . . 149, 150Sluijter, R.J. [476] . . . . . . . . . . . . . . . . . . . . . .521Sluyter, R.J. [11] . . . . . 157, 158, 160, 162, 564Sluyter, R.J. [480] . . . . . . . . . . . . . . . . . . . . . . 524Sluyter, R.J. [12] . . . . . . . . . . . . . . . . . . . . . . . 162Smith (Eds), M.T.J. [436] . . . . . . . . . . . . . . . 495Smith, B. [66]. . . . . . . . . . . . . . . . . . . . . . . . . . .32Smith, J.O. [469] . . . . . . . . . . . . . . . . . . 516, 517Smith, J.O. III [471] . . . . . . . . . . . . . . . . . . . . 517Smith, M.J.T. [590] . . 720, 721, 723, 732–734,

754, 758Smith, M.J.T. [591] . . 720, 721, 733, 754, 758So, K.K.M. [129] . . . . . . . . . . . . . . . . . . . . . . 129Sofranek, R. [417] . . . . . . . . . . . . . . . . . . . . . 493Soheili, R. [293] . . . . . . . . . . . . . . . . . . . . . . . 439Sollenberger, N. [488] . . . . . . . . . . . . . . . . . . 528Sollenberger, N.R. [498] . . . . . . . . . . . . . . . . 528Sollenberger, N.R. [325] . . . . . . . . . . . . . . . . 447Somerville, F.C.A. [515] . . . . . . . . . . . 542, 550Sondhi, M. [145] . . . . . . . . . . . . . . . . . . . . . . . 139Sondhi, M.M. [128] . . . . . . . . . . . 129, 134–136Sonohara, M. [413] . . . . . . . . . . . . . . . . . . . . .493

Soong, F.K. [117] . . . 117, 120–122, 124, 135,139, 209

Spanias, A. [420]493, 495, 497, 499, 501, 503Sreenivas, T.V. [455] . . . . . . . . . . . . . . . . . . . 508Srinivasan, K. [236] . . . . . . . . . . . . . . . . . . . . 320Steedman, R.A.J. [79] . . . . . . . . . . . . . . . . . . . 60Steele, R. [265] . . . . . . . . . . . . . . . . . . . .336, 445Steele, R. [158] . 171, 221, 224, 324, 332, 410,

680–683Steele, R. [575] . . . . . . . . . . . . . . . . . . . . . . . . 648Steele, R. [154] . . . . . . . . . . . . . . . . . . . . . . . . 168Steele, R. [184] . . . . . . . . . . 209, 216, 217, 220Steele, R. [296] . . . . . . . . . . . . . . . . . . . . . . . . 445Steele, R. [71] . . . . . . . . . . . . . . . . . . . . . . 45–47,

50, 96–99, 113, 155–160, 162, 176,179, 181, 182, 191, 421, 442

Steele, R. [3] . . . . . . . . . . . . . . . . . . . . . . . . . . 561Steele, R. [339] . . . . . . . . . . . . . . . 464, 522, 528Steele, R. [153] . . . . . . . . . . . . . . . . . . . . . . . . 168Steele, R. [155] . . . . . . . . . . . . . . . 171, 173, 222Steele, R. [194] . . . . . . . . . . . . . . . . . . . .235–240Stefanov, J. [98] 100, 117, 154, 162, 187, 221,

223, 224, 228, 229, 231, 235Stegmann, J. [143] . . . . . . . . . . . . 138, 456, 457Stegmann, J. [336] . . . . . . . . . . . . . . . . . . . . . 463Stegmann, J. [337] . . . . . . . . . . . . . . . . . 464, 528Stegmann, J. [568] . . . . . . . . . . . . 621, 627, 631Stoll, G. [35] . . . . . . . . . . . . . 5, 6, 489, 491, 496Stoll, G. [445] . . . . . . . . . . . . . . . . . . . . . . . . . 498Street, J. [514] . . . . . . . . . . . . . . . . . . . . 542, 548Su, H.Y. [191] . . . . . . . . . . . . . . . . . . . . . . . . . 221Suda, H. [209] . . . . . . . . . . . . . . . . . . . . . . . . . 261Suda, H. [208] . . . . . . . . . . . . . . . . . . . . . . . . . 261Suen, A.N. [201] . . . . . . . . . . . . . . . . . . . . . . . 247Sugamura, N. [123] . . . . . . . . . . . . . . . . . . . . 125Sugamura, N. [115] . . . . . . . . . . . . . . . . 117, 134Sugiyama, A. [44] . . . . . . . . 5, 6, 490, 491, 517Sukkar, R.A. [574] . . . . . . . . . . . . . . . . . . . . . 632Sukkar, R.A. [542] . . 574, 641, 647, 673, 758,

807Sun, J-D. [261] . . . . . . . . . . . . . . . . . . . . 332, 333Sun, J-D. [262] . . . . . . . . . . . . . . . . . . . . 332, 333Sundberg, C-E.W. [519] . . . . . . . . . . . . . . . . 548Suoranta, R. [586] . . . . . . . . . . . . . . . . . 713, 714Supplee, L.M. [547] . . . . . . . . . . . . . . . . . . . . 581Supplee, L.M. [527] . . . . . . . . . . . . . . . . . . . . 566Suzuki, H. [413] . . . . . . . . . . . . . . . . . . . . . . . 493Szabo, N.S. [259] . . . . . . . . . . . . . . . . . .332, 333

Page 211: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 899

AUTHOR INDEX 899

TTakacs, GY. [15] . . . . . . . . . 105, 107, 110, 111Tanaka, R.I. [259] . . . . . . . . . . . . . . . . . 332, 333Taniguchi, T. [233] . . . . . . . . . . . . . . . . . . . . . 320Taori, R. [476] . . . . . . . . . . . . . . . . . . . . . . . . . 521Tardelli, J.D. [550] . . . . . . . . . . . . . . . . . . . . . 581Tardelli, J.D. [548] . . . . . . . . . . . . . . . . . . . . . 581Tarokh, V. [487] . . . . . . . . . . . . . . . . . . . . . . . 528Tarokh, V. [490] . . . . . . . . . . . . . . . . . . . . . . . 528Tarokh, V. [485] . . . . . . . . . . . . . . . . . . . . . . . 528Tarokh, V. [513] . . . . . . . . . . 541, 542, 549, 552Tarokh, V. [483] . . . . . . . . . . . . . . . . . . . 528, 533Tarokh, V. [484] . . . . . . . . . . . . . . . . . . . . . . . 528Taylor, F.J. [260] . . . . . . . . . . . . . . . . . . . . . . . 332Teague, K.A. [528] . . . . . . . . . . . . 566, 569, 570Teague, K.A. [535] . . . . . . . . . . . . . . . . 569, 570Temerinac, M. [440] . . . . . . . . . . . . . . . . . . . .495Teukolsky, S.A. [177] . . . . . 203–206, 404, 605Thitimajshima, P. [216] . . . 289, 447, 449, 680,

681, 713Thitimajshima, P. [502] . . . . . . . . 528, 529, 534Thorpe, T. [89] . . . . . . . . . . . . . . . . . . . . . . . . . . 77Timor, U. [192] . . . . . . . . . . 221, 226, 229, 235Tobias, J.V. [8] . . . . . . . . . . . . . . . . . . . . . . . . . 434Todd, C. [412]. . . . . . . . . . . . . . . . . . . . .491, 501Tohkura, Y. [173] . . . . . . . . . . . . . . . . . . 200, 201Torrance, J.M. [267] . . . . . . . . . . . . . . . . . . . . 336Trancoso, I.M. [198] . . . . . . . . . . . . . . . . . . . 244Tremain, T. [186] . . . . . . . . . 209, 210, 214, 242Tremain, T.E. [197] . . . . . . . . . . . 242, 245, 596Tremain, T.E. [527] . . . . . . . . . . . . . . . . . . . . 566Tremain, T.E. [196] . . . . . . . 242, 562, 564, 566Tremain, T.E. [546] . . . . . . . . . . . . . . . . . . . . 580Tribolet, J.M. [589] . . . . . . . . . . . . . . . . . . . . 719Tribolet, J.M. [198] . . . . . . . . . . . . . . . . . . . . 244Tribolet, J.M. [430] . . . . . . . . . . . . . . . . . . . . 494Tsutsui, K. [413] . . . . . . . . . . . . . . . . . . . . . . . 493Tzeng, F.F. [181] . . . . . . . . . . . . . . . . . . . . . . . 207

UUbale, A. [142] . . . . . . . . . . . . . . . . . . . . . . . . 138Unagami, S. [233] . . . . . . . . . . . . . . . . . . . . . .320Ungerb, G.1001[510] 541, 542, 547, 549, 552Unser, M. [570] . . . . . . . . . . . . . . . . . . . . . . . . 622

VVaidyanathan, P.P. [435] . . . . . . . . . . . . 495, 496Vaidyanathan, P.P. [434] . . . . . . . . . . . . . . . . 495Vainio, J. [520] . . . . . . . . . . . . . . . . . . . . 548, 549Vainio, J. [225] . . . . . . . . . . . 298, 302, 303, 307

Vainio, J. [228] . . . . . . . . . . . . . . . . . . . . 304, 307van der Waal, R.G. [454] . . . . . . . . . . . . . . . .508van Eetvelt, P.W.J. [317] . . . . . . . . . . . . . . . . 447Van Loan, C.F. [178] . . . . . . . . . . . . . . . 203, 204Varaiya, P.P. [302] . . . . . . . . . . . . . . . . . . . . . . 445Vary, P. [143] . . . . . . . . . . . . . . . . . 138, 456, 457Vary, P. [336] . . . . . . . . . . . . . . . . . . . . . . . . . . 463Vary, P. [337] . . . . . . . . . . . . . . . . . . . . . 464, 528Vary, P. [523] . . . . . . . . . . . . . . . . . . . . . . . . . . 557Vary, P. [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Vary, P. [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Veldhuis, R.N.J. [454] . . . . . . . . . . . . . . . . . . 508Vercoe, B. [48] . . . . . . . . . . . . . . . . . . 6, 491, 528Verma, T. [469] . . . . . . . . . . . . . . . . . . . 516, 517Verma, T.S. [470] . . . . . . . . . . . . . . . . . . 516, 517Vernon, S. [412] . . . . . . . . . . . . . . . . . . . 491, 501Vetterli, M. [562] . . . . . . . . . . . . . . . . . . 619–621Vetterli, M. [573] . . . . . . . . . . . . . . . . . . . . . . 622Vetterling, W.T. [177] . . . . . 203–206, 404, 605Viaro, U. [124] . . . . . . . . . . . . . . . . . . . . . . . . 125Villebrun, E. [578] . . . . . . . . . . . . . . . . . 681, 683Viswanathan, R. [536] . . . . . . . . . . . . . 571, 743Viswanathan, R. [114] . . . . . . . . . . . . . . . . . . 113Viswanathan, V. [537] . . . . . . . . . . . . . . . . . . 571Viterbi, A.J. [59] . . . . . . . . . . . . . . . . . . . . . . . . 15Voiers, W.D. [544] . . . . . . . . . . . . . . . . . . . . . 580Voiers, W.D. [545] . . . . . . . . . . . . . . . . . . . . . 580Vook, F.W. [326] . . . . . . . . . . . . . . . . . . . . . . . 447

WWakatsuki, R. [104] . . . . . . . . . . . . . . . . . . . . 102Walls, B. [535] . . . . . . . . . . . . . . . . . . . . 569, 570Wand, J.F. [201] . . . . . . . . . . . . . . . . . . . . . . . 247Wang, D. [141] . . . . . . . . . . . . . . . . . . . . . . . . 138Wang, H-S. [271] . . . . . . . . . . . . . . . . . . . . . . 351Wang, S. [101] . . . . . . . . . . . . . . . . . . . . . . . . . 101Wassell, I. [71] . . . . . . . . . . . . . . . . . . . . . . 45–47,

50, 96–99, 113, 155–160, 162, 176,179, 181, 182, 191, 421, 442

Webb, W. [154] . . . . . . . . . . . . . . . . . . . . . . . . 168Webb, W. [153] . . . . . . . . . . . . . . . . . . . . . . . . 168Webb, W.T. [265] . . . . . . . . . . . . . . . . . . 336, 445Webb, W.T. [51] . . . . . . . 7, 491, 528, 529, 534Webb, W.T. [248] . . . . . . . . .321, 322, 335, 336Webb, W.T. [296] . . . . . . . . . . . . . . . . . . . . . . 445Webb, W.T. [159] . . . 171, 221, 222, 235, 288,

410, 445Webb, W.T. [512] . . . . 541, 542, 547, 549, 552Webb, W.T. [73] . . 45, 172, 228, 235, 288, 410Webber, S.A. [284] . . . . . . . . . . . . 419, 561, 687

Page 212: Voice and Audio Compression for Wireless Communicationsfiles.wireless.ecs.soton.ac.uk/newcomms/files/u1/Voice...VOICE-BOOK-2E-SAMPLE-CHAPS 2007/8/20 page 1 Voice and Audio Compression

VOICE-BOOK-2E-SAMPLE-CHAPS2007/8/20page 900

900 AUTHOR INDEX

Webber, S.A. [429] . . . . . . . . . . . . . . . . . . . . . 494Welch, V. [186] . . . . . . . . . . 209, 210, 214, 242Welch, V.C. [197] . . . . . . . . . . . . . 242, 245, 596Wilkinson, T.A. [318] . . . . . . . . . . . . . . . . . . 447Williams, J.E.B. [155] . . . . . . . . . 171, 173, 222Williams, J.E.B. [194] . . . . . . . . . . . . . 235–240Winter, E.H. [211] . . . . . . . . . . . . . . . . . 269, 272Wong, C.H. [250] . . . . . . . . . . . . . . . . . . . . . . 322Wong, C.H. [268] . . . . . . . . . . . . . . . . . .336, 446Wong, C.H. [309] . . . . . . . . . . . . . . . . . .447, 714Wong, C.H. [253] . . . . . . . . 322, 323, 328, 336Wong, C.H. [500] . . . . . . . . . . . . . . . . . . . . . . 528Wong, C.H. [303] . . . . . . . . . . . . . . . . . . . . . . 446Wong, C.H. [304] . . . . . . . . . . . . . . . . . . . . . . 446Wong, K.H.H. [74] . . . . . . . . . . . . . . . . . . .45, 47Wong, K.H.J. [71] . . . . . . . . . . . . . . . . . . . 45–47,

50, 96–99, 113, 155–160, 162, 176,179, 181, 182, 191, 421, 442

Wyatt–Millington, C.W. [317] . . . . . . . . . . . 447

XXydeas, C.S. [541] . . 574, 575, 641, 646, 648,

652, 653, 659, 660, 684, 758, 807Xydeas, C.S. [165] . . . . . . . . . . . . . . . . . . . . . 181Xydeas, C.S. [600] . . . . . . . . . . . . . . . . . . . . . 724Xydeas, C.S. [167] . . . . . . . . . . . . . . . . 181, 441Xydeas, C.S. [129] . . . . . . . . . . . . . . . . . . . . . 129

YYair Shoham, [540] . . . . . . . . . . . . . . . . . . . . 573Yang, H. [598] . . . . . . . . . . . . . . . . . . . . . . . . . 723Yang, L-L. [263] . . . . . . . . . . . . . . . . . . . . . . . 333Yang, L.L. [256] . . . . . . . . . . . . . . . . . . .322, 333Yao, T.C. [201] . . . . . . . . . . . . . . . . . . . . . . . . 247Yeap, B.L. [576] . . . . . . . . . . . . . . . . . . . . . . . 673Yeap, B.L. [511] 541–543, 547, 549, 552, 553Yee, M.S. [251] . . . . . . . . . . . . . . . . . . . 322, 446Yeldner, S. [597] . . . . . . . . . . . . . . . . . . . . . . . 723Yeldner, S. [585] . . . . . . . . . . . . . . . . . . 693, 758Yip, P. [334] . . . . . . . . . . . . . . . . . . . . . . . . . . . 459Yong, M. [133] . . . . . . . . . . . . . . . 131, 134, 320Yuen, E. [238] . . . . . . . . . . . . . . . . . . . . . . . . . 320

ZZarrinkoub, H. [150] . . . . . . . . . . . . . . . . . . . 149Zeger, K.A. [278] . . . . . . . . . . . . . . . . . . . . . . 364Zehavi, E. [517]. . . . . . . . . . . . . . . . . . . . . . . .547Zeiberg, S. [497] . . . . . . . . . . . . . . . . . . . . . . . 528Zeisberg, S. [323] . . . . . . . . . . . . . . . . . . . . . . 447Zelinski, R. [67] . . . . . . . . . . . . . . . . . . . . . . . . 38Zelinski, R. [88] . . . . . . . . . . . . . . . . . . . . . . . . 76

Zelinski, R. [432] . . . . . . . . . . . . . . . . . . . . . . 494Zhang, J. [271] . . . . . . . . . . . . . . . . . . . . . . . . 351Zhong, S. [569] .621, 622, 624, 626, 758, 804,

805Zwicker, E. [418] . . . . . . . . . . . . . . . . . . . . . . 493Zwicker, E. [424] . . . . . . . . . . . . . 494, 503, 516


Recommended