+ All Categories
Home > Documents > Performance of LSF vector quantizers for VSELP coders in noisy channels

Performance of LSF vector quantizers for VSELP coders in noisy channels

Date post: 26-Nov-2023
Category:
Upload: puc-rio-br
View: 0 times
Download: 0 times
Share this document with a friend
12
Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels Sonia L. Q. Dall’Agnol, Abraham Alcaim, Jose Roberto B. de Marca CETUC-PUC/Rio 22453-900, Rio de Janeiro - RJ-Brasil Abstract. Efficient quantization of synthesis filter coefficients for CELP (Code Excited Linear Prediction) coders is essential to achieve high quality speech at low rates. Three vector quantizers with good potential for utilization in low rate coders are studied. Each of them is implemented in- side the structure of the VSELP (Vector Sum Excited Linear Prediction) coder, an important member in the class of CELP coders. These three vector quantizers are used to encode LSF (Line Spectral Frequency) parameters and are compared in terms of robustness to channel errors, com- plexity and quality of synthesized speech. Performance of synthesized speech is evaluated consi- dering the objective measure of frequency weighted signal to noise ratio and subjective results ob- tained from listening tests. With the purpose of improving the robustness to channel errors, the ap- plication of simulated annealing to assign binary indices to the output levels of the quantizer is al- so investigated. A split vector quantization scheme which employs interframe prediction has shown to be an attractive approach to encode the synthesis filter parameters. It provides a perfor- mance comparable to the IS-54 scheme and uses 10 bits less for each LSF frame. 1. INTRODUCTION Code Excited Linear Prediction (CELP) [l] is a class of speech coders which presently represents the best strategy for high-quality digital transmission of speech at low rates. An important technique which fal- Is into this class of coders, and will be used throughout this paper, is the Vector Sum Excited Linear Predic- tion (VSELP) [2]. The VSELP scheme was standardi- zed in the North American Recommendation EIA/TIA-IS-54 for digital cellular telephony. In this Recommendation the codec operates at a total rate of 13 kbit/s, of which 5 kbitls are assigned to error detec- tion and correction. The next generation of speech co- ders to be used in this cellular system is required to operate at a final rate of 6.5 kbit/s (half-rate), be robu- st to the effects of channel errors and perform as well as the present standard. To achieve high-quality speech at low rates, special attention has to be placed on the efficient quantization of spectral or Linear Predictive Coding (LPC) para- meters. In order to reduce spectral distortion at low rates (less than 30 bit/ frame), vector quantization (VQ) has been considered an attractive approach. In addition, the use of line spectral frequencies (LSF) for LPC parameter representation has proven to be a good alternative at low rates. Recently [3, 51. several LSF quantizers have been proposed and analysed in an iso- lated fashion, using as performance measure the spec- tral distortion between the quantized and the original unquantized filters. in this paper we examine several efficient LSF vector quantizers when inserted in a complete VSELP speech coder, operating in a noisy channel. Section 2 of this paper describes the several quantiza- tion strategies. An analysis of these quantizers and a comparison with the one used in the North American Recommendation EIUIA-IS-54 for digital cellular te- lephony are presented in section 3. Average spectral di- stortion and percentage of outliers are used as perfor- mance criteria. Section 4 then provides a brief descrip- tion of the VSELP coding structure. Complexity issues are discussed in section 5. A comparative performance analysis of the several quantizers in a VSELP scheme is presented in section 6 for ideal channels and in section 7 for noisy channels. Section 7 also provides perfor- mance results relating to the application of simulated annealing to assign binary codewords to index the out- put of the quantizer codebook. Concluding remarks are then presented in section 8. Vol. 5, No. 5 September -October 1994 191553
Transcript

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

Sonia L. Q. Dall’Agnol, Abraham Alcaim, Jose Roberto B. de Marca CETUC-PUC/Rio 22453-900, Rio de Janeiro - RJ-Brasil

Abstract. Efficient quantization of synthesis filter coefficients for CELP (Code Excited Linear Prediction) coders is essential to achieve high quality speech at low rates. Three vector quantizers with good potential for utilization in low rate coders are studied. Each of them is implemented in- side the structure of the VSELP (Vector Sum Excited Linear Prediction) coder, an important member in the class of CELP coders. These three vector quantizers are used to encode LSF (Line Spectral Frequency) parameters and are compared in terms of robustness to channel errors, com- plexity and quality of synthesized speech. Performance of synthesized speech is evaluated consi- dering the objective measure of frequency weighted signal to noise ratio and subjective results ob- tained from listening tests. With the purpose of improving the robustness to channel errors, the ap- plication of simulated annealing to assign binary indices to the output levels of the quantizer is al- so investigated. A split vector quantization scheme which employs interframe prediction has shown to be an attractive approach to encode the synthesis filter parameters. It provides a perfor- mance comparable to the IS-54 scheme and uses 10 bits less for each LSF frame.

1. INTRODUCTION

Code Excited Linear Prediction (CELP) [ l ] is a class of speech coders which presently represents the best strategy for high-quality digital transmission of speech at low rates. An important technique which fal- Is into this class of coders, and will be used throughout this paper, is the Vector Sum Excited Linear Predic- tion (VSELP) [2]. The VSELP scheme was standardi- zed in the North American Recommendation EIA/TIA-IS-54 for digital cellular telephony. In this Recommendation the codec operates at a total rate of 13 kbit/s, of which 5 kbitls are assigned to error detec- tion and correction. The next generation of speech co- ders to be used in this cellular system is required to operate at a final rate of 6.5 kbit/s (half-rate), be robu- st to the effects of channel errors and perform as well as the present standard.

To achieve high-quality speech at low rates, special attention has to be placed on the efficient quantization of spectral or Linear Predictive Coding (LPC) para- meters. In order to reduce spectral distortion at low rates (less than 30 bit/ frame), vector quantization (VQ) has been considered an attractive approach. In addition, the use of line spectral frequencies (LSF) for

LPC parameter representation has proven to be a good alternative at low rates. Recently [3, 51. several LSF quantizers have been proposed and analysed in an iso- lated fashion, using as performance measure the spec- tral distortion between the quantized and the original unquantized filters. in this paper we examine several efficient LSF vector quantizers when inserted in a complete VSELP speech coder, operating in a noisy channel.

Section 2 of this paper describes the several quantiza- tion strategies. An analysis of these quantizers and a comparison with the one used in the North American Recommendation EIUIA-IS-54 for digital cellular te- lephony are presented in section 3. Average spectral di- stortion and percentage of outliers are used as perfor- mance criteria. Section 4 then provides a brief descrip- tion of the VSELP coding structure. Complexity issues are discussed in section 5 . A comparative performance analysis of the several quantizers in a VSELP scheme is presented in section 6 for ideal channels and in section 7 for noisy channels. Section 7 also provides perfor- mance results relating to the application of simulated annealing to assign binary codewords to index the out- put of the quantizer codebook. Concluding remarks are then presented in section 8.

Vol. 5 , No. 5 September -October 1994 191553

Sonia L. Q. Dall'Agnol, Abraham Alcaim, JoSe Roberto B. de Marca

0 TO K

fIN -

2. QUANTIZER DESCRIPTION

- C 0 D fi

' ( O W ------ + E . - - 0 0 0 K

Three vector quantizers were considered in this work, representing three different quantizer design philo- sophies. None of these techniques employ a single quan- tizer. They use several reduced size codebooks in order to avoid the excessive complexity of a very large VQ.

Before quantization the set of short-term predictor coefficients is converted to the LSF representation, whi- ch has been shown to possess very good quantization and interpolation properties [6 ] . Throughout this paper it will always be assumed that the linear prediction filter is of order 10.

The vector quantizers were designed using the well known LBG algorithm [7] or a variation thereof. The training set consisted of 10.340 vectors of LSF coeffi- cients, corresponding to 207 s of speech material spoken by 13 male and 12 female subjects. The quanti- zer family to be described in section 2.1. requires also the use of a bank of scalar quantizers. These one-dimen- sional quantizers were obtained from a smaller set of 1470 vectors derived from 29.4 s of speech, using again both male and female speakers.

The search metric adopted for both search and actual implementation of all the quantizers was the weighted Euclidean distortion measure defined by:

CODER

J

A where f is the quantized version of the LSF vector f = ( f l , f i ,..., fro). The weights (w,. j = 1, 2 ,..., 10) are defi- ned to provide a better quantization of the LSF parame- ters in the formant region of the spectrum, specially

- DECODER

around the lower frequency ones which are more impor- tant for the auditory perception. The mathematical defi- nition of the weights was first proposed in [8] and is in- spired by the property that whenever the speech spec- trum has a peak there are at least two LSF's close to- gether. The weights for frame i are given by:

withf, = 0 andf,, = JE. Note that the set of weights has to be computed for each LSF vector to be quantized.

The principal features of the three class of quantizers are as follows.

2.1. Vector-Scalar Quantizer (VSQ)

The concept of vector-scalar quantization was explo- red by Grass and Kabal in [3]. It consists of a two step quantization procedure. In the first stage a moderately complex vector quantizer is applied to the input the LSF vector. The quantization error resulting from this first step is then processed by a bank of scalar quantizers. If the two stages are operated independently the perfor- mance of this technique is not very impressive. Howe- ver, the reproduction can be significantly improved if instead of selecting only the nearest codevector in the vector stage, a set of N best vectors are saved. All N fir- st step quantization errors are then discretized by the scalar quantizer. The combination of output VQ code- vector and scalar output levels which produces the lowest distortion for the particular input vector is selec- ted to represent it. More formally, let ci, i = 1, ..., N be the N vectors closest to the input vectorfi, The input to

201554

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

the bank of scalar quantizers will be e, =A,, ci , Rrodu- cing at the output the quantized version e i = ( e i l , ..., gilo). The procedure will select as best output of the VQ the codevector ci which minimizes d,,, (fin, ci + e i ) , as il- lustrated in Fig. l

In order to allow for a fair comparison all the LSF quantizers addressed in this paper will operate at a rate of 28 bits per LSF vector. For the VSQ the total number of bits has to be apportioned between the two stages. The VQ portion was implemented with 8 bits, leaving 20 bits for scalar part. Limited tests indicated that a good distribution of these 20 bits among the 10 scalar quantizers is (3, 3, 2, 2, 2, 2, 2, 2, 1 , 1 ) . The search breadth M was taken to be 5.

This quantizer, which employs a fixed codebook for the first stage will from now on be referred to as Fixed VSQ WSQ).

Further increase in performance can be obtained if the correlation between successive LSF vectors is explored. Grass and Kabal suggest that a small fraction of the VQ codebook should change so as to include the most recent R quantized values off. In order to have room for this adaptive set, the R vectors least selected during the last ite- ration of the training procedure should be removed from the codebook. In our case, R was taken to be 6 and the re- moved vectors had a combined selection rate of 0.69%.

One drawback of this adaptive codebook vector-sca- lar quantizer (AVSQ) is its vulnerability to channel er- ror propagation. Each incorrect vector added to the co- debook may generate future incorrect decodings if it is again selected before being dropped from the codebook. In order to minimize this effect the adaptive portion of the codebook should be reset periodically. Here, after 30 LSF frames, the codebook is reinitialized with the R vectors originally removed after training.

A

2.2. Tree-searched multi-stage Vector Quantizer (TS-MSVQ)

In a multi-stage vector quantizer the input vector& is approximated by a sum:

where CW is the reproduction vector selected at the j-th quantization stage and N is the number of stages.

In general, the input to the j-th stage is the quantization error resulting from stage 0’ - 1). For example, the input to stage 2 is the error vector e(1) =fin - c(1) . This sequen- tial search procedure is not very efficient and usually the performance of the technique saturates after only two or three stages. Ideally, the quantizer should search all possi- ble combinations of stage output levels and select the combination yielding the smallest error. However, this optimal approach is certainly too complex. An interme- diate solution was proposed in [4], where the authors pro- pose to save from one stage to the other the M closest re- production levels. Thus, following stage j , the system

would compute M error vectors between e C t - 1 ) and cy’, k = 1.2, .... M . For each of these error vectors, the M Z best output levels of quantizer (j + 1) are selected. Out of the M choices resulting from the processing of all error vec- tors, the M alternatives producing the smaller errors are considered as inputs for stage (M, + 2). After the last stage the set of quantization outputs which produces the smalle- st value of weighted distortion is selected to reproduce&.

In this study a search breadth value of 4 was adopted and four stages of 128 levels were used, yielding the de- sired rate of 28 bits per LSF frame.

Training of the four codebooks was done sequential- ly. Codebook for first stage was designed first, error vectors resulting from the use of this first quantizer (on the speech data base already described) were then used to train second stage VQ and so forth.

The tree search procedure does not guarantee that the quantized LSF vector satisfies the ordering property, re- quired for prediction filter stability. Hence a test should be included in the final step of the quantization algo- rithm to guarantee that the chosen value off has the ne- cessary property.

2.3. Inreerame predictive split Vector Quantizer (SVQ-IP)

This quantized is basically a Split VQ [9] coupled with interframe first-order prediction. Let us recall that in split vector quantization the vector of LSF’s is partio- ned into M subsets and each subset is then coded using a different codebook. As M grows there is a reduction both in performance and in complexity of implementa- tion. In order to avoid this performance loss when a mo- derately complex split VQ is desired, in [5 ] it was pro- posed that advantage be taken from the high interframe correlation presented by the LSF coefficients. The sche- me introduced in [ 5 ] actually applies prediction on every other frame so as to avoid error propagation and slope overload. Specifically. let us assume that frame f ( i ) is not quantized differentially (i.e., the actual LSF’s are quantized) then in the next frame the quantity to be discretized would be:

(4)

where u is the vector of prediction coefficients. The fol- lowing LSF vector is again absolutely (not differential- ly) quantized.

Following the suggestion in [5]. a value of M = 4 was adopted in this work and bit distribution for the frame which is absolutely quantized (frame A ) and the frame which is predicted (frame B ) is as shown in Table 1.

Codebook design was performed, following the pro- cedure outlined in [ 5 ] . using the previously mentioned speech data base. Both during training as well as during actual operation a test is implemented to avoid unstable sets of quantized coefficients (51.

vol. 5 . No. 5 Sepiembcr - October 1994 211555

Sonia L. Q. Dall'Agnol, Abraham Alcaim, Jose Roberto B. de hlarca

Table I - Bit assignment for M = 4, 28 bigframe SVQ-IP.

r----

~ Subsets Frame A Frame B

7 6

3. PERFORMANCE OF DIFFERENT QUANTIZERS

Initially the stand-alone performance of the quanti- zers described in the previous section plus that of a 38 bit scalar quantizer employed in the standard IS-54 [2] are evaluated in terms of the following filtered spectral distortion measure:

3 450

[ lOlogS(f ) -10log~(f ) ]2d ,]'I2 ( 5 )

A

where S and S are the original and reconstructed spectra associated with a given LSF frame.

In a later section the objective and subjective perfor- mance of these quantizers when they are used in a VSELP scheme will be described.

Table 2 contains the values of average spectral distor- tion for four vector quantizers operating at 28 bidframe and for the above mentioned scalar quantizer. Percenta- ge of outlier frames (frames with distortion higher than 2 dB) are also given in Table 2 for all the quantizers.

Table 2 - Values of average spectral distortion and percentage of outlier frames for four 28 bitflrame vector quantizers and a scalar quantizer operating at 38 bitJrame.

SDf(dB) I Outliers (%) ~ Outliers (%), j bet.2and4dBi > 4 d B ,

7 iiS-54 ~ 0.7782 j 0.1 18 ~ 0 1

____- . ~-

Quantizer

~~ c . 1---- --- ~.~ .... - ~ _ _ _

' SVQ-IP \ 1.0488 1 3.371 I 0.335 r'-~ t~--.- 1

2.898 ~ 0 I

+ --- AVSQ 1.1215

+- - -~ ~

i TS-MSVQ i 1.0255 . - ~~

As can be seen, the scalar quantizer presents the lowest distortion but it also uses 10 additional bits for each LSF frame, when compared with the vector quanti- zers. Among the VQ's the TS-MSVQ and the SVQ-IP have comparable behavior. The worst performance in

terms of average distortion is provided by the family of vector-scalar quantizers. As expected, the FVSQ cannot match the distortion yielded by its adaptive version. On the other hand the VSQ produces the smallest percenta- ge of outlier frames among the VQ's, although the other two classes also appear to provide acceptable numbers.

4. VSELP CODER

This section presents a brief overview of the Vector Sum Excited Linear Predictive (VSELP) speech coder. VSELP was selected by the Telecommunications Indu- stry Association (TIA) and Electronic Industry Associa- tion in 1990 (EIA) as the standard for North American and Japanese digital cellular telephone systems [2, 10, 1 I]. This work used the basic structure of this standard

Speech coding for digital cellular systems presumes real time operation and so, reduced computational re- quirements. The VSELP speech coder achieves this goal through efficient utilization of structured excitation codebooks. The structured codebooks, besides reducing computational complexity, also reduce the amount of storage capacity and increase robustness to channel er- rors. Two VSELP excitation codebooks are used to im- prove speech quality and a new vector quantizer for the excitation gains is also employed to achieve high co- ding efficiency and robustness to channel errors.

(EIAlTIA-IS54).

4.1. Basic coder/decoder parameters

Fig. 2 a) shows a block diagram of the speech coder. The VSELP employs voice frames of 20 ms (160 sam- ples) generating the following output parameters:

ri ( i = 1,2, ..., 10) : reflection coefficients of synthesis

RO : frame energy; L I , H

G

filter;

: long term predictor lag; : index of vector excitation from code-

: index of vector gain quantizer for ex- books 1 and 2, respectively;

citation gains.

Parameters r, and RO are computed once a frame and the others are computed in every subframe of 5 ms. The basic data rate of the speech coder is 8 kbids (without error protection bits). There are 160 bits per frame (20 ms) allocated as shown in Table 3.

The analog speech signal is converted to a digital si- gnal in a uniform PCM format (with a minimum reso- lution of 13 bits) and it may be desirable, in some in- stances, to provide additional high pass filtering to the digital signal. A fourth order high pass filter with a f i l - ter response which is 3 dB down at 120 HZ and 40 dB down at 60 Hz is recommended. The ten reflection coefficients of the weighted synthesis filter are compu-

22/556 ETT

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

H

HPF

MDEBOOK 2

ENERGY AND

COEFFICIENTS COM PUTATl ON

I - t i I I

ADAPTIVE CODEBOOK I I

I I I

I I I I

I

H, ( 2 1

H

0 fl ‘62

TRANSFORMATION TO CODEBOOK GS, P O , Pl OF

ADAPTIVE CODEBOOK

r l

n m E l l Z C 0

POST-FILTER H ( z 1 I CODEBOOK

I

Fig. 2 - The VSELP Codec: a) Coder; b) Decoder.

Vol. 5. No. 5 September - October 1994

Sonia L. Q. Dall’Agnol, Abraham Alcaim, Jose Roberto B. de Mama

Table 3 - Bit Allocation for 8 kbith Coder.

(20 ms) Parameter

Filter Coefficients: r,

Energy RO ! 1-. i Excitation Index: I, H 7 + 7 ; 56

Lag: L 7 28 -+

32 I

I ~ Unused 1

ted from the high-pass filtered input speech through the LPC analysis using the fixed point covariance lattice algorithm (FLAT). The reflection coefficients are then converted into the prediction coefficients ui’s which are used on the fourth subframe of each frame. Before LIIGII usc UII LIIC I I I J L , JGL.UIIU auu L I I I I U ~ U I J I I ~ I I G J , LIIG

prediction coefficients are linearly interpolated. An overall frame energy is also computed and coded once per frame. This energy value RO reflects the average signal power in the input speech over a 20 ms interval which is centered with respect to the middle of the fourth subframe.

The high-pass filtered input speech is processed by the weighting filter W ( z ) and its output sequence is compared with the output of the weighted synthesis f i l - ter H,(z), which is excited by each codevector of each codebook. The indices of the codevectors that generate the minimum weighted error power are chosen as the indices for that subframe. Thus the minimization of the weighted error power is done for each excitation code- book and also, for the excitation gain codebook GS- PO-PI.

VSELP uses three excitation sequences: two of them coming from fixed and structured codebooks and the third one coming from an adaptative codebook that re- presents the long-term filter state. Each of the structured codebooks is constructed from a set of M basis vectors (M = 7 in the IS-54 standard) that are linearly combined to produce 2M excitation sequences. Using these proper- ties of the VSELP codebook construction, the computa- tion required for evaluation of the minimum weighted error power is greatly simplified. The adaptative code- book is updated in each subframe with the combined e,(n) which is computed through the sum of the three excitation sequences bl (n) , u1.1 (n) and u ~ . ~ ( n ) multi- plied by their respective gain p, yl and K.

The gains p, y,, y2 are jointly optimized to minimize the weighted error power. In fact, the vector quantiza- tion of p, y,, y2 is replaced by the joint vector quanti- zation of mathematically equivalents parameters GS. PO, and PI. Coding the {GS, PO, PI } vector is ad-

vantageous over coding p, yl, y2 because (GS, PO, P 1 } is independent of the input signal level (due to the normalization of the absolute signal energy when RO is quantized) and because (GS, PO, P l } is bounded. The codebook for (GS, PO, P1 } comprises 256 code- vectors.

The block diagram of VSELP speech decoder is shown in Fig. 2 b). In practice, the decoder is a subset of the encoder due to the analysis-by-synthesis procedu- re used in this type of speech coder. The only stage in the decoder that is not present in the coder is the adapta- tive spectral post filter that is used to enhance the per- ceptual quality of the synthetic speech.

The synthesis filter H (z) and the weighted synthesis filter H,.(z) are given by

1 H ( z ) = ”

I - & z-i r = l

(7) 1

, where A. = 0.8

1 = I

4.2. Channel error control

The VSELP codec was designed to be used in digital cellular systems, where the radio channels used for tran- smission can be very noisy. So, it is very important to provide additional protection for the data stream produ- ced by the speech coder. This data stream is comprised by the binary codes representing the 27 different para- meters computed every 20 ms. The degradation of spee- ch quality due to channel transmission errors depends on which parameter and which bit representing that pa- rameter was affected by the error. As a consequence, the protection mechanism for the speech coder data stream must provide a non-uniform amount of error cor- rection capability over the data stream.

The channel error control for the speech coder data employs three mechanisms for the mitigation of channel errors. The first is to use a rate one-half convolutional code to protect the more vulnerable bits. The second te- chnique interleaves the transmitted data for each speech coder frame over two time slots to mitigate the effects of Rayleigh fading. The third technique employs a cy- clic redundancy check over some of the most percep- tually significant bits. After the error correction is ap- plied, thc total rate of VSELP is increased to 13 kbit/s.

5. COMPLEXITY ISSUES

The selection of a given vector quantizer scheme for real time implementation can only be accomplished af- ter careful consideration of the complexity involved.

241558 ETr

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

Add-compare-select 1 e.g. Viterbi decoding

The first step in evaluating the complexity of an al- gorithm is to define an adequate measure of com- plexity. Such a measure should reflect both the cost and power requirements of a real time implementation. In this work it was adopted the measure described in [ 5 ] . This measure takes into account the number of operations, with weight factors associated to each type of operation, and also the data memory requirements both in terms of static and dynamic memory. The amount of program memory needed is not included sin- ce it depends on the actual hardware platform which is being used. Formally this complexity measure can be expressed as:

6 ~

Complexity (C) = 0.2 DM + 0.05 S M + 0 (8)

where DM is the number of dynamic memory posi- tions. SM the number of positions in ROM and 0 is a weighted sum of the required arithmetic operations. The weight associated to each type of operation is gi- ven in Table 4.

The measure just defined was used to evaluate the complexity of the quantizers described in section 2. For some steps of the algorithms the number of operations depends on the actual vector being quantized. In these situations the number entered in the computation of 0 was half of the number of operations that would be ne- cessary in the worst case scenario.

Table 4 - Weights associated to arithmetic operations.

Operation ~ Example __

Additions

Multiplications

Multiply-add

Data moves float. int

o=o*o+o

Table 5 - Complexity (C) of the Vector Quantizers defined in section 2.

SM Quantizer (byte) (byte) C

- -. . - - --- - - . - - 1-- --- kL ~ -_ I 22256 I :: 325 , 25806 21874 23415 14758 1

----- FVSQ

TS-MSVQt 155016 ' 188 45056 157307 '

6. PERFORMANCE IN IDEAL CHANNEL

This section will present the ideal channel performan- ce of the LSP quantizers when used in a VSELP coding structure.

6.1. Peformance Measures

The speech material used in our simulations compri- ses 8 utterances in Brazilian Portuguese taken from the FM radio and from lists of phonetically balanced sen- tences [12]. Each utterance was spoken by one of 8 speakers (4 male and 4 female). They will be referred to as M1, M2, M3, M4, F1, F2, F3 and F4. It should be noted that no one of these utterances or speakers were used for training the quantizer codebooks.

As objective performance measure we have used the average value of the weighted signal to noise ratios of all frames in a test sequence. The weighted SNR for fra- me j is defined as

SNR, ( j ) = 10 log,, n = l

n=l

The values obtained for 0, DM and SM for each quantizer are given in Table 5. As shown in this table, the SVQ-IP is the least complex quantizer. Its com- plexity is about one order of magnitude lower than that of TS-MSVQ and roughly 60% of complexity presented by the fixed vector-scalar quantizer. The AVSQ is ne- cessarily of more complex implementation than its fixed version.

where s ( n ) tnd : ( n ) are original and synthesized spee- ch, ( s ( n ) - s (n)),,. is the weighted (by W(:)) synthesis error and N is the number of samples ( N = 160) in the j - th CELP frame. The average value SNR,, for K frames of the test sequence is given by

-

This measure presents some correlation with subjecti- ve performance because the error is enhanced in fre- quencies where the human ear is more sensitive. I t should be remarked, however, that SNR, alone cannot predict all kinds of distortions present in the reconstruc- ted speech and it is not completely adequate to assess particular impairments of the spectrum.

Subjective performance results were obtained for the

Vol. 5 , No. 5 September - October 1994 25m9

Sonia L. Q. Dall'Agnol, Abraham Alcaim, Jose Roberto B. de Marca

M2

F2

M3

F3

M4

F4

IS-54 VSELP standard (spectral parameters quantized with 38 bits/ frame) and the VSELP using the LSF vec- tor quantizers (with 28 bitdframe) investigated in this work, both in ideal and noisy channels, and three MN- RU reference conditions with Q = 5, 15 and 25 dB. We have also included the FVSQ quantizer with simulated annealing in noisy channels. The test samples were ran- domly presented to 18 listeners who were asked to rank each one with scores ranging from 1 (unacceptable qua- lity) to 5 (excellent quality). Results are given in terms of mean score, standard deviation, and 95% confidence interval.

12.05 12.16 12.05 12.07 12.12

10.46 10.41 10.44 10.38 10.40

12.06 11.85 11.94 11.95 11.96 j I

14.12 13.94 13.94 13.94 14.19 ' 13.20 13.13 13.06 13.06 13.09

15.45 15.35 15.29 , 15.36 , 15.28 ~

6.2. Simulation results

All Speakers

Objective performance results in ideal channel con- ditions are presented in Tables 6 and 7. The SNR, va- lues are shown in Table 6 for each utterance and in Ta- ble 7 for the whole test sequence. Comparing these re- sults with the analysis of the quantizers alone presen- ted in section 3, we can see that the significant supe- riority of the IS-54 quantizer is no more observed when it is used in the VSELP (even using a higher ra- te). The weighted signal to noise ratios are similar for all systems.

Female Speakers Male Speakers

Table 6 - SNR, for each utterance in ideal channel.

IS-54

TS-MSVQ

I I I 1

4.19 0.82 1 0.11 I 4.15 I 0.81 0.15 4.24 0.82 0.16

4.21 0.75 I 0.09 4.24 1 0.75 I 0.13 4.18 0.75 0.13 I I

AVSQ

SVQ-IP

Q = 25 dB

3.8 3i9 4 io 4i I 4i2 4i3 4i4 I

4.12 0.92 0.14 I 4.18 I 0.86 0.17 4.06 0.97 0.22

4.02 0.83 ' 0.11 4.15 i 0.79 0.15 3.89 0.84 0.17

4.18 0.86 , 0.12 4.04 ~ 0.82 1 0.16 4.32 0.88 0.18

TS- MSVQ

1s -54

AVSQ - SVQ - I P -

a= 25d0 - Fig. 3 - Confidence intervals in ideal channel.

Statistics of the subjective quality tests are given in Table 8. Using these results we have plotted the confi- dence intervals shown in Fig. 3. It can be seen that the subjective results of the IS-54, TS-MSVQ and the Q = 25 dB reference condition are comparable. The AVSQ scheme is a little inferior and the SVQ-IP technique yields the worst performance. It should be noted, howe- ver, that this ranking cannot be taken in absolute terms because the confidence intervals overlap with each other.

Table 7 - SNR, for the whole test sequence in ideal channel.

i IS-54 j S V Q m G l

I SNR, (dBi 12.32 12.27 I 12.24 I 12.22 I 12.29 I

7. PERFORMANCE IN NOISY CHANNELS

In this section we present the error sensitivity of the bits representing the filter coefficients and a comparati- ve performance analysis of the several quantizers in a VSELP coder, operating in noisy channels. To determi- ne the error rates associated to each bit it should be esta- blished the classes of protection. In this work we consi- dered three classes of protection: 1 A, 1 and 2, with er- ror rates 5 x , 4 x 10-2 and 10-' respectively.

261560

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

VQ Class Bit Number

7.1. Bit error sensitivity

Number of Bits per Class

The classes of protection for the bits generated by both the IS-54 quantizer and the vector quantizers are given in Tables 9 and 10, respectively. To assess the sensitivity of each bit to channel errors, the bits were systematically changed in the output bit stream, one at a time. We then measured the weighted signal to noise ra- tie SNR,. The results are shown in Fig. 4. -

1A

Table 9 - Classes of protection for the 38 bits gene- rated by the IS-54 quantizer.

1,2,3,4,5,6,9,10, 16, 17 - Even Frame 1,2 - Odd Frame

12 bits/ (two frames)

Class Bit Number

1

2

1A

1

2

lNumber bits per class Of 1 Error Rate I

7, 8, 11, 12, 18, 23 - Even Frame 3 ,4 ,5 ,6 - Odd Frame (two frames)

13, 14, 15, 19 to 22, 24 to 30-Even Frame 34 bits/ 7.9 to 14, 16 to 21. (two frames) 23 to 29-Odd Frame

1 to7 ,9 8 bitdframe

8, 10, 12 3 bits/frame

11, 13, 14to28 17 bitdframe

10 bits/

I

I I I I

1A 11to9

1A 4,5,6,10.11,15, 9 bitslframe . 5 x 10-3 116.20,24 1 1 1 3,9,14,19 4 bitdframe 4 x 10-2

9 bitdframe I

1,2,7,8,12,13,17,18, 25 bitslframe 21.22.23.25 to 38

AVSQ

1 TS-MSVQ

I

1 10, 12, 13, 15 4 bitdframe

2 i 11,14, 16to28 15 bits/frame , -I

1A 1 to7 ,9 8 bits/frame

1 8, 10, 11, 12 4 bits/frame

2 I 13 to28

~~

16 bits/frame-l

SVQ-IP

WSQ

-

7.2. Pedomance in noisy channels

Table 1 1 shows the weighted signal to noise ratios ot the VSELP codec in noisy channels according to the

Vol. 5 . No. 5 September - October 1994

IS -54 - 0 I 10 I5 20 25 SO 35 40

12 r

l2 r

J' AVSQ

Fig. 4 - Bit error sensitivity.

21/56 I

Sonia L. Q. Dall’Agnol, Abraham Alcaim, JOG Roberto B. de Marca

Quantizer IS-54

m , ( d B ) i 9.23 ____

classes of protection previously defined. It can be seen that in terms of SNR,, the system which yields the best performance is the SVQ-IP, which is about 0.7 dB bet- ter than the IS-54 recommendation. The FVSQ also pro- vided a better performance than the IS-54 Recommen- dation. The worst performance was obtained with the TS-MSVQ approach.

SVQ-IP FVSQ I AVSQ TS-MSVQ

9.98 9.67 1 9.23 9.08 I

Table I I - m, of the VSELP codec in noisy channels.

I

1 Quantizer

: IS-54 1

I ! SVQ-IP -- 1 FVSQ I

i

All speakers ~ Female speakers ~ Male speakers I

d CI 0 I C I P ___ ~

P d CI , P - 2.53 0.82 0.14 2.22 0.75 0.13 j 2.85 0.97 0.22’

2.49 0.87 0.12 2.50 08-16k.47 0.90 0.19

I ! -. 1

2.53 I 0.87 0.12 ’ 2.58 0.85 1 0.17 1 2.47 , 0.88 0.18

---- -..

Subjetive test results are given in Table 12 and in Fig. 5. From these results we can see that the IS-54, SVQ-IP and FVSQ quantizers are comparable, with a slight di- sadvantage for the FVSQ. The worst result is obtained with the AVSQ system, probably due to the effects of error propagation associated with the adaptivity of the codebook. It is important to recall that although subjec- tively the three coders have similar performance, the IS- 54 operates at a higher rate than the FVSQ and the SVQ-IP. In addition, in terms of computational com-

plexity the SVQ-IP quantizer is about 10 times simpler than the TS-MSVQ and the complexity of the FVSQ is 60% higher than the SVQ-IP system.

7.3. Zero redundancy coding

In order to improve the performance in noisy chan- nels an attempt was made using a simulated annealing approach [ 13, 141 to properly assign the binary indices to the quantizers found in the IS-54 standard and also when the bank of scalar quantizers is replaced by the FSVQ.

In the case of the standard coder both the reflection coefficient quantizer and the quantizer for the gain vec- tor GS-PO-PI had their indices reassigned. On the other hand, the FSVQ had the indices of its first stage VQ re- designed according to the results of the annealing tech- nique. Furthermore after the index permutation another error sensitivity study was made for the coder with FVSQ. The bits were then reapportioned by the protec- tion classes in order to reflect the new situation.

No significant improvement was verified both in the IS-54 performance, which is probably already very well engineered, and in that of the system employing the mo- re sophisticated FVSQ.

It is worth pointing out that the simulated annealing

I ’ Q = 5 dB 1 1.45 I 0.65 1 0.07 1.38 1 0.68 0.11 1 1.53 I 0.62 1 0.09 I

li3 Ii4 115 210 2jl 2i2 2i3 214 215 2,6 217 2i8 - - - I S - 5 4

S V Q - I P - F V S Q

TS - MSVQ I

AVSQ - Q - 15dB - -

28/562

Fig. 5 - Confidence intervals in noisy channels.

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

algorithm was run using distortion metrics, namely: spectral distortion and a weighted euclidean distortion measure. It is conceivable, however, that the VSELP performance in hostile channels could be significantly improved if the simulated annealing had employed the overall coder distortion instead [ 151. Due to its com- plexity this solution was not attempted in this work.

8. CONCLUSIONS

In this paper we have examined the performance of three LSF vector quantizers for VSELP coders opera- ting in ideal and noisy channels. They were compared to the bank of scalar quantizers described in the IS-54 North American Recommendation for digital cellular telephony, which was referred to as IS-54 quantizer. The latter operates at 38 bitslframe, while the vector quantizers operate at 28 bits/frame and present the hi- ghest spectral distortions. The significant superiority of the IS-54 quantizer is no more verified when it is used in the VSELP coder. For instance, in ideal channel the subjective performance of the IS-54 and the TS-MSVQ schemes were comparable. It seems that the analysis- by-synthesis process of the VSELP coder compensates to a certain extent the deficiencies introduced by the quantization of the filter parameters. Hence, if the goal is the quality of synthesized speech, the isolated evalua- tion of the quantizer should be regarded with caution. Another example, not mentioned here, is concerned with the SVQ-IP performance. When its final rate is lowered to 26 bits/frame, the spectral distortion increa- ses from 1.0488 dB to 1.1868 dB and the percentage of “outliers” between 2 and 4 dB increases from 3.37 1 % to 5.381 %. On the other hand, the weighted signal-to-noi- se ratio of synthesized speech when the quantizer is used in a VSELP structure degrades only by 0.08 dB.

Noisy channel performance was evaluated at bit error rates of 5 x 1 ~ 3 , 4 x 10-2 and 10-1, depending on the selected class of bit protection. The IS-54, SVQ-IP and FVSQ quantizers have shown a similar performance, while the AVSQ scheme provided the worst quality of synthesized speech. It should be remarked that the IS-54 scheme uses 10 additional bits for each LSF frame.

In terms of computational complexity, the SVQ-IP quantizer is about 10 times simpler than the TS-MSVQ. In addition, as compared to the SVQ-IP system, the complexity of the FVSQ is 60% higher. We therefore conclude that the SVQ-IP quantizer may be an attracti-

ve approach to encode the spectral parameters for low bit rate speech coders using the CELP approach.

Manuscript received on May 24, 1994.

REFERENCES

[ I ] M. R. Schroeder, B. S. Atal: Code-Excited Linear Predicrion (CELP,: high-quality speech or ver?. low bit rares. Proc. Int. Conf. on Acoustics, Speech. and Signal Processing, Tampa,

121 Electronic Industries Association (EIA): Cellular svsrems. Re- port IS-54, December 1989.

[3] J. Grass, P. Kabal: Methods of improving Vecror-Scalar Quanri- zarion of LPC coefficienrs. Proc. Int. Conf. on Acoustics, Spee- ch, and Signal Processing. Toronto, Canada, May 1991. p. 657- 600.

(41 B. Bhattacharya, W. LeBlanc. S. Mahmoud. V. Cu rman: Tree searched multi-sfage Vector Quantrzarion of LP8‘aramerers for 4 kbMs speech coding. Proc. Inr. Conf. on Acoustics. Spee- ch, and Si nal Processing. San Francisco, USA, March 1992 p.

[5] J. R. B. de Marca: An LSF quanrizerfiw rhe Norrh-American half rare speech Coder. “IEEE Trans. Vehicular Technology“. August 1994.

161 N. Sugamura. F. Itakura: Speech analysis and synrhesis merhods developed at ECL in NTF - from LPC ro U P . “Speech Commu- nications.” Vol. 5. 1986, p. 199-215.

PI Y. Linde, A. Buzo, R. M. Gray: An algorirhm for Verfor Quan- rizer design. “IEEE Trans. on Communications”. Vol. COM-28, January 1980. p. 84-95.

(81 R. Lamia, N. Phamdo, N. Favardin: Roblrrr andeficient quanri- zarion of speech U P oramerers using structured Verror Q u n - fizers. Proc. Int. Conf!on Acoustics, Speech and Signal Proces- sing, Toronto, Canada, May 1991, p. 641-W.

191 K. K. Paliwal, 8 . S. A d : Efficient Vector Quantixrion of LPC paromerers at 24 birdframe. Roc. Int. Conf. on Acoustics. S h and Signal Recessing. Toronto, Canada, May 1991, p. 6 6 1 - g

1101 E I m I A Interim Standard: Cellular system dual-mode mobile sfarion-base srarion comparibiliry standard IS-54. May 1990.

[ I I ] 1. A. Gerson, M. A. Jasiuk: Vecror Sum Excired Linear Predic- r i m (VSELP) speech coding ar 8 k9Ws. Roc. hi . Conf. Acou- stics, Speech and Signal Processing. April 1990, p. 461-462.

1121 A. Alcaim, J. A. Solewicz, J. A. Moms: Frequency of occur- rence of phones and lisrs of phonerically balanced senrences of the Portuguese Spoken in Rio de Janeiro. (in Portuguese), “Re- vista da Sociedade Brasileira de Telecomunica~iks”. Vol. 7, No. 1. December 1992, p. 23-4 1.

I131 J. R. B. de Marca, N. Farvardin, N. S. Jayant, Y . Shoham: Robu- sf Vecror Quanrrzafion for noisy channels. Proc. Mobile Satellite Conference, Pasadena, USA, May 1988. p. 5 15-520.

1141 N. Farvardin: A Study of Vector Quanrizarion fo r noisy channels.’TEEE Trans. on Infonnatlon Theory”. Vol. 36, NO. 4,

[ IS] W. B. Kleijn: Source dependent channel coding and its upplira- rion in CELP. AT&T Bell Laboratories Technical Memoran- dum, January 1991.

USA, 1985, p. 937-940.

i

1.105-1.1oi.

July 1990, p. 799-809.

voi. 5 , No. 5 September - October 1994 291563

S. L. Q. Dall’Agnol. A. Alcaim, J. R. B. de Marca: Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

IETT, Vol. 5 - No. 5 1 ---___-____ - September - October 1994, p. 553 - 563

30t564 ETT


Recommended