Performance of LSF vector quantizers for VSELP coders in noisy channels

Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

Sonia L. Q. Dall’Agnol, Abraham Alcaim, Jose Roberto B. de Marca CETUC-PUC/Rio 22453-900, Rio de Janeiro - RJ-Brasil

Abstract. Efficient quantization of synthesis filter coefficients for CELP (Code Excited Linear Prediction) coders is essential to achieve high quality speech at low rates. Three vector quantizers with good potential for utilization in low rate coders are studied. Each of them is implemented in- side the structure of the VSELP (Vector Sum Excited Linear Prediction) coder, an important member in the class of CELP coders. These three vector quantizers are used to encode LSF (Line Spectral Frequency) parameters and are compared in terms of robustness to channel errors, complexity and quality of synthesized speech. Performance of synthesized speech is evaluated consi- dering the objective measure of frequency weighted signal to noise ratio and subjective results obtained from listening tests. With the purpose of improving the robustness to channel errors, the application of simulated annealing to assign binary indices to the output levels of the quantizer is also investigated. A split vector quantization scheme which employs interframe prediction has shown to be an attractive approach to encode the synthesis filter parameters. It provides a performance comparable to the IS-54 scheme and uses 10 bits less for each LSF frame.

1. INTRODUCTION

Code Excited Linear Prediction (CELP) [ l ] is a class of speech coders which presently represents the best strategy for high-quality digital transmission of speech at low rates. An important technique which fal- Is into this class of coders, and will be used throughout this paper, is the Vector Sum Excited Linear Predic- tion (VSELP) [2]. The VSELP scheme was standardi- zed in the North American Recommendation EIA/TIA-IS-54 for digital cellular telephony. In this Recommendation the codec operates at a total rate of 13 kbit/s, of which 5 kbitls are assigned to error detec- tion and correction. The next generation of speech coders to be used in this cellular system is required to operate at a final rate of 6.5 kbit/s (half-rate), be robu- st to the effects of channel errors and perform as well as the present standard.

To achieve high-quality speech at low rates, special attention has to be placed on the efficient quantization of spectral or Linear Predictive Coding (LPC) parameters. In order to reduce spectral distortion at low rates (less than 30 bit/ frame), vector quantization (VQ) has been considered an attractive approach. In addition, the use of line spectral frequencies (LSF) for

LPC parameter representation has proven to be a good alternative at low rates. Recently [3, 51. several LSF quantizers have been proposed and analysed in an isolated fashion, using as performance measure the spectral distortion between the quantized and the original unquantized filters. in this paper we examine several efficient LSF vector quantizers when inserted in a complete VSELP speech coder, operating in a noisy channel.

Section 2 of this paper describes the several quantization strategies. An analysis of these quantizers and a comparison with the one used in the North American Recommendation EIUIA-IS-54 for digital cellular telephony are presented in section 3. Average spectral distortion and percentage of outliers are used as performance criteria. Section 4 then provides a brief description of the VSELP coding structure. Complexity issues are discussed in section 5 . A comparative performance analysis of the several quantizers in a VSELP scheme is presented in section 6 for ideal channels and in section 7 for noisy channels. Section 7 also provides performance results relating to the application of simulated annealing to assign binary codewords to index the output of the quantizer codebook. Concluding remarks are then presented in section 8.

Vol. 5 , No. 5 September -October 1994 191553

Sonia L. Q. Dall'Agnol, Abraham Alcaim, JoSe Roberto B. de Marca

0 TO K

fIN -

2. QUANTIZER DESCRIPTION

- C 0 D fi

' ( O W ------ + E . - - 0 0 0 K

Three vector quantizers were considered in this work, representing three different quantizer design philo- sophies. None of these techniques employ a single quantizer. They use several reduced size codebooks in order to avoid the excessive complexity of a very large VQ.

Before quantization the set of short-term predictor coefficients is converted to the LSF representation, which has been shown to possess very good quantization and interpolation properties [6 ] . Throughout this paper it will always be assumed that the linear prediction filter is of order 10.

The vector quantizers were designed using the well known LBG algorithm [7] or a variation thereof. The training set consisted of 10.340 vectors of LSF coefficients, corresponding to 207 s of speech material spoken by 13 male and 12 female subjects. The quantizer family to be described in section 2.1. requires also the use of a bank of scalar quantizers. These one-dimen- sional quantizers were obtained from a smaller set of 1470 vectors derived from 29.4 s of speech, using again both male and female speakers.

The search metric adopted for both search and actual implementation of all the quantizers was the weighted Euclidean distortion measure defined by:

CODER

J

A where f is the quantized version of the LSF vector f = ( f l , f i ,..., fro). The weights (w,. j = 1, 2 ,..., 10) are defined to provide a better quantization of the LSF parameters in the formant region of the spectrum, specially

- DECODER

around the lower frequency ones which are more important for the auditory perception. The mathematical defi- nition of the weights was first proposed in [8] and is in- spired by the property that whenever the speech spectrum has a peak there are at least two LSF's close to- gether. The weights for frame i are given by:

withf, = 0 andf,, = JE. Note that the set of weights has to be computed for each LSF vector to be quantized.

The principal features of the three class of quantizers are as follows.

2.1. Vector-Scalar Quantizer (VSQ)

The concept of vector-scalar quantization was explored by Grass and Kabal in [3]. It consists of a two step quantization procedure. In the first stage a moderately complex vector quantizer is applied to the input the LSF vector. The quantization error resulting from this first step is then processed by a bank of scalar quantizers. If the two stages are operated independently the performance of this technique is not very impressive. Howe- ver, the reproduction can be significantly improved if instead of selecting only the nearest codevector in the vector stage, a set of N best vectors are saved. All N first step quantization errors are then discretized by the scalar quantizer. The combination of output VQ codevector and scalar output levels which produces the lowest distortion for the particular input vector is selected to represent it. More formally, let ci, i = 1, ..., N be the N vectors closest to the input vectorfi, The input to

201554


the bank of scalar quantizers will be e, =A,, ci , Rrodu- cing at the output the quantized version e i = ( e i l , ..., gilo). The procedure will select as best output of the VQ the codevector ci which minimizes d,,, (fin, ci + e i ) , as il- lustrated in Fig. l

In order to allow for a fair comparison all the LSF quantizers addressed in this paper will operate at a rate of 28 bits per LSF vector. For the VSQ the total number of bits has to be apportioned between the two stages. The VQ portion was implemented with 8 bits, leaving 20 bits for scalar part. Limited tests indicated that a good distribution of these 20 bits among the 10 scalar quantizers is (3, 3, 2, 2, 2, 2, 2, 2, 1 , 1 ) . The search breadth M was taken to be 5.

This quantizer, which employs a fixed codebook for the first stage will from now on be referred to as Fixed VSQ WSQ).

Further increase in performance can be obtained if the correlation between successive LSF vectors is explored. Grass and Kabal suggest that a small fraction of the VQ codebook should change so as to include the most recent R quantized values off. In order to have room for this adaptive set, the R vectors least selected during the last ite- ration of the training procedure should be removed from the codebook. In our case, R was taken to be 6 and the removed vectors had a combined selection rate of 0.69%.

One drawback of this adaptive codebook vector-scalar quantizer (AVSQ) is its vulnerability to channel error propagation. Each incorrect vector added to the codebook may generate future incorrect decodings if it is again selected before being dropped from the codebook. In order to minimize this effect the adaptive portion of the codebook should be reset periodically. Here, after 30 LSF frames, the codebook is reinitialized with the R vectors originally removed after training.

A

2.2. Tree-searched multi-stage Vector Quantizer (TS-MSVQ)

In a multi-stage vector quantizer the input vector& is approximated by a sum:

where CW is the reproduction vector selected at the j-th quantization stage and N is the number of stages.

In general, the input to the j-th stage is the quantization error resulting from stage 0’ - 1). For example, the input to stage 2 is the error vector e(1) =fin - c(1) . This sequential search procedure is not very efficient and usually the performance of the technique saturates after only two or three stages. Ideally, the quantizer should search all possi- ble combinations of stage output levels and select the combination yielding the smallest error. However, this optimal approach is certainly too complex. An interme- diate solution was proposed in [4], where the authors pro- pose to save from one stage to the other the M closest reproduction levels. Thus, following stage j , the system

would compute M error vectors between e C t - 1 ) and cy’, k = 1.2, .... M . For each of these error vectors, the M Z best output levels of quantizer (j + 1) are selected. Out of the M choices resulting from the processing of all error vectors, the M alternatives producing the smaller errors are considered as inputs for stage (M, + 2). After the last stage the set of quantization outputs which produces the smallest value of weighted distortion is selected to reproduce&.

In this study a search breadth value of 4 was adopted and four stages of 128 levels were used, yielding the desired rate of 28 bits per LSF frame.

Training of the four codebooks was done sequential- ly. Codebook for first stage was designed first, error vectors resulting from the use of this first quantizer (on the speech data base already described) were then used to train second stage VQ and so forth.

The tree search procedure does not guarantee that the quantized LSF vector satisfies the ordering property, required for prediction filter stability. Hence a test should be included in the final step of the quantization algorithm to guarantee that the chosen value off has the ne- cessary property.

2.3. Inreerame predictive split Vector Quantizer (SVQ-IP)

This quantized is basically a Split VQ [9] coupled with interframe first-order prediction. Let us recall that in split vector quantization the vector of LSF’s is partio- ned into M subsets and each subset is then coded using a different codebook. As M grows there is a reduction both in performance and in complexity of implementation. In order to avoid this performance loss when a moderately complex split VQ is desired, in [5 ] it was proposed that advantage be taken from the high interframe correlation presented by the LSF coefficients. The scheme introduced in [ 5 ] actually applies prediction on every other frame so as to avoid error propagation and slope overload. Specifically. let us assume that frame f ( i ) is not quantized differentially (i.e., the actual LSF’s are quantized) then in the next frame the quantity to be discretized would be:

(4)

where u is the vector of prediction coefficients. The following LSF vector is again absolutely (not differentially) quantized.

Following the suggestion in [5]. a value of M = 4 was adopted in this work and bit distribution for the frame which is absolutely quantized (frame A ) and the frame which is predicted (frame B ) is as shown in Table 1.

Codebook design was performed, following the procedure outlined in [ 5 ] . using the previously mentioned speech data base. Both during training as well as during actual operation a test is implemented to avoid unstable sets of quantized coefficients (51.

vol. 5 . No. 5 Sepiembcr - October 1994 211555

Sonia L. Q. Dall'Agnol, Abraham Alcaim, Jose Roberto B. de hlarca

Table I - Bit assignment for M = 4, 28 bigframe SVQ-IP.

r----

~ Subsets Frame A Frame B

7 6

3. PERFORMANCE OF DIFFERENT QUANTIZERS

Initially the stand-alone performance of the quantizers described in the previous section plus that of a 38 bit scalar quantizer employed in the standard IS-54 [2] are evaluated in terms of the following filtered spectral distortion measure:

3 450

[ lOlogS(f ) -10log~(f ) ]2d ,]'I2 ( 5 )

A

where S and S are the original and reconstructed spectra associated with a given LSF frame.

In a later section the objective and subjective performance of these quantizers when they are used in a VSELP scheme will be described.

Table 2 contains the values of average spectral distortion for four vector quantizers operating at 28 bidframe and for the above mentioned scalar quantizer. Percenta- ge of outlier frames (frames with distortion higher than 2 dB) are also given in Table 2 for all the quantizers.

Table 2 - Values of average spectral distortion and percentage of outlier frames for four 28 bitflrame vector quantizers and a scalar quantizer operating at 38 bitJrame.

SDf(dB) I Outliers (%) ~ Outliers (%), j bet.2and4dBi > 4 d B ,

7 iiS-54 ~ 0.7782 j 0.1 18 ~ 0 1

____- . ~-

Quantizer

~~ c . 1---- --- ~.~ .... - ~ _ _ _

' SVQ-IP \ 1.0488 1 3.371 I 0.335 r'-~ t~--.- 1

2.898 ~ 0 I

+ --- AVSQ 1.1215

+- - -~ ~

i TS-MSVQ i 1.0255 . - ~~

As can be seen, the scalar quantizer presents the lowest distortion but it also uses 10 additional bits for each LSF frame, when compared with the vector quantizers. Among the VQ's the TS-MSVQ and the SVQ-IP have comparable behavior. The worst performance in

terms of average distortion is provided by the family of vector-scalar quantizers. As expected, the FVSQ cannot match the distortion yielded by its adaptive version. On the other hand the VSQ produces the smallest percentage of outlier frames among the VQ's, although the other two classes also appear to provide acceptable numbers.

4. VSELP CODER

This section presents a brief overview of the Vector Sum Excited Linear Predictive (VSELP) speech coder. VSELP was selected by the Telecommunications Indu- stry Association (TIA) and Electronic Industry Associa- tion in 1990 (EIA) as the standard for North American and Japanese digital cellular telephone systems [2, 10, 1 I]. This work used the basic structure of this standard

Speech coding for digital cellular systems presumes real time operation and so, reduced computational requirements. The VSELP speech coder achieves this goal through efficient utilization of structured excitation codebooks. The structured codebooks, besides reducing computational complexity, also reduce the amount of storage capacity and increase robustness to channel errors. Two VSELP excitation codebooks are used to improve speech quality and a new vector quantizer for the excitation gains is also employed to achieve high coding efficiency and robustness to channel errors.

(EIAlTIA-IS54).

4.1. Basic coder/decoder parameters

Fig. 2 a) shows a block diagram of the speech coder. The VSELP employs voice frames of 20 ms (160 samples) generating the following output parameters:

ri ( i = 1,2, ..., 10) : reflection coefficients of synthesis

RO : frame energy; L I , H

G

filter;

: long term predictor lag; : index of vector excitation from code-

: index of vector gain quantizer for ex- books 1 and 2, respectively;

citation gains.

Parameters r, and RO are computed once a frame and the others are computed in every subframe of 5 ms. The basic data rate of the speech coder is 8 kbids (without error protection bits). There are 160 bits per frame (20 ms) allocated as shown in Table 3.

The analog speech signal is converted to a digital signal in a uniform PCM format (with a minimum reso- lution of 13 bits) and it may be desirable, in some in- stances, to provide additional high pass filtering to the digital signal. A fourth order high pass filter with a f i l - ter response which is 3 dB down at 120 HZ and 40 dB down at 60 Hz is recommended. The ten reflection coefficients of the weighted synthesis filter are compu-

22/556 ETT


H

HPF

MDEBOOK 2

ENERGY AND

COEFFICIENTS COM PUTATl ON

I - t i I I

ADAPTIVE CODEBOOK I I

I I I

I I I I

I

H, ( 2 1

H

0 fl ‘62

TRANSFORMATION TO CODEBOOK GS, P O , Pl OF

ADAPTIVE CODEBOOK

r l

n m E l l Z C 0

POST-FILTER H ( z 1 I CODEBOOK

I

Fig. 2 - The VSELP Codec: a) Coder; b) Decoder.

Vol. 5. No. 5 September - October 1994

Sonia L. Q. Dall’Agnol, Abraham Alcaim, Jose Roberto B. de Mama

Table 3 - Bit Allocation for 8 kbith Coder.

(20 ms) Parameter

Filter Coefficients: r,

Energy RO ! 1-. i Excitation Index: I, H 7 + 7 ; 56

Lag: L 7 28 -+

32 I

I ~ Unused 1

ted from the high-pass filtered input speech through the LPC analysis using the fixed point covariance lattice algorithm (FLAT). The reflection coefficients are then converted into the prediction coefficients ui’s which are used on the fourth subframe of each frame. Before LIIGII usc UII LIIC I I I J L , JGL.UIIU auu L I I I I U ~ U I J I I ~ I I G J , LIIG

prediction coefficients are linearly interpolated. An overall frame energy is also computed and coded once per frame. This energy value RO reflects the average signal power in the input speech over a 20 ms interval which is centered with respect to the middle of the fourth subframe.

The high-pass filtered input speech is processed by the weighting filter W ( z ) and its output sequence is compared with the output of the weighted synthesis f i l - ter H,(z), which is excited by each codevector of each codebook. The indices of the codevectors that generate the minimum weighted error power are chosen as the indices for that subframe. Thus the minimization of the weighted error power is done for each excitation codebook and also, for the excitation gain codebook GS- PO-PI.

VSELP uses three excitation sequences: two of them coming from fixed and structured codebooks and the third one coming from an adaptative codebook that represents the long-term filter state. Each of the structured codebooks is constructed from a set of M basis vectors (M = 7 in the IS-54 standard) that are linearly combined to produce 2M excitation sequences. Using these properties of the VSELP codebook construction, the computation required for evaluation of the minimum weighted error power is greatly simplified. The adaptative codebook is updated in each subframe with the combined e,(n) which is computed through the sum of the three excitation sequences bl (n) , u1.1 (n) and u ~ . ~ ( n ) multi- plied by their respective gain p, yl and K.

The gains p, y,, y2 are jointly optimized to minimize the weighted error power. In fact, the vector quantization of p, y,, y2 is replaced by the joint vector quantization of mathematically equivalents parameters GS. PO, and PI. Coding the {GS, PO, PI } vector is ad-

vantageous over coding p, yl, y2 because (GS, PO, P 1 } is independent of the input signal level (due to the normalization of the absolute signal energy when RO is quantized) and because (GS, PO, P l } is bounded. The codebook for (GS, PO, P1 } comprises 256 codevectors.

The block diagram of VSELP speech decoder is shown in Fig. 2 b). In practice, the decoder is a subset of the encoder due to the analysis-by-synthesis procedure used in this type of speech coder. The only stage in the decoder that is not present in the coder is the adaptative spectral post filter that is used to enhance the per- ceptual quality of the synthetic speech.

The synthesis filter H (z) and the weighted synthesis filter H,.(z) are given by

1 H ( z ) = ”

I - & z-i r = l

(7) 1

, where A. = 0.8

1 = I

4.2. Channel error control

The VSELP codec was designed to be used in digital cellular systems, where the radio channels used for transmission can be very noisy. So, it is very important to provide additional protection for the data stream produ- ced by the speech coder. This data stream is comprised by the binary codes representing the 27 different parameters computed every 20 ms. The degradation of speech quality due to channel transmission errors depends on which parameter and which bit representing that parameter was affected by the error. As a consequence, the protection mechanism for the speech coder data stream must provide a non-uniform amount of error correction capability over the data stream.

The channel error control for the speech coder data employs three mechanisms for the mitigation of channel errors. The first is to use a rate one-half convolutional code to protect the more vulnerable bits. The second technique interleaves the transmitted data for each speech coder frame over two time slots to mitigate the effects of Rayleigh fading. The third technique employs a cy- clic redundancy check over some of the most percep- tually significant bits. After the error correction is applied, thc total rate of VSELP is increased to 13 kbit/s.

5. COMPLEXITY ISSUES

The selection of a given vector quantizer scheme for real time implementation can only be accomplished after careful consideration of the complexity involved.

241558 ETr


Add-compare-select 1 e.g. Viterbi decoding

The first step in evaluating the complexity of an algorithm is to define an adequate measure of complexity. Such a measure should reflect both the cost and power requirements of a real time implementation. In this work it was adopted the measure described in [ 5 ] . This measure takes into account the number of operations, with weight factors associated to each type of operation, and also the data memory requirements both in terms of static and dynamic memory. The amount of program memory needed is not included sin- ce it depends on the actual hardware platform which is being used. Formally this complexity measure can be expressed as:

6 ~

Complexity (C) = 0.2 DM + 0.05 S M + 0 (8)

where DM is the number of dynamic memory positions. SM the number of positions in ROM and 0 is a weighted sum of the required arithmetic operations. The weight associated to each type of operation is given in Table 4.

The measure just defined was used to evaluate the complexity of the quantizers described in section 2. For some steps of the algorithms the number of operations depends on the actual vector being quantized. In these situations the number entered in the computation of 0 was half of the number of operations that would be ne- cessary in the worst case scenario.

Table 4 - Weights associated to arithmetic operations.

Operation ~ Example __

Additions

Multiplications

Multiply-add

Data moves float. int

o=o*o+o

Table 5 - Complexity (C) of the Vector Quantizers defined in section 2.

SM Quantizer (byte) (byte) C

- -. . - - --- - - . - - 1-- --- kL ~ -_ I 22256 I :: 325 , 25806 21874 23415 14758 1

----- FVSQ

TS-MSVQt 155016 ' 188 45056 157307 '

6. PERFORMANCE IN IDEAL CHANNEL

This section will present the ideal channel performance of the LSP quantizers when used in a VSELP coding structure.

6.1. Peformance Measures

The speech material used in our simulations comprises 8 utterances in Brazilian Portuguese taken from the FM radio and from lists of phonetically balanced sen- tences [12]. Each utterance was spoken by one of 8 speakers (4 male and 4 female). They will be referred to as M1, M2, M3, M4, F1, F2, F3 and F4. It should be noted that no one of these utterances or speakers were used for training the quantizer codebooks.

As objective performance measure we have used the average value of the weighted signal to noise ratios of all frames in a test sequence. The weighted SNR for frame j is defined as

SNR, ( j ) = 10 log,, n = l

n=l

The values obtained for 0, DM and SM for each quantizer are given in Table 5. As shown in this table, the SVQ-IP is the least complex quantizer. Its complexity is about one order of magnitude lower than that of TS-MSVQ and roughly 60% of complexity presented by the fixed vector-scalar quantizer. The AVSQ is ne- cessarily of more complex implementation than its fixed version.

where s ( n ) tnd : ( n ) are original and synthesized speech, ( s ( n ) - s (n)),,. is the weighted (by W(:)) synthesis error and N is the number of samples ( N = 160) in the j - th CELP frame. The average value SNR,, for K frames of the test sequence is given by

-

This measure presents some correlation with subjective performance because the error is enhanced in frequencies where the human ear is more sensitive. I t should be remarked, however, that SNR, alone cannot predict all kinds of distortions present in the reconstructed speech and it is not completely adequate to assess particular impairments of the spectrum.

Subjective performance results were obtained for the

Vol. 5 , No. 5 September - October 1994 25m9

Sonia L. Q. Dall'Agnol, Abraham Alcaim, Jose Roberto B. de Marca

M2

F2

M3

F3

M4

F4

IS-54 VSELP standard (spectral parameters quantized with 38 bits/ frame) and the VSELP using the LSF vector quantizers (with 28 bitdframe) investigated in this work, both in ideal and noisy channels, and three MN- RU reference conditions with Q = 5, 15 and 25 dB. We have also included the FVSQ quantizer with simulated annealing in noisy channels. The test samples were ran- domly presented to 18 listeners who were asked to rank each one with scores ranging from 1 (unacceptable quality) to 5 (excellent quality). Results are given in terms of mean score, standard deviation, and 95% confidence interval.

12.05 12.16 12.05 12.07 12.12

10.46 10.41 10.44 10.38 10.40

12.06 11.85 11.94 11.95 11.96 j I

14.12 13.94 13.94 13.94 14.19 ' 13.20 13.13 13.06 13.06 13.09

15.45 15.35 15.29 , 15.36 , 15.28 ~

6.2. Simulation results

All Speakers

Objective performance results in ideal channel conditions are presented in Tables 6 and 7. The SNR, values are shown in Table 6 for each utterance and in Ta- ble 7 for the whole test sequence. Comparing these results with the analysis of the quantizers alone presented in section 3, we can see that the significant superiority of the IS-54 quantizer is no more observed when it is used in the VSELP (even using a higher rate). The weighted signal to noise ratios are similar for all systems.

Female Speakers Male Speakers

Table 6 - SNR, for each utterance in ideal channel.

IS-54

TS-MSVQ

I I I 1

4.19 0.82 1 0.11 I 4.15 I 0.81 0.15 4.24 0.82 0.16

4.21 0.75 I 0.09 4.24 1 0.75 I 0.13 4.18 0.75 0.13 I I

AVSQ

SVQ-IP

Q = 25 dB

3.8 3i9 4 io 4i I 4i2 4i3 4i4 I

4.12 0.92 0.14 I 4.18 I 0.86 0.17 4.06 0.97 0.22

4.02 0.83 ' 0.11 4.15 i 0.79 0.15 3.89 0.84 0.17

4.18 0.86 , 0.12 4.04 ~ 0.82 1 0.16 4.32 0.88 0.18

TS- MSVQ

1s -54

AVSQ - SVQ - I P -

a= 25d0 - Fig. 3 - Confidence intervals in ideal channel.

Statistics of the subjective quality tests are given in Table 8. Using these results we have plotted the confidence intervals shown in Fig. 3. It can be seen that the subjective results of the IS-54, TS-MSVQ and the Q = 25 dB reference condition are comparable. The AVSQ scheme is a little inferior and the SVQ-IP technique yields the worst performance. It should be noted, however, that this ranking cannot be taken in absolute terms because the confidence intervals overlap with each other.

Table 7 - SNR, for the whole test sequence in ideal channel.

i IS-54 j S V Q m G l

I SNR, (dBi 12.32 12.27 I 12.24 I 12.22 I 12.29 I

7. PERFORMANCE IN NOISY CHANNELS

In this section we present the error sensitivity of the bits representing the filter coefficients and a comparative performance analysis of the several quantizers in a VSELP coder, operating in noisy channels. To determi- ne the error rates associated to each bit it should be esta- blished the classes of protection. In this work we considered three classes of protection: 1 A, 1 and 2, with error rates 5 x , 4 x 10-2 and 10-' respectively.

261560


VQ Class Bit Number

7.1. Bit error sensitivity

Number of Bits per Class

The classes of protection for the bits generated by both the IS-54 quantizer and the vector quantizers are given in Tables 9 and 10, respectively. To assess the sensitivity of each bit to channel errors, the bits were systematically changed in the output bit stream, one at a time. We then measured the weighted signal to noise ra- tie SNR,. The results are shown in Fig. 4. -

1A

Table 9 - Classes of protection for the 38 bits generated by the IS-54 quantizer.

1,2,3,4,5,6,9,10, 16, 17 - Even Frame 1,2 - Odd Frame

12 bits/ (two frames)

Class Bit Number

1

2

1A

1

2

lNumber bits per class Of 1 Error Rate I

7, 8, 11, 12, 18, 23 - Even Frame 3 ,4 ,5 ,6 - Odd Frame (two frames)

13, 14, 15, 19 to 22, 24 to 30-Even Frame 34 bits/ 7.9 to 14, 16 to 21. (two frames) 23 to 29-Odd Frame

1 to7 ,9 8 bitdframe

8, 10, 12 3 bits/frame

11, 13, 14to28 17 bitdframe

10 bits/

I

I I I I

1A 11to9

1A 4,5,6,10.11,15, 9 bitslframe . 5 x 10-3 116.20,24 1 1 1 3,9,14,19 4 bitdframe 4 x 10-2

9 bitdframe I

1,2,7,8,12,13,17,18, 25 bitslframe 21.22.23.25 to 38

AVSQ

1 TS-MSVQ

I

1 10, 12, 13, 15 4 bitdframe

2 i 11,14, 16to28 15 bits/frame , -I

1A 1 to7 ,9 8 bits/frame

1 8, 10, 11, 12 4 bits/frame

2 I 13 to28

~~

16 bits/frame-l

SVQ-IP

WSQ

-

7.2. Pedomance in noisy channels

Table 1 1 shows the weighted signal to noise ratios ot the VSELP codec in noisy channels according to the

Vol. 5 . No. 5 September - October 1994

IS -54 - 0 I 10 I5 20 25 SO 35 40

12 r

l2 r

J' AVSQ

Fig. 4 - Bit error sensitivity.

21/56 I

Sonia L. Q. Dall’Agnol, Abraham Alcaim, JOG Roberto B. de Marca

Quantizer IS-54

m , ( d B ) i 9.23 ____

classes of protection previously defined. It can be seen that in terms of SNR,, the system which yields the best performance is the SVQ-IP, which is about 0.7 dB better than the IS-54 recommendation. The FVSQ also provided a better performance than the IS-54 Recommen- dation. The worst performance was obtained with the TS-MSVQ approach.

SVQ-IP FVSQ I AVSQ TS-MSVQ

9.98 9.67 1 9.23 9.08 I

Table I I - m, of the VSELP codec in noisy channels.

I

1 Quantizer

: IS-54 1

I ! SVQ-IP -- 1 FVSQ I

i

All speakers ~ Female speakers ~ Male speakers I

d CI 0 I C I P ___ ~

P d CI , P - 2.53 0.82 0.14 2.22 0.75 0.13 j 2.85 0.97 0.22’

2.49 0.87 0.12 2.50 08-16k.47 0.90 0.19

I ! -. 1

2.53 I 0.87 0.12 ’ 2.58 0.85 1 0.17 1 2.47 , 0.88 0.18

---- -..

Subjetive test results are given in Table 12 and in Fig. 5. From these results we can see that the IS-54, SVQ-IP and FVSQ quantizers are comparable, with a slight di- sadvantage for the FVSQ. The worst result is obtained with the AVSQ system, probably due to the effects of error propagation associated with the adaptivity of the codebook. It is important to recall that although subjec- tively the three coders have similar performance, the IS- 54 operates at a higher rate than the FVSQ and the SVQ-IP. In addition, in terms of computational com-

plexity the SVQ-IP quantizer is about 10 times simpler than the TS-MSVQ and the complexity of the FVSQ is 60% higher than the SVQ-IP system.

7.3. Zero redundancy coding

In order to improve the performance in noisy channels an attempt was made using a simulated annealing approach [ 13, 141 to properly assign the binary indices to the quantizers found in the IS-54 standard and also when the bank of scalar quantizers is replaced by the FSVQ.

In the case of the standard coder both the reflection coefficient quantizer and the quantizer for the gain vector GS-PO-PI had their indices reassigned. On the other hand, the FSVQ had the indices of its first stage VQ re- designed according to the results of the annealing technique. Furthermore after the index permutation another error sensitivity study was made for the coder with FVSQ. The bits were then reapportioned by the protection classes in order to reflect the new situation.

No significant improvement was verified both in the IS-54 performance, which is probably already very well engineered, and in that of the system employing the more sophisticated FVSQ.

It is worth pointing out that the simulated annealing

I ’ Q = 5 dB 1 1.45 I 0.65 1 0.07 1.38 1 0.68 0.11 1 1.53 I 0.62 1 0.09 I

li3 Ii4 115 210 2jl 2i2 2i3 214 215 2,6 217 2i8 - - - I S - 5 4

S V Q - I P - F V S Q

TS - MSVQ I

AVSQ - Q - 15dB - -

28/562

Fig. 5 - Confidence intervals in noisy channels.


algorithm was run using distortion metrics, namely: spectral distortion and a weighted euclidean distortion measure. It is conceivable, however, that the VSELP performance in hostile channels could be significantly improved if the simulated annealing had employed the overall coder distortion instead [ 151. Due to its complexity this solution was not attempted in this work.

8. CONCLUSIONS

In this paper we have examined the performance of three LSF vector quantizers for VSELP coders operating in ideal and noisy channels. They were compared to the bank of scalar quantizers described in the IS-54 North American Recommendation for digital cellular telephony, which was referred to as IS-54 quantizer. The latter operates at 38 bitslframe, while the vector quantizers operate at 28 bits/frame and present the hi- ghest spectral distortions. The significant superiority of the IS-54 quantizer is no more verified when it is used in the VSELP coder. For instance, in ideal channel the subjective performance of the IS-54 and the TS-MSVQ schemes were comparable. It seems that the analysis- by-synthesis process of the VSELP coder compensates to a certain extent the deficiencies introduced by the quantization of the filter parameters. Hence, if the goal is the quality of synthesized speech, the isolated evaluation of the quantizer should be regarded with caution. Another example, not mentioned here, is concerned with the SVQ-IP performance. When its final rate is lowered to 26 bits/frame, the spectral distortion increases from 1.0488 dB to 1.1868 dB and the percentage of “outliers” between 2 and 4 dB increases from 3.37 1 % to 5.381 %. On the other hand, the weighted signal-to-noise ratio of synthesized speech when the quantizer is used in a VSELP structure degrades only by 0.08 dB.

Noisy channel performance was evaluated at bit error rates of 5 x 1 ~ 3 , 4 x 10-2 and 10-1, depending on the selected class of bit protection. The IS-54, SVQ-IP and FVSQ quantizers have shown a similar performance, while the AVSQ scheme provided the worst quality of synthesized speech. It should be remarked that the IS-54 scheme uses 10 additional bits for each LSF frame.

In terms of computational complexity, the SVQ-IP quantizer is about 10 times simpler than the TS-MSVQ. In addition, as compared to the SVQ-IP system, the complexity of the FVSQ is 60% higher. We therefore conclude that the SVQ-IP quantizer may be an attracti-

ve approach to encode the spectral parameters for low bit rate speech coders using the CELP approach.

Manuscript received on May 24, 1994.

REFERENCES

[ I ] M. R. Schroeder, B. S. Atal: Code-Excited Linear Predicrion (CELP,: high-quality speech or ver?. low bit rares. Proc. Int. Conf. on Acoustics, Speech. and Signal Processing, Tampa,

121 Electronic Industries Association (EIA): Cellular svsrems. Re- port IS-54, December 1989.

[3] J. Grass, P. Kabal: Methods of improving Vecror-Scalar Quanri- zarion of LPC coefficienrs. Proc. Int. Conf. on Acoustics, Spee- ch, and Signal Processing. Toronto, Canada, May 1991. p. 657- 600.

(41 B. Bhattacharya, W. LeBlanc. S. Mahmoud. V. Cu rman: Tree searched multi-sfage Vector Quantrzarion of LP8‘aramerers for 4 kbMs speech coding. Proc. Inr. Conf. on Acoustics. Spee- ch, and Si nal Processing. San Francisco, USA, March 1992 p.

[5] J. R. B. de Marca: An LSF quanrizerfiw rhe Norrh-American half rare speech Coder. “IEEE Trans. Vehicular Technology“. August 1994.

161 N. Sugamura. F. Itakura: Speech analysis and synrhesis merhods developed at ECL in NTF - from LPC ro U P . “Speech Commu- nications.” Vol. 5. 1986, p. 199-215.

PI Y. Linde, A. Buzo, R. M. Gray: An algorirhm for Verfor Quan- rizer design. “IEEE Trans. on Communications”. Vol. COM-28, January 1980. p. 84-95.

(81 R. Lamia, N. Phamdo, N. Favardin: Roblrrr andeficient quanrizarion of speech U P oramerers using structured Verror Q u n - fizers. Proc. Int. Conf!on Acoustics, Speech and Signal Proces- sing, Toronto, Canada, May 1991, p. 641-W.

191 K. K. Paliwal, 8 . S. A d : Efficient Vector Quantixrion of LPC paromerers at 24 birdframe. Roc. Int. Conf. on Acoustics. S h and Signal Recessing. Toronto, Canada, May 1991, p. 6 6 1 - g

1101 E I m I A Interim Standard: Cellular system dual-mode mobile sfarion-base srarion comparibiliry standard IS-54. May 1990.

[ I I ] 1. A. Gerson, M. A. Jasiuk: Vecror Sum Excired Linear Predic- r i m (VSELP) speech coding ar 8 k9Ws. Roc. hi . Conf. Acou- stics, Speech and Signal Processing. April 1990, p. 461-462.

1121 A. Alcaim, J. A. Solewicz, J. A. Moms: Frequency of occur- rence of phones and lisrs of phonerically balanced senrences of the Portuguese Spoken in Rio de Janeiro. (in Portuguese), “Re- vista da Sociedade Brasileira de Telecomunica~iks”. Vol. 7, No. 1. December 1992, p. 23-4 1.

I131 J. R. B. de Marca, N. Farvardin, N. S. Jayant, Y . Shoham: Robu- sf Vecror Quanrrzafion for noisy channels. Proc. Mobile Satellite Conference, Pasadena, USA, May 1988. p. 5 15-520.

1141 N. Farvardin: A Study of Vector Quanrizarion fo r noisy channels.’TEEE Trans. on Infonnatlon Theory”. Vol. 36, NO. 4,

[ IS] W. B. Kleijn: Source dependent channel coding and its upplira- rion in CELP. AT&T Bell Laboratories Technical Memoran- dum, January 1991.

USA, 1985, p. 937-940.

i

1.105-1.1oi.

July 1990, p. 799-809.

voi. 5 , No. 5 September - October 1994 291563

S. L. Q. Dall’Agnol. A. Alcaim, J. R. B. de Marca: Performance of LSF Vector Quantizers for VSELP Coders in Noisy Channels

IETT, Vol. 5 - No. 5 1 ---___-____ - September - October 1994, p. 553 - 563

30t564 ETT

Date post:	26-Nov-2023
Category:	Documents
Upload:	puc-rio-br
View:	0 times
Download:	0 times

Performance of LSF vector quantizers for VSELP coders in noisy channels

Documents