ROBUST ENCODING OF THE FS1016 LSF PARAMETERS …

ROBUST ENCODING OF THE FS1016 LSF PARAMETERS :

APPLICATION OF THE CHANNEL OPTIMIZED TRELLIS CODED VECTOR QUANTIZATION

BOUZID Merouane Speech Communication and Signal Processing Laboratory,

Electronics Faculty, University of Sciences and Technology Houari Boumediene (USTHB), P.O. Box 32, El-Alia, Bab-Ezzouar, Algiers, 16111, ALGERIA

[email protected], [email protected]

ABSTRACT

Speech coders operating at low bit rates necessitate efficient encoding of the linear predictive coding (LPC) coefficients. Line spectral Frequencies (LSF) parameters are currently one of the most efficient choices of transmission parameters for the LPC coefficients. In this paper, we present an optimized trellis coded vector quantization (OTCVQ) scheme designed for robust encoding of the LSF parameters. The objective of this system, called initially "LSF-OTCVQ Encoder", is to achieve a low bit-rate quantization of the FS1016 LSF parameters. The efficiency of the LSF-OTCVQ encoder (with weighted distance) was first proved in the ideal case of transmissions over noiseless channel. After that we were interested on the improvement of its robustness for real transmissions over noisy channel. To protect implicitly the transmission parameters of the LSF-OTCVQ encoder incorporated in the FS1016, we used a joint source-channel coding carried out by the channel optimized vector quantization (COVQ) method. In the case of transmissions over noisy channel, we will show that the new encoding system, called "COVQ-LSF-OTCVQ Encoder", would be able to contribute significantly to the improvement of the FS1016 performances by ensuring a good coding robustness of its LSF spectral parameters. Keywords: source-channel coding, robust speech coding, LSF parameters.

1 INTRODUCTION In speech coding systems, the short-term spectral information of the speech signal is often modelled by the frequency response of an all-pole filter whose transfer function is denoted by H(z) = 1/A(z) in which A(z) = 1 + a1 z

−1 +…+ ap z

−p [1]. In telephone band speech coding (300-3400 Hz, fe = 8 KHz), the parameters of this filter are derived from the input signal through linear prediction (LP) analysis of p = 10 order. The 10 parameters {ai}i=1,2,…,10, known as the Linear Predictive Coding (LPC) coefficients [1], play a major role in the overall bandwidth and preserving the quality of the encoded speech. Therefore, the challenge in the quantization of the LPC parameters is to achieve the transparent quantization quality [2], with the minimum bit-rate while maintaining the memory and computational complexity at a low level. In practice, one doesn't quantify directly the LPC coefficients because they have poor quantization properties. Thus, other equivalent parametric representations have been formulated which convert them into much more suitable parameters to

quantize. One of the most efficient representations of the LPC coefficients is the Line Spectral Frequency (LSF) [3]. The LSF parameters (LSFs) which are related to the zeros of polynomials derived from A(z) [1] exhibit a number of interesting properties. These properties [2] make them a very attractive set of transmission parameters for the LPC coefficients. Exploiting these properties, various coding schemes based on scalar and vector quantization were developed in the past for the efficient quantization of spectral LSF parameters. Several works showed that the vector quantizer (VQ) schemes, such as multistage VQ [4], Split VQ [2]…, can achieve at lower bit-rates the transparent quantization quality of the LSFs compared with those conceived based on scalar quantizer (SQ). In this paper, we present an optimized trellis coded vector quantization (OTCVQ) scheme designed for the efficient and robust coding of LSF parameters. The aim of this system, called at the beginning "LSF-OTCVQ Encoder", is to achieve a low bit rate transparent quantization of LSFs by exploiting the intra-frame dependence between the closest pairs of the LSF parameters. In the case of

ideal transmissions over a noiseless channel, we have already proved in [5] that the LSF-OTCVQ encoder (with weighted distance) could achieve good performances when applied to encode the LSF parameters of the US Federal Standard FS1016. Indeed, we have showed that LSF-OTCVQ encoder of 27 bits/frame produces equivalent perceptual quality to that obtained when the LSF parameters are unquantized. Subsequently, our interest was drawn to the improvement of the LSF-OTCVQ encoder robustness for real transmissions over noisy channel. In low bit rate speech coding domain, the essential objective is to reduce the bit rates of speech coders while maintaining a good quality of transmission. In general, during the design of speech coding systems, the effects of transmission noises are often neglected. A redundant channel coding [6] is conventionally used to ensure an "explicit" protection to sensitive parameters of speech coders against channel errors. According to the separate design approach, suggested by Shannon in his classical source/channel coding theorems [7], the channel encoder can be designed separately from the source encoder by adding redundant bits (Error-detecting-correcting codes) to source data. Indeed, robust encoding systems could be designed according to this separation approach but at the cost of an increase of the bit-rate/delay transmission and the complexity of the coding/decoding. However, at low bit rate where the constraints in complexity and delay are very severe, this channel coding is not especially recommended. The separation design disadvantages have motivated some researchers to investigate a joint solution to the source and channel coding optimization problem so that they can reduce the complexity on both sides, while providing performances close to the optimum. For these purposes, Joint Source-Channel Coding (JSCC) was introduced in which the overall distortion is minimized by simultaneously considering the impact of the transmission errors and the distortion due to source coding [8], [9], [10]. Most of these works have proved the effectiveness of the JSCC to protect implicitly (i.e., without redundancy) source data while maintaining a constant bit rate and a reduced complexity. To implicitly protect the transmission indices of our LSF-OTCVQ encoder incorporated in the FS1016, we used a JSCC method carried out by the Channel Optimized Vector Quantization (COVQ). We will show first how to adapt and apply successfully the COVQ technique for the robust design of a new encoding system (called "COVQ-LSF-OTCVQ encoder") in order to implicitly protect some of its indices. To finish, we will generalize the study with the complete protection of all the indices of the COVQ-LSF-OTCVQ encoder. An outline of this paper is as follows. In section

2, we briefly review the basics of vector quantization. In section 3, we describe the design steps of the OTCVQ encoding system. Examples of comparative results of TCVQ/OTCVQ encoders are reported in this section. Next, we present the joint coding method by the COVQ technique. The performances of the COVQ system applied to encode memoryless source are presented at the end of the section. The application of the OTCVQ scheme for encoding the LSF parameters is described in section 5. Simulation results, when using two different distance measures (unweighted and weighted) in the design and the operation of the LSF-OTCVQ encoder, are provided. In section 6, we present the application of the LSF-OTCVQ encoder to quantize the LSF parameters of the FS1016 speech coder. After, a JSCC-COVQ method was used to implicitly protect the LSF-OTCVQ indices for transmissions over noisy channel. Conclusions are given in section 7. 2 VECTOR QUANTIZATION A k-dimensional vector quantizer (VQ) of size L is a mapping Q of k-dimensional Euclidean space ℜk into a finite subset (codebook) Y = {y0,…, yL−1} composed of L codevectors [11]. The design principle of a VQ consists of partitioning the k-dimensional space of source vectors into L non overlapping cells {R0,.., RL−1} and associating with each cell Ri a unique codevector yi. Coding a sequence of input source vectors by a VQ consists thus to associate to each source vector x the binary index i ∈ {0,…, L −1} of a close codevector yi whose distance from the input vector is minimized. In general, the vector quantization involves an irreversible loss of information which results in a quality degradation evaluated commonly by a distortion measure. For a given VQ, the average distortion is defined by [11]:

( ) dxxpyxdk

DL

i Rxi

i

⋅= ∑ ∫−

= ∈

1

0),(1 , (1)

where p(x) is the k-fold probability density function of the source and d(x, yi) is the widely used squared Euclidean distance.

The optimal design of a VQ is based on the principle of searching simultaneously the partition {R0,.., RL−1} and the representing codevectors {y0,.., yL−1} which minimizes the average distortion D. To resolve this problem, two main necessary conditions of optimality need to be successively satisfied during the VQ design process [11]: 1. For a given codebook Y = {y0, y1,..., yL−1}, the optimal partition satisfy :

{ }ijyxdyxdxR jii ≠∀≤= ),,(),(: (2)

It's the nearest neighbor optimality condition.

2. Given an encoder partition {Ri, i = 0,..., L−1}, the optimal codevectors yi are the centroids in each partition cell Ri (centroid condition) :

)/()( iii RXXERCenty ∈== (3)

Various algorithms for the design of VQ have been developed in the past. The most popular one is certainly the LBG algorithm [12]. This algorithm (LBG-VQ) is an iterative application of the two optimality conditions such as the partition and the codebook are iteratively updated. 3 OPTIMIZED ENCODING SYSTEM BASED

ON THE TRELLIS CODED VECTOR QUANTIZATION

The scalar trellis coded quantization (TCQ) [13]

and its generalized version to vector case (TCVQ) [14], [15] improve upon traditional trellis encoders [16] by labelling the trellis branches with entire subsets rather than with individual reproduction levels. This approach, which was motivated by Ungerboeck's formulation of Trellis Coded Modulation (TCM) [17], uses a structured alphabet with an extended set of quantization levels. In this work, one was interested particularly on the TCVQ encoder which structure is quite similar to TCQ, with an increase in complexity due to vector codebook searching [14]. The design of a TCVQ encoder consists of several interrelated steps. These steps include selection of trellis, extended initial codebook construction, partitioning of the codebook's codevectors into subcodebooks (subsets) and labelling the trellis branches with these subsets. Consider the design process of a k-dimensional TCVQ encoder of rate R bits per sample (bps) used to encode a sequence of source vectors. The S-state trellis used in TCVQ can be any one of Ungerboeck's amplitude modulation trellises [17]. The extended initial TCVQ codebook is generally designed by the LBG algorithm. It contains 2kR+1 codevectors (twice that of the VQ). However, during the TCVQ encoding process, only a subset of size 2kR of these codevectors may be used to represent a source vector at any instance of time. According to Ungerboeck's set partitioning method, the codevectors are then partitioned into four subsets D0, D1, D2 and D3 each of size 2kR−1. In our TCVQ encoders design, we used the heuristic algorithm described in [15] to partition the extended TCVQ codebook. After that, the subsets are labelled on the trellis branches according to Ungerboeck's rules of TCM [17]. These rules are meant to ensure that the distortion between the

original and the reconstructed source sequences (under clear channel assumptions) is close to the minimum. To encode the source vectors sequence, the well-known Viterbi algorithm [16] is used to find a legitimate optimal path through the trellis, which results in minimum distortion. The TCVQ encoder transmits to reception a bit sequence specifying the corresponding optimal path (sequence of subsets) in addition to a sequence of kR−1 bits codewords necessary to specify codevectors from the chosen subsets. At the TCVQ decoder side, the bit sequence that specifies the selected optimal trellis path is used as the input to the convolutional coder of the TCVQ system. The output of this coder selects the proper subset Di. The codewords of the second binary sequence are used to select the correct codevectors from each subset. An example of a 4-states scalar TCQ encoder of rate R = 2 bps used to encode a memoryless source, which is uniformly distributed on the interval [-A A], is illustrated on Fig. 1.

(a)

(b)

(c) Figure 1: TCQ encoder of rate R=2 bps : (a) Section of labelled 4-states trellis, (b) Output alphabet levels and partition, (c) TCQ convolutional coder. Examples of simulation results for encoding unity-variance memoryless Gaussian sources using integer and fractional rates TVCQ encoders are respectively given in tables 1 and 2. For different rates, results are given in terms of Signal to Noise Ratio (SNR) in dB, along with the corresponding LBG-VQ performance and distortion rate function

D0 D1 D2 D3 D0 D1 D2 D3

⊕

x0

-7A/8 -5A/8 -3A/8 -A/8 A/8 3A/8 5A/8 7A/8

y1

y0 Input bit Output bits

⊕ s2 s1 s0

0/D0

1/D2

0/D1

1/D3

1/D2

0/D0

1/D3

0/D1

D(R). Notice that when the rate is fractional, the dimension k has to be such that kR becomes an integer.

Table 1: Performances of TCVQ encoding with integer rates for the Gaussian source.

Table 2: Performances of TCVQ encoding with fractional rates for the Gaussian source.

TCVQ Trellises Size (State's Number)

Rate bps

Dim.

k 4 8 16 32 64 128 256

LBG-VQ

D(R)

0.66 6 3.34 3.39 3.41 3.42 3.45 3.48 3.49 3.05 4.010.75 4 3.72 3.78 3.80 3.82 3.87 3.90 3.93 3.36 4.510.80 5 3.96 4.04 4.07 4.08 4.14 4.18 4.20 3.69 4.82

At the same encoding rate, these results show that the TCVQ outperforms the TCQ (k = 1). Moreover, the TCVQ allows fractional rates as shown by the simulation results listed in table 2. We can see also that, for a given rate, the TCQ/TCVQ performances are higher than those of the conventional SQ/VQ. To more improve the TCVQ performances, a training optimization procedure for the extended TCVQ codebook design was developed [5]. For a given training source vectors, this procedure updates the TCVQ codebook by replacing each codevector with the average of all the source vectors mapped to this codevector. This leads to an iterative design algorithm for the overall TCVQ encoder. Using this optimization variant, the algorithm will be called OTCVQ (Optimized Trellis Coded Vector Quantization) algorithm. Examples of simulation results for encoding memoryless Gaussian sources using fractional rate OTCVQ encoders are listed in table 3.

Table3 : Performances of the OTCVQ with fractional rates for the Gaussian source.


Rate bps

Dim. k

4

8

16

32

64

128

256

LBG-VQ

D(R)

0.66 6 3.41 3.45 3.47 3.48 3.49 3.52 3.53 3.05 4.010.75 4 3.81 3.85 3.87 3.89 3.93 3.96 3.97 3.36 4.510.80 5 4.08 4.14 4.16 4.17 4.21 4.23 4.25 3.69 4.82

Comparing these results with those given in table 2, we clearly notice the performance improvements

brought by the optimization of the TCVQ codebooks. 4 JOINT CODING BY THE CHANNEL

OPTIMIZED VECTOR QUANTIZATION Vector quantization is currently used in various practical applications and since some type of channel noise is present in any practical communication system, the analysis and design of VQs for noisy channels is receiving increasing attention. In this work, we considered the joint source-channel coding (JSCC) associated specifically with the use of VQ in order to provide an implicit protection to our quantizers. Particularly, we were interested on a category of JSCC relating to quantizers optimized by taking into account the error probability of channel. It's about the channel optimized vector quantization [8], [18]. 4.1 COVQ system principle: Modified

optimality conditions A channel optimized vector quantizer (COVQ) is a coding scheme based on the principle of VQ generalization by taking into account the present noise on the transmission channel. The idea is to exploit the knowledge about the channel in the codebook design process and the encoding algorithm. Thus, the operations of source and channel coding are integrated jointly into the same entity by incorporating the channel characteristics in the design procedure. Indeed, the LBG-VQ is well appropriate to a modification in this sense. The purpose then is to minimize a modified total average distortion between the reconstituted signal and the original signal, given the channel noise. The design of a COVQ encoder is carried out by a VQ version extended to the noisy case [8], [18]. The COVQ scheme keeps the same VQ block structure (encoder/decoder, dimension, bit rate). The difference is in the formulation of the necessary conditions of optimality to minimize a modified expression of the total average distortion. This new distortion is formulated by considering simultaneously the distortion due to vector quantization and channel errors [18], [19]:

∑ ∫ ∑−

=

−

= ⎥⎥

⎦

⎤

⎢⎢

⎣

⎡⋅=

1

0

1

0),()/()(1 L

i R

L

jj dxyxdijpxp

kD

i

, (4)

where p(j/i) is the channel transition probability which represents the probability that the index j is received given that the index i is transmitted. By comparing the Eq. (4) with Eq. (1), one can notice easily that these two equations are equivalent, except that the Eq. (4) uses a modified distance measure (term in the braces). It about the same distance d but with weightings given by the channel transition


Rate bps

Vector Dim. 4 8 16 32 64

LBG- VQ

D(R)

1 4.64 4.77 4.85 4.93 4.98 4.40 2 4.85 5.03 5.09 5.10 5.21 4.42 3 4.98 5.08 5.12 5.15 5.22 4.49

1

4 5.05 5.14 5.16 5.18 5.23 4.69

6.02

1 10.18 10.31 10.38 10.46 10.51 9.31 2 10.36 10.50 10.57 10.60 10.69 9.70 3 10.59 10.69 10.72 10.75 10.81 10.00

2

4 10.93 11.02 11.05 11.07 11.12 10.41

12.04

probabilities p(j / i), i, j = 0,..., L−1. The formulations of optimality necessary

conditions of COVQ are also derived in two steps, according to the minimization principle of the modified total average distortion [8], [18], [19].

For a given codebook Y = {y0,..., yL−1} and by using a squared Euclidean distance measure, the optimal partition Ri (i= 0,..., L−1) for a noisy channel is such that :

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

≠∀−≤−∈= ∑ ∑−

=

−

=

1

0

1

0

22,)/()/(:

L

j

L

jjj

ki ilyxljpyxijpRxR (5)

Similarly, the optimum codebook for a fixed partition is given by:

∑ ∫

∑ ∫−

=

−

==

1

0

1

0

).()/(

).()/(

L

i R

L

i Rj

i

i

dxxpijp

dxxxpijp

y , j = 0,…, L−1. (6)

The codevector yj represents now the centroid of all input vectors that are decoded into the cell Rj, even if the transmitted index i is different from j. The equations (5) and (6) are respectively referred as the generalized nearest neighbour and centroid conditions with a modified distortion measure. The optimal codevectors for noisy channel are thus linear combinations of those for the noiseless case, weighted by the a posteriori channel transition probabilities. In our applications, the communication channel considered is a discrete memoryless channel with finite input and output alphabets. Precisely, we assumed a memoryless binary symmetric channel (BSC) model with bit error (crossover) probability p [6], [16]. For codewords (VQ indices) of n bits, the BSC transition probabilities are described by [9], [19]:

),(),()1()( jidjidn HH ppijp ⋅−= − , (7)

where dH (i, j) (0 ≤ dH (i, j) ≤ n) is the Hamming distance between the n-bits binary codewords represented by integers i and j.

When the channel bit error probability p is sufficiently small, the probability of multiple bit errors in an index is very small relative to the probability of zero or one bit error [9], [18], [19]. To simplify the numerical computations, it is often adequate to consider only the effects of single bit errors on channel codewords. The BSC channel model can be then approximated by [9]:

=)( ijp ⎪⎩

⎪⎨

⎧−0

1 npp

otherwise

ijj i

,,

=ξ∈

(8)

where ξi is the set of all integers j, (0 ≤ j ≤ L −1), such that the binary representation of j is of Hamming distance one from the binary representation of i. In the case where the source distribution is unknown, long training database of k-dimensional vectors can be used for the quantizer design. With the approximation given in Eq. (8), the equations (4) and (6) will be respectively modified as:

∑∑−

=

−

ξ∈⋅=

1

0

1),()/(11 N

t

L

jjtt

i

yxdijpkN

D , (9)

and :

NRijp

Nxijp

y

ii

i Rxll

j

j

j il

/)/(

/)/(:

∑

∑ ∑

ξ∈

ξ∈ ∈= , (10)

where N is the size of the training base and iR denotes the number of training vectors belonging to the cell Ri. 4.2 COVQ encoder design algorithm

The design procedure of the COVQ encoding system is a straightforward extension of the LBG-VQ algorithm. An iterative optimization of the two modified optimality conditions is carried out such as the partition and the codebook codevectors are updated by using the modified distortion including the channel probability [8], [18]. The steps of our version of the COVQ algorithm are detailed in [20]. We suppose that a set of input vectors is available (training base) and that the BSC channel error probability ε is given. This channel probability, which is often called design error probability of COVQ codebook, is considered as an input parameter in the optimization process. At the beginning this design parameter is set temporarily at a low value; then gradually increased until matching the desired design error probability. The choice of the initial codebook is very important since it can significantly impact the final results. In our design, the initial codebook is conceived for ε = 0 (i.e., for noiseless channel). It is about a simple run of the conventional LBG-VQ algorithm which will converge to a locally optimal codebook. This codebook will be used as initial

codebook of the COVQ algorithm. Then, for each stage of ε, the algorithm will converge to an intermediate codebook which will be used as initial codebook of the next stage in the COVQ design process.

The greatest difficulty in the COVQ system design is that the channel error probability is a parameter in the optimization process. In real transmission situation, this parameter is difficult to estimate. It may even vary in time, making the design according to a specific value rather academic. Thus, according to the practical situation and to the estimates of the real communication channel characteristics, COVQ encoders can be selected to obtain the highest degree of robustness.

4.3 COVQ encoder performances We now present numerical results on the performance of COVQ encoding system operating over a BSC channel with variable bit error probability p. Examples of simulation results of COVQ encoders, trained for various values of the design probability parameter ε (ε = 0.001, 0.005, 0.010 and 0.050) are given in table 4. These encoders, whose selected characteristics are: k = 2, R = 2 bps and L = 16, were applied to encode memoryless Gaussian source. For a comparative evaluation with the conventional VQ, the LBG-VQ (designed for a noiseless channel, ε = 0.000) performances were also included in the table. Table 4 : SNR Performances comparison between COVQ and VQ over BSC channel

ε p 0.000 0.001 0.005 0.01 0.05

0.000 9.686 9.685 9.624 9.537 8.664 0.001 9.584 9.604 9.565 9.481 8.643 0.005 9.292 9.314 9.357 9.332 8.571 0.01 8.927 8.965 9.034 9.179 8.477 0.05 6.824 6.918 7.351 7.608 7.800 0.1 4.650 5.292 5.875 6.801 7.043 0.2 2.518 3.109 3.876 4.752 5.886

In the case of transmissions over noisier channels (higher values of p), the results indicate that COVQ performs better than LBG-VQ. For example, for a BSC of p = 0.2, a considerable SNR gain of 3.36 dB was obtained by the COVQ (trained for ε = 0.05) compared with the LBG-VQ. One notice that when the channel probability p does not match with the design probability ε, COVQ encoders trained for ε identical or close to p are those which yields the best performances. However, when the channel is noiseless (p = 0.000) the SNR-performances of

COVQ encoders are suboptimal with the increase of the design parameter ε. In this case, the LBG-VQ ensures comparable performances or better than the COVQ. Same remarks when the channel error probability is low (p < 0.005) with a slight performances improvement obtained by COVQ encoders trained for a low value of the design parameter ε (example, COVQ for ε = 0.001). 5 OPTIMIZED-TCVQ FOR LOW-BIT RATE

ENCODING OF LSF PARAMETERS Using the OTCVQ encoding technique, an encoding scheme for the LSF parameters is presented in this section. The aim of this encoding system, called "LSF-OTCVQ Encoder" [5], is to efficiently quantize the LSF parameters of one frame using only the dependencies among the same parameters. For speech coding applications, the OTCVQ is used in block mode, where each block corresponds to an LSF vector of size 10. In this work, two-dimensional 2-D codebooks (k = 2) are used for encoding the LSF vectors. Thus, each stage in the trellis diagram is associated with 2-D of the LSF vector. Hence, there are five stages in the LSF-OTCVQ trellis with two branches entering and leaving each state. Since the LSF parameters have different means and variances, five extended codebooks are then needed to encode an LSF vector. Knowing that choice of an appropriate distance measure is an important issue in the design of any VQ system, we have used another distance measure in the design and the operation steps of the LSF-OTCVQ encoder. It's about the weighted Euclidean distance measure. Based on the LSF parameters properties, several weighted distance measures have been proposed for the LSF encoding [2], [4], [21]. In our applications, we used the weighted squared Euclidean distance given by:

∑=

−=10

1

2)ˆ()ˆ,(i

iiii ffwcffd , (11)

where fi and if̂ are respectively the ith coefficients of

the original f and quantized f̂ LSF vectors; ci and wi represent respectively the constant and variable weights assigned to the ith LSF coefficient. These weights are meant to provide a better quantization of LSF parameters in the formant regions. Many weighting functions have been defined to calculate the variable weight vector w = [w1,…, w10]. Particularly, we used the weighting function, known by the inverse harmonic mean (IHM) [21]:

iiiii ffff

w−

+−

=+− 11

11 , (12)

where f0 = 0 and f11 =0.5. The constant weight vector c = [c1,…, c10] is experimentally determined [2]:

⎪⎪⎩

⎪⎪⎨

⎧

==

≤≤=

10,4.09,8.0

81,0.1

iforifor

iforci (13)

The LSF quantizer performances are evaluated by the average spectral distortion (SD) which is often used as an objective measure of the LSF encoding performance. This measure correlates well with human perception of distortion. When calculated discretely over a limited bandwidth, the spectral distortion for frame i is given, in decibels, by [4] :

∑−

=π

π

⎥⎥⎦

⎤

⎢⎢⎣

⎡

−=

1 2

/2

/2

1001

1

0 )(ˆ)(log101 n

nnNnj

Nnj

ieS

eSnn

SD . (14)

For speech signal sampled at 8 kHz with a 3 kHz bandwidth, an N = 256 point FFT is used to compute the original S(ej2πn/N) and quantized Ŝ(ej2πn/N) power spectra of the LPC synthesis filter, associated with the ith frame of speech. The spectral distortion is thus computed discretely with a resolution of 31.25 Hz per sample over 96 uniformly spaced points from 125 Hz to 3.125 kHz. The constants n0 and n1 in Eq. (14) correspond to 1 and 96 respectively. Generally, it is accepted that an average SD of about 1 dB indicates negligible audible distortion has incurred during quantization. This value has been, in the past, suggested for transparent quantization quality and used as a goal in designing many LPC quantization schemes. In [2], Paliwal and Atal established that the average SD is not sufficient to measure perceived quality alone. They introduced the notion of spectral outliers frames. Consequently, we can get transparent quality if we maintain the following three conditions: − 1) The average SD is about 1 dB,

− 2) The percentage of outlier frames having SD between 2 and 4 dB is less than 2%,

− 3) No frames must have SD greater than 4 dB. Now, we evaluate the performances of our LSF-OTCVQ encoder operating at different bit rates. All simulation results reported in this section were obtained by using four-state trellis and 2-D codebooks. For each encoding rate, 2 bits are thus assigned to represent the initial state. When the remaining bits cannot be equally assigned to represent the five 2-D codebooks, fewer bits are used in the last codebooks, since it is known that human resolution in the higher frequency bands is less than in the lower frequency bands. We investigated the optimum bit allocations for the LSF-OTCVQ encoder and found that the bit allocations given in table 5 yield the best results. Table 5 : Bit allocations of each LSF-OTCVQ trellis stage codebook as a function of bit rate

Trellis Stage Number : Bits / LSF Vector 1 2 3 4 5

24 5 5 5 4 3 25 5 5 5 5 3 26 6 5 5 5 3 27 6 6 5 5 3 28 6 6 6 5 3

Bits / Stage

codebook

The speech data used in the experiments of this section consists of approximately 43 min of speech taken from the TIMIT speech database [22]. To construct the LSF database, we have used the same LPC analysis function of the FS1016 speech coder [23]. A 10-order LPC analysis, based on the autocorrelation method, is performed every analysis frame of 30 ms using a Hamming window. One part of the LSF database, consisting of 75000 LSF vectors, is used for training and the remaining part, of 11262 LSF vectors (different from the training set), is used for test. For different bit rates, the performances of the LSF-OTCVQ encoder are shown in table 6. These results have been obtained by using separately two different distortion measures (unweighted and

SD Outliers (in %) SD Outliers (in %)

Bits/frame

Average SD (dB) 2- 4 dB > 4 dB

Average SD (dB) 2- 4 dB > 4 dB

24 1.34 7.04 0.03 1.29 5.26 0.02 25 1.24 3.97 0.03 1.19 2.99 0.00 26 1.18 3.01 0.02 1.15 2.72 0.00 27 1.14 2.95 0.02 1.07 1.90 0.00 28 1.04 1.60 0.01 0.98 1.10 0.00

LSF-OTCVQ (unweighted distance) LSF-OTCVQ (weighted distance)

Table 6: Performances of the LSF-OTCVQ encoder as a function of bit rate.

weighted distances) in both the design and the operation of the LSF-OTCVQ encoder.

These comparative results clearly show the improvement of the LSF-OTCVQ performances, obtained by using the weighted distance. The LSF-OTCVQ encoder, designed with a weighted distance, need 27 bits/frame to get transparent quantization quality. Compared to the encoder designed with the unweighted distance, it can save about 1-2 bits/frame while maintaining comparable performances. 6 EFFICIENT AND ROBUST CODING OF

THE FS1016 LSF PARAMETERS: APPLICATION OF THE LSF-OTCVQ

In this section we use the LSF-OTCVQ encoder (with weighted distance) to quantize the LSF parameters of the FS1016. For the moment, we suppose that the transmissions are done over a noiseless ideal channel. Recall that the US Federal Standard FS1016 is a 4.8 kbits/s Code Excited Linear Prediction (CELP) speech coder [23]. According to the FS1016 norm, the LSF parameters are encoded at the origin by an SQ of 34 bits/frame. For the same test database (11262 LSF vectors), this 34 bits/frame LSF SQ results in an average SD of 1.72 dB, 25.99 % outliers in the range 2-4 dB, and 0.46 % outliers having SD greater than 4 dB. By comparing these results with those given in table 6, we can see that the LSF-OTCVQ encoder (for all studied lower rates) performs better than the 34 bits/frame SQ used at the origin in the FS1016. Thus, several bits per frame can be gained by the application of the LSF-OTCVQ in the LSF encoding process of the FS1016.

Subjective listening tests of the 27 bits/frame LSF-OTCVQ encoder were also performed. Incorporating this encoder in the FS1016, the bit rate for the quantization of the LSF parameters decreases to 900 bits/s and consequently the FS1016 operate at a bit rate of 4.57 kbits/s. To carry out these tests, we generated for the same original speech signal three versions of synthetic speech signals: one with unquantized LSFs and the two others with quantized LSFs using respectively the 27 bits/frame LSF-OTCVQ encoder and the 34 bits/frame SQ. Subjective quality evaluations are done here through A-B comparison and MOS (Mean Opinion Score) tests using 8 listeners. Six sentences from the TIMIT database (spoken by three male and three female speakers) are used for the subjective evaluations. The A-B comparison test involves presenting listeners with a sequence of two speech test signals (A and B). For each sentence, a comparison is done between the two synthetic signals: one A (or B) with unquantized LSFs and the other B (or A) with LSFs quantized by the LSF-OTCVQ encoder. The A-B signal pairs are presented in a randomized order. The listeners choose either one or the other of the two

synthesized versions, or indicate no preference. For the MOS tests, the listeners were requested to rate each synthetic speech sentence (with LSF-OTCVQ quantized LSFs) in a scale between 1 (bad) and 5 (excellent). At the end, the average score of opinion (MOS) is calculated. Results from the A-B comparison tests show that the majority of the listeners (58.84 %) have no preference. The mean preference for speech signal coded with LSF-OTCVQ quantized LSFs (20.83 %) is identical to that obtained for the speech signal coded with unquantized LSFs. Roughly, we can conclude that the two considered versions of coded speech are statistically indistinguishable, i.e., there are no perceptible differences and the quantization does not contribute to audible distortion. In terms of MOS, the considered coded version of speech exhibits a good score of 3.89. This implies that good communications quality and high levels of intelligibility [2] are obtained using the 27 bits/frame LSF-OTCVQ encoder in the FS1016. In addition, in term of average segmental signal-to-noise ratio (SSNR), the synthetic speech signals with unquantized LSF parameters gave an average SSNR of 11.05 dB; with LSF-OTCVQ encoding of LSF parameters, the average SSNR obtained is 10.31 dB. In the case where LSF parameters are quantized by the 34 bits SQ, an average SSNR of 9.59 dB was obtained. Thus, a reduction in coding rate with an improvement of the SSNR-performances of the FS1016 was obtained by application of the LSF-OTCVQ encoding system. 6.1 Robustness of the COVQ-OTCVQ encoder:

Transmission over a noisy channel

In a practical communication system, the robustness of the LSF-OTCVQ encoder must be reinforced so that the encoder will be able to cope up with channel errors. In this part, we were interested in implicit protection of the encoders by application of the JSCC-COVQ technique. We will see first how to apply the COVQ for the robust design of the LSF-OTCVQ encoder in order to provide an implicit protection to some of its indices. To finish, we will generalize the study with the full protection of all the indices of the new LSF-OTCVQ encoder with the COVQ technique. 6.1.1 Design of the LSF-OTCVQ encoder with

JSCC-COVQ technique The design principle of the LSF-OTCVQ encoder optimized for noisy channel is based mainly on the design algorithm of LSF-OTCVQ modified according to the basic concept of the COVQ. In the applications, the five extended codebooks of our new encoding system, denoted by: "COVQ-LSF-OTCVQ encoder", were optimized for a design error

probability ε = 0.05. The basic steps of our design algorithm of the 27 bits/frame COVQ-LSF-OTCVQ encoder are summarized below. Notice that the trellis states number of the encoder is always S = 4; consequently 2 bits/frame are necessary to represent the initial state. The remaining 25 bits are assigned for the 5 codebooks according to the bits allocation given in table 5. Let us specify that at the beginning the 5 initial extended codebooks are designed by the LBG-VQ algorithm (ε = 0.000) using the weighted Euclidean distance. The codebooks design of COVQ-LSF-OTCVQ encoder is done using the same training data base (75000 LSF vectors). Thereafter, this base is divided into 5 training subsets of 2-D LSF vector pairs (LSF 1-2, LSF 3-4, LSF 5-6, LSF 7-8 and LSF 9-10).

Design steps of COVQ-LSF-OTCVQ encoder : Step 1: Initial design

− Based on the 5 training subsets, use the COVQ (εc = 0.05) algorithm to design the five (2-D) extended initial codebooks of the encoder.

− Partition each initial codebook in 4 sub-codebooks using the set partitioning algorithm. Then, label the transitions of each trellis stage with the corresponding partitioned COVQ-codebook (i.e., COVQ-codebook LSF1-2 for stage 1,…

− Set a stop threshold α to very small value.

Step 2: TCVQ coding/decoding process

− For the given LSF vectors training base, find the best possible reproduction LSF vectors through the trellis by using a modified version of Viterbi procedure.

− Calculate the average SD between the original and quantized LSF vectors. Step 3: Termination Test

− If the relative decrease of the average SD is below the threshold α, save the 5 optimized codebooks of COVQ-LSF-OTCVQ encoder, stop.

− Otherwise, updates the 5 COVQ-codebooks using a modified version of the optimization procedure and go to step 2. In step 2, the TCVQ encoding process of input LSF vectors consists to find the best possible sequence of codevectors (optimal path) through the trellis. This research task is assured by the Viterbi algorithm with a slight modification of the distance computation formula. This distance, which must be minimized during the TCVQ search process of the optimal codevector, is formulated as follows:

∑ ∑ξ∈ =

−=ij

k

mjmmi mfmfdwc

kijpffd

1

2))(ˆ)((1)/()ˆ,( (15)

where k is the dimension of LSF vectors (k = 2 for LSF's pairs) and ξi is the set of the i-neighbors such as dH (i, j) = 1. Recall that after the encoding process, COVQ-LSF-OTCVQ encoder transmits two binary sequences in addition to two bits representing the trellis initial state.

In this part, we must notice that only the indices sequence of COVQ-LSF-OTCVQ codevectors (sequence of 20 bits for the 5 indices) is supposed to be protected implicitly by COVQ. This sequence results directly from the COVQ search procedure through the 5 codebooks of the encoder. On the other hand, the other binary sequences (initial state, optimal path) are not delivered by VQ search process and consequently they are not protected implicitly against channel errors. 6.1.2 Performances of the COVQ-LSF-OTCVQ

system: Encoding of the FS1016 LSF parameters We present now the performances of the 27

bits/frame COVQ-LSF-OTCVQ encoder (ε = 0.05) applied for the efficient and robust coding of FS1016 LSF parameters. In these simulations, the channel errors will affect only the transmission of LSF parameters. For the moment, only the sequences of 20 bits/frame specifying the COVQ-LSF-OTCVQ codevectors indices are transmitted over a BSC channel of bit error probability p varying between 0 and 0.5.

The data base used in the following evaluations is composed of 13.69s speech sequences extracted from the test data base. Synthesized speech signals of this base were generated by the FS1016, with objective evaluations in terms of average SD for the LSF encoders and average SSNR for synthetic speech signals. The SD Performances of the 27 bits/frame systems: LSF-OTCVQ (without protection) and COVQ-LSF-OTCVQ (ε = 0.05) are reported in table 7. These results show that when the channel error probability becomes rather high (p > ε = 0.05), the COVQ yields significant improvement to the performances of LSF-OTCVQ encoder. Without protection, the LSF-OTCVQ has incurred more severe degradation compared with the protected LSF encoder. This degradation is represented by a brutal increase in the average SD of the LSF-OTCVQ as well as the percentage of outliers frames having SD> 4 dB. Under these conditions, the COVQ (ε = 0.05) has permitted thus to LSF-OTCVQ to have a good robustness against channel errors by maintaining a reduced and slow increase of the average SD and the number of outliers frames (SD > 4 dB).

However, when the transmissions are done over a noiseless channel (p = 0.000) or slightly disturbed (p ≤ ε), the performances of COVQ-LSF-OTCVQ become suboptimal by compromising the transparent quantization quality. On other hand, important observations were noted concerning the SSNR objective performances of the global FS1016 encoder. Indeed, contrary to certain conclusions made before, the FS1016 SSNR performances (with LSF parameters coded by COVQ-LSF-OTCVQ) are also remarkable when the channel is slightly disturbed. The comparative evaluation of the FS1016 objective performances, with LSFs coded by LSF-OTCVQ and COVQ-LSF-OTCVQ encoders, is presented in Fig. 2.

0,01 0,10

2

4

6

8

10

12

0.001 0.5

FS1016 with LSF-OTCVQ FS1016 with COVQ-LSF-OTCVQ

Ave

rage

SSN

R (d

B)

Error Probability (p) Figure 2: Average SSNR performances of the FS1016 speech coder. For error probabilities p ≤ 0.01, these results show that the distortions are negligible for the two LSF encoding systems. We can conclude that the encoding system COVQ-LSF-OTCVQ (ε = 0.05) can provide a good implicit protection to the FS1016 LSF parameters with suboptimal SD-performances when the channel is slightly disturbed.

6.2 COVQ-LSF-OTCVQ encoder with

redundant channel coding Now, we generalize the study with the full protection of all transmission indices of the 27 bits/frame COVQ-LSF-OTCVQ encoder (ε = 0.05). By adequately exploiting the bits gained by this encoder, a redundant channel coding is used to explicitly protect the 7 bits/frame remaining without protection. Since in our simulations the transmissions are done via BSC channel with the assumption of only one error bit dominating by corrupted index (single error), a simple single error-correcting code is largely sufficient to correct all possible single errors which will affect the transmitted sequences of the encoder (5 bits of the optimal path and the 2 bits of the initial state). Notice, of course, that the 20 bits/ frame representing the codevectors indices of the optimal path are already protected by COVQ. To carry out the channel coding of the non-protected 7 bits/frame, we used two error-correcting Hamming (7, 4, 3) codes belonging to the category of systematic linear block codes. In this paper, we will not review the design/operation theory of the Hamming codes which is generally well documented [6]. These codes were first conceived to effectively correct only one error per transmission block (single error-correcting codes). In our design, the two Hamming (7, 4, 3) codes have the capacity to protect 8 bits by generating together 14 bits. The 27 bits/frame COVQ-LSF-OTCVQ encoder, with the two Hamming (7, 4, 3) codes, will thus operate at a rate of 34 bits/frame. It is about the same number of bits allocated with the original coding of the FS1016's LSF parameters. Thus, the global design of the FS1016 with COVQ-LSF-OTCVQ (plus the 2 Hamming codes) of LSF parameters maintains the speech coder rate to its original value of 4.8 kbits/s. The performances of the non-protected LSF-OTCVQ compared with those of the COVQ-LSF-OTCVQ (ε = 0.05) encoder with Hamming (7, 4, 3)

COVQ-LSF-OTCVQ Encoder SD Outliers (in %) BSC

Probability p

Average SD (dB) 2-4 dB > 4 dB

0.000 1.690 25.607 0.441 0.001 1.693 25.827 0.441 0.005 1.710 26.710 0.662 0.010 1.712 26.931 0.441 0.050 1.800 32.671 0.883 0.100 1.924 38.852 0.883 0.200 2.130 46.799 3.532 0.500 2.696 67.911 7.726

Table 7: Performance comparisons between COVQ-LSF-OTCVQ/LSF-OTCVQ encoders of 27 bits/frame: Application to the FS1016 LSF parameters encoding

LSF-OTCVQ Encoder SD Outliers (in %) Average

SD (dB) 2-4 dB > 4 dB 1.073 0.440 0.000 1.099 0.441 0.442 1.148 0.883 1.544 1.224 2.649 1.986 1.707 10.596 7.505 2.696 15.010 21.192 4.251 17.439 43.929 6.649 13.245 80.573

codes are given in table 8. For all error probability variation range, the results showed that the channel coding by Hamming codes (7, 4, 3) has clearly improved the performances of the 27 bits/frame COVQ-LSF-OTCVQ encoding system. The global system thus has a good robustness against the errors of the noisy channel. On the other hand by comparing these results with those given in table 7, the LSF-OTCVQ encoder has incurred larger degradation in terms of average SD and outliers. This is due mainly to the random noise effects of the binary sequences specifying the initial state or the optimal path. Concerning the SSNR performances of the global FS1016 (with LSFs coded by COVQ-LSF- OTCVQ + 2 Hamming (7, 4, 3) codes), the degradations are very low and even negligible for error probabilities p < 0.01. The SSNR performances of the FS1016, in the cases with and without LSF protection, are presented in Fig. 3.

0,01 0,10

2

4

6

8

10

12

0.001 0.5

FS1016 with non-protected LSF-OTCVQ FS1016 with COVQ-LSF-OTCVQ + 2 Ham(7,4)

Ave

rage

SSN

R (d

B)

Error Probability (p) Figure 3: Average-SSNR performances of global FS1016

7 CONCLUSION

In this work, an optimized trellis coded vector quantization scheme has been developed and successfully applied for the efficient and robust encoding of the FS1016 LSF spectral parameters. In the case of ideal transmissions over a noiseless channel, objective and subjective evaluation results revealed that the 27 bits/frame LSF-OTCVQ encoder (with weighted distance) produced equivalent perceptual quality to that when the LSF parameters are unquantized. After, we used a JSCC-COVQ technique to protect implicitly the transmission indices of the LSF-OTCVQ encoder incorporated in the FS1016. The simulation results showed that our new COVQ-LSF-OTCVQ encoding system has permitted to the basic LSF-OTCVQ encoder to have a good robustness against BSC channel errors especially when the transmission errors probability is high. To finish this work, it was necessary to protect all the transmission indices of the COVQ-LSF-OTCVQ encoder since only a part of its indices was protected implicitly by JSCC-COVQ. By using adequately the bits per frame gained by this encoder, a redundant channel coding by Hamming codes was used to explicitly protect the remaining bits without protection. We showed that the COVQ-LSF-OTCVQ encoder, using the Hamming codes (7, 4, 3), has contributed significantly to the improvement of the encoding performances of the FS1016's LSF parameters. We can conclude that our global COVQ-LSF-OTCVQ encoding system with Hamming channel codes can ensure an effective and robust coding of the LSF parameters of the FS1016 operating over noisy channel.

Table 8 : Performances comparison between the LSF-OTCVQ encoder and the COVQ-LSF-OTCVQ (ε = 0.05) + Hamming (7, 4, 3) codes

COVQ-LSF-OTCVQ Encoder + 2 Hamming (7, 4, 3) codes

SD Outliers (in %) BSC Probability p

Average SD (dB) 2-4 dB > 4 dB

0.000 1.690 25.607 0.441 0.001 1.689 25.607 0.441 0.005 1.701 26.048 0.441 0.010 1.725 26.490 0.662 0.050 1.802 28.697 1.545 0.100 1.948 32.229 3.532 0.200 2.226 34.878 7.505 0.500 2.389 35.982 10.596

LSF-OTCVQ Encoder without protection

SD Outliers (in %) Average SD (dB) 2-4 dB > 4 dB

1.073 0.440 0.000 1.665 4.635 9.933 1.993 5.077 14.790 2.030 5.960 14.128 2.896 13.907 26.931 3.825 17.880 41.721 5.070 23.620 54.304 7.057 12.362 86.754

8 REFERENCES [1] W.B. Kleijn and K. K. Paliwal, : Speech coding

and synthesis, Elsevier Science B.V., (1995).J. [2] K. K. Paliwal and B.S. Atal : Efficient vector

quantization of LPC parameters at 24 bits/frame, IEEE Transactions on Speech and Audio Processing, vol. 1, no. 1, pp. 3-14 (1993). F. R.

[3] F. Itakura : Line spectrum representation of linear predictive coefficients of speech signals", Journal of Acoustical Society of America, vol. 57, p.535 (1975).

[4] W. F. LeBlanc, B. Bhattacharya, S. A. Mahmoud and V. Cuperman : Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding, IEEE Transactions on Speech and Audio Processing, vol. 1, no. 4, pp. 373-385 (1993).

[5] M. Bouzid, A. Djeradi and B. Boudraa : Optimized Trellis Coded Vector Quantization of LSF Parameters: Application to the 4.8 Kbps FS1016 Speech Coder, Signal Processing, Vol. 85, Issue 9, pp. 1675-1694 (2005).

[6] S. Lin : An Introduction to Error-Correcting Codes", Prentice-Hall, Inc., Englewood Cliffs, New Jersey, USA (1970).

[7] C. E. Shannon: A Mathematical Theory of Communication, Bell System Technical Journal, vol. 27, no. 3 and 4, pp. 379-423 and 623-656 (1948).

[8] K. A. Zeger and A. Gersho : Vector quantizer design for memoryless noisy channels, in Proceedings of the International Conference on Communications (ICC'88), Philadelphia, pp. 1593-1597 (1988).

[9] N. Farvardin : A Study of vector quantisation for Noisy Channels, IEEE Transactions on Information Theory, vol. 36, n°. 4, pp. 799-809 (1990).

[10] S. B. Z. Azami, P. Duhamel and O. Rioul : Combined source-channel coding: Panorama of methods, CNES Workshop on Data Compression, Toulouse France (1996).

[11] A. Gersho, R. M. Gray : Vector quantization and Signal compression, Kluwer Academic Publishers, USA (1992).

[12] Y. Linde, A. Buzo, R. M. Gray : An Algorithm for Vector Quantization Design, IEEE Transactions on Communications, COM-28, pp. 84-95 (1980).

[13] M. W. Marcellin and T. R. Fischer : Trellis coded quantization of memoryless and Gauss-markov sources, IEEE Trans. on Communications, vol. 38, pp. 83-93 (1990).

[14] T. R. Fischer, M. W. Marcellin and M. Wang : Trellis coded vector quantization", IEEE Transactions on Information Theory, vol. 37, pp. 1551-1566 (1991).

[15] H. S. Wang and N. Moayeri : Trellis coded vector quantization, IEEE Trans. on Communications, vol. 40, pp. 1273-1276 (1992).

[16] A. J. Viterbi and J. K. Omura : Principles of Digital Communication and Coding, McGraw-Hill Kogakusha (1979).

[17] G. Ungerboeck : Trellis-coded modulation with redundant signal sets, Part I and II, IEEE Commun. Magazine, vol. 25, pp. 5-21, (1987).

[18] N. Farvardin and V. Vaishampayan : On the performance and Complexity of Channel-Optimized Vector Quantizers", IEEE, Transactions on Information Theory, vol. 37, n°.1, pp. 155-159 (1991).

[19] D. M. Chiang, L. C. Potter : Vector Quantisation For Noisy Channels: A guide To performance And Computation, IEEE Trans. on Circuits and systems for Video Technology, vol. 7, n°.1, pp. 604-612 (1997).

[20] M. Bouzid : Codage conjoint de source et de canal pour des transmissions par canaux bruités, Doctorate Thesis, Speech Communication, USTHB university, Alger, 2006.

[21] R. Laroia, N. Phamdo and N. Farvardin: Robust and efficient quantization of speech LSP parameters using structured vector quantizers", Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, pp. 641-644 (1991).

[22] J. S. Garofolo and al. : DARPA TIMIT Acoustic-phonetic Continuous Speech Database, Technology Building, National Institute of Standards and Technology (NIST), Gaithersburg (1988).

[23] J. P. Campbell, T. E. Tremain and V. C. Welch : The Proposed Federal Standard 1016 4800 bps Voice Coder: CELP, Speech Technology Magazine, pp. 58-64 (1990).

Date post:	10-Nov-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

ROBUST ENCODING OF THE FS1016 LSF PARAMETERS …

Documents