+ All Categories
Home > Documents > High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results...

High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results...

Date post: 19-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
1020 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989 High-Resolution Quantization Theory and the Vector Quantizer Advantage Abstract-How much performance advantage can a fixed dimension vector quantizer gain over a scalar quantizer? We collect several results from high-resolution or asymptotic (in rate) quantization theory and use them to identify source and system characteristics that contribute to the vector quantizer advantage. One well-known advantage is due to improve- ment in the space-filling properties of polytopes as the dimension in- creases. Others depend on the source’s memory and marginal density shape. The advantages are used to gain insight into product, transform, lattice, predictive, pyramid, and universal quantizers. Although numerical predictions consistently overestimated gains in low rate (1 bit/sample) experiments, theoretical insights may be useful even at these rates. I. INTRODUCTION HE USE of vector quantizers in practical systems for T data compression is increasing, largely because of the existence of design algorithms producing code structures amenable to microcircuit implementation. While it has long been known from Shannon’s source coding with a fidelity criterion theorem that better performance can be achieved by coding vectors rather than scalars, practical use of vector codes has been slow to develop because of the simplicity of scalar codes and the lack, until recently, of good design methods for implementable vector codes. While vector quantizers now appear attractive for several specific applications, a general question remains as to how to judge the potential gains from vector quantization and how to weigh these gains against the inevitable increase in complexity over scalar systems. A scalar quantizer maps an input value into one of a finite number, say N, of reproduction values, collectively called the code book. Typically, the encoder portion of a scalar quantizer examines the input and determines the reproduction that is “closest” (i.e., the one that will pro- duce the minimum distortion when used to reproduce the input). The coder dispatches the index of this reproduction over the channel, and the decoder uses the index to look up the reproduction, which it then delivers to the output of the system. The transmission rate of the system is given by log, N bits/sample, which may be significantly smaller than the transmission rate required without quantization. Manuscript received June 6, 1987; revised October 31, 1988. This work was supported in part by ESL, a subsidiary of TRW, and in part by the National Science Foundation under Grant ECS83-17981. T. Lookabaugh was with the Information Systems Laboratory, Stan- ford University, Stanford, CA. He is now with Compression Labs., Inc., 2860 Junction Avenue, San Jose, CA 95131. R. M. Gray is with the Information Systems Laboratory, Department of Electrical Engineering, Stanford University. Stanford, CA 94305-4055. IEEE Log Number 8930401. For instance, an input sample that is a real number would require an infinite number of bits to transmit its value if it is not quantized! Vector quantization generalizes scalar quantization to the simultaneous encoding of a vector of input values to one of a finite number of reproduction vectors. The encoding process leads to a useful visualiza- tion of vector quantization as a set of reproductions and a partition of the input space into sets or bins, one for each reproduction, which determine the reproduction that will be assigned to any input. Thorough reviews of code struc- tures and design algorithms for vector quantization can be found in [1]-[3]. The most frequently cited theoretical rationale for vector quantization is rate-distortion theory, in which the actual proof of achievability (to within an arbitrarily small con- stant) of the rate-distortion bound for a given source uses vector quantizers. Hence one would suppose (correctly) that given vector quantizers of arbitrarily large dimension, one could approach the rate-distortion function. However, a vector quantizer which explicitly calculates the distortion between the input vector and each possible reproduction vector exhibits both search complexity and memory re- quirements that grow exponentially with dimension. Ex- ceptions occur when the source has a particularly simple structure so that fast algorithms may exist for vector quantizing the input (see the discussion of lattice and pyramid vector quantizers in Section V). Alternatively, one can impose a suboptimal structure on the code book to make the code book search or storage or both simpler at the cost of performance. Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes large; a contrasting approach is to fix the dimen- sion and assume the rate (and hence the code book size) is large. This is the subject of high-resolution or asymptotic quantization theory. The essence of high-resolution quanti- zation theory is to assume that there are so many output points that the probability density of the input is approxi- mately constant across any particular input bin. Bennett [4] first applied this approximation in developing a system performance formula for scalar quantizers that has come to be known as Bennett’s integral. The vector quantizer version of the problem was first studied by Schutzenberger [5] and Zador [6]. Generalizations were provided by Gish and Pierce [7], Yamada et al. 191, Bucklew and Wise [lo], and Bucklew [ll], [12]. 0018-9448/89/0900-lO20$01.00 01989 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.
Transcript
Page 1: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1020 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

High-Resolution Quantization Theory and the Vector Quantizer Advantage

Abstract-How much performance advantage can a fixed dimension vector quantizer gain over a scalar quantizer? We collect several results from high-resolution or asymptotic (in rate) quantization theory and use them to identify source and system characteristics that contribute to the vector quantizer advantage. One well-known advantage is due to improve- ment in the space-filling properties of polytopes as the dimension in- creases. Others depend on the source’s memory and marginal density shape. The advantages are used to gain insight into product, transform, lattice, predictive, pyramid, and universal quantizers. Although numerical predictions consistently overestimated gains in low rate (1 bit/sample) experiments, theoretical insights may be useful even at these rates.

I. INTRODUCTION

HE USE of vector quantizers in practical systems for T data compression is increasing, largely because of the existence of design algorithms producing code structures amenable to microcircuit implementation. While it has long been known from Shannon’s source coding with a fidelity criterion theorem that better performance can be achieved by coding vectors rather than scalars, practical use of vector codes has been slow to develop because of the simplicity of scalar codes and the lack, until recently, of good design methods for implementable vector codes. While vector quantizers now appear attractive for several specific applications, a general question remains as to how to judge the potential gains from vector quantization and how to weigh these gains against the inevitable increase in complexity over scalar systems.

A scalar quantizer maps an input value into one of a finite number, say N , of reproduction values, collectively called the code book. Typically, the encoder portion of a scalar quantizer examines the input and determines the reproduction that is “closest” (i.e., the one that will pro- duce the minimum distortion when used to reproduce the input). The coder dispatches the index of this reproduction over the channel, and the decoder uses the index to look up the reproduction, which it then delivers to the output of the system. The transmission rate of the system is given by log, N bits/sample, which may be significantly smaller than the transmission rate required without quantization.

Manuscript received June 6 , 1987; revised October 31, 1988. This work was supported in part by ESL, a subsidiary of TRW, and in part by the National Science Foundation under Grant ECS83-17981.

T. Lookabaugh was with the Information Systems Laboratory, Stan- ford University, Stanford, CA. He is now with Compression Labs., Inc., 2860 Junction Avenue, San Jose, CA 95131.

R. M. Gray is with the Information Systems Laboratory, Department of Electrical Engineering, Stanford University. Stanford, CA 94305-4055.

IEEE Log Number 8930401.

For instance, an input sample that is a real number would require an infinite number of bits to transmit its value if it is not quantized! Vector quantization generalizes scalar quantization to the simultaneous encoding of a vector of input values to one of a finite number of reproduction vectors. The encoding process leads to a useful visualiza- tion of vector quantization as a set of reproductions and a partition of the input space into sets or bins, one for each reproduction, which determine the reproduction that will be assigned to any input. Thorough reviews of code struc- tures and design algorithms for vector quantization can be found in [1]-[3].

The most frequently cited theoretical rationale for vector quantization is rate-distortion theory, in which the actual proof of achievability (to within an arbitrarily small con- stant) of the rate-distortion bound for a given source uses vector quantizers. Hence one would suppose (correctly) that given vector quantizers of arbitrarily large dimension, one could approach the rate-distortion function. However, a vector quantizer which explicitly calculates the distortion between the input vector and each possible reproduction vector exhibits both search complexity and memory re- quirements that grow exponentially with dimension. Ex- ceptions occur when the source has a particularly simple structure so that fast algorithms may exist for vector quantizing the input (see the discussion of lattice and pyramid vector quantizers in Section V). Alternatively, one can impose a suboptimal structure on the code book to make the code book search or storage or both simpler at the cost of performance.

Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes large; a contrasting approach is to fix the dimen- sion and assume the rate (and hence the code book size) is large. This is the subject of high-resolution or asymptotic quantization theory. The essence of high-resolution quanti- zation theory is to assume that there are so many output points that the probability density of the input is approxi- mately constant across any particular input bin. Bennett [4] first applied this approximation in developing a system performance formula for scalar quantizers that has come to be known as Bennett’s integral. The vector quantizer version of the problem was first studied by Schutzenberger [5] and Zador [6]. Generalizations were provided by Gish and Pierce [7], Yamada et al. 191, Bucklew and Wise [lo], and Bucklew [ll], [12].

001 8-9448/89/0900-lO20$01.00 01989 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 2: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER

I

1021

High-resolution quantization theory provides tractable equations for the performance of vector quantizers for any vector dimension. True quantizer performance will ap- proach that predicted by high-resolution quantization the- ory as the number of output points gets very large. Some practical applications require very low distortion so that high rates are necessary and high resolution quantization theory will be quite accurate; other applications require rates that are so low that the approximations of the high-resolution theory are questionable. Nevertheless, the tractability of the high-resolution results makes them use- ful for understanding performance gains even in low rate cases.

When considering vector quantizers of finite dimension, two different design criteria can be formulated. First, one can find the vector quantizer which minimizes distortion for a fixed number of output points (“constrained resolu- tion”). Alternatively, one can find the vector quantizer which minimizes distortion with the constraint that its output entropy does not exceed some constant (“con- strained entropy”). The latter approach has the potential for lower output rates, but only if some kind of noiseless entropy coding (such as Huffman coding) follows the vector quantizer. Analytical expressions for optimum de- signs and performances are available for both problems in the high-resolution case; iterative design algorithms exist for the low-resolution case [13]-[15]. We will consider both design criteria, although emphasis will be placed on the constrained resolution problem. Finally, it is worth noting that hgh-resolution theory shows that, in the limit of large dimension, both criteria converge to the results generated from rate-distortion theory Ill].

In their survey paper, Makhoul et al. [3] present proper- ties of the components of a vector, which “when utilized appropriately in codebook design, result in optimal perfor- mance.” These properties are 1) linear dependency, 2) nonlinear dependency, 3) probability density function shape, and 4) dimensionality. To demonstrate how vector quantizer design uses dimensionality and dependency (lin- ear dependency is correlation; nonlinear dependency is dependency that remains after the components have been decorrelated), they provide some easily visualized two- dimensional examples. Dependence on quantizer design on the probability density function shape is demonstrated through its effect on differential entropy (which is a factor in the Shannon lower bound on the rate-distortion func- tion) and on the design of low-rate optimal scalar quantiz- ers.

In this paper, we address both why and by how much a vector quantizer out-performs a scalar quantizer for the same source, and, in particular, which attributes of a source will make vector quantization most effective. The paper is primarily interpretive: few new theoretical results are provided. Rather, we have collected high-resolution quantization theory formulas for both scalar and vector quantizers; after dividing the two we can identify factors that correspond to the properties of Makhoul et al. We call these factors “ vectors quantizer advantages”: they are the

~ _ _ _ _

space-filling advantage (corresponding to dimensionality), the shape advantage (corresponding to probability density function shape), and the memory advantage (correspond- ing to both linear and nonlinear dependencies between vector components). An attribute of our approach is that we not only establish the nature of the vector quantizer advantages, but also estimate their importance for a given source and quantizer system.

In the next section we derive the vector quantizer advan- tages for constrained resolution systems under the rth power distortion measure, and justify the names we have chosen for them. In Sections I11 and IV we provide gener- alizations to entropy-constrained systems and a wider class of distortion measures. Finally, in Section V we apply the insights we have gained to evaluating a number of quanti- zation systems, including product vector quantizers, trans- form and predictive coders, lattice and pyramid vector quantizers, universal quantization, and the results of low- rate quantizer design experiments.

11. THE VECTOR QUANTIZER ADVANTAGES

To begin our discussion of vector quantizer perfor- mance, we first need an expression for the expected distor- tion produced by a vector quantizer of specified dimension and rate. We consider initially (until Section IV) the rth power distortion measure

where k is vector dimension, x and y are k-dimensional vectors, and r > 0 ( r = 2 corresponds to mean squared error). Zador [6] showed that the expected distortion D produced by a k-dimensional vector quantizer with N vectors in its code book is (for N large)

D ( N ; k , 4 = ‘)N-r/kllP(X)Ilk/(k+r) (1) where C(k, r ) is the coefficient of quantization, p(x) is the probability density function of the vector x = { x o , xl; . -, x k P l } (assumed continuous and smooth), and the functional I l . I I Y is given by

l / v Ilf(x)ll,= ( I f (X)*dX)

which is defined if f( x)” is integrable (for v 2 1 this is a seminorm, but not for v <1; see Ash [16, sect. 2.41 for a discussion). Zador showed that the coefficient of quantiza- tion depends only on k and r and provided upper and lower bounds.

Gersho [8] conjectured that the coefficient of quantiza- tion is determined by the optimal regular cell shape for hgh-rate vector quantization matched to a uniform proba- bility density function. To be precise, Gersho considers the class of convex polytopes (a region of k-dimensional space bounded by (k - 1)-dimensional hyperplanes so that any point lying on a line segment joining points on the bound- aries is within the region) which can be used to cover space by translating and rotating copies. If a uniform probability

7

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 3: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1022 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

density is placed on the space contained by the polytope, then its centroid is the point f which minimizes the expected distortion incurred by replacing a randomly chosen point within the polytope by f. A Dirichlet or Voronoi partition with respect to a set of points X = { xo, xZ; . e , x N P 1 } is a partition whose regions are near- est-neighbor regions, i.e., a point is in the region belonging to x, if x, is the closest element in X (in the minimum distortion sense). Let the set of admissible polytopes for k-dimensional space Hk consist of those space covering convex polytopes whch produce a Dirichlet partition with respect to their centroids. By observing the expected dis- tortion that would result if a high-rate quantizer for the uniform probability density function was based on a poly- tope Hk E Hkr Gersho conjectured that

l 3 \ I I IX- fll‘dx C ( k , r ) = inf l + r / k ( 2 )

where f is the centroid of the polytope Hk, T/(Hk) is its volume (Lebesgue integral), and 11 .11 is the Euclidean norm. For two dimensions we can enumerate Hk as the rectangle, equilateral triangle, and hexagon; the infimum in (2) is achieved when the hexagon is used.

We will adopt Gersho’s conjecture for the remainder of this paper. This allows us to apply results and bounds based both on Zador’s work and resulting from the calcu- lation (2).

To determine the performance gain of vector over scalar quantizers, we wish to compare vector and scalar quantiz- ers with an equal number of output points per vector input. A scalar quantizer with N output points used re- peatedly over k dimensions can produce N different outputs; thus it effectively has N k output points in the k-dimensional space. Hence, it is appropriate to compare a vector quantizer with N k output points against repeated use (over k dimensions) of a scalar quantizer with N output points. Note that indexing the output of either the vector or repeated scalar quantizer will require k log, N bits per k-dimensional input vector (without entropy cod- ing).

We define the vector quantizer advantage A( k, r ) as the ratio of the distortion due to repeated scalar quantization to that due to vector quantization:

D( N;1, r ) A ( k , r ) = ‘

D( Nk; k, r )

where we have assumed that the source is stationary so that its marginal density P ( x ) does not depend on the coordinate. Later (Section V) we will consider an example where we relax the stationarity assumption. Define

k - 1

r = O

the distribution that would result if the vector coordinates were in fact independent. With t h s definition in hand, we may rewrite ( 3 ) as

A ( k , r ) = F ( k , r ) S ( k , r ) M ( k , r ) (4) where the space filling advantage is defined by

the shape advantage is defined by

11 P ( x 1 ll1/(1+ I )

S ( k, r ) =

and the memory advantage is defined by

11 P* ( ) 1 I k / ( k + r ) ’

( 5 )

(7)

We can also express (4) in dB, so that 10logloA =lOlogloF+lOl~gloM +lOl~g, ,S dB.

We shall see that, defined in this way, the advantages capture in a quantitative manner the attributes described by Makhoul et al. We now examine each of the advantages in detail.

A . The Space-Filling Advantage

The definition of the space-filling advantage in (5) shows that it depends only on the coefficient of quantization and hence, applying Gersho’s conjecture (2) , only on the effi- ciency with which polytopes can fill space. In particular, it does not depend on the probability distribution of the source (either its shape or memory). This particular advan- tage has long been recognized by researchers as a funda- mental advantage of vector quantizers over scalar quantiz- ers. As we will see, this advantage by itself establishes that vector quantizers will always outperform scalar quantizers (in cases where the hgh-resolution approximations are accurate).

We can evaluate C( k, r ) explicitly in only a few cases. For k =1 and all r, the optimal polytope is trivially the interval (there are no other convex polytopes), for k = 2 and all r the optimal polytope is the hexagon [8], and for k = 3 and r = 2 the optimal polytope is the regular trun- cated octahedron [17]. For other values of k and r , we must rely on bounds. Partitions based on Gersho’s admis- sible polytopes are equivalent to lattices, where the points of a lattice are the centroids of the polytopes. Conse- quently, known lattices for various dimensions provide an upper bound on C(k, r ) and hence a lower bound on F( k, r). Among these is the lattice formed by concatenat- ing replicas of a uniform scalar quantizer, which, from (2), would yield a shape advantage of one. Since t h s is a lower bound, we clearly have that F(k, r ) 2 1 for k 2 1 and all r > 0.

Since the sphere in k dimensions has the smallest mo- ment of inertia with respect to its centroid of any k-dimen- sional polytope, if it were admissible it would be the

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 4: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER

0.9

0.8

1023

- sphere upper bound - Conway and Sloane conjectured upper bound

- 0 lattice lower bound ____- Zador lower bound

polytope used in the definition of C( k, r ) . Unfortunately, we cannot cover space with spheres. Hence a result based on approximating a partition of the input space using spheres (such that their total volume equals the volume of the space to be covered) provides a lower bound on C( k, r ) (and an upper bound on F( k, r ) ) :

1 C ( k, r ) 2 - V k ( 2 ) - r / k

k + r where v k ( < ) is the volume of the sphere of unit radius in k-dimensional space with the I, norm (5 2 1):

(9)

(so Vk(2) is the usual volume of a unit sphere in k-dimen- sional Euclidean space).

Conway and Sloane [18] have studied lattices of various dimensionalities and have conjectured a tighter lower bound than the spheres bound of (8) for the case r = 2 [19]. Their work provides a conjectured upper bound on F( k, 2) for all k and a known lower bound for dimensions with lattices that have been studied. We can also use an upper bound on C(k, r ) developed by Zador [6] using random coding arguments,

to lower bound F(k,2). The various bounds on F( k, r ) are found in Table I and

Fig. 1 for r = 2. The approximate gain in Table I is calculated using Conway and Sloane's conjectured lower bound, since it seems quite accurate with respect to known lattices in the range of dimensions considered. We note that as k +. 00, the Zador upper bound, Conway and

h

3

TABLE I BOUNDS ON THE SPACE FILLING ADVANTAGE F( k , 2)

Zador Lattice Conway and Sphere Approximate Lower Lower Sloane Conjectured Upper Gain

k Bound Bound UDDerBound Bound (dB)

1 0.167 1 2 0.524 1.039 3 0.720 1.061 4 0.835 1.088 5 0.913 1.102 6 0.968 1.122 7 1.010 1.139 8 1.043 1.162 9 1.071 1.115 10 1.094 1.115 12 1.131 1.188 16 1.181 1.220 24 1.239 1.266 100 1.359 co 1.423

1 1.039 1.070 1.095 1.116 1.134 1.149 1.163 1.175 1.186 1.204 1.232 1.270

1.423

1 0 1.047 0.17 1.046 0.29 1.111 0.39 1.133 0.47 1.153 OS4 1.168 0.60 1.182 0.66 1.195 0.70 1.205 0.74 1.244 0.81 1.251 0.91 1.286 1.04 1.370 1.35 1.423 1.53

Sloane's conjectured lower bound, and the sphere lower bound on C( k, 2) all have the same limiting value [19],

lim C ( k , 2 ) = ( 2 v e ) - ' , (11) k + c c

which gives the limiting value for F(k,2) of the table.

B. The Shape Advantage

The factor S( k, r ) of (6) can be written

0.6 1 0.51

0.4 t

I 0 2 4 6 8 10 12

Dimension Bounds on space filling advantage F(k,2). Fig. 1.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 5: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1024 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

~ - - - uniform _ _ _ - - -

~ 8 - _ _ _ _ _ Gaussian .......... Laplacian .*.- _._.- -.

...I 7 - - gamma (s=1/2) ,..e ...e

6 -

5 -

4 - ....

3 -

... . . . . . . . . . ..... ..........

..... . . . . . . . . . . . . ..... .....

,.." . .

I . , .. ; ,: _______----- .............................. I . _-____----- . . __-- - I . . .

__--- 2 - : .: __--

/' 1 - I ' I

C,'

0 - !'

Hence S(k, r ) depends only on the shape of the marginal probability density function. An exercise with Holder's inequality [20] shows that S(k, r ) 2 1 for all k 2 1 and r > 0. To develop a feel for the gain that S(k, r ) con- tributes, we have derived its value for several typical probability density functions.

S(k,r) for the Uniform Density: Let p(x) be uniform, say,

(13) l / ( b - a ) , if a < x < b ;

otherwise. Then substituting into (12) provides the result

S ( k , r ) = l f o r a l l k > l and r > 0 . Hence the uniform probability density function provides no shape advantage.

S(k, r) for the Gaussian Density: Let p(x) be zero mean Gaussian with variance u2, i.e.,

1 p ( x ) = ~ exp ( - x2/2u2). (14) \/2.rraz

Then (12) and some algebra yield

which is independent of u2, so that the fact that p (x) is a Gaussian density gives the value of S(k, r ) regardless of the actual variance.

S(k, r) for the Gamma Density: The gamma density with parameters s, X > 0 is given by

otherwise.

Thus (12) yields

The mean and variance of a random variable with such a density are, respectively, s/A and s/A2. With the parame- ter s fixed, S(k, r ) is independent of A (or, equivalently, the mean and variance), so that it is determined only by k, r , and the fact that the density is gamma with parameter s. When s = 1, the gamma density reduces to the exponential density function.

Symmetrized Densities: Frequently, we take probability densities defined on the nonnegative real line and extend them by symmetry to ones defined on the entire real line. Using (12), we can show that the shape advantage of the

TABLE I1 THE SHAPE ADVANTAGE S(k,2) FOR GAUSSIAN, LAPLACIAN,

AND GAMMA DENSITIES (s = 1/2)

Gaussian Laplacian Gamma k (dB) (dB) (dB)

1 1 0 1 0 1 0 2 1.299 1.14 1.688 2.27 2.203 3.43 3 1.449 1.61 2.100 3.22 3.198 5.05 4 1.540 1.87 2.370 3.75 4.000 6.02 5 1.600 2.04 2.561 4.09 4.652 6.68 6 1.644 2.16 2.703 4.32 5.190 7.15 7 1.677 2.25 2.812 4.49 5.640 7.51 8 1.703 2.31 2.899 4.62 6.021 7.80 9 1.723 2.36 2.970 4.73 6.349 8.03 10 1.740 2.41 3.028 4.81 6.632 8.22 12 1.766 2.47 3.120 4.94 7.099 8.51 16 1.800 2.55 3.241 5.11 7.764 8.90 24 1.836 2.64 3.369 5.28 8.540 9.32 100 1.893 2.77 3.582 5.54 10.001 10.00 00 1.912 2.81 3.654 5.63 14.619 11.65

-1 ' 1 0 2 4 6 8 10 12

Dimension Shape advantage S( k , 2 ) for uniform, Gaussian, Laplacian, and symmetrized gamma (s = 1/2) densities Fig. 2.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 6: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER 1025

symmetrized probability is identical to that of its parent asymmetric density. Special cases include the Laplacian density, which has the same shape advantage as the gamma density for s = 1, and the symmetrized gamma density with s = 1/2, a popular model for the marginal probability density function of speech [21].

These examples and the fact that S( k, r ) depends only on the marginal density justify calling it the shape advan- tage. From (12), S ( k , r ) does not depend on how the marginal density is scaled. Some numerical values for S( k, 2) are given in Table I1 and plotted in Fig. 2.

Note that, for the Gaussian density k (k+2) /2

( k + 2 ) ’ S(k,2) = (5.1962) -

and that

so that 5.1962

lim S(k.2) = -. k + m ‘ ” e

Similarly, for the Laplacian density,

27

e lim S(k,2) = ?,

k + m

and for the symmetrized gamma density (s = 1/2),

39.378 lim S( k,2) = -.

k - + m e

C. The Memory Advantage

The factor M(k, r ) of (7) is the ratio of the k/(k + r)th norms of the vector’s probability density and the product of its marginals. Clearly, M(k, r ) is 1 (leading to a O-dB gain due to memory) if the components are independent and identically distributed (i.i.d.). In such a case, the entire gain over scalar quantization can be attributed to the space-filling and shape advantages. In other cases, M( k, r ) can be hard to compute analytically (since it requires knowledge of the true k-dimensional source probability density function and computation of k-dimensional inte- grals). The approach followed here will be to solve the problem analytically for a jointly Gaussian random vector (a case in which the integrals can be calculated in a straightforward manner). Unfortunately, since uncorrelat- edness implies independence for components of Gaussian random vectors, it is not possible to explore “nonlinear” dependence with these results. We must refer the reader to Makhoul et al. [ 3 , example 2, p. 15591 for an example that demonstrates an achievable vector quantizer advantage due to nonlinear dependence (Le., even when the vector components are uncorrelated).

M(k, r) for a Jointly Gaussian Random Vector: Let p ( x ) be zero means Gaussian with positive definite covariance

matrix R, i.e.,

By stationarity, the marginal densities are identically dis- tributed, i.e., they have the same variance u2, although the coordinates may not be independent. In this case, the product of the marginals is also jointly Gaussian with zero mean and covariance matrix

R* = u21, 1 being the identity matrix. We can explicitly compute M( k , r ) to obtain (see the Appendix)

Several observations can be made based on this formula. First, note that the diagonal elements of R are all the same under our identical distribution assumption and are given by u2. From matrix theory [22, p. 1261,

k - 1 d e t R I n Rii

i = O

so that k - 1 l / k

(det R)l/k I [ iGou2] = u2,

and hence M ( k, r ) is always greater than or equal to one. Secondly, note that as det( R) decreases, M ( k , r ) gets large, so that for det( R) very small, the memory advantage can swamp the shape and space filling advantages. Con- sider, for example, M(2,2) for a jointly Gaussian random vector input with covariance matrix

In this form, p is known as the correlation coefficient and has the property that - 1 s p ~1 with p = 0 signifying independence between the two coordinates, and IpI = 1 signifying complete dependence (i.e., one component is completely determined by the other). For this example, we have

1 M(2,2) = ~ F7

so that, as the vector approaches complete dependence, M(2,2) becomes arbitrarily large. This suggests an inter- pretation for M(k, r ) : the more dependent the coordinates the vector are, the larger M(k, r ) will become. It achieves its minimum, 1, when the coordinates are independent.

Gauss-Markov or Gaussian autoregressive processes are quite popular as models due to their tractability. For this very reason, we will derive M(k, r ) for a first-order Gauss-Markov source and show that it is solely a function

M(k, r) for a First-Order Gauss - Markov Source:

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 7: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1026 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

10

8 -

6 -

4 -

of vector dimension, distortion power, and the regression coefficient.

We define a first-order Gauss-Markov source by

x, = ax,-, + E , , for all n > no

where la1 < 1 is the regression coefficient, and the E , is i.i.d. zero mean normal random variables with variance U:,

and xn0 is a zero mean finite variance random variable. Such a source is known to be asymptotically stationary [23]. Hence, as the starting point for the process recedes into the past, the process becomes stationary. Its marginal will approach a zero mean normal density with variance

- ............ ......................

- _. .............. - - - . - .

......... ............ ................... .... ....... . . . . . . . . . . .

. . . . ....... .....

, .: I ,..’

I ! ,: ../’ I :

* : I .

. . . . , .

0: a ’= - (1- a ’ ) .

In Table I11 and Fig. 3, we present M ( k , 2 ) for several values of a. (For the sake of comparison, a = 0.86 for 8-kHz sampled speech, low-pass filtered to 3400 Hz [24, p. 371.) Note that

k + m lim M ( k , r ) = ( l -~’ )~~’’ . (20)

One would expect that, in spite of the high-resolution approximations, the expressions for rate versus distortion of high-resolution quantization theory approach those of rate-distortion theory as the dimension becomes arbitrarily large, and indeed, several authors have shown that this is so for many sources and distortion measures (see [6], [8], [9], [ll]). Since our discussion has been based on high-reso- lution quantization approximations, the combination of the vector quantizer advantages for a particular source should also yield a total advantage for large dimension

A vector of samples taken from this source will form a zero mean Gaussian random vector. In the stationary regime, the covariance matrix of the vector is independent of n and is described by

To evaluate M ( k , r ) from (19), we must find det(R). By expanding the determinant by minors, we can easily show that this is

(dk det(R) =y

1-U Substituting into (19) yields

(1- k ) r / 2 k M ( k , r ) = ( l - a )

Since k 2 1 and r > 0, the exponent is always nonpositive and with la1 < 1, M ( k , r ) 21 as expected. Also, M ( k , r ) is independent of U:; it depends only on a (the same param- eter that determines how “dependent” the source is).

TABLE 111 THE MEMORY ADVANTAGE M ( k,2) FOR A FIRST-ORDER

GAUSS-MARKOV PROCESS (dB)

k a = 0.5 a =0.9 a = 0.95 Regression Coefficient

1 2 3 4 5 6 7 8 9

10 12 16 24

100 M

0 0.62 0.83 0.94 1.00 1.04 1.07 1.09 1.11 1.12 1.15 1.17 1.25 1.24 1.25

0 3.61 4.81 5.41 5.77 6.01 6.18 6.31 6.41 6.49 6.61 6.76 6.21 7.14 7.21

0 5.05 6.74 7.58 8.09 8.42 8.67 8.85 8.99 9.10 9.27 9.48 9.77

10.01 10.11

h

53

a=O.O a=0.5

- _ _ _ _ .... a=0.9

a=0.95

0 2 4 6 8 10 12

Dimension Fig. 3. Memory advantage, M(k,2), for a Gauss-Markov source with regression coefficient a.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 8: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER 1027

consistent with rate-distortion theory. We now demon- strate that this is true for a Gauss-Marko source.

a first-order Gauss-Markov source with variance 1 (for convenience) and small distortion (the same condition for validity of the high resolution quantization expressions):

space-filling advantage

C(1, r ) Berger [25, p. 1131 gives the rate-distortion function for Fe(k , r ) = ____ qk, .) - - r L

i.e., the Same as the resolution-constrained space-filling advantage. The entropy-constrained shape advantage is

1 1 - a 2 given by

2 D R ( D ) = - log - bits S , ( k , .) = 2rIH(a-(l/k)H(P*)l = I ,

which leads to

D(R) = (1 - ~ ~ ) 2 - ~ ~ . The asymptotic scalar quantizer expression for a Gaussian random variance with variance 1 given by [26]

bits, 1 1

R (D,) = - log - + 0.722 2 0

leading to

D,(R) = (2.721)2-2R.

Hence the ratio of the asymptotic scalar distortion to the distortion-rate function is given by

- D,(R) 2.721 A=-=- D ( R ) (1-2) .

Now from (ll), (18), and (20), we can calculate

1 2.721 lim A(k,2) = (1.423)(1.912) ___ = ~

k + m [ (1- d ) ] (1- a') '

which gives the expected agreement with rate-distortion theory. Recall that agreement with rate-distortion theory is expected in both the resolution-constrained and entropy- constrained problems.

since the entropy of the product distribution of indepen- dent random variables is just the sum of their entropies. Hence, for the entropy-constrained problem there is no shape advantage. The absence of a shape advantage is responsible for the observation, first made by Gish and Pierce [7], that for mean squared error, a high-resolution scalar quantizer followed by an entropy coder can get within 0.254 bit of the rate-distortion curve for an i.i.d. source, regardless of the probability density function shape. This is true because the lack of a memory advantage (due to the independence) and the inherent lack of a shape advantage for entropy-constrained quantizers leaves only the space-filling advantage. Recall from Section 11-A that the space filling advantage is bounded above by the sphere upper bound and approaches a limit of 1.423 as the dimension gets very large. We can combine this with the expression for high-resolution scalar quantizer perfor- mance ((1) with k = 1 and r = 2) to show that the number of bits saved by a vector quantizer over a scalar quantizer with the same distortion when only the space filling advan- tage is present is bounded above by and approaches 0.254 bit as the dimension gets arbitrarily large.

When memory is present, the entropy-constrained mem- ory advantage is given by

111. ENTROPY-CONSTRAINED CRITERION We can invoke properties of differential entropy to evalu-

Until now, we have been concerned with the constrained ate Me for a Gaussian random vector (see the Appendix): resolution problem. We can solve the entropy-constrained

Gersho's conjecture) for the optimum entropy-constrained quantizer's performance under the average r th power dis- tortion measure is

[ (det"K2)1'k]r/2' problem in an analogous manner. Zador's equation (with M , ( k , .) = 2r/2[log,a2-(1/k)log2(detR)1 =

(23)

which is the same as the memory advantage for the resolu- tion-constrained Gaussian random vector case. (21) De( N ; k, r ) = C ( k , r)2-r/k[HQ-H(p)1

where HQ is the entropy of the quantizer output, i.e., if pi is the probability of the ith output, then HQ = IV. MORE GENERAL DISTORTION MEASURES -CpilOg,(p,); and H ( p ) is the differential entropy of the k-dimensional probability density function, i.e., H( p ) = - / p ( x ) log, ( p ( x ) ) dx. We apply the same logic as in (3) to obtain the entropy-constrained vector quantizer ad- vantage:

We can find vector quantizer advantage expressions for a wider class of distortion measures than the r th power distortion used so far; in fact, we can consider any distor- tion d ( ~ , y ) given as the rth power of a seminorm of the difference x - y . However, we will no longer be able to identify the factor F ( k , r ) as due to the space filling properties of polytopes.

A seminorm on R k is a function 1 . 1 from R k to R with where, as before, j ( x ) is the marginal probability density the properties that, for all x, y E R k and a E R 1) 1x1 2 0, function. Factoring leads to the entropy-constrained 2) lax( = ( a ( 1x1, and 3) lx + y ( s (x (+ 1 y ( . Examples in-

c ( 1 , r ) A e ( k, r ) = -2r[H(B)-(l/k)H(p)I C(k, r )

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 9: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1028 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

cludes the 1, norms for v 2 1 defined by

the I , or sup norm given by

IxI= sup IXil, is ( I ;... k )

and the quadratic norms 1x1 = ( xTBx)’I2 with B positive semidefinite. The volume of a unit sphere in k dimensions under a particular seminorm is given by

Vk = du . (24) L: J U l % l

For the I , norms, Vk is given by (9); for the 1, norm V, = 2k, and for the quadratic norms,

We consider a distortion measure defined by

d ( x, y ) = Jx - yIr, for all r > 0.

For this class of distortion measures, Yamada et al. [9] have derived a lower bound on the distortion of a high-res- olution optimal resolution-constrained quantizer:

vk- r / k

D ( N ; k , r ) 2 D , ( N ; k , r ) =-Npr ’k l lP(x) I Ik / (k+ , , , k + r

(25 1 and Bucklew [ l l ] has provided an upper bound

D ( N ; k , r ) I D , ( N ; k , r )

= T ( 1 + r / k ) V i r / , - - ( k i r ) ( k , l ) r

’ N p r ’k l l P ( 1 1 k / ( k + r ) * (26)

As Bucklew notes, his bound is slightly weaker than Zador’s (10) in cases where Zador’s applies (when the seminorm is the Euclidean norm). Since

D , ( N ; 1, 4 = A ( k , r ) I

D , ( N ; 1, r ) D ( N ; 1, r ) D u ( N k ; k , r ) D ( N k ; k , r ) D , ( N k k , r ) ’

and noting that the factors N - r ’ k l l P ( X ) l l k , / ( k + , , are pres- ent in both D , and D,, we can again diwde and identify three vector quantizer advantages; the latter two are iden- tical to the shape and memory advantages already defined in (6) and (7), while the first, which we will denote by F( k , r ) , is bounded by

Note that, although the bounds on F ( k , r ) are indepen- dent of the probability density function of the source, the results we have quoted for seminorm-based distortion measures do not prove that F( k , r ) itself is independent of the probability density function. Nevertheless, when the bounds tightly control F ( k , r ) , it appears that memory and shape advantages do not depend on the particular seminorm used, but only on the power to which it is raised in the distortion measure. Although Yamada et al. have a lower bound similar to (25) for the entropy-constrained case, the authors are not aware of an upper bound similar to Bucklew’s (27); hence the same procedure as for the resolution-constrained case cannot be applied.

v. APPLICATIONS TO QUANTIZER SYSTEMS

The development so far allows us to gain some insight into the performance of a variety of specific quantizer systems. In this section we analyze several of these systems and make some comparisons with experimental low-rate vector quantizers.

A . Product Vector Quantizers

In some cases, the natural vector quantizer dimension may be too large for implementation. For example, in transform vector quantization [27], it would be desirable to have a vector quantizer with the same dimension as the transform, but for a transform of dimension 256 and rate 1 bit/sample, this would require a code book of size 2256. Hence we are led to use product structured code books [28] in which the overall code book is made of a concatenation of smaller ones. This leads to some decrease in perfor- mance; the vector quantizer advantage results give us an idea how much. For instance, replacing a 256-dimensional vector quantizer with 32 eight-dimensional vector quantiz- ers, operating on a Gaussian source and assuming that the samples are uncorrelated (e.g., a transform coder with a transfer that approximately decorrelates the coefficients), leads to a predicted performance loss of 1.22 dB, where we have subtracted the shape and space-filling advantages for dimension 8 from those for dimension 256.

B. Scalar versus Vector Transform Coders

If the process under consideration is nonstationary, the analysis is greatly complicated. However, one example of practical interest can be easily treated. In scalar transform coders, the transform domain coefficients have different variances. Typically, different scalar quantizers are applied to each coordinate (matched to the coordinate’s variance) and an optimal allocation of bits among the quantizers is made. To use our previous results, we assume a jointly Gaussian random vector. Hence we are led to evaluate the vector quantizer advantage for a jointly Gaussian random vector with the optimal bit allocation among a family of scalar quantizers.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 10: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER

I

1029

We apply (1) to the scalar quantizers and take the average over k-dimensions to obtain

We can apply the same method as Huang and Shultheiss [29] who originally solved the bit allocation problem for the case r = 2 to give the optimal allocation of reproduc- tion levels for arbitrary r as

N N, = ; 1% ( ;)

with V as the geometric mean of the variances, V = (I-If= lu:)l/k. Under the optimal bit allocation,

D” = C(1, r ) ( 2 ~ ) ~ ’ ~ ( 1 + r)(1+r)’2N-rVr/2.

The distortion for the vector quantizer follows from (1) and the result for the norm of a jointly Gaussian random vector given in the Appendix:

Dividing D” by D” gives us the vector quantizer advan- tage. We note that the space-filling and shape advantages are precisely the same as those given before in Sections 11-A and -B, but the memory advantage is

Since (det R ) l / k I ( r I R i i ) ’ / k = V, the memory advantage is minimized and equal to one if and only if the compo- nents are uncorrelated. If we use these results to compare scalar and vector transform coders, we note that if the transform fails to decorrelate the coefficients or if the process is non-Gaussian (so that there is nonlinear depen- dence among coefficients) the vector quantizer will gain some memory advantage. The vector quantizer will always achieve space-filling and shape advantages. Also, the bit allocation implicitly assumes we can make nonintegral bit assignments. While scalar quantizers cannot make frac- tional bit assignments, vector quantizers can have effec- tively fractional bit assignments. T h s factor is quite im- portant in the superior performance of transform vector quantizers [27].

C. Linear Predictive Quantizers

Linear prediction is commonly used to remove redun- dancy due to statistical correlation between samples. In this respect, it is much like transform coding, in which removal of correlation may significantly reduce the mem-

ory advantage and hence make vector quantization (of the decorrelated sequence) less attractive. Again, however, a memory advantage due to failure to decorrelate completely the source and nonlinear dependency between samples, a shape advantage, and a space-filling advantage will result in some vector quantizer advantage over scalar quantiza- tion. Also, the ability to quantize at rates less than 1 bit/sample may be desirable.

D. Lattice and Pyramid Vector Quantizers

As we noted in our discussion of the shape advantage, some lattices are superior to others in terms of their use in specifying output points for vector quantizers; in fact, C ( k , r ) is defined in terms of the optimum space-filling polytope. For the i.i.d. uniform random process, there is no shape or memory advantage to be gained, and a vector quantizer based on a pure lattice may be the best imple- mentable quantizer. Such quantizers will be limited in the gain that they can achieve over repeated scalar quantiza- tion to the space-filling advantage. However, several lat- tices have fast encoding algorithms [30]-[32] that allow the use of much larger dimensions (and higher rates) than would be possible using a standard full search vector quantizer implementation. Hence, lattice quantizers are well matched to uniform memoryless sources, or to general memoryless sources when entropy coding is to be applied to the quantizer output.

Shannon theory includes the notion of a “typical set”: roughly, if one looks at a long vector x from an i.i.d. process, then with high probability, it is in the set for whch (l /n)logp(x) = - H ( j j ) , where H ( g ) is the differ- ential entropy of the marginal density of the process (see Section 111). The particular geometry of the typical set depends on the type of process. Fischer [33] has found a fast algorithm for searching a lattice restricted to the typical set for an i.i.d. process with Laplacian marginal density. This set has the shape of a pyramid; hence the name “pyramid quantization.” Such a quantizer has the potential of exploiting both the shape advantage for Laplacian sources and the space-filling advantage (depend- ing on the lattice chosen) but not the memory advantage. Again, Fischer’s fast algorithm allows consideration of longer dimensions and higher rates than a full search implementation; in fact, an implementation with current technology should be able to quantize 64-dimensional vec- tors at a sample rate of 16 kHz [34]. A Hadamard un- scrambling/permutation code due to Schroeder and Sloan [35] and several algebraic codes explored by Adoul and Lamblin [36] achieve the same low complexity and high- performance coding of the typical set for a Gaussian i.i.d. source (a spherical shell). These quantizers should work particularly well in a system in which some lund of prepro- cessing has removed much of the memory from the process (e.g., linear prediction), leaving a residual process to be quantized which is roughly memoryless with a Laplacian or Gaussian marginal.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 11: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1030 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

E. Universal Quantization

Ziv [37] has explored a system composed of a dithered uniform scalar quantizer followed by an entropy coder (e.g., a Huffman coder) whose input is a vector of k of the scalar quantizer’s outputs. He shows that such a system, which he calls a universal quantizer, achieves an output rate that is no more than 0.754 bit/sample higher than the optimal achevable by a k-dimensional quantizer. Gutman [38] extends Ziv’s result for mean square error to a variety of other distortions. A particularly nice aspect is that the results do not depend on high rate, as opposed to the theory behind this paper. On the other hand, as Gutman notes, the performance bound may become too loose to be useful as we consider rates around 1 bit/sample or below.

If we assume high rate so as to provide some compari- son with high-resolution quantization theory, we recall that, for the constrained-entropy problem, there is no shape advantage. Since the universal quantizer employs a vector-alphabet entropy coder, much of the memory in a k-dimensional input can still be extracted even after scalar quantization. In fact, as the quantizer becomes increas- ingly fine, the bit savings provided by the memory advan- tage are exactly the same as those provided by entropy blocks rather than single scalar quantizer outputs. To see this, note that by solving (21) for HQ, using (22), and assuming that all entropy coders actually achieve a rate equal to quantizer output entropy, we can show that the number of bits per sample saved due to the memory advantage is precisely H( jj) - (l /k)H( p ) . The number of bits per sample saved by using block entropy coding of successive scalar outputs rather than entropy coding them individually is H( j j Q ) - (l /k)H( pQ), where pQ is the probability mass function of a block of k scalar quantizer outputs, and j j Q is the probability mass function for a

single output. We can identify H( j j Q ) - ( l / k ) H ( p Q ) =

(l/k)Z( pQ*; pQ) , where Z( p;; p Q ) is the mutual informa- tion between the product of the marginals and pQ. Like- wise, H( jj) - ( l / k ) H ( p ) = (l/k)Z( p*; p ) , so that using the definition of the mutual information for probability densities as the supremum over the mutual information of the corresponding quantizer probability mass functions [39, p. 341, it follows that in the high-resolution approxi- mation, these two numbers are the same.

The remaining advantage is due to space-filling. In Section I11 we already noted the fact that the space-filling advantage by itself generates a bit savings that approaches and is bounded above by 0.254 bit. Hence the high-resolu- tion theory predicts that successive uniform scalar quanti- zation followed by scalar entropy coding with a memory- less source will be within 0.254 bit of the best vector quantizer for any dimension k. Since the memory advan- tages in the two cases (the high resolution theory and Ziv’s universal quantizer) are the same, the result also holds for sources with memory, i.e., a scalar uniform quantizer fol- lowed by entropy coding of blocks of quantizer outputs will be within 0.254 bit of the best vector quantizer for the same dimension. While this bound is quite a bit tighter than the universal quantizer result, we emphasize that a comparison rests on the assumption of high rate which is not required for Ziv’s result to hold.

F. Low Rate Experiments

While we require high rate for the validity of the theory presented here, many practical vector quantization schemes operate at low rates, a frequent rate for system compar- isons being 1 bit/sample. Several studied have been done to evaluate vector quantizers designed using the general- ized Lloyd algorithm for different statistical sources [14],

2.5 I I

1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

Dimension Fig. 4. Low rate VQ advantage: memoryless Gaussian source. Solid lines are predicted space filling and shape advantages

Dashed line [40] and dotted line [14] are total realized advantages at 1 bit/sample.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 12: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER 1031

4.5 I I

Dimension Low rate VQ advantage: memoryless Laplacian source. Solid lines are predicted space filling and shape advantages Fig. 5.

Dashed line [40] is total realized advantage at 1 bit/sample.

0

P P P

Dimension

Low rate VQ advantage: Gauss-Markov source with regression coefficient 0.9. Solid lines are predicted space filling, Fig. 6 . shape, and memory advantages. Dashed line [41] is total realized advantage at 1 bit/sample.

[40], [41]. We compare the results of actual vector quan- tizer designs at 1 bit/sample against our theoretical vector quantizer advantages for the memoryless Gaussian source (Fig. 4), memoryless Laplacian source (Fig. 5), and Gauss-Markov source with a regression coefficient of 0.9 (Fig. 6). The total expected vector quantizer advantage is the sum of the solid lines representing the appropriate space-filling, shape, and memory advantages for the partic- ular case. The dashed lines represent the realized vector quantizer advantage.

It is immediately apparent that actual low rate quantiz- ers realize only a small portion of the predicted gain. The

memoryless Gaussian source in particular behaves as if little of the shape and space-filling advantages are actually realized. In fact, in [14] and [41], experimental constrained resolution vector quantizer performances were compared against the constrained entropy high-resolution expres- sions (we recall that in the later case a shape advantage is never present).

Several factors help explain the discrepancy. Most im- portant is the low rate for the designed quantizers. Recall that the distortion of a high-resolution scalar quantizer is an integral part of our vector quantizer advantage deriva- tion. However, a 1-bit/sample scalar quantizer (two levels)

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 13: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

1032 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 35, NO. 5, SEPTEMBER 1989

in no way resembles a hgh-resolution quantizer with enough output levels so that the density is approximately constant across all bins. Secondly, we note that practical quantizers must deal with the overload distortion, in which no points are available for large valued (unlikely) outputs. For instance, a typical rule is to limit the quantization region to four times the standard deviation of the input; values that fall outside this range result in large distor- tions. Finally, there are the quirks of the generalized Lloyd algorithm itself, such as the possibility of converging to a bad local optimum [42].

Hence the vector quantizer advantage predictions be- come less reliable as we consider low-rate coders. How- ever, since the only reliable alternative is actually imple- menting designs and testing them, the vector quantizer advantage theory may be the best available for helping a designer decide whether to attempt a vector quantizer design or not. All the experiments suggest that the vector quantizer advantage at least provides an effective upper bound on the available gain.

VI. CONCLUSION

Formulas from high-resolution quantization theory can be applied to both scalar and vector quantizers. By factor- ing the ratio of the distortion of optimal hgh-resolution scalar and vector quantizers we can identify terms that contribute to the improvement in performance achievable by vector quantization. For the resolution-constrained problem, these terms are the space filling advantage, which is independent of source characteristics, the shape advan- tage, which depends on the marginal density shape, and the memory advantage, which depends on the dependence between samples in the source. In the entropy-constrained problem, the shape advantage is absent. The relative mag- nitude of these advantages was demonstrated through ta- bles and figures.

We have applied the insights gained from this factoriza- tion to a number of quantizer systems. We were able to analyze performance losses when a product-structured code book is substituted for a full-search code book. In trans- form coding applications, vector quantizers will always outperform scalar quantizers, although the memory advan- tage may be reduced or eliminated by using a decorrelating transform. In a similar manner, linear predictive quantiz- ers can reduce the memory advantage of vector quantiza- tion. Lattice vector quantizers benefit from the space-fill- ing advantage, while pyramid and similar restricted lattice vector quantizers can use both space-filling and shape advantages for Laplacian and Gaussian sources; hence these quantizers are well matched to sources for whch the memory advantage is absent. We showed that Ziv's univer- sal quantization results are not so surprising at high rates, although his results are valid in low-rate regions where high-resolution quantization theory breaks down. Finally, we showed that the numerical predictions of the theory overestimated the performance gains actually realized by replacing scalar by vector quantizers at low rates

(1 bit/sample), particularly for the memoryless Gaussian source.

ACKNOWLEDGMENT

The authors are indebted to Philip A. Chou and Ping Wah Wong for helpful comments and discussion.

APPENDIX

Calculation of the Resolution-Constrained Memory Advantage for a Jointly Gaussian Random Vector

random vector probability density: We first evaluate the k / ( k + r ) norm for the jointly Gaussian

llP(~)llA/(A+r) k / ( k + r ) ( k + r ) / k

1 & x ' R - ' x ) / Z ] dX] = [ j [ ( 2 ~ ) ~ / ~ ( d e t R ) ~ / ~

Letting S = [( k + r ) / k ] R, we have

det(S) = - det(R), ( k J k

which, in turn, gives (after some algebra)

For p * ( x ) we have R = u21 with I the identity matrix; thus

Dividing (28) by (27) gives the desired result (equation (19) in the text).

Calculation of the Entropy-Constrained Memoly Advantage for a Jointly Gaussian Random Vector

Let x - N(0, R) and y - N(0,a21). The heart of our problem is to find H( y ) - H ( x ) . From the theory of jointly Gaussian random vectors [43], there exists matrices A and B such that

A A T = R, BBT = OZI

z x = A - ' x - N(0,l) z , , = B - ' y - N ( O , l ) .

Trivially, B = al. Since the transformation A - ' is linear, its Jacobian JA - 1 is simply itself, as is the Jacobian JS-l . Berger [25, p. 871 shows that the differential entropy of two random variables related by a transformation differs by the expected value of the logarithm of the Jacobian, so that

H ( z x ) = H( x) + E[log,det( J , - I ) ]

= H ( x ) + E[log,det(A-')]

= H( x) -log,det( R)'/'. Similarly,

H ( z , ) =H(y)-1ogz(ok). Since z , and z,. have the same distribution, H(z,) = H(z,) so that

.

1

2 det(R) H( y ) - H( x) =-log, ~

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.

Page 14: High-Resolution Quantization Theory and the Vector ... · Rate-distortion theory provides results on vector quan- tizer performance for fixed rate as the vector dimension becomes

I

1033 LOOKABAUGH AND GRAY: HIGH-RESOLUTION QUANTIZATION THEORY AND VECTOR QUANTIZER

and [20] H. L. Royden, Real Ana&, 2nd ed., New York: Collier Macmil- lan, 1968. M. D. Paez and T. H. Glisson, “Minimum mean-squared error quantization in speech PCM and DPCM systems,” IEEE Trans. Commun., vol. COM-20, pp. 225-230, Apr. 1972.

[22] R. Bellman, Introduction to Matrix Analysis. New York: McGraw-Hill. 1960.

[23] M. B. Priestley, Spectral Analysis and Time Series. New York:

I211 ~ , ( k , ~ ) = 2 ( r / ~ ) l f ~ ( ~ ~ - W x ) l =

as stated in (23).

REFERENCES

A. Gersho and V. Cuperman, “Vector quantization: A pattern matching technique for speech coding,’’ IEEE Commun. Mag., vol. 21, pp. 15-21, Dec. 1983. R. M. Gray, “Vector quantization,” IEEE ASSP Mag., vol. 1, pp. 4-29, Apr. 1984. J. Makhoul, S. Roucos, and H. Gish, “Vector quantization in speech coding,” Proc. IEEE, vol. 73, pp. 1551-1588, 1985. W. R. Bennett, “Spectra of quantized signals,” Bell Sysr. Tech. J.,

M. P. Schutzenberger, “On the quantization of finite dimensional messages,” Inform. Contr., vol. 1, pp. 153-158, 1958. P. Zador, “Asymptotic quantization error of continuous signals and the quantization dimension,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 139-149, Mar. 1982 (previously an unpublished Bell Laboratories memorandum, “Topics in the asymptotic quantization of continuous random variables,” 1966). H. Gish and J. N. Pierce, “Asymptotically efficient quantizing,” IEEE Trans. Inform. Theoty, vol. IT-14, pp. 676-683, Sept. 1968. A. Gersho, “Asymptotically optimal block quantization,” IEEE Trans. Inform. Theory, vol. IT-25, pp. 373-380, July 1979. Y. Yamada, S. Tazaki, and R. M. Gray, “Asymptotic performance of block quantizers with difference distortion measures,” IEEE Trans. Inform. Theory, vol. IT-26, pp. 6-14, Jan. 1980. J. A. Bucklew and G. L. Wise, “Multidimensional asymptotic quantization theory with r th power distortion measures,” IEEE Trans. Inform. Theoty, vol. IT-28, pp. 239-247, Mar. 1982. J. A. Bucklew, “Upper bounds to the asymptotic performance of block quantizers,” IEEE Trans. Inform. Theory, vol. IT-27, pp. 577-581, Sept. 1981. -, “Two results on the asymptotic performance of quantizers,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 341-348, Mar. 1984. S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. Inform. Theoty, vol. IT-28, pp. 129-136, Mar. 1982 (previously an unpublished Bell Laboratories Technical Note, 1957). Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. COM-28, pp. 84-95, Jan. 1980. P. A. Chou, T. Lookabaugh, and R. M. Gray, “Entropy-con- strained vector quantization,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-36, pp. 31-42, 1988. R. B. Ash, Real Analysis and Probability. New York: Academic, 1972. E. S. Barnes and N. J. A. Sloane, “The optimal lattice quantizer in three dimensions,” SIAM J . Algebraic Discrete Methods, vol. 4, pp. 30-41, Mar. 1983. J. H. Conway and N. J. A. Sloane, “Voronoi regions of lattices, second moments of polytopes, and quantization,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 211-226, Mar. 1982. -, “A lower bound on the average error of vector quantizers,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 106-109, Jan. 1985.

vol. 27, pp. 446-472, July 1948.

Academic, 1981. N. S. Jayant and P. Noll, Digital Coding of Waveforms. Engle- wood Cliffs, NJ: Prentice-Hall, 1984. T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. P. No11 and R. Zelinski, “Bounds on quantizer performance in the low bit-rate region,” IEEE Trans. Commun., vol. COM-26, pp. 300-304, Feb. 1978. P. C. Chang, R. M. Gray, and J. May, “Fourier transform vector quantization for speech coding,” IEEE Trans. Commun., vol. 35, pp. 1059-1068, Oct. 1987. M. J. Sabin and R. M. Gray, “Product code vector quantizers for waveform and voice coding,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-32, pp. 474-488, June 1984. J. J. Huang and P. M. Shultheiss, “Block quantization of correlated Gaussian random variables,” IEEE Trans. Commun., vol. COM-11, pp. 289-296, Sept. 1963. J. H. Conway and N. J. A. Sloane, “Fast quantizing and decoding algorithms for lattice quantizers and codes,” IEEE Trans. Inform. Theoty, vol. IT-28, pp. 227-231, Mar. 1982. A. Gersho, “On the structure of vector quantizers,” IEEE Trans. Inform. Theoiy, vol. IT-28, pp. 157-166, Mar. 1982. K. Sayood, J . D. Gibson, and M. C. Rost, “An algorithm for uniform vector quantizer design,” IEEE Trans. Inform. Theoty, vol. 30, pp. 805-814, Nov. 1984. T. R. Fischer, “A pyramid vector quantizer,” IEEE Trans. Inform. Theoty, vol. 32, pp. 568-583, July 1986. Q. Quresh and T. Fischer, “A hardware pyramid vector quantizer,” in Proc. ICAASP, Dallas, TX, 1987, pp. 1402-1405. M. R. Schroeder and N. J. A. Sloane, “New permutation codes using Hadamard unscrambling,” IEEE Trans. Inform. Theoty, vol. IT-33, pp. 144-146, Jan. 1987. J. P. Adoul and C. Lamblin, “A comparison of some algebraic structures for CELP coding of speech,” in Proc. ICAASP, Dallas, TX, Apr. 1987, pp. 45.8.1-45.8.4. J. Ziv, “On universal quantization,” IEEE Trans. Inform. Theoty, vol. IT-31, pp. 344-347, May 1985. M. Gutman, “On uniform quantization with various distortion measures,’’ IEEE Trans. Inform. Theoty, vol. IT-33, pp. 169-171, Jan. 1987. R. J. McEliece. The Theory of Information and Coding, vol. 3 of Encyclopedia of Mathematics and Its Applications. Reading, MA: Addison-Wesley, 1977. T. R. Fischer and R. M. Dicharry, “Vector quantizer design for memoryless Gaussian, gamma, and Laplacian sources,” IEEE Trans. Commun., vol. COM-32, pp. 1065-1069, Sept. 1984. R. M. Gray and Y. Linde, “Vector quantizers and predictive vector quantizers for Gauss-Markov sources,” IEEE Trans. Commun., vol. COM-30, pp. 381-389, Feb. 1982. R. M. Gray and E. D. Karnin, “Multiple local optima in vector quantizers,” IEEE Trans. Inform. Theoty, vol. IT-28, pp. 256-261, Mar. 1982. B. W. Lindgren. Statrstical Theoty. New York: Macmillan, 1976.

Englewood Cliffs, NJ: Prentice-Hall, 1971.

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 12, 2008 at 06:17 from IEEE Xplore. Restrictions apply.


Recommended