A very low bit rate video coder based on vector quantization

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 5, NO. 2, FEBRUARY 1996

Images kbit/s per second 8 16

~

5 0.0632 I 0.1264

263

10 0,0316

A Very Low Bit Rate Video Coder Based on Vector

0,0632

Quantization Luis Corte-Real, Associate Member, IEEE, and Artur Pimenta Alves, Member, IEEE

Abstract-This work describes a video coder based on a hybrid DPCM-vector quantization algorithm that is suited for bit rates ranging from 8-16 kb/s. The proposed approach involves segmenting difference images into variable-size and variable-shape blocks and performing segmentation and motion compensation simultaneously. The purpose of obtaining motion vectors for variable-size and variable-shape blocks is to improve the quality of motion estimation, particularly in those areas where the edges of moving objects are situated. For the larger blocks, decimation takes place in order to simplify vector quantization. For very active blocks, which are always of small dimension, a specific vector quantizer has been applied, the fuzzy classified vector quantizer (FCVQ). The coding algorithm described displays good performance in the compression of test sequences at the rates of 8 and 16 kbls; the signal-to-noise ratios obtained are good in both cases. The complexity of the coder implementation is comparable to that of conventional hybrid coders, while the decoder is much simpler in this proposal.

I. INTRODUCTION N THE past few years there has been significant progress in I video coding, both in research and normalization. CCITT

and IS0 standards have been established specifying high- efficiency coders for a wide range of bit rates, such as the CCITT H.261 recommendation and the IS0 MPEGl and MPEG2. Major research efforts are concentrated at present in two video coding areas, namely very low bit rate video coding, the area of this work, and high definition video coding.

Very low bit rate video coding is still an open research area where useful contributions are expected from such distinct areas as computer vision, fractal geometry, and pattern recognition. There are many practical applications of this kind of coder ranging from video transmission by narrow-band channels-such as conventional telephone lines or mobile telephone networks-to multimedia applications used in data bases and electronic mail.

Very low bit rate video coding requires very high compression rate. In what follows, very low bit rate means a value less than or equal to 16 kb/s. Table I presents the average bits per pixel required to transmit video with spatial resolution of 144 x 176 pixels through channels of 16 and 8 kb/s.

This paper describes a hybrid video compression algorithm for very low bit rates that uses DPCM and vector quantization [I]. DPCM is intended to reduce temporal redundancy, whereas spatial redundancy is reduced by vector quantization.

Manuscript received June 17, 1994; revised June 6 , 1995. The authors are with the Departamento de Engenharia ElectrotCcnica e de

Computadores, Faculdade de Engenharia da Universidade do PortofiNESC, Porto, Portugal (e-mail: [email protected]).

Publisher Item Identifier S 1057-7149(96)01313-9.

TABLE I AVERAGE BITS PER PIXEL FOR IMAGE SEQUENCES W ~ H 144 x 176 RESOLUTION

The vector quantizer plays the role attributed to the transform in conventional algorithms and is applied to the difference image segmented into blocks of variable size and shape.

The algorithm has been designed in order to obtain a coder with a low implementation complexity and a high degree of adaptivity. Adaptivity is a very important feature in very low bit rate video coding, as it allows an efficient management of the limited number of available bits. When the coder is adapted to the spatial and temporal variations of the image features, it is possible to improve the quality of the coded images for a constant channel bit rate.

In Section 11, the basic algorithm is introduced. Section 111 describes image pre- and post-processing. Some complementary techniques of bit rate reduction are mentioned in Section IV. Color sequence coding is presented in Section V. Section VI discusses the problem of adaptation to a constant bit rate channel and Section VI1 presents some simulation results.

11. BASIC ALGORITHM

The fundamental characteristics of the algorithm are the use of vector quantization and the segmentation of the difference images into blocks of variable size and shape. This segmentation leads to a more natural division of the difference image, allowing a careful treatment of the high detail areas.

As the implementation complexity of vector quantization increases exponentially with block size, decimation and interpolation techniques were applied to large-size blocks. In this way, it is possible to vector quantize large blocks with reduced complexity. The resulting distortion of the image quality is minimal because these blocks correspond to areas with low activity.

Fig. 1 describes the coder structure schematically.

A. Segmentation The classical video and image compression algorithms

are based on a previous image division into square blocks of fixed size and the independent coding of each block. The choice of block size constitutes one of the problems of this kind of algorithm. Large-size blocks allow a higher

1057-7149/96$05.00 0 1996 IEEE

264 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 5, NO. 2, FEBRUARY 1996

TZlpe 0 1

.

Dimension 4 x 4 4 x 4

Vector Quantization Decimation

t

1 1 9 Memory

Segmentation

Fig. 1. Coder.

TABLE IT TYPES OF BLOCKS

~1 8 x 16

9 I[ 32 x 16 10 II 32 x 32 I

compression rate. This is achieved at the expense of an increase in implementation complexity and a degradation of subjective quality of the coded image in higher activity regions. Small blocks, on the other hand, allow a better adaptation to the most active image areas at the expense of compression efficiency. Besides this tradeoff in the choice of block sizes, the division in blocks of fixed size imposes the localization of the borders between blocks, and may lead to an unnatural division of the image. The transitions between areas with different characteristics may not correspond to the block borders.

All the problems raised by the division of images in blocks of fixed size suggest the segmentation of the image in blocks of variable size and shape, providing a new degree of adaptivity to the coder. Images will be divided into large-size blocks in areas of low activity and into small-size blocks in high activity areas. It is possible in this way to give special treatment to high-detail areas, assigning them a larger number of bits per pixel and making the division more natural. In short, coders using variable size and variable-shape blocks adjust themselves better to the nonstationary character of images.

The segmentation in variable size and variable-shape blocks requires the generation of overhead bits in order to allow the decoder to recognize and locate each block. The more structured the segmentation is, i.e., the more constraints are imposed on the position and shape of blocks, the smaller the number of overhead bits will be. Another important issue in the choice of the segmentation algorithm is complexity of implementation. The decrease in structuring entails an increase in complexity of the algorithm. The acceptable complexity is application- dependent; it is relatively low in video coding due to the requirement of real-time processing.

Fig. 2. Segmented difference image (type 0 blocks are shaded).

1) Multiple-Step Nonstructured Segmentation: In order to adapt the segmentation process to the features of the image, several algorithms have been presented El], [2] based on the division of the image into blocks of variable size and shape. These algorithms, unlike algorithms based on quadtrees, do not impose a priori constraints on the position and shape of the blocks, allowing the use of a larger set of block types [l]. Nevertheless, in the simulations of Section VII, only square and rectangular blocks have been used in order to reduce implementation complexity. Table I1 presents the types of blocks used.

The image is initially divided into blocks. These blocks are taken as the elementary unit. For each 4 x 4 block an activity measure is calculated. In this coder, the activity measure is the block variance, estimated as follows:

R,-l C,-1

where a( i , T , e) is the variance of a block of type i whose top left pixel is placed at (T , e) , Ri is the number of rows of the blocks of type i , Ci is the number of columns of the blocks of type i , and ~ ( k , I ) is the pixel intensity located in the position ( k , 1 ) . This activity measure has been judged adequate to the identification task of the higher detail areas, allowing the segmentation of the image in blocks with uniform features. The 4 x 4 blocks for which activity surpasses a preestablished threshold are classified as type 0, and all other blocks are classified as type 1. The latter can be aggregated in order to create larger blocks, as long as their activity measure remains below the threshold. The aggregation procedure is first applied to create as many blocks as possible of the larger size, decreasing block size at each step. The algorithm can be

CORE-REAL et al.: VERY LOW BIT RATE VIDEO CODER 265

Tvue Initial block Final block

4

I

10

0 ---- I I’

ob 2

5

8 U

3

6

9

0 0

Fig. 3. Decimation.

described as follows: init ialize segmentation for type := N downto 1 do

for row := 1 to column dimension do for column := 1 to row dimension do

if f(type, row, column) = 1 then assign block(type, row, column);

where

f ( i , 1 , c ) = 1 if it is possible to place block of typei at (1, e) { 0 otherwise

Fig. 2 presents the result of a difference image segmentation of “Miss America” sequence.

2) Techniques for Reducing Overhead Bits: While segmentation based on quadtrees allows a simple and compact description of the image division, nonstructured segmentation requires the use of variable- length codes in order to reduce the overhead bits. Coding of the segmentation description will be performed by going through the image in a preestablished order, such as from top to bottom and from left to right, and always sending a code whenever the beginning of a block is found (by convention, the upper left corner). To code each type of block it is necessary to define a model of the segmentation unit as a “source” of blocks. The simpler way to model the segmentation unit is to view it as a stationary source of independent blocks characterized by the probabilities of

I I

Fig. 4. (type 0 blocks are shaded).

occurrence for each type of block. The variable-length coder will be designed based on these probabilities. In practice, the probabilities of occurrence for each type of block will have to be estimated based on the relative frequency of occurrence of each type of block in a training sequence. The quality of probability estimation strongly influences the performance of the variable-length coder.

The described model represents reality imperfectly as it ignores the dependencies among blocks. This model can be significantly improved by using higher order statistics, namely conditional probabilities of order n . In this case, the source would be modeled by the probability P(xily) , where y = m(zi-1, . . . , xi--n) . This way it is possible to account for the influence of a certain context in future events. The function m only considers blocks that occupy the area of the block to be coded in the previous image. In order to code at instant t a block placed in position (z, y) , the function m calculates the integer that is closer to the average of the block types, weighted by the occupied areas. To each possible value of m (for the types in Table 11, m varies between zero and ten) corresponds a table of probabilities of occurrence for each block type allowing the design of the respective variable- length code. This coder can be seen as a state machine with a variable-length coder associated to each state.

B. Block Coding

1 ) Decimation and Interpolation of Blocks: Decimation is applied to the block to be coded to reduce the complexity of vector quantization for large-size blocks. All square blocks are transformed into 4 x 4 blocks and all rectangular blocks are transformed into 4 x 8 blocks, as illustrated in Fig. 3. For blocks of types 3, 6, and 9 (see Table 11) where the number of rows is greater than the number of columns, a rotation of 90” is executed prior to coding. This procedure allows uniform processing of rectangular blocks. The decimation operation is preceded by a lowpass filtering to avoid aliasing. After coding, a linear interpolation is executed in order to recover the original size of the blocks. As the decoder knows

Motion compensation with variable-size and variable-shape blocks

266 EEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 5, NO. 2, FEBRUARY 1996

I I I I 0 M a M b 100

Buffer level (Ti)

Activity threshold versus buffer level. Fig. 5.

the type of each block, interpolation is performed without transmitting additional information. The decimation and the interpolation are executed hierarchically, i.e., block size is repeatedly divided by two in the decimation case, or multiplied by two in the interpolation case, until the correct value is reached. This approach accounts for the normalization of the decimation and interpolation procedures.

2) Vector Quantization of Blocks: After decimation, blocks are reduced to three classes: class 0 comprises the blocks of type 0, and classes 1 and 2 include the remaining square and rectangular blocks, respectively. Each class is associated to a vector quantizer and two codebooks, one used to code original blocks and the other for difference blocks. The blocks of class 0 are coded using the FCVQ (Fuzzy Classified Vector Quantizer) coder [3] while blocks of other classes are coded by conventional vector quantizers. The fuzzy classified vector quantizer (FCVQ) is based on fuzzy set theory and consists basically of a method of extracting a subcodebook from the original codebook, biased by the features of the block to be coded. The incidence of each feature on the blocks is represented by a fuzzy set that captures its (possibly subjective) nature. Unlike the classified vector quantizer (CVQ), in the FCVQ a specific subcodebook is extracted for each block to be coded, allowing a better adaptation to the block. CVQ may be regarded as a special case of FCVQ.

The overhead information provided by the segmentation unit is used to select the correct vector quantizer. The same information allows the choice of the codebook in the decoding process, both in the coder and in the decoder.

3) Intra- and Inter- Modes: Coding difference images allows an increase in the compression rate as it explores the existing redundancy in consecutive images. Nevertheless, in areas of high motion the correlation between images tends to be reduced. There are two ways to overcome this problem: (a) using motion compensation; and (b) direct coding of the image, i.e., coding without calculating the difference image. The latter will be called intraimage mode as opposed to interimage mode, which corresponds to the basic functioning of the coder. The two techniques are complementary, motion compensation being applied in areas of moderate motion and intraimage mode to the regions of high motion and to scene cuts. As the high- motion areas are automatically identified by the segmentation unit as blocks of class 0, the intraimage mode is only applied to such blocks.

36 I I

30 I U I

0 32 64 96 128 Image

(a)

- x f 5 .-

2

o L I I I

0 32 64 96 128

Image

(b)

Fig. 6. sequence at 16 and 8 kb/s (thin lines).

(a) Peak S N R ; (b) bit rate for the monochrome “Miss America”

After blocks of class 0 are identified during the segmentation of the difference image, they are coded directly instead of the difference block. To reduce the dynamic range of the class 0 blocks when they are coded, the coder uses the difference between the block and an estimation of its average. The estimation of the average is the average of the already coded neighbor blocks in the difference image. The same estimation can be obtained independently in the decoder.

C. Motion Compensation Motion estimation techniques allow good quality predictions

of the new image in areas of moderate motion. Motion compensation results are only relevant to areas coded using interimage mode.

The block matching technique was selected for robustness and stability reasons. The implementation of the motion estimator implies the choice of two parameters: block size and search window size.

In the block matching technique, the same displacement vector is assigned to all the pixels of each block, and therefore the quality of the estimation is conditioned by block size. Large blocks may include pixels whose real displacement is very different from the one represented by the calculated displacement vector. It is advisable for this reason to use blocks

CORTE-REAL et al.: VERY LOW BIT RATE VIDEO CODER 267

of reduced size. The use of small blocks generates a higher number of displacement vectors. As this technique requires the transmission of overhead bits to code the displacement vectors, very small blocks should not be used. If the number of displacement vectors is too high, the gain obtained with motion compensation can be nullified due to the necessity to transmit overhead bits.

Size of the search window determines the capacity of the motion estimator to deal with large displacements, and influences the number of overhead bits. The search window must be large enough to include the larger displacements and small enough to allow the coding of the displacement vectors with a reduced number of bits.

1) Motion Compensation Using Variable-Size and Vari- able-Shape Blocks: As the use of fixed-size blocks does not assure the homogeneity of the areas from the motion point of view, motion compensation with variable size and variable shape appears as an interesting alternative [4]. This approach increases the adaptivity of the coder and allows motion estimates much closer to reality. In the coder under consideration, motion compensation with blocks of variable size and shape is associated to the segmentation operation, as illustrated in Fig. 1. The segmentation unit receives as inputs the most recently coded image and the new image. It tries to use large blocks on the segmentation of the differential image. If it is unable to assign a block to a certain position due to an excess of activity, it uses motion compensation to reduce activity. In case the motion compensator is not capable of reducing activity bellow the threshold, smaller blocks must be used.

As segmentation is related to image content, the displacement vectors tend to give a better approximation to the real displacements (see Fig. 4). The joint operation of segmentation and motion compensation significantly reduces the number of blocks. This reduction entails the reduction of number of coding bits and number of segmentation description bits.

In the simulations that were carried out, this method has originated a larger number of displacement vectors, which involves a larger number of overhead bits. Nevertheless, the total number of bits is less than that obtained for motion compensation with fixed-size blocks. With an equal number of coding bits, motion compensation using variable-size and variable-shape blocks results in a slight advantage in SNR ratio.

2 ) Displacement Vector Coding: Motion compensation by block matching involves the transmission of the displacement vectors to the decoder. This is required to maintain coherence between coding and decoding operations. The efficiency in coding displacement vectors determines the effective gain obtained with motion compensation. As some blocks have zero displacement, blocks with or without displacement are a priori distinguished. The coding of displacement vectors is, therefore, structured as follows: a) distinction between blocks with or without displacement; b) coding of the nonzero displacement vectors. Each of these operations generates overhead bits. The problem of reducing the number of bits for distinguishing between blocks with and without displacement is similar to the coding of a binary image. In this case, we used a run-

36 [ I

0’18 ’ I I , I

0 32 64 96 128

Image

(b)

Fig. 7. 16 and 8 kb/s (thin lines).

(a) Peak SNR; (b) bit rate for the monochrome “Claire” sequence at

length coder. The dimension of each run is coded using an arithmetic coder.

In the transmission of the displacement vectors, the difference between their components and the estimate of the components is coded. The displacement vector of the previous block is taken as an estimate of the displacement vector, taking advantage of the correlation between the displacement of one block and the displacement of the neighbor blocks.

111. PRE AND POSTPROCESSING

When coding with high compression rates, distortion is frequently visible in the reconstructed images. Certain kinds of defects can be attenuated by the use of filters at the coder input, in the coder (and in the decoder), or at the decoder output. This filtering, in certain situations, can also improve coder performance. It is possible, for example, to limit the temporal or spatial activity of the images to reduce the number of coding bits.

A. Preprocessing

The use of preprocessing (filter A of Fig. 1) has the goal of reducing the temporal activity of the image sequences in order to limit the number of coding bits. The use of this kind

268 E E E TRANSACTIONS ON IMAGE PROCESSING, VOL. 5, NO. 2, FEBRUARY 1996

4500

.- 2 2700 \I ............................... I ............................... 1 .................................... .................................. -I

0 ’ I

5000

0 ‘ I 0 32 64 96 128

Image

(b)

Fig. 8. Number of coding bits for (a) “Miss America” and (b) “Claire” sequences (from bottom to top: vector quantization bits, auxiliary coding bits, segmentation bits, and motion compensation bits).

of processing has been referred to in the literature as temporal recursive filtering [5]. Chen and Hein propose the following filter:

where g n ( i , j ) is the pixel in position (i, j ) of the filtered image at instant n , f n ( i , j ) is the corresponding pixel of the original image at instant n, and a is a filter coefficient with a value in the range 0-1.

The application of the filter to the image sequences obvi- ously introduces some distortion. In terms of statistic parameters, this filter has the following effects [5]:

0 variance reduction, visible in the reduction of details in motion areas increase of the correlation coefficient allowing a reduction in the number of coding bits reduction in the square error between consecutive images.

If coefficient a is not too low, the distortion introduced does not entail a great loss in the subjective quality of the image. In fact, this filter only affects the motion areas, decreasing the detail and introducing the “track” effect. However, as human

vision is less sensitive to details in motion areas, the introduced filter distortion is partially masked.

B. Postprocessing

The visible effects in rebuilt images can be attenuated by filtering. For the coder being described, the most common defects are

visibility of the block division

0 bad definition of the edges. A great variety of filters intended to attenuate image noise

without degrading the edges have been referenced in the literature [6]. Filter simulations with several filters in many positions have been realized to implement postprocessing. The selected filter is the one proposed by Lee [7] situated in the position of filter B in Fig. 1. Strictly speaking, a postprocessing filter should be placed in the output of the decoder, but filter B simultaneously improves the quality of the decoded images and reduces the noise in the coder loop. The chosen filter is applied to an M x N window of pixels centered in the pixel to be filtered, and is defined in the following manner:

high-frequency noise

where fn(i, j ) is the pixel at position ( i , j ) of the original image at instant n, op is noise standard deviation, and

k+N/2 1+M/2

i=k-N/2 j=l-M/2

( N + l ) ( M + 1) m(k, I) =

and k+N/2 l+M/2

i = k - N / 2 j = I - M / 2

( N + 1)(M + 1) v ( k , 2 ) =

In the implementation of this filter or has been estimated for each type of block using a training sequence.

IV. COMPLEMENTARY TECHNIQUES FOR BIT RATE REDUCTION

A. Commutation “Code-Not Code” The areas in the difference image with reduced spatial

activity correspond to areas of low temporal activity. For these areas, the difference image carries little information and, considering the coder characteristics, part of the bit rate will be used for coding noise. If these areas are identified, it is possible to reduce this bit rate without a significant increase in the degradation of the coded image. The identification and classification of the areas with low motion can be regarded as an upper level of segmentation, whereby the image is distinguished from the background. In this context, the background will be an image area that stays unchanged for a certain time period and not just what is usually considered. This option

CORTE-REAL et al.: VERY LOW BIT RATE VIDEO CODER

.l’ ................................................................... I ........................................................................

269

36 I I

0 32 64 96 128

Image

(a)

30 0 32 64 96 128

Image

(b)

Fig. 9. sequences at 16 kb/s (U thin line and V dash line).

(a) Peak SNR for the color “Miss America” and (b) “Claire”

for not coding areas with reduced or null motion allows a reduction in the number of coding bits and also increases stability in the background. In fact, coding noise and small variations in scene illumination can produce alterations in the background areas of the coded image. These alterations prejudice the subjective quality of the coded sequence.

Based on the segmented difference image, the blocks are classified as “to be coded” or “not to be coded” according to their activity. Whenever the activity of a block is less than a predefined threshold, the coder treats the corresponding image area as unchanged and does not code it. The threshold should be low enough not to give rise to an excessive degradation of the image quality and high enough to allow a significant reduction in the number of coding bits.

Simulations carried out using the thresholds 10, 20, and 30 have shown that a threshold of 30 causes a high degradation in the coded image quality. The ideal value should be situated in the range 10-20 according to the desired quality and available channel.

Block classification as “to be coded” and “not to be coded” implies the transmission of overhead bits to the decoder. The problem is similar to the distinction between blocks with and without motion compensation mentioned in Section III-

36

34

S 32 3

5 30 2

28

26

37

33 S U

C v

2 m 29

25

................. 1. ............. . ................................. \\., ; kz \

0 32 64 96 128

Image

(b)

Fig. 10. sequences at 8 kb/s (U thin line and V dash line).

(a) Peak SNR for the color “Miss America” and (b) “Claire”

B. Run-length coding is also used in this case and the runs are coded by an arithmetic coder.

B. Reduction of the Average Size of Codebook Addresses

The bit rate generated by a vector quantizer depends directly on the average size of the codebook addresses. In most vector quantizer implementations, the codebook address has fixed size equal to log 2 N , where N is the codebook size. The transmitted address is simply the order number, in binary, of the chosen block in the codebook.

In [3] a parameter estimator was presented, allowing a reduction of the number of bits generated by the FCVQ. The estimate of the block parameters is based on the already coded neighbor blocks and provides a sorting criterion for the codebook blocks according to their similarity with respect to the estimate. The transmitted address is no longer the absolute address of the chosen block in the codebook, but rather its order number in the list of blocks similar to the estimate. This order number is coded using variable-length codes. The average size of the codebook address depends, in this case, on estimate quality. This technique has been adopted to reduce the bit rate due to the blocks of type 0.

270 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 5 , NO. 2, FEBRUARY 1996

As the blocks of classes 1 and 2 have reduced activity, it is possible to use a simpler implementation technique based on an estimation with a single parameter without compromising the results. As the average is a parameter that is easy to obtain and has demonstrated a good discriminative capacity in the simulations, we have implemented an average estimator aimed at the reduction of the bit rate generated in the coding of blocks of classes 1 and 2. When coding a block of class 1 or 2, an estimate of the average is computed based on the neighbor areas, which are already coded. This estimate is simply the average intensity of pixels in the area considered. The corresponding codebooks are previously sorted by the block average. The search in the codebook is initiated in the block having an average nearer to the estimate, and the codebook is searched by increasing order of the absolute value of the difference between the estimate and the block average. The search terminates when all blocks in the codebook have been tested. The decoder receives the “distance,” in number of blocks, between the block with the average closer to the estimate and the chosen block. This “distance” is coded by a variable-length coder. If we can introduce in the codebook search a stop criterion allowing, in certain cases, to avoid searching the entire codebook, then it is possible to reduce the coding complexity and increase the efficiency of the variable-length codes. A simple criterion consists in admitting that, if the mean square error of a certain block is below a predefined threshold, then the block is an acceptable choice. This procedure restricts the search area in the codebook and small “distances” become more probable. Here, the square error threshold plays a role similar to that of parameter a of the FCVQ with estimator. The estimate of the average is computed using the neighboring area of the block to be coded in the current coded image.

V. COLOR SEQUENCE CODING

A simple way to code color sequences is to implement three identical coders to code separately each one of the three components. The color sequence would be decomposed into three independent monochromatic sequences and a coder applied to each one. According to information theory, compression rate increases with vector size for a constant distortion, so it is advisable from this point of view to code the three components together. Each vector to be coded would include elements from each of the components, and each codebook would be used to code a color block. This approach forces the codebooks to include vectors capable of approximating all the colors present in the sequence to be coded.

When coding for low bit rates, the codebooks have small size, and therefore the robustness of the joint coding of the three components is low [8]. If we have available an optimum vector quantizer based in a codebook of size N and substitute it for three optimum vector quantizers based on three codebooks of size N I , N2 , and N3 , where N = NI + Nz + N3 , we will obtain a coder that is equivalent to a suboptimum vector quantizer based on a codebook of size M = N I x Nz x N3 . The separate coding allows the reproduction of a larger number

of colors and is not so sensitive to the quality of the training sequence used to generate the codebooks.

The color representation by one luminance and two chrominance components (YUV) weakens the correlation between components and concentrates most of the energy in Y. This approach reduces the disadvantage of separate coding and allows subsampling of the U and V components due to their limited bandwidth. In the extension of the coder just described to code color sequences, the components Y, U, and V were used. Components U and V were sampled with half the sampling frequency used for Y. For a 144 x 176 image, the components U and V are represented by 72 x 88 images. Under these circumstances, introducing color corresponds to a 50% increase in the area to be coded.

Although we have coded luminance and chrominance separately, it is possible to take advantage of results of the previous coding of luminance. The results from the segmentation and motion compensation obtained in luminance coding can be used in chrominance coding.

Chrominance images can be independently segmented or can be subject to luminance segmentation with the required scale adjustments. In the second case, it is not necessary to transmit overhead bits for the segmentation description; however, as the number of blocks is the same as for luminance and the area of each chrominance image is one-fourth that of luminance, an excessive number of coding bits is generated. Although the size of the codebooks for luminance can be reduced, the gain in overhead bits is largely compensated by the loss in coding bits. The loss in block coding is due to reduction in efficiency of vector quantization as a result of the decrease in vector size. For the reasons indicated, the option has been to segment separately the two chrominance images.

Since the displacement vectors calculated for luminance faithfully reflect the effect of motion on chrominance, it is unnecessary to apply motion compensation to the chrominance images. The construction of the chrominance difference images is based on the displacement vectors obtained for the luminance, taking into account the necessary scale adjustment. This reduces the implementation complexity of the coder and avoids transmission of supplementary displacement vectors. Due to separately segmenting the chrominance and luminance images, it is not possible to simultaneously perform motion compensation and segmentation for the chrominance image.

VI. ADAFTATION TO A CONSTANT BIT RATE CHANNEL

Video coders, due to image features, act as variable bit rate sources. According to their criteria, coders try to keep constant image quality at the expense of bit rate adjustment. Since available channels are, in general, fixed bit rate channels, it is necessary to introduce adjustment mechanisms.

The bit rate control mechanism is based on two distinct but complementary capacities of the coder, the capacity to temporarily store bits in excess and the capacity to control the number of generated bits. The coder must generate an average bit rate less than or equal to the bit rate of the available channel. If this is not the case, either indispensable information is lost during the transmission or the required storage capacity

CORTE-REAL et al.: VERY LOW BIT RATE VIDEO CODER 27 1

(C) (4 Fig. 11. (a) and (b) Image number 72 from “Miss America” and “Claire” sequences coded for 16 kb/s and (c) and (d) for 8 kb/s.

tends to infinity. The possibility of temporarily storing the bits in excess is not intended to simulate, with respect to the coder, a channel with higher bit rate; its role is to adjust the bit rate generated by the coder, absorbing the peaks and using the periods of low bit rate. The size of the buffer influences decisively the delay between the occurrence of the original images and their presentation by the decoder. For these reasons, the buffer must be large enough to normalize the bit rate peaks and small enough not to introduce an excessive delay.

Having introduced a storage capacity in the coder, it is necessary to complement it with the ability to control the generated bit rate. For a specific instant, the degree of buffer occupation is used to compute an estimate of the bit rate that can be generated in a near future. It is therefore possible to “authorize” the coder to generate more bits when transmission limitations are small or nonexistent, and to impose additional restrictions when the number of stored bits is high. The management of this feedback system is complex due to the nonlinear character of the coder. In practice, heuristics were used, tuned by simulation and by real- world experience.

The main parameter for bit rate control is the activity threshold used by the segmentation unit. The variation of this threshold in the same image changes the number of blocks generated by the segmentation, which directly influences the number of bits used in coding, in segmentation description, and in coding the displacement vectors. The bit rate generated by

the coder decreases with an increase in the activity threshold. The bit rate control mechanism based in the activity threshold varies its value between a lower bound and an upper bound, according to the buffer occupation level. The lower bound is intended to prevent the activity threshold from assuming excessively low values that would induce abrupt increases in the buffer level. The upper bound avoids excessive degradation of image quality. If the buffer is occupied less than Mfl%, the minimum value of the threshold lmin is used, and if the memory is occupied in more than Mb% ( Ma < Mb ), the maximum value l,,, is used. For intermediate occupation rates M , the threshold used is

Fig. 5 illustrates the variation of the activity threshold with the occupation of the buffer.

The bit rate generated by the coder can be influenced by methods other than the activity threshold control. These methods are particularly useful in situations of high buffer occupancy. The following complementary techniques to control bit rate have been used in the implementation of the adjustment mechanisms for a constant bit rate channel:

variation of the a parameter of the preprocessing filter application or not of segmentation and motion compensation to the difference image without noise coding variation in the size of codebooks.

212 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 5 , NO. 2, FEBRUARY 1996

Two levels of buffer occupation were defined to control the a parameter of the preprocessing filter. These levels define three areas to which three values of a are associated. Such values should decrease with the increase in buffer occupation.

The second complementary bit rate control technique consists of the application of segmentation and motion compensation to the difference between uncoded images instead of using the rebuilt image. When motion compensation is performed using the previously coded image, not only the effects of movement are reduced but some coding noise is also canceled. If the uncoded image is used, the total number of blocks is reduced at the cost of degradation in the coded image. The coder switches to this operating mode whenever the buffer occupation level goes beyond a predefined threshold.

After segmentation, it is necessary to select the codebooks for vector quantization. Selection is based on the buffer level of occupancy after segmentation taking into account all the overhead bits, and the number of blocks of each type to transmit. Codebooks of several sizes are assigned to blocks of type 0, to blocks of type 1, and to blocks of type 2 and 3. For all other blocks the codebooks have fixed size.

When the smaller codebook is used and the free buffer memory does not allow storage of the coding bits originated by the blocks of type 0, the image will not be transmitted. In this case, only a small header is transmitted to inform the decoder. If it is possible to transmit the coding bits for blocks of type 0, the coder checks whether it can also transmit the bits corresponding to other blocks, always using the smaller codebooks. If it is not possible to transmit the bits corresponding to all other blocks, the image is partially coded. In this situation, image update is only performed in areas of higher motion.

Whenever free buffer memory allows the transmission of the whole image, the coder enters a second codebook selection phase, maximizing codebook size. Codebook size for type 0 blocks is maximized in the first place. Codebook selection terminates with the joint maximization of the size of the codebooks for blocks of types 1, 2, and 3.

VII. SIMULATION RESULTS

The coder has been used for simulations with the “Miss America” and “Claire” sequences and channels of 16 and 8 kbls. The sequences have a spatial resolution of 144 x 176 and a temporal resolution of ten images per second. The “Miss America” sequence used has 32 images and was run four times in alternate ways to artificially turn it into a 128-image sequence. The 64 images of the “Claire” sequence were run twice for similar reasons.

The codebooks were generated in two phases. Initially, codebooks were generated with the LBG algorithm for all types of blocks using a set of images taken from training sequences. In a second phase, using the initial tables, the coder has been applied to the training sequences, and each block to be coded was stored with a pointer to the vector used to code it. In the following step, each vector in each codebook has been replaced by the centroid of the blocks that point to it. With the new codebooks, the process has been

repeated iteratively until the reduction of the total square error goes below a predefined threshold. This process accounts for an adjustment of the codebooks to the operating conditions of the coder. This algorithm is an extension of the LBG algorithm to predictive vector quantizers. In applications like video surveillance, where frequent periods of reduced activity occur, this algorithm can be used to update the codebooks in real time, using the available bandwidth to transmit codebook innovations to the decoder.

Figs. 6 and 7 present the peak signal-to-noise ratio and the number of bits per pixel obtained for the two coded sequences (monochrome) at the constant rates of 16 and 8 kbls (thin line). Codebooks with 1024, 512, and 256 elements have been used for type 0 blocks. Blocks of types 1,2, and 3 have been coded using codebooks with 512, 256, and 128 elements. All other blocks have been coded with 64-element codebooks. The coder displays a stable behavior, although, as would be expected, it presents more noticeable fluctuations in image quality when coding for 8 kbls. The relative weight of the bits generated in block coding, segmentation, and motion compensation is shown in Fig. 8. The bits generated in block coding account for about half of the total number of bits. The number of bits used for segmentation description and for motion compensation are approximately the same.

Applying the coder to the color versions of the “Miss America” and “Claire” sequences, peak SNR of luminance decreases by about 1 dB. The results presented in Figs. 9 and 10 have been obtained using a codebook of 128 elements for type 0 blocks of chrominance and a codebook of 64 elements for the other blocks of chrominance. Due to the masking effect introduced by color, subjective quality of coded images is not significantly affected. Since the coder gives higher priority to luminance, at 8 kbls chrominance peak SNR is clearly affected.

As published results are scarce in this area and those that are widely known correspond to sequences with a temporal resolution different from ours, comparisons are hard to make. The signal-to-noise ratio curves obtained allow us to claim that this coder produces good-quality results. The coded images, although presenting some visible defects (particularly for 8 kbls), have a quality that is satisfactory for a large number of applications. The images of Fig. 11 illustrate the results of the four simulations mentioned for monochrome sequences.

Finally, in order to evaluate the performance of the bit rate control mechanism, Fig. 12 presents the buffer level during the coding of the two monochrome sequences for the two bit rates considered. It can be verified that the level of buffer occupation is high, especially in the 8 kbls case, which may be intolerable for some applications. In the presented simulations, the average level of buffer occupation is strongly influenced by the number of bits generated when coding the first image. This problem can be minimized by accepting lower quality for the first image or by reducing the temporal resolution in the beginning of the sequence.

VIII. CONCLUSIONS

A video coder for very low bit rates based on a hybrid DPCM-vector quantization algorithm was presented. The dif-

CORTE-REAL et al.: VERY LOW BIT RATE VIDEO CODER 273

8ooO I

0 32 64 96 128

8000

6000

- 2 g z 4000

m

2000

0 0 32 64 96 128

Image

(b)

Fig. 12. 16 and 8 kbls (thin line).

Buffer level for (a) “Miss America” and (b) “Claire” sequences at

ference images are previously segmented in variable-shape and variable-size blocks and the segmentation operation is inter- leaved with motion compensation. The computation of displacement vectors for variable-shape and variable-size blocks is intended to improve the quality of motion estimation. The large-size blocks are decimated in order to simplify their vector quantization. Very active blocks are treated by a specific vector quantizer, the fuzzy classified vector quantizer (FCVQ), reducing the complexity in codebook search and improving the subjective quality of the coded blocks.

The coding algorithm performed well in the compression of the test sequences for 8 and 16 kbls, obtaining good signal- to-noise ratio values in both cases. The coded images present a subjective quality that is satisfactory for a large number of applications. On the other hand, the algorithm has a relatively low implementation complexity, allowing implementation with simple hardware [9]. The coder implementation complexity is comparable to the conventional hybrid coders, and the decoder is simpler due to the use of vector quantization.

In the future, it will be possible to evolve an algorithm that will incorporate “object” concepts while maintaining a general-purpose nature.

REFERENCES

L. Corte-Real and A. P. Alves, “Vector quantisation of image sequences using variable size and variable-shape blocks,” Elec. Lett., vol. 26, pp. 1483-1484, 1990. J. L. Boxerman and H. J. Lee, “Variable block-sized vector quantization of grayscale images with unconstrained tiling,” in Proc. IEEE Int. Con$ Acoust., Speech, Signal Processing, 1990, pp. 2277-2280. L. Corte-Real and A. P. Alves, “A fuzzy classified vector quantizer for image coding,” IEEE Trans. Commun., vol. 43, pp. 207-215, 1995. M. Chan, Y. Yu, and A. Constantinides, “Variable size block matching motion compensation with applications to video coding,” in Proc. Inst Elec. Eng., vol. 137, pt. I, pp. 205-212, 1990. W.-H. Chen and D. Hein, “Recursive temporal filtering and frame rate reduction for image coding,” ZEEE J. Select. Areas Commun., vol. SAC-5, pp. 1155-1165, 1987. Y.-S. Fong, C. Pomalaza-Raez, and X.-H. Wang, “Comparison study on nonlinear filters in image processing applications,” Opt. Eng., vol. 28, pp. 749-760, 1989. J.-S. Lee, “Digital image enhancement and noise filtering by use of local statistics,” IEEE Trans. Patt. Anal. Machine Intell., vol. PAMl-2, pp. 165-168, 1980. H.-M. Hang and B. Haskell, “Interpolative vector quantization of color images,” IEEE Trans. Commun., vol. COM-36, pp. 465470, 1988. I. Martins, L. Corte-Real, N. Vasconcelos, and A. P. Alves, “Low bit rate vector quantization of image sequences with reduced complexity,” in RecPad ’92, pp. 179-186.

Luis Corte-Real (A’93) was born in Vila do Conde, Portugal, in 1958. He received his undergraduate degree in electrical engineering from the University of Porto, Porto, Portugal, in 1981. He received the M.Sc. degree in electrical engineering and computers in 1986 from Instituto Superior TCcnico, Lisbon, Portugal, and the Ph.D. degree from the University of Porto in 1994.

Since 1984, he has been a professor of telecommunications with the University of Porto. He is currentlv a urofessor with the Deuartment of Elec- .~

trical Engineering and Computers, Faculty of Engineering of the University of Porto. He has been the leader of the INESC Group of Audio and Video Coding since 1992. His current research interest is very low bit rate video coding.

Artur Pimenta Alves (M’90) was bom in Vila das Aves, Portugal, in 1947. He received his undergraduate degree in electrical engineering from the University of Porto, Porto, Portugal, in 1970. In 1981, he received the Ph.D. degree in electrical engineering from the University of Bradford, UK.

In 1970, he joined University of Porto as a teacher of telecommunications. He is currently associate professor at the Department of Electrical Engineering and Computers, Faculty of Engineering at the University of Porto. His current areas of

research interest are digital video coding and image communications. He has been responsible for several national and European projects in the areas of broadband networks and services.

Dr. Alves is a member of the board of Directors of INESC and FUNDETEC.

Date post:	05-Feb-2023
Category:	Documents
Upload:	twitter
View:	1 times
Download:	0 times

A very low bit rate video coder based on vector quantization

Documents