The Discrete Cosine Transform and its Application to Image Compression

The Discrete Cosine Transform and its

Application to Image Compression

Jason Creighton

May 14, 2010

AbstractThis paper discusses the discrete cosine transform and its role in the

JPEG image file format, and then presents ”MiniJPEG”, a trimmed-down implementation of the core of the JPEG algorithm written forMATLAB/GNU Octave.

1 The Discrete Cosine Transform

1.1 Introduction

The Discrete Cosine Transform, commonly called the DCT, represents a set ofdata points as a sum of sinusoidal waves with varying frequencies and magni-tudes. To see how this is possible, consider four numbers:

x0 = 10x1 = 3x2 = 5x3 = 1

Now consider functions of the form:

fi(x) = ai cos (bix+ ci)The crucial question is, can we select four such functions f0, f1, f2, f3, each

with their own values of a, b and c such that∑3i=0 fi(n) = xn for n = 0, . . . , 3?

The answer is yes.The easiest way to understand this is visually. Figure 1 shows the scatter

plot of xn.Now we need to find our four sinusoidal functions the sum of which fits the

data. We do not give their coefficients explictly, but Figure 2 gives their plots.Finally, in Figure 3, we sum the functions and confirm that they fit our

original data.Although in practice it is rare to explicitly find functions in this form, it is

instructive to do so, and we see shortly how those graphs were produced.

1

0

2

4

6

8

10

-1 0 1 2 3 4

Figure 1: A scatterplot of some example data

-4

-2

0

2

4

6

0 0.5 1 1.5 2 2.5 3

Figure 2: The separate waves that represent the example data

2

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5 3

Figure 3: Sum of waves along with the original points

1.2 DCT-II

The variant of the DCT used by JPEG is called the DCT-II. The transformationof N numbers, x0, ..., xN−1 to X0, ..., XN−1 is defined as:

Xk = w(k)N−1∑n=0

xn cos[

(2n+ 1)πk2N

]k = 0, ..., N − 1

Where w(k) is a scaling factor defined by:

w(k) =

√

1N k = 0√2N 1 ≤ k ≤ N − 1

1.3 DCT-III

The inverse of the DCT-II is the DCT-III, often just called the ”inverse DCT”,and it is defined by:

xk =N−1∑n=0

w(n)Xn cos[

(2k + 1)πn2N

]k = 0, . . . , N − 1

w(n) is the scaling factor, as defined previously.Notice that we could trivially restate the definition as:

xk =N−1∑n=0

fn(k) k = 0, . . . , N − 1

3

Figure 4: The waves produced by the 2-dimensional DCT

where:

fn(k) = w(n)Xn cos[

(2k + 1)πn2N

]The graphs given earlier for f0, f1, f2, f3 were obtained by taking the DCT

(and thereby calculating Xn for each function) and then plotting the resultingfunctions. Also note what happens when n = 0:

f0(k) = w(0)X0 cos[

(2k + 1)π(0)2N

]= w(0)X0 cos [0] = w(0)X0

This is why one of the graphs was of a constant function.

1.4 Two dimensional DCT

We have seen how the DCT can be applied to ”one dimensional” data. But wewant to compress images, a decidedly two-dimensional enterprise. The solutionis to represent our data as a matrix, and then apply the DCT along the rowsand then along the columns.

As before, first we will give a graphical example. Suppose we want to applythe 2 dimensional DCT to a 4x4 matrix. We have to represent 16 data points,and we will do it with the magnitudes of 16 waves. Figure 4 shows the waverepresented by each coefficient. Just as in the one-dimensional DCT, each valuerepresents the magnitude of a specific frequency of wave. Except now they aretwo dimensional waves!

4

1.5 The DCT as a matrix operation

Recall the definition of the DCT is:

Xk = w(k)N−1∑n=0

xn cos[

(2n+ 1)πk2N

]k = 0, ..., N − 1

Now let’s define an NxN matrix D:

D(i,j) =

i = 1√

1N

i 6= 1√

2N cos

[(2j−1)π(i−1)

2N

]Now we can restate the DCT as:

Xk =N−1∑n=0

D(k+1,n+1)xn k = 0, ..., N − 1

This is the multiplication of a matrix by a vector! We can now write theDCT entirely in terms of matrix operations. The DCT is:

~X = D~x

We can multiply both sides by D−1 to find the IDCT:

D−1 ~X = D−1D~x

D−1 ~X = I~x

~x = D−1 ~X

But D is an orthogonal matrix, so D−1 = DT . So

~x = DT ~X

The proof that D is orthogonal is not simple, so we will not present it here,but the fact that it is helps us explain some aspects of the DCT.

First, why does the DCT work for all inputs? The columns of D−1 arelinearly independent basis vectors, so to apply the DCT, we need only solve:

D−1 ~X = ~x

which of course is just:

~X = D~x

Secondly, the scaling factor we introduced seems somewhat arbitrary. For

example, why do we multiply the first element by√

1N ?

5

The first row of D consists of√

1N repeated N times. And so the first column

of DT must consist of√

1N repeated N times. Since an orthogonal matrix must

satisfy AAT = I, the first row of D multiplied by the first column of DT mustequal 1:

N∑i=1

√1N

√1N

=N∑i=1

1N

= N

(1N

)= 1

The scaling factor is specifically chosen to make the DCT matrix orthogonal.

2 JPEG Overview

2.1 File Formats

The JPEG image compression technique is actually used in a number of differentcontainer formats. By far the most common container is the JFIF format: Fileswith a .jpg or .jpeg extension are actually JFIF files. It should be understoodthat when we say ”JPEG” uses a certain technique or encoding scheme, we mean”JPEG as implemented in the JFIF format, in its most common use case.”

2.2 Color space transformation

Colors are generally represented in computers by using the ”RGB” colorspace.This means that every color is represented as a certain combination of red, greenand blue. JPEG stores images in the ”YCbCr” colorspace. As the name sug-gests, the YCbCr colorspace has three components: Y (luminance), Cb (”bluedifference”), Cr (”red difference”).

Converting from RGB to YCbCr is straightforward:

Y = 0.299R+ 0.587G+ 0.114BCb = −0.1687R− 0.3313G+ 0.5B + 128Cr = 0.5R− 0.4187G− 0.0813B + 128

As is the inverse, converting from YCbCr:

R = Y + 1.402(Cr − 128)G = Y − 0.34414(Cb− 128)− 0.71414(Cr − 128)B = Y + 1.772(Cb− 128)

The exact details are unimportant: The key point is that brightness (or”luminance”) is separated from color, which is critical for the next step.

6

2.3 Downsampling

The human eye is more sensitive to changes in brightness than changes in color.Most JPEG files reduce the horizontal and vertical resolution for chroma datain half, cutting the storage requirements for chroma data by a factor of four.

2.4 Block splitting

The image is then split into 8x8 blocks, each of which will be encoded individu-ally. While the DCT can be applied to data sets of any size, encoding in small,fixed-size blocks reduces difficulty of implementation and memory requirements.

2.5 Discrete cosine transform

Now we have reached the heart of the matter. Let’s say that we have somematrix A, where each entry in the matrix represents a ”brightness” on a scale of0 to 255, with 0 being completely black and 255 being completely white. Thismakes A a (very small) grayscale image:

A =

95 41 54 92 152 141 84 5453 22 15 34 38 136 141 3843 17 1 5 0 0 34 3528 19 4 0 0 6 26 3841 37 21 8 10 25 40 4638 38 32 26 24 42 51 5954 49 46 39 44 56 76 6764 65 59 62 60 50 72 112

Figure 5 shows A as an image.We subtract 128 from each element to rescale the values around 0:

A′ =

−33 −87 −74 −36 24 13 −44 −74−75 −106 −113 −94 −90 8 13 −90−85 −111 −127 −123 −128 −128 −94 −93−100 −109 −124 −128 −128 −122 −102 −90−87 −91 −107 −120 −118 −103 −88 −82−90 −90 −96 −102 −104 −86 −77 −69−74 −79 −82 −89 −84 −72 −52 −61−64 −63 −69 −66 −68 −78 −56 −16

Now let’s apply the DCT:

7

Figure 5: The matrix A as an image

D =

−654.12 −64.55 52.65 48.21 −5.13 20.84 −1.92 −0.1415.19 −20.86 −42.34 83.90 −9.56 24.79 5.47 −1.13179.62 −32.24 −53.77 42.49 8.77 4.22 7.83 2.5964.10 −3.15 −44.35 34.13 −3.50 −7.63 12.99 2.0231.63 18.64 −18.70 −9.63 34.63 −27.79 6.80 6.52−13.14 26.13 −17.43 −8.29 30.14 −16.95 1.85 12.43−14.19 27.95 −3.42 −26.68 39.80 −21.25 2.27 6.05−5.86 23.22 2.56 −6.38 15.18 −6.75 −2.23 8.69

What do these numbers mean?The large value in the upper-left corner is called the ”DC coefficient”.1 The

reason this value is much larger than the others is that that position correspondsto the ”zero frequency” wave, and thus represents the average brightness of thewhole block.

More generally, each number can be thought of as a magnitude of a two-dimensional wave, where the height of the wave corresponds to the brightnessof a pixel. Figure 6 shows the wave formed by each coefficient. Every 8x8grayscale image can be written in terms of those waves.

1The other coefficients are called the ”AC coefficients”. These terms originate from elec-trical transmission of signals, and are used in JPEG mainly for historical reasons.

8

Figure 6: The ”brightness” waves formed by each DCT coefficient

2.6 Quantization

We have transformed the image to a set of frequencies, but we still have to storeexactly as much information as we did before. What is the point of all of this?

One of the key insights in JPEG is that most of the ”important” informationis contained in the low-frequency coefficients. Quantization works by doing anelement-wise division of the DCT coefficients by a ”quantization matrix”.

We use the quantization matrix recommended for luminance data (since weare using a grayscale image) by the JPEG standard.[1]

Q =

16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99

The exact numbers are not important, but notice the general trend: Lower

numbers in the upper left (low frequencies), higher numbers in the lower right.(high frequencies). Since we are going to divide by these numbers and thenround to the nearest integer, higher numbers mean more information will bediscarded. After quantization:

9

Figure 7: The image after encoding and decoding

D′ =

−41 −6 5 3 0 1 0 01 −2 −3 4 0 0 0 013 −2 −3 2 0 0 0 05 0 −2 1 0 0 0 02 1 −1 0 1 0 0 0−1 1 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

All of the coefficents are now integers, which can be stored more compactly

than floating point numbers. But more importantly, most of the coefficents havebeen rounded to zero. In effect, we don’t need to store those at all! The vastmajority of JPEG’s compression comes from the quantization step.

The selection of a quantization matrix is somewhat arbitrary: In practice,JPEG encoders vary in the particular quantization matrix they use, a fact whichhas been used by forensic investigators to determine whether a digital image wasmodified after it was taken.[2]

Figure 7 shows the result of reversing the above process.The differences between the original and encoded may seem prohibitively

large, but they are nearly undetectable when the entire image is viewed. Figure 8shows the original test image[3], and Figure 9 shows the same image after JPEGcompression.

10

Figure 8: Original Image

Figure 9: Image after compression

11

2.7 Entropy coding

Entropy coding is a form of lossless data compression that encodes fixed-lengthdata by replacing them with a variable length encoding, with the most frequentlyused symbols assigned the shortest encoding. JPEG uses a technique known asHuffman Coding[4], which will not be described here as it is not relevant to theDCT.

3 MiniJPEG

3.1 Limitations

MiniJPEG does not implement a file format, can only encode grayscale images,and does not implement downsampling or entropy coding.

3.2 Support

MiniJPEG has been tested with GNU Octave 3.2.2 and MATLAB 7.8.0 (r2009a).Octave is an open source numerical computation language, designed to be com-patible with MATLAB. MiniJPEG will work in either MATLAB or Octavewithout additional packages installed, but it will be much faster if there arebuilt-in ”dct2” and ”idct2” functions available, as the functions presented hereare written with clarity in mind rather than performance.

3.3 Quality Setting

As we have seen, the quantization matrix determines the amount of lossy com-pression that occurs. It is desirable to allow the user of a graphics program somecontrol over the quality/size tradeoff being made, but they can be hardly be ex-pected to craft a new quantization matrix by hand. A common solution (usedby the popular libjpeg library[5]) is to rescale the quantization matrix basedon a ”quality” setting, which ranges from 1-100, with higher values indicatinghigher quality.

The scaling factor to multiply the quantization matrix is given by:

f(q) =

{50q q ≤ 50

2− q50 q > 50

After rescaling, the quantization matrix will be rounded to the nearest integer,and any entries that would have been zero are replaced by ones to avoid dividingby zero.

Notice that f(50) = 1, which means that a quality of 50 will use the rec-ommended quantization matrix without any rescaling. Also, f(100) = 0, whichmight seem to be a problem at first, but this case is handled by the roundingrules mentioned earlier.

12

3.4 Encoding

Encoding is done by the ”minijpeg encode” function. The first argument is agrayscale image (or an RGB image, but it will be converted to grayscale), thesecond (optional) argument is the ”quality” to use.

The return value is a structure containing the DCT coefficient matrices, thequantization matrix that was used, and some statistics about the encoding.

3.5 Decoding

Decoding is done by the ”minijpeg decode” function. The first (and only) argu-ment is the object that was returned by ”minijpeg encode”. The return valueis a grayscale image.

3.6 Results

Since MiniJPEG only implements a portion of JPEG, it’s hard to do an apples-to-apples comparison of it to other implementations of JPEG. So I chose toestimate the amount of compression based on how many coefficients in theimage went to 0. The following is for the example image shown in Figure 9:

Total Coefficients 393216Non-zero Coefficients 56443Compression Ratio 0.1435

Compression ratio is defined as:

Compression Ratio =Compressed Size

Uncompressed Size

4 MiniJPEG Full Source Code

4.1 minijpeg dct2.m

function result = minijpeg_dct2(x)% Use faster built-in function if availableif exist(’dct2’)

result = dct2(x);return

end

[rows, cols] = size(x);result = zeros(rows, cols);for i = 1:rows

result(i, :) = minijpeg_dct(x(i, :));endfor j = 1:cols

13

result(:, j) = minijpeg_dct(result(:, j)’)’;end

end

4.2 minijpeg dct.m

function result = minijpeg_dct(x)[rows, cols] = size(x);if rows ~= 1

error(’x can only have one row’);enddct_mat = minijpeg_dct_matrix(cols);result = (dct_mat * x’)’;

end

4.3 minijpeg dct matrix.m

function [ dct_mat ] = minijpeg_dct_matrix(n)dct_mat = zeros(n,n);

for i = 1:nfor j = 1:n

if i == 1dct_mat(i, j) = sqrt(1/n);

elsedct_mat(i, j) = sqrt(2/n) * cos(((2*j - 1) * pi * (i - 1)) / (2*n));

endend

endend

4.4 minijpeg decode.m

function image = minijpeg_decode(mj)[y_blocks, x_blocks, dummy, dummy2] = size(mj.dct_matrices);

image = zeros(y_blocks*8, x_blocks*8, ’uint8’);

for xb = 1:x_blocksfor yb = 1:y_blocks

y_index = ((yb-1)*8)+1;x_index = ((xb-1)*8)+1;dct_matrix = squeeze(mj.dct_matrices(yb, xb, :, :));image_block = round(minijpeg_idct2(double(dct_matrix) .* double(mj.q_matrix)) + 128);image(y_index:(y_index+7), x_index:(x_index+7)) = image_block;

end

14

end

image = image(1:mj.height, 1:mj.width);end

4.5 minijpeg encode.m

function mj = minijpeg_encode(image, quality)if nargin < 2

quality = 75;end

[height, width, channels] = size(image);

if channels == 3% Assume input is RGB and convert to grayscaleimage = minijpeg_rgb2gray(image);

elseif channels == 1% Assume it is already grayscale and do nothing

elseerror(’image matrix must be either RGB or grayscale data’);

end

mj.height = height;mj.width = width;

x_blocks = idivide(int32(width), 8, ’ceil’);y_blocks = idivide(int32(height), 8, ’ceil’);

mj.total_coefficients = x_blocks * y_blocks * 8 * 8;mj.nonzero_cofficients = 0;

zero_padded = zeros(y_blocks*8, x_blocks*8, ’uint8’);zero_padded(1:height, 1:width) = image;

mj.q_matrix = minijpeg_quantization_matrix(quality);

q_matrix_double = double(mj.q_matrix);

mj.dct_matrices = zeros(y_blocks, x_blocks, 8, 8, ’int16’);

for xb = 1:x_blocksfor yb = 1:y_blocks

y_index = ((yb-1)*8)+1;x_index = ((xb-1)*8)+1;image_block = zero_padded(y_index:(y_index+7), x_index:(x_index+7));

15

dct_matrix = round(minijpeg_dct2(double(image_block) - 128) ./ q_matrix_double);mj.nonzero_cofficients = mj.nonzero_cofficients + sum(sum(dct_matrix ~= 0));mj.dct_matrices(yb, xb, :, :) = dct_matrix;

endendmj.compression_ratio = double(mj.nonzero_cofficients) / double(mj.total_coefficients);

end

4.6 minijpeg idct2.m

function result = minijpeg_idct2(x)% Use faster built-in function if availableif exist(’idct2’)

result = idct2(x);return

end

[rows, cols] = size(x);result = zeros(rows, cols);for j = 1:cols

result(:, j) = minijpeg_idct(x(:, j)’)’;endfor i = 1:rows

result(i, :) = minijpeg_idct(result(i, :));end

end

4.7 minijpeg idct.m

function result = minijpeg_idct(x)[rows, cols] = size(x);if rows ~= 1

error(’x can only have one row’);enddct_mat = minijpeg_dct_matrix(cols);result = (dct_mat’ * x’)’;

end

4.8 minijpeg quantization matrix.m

function qm = minijpeg_quantization_matrix(quality)base_q_matrix = [

16 11 10 16 24 40 51 61;12 12 14 19 26 58 60 55;14 13 16 24 40 57 69 56;14 17 22 29 51 87 80 62;

16

18 22 37 56 68 109 103 77;24 35 55 64 81 104 113 92;49 64 78 87 103 121 120 101;72 92 95 98 112 100 103 99;

];if quality <= 50

scaling_factor = 50 / quality;else

scaling_factor = 2 - (quality/50);end

qm = uint16(round(base_q_matrix * scaling_factor));% In case anything has been rounded to zero, we need to set those elements to 1qm(qm==0) = 1;

end

4.9 minijpeg rgb2gray.m

function [ gray ] = minijpeg_rgb2gray(rgb)red = rgb(:, :, 1);green = rgb(:, :, 2);blue = rgb(:, :, 3);gray = (red * 0.299) + (green * 0.587) + (blue * 0.114);

end

References

[1] Gregory K. Wallace, The JPEG still picture compression standard, Com-munications of the ACM, April 1991

[2] Jesse D. Kornblum, Using JPEG quantization tables to identify imageryprocessed by software, Digital Investigation, September 2008

[3] Kodak Lossless True Color Image Suite, http://r0k.us/graphics/kodak/

[4] David A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the I.R.E., September 1952

[5] Independent JPEG group, http://www.ijg.org/

17

Date post:	03-Feb-2022
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

The Discrete Cosine Transform and its Application to Image Compression

Documents