DISCRETE COSINE TRANSFORMS~ Jennie G. Abraham
Fall 2009, EE5355
Reference Book: THE TRANSFORM AND DATA COMPRESSION HANDBOOK,
edited by K.R. Rao and P.C. Yip
4.0 Transform Introduction
In general, there are several characteristics that are desirable for the purpose of data
compression.
Transforms are useful entities that encapsulate these some/all of these characteristics:
Data decorrelation: The ideal transform completely decorrelates the data in a sequence/block;
i.e., it packs the most amount of energy in the fewest number of coefficients. In this way, many
coefficients can be discarded after quantization and prior to encoding. It is important to note
that the transform operation itself does not achieve any compression. It aims at decorrelating
the original data and compacting a large fraction of the signal energy into relatively few
transform coefficients.
Data-independent basis functions: Owing to the large statistical variations among data, the
optimum transform usually depends on the data, and finding the basis functions of such
transform is a computationally intensive task. This is particularly a problem if the data blocks
are highly nonstationary, which necessitates the use of more than one set of basis functions to
achieve high decorrelation. Therefore, it is desirable to trade optimum performance for a
transform whose basis functions are data-independent.
Fast implementation: The number of operations required for an n-point transform is generally
of the order O(n2). Some transforms have fast implementations, which reduce the number of
operations to O(n log n). For a separable n × n 2-D transform, performing the row and column
1-D transforms successively reduces the number of operations from O(n4) to O(2n2 log n).
4.1 DCT Introduction
The discrete cosine transforms (DCT) and discrete sine transform (DST) are members of a family
of sinusoidal unitary transforms. They are real, orthogonal, and separable with fast algorithms for
its computation. They have a great relevance to data compression
Sinusoidal unitary transform: ~ is an invertible linear transform whose kernel describes a set of
complete, orthogonal discrete cosine and/or sine basis functions.
E.g.: KLT, generalized DFT, generalized discrete Hartley transform, and various types of
the DCT and DST are members of this class of unitary transforms.
The family of discrete trigonometric transforms consists of 8 versions of DCT.
Each transform is identified as EVEN or ODD and of type I, II, III, and IV.
All present digital signal and image processing applications (mainly transform coding and
digital filtering of signals) involve only even types of the DCT and DST.
Therefore, we consider these four even types of DCT.
DCT-I Wang and Hunt defined for the order N +1.
DCT-II Ahmed, Natarajan, and Rao excellent energy compaction property, best
approximation for the optimal KLT
DCT-III Ahmed, Natarajan, and Rao Inverse of DCT-II
DCT-IV Jain fast implementation of lapped orthogonal transform
for the efficient transform/subband coding
4.1.2 Definitions of DCTs
Note:
For normalized even types of DCT in the matrix form : calculate RHS value for each n and
k at (n,k)
N is assumed to be an integer power of 2, i.e., N = 2m
subscript of matrix denotes its order
superscript denotes the version number
4.1.3 Mathematical Properties
DCT Matrices are real and orthogonal
Unitary Property
Linearity Property
, for a matrix M, constants α and β, and vectors
g and f, all DCTs are linear transforms.
The Convolution-Multiplication Property
Convolution in the spatial domain is equivalent to taking an inverse transform of the
product of forward transforms of two data sequences.
The convolution — multiplication property is a powerful tool for performing
digital filtering in the transform domain.
All DCTs are separable transforms multidimensional transform can be decomposed
into successive application of one-dimensional (1-D) transforms in the appropriate
directions.
4.3 Relations to the KLT
KLT is an optimal transform for data compression in a statistical sense because
it decorrelates a signal in the transform domain,
packs the most information in a few coefficients, and
minimizes mean-square error between the reconstructed and original signal compared to
other transform.
However, KLT is constructed from the eigenvalues and the corresponding eigenvectors of a
covariance matrix of the data to be transformed; it is signal-dependent, and there is no
general algorithm for its fast computation.
There is asymptotic equivalence of the family of DCTs with respect to KLT for a first-order
stationary Markov process in terms of transform size and the adjacent (inter element) correlation
coefficient ρ.
The performance of DCTs, particularly important in transform coding, is associated with the KLT.
For finite length data, DCTs and DSTs provide different approximations to KLT, and the best
approximating transform varies with the value of correlation coefficient ρ.
E.g.:
ρ KLT is reduced to
1 DCT-II (DCT-III)
0 DST-I
-1 DST-II (DST-III)
For infinite length data i.e. data if the transform size N increases (i.e., N tends to infinity
KLT is reduced to DCT I or DCT IV
This asymptotic behavior implies that DCTs and DSTs can be used as substitutes for KLT of
certain random processes.
4.4 Relation to DFT[Question (?)]
DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only
real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data
with even symmetry. The obvious distinction between a DCT and a DFT is that the former uses
only cosine functions, while the latter uses both cosines and sines (in the form of complex
exponentials).
Compared with DFT, DCT has two main advantages:
It’s a real transform with better computational efficiency than DFT which by definition is a
complex transform.
It does not introduce discontinuity while imposing periodicity in the time signal. In DFT, as
the time signal is truncated and assumed periodic, discontinuity is introduced in time
domain and some corresponding artifacts is introduced in frequency domain. But as even
symmetry is assumed while truncating the time signal, no discontinuity and related artifacts
are introduced in DCT.
4.5 Relevance to data compression DCT-II
Performance of DCT-II is closest to the statistically optimal KLT based on a number of
performance criteria.
variance distribution,
energy packing efficiency,
residual correlation,
rate distortion,
maximum reducible bits …
Exhibition of desirable characteristics for data compression namely,
o Data decorrelation
o Data-independent basis functions
o Fast implementation
The importance of DCT II is further accentuated by its -
Superiority in bandwidth compression (redundancy reduction) of a wide range of signals.
Powerful performance in the bit-rate reduction.
Existence of fast algorithms for its implementation.
DCT-II and its inversion, DCT-III, have been employed in the international image/video coding
standards: e.g.: JPEG, MPEG, H.261, H.263, H.264…
4.6 DCT Computation
4.6.1 : DCT Definition
4.6.2 DCT– Matrix Form:
Example of a 4x4 DCT Matrix:
Example of a 4x4 IDCT Matrix:
Example: A -point DCT matrix can be generated by
Assume the signal is , then its DCT transform is:
The inverse transform is:
4.6.3 Computation of DCT from DFT (using 2N point FFT):
To derive the DCT of an N-point real signal sequence , we
first construct a new sequence of points:
This 2N-point sequence is assumed to repeat its self outside the range
, i.e., it is periodic with period , and it is even symmetric with respect
to the point :
If we shift the signals to the right by 1/2, or, equivalently, shift to the left by 1/2 by
defining another index , then is even symmetric with
respect to . In the following we simply represent this new function by .
The DFT of this 2N-point even symmetric sequence can be found as:
Since is even and is odd with respect to , all terms
in the second summation are odd and the summation is zero (while all terms in the first summation
are even). It can also be seen that all is real and even . Next, we replace
by and get
Note that since all terms in the summation are all even symmectric, only the first half of the data
points need to be used. Moreover, as cosine function is even, is also even and
periodic with period , we have
,
indicating that a point ( ) in the second half is the same as its
corresponding point in the first half, i.e., the second half is redundant and therefore
can be dropped.
Now we have the discrete cosine transform (DCT):
where the nth row and mth column of the DCT matrix:
All row vectors of this DCT matrix are orthogonal and normalized except the first one ( ):
It is straightforward to show that a DCT matrix is orthonormal for n even, since the norm of
each row is unity and the dot product of any pair of rows is zero(the product terms may be
expressed as the sum of a pair of cosine functions, which are each zero mean).
To make DCT a orthonormal transform, we define a coefficient
so that DCT now becomes
where is modified with , which is also the component in the nth row and mth
coloum of the N by N cosine transform matrix:
Here is the ith row of the DCT transform matrix . As these
row vectors are orthogonal:
the DCT matrix is orthogonal:
The inverse DCT is
or in matrix form:
4.6.4 DCT Fast Algorithms:
1. N – point DCT via 2N point FFT
2. N – point DCT via N point FFT
3. Recursive Fast Algorithm
4. Sparse Matrix Factors
5. Prime Factor Algorithm for DCT
6. DIT & DIF Algorithms for DCT
Fast DCT algorithm
Forward DCT
The DCT of a sequence can be implemented by FFT. First
we define a new sequence :
Then the DCT of can be written as the following (the coefficient is dropped for now for
simplicity):
where the first summation is for all even terms and second all odd terms. We define for the second
summation , then the limits of the summation and for
becomes and for , and the second summation can be written as
where the equal sign is due to the trigonometric identity:
Now the two summations in the expression of can be combined
Next, consider the DFT of :
If we multiply both sides by
and take the real part of the result (and keep in mind that both and are real), we get:
The last equal sign is due to the trigonometric identity:
This expression for is identical to that for above, therefore we get
where is the DFT of (defined from ) which can be computed using FFT
algorithm with time complexity .
In summary, fast forward DCT can be implemented in 3 steps:
Step 1: Generate a sequence from the given sequence :
Step 2: Obtain DFT of using FFT. (As is real, is symmetric and
only half of the data points need be computed.)
step 3: Obtain DCT from by
Inverse DCT
The most obvious way to do inverse DCT is to reverse the order and the mathematical operations
of the three steps for the forward DCT:
step 1: Obtain from . In step 3 above there are N equations but 2N variables
(both real and imaginary parts of ). However, note that as are real, the real
part of its spectrum is even (N+1 independent variables) and imaginary part odd (N-1
independent variables). So there are only N variables which can be obtained by solving the
N equations.
step 2: Obtain from by inverse DFT also using FFT in complexity.
Step 3: Obtain from by
However, there is a more efficient way to do the inverse DCT. Consider first the real part of the
inverse DFT of the sequence :
This equation gives the inverse DCT of all even data
points . To obtain the odd data points, recall
that , and all odd data points
can be obtained from the second half of the previous equation in reverse order
.
In summary, we have these steps to compute IDCT:
step 1: Generate a sequence from the given DCT sequence :
step 2: Obtain from by inverse DFT also using FFT. (Only the real part need
be computed.)
Step 3: Obtain from by
These three steps are mathematically equivalent to the steps of the first method.
Data Compression
Although representing images in digital form allows visual information to be easily manipulated in
useful and novel ways, there is one potential problem with digital images—the large number of
bits required to represent even a single digital image directly. The need for image compression
becomes apparent when we compute the number of bits per image resulting from typical sampling
and quantization schemes. We consider the amount of storage for the “Lena” digital image shown
in Fig. 4.7.
The monochrome (grayscale) version of this image with a resolution 512 × 512 × 8 bits/pixel
requires a total of 2,097,152 bits, or equivalently 262,144 bytes. The color version of the same
image in RGB format (red, green, and blue color bands) with a resolution of 8 bits/color requires a
total of 6,291,456 bits (=512 × 512 ×3 × 8 bits/pixel), or 786,432 bytes. Such an image should be
compressed for efficient storage or transmission.
In order to utilize digital images effectively, specific techniques are needed to reduce the number
of bits required for their representation. Fortunately, digital images generally contain a significant
amount of redundancy (spatial, spectral, or temporal redundancy). Image data compression (the
art/science of efficient coding of the picture data) aims at taking advantage of this redundancy to
reduce the number of bits required to represent an image. This can result in significantly reducing
the memory needed for image storage and channel capacity for image transmission.
Image compression methods can be classified into two fundamental groups: lossless and lossy
Lossless compression -
Reconstructed image after compression identical to the original image.
Modest 1:2 or 1:3 compression ratios are achieved.
Lossy compression -
Reconstructed image contains degradations relative to the original.
Generally, more compression is obtained at the expense of more distortion.
Transform Coding Compression Scheme: [Question (?)]
The most used lossy compression technique is transform coding.
A general transform coding scheme involves subdividing an N ×N image into smaller
nonoverlapping n × n sub-image blocks and performing a unitary transform on each block. The
transform operation itself does not achieve any compression. It aims at decorrelating the original
data and compacting a large fraction of the signal energy into a relatively small set of transform
coefficients (energy packing property). In this way, many coefficients can be discarded after
quantization and prior to encoding.
In principle, DCT introduces no loss to the source samples, it merely transforms them to a domain
in which they can be more efficiently encoded.
Most practical transform coding systems are based on DCT of types II and III, which –
Provides good compromise between energy packing ability and computational complexity.
The energy packing property of DCT is superior to that of any other unitary transform.
Transforms that redistribute or pack the most information into the fewest coefficients
provide the best sub-image approximations and, consequently, the smallest reconstruction
errors.
DCT basis images are fixed (image independent) as opposed to the optimal KLT which is
data dependent.
E.g.: DCT-Based Image Compression/Decompression
Block diagram of encoder and decoder for JPEG DCT-based image compression and
decompression.