7/31/2019 Final Seminar Raju
1/47
1
Chapter 1
INTRODUCTION
Uncompressed multimedia (graphics, audio and video) data requires considerable
storage capacity and transmission bandwidth. Despite rapid progress in mass-storage density,
processor speeds, and digital communication system performance, demand for data storage
capacity and data-transmission bandwidth continues to outstrip the capabilities of available
technologies. The recent growth of data intensive multimedia-based web applications have
not only sustained the need for more efficient ways to encode signals and images but have
made compression of such signals central to storage and communication technology.
In the field of image processing, image compression is the current topic of research.
Image compression plays a crucial role in many important and diverse applications, including
televideo conferencing, remote sensing, document & medical and facsimile transmission.
1.1. Need For Compression
Image data is by its nature multidimensional and tend to take up a lot of space
Pictures take up a lot of storage space (either disk or memory). A 1000x1000 picture with 24 bits per pixel takes up 3 megabytes. The Encyclopedia Britannica scanned at 300 pixels per inch and 1 bit per pixel requires
25,000 pages 1,000,000 bytes per page = 25 gigabytes.
Video is even bulkier: 90 minute movie at 640480 resolution spatially, 24 bit per pixel,24 frames per second, requires 9060246404803=120 gigabytes.
Applications: HDTV, film, remote sensing and satellite image transmission, networkcommunication, image storage, medical image processing, fax.
1.2. Principles Behind Compression
A common characteristic of most images is that the neighboring pixels are correlated
and therefore contain redundant information. The foremost task then is to find less correlated
representation of the image. Two fundamental components of compression are redundancy
7/31/2019 Final Seminar Raju
2/47
2
and irrelevancy reduction. Redundancy reduction aims at removing duplication from the
signal source (image/video). Irrelevancy reduction omits parts of the signal that will not be
noticed by the signal receiver, namely the Human Visual System (HVS). In general, three
types of redundancy can be identified:
Spatial Redundancy or correlation between neighboring pixel values. Spectral Redundancy or correlation between different color planes or spectral bands. Temporal Redundancy or correlation between adjacent frames in a sequence of images
(in video applications).
Image compression research aims at reducing the number of bits needed to represent an
image by removing the spatial and spectral redundancies as much as possible. Since we will
focus only on still image compression, we will not worry about temporal redundancy.
Different methods for redundancy reduction are
Spatial redundancy: DCT, DWT, DPCM Statistical redundancy: Run-Length coding, Variable-Length coding
1.3. Image Compression Model
A typical image compression model consists of source encoder which is resposible for
reducing or eliminating any coding, interpixel or psycho visual redundancies in the input
image. Channel is a transmission path and source decoder reconstructs the original image
whose function is opposite to that of source encoder. The figure.1 shows the block diagram of
image compression model [1].
Figure.1 Image compression model[1]
Image channel
(a)
ImageSource
encoder
ChannelSource
decoder
Reconstructed
image
mapper quantizer Symbol
encoder
7/31/2019 Final Seminar Raju
3/47
3
Channel reconstructed image
(b)
Figure.2 (a) Source encoder (b) Source decoder[1]
The source encoder consists of three blocks. The first stage of the source encoding
process, the mapper transforms the input data into a format designed to reduce inter pixel
redundancies in the input image. This operation is generally reversible and may or may not
reduce directly the amount of data required to represent the image.
The second stage, or quantizer block in figure.2(a), reduces the accuracy of the
mappers output in accordance with some pre established fidelity criterion. The stage reduces
the psychovisual redundencies of the input image. This operation is irreversible. Thus it must
be omitted when error free compression is desired.
In the third and final stage of the source encoding process, the symbol coderblock in
figure.2 (a) creates a fixed- or variable-length code to represent the quantizer output and
maps the output in accordance with the code.
The source decoder shown in figure.2 (b) contains only two components symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse
operations of the source encoders, symbol encoderand mapperblocks.
The lossless and lossy methods are discussed separately in the 2nd
and 3rd
chapters
respectively.
Symbol
decoder
Inverse
mapper
7/31/2019 Final Seminar Raju
4/47
4
Chapter 2
LOSS LESS COMPRESSION METHODS
In numerous applications error-free compression is the only acceptable means of data
compression. One such application is the archival of medical or business documents, where
lossy compression usually is prohibited for legal reasons. Another is the processing of
satellite image, where both the use and cost of collecting the data makes any loss undesirable.
Yet another is digital radiography, where the loss of information can compromise diagnostic
accuracy. In these and other cases, the need for errorfree compression is motivated by theintended use or nature of the image under consideration. The lossless method normally
provides compression ratios of 2 to 10.
2.1.Run Length Encoding
This method reduces only inter pixel redundancy. The following example illustrates the Run
length coding method [2].
Original Image
63 63 63 63 64 64 64 78 89 89 89 89
Compressed Image
63, 4,64,3,78,1,89,4
Code the number of pixels taking the same value along a given scan line. Works particularly well on binary images since only length of run needs to encoded. Works by utilizing scan line coherence. Bit-plane run length encoding is used on non-binary images by considering each bit of
the, say 8 bit, image one at a time.
Compression rates of 1.5:1 (gray-scale / color images), 4:1 (binary images) and 2:1 (bit-plane compression on gray-scale /color images)
May cause a data explosion: the final file may be larger than the original one.
7/31/2019 Final Seminar Raju
5/47
5
2.2.Huffman Coding [2],[3]
This is the most popular technique for removing coding redundancy.
Huffman coding works on the image brightness histogram. Finds the most commonly occurring brightness patterns and uses the shortest codes to
represent these.
Compression rates of 1.5 to 2:1.Huffman coding may also be used after run length coding to give further compression.
An Example of Huffman coding:
Figure.1 illustrates the principles of Huffman coding. Assume that we wish to transmit the set
of 28 data points[3].
{1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}
The set consists of seven distinct quantized levels, or symbols. For each symbol, S i, we
calculate its probability of occurrence Pi by dividing its frequency of occurrence by 28, the
total number of data points. Consequently, the construction of a Huffman code for this set
begins with seven nodes, one associated with each P i. At each step we sort the Pi list in
descending order, breaking the ties arbitrarily. The two nodes with smallest probability, P i
and Pj, are merged into a new node with probability Pi + Pj. This process continues until the
probability list contains a single value, 1.0, as shown in Figure 2.1(a).
The process of merging nodes produces a binary tree as in Figure 2.1(b). The root of the
tree has probability 1.0. We obtain the Huffman code of the symbols by traversing down the
tree, assigning 1 to the left child and 0 to the right child. The resulting code words have the
prefix property. This property ensures that a coded message is uniquely decodable without
the need for look ahead. Figure 2.1(c) summarizes the results and shows the Huffman codes
for the seven symbols. We enter these code word mappings into a translation table and use
the table to pad the appropriatecode word into the output bit stream in the reduction process.
The reduction ratio of Huffman coding depends on the distribution of the source symbols. In
our example, the original data requires three bits to represent the seven quantized levels.
After Huffman coding, we can calculate the expected code word length
7
1
i i
i
El l p
Where li represents the length of Huffman code for the symbols. This value is 2.65 in our
example, resulting in an expected reduction ratio of 3:2.65. The reconstruction process begins
7/31/2019 Final Seminar Raju
6/47
6
at the root of the tree. If bit 1 is received, we traverse down the left branch, otherwise the
right branch. We continue traversing until we reach a node with no child. We then output the
symbol corresponding to this node and begin traversal from the root again. The
reconstruction process of Huffman coding perfectly recovers the original data. Therefore it isa lossless algorithm.
Figure 2.1.Illustration of Huffman coding. (a) At each step, P is are sorted in descending
order and the two lowest Pi are merged. (b) Merging operation depicted in a binary tree. (c)
Summary of Huffman coding for the data set[2].
However, a transmission error of a single bit may result in more than one decoding
error. This propagation of transmission error is a consequence of all algorithms that producevariable-length code words.
7/31/2019 Final Seminar Raju
7/47
7
2.3. Arithmetic Coding
Arithmetic coding is a lossless coding method which does not suffer from the afore
mentioned drawbacks and which tends to achieve a higher compression ratio than Huffman
coding. However, Huffman coding can generally be realized with simpler software and
hardware.
The basic idea behind arithmetic coding is to map the input sequence of symbols into
one single codeword. Symbol blocking is not needed since the codeword can be determined
and updated incrementally as each new symbol is input (symbol-by-symbol coding). At any
time, the determined codeword uniquely represents all the past occurring symbols. Although
the final codeword is represented using an integral number of bits, the resulting average
number of bits per symbol is obtained by dividing the length of the codeword by the number
of encoded symbols [2],[3].
Arithmetic Coding Algorithm:
1) Divide the interval [0,1] into segments corresponding to the M symbols; the segment of
each symbol has a length proportional to its probability.
2) Choose the segment of the first symbol in the string message.
3) Divide the segment of this symbol again into M new segments with length proportional tothe symbols probabilities.
4) From these new segments, choose the one corresponding to the next symbol in the
message.
5) Continue steps 3) and 4) until the whole message is coded.
6) Represent the segment's value by a binary fraction.
2.4. Lempel-Ziv Coding
Huffman coding and arithmetic coding requires a priori knowledge of the source symbol
probabilities or of the source statistical model. In some cases, a sufficiently accurate source
model is difficult to obtain, especially when several types of data (such as text, graphics, and
natural pictures) are intermixed [4], [5].
Dictionary-based coders dynamically build a coding table (called dictionary) of variable-
length symbol strings as they occur in the input data. As the coding table is constructed,
fixed-length binary code words are assigned to the variable length input symbol strings by
7/31/2019 Final Seminar Raju
8/47
8
indexing into the coding table. In Lempel-Ziv (LZ) coding, the decoder can also dynamically
reconstruct the coding table and the input sequence as the code bits are received without any
significant decoding delays. Although LZ codes do not explicitly make use of the source
probability distribution, they asymptotically approach the source entropy rate for very long
sequences. Because of their adaptive nature, dictionary-based codes are ineffective for short
input sequences since these codes initially result in a lot of bits being output. So, short input
sequences can result in data expansion instead of compression.
Let S be the source alphabet consisting of N symbols Sk (1 < k < N). The basic steps of
the LZW algorithm can be stated as follows [4]:
1. Initialize the first N entries of the dictionary with the individual source symbols of S,
2. Parse the input sequence and find the longest input string of successive symbols w
(including the first still uuencoded symbol s in the sequence) that has a matching entry in the
dictionary.
3. Encode w by outputting the index (address) of the matching entry as the codeword for w.
4. Add to the dictionary the string ws formed by concatenating w and the next input symbol s.
5. Repeat from step 2 for the remaining input symbols starting with the symbol s, until the
entire input sequence is encoded.
Consider the source alphabet S = {S1, S2, S3, S4}. The encoding procedure is illustrated for
the input sequence S1 S2 S1 S2S3 S2 S1 S2. The constructed dictionary is shown in Table.2.1.
Table.2.1: Dictionary constructed while encoding the sequence S1 S2 S1 S2 S3 S2 S1 S2,
which is emitted by a source with alphabet S = {S1, S2, S3, S4}[4].
The resulting code is given by the fixed-length binary representation of the following
sequence of dictionary addresses: 1 2 5 3 6 2. The length of the generated binary code words
depends on the maximum allowed dictionary size. If the maximum dictionary size is M
entries, the length of the code words would be log2 (M) rounded to the next smallest integer.
7/31/2019 Final Seminar Raju
9/47
9
The decoder constructs the same dictionary as the code words are received. The basic
decoding steps can be described as follows:
1. Start with the same initial dictionary as the encoder. Also, initialize w to be the empty
string.
2. Get the next "codeword", and decode it by outputting the symbol string smstored at address
"codeword" in dictionary.
3. Add to the dictionary the string ws formed by concatenating the previous decoded string w
(if any) and the first symbol s of the current decoded string.
4. Set w = m and repeat from step 2 until all the code words are decoded.
2.5. Predictive Coding [1],[5]
Original Image
63 63 63 63 64 64 64 78 89 89 89 89
Compressed Image
63,0,0,0,1,0,0,14,11,0,0,0
Stores the deference between successive pixels' brightness in fewer bits. Relies on the image having smooth changes in brightness: at sharp changes in the
image we need overload patterns.
2.7. FELICS (Fast, Efficient, and Lossless Image Compression System)
It is a special-purpose compression method designed for greyscale images and it
competes with the lossless mode of JPEG. It is fast and it generally produces good
compression. However, it cannot compress an image to below one bit per pixel, so it is not a
good choice for bi-level or for highly redundant images [2].
The principle of FELICS is to code each pixel with a variable-size code based on the
values of two of its previously seen neighbour pixels. Figure.2.2.(a) shows the two known
neighbours A and B of some pixels P. For a general pixel, these are the neighbours above it
and to its left. For a pixel in the top row, these are its two left neighbours (except for the first
two pixels of the image). For a pixel in the leftmost column, these are the first two pixels of
the line above it. Notice that the first two pixels of the image dont have any prev iously seen
7/31/2019 Final Seminar Raju
10/47
10
neighbours, but since there are only two of them, they can be output without any encoding,
causing just a slight degradation in the overall compression.
Consider the two neighbours A and B of a pixel P. We use A, B, and P to denote both
the three pixels and their intensities (greyscale values). We denote by L and H the neighbours
with the smaller and the larger intensities, respectively. Pixel P should be assigned a variable-
size code depending on where the intensity P is located relative to L and H. There are three
cases:
1. The intensity of pixel P is between L and H (it is located in the central region of
Figure.2.2.(b)). This case is known experimentally to occur in about half the pixels, and P is
assigned, in this case, a code that starts with 0. The probability that P will be in this central
region is almost, but not completely, flat, so P should be assigned a binary code that has
about the same size in the entire region but is slightly shorter at the centre of the region.
2. The intensity of P is lower than L (P is in the left region). The code assigned to P in this
case starts with 10.
(a) (b)Figure 2.2(a) The two neighbours ( b) The three regions[2].
Table.2.2. The Codes for the Central Region[2].
Pixel Region Pixel
P code code
L=15 0 0000
16 0 0010
17 0 010
18 0 011
19 0 100
20 0 101
21 0 110
22 0 111
23 0 0001H=24 0 0011
7/31/2019 Final Seminar Raju
11/47
11
3. Ps intensity is greater than H. P is assigned a code that starts with 11. When pixel P is in
one of the outer regions, the probability that its intensity will differ from L or H by much is
small, so P can be assigned a long code in these cases . The code assigned to P should
therefore depend heavily on whether P is in the central region or in one of the outer regions.
Here is how the code is assigned when P is in the central region. We need H L+1 variable-
size codes that will not differ much in size and will, of course, satisfy the prefix property. We
set k = [log2 (H L + 1)]and compute integers a and b by
a = 2k+1 (H L + 1), b= 2(H L + 1 2
k).
Example: If H L = 9, then k= 3, a = 23+1 (9 + 1) = 6, and b = 2(9+1 23) = 4. We now
select the a codes 2k1, 2k2,. . . expressed as k-bit numbers, and the b codes 0, 1, 2, . . .
expressed as (k+ 1)-bit numbers. In the example above, the a codes are 8 1 = 111, 8 2 =
110, through 8 6 = 010, and the b codes, 0000, 0001, 0010, and 0011. Table .2.2. shows
how ten such codes can be assigned in the case L = 15, H = 24.
7/31/2019 Final Seminar Raju
12/47
12
Chapter 3
LOSSY COMPRESSION
The compression achieved via lossless schemes is often inadequate to cope with the
volume of image data involved. Thus, lossy schemes (also called irreversible) have to be
employed, which aim at obtaining a more compact representation of the image at the cost of
some data loss, which however might not correspond to an equal amount of information loss.
In other words, although the original image cannot be fully reconstructed, the degradation
that it has undergone is not visible by a human observer for the purposes of the specific task.
Compression ratios achieved through lossy compression range from 4:1 to 100:1 or even
higher.
3.1. Performance Evaluation Parameters
To compare different algorithms of lossy compression several approaches of measuring
the loss of quality have been devised. In the MI context, where the ultimate use of an image is
its visual assessment and interpretation, subjective and diagnostic evaluation approaches are
the most appropriate. However, these are largely dependent on the specific task at hand and
moreover they entail costly and time-consuming procedures. In spite of the fact that they are
often inadequate in predicting the visual (perceptual) quality of the decompressed image,
objective measures are often used since they are easy to compute and are applicable to all
kinds of images regardless of the application.
Compression ratio is defined as the nominal bit depth of the original image in bits per
pixel (bpp) divided by the bpp necessary to store the compressed image. For each compressed
and reconstructed image, an error image was calculated. From the error data, maximum
absolute error (MAE), mean square error (MSE), root mean square error (RMSE), signal to
noise ratio (SNR), and peak signal to noise ratio (PSNR) were calculated [7],[8].
The maximum absolute error (MAE) is calculated as
MAEmaxf(x,y) f*(x,y)
Wheref (x, y) is the original image data andf*(x, y) is the compressed image value. The
formulae for calculated image matrices are:
11*
00
1
= (,) (,.
NM
i jMSE fxyfxNM
7/31/2019 Final Seminar Raju
13/47
13
RMSE MS
WhereMandNare the matrix dimensions in x and y, respectively.
1 1
2
0 0
1 1*
0 0
(,)10log
(,) (,)
NM
i j
NM
i j
f xySNR
f xy f xy
25520logPSNRRMS
3.2. Transform Coding
In transform coding, a block of correlated pixels is transformed into a set of less
correlated coefficients. The transform to be used for data compression should satisfy two
objectives. Firstly, it should provide energy compaction: i.e. the energy in the transform
coefficients should be concentrated to as few coefficients as possible. This is referred to as
the energy compaction property of the transform. Secondly, it should minimize the statistical
correlation between the transform coefficients. As consequence transform coding has a good
capability of data compression, because not all transform coefficients need to be transmitted
in order to obtain good image quality and even those that are transmitted need not be
represented with full accuracy in order to obtain good image quality. In addition the
transform domain coefficients are generally related to the spatial frequencies in the image and
hence the compression techniques can exploit the psycho-visual properties of the HVS, by
quantizing the higher frequency coefficients more coarsely, as the HVS is more sensitive to
the lower frequency coefficients [2].
.
3.2.1. The Discrete Cosine Transform
The important feature of the DCT is that it takes correlated input data and concentrates
its energy in just the first few transform coefficients. If the input data consists of correlated
quantities, then most of the n transform coefficients produced by the DCT are zeros or small
numbers, and only a few are large (normally the first ones). The early coefficients contain the
important (low-frequency) image information and the later coefficients contain the less-
important (high-frequency) image information. Compressing data with the DCT is therefore
done by quantizing the coefficients. The small ones are quantized coarsely (possibly all the
way to zero), and the large ones can be quantized finely to the nearest integer. Afterquantization, the coefficients (or variable-size codes assigned to the coefficients) are written
7/31/2019 Final Seminar Raju
14/47
14
on the compressed stream. Decompression is done by performing the inverse DCT on the
quantized coefficients. This results in data items that are not identical to the original ones but
are not much different.
The DCT is applied to small parts (data blocks) of the image. It is computed by applying
the DCT in one dimension to each row of a data block, then to each column of the result
[2],[7],[8],[9],[11],[12]. Because of the special way the DCT in two dimensions is computed,
we say that it is separable in the two dimensions. Because it is applied to blocks of an image,
we term it a blocked transform. It is defined by
11
00
22 (21) (21)cos cos2 2
nm
ij ij xy
ij
yj xiGCCp
mn m
for 0 i n-1 and 0 j m1 and for Ci and Cj defined by Equation Gf .The firstcoefficient G00is termed the DC coefficient, and the remaining coefficients are called the
AC coefficients. The image is broken up into blocks of nm pixels pxy (with n = m = 8
typically), and Equation Gij is used to produce a block ofnm DCT coefficients Gijfor each
block of pixels. The coefficients are then quantized, which results in lossy but highly efficient
compression. The decoder reconstructs a block of quantized data values by computing the
IDCT whose definition is
11
00
22 (21) (21)cos cos2 2
nm
xy ijij
ij
xi yjp CCG
mn n
Where
1, 0
2
1, 0,
f
fC
f
, for 0 x n 1 and 0 y m 1.
Steps involved in DCT image compression technique: 1. The image is divided into kblocks of
88 pixels each. The pixels are denoted by xyp . If the number of image rows (columns) is not
divisible by 8, the bottom row (rightmost column) is duplicated as many times as needed.
2. The DCT in two dimensions is applied to each blockBi. The result is a block (well call it a
vector) W(i)
of 64 transform coefficients( )i
jw (wherej = 0, 1, . . . , 63). The kvectors W
(i)
become the rows of matrix W
(1) (1) (1)
0 1 63
(2) (2) (2)
0 1 63
( ) ( ) ( )
0 1 63
. .
. .
= . .
. .
. . .k k k
w w w
w w w
w w w
W
7/31/2019 Final Seminar Raju
15/47
15
3. The 64 columns ofW are denoted by C(0)
, C(1)
, . . . , C(63)
. The kelements ofC(j)
are
(1) (2) (), , . . . , kj j jww w . The first coefficient vector C(0) consists of the kDC coefficients.
4. Each vector C(j)
is quantized separately to produce a vector Q(j)
of quantized coefficients.
The elements of Q(j) are then written on the compressed stream. In practice, variable-size
codes are assigned to the elements, and the codes, rather than the elements themselves, are
written on the compressed stream.
3.2.1.1 JPEG
Most high-quality algorithms today use some form of transform coder. One widely used
standard is the JPEG compression algorithm, based on the discrete cosine transform (DCT).
The image is partitioned into 8 8 blocks, each of which is then transformed via a tensorproduct of two 8-point DCTs. The transform coefficients are then arranged into 64 sub bands,
scalar-quantized, and adaptively Huffman coded [7]. The JPEG algorithm is discussed in
detail in next chapter.
3.2.2. Wavelets
Wavelets are functions defined over a finite interval and having an average value of
zero. The basic idea of the wavelet transform is to represent any arbitrary function as a
superposition of a set of such wavelets or basis functions. These basis functions or baby
wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilations
or contractions (scaling) and translations (shifts).
Wavelet methods involve overlapping transforms with varying-length basis functions.
The Overlapping nature of the transform (each pixel contributes to several output points)
alleviates blocking artifacts, while the multiresolution character of the wavelet decomposition
leads to superior energy compaction and perceptual quality of the decompressed image.
Furthermore, the multiresolution transform domain means that wavelet compression methods
degrade much more gracefully than block-DCT methods as the compression ratio increases.
One wavelet algorithm, the embedded zerotree wavelet (EZW) coder, yields acceptable
compression at a ratio of 100:1. Wavelet coding schemes are especially suitable for
applications where scalability and tolerable degradation are important[7],[8],[9],[12],[13].
Actually no compression is achieved by wavelets. It decomposes the image different
frequency bands and actual compression is done by quantisation and entropy coding. There
7/31/2019 Final Seminar Raju
16/47
16
are many ways of decomposing image based the wavelet method used, each involving a
different algorithm and resulting in subbands with different energy compactions. Some
methods are given below[2].
1. Line: This technique is a simpler version of the standard wavelet decomposition. The
wavelet transform is applied to each row of the image, resulting in smooth coefficients on the
left (subband L1) and detail coefficients on the right (subbandH1). Subbands L1 is then
partitioned into L2 and H2, and the process is repeated until the entire coefficient matrix is
turned into detail coefficients, except the leftmost column, which contains smooth
coefficients. The wavelet transform is then applied recursively to the leftmost column,
resulting in one smooth coefficient at the top-left corner of the coefficient matrix. This last
step may be omitted if the compression method being used requires that image rows be
individually compressed.
This technique exploits correlations only within an image row to calculate the
transform coefficients. Also, discarding a coefficient that is located on the leftmost column
may affect just a particular group of rows and may this way introduce artifacts into the
reconstructed image. Implementation of this method is simple, and execution is fast, about
twice that of the standard decomposition. This type of decomposition is illustrated in Fig.3.1.
It is possible to apply this decomposition to the columns of the image, instead of to the rows.Ideally, the transform should be applied in the direction of highest image redundancy, and
experience suggests that for natural images this is the horizontal direction. Thus, in practice,
line decomposition is applied to the image rows.
2. Quincunx: Somewhat similar to the Laplacian pyramid, quincunx decomposition proceeds
level by level and decomposes subbandLi of level i into subbandsHi+1 andLi+1 of level i +1.
Figure.4 illustrates this type of decomposition. It is efficient and computationally simple. On
average, it achieves more than four times the energy compaction of the line method.
Quincunx decomposition results in fewer subbands than most other wavelet
decompositions, a feature that may lead to reconstructed images with slightly lower visual
quality. The method is not used much in practice.
3. Pyramid: The pyramid decomposition is by far the most common method used to
decompose images that are wavelet transformed. It results in subbands with horizontal,
vertical, and diagonal image details, as illustrated by Figure 3.3. The three sub bands at each
level contain horizontal, vertical, and diagonal image features at a particular scale, and each
scale is divided by an octave in spatial frequency (division of the frequency by two).
7/31/2019 Final Seminar Raju
17/47
17
Figure 3.1: Line wavelet decomposition[2]
Figure 3.2: Quincunx wavelet decomposition[2]
Figure 3.3: Pyramid wavelet decomposition[2].
Figure.3.3 illustrates pyramid decomposition. It is obvious that the first step is identical
Pyramid decomposition turns out to be a very efficient way of transferring significant visual
7/31/2019 Final Seminar Raju
18/47
18
data to the detail coefficients. Its computational complexity is about 30% higher thanthat of
the quincunx method, but its image reconstruction abilities are higher. The reasons for the
popularity of the pyramid method may be that (1) it is symmetrical; (2) its mathematical
description is simple.
The quincunx method leaves the high-frequency subband untouched, the pyramid
method resolves it into two bands. On the other hand, pyramid decomposition involves more
computations in order to spatially resolve the asymmetric high-frequency band into two
symmetric high frequency and low-frequency bands.
4. Standard: The first step in the standard decomposition is to apply whatever discrete
wavelet filter is being used to all the rows of the image, obtaining subbands L1 andH1. This
is repeated on L1 to obtain L2 andH2, and so on k times. This is followed by a second step
where a similar calculation is applied k times to the columns. If k = 1, the decomposition
alternates between rows and columns, but kmay be greater than 1. The end result is to have
one smooth coefficient at the top-left corner of the coefficient matrix. This method is
somewhat similar to line decomposition.
Standard decomposition has the second-highest reconstruction quality of all the methods
described here. The reason for the improvement compared to the pyramid decomposition may
be that the higher directional resolution gives thresholding a better chance to cover larger
uniform areas. On the other hand, standard decomposition is computationally more expensive
than pyramid decomposition.
5. Adaptive Wavelet Packet Decomposition: The idea is to skip those subbands splits that do
not contribute significantly to energy compaction. The result is a coefficient matrix with
subbands of different (possibly even many) sizes. The justification for this complex
decomposition method is the prevalence of continuous tone (natural) images. These images
are mostly smooth but normally also have some regions with high frequency data. Such
regions should end up as many small subbands (to better enable an accurate spatial frequency
representation of the image), with the rest of the image giving rise to a few large subbands.
The downside of this type of decomposition is finding an algorithm that will determine
which subband splits can be skipped. Such an algorithm uses entropy calculations and should
be efficient. It should identify all the splits that do not have to be performed, and it should
identify as many of them as possible.
This type of decomposition has the highest reproduction quality of all the methods
discussed here, a feature that may justify the high computational costs in certain special
7/31/2019 Final Seminar Raju
19/47
19
applications. This quality, however, is not much higher than what is achieved with simpler
decomposition methods, such as standard, pyramid, or quincunx.
Figure 3.4: Standard wavelet decomposition[2].
The quantizer and encode steps are the crucial aspects of wavelet transform
compression because they are the cause for actual compression. Each of the algorithms
described below takes a different approach to these aspects[2].
1. SPIHT (set partitioning in hierarchical trees) algorithm
2. EZW (embedded zerotree wavelet) algorithm
3.2.2.1 SPIHT (Set Partitioning In Hierarchical Trees) Algorithm
Regardless of the particular filter used, the image is decomposed into subbands, such
that lower subbands correspond to higher image frequencies (they are the highpass levels)
and higher subbands correspond to lower image frequencies (lowpass levels), where most of
the image energy is concentrated (Figure 3.5). This is why we can expect the detail
coefficients to get smaller as we move from high to low levels. Also, there are spatial
similarities among the subbands. An image part, such as an edge, occupies the same spatial
position in each subband. These features of the wavelet decomposition are exploited by theSPIHT method.
SPIHT was designed for optimal progressive transmission, as well as for compression.
One of the important features of SPIHT (perhaps a unique feature) is that at any point during
the decoding of an image, the quality of the displayed image is the best that can be achieved
for the number of bits input by the decoder up to that moment.
Another important SPIHT feature is its use of embedded coding. This feature is defined
as follows: If an encoder produces two files, a large one of size Mand a small one of size m,
then the smaller file is identical to the first m bits of the larger file. The following example
7/31/2019 Final Seminar Raju
20/47
20
aptly illustrates the meaning of this definition. Suppose that three users wait for you to send
them a certain compressed image, but they need different image qualities. The first one needs
the quality contained in a 10 Kb file. The image qualities required by the second and third
users are contained in files of sizes 20 Kb and 50 Kb, respectively. Most lossy image
compression methods would have to compress the same image three times, at different
qualities, to generate three files with the right sizes. SPIHT, on the other hand, produces one
file, and then three chunks of lengths 10 Kb, 20 Kb, and 50 Kb, all starting at the beginning
of that file can be sent to the three users, thereby satisfying their needs.
Another principle is based on the observation that the most significant bits of a binary
integer whose value is close to maximum tend to be ones. This suggests that the most
significant bits contain the most important image information, and that they should be sent to
the decoder first (or written first on the compressed stream). The progressive transmission
method used by SPIHT incorporates these two principles.
Figure 3.5: Subbands and levels in wavelet decomposition[13]
The main steps of the SPIHT encoder are as follows[13],[15],[16]:
Step 1: Given an image to be compressed, perform its wavelet transform using any suitable
wavelet filter, decompose it into transform coefficients ci,j , and represent the resulting
coefficients with a fixed number of bits..Set n to [log2 maxi,j(ci,j)].
Step 2: Sorting pass: Transmit the number l of coefficients ci,j that satisfy
2n |ci,j | < 2
n+1. Follow with the l pairs of coordinates and the l sign bits of those
coefficients.
Step 3: Refinement pass: Transmit the nth most significant bit of all the coefficients
satisfying |ci,j| 2n+1
. These are the coefficients that were selected in previous sorting passes
Step 4: Iterate: Decrement n by 1. If more iterations are needed go back to Step 2.
7/31/2019 Final Seminar Raju
21/47
21
The last iteration is normally performed for n = 0, but the encoder can stop earlier, in
which case the least important image information (some of the least significant bits of all the
wavelet coefficients) will not be transmitted. This is the natural lossy option of SPIHT. It is
equivalent to scalar quantization, but it produces better results than what is usually achieved
with scalar quantization, since the coefficients are transmitted in sorted order.
Partitioning Sorting Algorithm:
The algorithm used by SPIHT is based on the realization that there is really no need to
sort all the coefficients. The main task of the sorting pass in each iteration is to select those
coefficients that satisfy 2n |ci,j | < 2
n+1. This task is divided into two parts. For a given value
ofn, if a coefficient ci,j satisfies |ci,j| 2n, then we say that it is significant; otherwise, it is
called insignificant. In the first iteration, relatively few coefficients will be significant, but
their number increases from iteration to iteration, because n keeps getting decremented. The
sorting pass has to determine which of the significant coefficients satisfies |ci,j | < 2n+1
and
transmit their coordinates to the decoder. This is an important part of the algorithm used by
SPIHT. The encoder partitions all the coefficients into a number of sets Tkand performs the
significance test
,
,max 2
k
n
i ji j T
c
on each set Tk. The result may be either no (all the coefficients in Tk are insignificant, so Tk
itself is considered insignificant) or yes (some coefficients in Tk are significant, so Tk itself
is significant). This result is transmitted to the decoder. If the result is yes, then Tk is
partitioned by both encoder and decoder, using the same rule, into subsets and the same
significance test is performed on all the subsets. This partitioning is repeated until all the
significant sets are reduced to size 1 (i.e., they contain one coefficient each, and that
coefficient is significant). This is how the significant coefficients are identified by the sorting
pass in each iteration. The significance test performed on a set Tcan be summarized by
(,) ,1,max()
0,
ijTij
n
csT
otherwise
Spatial Orientation Trees:
The sets Tk are created and partitioned using a special data structure called a spatial
orientation tree. The spatial orientation trees are illustrated in Figure 10a,b for a 1616
image. The figure shows two levels, level 1 (the high pass) and level 2 (the low pass). Each
level is divided into four subbands. Subband LL2 (the low pass subband) is divided into four
7/31/2019 Final Seminar Raju
22/47
22
groups of 22 coefficients each. Figure 3.6(a) shows the top-left group, and Figure 3.6(b)
shows the bottom-right group. In each group, each of the four coefficients (except the top-left
one, marked in gray) becomes the root of a spatial orientation tree. The arrows show
examples of how the various levels of these trees are related. The thick arrows indicate how
each group of 44 coefficients in level 2 is the parent of four such groups in level 1. In
general, a coefficient at location (i, j) in the image is the parent of the four coefficients at
locations (2i, 2j), (2i + 1, 2j), (2i, 2j + 1), and (2i + 1, 2j + 1).
The set partitioning sorting algorithm uses the following four sets of coordinates:
1. O(i, j): the set of coordinates of the four offspring of node ( i, j). If node (i, j) is a leaf of a
spatial orientation tree, then O(i, j) is empty.
2.D(i, j): the set of coordinates of the descendants of node (i, j).
3. H(i, j): the set of coordinates of the roots of all the spatial orientation trees (3/4 of the
wavelet coefficients in the highest LL subband).
4.L(i, j): The difference setD(i, j)O(i, j). This set contains all the descendants of tree node
(i, j), except its four offspring.
Figure 3.6: Spatial orientation trees in SPIHT[13]
The spatial orientation trees are used to create and partition the sets Tk. The set partitioning
rules are as follows:
1. The initial sets are {(i, j)} andD(i, j), for all (i, j) H
2. If setD(i, j) is significant, then it is partitioned into L(i, j) plus the four singleelement sets
with the four offspring of (i, j). In other words, if any of the descendants of node (i, j) is
significant, then its four offspring become four new sets and all its other descendants become
another set (to be significance tested in rule 3).
3. IfL(i, j) is significant, then it is partitioned into the four sets D(k, l), where (k, l) are the
offspring of (i, j).
7/31/2019 Final Seminar Raju
23/47
23
Once the spatial orientation trees and the set partitioning rules are understood, the coding
algorithm can be described.
SPIHT Coding [2]:
It is important to have the encoder and decoder test sets for significance in the same
way. The coding algorithm therefore uses three lists called list of significant pixels (LSP), list
of insignificant pixels (LIP), and list of insignificant sets (LIS). These are lists of coordinates
(i, j) that in the LIP and LSP represent individual coefficients, and in the LIS represent either
the setD(i, j) (a typeA entry) or the setL(i, j) (a typeB entry). The LIP contains coordinates
of coefficients that were insignificant in the previous sorting pass. In the current pass they are
tested, and those that test significant are moved to the LSP. In a similar way, sets in the LIS
are tested in sequential order, and when a set is found to be significant, it is removed from the
LIS and is partitioned. The new subsets with more than one coefficient are placed back in the
LIS, to be tested later, and the subsets with one element are tested and appended to the LIP or
the LSP, depending on the results of the test. The refinement pass transmits the nth most
significant bit of the entries in the LSP. this algorithm is given below [2].
1. Set the threshold. Set LIP to all root nodes coefficients. Set LIS to all trees (assign type D
to them). Set LSP to an empty set.
2. Sorting pass: Check the significance of all coefficients in LIP:
2.1 If significant, output 1, output a sign bit, and move the coefficient to the LSP.
2.2 If not significant, output 0.
3. Check the significance of all trees in the LIS according to the type of tree:
3.1 For a tree of type D:
3.1.1 If it is significant, output 1, and code its children:
3.1.1.1 If a child is significant, output 1, then a sign bit, add it to the LSP
3.1.1.2 If a child is insignificant, output 0 and add the child to the end of LIP.
3.1.1.3 If the children have descendants, move the tree to the end of LIS as type L, otherwise
remove it from LIS.
3.1.2 If it is insignificant, output 0.
3.2 For a tree of type L:
3.2.1 If it is significant, output 1, add each of the children to the end of LIS as an entry of
type D and remove the parent tree from the LIS.
3.2.2 If it is insignificant, output 0.
7/31/2019 Final Seminar Raju
24/47
24
4. Loop: Decrement the threshold and go to step 2 if needed.
3.2.2.2. EZW (embedded coding using zerotree of wavelet coefficients)
The EZW method, as implemented in practice, starts by performing the 9-tap
symmetric quadrature mirror filter (QMF) wavelet transform. The main loop is then repeated
for values of the threshold that are halved at the end of each iteration. The threshold is used to
calculate a significance map of significant and insignificant wavelet coefficients. Zerotrees
are used to represent the significance map in an efficient way. The main steps are as
follows[2],[13][17],[18],[19]:
1. Initialization: Set the threshold Tto the smallest power of 2 that is greater than
max(i,j)|ci,j |/2, where ci,j are the wavelet coefficients.
2. Significance map coding: Scan all the coefficients in a predefined way and output a symbol
when | ci,j | > T. When the decoder inputs this symbol, it sets ci,j= 1.5T.
3. Refinement: Refine each significant coefficient by sending one more bit of its binary
representation. When the decoder receives this, it increments the current coefficient
value by 0.25T.
4. Set T= T/2, and go to step 2 if more iterations are needed.
A wavelet coefficient ci,jis considered insignificant with respect to the current threshold
Tif| ci,j| T. The zerotree data structure is based on the following wellknown experimental
result: If a wavelet coefficient at a coarse scale (i.e., high in the image pyramid) is
insignificant with respect to a given threshold T, then all of the coefficients of the same
orientation in the same spatial location at finer scales (i.e., located lower in the pyramid) are
very likely to be insignificant with respect to T.
In each iteration, all the coefficients are scanned in the order shown in Figure 3.7(a).
This guarantees that when a node is visited, all its parents will already have been scanned.
The scan starts at the lowest frequency subbandLLn, continues with subbandsHLn,LHn, and
HHn, and drops to level n 1, where it scansHLn1,LHn1, andHHn1. Each subband is
fully scanned before the algorithm proceeds to the next subband.
Each coefficient visited in the scan is classified as a zerotree root (ZTR), an isolated
zero (IZ), positive significant (POS), or negative significant (NEG). A zerotree root is a
coefficient that is insignificant and all its descendants (in the same spatial orientation tree) are
also insignificant. Such a coefficient becomes the root of a zerotree. It is encoded with a
special symbol (denoted by ZTR), and the important point is that its descendants dont have
7/31/2019 Final Seminar Raju
25/47
25
to be encoded in the current iteration. When the decoder inputs a ZTR symbol, it assigns a
zero value to the coefficients and to all its descendants in the spatial orientation tree. Their
values get improved (refined) in subsequent iterations. An isolated zero is a coefficient that is
insignificant but has some significant descendants. Such a coefficient is encoded with the
special IZ symbol. The other two classes are coefficients that are significant and are positive
or negative. The flowchart of Figure11b illustrates this classification. Notice that a coefficient
is classified into one of five classes, but the fifth class (a zerotree node) is not encoded.
Coefficients in the lowest pyramid level dont have any children, so they cannot be the
roots of zerotrees. Thus, they are classified into isolated zero, positive significant or negative
significant. The zerotree can be viewed as a structure that helps to find insignificance. Most
methods that try to find structure in an image try to find significance.
Figure 3.7: (a) Scanning a zerotree. (b) Classifying a coefficient[18]
Two lists are used by the encoder (and also by the decoder, which works in lockstep) in
the scanning process. The dominant listcontains the coordinates of the coefficients that have
not been found to be significant. They are stored in the order scan, by pyramid levels, and
within each level by subbands. The subordinate listcontains the magnitudes (not coordinates)
of the coefficients that have been found to be significant. Each list is scanned once per
iteration. Iteration consists of a dominant pass followed by a subordinate pass. In the
dominant pass, coefficients from the dominant list are tested for significance. If a coefficient
is found significant, then (1) its sign is determined, (2) it is classified as either POS or NEG,
(3) its magnitude is appended to the subordinate list, and (4) it is set to zero in memory (in the
7/31/2019 Final Seminar Raju
26/47
26
array containing all the wavelet coefficients). The last step is done so that the coefficient does
not prevent the occurrence of a zerotree in subsequent dominant passes at smaller thresholds.
Example [2]:
This example follows the one in. Figure 3.7(a) shows three levels of the wavelet
transform of an 88 image. The largest value is 63, so the initial threshold can be anywhere
in the range (31, 64]. We set it to 32. Figure 3.8(b) lists the results of the first dominant pass.
1. The top-left coefficient is 63. It is greater than the threshold, and it is positive, so a POS
symbol is generated and is transmitted by the encoder (and the 63 is changed to 0). The
decoder assigns this POS symbol the value 48, the midpoint of the interval [32, 64).
2. The coefficient 31 is insignificant with respect to 32, but it is not a zerotree root, since one
of its descendants (the 47 in LH1) is significant. The 31 is therefore an isolated zero (IZ).
3. The 23 is less than 32. Also, all its descendants (the 3, 12, 14, and 8 in HH2, and all of
HH1) are insignificant. The 23 is therefore a zerotree root (ZTR). As a result, no symbols will
be generated by the encoder in the dominant pass for its descendants (this is why none of the
HH2 and HH1 coefficients appear in the table).
4. The 10 is less than 32, and all its descendants (the 12, 7, 6, and 1 in HL1) are also less
than 32. Thus, the 10 becomes a zerotree root (ZTR). Notice that the 12 is greater, in
absolute value, than the 10, but is still less than the threshold.
5. The 14 is insignificant with respect to the threshold, but one of its children (they are 1,
47, 3, and 2) is significant. Thus, the 14 becomes an IZ.
6. The 47 in subband LH1 is significant with respect to the threshold, so it is coded as POS. It
is then changed to zero, so that a future pass (with a threshold of 16) will code its parent, 14,
as a zerotree root.
Four significant coefficients were transmitted during the first dominant pass. All that
the decoder knows about them is that they are in the interval [32, 64). They will be refined
during the first subordinate pass, so the decoder will be able to place them either in [32, 48)
(if it receives a 0) or in [48, 64) (if it receives a 1). The encoder generates and transmits the
bits 1010 for the four significant coefficients 63, 34, 49, and 47. Thus, the decoder refines
them to 56, 40, 56, and 40, respectively.
In the second dominant pass, only those coefficients not yet found to be significant
are scanned and tested. The ones found significant are treated as zero when the encoder
checks for zerotree roots. This second pass ends up identifying the 31 in LH3 as NEG, the
23 in HH3 as POS, the 10, 14, and 3 in LH2 as zerotree roots, and also all four coefficients
7/31/2019 Final Seminar Raju
27/47
27
in LH2 and all four in HH2 as zerotree roots. The second dominant pass stops at this point,
since all other coefficients are known to be insignificant from the first dominant pass
.
Figure 3.8: An EZW example: Three levels of an 88 Image[2].
The subordinate list contains, at this point, the six magnitudes 63, 49, 34, 47, 31, and 23.
They represent the 16-bit-wide intervals [48, 64), [32, 48), and [16, 31). The encoder outputs
bits that define a new subinterval for each of the three. At the end of the second subordinate
pass, the decoder could have identified the 34 and 47 as being in different intervals, so the six
magnitudes are ordered as 63, 49, 47, 34, 31, and 23. The decoder assigns them the refined
values 60, 52, 44, 36, 28, and 20.
3.5. Quadtrees
Quad tree compression partitions the visual data into a structural part (the quad tree
structure) and colour information (the leave values). The quad tree structure shows the location
and size of each homogeneous region, the colour information represents the intensity of the
corresponding region. The generation of the quad tree follows the splitting strategy well known
from the area of image segmentation. Quad tree image compression comes in lossless as well in
lossy flavor; the lossy variant is obtained in case the homogeneity criterion is less flexible. This
7/31/2019 Final Seminar Raju
28/47
28
technique is not competitive from the rate distortion efficiency viewpoint, but it is much faster
than any transform based compression technique[2].
3.6.Fractal Coding
A fractal, in simplest terms, is an image of a texture or shape expressed as one or
more mathematical formulas. In terms of fractal geometry, a fractal is a geometric form
whose irregular details recur at different scale and angle which can be described by affine or
fractal transformations (formulas). Fractals have historically been used to generate images in
applications such as flight simulator scenes and special effects in motion pictures. Fractal
formulas can now be used to describe all real world pictures [7].
Fractal image compression is the inverse of fractal image generation, i.e. Instead of
generating an image or figure from a given formula, fractal image compression searches for
sets of fractals in a digitized image which describe and represent the entire image. Once the
appropriate sets of fractals are determined, they are reduced (compressed) to very compact
fractal transform codes or formulas. The codes are 'rules' for reproducing the various sets of
fractals which, in turn, regenerate the entire image. Because fractal transform codes require
very small amounts of data to be expressed and stored as formulas, fractal compression
results in very high compression ratios.Although fractal compression exhibits promising properties (like e.g. fractal interpolation
and resolution independent decoding) the encoding complexity turned out to be prohibitive for
successful employment of the technique. Additionally, fractal coding has never reached the
rate distortion performance of second generation wavelet codecs. Fractal coding is highly
asymmetric in that significantly more processing is required for searching/encoding than for
decoding. This is because the encoding process involves many transformations and
comparisons to search for sets of fractals, while the decoder simply generates images
according to the fractal formulas received.
3.6. Vector Quantization
Vector quantization exploits similarities between image blocks and an external
codebook. The image to be encoded is tiled into smaller image blocks which are compared
against equally sized blocks in an external codebook. For each image block the most similar
codebook block is identified and the corresponding index is recorded. From the algorithmic
viewpoint, the process is similar to fractal coding, therefore fractal coding is sometimes
referred to as vector quantization with internal codebook. Similar to fractal coding, the
7/31/2019 Final Seminar Raju
29/47
29
encoding process involves a search for an optimal block match and is rather costly, whereas
the decoding process in the case of vector quantization is even faster since it is a simple lookup
table operation. If the properties of the human visual system are used, the size of the
codebook can be reduced further, and fewer bits are used to represent the index of codebook
entries[7],[20].
Two major problems with VQ are, first, how to design a good codebook that is
representative of all the possible occurrences of pixel combinations in a block, and second,
how to find a best match efficiently in the codebook during the coding process.
7/31/2019 Final Seminar Raju
30/47
30
Chapter 4
STANDARD METHODS FOR IMAGE COMPRESSION
With the rapid developments of imaging technology, image compression and coding
tools and techniques, it is necessary to evolve coding standards so that there is compatibility
and interoperability between the image communication and storage products manufactured by
different vendors. Without the availability of standards, encoders and decoders can not
communicate with each other; the service providers will have to support a variety of formats
to meet the needs of the customers and the customers will have to install a number of
decoders to handle a large number of data formats. Towards the objective of setting up
coding standards, the international standardization agencies, such as International Standards
Organization (ISO), International Telecommunications Union (ITU), International Electro-
technical Commission (IEC) etc. have formed expert groups and solicited proposals from
industries, universities and research laboratories. This has resulted in establishing standards
for bi-level (facsimile) images and continuous tone (gray scale) images. Basics concepts of
JPEG and JPEG2000 image compression standards are explained below.
4.1 JPEG
JPEG is a sophisticated lossy/lossless compression method for color or grayscale still
images (not videos). It does not handle bi-level (black and white) images very well. It also
works best on continuous-tone images, where adjacent pixels have similar colors. An
important feature of JPEG is its use of many parameters, allowing the user to adjust the
amount of the data lost (and thus also the compression ratio) over a very wide range. Often,
the eye cannot see any image degradation even at compression factors of 10 or 20. There are
two operating modes, lossy (also called baseline) and lossless (which typically produces
compression ratios of around 0.5). Most implementations support just the lossy mode. This
mode includes progressive and hierarchical coding.The JPEG standard has proved successful
and has become widely used for image compression, especially in Web pages.
JPEG has been designed as a compression method for continuous-tone images. The main
goals of JPEG compression are the following [21],[22],[28]:
1. High compression ratios, especially in cases where image quality is judged as very good to
excellent.
7/31/2019 Final Seminar Raju
31/47
31
2. The use of many parameters, allowing knowledgeable users to experiment and achieve the
desired compression/quality trade-off.
3. Obtaining good results with any kind of continuous-tone image, regardless of image
dimensions, color spaces, pixel aspect ratios, or other image features.
4. A sophisticated, but not too complex compression method, allowing software and
hardware implementations on many platforms.
5. have the following modes of operation:
Sequential encoding: each image component is encoded in a single left-to-right, top-to-
bottom scan
Progressive encoding: the image is encoded in multiple scans for applications in which
transmission time is long, and the viewer prefers to watch the image build up in multiple
coarse-to-clear passes
Figure 4.1Progressive versus sequential presentation[22]
Figure 4.2Hierarchical multi-resolution encoding[22]
Lossless encoding: the image is encoded to guarantee exact recovery of every source image
sample value (even though the result is low compression compared to the lossy modes);
7/31/2019 Final Seminar Raju
32/47
32
Hierarchical encoding: the image is encoded at multiple resolutions so that lower-resolution
versions may be accessed without first having to decompress the image at its full resolution.
The typical sequence of image presentation at the output of the decoder for sequential
versus progressive modes of operation is shown in Figure 4.1
4.1.1. Lossy and Lossless Compression
To meet the differing needs of many applications, the JPEG standard includes two basic
compression methods, each with various modes of operation. This Specification specifies two
classes of encoding and decoding processes, lossy and lossless processes. Those based on the
discrete cosine transform (DCT) are lossy, thereby allowing substantial compression to be
achieved while producing a reconstructed image with high visual fidelity to the encoders
source image.
The simplest DCT-based coding process is referred to as the baseline sequential process.
It provides a capability which is sufficient for many applications. There are additional DCT-
based processes which extend the baseline sequential process to a broader range of
applications. In any decoder using extended DCT-based decoding processes, the baseline
decoding process is required to be present in order to provide a default decoding capability.
The second class of coding processes is not based upon the DCT and is provided to meet
the needs of applications requiring lossless compression. These lossless encoding and
decoding processes are used independently of any of the DCT-based processes.
The amount of compression provided by any of the various processes is dependent on the
characteristics of the particular image being compressed, as well as on the picture quality
desired by the application and the desired speed of compression and decompression.
4.1.2 Sequential DCT-based Coding
Figures 4.3 and 4.5 show the key processing steps which are the heart of the DCT-based modes of operation. These figures illustrate the special case of single-component
(grayscale) image compression. We can grasp the essentials of DCT-based compression by
thinking of it as essentially compression of a stream of 8x8 blocks of grayscale image
samples. Color image compression can then be approximately regarded as compression of
multiple grayscale images, which are either compressed entirely one at a time, or are
compressed by alternately interleaving 8x8 sample blocks from each in turn.
In the encoding process the input components samples are grouped into 8 8 blocks,
and each block is transformed by theforward DCT(FDCT) into a set of 64 values referred to
7/31/2019 Final Seminar Raju
33/47
33
asDCT coefficients. One of these values is referred to as the DC coefficientand the other 63
as the AC coefficients. Each of the 64 coefficients is then quantized using one of 64
corresponding values from a quantization table (determined by one of the table specifications
shown in Figure 4). No default values for quantization tables are specified in this
Specification; applications may specify values which customize picture quality for their
particular image characteristics, display devices, and viewing conditions.
Figure 4.3DCT-based encoder simplified diagram[22]
Figure 4.4Preparation of quantized coefficients for entropy encoding[22]
After quantization, the DC coefficient and the 63 AC coefficients are prepared for
entropy encoding, as shown in Figure 5. The previous quantized DC coefficient is used to
predict the current quantized DC coefficient, and the difference is encoded. The 63 quantized
AC coefficients undergo no such differential encoding, but are converted into a one
dimensional zig-zag sequence, as shown in Figure 4.4. The quantized coefficients are then
passed to an entropy encoding procedure which compresses the data further. IfHuffman
encoding is used,Huffman table specifications must be provided to the encoder. Ifarithmetic
encoding is used, arithmetic coding conditioning table specifications may be provided,
otherwise the default conditioning table specifications shall be used. Figure 6 shows the mainprocedures for all DCT-based decoding processes. Each step shown performs essentially the
7/31/2019 Final Seminar Raju
34/47
34
inverse of its corresponding main procedure within the encoder. The entropy decoder decodes
the zig-zag sequence of quantized DCT coefficients. After dequantization the DCT
coefficients are transformed to an 8 8 block of samples by the inverse DCT(IDCT).
Figure 4.5DCT-based decoder simplified diagram[22]
4.1.3 Lossless Coding
Figure 4.6 shows the main procedures for the lossless encoding processes. A predictor
combines the reconstructed values of up to three neighbourhood samples at positions a, b, and
c to form a prediction of the sample at position x as shown in Figure 4.7. This prediction is
then subtracted from the actual value of the sample at position x, and the difference is
losslessly entropy-coded by either Huffman or arithmetic coding.Any one of the eight
predictors listed in Table 4.1 (under selection-value) can be used. Selections 1, 2, and 3 are
one-dimensional predictors and selections 4, 5, 6 and 7 are two-dimensional predictors.
Selection-value 0 can only be used for differential coding in the hierarchical mode of
operation.
For the lossless mode of operation, two different codecs are specified - one for each
entropy coding method.The encoders can use any source image precision from 2 to 16
bits/sample, and can use any of the predictors except selection-value 0. The decoders must
handle any of the sample precisions and any of the predictors. Lossless codecs typically
produce around 2:1 compression for color images with moderately complex scenes.
This encoding process may also be used in a slightly modified way, whereby the
precision of the input samples is reduced by one or more bits prior to the lossless coding.
This achieves higher compression than the lossless process (but lower compression than the
DCT-based processes for equivalent visual fidelity), and limits the reconstructed images
worst-case sample error to the amount of input precision reduction.
7/31/2019 Final Seminar Raju
35/47
35
Figure 4.6Lossless encoder simplified diagram[22]
Figure 4.73-sample prediction neighbourhood[22]
Table 4.1. Predictors for lossless coding[22]
The JPEG algorithm yields good results for compression ratios of 10:1 and below (on
8-bit gray-scale images), but at higher compression ratios the underlying block nature of the
transform begins to show through the compressed image. By the time compression ratios
have reached 24:1, only the DC (lowest frequency) coefficient is getting any bits allocated to
it, and the input image has been approximated by a set of 8 8 blocks. Consequently, the
decompressed image has substantial blocking artifacts for medium and high compression
ratios.
7/31/2019 Final Seminar Raju
36/47
36
4.2 JPEG 2000
The data compression field is very active, with new approaches, ideas, and techniques
being developed and implemented all the time. JPEG is widely used for image compression
but is not perfect. The use of the DCT on 88 blocks of pixels results sometimes in a
reconstructed image that has a blocky appearance (especially when the JPEG parameters are
set for much loss of information). This is why the JPEG committee has developed a new,
wavelet-based standard for the compression of still images, to be known as JPEG 2000. JPEG
2000 has many advantages over JPEG, such as better image quality at the same file size, 25-
35% smaller file sizes at comparable image quality, good image quality even at very high
compression ratios (over 80:1), low complexity option for devices with limited resources,
scalable image files, and progressive rendering and transmission through a layered image file
structure.
JPEG 2000 is not only intended to provide rate-distortion and subjective image quality
performance superior to existing standards, but also to provide features and functionalities
that current standards can either not address efficiently or in many cases cannot address at all.
Lossless and lossy compression, embedded lossy to lossless coding, progressive transmission
by pixel accuracy and by resolution, robustness to the presence of bit-errors and region-of-
interest coding, are some representative features. It is interesting to note that JPEG2000 is
designed to address the requirements of a diversity of applications, e.g. Internet, color
facsimile, printing, scanning, digital photography, remote sensing, mobile applications,
medical imagery, digital library and E-commerce.
JPEG-2000 has a long list of features, a subset of which are [24],[25].[26],27]:
High compression efficiency. Bitrates of less than 0.25 bpp are expected for highlydetailed greyscale images.
The ability to handle large images, up to 232232 pixels (the original JPEG canhandle images of up to 216216).
Progressive image transmission (Section 4.10). The proposed standard candecompress an image progressively by SNR, resolution, colour component, or region
of interest.
Easy, fast access to various points in the compressed stream. The decoder can pan/zoom the image while decompressing only parts of it. The decoder can rotate and crop the image while decompressing it.
7/31/2019 Final Seminar Raju
37/47
37
Error resilience. Error-correcting codes can be included in the compressed stream, toimprove transmission reliability in noisy environments.
One of the new, important approaches to compression introduced by JPEG 2000 is the
compress once, decompress many ways paradigm. The JPEG 2000 encoder selects a
maximum image quality Q and maximum resolution R, and it compresses an image using
these parameters. The decoder can decompress the image at any image quality up to and
including Q and at any resolution less than or equal to R. Suppose that an image I was
compressed intoB bits. The decoder can extractA bits from the compressed stream (whereA
< B) and produce a lossy decompressed image that will be identical to the image obtained ifI
was originally compressed lossily toA bits.
In general, the decoder can decompress the entire image in lower quality and/or lower
resolution. It can also decompress parts of the image (regions of interest) at either maximum
or lower quality or resolution. Even more, the decoder can extractparts of the compressed
stream and assemble them to create a new compressed stream without having to do any
decompression. Thus, a lower-resolution and/or lower-quality image can be created without
the decoder having to decompress anything. The advantages of this approach are (1) it saves
time and space and (2) it prevents the buildup of image noise, common in cases where an
image is lossily compressed and decompressed several times.
Figure 4.8 shows the steps in the JPEG 2000 compression of an image. Function of each
block is explained below[26].
Figure 4.8: Steps in the JPEG 2000 compression of an image.[26]
Fig. 4.8. Tiling, DC level shifting and DWT of each image tile component[26].
Tiling Component
transform
Wavelet
transformQuantizer Entropy
coder
Packet
orderin
7/31/2019 Final Seminar Raju
38/47
38
1. TilingThe first thing that happens when an image is JPEG 2000 compressed is that it is split
into rectangular tiles. Since each tile is compressed independently of every other tile, the
usual rationale for tiling is to limit the amount of memory needed to implement JPEG 2000
and to provide independent access to regions in an image. Some implementations are
designed for tiling and perform best with tiled images; other implementations can compress
megabyte and gigabyte images without tiling.
Prior to computation of the forward discrete wavelet transform (DWT) on each image
tile, all samples of the image tile component are DC level shifted by subtracting the same
quantity (i.e. the component depth). DC level shifting is performed on samples of
components that are unsigned only. If color transformation is used, it is performed prior to
computation of the forward component transforms Otherwise it is performed prior to the
wavelet Transform as shown in Figure 4.8. This process translates all pixel values from their
original, unsigned interval [0, 2s 1] (where s is the pixels depth) to the signed interval
[2s1, 2s1 1] by subtracting 2s1 from each value. For s = 4, e.g., the 24 = 16 possible
pixel values are transformed from the interval [0, 15] to the interval [8,+7] by subtracting
241 = 8 from each value.
2. Component transform
If the components in a multi-component image are red, green and blue, then an optional
component transform is available to convert them to luminance and chrominance. The
purpose of these transforms is to decorrelate the red, green and blue image components,
which improves compression performance by redistributing the energy across the image
components. In this respect, the ICT does a better job at decorrelating the red, green and blue
values than the RCT, which leads to better compression. Whichever transform is used before
compression; the inverse transform is applied after decompression to restore the red, green
and blue values.
3. Wavelet Transform
The wavelet transform is applied on each tile. The tile is decomposed in different
resolution levels. These decomposition levels are made up of subbands of coefficients that
describe the frequency characteristics of local areas (rather than across the entire tile-
component) of the tile component.
7/31/2019 Final Seminar Raju
39/47
39
4. Quantizer
The next step after the wavelet transform is the quantization of the subband images,
which is a lossy step that reduces their precision in order to improve their compressibility in
the following step, which is the arithmetic coder. In lossless compression, the subband
images are passed unchanged to the arithmetic coder.
5. Entropy coder
After quantization comes the entropy coder, which takes advantage of the statistical
properties of the quantized subband images to reduce the number of bits used to represent
them. This is the stage where the actual compression occurs. While baseline JPEG use
Huffman coding, JPEG 2000 uses a more sophisticated and computationally expensive
method known as adaptive arithmetic coding. The subband images are partitioned into fixed-
size codeblocks and the arithmetic coder applied independently to each bitplane of each
subband image within a codeblock. Because arithmetic coding can become less effective for
lower bitplanes, JPEG 2000 has an optional Bypass mode that skips the coding of the lower
bitplanes, which saves time with little reduction in compression efficiency.
6. Packet ordering
Packets are the fundamental building blocks of a JPEG 2000 codestream. While a layer is an
increment in quality for the entire image, a packet is an increment in quality for a specific
position of a given resolution of a component of a tile. The interleaving of packets in a
codestream determines the progression order in which compressed data is received and
decompressed. JPEG 2000 defines five progression orders or packet orderings. In resolution-
major progression orders, the packets for all layers, positions and components of the lowest
resolution come in the codestream before all those for the next higher resolution level.
4.3. JPEG 2000 Applications in Access and Preservation
JPEG 2000 is being used for geospatial imaging, medical imaging and by the cultural
heritage and digital preservation communities. Many digital collection and library systems
support JPEG 2000, and several institutions use it in their collections. This section will
discuss the experiences of a few of those institutions, chosen as they highlight the issues[26].
An institution that has done much work in the use of JPEG 2000 and is now one of theleaders in its adoption is the Harvard University Library (HUL). The move to JPEG 2000
was driven in part by institutional clients who wanted features such as interactive zoom,
pan, and rotate. These requirements are not easily implemented with TIFF, GIF, or JPEG,
but are easily enabled by JPEG 2000. In 2006, Harvard reported the successful test
7/31/2019 Final Seminar Raju
40/47
40
migration of more than 10,000 TIFF, GIF, and JPEG images to equivalent lossless and
lossy JPEG 2000 form.
Over the past several years, the rate of acquisition of new JPEG 2000 images into theHUL Digital Repository Service (DRS) has steadily increased, while that for JPEG and
TIFF has decreased. The DRS now manages about two million JPEG 2000 images, and
JPEG 2000 is becoming the default format for image conversion and acquisition. A single
JPEG 2000 master image in the repository enables the dynamic delivery of arbitrarily-
sized use images (transcoded to JPEG for rendering by the client browser), all computed
on demand from the master, thereby eliminating the need to maintain multiple variants in
anticipation of client requests. In addition, JPEG 2000 enables an interactive interface that
lets users perform the zoom, pan, and rotation operations that now form the common user
expectation for web-based image delivery.
Library and Archives Canada ran a year-long JPEG 2000 pilot project over 2006 and2007, the results of which were described at the Museums and Web 2007 conference35.
This pilot was undertaken to address many of the questions that cultural institutions have
regarding JPEG 2000. One of their main results was to show that the use of JPEG 2000
could reduce file sizes significantly without loss to image quality. In the case of lossless
archival masters, the compression ratio was typically around 2:1. For production or access
masters, they specified a recommended compression ratio of 24:1 for colour images,
which included photographs, prints, drawings and maps, and 8:1 for greyscale images,
which included newspapers, microfilm and textual materials. They found that the JPEG
2000 codec they used performed best when images were tiled, and they recommended
tiles sizes of 512 by 512 and 1024 by 1024. They also observed that the use of JPEG 2000
meant that derivative files were no longer required. The JP2 files they created in this pilot
contained XML boxes with MODS based metadata records.
The Library of Congress already makes use of JPEG 2000. For example, Civil War mapsin the American Memory collection are compressed using JPEG 2000. A clients pan and
zoom requests are served with reference to a JPEG 2000 image; the resulting views are
transcoded to JPEG for delivery to a standard web browser. The site also offers the option
of downloading the JPEG 2000 image of the map. The Librarys collection still has some
maps compressed using MrSID, a proprietary wavelet-based compression method that
predates JPEG 2000.
7/31/2019 Final Seminar Raju
41/47
41
Chapter 5
COMPARATIVE STUDY
The goal of image compression is to save storage space and to reduce transmission
time for image data. It aims at achieving a high compression ratio (CR) while preserving
good fidelity of decoded images. The techniques used to compress/decompress a single gray
level image are expected to be easily modified to encode/decode color images and image
sequences. There is always a compromise between image quality and compression ratio.
There are so many methods available for image compression. The choice of a particular
method depends on application. In this chapter we review comparative studies made by
different authors.
5.1. Comparative results obtained by DELGORGE [6]
The survey was performed on 10 ultrasound images of size 768*576 The images have
been acquired by an AU3 ultrasound scanner (ESAOTE) at a rate of 15 images per second,
then digitised thanks to a Matrox Meteor board. The computing was achieved by a Pentium
III with 450 MHz, under Windows NT.
The following results represent an average measure of the MSE, PSNR, coding
computed time tcc, tenc, tdec and compression rate CRt calculated on ten rebuilt and original
images of our database. The results concerning tcc, tenc and tdec have to be looked at in
comparison with each others to appreciate the performance of each of the studied techniques
TABLE 5.1 TABLE5.2
COMPARISON RESULTS FOR LOSSLESS METHODS COMPARISON RESULTS FOR LOW AND HIGH COMPRESSION
It can be concluded that the RLE coding is not suited to ultrasound image, as its CRt
is the largest. Fano & Huffman algorithms give comparable results in terms of tenc,tdec and
CRt, with poor performances. Adaptative Huffman method presents a compression rate of
54.57 % (the final image size is about half of the original one). The last method, based onarithmetic coding, give the best compression rate, but is associated with larger compression
7/31/2019 Final Seminar Raju
42/47
42
and decompression times.In conclusion, the Adaptative Huffman method gives the best
compromise between compression rate and computing times.
Experimental results performed on ten ultrasound images establish that the JPEG-LS
technique seems to be the best lossless method for tele-medicine application. In the lossy
case, JPEG-LS is the best method when the compression rate expected is greater than 5%.
And for very high compression, JPEG 2000 becomes the optimal technique.
5.2. The comparative results reported by Chaur-Chin Chen[7] for various lossymehods
Table 5.3 performance of different methods
Method Advantages Disadvantages Compression ratio
Wavelet
high compression
ratio
coefficient
ratio quantization
bit allocation>> 32
JPEG
state-of-the-art
current standard
coefficient
(DCT) quantization
bit allocation
50
VQ
simple decoder
no coefficient
quantization
slow codebook
generation
small bpp
< 32
Fractal
good mathematical
encoding frame
resolution-free
decoding
slow encoding
16
Image compression algorithms based on EZW, JPEG/DCT, VQ, and Fractal methods
were tested for four 256256 real images: Jet, Lenna, Mandrill, Peppers, and one 400400
fingerprint image. The original images ofLenna andfingerprintare shown in Figure 5.1. The
results of performance are reported in Tables 5.3. The decoded images based on the four
approaches are shown in Figures 5.2 and 5.3. The associated PSNR values and
encoding/decoding times shown in Tables 4.5 for the test images indicate that all the four
approaches are satisfactory at 0.5 bpp request (CR=16). However, the EZW has significantly
larger PSNR values and a better visual quality of decoded images compared with the other
approaches. At a desired compression of 0.25 bpp (CR=32) for the fingerprint image, the
7/31/2019 Final Seminar Raju
43/47
43
commonly used VQ cannot be tested, and the fractal coding cannot be achieved unless
resolution-free decoding property is utilized which is not useful for the current purpose; both
EZW and JPEG approaches perform well, and the results of EZW have significant larger
PSNR values than that of JPEG.
Table 5.4: Performance of coding algorithms on various 256256 images.
Algorithm PSNR values (in dB)