Download - Final Seminar Raju

7/31/2019 Final Seminar Raju

1/47

1

Chapter 1

INTRODUCTION

Uncompressed multimedia (graphics, audio and video) data requires considerable

storage capacity and transmission bandwidth. Despite rapid progress in mass-storage density,

processor speeds, and digital communication system performance, demand for data storage

capacity and data-transmission bandwidth continues to outstrip the capabilities of available

technologies. The recent growth of data intensive multimedia-based web applications have

not only sustained the need for more efficient ways to encode signals and images but have

made compression of such signals central to storage and communication technology.

In the field of image processing, image compression is the current topic of research.

Image compression plays a crucial role in many important and diverse applications, including

televideo conferencing, remote sensing, document & medical and facsimile transmission.

1.1. Need For Compression

Image data is by its nature multidimensional and tend to take up a lot of space

Pictures take up a lot of storage space (either disk or memory). A 1000x1000 picture with 24 bits per pixel takes up 3 megabytes. The Encyclopedia Britannica scanned at 300 pixels per inch and 1 bit per pixel requires

25,000 pages 1,000,000 bytes per page = 25 gigabytes.

Video is even bulkier: 90 minute movie at 640480 resolution spatially, 24 bit per pixel,24 frames per second, requires 9060246404803=120 gigabytes.

Applications: HDTV, film, remote sensing and satellite image transmission, networkcommunication, image storage, medical image processing, fax.

1.2. Principles Behind Compression

A common characteristic of most images is that the neighboring pixels are correlated

and therefore contain redundant information. The foremost task then is to find less correlated

representation of the image. Two fundamental components of compression are redundancy


2/47

2

and irrelevancy reduction. Redundancy reduction aims at removing duplication from the

signal source (image/video). Irrelevancy reduction omits parts of the signal that will not be

noticed by the signal receiver, namely the Human Visual System (HVS). In general, three

types of redundancy can be identified:

Spatial Redundancy or correlation between neighboring pixel values. Spectral Redundancy or correlation between different color planes or spectral bands. Temporal Redundancy or correlation between adjacent frames in a sequence of images

(in video applications).

Image compression research aims at reducing the number of bits needed to represent an

image by removing the spatial and spectral redundancies as much as possible. Since we will

focus only on still image compression, we will not worry about temporal redundancy.

Different methods for redundancy reduction are

Spatial redundancy: DCT, DWT, DPCM Statistical redundancy: Run-Length coding, Variable-Length coding

1.3. Image Compression Model

A typical image compression model consists of source encoder which is resposible for

reducing or eliminating any coding, interpixel or psycho visual redundancies in the input

image. Channel is a transmission path and source decoder reconstructs the original image

whose function is opposite to that of source encoder. The figure.1 shows the block diagram of

image compression model [1].

Figure.1 Image compression model[1]

Image channel

(a)

ImageSource

encoder

ChannelSource

decoder

Reconstructed

image

mapper quantizer Symbol

encoder


3/47

3

Channel reconstructed image

(b)

Figure.2 (a) Source encoder (b) Source decoder[1]

The source encoder consists of three blocks. The first stage of the source encoding

process, the mapper transforms the input data into a format designed to reduce inter pixel

redundancies in the input image. This operation is generally reversible and may or may not

reduce directly the amount of data required to represent the image.

The second stage, or quantizer block in figure.2(a), reduces the accuracy of the

mappers output in accordance with some pre established fidelity criterion. The stage reduces

the psychovisual redundencies of the input image. This operation is irreversible. Thus it must

be omitted when error free compression is desired.

In the third and final stage of the source encoding process, the symbol coderblock in

figure.2 (a) creates a fixed- or variable-length code to represent the quantizer output and

maps the output in accordance with the code.

The source decoder shown in figure.2 (b) contains only two components symbol

decoder and an inverse mapper. These blocks perform, in reverse order, the inverse

operations of the source encoders, symbol encoderand mapperblocks.

The lossless and lossy methods are discussed separately in the 2nd

and 3rd

chapters

respectively.

Symbol

decoder

Inverse

mapper


4/47

4

Chapter 2

LOSS LESS COMPRESSION METHODS

In numerous applications error-free compression is the only acceptable means of data

compression. One such application is the archival of medical or business documents, where

lossy compression usually is prohibited for legal reasons. Another is the processing of

satellite image, where both the use and cost of collecting the data makes any loss undesirable.

Yet another is digital radiography, where the loss of information can compromise diagnostic

accuracy. In these and other cases, the need for errorfree compression is motivated by theintended use or nature of the image under consideration. The lossless method normally

provides compression ratios of 2 to 10.

2.1.Run Length Encoding

This method reduces only inter pixel redundancy. The following example illustrates the Run

length coding method [2].

Original Image

63 63 63 63 64 64 64 78 89 89 89 89

Compressed Image

63, 4,64,3,78,1,89,4

Code the number of pixels taking the same value along a given scan line. Works particularly well on binary images since only length of run needs to encoded. Works by utilizing scan line coherence. Bit-plane run length encoding is used on non-binary images by considering each bit of

the, say 8 bit, image one at a time.

Compression rates of 1.5:1 (gray-scale / color images), 4:1 (binary images) and 2:1 (bit-plane compression on gray-scale /color images)

May cause a data explosion: the final file may be larger than the original one.


5/47

5

2.2.Huffman Coding [2],[3]

This is the most popular technique for removing coding redundancy.

Huffman coding works on the image brightness histogram. Finds the most commonly occurring brightness patterns and uses the shortest codes to

represent these.

Compression rates of 1.5 to 2:1.Huffman coding may also be used after run length coding to give further compression.

An Example of Huffman coding:

Figure.1 illustrates the principles of Huffman coding. Assume that we wish to transmit the set

of 28 data points[3].

{1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

The set consists of seven distinct quantized levels, or symbols. For each symbol, S i, we

calculate its probability of occurrence Pi by dividing its frequency of occurrence by 28, the

total number of data points. Consequently, the construction of a Huffman code for this set

begins with seven nodes, one associated with each P i. At each step we sort the Pi list in

descending order, breaking the ties arbitrarily. The two nodes with smallest probability, P i

and Pj, are merged into a new node with probability Pi + Pj. This process continues until the

probability list contains a single value, 1.0, as shown in Figure 2.1(a).

The process of merging nodes produces a binary tree as in Figure 2.1(b). The root of the

tree has probability 1.0. We obtain the Huffman code of the symbols by traversing down the

tree, assigning 1 to the left child and 0 to the right child. The resulting code words have the

prefix property. This property ensures that a coded message is uniquely decodable without

the need for look ahead. Figure 2.1(c) summarizes the results and shows the Huffman codes

for the seven symbols. We enter these code word mappings into a translation table and use

the table to pad the appropriatecode word into the output bit stream in the reduction process.

The reduction ratio of Huffman coding depends on the distribution of the source symbols. In

our example, the original data requires three bits to represent the seven quantized levels.

After Huffman coding, we can calculate the expected code word length

7

1

i i

i

El l p

Where li represents the length of Huffman code for the symbols. This value is 2.65 in our

example, resulting in an expected reduction ratio of 3:2.65. The reconstruction process begins


6/47

6

at the root of the tree. If bit 1 is received, we traverse down the left branch, otherwise the

right branch. We continue traversing until we reach a node with no child. We then output the

symbol corresponding to this node and begin traversal from the root again. The

reconstruction process of Huffman coding perfectly recovers the original data. Therefore it isa lossless algorithm.

Figure 2.1.Illustration of Huffman coding. (a) At each step, P is are sorted in descending

order and the two lowest Pi are merged. (b) Merging operation depicted in a binary tree. (c)

Summary of Huffman coding for the data set[2].

However, a transmission error of a single bit may result in more than one decoding

error. This propagation of transmission error is a consequence of all algorithms that producevariable-length code words.


7/47

7

2.3. Arithmetic Coding

Arithmetic coding is a lossless coding method which does not suffer from the afore

mentioned drawbacks and which tends to achieve a higher compression ratio than Huffman

coding. However, Huffman coding can generally be realized with simpler software and

hardware.

The basic idea behind arithmetic coding is to map the input sequence of symbols into

one single codeword. Symbol blocking is not needed since the codeword can be determined

and updated incrementally as each new symbol is input (symbol-by-symbol coding). At any

time, the determined codeword uniquely represents all the past occurring symbols. Although

the final codeword is represented using an integral number of bits, the resulting average

number of bits per symbol is obtained by dividing the length of the codeword by the number

of encoded symbols [2],[3].

Arithmetic Coding Algorithm:

1) Divide the interval [0,1] into segments corresponding to the M symbols; the segment of

each symbol has a length proportional to its probability.

2) Choose the segment of the first symbol in the string message.

3) Divide the segment of this symbol again into M new segments with length proportional tothe symbols probabilities.

4) From these new segments, choose the one corresponding to the next symbol in the

message.

5) Continue steps 3) and 4) until the whole message is coded.

6) Represent the segment's value by a binary fraction.

2.4. Lempel-Ziv Coding

Huffman coding and arithmetic coding requires a priori knowledge of the source symbol

probabilities or of the source statistical model. In some cases, a sufficiently accurate source

model is difficult to obtain, especially when several types of data (such as text, graphics, and

natural pictures) are intermixed [4], [5].

Dictionary-based coders dynamically build a coding table (called dictionary) of variable-

length symbol strings as they occur in the input data. As the coding table is constructed,

fixed-length binary code words are assigned to the variable length input symbol strings by


8/47

8

indexing into the coding table. In Lempel-Ziv (LZ) coding, the decoder can also dynamically

reconstruct the coding table and the input sequence as the code bits are received without any

significant decoding delays. Although LZ codes do not explicitly make use of the source

probability distribution, they asymptotically approach the source entropy rate for very long

sequences. Because of their adaptive nature, dictionary-based codes are ineffective for short

input sequences since these codes initially result in a lot of bits being output. So, short input

sequences can result in data expansion instead of compression.

Let S be the source alphabet consisting of N symbols Sk (1 < k < N). The basic steps of

the LZW algorithm can be stated as follows [4]:

1. Initialize the first N entries of the dictionary with the individual source symbols of S,

2. Parse the input sequence and find the longest input string of successive symbols w

(including the first still uuencoded symbol s in the sequence) that has a matching entry in the

dictionary.

3. Encode w by outputting the index (address) of the matching entry as the codeword for w.

4. Add to the dictionary the string ws formed by concatenating w and the next input symbol s.

5. Repeat from step 2 for the remaining input symbols starting with the symbol s, until the

entire input sequence is encoded.

Consider the source alphabet S = {S1, S2, S3, S4}. The encoding procedure is illustrated for

the input sequence S1 S2 S1 S2S3 S2 S1 S2. The constructed dictionary is shown in Table.2.1.

Table.2.1: Dictionary constructed while encoding the sequence S1 S2 S1 S2 S3 S2 S1 S2,

which is emitted by a source with alphabet S = {S1, S2, S3, S4}[4].

The resulting code is given by the fixed-length binary representation of the following

sequence of dictionary addresses: 1 2 5 3 6 2. The length of the generated binary code words

depends on the maximum allowed dictionary size. If the maximum dictionary size is M

entries, the length of the code words would be log2 (M) rounded to the next smallest integer.


9/47

9

The decoder constructs the same dictionary as the code words are received. The basic

decoding steps can be described as follows:

1. Start with the same initial dictionary as the encoder. Also, initialize w to be the empty

string.

2. Get the next "codeword", and decode it by outputting the symbol string smstored at address

"codeword" in dictionary.

3. Add to the dictionary the string ws formed by concatenating the previous decoded string w

(if any) and the first symbol s of the current decoded string.

4. Set w = m and repeat from step 2 until all the code words are decoded.

2.5. Predictive Coding [1],[5]

Original Image

63 63 63 63 64 64 64 78 89 89 89 89

Compressed Image

63,0,0,0,1,0,0,14,11,0,0,0

Stores the deference between successive pixels' brightness in fewer bits. Relies on the image having smooth changes in brightness: at sharp changes in the

image we need overload patterns.

2.7. FELICS (Fast, Efficient, and Lossless Image Compression System)

It is a special-purpose compression method designed for greyscale images and it

competes with the lossless mode of JPEG. It is fast and it generally produces good

compression. However, it cannot compress an image to below one bit per pixel, so it is not a

good choice for bi-level or for highly redundant images [2].

The principle of FELICS is to code each pixel with a variable-size code based on the

values of two of its previously seen neighbour pixels. Figure.2.2.(a) shows the two known

neighbours A and B of some pixels P. For a general pixel, these are the neighbours above it

and to its left. For a pixel in the top row, these are its two left neighbours (except for the first

two pixels of the image). For a pixel in the leftmost column, these are the first two pixels of

the line above it. Notice that the first two pixels of the image dont have any prev iously seen


10/47

10

neighbours, but since there are only two of them, they can be output without any encoding,

causing just a slight degradation in the overall compression.

Consider the two neighbours A and B of a pixel P. We use A, B, and P to denote both

the three pixels and their intensities (greyscale values). We denote by L and H the neighbours

with the smaller and the larger intensities, respectively. Pixel P should be assigned a variable-

size code depending on where the intensity P is located relative to L and H. There are three

cases:

1. The intensity of pixel P is between L and H (it is located in the central region of

Figure.2.2.(b)). This case is known experimentally to occur in about half the pixels, and P is

assigned, in this case, a code that starts with 0. The probability that P will be in this central

region is almost, but not completely, flat, so P should be assigned a binary code that has

about the same size in the entire region but is slightly shorter at the centre of the region.

2. The intensity of P is lower than L (P is in the left region). The code assigned to P in this

case starts with 10.

(a) (b)Figure 2.2(a) The two neighbours ( b) The three regions[2].

Table.2.2. The Codes for the Central Region[2].

Pixel Region Pixel

P code code

L=15 0 0000

16 0 0010

17 0 010

18 0 011

19 0 100

20 0 101

21 0 110

22 0 111

23 0 0001H=24 0 0011


11/47

11

3. Ps intensity is greater than H. P is assigned a code that starts with 11. When pixel P is in

one of the outer regions, the probability that its intensity will differ from L or H by much is

small, so P can be assigned a long code in these cases . The code assigned to P should

therefore depend heavily on whether P is in the central region or in one of the outer regions.

Here is how the code is assigned when P is in the central region. We need H L+1 variable-

size codes that will not differ much in size and will, of course, satisfy the prefix property. We

set k = [log2 (H L + 1)]and compute integers a and b by

a = 2k+1 (H L + 1), b= 2(H L + 1 2

k).

Example: If H L = 9, then k= 3, a = 23+1 (9 + 1) = 6, and b = 2(9+1 23) = 4. We now

select the a codes 2k1, 2k2,. . . expressed as k-bit numbers, and the b codes 0, 1, 2, . . .

expressed as (k+ 1)-bit numbers. In the example above, the a codes are 8 1 = 111, 8 2 =

110, through 8 6 = 010, and the b codes, 0000, 0001, 0010, and 0011. Table .2.2. shows

how ten such codes can be assigned in the case L = 15, H = 24.


12/47

12

Chapter 3

LOSSY COMPRESSION

The compression achieved via lossless schemes is often inadequate to cope with the

volume of image data involved. Thus, lossy schemes (also called irreversible) have to be

employed, which aim at obtaining a more compact representation of the image at the cost of

some data loss, which however might not correspond to an equal amount of information loss.

In other words, although the original image cannot be fully reconstructed, the degradation

that it has undergone is not visible by a human observer for the purposes of the specific task.

Compression ratios achieved through lossy compression range from 4:1 to 100:1 or even

higher.

3.1. Performance Evaluation Parameters

To compare different algorithms of lossy compression several approaches of measuring

the loss of quality have been devised. In the MI context, where the ultimate use of an image is

its visual assessment and interpretation, subjective and diagnostic evaluation approaches are

the most appropriate. However, these are largely dependent on the specific task at hand and

moreover they entail costly and time-consuming procedures. In spite of the fact that they are

often inadequate in predicting the visual (perceptual) quality of the decompressed image,

objective measures are often used since they are easy to compute and are applicable to all

kinds of images regardless of the application.

Compression ratio is defined as the nominal bit depth of the original image in bits per

pixel (bpp) divided by the bpp necessary to store the compressed image. For each compressed

and reconstructed image, an error image was calculated. From the error data, maximum

absolute error (MAE), mean square error (MSE), root mean square error (RMSE), signal to

noise ratio (SNR), and peak signal to noise ratio (PSNR) were calculated [7],[8].

The maximum absolute error (MAE) is calculated as

MAEmaxf(x,y) f*(x,y)

Wheref (x, y) is the original image data andf*(x, y) is the compressed image value. The

formulae for calculated image matrices are:

11*

00

1

= (,) (,.

NM

i jMSE fxyfxNM


13/47

13

RMSE MS

WhereMandNare the matrix dimensions in x and y, respectively.

1 1

2

0 0

1 1*

0 0

(,)10log

(,) (,)

NM

i j

NM

i j

f xySNR

f xy f xy

25520logPSNRRMS

3.2. Transform Coding

In transform coding, a block of correlated pixels is transformed into a set of less

correlated coefficients. The transform to be used for data compression should satisfy two

objectives. Firstly, it should provide energy compaction: i.e. the energy in the transform

coefficients should be concentrated to as few coefficients as possible. This is referred to as

the energy compaction property of the transform. Secondly, it should minimize the statistical

correlation between the transform coefficients. As consequence transform coding has a good

capability of data compression, because not all transform coefficients need to be transmitted

in order to obtain good image quality and even those that are transmitted need not be

represented with full accuracy in order to obtain good image quality. In addition the

transform domain coefficients are generally related to the spatial frequencies in the image and

hence the compression techniques can exploit the psycho-visual properties of the HVS, by

quantizing the higher frequency coefficients more coarsely, as the HVS is more sensitive to

the lower frequency coefficients [2].

.

3.2.1. The Discrete Cosine Transform

The important feature of the DCT is that it takes correlated input data and concentrates

its energy in just the first few transform coefficients. If the input data consists of correlated

quantities, then most of the n transform coefficients produced by the DCT are zeros or small

numbers, and only a few are large (normally the first ones). The early coefficients contain the

important (low-frequency) image information and the later coefficients contain the less-

important (high-frequency) image information. Compressing data with the DCT is therefore

done by quantizing the coefficients. The small ones are quantized coarsely (possibly all the

way to zero), and the large ones can be quantized finely to the nearest integer. Afterquantization, the coefficients (or variable-size codes assigned to the coefficients) are written


14/47

14

on the compressed stream. Decompression is done by performing the inverse DCT on the

quantized coefficients. This results in data items that are not identical to the original ones but

are not much different.

The DCT is applied to small parts (data blocks) of the image. It is computed by applying

the DCT in one dimension to each row of a data block, then to each column of the result

[2],[7],[8],[9],[11],[12]. Because of the special way the DCT in two dimensions is computed,

we say that it is separable in the two dimensions. Because it is applied to blocks of an image,

we term it a blocked transform. It is defined by

11

00

22 (21) (21)cos cos2 2

nm

ij ij xy

ij

yj xiGCCp

mn m

for 0 i n-1 and 0 j m1 and for Ci and Cj defined by Equation Gf .The firstcoefficient G00is termed the DC coefficient, and the remaining coefficients are called the

AC coefficients. The image is broken up into blocks of nm pixels pxy (with n = m = 8

typically), and Equation Gij is used to produce a block ofnm DCT coefficients Gijfor each

block of pixels. The coefficients are then quantized, which results in lossy but highly efficient

compression. The decoder reconstructs a block of quantized data values by computing the

IDCT whose definition is

11

00

22 (21) (21)cos cos2 2

nm

xy ijij

ij

xi yjp CCG

mn n

Where

1, 0

2

1, 0,

f

fC

f

, for 0 x n 1 and 0 y m 1.

Steps involved in DCT image compression technique: 1. The image is divided into kblocks of

88 pixels each. The pixels are denoted by xyp . If the number of image rows (columns) is not

divisible by 8, the bottom row (rightmost column) is duplicated as many times as needed.

2. The DCT in two dimensions is applied to each blockBi. The result is a block (well call it a

vector) W(i)

of 64 transform coefficients( )i

jw (wherej = 0, 1, . . . , 63). The kvectors W

(i)

become the rows of matrix W

(1) (1) (1)

0 1 63

(2) (2) (2)

0 1 63

( ) ( ) ( )

0 1 63

. .

. .

= . .

. .

. . .k k k

w w w

w w w

w w w

W


15/47

15

3. The 64 columns ofW are denoted by C(0)

, C(1)

, . . . , C(63)

. The kelements ofC(j)

are

(1) (2) (), , . . . , kj j jww w . The first coefficient vector C(0) consists of the kDC coefficients.

4. Each vector C(j)

is quantized separately to produce a vector Q(j)

of quantized coefficients.

The elements of Q(j) are then written on the compressed stream. In practice, variable-size

codes are assigned to the elements, and the codes, rather than the elements themselves, are

written on the compressed stream.

3.2.1.1 JPEG

Most high-quality algorithms today use some form of transform coder. One widely used

standard is the JPEG compression algorithm, based on the discrete cosine transform (DCT).

The image is partitioned into 8 8 blocks, each of which is then transformed via a tensorproduct of two 8-point DCTs. The transform coefficients are then arranged into 64 sub bands,

scalar-quantized, and adaptively Huffman coded [7]. The JPEG algorithm is discussed in

detail in next chapter.

3.2.2. Wavelets

Wavelets are functions defined over a finite interval and having an average value of

zero. The basic idea of the wavelet transform is to represent any arbitrary function as a

superposition of a set of such wavelets or basis functions. These basis functions or baby

wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilations

or contractions (scaling) and translations (shifts).

Wavelet methods involve overlapping transforms with varying-length basis functions.

The Overlapping nature of the transform (each pixel contributes to several output points)

alleviates blocking artifacts, while the multiresolution character of the wavelet decomposition

leads to superior energy compaction and perceptual quality of the decompressed image.

Furthermore, the multiresolution transform domain means that wavelet compression methods

degrade much more gracefully than block-DCT methods as the compression ratio increases.

One wavelet algorithm, the embedded zerotree wavelet (EZW) coder, yields acceptable

compression at a ratio of 100:1. Wavelet coding schemes are especially suitable for

applications where scalability and tolerable degradation are important[7],[8],[9],[12],[13].

Actually no compression is achieved by wavelets. It decomposes the image different

frequency bands and actual compression is done by quantisation and entropy coding. There


16/47

16

are many ways of decomposing image based the wavelet method used, each involving a

different algorithm and resulting in subbands with different energy compactions. Some

methods are given below[2].

1. Line: This technique is a simpler version of the standard wavelet decomposition. The

wavelet transform is applied to each row of the image, resulting in smooth coefficients on the

left (subband L1) and detail coefficients on the right (subbandH1). Subbands L1 is then

partitioned into L2 and H2, and the process is repeated until the entire coefficient matrix is

turned into detail coefficients, except the leftmost column, which contains smooth

coefficients. The wavelet transform is then applied recursively to the leftmost column,

resulting in one smooth coefficient at the top-left corner of the coefficient matrix. This last

step may be omitted if the compression method being used requires that image rows be

individually compressed.

This technique exploits correlations only within an image row to calculate the

transform coefficients. Also, discarding a coefficient that is located on the leftmost column

may affect just a particular group of rows and may this way introduce artifacts into the

reconstructed image. Implementation of this method is simple, and execution is fast, about

twice that of the standard decomposition. This type of decomposition is illustrated in Fig.3.1.

It is possible to apply this decomposition to the columns of the image, instead of to the rows.Ideally, the transform should be applied in the direction of highest image redundancy, and

experience suggests that for natural images this is the horizontal direction. Thus, in practice,

line decomposition is applied to the image rows.

2. Quincunx: Somewhat similar to the Laplacian pyramid, quincunx decomposition proceeds

level by level and decomposes subbandLi of level i into subbandsHi+1 andLi+1 of level i +1.

Figure.4 illustrates this type of decomposition. It is efficient and computationally simple. On

average, it achieves more than four times the energy compaction of the line method.

Quincunx decomposition results in fewer subbands than most other wavelet

decompositions, a feature that may lead to reconstructed images with slightly lower visual

quality. The method is not used much in practice.

3. Pyramid: The pyramid decomposition is by far the most common method used to

decompose images that are wavelet transformed. It results in subbands with horizontal,

vertical, and diagonal image details, as illustrated by Figure 3.3. The three sub bands at each

level contain horizontal, vertical, and diagonal image features at a particular scale, and each

scale is divided by an octave in spatial frequency (division of the frequency by two).


17/47

17

Figure 3.1: Line wavelet decomposition[2]

Figure 3.2: Quincunx wavelet decomposition[2]

Figure 3.3: Pyramid wavelet decomposition[2].

Figure.3.3 illustrates pyramid decomposition. It is obvious that the first step is identical

Pyramid decomposition turns out to be a very efficient way of transferring significant visual


18/47

18

data to the detail coefficients. Its computational complexity is about 30% higher thanthat of

the quincunx method, but its image reconstruction abilities are higher. The reasons for the

popularity of the pyramid method may be that (1) it is symmetrical; (2) its mathematical

description is simple.

The quincunx method leaves the high-frequency subband untouched, the pyramid

method resolves it into two bands. On the other hand, pyramid decomposition involves more

computations in order to spatially resolve the asymmetric high-frequency band into two

symmetric high frequency and low-frequency bands.

4. Standard: The first step in the standard decomposition is to apply whatever discrete

wavelet filter is being used to all the rows of the image, obtaining subbands L1 andH1. This

is repeated on L1 to obtain L2 andH2, and so on k times. This is followed by a second step

where a similar calculation is applied k times to the columns. If k = 1, the decomposition

alternates between rows and columns, but kmay be greater than 1. The end result is to have

one smooth coefficient at the top-left corner of the coefficient matrix. This method is

somewhat similar to line decomposition.

Standard decomposition has the second-highest reconstruction quality of all the methods

described here. The reason for the improvement compared to the pyramid decomposition may

be that the higher directional resolution gives thresholding a better chance to cover larger

uniform areas. On the other hand, standard decomposition is computationally more expensive

than pyramid decomposition.

5. Adaptive Wavelet Packet Decomposition: The idea is to skip those subbands splits that do

not contribute significantly to energy compaction. The result is a coefficient matrix with

subbands of different (possibly even many) sizes. The justification for this complex

decomposition method is the prevalence of continuous tone (natural) images. These images

are mostly smooth but normally also have some regions with high frequency data. Such

regions should end up as many small subbands (to better enable an accurate spatial frequency

representation of the image), with the rest of the image giving rise to a few large subbands.

The downside of this type of decomposition is finding an algorithm that will determine

which subband splits can be skipped. Such an algorithm uses entropy calculations and should

be efficient. It should identify all the splits that do not have to be performed, and it should

identify as many of them as possible.

This type of decomposition has the highest reproduction quality of all the methods

discussed here, a feature that may justify the high computational costs in certain special


19/47

19

applications. This quality, however, is not much higher than what is achieved with simpler

decomposition methods, such as standard, pyramid, or quincunx.

Figure 3.4: Standard wavelet decomposition[2].

The quantizer and encode steps are the crucial aspects of wavelet transform

compression because they are the cause for actual compression. Each of the algorithms

described below takes a different approach to these aspects[2].

1. SPIHT (set partitioning in hierarchical trees) algorithm

2. EZW (embedded zerotree wavelet) algorithm

3.2.2.1 SPIHT (Set Partitioning In Hierarchical Trees) Algorithm

Regardless of the particular filter used, the image is decomposed into subbands, such

that lower subbands correspond to higher image frequencies (they are the highpass levels)

and higher subbands correspond to lower image frequencies (lowpass levels), where most of

the image energy is concentrated (Figure 3.5). This is why we can expect the detail

coefficients to get smaller as we move from high to low levels. Also, there are spatial

similarities among the subbands. An image part, such as an edge, occupies the same spatial

position in each subband. These features of the wavelet decomposition are exploited by theSPIHT method.

SPIHT was designed for optimal progressive transmission, as well as for compression.

One of the important features of SPIHT (perhaps a unique feature) is that at any point during

the decoding of an image, the quality of the displayed image is the best that can be achieved

for the number of bits input by the decoder up to that moment.

Another important SPIHT feature is its use of embedded coding. This feature is defined

as follows: If an encoder produces two files, a large one of size Mand a small one of size m,

then the smaller file is identical to the first m bits of the larger file. The following example


20/47

20

aptly illustrates the meaning of this definition. Suppose that three users wait for you to send

them a certain compressed image, but they need different image qualities. The first one needs

the quality contained in a 10 Kb file. The image qualities required by the second and third

users are contained in files of sizes 20 Kb and 50 Kb, respectively. Most lossy image

compression methods would have to compress the same image three times, at different

qualities, to generate three files with the right sizes. SPIHT, on the other hand, produces one

file, and then three chunks of lengths 10 Kb, 20 Kb, and 50 Kb, all starting at the beginning

of that file can be sent to the three users, thereby satisfying their needs.

Another principle is based on the observation that the most significant bits of a binary

integer whose value is close to maximum tend to be ones. This suggests that the most

significant bits contain the most important image information, and that they should be sent to

the decoder first (or written first on the compressed stream). The progressive transmission

method used by SPIHT incorporates these two principles.

Figure 3.5: Subbands and levels in wavelet decomposition[13]

The main steps of the SPIHT encoder are as follows[13],[15],[16]:

Step 1: Given an image to be compressed, perform its wavelet transform using any suitable

wavelet filter, decompose it into transform coefficients ci,j , and represent the resulting

coefficients with a fixed number of bits..Set n to [log2 maxi,j(ci,j)].

Step 2: Sorting pass: Transmit the number l of coefficients ci,j that satisfy

2n |ci,j | < 2

n+1. Follow with the l pairs of coordinates and the l sign bits of those

coefficients.

Step 3: Refinement pass: Transmit the nth most significant bit of all the coefficients

satisfying |ci,j| 2n+1

. These are the coefficients that were selected in previous sorting passes

Step 4: Iterate: Decrement n by 1. If more iterations are needed go back to Step 2.


21/47

21

The last iteration is normally performed for n = 0, but the encoder can stop earlier, in

which case the least important image information (some of the least significant bits of all the

wavelet coefficients) will not be transmitted. This is the natural lossy option of SPIHT. It is

equivalent to scalar quantization, but it produces better results than what is usually achieved

with scalar quantization, since the coefficients are transmitted in sorted order.

Partitioning Sorting Algorithm:

The algorithm used by SPIHT is based on the realization that there is really no need to

sort all the coefficients. The main task of the sorting pass in each iteration is to select those

coefficients that satisfy 2n |ci,j | < 2

n+1. This task is divided into two parts. For a given value

ofn, if a coefficient ci,j satisfies |ci,j| 2n, then we say that it is significant; otherwise, it is

called insignificant. In the first iteration, relatively few coefficients will be significant, but

their number increases from iteration to iteration, because n keeps getting decremented. The

sorting pass has to determine which of the significant coefficients satisfies |ci,j | < 2n+1

and

transmit their coordinates to the decoder. This is an important part of the algorithm used by

SPIHT. The encoder partitions all the coefficients into a number of sets Tkand performs the

significance test

,

,max 2

k

n

i ji j T

c

on each set Tk. The result may be either no (all the coefficients in Tk are insignificant, so Tk

itself is considered insignificant) or yes (some coefficients in Tk are significant, so Tk itself

is significant). This result is transmitted to the decoder. If the result is yes, then Tk is

partitioned by both encoder and decoder, using the same rule, into subsets and the same

significance test is performed on all the subsets. This partitioning is repeated until all the

significant sets are reduced to size 1 (i.e., they contain one coefficient each, and that

coefficient is significant). This is how the significant coefficients are identified by the sorting

pass in each iteration. The significance test performed on a set Tcan be summarized by

(,) ,1,max()

0,

ijTij

n

csT

otherwise

Spatial Orientation Trees:

The sets Tk are created and partitioned using a special data structure called a spatial

orientation tree. The spatial orientation trees are illustrated in Figure 10a,b for a 1616

image. The figure shows two levels, level 1 (the high pass) and level 2 (the low pass). Each

level is divided into four subbands. Subband LL2 (the low pass subband) is divided into four


22/47

22

groups of 22 coefficients each. Figure 3.6(a) shows the top-left group, and Figure 3.6(b)

shows the bottom-right group. In each group, each of the four coefficients (except the top-left

one, marked in gray) becomes the root of a spatial orientation tree. The arrows show

examples of how the various levels of these trees are related. The thick arrows indicate how

each group of 44 coefficients in level 2 is the parent of four such groups in level 1. In

general, a coefficient at location (i, j) in the image is the parent of the four coefficients at

locations (2i, 2j), (2i + 1, 2j), (2i, 2j + 1), and (2i + 1, 2j + 1).

The set partitioning sorting algorithm uses the following four sets of coordinates:

1. O(i, j): the set of coordinates of the four offspring of node ( i, j). If node (i, j) is a leaf of a

spatial orientation tree, then O(i, j) is empty.

2.D(i, j): the set of coordinates of the descendants of node (i, j).

3. H(i, j): the set of coordinates of the roots of all the spatial orientation trees (3/4 of the

wavelet coefficients in the highest LL subband).

4.L(i, j): The difference setD(i, j)O(i, j). This set contains all the descendants of tree node

(i, j), except its four offspring.

Figure 3.6: Spatial orientation trees in SPIHT[13]

The spatial orientation trees are used to create and partition the sets Tk. The set partitioning

rules are as follows:

1. The initial sets are {(i, j)} andD(i, j), for all (i, j) H

2. If setD(i, j) is significant, then it is partitioned into L(i, j) plus the four singleelement sets

with the four offspring of (i, j). In other words, if any of the descendants of node (i, j) is

significant, then its four offspring become four new sets and all its other descendants become

another set (to be significance tested in rule 3).

3. IfL(i, j) is significant, then it is partitioned into the four sets D(k, l), where (k, l) are the

offspring of (i, j).


23/47

23

Once the spatial orientation trees and the set partitioning rules are understood, the coding

algorithm can be described.

SPIHT Coding [2]:

It is important to have the encoder and decoder test sets for significance in the same

way. The coding algorithm therefore uses three lists called list of significant pixels (LSP), list

of insignificant pixels (LIP), and list of insignificant sets (LIS). These are lists of coordinates

(i, j) that in the LIP and LSP represent individual coefficients, and in the LIS represent either

the setD(i, j) (a typeA entry) or the setL(i, j) (a typeB entry). The LIP contains coordinates

of coefficients that were insignificant in the previous sorting pass. In the current pass they are

tested, and those that test significant are moved to the LSP. In a similar way, sets in the LIS

are tested in sequential order, and when a set is found to be significant, it is removed from the

LIS and is partitioned. The new subsets with more than one coefficient are placed back in the

LIS, to be tested later, and the subsets with one element are tested and appended to the LIP or

the LSP, depending on the results of the test. The refinement pass transmits the nth most

significant bit of the entries in the LSP. this algorithm is given below [2].

1. Set the threshold. Set LIP to all root nodes coefficients. Set LIS to all trees (assign type D

to them). Set LSP to an empty set.

2. Sorting pass: Check the significance of all coefficients in LIP:

2.1 If significant, output 1, output a sign bit, and move the coefficient to the LSP.

2.2 If not significant, output 0.

3. Check the significance of all trees in the LIS according to the type of tree:

3.1 For a tree of type D:

3.1.1 If it is significant, output 1, and code its children:

3.1.1.1 If a child is significant, output 1, then a sign bit, add it to the LSP

3.1.1.2 If a child is insignificant, output 0 and add the child to the end of LIP.

3.1.1.3 If the children have descendants, move the tree to the end of LIS as type L, otherwise

remove it from LIS.

3.1.2 If it is insignificant, output 0.

3.2 For a tree of type L:

3.2.1 If it is significant, output 1, add each of the children to the end of LIS as an entry of

type D and remove the parent tree from the LIS.

3.2.2 If it is insignificant, output 0.


24/47

24

4. Loop: Decrement the threshold and go to step 2 if needed.

3.2.2.2. EZW (embedded coding using zerotree of wavelet coefficients)

The EZW method, as implemented in practice, starts by performing the 9-tap

symmetric quadrature mirror filter (QMF) wavelet transform. The main loop is then repeated

for values of the threshold that are halved at the end of each iteration. The threshold is used to

calculate a significance map of significant and insignificant wavelet coefficients. Zerotrees

are used to represent the significance map in an efficient way. The main steps are as

follows[2],[13][17],[18],[19]:

1. Initialization: Set the threshold Tto the smallest power of 2 that is greater than

max(i,j)|ci,j |/2, where ci,j are the wavelet coefficients.

2. Significance map coding: Scan all the coefficients in a predefined way and output a symbol

when | ci,j | > T. When the decoder inputs this symbol, it sets ci,j= 1.5T.

3. Refinement: Refine each significant coefficient by sending one more bit of its binary

representation. When the decoder receives this, it increments the current coefficient

value by 0.25T.

4. Set T= T/2, and go to step 2 if more iterations are needed.

A wavelet coefficient ci,jis considered insignificant with respect to the current threshold

Tif| ci,j| T. The zerotree data structure is based on the following wellknown experimental

result: If a wavelet coefficient at a coarse scale (i.e., high in the image pyramid) is

insignificant with respect to a given threshold T, then all of the coefficients of the same

orientation in the same spatial location at finer scales (i.e., located lower in the pyramid) are

very likely to be insignificant with respect to T.

In each iteration, all the coefficients are scanned in the order shown in Figure 3.7(a).

This guarantees that when a node is visited, all its parents will already have been scanned.

The scan starts at the lowest frequency subbandLLn, continues with subbandsHLn,LHn, and

HHn, and drops to level n 1, where it scansHLn1,LHn1, andHHn1. Each subband is

fully scanned before the algorithm proceeds to the next subband.

Each coefficient visited in the scan is classified as a zerotree root (ZTR), an isolated

zero (IZ), positive significant (POS), or negative significant (NEG). A zerotree root is a

coefficient that is insignificant and all its descendants (in the same spatial orientation tree) are

also insignificant. Such a coefficient becomes the root of a zerotree. It is encoded with a

special symbol (denoted by ZTR), and the important point is that its descendants dont have


25/47

25

to be encoded in the current iteration. When the decoder inputs a ZTR symbol, it assigns a

zero value to the coefficients and to all its descendants in the spatial orientation tree. Their

values get improved (refined) in subsequent iterations. An isolated zero is a coefficient that is

insignificant but has some significant descendants. Such a coefficient is encoded with the

special IZ symbol. The other two classes are coefficients that are significant and are positive

or negative. The flowchart of Figure11b illustrates this classification. Notice that a coefficient

is classified into one of five classes, but the fifth class (a zerotree node) is not encoded.

Coefficients in the lowest pyramid level dont have any children, so they cannot be the

roots of zerotrees. Thus, they are classified into isolated zero, positive significant or negative

significant. The zerotree can be viewed as a structure that helps to find insignificance. Most

methods that try to find structure in an image try to find significance.

Figure 3.7: (a) Scanning a zerotree. (b) Classifying a coefficient[18]

Two lists are used by the encoder (and also by the decoder, which works in lockstep) in

the scanning process. The dominant listcontains the coordinates of the coefficients that have

not been found to be significant. They are stored in the order scan, by pyramid levels, and

within each level by subbands. The subordinate listcontains the magnitudes (not coordinates)

of the coefficients that have been found to be significant. Each list is scanned once per

iteration. Iteration consists of a dominant pass followed by a subordinate pass. In the

dominant pass, coefficients from the dominant list are tested for significance. If a coefficient

is found significant, then (1) its sign is determined, (2) it is classified as either POS or NEG,

(3) its magnitude is appended to the subordinate list, and (4) it is set to zero in memory (in the


26/47

26

array containing all the wavelet coefficients). The last step is done so that the coefficient does

not prevent the occurrence of a zerotree in subsequent dominant passes at smaller thresholds.

Example [2]:

This example follows the one in. Figure 3.7(a) shows three levels of the wavelet

transform of an 88 image. The largest value is 63, so the initial threshold can be anywhere

in the range (31, 64]. We set it to 32. Figure 3.8(b) lists the results of the first dominant pass.

1. The top-left coefficient is 63. It is greater than the threshold, and it is positive, so a POS

symbol is generated and is transmitted by the encoder (and the 63 is changed to 0). The

decoder assigns this POS symbol the value 48, the midpoint of the interval [32, 64).

2. The coefficient 31 is insignificant with respect to 32, but it is not a zerotree root, since one

of its descendants (the 47 in LH1) is significant. The 31 is therefore an isolated zero (IZ).

3. The 23 is less than 32. Also, all its descendants (the 3, 12, 14, and 8 in HH2, and all of

HH1) are insignificant. The 23 is therefore a zerotree root (ZTR). As a result, no symbols will

be generated by the encoder in the dominant pass for its descendants (this is why none of the

HH2 and HH1 coefficients appear in the table).

4. The 10 is less than 32, and all its descendants (the 12, 7, 6, and 1 in HL1) are also less

than 32. Thus, the 10 becomes a zerotree root (ZTR). Notice that the 12 is greater, in

absolute value, than the 10, but is still less than the threshold.

5. The 14 is insignificant with respect to the threshold, but one of its children (they are 1,

47, 3, and 2) is significant. Thus, the 14 becomes an IZ.

6. The 47 in subband LH1 is significant with respect to the threshold, so it is coded as POS. It

is then changed to zero, so that a future pass (with a threshold of 16) will code its parent, 14,

as a zerotree root.

Four significant coefficients were transmitted during the first dominant pass. All that

the decoder knows about them is that they are in the interval [32, 64). They will be refined

during the first subordinate pass, so the decoder will be able to place them either in [32, 48)

(if it receives a 0) or in [48, 64) (if it receives a 1). The encoder generates and transmits the

bits 1010 for the four significant coefficients 63, 34, 49, and 47. Thus, the decoder refines

them to 56, 40, 56, and 40, respectively.

In the second dominant pass, only those coefficients not yet found to be significant

are scanned and tested. The ones found significant are treated as zero when the encoder

checks for zerotree roots. This second pass ends up identifying the 31 in LH3 as NEG, the

23 in HH3 as POS, the 10, 14, and 3 in LH2 as zerotree roots, and also all four coefficients


27/47

27

in LH2 and all four in HH2 as zerotree roots. The second dominant pass stops at this point,

since all other coefficients are known to be insignificant from the first dominant pass

.

Figure 3.8: An EZW example: Three levels of an 88 Image[2].

The subordinate list contains, at this point, the six magnitudes 63, 49, 34, 47, 31, and 23.

They represent the 16-bit-wide intervals [48, 64), [32, 48), and [16, 31). The encoder outputs

bits that define a new subinterval for each of the three. At the end of the second subordinate

pass, the decoder could have identified the 34 and 47 as being in different intervals, so the six

magnitudes are ordered as 63, 49, 47, 34, 31, and 23. The decoder assigns them the refined

values 60, 52, 44, 36, 28, and 20.

3.5. Quadtrees

Quad tree compression partitions the visual data into a structural part (the quad tree

structure) and colour information (the leave values). The quad tree structure shows the location

and size of each homogeneous region, the colour information represents the intensity of the

corresponding region. The generation of the quad tree follows the splitting strategy well known

from the area of image segmentation. Quad tree image compression comes in lossless as well in

lossy flavor; the lossy variant is obtained in case the homogeneity criterion is less flexible. This


28/47

28

technique is not competitive from the rate distortion efficiency viewpoint, but it is much faster

than any transform based compression technique[2].

3.6.Fractal Coding

A fractal, in simplest terms, is an image of a texture or shape expressed as one or

more mathematical formulas. In terms of fractal geometry, a fractal is a geometric form

whose irregular details recur at different scale and angle which can be described by affine or

fractal transformations (formulas). Fractals have historically been used to generate images in

applications such as flight simulator scenes and special effects in motion pictures. Fractal

formulas can now be used to describe all real world pictures [7].

Fractal image compression is the inverse of fractal image generation, i.e. Instead of

generating an image or figure from a given formula, fractal image compression searches for

sets of fractals in a digitized image which describe and represent the entire image. Once the

appropriate sets of fractals are determined, they are reduced (compressed) to very compact

fractal transform codes or formulas. The codes are 'rules' for reproducing the various sets of

fractals which, in turn, regenerate the entire image. Because fractal transform codes require

very small amounts of data to be expressed and stored as formulas, fractal compression

results in very high compression ratios.Although fractal compression exhibits promising properties (like e.g. fractal interpolation

and resolution independent decoding) the encoding complexity turned out to be prohibitive for

successful employment of the technique. Additionally, fractal coding has never reached the

rate distortion performance of second generation wavelet codecs. Fractal coding is highly

asymmetric in that significantly more processing is required for searching/encoding than for

decoding. This is because the encoding process involves many transformations and

comparisons to search for sets of fractals, while the decoder simply generates images

according to the fractal formulas received.

3.6. Vector Quantization

Vector quantization exploits similarities between image blocks and an external

codebook. The image to be encoded is tiled into smaller image blocks which are compared

against equally sized blocks in an external codebook. For each image block the most similar

codebook block is identified and the corresponding index is recorded. From the algorithmic

viewpoint, the process is similar to fractal coding, therefore fractal coding is sometimes

referred to as vector quantization with internal codebook. Similar to fractal coding, the


29/47

29

encoding process involves a search for an optimal block match and is rather costly, whereas

the decoding process in the case of vector quantization is even faster since it is a simple lookup

table operation. If the properties of the human visual system are used, the size of the

codebook can be reduced further, and fewer bits are used to represent the index of codebook

entries[7],[20].

Two major problems with VQ are, first, how to design a good codebook that is

representative of all the possible occurrences of pixel combinations in a block, and second,

how to find a best match efficiently in the codebook during the coding process.


30/47

30

Chapter 4

STANDARD METHODS FOR IMAGE COMPRESSION

With the rapid developments of imaging technology, image compression and coding

tools and techniques, it is necessary to evolve coding standards so that there is compatibility

and interoperability between the image communication and storage products manufactured by

different vendors. Without the availability of standards, encoders and decoders can not

communicate with each other; the service providers will have to support a variety of formats

to meet the needs of the customers and the customers will have to install a number of

decoders to handle a large number of data formats. Towards the objective of setting up

coding standards, the international standardization agencies, such as International Standards

Organization (ISO), International Telecommunications Union (ITU), International Electro-

technical Commission (IEC) etc. have formed expert groups and solicited proposals from

industries, universities and research laboratories. This has resulted in establishing standards

for bi-level (facsimile) images and continuous tone (gray scale) images. Basics concepts of

JPEG and JPEG2000 image compression standards are explained below.

4.1 JPEG

JPEG is a sophisticated lossy/lossless compression method for color or grayscale still

images (not videos). It does not handle bi-level (black and white) images very well. It also

works best on continuous-tone images, where adjacent pixels have similar colors. An

important feature of JPEG is its use of many parameters, allowing the user to adjust the

amount of the data lost (and thus also the compression ratio) over a very wide range. Often,

the eye cannot see any image degradation even at compression factors of 10 or 20. There are

two operating modes, lossy (also called baseline) and lossless (which typically produces

compression ratios of around 0.5). Most implementations support just the lossy mode. This

mode includes progressive and hierarchical coding.The JPEG standard has proved successful

and has become widely used for image compression, especially in Web pages.

JPEG has been designed as a compression method for continuous-tone images. The main

goals of JPEG compression are the following [21],[22],[28]:

1. High compression ratios, especially in cases where image quality is judged as very good to

excellent.


31/47

31

2. The use of many parameters, allowing knowledgeable users to experiment and achieve the

desired compression/quality trade-off.

3. Obtaining good results with any kind of continuous-tone image, regardless of image

dimensions, color spaces, pixel aspect ratios, or other image features.

4. A sophisticated, but not too complex compression method, allowing software and

hardware implementations on many platforms.

5. have the following modes of operation:

Sequential encoding: each image component is encoded in a single left-to-right, top-to-

bottom scan

Progressive encoding: the image is encoded in multiple scans for applications in which

transmission time is long, and the viewer prefers to watch the image build up in multiple

coarse-to-clear passes

Figure 4.1Progressive versus sequential presentation[22]

Figure 4.2Hierarchical multi-resolution encoding[22]

Lossless encoding: the image is encoded to guarantee exact recovery of every source image

sample value (even though the result is low compression compared to the lossy modes);


32/47

32

Hierarchical encoding: the image is encoded at multiple resolutions so that lower-resolution

versions may be accessed without first having to decompress the image at its full resolution.

The typical sequence of image presentation at the output of the decoder for sequential

versus progressive modes of operation is shown in Figure 4.1

4.1.1. Lossy and Lossless Compression

To meet the differing needs of many applications, the JPEG standard includes two basic

compression methods, each with various modes of operation. This Specification specifies two

classes of encoding and decoding processes, lossy and lossless processes. Those based on the

discrete cosine transform (DCT) are lossy, thereby allowing substantial compression to be

achieved while producing a reconstructed image with high visual fidelity to the encoders

source image.

The simplest DCT-based coding process is referred to as the baseline sequential process.

It provides a capability which is sufficient for many applications. There are additional DCT-

based processes which extend the baseline sequential process to a broader range of

applications. In any decoder using extended DCT-based decoding processes, the baseline

decoding process is required to be present in order to provide a default decoding capability.

The second class of coding processes is not based upon the DCT and is provided to meet

the needs of applications requiring lossless compression. These lossless encoding and

decoding processes are used independently of any of the DCT-based processes.

The amount of compression provided by any of the various processes is dependent on the

characteristics of the particular image being compressed, as well as on the picture quality

desired by the application and the desired speed of compression and decompression.

4.1.2 Sequential DCT-based Coding

Figures 4.3 and 4.5 show the key processing steps which are the heart of the DCT-based modes of operation. These figures illustrate the special case of single-component

(grayscale) image compression. We can grasp the essentials of DCT-based compression by

thinking of it as essentially compression of a stream of 8x8 blocks of grayscale image

samples. Color image compression can then be approximately regarded as compression of

multiple grayscale images, which are either compressed entirely one at a time, or are

compressed by alternately interleaving 8x8 sample blocks from each in turn.

In the encoding process the input components samples are grouped into 8 8 blocks,

and each block is transformed by theforward DCT(FDCT) into a set of 64 values referred to


33/47

33

asDCT coefficients. One of these values is referred to as the DC coefficientand the other 63

as the AC coefficients. Each of the 64 coefficients is then quantized using one of 64

corresponding values from a quantization table (determined by one of the table specifications

shown in Figure 4). No default values for quantization tables are specified in this

Specification; applications may specify values which customize picture quality for their

particular image characteristics, display devices, and viewing conditions.

Figure 4.3DCT-based encoder simplified diagram[22]

Figure 4.4Preparation of quantized coefficients for entropy encoding[22]

After quantization, the DC coefficient and the 63 AC coefficients are prepared for

entropy encoding, as shown in Figure 5. The previous quantized DC coefficient is used to

predict the current quantized DC coefficient, and the difference is encoded. The 63 quantized

AC coefficients undergo no such differential encoding, but are converted into a one

dimensional zig-zag sequence, as shown in Figure 4.4. The quantized coefficients are then

passed to an entropy encoding procedure which compresses the data further. IfHuffman

encoding is used,Huffman table specifications must be provided to the encoder. Ifarithmetic

encoding is used, arithmetic coding conditioning table specifications may be provided,

otherwise the default conditioning table specifications shall be used. Figure 6 shows the mainprocedures for all DCT-based decoding processes. Each step shown performs essentially the


34/47

34

inverse of its corresponding main procedure within the encoder. The entropy decoder decodes

the zig-zag sequence of quantized DCT coefficients. After dequantization the DCT

coefficients are transformed to an 8 8 block of samples by the inverse DCT(IDCT).

Figure 4.5DCT-based decoder simplified diagram[22]

4.1.3 Lossless Coding

Figure 4.6 shows the main procedures for the lossless encoding processes. A predictor

combines the reconstructed values of up to three neighbourhood samples at positions a, b, and

c to form a prediction of the sample at position x as shown in Figure 4.7. This prediction is

then subtracted from the actual value of the sample at position x, and the difference is

losslessly entropy-coded by either Huffman or arithmetic coding.Any one of the eight

predictors listed in Table 4.1 (under selection-value) can be used. Selections 1, 2, and 3 are

one-dimensional predictors and selections 4, 5, 6 and 7 are two-dimensional predictors.

Selection-value 0 can only be used for differential coding in the hierarchical mode of

operation.

For the lossless mode of operation, two different codecs are specified - one for each

entropy coding method.The encoders can use any source image precision from 2 to 16

bits/sample, and can use any of the predictors except selection-value 0. The decoders must

handle any of the sample precisions and any of the predictors. Lossless codecs typically

produce around 2:1 compression for color images with moderately complex scenes.

This encoding process may also be used in a slightly modified way, whereby the

precision of the input samples is reduced by one or more bits prior to the lossless coding.

This achieves higher compression than the lossless process (but lower compression than the

DCT-based processes for equivalent visual fidelity), and limits the reconstructed images

worst-case sample error to the amount of input precision reduction.


35/47

35

Figure 4.6Lossless encoder simplified diagram[22]

Figure 4.73-sample prediction neighbourhood[22]

Table 4.1. Predictors for lossless coding[22]

The JPEG algorithm yields good results for compression ratios of 10:1 and below (on

8-bit gray-scale images), but at higher compression ratios the underlying block nature of the

transform begins to show through the compressed image. By the time compression ratios

have reached 24:1, only the DC (lowest frequency) coefficient is getting any bits allocated to

it, and the input image has been approximated by a set of 8 8 blocks. Consequently, the

decompressed image has substantial blocking artifacts for medium and high compression

ratios.


36/47

36

4.2 JPEG 2000

The data compression field is very active, with new approaches, ideas, and techniques

being developed and implemented all the time. JPEG is widely used for image compression

but is not perfect. The use of the DCT on 88 blocks of pixels results sometimes in a

reconstructed image that has a blocky appearance (especially when the JPEG parameters are

set for much loss of information). This is why the JPEG committee has developed a new,

wavelet-based standard for the compression of still images, to be known as JPEG 2000. JPEG

2000 has many advantages over JPEG, such as better image quality at the same file size, 25-

35% smaller file sizes at comparable image quality, good image quality even at very high

compression ratios (over 80:1), low complexity option for devices with limited resources,

scalable image files, and progressive rendering and transmission through a layered image file

structure.

JPEG 2000 is not only intended to provide rate-distortion and subjective image quality

performance superior to existing standards, but also to provide features and functionalities

that current standards can either not address efficiently or in many cases cannot address at all.

Lossless and lossy compression, embedded lossy to lossless coding, progressive transmission

by pixel accuracy and by resolution, robustness to the presence of bit-errors and region-of-

interest coding, are some representative features. It is interesting to note that JPEG2000 is

designed to address the requirements of a diversity of applications, e.g. Internet, color

facsimile, printing, scanning, digital photography, remote sensing, mobile applications,

medical imagery, digital library and E-commerce.

JPEG-2000 has a long list of features, a subset of which are [24],[25].[26],27]:

High compression efficiency. Bitrates of less than 0.25 bpp are expected for highlydetailed greyscale images.

The ability to handle large images, up to 232232 pixels (the original JPEG canhandle images of up to 216216).

Progressive image transmission (Section 4.10). The proposed standard candecompress an image progressively by SNR, resolution, colour component, or region

of interest.

Easy, fast access to various points in the compressed stream. The decoder can pan/zoom the image while decompressing only parts of it. The decoder can rotate and crop the image while decompressing it.


37/47

37

Error resilience. Error-correcting codes can be included in the compressed stream, toimprove transmission reliability in noisy environments.

One of the new, important approaches to compression introduced by JPEG 2000 is the

compress once, decompress many ways paradigm. The JPEG 2000 encoder selects a

maximum image quality Q and maximum resolution R, and it compresses an image using

these parameters. The decoder can decompress the image at any image quality up to and

including Q and at any resolution less than or equal to R. Suppose that an image I was

compressed intoB bits. The decoder can extractA bits from the compressed stream (whereA

< B) and produce a lossy decompressed image that will be identical to the image obtained ifI

was originally compressed lossily toA bits.

In general, the decoder can decompress the entire image in lower quality and/or lower

resolution. It can also decompress parts of the image (regions of interest) at either maximum

or lower quality or resolution. Even more, the decoder can extractparts of the compressed

stream and assemble them to create a new compressed stream without having to do any

decompression. Thus, a lower-resolution and/or lower-quality image can be created without

the decoder having to decompress anything. The advantages of this approach are (1) it saves

time and space and (2) it prevents the buildup of image noise, common in cases where an

image is lossily compressed and decompressed several times.

Figure 4.8 shows the steps in the JPEG 2000 compression of an image. Function of each

block is explained below[26].

Figure 4.8: Steps in the JPEG 2000 compression of an image.[26]

Fig. 4.8. Tiling, DC level shifting and DWT of each image tile component[26].

Tiling Component

transform

Wavelet

transformQuantizer Entropy

coder

Packet

orderin


38/47

38

1. TilingThe first thing that happens when an image is JPEG 2000 compressed is that it is split

into rectangular tiles. Since each tile is compressed independently of every other tile, the

usual rationale for tiling is to limit the amount of memory needed to implement JPEG 2000

and to provide independent access to regions in an image. Some implementations are

designed for tiling and perform best with tiled images; other implementations can compress

megabyte and gigabyte images without tiling.

Prior to computation of the forward discrete wavelet transform (DWT) on each image

tile, all samples of the image tile component are DC level shifted by subtracting the same

quantity (i.e. the component depth). DC level shifting is performed on samples of

components that are unsigned only. If color transformation is used, it is performed prior to

computation of the forward component transforms Otherwise it is performed prior to the

wavelet Transform as shown in Figure 4.8. This process translates all pixel values from their

original, unsigned interval [0, 2s 1] (where s is the pixels depth) to the signed interval

[2s1, 2s1 1] by subtracting 2s1 from each value. For s = 4, e.g., the 24 = 16 possible

pixel values are transformed from the interval [0, 15] to the interval [8,+7] by subtracting

241 = 8 from each value.

2. Component transform

If the components in a multi-component image are red, green and blue, then an optional

component transform is available to convert them to luminance and chrominance. The

purpose of these transforms is to decorrelate the red, green and blue image components,

which improves compression performance by redistributing the energy across the image

components. In this respect, the ICT does a better job at decorrelating the red, green and blue

values than the RCT, which leads to better compression. Whichever transform is used before

compression; the inverse transform is applied after decompression to restore the red, green

and blue values.

3. Wavelet Transform

The wavelet transform is applied on each tile. The tile is decomposed in different

resolution levels. These decomposition levels are made up of subbands of coefficients that

describe the frequency characteristics of local areas (rather than across the entire tile-

component) of the tile component.


39/47

39

4. Quantizer

The next step after the wavelet transform is the quantization of the subband images,

which is a lossy step that reduces their precision in order to improve their compressibility in

the following step, which is the arithmetic coder. In lossless compression, the subband

images are passed unchanged to the arithmetic coder.

5. Entropy coder

After quantization comes the entropy coder, which takes advantage of the statistical

properties of the quantized subband images to reduce the number of bits used to represent

them. This is the stage where the actual compression occurs. While baseline JPEG use

Huffman coding, JPEG 2000 uses a more sophisticated and computationally expensive

method known as adaptive arithmetic coding. The subband images are partitioned into fixed-

size codeblocks and the arithmetic coder applied independently to each bitplane of each

subband image within a codeblock. Because arithmetic coding can become less effective for

lower bitplanes, JPEG 2000 has an optional Bypass mode that skips the coding of the lower

bitplanes, which saves time with little reduction in compression efficiency.

6. Packet ordering

Packets are the fundamental building blocks of a JPEG 2000 codestream. While a layer is an

increment in quality for the entire image, a packet is an increment in quality for a specific

position of a given resolution of a component of a tile. The interleaving of packets in a

codestream determines the progression order in which compressed data is received and

decompressed. JPEG 2000 defines five progression orders or packet orderings. In resolution-

major progression orders, the packets for all layers, positions and components of the lowest

resolution come in the codestream before all those for the next higher resolution level.

4.3. JPEG 2000 Applications in Access and Preservation

JPEG 2000 is being used for geospatial imaging, medical imaging and by the cultural

heritage and digital preservation communities. Many digital collection and library systems

support JPEG 2000, and several institutions use it in their collections. This section will

discuss the experiences of a few of those institutions, chosen as they highlight the issues[26].

An institution that has done much work in the use of JPEG 2000 and is now one of theleaders in its adoption is the Harvard University Library (HUL). The move to JPEG 2000

was driven in part by institutional clients who wanted features such as interactive zoom,

pan, and rotate. These requirements are not easily implemented with TIFF, GIF, or JPEG,

but are easily enabled by JPEG 2000. In 2006, Harvard reported the successful test


40/47

40

migration of more than 10,000 TIFF, GIF, and JPEG images to equivalent lossless and

lossy JPEG 2000 form.

Over the past several years, the rate of acquisition of new JPEG 2000 images into theHUL Digital Repository Service (DRS) has steadily increased, while that for JPEG and

TIFF has decreased. The DRS now manages about two million JPEG 2000 images, and

JPEG 2000 is becoming the default format for image conversion and acquisition. A single

JPEG 2000 master image in the repository enables the dynamic delivery of arbitrarily-

sized use images (transcoded to JPEG for rendering by the client browser), all computed

on demand from the master, thereby eliminating the need to maintain multiple variants in

anticipation of client requests. In addition, JPEG 2000 enables an interactive interface that

lets users perform the zoom, pan, and rotation operations that now form the common user

expectation for web-based image delivery.

Library and Archives Canada ran a year-long JPEG 2000 pilot project over 2006 and2007, the results of which were described at the Museums and Web 2007 conference35.

This pilot was undertaken to address many of the questions that cultural institutions have

regarding JPEG 2000. One of their main results was to show that the use of JPEG 2000

could reduce file sizes significantly without loss to image quality. In the case of lossless

archival masters, the compression ratio was typically around 2:1. For production or access

masters, they specified a recommended compression ratio of 24:1 for colour images,

which included photographs, prints, drawings and maps, and 8:1 for greyscale images,

which included newspapers, microfilm and textual materials. They found that the JPEG

2000 codec they used performed best when images were tiled, and they recommended

tiles sizes of 512 by 512 and 1024 by 1024. They also observed that the use of JPEG 2000

meant that derivative files were no longer required. The JP2 files they created in this pilot

contained XML boxes with MODS based metadata records.

The Library of Congress already makes use of JPEG 2000. For example, Civil War mapsin the American Memory collection are compressed using JPEG 2000. A clients pan and

zoom requests are served with reference to a JPEG 2000 image; the resulting views are

transcoded to JPEG for delivery to a standard web browser. The site also offers the option

of downloading the JPEG 2000 image of the map. The Librarys collection still has some

maps compressed using MrSID, a proprietary wavelet-based compression method that

predates JPEG 2000.


41/47

41

Chapter 5

COMPARATIVE STUDY

The goal of image compression is to save storage space and to reduce transmission

time for image data. It aims at achieving a high compression ratio (CR) while preserving

good fidelity of decoded images. The techniques used to compress/decompress a single gray

level image are expected to be easily modified to encode/decode color images and image

sequences. There is always a compromise between image quality and compression ratio.

There are so many methods available for image compression. The choice of a particular

method depends on application. In this chapter we review comparative studies made by

different authors.

5.1. Comparative results obtained by DELGORGE [6]

The survey was performed on 10 ultrasound images of size 768*576 The images have

been acquired by an AU3 ultrasound scanner (ESAOTE) at a rate of 15 images per second,

then digitised thanks to a Matrox Meteor board. The computing was achieved by a Pentium

III with 450 MHz, under Windows NT.

The following results represent an average measure of the MSE, PSNR, coding

computed time tcc, tenc, tdec and compression rate CRt calculated on ten rebuilt and original

images of our database. The results concerning tcc, tenc and tdec have to be looked at in

comparison with each others to appreciate the performance of each of the studied techniques

TABLE 5.1 TABLE5.2

COMPARISON RESULTS FOR LOSSLESS METHODS COMPARISON RESULTS FOR LOW AND HIGH COMPRESSION

It can be concluded that the RLE coding is not suited to ultrasound image, as its CRt

is the largest. Fano & Huffman algorithms give comparable results in terms of tenc,tdec and

CRt, with poor performances. Adaptative Huffman method presents a compression rate of

54.57 % (the final image size is about half of the original one). The last method, based onarithmetic coding, give the best compression rate, but is associated with larger compression


42/47

42

and decompression times.In conclusion, the Adaptative Huffman method gives the best

compromise between compression rate and computing times.

Experimental results performed on ten ultrasound images establish that the JPEG-LS

technique seems to be the best lossless method for tele-medicine application. In the lossy

case, JPEG-LS is the best method when the compression rate expected is greater than 5%.

And for very high compression, JPEG 2000 becomes the optimal technique.

5.2. The comparative results reported by Chaur-Chin Chen[7] for various lossymehods

Table 5.3 performance of different methods

Method Advantages Disadvantages Compression ratio

Wavelet

high compression

ratio

coefficient

ratio quantization

bit allocation>> 32

JPEG

state-of-the-art

current standard

coefficient

(DCT) quantization

bit allocation

50

VQ

simple decoder

no coefficient

quantization

slow codebook

generation

small bpp

< 32

Fractal

good mathematical

encoding frame

resolution-free

decoding

slow encoding

16

Image compression algorithms based on EZW, JPEG/DCT, VQ, and Fractal methods

were tested for four 256256 real images: Jet, Lenna, Mandrill, Peppers, and one 400400

fingerprint image. The original images ofLenna andfingerprintare shown in Figure 5.1. The

results of performance are reported in Tables 5.3. The decoded images based on the four

approaches are shown in Figures 5.2 and 5.3. The associated PSNR values and

encoding/decoding times shown in Tables 4.5 for the test images indicate that all the four

approaches are satisfactory at 0.5 bpp request (CR=16). However, the EZW has significantly

larger PSNR values and a better visual quality of decoded images compared with the other

approaches. At a desired compression of 0.25 bpp (CR=32) for the fingerprint image, the


43/47

43

commonly used VQ cannot be tested, and the fractal coding cannot be achieved unless

resolution-free decoding property is utilized which is not useful for the current purpose; both

EZW and JPEG approaches perform well, and the results of EZW have significant larger

PSNR values than that of JPEG.

Table 5.4: Performance of coding algorithms on various 256256 images.

Algorithm PSNR values (in dB)