+ All Categories
Home > Documents > Final Seminar Raju

Final Seminar Raju

Date post: 05-Apr-2018
Category:
Upload: tirupathiraju-kanumuri
View: 220 times
Download: 0 times
Share this document with a friend

of 47

Transcript
  • 7/31/2019 Final Seminar Raju

    1/47

    1

    Chapter 1

    INTRODUCTION

    Uncompressed multimedia (graphics, audio and video) data requires considerable

    storage capacity and transmission bandwidth. Despite rapid progress in mass-storage density,

    processor speeds, and digital communication system performance, demand for data storage

    capacity and data-transmission bandwidth continues to outstrip the capabilities of available

    technologies. The recent growth of data intensive multimedia-based web applications have

    not only sustained the need for more efficient ways to encode signals and images but have

    made compression of such signals central to storage and communication technology.

    In the field of image processing, image compression is the current topic of research.

    Image compression plays a crucial role in many important and diverse applications, including

    televideo conferencing, remote sensing, document & medical and facsimile transmission.

    1.1. Need For Compression

    Image data is by its nature multidimensional and tend to take up a lot of space

    Pictures take up a lot of storage space (either disk or memory). A 1000x1000 picture with 24 bits per pixel takes up 3 megabytes. The Encyclopedia Britannica scanned at 300 pixels per inch and 1 bit per pixel requires

    25,000 pages 1,000,000 bytes per page = 25 gigabytes.

    Video is even bulkier: 90 minute movie at 640480 resolution spatially, 24 bit per pixel,24 frames per second, requires 9060246404803=120 gigabytes.

    Applications: HDTV, film, remote sensing and satellite image transmission, networkcommunication, image storage, medical image processing, fax.

    1.2. Principles Behind Compression

    A common characteristic of most images is that the neighboring pixels are correlated

    and therefore contain redundant information. The foremost task then is to find less correlated

    representation of the image. Two fundamental components of compression are redundancy

  • 7/31/2019 Final Seminar Raju

    2/47

    2

    and irrelevancy reduction. Redundancy reduction aims at removing duplication from the

    signal source (image/video). Irrelevancy reduction omits parts of the signal that will not be

    noticed by the signal receiver, namely the Human Visual System (HVS). In general, three

    types of redundancy can be identified:

    Spatial Redundancy or correlation between neighboring pixel values. Spectral Redundancy or correlation between different color planes or spectral bands. Temporal Redundancy or correlation between adjacent frames in a sequence of images

    (in video applications).

    Image compression research aims at reducing the number of bits needed to represent an

    image by removing the spatial and spectral redundancies as much as possible. Since we will

    focus only on still image compression, we will not worry about temporal redundancy.

    Different methods for redundancy reduction are

    Spatial redundancy: DCT, DWT, DPCM Statistical redundancy: Run-Length coding, Variable-Length coding

    1.3. Image Compression Model

    A typical image compression model consists of source encoder which is resposible for

    reducing or eliminating any coding, interpixel or psycho visual redundancies in the input

    image. Channel is a transmission path and source decoder reconstructs the original image

    whose function is opposite to that of source encoder. The figure.1 shows the block diagram of

    image compression model [1].

    Figure.1 Image compression model[1]

    Image channel

    (a)

    ImageSource

    encoder

    ChannelSource

    decoder

    Reconstructed

    image

    mapper quantizer Symbol

    encoder

  • 7/31/2019 Final Seminar Raju

    3/47

    3

    Channel reconstructed image

    (b)

    Figure.2 (a) Source encoder (b) Source decoder[1]

    The source encoder consists of three blocks. The first stage of the source encoding

    process, the mapper transforms the input data into a format designed to reduce inter pixel

    redundancies in the input image. This operation is generally reversible and may or may not

    reduce directly the amount of data required to represent the image.

    The second stage, or quantizer block in figure.2(a), reduces the accuracy of the

    mappers output in accordance with some pre established fidelity criterion. The stage reduces

    the psychovisual redundencies of the input image. This operation is irreversible. Thus it must

    be omitted when error free compression is desired.

    In the third and final stage of the source encoding process, the symbol coderblock in

    figure.2 (a) creates a fixed- or variable-length code to represent the quantizer output and

    maps the output in accordance with the code.

    The source decoder shown in figure.2 (b) contains only two components symbol

    decoder and an inverse mapper. These blocks perform, in reverse order, the inverse

    operations of the source encoders, symbol encoderand mapperblocks.

    The lossless and lossy methods are discussed separately in the 2nd

    and 3rd

    chapters

    respectively.

    Symbol

    decoder

    Inverse

    mapper

  • 7/31/2019 Final Seminar Raju

    4/47

    4

    Chapter 2

    LOSS LESS COMPRESSION METHODS

    In numerous applications error-free compression is the only acceptable means of data

    compression. One such application is the archival of medical or business documents, where

    lossy compression usually is prohibited for legal reasons. Another is the processing of

    satellite image, where both the use and cost of collecting the data makes any loss undesirable.

    Yet another is digital radiography, where the loss of information can compromise diagnostic

    accuracy. In these and other cases, the need for errorfree compression is motivated by theintended use or nature of the image under consideration. The lossless method normally

    provides compression ratios of 2 to 10.

    2.1.Run Length Encoding

    This method reduces only inter pixel redundancy. The following example illustrates the Run

    length coding method [2].

    Original Image

    63 63 63 63 64 64 64 78 89 89 89 89

    Compressed Image

    63, 4,64,3,78,1,89,4

    Code the number of pixels taking the same value along a given scan line. Works particularly well on binary images since only length of run needs to encoded. Works by utilizing scan line coherence. Bit-plane run length encoding is used on non-binary images by considering each bit of

    the, say 8 bit, image one at a time.

    Compression rates of 1.5:1 (gray-scale / color images), 4:1 (binary images) and 2:1 (bit-plane compression on gray-scale /color images)

    May cause a data explosion: the final file may be larger than the original one.

  • 7/31/2019 Final Seminar Raju

    5/47

    5

    2.2.Huffman Coding [2],[3]

    This is the most popular technique for removing coding redundancy.

    Huffman coding works on the image brightness histogram. Finds the most commonly occurring brightness patterns and uses the shortest codes to

    represent these.

    Compression rates of 1.5 to 2:1.Huffman coding may also be used after run length coding to give further compression.

    An Example of Huffman coding:

    Figure.1 illustrates the principles of Huffman coding. Assume that we wish to transmit the set

    of 28 data points[3].

    {1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7}

    The set consists of seven distinct quantized levels, or symbols. For each symbol, S i, we

    calculate its probability of occurrence Pi by dividing its frequency of occurrence by 28, the

    total number of data points. Consequently, the construction of a Huffman code for this set

    begins with seven nodes, one associated with each P i. At each step we sort the Pi list in

    descending order, breaking the ties arbitrarily. The two nodes with smallest probability, P i

    and Pj, are merged into a new node with probability Pi + Pj. This process continues until the

    probability list contains a single value, 1.0, as shown in Figure 2.1(a).

    The process of merging nodes produces a binary tree as in Figure 2.1(b). The root of the

    tree has probability 1.0. We obtain the Huffman code of the symbols by traversing down the

    tree, assigning 1 to the left child and 0 to the right child. The resulting code words have the

    prefix property. This property ensures that a coded message is uniquely decodable without

    the need for look ahead. Figure 2.1(c) summarizes the results and shows the Huffman codes

    for the seven symbols. We enter these code word mappings into a translation table and use

    the table to pad the appropriatecode word into the output bit stream in the reduction process.

    The reduction ratio of Huffman coding depends on the distribution of the source symbols. In

    our example, the original data requires three bits to represent the seven quantized levels.

    After Huffman coding, we can calculate the expected code word length

    7

    1

    i i

    i

    El l p

    Where li represents the length of Huffman code for the symbols. This value is 2.65 in our

    example, resulting in an expected reduction ratio of 3:2.65. The reconstruction process begins

  • 7/31/2019 Final Seminar Raju

    6/47

    6

    at the root of the tree. If bit 1 is received, we traverse down the left branch, otherwise the

    right branch. We continue traversing until we reach a node with no child. We then output the

    symbol corresponding to this node and begin traversal from the root again. The

    reconstruction process of Huffman coding perfectly recovers the original data. Therefore it isa lossless algorithm.

    Figure 2.1.Illustration of Huffman coding. (a) At each step, P is are sorted in descending

    order and the two lowest Pi are merged. (b) Merging operation depicted in a binary tree. (c)

    Summary of Huffman coding for the data set[2].

    However, a transmission error of a single bit may result in more than one decoding

    error. This propagation of transmission error is a consequence of all algorithms that producevariable-length code words.

  • 7/31/2019 Final Seminar Raju

    7/47

    7

    2.3. Arithmetic Coding

    Arithmetic coding is a lossless coding method which does not suffer from the afore

    mentioned drawbacks and which tends to achieve a higher compression ratio than Huffman

    coding. However, Huffman coding can generally be realized with simpler software and

    hardware.

    The basic idea behind arithmetic coding is to map the input sequence of symbols into

    one single codeword. Symbol blocking is not needed since the codeword can be determined

    and updated incrementally as each new symbol is input (symbol-by-symbol coding). At any

    time, the determined codeword uniquely represents all the past occurring symbols. Although

    the final codeword is represented using an integral number of bits, the resulting average

    number of bits per symbol is obtained by dividing the length of the codeword by the number

    of encoded symbols [2],[3].

    Arithmetic Coding Algorithm:

    1) Divide the interval [0,1] into segments corresponding to the M symbols; the segment of

    each symbol has a length proportional to its probability.

    2) Choose the segment of the first symbol in the string message.

    3) Divide the segment of this symbol again into M new segments with length proportional tothe symbols probabilities.

    4) From these new segments, choose the one corresponding to the next symbol in the

    message.

    5) Continue steps 3) and 4) until the whole message is coded.

    6) Represent the segment's value by a binary fraction.

    2.4. Lempel-Ziv Coding

    Huffman coding and arithmetic coding requires a priori knowledge of the source symbol

    probabilities or of the source statistical model. In some cases, a sufficiently accurate source

    model is difficult to obtain, especially when several types of data (such as text, graphics, and

    natural pictures) are intermixed [4], [5].

    Dictionary-based coders dynamically build a coding table (called dictionary) of variable-

    length symbol strings as they occur in the input data. As the coding table is constructed,

    fixed-length binary code words are assigned to the variable length input symbol strings by

  • 7/31/2019 Final Seminar Raju

    8/47

    8

    indexing into the coding table. In Lempel-Ziv (LZ) coding, the decoder can also dynamically

    reconstruct the coding table and the input sequence as the code bits are received without any

    significant decoding delays. Although LZ codes do not explicitly make use of the source

    probability distribution, they asymptotically approach the source entropy rate for very long

    sequences. Because of their adaptive nature, dictionary-based codes are ineffective for short

    input sequences since these codes initially result in a lot of bits being output. So, short input

    sequences can result in data expansion instead of compression.

    Let S be the source alphabet consisting of N symbols Sk (1 < k < N). The basic steps of

    the LZW algorithm can be stated as follows [4]:

    1. Initialize the first N entries of the dictionary with the individual source symbols of S,

    2. Parse the input sequence and find the longest input string of successive symbols w

    (including the first still uuencoded symbol s in the sequence) that has a matching entry in the

    dictionary.

    3. Encode w by outputting the index (address) of the matching entry as the codeword for w.

    4. Add to the dictionary the string ws formed by concatenating w and the next input symbol s.

    5. Repeat from step 2 for the remaining input symbols starting with the symbol s, until the

    entire input sequence is encoded.

    Consider the source alphabet S = {S1, S2, S3, S4}. The encoding procedure is illustrated for

    the input sequence S1 S2 S1 S2S3 S2 S1 S2. The constructed dictionary is shown in Table.2.1.

    Table.2.1: Dictionary constructed while encoding the sequence S1 S2 S1 S2 S3 S2 S1 S2,

    which is emitted by a source with alphabet S = {S1, S2, S3, S4}[4].

    The resulting code is given by the fixed-length binary representation of the following

    sequence of dictionary addresses: 1 2 5 3 6 2. The length of the generated binary code words

    depends on the maximum allowed dictionary size. If the maximum dictionary size is M

    entries, the length of the code words would be log2 (M) rounded to the next smallest integer.

  • 7/31/2019 Final Seminar Raju

    9/47

    9

    The decoder constructs the same dictionary as the code words are received. The basic

    decoding steps can be described as follows:

    1. Start with the same initial dictionary as the encoder. Also, initialize w to be the empty

    string.

    2. Get the next "codeword", and decode it by outputting the symbol string smstored at address

    "codeword" in dictionary.

    3. Add to the dictionary the string ws formed by concatenating the previous decoded string w

    (if any) and the first symbol s of the current decoded string.

    4. Set w = m and repeat from step 2 until all the code words are decoded.

    2.5. Predictive Coding [1],[5]

    Original Image

    63 63 63 63 64 64 64 78 89 89 89 89

    Compressed Image

    63,0,0,0,1,0,0,14,11,0,0,0

    Stores the deference between successive pixels' brightness in fewer bits. Relies on the image having smooth changes in brightness: at sharp changes in the

    image we need overload patterns.

    2.7. FELICS (Fast, Efficient, and Lossless Image Compression System)

    It is a special-purpose compression method designed for greyscale images and it

    competes with the lossless mode of JPEG. It is fast and it generally produces good

    compression. However, it cannot compress an image to below one bit per pixel, so it is not a

    good choice for bi-level or for highly redundant images [2].

    The principle of FELICS is to code each pixel with a variable-size code based on the

    values of two of its previously seen neighbour pixels. Figure.2.2.(a) shows the two known

    neighbours A and B of some pixels P. For a general pixel, these are the neighbours above it

    and to its left. For a pixel in the top row, these are its two left neighbours (except for the first

    two pixels of the image). For a pixel in the leftmost column, these are the first two pixels of

    the line above it. Notice that the first two pixels of the image dont have any prev iously seen

  • 7/31/2019 Final Seminar Raju

    10/47

    10

    neighbours, but since there are only two of them, they can be output without any encoding,

    causing just a slight degradation in the overall compression.

    Consider the two neighbours A and B of a pixel P. We use A, B, and P to denote both

    the three pixels and their intensities (greyscale values). We denote by L and H the neighbours

    with the smaller and the larger intensities, respectively. Pixel P should be assigned a variable-

    size code depending on where the intensity P is located relative to L and H. There are three

    cases:

    1. The intensity of pixel P is between L and H (it is located in the central region of

    Figure.2.2.(b)). This case is known experimentally to occur in about half the pixels, and P is

    assigned, in this case, a code that starts with 0. The probability that P will be in this central

    region is almost, but not completely, flat, so P should be assigned a binary code that has

    about the same size in the entire region but is slightly shorter at the centre of the region.

    2. The intensity of P is lower than L (P is in the left region). The code assigned to P in this

    case starts with 10.

    (a) (b)Figure 2.2(a) The two neighbours ( b) The three regions[2].

    Table.2.2. The Codes for the Central Region[2].

    Pixel Region Pixel

    P code code

    L=15 0 0000

    16 0 0010

    17 0 010

    18 0 011

    19 0 100

    20 0 101

    21 0 110

    22 0 111

    23 0 0001H=24 0 0011

  • 7/31/2019 Final Seminar Raju

    11/47

    11

    3. Ps intensity is greater than H. P is assigned a code that starts with 11. When pixel P is in

    one of the outer regions, the probability that its intensity will differ from L or H by much is

    small, so P can be assigned a long code in these cases . The code assigned to P should

    therefore depend heavily on whether P is in the central region or in one of the outer regions.

    Here is how the code is assigned when P is in the central region. We need H L+1 variable-

    size codes that will not differ much in size and will, of course, satisfy the prefix property. We

    set k = [log2 (H L + 1)]and compute integers a and b by

    a = 2k+1 (H L + 1), b= 2(H L + 1 2

    k).

    Example: If H L = 9, then k= 3, a = 23+1 (9 + 1) = 6, and b = 2(9+1 23) = 4. We now

    select the a codes 2k1, 2k2,. . . expressed as k-bit numbers, and the b codes 0, 1, 2, . . .

    expressed as (k+ 1)-bit numbers. In the example above, the a codes are 8 1 = 111, 8 2 =

    110, through 8 6 = 010, and the b codes, 0000, 0001, 0010, and 0011. Table .2.2. shows

    how ten such codes can be assigned in the case L = 15, H = 24.

  • 7/31/2019 Final Seminar Raju

    12/47

    12

    Chapter 3

    LOSSY COMPRESSION

    The compression achieved via lossless schemes is often inadequate to cope with the

    volume of image data involved. Thus, lossy schemes (also called irreversible) have to be

    employed, which aim at obtaining a more compact representation of the image at the cost of

    some data loss, which however might not correspond to an equal amount of information loss.

    In other words, although the original image cannot be fully reconstructed, the degradation

    that it has undergone is not visible by a human observer for the purposes of the specific task.

    Compression ratios achieved through lossy compression range from 4:1 to 100:1 or even

    higher.

    3.1. Performance Evaluation Parameters

    To compare different algorithms of lossy compression several approaches of measuring

    the loss of quality have been devised. In the MI context, where the ultimate use of an image is

    its visual assessment and interpretation, subjective and diagnostic evaluation approaches are

    the most appropriate. However, these are largely dependent on the specific task at hand and

    moreover they entail costly and time-consuming procedures. In spite of the fact that they are

    often inadequate in predicting the visual (perceptual) quality of the decompressed image,

    objective measures are often used since they are easy to compute and are applicable to all

    kinds of images regardless of the application.

    Compression ratio is defined as the nominal bit depth of the original image in bits per

    pixel (bpp) divided by the bpp necessary to store the compressed image. For each compressed

    and reconstructed image, an error image was calculated. From the error data, maximum

    absolute error (MAE), mean square error (MSE), root mean square error (RMSE), signal to

    noise ratio (SNR), and peak signal to noise ratio (PSNR) were calculated [7],[8].

    The maximum absolute error (MAE) is calculated as

    MAEmaxf(x,y) f*(x,y)

    Wheref (x, y) is the original image data andf*(x, y) is the compressed image value. The

    formulae for calculated image matrices are:

    11*

    00

    1

    = (,) (,.

    NM

    i jMSE fxyfxNM

  • 7/31/2019 Final Seminar Raju

    13/47

    13

    RMSE MS

    WhereMandNare the matrix dimensions in x and y, respectively.

    1 1

    2

    0 0

    1 1*

    0 0

    (,)10log

    (,) (,)

    NM

    i j

    NM

    i j

    f xySNR

    f xy f xy

    25520logPSNRRMS

    3.2. Transform Coding

    In transform coding, a block of correlated pixels is transformed into a set of less

    correlated coefficients. The transform to be used for data compression should satisfy two

    objectives. Firstly, it should provide energy compaction: i.e. the energy in the transform

    coefficients should be concentrated to as few coefficients as possible. This is referred to as

    the energy compaction property of the transform. Secondly, it should minimize the statistical

    correlation between the transform coefficients. As consequence transform coding has a good

    capability of data compression, because not all transform coefficients need to be transmitted

    in order to obtain good image quality and even those that are transmitted need not be

    represented with full accuracy in order to obtain good image quality. In addition the

    transform domain coefficients are generally related to the spatial frequencies in the image and

    hence the compression techniques can exploit the psycho-visual properties of the HVS, by

    quantizing the higher frequency coefficients more coarsely, as the HVS is more sensitive to

    the lower frequency coefficients [2].

    .

    3.2.1. The Discrete Cosine Transform

    The important feature of the DCT is that it takes correlated input data and concentrates

    its energy in just the first few transform coefficients. If the input data consists of correlated

    quantities, then most of the n transform coefficients produced by the DCT are zeros or small

    numbers, and only a few are large (normally the first ones). The early coefficients contain the

    important (low-frequency) image information and the later coefficients contain the less-

    important (high-frequency) image information. Compressing data with the DCT is therefore

    done by quantizing the coefficients. The small ones are quantized coarsely (possibly all the

    way to zero), and the large ones can be quantized finely to the nearest integer. Afterquantization, the coefficients (or variable-size codes assigned to the coefficients) are written

  • 7/31/2019 Final Seminar Raju

    14/47

    14

    on the compressed stream. Decompression is done by performing the inverse DCT on the

    quantized coefficients. This results in data items that are not identical to the original ones but

    are not much different.

    The DCT is applied to small parts (data blocks) of the image. It is computed by applying

    the DCT in one dimension to each row of a data block, then to each column of the result

    [2],[7],[8],[9],[11],[12]. Because of the special way the DCT in two dimensions is computed,

    we say that it is separable in the two dimensions. Because it is applied to blocks of an image,

    we term it a blocked transform. It is defined by

    11

    00

    22 (21) (21)cos cos2 2

    nm

    ij ij xy

    ij

    yj xiGCCp

    mn m

    for 0 i n-1 and 0 j m1 and for Ci and Cj defined by Equation Gf .The firstcoefficient G00is termed the DC coefficient, and the remaining coefficients are called the

    AC coefficients. The image is broken up into blocks of nm pixels pxy (with n = m = 8

    typically), and Equation Gij is used to produce a block ofnm DCT coefficients Gijfor each

    block of pixels. The coefficients are then quantized, which results in lossy but highly efficient

    compression. The decoder reconstructs a block of quantized data values by computing the

    IDCT whose definition is

    11

    00

    22 (21) (21)cos cos2 2

    nm

    xy ijij

    ij

    xi yjp CCG

    mn n

    Where

    1, 0

    2

    1, 0,

    f

    fC

    f

    , for 0 x n 1 and 0 y m 1.

    Steps involved in DCT image compression technique: 1. The image is divided into kblocks of

    88 pixels each. The pixels are denoted by xyp . If the number of image rows (columns) is not

    divisible by 8, the bottom row (rightmost column) is duplicated as many times as needed.

    2. The DCT in two dimensions is applied to each blockBi. The result is a block (well call it a

    vector) W(i)

    of 64 transform coefficients( )i

    jw (wherej = 0, 1, . . . , 63). The kvectors W

    (i)

    become the rows of matrix W

    (1) (1) (1)

    0 1 63

    (2) (2) (2)

    0 1 63

    ( ) ( ) ( )

    0 1 63

    . .

    . .

    = . .

    . .

    . . .k k k

    w w w

    w w w

    w w w

    W

  • 7/31/2019 Final Seminar Raju

    15/47

    15

    3. The 64 columns ofW are denoted by C(0)

    , C(1)

    , . . . , C(63)

    . The kelements ofC(j)

    are

    (1) (2) (), , . . . , kj j jww w . The first coefficient vector C(0) consists of the kDC coefficients.

    4. Each vector C(j)

    is quantized separately to produce a vector Q(j)

    of quantized coefficients.

    The elements of Q(j) are then written on the compressed stream. In practice, variable-size

    codes are assigned to the elements, and the codes, rather than the elements themselves, are

    written on the compressed stream.

    3.2.1.1 JPEG

    Most high-quality algorithms today use some form of transform coder. One widely used

    standard is the JPEG compression algorithm, based on the discrete cosine transform (DCT).

    The image is partitioned into 8 8 blocks, each of which is then transformed via a tensorproduct of two 8-point DCTs. The transform coefficients are then arranged into 64 sub bands,

    scalar-quantized, and adaptively Huffman coded [7]. The JPEG algorithm is discussed in

    detail in next chapter.

    3.2.2. Wavelets

    Wavelets are functions defined over a finite interval and having an average value of

    zero. The basic idea of the wavelet transform is to represent any arbitrary function as a

    superposition of a set of such wavelets or basis functions. These basis functions or baby

    wavelets are obtained from a single prototype wavelet called the mother wavelet, by dilations

    or contractions (scaling) and translations (shifts).

    Wavelet methods involve overlapping transforms with varying-length basis functions.

    The Overlapping nature of the transform (each pixel contributes to several output points)

    alleviates blocking artifacts, while the multiresolution character of the wavelet decomposition

    leads to superior energy compaction and perceptual quality of the decompressed image.

    Furthermore, the multiresolution transform domain means that wavelet compression methods

    degrade much more gracefully than block-DCT methods as the compression ratio increases.

    One wavelet algorithm, the embedded zerotree wavelet (EZW) coder, yields acceptable

    compression at a ratio of 100:1. Wavelet coding schemes are especially suitable for

    applications where scalability and tolerable degradation are important[7],[8],[9],[12],[13].

    Actually no compression is achieved by wavelets. It decomposes the image different

    frequency bands and actual compression is done by quantisation and entropy coding. There

  • 7/31/2019 Final Seminar Raju

    16/47

    16

    are many ways of decomposing image based the wavelet method used, each involving a

    different algorithm and resulting in subbands with different energy compactions. Some

    methods are given below[2].

    1. Line: This technique is a simpler version of the standard wavelet decomposition. The

    wavelet transform is applied to each row of the image, resulting in smooth coefficients on the

    left (subband L1) and detail coefficients on the right (subbandH1). Subbands L1 is then

    partitioned into L2 and H2, and the process is repeated until the entire coefficient matrix is

    turned into detail coefficients, except the leftmost column, which contains smooth

    coefficients. The wavelet transform is then applied recursively to the leftmost column,

    resulting in one smooth coefficient at the top-left corner of the coefficient matrix. This last

    step may be omitted if the compression method being used requires that image rows be

    individually compressed.

    This technique exploits correlations only within an image row to calculate the

    transform coefficients. Also, discarding a coefficient that is located on the leftmost column

    may affect just a particular group of rows and may this way introduce artifacts into the

    reconstructed image. Implementation of this method is simple, and execution is fast, about

    twice that of the standard decomposition. This type of decomposition is illustrated in Fig.3.1.

    It is possible to apply this decomposition to the columns of the image, instead of to the rows.Ideally, the transform should be applied in the direction of highest image redundancy, and

    experience suggests that for natural images this is the horizontal direction. Thus, in practice,

    line decomposition is applied to the image rows.

    2. Quincunx: Somewhat similar to the Laplacian pyramid, quincunx decomposition proceeds

    level by level and decomposes subbandLi of level i into subbandsHi+1 andLi+1 of level i +1.

    Figure.4 illustrates this type of decomposition. It is efficient and computationally simple. On

    average, it achieves more than four times the energy compaction of the line method.

    Quincunx decomposition results in fewer subbands than most other wavelet

    decompositions, a feature that may lead to reconstructed images with slightly lower visual

    quality. The method is not used much in practice.

    3. Pyramid: The pyramid decomposition is by far the most common method used to

    decompose images that are wavelet transformed. It results in subbands with horizontal,

    vertical, and diagonal image details, as illustrated by Figure 3.3. The three sub bands at each

    level contain horizontal, vertical, and diagonal image features at a particular scale, and each

    scale is divided by an octave in spatial frequency (division of the frequency by two).

  • 7/31/2019 Final Seminar Raju

    17/47

    17

    Figure 3.1: Line wavelet decomposition[2]

    Figure 3.2: Quincunx wavelet decomposition[2]

    Figure 3.3: Pyramid wavelet decomposition[2].

    Figure.3.3 illustrates pyramid decomposition. It is obvious that the first step is identical

    Pyramid decomposition turns out to be a very efficient way of transferring significant visual

  • 7/31/2019 Final Seminar Raju

    18/47

    18

    data to the detail coefficients. Its computational complexity is about 30% higher thanthat of

    the quincunx method, but its image reconstruction abilities are higher. The reasons for the

    popularity of the pyramid method may be that (1) it is symmetrical; (2) its mathematical

    description is simple.

    The quincunx method leaves the high-frequency subband untouched, the pyramid

    method resolves it into two bands. On the other hand, pyramid decomposition involves more

    computations in order to spatially resolve the asymmetric high-frequency band into two

    symmetric high frequency and low-frequency bands.

    4. Standard: The first step in the standard decomposition is to apply whatever discrete

    wavelet filter is being used to all the rows of the image, obtaining subbands L1 andH1. This

    is repeated on L1 to obtain L2 andH2, and so on k times. This is followed by a second step

    where a similar calculation is applied k times to the columns. If k = 1, the decomposition

    alternates between rows and columns, but kmay be greater than 1. The end result is to have

    one smooth coefficient at the top-left corner of the coefficient matrix. This method is

    somewhat similar to line decomposition.

    Standard decomposition has the second-highest reconstruction quality of all the methods

    described here. The reason for the improvement compared to the pyramid decomposition may

    be that the higher directional resolution gives thresholding a better chance to cover larger

    uniform areas. On the other hand, standard decomposition is computationally more expensive

    than pyramid decomposition.

    5. Adaptive Wavelet Packet Decomposition: The idea is to skip those subbands splits that do

    not contribute significantly to energy compaction. The result is a coefficient matrix with

    subbands of different (possibly even many) sizes. The justification for this complex

    decomposition method is the prevalence of continuous tone (natural) images. These images

    are mostly smooth but normally also have some regions with high frequency data. Such

    regions should end up as many small subbands (to better enable an accurate spatial frequency

    representation of the image), with the rest of the image giving rise to a few large subbands.

    The downside of this type of decomposition is finding an algorithm that will determine

    which subband splits can be skipped. Such an algorithm uses entropy calculations and should

    be efficient. It should identify all the splits that do not have to be performed, and it should

    identify as many of them as possible.

    This type of decomposition has the highest reproduction quality of all the methods

    discussed here, a feature that may justify the high computational costs in certain special

  • 7/31/2019 Final Seminar Raju

    19/47

    19

    applications. This quality, however, is not much higher than what is achieved with simpler

    decomposition methods, such as standard, pyramid, or quincunx.

    Figure 3.4: Standard wavelet decomposition[2].

    The quantizer and encode steps are the crucial aspects of wavelet transform

    compression because they are the cause for actual compression. Each of the algorithms

    described below takes a different approach to these aspects[2].

    1. SPIHT (set partitioning in hierarchical trees) algorithm

    2. EZW (embedded zerotree wavelet) algorithm

    3.2.2.1 SPIHT (Set Partitioning In Hierarchical Trees) Algorithm

    Regardless of the particular filter used, the image is decomposed into subbands, such

    that lower subbands correspond to higher image frequencies (they are the highpass levels)

    and higher subbands correspond to lower image frequencies (lowpass levels), where most of

    the image energy is concentrated (Figure 3.5). This is why we can expect the detail

    coefficients to get smaller as we move from high to low levels. Also, there are spatial

    similarities among the subbands. An image part, such as an edge, occupies the same spatial

    position in each subband. These features of the wavelet decomposition are exploited by theSPIHT method.

    SPIHT was designed for optimal progressive transmission, as well as for compression.

    One of the important features of SPIHT (perhaps a unique feature) is that at any point during

    the decoding of an image, the quality of the displayed image is the best that can be achieved

    for the number of bits input by the decoder up to that moment.

    Another important SPIHT feature is its use of embedded coding. This feature is defined

    as follows: If an encoder produces two files, a large one of size Mand a small one of size m,

    then the smaller file is identical to the first m bits of the larger file. The following example

  • 7/31/2019 Final Seminar Raju

    20/47

    20

    aptly illustrates the meaning of this definition. Suppose that three users wait for you to send

    them a certain compressed image, but they need different image qualities. The first one needs

    the quality contained in a 10 Kb file. The image qualities required by the second and third

    users are contained in files of sizes 20 Kb and 50 Kb, respectively. Most lossy image

    compression methods would have to compress the same image three times, at different

    qualities, to generate three files with the right sizes. SPIHT, on the other hand, produces one

    file, and then three chunks of lengths 10 Kb, 20 Kb, and 50 Kb, all starting at the beginning

    of that file can be sent to the three users, thereby satisfying their needs.

    Another principle is based on the observation that the most significant bits of a binary

    integer whose value is close to maximum tend to be ones. This suggests that the most

    significant bits contain the most important image information, and that they should be sent to

    the decoder first (or written first on the compressed stream). The progressive transmission

    method used by SPIHT incorporates these two principles.

    Figure 3.5: Subbands and levels in wavelet decomposition[13]

    The main steps of the SPIHT encoder are as follows[13],[15],[16]:

    Step 1: Given an image to be compressed, perform its wavelet transform using any suitable

    wavelet filter, decompose it into transform coefficients ci,j , and represent the resulting

    coefficients with a fixed number of bits..Set n to [log2 maxi,j(ci,j)].

    Step 2: Sorting pass: Transmit the number l of coefficients ci,j that satisfy

    2n |ci,j | < 2

    n+1. Follow with the l pairs of coordinates and the l sign bits of those

    coefficients.

    Step 3: Refinement pass: Transmit the nth most significant bit of all the coefficients

    satisfying |ci,j| 2n+1

    . These are the coefficients that were selected in previous sorting passes

    Step 4: Iterate: Decrement n by 1. If more iterations are needed go back to Step 2.

  • 7/31/2019 Final Seminar Raju

    21/47

    21

    The last iteration is normally performed for n = 0, but the encoder can stop earlier, in

    which case the least important image information (some of the least significant bits of all the

    wavelet coefficients) will not be transmitted. This is the natural lossy option of SPIHT. It is

    equivalent to scalar quantization, but it produces better results than what is usually achieved

    with scalar quantization, since the coefficients are transmitted in sorted order.

    Partitioning Sorting Algorithm:

    The algorithm used by SPIHT is based on the realization that there is really no need to

    sort all the coefficients. The main task of the sorting pass in each iteration is to select those

    coefficients that satisfy 2n |ci,j | < 2

    n+1. This task is divided into two parts. For a given value

    ofn, if a coefficient ci,j satisfies |ci,j| 2n, then we say that it is significant; otherwise, it is

    called insignificant. In the first iteration, relatively few coefficients will be significant, but

    their number increases from iteration to iteration, because n keeps getting decremented. The

    sorting pass has to determine which of the significant coefficients satisfies |ci,j | < 2n+1

    and

    transmit their coordinates to the decoder. This is an important part of the algorithm used by

    SPIHT. The encoder partitions all the coefficients into a number of sets Tkand performs the

    significance test

    ,

    ,max 2

    k

    n

    i ji j T

    c

    on each set Tk. The result may be either no (all the coefficients in Tk are insignificant, so Tk

    itself is considered insignificant) or yes (some coefficients in Tk are significant, so Tk itself

    is significant). This result is transmitted to the decoder. If the result is yes, then Tk is

    partitioned by both encoder and decoder, using the same rule, into subsets and the same

    significance test is performed on all the subsets. This partitioning is repeated until all the

    significant sets are reduced to size 1 (i.e., they contain one coefficient each, and that

    coefficient is significant). This is how the significant coefficients are identified by the sorting

    pass in each iteration. The significance test performed on a set Tcan be summarized by

    (,) ,1,max()

    0,

    ijTij

    n

    csT

    otherwise

    Spatial Orientation Trees:

    The sets Tk are created and partitioned using a special data structure called a spatial

    orientation tree. The spatial orientation trees are illustrated in Figure 10a,b for a 1616

    image. The figure shows two levels, level 1 (the high pass) and level 2 (the low pass). Each

    level is divided into four subbands. Subband LL2 (the low pass subband) is divided into four

  • 7/31/2019 Final Seminar Raju

    22/47

    22

    groups of 22 coefficients each. Figure 3.6(a) shows the top-left group, and Figure 3.6(b)

    shows the bottom-right group. In each group, each of the four coefficients (except the top-left

    one, marked in gray) becomes the root of a spatial orientation tree. The arrows show

    examples of how the various levels of these trees are related. The thick arrows indicate how

    each group of 44 coefficients in level 2 is the parent of four such groups in level 1. In

    general, a coefficient at location (i, j) in the image is the parent of the four coefficients at

    locations (2i, 2j), (2i + 1, 2j), (2i, 2j + 1), and (2i + 1, 2j + 1).

    The set partitioning sorting algorithm uses the following four sets of coordinates:

    1. O(i, j): the set of coordinates of the four offspring of node ( i, j). If node (i, j) is a leaf of a

    spatial orientation tree, then O(i, j) is empty.

    2.D(i, j): the set of coordinates of the descendants of node (i, j).

    3. H(i, j): the set of coordinates of the roots of all the spatial orientation trees (3/4 of the

    wavelet coefficients in the highest LL subband).

    4.L(i, j): The difference setD(i, j)O(i, j). This set contains all the descendants of tree node

    (i, j), except its four offspring.

    Figure 3.6: Spatial orientation trees in SPIHT[13]

    The spatial orientation trees are used to create and partition the sets Tk. The set partitioning

    rules are as follows:

    1. The initial sets are {(i, j)} andD(i, j), for all (i, j) H

    2. If setD(i, j) is significant, then it is partitioned into L(i, j) plus the four singleelement sets

    with the four offspring of (i, j). In other words, if any of the descendants of node (i, j) is

    significant, then its four offspring become four new sets and all its other descendants become

    another set (to be significance tested in rule 3).

    3. IfL(i, j) is significant, then it is partitioned into the four sets D(k, l), where (k, l) are the

    offspring of (i, j).

  • 7/31/2019 Final Seminar Raju

    23/47

    23

    Once the spatial orientation trees and the set partitioning rules are understood, the coding

    algorithm can be described.

    SPIHT Coding [2]:

    It is important to have the encoder and decoder test sets for significance in the same

    way. The coding algorithm therefore uses three lists called list of significant pixels (LSP), list

    of insignificant pixels (LIP), and list of insignificant sets (LIS). These are lists of coordinates

    (i, j) that in the LIP and LSP represent individual coefficients, and in the LIS represent either

    the setD(i, j) (a typeA entry) or the setL(i, j) (a typeB entry). The LIP contains coordinates

    of coefficients that were insignificant in the previous sorting pass. In the current pass they are

    tested, and those that test significant are moved to the LSP. In a similar way, sets in the LIS

    are tested in sequential order, and when a set is found to be significant, it is removed from the

    LIS and is partitioned. The new subsets with more than one coefficient are placed back in the

    LIS, to be tested later, and the subsets with one element are tested and appended to the LIP or

    the LSP, depending on the results of the test. The refinement pass transmits the nth most

    significant bit of the entries in the LSP. this algorithm is given below [2].

    1. Set the threshold. Set LIP to all root nodes coefficients. Set LIS to all trees (assign type D

    to them). Set LSP to an empty set.

    2. Sorting pass: Check the significance of all coefficients in LIP:

    2.1 If significant, output 1, output a sign bit, and move the coefficient to the LSP.

    2.2 If not significant, output 0.

    3. Check the significance of all trees in the LIS according to the type of tree:

    3.1 For a tree of type D:

    3.1.1 If it is significant, output 1, and code its children:

    3.1.1.1 If a child is significant, output 1, then a sign bit, add it to the LSP

    3.1.1.2 If a child is insignificant, output 0 and add the child to the end of LIP.

    3.1.1.3 If the children have descendants, move the tree to the end of LIS as type L, otherwise

    remove it from LIS.

    3.1.2 If it is insignificant, output 0.

    3.2 For a tree of type L:

    3.2.1 If it is significant, output 1, add each of the children to the end of LIS as an entry of

    type D and remove the parent tree from the LIS.

    3.2.2 If it is insignificant, output 0.

  • 7/31/2019 Final Seminar Raju

    24/47

    24

    4. Loop: Decrement the threshold and go to step 2 if needed.

    3.2.2.2. EZW (embedded coding using zerotree of wavelet coefficients)

    The EZW method, as implemented in practice, starts by performing the 9-tap

    symmetric quadrature mirror filter (QMF) wavelet transform. The main loop is then repeated

    for values of the threshold that are halved at the end of each iteration. The threshold is used to

    calculate a significance map of significant and insignificant wavelet coefficients. Zerotrees

    are used to represent the significance map in an efficient way. The main steps are as

    follows[2],[13][17],[18],[19]:

    1. Initialization: Set the threshold Tto the smallest power of 2 that is greater than

    max(i,j)|ci,j |/2, where ci,j are the wavelet coefficients.

    2. Significance map coding: Scan all the coefficients in a predefined way and output a symbol

    when | ci,j | > T. When the decoder inputs this symbol, it sets ci,j= 1.5T.

    3. Refinement: Refine each significant coefficient by sending one more bit of its binary

    representation. When the decoder receives this, it increments the current coefficient

    value by 0.25T.

    4. Set T= T/2, and go to step 2 if more iterations are needed.

    A wavelet coefficient ci,jis considered insignificant with respect to the current threshold

    Tif| ci,j| T. The zerotree data structure is based on the following wellknown experimental

    result: If a wavelet coefficient at a coarse scale (i.e., high in the image pyramid) is

    insignificant with respect to a given threshold T, then all of the coefficients of the same

    orientation in the same spatial location at finer scales (i.e., located lower in the pyramid) are

    very likely to be insignificant with respect to T.

    In each iteration, all the coefficients are scanned in the order shown in Figure 3.7(a).

    This guarantees that when a node is visited, all its parents will already have been scanned.

    The scan starts at the lowest frequency subbandLLn, continues with subbandsHLn,LHn, and

    HHn, and drops to level n 1, where it scansHLn1,LHn1, andHHn1. Each subband is

    fully scanned before the algorithm proceeds to the next subband.

    Each coefficient visited in the scan is classified as a zerotree root (ZTR), an isolated

    zero (IZ), positive significant (POS), or negative significant (NEG). A zerotree root is a

    coefficient that is insignificant and all its descendants (in the same spatial orientation tree) are

    also insignificant. Such a coefficient becomes the root of a zerotree. It is encoded with a

    special symbol (denoted by ZTR), and the important point is that its descendants dont have

  • 7/31/2019 Final Seminar Raju

    25/47

    25

    to be encoded in the current iteration. When the decoder inputs a ZTR symbol, it assigns a

    zero value to the coefficients and to all its descendants in the spatial orientation tree. Their

    values get improved (refined) in subsequent iterations. An isolated zero is a coefficient that is

    insignificant but has some significant descendants. Such a coefficient is encoded with the

    special IZ symbol. The other two classes are coefficients that are significant and are positive

    or negative. The flowchart of Figure11b illustrates this classification. Notice that a coefficient

    is classified into one of five classes, but the fifth class (a zerotree node) is not encoded.

    Coefficients in the lowest pyramid level dont have any children, so they cannot be the

    roots of zerotrees. Thus, they are classified into isolated zero, positive significant or negative

    significant. The zerotree can be viewed as a structure that helps to find insignificance. Most

    methods that try to find structure in an image try to find significance.

    Figure 3.7: (a) Scanning a zerotree. (b) Classifying a coefficient[18]

    Two lists are used by the encoder (and also by the decoder, which works in lockstep) in

    the scanning process. The dominant listcontains the coordinates of the coefficients that have

    not been found to be significant. They are stored in the order scan, by pyramid levels, and

    within each level by subbands. The subordinate listcontains the magnitudes (not coordinates)

    of the coefficients that have been found to be significant. Each list is scanned once per

    iteration. Iteration consists of a dominant pass followed by a subordinate pass. In the

    dominant pass, coefficients from the dominant list are tested for significance. If a coefficient

    is found significant, then (1) its sign is determined, (2) it is classified as either POS or NEG,

    (3) its magnitude is appended to the subordinate list, and (4) it is set to zero in memory (in the

  • 7/31/2019 Final Seminar Raju

    26/47

    26

    array containing all the wavelet coefficients). The last step is done so that the coefficient does

    not prevent the occurrence of a zerotree in subsequent dominant passes at smaller thresholds.

    Example [2]:

    This example follows the one in. Figure 3.7(a) shows three levels of the wavelet

    transform of an 88 image. The largest value is 63, so the initial threshold can be anywhere

    in the range (31, 64]. We set it to 32. Figure 3.8(b) lists the results of the first dominant pass.

    1. The top-left coefficient is 63. It is greater than the threshold, and it is positive, so a POS

    symbol is generated and is transmitted by the encoder (and the 63 is changed to 0). The

    decoder assigns this POS symbol the value 48, the midpoint of the interval [32, 64).

    2. The coefficient 31 is insignificant with respect to 32, but it is not a zerotree root, since one

    of its descendants (the 47 in LH1) is significant. The 31 is therefore an isolated zero (IZ).

    3. The 23 is less than 32. Also, all its descendants (the 3, 12, 14, and 8 in HH2, and all of

    HH1) are insignificant. The 23 is therefore a zerotree root (ZTR). As a result, no symbols will

    be generated by the encoder in the dominant pass for its descendants (this is why none of the

    HH2 and HH1 coefficients appear in the table).

    4. The 10 is less than 32, and all its descendants (the 12, 7, 6, and 1 in HL1) are also less

    than 32. Thus, the 10 becomes a zerotree root (ZTR). Notice that the 12 is greater, in

    absolute value, than the 10, but is still less than the threshold.

    5. The 14 is insignificant with respect to the threshold, but one of its children (they are 1,

    47, 3, and 2) is significant. Thus, the 14 becomes an IZ.

    6. The 47 in subband LH1 is significant with respect to the threshold, so it is coded as POS. It

    is then changed to zero, so that a future pass (with a threshold of 16) will code its parent, 14,

    as a zerotree root.

    Four significant coefficients were transmitted during the first dominant pass. All that

    the decoder knows about them is that they are in the interval [32, 64). They will be refined

    during the first subordinate pass, so the decoder will be able to place them either in [32, 48)

    (if it receives a 0) or in [48, 64) (if it receives a 1). The encoder generates and transmits the

    bits 1010 for the four significant coefficients 63, 34, 49, and 47. Thus, the decoder refines

    them to 56, 40, 56, and 40, respectively.

    In the second dominant pass, only those coefficients not yet found to be significant

    are scanned and tested. The ones found significant are treated as zero when the encoder

    checks for zerotree roots. This second pass ends up identifying the 31 in LH3 as NEG, the

    23 in HH3 as POS, the 10, 14, and 3 in LH2 as zerotree roots, and also all four coefficients

  • 7/31/2019 Final Seminar Raju

    27/47

    27

    in LH2 and all four in HH2 as zerotree roots. The second dominant pass stops at this point,

    since all other coefficients are known to be insignificant from the first dominant pass

    .

    Figure 3.8: An EZW example: Three levels of an 88 Image[2].

    The subordinate list contains, at this point, the six magnitudes 63, 49, 34, 47, 31, and 23.

    They represent the 16-bit-wide intervals [48, 64), [32, 48), and [16, 31). The encoder outputs

    bits that define a new subinterval for each of the three. At the end of the second subordinate

    pass, the decoder could have identified the 34 and 47 as being in different intervals, so the six

    magnitudes are ordered as 63, 49, 47, 34, 31, and 23. The decoder assigns them the refined

    values 60, 52, 44, 36, 28, and 20.

    3.5. Quadtrees

    Quad tree compression partitions the visual data into a structural part (the quad tree

    structure) and colour information (the leave values). The quad tree structure shows the location

    and size of each homogeneous region, the colour information represents the intensity of the

    corresponding region. The generation of the quad tree follows the splitting strategy well known

    from the area of image segmentation. Quad tree image compression comes in lossless as well in

    lossy flavor; the lossy variant is obtained in case the homogeneity criterion is less flexible. This

  • 7/31/2019 Final Seminar Raju

    28/47

    28

    technique is not competitive from the rate distortion efficiency viewpoint, but it is much faster

    than any transform based compression technique[2].

    3.6.Fractal Coding

    A fractal, in simplest terms, is an image of a texture or shape expressed as one or

    more mathematical formulas. In terms of fractal geometry, a fractal is a geometric form

    whose irregular details recur at different scale and angle which can be described by affine or

    fractal transformations (formulas). Fractals have historically been used to generate images in

    applications such as flight simulator scenes and special effects in motion pictures. Fractal

    formulas can now be used to describe all real world pictures [7].

    Fractal image compression is the inverse of fractal image generation, i.e. Instead of

    generating an image or figure from a given formula, fractal image compression searches for

    sets of fractals in a digitized image which describe and represent the entire image. Once the

    appropriate sets of fractals are determined, they are reduced (compressed) to very compact

    fractal transform codes or formulas. The codes are 'rules' for reproducing the various sets of

    fractals which, in turn, regenerate the entire image. Because fractal transform codes require

    very small amounts of data to be expressed and stored as formulas, fractal compression

    results in very high compression ratios.Although fractal compression exhibits promising properties (like e.g. fractal interpolation

    and resolution independent decoding) the encoding complexity turned out to be prohibitive for

    successful employment of the technique. Additionally, fractal coding has never reached the

    rate distortion performance of second generation wavelet codecs. Fractal coding is highly

    asymmetric in that significantly more processing is required for searching/encoding than for

    decoding. This is because the encoding process involves many transformations and

    comparisons to search for sets of fractals, while the decoder simply generates images

    according to the fractal formulas received.

    3.6. Vector Quantization

    Vector quantization exploits similarities between image blocks and an external

    codebook. The image to be encoded is tiled into smaller image blocks which are compared

    against equally sized blocks in an external codebook. For each image block the most similar

    codebook block is identified and the corresponding index is recorded. From the algorithmic

    viewpoint, the process is similar to fractal coding, therefore fractal coding is sometimes

    referred to as vector quantization with internal codebook. Similar to fractal coding, the

  • 7/31/2019 Final Seminar Raju

    29/47

    29

    encoding process involves a search for an optimal block match and is rather costly, whereas

    the decoding process in the case of vector quantization is even faster since it is a simple lookup

    table operation. If the properties of the human visual system are used, the size of the

    codebook can be reduced further, and fewer bits are used to represent the index of codebook

    entries[7],[20].

    Two major problems with VQ are, first, how to design a good codebook that is

    representative of all the possible occurrences of pixel combinations in a block, and second,

    how to find a best match efficiently in the codebook during the coding process.

  • 7/31/2019 Final Seminar Raju

    30/47

    30

    Chapter 4

    STANDARD METHODS FOR IMAGE COMPRESSION

    With the rapid developments of imaging technology, image compression and coding

    tools and techniques, it is necessary to evolve coding standards so that there is compatibility

    and interoperability between the image communication and storage products manufactured by

    different vendors. Without the availability of standards, encoders and decoders can not

    communicate with each other; the service providers will have to support a variety of formats

    to meet the needs of the customers and the customers will have to install a number of

    decoders to handle a large number of data formats. Towards the objective of setting up

    coding standards, the international standardization agencies, such as International Standards

    Organization (ISO), International Telecommunications Union (ITU), International Electro-

    technical Commission (IEC) etc. have formed expert groups and solicited proposals from

    industries, universities and research laboratories. This has resulted in establishing standards

    for bi-level (facsimile) images and continuous tone (gray scale) images. Basics concepts of

    JPEG and JPEG2000 image compression standards are explained below.

    4.1 JPEG

    JPEG is a sophisticated lossy/lossless compression method for color or grayscale still

    images (not videos). It does not handle bi-level (black and white) images very well. It also

    works best on continuous-tone images, where adjacent pixels have similar colors. An

    important feature of JPEG is its use of many parameters, allowing the user to adjust the

    amount of the data lost (and thus also the compression ratio) over a very wide range. Often,

    the eye cannot see any image degradation even at compression factors of 10 or 20. There are

    two operating modes, lossy (also called baseline) and lossless (which typically produces

    compression ratios of around 0.5). Most implementations support just the lossy mode. This

    mode includes progressive and hierarchical coding.The JPEG standard has proved successful

    and has become widely used for image compression, especially in Web pages.

    JPEG has been designed as a compression method for continuous-tone images. The main

    goals of JPEG compression are the following [21],[22],[28]:

    1. High compression ratios, especially in cases where image quality is judged as very good to

    excellent.

  • 7/31/2019 Final Seminar Raju

    31/47

    31

    2. The use of many parameters, allowing knowledgeable users to experiment and achieve the

    desired compression/quality trade-off.

    3. Obtaining good results with any kind of continuous-tone image, regardless of image

    dimensions, color spaces, pixel aspect ratios, or other image features.

    4. A sophisticated, but not too complex compression method, allowing software and

    hardware implementations on many platforms.

    5. have the following modes of operation:

    Sequential encoding: each image component is encoded in a single left-to-right, top-to-

    bottom scan

    Progressive encoding: the image is encoded in multiple scans for applications in which

    transmission time is long, and the viewer prefers to watch the image build up in multiple

    coarse-to-clear passes

    Figure 4.1Progressive versus sequential presentation[22]

    Figure 4.2Hierarchical multi-resolution encoding[22]

    Lossless encoding: the image is encoded to guarantee exact recovery of every source image

    sample value (even though the result is low compression compared to the lossy modes);

  • 7/31/2019 Final Seminar Raju

    32/47

    32

    Hierarchical encoding: the image is encoded at multiple resolutions so that lower-resolution

    versions may be accessed without first having to decompress the image at its full resolution.

    The typical sequence of image presentation at the output of the decoder for sequential

    versus progressive modes of operation is shown in Figure 4.1

    4.1.1. Lossy and Lossless Compression

    To meet the differing needs of many applications, the JPEG standard includes two basic

    compression methods, each with various modes of operation. This Specification specifies two

    classes of encoding and decoding processes, lossy and lossless processes. Those based on the

    discrete cosine transform (DCT) are lossy, thereby allowing substantial compression to be

    achieved while producing a reconstructed image with high visual fidelity to the encoders

    source image.

    The simplest DCT-based coding process is referred to as the baseline sequential process.

    It provides a capability which is sufficient for many applications. There are additional DCT-

    based processes which extend the baseline sequential process to a broader range of

    applications. In any decoder using extended DCT-based decoding processes, the baseline

    decoding process is required to be present in order to provide a default decoding capability.

    The second class of coding processes is not based upon the DCT and is provided to meet

    the needs of applications requiring lossless compression. These lossless encoding and

    decoding processes are used independently of any of the DCT-based processes.

    The amount of compression provided by any of the various processes is dependent on the

    characteristics of the particular image being compressed, as well as on the picture quality

    desired by the application and the desired speed of compression and decompression.

    4.1.2 Sequential DCT-based Coding

    Figures 4.3 and 4.5 show the key processing steps which are the heart of the DCT-based modes of operation. These figures illustrate the special case of single-component

    (grayscale) image compression. We can grasp the essentials of DCT-based compression by

    thinking of it as essentially compression of a stream of 8x8 blocks of grayscale image

    samples. Color image compression can then be approximately regarded as compression of

    multiple grayscale images, which are either compressed entirely one at a time, or are

    compressed by alternately interleaving 8x8 sample blocks from each in turn.

    In the encoding process the input components samples are grouped into 8 8 blocks,

    and each block is transformed by theforward DCT(FDCT) into a set of 64 values referred to

  • 7/31/2019 Final Seminar Raju

    33/47

    33

    asDCT coefficients. One of these values is referred to as the DC coefficientand the other 63

    as the AC coefficients. Each of the 64 coefficients is then quantized using one of 64

    corresponding values from a quantization table (determined by one of the table specifications

    shown in Figure 4). No default values for quantization tables are specified in this

    Specification; applications may specify values which customize picture quality for their

    particular image characteristics, display devices, and viewing conditions.

    Figure 4.3DCT-based encoder simplified diagram[22]

    Figure 4.4Preparation of quantized coefficients for entropy encoding[22]

    After quantization, the DC coefficient and the 63 AC coefficients are prepared for

    entropy encoding, as shown in Figure 5. The previous quantized DC coefficient is used to

    predict the current quantized DC coefficient, and the difference is encoded. The 63 quantized

    AC coefficients undergo no such differential encoding, but are converted into a one

    dimensional zig-zag sequence, as shown in Figure 4.4. The quantized coefficients are then

    passed to an entropy encoding procedure which compresses the data further. IfHuffman

    encoding is used,Huffman table specifications must be provided to the encoder. Ifarithmetic

    encoding is used, arithmetic coding conditioning table specifications may be provided,

    otherwise the default conditioning table specifications shall be used. Figure 6 shows the mainprocedures for all DCT-based decoding processes. Each step shown performs essentially the

  • 7/31/2019 Final Seminar Raju

    34/47

    34

    inverse of its corresponding main procedure within the encoder. The entropy decoder decodes

    the zig-zag sequence of quantized DCT coefficients. After dequantization the DCT

    coefficients are transformed to an 8 8 block of samples by the inverse DCT(IDCT).

    Figure 4.5DCT-based decoder simplified diagram[22]

    4.1.3 Lossless Coding

    Figure 4.6 shows the main procedures for the lossless encoding processes. A predictor

    combines the reconstructed values of up to three neighbourhood samples at positions a, b, and

    c to form a prediction of the sample at position x as shown in Figure 4.7. This prediction is

    then subtracted from the actual value of the sample at position x, and the difference is

    losslessly entropy-coded by either Huffman or arithmetic coding.Any one of the eight

    predictors listed in Table 4.1 (under selection-value) can be used. Selections 1, 2, and 3 are

    one-dimensional predictors and selections 4, 5, 6 and 7 are two-dimensional predictors.

    Selection-value 0 can only be used for differential coding in the hierarchical mode of

    operation.

    For the lossless mode of operation, two different codecs are specified - one for each

    entropy coding method.The encoders can use any source image precision from 2 to 16

    bits/sample, and can use any of the predictors except selection-value 0. The decoders must

    handle any of the sample precisions and any of the predictors. Lossless codecs typically

    produce around 2:1 compression for color images with moderately complex scenes.

    This encoding process may also be used in a slightly modified way, whereby the

    precision of the input samples is reduced by one or more bits prior to the lossless coding.

    This achieves higher compression than the lossless process (but lower compression than the

    DCT-based processes for equivalent visual fidelity), and limits the reconstructed images

    worst-case sample error to the amount of input precision reduction.

  • 7/31/2019 Final Seminar Raju

    35/47

    35

    Figure 4.6Lossless encoder simplified diagram[22]

    Figure 4.73-sample prediction neighbourhood[22]

    Table 4.1. Predictors for lossless coding[22]

    The JPEG algorithm yields good results for compression ratios of 10:1 and below (on

    8-bit gray-scale images), but at higher compression ratios the underlying block nature of the

    transform begins to show through the compressed image. By the time compression ratios

    have reached 24:1, only the DC (lowest frequency) coefficient is getting any bits allocated to

    it, and the input image has been approximated by a set of 8 8 blocks. Consequently, the

    decompressed image has substantial blocking artifacts for medium and high compression

    ratios.

  • 7/31/2019 Final Seminar Raju

    36/47

    36

    4.2 JPEG 2000

    The data compression field is very active, with new approaches, ideas, and techniques

    being developed and implemented all the time. JPEG is widely used for image compression

    but is not perfect. The use of the DCT on 88 blocks of pixels results sometimes in a

    reconstructed image that has a blocky appearance (especially when the JPEG parameters are

    set for much loss of information). This is why the JPEG committee has developed a new,

    wavelet-based standard for the compression of still images, to be known as JPEG 2000. JPEG

    2000 has many advantages over JPEG, such as better image quality at the same file size, 25-

    35% smaller file sizes at comparable image quality, good image quality even at very high

    compression ratios (over 80:1), low complexity option for devices with limited resources,

    scalable image files, and progressive rendering and transmission through a layered image file

    structure.

    JPEG 2000 is not only intended to provide rate-distortion and subjective image quality

    performance superior to existing standards, but also to provide features and functionalities

    that current standards can either not address efficiently or in many cases cannot address at all.

    Lossless and lossy compression, embedded lossy to lossless coding, progressive transmission

    by pixel accuracy and by resolution, robustness to the presence of bit-errors and region-of-

    interest coding, are some representative features. It is interesting to note that JPEG2000 is

    designed to address the requirements of a diversity of applications, e.g. Internet, color

    facsimile, printing, scanning, digital photography, remote sensing, mobile applications,

    medical imagery, digital library and E-commerce.

    JPEG-2000 has a long list of features, a subset of which are [24],[25].[26],27]:

    High compression efficiency. Bitrates of less than 0.25 bpp are expected for highlydetailed greyscale images.

    The ability to handle large images, up to 232232 pixels (the original JPEG canhandle images of up to 216216).

    Progressive image transmission (Section 4.10). The proposed standard candecompress an image progressively by SNR, resolution, colour component, or region

    of interest.

    Easy, fast access to various points in the compressed stream. The decoder can pan/zoom the image while decompressing only parts of it. The decoder can rotate and crop the image while decompressing it.

  • 7/31/2019 Final Seminar Raju

    37/47

    37

    Error resilience. Error-correcting codes can be included in the compressed stream, toimprove transmission reliability in noisy environments.

    One of the new, important approaches to compression introduced by JPEG 2000 is the

    compress once, decompress many ways paradigm. The JPEG 2000 encoder selects a

    maximum image quality Q and maximum resolution R, and it compresses an image using

    these parameters. The decoder can decompress the image at any image quality up to and

    including Q and at any resolution less than or equal to R. Suppose that an image I was

    compressed intoB bits. The decoder can extractA bits from the compressed stream (whereA

    < B) and produce a lossy decompressed image that will be identical to the image obtained ifI

    was originally compressed lossily toA bits.

    In general, the decoder can decompress the entire image in lower quality and/or lower

    resolution. It can also decompress parts of the image (regions of interest) at either maximum

    or lower quality or resolution. Even more, the decoder can extractparts of the compressed

    stream and assemble them to create a new compressed stream without having to do any

    decompression. Thus, a lower-resolution and/or lower-quality image can be created without

    the decoder having to decompress anything. The advantages of this approach are (1) it saves

    time and space and (2) it prevents the buildup of image noise, common in cases where an

    image is lossily compressed and decompressed several times.

    Figure 4.8 shows the steps in the JPEG 2000 compression of an image. Function of each

    block is explained below[26].

    Figure 4.8: Steps in the JPEG 2000 compression of an image.[26]

    Fig. 4.8. Tiling, DC level shifting and DWT of each image tile component[26].

    Tiling Component

    transform

    Wavelet

    transformQuantizer Entropy

    coder

    Packet

    orderin

  • 7/31/2019 Final Seminar Raju

    38/47

    38

    1. TilingThe first thing that happens when an image is JPEG 2000 compressed is that it is split

    into rectangular tiles. Since each tile is compressed independently of every other tile, the

    usual rationale for tiling is to limit the amount of memory needed to implement JPEG 2000

    and to provide independent access to regions in an image. Some implementations are

    designed for tiling and perform best with tiled images; other implementations can compress

    megabyte and gigabyte images without tiling.

    Prior to computation of the forward discrete wavelet transform (DWT) on each image

    tile, all samples of the image tile component are DC level shifted by subtracting the same

    quantity (i.e. the component depth). DC level shifting is performed on samples of

    components that are unsigned only. If color transformation is used, it is performed prior to

    computation of the forward component transforms Otherwise it is performed prior to the

    wavelet Transform as shown in Figure 4.8. This process translates all pixel values from their

    original, unsigned interval [0, 2s 1] (where s is the pixels depth) to the signed interval

    [2s1, 2s1 1] by subtracting 2s1 from each value. For s = 4, e.g., the 24 = 16 possible

    pixel values are transformed from the interval [0, 15] to the interval [8,+7] by subtracting

    241 = 8 from each value.

    2. Component transform

    If the components in a multi-component image are red, green and blue, then an optional

    component transform is available to convert them to luminance and chrominance. The

    purpose of these transforms is to decorrelate the red, green and blue image components,

    which improves compression performance by redistributing the energy across the image

    components. In this respect, the ICT does a better job at decorrelating the red, green and blue

    values than the RCT, which leads to better compression. Whichever transform is used before

    compression; the inverse transform is applied after decompression to restore the red, green

    and blue values.

    3. Wavelet Transform

    The wavelet transform is applied on each tile. The tile is decomposed in different

    resolution levels. These decomposition levels are made up of subbands of coefficients that

    describe the frequency characteristics of local areas (rather than across the entire tile-

    component) of the tile component.

  • 7/31/2019 Final Seminar Raju

    39/47

    39

    4. Quantizer

    The next step after the wavelet transform is the quantization of the subband images,

    which is a lossy step that reduces their precision in order to improve their compressibility in

    the following step, which is the arithmetic coder. In lossless compression, the subband

    images are passed unchanged to the arithmetic coder.

    5. Entropy coder

    After quantization comes the entropy coder, which takes advantage of the statistical

    properties of the quantized subband images to reduce the number of bits used to represent

    them. This is the stage where the actual compression occurs. While baseline JPEG use

    Huffman coding, JPEG 2000 uses a more sophisticated and computationally expensive

    method known as adaptive arithmetic coding. The subband images are partitioned into fixed-

    size codeblocks and the arithmetic coder applied independently to each bitplane of each

    subband image within a codeblock. Because arithmetic coding can become less effective for

    lower bitplanes, JPEG 2000 has an optional Bypass mode that skips the coding of the lower

    bitplanes, which saves time with little reduction in compression efficiency.

    6. Packet ordering

    Packets are the fundamental building blocks of a JPEG 2000 codestream. While a layer is an

    increment in quality for the entire image, a packet is an increment in quality for a specific

    position of a given resolution of a component of a tile. The interleaving of packets in a

    codestream determines the progression order in which compressed data is received and

    decompressed. JPEG 2000 defines five progression orders or packet orderings. In resolution-

    major progression orders, the packets for all layers, positions and components of the lowest

    resolution come in the codestream before all those for the next higher resolution level.

    4.3. JPEG 2000 Applications in Access and Preservation

    JPEG 2000 is being used for geospatial imaging, medical imaging and by the cultural

    heritage and digital preservation communities. Many digital collection and library systems

    support JPEG 2000, and several institutions use it in their collections. This section will

    discuss the experiences of a few of those institutions, chosen as they highlight the issues[26].

    An institution that has done much work in the use of JPEG 2000 and is now one of theleaders in its adoption is the Harvard University Library (HUL). The move to JPEG 2000

    was driven in part by institutional clients who wanted features such as interactive zoom,

    pan, and rotate. These requirements are not easily implemented with TIFF, GIF, or JPEG,

    but are easily enabled by JPEG 2000. In 2006, Harvard reported the successful test

  • 7/31/2019 Final Seminar Raju

    40/47

    40

    migration of more than 10,000 TIFF, GIF, and JPEG images to equivalent lossless and

    lossy JPEG 2000 form.

    Over the past several years, the rate of acquisition of new JPEG 2000 images into theHUL Digital Repository Service (DRS) has steadily increased, while that for JPEG and

    TIFF has decreased. The DRS now manages about two million JPEG 2000 images, and

    JPEG 2000 is becoming the default format for image conversion and acquisition. A single

    JPEG 2000 master image in the repository enables the dynamic delivery of arbitrarily-

    sized use images (transcoded to JPEG for rendering by the client browser), all computed

    on demand from the master, thereby eliminating the need to maintain multiple variants in

    anticipation of client requests. In addition, JPEG 2000 enables an interactive interface that

    lets users perform the zoom, pan, and rotation operations that now form the common user

    expectation for web-based image delivery.

    Library and Archives Canada ran a year-long JPEG 2000 pilot project over 2006 and2007, the results of which were described at the Museums and Web 2007 conference35.

    This pilot was undertaken to address many of the questions that cultural institutions have

    regarding JPEG 2000. One of their main results was to show that the use of JPEG 2000

    could reduce file sizes significantly without loss to image quality. In the case of lossless

    archival masters, the compression ratio was typically around 2:1. For production or access

    masters, they specified a recommended compression ratio of 24:1 for colour images,

    which included photographs, prints, drawings and maps, and 8:1 for greyscale images,

    which included newspapers, microfilm and textual materials. They found that the JPEG

    2000 codec they used performed best when images were tiled, and they recommended

    tiles sizes of 512 by 512 and 1024 by 1024. They also observed that the use of JPEG 2000

    meant that derivative files were no longer required. The JP2 files they created in this pilot

    contained XML boxes with MODS based metadata records.

    The Library of Congress already makes use of JPEG 2000. For example, Civil War mapsin the American Memory collection are compressed using JPEG 2000. A clients pan and

    zoom requests are served with reference to a JPEG 2000 image; the resulting views are

    transcoded to JPEG for delivery to a standard web browser. The site also offers the option

    of downloading the JPEG 2000 image of the map. The Librarys collection still has some

    maps compressed using MrSID, a proprietary wavelet-based compression method that

    predates JPEG 2000.

  • 7/31/2019 Final Seminar Raju

    41/47

    41

    Chapter 5

    COMPARATIVE STUDY

    The goal of image compression is to save storage space and to reduce transmission

    time for image data. It aims at achieving a high compression ratio (CR) while preserving

    good fidelity of decoded images. The techniques used to compress/decompress a single gray

    level image are expected to be easily modified to encode/decode color images and image

    sequences. There is always a compromise between image quality and compression ratio.

    There are so many methods available for image compression. The choice of a particular

    method depends on application. In this chapter we review comparative studies made by

    different authors.

    5.1. Comparative results obtained by DELGORGE [6]

    The survey was performed on 10 ultrasound images of size 768*576 The images have

    been acquired by an AU3 ultrasound scanner (ESAOTE) at a rate of 15 images per second,

    then digitised thanks to a Matrox Meteor board. The computing was achieved by a Pentium

    III with 450 MHz, under Windows NT.

    The following results represent an average measure of the MSE, PSNR, coding

    computed time tcc, tenc, tdec and compression rate CRt calculated on ten rebuilt and original

    images of our database. The results concerning tcc, tenc and tdec have to be looked at in

    comparison with each others to appreciate the performance of each of the studied techniques

    TABLE 5.1 TABLE5.2

    COMPARISON RESULTS FOR LOSSLESS METHODS COMPARISON RESULTS FOR LOW AND HIGH COMPRESSION

    It can be concluded that the RLE coding is not suited to ultrasound image, as its CRt

    is the largest. Fano & Huffman algorithms give comparable results in terms of tenc,tdec and

    CRt, with poor performances. Adaptative Huffman method presents a compression rate of

    54.57 % (the final image size is about half of the original one). The last method, based onarithmetic coding, give the best compression rate, but is associated with larger compression

  • 7/31/2019 Final Seminar Raju

    42/47

    42

    and decompression times.In conclusion, the Adaptative Huffman method gives the best

    compromise between compression rate and computing times.

    Experimental results performed on ten ultrasound images establish that the JPEG-LS

    technique seems to be the best lossless method for tele-medicine application. In the lossy

    case, JPEG-LS is the best method when the compression rate expected is greater than 5%.

    And for very high compression, JPEG 2000 becomes the optimal technique.

    5.2. The comparative results reported by Chaur-Chin Chen[7] for various lossymehods

    Table 5.3 performance of different methods

    Method Advantages Disadvantages Compression ratio

    Wavelet

    high compression

    ratio

    coefficient

    ratio quantization

    bit allocation>> 32

    JPEG

    state-of-the-art

    current standard

    coefficient

    (DCT) quantization

    bit allocation

    50

    VQ

    simple decoder

    no coefficient

    quantization

    slow codebook

    generation

    small bpp

    < 32

    Fractal

    good mathematical

    encoding frame

    resolution-free

    decoding

    slow encoding

    16

    Image compression algorithms based on EZW, JPEG/DCT, VQ, and Fractal methods

    were tested for four 256256 real images: Jet, Lenna, Mandrill, Peppers, and one 400400

    fingerprint image. The original images ofLenna andfingerprintare shown in Figure 5.1. The

    results of performance are reported in Tables 5.3. The decoded images based on the four

    approaches are shown in Figures 5.2 and 5.3. The associated PSNR values and

    encoding/decoding times shown in Tables 4.5 for the test images indicate that all the four

    approaches are satisfactory at 0.5 bpp request (CR=16). However, the EZW has significantly

    larger PSNR values and a better visual quality of decoded images compared with the other

    approaches. At a desired compression of 0.25 bpp (CR=32) for the fingerprint image, the

  • 7/31/2019 Final Seminar Raju

    43/47

    43

    commonly used VQ cannot be tested, and the fractal coding cannot be achieved unless

    resolution-free decoding property is utilized which is not useful for the current purpose; both

    EZW and JPEG approaches perform well, and the results of EZW have significant larger

    PSNR values than that of JPEG.

    Table 5.4: Performance of coding algorithms on various 256256 images.

    Algorithm PSNR values (in dB)


Recommended