+ All Categories
Home > Documents > JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG...

JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG...

Date post: 30-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
JPEG2000 Hardware Implementation Procedures and Issues ALI M. REZA U.S. Coast Guard Academy Department of Engineering, Electrical Engineering Section 15 Mohegan Ave., New London, CT 06320 U.S.A. Abstract: Hardware implementation of JPEG 2000 compression/decompression standard, applicable to the real- time image sequences, is discussed in this article. The general system-level design along with its details are presented. In this work, for clarity and ease of understanding, implementation of lossless and lossy compressions are treated independently. Depending on the application, usually one implementation is preferred over the other and in general there is no need to include both methods on the same hardware platform. For multi-level decom- position, it is assumed that the LL component of the previous stage, i.e., scale-coefficient, is fed back to the same hardware for further decomposition. Lifting approach along with convolution method and polyphase decomposi- tion for implementation of the forward and inverse discrete wavelet transform (DWT) are discussed. Key–Words: JPEG 2000, Hardware Implementation, Discrete Wavelet Transform, Lossy and Lossless Compres- sion, Lifting Algorithm 1 Introduction Hardware implementation of JPEG 2000 compression standard for lossy and lossless compressions depend on several different operations. The original image frame, which is assumed to have three color compo- nents, is divided into tiles, usually 64 × 64, and then a given tile component is delivered into the discrete wavelet transform block (DWT). Depending on the number of stages in the DWT, which corresponds to the number of decomposition levels, the LL compo- nent, e.g., scale-coefficient, of the previous level is fed back to the DWT block for the next level decompo- sition. Quantization block is used only during lossy compression and is by-passed in the lossless com- pression. Each tile is independently quantized and passed to the code-block formation to produce a fixed size rectangular block for coding. Each code-block is decomposed to its bit-plane and passed through the entropy-coding block to produce the compressed bit- stream. The reverse operation is carried out to recover the original image from the compressed one. In the re- verse process, the quantization block simply behaves as a scaling operation to produce the proper scale for a given code. In this case, the inverse DWT (IDWT) is feeding back its reconstructed image, at a given res- olution, in order to reconstruct the image at the upper resolution. The number of feedback operation in this case depends on the decomposition level used in the compressed image. Coding in the JPEG 2000 standard is based on the adaptive arithmetic coding [1]. Encoding is imple- mented during the compression stage and its inverse operation, referred to as decoding, is implemented during the decompression stage. We have already pre- sented a system level design for implementation of the 2D wavelet transform [2] and the coefficient bit mod- eling used in the JPEG 2000 standard [3]. Our sys- tem level design for the adaptive arithmetic encoding is discussed in [4] and the corresponding discussion for the decoding process is given in [5]. Our main ref- erence in this work is the JPEG 2000 Part I Final Com- mittee Draft Version 1.0 dated 16 March 2000 [6]. There are several real-time video processing hard- ware implementation that deal with video compres- sion. A programmable and dedicated approach to real-time video processing application is proposed in [7]. This work is not based on JPEG 2000. A hard- ware architecture dedicated to the macroblock engine of the H.264/AVC video codec standard is presented in [8]. This work is based on H.264 standard and is useful for video processing. A hardware architec- ture for motion compensated, video frame rate up- conversion application is proposed in [9]. This work is also useful only for video processing. Our implemen- tation is mainly applicable to image sequence process- ing where correlation between frames are not consid- WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza E-ISSN: 2224-266X 101 Issue 4, Volume 12, April 2013
Transcript
Page 1: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

JPEG2000 Hardware ImplementationProcedures and Issues

ALI M. REZAU.S. Coast Guard Academy

Department of Engineering, Electrical Engineering Section15 Mohegan Ave., New London, CT 06320

U.S.A.

Abstract: Hardware implementation of JPEG 2000 compression/decompression standard, applicable to the real-time image sequences, is discussed in this article. The general system-level design along with its details arepresented. In this work, for clarity and ease of understanding, implementation of lossless and lossy compressionsare treated independently. Depending on the application, usually one implementation is preferred over the otherand in general there is no need to include both methods on the same hardware platform. For multi-level decom-position, it is assumed that the LL component of the previous stage, i.e., scale-coefficient, is fed back to the samehardware for further decomposition. Lifting approach along with convolution method and polyphase decomposi-tion for implementation of the forward and inverse discrete wavelet transform (DWT) are discussed.

Key–Words: JPEG 2000, Hardware Implementation, Discrete Wavelet Transform, Lossy and Lossless Compres-sion, Lifting Algorithm

1 Introduction

Hardware implementation of JPEG 2000 compressionstandard for lossy and lossless compressions dependon several different operations. The original imageframe, which is assumed to have three color compo-nents, is divided into tiles, usually 64 × 64, and thena given tile component is delivered into the discretewavelet transform block (DWT). Depending on thenumber of stages in the DWT, which corresponds tothe number of decomposition levels, the LL compo-nent, e.g., scale-coefficient, of the previous level is fedback to the DWT block for the next level decompo-sition. Quantization block is used only during lossycompression and is by-passed in the lossless com-pression. Each tile is independently quantized andpassed to the code-block formation to produce a fixedsize rectangular block for coding. Each code-blockis decomposed to its bit-plane and passed through theentropy-coding block to produce the compressed bit-stream.

The reverse operation is carried out to recover theoriginal image from the compressed one. In the re-verse process, the quantization block simply behavesas a scaling operation to produce the proper scale fora given code. In this case, the inverse DWT (IDWT)is feeding back its reconstructed image, at a given res-olution, in order to reconstruct the image at the upperresolution. The number of feedback operation in this

case depends on the decomposition level used in thecompressed image.

Coding in the JPEG 2000 standard is based on theadaptive arithmetic coding [1]. Encoding is imple-mented during the compression stage and its inverseoperation, referred to as decoding, is implementedduring the decompression stage. We have already pre-sented a system level design for implementation of the2D wavelet transform [2] and the coefficient bit mod-eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encodingis discussed in [4] and the corresponding discussionfor the decoding process is given in [5]. Our main ref-erence in this work is the JPEG 2000 Part I Final Com-mittee Draft Version 1.0 dated 16 March 2000 [6].

There are several real-time video processing hard-ware implementation that deal with video compres-sion. A programmable and dedicated approach toreal-time video processing application is proposed in[7]. This work is not based on JPEG 2000. A hard-ware architecture dedicated to the macroblock engineof the H.264/AVC video codec standard is presentedin [8]. This work is based on H.264 standard andis useful for video processing. A hardware architec-ture for motion compensated, video frame rate up-conversion application is proposed in [9]. This work isalso useful only for video processing. Our implemen-tation is mainly applicable to image sequence process-ing where correlation between frames are not consid-

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 101 Issue 4, Volume 12, April 2013

Page 2: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

ered for compression.The main procedure, in block diagram form, is

shown in Fig. 1. The original image, which mayhave color components, is divided into tiles, usually64 × 64. A given tile is used as input to the dis-crete wavelet transform block (DWT). Depending onthe number of stages in the DWT, which correspondsto the number of decomposition level, the LL com-ponent, scale-coefficient, of the previous level is fedback from memory to the DWT block for the nextlevel decomposition.

Quantization block is used only for lossy com-pression and is bypassed by the lossless compres-sion. Each sub-image of the decomposition is inde-pendently quantized and passed on to the code-blockformation to produce a fixed-size rectangular blockfor coding. Each code-block is decomposed to its bit-plane and passed through the entropy-coding block toproduce the compressed bit-stream.

The reverse operation is carried out to recover theoriginal image from the compressed one. In the re-verse operation the quantization block simply behavesas a scaling operator to produce the proper scale for agiven code. In this case, the inverse DWT (IDWT) isrecursively feeding back its reconstructed image at agiven resolution in order to reconstruct the image atthe upper resolution. The number of feedback opera-tion in this case depends on the decomposition levelused in the compressed image.

2 Discrete Wavelet Transform(DWT) for Lossless JPEG 2000

In this part, we present the design for implementa-tion of wavelet transform that accommodate JPEG2000 requirements [10]. The forward discrete wavelettransform is referred to as DWT and its correspondinginverse is denoted by IDWT. Number of stages usedfor image decomposition is not fixed and is used as aparameter. The wavelet type discussed in this discus-sion is the one that maps integers to integers with 5/3biorthogonal filters given in (1).

h0 ={−1

8 ,14 ,

34 ,

14 ,−

18

}h1 =

{−1

2 , 1,−12

}g0 =

{12 , 1,

12

}g1 =

{−1

8 ,−14 ,

34 ,−

14 ,−

18

} (1)

In this case, based on the lifting algorithm:[11][12][13][14], the filters for forward transform can

be implemented by using equations in (2).

y (2n+ 1) = x (2n+ 1)−⌊x(2n)+x(2n+2)

2

⌋;

for⌈i02

⌉− 1 ≤ n <

⌈i12

⌉y (2n) = x (2n) +

⌊y(2n−1)+y(2n+1)+2

4

⌋;

for⌈i02

⌉≤ n <

⌈i12

⌉(2)

In this case, x(n) is the input to the single stage DWTand yH(m) and yL(M) are respectively the high andlow components of the DWT decomposition. Notethat the rate of the input is twice that of the outputs.Similar relations can be derived for the inverse DWTas shown in (3).

y (2n+ 1) = x (2n+ 1)−⌊x(2n)+x(2n+2)

2

⌋;

for⌈i02

⌉− 1 ≤ n <

⌈i12

⌉y (2n) = x (2n) +

⌊y(2n−1)+y(2n+1)+2

4

⌋;

for⌈i02

⌉≤ n <

⌈i12

⌉(3)

In this case, the higher rate signal x(n) is madeof its even and odd samples obtained in (3). Note thatb·c denotes floor operation, whose output is the largestinteger that is smaller than or equal to the real numberthat it operates on. In the case of 2’s complement bi-nary numbers, floor operation corresponds to the trun-cation of the least significant bits. In these equations,i0 is the index for the first sample and i1 − 1 is theindex for the last sample of the input x(·). Also interms of Lowpass and Highpass components we havethe following relationships:

{yH (m) = y (2n+ 1)yL (m) = y (2n)

(4)

The signal is extended in both directions usingmirror imaging in both forward and inverse transformsas needed. This is better explained by the signal flowgraphs shown in Fig. 2.

Block diagram of this realization, based on stor-age of intermediate results for both forward and in-verse transforms, is shown in Fig. 3. This implemen-tation is sequential. Another method for this operationis based on parallel realization in which intermediateresults are recalculated as needed. The parallel ver-sion of this realization is shown in Fig. 4.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 102 Issue 4, Volume 12, April 2013

Page 3: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 1: Top-level block diagram for JPEG2000 operation

Figure 3: Implementation of (5/3) integer-to-integerwavelet transform based on sequential lifting algo-rithm.

Figure 4: Realization of (5/3) integer-to-integerwavelet transform based on parallel lifting algorithm.In the inverse operation, the output should be settledin a way that samples are properly ordered. That mayrequire repositioning of the unit delay block to the up-per path.

3 Hardware Implementation of theLossless Forward DWT

Hardware implementation that we propose in this partis based on the parallel lifting scheme presented inFig. 4. This realization is more suitable for hardwareimplementation mainly due to its timesharing for mul-tiple decomposition level. Since the lossless imple-mentation should properly correspond to its softwarerealization counterpart, we have no choice other thanimplementing the column processing before the rowprocessing.

When row processing is carried out first and theresultant compressed image is reconstructed by stan-dard software, the result is not perfectly lossless andthere will be some rounding error (only on the leastsignificant bit). To avoid this situation and other simi-lar errors, we have decided to use the same implemen-tation approach as carried out in the standard software.

In this case, we assume that the input image isread or arrives in a raster form, one pixel at a time androw by row. The first operation is to properly collectthe input data for the first processing unit. This is donein the FDWT Input Interface block shown in Fig. 5.

The size of each buffer is exactly equal to the rowsize of the corresponding image. The set of five rowbuffers are considered as a macro block in which thevertical mirror imaging is also incorporated. This partis carried out at the top and the bottom of the inputsub-image. The timing and control of this block isdiscussed in the section related to control signals. Thisimplementation includes the vertical mirror imagingof the input as needed.

Lifting algorithm used in lossless implementationresults in a hardware realization with no multipliers.In this case, the required division by 2 and 4 can berealized by binary shift to the right. With reference toFig. 4, this implementation requires a processing coreshown in Fig. 6.

This core, which can be used for all decompo-sition levels, is always working at the input rate aslong as data are coming through it. The way that thiscore serves all decomposition levels is discussed in thelater sections. It should be noted that the same core isused for column processing as is discussed later.

In the next stage, the pixels from rows of the two

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 103 Issue 4, Volume 12, April 2013

Page 4: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 2: Signal flow graph for four different possible cases shown for both forward and inverse DWT transformsbased on lifting implementation of 5/3 biorthogonal filters. In the forward transform all multipliers in the first setare 1/2 and in the second set are 1/4. The reverse is true in the inverse transform.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 104 Issue 4, Volume 12, April 2013

Page 5: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 5: Column Interface block for FDWT with mir-ror imaging capabilities.

Figure 6: Processing core for the forward DWT basedon parallel lifting shown in Fig. 4

Figure 7: Macro block for Row Interface. This imple-mentation includes the horizontal mirror imaging ofthe input as needed.

sub-images are independently collected, in a set offive registers for each sub-image, in order to enablethe row processing along with the corresponding mir-ror imaging. This process is carried out in the row in-terface represented in Fig. 7. The mirror imaging anddata flow are controlled by the addressing of MUXsand enable signals of MUXs and Registers.

For each decomposition level, two such interfacesare needed, one for the Low sub-image and another forthe High sub-image. Two similar processing units, asshown in Fig. 6, are used to carry out the row process-ing independently for the two sub-images.

In order to combine these macro blocks for multi-level decomposition, there is a need for proper routingof the inputs and outputs. These operations need in-put and output interface macro blocks as well as twoRow/Column Interface macro blocks. The Interfacemacro blocks are shown in Fig.8.

In the case of the Input Interface, the rate of in-coming data to the Column Interface blocks and thecorresponding MUX depends on the row size of thecorresponding image. Similarly, in the Column/RowInterface, the data rate to the Row Interface blocks andthe corresponding MUX depends on the row size of

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 105 Issue 4, Volume 12, April 2013

Page 6: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 8: (a) Input Interface, (b) Row/Column In-terface, and (c) output interface macro blocks usedfor multi-level forward DWT decomposition. FDWTRow and Column interface blocks are defined asshown in Fig. 5 and Fig. 7.

Figure 10: Initial interface before row processing inthe inverse DWT.

each corresponding sub-image. Addressing and con-trol signals for these interface macro blocks are dis-cussed in the section corresponding to the signal flowand control signals.

The overall forward DWT implementation, interms of the defined macro blocks is shown in Fig. 9.In this implementation, the size of the memory usedas buffers in all Column Interface blocks is about 10times the row size of the input image for any numberof decompositions. Proper memory design may resultin more optimization of the Column Interface by elim-inating the need for the MUX in that macro block.

4 Hardware Implementation of theLossless Inverse DWT

The inverse operation starts in reverse order of the for-ward transform. The procedure begins with two setsof row processing by initially going through the rowinterface for mirror imaging as well as proper dataflow alignment. This interface is shown in Fig. 10.

After completion of the row processing, the col-umn processing starts. The first block in this stageis the Column Interface block. The structure of row

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 106 Issue 4, Volume 12, April 2013

Page 7: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 9: Overall block diagram of the forward DWT based on realization of the parallel lifting algorithm (integer-to-integer 5/3 biorthogonal wavelet transform).

buffers in the Column Interface is a bit different fromthat of the forward DWT. This structure for a givencomposition level is shown in Fig. 11.

Similar to the forward transform, the total mem-ory used to handle all decomposition levels is about10 times the row size of the original image. The ver-tical mirror imaging is carried out at the top and thebottom of the input sub-images. The timing and con-trol of this block is discussed in the section related tocontrol signals.

The core for the lossless column or row process-ing for the inverse transform is properly designed tocarry out the inverse operation as shown in Fig. 4.This core is working at full rate to service all decom-position levels. The way that data are provided andaligned for processing by this core is controlled by thecorresponding interfaces that are designed for columnprocessing and row processing. The structure of thisimplementation is shown in Fig. 12.

In the inverse transform, the input interface is alittle bit more involved. In this case, there are two setsof DWT coefficients that are almost simultaneouslysynthesized when doing the row processing. The re-sult of this processing is synthesized to generate thehigher resolution image. This input interface is shownin Fig. 13.

In this case, the top sub-images consist of low andhigh DWT coefficients and need to be synthesized inthe row direction. It is also assumed that the inputsequence is loaded in such a way that for any lowcoefficient the corresponding high coefficient comesinto the interface and properly sent to its correspond-ing register. Therefore the DWT coefficients are notread in a raster from Memory but they are read in aproper ordering needed for their synthesis.

The interface between row processing and col-

Figure 11: Column Interface macro block used for in-verse DWT. In this case, two row buffers are assignedto the Lowpass component of the image and three rowbuffers are assigned to the corresponding Highpasscomponent the image.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 107 Issue 4, Volume 12, April 2013

Page 8: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 13: Input interface macro block for inverse DWT. The input properly loads two different sets of IDWT rowinterfaces. The number of row interfaces in each segment is the same as the decomposition level.

Figure 15: Overall block diagram of the inverse DWT based on realization of the parallel lifting algorithm (integer-to-integer 5/3 biorthogonal wavelet transform).

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 108 Issue 4, Volume 12, April 2013

Page 9: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 12: Macro block for column or row process-ing in the case of the inverse DWT. In this case thefinal output sequence is prepared by the final switchto properly order the odd and even samples.

Figure 14: Inverse DWT Row/Column Interface con-sist of a set of IDWT Column Interfaces. The numberof Column Interfaces is equal to the decompositionlevel.

umn processing consists of a set of Column Interfaces.This interface is referred to as Row/Column Interfaceand is shown in Fig. 14. The overall inverse transformin terms of defined macro blocks is shown in Fig. 15.

5 Discrete Wavelet Transform(DWT) for Lossy JPEG 2000

In this part, we present system level design for im-plementation of lossy wavelet transform that accom-modate JPEG 2000 requirements. As before, numberof stages used for image decomposition is not fixedand is dealt with as a parameter. The wavelet typeis restricted to only one type, which is 9/7 biorthogo-nal wavelet used as standard for lossy DWT in JPEG2000. In this work we present and use two differentformulations for the discrete wavelet transform. Oneformulation is based on the conventional FIR imple-mentation or convolution and the other one is basedon the lifting algorithm. The conventional approach is

Table 1: Standard filter coefficients for (9/7) biorthog-onal DWT.

hc0 hc1 gc0 gc10.0378284555 -0.0645388826 -0.0645388826 -0.0378284555

-0.023849465 0.0406894176 -0.0406894176 -0.023849465

-0.1106244044 0.418092273 0.418092273 0.1106244044

0.3774028556 -0.7884856164 0.7884856164 0.3774028556

0.852698679 0.418092273 0.418092273 -0.852698679

0.3774028556 0.0406894176 -0.0406894176 0.3774028556

-0.1106244044 -0.0645388826 -0.0645388826 0.1106244044

-0.023849465 -0.023849465

0.0378284555 -0.0378284555

based on using filter coefficients provided in Table 1.The lifting implementation is based on equations pro-vided in (5).

Step1 : y2n+1 ← x2n+1 + α (x2n + x2n+2) ;

for⌈i02

⌉− 2 ≤ n <

⌈i12

⌉+ 1

Step2 : y2n ← x2n + β (y2n−1 + y2n+1) ;

for⌈i02

⌉− 1 ≤ n <

⌈i12

⌉+ 1

Step3 : y2n+1 ← y2n+1 + γ (y2n + y2n+2) ;

for⌈i02

⌉− 1 ≤ n <

⌈i12

⌉Step4 : y2n ← y2n + δ (y2n−1 + y2n+1) ;

for⌈i02

⌉≤ n <

⌈i12

⌉Step5 : y2n+1 ← −Ky2n+1;

for⌈i02

⌉− 1 ≤ n <

⌈i12

⌉Step6 : y2n ← (1/K) y (2n) ;

for⌈i02

⌉≤ n <

⌈i12

⌉(5)

Approximate values of the parameters used in (5) areas follows:

α = −1.586 134 342 059 924β = −0.052 980 118 572 961γ = 0.882 911 075 530 934δ = 0.443 506 852 043 971K = 1.230 174 104 914 001

(6)

Index i0 is used to represent the first sample and indexi1 − 1 denotes the last sample of the input xn.

If there is a need to implement DWT filters ina conventional form (using convolution or FIR tap-delays), the filter coefficients are modified to corre-

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 109 Issue 4, Volume 12, April 2013

Page 10: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Table 2: Filter coefficients for (9/7) biorthogonalDWT used in JPEG2000.

h0 h1 g0 g10.02674875741 0.09127176311 -0.09127176311 0.02674875741

-0.01686411844 -0.05754352622 -0.05754352622 0.01686411844

-0.07822326652 -0.59127176311 0.59127176311 -0.07822326652

0.26686411844 1.11508705245 1.11508705245 -0.26686411844

0.60294901823 -0.59127176311 0.59127176311 0.60294901823

0.26686411844 -0.05754352622 -0.05754352622 -0.26686411844

-0.07822326652 0.09127176311 -0.09127176311 -0.07822326652

-0.01686411844 0.01686411844

0.02674875741 0.02674875741

spond exactly to what the lifting approach will pro-vide. This conversion is represented in (7) and theresultant coefficients are given in Table 2.

h0 = hc0

/√2

h1 = −hc1 ·√2

g0 = gc0 ·√2

g1 = −gc1/√

2

(7)

In this case, xn is the input to the single stage DWT.The high and low components of the DWT decompo-sition, represented by yH(m) and yL(m) respectively,are given by

{yH (m) = y2n+1

yL (m) = y2n(8)

Note that the rate of the input is twice that of theoutputs. Similar relations can be derived for the in-verse DWT as shown in (9). In this case the higherrate signal xn is made of its even and odd samples

obtained in (9).

Step1 : x2n ← Ky2n;

for⌊i02

⌋− 1 ≤ n <

⌊i12

⌋+ 2

Step2 : x2n+1 ← − (1/K) y2n+1;

for⌊i02

⌋− 1 ≤ n <

⌊i12

⌋+ 1

Step3 : x2n ← x2n − δ (x2n−1 + x2n+1) ;

for⌊i02

⌋− 1 ≤ n <

⌊i12

⌋+ 2

Step4 : x2n+1 ← x2n+1 − γ (x2n + x2n+2) ;

for⌊i02

⌋− 1 ≤ n <

⌊i12

⌋+ 1

Step5 : x2n ← x2n − β (x2n−1 + x2n+1) ;

for⌊i02

⌋≤ n <

⌊i12

⌋+ 1

Step6 : x2n+1 ← x2n+1 − α (x2n + x2n+2) ;

for⌊i02

⌋≤ n <

⌊i12

⌋(9)

Block diagrams of the conventional FIR filter-ing realization for both forward and inverse DWT areshown in Fig. 16. The same operation is realizedthrough lifting algorithm as shown in Fig. 17. Thisfigure depicts the realization for both forward and in-verse DWT. In this writing, the conventional FIR re-alization is referred to as Convolution approach andthe implementation based on the lifting algorithm isreferred to as lifting approach.

Efficient implementation of the lossy wavelettransforms along with its corresponding macro blocksare discussed in the following sections. First, wepresent the forward transform and then we discuss theinverse transform. The control part for this implemen-tation is presented at the end.

6 Hardware Implementation of theLossy Forward DWT

Hardware realization of two-dimensional wavelettransform requires enough memory to store the inter-mediate results right after row processing and beforethe column processing. However, if we use proper rowbuffering scheme, the column processing can be car-ried out first and without the necessity of holding theintermediate results, the row processing can follow thecolumn processing. Advantage of this approach is thatthe input row buffer can be designed independentlyand separately in a different VLSI chip if needed. Thisis more desirable for low density FPGAs. When us-ing high density FPGAs the same design along with

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 110 Issue 4, Volume 12, April 2013

Page 11: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 16: Conventional realization using polyphase implementation.

Figure 17: Implementation of (9/7) biorthogonal wavelet transform based on lifting algorithm.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 111 Issue 4, Volume 12, April 2013

Page 12: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

the input buffer can be put into a single chip and thereis no need to modify the design that is presented inthis work. In terms of the memory requirement, bothapproaches require same amount of memory.

For efficient implementation of the lossy forwardtransform, we recommend to apply column transfor-mation first and then transform each row by utilizinginput row buffer. In this case the buffer will includethe original data and does not encounter bit-growthas drastically as it would if we had to process rowsfirst. In terms of making the mirror image of data inall directions, we have found that the best approachsuitable for the column processing is the conventionalapproach. In this case the number of buffers would bekept at its minimum. Lifting approach can be efficientonly if we use mirror imaging outside the chip. In thisreport we only present the more efficient conventionalapproach for both column processing as well as rowprocessing.

Hardware implementation based on the conven-tional approach requires a set of nine input buffersproperly configured for convolution filtering. Thestructure of the input buffer is shown in Fig. 18. Inthis structure, the mirror imaging is carried out withproper control of the MUX units in the design. Themirror imaging is considered for both top and bottompart of the image input. Depending on the indexes ofthe first row (top) and the last row (bottom), the mir-ror imaging is carried out for four or three rows. Thedetail of this structure and the way the control is orga-nized is discussed in the section on control unit.

The processing core that should come right afterthe row buffers consist of multipliers and adders asshown in Fig. 19. This core, which can be used for alldecomposition levels, is always working at the inputrate as long as data are coming through it in real time.The way that this core serves all decomposition levelsis discussed in the later section on control.

Row processing can immediately start on the out-put of the processed columns. The processing core inthis case is exactly the same as that of column process-ing. The interface between the output of the processedcolumns and the input of the row processing is shownin Fig. 20. In this case, the realization is similar tothat of row buffers but instead of each buffer we havea register.

In order to combine these macro blocks for multi-level decomposition, there is a need for proper routingof the inputs and outputs. These operations are car-ried out by input and output interface macro blocks asshown in Fig. 21.

The overall forward DWT implementation, interms of the defined macro block is shown in Fig. 22.

Figure 18: Detailed block diagram of the inputbuffer for forward DWT. The corresponding equiva-lent macro block is also shown. With proper controlof MUX units, the mirror imaging is carried out.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 112 Issue 4, Volume 12, April 2013

Page 13: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 22: Overall block diagram of the forward DWT based on realization of the conventional convolution algo-rithm for (9/7) biorthogonal wavelet transform.

Figure 19: The main processing macro block for thelossy forward DWT.

7 Hardware Implementation of theLossy Inverse DWT

The inverse operation starts with a set of row buffers.The structure of row buffers in this case is a bit dif-ferent from that of the forward DWT. This structurefor a given number of decomposition level is shownin Fig. 23.

Similarly, the core for the lossy column or rowprocessing for the inverse transform is properly de-signed to carry out the corresponding operation asshown in Fig. 16. The structure of this implementationis shown in Fig. 24. The interface between columnand row processing is similarly designed for inversetransform as shown in Fig. 25.

In the inverse transform, the input interface issomewhat more involved. In this case, there are twosets of DWT coefficients that are almost simultane-ously synthesized when doing the column processing.The result of this processing is synthesized to gener-ate the higher resolution image. This input interfaceis shown in Fig. 26. The overall inverse transform interms of defined macro blocks is shown in Fig. 27.

8 Conclusion

We have presented a system level design for hardwarerealization of JPEG 2000 compression and decom-pression standards. We have considered both losslessand lossy compressions and dealt with each case inde-pendently. In implementation of the discrete wavelettransform, we have considered realization based onlifting algorithm as well as convolution and polyphasemethods. It was shown that implementation based on

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 113 Issue 4, Volume 12, April 2013

Page 14: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 26: Input interface macro block for inverse DWT. The input raster is properly load two different sets ofIDWT row buffers. The number of row buffers in each segment is the same as the decomposition level.

Figure 27: Overall block diagram of the inverse DWT based on realization of the convolution algorithm (9/7biorthogonal wavelet transform).

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 114 Issue 4, Volume 12, April 2013

Page 15: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 20: The interface macro block between col-umn and row processing. Horizontal mirror imagingis conducted on this core as needed. For each decom-position levels there is a need for two of these blocks,one for Lowpass and the other one for Highpass out-puts.

Figure 21: (a) Input and (b) output interface macroblocks used for multi-level forward DWT decomposi-tion. In the FDWT input interface the row buffers aredefined as shown in Fig. 18.

lifting algorithm is not always advantages in terms ofhardware. We have presented each system level blockand provided enough details for hardware implemen-tation.

This implementation is mainly intended for videoprocessing where each frame is required to be pro-cessed independently. This might have applications inmedical imaging systems as well as digital cinema.

References:

[1] Marpe, D.; Schwarz, H.; and Wiegand, T.,“Context-based adaptive binary arithmetic codingin the H.264/AVC video compression standard,”IEEE Transactions on Circuits and Systems forVideo Technology, Vol. 13, No. 7, 2003, pp.620636.

[2] Reza, Ali M. and Turney, Robert D., “FPGA Im-plementation of 2D Wavelet Transform,” Proceed-ings of the Thirty Third Annual Asilomar Confer-ence on Signals, Systems, and Computers. October1999, Pacific Grove, California.

[3] Reza, Ali M., “System level design of the cod-ing and modeling of the adaptive arithmetic cod-ing used in the JPEG 2000,” International Journal

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 115 Issue 4, Volume 12, April 2013

Page 16: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 23: Row buffer macro block used for inverseDWT. In this case, two row buffers are assigned tothe Lowpass component of the image and three rowbuffers are assigned to the Highpass component.

Figure 24: Macro block for column or row process-ing in the case of the inverse DWT. In this case thefinal output sequence is prepared by the final switchto properly order the odd and even samples.

of Information Engineering (IJIE), Vol. 2, No. 3,September 2012, PP. 86-96.

[4] Reza, Ali M., “System Level Design of the Adap-tive Arithmetic Encoding Used in the JPEG 2000Standard,” International Journal on Electrical En-gineering and Informatics, Vol. 5, No. 1, March2013.

[5] Reza, Ali M., “System Level Design of the Adap-tive Arithmetic Decoding Used in the JPEG 2000,”Submitted to International Journal on ElectricalEngineering and Informatics.

[6] Coding of Still Pictures: JBIG&JPEG, “JPEG2000 image coding system,” JPEG 2000 FinalCommitte Draft Version 1.0, ISO/IEC JTC1/SC29WG1, JPEG 2000 Editor Martin Boliek, Coeditors:Charilaos Christopoulos, and Eric Majani, March16, 2000.

[7] Ahmad, A.; Loo, K. K.; and Cosmas, J., “VLSIArchitecture Design Approaches for Real-TimeVideo Processing.” WSEAS Transactions on Cir-cuits and Systems, Vol. 7, No. 8, August 2008, pp.855-868.

[8] Atitallah, A. Ben; Loukil, H.; Kadionik, P.;and Masmoudi, N., “Advanced Design of TQ/IQTComponent for H.264/AVC Based on SoPC Vali-dation.” WSEAS Transactions on Circuits and Sys-tems, Vol. 11, No. 7, July 2012, pp. 211-223.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 116 Issue 4, Volume 12, April 2013

Page 17: JPEG2000 Hardware Implementation Procedures and Issues · 2013-09-25 · eling used in the JPEG 2000 standard [3]. Our sys-tem level design for the adaptive arithmetic encoding is

Figure 25: Interface between column and row pro-cessing in the inverse DWT.

[9] Ho, Huong, “A Hardware Architecture for MotionCompensated Video Frame Rate Up-Conversion.”WSEAS Transactions on Circuits and Systems,Vol. 11, No. 2, February 2012, pp. 43-55.

[10] ISO/IEC JTC 1/SC 29/WG 1 N1890, Date: 25September 2000, ISO/IEC JTC 1/SC 29/WG 1(ITU-T SG8) Coding of Still Pictures JBIG JPEG,Joint Bi-level Image Joint Photographic ExpertsGroup Experts Group, TITLE: JPEG 2000 Part IFinal Draft International Standard (corrected andformatted), SOURCE: ISO/IEC JTC1/SC29 WG1,JPEG 2000 Editor Martin Boliek, Co-editors Char-ilaos Christopoulos, and Eric Majani, PROJECT:1.29.15444 (JPEG 2000).

[11] Sweldens, W., Construction and Applications ofWavelets in Numerical Analysis. Ph.D. Thesis, De-partment of Computer Science, Katholieke Univer-siteit Leuven, Belgium, 1994.

[12] Sweldens, W., “The lifting scheme: A custom-design construction of biorthogonal wavelets,”Appl. Comput. Harmon. Anal., 3(2):186-200,1996.

[13] Daubechies, I. and Sweldens, W., “Factoringwavelet transforms into lifting steps,” Technicalreport, Bell Laboratories, Lucent Technologies,1996.

[14] Calderbank, R., Daubechies, I., Sweldens, W.,and Yeo, B.-L., “Wavelet transforms that map in-tegers to integers,” Appl. Comput. Harmon. Anal.,3:127-153, 1996.

DISCLAIMER AND NOTE

The views expressed herein are those of the authorand are not to be construed as official or reflecting theviews of the Commandant, the U.S. Coast Guard, theDepartment of Homeland Security, or any agency ofthe U.S. Government.

WSEAS TRANSACTIONS on CIRCUITS and SYSTEMS Ali M. Reza

E-ISSN: 2224-266X 117 Issue 4, Volume 12, April 2013


Recommended