New algorithm for r-dimensional DCT-II

New algorithm for r-dimensional DCT-II

Y.Zeng, G.Bi and A.R.Leyman

Abstract: The paper shows that the type-11 r-dimensional discrete cosine transform (rD-DCT-11) of size q'l x 4'2 x ' . . x 4'' , where r > 1 and q is an odd prime number, can be converted into a series of one-dimensional reduced DCT-11s by using the polynomial transform. The number of multiplications for computing an rD-DCT-II is significantly reduced compared to that needed by the row-column method. The total number of arithmetic operations (additions plus multiplications) needed by the proposed algorithm is also reduced substantially. In addition to the capability of dealing with different dimensional sizes, the proposed algorithm also has a simple computational structure because it requires only the ID-DCT-II and the polynomial transform.

1 Introduction

The discrete cosine transform has a wide range of applications such as data compression, feature extraction, image reconstruction and multi-frame detection. For example, 3D-DCT coding is an alternative approach to the motion compensation transform coding technique used in video coding standards [ 1, 21. Multi-dimensional transforms are also used in the areas of computer vision, high definition television (HDTV) and vision telephones to process or analyse motion images (i.e. multi-frame detection) [ 1, 21. For example, the four-dimensional DCT is generally required to process three-dimensional motion images. Although modern technologies have increased computing speed dramatically over recent years, there still exist many difficulties to process multi-dimensional signals at a throughput required by most practical applications. A good fast algorithm is extremely important to cope with the prohibitive computational complexity of the multi- dimensional transform.

The multi-dimensional DCT (MD-DCT) can be computed by directly using the well known row-column method. Although it is far from the best in terms of computational efficiency, the row-column method is widely used for processing of signals of more than two dimensions. Recently, several fast algorithms other than the row-column method have been proposed for 2D-DCT or MD-DCT [3-91. Among them, the algorithms in [ 3 , 4, 6-81, which were reported to offer savings on the required number of operations, can only deal with 2D or MD DCT- I1 of the same dimensional sizes, i.e. N x N x . . . x N, where N is a power of two. The particular requirements on dimensional sizes may not be able to support applications ([2], for example) that require various sizes for different dimensions. The fast algorithm in [5] uses a recursive technique to decompose the 2D-DCT of size

p' x p', where p is a prime number, into cyclic convolution (CC), skew-cyclic convolution (SCC) and matrix multiplications. This algorithm greatly reduces the number of multiplications compared to those needed by the row- column method. Because of the irregular computational structure, however, the computation of CC and SCC generally needs a large number of additions. Furthermore, such fast algorithms are impractical when p is a small odd number and 1 is large.

This paper presents a polynomial transform algorithm for rD-DCT-II whose size is q'l x 4'2 x . . . x &, where r > 1 and q is an odd prime number. Advantages such as reduced computational and structural complexities can be achieved.

2 Computation of 2D-DCT-II

The 2D-DCT-I1 of the sequence x(n, m)(n =0, 1,. . . , N - 1; m=0, I , . . . , M - 1) is defined by

(1) z(2n + 1)k z(2m + 1)l N-1 M-1

2M cos

2N X ( k , I ) = y y x(n, m) cos

n=O m=O

k = 0 , 1 , . . . , N - l ; I = O , l , ..., M - 1

where the constant scaling factors in the DCT definition are ignored for simplicity. It is assumed that the sizes of the two dimensions are M=qt and N=M/q", where t > 0, J>O and q is an odd prime number. Similarly to the methods used in [4, 81, the input sequence x(n, m) is converted into

y(n, ~ n ) = x(2n, 2m), N - l M - 1

; m = O , l , ...,- n = o , l , ...,- 2

y(N - 1 - n, m) = x(2n + 1,2m),

0 IEE, 2001 IEE Proceedings online no. 20010239 DOI: 10.1049/ip-vis:20010239 Paper first received 18th February and in revised form 4th September 2000 The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Y. Zeng is also with the The National University of Defense Technology, People's Republic of China

IEE Proc.-Vis. Image Signal Process.. 161. 148, No. I , February 2001

1 m = 0 , 1 , . . . , -- N - 1 M - l

n = 0 , 1 , . . . , 2; 2

y(N - 1 - IZ, M - 1 - m) = x(2n + 1,2m + l),

1 (2) N - 1 M - 1

2 2 n = O , l , ...,-- 1; m = O , l , ...,--

1

so that eqn. 1 can be expressed by

n(4n + 1)k n(4m + 1)l N - l M-l

cos 2N 2M X ( k , 0 = y y Y ( n , m) cos

n=O m=O

k = 0 , 1 , . . . , N - 1 ; l = 0 , l I . . . , M - 1 ( 3 )

Let A( q) represent the unique non-negative integer which is smaller than q and satisfies 4 4 q ) + 1 0 mod q. The 2D- DCT-I1 defined in eqn. 3 can be decomposed into two parts, that is

X ( k , 1) = XO(k, I ) + X ' ( k , 2) ( 4 )

where

N- 1

XO(kJ) =e v(n,m) n=O m=A(q) mod q

x cos n(4n + 1)k n(4m + 1)l

cos 2N 2M

k = 0 , 1 ) ..., N - I ; l = o , l , . . . ) M - 1 (5 )

and

N-l

X k 0 =e y(n ,m) n=O m+i(q) mod q

x cos n(4n + 1)k cos n(4m + 1)l 2N 2M

k = 0 , 1 , . . . , N - 1 ; 1 = 0 , 1 , ..., M - 1 (6)

The computation defined in eqn. 5 can be expressed as a 2D-DCT-I1 of size N x M / q . In fact, if 4A(q) + 1 = q we have

N - I Mla- I

n(4n + l ) k n(4m + 1)l x cos cos

2N 2M/q k = 0 , 1 , ..., N - l ; 2 = 0 , 1 , . . . , M / q - 1 (7)

If 4/2(q) + 1 = 3q, eqn. 5 can also be turned into a 2D-DCT- I1 by reordering the input. It can be proven easily that Xo(k, 1 + u(2M/q)) = (- l)"Xo(k, I> and p ( k , (2M/q) - I ) = - P ( k , 1) for an arbitrary integer U. If M / q is a multiple of q, eqn. 7 can be further decomposed in the same way.

We now consider the computation of eqn. 6. If we define p(m) = (4p + 1)m + p mod N, then lemma 1 holds.

L e m m a I : L e t S , = { ( n , m ) ) O I n I N - 1 ; O ~ m ~ M - 1; m+l(q)modq), &={(p(m) , ~ ) I O I P I N - 1; O l m l M - 1 ; m +A(q) mod q } , Then SI = S 2 .

Proo) It is obvious that S2 ES, and the number of elements in SI is the same as that in S2. Therefore, it is sufficient to prove that the elements in S2 are different from each other. Let (p (m) , m) and (p'(m'), m') be two elements in S2. If they are equal, then p(m) =p'(m'), m = m'. From the definition of p(m), we have (4p + 1)m + p = (4p' + 1)m +p' mod N. Hence (4m + l ) ( p - p') 0 mod N. Since m f l ( q ) mod q, that is, 4m + 1 and N are relatively prime with each other, we have p =p' mod N. Therefore, p =p'.

Based on lemma 1 and the properties of trigonometric identities, eqn. 6 can be computed by

where

p=O m$A(q) mod q

n(4p(m) + 1)k + n(4m + 1)1 'Os[ 2N 2M

k = 0 , 1 , . . . , N - 1 ; l = O , I , . . . , M - 1 (9)

and h.'-l M-I

1 'Os[ 2N 2M n(4p(m) + 1)k - n(4m + 1)l

k = O , I , . . . , N - 1 ; I = O , I , . . . , M - 1 (10)

From the definition of p(m) , it can easily be proved that 4p(m) + 1 (4m + 1)(4p + 1) mod 4N. After applying this equation in eqns. 9 and 10, we obtain

N-l M-1

0 = Y(P(m>, m) p=O m+L(q) mod q

1 n(4m + 1)(4p + 1)k n(4m + 1)1 'Os[ 2N 4- 2M

N - l M-l

1 n(4m + 1)(q'(4p + l)k + l ) x cos[ 2M

k = 0 , 1 , . . . , N - 1 ; Z = O , l , . . . , M - 1 ( 1 1 )

Similarly, N - l M-l

k = 0 , 1 , . . . , N - 1 ; I = O , l , . . . , M - 1 (12)

By defining

p = O , 1 , . . . , N - 1 ; l = O , l , . . . , M - 1 (13)

we can express eqns. 11 and 12 by

and N - l

k = 0 , 1 , . . . , N - 1 ; Z = O , l , . . . , M - 1 (15)

For convenience of presentation, we define a new sequence

1 = 0, 1 , . . . , M - 1 (16)

IEE Proc.-Vis. Iniuge Signal Process., Vol. 148, No. I , Februuty 2001 2

which can also be written as where

1 = 0 , 1 , . . . , M - 1 (17)

In comparison between eqns. 16 and 17, we have the relations F(2m) = E(m), m = 0, 1, . . . , (M - 1)/2 and

In fact, eqn. 17 is the same as the 1D-DCT-I1 except that those terms satisfying m = ( q - 1)/2 mod 4 are ignored in the summation. Accordingly, eqn. 17 is known as a reduced 1D-DCT-11. For different p values, eqn. 13 specifies N reduced 1D-DCT-11s of length M. The computation of the reduced 1D-DCT-I1 is discussed in detail in Section 4.

Both eqns. 14 and 15 are a summation of 50, which has the properties

F(2m+l)=E(M-l-") ,nz=O, 1 ) . . . ) (M-1)/2-1.

Vp(l+ u2M) = (-l)tdVp(l),

Vp(2M - I) = -Vp(l), Vp(M) = 0

By using these properties, A(k, I ) and B(k, l) can be computed from Vp(l>, where l = O , 1,. . . , M - 1. In general, a direct computation of eqns. 14 and 15 needs 2NM(N - 1) additions. However, eqns. 14 and 15 can be equivalently computed by using a polynomial transform, which substantially reduces the computational complexity.

Based on the properties of V,(I), it can be proved that N-1

A ( k , 2M - 1) = Vp[4/(4p + l)k + 2M - 11 p=o

N-1

= --E Vp[q/(4p + 1)k - 11 p=o

= -B(k, 1) (18)

A(k , 0) = B(k, 0), A(O,1) = B(0 , l ) (19)

and

Now let us generate polynomial

/=O /=M

By using the property in eqn. 18, eqn. 20 becomes

from which A(k, l) and B(k, 1) can be derived. Bk(z) can be expressed by

N - l 2 M - I

Bk(Z) e Vp(4"(4p + 1)k - l)Z' mod Z2M + 1 p=o /=o N - l 2 M - I

E V,(Z - q'(4p + 1)k)zl mod z2hf + 1 p=o 1=o

N - l 2 M - 1

Vp(1)z'+q'(4p+')k mod z2M + 1 p=o I=O

E [ N - l Up(z)Yk]dk mod z~~ + 1

E ck(z)zbk mod aM + 1 (21) IEE Proc.-Vis. Image Signal Pvocess.. Vol. 148, No. I . February 2001

I=O N-I

p=O

k = 0 , 1 , . . . , N - 1 , . ? E Z ~ ~ m o d z Z M + 1 (22)

Since SN G 1 mod z2M + 1, eqn. 22 is a polynomial transform which can be computed by a fast algorithm, to be discussed in the next Section.

The main steps of the algorithm are summarised as follows.

Step I. Compute N one-dimensional reduced ID-DCT-11s of length M according to eqn. 13.

Step 2. Compute a polynomial transform according to eqn. 22 and then Bk(z) according to eqn. 21.

Step 3. Add A(k, 1) and B(k, l), which can be obtained from the definition of Bk(z) in eqn. 20, to form X' (k , I) according to eqn. 8.

Step 4. Compute a 2D-DCT-I1 with size N x M / q in eqn. 7 by recursively using Steps 1 to 3.

Step 5. Compute X(k, I> according to eqn. 4.

3 Fast polynomial transform

The polynomial transform in eqn. 22 can be computed by the fast polynomial transform (FPT) algorithm reported in [ 10-121, which generally requires 2MN log, N additions. However, the number of additions can be further reduced by using the symmetric property of the input polynomial sequence Up(.). It is noted that the coefficients of UJz) are Vp(l), I = O , 1,. , . , 2 M - 1, which satisfy Vp(2M- 1)= - Vp(l); Vp(M) = 0. It indicates that only one-half of the coefficients are independent. This property can be expressed as

U,(z) r/,(z-') mod z2M + 1

Based on this property, it can be proved that the polynomial sequence ck(z) in eqn. 22 also has a symmetric property expressed as

cN-k(z) ck ( z - ' ) mod gM + 1

Therefore, only half of the Ck(z) need to be computed. The fast polynomial transform algorithm computes a polynomial transform by s = log, N stages. The computation of each stage produces temporary outputs which also have a symmetric property that provides further savings on the number of additions. In the rest of this section, some details of the fast polynomial transformation algorithm are introduced to show the utilisation of the symmetric property. More information on the fast polynomial algorithm can be found in [lo-121.

If we express n and k by

n = nz,_lqs-' + ns-2qs-2 + . . . + no; k = k,-lq"-' + ks-2qs-2 + * * 1 + ko

and simply denote n = (n,?- . . . no) and k=

3

a-l U - 1

x mod z2M + 1 (25)

Eqn. 25 is the jth stage of the FPT algorithm ( j = 1, 2,. . . , s). If the symmetric property is not considered, then eqn. 25 needs 2(q - 1)MN additions. However, the number of additions can be reduced by one-half if the symmetric property is hlly used. From eqn. 23 we have

0-1 0-1

n,,=O n,-l=O

x mod gM + 1 (26)

where k=k j~1q- ' - '+k j -2qJ -2+ . . . +kO. For E = o , we have

c&, (2-I) = C& (z) mod 2M + 1 (27)

and for k # 0 we have

It means that we need to compute_ C&+i;(z) for &= 1 to (qJ - 1)/2 only, and C&,+i,(z) for k=(qJ + 1)/2 to qJ - 1 can be obtained by using the symmetric property. Further- more, the property in eqn. 27 shows that it is enough to compute only half of the total polynomial coefficients for the polynomial CAy,(z). The rest of the coefficients can be achieved from the properties without requiring any computation. Therefore, eqn. 25 needs ( q - 1)NM additions only. It is noted from eqn. 20 that the Mth coefficient of each output polynomial is B(k, M) which is not required by our computation. Therefore, a further saving of ( N - 1)/2 additions can be made. In total, the number of additions required by eqn. 22 for the fast polynomial transform is ( q - l)NMlogq N - ( N - 1)/2.

4 Radix-q algorithm for reduced DCT-II

In general, many fast algorithms [13-191 for 1D-DCT-I1 can be amended for the reduced ID-DCT-11. This section presents a fast algorithm for the reduced 1 D-DCT-I1 based on the fast algorithm reported in [ 191.

We divide W(4, 1=0, 1, . . . , M - 1, as defined in eqn. 17, into Cj(l)=W(qZ+j), j = O , 1 ,..., q - 1; Z=0, 1 , . . . , M/q - 1, and show that Cj(l> can be expressed in the reduced 1D-DCT-I1 of length M/q. For example, we have

which is a reduced ID-DCT-I1 of length M/q whose input sequence is

(4- 1)/2 d()(m) = F(n2) + c [F(2iM/q + m)

i=O

+ F(2iMlq - 1 - m)] m = 0, 1, . . . , M / q - 1 and m + (q - 1)/2 mod q (30)

NOW let us consider Dj(Q = Cj(l> + CqPj(Z - l), which can be expressed as

Z = 0, 1 , . . . , M / q - 1 (31)

which is again a reduced 1D-DCT-I1 of input sequence

q m ) = 2F(m) cos Y2;i l)l L J

I) +2F(2iM/q - 1 - m)cos [ n(4iM/q2;2m - 1)j

4 IEE Proc.-Vis. Image Signal Process., lbl. 148, No. 1, February 2001

By using the trigonometric properties, the computation of d,(m) can be simplified. If we define

f o b ) = F(m) J;(m) = F(2iM/q - 1 - m) + F(2iM/q + m) gi(m) = F(2iM/q - 1 - m) - F(2iM/q + m)

i = 1 , 2 , . . . , (q - 1)/2; 0 5 m I M / q - 1 and

m + (4 - 1)/2 mod 4

then eqns. 30 and 32 can be expressed by

ddm) = G ( m >

(33)

j = 1 , 2 , . . . , q - - 1; O S m s M / q - l and

m $ (4 - 1)/2 mod 4 (34)

where

2nq , (4- 0 1 2

Cj(m) = .h(m) cosq, ,/ = 0, 1, . . . , q - 1 i=O

0 5 m 5 M / q - 1 and m + (q - 1)/2 mod q (35)

The reduced 1D-DCT-I1 W(l) can be obtained from C,(l) and DJ(O according to

W(ql) = c"(l); I = 0, I , . . . , M / q - 1

q o ) = 0,(0)/2; CJ(l) = D'(1) - C,,(l- 1) 1 = 1,2 , . . . , M / q - 1; j = 1,2, . . . , - 1 (36)

Therefore, the reduced ID-DCT-I1 of length M is decomposed into q reduced 1D-DCT-11s of length M / q at the decomposition cost required by eqns. 33-36. The main steps of the decomposition are summarised as follows:

Step 1. Computing eqn. 33 to getf;(m) and g,(m). Step 2. Computing eqn. 35 to get cJ(m) and $(m). Step 3. Computing eqn. 34 to get dJ(m). Step 4. Computing the reduced DCT-11s of dJ(m) with length M / q to get Co(l> and I l l ( / ) . Step 5. Get C'(1) according to eqn. 36.

With the assumption that the computation in Step 2 for a fixed m needs a(q) multiplications and p(q) additions, the total number of operations for Steps 1 to 3 and 5 are (a(q) + 2q - 2 ) ( M / q - M / q 2 ) multiplications and additions. We can use the same technique to further decompose the reduced ID-DCT-I1 in Step 4 if the sub- sequence's length M/q is divisible by q. Let cp(M) and $(M) represent the number of multiplications and additions, respectively, for the reduced 1D-DCT-IT of length M. If the recursive decomposition process stops until the sequence length becomes q', the number of arithmetic operations becomes

(IR(q) + 2q - 2>(M/q - M/s2>+(q - l ) (M/q - 1)

d 9 ' ) = (a(q) + 2q - 2)

+ y (t - I)q' - q'-/ + 1 + qt-qY(q) (37)

B E Proc- Vis. Image Signal Proces.s., Vol. 148, No. I , Fehruary 2001

The proposed decomposition procedures are valid for an arbitrary odd number q. However, if q is small (such as 3 or 5) some arithmetic operations can be saved by an opti- mised implementation. The fast algorithms for these two cases are given as examples.

4. I Algorithm for q = 3

Step I . Compute

Tl(m) = F(2 x 3'-l - 1 - m) +F(2 x 3'-, + m)

T2(m) = F(2 x 3l-I - 1 - m) - F(2 x 3I-l + m)

T3(m) = 2F(m) - Z',(m)

a m ) = F(m) + T,(m> (2m + 1 ) ~ (2m + 1)n

d, (m) = T3(m) cos 2M + T,(m)&sin 2M

- ~ ~ ( m ) a s i n ( ~ " + (2m + 1)n

2M 2M d2(m) = T ~ ( P z ) COS II

nz = 0, 1 , . . . ,3!-' - 1 and m $ 1 mod 3

Step 2. Compute the reduced ID-DCT-I1 of d,(m) to get

Step 3. Compute C,(l) = W(31+ j ) by D;(l> (Z=O, 1 ) . . . ) 3'-1 - 1 ; j = o , 1, 2).

C,(I) = Do@), I = 0, 1 , . . . ,3'-l - I c, (0) = Dl (0)/2, C2(0) = 02(0)/2 CI(1) = Dl(I) - C2(I - 1); C2(1) = Dl(Z) - C,(l - 1)

1 = 1 , 2 , . . . ,3'-1 - 1

If the decomposition process terminates until the sub- sequence length is 3 and the 3-point reduced 1D-DCT-I1 needs one multiplication and two additions plus one shift, then we have

2 1 20 17 3 3 9 9

~ ( 3 ' ) = - t3' - -3'; 1//(3') = - t3' - - 3' + 1

4.2 Algorithm when q= 5 The procedures are similar to those needed for q = 3 . However, the computation defined in Step 2 (or eqn. 35) for q = 5 can be expressed explicitly by

- C,(m> = h ( m ) +.fi(.z) +.m)

271 G ( 4 = (.fi(m> - . a m ) ) c o s 7 -.Mm)/2 +fo(m>

C2(m> = - (J;(m) -.Mm>> c o s y -f1(m)/2 + . a m > 2n

It is shown in [ 131 that the reduced length-5 DCT-I1 needs four multiplications and 11 additions. Therefore, the computational complexity of the reduced 1 D-DCT-I1 of length 5' is

44 24 88 38 25 25 25 25

q ( 5 ' ) = -t5' - -5f; $@') = -t5' - -s5 ' + 1

5

5 Multi-dimensional DCT-II Now we focus our attention on eqn. 4 1 . If we define

A(k19 k2, . . , , kr) This Section generalises the method used in Section 2 for the computation of rD-DCT-I1 defined by

nl=O n2=0 n,=O

74212,. + l)k, . . . cos n(2n, + l)k, x cos

2Nl 2N, k i = O , l , . . . , & . - 1 ; i = 1 , 2 , . . . , r

where N,. = q' (q is an odd prime number) and N,/Ni = q'i, i = l , 2 , . . . , r - 1, lizO. If we define a sequence ~ ( Q I , . . ., y(n1 3 n23 . . . , n r ) = ~(2n1 ,2n , , * . . T 2nr)

by

y (N , - 1 - n l , n2, . . . , n,) = x(2nl + 1 , 2n2, , . . ,2n,.) . . .

y(Nl - 1 - n l , . . . , N, - 1 - n,) =x(2nl + 1,. . . , 2n, $- 1 )

0 5 2ni 5 (Ni - 1) or 0 5 2ni + 1 5 (Ni - 1 ) ;

then the rD-DCT-I1 becomes

i = 1,2 , . . . , r

X(k1, k2, . . ., k r )

k i = O , l , . . . , N i - l ; i = 1 , 2 , . . . , r (38)

Let A( q) represent the unique non-negative integer 0 mod q. Then, smaller than q and satisfying 4A(q) + 1

eqn. 38 can be decomposed into two parts as

X(k , , k2, . . . , k,) = XO(k , , k2, . . . , kr) +X1(k,, k2, . . , , k,) (39)

where

XO(k1, k2, . . . 1 k,)

nl=O n,.-l=O n,.=O

x cos[ 7C(4nr + 6)kr ] 2Nr/q

6 = 1 o r 3 ; k i = O , l , . . . , N i - 1 ; i = 1 , 2 , . . . , r (40) and

nl =O n,-l =O n,&(q) mad q

x cos[ n(4n, + l ) k , ]-cos[ n(4nr 2N, + l)k, ] 2Nl

k i = O , l , . . . , N i - l ; i = l , 2 , . . . , r (41) Noting that J?(kl, . . . , k,- k,+u(2Nr/q)) = ( - l ) I L p ( k 1 , . . . , k,- 1, k,) and p ( k l , . . . , k,- I , (2N,/q) - k,) = - p ( k , , . . . , kr - l , k,), we know that eqn. 40 can be computed by an rD-DCT-I1 of size N I x . . . x Nr- 1 x Nrlq. 6

n l=O n,+,=O n,+L(q) mad q

1 + 1)kl n(4nr_2 + l)kr-l x 2Nl ] . . .cos[

x cos[

2Nr-2

n(4nr 2Nr + - 71(4nr-1 + l)kr-l 2% 1

then eqn. 41 can be computed by 1 2 X ' ( k , , k2, . . . , k,) = -[A(k, , k2, . . . 1 k,) +m,, k2 , . . . , k,>l

(42) Let pr - I (n,) be the least non-negative remainder of (4prPl + l )n , + p r P 1 modulo N r P l . Similar to the two- dimensional case, we have

A(k1, k2, . . 3 k,) NI-l N,-2-I N,-1-1 N,-1 =e -c nl=O U , + ~ = O pr- ,=0 c n,&(q) c mod 9

x y(n1, . . . , nr-2r~r-l(nr), n,.)

and

IEE Proc.-Vis. Image Signal Process.. Vol. 1461, No. I , February 2001

to be the (r - 1)-D reduced DCT-I1 of size N I x . . . x NrP2 x N,., then eqns. 43 and 44 become

We form a polynomial

A', - A ( k l , It2, . . . , 2N,. - k,.)z"r

k,=l

2N..- I = B(kl , k2, . . . , k,.)zki

k,=O

which can be expressed as

where

and

,A'. - 1

which is the generating polynomial of vk ,,,,,, k,-2a,.-I (1). The algorithm can be summarised as follows.

Step' I . Compute N,.-I (r - l)D reduced DCT-11s of size N I x . . . x Nr-2 x N,. according to eqn. 45. Step 2. Compute N I N2 . . . NrP2 polynomial transforms according to eqn. 48 and then compute Bk, ,k , ,,,,, kr- l ( z ) according to eqn. 47. Step 3. Based on the definition of B/<,.k, ,,,,, k,- , ( z ) in eqn. 46, A(k, , k,,. . . , k,.) and B ( k , , IC,,. . . , k,.) are derived to form X1(kl, k 2 , . . . , k,.) according to eqn. 42. Step 4. Compute an rD-DCT-I1 of size NI x . . . x N r P l x N,./q according to eqn. 40. Step 5. Compute X(k , , k 2 , . . . , k,.) according to eqn. 39.

The flow-chart of the algorithm is given in Fig. 1.

output

Fig. 1

IEE Proc.-Vis. lmage Signal P~oce.ss., Kd 148, No, I , Ftbtwary 2001

Flow-chart qf rD-DCWI algorithm

6 Analysis and comparison of computational complexity

The proposed fast algorithm needs to compute eqn. 40, an rD-DCT-I1 of size N I x . . . x Nr- I x N,./q, and eqn. 41, an rD reduced DCT-I1 of size N I x . . . x N,.-, x N,.. Let q ( N , , . . . , N,.- I , N,.) and $(N,, . . . , N,.-, , N,.) represent the number of multiplications and additions, respectively, for the rD reduced DCT-TI of size N I x . . . x Nr-, x N,.; then we have

v(Nl1 . . . , Nr-1, N,.) = Nr-l v(Nl I . . . , N,.-2, N,.) (49)

N $(NI, . . . , N,.-I, N,.) = - $(N,.) N,.

N NI.

+ (r - l )N + (4 - 1)N log - (52)

Therefore, the number of multiplications and additions for rD-DCT-I1 of size N I x . . . x N,.- I x N,. are respectively as follows:

M,,(N,, ' , N,.-I, N,.) = cp(N,, . . . , N,.-I, N,.)

+ M,, ( N I , . . . , N,.-l, ") 4 (53)

It is difficult to derive a closed-form expression of the number of operations for arbitrary N, . When N I = N2 = . . . = N, = q', however, the closed-form expression can be obtained. For this case, let us define M L d ) = M,,(q', . . . , q', 4') and M q ' ) = M d , . I . , d, q') for simplicity and, without loss of generality, assume that cp( 4') = a I fq'+a2q' + a3 and ib( 4') = 6, tq' + b2q' + h, . From eqns. 53 and 54 we can deduce that

( 5 5 )

Let T( q') and p ( q') represent the number of multiplications and additions, respectively, required by a 1 D-DCT-I1 of length q f . Without loss of generality, we assume that z( q') = cI tq'+c,q' + cg and p ( 4') = d, tq'+d2q' + d3 .

7

When N I = N2 = . . . = N,. = q', the computational complexity of the commonly used row-column method is

l i ju(4f) = c,rtq"f + c2r4"' + c3r4(r-l)t;

A d ( 4 ' ) = dlrtqrt + d2rq" + d34F-1)t

Compared to the above computational complexity, the proposed fast algorithm achieves remarkable reductions on the number of multiplications. Based on the approx- imations a l = [ ( 4 - l)/q]c,, and u2 x [ ( 4 - l)/g]c2, which can be seen from Section 4 , we have

1 - Mu E -Mil

Y

However, the comparison of the number of additions needed by the two algorithms is difficult. Let us consider two special cases for 4 = 3 and 4 = 5. When q = 3, Section 4 shows that a , = 2 / 3 , u2 = -113, a3 = O and b, =20/9, b2 = - 1719, b, = 1. Therefore, we get

3"+ I 2(3" - 1)

Mu(3') = t3" - ____ (3"t - 1)

17(3" - 15) A,(3') = 3r+- t3" - (3"t - 1) ( i) 6(3" - 1)

(57)

When 4 = 5 , Section 4 shows that U , =44/25, u2 = -2412.5, u3 = O and b, = 88/25, b, = -38125, b, = 1. We have

MlI(5') = - t5" - - + ~ (5"' - 1) l 1 5 (: 5(5"- l 1 1) 1

~ ~ ( 5 ' ) = 5r - - t5"' - ( :)

If the best known algorithm for DCT-I1 in [ 131 is used, we have

As seen tions needed by the proposed fast algorithm is approxi- mately l l r times that needed by the row-column method. However, the number of additions required by the proposed algorithm is slightly more than that needed by the row- column method. Finally, the total number of arithmetic operations (multiplications plus additions) is also reduced when t is larger, as seen from eqns. 57-60. When r 2 3, the proposed fast algorithm substantially achieves savings on the number of arithmetic operations.

7 Conclusions

The polynomial transform algorithm for rD-DCT-I1 presented in this paper can handle transforms of size 4'1 x 4'2 x . . . x 4l1, where 4 is an odd prime number.

Considerable savings on the number of multiplications are achieved. The structure of the algorithm is also simple because it uses only reduced 1D-DCT-I1 and polynomial transforms, which can be easily implemented. Although the number of additions required by the proposed algorithm increase slightly compared to the row-column method, the proposed algorithms achieve savings in terms of the total number of arithmetic operations. We use the simplest FPT algorithm to minimise the additive complexity, which is mainly for computing the (r - 1)-dimensional polynomial transform. Since the polynomial transform can be viewed as a kind of discrete Fourier transform in the polynomial residue rings, further improvement on the existing FPT algorithm is possible, which is similar to the improvement achieved on the Cooley-Turkey FFT algorithm. Extensive research is needed to find out how to achieve the improvement. The presented fast algorithm can also be easily extended for computing rD-DCT-I1 of size 2k1q'l x 2'<2q'2 x . . . x 2 k ~ 4 b . However, it seems that it is much more difficult to use the same idea for rD-DCT-I1 of size 4: x 42 x . . . x 4:!, where the qi are different prime numbers.

I

8

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

References

APOUSLEMAN, G.P., MARCELLIN, M.W., and HUNT, B.R.: 'Compression of hyperspectral imagery using thc 3-D DCT and hybrid DPCM/DCT', IEEE Trans. Geosci. Remote Sens., 1955, 33, (I) , pp. 26-34 SIU, Y.L., and SIU, W.C.: 'Variable temporal-length 3-D discrete cosine transform coding', IEEE Trans. Image Process., 1997, 6, (5), pp. 758-763 CHO, N.I., and LEE, S.U.: 'Fast algorithm and implementation for 2-D discrete cosine transform', IEEE Pans. Circuits Sysf., 1991, 38, (3), pp. 297-305 DUHAMEL, P., and GUILLEMOT, C.: 'Polynomial transform computation ofthe 2-D DCT'. Proc. ICASSP, 1990, pp. 1515-1518 FANG, W.H., HU, N.C., and SHIH, S.K.: 'Recursive fast computations of the two-dimensional discrete cosine transform', IEE Puoc., Vis. Image Signal Process., 1999, 146, ( I ) , pp. 25-33 FEIG, E., and WINOGRAD, S.: 'Fast algorithm for the discrete cosine transform', IEEE Trans. Signal Process., 1992, 40, (9), pp. 2174-2193 HUANG, Y.M., and WU, J.L.: 'A refined fast 2-D discrete cosine transform algorithm', IEEE Trans. Signal Process., 1999, 47, (3), pp. 904-907 PRADO, J., and DUHAMEL, P.: 'A polynomial transform based computation of the 2-D DCT with minimum multiplicative complexity'. Proceedings of ICASSP, 1996,3, pp. 1347-1350 WANG, Z.S., HE, Z.Y., ZOU, C.R., and CHEN, J.D.Z.: 'A generalized fast al~orithm for n-D discrete cosine transform and its auolication to motio; picture coding', IEEE Trans. Circuits Syst. 11, A%og Digit. Signal Process., 1999, 46, (5), pp. 617-627 BLAHUT, R.E.: 'Fast algorithms for digital signal processing' (Addi- son-Weslev, Reading, MA, 1984) JIANG, ZIR., and ZENG, Y.H.: "Polynomial transform and its applications' (National Universitv of Defense Technoloev Press. P.R. China. .,, 1989) (in Chinese) JIANG, Z.R., ZENG, Y.H., and YU, P.N.: 'Fast algorithms' (National University of Defense Technology Press, Changsha, P.R. China, 1994) -. (in Chinese) B1, G., and YU, L.W.: 'DCTalgorithms for composite sequence lengths', IEEE Trans. Signal Process., 1998, 46, (3), pp. 554-562 CHAN, S.C., and HO, K.L.: 'Fast algorithm for computing the discretc cosine transform', IEEE Trans. Circuits Syst. 11, Analog Digit. Signal Process., 1992, 39, (3), pp. 185-190 CHAN, Y.H., and SIU, W.C.: 'Mixed-radix discrete cosine transform', IEEE Trans. Signal Process., 1993, 41, ( ] I ) , pp. 3157-3161 KAR, D.C., and M O , VV: 'On the prime factor decomposition algorithm for the discrete sine transform', IEEE Trans. Signal Process.,

LEE, B.G.: 'A new algorithm to compute the discrete cosine transform', IEEE Trans. Acoust. Speech Signal Process., 1984, 32, pp. 1243-1245 LEE, P., and HUANG, EY.: 'An efficient prime-factor algorithm for the discrete cosine transform and its hardware implementations', IEEE Pans. Signal Process., 1994, 42, pp. 2041-2058 ZENG, Y.H.: 'Fast algorithm for discrete cosine transform of arbitrary length', Math. Nunier: Sin. (in Chinese), 1993, 15, (3), pp. 295-302

1994, 42, (1 l), pp. 3258-3260

8 IEE Proc.-Vis. Image Signal Proces.v., Vol. 148, No. I . February 2001

Date post:	20-Sep-2016
Category:	Documents
Upload:	ar
View:	213 times
Download:	0 times

New algorithm for r-dimensional DCT-II

Documents