[IEEE 2007 14th IEEE International Conference on Electronics, Circuits and Systems (ICECS '07) -...

Low Power Implementation of Decimation Filters in Multistandard Radio Receiver Using Optimized

Multiplication-Accumulation Unit

Nadia Khouja, Khaled Grati, Adel Ghazel CIRTA’COM Laboratory 2088 Technologic Parc

El Ghazala, Ariana, Tunisia [email protected] , [email protected] , [email protected]

Abstract—In this work, the implementation of decimation filters for multistandard wireless transceivers was optimized to reduce its power consumption. The power reduction is achieved through the usage of a MAC unit inside the filters that reduce the total activity and therefore the dynamic power. The multiplication function of the MAC unit is based on the Radix-4 modified-Booth algorithm for the generation of the partial products. For the partial products summation, carry save addition was used. A Pipeline stage is then introduced to allow the accumulation of the result of the partial product summation step and the previous multiplications results. The power consumption of the different filters in the multistandard radio receiver was already optimized using the clock-gating technique at the circuit level.

Experimental results show a power reduction of 9,48% when using the optimized MAC unit in comparison to the same architecture of the filters where only clock-gating technique is used and a total reduction of 61,98% compared to the implementation without any optimization.

I. INTRODUCTION

The design of a multistandard receiver requires the consideration of power reduction techniques during the different steps of its implementation. Power consumption is becoming nowadays an important issue when designing digital systems. In fact, Power is increasing not only because of the integration of more functions inside the new digital circuit devices but also because of the increase of the frequency and because of the technology itself that is becoming smaller and smaller.

In the present work, a multiplication-accumulation unit that presents a low-activity rate and therefore reduced dynamic power consumption is used to reduce the total power of a multistandard channel selection-decimating filter. In fact, the majority of the power consumption in a digital circuit comes from the switching activities, as shown in equation 1.

2.. DDTOT VCfP α= (1)

Where � is the switching activity parameter, CTOT is the load capacitance; VDD is the operating voltage and f is the operating frequency [1].

This filtering chain was already optimized in term of power consumption using the clock-gating technique [11]. The principle of this technique is to reduce the consumption of the clock by disconnect it from parts of the design during inactive periods.

This paper is organized as follows: Section 2 briefly reviews sources of power dissipation and solutions for reducing power at different levels. We present the multiplication-accumulation unit in 3 and we review the decimation filtering chain for multistandard wireless receiver in section 4. In section 5, we present the hardware optimization of the filtering chain using the optimized MAC unit. Results for area and power are proposed and compared to previous results in section 6. Conclusions are presented in section 7.

II. POWER-REDUCTION TECHNIQUES

In CMOS technologies, the major part of the power dissipated is due to the dynamic power that expression is given in equation (1). In fact, dynamic power dissipation is about 50% of the total power consumed in a digital circuit [4]. Reducing the dynamic consumption could leave in consequence to important save of power.

Equation (1) shows that the dynamic power consumption is proportional to switching activity. Therefore, minimizing switching activity can effectively reduce the power dissipation without impacting the circuit performance. The activity can be reduced with different methods and at different levels. The simplest solution is the reduction of the total number of operations. Others solutions are the pipeline [9], operation’s transformations [7], arithmetic operations

1-4244-1378-8/07/$25.00 ©2007 IEEE. 1183

optimization [8], such as the use of the booth algorithm for reducing the partial products multiplication operations [3], the Wallace multiplier [10], and the clock gating method [11].

III. MULTIPLICATION-ACCUMULATION UNIT

A multiplication-accumulation unit is responsible of performing the function given by equation 2.

ZYX +× (2)

where X and Y are two input data with k bits and Z the input datum with 2k bits. A kk × bits multiplier and a k2 bits CPA adder could be used for performing these operations. Many researches have attempted to design a MAC unit that is optimized in term of power and have also a high computational performance. In [2], an optimized low-power MAC unit was proposed and showed a power reduction between 21.09% and 43.74%.

In this work, the switching activities and in consequence the power consumption of the multiplication operation, the

kk × bits multiplier is broken into 3 parts: Partial-Product Generation, Partial Product Summation and final CPA addition. The Partial-Product reduction step generates the partial products of the multiplication using the modified Booth algorithm in radix-4 base. For the partial products summation a carry-save addition tree technique based on (m, 2) compressors is used. The result of the partial products summation is two vectors Sum and Carry. The accumulation of the Z datum is merged to the addition of the Sum and Carry vectors using a carry save adder. Finally, a Carry Propagate Adder is used to perform the final addition. The figure 1 shows the used architecture of the Multiplication-accumulation unit.

Figure 1. Architecture of the MAC unit used in the filters design

A. Partial Product Generation The application of the modified booth algorithm reduces

the activity of the multiplication circuit since it reduces the number of partial products to the half. This algorithm is widely used when dealing with low power multiplication operations. It has been shown in [4] that a reduction of 31% can be achieved on power dissipation by just applying the modified Booth radix-4 algorithm.

B. Partiel Products Summation step Several techniques for the addition of the partial products

can be applied such as the carry-save array addition and the carry-save tree addition. In this work, we used an addition architecture based on carry save addition tree. The outputs of this summation step are two vectors Sum and Carry. These two vectors are then stored in registers to allow their summation to the previous results that represent the Z vector in equation 2.

C. Final Additions The final addition is composed of 2 steps. In fact, at the

output of the registers, 3 vectors have to be added, Sum and Carry vectors that represent the result of the Partial product summation and the vector Z that represents the previous results. This first addition is realized by a carry save adder. Then a carry propagate adder allows the calculation of the final result.

The detailed implementation of the MAC unit is detailed in figure 2.

Figure 2. Implementation details of the MAC unit

P P G

XY

P 0P 1P f

P P S( P 0 + P 1 + P 2 + … … ..+ P f)

2 -k b it C S A

R e g R e g R e g

2 -k b its C P A

k k

Z

2 k

Booth

decoder

Booth

decoder

Booth

decoder

Booth

decoder

Booth

decoderBooth

decoder

Compressors

Registers

n+m CSA Adder

n+m CPA Adder

clk

output

Ym bits

Xn bits

1184

IV. PREVIOUS WORK

The purpose of this work is to provide a low power solution for the implementation of a decimation filtering chain for multistandard reception. The design and the structure of such a receiver are discussed in [2] and [4]. The power consumption of the decimation chain was studied and optimized in [11]. The power reduction technique used in this previous work was the clock-gating technique at the circuit level. The principle of this technique is to reduce the consumption of the clock unit that constitutes a big average of the total power. Experimental results show a 58% power reduction average by applying this reduction method.

The structure of the decimating filtering chain for multistandard receiver is given in figure 3.

Figure 3. Decimation filters cascade structures

A. Comb filter. The Comb filter specification is the number of stages

required to prevent noise aliasing for a given decimation ratio. To meet radio standard, Nyquist frequency and dynamic range requirements the chosen OSR were 64, 32 and 16 respectively for GSM, DECT and UMTS standard. Considering the blockers and interferes profile we find that 5 stages are required for both GSM and DECT standard however 6 stages are required for the UMTS standard.

B. Halfband filter The half-band filter is used to reduce sampling rate

before channel selection to reduce the computation complexity. This filter can be reused for all chosen standards. Considering GSM, DECT and UMTS standards the require attenuation is 56 dB and the normalized pass and stop frequency are pf = 0.32 and sf = 0.67.

C. Fir Selector filter

The last stage must eliminate the residual noise, reduce the sampling to the nyquist rate and selects the channel. For the DECT selector filter, the required attenuation is 35 dB and the transition band is [576 kHz, 700 kHz]. Eventually for the GSM selector filter the required attenuation is 41 dB and the transmission band is [82 kHz, 100 kHz].

V. HARDWARE OPTIMIZATION USING THE OPTIMIZED MAC UNIT

The Multiplication-Accumulation unit was used inside the halfband filter and the Selector FIR filter. These two

filters were implemented in their polyphase form using a structure of generic multipliers. In this work, the generic multipliers as well as the accumulation unit was removed and replaced by the designed multiplication-accumulation unit. The previous implementation of each sub-filter forming the polyphase decomposition of each of the two filters is given in figure 4. The new implementation is given in figure 5. To deliver the result of the multiplication in one clock-cycle as it was the case with the old implementation, the MAC unit has to run at the double of the frequency of the filter. In consequence, during the active periods of the design, the power dissipated by the clock driving the register of the MAC unit is more important because the frequency is doubled. However, the activity of the multiplication operation itself is reduced because of the usage of the Booth algorithms and the compressors tree. As conclusion, the overall activity is reduced but the clock is consuming more power then in previous work because the frequency of the MAC unit is the double of the frequency of the filter.

z - 1 z - 1 z - 1 z - 1 z - 1

z - 1z - 1 z - 1z - 1 z - 1

x n

M u x

Mux

h 0

h 1h 2

h 3

h 4

R e g

y n

N R S T 1

C L K 1

C L K 3

N R S T 2

Figure 4. Implementation of a filter stage using the clock-gating technique

z

xn

M u x

Mux

h 0

h 1h 2

h 3

h 4

M A Cu n i t

N R S T

o u t p u t

C L K _ m

C L K

- 1

z - 1 z - 1 z - 1 z - 1 z - 1

z - 1 z - 1 z - 1 z - 1

Figure 5. Implementation of a filter stage using the MAC unit implementation

Comb filter M Halfband filter 2 Fir selector filter 2

1185

VI. FPGA IMPLEMENTATION RESULT

The results of implementation of the digital filtering on a VIRTEX4 FPGA of Xilinx are given in table 1. The frequency was fixed to 80 MHz for the power measurement. The results on power are obtained using XPOWER tool. The results obtained show a power reduction improvement of 9,48 % compared to the same design where only clock-gating technique is used [11]. This comes however at the price of a bigger area on the FPGA. This additional area is mainly due to the fact that 8 DSPs blocks was used in the old implementation to replace the multiplication operations required to perform the filtering operation. In the new implementation, no DSP blocks are used since these multiplication blocs are implemented directly on the FPGA.

TABLE I. RESULTS OF IMPLEMENTATION OF THE DECIMATING FILTERS

VII. CONCLUSION

In this paper, we have described an optimized implementation of digital channel selection filter processor transceiver. We focus in this work on the optimization of the power consumption that constitutes a very important step in the reduction of the overall complexity of a digital circuit. For this, and because a filtering operation is mainly composed of a multiplication-accumulation unit, we tried to explore the possibility of reduction of the power consumption by the optimization of the activity of such a unit. However, even if the activity in the multiplication-accumulation unit was reduced, we increase the power dissipated by the clock when the circuit is active because the frequency of the MAC unit is the double of the filter frequency. The implementation of the filtering chain on a

Virtex4 FPGA shows an overall power reduction of 9,48% in comparison to the implementation where only clock-gating is used. This result mainly demonstrates that the Multiplication-accumulation unit contributes efficiently to the total activity reduction of the system.

REFERENCES [1] A. P. Chandrakasan and R. W. Brodersen, “Minimizing power

consumption in digital CMOS circuits,” Proc. IEEE, vol. 83, pp. 498–523, Apr. 1995.

[2] A. Ghazel, L. Naviner, K. Grati, “On design and implementation of a decimation filter for multi-standards Wireless Transceivers,” IEEE Transactions on Wireless Communications, vol. 1, no.4, 558-562, 2002.

[3] H. Lee, “A power-aware scalable pipelined Booth multiplier,” in Proc. IEEE International Systems-On-Chip Conference, pp.123–126, 2004.

[4] K. Grati, A. Ghazel and L. Naviner, “Design and implementation of digital channel selection decimating filter for multistandard receiver” ICECS 2005.

[5] V. G. Moshnyaga and K. Tamaru, “A Comparative Study of Switching Activity Reduction Techniques for Design of Low-Power Multipliers,” Proceedings of the IEEE International Symposium on Circuits and Systems, pp.1560-1563, 1995,.

[6] Zhijun Huang; Ercegovac, M.D., “On signal-gating schemes for low-power adders”, Signals, Systems and Computers, 2001. Conference Record of the Thirty-Fifth Asilomar, Page(s): 867 -871 vol.1. Conference on, 2001.

[7] P. Chandrakasan, M.Potkonjak, R.Mehra, J. Rabaey and R.Brodersen, ‘Optimiszing Power Using Transformations’ IEEE Transactions on computer-Aided of integrated circuits and systems. Vol. 14, No. 1, January 1995.

[8] T. Callaway and E. Swartzlander, “Optimizing arithmetic elements for signal processing” VLSI Signal Process. Wkshp., pp. 91-100, 1992.

[9] V. G. Moshnyaga and K. Tamaru, “A Comparative Study of Switching Activity Reduction Techniques for Design of Low-Power Multipliers,” Proceedings of the IEEE International Symposium on Circuits and Systems, pp.1560-1563, 1995.

[10] Lakshmanan, M. Othman and M. Ali, "High performance parallel multiplier using Wallace-Booth algorithm," IEEE International Conference on Semiconductor Electronics, pp. 433-436, December 2002.

[11] N.Khouja, K.Grati, and A.Ghazel "Low Power FPGA-Based Implementation Of Decimating Filters For Multistandard Receiver" IEEE Design and Test of Integrated Systems in Nanoscale Technology, pp. 10-14, September 2006 .

Filter architecture Reduction Technique

Area (slices)

Dsp 48 Power (mW)

Optimization based on

clock gating

2,126

8

385

CombFiler: recursive architecture

Halfband + FIR selector filters:

Architecture based on Generic multipliers

Optimisation without

clock-Gating

2,208

8

915

CombFiler: recursive architecture

Halfband + FIR selector filters:

Architecture based on Generic multipliers

Clock-gating +

Optimised MAC unit

3,170

0

348,42

1186

Date post:	08-Dec-2016
Category:	Documents
Upload:	adel
View:	216 times
Download:	2 times

[IEEE 2007 14th IEEE International Conference on Electronics, Circuits and Systems (ICECS '07) -...

Documents