+ All Categories
Home > Documents > Optimized Design Platform for High Speed Digital Filter ... · Optimized Design Platform for High...

Optimized Design Platform for High Speed Digital Filter ... · Optimized Design Platform for High...

Date post: 16-Apr-2018
Category:
Upload: hoangcong
View: 215 times
Download: 1 times
Share this document with a friend
12
19 Optimized Design Platform for High Speed Digital Filter using Folding Technique 1 Shreyas Patel , 2 Prof.J.S. Rani Alex 1 Department of SENSE, VIT University, Chennai, India 2 Department of SENSE, VIT University, Chennai, India ABSTRACT Implementation of DSP system must satisfy the sampling rate constraint and must require less space and power consumption. Thus finding a reasonable solution to optimize design platform using different algorithm is much needed. In this paper an optimized platform is designed by lifetime analysis which is one of the techniques of folding algorithm for minimizing the registers such that synthesizable RTL is obtained. Folding techniques can be used for the synthesis of DSP architecture that can be operated using single or multiple clocks with less number of registers and functional units resulting in an integrated circuit with usage of small silicon area. A technique is presented for computing the minimum number of registers, allocating the data to these registers and obtains synthesizable RTL code for folded architecture. Keywords: Folding Architecture, RTL (Register Transfer Logic), Register Minimization, Lifetime Analysis. I. INTRODUCTION In today's VLSI world, Designers had to design circuit with high performance and with less area and this to be done with a rapid design time. CAD tool play a very important role in achieving this requirement. ASIC design process start with given specification, from these high level functional block is obtained. These can be later used for obtaining circuit level device. In present work designed of 3-tap IIR filter model is design in MATLAB SIMULINK using XILINX block set, System generator which generate automatic synthesizable RTL code and design specification report of speed, area, power and registers. Folding technique provide a mean for trading area for time in a DSP architecture. DSP Architecture consists of adders and multipliers, in CMOS technology multiplier consume more power and thus structure must be implemented using one adder and multiplier using folding technique with minimum registers. The work carried out in previous paper is for reduction in clock period using retiming method [1]. In this paper, it had been reported that there is minimization of clock period but number of register is increasing. In this paper the technique is applied on folded retimed filter to reduce the registers. First, Design a 3-tap IIR folded retimed filter in MATLAB SIMULINK using XILINX block and obtain synthesizable RTL code automatic which reduce time for designer, observe the number of register that has been used from synthesize report. Next, Find iteration bound using longest path matrix (LPM) and minimum cycle mean (MCM) algorithm using MATLAB. Then, Obtain folded retimed architecture of 3-tap IIR filter(Manually) and again check for iteration bound using LPM and MCM algorithm. Iteration and loop bound must remain same(MATLAB).Required number of registers is more in folded structure so use life time analysis technique which is part of folding technique for minimization of registers(Manually). Finally, Design a folded structure according to life time analysis technique and write an HDL code and synthesize report of folded structure compare with pervious synthesize result.(XILINX). International Journal of Research in Electronics & Communication Technology Volume-2, Issue-1, January-February, 2014, pp. 19-30, © IASTER 2013 www.iaster.com, ISSN Online: 2347-6109, Print: 2348-0017
Transcript

19

Optimized Design Platform for High Speed

Digital Filter using Folding Technique

1Shreyas Patel

,

2Prof.J.S. Rani Alex

1Department of SENSE, VIT University, Chennai, India

2Department of SENSE, VIT University, Chennai, India

ABSTRACT

Implementation of DSP system must satisfy the sampling rate constraint and must require less space

and power consumption. Thus finding a reasonable solution to optimize design platform using

different algorithm is much needed. In this paper an optimized platform is designed by lifetime

analysis which is one of the techniques of folding algorithm for minimizing the registers such that

synthesizable RTL is obtained. Folding techniques can be used for the synthesis of DSP architecture

that can be operated using single or multiple clocks with less number of registers and functional units

resulting in an integrated circuit with usage of small silicon area. A technique is presented for

computing the minimum number of registers, allocating the data to these registers and obtains

synthesizable RTL code for folded architecture.

Keywords: Folding Architecture, RTL (Register Transfer Logic), Register Minimization, Lifetime

Analysis.

I. INTRODUCTION

In today's VLSI world, Designers had to design circuit with high performance and with less area and

this to be done with a rapid design time. CAD tool play a very important role in achieving this

requirement. ASIC design process start with given specification, from these high level functional

block is obtained. These can be later used for obtaining circuit level device. In present work designed

of 3-tap IIR filter model is design in MATLAB SIMULINK using XILINX block set, System

generator which generate automatic synthesizable RTL code and design specification report of speed,

area, power and registers. Folding technique provide a mean for trading area for time in a DSP

architecture. DSP Architecture consists of adders and multipliers, in CMOS technology multiplier

consume more power and thus structure must be implemented using one adder and multiplier using

folding technique with minimum registers.

The work carried out in previous paper is for reduction in clock period using retiming method [1]. In

this paper, it had been reported that there is minimization of clock period but number of register is

increasing. In this paper the technique is applied on folded retimed filter to reduce the registers. First,

Design a 3-tap IIR folded retimed filter in MATLAB SIMULINK using XILINX block and obtain

synthesizable RTL code automatic which reduce time for designer, observe the number of register that

has been used from synthesize report. Next, Find iteration bound using longest path matrix (LPM) and

minimum cycle mean (MCM) algorithm using MATLAB. Then, Obtain folded retimed architecture of

3-tap IIR filter(Manually) and again check for iteration bound using LPM and MCM algorithm.

Iteration and loop bound must remain same(MATLAB).Required number of registers is more in folded

structure so use life time analysis technique which is part of folding technique for minimization of

registers(Manually). Finally, Design a folded structure according to life time analysis technique and

write an HDL code and synthesize report of folded structure compare with pervious synthesize

result.(XILINX).

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, pp. 19-30, © IASTER 2013

www.iaster.com, ISSN Online: 2347-6109, Print: 2348-0017

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

20

II. FOLDING TECHNIQUE

Folding can be used to reduce the number of hardware functional unit by a factor of N at the expense of

increasing the computational time by a factor of N. While folding transformation reduces the number

of functional unit in the architecture, it may also apply to an architecture that uses a larger number of

register. To avoid architecture consist of excessive amount of register, life time analysis technique can

be used to compute the minimum number of register required to implement a folded DSP architecture.

Using register minimization along with folding transformation not only reduce number of functional

unit but also keeps the area as minimum as possible[8]. Fig-1 shows an example of 2 addition

operations can be time multiplexed on a single pipelined hardware adder [9].

Fig-1 DSP program with 2 addition operation [9]

y(n)=a(n)+b(n)+c(n) (1)[8]

In Fig-2, the 2 addition operation are time-multiplexed on a single pipelined adder.

Fig-2 A folded architecture 2 addition operation are folded to a single hardware adder

with 1 stage of pipelining.[9]

Table-1 operation of first six cycle of the folded hardware[8][9]

In Table-1 in cycle 0,th sample a(0) and b(0) are switched into adder and in cycle 1[8], sum of

(a(0)+b(0)) is switched into adder along with c(0),in cycle 2 when sum of (a(0)+b(0)+c(0)) is output

and intermediate result (a(1)+b(1)) is computed by the adder[8]. This process continues as shown in

table-1[8].The use of systematic folding technique is explained by folding the 2-tap retimed IIR filter,

shown in Fig-4. Assume that addition and Multiplication require 1 and 2 unit the filter is folded with

folding factor N=4[8],folding factor N means that iteration period of folded hardware is 4 unit i.e each

node of filter is executed exactly once every 4 unit in folded architecture[8].

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

21

For folded system to be realized DF (UV) >= 0 must hold for all of edge in DFG (data flow graph),

must implies

Nwr(e)-Pu+V-U >=0 (2)

where Pu is processing unit time and Wr(e) is number of delay in edge Consider a one node-1 at

instance (S1/3) doing to Node-2 with instance (S1/1) with one delay

4(0)-1+1-3=-3 (before folding)

4(1)-1+1-3=1 (after folding)

Fig-2(A) Retimed Biquard Filter with Valid Folding Structure

Fig-2(B) The Folded Biquard Filter using 1 Adder and 1 Multipier [8]

As shown in Fig-2(b) number of adder and multiplier reduce, consider node 1 in Fig-2(a) at instance

4l+3 input to adder and at instance 4l+1 is output of filter compare this operation with Fig-2(b).as per

equation (2) delay is 1 unit so in Fig-2(b) sample at IN,{3} enter input as adder and after 1 delay

again input to adder{1},this structure give same functionality as Fig-2(a).but problem with this

structure, it required more number of delay(register).

III. LIFETIME ANALYSIS

Lifetime analysis is one of folding technique used to compute minimum number of register require to

implement a dsp algorithm in hardware[8] .A data sample is live from the time it is product through the

time it is consumed. After the variable is consumed it is dead[10]. A variable occupies one register during

each time unit that is live[10]. In lifetime analysis, the number of live variable at any time unit is

determined[10]. This is the minimum number of register required to implement the DSP program[8].

The folded architecture without lifetime analysis show in Fig-2(b) requires 6 register and 1 adder and

multiplier. Since retiming for folding has already been performed ,the next step is to construct the

lifetime show in Table-2.In life time there is one entry for each node in DFG, that specify the

lifetime(TinputToutput) for a node.

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

22

Tinput =u+pu (3)

Toutout=u+Pu+maxv (DF (UV) (4)

The time Tin put for node U is u+Pu where u is folding order of U and Pu is number of pipeline stage in

functional unit that execute U[9]. This value of Tinput is the time unit in which the node produce data

in hardware for the 0-th iteration of DSP programmed[8]. For example Tinput for node 1 in Fig-3 is

3+1=4.The time Toutput for node U is u+Pu+maxv(DF (UV)).where max{DF (UV)} represent

longest folded path delay among all edge that begin at node U[9]. from equation Tinput and Touput

develop a table show in Table-2.

Table-2 Lifetime for the Retimed Biquard filter

NODE TinTout

1 49

2 -----

3 33

4 11

5 22

6 44

7 56

8 34

Fig-3 Lifetime Chart[8]

Table-3 The Allocation Table for the Folded Biquard Filter[8]

The linear lifetime chart can be drawn from Table-2 for the lifetime Fig-3 shown, at last the

allocation of data variable to register shown in Table-3.Lifetime analysis need less number of

register compare to folded technique. Same folded architecture is obtained by using lifetime

analysis with 2 register shown in Fig-4[8].

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

23

Fig-4 A Folded Biquard Filter Architecture Implementing the DFG Using

Minimum Number of Registers [8][9]

As from Fig-4 same biquade filter is implemented by using 1 adder and multiplier with two registers

and data allocate in registers using switching activity

IV. DESIGN AND ANALYSIS

In this paper the main goal to reduce the

number of registers used in retimed folded 3-

Tap IIR filter using HDL, for comparison of

designer HDL code first we are designing the

retimed folded 3-Tap IIR filter in Matlab

Simulink using Xilinx System Generator,

show in Fig-5(a) and output for 5 discrete

sample shown in Fig-5(b),System Generator

is a system-level modeling tool that facilitate

FPGA hardware design. It extends Simulink

in many ways to provide modeling

environment that is well suited to hardware

design.

Fig-5(a) Implementation of Retimed Folded 3-TAP IIR Filter in Matlab Simulink

using System Generator

Fig-5(b) Output of 3-TAP IIR Filter with 5 Sample

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

24

System Generator automatically compiles design into low-level representation. Design is compiled

and simulated using the System Generator. Automatically code is generated and code is synthesis in

Xilinx simulator to find number of register used in retimed folded 3-TAP IIR filter, Synthesis

report is been show in Fig-5(c).

Fig-5(c) Automatic Synthesis Report generated by System Generator

From the synthesis report the number of registers slice generated by System generator is 48, so our

aim to reduce number of Registers by writing HDL for folded structure. For folded structure we

need to do calculation analytically and by using matlab. The 3-tap IIR Filter been designed by using

dataflow graph .Dataflow graph gives detail information without implementation of hardware and

can be able to represent any algorithm. A DFG is a directed graph G(V,E) with a set of edges E.

These set of nodes V are subdivided into computational nodes, input and output nodes [1].

(a) (b)

Fig-6 (a) 3-TAP IIR filter (b) Dataflow graph of 3-TAP IIR filter

In dataflow graph representation the node represent computational time and directed edge represent

data path and each has a non-negative number of delay associated with node implementation of data

flow graph represent in Fig-7.This filter is folded with folding factor N=6,means that iteration period

of folded hardware is 4 U.T,each node in 3-tap IIR filter is executed exactly ones the iteration period

can be founded by using LPM(longest path matrix) and MCM(Minimum cycle mean) algorithm,

algorithm is implemented in matlab to check iteration period, after and before folding, the property of

folding transformation that loop bound and iteration bound should not change after adding number

of delay in path.

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

25

Fig-7 Dataflow Graph of 3-Tap IIR Filter Using Matlab

In present paper as per eqa-2 the weight (delay) of Df(UV) is calculated, some of edge may get

negative value shown in Table-8, the edge with negative Df(UV) can be made non-negative by

increasing (decreasing) number of delay the Df(UV) by Nw,while adding delay property should not

be effected.

Table-4 Folding Equation for Folding Constraint for DFG

Df(UV) Delay

112 -2

14 1

15 3

16 9

17 5

18 4

19 8

21 1

32 1

42 0

53 -1

63 -4

711 -5

810 -6

910 -4

1011 1

1112 2

In the Table-4 some of edge get negative value to make non-negative value, we added a

delay(register) to make them positive after adding delay to each negative value retime 3-TAP IIR

filter with valid folding retimed structure is shown in Fig-8,but adding delay there is increase in

latency but functionality and property

1.loop bound remain same

2.iteration bound must remain same

iteration bound and Loop bound of folded architecture can check by using LPM and MCM algorithm

shown in Fig-8(a),Fig-8(b).

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

26

Fig-8 Retimed 3-TAP IIR Filter with Valid Folding Architecture

Fig-8(a)Verified Iteration Bound using LPM after Adding Delay

Fig-8(B) Verified Iteration Bound using MCM after Adding Delay

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

27

In folded structure {(s1/0),(s1/1),(s1/2),(s1/3),(s1/4),(s1/5),(s2/0),(s2/1),(s2/2),(s2/3),(s2/4),(s2/5)}, are

assumed instance at particular time. For folded structure again calculate the delay for each edge, as per

mention above Life time Analysis a linear life time chart is used to graphically represent the lifetime of

variable in a linear fashions be calculated as per (eqation-3 and equation-4) show in Table-5.

Table-5 Lifetime Chart

NODE Tinput Toutput

1 6 15

2 4 5

3 2 3

4 3 3

5 2 7

6 5 7

7 7 8

8 6 6

9 4 6

10 1 2

11 3 5

Fig-9 Life Time Chart

The vertical line in Fig-9 represent the clock cycle and horizontal line represent the activation of node

at particular clock cycle. For example sample leaving from node-1(Fig-8) should activate at 6th clock

cycle and must reach at node 6 with 9 delay. While writing HDL code Table-6 gives information

about data allocation in registers.

Table-6 Data Allocation in Register for Every Clock Cycle

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

28

Fig-10 Folded Architecture of 3-TAP IIR Filter Using Lifetime Chart

Fig-10 show is folded structure of 3-TAP IIR filter now to represent this structure in digital design for

writing HDL we need to replace those switches by multiplexer and need RAM to store data for filter

co-efficient and to store

Multiplier output which can be further used Fig-11 show the implementation of Fig-10(Folded

architecture of 3-TAP IIR filter) in digital design.

Fig-11 3-TAP IIR Filter Folded Digital Design

3-TAP IIR filter with folded structure using 4 register,1 adder and 1 multiplier in Xilinx with HDL

code and synthesis and design summary report is to be compare the result with report generated by

System Generator

V. SIMULATION RESULT

Fig-12 3-tap IIR Folded Filter using Xilinx Simulation Tool

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

29

Fig-13 RTL Schematic View of Folded 3-TAP IIR Filter

Fig-14 Synthesis Report of Folded Digital Design in Xilinx

From Fig-14 Synthesis report, registers get reduce with usage of 5 Look-up Table. In previous design

work been carried out for optimizing the clock period by using retiming technique but disadvantage

of paper is after doing retiming, they are able to reduce clock period but in report they shown the

number of registers is increasing shown in Fig-15 ,so our design give reduction in register can be seen

by synthesis report.

Fig-15 Previous Work Simulation Result [1]

International Journal of Research in Electronics & Communication Technology

Volume-2, Issue-1, January-February, 2014, www.iaster.com ISSN

(O) 2347-6109

(P) 2348-0017

30

VI. CONCLUSION

In this particular work a design optimized platform is developed for Digital filter. There are two ways

by which optimization is performed in the current work. Firstly folding and second lifetime analysis

technique but in folding functional unit and critical path is reduced but there is increasing in number

of registers so lifetime analysis method is chosen which reduce the critical path, functional unit as

well as registers and generates the synthesizable HDL. Since the entire process is reduce area

occupied by register.

VII. REFERENCES

[1] Deepa Yagain,Dr. Vijaya Krishna A"Design Optimization Platform for Synthesizable High

Speed Digital Filters Using Retiming Technique"IEEE-ICSE2012 Proc., 2012, Kuala Lumpur,

Malaysia.

[2] Daniel D. Gajski, Lognath Ramachandran “IEEE Design & Test,” volume 11, Issue 4 (Oct

1994), Publishers: IEEE computer society press, Los Alamitos, CA,USA ,ISSN: 0740-7475,pp-

44-54.

[3] Zahra Jeddi and Esmail Amini “Power optimization of Sequential Circuits by Retiming and

Rewiring”, IEEE, 2006

[4] Ozgur Sinanoglu and Vishwani D. Agrawal “Retiming Scan Circuit to Eliminate Timing

Penalty”,IEEE, 2010.

[5] A. Chandrakasan, S. Sheng, and R. Brodersen, “Low-power CMOS digital design,” IEEE J.

Solid-State Circuits, vol. 27, pp. 473–484, Apr. 1992.

[6] Zahra Jeddi and Esmail Amini ”Power optimization of Sequential Circuits by Retiming and

Rewiring”, IEEE, 2006.

[7] K. K. Parhi "Synthesis of Control Circuits in Folded Pipelined DSP Architectures", IEEE Jl. of

Solid-State Circuits, vol. SC-27, no. 1, pp.29 -43 1992.

[8] KESHAB K.PARHI "VLSI DIGITAL SIGNAL PROCESSING SYSTEM design and

implementation" ISBN:978-81-265-1098-6, 2012.

[9] Pierre COULON "Postgraduate Course on Signal Processing in Communications, FALL – 99.

[10] S. Srinivasan. "A novel architecture for lifting-based discrete wavelet transform for

JPEG2000standard suitable for VLSI implementation", 16th International Conference on VLSI

Design2003 Proceedings ICVD-03, 2003.


Recommended