Final.ppt LUT Mul

transcript

Design And Implementation Of Lut Multiplier Design And Implementation Of Lut Multiplier For Dsp Based ApplicationsFor Dsp Based Applications

ByDinesh Alapati (08J41A0483)

S. M. Himanshu (08J41A0488) P. Vishnu (08J41A04D1)

OutlineOutline Objective Introduction Digital Signal Processing (DSP) Field Programmable Gate Array (FPGA) Distributed Arithmetic (DA) Architecture New approach to LUT design of FIR digital filter Simulation Results Advantages Applications Conclusion Future scope References

ObjectiveObjective

Digital filters are a very important part of DSP.

Advancements of digital signal processing functions in FPGA has put great

efforts in designing efficient architectures for DSP functions.

Conventional design of an Digital FIR Filter based on the direct implementation

of a K-tap FIR filter requires K multiply-and-accumulate (MAC) blocks.

we first present DA, which is a multiplier-less architecture. There is an

exponential increase in the size of the memory with respect to order of the

filter.

Proposed architecture is designed with a new approach to LUT based multiplier

whose memory is reduced to half.

MotivationMotivation

IntroductionIntroduction

Digital Signal Processing (DSP) is one of the most active area in VLSI

applications

Traditionally, DSP algorithms are implemented either using general purpose

DSP processors (Low speed, less expensive, flexible) or using ASICs (High

speed, expensive, less flexible)

FPGAs provide solutions that maintain both the advantages of the approach

based on DSP processors and the approach based on ASICs

DSP applications include multiply-and-accumulate (MAC) blocks, which

require efficient architecture to designing these blocks.

Multipliers using the logic fabric of the FPGA is costly. An alternative

method for computing multiplication is to decompose the MAC

operations into a series of LUTs.

Distributed Arithmetic (DA) FIR filter using FPGA architecture.

We propose, New approach to LUT design and memory based realization

of FIR digital filter.

Introduction…Introduction…

Comparisons of diff Programmable LogicsComparisons of diff Programmable Logics

FIR Filters

FPGA – Generic StructureFPGA – Generic Structure A Field-Programmable Gate Array (FPGA) is a

semiconductor device that can be configured by the

designer after manufacturing— hence the name

"field-programmable".

FPGA Building Blocks :

o Programmable logic blocks

Implement combinatorial and sequential logic

o Programmable interconnect

Wires to connect inputs and outputs to logic blocks

o Programmable I/O blocks

Special logic blocks at the periphery of device for

external connections

Logic Block

Interconnection Switch

FPGA – Basic Logic ElementFPGA – Basic Logic Element

LUT to implement combinatorial logic

Register for sequential circuits

Additional logic (not shown):

o Carry logic for arithmetic functions

o Expansion logic for functions requiring more than 4 inputs

LUTLUT

Select

Look Up Table (LUT)Look Up Table (LUT)

Look-Up Table is possible to store binary data within solid-state devices. Those

storage "cells" within solid-state memory devices are easily addressed by

driving the "address" lines of the device with the proper binary value(s).

Look-up table with N-inputs can be used to implement any combinatorial

function of N inputs

The basic features of LUT is :o Complete times table of all possible input combinations o One address bit for each bit in each input o Table size grows exponentiallyo Very limited use o Fast - result is just a memory access away

LUT Based MultiplierLUT Based Multiplier

x[n] y[n]

In many DSP circuits, multipliers always have one constant input.

Ci (constant)

For the above multiplier, y[n] purely depends on x[n]. Thus, a look-up table (LUT) can be used to implement the multiplier.

For example, a 256 X 16 bit memory canbe used to implementa 8-bit ,multiplier

if one of its input is always constant.

x[n] y[n]

Distributed Arithmetic ArchitectureDistributed Arithmetic Architecture LUT Technique for Distributed Arithmetic : Distributed Arithmetic (DA) is the well known method of

implementing FIR filters. DA solves the computation of the inner product equation when the

coefficients are pre knowledge, as happens in FIR filters. An FIR filter K is described as:

y=∑K-1 x[n-k] hk

In this equation, the hk are the fixed coefficients, K is the number of filter taps and xk are the input data words. These ones have a standard fixed-point format number.

Using registers, memory resources and a scaling accumulator does the implementation of digital filters using this arithmetic.

Distributed Arithmetic (cont’d)Distributed Arithmetic (cont’d)

Original LUT-based DA implementation of a 4-tap (K=4) FIR filter is shown in Figure. The DA architecture includes three units: the shift register unit, the DA-LUT unit, and the adder/shifter unit.

New Approach to LUT Design of FIR Digital FilterNew Approach to LUT Design of FIR Digital Filter

FIR digital filter is widely used in various signal processing applications.

The order of the filter is directly dependent on the width of the transition band.

Hence, the number of MAC operations required increases respectively.

In DA architecture, where the memory elements store all the possible values of products of the filter coefficients could be an area efficient criteria for implementation of FIR filter.

There is a basic variant of memory based technique which is based on computation of multiplication by LUT.

In this work, i.e. in designing the LUT for LUT-based-multiplier implementation, where the memory size is reduced to nearly half of the conventional approach.

LUT Design for Memory-Based MultiplicationLUT Design for Memory-Based Multiplication

The conventional memory-based-multiplier is depicted as following:

Fig : Conventional Memory-Based Multiplier

o Let ‘A’ be a fixed coefficient and ‘X’ be an input word to be multiplied with ‘A’.

o If ‘X’ is an unsigned binary number of word length L, there can be 2L possible values of X and accordingly there can be 2L possible values of product C=A.X.

o Therefore, for the conventional implementation of memory based multiplication, a memory unit of 2L words is required to be used as look-up-table consisting of pre-computed product values corresponding to all possible values of X.

LUT Design for Memory-Based Multiplication LUT Design for Memory-Based Multiplication (cont’d)(cont’d)

In this work, the basic principle of memory based multiplication is depicted in following :

o In the proposed memory based multiplication the memory used is exactly reduced to half of that which used in the conventional based multiplication.

o Although 2L possible values of X corresponding to 2L possible values of C=A.X, recently we have shown that only (2L/2) words corresponding to the odd multiples of A may only be stored in the LUT.

o while all the rest (2L/2)-1 are even multiples of A which could be derived by left-shift operations of one of the odd multiples of A.

o We illustrate this in the following table for L=4.

Table: LUT words and product values for input word length L=4

We illustrate this approach in table for L=4.

o At 8 memory locations, 8 odd multiples A*(2i +1) are stored as Pi for i = 0,1,2…7.

o The even multiples 2A, 4A, and 8A are derived by left-shifting operations of A.

o Similarly, 6A and 12A are derived by left shifting 3A, while 10A and 14A are derived by left shifting 5A and 7A, respectively.

o The address X=(0000) corresponding to (A.X)=0, which can be obtained by resetting the LUT output.

o Therefore, for an input multiplicand of word-size L similarly, only (2L/2) odd multiple values need to be stored in the memory-core of the LUT, while the other (2L/2-1) non zero could be derived by left shift operations of the stored values.

Proposed LUT-Based Multiplier for 4-Bit InputProposed LUT-Based Multiplier for 4-Bit Input

The proposed LUT-based multiplier for input word size is shown in the following figure:

Fig : Proposed LUT design for multiplier

LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)

The various modules included in the above block diagram are :o 4 to 3 bit Address Encoder :

The 4-to-3 bit input encoder is shown in Fig. 3(b). It receives a four-bit input word (x3x2x1x0) and maps that onto the three-bit address word, according to the logical relations.

o 3 to 8 Line Address Decoder :The decoder takes the 3-bit address from the input encoder, and generates 8 word-select signals, to select the referenced-word from the memory-array.

Fig: 3 to 8 decoder

o Control Circuit :

• The number of shifts required to be performed on the output of the LUT and the control-bits and for different values of are shown Table.

• The control circuit accordingly generates the control-bits given by,

o Barrel Shifter :

• The LUT output is required to be shifted through 1 location to left when the input operand is one of the values.

• Two left-shifts are required if is either (0 1 0 0) or (1 1 0 0).

• Only when the input word is (1 0 0 0), three shifts are required.

• For all other possible input operands, no shifts are required. • Since the maximum number of left-shifts required on the stored-

word is three, a two-stage logarithmic barrel-shifter is adequate to perform the necessary left-shift operations.

NOR cell :

o The RESET bit is fed to one of the inputs of all those NOR gates, and the other input lines of 8 NOR gates of NOR cell are fed with 8 bits of LUT output in parallel.

o When RESET = 1, the output is 0.o When RESET = 0, the outputs of NOR gates is just the

compliment of the LUT output-bits.

Applications of LUT multiplierApplications of LUT multiplier

The applications of implementing LUT multiplier are : In Digital filters ( such as FIR and IIR). Finite Impulse Response (FIR) filters using LUT multiplier

approach are widely used to implement pulse-shaping filters. Digital phase-locked loop (PLL) frequency synchronizers. Discrete cosine transform (DCT) cores. In digital communication :o Channel equalization.o Frequency channelization. Speech processing (adaptive noise cancelation).

ConclusionConclusion

Traditionally, direct implementation of a K-tap FIR filter requires K multiply-

and-accumulate (MAC) blocks, which are expensive to implement in FPGA

due to logic complexity and resource usage.

An alternative to computing the multiplication is to decompose the MAC

operations into a series of lookup table (LUT) accesses and summations.

Advantage of this method is the LUT’s readily available in the FPGA’s can be

utilized efficiently.

This work presents the proposed DA architectures for FIR filters, i.e., multiplier

less architecture. Then, the complexity is reduced. Hence there is low power

consumption. Then performance increases. Then the speed increases.

Future ScopeFuture Scope

Future scope of this project is to improve the architecture of the Distributed

arithmetic FIR filter such that it uses the hardware resources of the latest FPGA

families.

In vertex-5 and Vertex-6 family FPGA’s, 6-input LUT’s were introduced.

Future work includes changing the architecture which uses 6-input LUT’s for

storing coefficient sums and SRL(Shift register logic) macros to implement

shift operations such that total number of slices used will be reduced.

BibliographyBibliography

References:o DIGITAL SIGNAL PROCESSING Principles, Algorithms, and

Applications by John G.Proakis, Dimitris G.Manolakiso DIGITAL SIGNAL PROCESSING by NagoorKanio SWITCHING THEORY AND LOGIC DESIGN by R.P.Jaino Wang Sen, Tang Bin, Zhu Jun, “Distributed Arithmetic for FIR

Filter Design on FPGA”

Websites:o www.wikipedia.org/wiki/FIRo www.wikipedia.org/wiki/daFIRo www./ipcores/distributedarithmeticFIRd.cfmo www.daFIR.cfm

Final.ppt LUT Mul

Documents