Post on 20-Feb-2015
transcript
Design And Implementation Of Lut Multiplier Design And Implementation Of Lut Multiplier For Dsp Based ApplicationsFor Dsp Based Applications
ByDinesh Alapati (08J41A0483)
S. M. Himanshu (08J41A0488) P. Vishnu (08J41A04D1)
OutlineOutline Objective Introduction Digital Signal Processing (DSP) Field Programmable Gate Array (FPGA) Distributed Arithmetic (DA) Architecture New approach to LUT design of FIR digital filter Simulation Results Advantages Applications Conclusion Future scope References
ObjectiveObjective
Digital filters are a very important part of DSP.
Advancements of digital signal processing functions in FPGA has put great
efforts in designing efficient architectures for DSP functions.
Conventional design of an Digital FIR Filter based on the direct implementation
of a K-tap FIR filter requires K multiply-and-accumulate (MAC) blocks.
we first present DA, which is a multiplier-less architecture. There is an
exponential increase in the size of the memory with respect to order of the
filter.
Proposed architecture is designed with a new approach to LUT based multiplier
whose memory is reduced to half.
MotivationMotivation
IntroductionIntroduction
Digital Signal Processing (DSP) is one of the most active area in VLSI
applications
Traditionally, DSP algorithms are implemented either using general purpose
DSP processors (Low speed, less expensive, flexible) or using ASICs (High
speed, expensive, less flexible)
FPGAs provide solutions that maintain both the advantages of the approach
based on DSP processors and the approach based on ASICs
DSP applications include multiply-and-accumulate (MAC) blocks, which
require efficient architecture to designing these blocks.
Multipliers using the logic fabric of the FPGA is costly. An alternative
method for computing multiplication is to decompose the MAC
operations into a series of LUTs.
Distributed Arithmetic (DA) FIR filter using FPGA architecture.
We propose, New approach to LUT design and memory based realization
of FIR digital filter.
Introduction…Introduction…
Comparisons of diff Programmable LogicsComparisons of diff Programmable Logics
FIR Filters
FPGA – Generic StructureFPGA – Generic Structure A Field-Programmable Gate Array (FPGA) is a
semiconductor device that can be configured by the
designer after manufacturing— hence the name
"field-programmable".
FPGA Building Blocks :
o Programmable logic blocks
Implement combinatorial and sequential logic
o Programmable interconnect
Wires to connect inputs and outputs to logic blocks
o Programmable I/O blocks
Special logic blocks at the periphery of device for
external connections
I/O
I/O
Logic Block
Interconnection Switch
I/O
I/O
FPGA – Basic Logic ElementFPGA – Basic Logic Element
LUT to implement combinatorial logic
Register for sequential circuits
Additional logic (not shown):
o Carry logic for arithmetic functions
o Expansion logic for functions requiring more than 4 inputs
LUTLUT
Out
Select
D Q
A
B
C
D
Clock
Look Up Table (LUT)Look Up Table (LUT)
Look-Up Table is possible to store binary data within solid-state devices. Those
storage "cells" within solid-state memory devices are easily addressed by
driving the "address" lines of the device with the proper binary value(s).
Look-up table with N-inputs can be used to implement any combinatorial
function of N inputs
The basic features of LUT is :o Complete times table of all possible input combinations o One address bit for each bit in each input o Table size grows exponentiallyo Very limited use o Fast - result is just a memory access away
LUT Based MultiplierLUT Based Multiplier
x[n] y[n]
In many DSP circuits, multipliers always have one constant input.
Ci (constant)
For the above multiplier, y[n] purely depends on x[n]. Thus, a look-up table (LUT) can be used to implement the multiplier.
For example, a 256 X 16 bit memory canbe used to implementa 8-bit ,multiplier
if one of its input is always constant.
x[n] y[n]
Distributed Arithmetic ArchitectureDistributed Arithmetic Architecture LUT Technique for Distributed Arithmetic : Distributed Arithmetic (DA) is the well known method of
implementing FIR filters. DA solves the computation of the inner product equation when the
coefficients are pre knowledge, as happens in FIR filters. An FIR filter K is described as:
y=∑K-1 x[n-k] hk
k=0
In this equation, the hk are the fixed coefficients, K is the number of filter taps and xk are the input data words. These ones have a standard fixed-point format number.
Using registers, memory resources and a scaling accumulator does the implementation of digital filters using this arithmetic.
Distributed Arithmetic (cont’d)Distributed Arithmetic (cont’d)
Distributed Arithmetic (cont’d)Distributed Arithmetic (cont’d)
Original LUT-based DA implementation of a 4-tap (K=4) FIR filter is shown in Figure. The DA architecture includes three units: the shift register unit, the DA-LUT unit, and the adder/shifter unit.
New Approach to LUT Design of FIR Digital FilterNew Approach to LUT Design of FIR Digital Filter
FIR digital filter is widely used in various signal processing applications.
The order of the filter is directly dependent on the width of the transition band.
Hence, the number of MAC operations required increases respectively.
In DA architecture, where the memory elements store all the possible values of products of the filter coefficients could be an area efficient criteria for implementation of FIR filter.
There is a basic variant of memory based technique which is based on computation of multiplication by LUT.
In this work, i.e. in designing the LUT for LUT-based-multiplier implementation, where the memory size is reduced to nearly half of the conventional approach.
LUT Design for Memory-Based MultiplicationLUT Design for Memory-Based Multiplication
The conventional memory-based-multiplier is depicted as following:
Fig : Conventional Memory-Based Multiplier
o Let ‘A’ be a fixed coefficient and ‘X’ be an input word to be multiplied with ‘A’.
o If ‘X’ is an unsigned binary number of word length L, there can be 2L possible values of X and accordingly there can be 2L possible values of product C=A.X.
o Therefore, for the conventional implementation of memory based multiplication, a memory unit of 2L words is required to be used as look-up-table consisting of pre-computed product values corresponding to all possible values of X.
LUT Design for Memory-Based Multiplication LUT Design for Memory-Based Multiplication (cont’d)(cont’d)
In this work, the basic principle of memory based multiplication is depicted in following :
o In the proposed memory based multiplication the memory used is exactly reduced to half of that which used in the conventional based multiplication.
o Although 2L possible values of X corresponding to 2L possible values of C=A.X, recently we have shown that only (2L/2) words corresponding to the odd multiples of A may only be stored in the LUT.
o while all the rest (2L/2)-1 are even multiples of A which could be derived by left-shift operations of one of the odd multiples of A.
o We illustrate this in the following table for L=4.
LUT Design for Memory-Based Multiplication LUT Design for Memory-Based Multiplication (cont’d)(cont’d)
Table: LUT words and product values for input word length L=4
LUT Design for Memory-Based Multiplication LUT Design for Memory-Based Multiplication (cont’d)(cont’d)
We illustrate this approach in table for L=4.
o At 8 memory locations, 8 odd multiples A*(2i +1) are stored as Pi for i = 0,1,2…7.
o The even multiples 2A, 4A, and 8A are derived by left-shifting operations of A.
o Similarly, 6A and 12A are derived by left shifting 3A, while 10A and 14A are derived by left shifting 5A and 7A, respectively.
o The address X=(0000) corresponding to (A.X)=0, which can be obtained by resetting the LUT output.
o Therefore, for an input multiplicand of word-size L similarly, only (2L/2) odd multiple values need to be stored in the memory-core of the LUT, while the other (2L/2-1) non zero could be derived by left shift operations of the stored values.
Proposed LUT-Based Multiplier for 4-Bit InputProposed LUT-Based Multiplier for 4-Bit Input
The proposed LUT-based multiplier for input word size is shown in the following figure:
Fig : Proposed LUT design for multiplier
LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)
The various modules included in the above block diagram are :o 4 to 3 bit Address Encoder :
The 4-to-3 bit input encoder is shown in Fig. 3(b). It receives a four-bit input word (x3x2x1x0) and maps that onto the three-bit address word, according to the logical relations.
LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)
o 3 to 8 Line Address Decoder :The decoder takes the 3-bit address from the input encoder, and generates 8 word-select signals, to select the referenced-word from the memory-array.
Fig: 3 to 8 decoder
LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)
o Control Circuit :
• The number of shifts required to be performed on the output of the LUT and the control-bits and for different values of are shown Table.
• The control circuit accordingly generates the control-bits given by,
LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)
o Barrel Shifter :
• The LUT output is required to be shifted through 1 location to left when the input operand is one of the values.
• Two left-shifts are required if is either (0 1 0 0) or (1 1 0 0).
• Only when the input word is (1 0 0 0), three shifts are required.
• For all other possible input operands, no shifts are required. • Since the maximum number of left-shifts required on the stored-
word is three, a two-stage logarithmic barrel-shifter is adequate to perform the necessary left-shift operations.
LUT-Based Multiplier for 4-Bit Input (cont’d)LUT-Based Multiplier for 4-Bit Input (cont’d)
NOR cell :
o The RESET bit is fed to one of the inputs of all those NOR gates, and the other input lines of 8 NOR gates of NOR cell are fed with 8 bits of LUT output in parallel.
o When RESET = 1, the output is 0.o When RESET = 0, the outputs of NOR gates is just the
compliment of the LUT output-bits.
Applications of LUT multiplierApplications of LUT multiplier
The applications of implementing LUT multiplier are : In Digital filters ( such as FIR and IIR). Finite Impulse Response (FIR) filters using LUT multiplier
approach are widely used to implement pulse-shaping filters. Digital phase-locked loop (PLL) frequency synchronizers. Discrete cosine transform (DCT) cores. In digital communication :o Channel equalization.o Frequency channelization. Speech processing (adaptive noise cancelation).
ConclusionConclusion
Traditionally, direct implementation of a K-tap FIR filter requires K multiply-
and-accumulate (MAC) blocks, which are expensive to implement in FPGA
due to logic complexity and resource usage.
An alternative to computing the multiplication is to decompose the MAC
operations into a series of lookup table (LUT) accesses and summations.
Advantage of this method is the LUT’s readily available in the FPGA’s can be
utilized efficiently.
This work presents the proposed DA architectures for FIR filters, i.e., multiplier
less architecture. Then, the complexity is reduced. Hence there is low power
consumption. Then performance increases. Then the speed increases.
Future ScopeFuture Scope
Future scope of this project is to improve the architecture of the Distributed
arithmetic FIR filter such that it uses the hardware resources of the latest FPGA
families.
In vertex-5 and Vertex-6 family FPGA’s, 6-input LUT’s were introduced.
Future work includes changing the architecture which uses 6-input LUT’s for
storing coefficient sums and SRL(Shift register logic) macros to implement
shift operations such that total number of slices used will be reduced.
BibliographyBibliography
References:o DIGITAL SIGNAL PROCESSING Principles, Algorithms, and
Applications by John G.Proakis, Dimitris G.Manolakiso DIGITAL SIGNAL PROCESSING by NagoorKanio SWITCHING THEORY AND LOGIC DESIGN by R.P.Jaino Wang Sen, Tang Bin, Zhu Jun, “Distributed Arithmetic for FIR
Filter Design on FPGA”
Websites:o www.wikipedia.org/wiki/FIRo www.wikipedia.org/wiki/daFIRo www./ipcores/distributedarithmeticFIRd.cfmo www.daFIR.cfm