Jgvldgram
Digital signal processors and applications
ByProf. Sridhar Ranganathan
VIT,Chennai
Why DSP?
• DSP algorithms need lot of mathematical operations on every sample of data
• They need to be done quickly [ before next sample of data arrives]
• Deferred processing is NOT possible• General purpose processors provide Add,
subtract and shift operations• They provide multiply and divide but typically
they take lot of memory cycles
DSP need
• General purpose processors also suffer from space constraints
• They also consume lot of power
Hardware features visible through DSP
• Hardware Modulo addressing• Memory architecture designed for streaming
data – they may support several memory accesses per cycle
• DMA• Multiple arithmetic units• Harvard architecture
Features of DSP contd..• Special SIMD instructions• Some processors use VLIW [ very long instruction
word] techniques• Specialized instructions of DSP[ like MAC – Multiply
and accumulate] work quickly• Special algorithms are packaged as libraries for quick
functionality• Bit reversed addressing that would help in calculating
FFT• Deliberate exclusion of memory management unit –
they do not support virtual memory
Data opertions
• Saturation arithmetic – operations that produce overflows will accumulate the maximum number as the result
• Fixed point arithmetic • Single cycle operations
History of DSP
• Originally Bit slice processors are used for implementing DSPs
• Example – AMD2901 4 bit processors• By connecting several AMD20
Multiplier and Multiplier Accumulator[MAC]
• Array multiplication is one of the common operations required in DSP
• Example operations that require array multiplication are – Convolution– Correlation
• One of the important requirements of array multiplier is that we need to process the signals in real time
• Operations related to one sample need to be completed before next sample arrives
• If sampling frequency is 100 Hz, the operations needed by the present sample need to be completed before 0.01s
• Higher the sampling frequency, lesser the time available for computation based on present sample
How to construct a real time array multiplier
• Two approaches– A dedicated MAC unit may be implemented in hardware,
which integrates multiplier and accumulator in a single unit Ex. Motorola DSP5600X
– Have separate multiplier and accumulator. Example for this approach TIDSP320C5X - Here output of the accumulator is stored into product register and content of product register is added to accumulator register in the central ALU
– In both the approaches MAC operation can be completed in one cycle
– Thus the presence of H/W multipliers and multiplier accumulator is one of the mandatory requirements of P-DSP[Programmable DSP]
How the array multiplier operates
• Let the input signal [ present sample and previous M samples are]
Array x=xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1
and the array input corresponding to the impulse response of the sequence is
Array h=h0h1h2…………hM-3hM-2hM-1
• The output at the nth sampling instant yn is obtained by multiplying xn with the array h
xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1
Array multiplier operation - II
• x n+1 is obtained by shifting xn so that the [n+1] th sample becomes first element and all
the elements of the x array are shifted right such that ith element of xn becomes [i+1] the element of xn+1
• The content of the product register is added to accumulator before new product is stored
• Further the content of ‘dma’ is copied to next location whose address is ‘dma+1’.
Harvard architecture
Harvard architecture explained• This employed entirely separate memory systems to store instructions
and data• CPU fetched the next instruction• It also fetched data simultaneously• Its unique feature is instruction address space and data address space are
separate• Each address space can have the same address• So An address does NOT uniquely specify a memory location• You also need to store which address space you are referring to.• This will use two buses – one for accessing instructions and one for
accessing data
Von Neumann architecture
Von Neumann architecture explained
• It employs one address space• Instructions and data are stored in the same address
space• The PC refers to the next instruction• It takes the instruction, examines it and the instruction
would be having pointers to operands• If the pointer gets corrupted, there is a possibility of
program abending• As it fetches instruction and then data, this
architecture is slow • So P-DSPs rarely use this architecture
Modified Harvard architecture
• In a Pure Harvard architecture, mechanisms need to be provided to load programs into program memory and initial data into data memory
• Modern machines use Multiple buses– One will access both program memory and data memory– One will access only data memory– Data can also be transferred from one memory to another
memory• This feature is used in modern day P-DSPs• This is helpful at start time too as constant data can be
transferred from program memory to data memory
Advantage of having multiple busses
• Number of accesses/memory cycle can be increased
• Motorola DSP5600X, DSP96002 have three memory buses and three memory accesses/cycle
• TMS320C54X has four memory buses and four memory accesses/cycle
Multiple access memory
• Memory that permits more than one memory access per cycle is called Multiple access memory
• Dual access RAM technology permits two memory accesses per clock cycle
• Four memory accesses are also possible if Dual access RAM memory is connected to P-DSP with two independent address and data buses
Multiported memory
• No of accesses can be increased using multiport memory
• Typical 2 port memories will have two memory address buses and two data buses
• Thus two different chips need not be used in Harvard architecture
• Disadvantage– Increased complexity– More number of pins, more area and increased cost
VLIW architecture• VLIW – Very long instruction word• Transmeta crusoe is a chip that uses this technique• TMS320C6X also uses similar technique• This reads relatively large group of instructions• They execute them at the same time• For this purpose they have
– Many ALUs– Many Multipliers– Many shifters etc.,
• VLIW is accessed from memory and it specifies the operations and operands for performing on different data paths
• It simply increases the number of instructions executed per cycle• Performance gain with VLIW depends on parallelism achievable
with the algorithm
Instruction pipelining
• An instruction may have many phases– Fetch– Decode– Execute – Write
• Throughput will be low if all these are executed serially as when one stage is busy others are idle
• All these stages could be operated parallely in pipelining technique which will improve throughput
Pipelining diagram
Special addressing modes in P-DSPs
• Short immediate addressing• Short direct addressing• Memory mapped addressing• Indirect addressing• Bit reversed addressing• Circular addressing
Special addressing modes explained - 1
• Short immediate addressing– Operand is specified as a short constant– This forms part of the instruction– Length depends on P-DSP– Example – TMS320C5X – an 8 bit constant could be used
• Short direct addressing– The lower order address of operand is specified as part of the
instruction– Higher order bits could be stored elsewhere – like a page
pointer– Example
• TITMS320 DSP – lower 7 bits are specified in instruction• Motorola DSP5600X lower 6 bits are specified in instruction
Special addressing modes explained - 2
• Memory mapped addressing– CPU registers and I/O registers are accessed as memory locations– This is done by storing them in the initial or final page– Example
• TMS320C5x – page 0 corresponds to CPU registers and I/O registers• Motorola DSP5600X – last page is used
• Indirect addressing– Address of operands can be stored in one of the registers called
indirect access registers– When operands are fetched from addresses specified in registers, the
registers are updated– This is by done having another special CPU or ALU for updating these
addresses– Increment can be 1 or an offset in some special registr
Special addressing modes explained - 3
• Bit reversed addressing – Binary pattern corresponding to a particular decimal number is
obtained by writing the natural binary equivalent in reverse order
– Therefore LSB becomes MSB and MSB becomes LSB– Address is incremented or decremented in bit reversed form
• Circular addressing mode– In real time data will be continuously coming– If they are stored in linear buffers, buffer would be exhausted– If they are stored in circular buffer, new data would overwrite
older data– No need to check whether we have reached the end of buffer
Use of linear buffer
Use of circular buffer
Example of circular addressing
Limitations of circular buffering
Methodology for a circular buffer
On Chip peripherals
• On chip timer– They generate periodic interrupts to the DSP– They also generate sampling clocks for A/D
converters
• Serial port– They enable data communication between P-DSP and peripherals
such as ADC,DAC or a RS-232C device.– These ports have buffers such that the DSP sends data and reads data
to these ports in parallel form but the data is sent out through these ports in serial form and data is read from these ports in serial form
On Chip peripherals contd..
• TDM serial port– a special serial port which permits P-DSP to
communicate with other devices or other P-DSPs using Time division multiplexing format
• Parallel ports– They are faster than serial port
• Bit I-O port– These are only single bit wide– They can be individually set, reset or read– These bits are used for control purposes or for data transfer also
On Chip peripherals contd..
• Host port– A special type of parallel port the P-DSPs have– This enables the P-DSPs to communicate with a processor or a PC which is
called host– They can communicate data through this– They can generate interrupts– They also help P-DSP to load a program from ROM to RAM
• Common ports– They are used for communication between many P-DSPs in a
multiprogrammed system
• On Chip ADCs and DACs– They are used to enable P-DSP communicate with analog world– They are used in cellular phones and tapeless answering machines
TMS320C50
Complex DSP operations
• Sum of products is the most key element in most DSP algorithms
Algorithm Equation
Finite Impulse Response Filter
M
kk knxany
0
)()(
Infinite Impulse Response Filter
N
kk
M
kk knybknxany
10
)()()(
Convolution
N
k
knhkxny0
)()()(
Discrete Fourier Transform
1
0
])/2(exp[)()(N
n
nkNjnxkX
Discrete Cosine Transform
1
0
122
cos).().(N
x
xuN
xfucuF