Download - DSP architecture - part 1.ppt

Jgvldgram

Digital signal processors and applications

ByProf. Sridhar Ranganathan

VIT,Chennai

Why DSP?

• DSP algorithms need lot of mathematical operations on every sample of data

• They need to be done quickly [ before next sample of data arrives]

• Deferred processing is NOT possible• General purpose processors provide Add,

subtract and shift operations• They provide multiply and divide but typically

they take lot of memory cycles

DSP need

• General purpose processors also suffer from space constraints

• They also consume lot of power

Hardware features visible through DSP

• Hardware Modulo addressing• Memory architecture designed for streaming

data – they may support several memory accesses per cycle

• DMA• Multiple arithmetic units• Harvard architecture

Features of DSP contd..• Special SIMD instructions• Some processors use VLIW [ very long instruction

word] techniques• Specialized instructions of DSP[ like MAC – Multiply

and accumulate] work quickly• Special algorithms are packaged as libraries for quick

functionality• Bit reversed addressing that would help in calculating

FFT• Deliberate exclusion of memory management unit –

they do not support virtual memory

Data opertions

• Saturation arithmetic – operations that produce overflows will accumulate the maximum number as the result

• Fixed point arithmetic • Single cycle operations

History of DSP

• Originally Bit slice processors are used for implementing DSPs

• Example – AMD2901 4 bit processors• By connecting several AMD20

Multiplier and Multiplier Accumulator[MAC]

• Array multiplication is one of the common operations required in DSP

• Example operations that require array multiplication are – Convolution– Correlation

• One of the important requirements of array multiplier is that we need to process the signals in real time

• Operations related to one sample need to be completed before next sample arrives

• If sampling frequency is 100 Hz, the operations needed by the present sample need to be completed before 0.01s

• Higher the sampling frequency, lesser the time available for computation based on present sample

How to construct a real time array multiplier

• Two approaches– A dedicated MAC unit may be implemented in hardware,

which integrates multiplier and accumulator in a single unit Ex. Motorola DSP5600X

– Have separate multiplier and accumulator. Example for this approach TIDSP320C5X - Here output of the accumulator is stored into product register and content of product register is added to accumulator register in the central ALU

– In both the approaches MAC operation can be completed in one cycle

– Thus the presence of H/W multipliers and multiplier accumulator is one of the mandatory requirements of P-DSP[Programmable DSP]

How the array multiplier operates

• Let the input signal [ present sample and previous M samples are]

Array x=xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1

and the array input corresponding to the impulse response of the sequence is

Array h=h0h1h2…………hM-3hM-2hM-1

• The output at the nth sampling instant yn is obtained by multiplying xn with the array h

xnxn-1xn-2…………xn-M+3xn-M+2xn-M+1

Array multiplier operation - II

• x n+1 is obtained by shifting xn so that the [n+1] th sample becomes first element and all

the elements of the x array are shifted right such that ith element of xn becomes [i+1] the element of xn+1

• The content of the product register is added to accumulator before new product is stored

• Further the content of ‘dma’ is copied to next location whose address is ‘dma+1’.

Harvard architecture

Harvard architecture explained• This employed entirely separate memory systems to store instructions

and data• CPU fetched the next instruction• It also fetched data simultaneously• Its unique feature is instruction address space and data address space are

separate• Each address space can have the same address• So An address does NOT uniquely specify a memory location• You also need to store which address space you are referring to.• This will use two buses – one for accessing instructions and one for

accessing data

Von Neumann architecture

Von Neumann architecture explained

• It employs one address space• Instructions and data are stored in the same address

space• The PC refers to the next instruction• It takes the instruction, examines it and the instruction

would be having pointers to operands• If the pointer gets corrupted, there is a possibility of

program abending• As it fetches instruction and then data, this

architecture is slow • So P-DSPs rarely use this architecture

Modified Harvard architecture

• In a Pure Harvard architecture, mechanisms need to be provided to load programs into program memory and initial data into data memory

• Modern machines use Multiple buses– One will access both program memory and data memory– One will access only data memory– Data can also be transferred from one memory to another

memory• This feature is used in modern day P-DSPs• This is helpful at start time too as constant data can be

transferred from program memory to data memory

Advantage of having multiple busses

• Number of accesses/memory cycle can be increased

• Motorola DSP5600X, DSP96002 have three memory buses and three memory accesses/cycle

• TMS320C54X has four memory buses and four memory accesses/cycle

Multiple access memory

• Memory that permits more than one memory access per cycle is called Multiple access memory

• Dual access RAM technology permits two memory accesses per clock cycle

• Four memory accesses are also possible if Dual access RAM memory is connected to P-DSP with two independent address and data buses

Multiported memory

• No of accesses can be increased using multiport memory

• Typical 2 port memories will have two memory address buses and two data buses

• Thus two different chips need not be used in Harvard architecture

• Disadvantage– Increased complexity– More number of pins, more area and increased cost

VLIW architecture• VLIW – Very long instruction word• Transmeta crusoe is a chip that uses this technique• TMS320C6X also uses similar technique• This reads relatively large group of instructions• They execute them at the same time• For this purpose they have

– Many ALUs– Many Multipliers– Many shifters etc.,

• VLIW is accessed from memory and it specifies the operations and operands for performing on different data paths

• It simply increases the number of instructions executed per cycle• Performance gain with VLIW depends on parallelism achievable

with the algorithm

Instruction pipelining

• An instruction may have many phases– Fetch– Decode– Execute – Write

• Throughput will be low if all these are executed serially as when one stage is busy others are idle

• All these stages could be operated parallely in pipelining technique which will improve throughput

Pipelining diagram

Special addressing modes in P-DSPs

• Short immediate addressing• Short direct addressing• Memory mapped addressing• Indirect addressing• Bit reversed addressing• Circular addressing

Special addressing modes explained - 1

• Short immediate addressing– Operand is specified as a short constant– This forms part of the instruction– Length depends on P-DSP– Example – TMS320C5X – an 8 bit constant could be used

• Short direct addressing– The lower order address of operand is specified as part of the

instruction– Higher order bits could be stored elsewhere – like a page

pointer– Example

• TITMS320 DSP – lower 7 bits are specified in instruction• Motorola DSP5600X lower 6 bits are specified in instruction


• Memory mapped addressing– CPU registers and I/O registers are accessed as memory locations– This is done by storing them in the initial or final page– Example

• TMS320C5x – page 0 corresponds to CPU registers and I/O registers• Motorola DSP5600X – last page is used

• Indirect addressing– Address of operands can be stored in one of the registers called

indirect access registers– When operands are fetched from addresses specified in registers, the

registers are updated– This is by done having another special CPU or ALU for updating these

addresses– Increment can be 1 or an offset in some special registr


• Bit reversed addressing – Binary pattern corresponding to a particular decimal number is

obtained by writing the natural binary equivalent in reverse order

– Therefore LSB becomes MSB and MSB becomes LSB– Address is incremented or decremented in bit reversed form

• Circular addressing mode– In real time data will be continuously coming– If they are stored in linear buffers, buffer would be exhausted– If they are stored in circular buffer, new data would overwrite

older data– No need to check whether we have reached the end of buffer

Use of linear buffer

Use of circular buffer

Example of circular addressing

Limitations of circular buffering

Methodology for a circular buffer

On Chip peripherals

• On chip timer– They generate periodic interrupts to the DSP– They also generate sampling clocks for A/D

converters

• Serial port– They enable data communication between P-DSP and peripherals

such as ADC,DAC or a RS-232C device.– These ports have buffers such that the DSP sends data and reads data

to these ports in parallel form but the data is sent out through these ports in serial form and data is read from these ports in serial form

On Chip peripherals contd..

• TDM serial port– a special serial port which permits P-DSP to

communicate with other devices or other P-DSPs using Time division multiplexing format

• Parallel ports– They are faster than serial port

• Bit I-O port– These are only single bit wide– They can be individually set, reset or read– These bits are used for control purposes or for data transfer also

On Chip peripherals contd..

• Host port– A special type of parallel port the P-DSPs have– This enables the P-DSPs to communicate with a processor or a PC which is

called host– They can communicate data through this– They can generate interrupts– They also help P-DSP to load a program from ROM to RAM

• Common ports– They are used for communication between many P-DSPs in a

multiprogrammed system

• On Chip ADCs and DACs– They are used to enable P-DSP communicate with analog world– They are used in cellular phones and tapeless answering machines

TMS320C50

Complex DSP operations

• Sum of products is the most key element in most DSP algorithms

Algorithm Equation

Finite Impulse Response Filter

M

kk knxany

0

)()(

Infinite Impulse Response Filter

N

kk

M

kk knybknxany

10

)()()(

Convolution

N

k

knhkxny0

)()()(

Discrete Fourier Transform

1

0

])/2(exp[)()(N

n

nkNjnxkX

Discrete Cosine Transform

1

0

122

cos).().(N

x

xuN

xfucuF