+ All Categories
Home > Documents > BEngDSP Notes

BEngDSP Notes

Date post: 04-Jun-2018
Category:
Upload: ashrafdeen08
View: 217 times
Download: 0 times
Share this document with a friend

of 181

Transcript
  • 8/13/2019 BEngDSP Notes

    1/181

    1

    U H

    BEng

    School of Engineering & Technology, University of Hertfordshire

    Prof. Talib Alukaidey

    Digital Signal Processing

  • 8/13/2019 BEngDSP Notes

    2/181

    2

    U H

    Table of Contents

    Outline ofDigital Signal Processors

    Digital vs. Analogue Signal Processing ---------------------------------------------------------------------- Page 3Why process signals digitally? -------------------------------------------------------------------------------- Page 5What is Digital Signal Processing? ---------------------------------------------------------------------- Page 6What are Digital Signal Processors? -------------------------------------------------------------------- Page 7What are the typical Applications for DSP? ---------------------------------------------------------- Page 9What do you need to produce a Functional DSP Device? --------------------------------------- Page 15The Efficiency of the Assemblers & the Goodies of the Simulators ------------------------ Page 20High Level Languages and Their Advantages ------------------------------------------------------- Page 25Binary Notation in DSP's ------------------------------------------------------------------------------------ Page 29Features Of ADSP-2100 Base Architecture ----------------------------------------------------------- Page 41ADSP-2100 Family Base Internal Architecture ------------------------------------------------------- Page 42ALU ----------------------------------------------------------------------------------------------------------------- Page 43MAC ---------------------------------------------------------------------------------------------------------------- Page 55Shifter -------------------------------------------------------------------------------------------------------------- Page 71

    Data Address Generator (DAG) Operations ---------------------------------------------------------- Page 83Program Sequencer Operations -------------------------------------------------------------------------- Page 93ADSP-2100 Family Peripherals --------------------------------------------------------------------------- Page 102The Base Architecture of Floating-Point DSP Processor ---------------------------------------- Page 128The System Architecture ------------------------------------------------------------------------------------ Page 133The Complete Architecture --------------------------------------------------------------------------------- Page 134What is a Real Time Application? ------------------------------------------------------------------------ Page 135Real Time Operating Systems as an Ideal Environment for Embedded Applications -- Page 136Compression Techniques and a Compressor and De-Compressor Generator ---------- Page 140Performance Measures------------------------------------------------------------------------------------------ Page 145Data Flow Bottle-necks & Solutions; Pipeline & Parallel Architectures With Examples --- Page 147High Performance System Classification Scheme ------------------------------------------------- Page 163SIMD Matrix Multiplication & SIMD FFT ---------------------------------------------------------------- Page 166How To Design SIMD DSP System From The Off-Shelf Fixed-Point DS Processors? ----------- Page 167Multiprocessing With The SHARC ------------------------------------------------------------------------ Page 171VLIW Compiler and the DSP Super Computer Architecture Goes Hand in Hand -------- Page 213

  • 8/13/2019 BEngDSP Notes

    3/181

    3

    U H

    Digital vs. Analogue Signal Processing

    Digital vs. Analogue Signal Processing

    Y(f)

    X(f)

    LP BP HP

    f

    x(t)R y(t)C

    x(t)R y(t)

    CL

    x(t)R

    y(t)C

    Simple

    Filters

    YHP

    YBP

    YLP

    t

    t

    x(t)

    Data with abroad rangeof spectralcontent

    Filters are typically used to pick out signals of interest from noise, by making use of their differing frequency

    characteristics.

    Filters can be designed analogue components or digital components. The following figure shows simple

    analogue filters:

  • 8/13/2019 BEngDSP Notes

    4/181

    4

    U H

    DIGITALS/H A/D PROCE- D/A

    fs

    NOISY SIGNAL CLEAN

    DiscreteTime Value

    AnalogueDiscrete FilterProcessing

    SSOR

    Signal

    SIGNAL

    The following figure shows the required components for Digital filters:

    Analogue

    Signal

  • 8/13/2019 BEngDSP Notes

    5/181 5

    U H

    Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    Bandwidth Aging Temp Drift Accuracy Upgrade Prediciton

    Analogue

    Digital

    Why process signals digitally?

  • 8/13/2019 BEngDSP Notes

    6/181 6

    U H

    For reasons of simplicity and flexibilityassociated with the binary nature of the

    electronics, processing of signals is most

    conveniently done digitally and it is this

    major area of electronics, informationtechnology and control engineering known

    as Digital Signal Processing.

    What is Digital Signal Processing?

    Digital Signal Processings are Numerical Techniques ToExtract Information From Discrete Time, Discrete ValuedSignals.

  • 8/13/2019 BEngDSP Notes

    7/181 7

    U H

    The rapid advances being made in the field of digital component technology

    are having profound effects on all aspects of digital systems design.

    Nowhere are these effects being felt more strongly than in the design of highperformance systems for such applications as digital signal processing.

    This part of the DSP2course brings together a wide variety of logical

    concepts that impact the design of such systems which acknowledge and

    take advantage of modern component technology.

    The Digital Signal Processors may be interpreted as:

    1- The design of VLSI components intended for use in digital signal

    processing applications, &

    2- The design of digital signal processing systems that utilise VLSI

    components.

    What are Digital Signal Processors?

  • 8/13/2019 BEngDSP Notes

    8/181 8

    U H

    2

    < 300s1

    < 300 s0

    < 300 sS A M P E L S

    F I

    a i x n i

    N

    ( )* ( ) 1

    1

    < 300

    R

    y n a i x n i

    N( ) ( )* ( ) 11

    2 1 0

    SPEECH RECOG.

  • 8/13/2019 BEngDSP Notes

    9/1819

    U H

    What are the typical Applications for DSP?

    Communications

    Echo CancellationScrambler-Descrambleretc.

    Radar

    ImagingSpeechControlGeologyMedical

    and more and more

  • 8/13/2019 BEngDSP Notes

    10/1811

    U H

    SPEECHAmong The Applications of DSP to Speech are:

    . VOCODERS . SYNTHESIS . ANALYSIS . RECOGNITION

    One of the Largest Applications is in Voice Synthesis:

    Impulse

    TrainGenerator

    RandomNumberGenerator

    Pitch PeriodDigital Filter Coefficients(Vocal Tract Parameters)

    X

    Amplitude

    SpeechSamples

    Time-Varying

    DigitalFilter

  • 8/13/2019 BEngDSP Notes

    11/1811

    U H

    CONTROLControl Systems are Finding Applications for DSP

    . Lead/Lag Compensators . Transducer Linearisation . LargeMultivariate Systems

    For Example: Feedback Control

    Digital

    Command D/A

    Dynamic

    System

    A/D

    Feedbac

    Digital

    Filter

    .

  • 8/13/2019 BEngDSP Notes

    12/181

    1

    U H

    COMMUNICATIONCommunications Applications of DSP Include:

    . PCM Generation . Tone Detection . Adaptive EchoCancellers . SSB Generation

    For Example: SSB Via Hilbert Filters

    X(t)A/D

    Delay

    HilbertFilter

    SIN

    COS

    Y(n)

    X(f)

    f

    Y(f)

    f

  • 8/13/2019 BEngDSP Notes

    13/181

    1

    U H

    IMAGINGImage Processing Applications Include:

    . Deblurring . Data Compression . Scene Analysis . 3-DReconstruction

    For Example: A Moving Camera Blurs a Picture and canbe Modelled as a Low Pass Filter. Deblurring Requires the

    Inverse Linear Operation

    Scene

    MovingCamera

    Picture

    2-D

    Filter

    Inverse

    Point SpreadFunction

    Deblurre

  • 8/13/2019 BEngDSP Notes

    14/181

    1

    U H

    MEDICALDSP is Finding New Applications in the Medical Field:

    . Patient Monitoring . Tomography . Blood Flow Velocimeters. EKG Pattern Analysis . XRAY Enhancement

    For Example: Micro Based Monitor

    CommercialFetal

    Monitor

    MUXS/H

    A/D

    Micro

    Display

    DataRecoder

    DSP

  • 8/13/2019 BEngDSP Notes

    15/181

    1

    U H

    What do you need to produce aFunctional DSP device?

    Answer: HARDWARE & SOFTWARE

    Real Time DSP applications requirechoices In both Hardware &Software to produce a functional

    device

  • 8/13/2019 BEngDSP Notes

    16/181

    1

    U H

    APPLICATION

    HARDWARE SOFTWARE

    ARRAYPROCESSOR

    MICRO-PROCESSOR

    D SPCHIP SPECIAL

    DEVICEHIGH

    LEVEL

    ASSEMBLYCODE

    MICROCODE

    FUNCTIONALDEVICE

    ADVANCED CAD TOOLS

  • 8/13/2019 BEngDSP Notes

    17/181

    1

    U H

    Design Capture : Dra w and Spe cify

    T ransla tor

    Ana log D e vices Design Implementa tion

    GENERATOR

    CODE

  • 8/13/2019 BEngDSP Notes

    18/181

    1

    U H

    Library of DSP PrimitiveFunctions

    A

    B

    EQ

    2

    3

    EQ?

    1 1IN

    EX T_IN

    IN?

    G 1GP

    INGP

    12

    AEXP?

    AEXPAND

    12

    ACOMP?

    ACOMPRES

    FIR

    LMSE

    AFIR?3 X n

    3 Dn

    Yn 1

    2

    NOISE?

    NOISE

    1

    Z

    DELAY1

    12 -1

    DELAY?

    Z

    DELAY2

    12 -2

    DELAY?

    Z

    DELAYN

    12 -n

    DELAY?

    2

    MULT

    MULT?

    GDFT2

    DFT1

    2

    3

    4

    2

    MINUS

    MINUS?

    +

    -

    AMP

    AMP?

    2 1

  • 8/13/2019 BEngDSP Notes

    19/181

    1

    U H

    Proportional Integral Derivative (PID)Compensation Filter

    U t K e t K de tdt K e t dt p d i( ) ( ) ( ) ( )

    2

    MINUSMINUS1

    +

    -

    AMP

    AMP1

    2

    1

    SOURCE

    1

    PROFILE GEN

    zcne=tzlrate=10000.0trigger=cp0

    AMP

    AMP2

    2 1

    AMP

    AMP3

    2 1

    AMP

    AMP4

    2 1

    INT Z

    12

    INT1

    DFF_LD1

    12

    DFF1

    ddt

    SUM3

    2

    4

    SUM1

    13

    PAR_IN

    1

    ENCODER

    OUT1

    SER OUT1des=port2

    1

    3

    1

    gdn=1.0

    gdn=.7

    gdn=.5

    gdn=1.19

  • 8/13/2019 BEngDSP Notes

    20/181

  • 8/13/2019 BEngDSP Notes

    21/181

    2

    U H

    The AssemblerThe Assembler translates source code, written with an

    algebraic syntax, into object code. Variables, data buffers,

    and symbolic constants are defined with the Assemblerdirectives.

    LCNTR=r15, Do end_bfly until LCE;

    f8=f1*f6, f14=f11-f14, dm(i2,m0)=f10, f9=pm(i11,m8);f11=f1*f7, f3=f9+f14, f9=f9-f14, dm(i2,m0)=f13, f7=pm(i8,m8);

    f14=f0*f6, f13=f8+f12, f8=dm(i0,m0), pm(i10,m10)=f9;

    end_bfly: f12=f0*f7, f13=f8+f12, f10=f8-f12, f6=dm(i0,m0), pm(i10,m10)=f3;

    FFT Butterfly Core Example

  • 8/13/2019 BEngDSP Notes

    22/181

    2

    U H

    Due to the following characteristics, a high efficient codecould be achieved if an assembler is used:

    Dedicated Purpose

    Assembler is Hardware Slave

    Moderate Data Size

    Instruction Mnemonics, Address Labels

    Simple Arithmetic Operations

    High Speed

    Moderate Ease Writing and Development

    Moderate Ease of Documentation

    DSP Processor Development Cycle

  • 8/13/2019 BEngDSP Notes

    23/181

    2

    U H

    S T A R T

    Burn PR OM s

    Prototype T e st

    EN D

    (System Builder)

    D e fine T a rge t H a rdware

    Assemble Mo dule

    Link

    S IMU LAT E EMU LAT E

    PR OM Sp litte r

    .obj .cde .int

    .sys .dsp

    .ach

    .exe .exe

    CROSS-SOFTWARE-PRO

    GRAMS

    Repeat as necessary

    Repeat as necessary

    DSP Processor Development Cycle

  • 8/13/2019 BEngDSP Notes

    24/181

    2

    U H

    Performs interactive, instruction-level simulation of the DSPprocessor code within the hardware configuration

    Simulates interrupt and I/O handling,

    Flags illegal operations

    Supports full symbolic assembly and disassembly

    Displays the internal operations and status of the processor

    Provides an easy-to-use, window oriented, and graphical user

    interface with commands accessed from pull-down menuswith a mouse

    The Simulator

  • 8/13/2019 BEngDSP Notes

    25/181

    2

    U H

    High Level Languages and TheirAdvantages

    High-Level Languages are:

    C

    Compiler

    HD

    C++

    HD

    DSP/C

    HD

    Compiler

    (Numerical C)

    ADA

    HD

  • 8/13/2019 BEngDSP Notes

    26/181

    2

    U H

    Compiles with ANSI Specification Incorporates Optimizing Algorithms to Speed Up the

    Execution of Code

    They Include an Extensive Runtime Library withTypical 100 Standard and DSP-Specific Functions

    Outputs DSP Processor Assembly LanguageSource Code

    C Compiler and Runtime Library

  • 8/13/2019 BEngDSP Notes

    27/181

    2

    U H

    Supports ANSI Standard (X3J11.1) Numerical C as

    Defined by the Numeric C Extensions Group (NECG)

    Accepts C Source Input Containing Numerical CExtensions for:

    Array Selection

    Vector Math OperationsComplex Data TypesCircular PointersVariably Dimensioned Arrays

    Outputs DSP Processor Assembly Language SourceCode

    DSP/C Compiler

    DSP HLL Ad t

  • 8/13/2019 BEngDSP Notes

    28/181

    2

    U H

    DSP HLLs Advantages are:

    Hardware Transparent (Portability)

    High Level Arithmetic Operations (Complex Math) orUse Library Routines e.g. sin(), fir(), fft()

    Loops, Arrays, Labels, I/O Format

    Searching and Sorting

    Peripheral Intensive System

    Relatively Fast Writing & Development

    Ease of Documentation

  • 8/13/2019 BEngDSP Notes

    29/181

  • 8/13/2019 BEngDSP Notes

    30/181

    3

    U H

    Binary - Hexadecimal - Decimal Number

    Conversion Table

    Decimal

    0123

    456789

    1011

    12131415

    Hexadecimal

    0123

    456789

    AB

    CDEF

    Binary

    0000000100100011

    01000101011001111000100110101011

    1100110111101111

  • 8/13/2019 BEngDSP Notes

    31/181

    3

    U H

    Signed / Unsigned

    UnSigned

    Signed

    0000 0V - FULL SCALE

    FFFF 5V + FULL SCALE

    8000 -5V - FULL SCALE

    0000 0V

    7FFF 5V + FULL SCALE

    S/U U U U U U U U U U U U U U U U

  • 8/13/2019 BEngDSP Notes

    32/181

    3

    U H

    2's Compliment Representation

    For 2's complement representation, the scale factor for the sign bit of a number

    is seen as -(2) (M-1) where M is the number of bits left of the binary point. For

    a 4.2 number, the sign scale is (-2)^3.

    Example: 0101.01 = 0 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)

    = 5.25

    = - 2.75

    1101.01 = 1 * (-8) + 1 * (4) + 0 * (2) + 1 * (1) + 0 * (1/2) + 1 * (1/4)

    Binary Point

    -2-13 012-(2 ) 2 2 2 2 2

    Sign Bit

  • 8/13/2019 BEngDSP Notes

    33/181

    3

    U H

    Fractional versus Integer Notation

    S F F F F F F F F F F F F F F F

    S I I I I I I I I I I I I I I I

    radix point

    radix point

    Integer format is 16.0 notation

  • 8/13/2019 BEngDSP Notes

    34/181

    3

    U H

    DSP is optimized for fractional

    notation

    DSP supports integer notation

  • 8/13/2019 BEngDSP Notes

    35/181

    3

    U H

    Ranges for 16 bit Formats

    Largest PositiveValue (0x7FFF)

    In Decimal

    0.9999694824218751.9999389648437503.9998779296875007.999755859375000

    15.999511718750000

    31.99902343750000063.998046875000000127.996093750000000255.992187500000000511.984375000000000

    1023.9687500000000002047.9375000000000004095.8750000000000008191.750000000000000

    16383.50000000000000032767.000000000000000

    Largest NegativeValue (0x8000)

    In Decimal

    1.02.04.08.0

    16.0

    32.064.0128.0256.0512.0

    1024.02048.04096.08192.0

    16384.032768.0

    Value of 1 LSB(0x0001)In Decimal

    0.0000305175781250.0000610351562500.0001220703125000.0002441406250000.000488281250000

    0.0009765625000000.0019531250000000.0039062500000000.0078125000000000.0156250000000000.0312500000000000.0625000000000000.1250000000000000.250000000000000

    0.5000000000000001.000000000000000

    FORMAT

    1.152.143.134.125.11

    6.107.98.89.710.611.512.413.314.2

    15.116.0

    Fractional

    Integer

  • 8/13/2019 BEngDSP Notes

    36/181

    3

    U H

    Format Example

    +5 V

    -5 V

    0 V

    0x7FFF

    0x0000

    0x8000

    FORMAT

    1 2

    3

    4 5

    0x7FFF

    0x3FFF

    0x0000

    0xCCCD

    0x8000

    1)

    2)

    3)

    4)

    5)

    16.0 1.15

    5 V

    2.5 V

    0 V

    -2.0 V

    -5.0 V

    = 32767 ->

    = 16383 ->

    = 0 ->

    = -13107 ->

    = -32768 ->

    0.999969482... ->

    0.499969482... ->

    0.0000000... ->

    -0.399993986... ->

    -1.0000000.... ->

    5 V

    2.5 V

    0 V

    -2.0 V

    -5.0 V

  • 8/13/2019 BEngDSP Notes

    37/181

    3

    U H

    There are two methods for converting Hexadecimal Numbers to Decimal

    Numbers. One is easy and one is hard.

    HARD WAY : Convert the hexadecimal number to binary. Place the binary

    point. Multiply each bit of the binary number by its associated scale factor.

    Example: Convert 0x2A00 to a 1.15 twos-complement decimal value

    0x2A00 = 0.010 1010 0000 0000= 2^-2 + 2^-4 + 2^-6

    = 0.25 + 0.0625 + 0.015625

    = 0.328125 = 0.33 = 1/3

    EASY WAY : Use a calculator to convert the hexadecimal number to decimal.

    Divide the decimal number by 2^N where N is the number of bits to the right

    of the binary point.

    Example: Convert 0x2A00 to a 1.15 twos-complement decimal value

    0x2A00 10752 / 2^15 = 10752 / 32768 = 0.328125

    Hexadecimal to Decimal Conversion

  • 8/13/2019 BEngDSP Notes

    38/181

    3

    U H

    There are two methods for converting Decimal Numbers toHexadecimal numbers. One is easy, and one is hard.

    HARD WAY: Break the decimal number into its 2^N components.

    Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format

    0.8125 =>

    2-2

    2-1

    20

    2-3

    2-4

    2-5

    2-6

    2-7

    1 1/2 1/4 1/8 1/16 1/641/32 1/128

    0 0001011 => 0x6800

    EASY WAY: Multiply the decimal number by 2^N where N is the number ofbits to the right of the binary point. Then use a calculator to convert to hex.

    Example: Convert 0.8125 to a 1.15 twos-complement hexadecimal format

    0.8125 * 2^15 = 0.8125 * 32768 = 26624 0x6800

    Decimal to Hexadecimal Conversion

  • 8/13/2019 BEngDSP Notes

    39/181

    3

    U H

    Binary Notation Mini-Quiz

    Mini-Quiz

    1) What is 0x4000 (1.15 format) in signed decimal notation?

    2) What is 0x4000 (16.0 format) in signed decimal notation?

    3) What is 0x4000 (0.16 format) in unsigned decimal notation?

    4) What is .875 in hex 1.15 Format?

    5) What is -.875 in hex 1.15 Format?

  • 8/13/2019 BEngDSP Notes

    40/181

    4

    U H

    Binary Notation Mini-Quiz Answer

    1) What is 0x4000 in 1.15 signed notation? 0.5

    2) What is 0x4000 in 16.0 signed notation? 16384

    3) What is 0x4000 in 0.16 unsigned notation? 0.25

    4) What is .875 in 1.15 Format? 0x7000

    5) What is -.875 in 1.15 Format? 0x9000

  • 8/13/2019 BEngDSP Notes

    41/181

    4

    U H

    Features Of ADSP-2100 Base Architecture

    Modified Harvard Architecture

    2 Data Address Generators

    Advanced Program Sequencer

    3 Arithmetic Units (ALU/MAC/Shifter)

    Result Bus

    ADSP-2100 Family Base Internal Architecture

  • 8/13/2019 BEngDSP Notes

    42/181

    4

    U H

    y

    Input Regs

    Output Regs

    Shifter

    Input Regs

    Output Regs

    ALU

    Input Regs

    Output Regs

    MAC

    R BUS 16

    DMD BUS

    PMD BUS

    DataAddress

    Generator#2

    DataAddressGenerator

    #1

    DMA BUS

    PMA BUS14

    14

    24

    16

    ProgramSequencer

  • 8/13/2019 BEngDSP Notes

    43/181

    4

    U H

    ALU

    ALU Block Diagram

  • 8/13/2019 BEngDSP Notes

    44/181

    4

    U H

    g

    X Y

    ALU

    R

    AZ

    ANAC

    AV

    AS

    AQ

    CI

    MUX

    ARREGISTER

    16

    AF

    REGISTER

    AX

    REGISTERS

    2 x 16

    16 16

    16

    16

    24

    16

    PMD BUS

    DMD BUS 16

    R - BUS

    MUX

    MUX

    MUX

    AY

    REGISTERS

    2 x 16

  • 8/13/2019 BEngDSP Notes

    45/181

    4

    U H

    ALU Features

    4 Input Registers ( AX0, AX1, AY0, AY1 )

    Feedback Paths ( AF, AR, MR0, MR1, MR2, SR0, SR1 )

    Six Status Flags

    Saturation

    Provisions For Double Precision

    Background Registers

  • 8/13/2019 BEngDSP Notes

    46/181

    4

    U H

    ALU Instruction Examples

    (Programmer's Quick Reference pgs 4-5)

    AR = AX0 + AY0;

    AF = MR1 XOR AY1;

    AR = AX0 + AF;

    IF GE AR = -AR;

    IF AV AR = AY1 + 1;

  • 8/13/2019 BEngDSP Notes

    47/181

    4

    U H

    ALU Instructions

    [IF Condition] dest = xop + yop ;

    [IF Condition] dest = xop + C ;[IF Condition] dest = xop + yop + C ;

    [IF Condition] dest = xop - yop ;

    [IF Condition] dest = xop - yop + C - 1 ;

    [IF Condition] dest = yop - xop ;

    [IF Condition] dest = yop - xop + C - 1;

    [IF Condition] dest = xop AND yop;

    [IF Condition] dest = xop OR yop;

    [IF Condition] dest = xop XOR yop;

    [IF Condition] dest = PASS xop ;

    [IF Condition] dest = PASS yop ;

    [IF Condition] dest = PASS 0;[IF Condition] dest = PASS 1;

    ALU Instructions

  • 8/13/2019 BEngDSP Notes

    48/181

    4

    U H

    ALU Instructions

    [IF Condition] dest = - xop ;

    [IF Condition] dest = - yop ;

    [IF Condition] dest = NOT xop ;[IF Condition] dest = NOT yop ;

    [IF Condition] dest = ABS xop ;

    [IF Condition] dest = yop +/-1 ;

    DIVS yop , xop ;

    DIVQ xop ;

    XOP = [AR, MR0, MR1, MR2, SR0, SR1, AX0, AX1]

    YOP = [AY0, AY1, AF]

    dest = [AR, AF]

    Examples: AR = AX0 + AY0;

    AF = NOT AR;

    AF = AX1 + AY0 + C;

    ALU St t Fl

  • 8/13/2019 BEngDSP Notes

    49/181

    4

    U H

    ALU Status Flags

    Flag Name Definition

    AZ Zero Logical NOR of all bits in ALU result reg. True if

    ALU output equals 0

    AN Negative Sign bit of ALU result. True if ALU output negative

    AV Overflow X-OR of carry outputs of 2 most significant adder

    stages. True if ALU overflows

    AC Carry Carry output from most significant adder stage

    AS Sign Sign of ALU X input port. Affected only by ABS

    instruction

    AQ Quotient Quotient bit generated only by DIVS and DIVQ

    Arithmetic Conditions

  • 8/13/2019 BEngDSP Notes

    50/181

    5

    U H

    Arithmetic Conditions

    ALU Overflow Bit Set

    ALU Carry Bit Set

    EQ: ALU result = 0

    NE: ALU Result 0

    GT: ALU Result > 0GE: ALU Result 0

    LT: ALU Result < 0

    LE: ALU Result 0

    NEG: XOP Input Negative

    POS: XOP Input Positive

    AV:

    Not AV:

    AC:

    Not AC:

    MV:

    Not MV:

    Not CE: Not Counter Expired

    Absolute Value Instruction Only

    MAC Overflow Bit

    >

    >

    >

    >

    ALU Saturation

  • 8/13/2019 BEngDSP Notes

    51/181

    5

    U H

    Sets ALU result to full scale positive or full scale negative if overflow or

    underflow occurs

    Feature enabled by executing ena ar_sat (bit 3 of MSTAT)

    Once enabled, affects every ALU operation

    Only affects results sent to AR (AF - flags still get set)

    Overflow or underflow determined by the following conditions

    Overflow (AV) Carry (AC) AR Contents

    0 0 ALU Output

    0 1 ALU Output

    1 0 0x7FFF

    full-scale positive

    1 1 0x8000

    full-scale negativeALU Overflow Latch Mode

    Causes AV status flag to become sticky. Need to explicitly clear.

    Feature enable by executing ena av_latch (bit 2 of MSTAT)

    ALU Mi i Q i

  • 8/13/2019 BEngDSP Notes

    52/181

    5

    U H

    ALU Mini-Quiz

    Write The ADSP-2100 Code To Perform The Following Operations:

    1) Add 0x0030 to 0x0070 And Store Result in AF.

    Hint:

    = 0x0070 ;

    = 0x0030 ;

    AF = + ;

    2) Find The Logical AND Of 0x1234 And 0xF00F.

    Store The Result In AR.

    ALU Mini Quiz

  • 8/13/2019 BEngDSP Notes

    53/181

    5

    U H

    ALU Mini-Quiz

    Write The ADSP-2100 Code To Perform The Following Operations:

    1) Add 0x0030 to 0x0070 And Store Result in AF.

    Hint:

    AX0 (or AX1) = 0x0070 ;

    AY0 (or AY1)= 0x0030 ;

    AF = AX0 + AY0 ;

    2) Find The Logical AND Of 0x1234 And 0xF00F.

    Store The Result In AR.

    AY1 = 0x1234;AR = 0xF00F;

    AR = AR AND AY1;

    ADSP-2100 Family Base Internal Architecture

  • 8/13/2019 BEngDSP Notes

    54/181

    5

    U H

    Input Regs

    Output Regs

    Shifter

    Input Regs

    Output Regs

    ALU

    Input Regs

    Output Regs

    MAC

    R BUS 16

    DMD BUS

    PMD BUS

    DataAddress

    Generator

    #2

    DataAddressGenerator

    #1

    DMA BUS

    PMA BUS14

    14

    24

    16

    ProgramSequencer

  • 8/13/2019 BEngDSP Notes

    55/181

    5

    U H

    MAC

    MAC Block Diagram24PMD BUS

  • 8/13/2019 BEngDSP Notes

    56/181

    5

    U H

    MF

    REGISTER

    MY

    REGISTERS2 x 16

    16

    16

    X Y

    MULTIPLIER

    P

    MX

    REGISTERS2 x 16

    16 16

    32

    16

    MR1

    REGISTER

    MR2

    REGISTERMR0

    REGISTER

    168

    M

    U

    X

    R0R1R2

    40

    MV

    16

    DMD BUS16

    R - BUS

    ADD / SUBTRACT

    MUX

    MUXMUX

    MUXMUXMUX

    MAC Features

  • 8/13/2019 BEngDSP Notes

    57/181

    5

    U H

    MAC Features

    40 Bit Accumulator

    Saturation

    Complete Set of Background Registers

    Mixed Mode Input Operands for Multiprecision Feedback Paths

    Access to R-Bus, DM and PM

    MAC Instruction Examples

  • 8/13/2019 BEngDSP Notes

    58/181

    5

    U H

    MAC Instruction Examples

    (Programmer's Quick Reference pgs 6-7)

    MR = MX1 * MY0 (SS);

    MF = AR * MY1 (SS);

    MR = MR + AR * MY1 (SS);

    MR = 0;

    IF MV SAT MR;

    IF EQ MR = MX0 * MY0 (UU);

    MAC Instructions

  • 8/13/2019 BEngDSP Notes

    59/181

    5

    U H

    MAC Instructions

    [IF condition] dest = xop * yop (format);

    [IF condition] dest = MR + xop * yop (format);

    [IF condition] dest = MR - xop * yop (format);

    [IF condition] dest = 0;

    [IF condition] dest = MR [ (RND)];

    Where:

    condition = arithmetic conditions

    dest = {MR, MF}

    format = {SS, US, SU, UU, RND}

    XOP = {MX0, MX1, MR2, MR1, MR0, AR, SR0, SR1}

    YOP = {MY0, MY1, MF}

    Placement of Binary Point in Multiplication

  • 8/13/2019 BEngDSP Notes

    60/181

    6

    U H

    Placement of Binary Point in Multiplication

    Binary Integer Multiplication

    M Bits

    P Bitsx

    M+P Bits

    Example: 16.0 x 16.0 => 32.0

    Mixed/Fractional Multiplication

    M.N Bits

    P.Q Bitsx

    (M+P).(N+Q) Bits

    Example: 1.15 x 1.15 => 2.30

    4.12 x 1.15 => 5.27

    Multiplication Modes on the ADSP-21xx

  • 8/13/2019 BEngDSP Notes

    61/181

    6

    U H

    Multiplication Modes on the ADSP 21xx

    Multiplier Assumes all numbers in a 1.15 Format Multiplier Automatically 1-bit Left Shifts Product

    Before Accumulation (Result Forced to 1.31 Format)

    Example: MR = MX0 * MY1 (SS);

    Mode 1: Fractional Mode

    0x4000 0x4000

    MX0 MY1

    MR0MR1MR2

    MR1

    0x00 2000 0000

    0x2000 underflowoverflow

    Multiplication Modes on the ADSP-21xx

  • 8/13/2019 BEngDSP Notes

    62/181

    6

    U H

    p

    Multiplier Assumes all numbers in a 16.0 Format No automatic left-shift necessary

    Example: MR = MX0 * MY1 (SS);

    Mode 2: Integer Mode

    0x4000 0x4000

    MX0 MY1

    MR0MR1MR2

    0x00 1000 0000

    0x0000overflow

    MR0

    overflow

    Multiplication on the ADSP-21xx

  • 8/13/2019 BEngDSP Notes

    63/181

    6

    U H

    p

    To Switch Modes: ENA M_MODE; {Select Integer Mode} *DIS M_MODE; {Select Fractional Mode}

    MSTAT Register holds value

    Fractional Mode the Default on Reset/Power-up

    * Integer Mode Not Available on ADSP-2100A

    Rounding in the MAC

  • 8/13/2019 BEngDSP Notes

    64/181

    6

    U H

    g

    Rounding can be specified as part of multiply instruction (RND)

    Rounding only applies to fixed point fractional results

    40-bit results "rounded to nearest" 16 bit value.

    Rounded result can be placed in MR1 or MF register

    Input: MX0 = 0x7FF9, MY0 = 0xEEEE

    Command MR2 MR1 MR0

    MR = MX0 * MY0 (SS); FF EEEE EEFC

    MR = MX0 * MY0 (RND); FF EEEF 6EFC

    Saturation and Overflow

  • 8/13/2019 BEngDSP Notes

    65/181

    6

    U H

    Overflow occurs when sign bit is corrupted during accumulation

    Overflow Status signal (MV) is updated every time a MAC operation is

    executed

    MV is set when MSB of MR2 does not equal MSB of MR1

    Saturation is performed by following instruction:

    IF MV SAT MR

    Input: MX0 = 0x7FFF, MY0 = 0x7FFF, MR = 00 7FFE 0002

    Command MR2 MR1 MR0

    MR = MR + MX0 * MY0 (SS); 00 FFFC 0004IF MV SAT MR; 00 7FFF FFFF

    MAC Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    66/181

    6

    U H

    Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply

    the result by 0x20.

    AX0 = 0x0020;

    AY0 = 0x0010;

    AR = _______________

    ___ = _______________

    ____=_______ * _________

    Binary Multiply Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    67/181

    6

    U H

    Fractional Mode Integer Mode

    0x1240 * 0x0001

    0x4000 * 0x4000

    0x4000 * 0x0002

    What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)

    MAC Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    68/181

    6

    U H

    Write an ADSP-2101 Program to add the values in AX0 and AY0 and to multiply

    the result by 0x20.

    AX0 = 0x0020;

    AY0 = 0x0010;

    AR = AX0 + AY0;

    MY0 = 0x20;

    MR = AR * MY0 (SS);

    Binary Multiply Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    69/181

    6

    U H

    Fractional Mode Integer Mode

    0x1240 * 0x0001

    0x4000 * 0x4000

    0x4000 * 0x0002

    What is the ADSP-21xx Multiplier Output?(Hint: The Output is 32 Bits Wide)

    0x0000 2480 0x0000 1240

    0x2000 0000

    0x0001 0000

    0x1000 0000

    0x0000 8000

    ADSP-2100 Family Base Internal Architecture

  • 8/13/2019 BEngDSP Notes

    70/181

    7

    U H

    Input Regs

    Output Regs

    Shifter

    Input Regs

    Output Regs

    ALU

    Input Regs

    Output Regs

    MAC

    R BUS 16

    DMD BUS

    PMD BUS

    DataAddress

    Generator#2

    DataAddressGenerator

    #1

    DMA BUS

    PMA BUS14

    14

    24

    16

    ProgramSequencer

  • 8/13/2019 BEngDSP Notes

    71/181

    7

    U H

    Shifter

    Shifter Block Diagram16DMD BUS

  • 8/13/2019 BEngDSP Notes

    72/181

    7

    U H

    32

    SR1

    REGISTER

    SR0

    REGISTER

    16

    SI

    REGISTER

    MUX

    SE

    REGISTERNEGATE

    MUX

    EXPONENT

    DETECTOR

    SHIFTER

    ARRAY

    I

    CO

    OR / PASS

    MUX

    8

    32

    16

    1616

    From

    Instruction

    16

    8

    MUX

    R - BUS

    BLOCK

    EXPONENT

    LOGIC

    MUX

    MUX

    16

    Shifter Features

  • 8/13/2019 BEngDSP Notes

    73/181

    7

    U H

    16 Bit Input Value Gets Stored Anywhere in a 32 Bit Output Field

    All Shift Instructions Execute in a Single Instruction Cycle

    Specify Immediate Shift Value within Instruction or indirectly in

    the SE register

    Normalize, Denormalize, and Exponent Detect Instructions Used

    For Block Floating Point and Floating Point Operations

    Shifter Instruction Examples

  • 8/13/2019 BEngDSP Notes

    74/181

    7

    U H

    (Programmer's Quick Reference pgs 8-9)

    SR = ASHIFT SI BY -3 (LO);

    SR = LSHIFT AR BY 6 (HI);

    SR = SR OR LSHIFT SR1 (LO);

    Shifter Instructions

    Shift Immediate Instructions

  • 8/13/2019 BEngDSP Notes

    75/181

    7

    U H

    S t ed ate st uct o s

    SR = [SR OR] ASHIFT xop BY (alignment);

    SR = [SR OR] LSHIFT xop BY (alignment);

    Shift By Value in SE Register

    [IF condition] SR = [SR OR] ASHIFT xop (alignment);

    [IF condition] SR = [SR OR] LSHIFT xop (alignment);

    Where:

    condition = Arithmetic Condition

    xop = {SI, SR0, SR1, MR2, MR1, MR0, AR}

    alignment = {HI, LO}

    data = -32 ... 32

    Arithmetic Shift Sign Extends Right Shifts

    Logical Shift Zero fills Right Shifts

    Left Shifts Are Always Zero Filled

    Positive SE or Values Shift Left

    Negative SE or Values Shift Right

    NO "+" for Positive Shifts

    Using the Shift Instructions

  • 8/13/2019 BEngDSP Notes

    76/181

    7

    U H

    Placement of Output Depends on HI/LO Modifier, SE Register and Value

    Refer to Table 2.4 In ADSP-21xx Users Manual

    Example 1: SR = LSHIFT SI BY -12 (LO);

    1110 1010 0011 0101SI

    Before:

    xxxx xxxxSE

    xxxx xxxx xxxx xxxx

    SR0SR1

    xxxx xxxx xxxx xxxx

    SI

    After:

    xxxx xxxxSE

    0000 0000 0000 1110

    SR0SR1

    0000 0000 0000 0000

    1110 1010 0011 0101

    Immediate Shift Instructions

  • 8/13/2019 BEngDSP Notes

    77/181

    7

    U H

    Example 2: SR = LSHIFT SI BY -12 (HI);

    1110 1010 0011 0101SI

    Before:

    xxxx xxxxSE

    xxxx xxxx xxxx xxxx

    SR0SR1

    xxxx xxxx xxxx xxxx

    SI

    After:

    xxxx xxxxSE

    1010 0011 0101 0000

    SR0SR1

    0000 0000 0000 1110

    1110 1010 0011 0101

    Shift Instructions with SE Register

  • 8/13/2019 BEngDSP Notes

    78/181

    7

    U H

    Example 3: SR = LSHIFT SI (HI);

    1110 1010 0011 0101SI

    Before:

    1111 0100 (-12)SE

    SR0SR1

    xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx

    1110 1010 0011 0101SI

    After:

    SE

    SR0SR11010 0011 0101 00000000 0000 0000 1110

    1111 0100 (-12)

    Shift Instructions with OR Functionality

  • 8/13/2019 BEngDSP Notes

    79/181

    7

    U H

    Example 4: SR = SR OR LSHIFT SI (HI);

    1110 1010 0011 0101SI

    Before:

    1111 0100 (-12)SE

    SR0SR1

    0000 0000 0000 0000 0000 0000 0000 0101

    1110 1010 0011 0101SI

    After:

    SE

    SR0SR11010 0011 0101 01010000 0000 0000 1110

    1111 0100 (-12)

    Shifter Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    80/181

    8

    U H

    Write ADSP-2101 Code to:

    Write 0x0034 into the AR register

    Write 0x0012 into the SI register

    Shift AR into the MS bits of SR0 (SR0 = 0x3400)

    Shift SI into the LS bits of SR0

    Hint: 4 Instructions SR1 = 0x0000, SR0 = 0x3412 When Done

    Shifter Mini-Quiz Answers

  • 8/13/2019 BEngDSP Notes

    81/181

    8

    U H

    Solution 1:

    AR = 0x0034;

    SI = 0x0012;

    SR = ASHIFT AR BY 8 (LO);

    SR = SR OR ASHIFT SI BY 0 (LO);

    Solution 2:

    AR = 0x0034;

    SI = 0x0012;

    SR = LSHIFT AR BY -8 (HI);

    SR = SR OR ASHIFT SI BY -16 (HI);

    ADSP-2100 Family Base Internal Architecture

  • 8/13/2019 BEngDSP Notes

    82/181

    8

    U H

    Input Regs

    Output Regs

    Shifter

    Input Regs

    Output Regs

    ALU

    Input Regs

    Output Regs

    MAC

    R BUS 16

    DMD BUS

    PMD BUS

    DataAddress

    Generator#2

    DataAddressGenerator

    #1

    DMA BUS

    PMA BUS14

    14

    24

    16

    ProgramSequencer

    Data Address Generator (DAG) Operations

  • 8/13/2019 BEngDSP Notes

    83/181

    8

    U H

    Data Address Generator (DAG) Operations

    Registered Indirect Addressing

    Automatic Post-Modify of Address

    Circular Buffering

    DAG 1 Fetches/Stores to Data Memory

    DAG 2 Fetched/Stores to Data or Program Memory

    Bit-Reverser For FFT Support (DAG 1 Only)

    Data Address Generator Block Diagram

  • 8/13/2019 BEngDSP Notes

    84/181

    8

    U H

    L

    REGISTERS

    4 x 14

    MUX

    ADDRESS

    DMD BUS

    FROM

    INSTRUCTION

    ADD

    I

    REGISTERS

    4 x 14

    M

    REGISTERS

    4 x 14

    MODULUS

    LOGIC

    BITREVERSE

    142 14 14 14

    14

    DAG1 ONLY

    FROM

    INSTRUCTION

    2

    DAG Features

  • 8/13/2019 BEngDSP Notes

    85/181

    8

    U H

    DAG Features

    Data Fetch/Store Execute Simultaneous With ArithmeticInstruction

    2 DAGS In Processor

    4 Index Address Registers Per DAG

    4 Modify Registers Per DAG

    4 Length Registers Per DAG

    Any Modifier Register in DAG can be Used With Any

    Index Register in DAG

    Example DAG Instructions

    (P ' Q i k R f 10)

  • 8/13/2019 BEngDSP Notes

    86/181

    8

    U H

    (Programmer's Quick Reference pgs10)

    AX0 = DM(0X3800);

    AX0 = DM(I0, M3);

    MODIFY (I4, M5);

    AX1 = DM(I2,M3), AY0 = PM(I4,M7);

    MR=MR+MX0 * MY0 (SS), MX0 = DM(I2,M2), MY0 = PM(I6,M6);

    Note: L Registers Must Be 0 If Circular Buffers Are Not Used

  • 8/13/2019 BEngDSP Notes

    87/181

    Modulo Addressing Example

    B Add H#0030

  • 8/13/2019 BEngDSP Notes

    88/181

    8

    U H

    H#0030

    H#0037

    I0

    I0 = Current Address

    M0 = Modify Value (3)

    Base Address = H#0030

    L0 = Buffer Length (8)

    M L

    Address Sequence

    30

    33

    36

    31

    34

    37

    32

    35

    Modulo Addressing Code Example

  • 8/13/2019 BEngDSP Notes

    89/181

    8

    U H

    .VAR/DM/CIRC/ABS=0X30I0 = ^Buff;L0 = %Buff;M0 = 3;AX0 = DM (I0, M0);

    AY0 = DM (I0, M0);AX1 = DM (I0, M0);AY1 = DM (I0, M0);

    Buff [8]; /*Define Buffer *//*I0 = Start address of Buff *//*L0 = Length of Buff *//*Modify value = 3 *//*Fetch data at address 30 */

    /*Fetch data at address 33 *//*Fetch data at address 36 *//*Fetch data at address 31 */

    Bit Reversal with the ADSP-2100 Family

  • 8/13/2019 BEngDSP Notes

    90/181

    9

    U H

    Only available with DAG1

    Enabled by setting bit 1 of MSTAT register or using the instructionENA BIT_REV

    Reverses all 14 bits of address

    normal order: 13 12 11 10 09 08 07 06 05 04 03 02 01 00Bit-reversed: 00 01 02 03 04 05 06 07 08 09 10 11 12 13

    For an FFT of size 2^N, set M register to 2*2 (14-N)*

    * x2 because FFT output has real and imaginary data interleaved

    i.e. 256 FFT = 2^8 FFT, M = 2*2^(14-8) = 2*2^6 = 128

    DAGS Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    91/181

    9

    U H

    0x12340x1234

    0x1234

    0x1234

    0x1234

    Data Memory

    DM(0x3800) Write the ADSP-2101 Instructionsto Find the Sum of the N=5 NumbersStored in Data Memory

    Hint:

    Use Multifunction Instructions Nine Instructions Total

    3 Instructions are Repeated

    Questions:

    1) How Many Instructions Cycles AreRequired?

    2) How Many Instruction Cycles are

    Required if N=100?

    3) Is this an Efficient Use of the Processor?

    DAGS Mini-Quiz Answer

    .module/boot = 0 dags_mini_quiz;

    .var/dm/circ data_buf [5];

  • 8/13/2019 BEngDSP Notes

    92/181

    9

    U H

    start:i0 = ^data_buf; /*Load DAG Registers */

    l0 = % data_buf;m3 = 1;ar = dm (I0, m3); /*Load first data value */ay0 = dm (I0, m3); /*Load second data value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load third value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fourth value */ar = ar + ay0, ay0 = dm (i0, m3); /*Add and load fifth value */ar = ar + ay0; /*Last addition */

    .endmod;

    1) 9 Cycles

    2) 104 Cycles

    3) No, it would waste program memory

    Program Sequencer Block Diagram

    DMD BUS 16

  • 8/13/2019 BEngDSP Notes

    93/181

    9

    U H

    INTERRUPTCONTROLLER

    CONDITIONLOGIC

    LOOP STACK4 X 18

    NEXTADDRESSSOURCESELECT

    INCREMENT

    PROGRAMCOUNTER

    NEXT ADDRESS MUX

    PC STACK16 X 14

    PMA BUS 14

    MUX

    From INSTRUCTION REGISTER

    LOOPCOMPARATOR

    18

    14

    14

    2

    IRQ

    4

    4

    14

    16

    14

    COUNTERLOGIC

    STATUSLOGIC

    CE

    Program Sequencer Operations

    Zero Overhead Looping

  • 8/13/2019 BEngDSP Notes

    94/181

    9

    U H

    Conditional/Unconditional Branches

    Interrupt Handling

    Counter and Status Stacks

    Next Instruction Address Generation

    Program Sequencer Features

    Automatic Operation, Transparent to User

    Single Cycle Conditional Branches

    4-Deep Loop, Counter Stack

    16-Deep PC Stack

    Sequencer Instructions

    (Programmer's Quick Reference pgs 12)

  • 8/13/2019 BEngDSP Notes

    95/181

    9

    U H

    [ IF condition] JUMP ;

    [ IF condition] CALL ;

    [ IF condition] RTS;

    [ IF condition] RTI;

    IF CALL ;

    IF JUMP ;

    SET / TOGGLE / RESET FLAG_OUT;

    Where:

    condition = Branch Condition = {(I4), (I5), (I6), (I7), }flag_condition = {FLAG_IN, NOT FLAG_IN}

    Program Loop Example

  • 8/13/2019 BEngDSP Notes

    96/181

    9

    U H

    General Form:

    DO LABEL UNTIL CONDITION

    Example:CNTR=10;

    DO ENDLOOP UNTIL CE;

    { First Loop Instruction } ;

    { Last Loop Instruction } ;ENDLOOP:

    { Next Loop Instruction } ;

    { First Instruction Outside Loop } ;

    Address PushedOn PC Stack

    Address PushedOn LOOP Stack

    Interrupt Handling

    Interrupts Can Be Generated By An External Interrupt Signal Or

  • 8/13/2019 BEngDSP Notes

    97/181

    9

    U H

    Interrupts Can Be Generated By An External Interrupt Signal Or2100 Family Peripherals (Timer, Sport, HIP, etc)

    External Interrupts (IRQx) Can Be Level Or Edge Sensitive (ICNTL)

    Interrupts Have Priority And Can Be Nested

    Interrupts Can Be Masked (IMASK)

    Interrupts Can Be Forced Or Cleared Under Software Control (IFC) *

    Different Family Members Have Different Interrupt Vector Tables

    Interrupt Vector Table Always Begins At PM Address 0x0000

    * Except ADSP-2100A

  • 8/13/2019 BEngDSP Notes

    98/181

    Interrupts & Interrupt Vector Addresses

    ADSP-2101

  • 8/13/2019 BEngDSP Notes

    99/181

    9

    U H

    Interrupt SourceProgram startup at RESETIRQ2

    SPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer

    Interrupt Vector Address0x00000x0004 (highest priority)

    0x00100x00140x0018 (lowest priority)

    ADSP-2105

    Interrupt Source

    Program startup at RESETIRQ2

    Interrupt Vector Address

    0x00000x0004 (highest priority)SPORT0 TransmitSPORT0 ReceiveSPORT1 Transmit / IRQ1SPORT1 Receive / IRQ0Timer

    0x00080x000C0x00100x00140x0018 (lowest priority)

    0x00140x0018

    0x001C

    0x001C

    Sequencer Mini-Quiz

  • 8/13/2019 BEngDSP Notes

    100/181

    1

    U H

    Modify the answer of the DAGS Mini-Quiz to use a zero-overhead loop.

    Assume N=100. Your program should require 9 Instruction Locations

    0x1234

    0x1234

    0x1234

    0x1234

    0x1234

    Data Memory

    DM(0x3800)

    Write the ADSP-2101 Instructionsto Find the Sum of the N=100Numbers Stored in Data Memory

    0x1234

    Sequencer Mini-Quiz Answer

  • 8/13/2019 BEngDSP Notes

    101/181

    1

    U H

    .module/boot = 0 sequencer_mini_quiz;

    .const buf_len = 100;

    .var/dm/circ/abs=0x3800 data_buf [buf_len];

    start:i0 = ^data_buf; /*Load address of data buf */l0 = %data_buf; /*Load length of data buf */m3 = 1;cntr = buf_len - 2; /*Load counter */ar = dm (i0, m3); /*Load first data value */ay0 = dm (i0, m3); /*Load second data value */do add_loop until ce;

    /*Value */

    ar = ar + ay0; /*Last addition */

    rts;.endmod;

    ADSP-2100 Family Peripherals

  • 8/13/2019 BEngDSP Notes

    102/181

    1

    U H

    Memory Interfacing

    Timer

    Serial Ports

  • 8/13/2019 BEngDSP Notes

    103/181

    1

    U H

    ADSP-21xx Family Memory Interface

    ADSP-2101 Basic System Configuration

    Clock or Crystal

  • 8/13/2019 BEngDSP Notes

    104/181

    1

    U H

    SCLKRFS

    TFSDTDR

    14 24

    16824

    Serial Device

    14 2

    SCLK

    RFS or IRQ0TFS or IRQ1DT or FODR or FI

    A D

    OE

    WE

    CS

    DATA

    MEMORY&PERIPHERALS

    (Optional)

    A D CS

    OE

    WE

    PROGRAMMEMORY

    (Optional)

    A D CS

    OE BOOTEPROM

    27C6427C128

    27C25627C512

    150 ns

    ADSP-2101

    CLKIN CLKOUT VDD

    SERIALPORT 0

    GND

    SERIALPORT 1

    DATAADDRESSPMS DMS BMSRD WR

    XTAL

    MMAP

    BG

    BR

    IRQ2

    RESET (Optional)

    Serial Device

    (Optional)

    ADSP-21xx Family Memory Architecture

    V i d M C fi ti A F il M b *

  • 8/13/2019 BEngDSP Notes

    105/181

    1

    U H

    Varied Memory Configurations Across Family Members*

    Core Can Access PM Twice and DM Once Per Instruction

    PM and DM Buses Multiplexed Off Chip*

    Can Perform One Off-Chip Access with No Cycle Penalty

    On Chip PM Can Be Initialized Through Boot EPROM or

    Host Interface Port

    External EPROM Can Store 8 Pages of Bootable Code.

    Software Programmable Wait States

    * Does not apply to ADSP-2100A

    On Chip Memory Configurations For

    ADSP-21xx Processors

    ProgramProgramMemory

    DataMemory Memory

  • 8/13/2019 BEngDSP Notes

    106/181

    1

    U H

    ADSP-2100A

    ADSP-2101

    ADSP-2103

    ADSP-2105

    ADSP-2111

    ADSP-2115

    ADSP-21msp5x

    ADSP-2161/63

    ADSP-2171/73

    MemoryRAM

    MemoryRAM

    yROM

    -

    1k

    1k

    1/2k

    1k

    1/2k

    1k

    1/2k

    2k

    -

    -

    -

    -

    -

    -

    2k

    8k/4k

    8k

    -

    2k

    2k

    1k

    2k

    1k

    2k

    -

    2k

    ADSP-2181 16k -16k

    ADSP-2101 Program Memory Architecture

    0x0000(Reset

  • 8/13/2019 BEngDSP Notes

    107/181

    1

    U H

    (Vector)

    0x07FF0x0800

    0x37FF0x3800

    0x3FFF

    Internal PM

    RAM BootedFrom ExternalBoot Memory

    ExternalProgramMemory

    ExternalProgramMemory

    Internal PMRAM Not

    Booted

    MMAP = 0

    (Boot)

    MMAP = 1

    (No Boot)

    ADSP-21xx Data Memory Architecture

  • 8/13/2019 BEngDSP Notes

    108/181

    1

    U H

    0x0000

    0x3FFF

    InternalData Memory

    RAM

    0x0400

    0x0800

    0x3000

    0x3400

    0x3800

    0x3C00

    1K ExternalDWAIT0

    1K ExternalDWAIT1

    10K ExternalDWAIT2

    1K ExternalDWAIT3

    1K ExternalDWAIT4

    Memory Mappedand Reserved

    Registers

    ADSP-2171

    Internal Data

    Memory

    RAM

    ADSP-21xx Memory Control Registers

  • 8/13/2019 BEngDSP Notes

    109/181

    1

    U H

    11 1 1 1 1 1 1 1 1 1 1 1 1 1

    DWAIT4 DWAIT3 DWAIT2 DWAIT1 DWAIT0

    Data Memory Wait State Control Register DM(0x3FFE)

    System Control Register DM(0x3FFF)

    0 10 0 0 1 1 1 10

    PWAITProgram

    MemoryWait States

    BWAITBoot

    MemoryWait States*

    BPAGEBoot Page

    Select

    BFORCEBoot

    Force Bit

    * 7 wait states for Boot Memory on ADSP-2171

    Memory Mapped Control Registersvs. Status Registers

    Memory Mapped Control Registers

    > Physical locations in Data Memory

  • 8/13/2019 BEngDSP Notes

    110/181

    1

    U H

    y y

    > Accessed by address

    > Addresses 0x3C00 thru 0x3FFF (All Processors)

    Status Registers (or Non-Memory Mapped Registers)

    > Physical registers in the DSP

    > Accessed by name

    Memory Mapped Control Registers> Mainly to set up the peripherals (i.e., mode of operation)

    Status Registers

    > Set up the operation of the DSP core (i.e., MAC, interrupts)

    > Provide information about the DSP core (i.e., stacks, status flags)

    Initialize Memory Mapped Registers before running (i.e., not on the fly)

    Status Registers are meant to be used on the fly

    Memory Mapped Control Registers 0x3FFF System Control Register - Wait states, Enable SPORTs

    0x3FFE Data Memory Waitstate Control Register

    0x3FFD-0x3FFB Timer Control Registers - Set Timer values

  • 8/13/2019 BEngDSP Notes

    111/181

    1

    U H

    0x3FFA -0x3FF7 SPORT0 Multichannel Word Enable Register

    0x3FF6 SPORT0 Control Register - clock, frame and data modes

    0x3FF5 SPORT0 SCLKDIV - Divide down register for SCLK

    0x3FF4 SPORT0 RFSDIV - Divide down register for internal RFS

    0x3FF3 SPORT0 Autobuffer Control Register

    0x3FF2-0x3FEF SPORT1 Control and Setup (same as SPORT0)

    0x3FEF-0x3FEC Analog Control Registers No SPORT1 autobuffer on msp5x parts

    0x3FEB-0x3FE9 NO REGISTERS

    0x3FE8 HMASK Register - HIP mask for interrupts

    0x3FE7-0x3FE6 HIP Status Registers - HSR7 and HSR6 0x3FE5-0x3FE0 HIP Data Registers

    Status Registers

    ASTAT ALU Status Flags, MAC Overflow Flag, Shifter Input Flag

    SSTAT Stacks Overflow and Empty (Read Only)

  • 8/13/2019 BEngDSP Notes

    112/181

    1

    U H

    SSTAT Stacks Overflow and Empty (Read-Only)

    MSTAT Computation Modes, Miscellaneous Functions

    5 4 3 2 1 0

    TimerSPORT1 Receive or IRQ0SPORT1 Transmit or IRQ1SPORT0 ReceiveSPORT0 TransmitIRQ2

    0 0 0 0 0 01 = Enable

    0 = Disable

    4 3 2 1 0

    IRQ0 SensitivityIRQ1 SensitivityIRQ2 Sensitivity

    Interrupt Nesting

    0

    1 = Edge

    0 = Level

    1 = Enable0 = Disable

    ICNTL External Interrupt Sensitivity (edge/level) and Nesting

    IMASK Interrupt Enables - Masks the servicing of interrupts

    IFC Interrupt Force/Clear (Write-Only)

    Boot EPROM to Internal PM RAM

    8 bits 24 bits

  • 8/13/2019 BEngDSP Notes

    113/181

    1

    U H

    8k x 8

    BootPage 0

    2k x 24

    0x0000

    0x2000

    BOOTEPROM

    Internal PM RAM

    0x1FFF

    .

    .

    .

    AdditionalBoot

    Pages

    0x0000

    0x07FF

    8 bitsAB

    C

    Page length

    24 bitsA B C

    A

    B

    C

    X

    A B C

    11

    2

    2

    Booting Order

    ADSP-2101 Timer Block Diagram

    16DMD Bus

  • 8/13/2019 BEngDSP Notes

    114/181

    1

    U H

    TSCALE TPERIOD

    CLKOUTTimer Enable

    & Prescale LogicTCOUNTDecrement Zero

    Count Register Load Logic

    TimerInterrupt

    Timer Enable

    168

    16

    ADSP-2100 Family Timer Features

    The ADSP-21xx programmable interval timer can generate periodic interrupts

  • 8/13/2019 BEngDSP Notes

    115/181

    1

    U H

    The ADSP 21xx programmable interval timer can generate periodic interrupts

    based on multiples of the processor's cycle time. The timer is not available onthe ADSP-2100.

    TCOUNT = dedicated count-down register

    TPERIOD = reloads TCOUNT at interrupt

    TSCALE = # of Clock ticks before TCOUNT decrements - 1

    TCOUNT is decremented every TSCALE+1 cycles. After TCOUNT

    expires, it is reloaded with the value in TPERIOD. One interrupt

    occurs every (TPERIOD + 1) * (TSCALE + 1) cycles.

    ADSP-2101 Timer Registers

  • 8/13/2019 BEngDSP Notes

    116/181

    1

    U H

    0x3FFD

    0x3FFC

    0x3FFB

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    TPERIOD Period Register

    TCOUNT Counter Register

    TSCALE Scaling Register00000000

    ENABLING THE TIMER

  • 8/13/2019 BEngDSP Notes

    117/181

    1

    U H

    1. Set values for TCOUNT, TPERIOD, and TSCALE.

    2. Set bit 0 in IMASK to 1 to enable interrupt.

    3. Execute "ena timer" instruction to start counting down.

    (Bit 5 in MSTAT register)

    Example Setup Code for Timer

    i0 = 0x3ffb; /*i0 points to TSCALE*/

  • 8/13/2019 BEngDSP Notes

    118/181

    1

    U H

    m0 = 1; /*modify value is 1 */

    l0 = 0; /*not a circular buffer */

    dm(i0,m0) = 0; /*set TSCALE to decrement every cycle*/

    dm(i0,m0) = 49; /*to generate first interrupt at 50 cycles*/

    dm(i0,m0) = 99; /*to reload TCOUNT with 99 at interrupt*/

    IMASK = 0x1; /*enables the timer interrupt*/

    ena timer; /*starts the count down after executing this*/

    TIMER MINI-QUIZ

  • 8/13/2019 BEngDSP Notes

    119/181

    1

    U H

    1. Write code to generate a timer interrupt every 50 cycles the first time, and

    75 cycles thereafter (any decrement that works).

    2. Write code to generate a timer interrupt every 300 ms. Assume clock is

    16.67MHz.

    3. What is the longest time you can set the timer for if you have a 12.5MHzcycle time. What would the values of TSCALE, TCOUNT, and

    TPERIOD?

    TIMER MINI-QUIZ ANSWER1. 2.i0 = 0x3ffb; /*same first 3 lines*/ /*300ms = 5,000,000 cycles*/

    m0 = 1; dm(i0,m0) = 0xF9; /*TSCALE = 250*/

    l0=0; dm(i0,m0) = 0x4E1F; /*TCOUNT = 20,000*/

    dm(i0,m0) = 0; /*set tscale=1*/ dm(i0,m0) = 0x4E1F; /*TPERIOD = 20,000*/

  • 8/13/2019 BEngDSP Notes

    120/181

    1

    U H

    dm(i0,m0) = 49; /*set tcount = 50*/ imask = 0x1;dm(i0,m0) = 74; /*set tperiod = 75*/ ena timer;

    imask = 0x1;

    ena timer;

    3. /*same first 3 lines*/

    /*12.5mHz processor yields an 80 ns instruction cycle time TCOUNT and TPERIOD are 16

    bit registers - largest number they can represent is 65535, TSCALE is an 8 bit register, sothe largest number it can represent is 255. Following the equation

    (TSCALE+1)*(TPERIOD+1) gives us 0x100 0000 number of cycles per timer interrupt.

    This number multiplied by 80ns is 1.3422 seconds*/

    dm(i0,m0) = 0xff;

    dm(i0,m0) = 0xffff;

    dm(i0,m0) = 0xffff;imask = 0x1; ena timer;

  • 8/13/2019 BEngDSP Notes

    121/181

    1

    U H

    ADSP-21xx Serial Port

  • 8/13/2019 BEngDSP Notes

    122/181

    1

    U H

    ADSP-21xx Serial Port UART

    ADSP-2101 Serial Port Block Diagram

    DMD Bus16

  • 8/13/2019 BEngDSP Notes

    123/181

    1

    U H

    CompandingHardware

    Receive Shift Register

    16

    16

    TXnTransmit Data Register

    Transmit Shift Register

    16

    16

    DT DR

    SerialControl

    SCLKTFS RFS

    InternalSerialClock

    Generator

    RXnReceive Data Register

    ADSP-21xx SPORT Features

  • 8/13/2019 BEngDSP Notes

    124/181

    1

    U H

    ADSP-21xx SPORTs Are Used For Synchronous Communication

    Full Duplex

    Fully Programmable

    Autobuffer Capability

    Multi-Channel Capability

    Data Rates Up To 13 Mbits/sec

    2171 Data Rates Up To 20 Mbits/sec

    Examples of Serial Port Implementation

    Connecting a CODEC to the Serial Port

  • 8/13/2019 BEngDSP Notes

    125/181

    1

    U H

    Connecting Two 2101's Together

    Using the Serial Port as a UART

    2101TP3053CODEC

    2101 2101

    2101

    (withsoftwareUART)

    PC

    AD233

    (RS-232 Driver)

    ADSP-21xx SPORT Hardware

    SCLK: Serial Clock

    SPORT Has 5 Wires

  • 8/13/2019 BEngDSP Notes

    126/181

    1

    U H

    RX: Data Receive

    TX: Data Transmit

    TFS: Transmit Frame Sync

    RFS: Receive Frame Sync

    SCLK

    TFS1

    RFS1

    RX

    TX

    ADSP-21xx Serial Device

    Serial Clock

    Transmit Frame Sync

    Receive Frame Sync

    Receive Data

    Transmit Data

    ADSP-21xx SPORT Software

    Access Serial Port Data By Accessing SPORT Data Registers:

  • 8/13/2019 BEngDSP Notes

    127/181

    1

    U H

    TX0, TX1, RX0, RX1

    Configure Serial Port Through Memory Mapped Control Registers:

    System Control Register **

    SPORT Control Register **

    SPORT SCLKDIV Register

    SPORT RFSDIV RegisterSPORT Autobuffer Control Register

    SPORT0 Multichannel Enable Registers

    ** Required to Configure SPORTs

    Synchronize SPORT Transfers and Processor Operation With Interrupts

    Each SPORT is Allocated a Transmit and Receive Interrupt

    The Base Architecture of Floating-Point DSP Processor

    DAG 1Program

    CACHE32 x 48 JTAG Test

    &Emulation

    TimerDAG 2

  • 8/13/2019 BEngDSP Notes

    128/181

    1

    U H

    8 x 4 x 32Program

    Sequencer

    Emulation

    BusConnect

    24

    32

    48

    40DMD BUS

    PMD BUS

    DMA BUS

    PMA BUS

    Fl P/Fx PALUMultiFx P MAC

    Fl P/Fx P 32-Bit

    Barrel Shift

    RegisterFile

    16 x 40

    8 x 4 x 24

    IEEE Compatibility(IEEE Floating Point Standard 754/854)

    Data Formats32-Bit Single-Precision IEEE Floating Point

    (23-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)

  • 8/13/2019 BEngDSP Notes

    129/181

    1

    U H

    40-Bit Extended Single-Precision IEEE Floating Point(31-Bit Data or Mantissa, 8-Bit Exponent, & Sign Bit)32-Bit Fixed Point (Integer and Fractional) With 80-Bit

    Accumulation

    RoundingRounding-to-Nearest (Unbiased Rounding)

    Round-Toward-Zero (Truncation)

    IEEE Exception HandlingOverflowUnderflowEquals ZerosDivide-by-Zero

    Interrupt on Exception or Latched Status

    RegisterFile

    Floating-Point Multiplier/MAC

  • 8/13/2019 BEngDSP Notes

    130/181

    1

    U H

    Fl P/Fx PALU

    MultiFx P MAC

    Fl P/Fx P 32-BitBarrel Shift

    16 x 40

    Example Multiplier/MAC InstructionsF1 = F5 * F7

    R2 = R3 * R8 (SSF)

    MRF = MRF + R5 * R0 (UUIR)

    RegisterFile

    Floating-Point Multiplier/MAC

  • 8/13/2019 BEngDSP Notes

    131/181

    1

    U H

    Fl P/Fx PALU

    MultiFx P MACFl P/Fx P 32-Bit

    Barrel Shift16 x 40

    Example Multiplier/MAC InstructionsF1 = F5 * F7

    R2 = R3 * R8 (SSF)

    MRF = MRF + R5 * R0 (UUIR)

    Example Multi-Function Instructions

  • 8/13/2019 BEngDSP Notes

    132/181

    1

    U H

    IF EQ F1 = ABS F8, F9 = DM (I0,M4)

    F8 =F1*F6, F3=F9+F14, F9=F9-F14,DM(I2,M0)=F10, PM(I10,M10)=F3

    The System Architecture

    1xCLOCK 4

  • 8/13/2019 BEngDSP Notes

    133/181

    1

    U H

    Peripherals

    DataMemoryD SP

    Selects

    OE

    WE

    ADDR

    DATA

    SelectsOE

    WE

    ADDR

    DATA

    ACK

    PMS1-0

    PMRD

    PMWR

    PM A

    PM D

    PMPAGEPMACK

    PMTS

    DMS3-0

    DMRD

    DMWR

    DM A

    DM D

    DMPAGEDMACK

    DMTS

    CLKINRESET IRQ3-0

    ProgramMemory

    Selects

    OE

    WE

    ADDR

    DATA

    54

    2

    24

    48

    4

    32

    40Processor

    NOISY SIGNAL CLEAN

    The Complete Architecture

  • 8/13/2019 BEngDSP Notes

    134/181

    1

    U H

    DIGITALS/H A/D PROCE- D/A

    fs

    DiscreteTime Value

    AnalogueDiscrete FilterProcessing

    SSOR

    Signal

    SIGNAL

    Analogue

    Signal

    What is a Real Time Application?

    Real Time is misleading expression.Howeverit means that the DSP system can process therequired algorithm within a specified time

  • 8/13/2019 BEngDSP Notes

    135/181

    1

    U H

    DIGITALS/H A/D

    PROCESSOR

    fs

    RADAR SIGNAL

    DISPLAY

    Fourier

    Transform

    x(t) x(f)

    f1 f2

    Real Time Operating Systems as an IdealEnvironment for Embedded Applications

    The current DSP processors:

  • 8/13/2019 BEngDSP Notes

    136/181

    1

    U H

    Are more than high-performance signal -processingengines

    Provide a more regular instruction set, with plenty ofaddress space to run large programs

    Come with efficient C compilers rival generalpurpose microprocessors

    Cli k t dd t t

    DSPEmbedded Applications

    DSP

    RTOS

  • 8/13/2019 BEngDSP Notes

    137/181

    1

    U H

    Click to add text RTOS

    Fax TasksTelephone

    Tasks

    Speech

    Recognition

    Tasks

    Sound

    Tasks

    Generation

    Answering

    Machine

    Tasks

    ARCHITECTUREDSP

    RTOS

    DSP Memory

    M t

    Real-Time

    M lti T ki

    DSP

    St I/O

  • 8/13/2019 BEngDSP Notes

    138/181

    1

    U H

    Managment Multi-Tasking Stream I/O

    DSP

    Event Handling

    Memory Segments Processor Segments Peripheral Devices

    Cli k t dd t t

    Operating System Features: BOS Nucleus RXTC SPOX Helios

    Preemptive Task Scheduling Yes Yes Yes Yes Yes

    Features for Real Time Operating Systems

  • 8/13/2019 BEngDSP Notes

    139/181

    1

    U H

    Click to add textTime-Sliced Scheduling Yes Yes Yes No YesRound-Robin Scheduling ? Yes Yes No Yes

    Parallel Processing No No No Optional Yes

    Inter-Task Messages Yes Yes Yes Yes Yes

    Memory Management Yes Yes Yes Yes Yes

    Interrupt Management Yes No Yes Yes YesTimer Management Yes Yes Yes No Yes

    Device-Independent I/ O No No No Yes Yes

    Stream I/ O $495* No No Yes Yes

    OS RAM/ ROM Size (Bytes) 5K-40K 4K-20K 12K-16K 44K+ 80K-200K

    Please contact the vendors listed above for the best and most up-to-date information

    Compression Techniques and a Compressorand De-Compressor Generator

    The CCITT/ISO Joint Photographic Experts Group

    (JPEG) d (MPEG) di it l i i

  • 8/13/2019 BEngDSP Notes

    140/181

    1

    U H

    (JPEG) and (MPEG) digital image compressionprocessing algorithmsare seriously required for:MultimediaVideo EditingColour Publishing and Graphics Arts

    Image-Processing, Storage and RetrievalColour Printers, Scanners and CopiersHigh-Speed Image Transmission Systems forLANs, Modem and Colour FacsimileDigital Cameras

    These algorithmsmay be implemented in real timeas:

    A) A dedicated Chip (Compressor)

    C P d t C i DCT H ff Q t iti P i

  • 8/13/2019 BEngDSP Notes

    141/181

    1

    U H

    Company Productname

    Compressionratio

    DCTTable

    HuffmanTable

    QuantasitionTable

    Pricein

    Fast

    Forward

    Outlaw Digital

    Video

    from 4:1 to

    10:1

    Board: Disc 0.5 GByte 4700

    950C-Cube CL 550

    En- / Decoder

    from 8:1 to

    100:1

    static program program 80

    C-Cube CL 650En- / Decoder

    from 1:1 to50:1

    static program program 200

    Winbond W9930En- / Decoder

    from 8:1 to100:1

    static static program 29

    LSI Logic L64702 * program

    program program 60

    B) DSP Processor + Compressor

  • 8/13/2019 BEngDSP Notes

    142/181

    1

    U H

    DATA

    compressed

    uncompressed

    DATA

    compressed

    uncompressedDCT

    IDCTDSP Processor

    DCT: Discrete Cosine Transform

    C) Software Solution (DSP C / Assembler code)

    Company Processor type Data Bits Operation

    frequency

    Benchmarks

    Optibase Motorola

    56002

    24 40 MHz *

  • 8/13/2019 BEngDSP Notes

    143/181

    1

    U H

    Atlanta Signal

    Processor

    Texas

    Instruments

    TMS320C31

    32 16 MHz 64 KB Grey scale

    700ms

    Sonitech

    International

    Texas

    Instruments

    TMS320C3x

    32 16 MHz 400 Kbytes/s b &

    540 Kbytes /s Colour

    Atlanta Signal

    Processor

    Analog Devices

    21020

    32 33 MHz 500 Kbytes/s b & W

    600 Kbytes /s Colour

    Zoran Corp Zoran ZR38000 16 25 MHz 440Kbytes/s b & W

    500 Kbytes /s Colour

    Compressor-De-Compressor Generator

    n Millions

    Pixels/Second

    Processing Rate

    Quantizer&

    HuffmanTables

    CompressionRate

    1:1 to 80:1

  • 8/13/2019 BEngDSP Notes

    144/181

    1

    U H

    Processing Rate

    MPEG Param.Comp/Decomp

    Generator

    CAssembly

    JPEG Param.

    1:1 to 80:1

    n Bit Gray Scale, RGB, CMYK, 4:4:4:4, YUV Colour Space I/O

    Comdisco: SPW

    Hyperception: HW

    Momentum: FDAS

    Modelsfor

    Code & Model Generator

    Performance Measures

    Two measures are used commonly:

    MIPS: Millions of Instructions Per Second

  • 8/13/2019 BEngDSP Notes

    145/181

    1

    U H

    MIPS: Millions of Instructions Per SecondThis is a measure of raw instruction

    execution rate without specifying the nature of the

    computations.

    MFLOPS: Millions of Floating Point Operations Per Second

    This is a measure useful in assessing computations in

    floating point format.

    The difference between MIPS and MFLOPS can be appreciated by

    considering a simple DO LOOP high level language construction:

    DO I = 1 TO 1000000 STEP 1

    BEGIN

    Z(1) = X(I) * Y(I) + C(I);

    END

  • 8/13/2019 BEngDSP Notes

    146/181

    1

    U H

    END

    Each iteration accomplishes two floating point operations, yet depending on the

    host computer the compiled assembly language code could occupy many bytes.

    The speed of execution of the two floating point operations depends therefore on

    the MIPS of the processor; provided that each iteration could be completed in

    say a nanosecond, the processor would then execute at the rate of two MFLOPS.A system of a giga (one thousand millions)! processors could conceivably do all

    the iterations at once and attain a performance of two giga MFLOPS.

    Despite its spread use, an MIP is perhaps the poorest definition of performance

    since it contains no quantifiable attributes for assessing useful processing.

    The term FLOPS is widely used in signal processing applications and is acommon measure of performance in comparing processors.

    Data Flow Bottle-necks & Solutions ;Pipeline & Parallel Architectures With Examples

    DATA INMEMORY INSTRUCTION

  • 8/13/2019 BEngDSP Notes

    147/181

    1

    U H

    DATA INMEMORYBUS

    DATA

    OUTPUT

    INSTRUCTION

    Bottle-neck Of a Shared Instruction/Data Bus inVon-Neumann Machine

    INSTRUCTION

    DATA BUS

    DATA

    ALU

    TMP

    ACCUM

    GENERALGPURPOSE

    REGISTERS

    PROG CNTR

    ADDR REG

    MEMORY

    (INSTRUCTIONSAND DATA)

    The First Generation P Architecture

  • 8/13/2019 BEngDSP Notes

    148/181

    1

    U H

    AND DATA)

    ADDRESS

    CONTROL & TIMING

    ADDRESS BUS

    Each instruction is a new event; it is fetched, decoded, and executed.

    The Assembly Language Commands Help To Execute Lengthy Manipulations

    On Designated Strings Of Data.

    The Programmer Must Code Iterative Loops Or To Use Other Mechanisms To

    Enhance Performance While Constrained With The Basic Limitations.At The Algorithmic Level, Many Sequences Of Operations Have Little Or No

    Precedence Relationships.

    The simplest view of a pipeline is that each stage consists of combinational

    logic driven by an input register. The output from a stage captured by the

    input register of the following stage. Each stage has a delay for the initial

    data capture and subsequent processing.

    It is possible to construct two types of pipeline system:

    i) Synchronous Pipeline

    Overview of the Pipeline Approach

  • 8/13/2019 BEngDSP Notes

    149/181

    1

    U H

    i) Synchronous Pipeline

    If all stages have an equal delay, then a synchronous clock can transfer

    results into each input register. This is the simplest control problem.

    ii) Asynchronous Pipeline

    If there is a large discrepancy between the various delays in each stage,

    then an asynchronous data transfer might be in order. Here the intermediateregisters are omitted. The design of such pipes requires careful timing of

    data input/output.

    The following figure shows a simple Pipeline DSP System.

    Combinatorial Logic

    In

    p

    u

    t

    R

    e

    g

    i

    st

    DSPADSP-2181

  • 8/13/2019 BEngDSP Notes

    150/181

    1

    U H

    t

    e

    r

    Stage jStage j-1 Stage j+1

    Simple Pipeline DSP System

    AD

    Converterjj-1 j+1 DA

    Converter

    When can the Pipeline Approach be considered?

    In general a pipeline can be considered if:

    i) The procedure can be broken into a sequence of discrete steps,

    ii) The steady state data flow matches the reminder of the system, &

    iii) Components can be found which implement the steps with the

    desired response.

  • 8/13/2019 BEngDSP Notes

    151/181

    1

    U H

    p

    How can the performance of the pipeline be measured?

    A synchronous pipeline produces a result every clock period t,

    i.e. a data-flow rate of 1/toutputs per second. An N-stage pipelinegives an apparent N-fold increase in performance. If the input to the

    pipeline is intermittent, however, then some stages will not be

    processing valid data, and this must be accounted for by the control

    mechanism. If, on the average, only a fraction P of the total stages

    are occupied, then the data flow falls to P/toutputs per second.

    In the following figure, a sequence of procedures is assumed each to

    process data in time t, except for the FFT procedure which

    consumes 8 t. Given that all the mechanisms for increasing

    Question:

  • 8/13/2019 BEngDSP Notes

    152/181

    1

    U H

    consumes 8 t. Given that all the mechanisms for increasingthroughput (i.e. for decreasing t)have been exhausted, what are the

    alternatives to enhance DSP performance?

    t 8t tP1 P = FFT2 P3

    Sequential Data Flow

  • 8/13/2019 BEngDSP Notes

    153/181

    Overview of the Parallel Approach

    The simplest view of a parallel approach is that the input data to be fed to the units

    sequentially via the input commutater and the output commutater collect the result

    data after the processors have been executed simultaneously.

  • 8/13/2019 BEngDSP Notes

    154/181

    1

    U H

    The following figure shows a simple Parallel DSP System.

    DSPADSP-2181

    I

    np

    u

    t

    C

    o

    m

    m

    u

    ta

    O

    u

    t

    p

    u

    t

    C

    o

    m

    m

    ut

  • 8/13/2019 BEngDSP Notes

    155/181

    1

    U H Simple Parallel DSP System

    AD

    ConverterDA

    Converter

    a

    t

    e

    r

    a

    t

    e

    r

    When can the Parallel Approach be considered?

    In general a parallel approach can be considered if:

    i) The procedure can not be broken into a sequence of discrete steps, &

    ii) The steady state data flow does not need to be constrained.

    Note: The input/output commutation is usually difficult to implement and consumes

    some overhead which lowers the effective throughput.

  • 8/13/2019 BEngDSP Notes

    156/181

    1

    U H

    How can the performance of the parallel be measured?

    A parallel array need not have an identical delay in each path, though this

    complicates the control problem. If each of N units has a delay ti, then the

    average delay could be used to compute data-flow. For N parallel paths theresponse will be shown to be the same as an N-stage pipeline. If a proportion

    P of units is unused then the output rate drops.

    The overall behaviour is identical therefore with a pipeline although

    implementation issues are widely different.

    The final resort to enhance DSP Performance is in the form of Multiplicity:Answer (continue):

    b) Parallel Array of Processing Units

    In this case the individual processors still operate with a response time of 8 .

    The input commutater sequentially allocates input data which is collected

    8 seconds later by the output commuter.

  • 8/13/2019 BEngDSP Notes

    157/181

    1

    U H

    t 8t t

    1

    8

    Bandwidth in = 1/t Bandwidth out = 1/t

    Parallel Data-Flow Solution

    Input Commutater Output Commutater

    Example: FFT with Serial, Pipelining and Parallel Butterflies

    The FFT provides a good example of the use of alternative

    signal-processing architecture to improve throughput.

    The key comparison is:i) That of butterfly time &

  • 8/13/2019 BEngDSP Notes

    158/181

    1

    U H

    i) That of butterfly time, tB, &ii) The time, (N/2)T log2N, to cycle through all butterflies of an FFT.

    The interval, t, includes the butterfly computation time and anyoverhead in address generation or looping.

    Realistic alternatives to consider are:

    Serial (direct)

    Pipeline log2N stages deep, with N/2 steps

    Parallel N/2 butterfly processors, iterate log2N times

    t1 t5DO 20 J = 1, log2 NDO 10 I = 1 N/2

    Serial (direct)

    Single processor compute each butterfly, one step at a time.

  • 8/13/2019 BEngDSP Notes

    159/181

    1

    U H

    t4

    t3

    t2 t6

    t7t8

    t9t1t11t12

    DO 10 I = 1, N/2

    10 CONTINUE

    20 CONTINUE

    The Serial (Classic) Approach

    The Computation Flow

    DO I = 1, N/2 DO I = 1, N/2 DO I = 1, N/2

    Pipeline log2N stages deep, with N/2 stepsHere there are log2N butterfly processors, corresponding to the number of passes

    (3 in the case of 8 data points- B1, B2, B3); each is used to compute the butterflies

    pertinent to its pass in series; as each pass is computed, the processors are ready to

    accept a pair of inputs for the next pass, and when the pipeline is full (steady state),

    a set of outputs will be produced by each pass (N/2 computations).

  • 8/13/2019 BEngDSP Notes

    160/181

    1

    U H

    Log2 N BUTTERFLY

    PROCESSORS (B1 - B3)

    IN A PIPELINE

    B1t4

    B1t3

    B1t2

    B1t1 B2t1

    B2t2

    B2t3B2t4

    B3t1B3t2B3t3B3t4

    The Pipeline Approach

    N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL

    Here there is one processor for each of the N/2 steps per pass; all butterflies for

    that pass are computed at the same time; as soon as one pass is completed, all are

    ready for the next pass; in the steady state, there will be an output for every

    computation cycle.

    Parallel N/2 butterfly processors, iterate log2N times

  • 8/13/2019 BEngDSP Notes

    161/181

    1

    U H

    N/2 BUTTERFLY PROCESSORS (B1 - B3) IN PARALLEL

    B4t1

    B3t1

    B2t1

    B1t1 B1t2

    B2t2

    B3t2B4t2

    B1t3B2t3B3t3B4t3

    The Parallel Approach

    DO J = 1, log2N

    DO J = 1, log2N

    DO J = 1, log2N

    DO J = 1, log2N

    Summarize the differences between the serial, pipeline, and theparallel architecture for the FFT example in terms of the

    computation time and the number of butterfly processors.

    Consider a 1024-point FFT, what are the time and hardware costs

    for the three architectures?

    Q.

    A.A hite t e C t ti Ti e N be f B tte fl

  • 8/13/2019 BEngDSP Notes

    162/181

    1

    U H

    Architecture Computation Time Number of Butterfly

    Processors

    Serial N/2log2N 1

    Pipeline N/2 log2N

    Parallel log2N N/2

    The 1024-point FFT costing:

    Serial 5,120 1

    Pipeline 512 10

    Parallel 10 512

    High Performance System Classification SchemeThere have been many attempts to classify processor architectures. A standard classification

    scheme would be exceedingly useful both for discussion purposes and as a guide to processor

    designs. The requirements for such a scheme are at least that:

    It be complete (i.e., include all architectures) and

    Orthogonal (i.e., differentiate the key attributes).Unfortunately, despite the attractiveness of the concept, no such scheme exists. Of the many

  • 8/13/2019 BEngDSP Notes

    163/181

    1

    U H

    proposals, one forms the basis of many others. It is neither complete nor orthogonal, yet its

    elegance and intrinsic simplicity are attractive and it does concentrate on data flow and control

    in a general way.

    The basis of scheme is that a processor processes data by a sequence of instructions regardless

    of the format and mechanisms whereby each arrives at the point of action. Based on the concept

    of a data stream and an instruction stream, four possibilities exist:

    SISD - Single Instruction Single Data Stream

    SIMD- Single Instruction Multiple Data Stream

    MISD Multiple Instruction Single Data Stream

    MIMD Multiple Instruction Multiple Data Stream

    Answer:

    Note that both the Babbage and Von Neumann architectures are SISD, although they differ greatly in

    implementation. The performance of such a configuration can be though of as unity for purposes of comparison:

    I

    Data in

    D1 D2 D3 D4

    Examples are shown in the following figures.

    Q. With the aid of appropriate diagram(s), show how the four categories in Flynns taxonomy can be emulatedon a dual processor shared-memory system. Your diagrams must clearly show the IS and DS from and to the

    various units.

  • 8/13/2019 BEngDSP Notes

    164/181

    1

    U H

    Data in

    I1 I2

    Data out

    (Version 1)

    I3

    I1 I2 I3 I4

    Data in

    D

    (Version 2)

    I

    DData in Data out

    Data out

    I

    SISD SIMD

    MISD

    D1

    I1

    D2

    I2

    D3

    I3

    D4

    I4

    MIMD

    The SIMD architecture is an example of a parallel array in which each processing unit executes the same

    instruction. It can achieve an n-fold increase in data flow band-width for each instruction, provided that

    the units can be continuously utilized.

    The original motivation for developing SIMD array processors was to perform parallel computations on

    vector or matrix types of data. Parallel processing algorithms have been developed by many computer

    scientists for SIMD computers. Important SIMD algorithms can be used to perform matrix multiplication,

    Fast Fourier Transform (FFT), matrix transposition, summation of vector elements, matrix inversion,

    parallel sorting, linear recurrence, Boolean matrix operations, and to solve partial differential equations.

    The MIMD architecture is implemented by a multiple processor system. Clearly implied is some form of

    cooperative network to share a computational task (completely autonomous units being of little interest)

    Discussion on the classification scheme

  • 8/13/2019 BEngDSP Notes

    165/181

    1

    U H

    cooperative network to share a computational task (completely autonomous units being of little interest).

    This is an example of a parallel array in which the task assigned to each processor can be different. The

    performance enhancement potential is equal to the number of processors.

    The MISD architecture is not widely implemented in practice and substantial disagreement exist on its

    exact structure. It is considered here as a pipeline in which a single data stream is modified at successive

    stages., and its performance enhancement potential equals the number of stages as shown in the previous

    section.

    There is a relationship between these classifications and the structure of processing algorithms. An

    algorithm may contain a collection of processing tasks which could optimally be assigned to different

    processing configurations to achieve an overall higher performance. If components were of sufficiently

    low cost, a solution might be to build a conglomerate of different processing architectures and utilize the

    optimum one at appropriate points in the algorithm. The task assignment problem here is formidable; and

    as well the physical complexity and lowered reliability of such a conglomerate of components is a major

    limiting factor of such a scheme. This will be discussed in more detail later.

    SIMD Matrix Multiplication & SIMD FFT

    *) G Barnes, et al.,"The ILLIAC-IV Computer," IEEE Trans. on Computers,

    Aug. 1968, pp. 746-756.

    To be found in the following References

  • 8/13/2019 BEngDSP Notes

    166/181

    1

    U H

    **) K Hwang & F Briggs, "Computer Architecture and Parallel Processing,"

    McGraw-Hill Book Company, 1985.

    *) B Wilkinson,"Computer Architecture: Design and Performance,"

    Prentic-Hall Int. Ltd, 1991.

    How To Design SIMD DSP System From TheOff-Shelf Fixed-Point DS Processors?

    Here we will develop SIMD DSP system with a processor-pair

    architecture, based on a dual-port RAM. The design is easy to

    implement and provides a significant computational boost overa single processor.

  • 8/13/2019 BEngDSP Notes

    167/181

    1

    U H

    The off-shelf Fixed-Point DS Processors are two ADSP-2101s,

    each with its own private memories. The following figure shows

    a block diagram of the system hardware architecture.

    A processor pair almost doubles the speed of a single processor while

    Keeping the architecture and

    Inter processor co-ordination as simple as possible.

    Hardware Architecture

    Program

    Memory

    Program

    Memory

    ADSP

    2101

    ADSP

    2101

    Private

    DataMemory

    Private

    DataMemory

    Common DataData

    Memory

    DMA DMA

    PMD PMD

    DMA DMA

  • 8/13/2019 BEngDSP Notes

    168/181

    1

    U H

    y

    (Dual-Port

    RAM)

    PMD PMDDMACK DMACK

    BUSYL BUSYR

    Processor Pair Block Diagram

    Private memories are accessible to one processor only.

    Common memory is accessed by both.

    Each memory has a private memory of 32K of 24-bit

    program memory and 14K of 16-bit data memory.

    In addition, 2K of 16-bit dual-port RAM is shared by both processors.This area of memory allows inter-processor communication and data

    transfers.

    Software ArchitectureTo complement the hardware design, a hypothetical application is

    presented. Data is input and low-pass filtered by one processor,

    then the second processor determines the peak location within a

    filtered window.

    Although the software implementation is simplistic, it shows a technique for programming

    in a multiprocessing environment: alternating buffers and flags

  • 8/13/2019 BEngDSP Notes

    169/181

    1

    U H

    The alternating buffers in this application are two identical buffers

    located in dual-port RAM so both processors can access them:

    The first processor fills buffer 1 with information,While the second processor fills the information in buffer 2.

    Each buffer has a flag that indicates completion of operations on

    that buffer.

    When processor 1 has finished its operations on the buffer data,

    It sets the flag, signalling processor 2 to begin operations on that buffer.

    in a multiprocessing environment: alternating buffers and flags.

    The sequence of operations is shown in the following table:

    Processor 1 (Filter) Processor 2 (Peak Locator)Initialise flags, coefficients initialise pointers

    delay line, pointers

    Perform low pass filter Check flag 1; wait if not set

    operation on data in buffer 1

    Set flag 1Check flag 1; if set, perform

  • 8/13/2019 BEngDSP Notes

    170/181

    1

    U H

    g p

    Perform low pass filter peak locating operation on

    operation on data in buffer 2

    data in buffer 1

    Clear flag 1

    Set flag 2

    Check flag 2; if set, perform

    Perform low pass filter peak locating operation on

    operation on data in buffer 1 data in buffer 2

    Clear flag 2

    Set flag 1; etc.Check flag 1; etc.

    The alternating buffer scheme is easier to implement than a single buffer scheme. If only one buffer were used, careful timing analysis or extensive

    handshaking would be required to ensure that the processors did not use old or invalid data.

    The Modified Harvard Architecture

    DSP

    Processor DataStorage

    DM

    Program

    &Data

    St

    PM

    Data Data

    Address Address

    3224

    Multiprocessing With The SHARC

  • 8/13/2019 BEngDSP Notes

    171/181

    1

    U H

    Harvard Architecture: Simultaneous Access of Data and Instruction

    Modified Harvard Architecture: Simultaneous Access of 2 Data Memories and Instruction from Cache Gives Three Bus Performance with only 2 Busses

    Storage32/4048

    I/O

    Cache

    SHARCComplete Signal Computer On A Chip

    ADSP-21000 Family High Performance Processor Core - 25ns = 40MIPS / 120 MFLOPS

    Large Efficient On-Chip Memory System

    - 4 Megabits on ADSP-21060- 2 Megabits on ADSP-21062

  • 8/13/2019 BEngDSP Notes

    172/181

    1

    U H

    - 2 Megabits on ADSP-21062

    DMA Controller and I/O Processor- Allows Flexible, Zero-Overhead, High-Speed Data Transfers

    - 240 Mbytes/s

    Host Interface- Efficient Interface to 16- & 32-Bit Microprocessors

    Two Serial Ports- 40 Mbit/s Multichannel Serial Ports

    Two Integrated Multiprocessing Interfaces- Glueless Cluster Interface Tran


Recommended