+ All Categories
Home > Documents > Comp Arith Notes

Comp Arith Notes

Date post: 30-May-2018
Category:
Upload: dannmartins9
View: 214 times
Download: 0 times
Share this document with a friend

of 26

Transcript
  • 8/14/2019 Comp Arith Notes

    1/26

    Zurich Technische Hochschule Eidgenossische

    Swiss Federal Institute of Technology Zurich Politecnico federale di Zurigo Ecole polytechnique federale de Zurich

    Institut f ur Integrierte Systeme Integrated Systems Laboratory

    Lecture notes on

    Computer Arithmetic:Principles, Architectures,

    and VLSI Design

    March 16, 1999

    Reto Zimmermann

    Integrated Systems LaboratorySwiss Federal Institute of Technology (ETH)

    CH-8092 Z urich, Switzerland [email protected]

    Copyright c

    1999 by Integrated Systems Laboratory, ETH Z urichhttp://www.iis.ee.ethz.ch/ zimmi/publications/comp arith notes.ps.gz

  • 8/14/2019 Comp Arith Notes

    2/26

    Contents

    Contents

    1 Introduction and Conventions 4

    1.1 Outline 4

    1.2 Motivation 4

    1.3 Conventions 5

    1.4 Recursive Function Evaluation

    62 Arithmetic Operations 8

    2.1 Overview 8

    2.2 Implementation Techniques 9

    3 Number Representations 10

    3.1 Binary Number Systems (BNS) 10

    3.2 Gray Numbers 13

    3.3 Redundant Number Systems 14

    3.4 Residue Number Systems (RNS) 16

    3.5 Floating-Point Numbers 18

    3.6 Logarithmic Number System 193.7 Antitetrational Number System 19

    3.8 Composite Arithmetic 20

    3.9 Round-Off Schemes 21

    4 Addition 22

    4.1 Overview 22

    4.2 1-Bit Adders, (m, k)-Counters 23

    Computer Arithmetic: Principles, Architectures, and VLSI Design 1

    Contents

    4.3 Carry-Propagate Adders (CPA) 26

    4.4 Carry-Save Adder (CSA) 45

    4.5 Multi-Operand Adders 46

    4.6 Sequential Adders 52

    5 Simple/ Addition-Based Operations 53

    5.1 Complement and Subtraction 53

    5.2 Increment / Decrement 545.3 Counting 58

    5.4 Comparison, Coding, Detection 60

    5.5 Shift, Extension, Saturation 64

    5.6 Addition Flags 66

    5.7 Arithmetic Logic Unit (ALU) 68

    6 Multiplication 69

    6.1 Multiplication Basics 69

    6.2 Unsigned Array Multiplier 71

    6.3 Signed Array Multipliers 72

    6.4 Booth Recoding 73

    6.5 Wallace Tree Addition 75

    6.6 Multiplier Implementations 75

    6.7 Composition from Smaller Multipliers 76

    6.8 Squaring 76

    7 Division / Square Root Extraction 77

    7.1 Division Basics 77

    Computer Arithmetic: Principles, Architectures, and VLSI Design 2

    Contents

    7.2 Restoring Division 78

    7.3 Non-Restoring Division 78

    7.4 Signed Division 79

    7.5 SRT Division 80

    7.6 High-Radix Division 81

    7.7 Division by Multiplication 81

    7.8 Remainder / Modulus 82

    7.9 Divider Implementations 83

    7.10 Square Root Extraction 84

    8 Elementary Functions

    858.1 Algorithms 85

    8.2 Integer Exponentiation 86

    8.3 Integer Logarithm 87

    9 VLSI Design Aspects 88

    9.1 Design Levels 88

    9.2 Synthesis 90

    9.3 VHDL 91

    9.4 Performance 93

    9.5 Testability 95

    Bibliography 96

    Computer Arithmetic: Principles, Architectures, and VLSI Design 3

  • 8/14/2019 Comp Arith Notes

    3/26

    1 Introduction and Conventions 1.2 Motivation

    1 Introduction and Conventions

    1.1 Outline

    Basic principles of computer arithmetic [1, 2, 3, 4, 5, 6, 7] Circuit architectures and implementations of main

    arithmetic operations

    Aspects regarding VLSI design of arithmetic units

    1.2 Motivation

    Arithmetic units are, among others, core of every data path and addressing unit

    Data path is core of : microprocessors (CPU) signal processors (DSP) data-processing application specic ICs (ASIC) and

    programmable ICs (e.g. FPGA) Standard arithmetic units available from libraries Design of arithmetic units necessary for :

    non-standard operations high-performance components library development

    Computer Arithmetic: Principles, Architectures, and VLSI Design 4

    1 Introduction and Conventions 1.3 Conventions

    1.3 Conventions

    Naming conventions

    Signal buses :

    (1-D),

    (2-D), : (subbus, 1-D)

    Signals : , (1-D),

    (2-D),

    : (group signal)

    Circuit complexity measures :

    (area), (cycle time,

    delay),

    (area-time product), (latency, # cycles) Arithmetic operators : , , , , log ( log 2 )

    Logic operators : (or), (and), (xor), (xnor), (not)

    Circuit complexity measures

    Unit-gate model ( gate-equivalents (GE) model) : Inverter, buffer :

    0 0 (i.e. ignored) Simple monotonic 2-input gates (AND, NAND, OR,

    NOR) :

    1 1

    Simple non-monotonic 2-input gates (XOR, XNOR) :

    2 2 Complex gates : composed from simple gates

    Simple -input gates :

    1

    log !

    Wiring not considered (acceptable for comparisonpurposes, local wiring, multilevel metallization)

    Only estimations given for complex circuits

    Computer Arithmetic: Principles, Architectures, and VLSI Design 5

    1 Introduction and Conventions 1.4 Recursive Function Evaluation

    1.4 Recursive Function Evaluation

    Given : inputs , outputs " , function # (graph sym. : )

    Non-recursive functions (n. ) Output " is a function of input (or $ % & : $ const.)

    "

    #

    '

    (

    )

    ; 0 0 1 1 1 2 1

    parallel structure :

    3

    '

    2

    )

    3

    '

    1)

    funn.epsi

    194

    17 mm1

    a 0 a 1a 2 a 3

    z 0 z 1z 2 z 3

    Recursive functions (r.) Output " is a function of all inputs

    5 6 0

    a) with single output " " 7 8 1 (r.s.) :

    9

    #

    '

    9

    8 1)

    ; 0 0 1 1 1 2 19

    8 1 0 1 " 9

    7

    8 1

    1. # is non-associative (r.s.n. )

    serial structure :

    3

    '

    2

    )

    3

    '

    2

    )

    funrsn.epsi19 4 24 mm

    123

    a 0

    a 1

    a 2

    a 3

    z

    Computer Arithmetic: Principles, Architectures, and VLSI Design 6

    1 Introduction and Conventions 1.4 Recursive Function Evaluation

    2. # is associative (r.s.a. ) serial or single-tree structure :

    3

    '

    2

    )

    3

    '

    log 2 )

    funrsa.epsi19 4 20 mm

    12

    a 0 a 1a 2 a 3

    z

    b) with multiple outputs " (r.m.) (

    prex problem) :

    "

    #

    '

    "

    8 1)

    ; 0 0 1 1 1 2 1 " 8 1 0 1

    1. # is non-associative (r.m.n. )

    serial structure :

    3

    '

    2

    )

    3

    '

    2

    )

    funrmn.epsi19 4 25 mm

    1

    23

    a 0 a 1a 2 a 3

    z 0 z 1z 2 z 3

    2. # is associative (r.m.a. )

    serial or multi-tree structure :

    3

    '

    2 2)

    3

    '

    log 2 )

    funrma1.epsi19 4 43 mm

    12

    a 0 a 1a 2 a 3

    z 0

    z 1

    z 2

    z 3

    or shared-tree structure :

    3

    '

    2 log 2 )

    3

    '

    log 2 )

    funrma2.epsi19 4 21 mm

    12

    a 0 a 1a 2 a 3

    z 0 z 1z 2 z 3

    Computer Arithmetic: Principles, Architectures, and VLSI Design 7

  • 8/14/2019 Comp Arith Notes

    4/26

    2 Arithmetic Operations 2.1 Overview

    2 Arithmetic Operations

    2.1 Overview

    arithops.epsi98 4 83 mm

    = , < + 1 , 1 + , + /

    exp (x)

    trig (x)

    sqrt (x)

    log (x)

    >

    + ,

    fixed-point floating-pointbased on operation

    related operation

    hyp (x) c o m p

    l e x

    i t y

    (same as onthe left for

    floating-pointnumbers)

    1 shift/extension 7 division2 comparison 8 square root extraction3 increment/decrement 9 exponential function4 complement 10 logarithm function5 addition/subtraction 11 trigonometric functions6 multiplication 12 hyperbolic functions

    Computer Arithmetic: Principles, Architectures, and VLSI Design 8

    2 Arithmetic Operations 2.2 Implementation Techniques

    2.2 Implementation Techniques

    Direct implementation of dedicated units :

    always : 1 5 in most cases : 6 sometimes : 7, 8

    Sequential implementation using simpler units andseveral clock cycles ( decomposition) :

    sometimes : 6 in most cases : 7, 8, 9

    Table look-up techniques using ROMs :

    universal : simple application to all operations efcient only for single-operand operations of high

    complexity (8 12) and small word length (note: ROMsize 2

    7

    2 )

    Approximation techniques using simpler units : 712

    taylor series expansion polynomial and rational approximations convergence of recursive equation systems CORDIC (COordinate Rotation DIgital Computer)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 9

    3 Number Representations 3.1 Binary Number Systems (BNS)

    3 Number Representations

    3.1 Binary Number Systems (BNS)

    Radix-2 , binary number system (BNS) : irredundant,weighted, positional, monotonic [1, 2]

    2 -bit number is ordered sequence of bits (b inary dig its ) :

    '

    7

    8 1 7 8 2 1 1 1 0)

    2

    0 1 Simple and efcient implementation in digital circuits MSB/LSB (most-/least-signicant bit) : 7 8 1 / 0 Represents an integer or xed-point number, exact Fixed-point numbers :

    '

    &

    8 1 1 1 1 0

    -bit integer

    1

    8 1 1 1 1 & 8 7

    -bit fraction

    )

    Unsigned : positive or natural numbers

    Value :

    7

    8 127

    8 1 12 0

    7

    8 1

    0

    2

    Range : 0 27

    1

    Twos (2s) complement : standard representation of signed or integer numbers

    Value :

    7

    8 127

    8 1

    7

    8 2

    0

    2

    Range : 27

    8 1 2

    7

    8 1 1

    Computer Arithmetic: Principles, Architectures, and VLSI Design 10

    3 Number Representations 3.1 Binary Number Systems (BNS)

    Complement :

    27

    1 ,where

    '

    7

    8 1 7 8 2 1 1 1 0)

    Sign : 7 8 1

    Properties : asymmetric range, compatible withunsigned numbers in many arithmetic operations(i.e. same treatment of positive and negative numbers)

    Ones (1s) complement : similar to 2s complement

    Value :

    7

    8 1'

    27

    8 1 1

    )

    7

    8 2

    0

    2

    Range : '

    27

    8 1 1

    )

    27

    8 1 1

    Complement :

    27

    1

    Sign : 7 8 1

    Properties : double representation of zero, symmetricrange, modulo

    '

    27

    1)

    number system

    Sign-magnitude : alternative representation of signednumbers

    Value :

    '

    1)

    1

    7

    8 2

    0

    2

    Range : '

    27

    8 1 1

    )

    27

    8 1 1

    Complement :

    '

    7

    8 1 7 8 2 1 1 1 0)

    Sign : 7 8 1

    Computer Arithmetic: Principles, Architectures, and VLSI Design 11

  • 8/14/2019 Comp Arith Notes

    5/26

    3 Number Representations 3.1 Binary Number Systems (BNS)

    Properties : double representation of zero, symmetricrange, different treatment of positive and negativenumbers in arithmetic operations, no MSB toggles atsign changes around 0 (

    low power)

    Graphical representation

    numrep.epsi95 4 73 mm

    2 n 10

    unsigned

    2s complement

    1s complement

    sign-magnitude

    2 n 2 n 1

    0 0 0

    . . . 0

    0 1 1

    . . . 1

    1 0 0

    . . . 0

    1 1 1

    . . . 1

    binary number representation

    Conventions 2s complement used for signed numbers in these notes Unsigned and signed numbers can be treated equally in

    most cases, exceptions are mentioned

    Computer Arithmetic: Principles, Architectures, and VLSI Design 12

    3 Number Representations 3.2 Gray Numbers

    3.2 Gray Numbers

    Gray numbers (code ) : binary, irredundant, non-weighted,non-monotonic

    + Property : unit-distance coding (i.e. exactly one bittoggles between adjacent numbers)

    Applications : counters with low output toggle rate(low-power signal buses), representation of continuoussignals for low-error sampling (no false numbers due toswitching of different bits at different times)

    Non-monotonic numbers : difcult arithmetic operations,e.g. addition, comparison :

    1

    0

    1

    0

    0

    0

    0 0 0 1 and 0 11 1 1 0 but 1 0

    binary Gray :

    %

    1

    7

    0 ;0 0 1 1 1 2 1 (n.)

    Gray binary :

    %

    1

    7

    0 ;0

    2

    1 1 1 1 0 (r.m.a.)

    binary Gray

    3

    2

    1

    0 3 2 1 00 0 0 0 0 0 0 0 01 0 0 0 1 0 0 0 12 0 0 1 0 0 0 1 13 0 0 1 1 0 0 1 04 0 1 0 0 0 1 1 05 0 1 0 1 0 1 1 16 0 1 1 0 0 1 0 17 0 1 1 1 0 1 0 08 1 0 0 0 1 1 0 09 1 0 0 1 1 1 0 1

    10 1 0 1 0 1 1 1 111 1 0 1 1 1 1 1 012 1 1 0 0 1 0 1 013 1 1 0 1 1 0 1 114 1 1 1 0 1 0 0 115 1 1 1 1 1 0 0 0

    Computer Arithmetic: Principles, Architectures, and VLSI Design 13

    3 Number Representations 3.3 Redundant Number Systems

    3.3 Redundant Number Systems Non-binary , redundant , weighted number systems [1, 2] Digit set larger than radix (typically radix 2)

    multiplerepresentations of same number redundancy

    + No carry-propagation in adders

    more efcient impl.of adder-based units (e.g. multipliers and dividers)

    Redundancy

    no direct implementation of relationaloperators conversion to irredundant numbers

    Several bits used to represent one digit

    higher storagerequirements

    Expensive conversion into irredundant numbers (notnecessary if redundant input operands are allowed)

    Delayed-carry of half-adder number representation :

    0 1 2 ,

    0 1 ,

    '

    %

    1

    )

    2 % 1

    , % 1

    0

    7

    8 1

    0

    2

    '

    )

    1 digit holds sum of 2 bits (no carry-out digit) example :

    '

    00 10)

    00 10 01 01 '

    10 00)

    irredundant representation of 1 [8], since

    %

    1

    0 &

    1

    1

    0

    Carry-save number representation :

    0 1 2 3 ,

    0 1 ,

    '

    %

    1

    )

    2 % 1

    7

    8 1

    0

    2

    '

    )

    Computer Arithmetic: Principles, Architectures, and VLSI Design 14

    3 Number Representations 3.3 Redundant Number Systems

    1 digit holds sum of 3 bits or 1 digit + 1 bit (nocarry-out digit, i.e. carry is saved )

    standard redundant number system for fast addition

    Signed-digit (SD) or redundant digit (RD) numberrepresentation :

    9

    1 0 1

    1 0 1 ,

    7

    8 1

    0

    2

    no carry-propagation in

    :

    9

    '

    %

    1

    )

    2 % 1

    , % 1

    1 0 1

    '

    %

    1

    )

    is redundant (e.g. 0 1 01 11)

    0

    '

    ) !

    1 0 1 1 digit holds sum of 2 digits (no carry-out digit) minimal SD representation : minimal number of

    non-zero digits, 011

    1 10 100

    0 10 applications : sequential multiplication (less cycles),

    lters with constant coefcients (less hardware) example :

    7 '

    0111!

    1111!

    1011!

    minimal

    1001!

    11111!

    )

    canonical SD repres.: minimal SD + not two non-zero

    digits in sequence,

    01

    1

    10

    10

    0

    10

    SD binary : carry-propagation necessary (

    adder) other applications : high-speed multipliers [9] similar to carry-save , simple use for signed numbers

    Computer Arithmetic: Principles, Architectures, and VLSI Design 15

  • 8/14/2019 Comp Arith Notes

    6/26

    3 Number Representations 3 .4 Residue Number Systems (RNS)

    3.4 Residue Number Systems (RNS)

    Non-binary , irredundant , non-weighted number system [1]

    + Carry-free and fast additions and multiplications

    Complex and slow other arithmetic operations(e.g. comparison, sign and overow detection) because

    digits are not weighted , conversion to weightedmixed-radix or binary system required Codes for error detection and correction [1] Possible applications (but hardly used) :

    digital lters : fast additions and multiplications error detection and correction for arithmetic operations

    in conventional and residue number systems Base is 2 -tuple of integers

    '

    7

    8 1 7 8 2 1 1 1 0)

    ,residues (or moduli ) pairwise relatively prime

    '

    7

    8 1 7

    8 2 1 1 1

    0

    )

    &

    1 &

    2 &

    0 ,

    0 1 1 1 1 1

    Range:

    7

    8 1

    0

    , anywhere in ZZ

    mod !

    !

    &

    ,

    !

    !

    7

    8 1

    0

    ,

    '

    1 1 1

    0 1

    0 1 1 1 )

    Computer Arithmetic: Principles, Architectures, and VLSI Design 16

    3 Number Representa tions 3 .4 Residue Number Systems (RNS)

    Arithmetic operations : (each digit computed separately)

    "

    !

    !

    &

    !

    #

    '

    ) !

    &

    #

    ' !

    !

    &

    )

    &

    !

    #

    '

    ) !

    &

    !

    !

    &

    !

    !

    &

    !

    !

    &

    &

    !

    !

    &

    !

    !

    &

    !

    !

    &

    !

    !

    &

    &

    !

    !

    &

    !

    !

    &

    !

    !

    &

    8 1

    &

    &

    8 2

    &

    (Fermats theorem)

    Best moduli are 2 and'

    2 1)

    : high storage efciency with 5 bits simple modular addition : 2 : 5 -bit adder without ,

    2 1 : 5 -bit adder with end-around carry ( 7 ) Example :

    '

    1 0)

    '

    3 2)

    ,

    6

    4 3 2 1 0 1 2 3 4 5 6 7 8 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 0 1 0 1 0 1 0 1 0 1 0 1 0

    possible range!

    5!

    6

    '

    1 0)

    ' !

    5!

    3 !

    5!

    2)

    '

    2 1)

    !

    4 5!

    6 '

    1 0)

    '

    2 1)

    ' !

    1 2!

    3 !

    0 1!

    2)

    '

    0 1)

    !

    3!

    6!

    4 5!

    6 '

    1 0)

    '

    2 1)

    ' !

    1 2!

    3 !

    0 1!

    2)

    '

    2 0)

    !

    2!

    6

    Computer Arithmetic: Principles, Architectures, and VLSI Design 17

    3 Number Representations 3.5 Floating-Point Numbers

    3.5 Floating-Point Numbers Larger range , smaller precision than xed-point

    representation, inexact , real numbers [1, 2] Double-number form

    discontinuous precision S biased exponent E unsigned norm. mantissa M

    '

    1)

    '

    1)

    1 1

    2

    8

    Basic arithmetic operations :

    '

    1)

    !

    %

    "

    '

    1)

    '

    1)

    "

    ! #

    '

    $

    $

    !

    )

    %

    &

    base on xed-point add, multiply, and shift operations postnormalization required (1

    6

    1) Applications :

    processors : real oating-point formats (e.g. IEEEstandard), large range due to universal use

    ASICs : usually simplied oating-point formats withsmall exponents, smaller range, used for rangeextension of normal xed-point numbers

    IEEE oating-point format :precision 2 2 2 bias range precision

    single 32 23 8 127 3 1 8 1038 108 7

    double 64 52 11 1023 9 10307 108 15

    Computer Arithmetic: Principles, Architectures, and VLSI Design 18

    3 Number Representa tions 3 .7 Anti tet rat ional Number System

    3.6 Logarithmic Number System Alternative representation to oating-point (i.e. mantissa

    + integer exponent only xed-point exponent ) [1] Single-number form

    continuous precision

    higheraccuracy, more reliable

    S biased xed-point exponent E

    '

    1)

    '

    1)

    2

    8

    (signed-logarithmic ) Basic arithmetic operations :

    '

    )

    '

    $

    $

    !

    )

    (additionally consider sign)

    : by approximation or addition in conventionalnumber system and double conversion

    '

    1)

    %

    '

    '

    1)

    '

    (

    )

    0

    '

    1)

    1

    '

    + Simpler multiplication/exponent., more complex addition

    Expensive conversion : (anti)logarithms (table look-up) Applications : real-time digital lters

    3.7 Antitetrational Number System

    Tetration (t. ( 22 2 2

    2

    3

    4

    ) and antitetration (a.t. ( ) [10]

    Larger range , smaller precision than logarithmic repres.,otherwise analogous (i.e. 2

    3

    t. ( log ( a.t. ( )

    Computer Arithmetic: Principles, Architectures, and VLSI Design 19

  • 8/14/2019 Comp Arith Notes

    7/26

    3 Number Representations 3.8 Composite Arithmetic

    3.8 Composite Arithmetic Proposal for a new standard of number representations [10] Scheme for storage and display of exact (primary:

    integer , secondary: rational ) and inexact (primary:logarithmic , secondary: antitetrational ) numbers

    Secondary forms used for numbers not representable by

    primary ones (

    no over-/underow handling necessary) Choice of number representation hidden from user, i.e.

    software/compiler selects format for highest accuracy Number representations :

    tag valueinteger : 00 2s complement integer

    rational : 01 slash denominator numerator

    logarithmic : 10 log integer log fraction

    antitetrational : 11 a.t. integer a.t. fraction Rational numbers : slash position (i.e. size of numerator/

    denominator) is variable and stored (oating slash) Storage form sizes : 32-bit (short), 64-bit (normal),

    128-bit (long), 256-bit (extended) Implementation : mixed hardware/software solutions Hardware proposal : long accumulator (4096 bits) holds

    any oating-point number in xed-point format

    higher accurary

    large hardware/software overhead

    Computer Arithmetic: Principles, Architectures, and VLSI Design 20

    3 Number Representations 3.9 Round-Off Schemes

    3.9 Round-Off Schemes Intermediate results with

    additional lower bits(

    higher accuracy) :

    '

    7

    8 1 1 1 1 0 8 1 1 1 1 8

    )

    Rounding : keeping error small during nal word length reduction :

    '

    7

    8 1 1 1 1

    0)

    Trade-off : numerical accuracy vs. implementation cost

    Truncation :

    '

    7

    8 1 1 1 1 0)

    0

    12

    12

    1 (= average error )

    Round-to-nearest (i.e. normal rounding ) :

    '

    7

    8 1 1 1 1

    0

    )

    12

    0 1 12

    0

    12

    1 (nearly symmetric) 0 1 12 can often be included in previous operation

    Round-to-nearest-even/-odd :

    8

    if '

    8 1

    1 1 1

    8

    )

    0

    0'

    7

    8 1 1 1 1

    1 0)

    otherwise

    0

    0 (symmetric) mandatory in IEEE oating-point standard

    3 guard bits for rounding after oating-point operations :guard bit (postnormalization), round bit

    (round-to-nearest), sticky bit

    (round-to-nearest-even)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 21

    4 Addition 4.1 Overview

    4 Addition

    4.1 Overview

    adders.epsi103 4 121 mm

    HA FA (m,k) (m,2)1-bit adders

    RCA CSKA CSLA CIA

    CLA PPA COSA

    carry-propagate adders

    carry-save adders

    CSA

    adderarray

    addertree

    arrayadder

    treeaddermulti-operand adders

    CPA

    3-operand

    multi-operand

    Legend:

    HA: half -adderFA: full-adder(m,k): (m,k)-counter(m,2): (m,2)-compressor

    CPA: carry-propagate adderRCA: ripple-carry adderCSKA:carry-skip adderCSLA: carry-select adderCIA: carry-increment adder

    CLA: carry-lookahead adderPPA: parallel-prefix adderCOSA:conditional-sum adder

    CSA: carry-save adder

    based on component related component

    Computer Arithmetic: Principles, Architectures, and VLSI Design 22

    4 Addition 4.2 1-Bit Adders, (m, k)-Counters

    4.2 1-Bit Adders, (m, k)-Counters

    Add up bits of same magnitude (i.e. 1-bit numbers)

    Output sum as 5 -bit number ( 5

    log

    1)

    or : count 1s at inputs

    (m, k)-counter [3](combinational counters)

    Half-adder (HA), (2, 2)-counter

    '

    )

    2

    3

    2

    '

    1

    )

    (sum)

    (carry-out)

    hasym.epsi18 4 23 mmHA

    a

    c out

    s

    b haschema1.epsi

    19 4 28 mm

    a

    c out

    s

    b

    (reference)

    haschema2.epsi21 4 43 mm

    a

    c out

    s

    b

    Computer Arithmetic: Principles, Architectures, and VLSI Design 23

  • 8/14/2019 Comp Arith Notes

    8/26

    4 Addition 4.2 1-Bit Adders, (m, k)-Counters

    Full-adder (FA), (3, 2)-counter

    '

    )

    2

    7

    7 4'

    2)

    (generate) 0

    (propagate) 1

    7

    7

    7

    7

    '

    )

    7

    7

    7

    7

    7

    0

    7

    1

    fasymbol.epsi18 4 21 mmFA

    a

    c out

    s

    b

    c in

    faschematic3.epsi29 4 32 mm

    a

    c out

    s

    b

    c in

    HA

    HA

    g

    p faschematic2.epsi32 4 35 mm

    a

    c out

    s

    b

    c in

    faschematic1.epsi29 4 43 mm

    a

    c out

    s

    b

    c in

    g p

    (reference)

    faschematic4.epsi29 4 41 mm

    a

    c out

    s

    b

    c in p

    0

    1faschematic5.epsi

    35 4 47 mm

    a

    c out

    s

    b

    c in

    0

    1

    c 0

    c 1

    Computer Arithmetic: Principles, Architectures, and VLSI Design 24

    4 Addition 4.2 1-Bit Adders, (m, k)-Counters

    (m, k)-counters'

    8 1 1 1 1

    0)

    8 1

    $

    0

    $ 2$

    &

    8 1

    0

    cntsymbol.epsi18 4 23 mm(m,k)

    a m-1...

    ...

    a 0

    s k-1 s 0 Usually built from full-adders

    Associativity of addition allows convertion from linear totree structure faster at same number of FAs

    7 log &

    1

    28

    7'

    log )

    4 2

    log

    4

    log3 !

    2

    log

    Example : (7, 3)-counter

    28 14

    count73ser.epsi42 4 59 mm

    FA

    a 0

    FA

    FA

    FA

    a 1 a 2 a 3 a 4 a 5 a 6

    s 0 s 1s 2 linear structure

    28 10

    count73par.epsi36 4 48 mm

    FA

    a 0

    FA

    FA

    FA

    a 1 a 2 a 3 a 4 a 5 a 6

    s 0 s 1s 2

    tree structure

    Computer Arithmetic: Principles, Architectures, and VLSI Design 25

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    4.3 Carry-Propagate Adders (CPA)

    Add two 2 -bit operands

    and

    and an optional carry-in

    7 by performing carry-propagation [1, 2, 11] Sum

    '

    )

    is irredundant '

    2

    1)

    -bit number'

    )

    27

    7

    2 % 1

    ;0 0 1 1 1 1 2 1

    0

    7

    7 (r.m.a.)cpasymbol.epsi

    29 4 26 mmc out CPA

    A B

    S

    c in

    Ripple-carry adder (RCA)

    Serial arrangement of 2 full-adders Simplest , smallest , and slowest CPA structure

    7 2 2 2

    14 2 2

    rca.epsi57 4 23 mmFAc out c in

    a n-1 b n-1

    s n-1

    FA

    a 1 b 1

    s 1

    FA

    a 0 b 0

    s 0

    c 1c 2 c n-1

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Carry-propagation speed-up techniques

    a) Concatenation of partial CPAs with fast 7

    speedup1.epsi84 4 26 mm

    a i-1:k b i-1:k

    s i-1:k

    c in c out CPA CPA

    a k-1:0 b k-1:0

    CPA

    a n-1:j b n-1:j

    s k-1:0 s n-1:j

    c k c i c j

    . . .

    . . .

    a) Fast carry look-ahead logic for entire range of bits

    speedup2.epsi104 4 50 mm

    c out c in

    a n-1 b n-1

    s n-1

    a 1 b 1

    s 1

    a 0 b 0

    s 0

    . . .

    . . .

    preprocessing

    postprocessing

    carry propagation

    Computer Arithmetic: Principles, Architectures, and VLSI Design 27

  • 8/14/2019 Comp Arith Notes

    9/26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Carry-skip adder (CSKA)

    Type a) : partial CPA with fast

    8 1:

    8 1: (bit group'

    8 1 1 1 1 )

    )

    8 1:

    8 1

    8 2

    (group propagate)

    1)

    8 1: 0 :

    and

    selected (

    )2)

    8 1: 1 :

    but

    skipped (

    )

    path

    never sensitized

    fast

    false path

    inherent logic redundancy

    problems incircuit optimization, timing analysis, and testing

    Variable group sizes (faster) : larger groups in the middle(minimize delays 0

    8 1 and

    7

    8 1) Partial CPA typ. is RCA or CSKA (

    multilevel CSKA) Medium speed-up at small hardware overhead

    (+ AND/bit + MUX/group)

    82

    42 1

    1

    2

    322 3

    1

    2

    cska.epsi99 4 36 mm

    a i-1:k b i-1:k

    s i-1:k

    c in c out

    CPA0

    1

    P i-1:k

    CPA

    a k-1:0 b k-1:0

    CPA

    a n-1:j b n-1:j

    s k-1:0 s n-1:j

    c k

    c i

    c i c j

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 28

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Carry-select adder (CSLA) Type a) : partial CPA with fast

    and

    8 1:

    8 1: 0

    8 1: 1

    8 1:

    0

    1

    Two CPAs compute two possible results ( 7 0 1),

    group carry-in selects correct one afterwards Variable group sizes (faster) : larger groups at end (MSB)

    (balance delays 0 and 0 ) Part. CPA typ. is RCA, CSLA ( multil. CSLA), or CLA High speed-up at high hardware overhead

    (+ MUX/bit + (CPA + MUX)/group)

    14 2

    2 1 8 2 11

    2

    39 2 31

    2

    csla.epsi102 4 50 mm

    c in c out CPA

    a k-1:0 b k-1:0

    s k-1:0

    0CPA

    CPA

    0 1

    10

    1

    s i-1:k 0 s i-1:k

    1

    c i 0

    c i 1

    a i-1:k b i-1:k

    s i-1:k

    c k c i

    . . .

    . . .

    c k

    Computer Arithmetic: Principles, Architectures, and VLSI Design 29

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Carry-increment adder (CIA) Type a) : partial CPA with fast

    and

    8 1:

    8 1:

    8 1:

    8 1:

    8 1:

    8 1

    8 2

    (group propagate)

    Result is incremented after addition, if

    1 [12, 11] Variable group sizes (faster) : larger groups at end (MSB)

    (balance delays 0 and

    ) Part. CPA typ. is RCA, CIA (

    multilevel CIA) or CLA High speed-up at medium hardware overhead

    (+ AND/bit + (incrementer + AND-OR)/group) Logic of CPA and incrementer can be merged [11]

    10 2

    2 1 8 2 11

    2

    28 2 31

    2

    cia.epsi86 4 43 mm

    c in c out CPA

    a k-1:0 b k-1:0

    s k-1:0

    c k c i

    a i-1:k b i-1:k

    s i-1:k

    0CPA

    +1

    c i

    s i-1:k

    P i-1:k

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 30

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Example : gate-level schematic of carry-incr. adder (CIA) only 2 different logic cells ( bit-slices ) : IHA and IFA

    4 6 10 12 14 16 18 20 22 24 26 28 ... 38max group 2 3 4 5 6 7 8 9 10 11 ... 16

    1 2 4 7 11 16 22 29 37 46 56 67 ... 137

    ciagate.epsi100 4 112 mm s k

    a k b k a k+1 b k+1

    s i-2

    a i-2 b i-2

    s i-1

    a i-1 b i-1

    c k c i

    . . .

    . . .

    . . .

    c in c out

    IFA IFA IFA IHA

    IHAIFA + IHA(i-k-1)IFA + IHA

    . . .. . .

    s k+1

    2IFA + IHA IHA

    bit 0bit 1bits 3,2bits 6...4bits i-1...k

    Computer Arithmetic: Principles, Architectures, and VLSI Design 31

  • 8/14/2019 Comp Arith Notes

    10/26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Conditional-sum adder (COSA)

    Type a) : optimized multilevel CSLA with'

    log 2 )

    levels(i.e. double CPAs are merged at higher levels)

    Correct sum bits ( 0 8 1: or 1

    8 1: ) are (conditionally )selected through

    '

    log 2 )

    levels of multiplexers Bit groups of size 2 at level

    Higher parallelism , more balanced signal paths Highest speed-up at highest hardware overhead

    (2 RCA + more than'

    log 2 )

    MUX/bit)

    3 2 log 2

    2log 2

    6 2 log 2 2

    cosa.epsi100 4 57 mm

    c in FA

    a 0 b 0

    s 3

    0 1

    FA

    FA

    0

    1

    a 1 b 1

    0 1

    0 1

    FA

    FA

    0

    1

    a 3 b 3

    0 1 0 1

    FA

    FA

    0

    1

    a 2 b 2

    0 1

    c out s 0 s 2 s 1

    0 1

    l e v e

    l 2

    l e v e

    l 1

    l e v e

    l 0

    . . .

    ...

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 32

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Carry-lookahead adder (CLA), traditional

    Type b) : carries looked ahead before sum bits computed Typically 4-bit blocks used (e.g. standard IC SN74181)

    0

    0 1

    0

    0

    0

    2

    1

    1

    0

    1

    0

    0 3

    2

    2

    1

    2

    1

    0

    2

    1

    0

    0

    3

    3

    3

    2

    3

    2

    1

    3

    2

    1

    0

    3

    3

    2

    1

    0

    clbsymbol.epsi27 4 26 mm

    c 3

    CLBc 0

    . . .

    c 0 . . .

    (g ,p )0 0

    (g ,p )3 3

    (g ,p )3 3

    Hierarchical arrangement using'

    12 log

    2

    )

    levels :'

    3

    3

    )

    passed up,

    0 passed down between levels High speed-up at medium hardware overhead

    14 2

    4 log 2

    56 2 log 2

    cla.epsi97 4 48 mm

    CLB CLB CLB CLB

    CLB c in

    (g ,p )3 3 ... (g ,p )0 0 (g ,p )7 7 ... (g ,p )4 4 (g ,p )11 11 ... (g ,p )8 8 ... (g ,p )12 12 (g ,p )15 15

    c 15 c 12 ...

    c 12 c 8 c 4 c 0

    ( g

    , p

    )

    3

    3

    c 11 c 8 ... c 7 c 4 ... c 3 c 0 ...

    ( g

    , p

    )

    1 5

    1 5

    ( g

    , p

    )

    7

    7

    ( g

    , p

    )

    1 1

    1 1

    + preprocessing :

    + postprocessing :

    Computer Arithmetic: Principles, Architectures, and VLSI Design 33

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Parallel-prex adders (PPA) Type b) : universal adder architecture comprising RCA,

    CIA, CLA, and more (i.e. entire range of area-delaytrade-offs from slowest RCA to fastest CLA)

    Preprocessing , carry-lookahead , and postprocessing step Carries calculated using parallel-prex algorithms

    + High regularity : suitable for synthesis and layout

    + High exibility : special adders, other arithmeticoperations, exchangeable prex algorithms (i.e. speeds)

    + High performance : smallest and fastest adders

    5 2 3

    4 2

    add.epsi///gures73 4 64 mm

    a n - 1

    a 0

    b n - 1

    b 0

    s n - 1

    s 0

    c out

    c in

    c n p n-1

    (g , p )0 0

    c 0 p 0 c 1

    (g , p )n-1n-1

    ...

    a 1

    b 1

    s 1

    a n - 2

    b n - 2

    s n - 2

    ...

    ... ...

    preprocessing:

    carry-lookahead:prex algorithm

    postprocessing:

    Computer Arithmetic: Principles, Architectures, and VLSI Design 34

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Prex problem Inputs

    '

    (

    7

    8 1 1 1 1 (

    0)

    , outputs'

    7

    8 1 1 1 1

    0)

    , associativebinary operator [11, 13]'

    7

    8 1 1 1 1

    0)

    '

    (

    7

    8 1

    (

    0 1 1 1 (

    1(

    0 (

    0)

    or

    0 (

    0

    (

    8 1 ; 0 1 1 1 1 2

    1 (r.m.a.)

    Associativity of tree structures for evaluation :(

    3

    '

    (

    2

    '

    (

    1(

    0

    '

    1

    11:0

    )

    '

    2

    2

    2:0

    )

    '

    3

    33:0

    '

    (

    3(

    2

    13:2

    )

    '

    (

    1(

    0

    '

    1

    11:0

    )

    '

    3

    2

    3:0

    , but 2 ?

    Group variables : : covers bits'

    (

    1 1 1

    (

    )

    at level

    Carry-propagation is prex problem : : '

    :

    :

    )

    '

    0

    :

    0

    :

    )

    '

    )

    '

    :

    :

    )

    '

    8 1

    :$

    %

    1

    8 1

    :$

    %

    1

    )

    '

    8 1$

    :

    8 1$

    :

    )

    ; 5 6

    6 0

    '

    8 1

    :$

    %

    1

    8 1

    :$

    %

    1 8 1

    $

    :

    8 1

    :$

    %

    1

    8 1$

    :

    )

    %

    1 &

    :0 ; 0 0 1 1 1 2

    1

    1 1 1 1

    Parallel-prex algorithms [11] :

    multi-tree structures ( 3

    '

    2

    )

    3

    '

    log2

    )

    ) sharing subtrees (

    3

    '

    2 2)

    3

    '

    2 log 2 )

    ) different algorithms trading area vs. delay (inuences

    also from wiring and maximum fan-out

    3

    )

    Computer Arithmetic: Principles, Architectures, and VLSI Design 35

  • 8/14/2019 Comp Arith Notes

    11/26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Prex algorithms

    Algorithms visualized by directed acyclic graphs (DAG)with array structure ( 2 bits

    levels) Graph vertex symbols :

    8 1

    :$

    %

    1

    8 1

    :$

    %

    1

    8 1$

    :

    8 1$

    :

    :

    :

    :

    :

    (contains logic for )

    8 1

    :

    8 1

    :

    :

    :

    :

    :

    (contains no logic)

    Performance measures :

    : graph size (number of black nodes)

    : graph depth (number of black nodes on critical path) Serial -prex algorithm (

    RCA)

    2

    1 2 1

    3

    2

    ser.epsi///gures69 4 38 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    0123

    1415

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 36

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Sklansky parallel-prex algorithm (

    PPA-SK) Tree-like collection, parallel redistribution of carries

    12

    2 log 2

    log 2 !

    3

    12

    2

    sk.epsi///gures67 4 30 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    1234

    0

    Brent-Kung parallel-prex algorithm (

    PPA-BK) Traditional CLA is PPA-BK with 4-bit groups Tree-like redistribution of carries (fan-out tree)

    2 2

    log 2 !

    2 2

    log 2 !

    2

    3

    log 2

    bk.epsi///gures67 4 38 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    1234

    0

    56

    Computer Arithmetic: Principles, Architectures, and VLSI Design 37

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Kogge-Stone parallel-prex algorithm (

    PPA-KS) very high wiring requirements

    2 log 2 2 1

    log 2 !

    3

    2

    ks.epsi///gures67 4 52 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    1

    2

    3

    4

    0

    Carry-increment parallel-prex algorithm ( CIA)

    2 2 1 1 4 2 11

    2

    1 1 4 2 11

    2

    3

    1 1 4 2 11

    2

    cia.epsi///gures67 4 34 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    01

    2345

    Computer Arithmetic: Principles, Architectures, and VLSI Design 38

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Mixed serial/parallel -prex algorithm (

    RCA + PPA)

    linear size-depth trade-off using parameter 5 :

    0 6 5 6 2 2

    log 2 !

    2

    5 0 : serial-prex graph5

    2

    2

    log 2 !

    1 : Brent-Kung parallel-prexgraph

    lls gap between RCA and PPA-BK (i.e. CLA) in stepsof single -operations

    2

    1 5 2 1 5

    3

    var.

    var.epsi///gures68 4 54 mm

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    012345678

    910

    Computer Arithmetic: Principles, Architectures, and VLSI Design 39

  • 8/14/2019 Comp Arith Notes

    12/26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Example : 4-bit parallel-prex adder (PPA-SK) efcient AND-OR-prex circuit for the generate and

    AND-prex circuit for the propagate signals optimization : alternatingly AOI-/OAI- resp. NAND-/

    NOR-gates (inverting gates are smaller and faster) can also be realized using two MUX-prex circuits

    askgate.epsi///gures100 4 103 mm

    c out

    a 3 b 3

    s 3 s 2 s 1 s 0 P n-1:0

    a 2 b 2 a 1 b 1 a 0 b 0

    c in

    Computer Arithmetic: Principles, Architectures, and VLSI Design 40

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Prex adder synthesis

    Local prex graph transformation :

    3

    3unfact.epsi

    20 4 26 mm

    0123

    012

    3

    depth-decr.transform

    size-decr.

    transform

    fact.epsi20 4 26 mm

    0123

    012

    3

    4

    2

    Repeated (local) prex transformations result in overallminimization of graph depth or size

    which sequence ? Goal: minimal size (area) at given depth (delay) Simple algorithm for sequence of applied transforms :

    Step 1 : prex graph compression (depth minimization) :depth-decr. transforms in right-to-left bottom-up order

    Step 2 : prex graph expansion (size minimization) :size-decreasing transforms in left-to-right top-downorder, if allowed depth not exceeded

    Prex adder synthesis : 1) generate serial-prex graph, 2)graph compression, 3) depth-controlled graph expansion,4) generate pre-/postprocessing and prex logic

    + Generates all previous prex graphs (except PPA-KS)

    + Universal adder synthesis algorithm : generatesarea-optimal adders for any given timing constraints [11](including non-uniform signal arrival times)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 41

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Multilevel adders Multilevel versions of adders of type a) possible (CSKA,

    CSLA, and CIA; notation: 2-level CIA = CIA-2L)

    + Delay is3

    '

    2 11

    &

    %

    1 )

    for levels

    Area increase small for CSKA and CIA,high for CSLA ( COSA)

    Difcult computation of optimal group sizes

    Hybrid adders

    Arbitrary combinations of speed-up techniques possible

    hybrid/mixed adder architectures Often used combinations : CLA and CSLA [14]

    Pure architectures usually perform best (at gate-level)

    Transistor-level adders Inuence of logic styles (e.g. dynamic logic,

    pass-transistor logic

    faster)

    + Efcient transistor-level implementation of ripple-carrychains (Manchester chain) [14]

    + Combinations of speed-up techniques make sense

    Much higher design effort Many efcient implementations exist and published

    Computer Arithmetic: Principles, Architectures, and VLSI Design 42

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Self-timed adders

    Average carry-propagation length : log 2

    + RCA is fast in average case ( 3

    '

    log 2 )

    ), slow in worstcase suitable for self-timed asynchronous designs [15]

    Completion detection is not trivial

    Adder performance comparisons

    Standard-cell implementations, 0 1 8 process

    addperf.ps84 4 84 mm

    RCA

    CSKA-2L

    CIA-1L

    CIA-2L

    PPA-SK

    PPA-BK

    CLA

    COSA

    const. AT

    area [lambda^2]

    delay [ns]2

    5

    1e+06

    2

    5

    1e+07

    5 10 20

    8-bit

    16-bit

    32-bit

    64-bit

    128-bit

    Computer Arithmetic: Principles, Architectures, and VLSI Design 43

  • 8/14/2019 Comp Arith Notes

    13/26

    4 Addition 4.3 Carry-Propagate Adders (CPA)

    Complexity comparison under the unit-gate model

    adder A T AT opt. 1 syn. 2

    RCA 7 2 2 2 14 2 2 aaa0

    CSKA-1L 8 2 4 2 11

    2 32 2 31

    2 aat 3

    CSKA-2L 8 2 ( 2 11

    3 4 ( 2 41

    3 4

    CSLA-1L 142

    21

    82 1

    1

    2

    392 3

    1

    2

    CIA-1L 10 2 2 1 8 2 11

    2 28 2 31

    2 att0

    CIA-2L 10 2 3 1 6 2 11

    3 36 2 41

    3 att0

    CIA-3L 10 2 4 1 4 2 11

    4 44 2 51

    4 0

    PPA-SK 322 log 2 2log 2 3 2 log2 2 ttt

    0

    PPA-BK 10 2 4log 2 40 2 log 2 att0

    PPA-KS 3 2 log 2 2log 2 6 2 log2 2 CLA 5 14 2 4log 2 56 2 log 2 (

    0

    )COSA 3 2 log 2 2log 2 6 2 log2 2

    1 optimality regarding area and delayaaa : smallest area, longest delayaat : small area, medium delayatt : medium area, short delayttt : large area, shortest delay : not optimal

    2 obtained from prex adder synthesis3 automatic logic optimization not possible (redundancy)4 exact factors not calculated5 corresponds to 4-bit PPA-BK

    Computer Arithmetic: Principles, Architectures, and VLSI Design 44

    4 Addition 4.4 Carry-Save Adder (CSA)

    4.4 Carry-Save Adder (CSA)

    a) Adds three 2 -bit operands

    0 ,

    1 ,

    2 performing nocarry-propagation (i.e. carries are saved ) [1]

    '

    )

    0

    1

    2

    2 % 1

    0

    1

    2 ;

    0 0 1 1 1 1 2 1 (n.)

    csasymbol.epsi21 4 26 mmCSA

    S C

    A0 A1 A2

    b) Adds one 2 -bit operand to an 2 -digit carry-save operand'

    )

    '

    )

    7

    Result is in redundant carry-save format ( 2 digits),represented by two 2 -bit numbers

    (sum bits) and

    (carry bits)

    + Parallel arrangement of 2 full-adders, constant delay

    7 2 4

    csa.epsi67 4 27 mmFA

    s n-1

    FA

    s 1

    FA

    s 0

    . . .

    c n c 2 c 1

    a 0

    , n - 1

    a 1

    , n - 1

    a 2

    , n - 1

    a 0

    , 1

    a 1

    , 1

    a 2

    , 1

    a 0

    , 0

    a 1

    , 0

    a 2

    , 0

    Multi-operand carry-save adders ( 3) adder array (linear arrangement), adder tree (tree arr.)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 45

    4 Addition 4.5 Multi-Operand Adders

    4.5 Multi-Operand Adders Add three or more ( 2) 2 -bit operands, yield

    '

    2

    log ! )

    -bit result in irredundant number rep. [1, 2]

    Array adders Realization by array adders : (see gures on next page)

    a) linear arrangement of CPAsb) linear arr. of CSAs (adder array ) and nal CPA

    a) and b) differ in bit arrival times at nal CPA : if CPA = RCA : a) and b) have same overall delay

    if fast nal CPA : uniform bit arrival times required

    CSA array (b) Fast implementation : CSA array + fast nal CPA

    (note: array of fast CPAs not efcient/necessary)

    '

    2)

    '

    2)

    CPA = RCA :

    3

    '

    2

    2

    )

    3

    '

    2

    )

    Fast CPA :

    3

    '

    2

    2 log 2 )

    3

    '

    log 2 )

    mopadd.epsi30 4 58 mm

    CSA

    A0

    CPA

    CSA

    A1 A2 Am-1

    S

    A3

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 46

    4 Addition 4.5 Multi-Operand Adders

    a) 4-operand CPA (RCA) array :

    cparray.epsi93 4 57 mm

    s n-1

    FA

    s 1

    FA

    s 0

    a 0

    , n - 1

    a 1

    , n - 1

    a 2,n-1

    a 0

    , 1

    a 1

    , 1

    a 0

    , 0

    a 1

    , 0

    FA

    HA

    FA HA

    FA FA HA

    FA

    FA

    FAFA

    a 0

    , 2

    a 1

    , 2

    a 3,n-1

    a 2,2

    a 3,2

    a 2,1

    a 3,1

    a 2,0

    a 3,0

    s 2 s n

    CPA

    CPA

    CPA

    . . .

    . . .

    . . .

    . . .

    b) 4-operand CSA array with nal CPA (RCA) :

    csarray.epsi99 4 57 mm

    s n-1

    FA

    s 1

    FA

    s 0

    a 0

    , n - 1

    a 1

    , n - 1

    a 3,n-1

    a 0

    , 1

    a 1

    , 1

    a 0

    , 0

    a 1

    , 0

    FA FA HA

    FA HA

    FA

    FA

    FAFA

    a 0

    , 2

    a 1

    , 2

    a 3,2 a 3,1 a 3,0

    s 2 s n

    a 2

    , n - 1

    a 2

    , 1

    a 2

    , 0

    a 2

    , 2

    CSA

    CSA

    CPA

    FA. . .

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 47

  • 8/14/2019 Comp Arith Notes

    14/26

    4 Addition 4.5 Multi-Operand Adders

    (m, 2)-compressors

    2'

    &

    8 4

    0

    )

    &

    8 1

    0

    &

    8 4

    0

    7

    cprsymbol.epsi37 4 26 mm(m,2)

    a m-1...

    a 0

    c s

    ......c in

    m-4

    c in 0

    c out m-4

    c out 0

    1-bit adders (similar to (m,k)-counters) [16] Compresses bits down to 2 by forwarding

    '

    3)

    intermediate carries to next higher bit position Is bit-slice of multi-operand CSA array (see prev. page)

    + No horizontal carry-propagation (i.e. 7 5

    ) Built from full-adders (= (3,2)-compressor) or

    (4, 2)-compressors arranged in linear or tree structures Example : 4-operand adder using (4, 2)-compressors

    cpradd.epsi99 4 44 mm

    FA

    s n-1

    FA

    s 1 s 0

    (4,2)

    HA

    (4,2)(4,2)(4,2)

    FA

    s n s n+1 s 2

    a 0 , n

    - 1

    a 1 , n

    - 1

    a 2 , n - 1

    a 3 , n

    - 1

    a 0 , 2

    a 1 , 2

    a 2 , 2

    a 3 , 2

    a 0 , 1

    a 1 , 1

    a 2 , 1

    a 3 , 1

    a 0 , 0

    a 1 , 0

    a 2 , 0

    a 3 , 0

    CSA

    CPA

    Computer Arithmetic: Principles, Architectures, and VLSI Design 48

    4 Addition 4.5 Multi-Operand Adders

    7'

    2)

    4'

    2)

    6'

    log !

    1)

    Optimized (4, 2)-compressor :

    2 full-adders merged and optimized (i.e. XORsarranged in tree structure )

    14 8

    cpr42fa.epsi32 4 38 mm

    FA

    s

    c out FA

    a 0 a 1 a 2 a 3

    c in

    c with full-adders

    14 6

    cpr42opt.epsi41 4 53 mm

    s

    c out

    a 0 a 1 a 2 a 3

    c in

    c

    0 1

    0 1

    optimized

    + same area, 25% shorter delay SD-FA (signed-digit full-adder) is similar to

    (4, 2)-compressor regarding structure and complexity

    Computer Arithmetic: Principles, Architectures, and VLSI Design 49

    4 Addition 4.5 Multi-Operand Adders

    Advantages of (4, 2)-compressors over FAs for realizing(m, 2)-compressors :

    higher compression rate (4:2 instead of 3:2) less deep and more regular trees

    tree depth 0 1 2 3 4 5 6 7 8 9 10

    FA 2 3 4 6 9 13 19 28 42 63 94# operands

    (4,2) 2 4 8 16 32 64 128

    Example : (8, 2)-compressor

    42 16

    cpr82fa.epsi47 4 65 mm

    FA

    a 0

    FA

    a 1 a 2 a 3 a 4 a 5 a 6

    FA FA

    FA

    FA

    a 7

    c s

    c in 0 c out

    0

    c in 1

    c in 2

    c in 3

    c out 1

    c out 2

    c out 3

    c in 4 c out

    4

    full-adder tree

    42 12

    cpr82cpr42.epsi47 4 50 mm

    (4,2)

    a 3 a 0

    c s

    c in 0 c out

    0

    a 1a 2 a 7 a 4 a 5 a 6

    (4,2)

    (4,2)

    c in 1

    c in 2

    c in 3

    c out 1

    c out 2

    c out 3

    c in 4 c out

    4

    (4, 2)-compressor tree

    Computer Arithmetic: Principles, Architectures, and VLSI Design 50

    4 Addition 4.5 Multi-Operand Adders

    Tree adders (Wallace tree)

    Adder tree : 2 -bit -operand carry-save adder composed of 2 tree-structured (m, 2)-compressors [1, 17]

    Tree adders : fastest multi-operand adders using anadder tree and a fast nal CPA

    &

    2 2

    3

    '

    2

    2 log 2 )

    &

    2

    3

    '

    log log 2 )

    Adder arrays and adder trees revisited

    Some FA can often be replaced by HA or eliminated (i.e. redundant due to constant inputs)

    Number of (irredundant) FA does not depend on adderstructure, but number of HA does

    An -operand adder accomodates'

    1)

    carry inputs

    Adder trees ( 3

    '

    log 2 )

    ) are faster than adder arrays(

    3

    '

    2

    )

    ) at same amount of gates (

    3

    '

    2

    )

    )

    Adder trees are less regular and have more complexrouting than adder arrays

    larger area, difcult layout(i.e. limited use in layout generators)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 51

  • 8/14/2019 Comp Arith Notes

    15/26

    4 Addition 4.6 Sequential Adders

    4.6 Sequential Adders

    Bit-serial adder : Sequential 2 -bit adder

    2

    bitseradd.epsi25 4 27 mm

    FA

    a i b i

    s i Accumulators : Sequential -operand adders

    With CPA

    accucpa.epsi27 4 28 mm

    A

    CPA

    S

    With CSA and nal CPA Allows higher clock rates Final CPA too slow :

    pipelining or multiplecycles for evaluation

    4

    accucsa.epsi33 4 52 mm

    A

    CPA

    CSA

    S Mixed CSA/CPA : CSA with partial CPAs (i.e. fewer

    carries saved), trade-off between speed and register size

    Computer Arithmetic: Principles, Architectures, and VLSI Design 52

    5 Simple/ Addition-Based Operations 5.1 Complement and Subtraction

    5 Simple / Addition-Based Operations

    5.1 Complement and Subtraction

    2s complementer (negation)

    1

    neg.epsi21 4 32 mm

    + 1

    A

    Z

    1

    2s complement subtractor

    '

    )

    1

    sub.epsi29 4 32 mm

    c out CPA

    A B

    S

    1

    2s complement adder/subtractor

    '

    1)

    '

    )

    addsub.epsi36 4 35 mm

    c out CPA

    A B

    S

    sub

    1s complement adder

    '

    mod 27

    1)

    (end-around carry)

    addmod.epsi29 4 28 mmc out

    CPA

    A B

    S

    c in

    Computer Arithmetic: Principles, Architectures, and VLSI Design 53

    5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement

    5.2 Increment / Decrement

    Incrementer Adds a single bit 7 to an 2 -bit operand

    '

    )

    27

    7

    "

    %

    1

    ; 0 0 1 1 1 2 1 0

    7

    7 (r.m.a.)

    incsymbol.epsi29 4 26 mmc out

    + 1

    A

    Z

    c in

    Corresponds to addition with

    0 (

    FA HA) Example : Ripple-carry incrementer using half-adders

    3 2 2 1

    3 2 2

    incfa.epsi59 4 23 mmc out c in

    a n-1

    z n-1

    a 1

    z 1

    a 0

    z 0

    c 1c 2 c n-1HA HA HA

    . . .

    . . .

    or using incrementer slices (= half-adder)

    inc.epsi83 4 33 mm

    c out c in

    a n-1

    z n-1

    a 2

    z 2

    a 1

    z 1

    a 0

    z 0

    HA

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 54

    5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement

    Prex problem :

    :

    :$

    %

    1

    $

    :

    AND-prex struct.

    12

    2 log 2 2 2

    log 2 !

    2

    12

    2 log2 2

    Decrementer'

    )

    7

    dec.epsi93 4 41 mmc out c in

    a 2

    z 2

    a 1

    z 1

    a 0

    z 0

    a n-1

    z n-1

    . . .

    . . .

    Incrementer-decrementer'

    )

    7

    '

    1)

    7

    incdec.epsi

    944

    46 mmc out c in

    a 2

    z 2

    a 1

    z 1

    a 0

    z 0

    dec

    a n-1

    z n-1

    . . .

    . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 55

  • 8/14/2019 Comp Arith Notes

    16/26

    5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement

    Fast incrementers

    4-bit incrementer using multi-input gates :

    inccg.epsi

    62 4 39 mm

    c out

    c in

    a 3 a 2 a 1 a 0

    z 3 z 2 z 1 z 0

    8-bit parallel-prex incrementer (Sklansky AND-prexstructure) :

    incpp.epsi98 4 63 mm

    c out

    c in

    a 7 a 6 a 5 a 4

    z 7 z 6 z 5 z 4

    a 3 a 2 a 1 a 0

    z 3 z 2 z 1 z 0

    Computer Arithmetic: Principles, Architectures, and VLSI Design 56

    5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement

    Gray incrementer

    Increments in Gray number system

    0 7 8 1 7 8 2 0 (parity)

    %

    1

    ; 0 0 1 1 1 2 3 (r.m.a.)" 0 0 0

    "

    8 1

    8 1 ; 0 1 1 1 1 2

    2"

    7

    8 1 7 8 1 7 8 2

    Prex problem

    AND-prex structure

    Computer Arithmetic: Principles, Architectures, and VLSI Design 57

    5 Simple / Addition-Based Operations 5.3 Counting

    5.3 Counting Count clock cycles counter ,

    divide clock frequency

    frequency divider ( )

    Binary counter Sequential in-/decrementer Incrementer speed-up

    techniques applicable Down- and up-down-counters

    using decrementers / incrementer-decrementers

    cntblock.epsi32 4 33 mm

    c out + 1

    Q

    c in

    clk

    Example : Ripple-carry up-counter using counter slices(= HA + FF), 7 is count enable

    cntripple.epsi87 4 36 mm

    c out c in

    q n-1 q 2 q 1 q 0

    . . .

    Asynchronous counter using toggle-ip-ops(lower toggle rate

    lower power)

    cntasync.epsi64 4 18 mm

    clk

    q n-1 q 2 q 1 q 0

    TTTT . . .

    Computer Arithmetic: Principles, Architectures, and VLSI Design 58

    5 Simple / Addition-Based Operations 5.3 Counting

    Fast divider ( 3

    '

    1)

    ) using delayed-carry numbers(irredundant carry-save represention of 1 allows usingfast carry-save incrementer) [8]

    Gray counter Counter using Gray incrementer

    Ring counters Shift register connected to ring :

    cntring.epsi51 4 16 mm

    q n-1 q 0 q 1q 2

    State is not encoded

    2 FF for counting 2 states Must be initialized correctly (e.g. 00 01) Applications:

    fast dividers (no logic between FF) state counter for one-hot coded FSMs

    Johnson / twisted-ring counter (inverted feed-back) :

    cntjohnson.epsi59 4 16 mm

    q n-1 q 0 q 1q 2

    2 FF for counting 2 2 states

    Computer Arithmetic: Principles, Architectures, and VLSI Design 59

  • 8/14/2019 Comp Arith Notes

    17/26

    5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection

    5.4 Comparison, Coding, Detection

    Comparison operations$

    '

    )

    (equal) $

    '

    )

    $

    (not equal)

    $

    '

    )

    (greater or equal)

    '

    )

    $

    (less than)

    '

    )

    $

    $

    (greater than)

    $

    '

    6

    )

    $

    $

    (less or equal)

    Equality comparison$

    '

    )

    %

    1 '

    )

    '

    )

    ;0 0 1 1 1 2 1

    0 1 $

    7 (r.s.a.)

    cmpeq.epsi40 4 36 mm

    a n - 1

    a 2

    a 1

    a 0

    EQ

    b n - 1

    b 2

    b 1

    b 0

    . . .

    Magnitude comparison

    $

    '

    )

    %

    1 '

    )

    '

    )

    '

    )

    ; 0 0 1 1 1 2 1

    0 1 $

    7 (r.s.a.)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 60

    5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection

    Comparators Subtractor (

    )

    :

    $

    $

    7

    8 1:0

    (for free in PPA)

    7 2 2 2 or

    8

    32

    2 log 2 8

    2 log 2

    cmpsub.epsi37 4 31 mm

    CPA

    A B

    1c out GE =

    P n-1:0 EQ =

    Optimized comparator : removing redundancies in subtractor (unused ) single-tree structure

    speed-up at no cost :

    6 2 2 2

    2log 2

    example : ripple comparator using comparator slices

    cmpripple.epsi100 4 47 mm

    a n - 1

    a 2

    a 1

    EQ

    b n - 1

    b 2

    b 1

    a 0

    b 0

    GE

    . . .

    equality

    magnitude

    equality &magnitude

    Computer Arithmetic: Principles, Architectures, and VLSI Design 61

    5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection

    Decoder Decodes binary number

    7

    8 1:0 to vector

    &

    8 1:0 ( 27

    )

    "

    1 if

    0

    0 else ; 0 0 1 1 1 1

    2

    decodersym.epsi21 4 26 mmdecoder

    A

    Z

    decoder.epsi58 4 28 mm

    a 2 a 1 a 0

    z 3 z 2 z 1 z 0 z 7 z 6 z 5 z 4

    '

    2

    1)

    27

    log2

    !

    Encoder Encodes vector

    &

    8 1:0 to binary number

    7

    8 1:0 ( 27

    )(condition: 0

    5

    !

    if 5 0 then

    1 else

    0)

    0 if 1 ; 0 0 1 1 1 1

    log 2

    encodersym.epsi21 4 26 mmencoder

    A

    Z

    2

    '

    27

    8 1 1

    )

    2

    1

    encoder.epsi30 4 34 mm

    a 0

    z 0

    z 1

    z 2

    a 2 a 4 a 6 a 1a 3 a 5 a 7

    (note: connectionsaccording to PPA-SK)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 62

    5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection

    Detection operations

    All-zeroes detection : " 7 8 1 7 8 2 0

    All-ones detection : " 7 8 1 7 8 2 0 (r.s.a.)

    2

    log 2

    Leading-zeroes detection (LZD) : for scaling , normalization , priority encoding

    a) non-encoded output :

    0 1

    0!

    1

    0 1

    0

    (e.g. 000101 000100)

    2 2 2

    lzdnenc.epsi50 4 28 mm

    a 1

    z 1

    a 0

    z 0

    a n-1

    z n-1

    . . .

    a n-2

    z n-2

    . . .

    prex problem (r.m.a.)

    AND-prex structure

    b) encoded output : + encoder

    signed numbers : + leading-ones detector (LOZ)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 63

  • 8/14/2019 Comp Arith Notes

    18/26

    5 Simple/ Addition-Based Operations 5.5 Shift, Extension, Saturation

    5.5 Shift, Extension, Saturation

    Shift : a) shift 2 -bit vector by 5 bit positionsb) select 2 out of more bits at position 5

    also: logical (= unsigned), arithmetic (= signed)

    Rotation by 5 bit positions, 2 constant (logic operation)Extension of word lengths by 5 bits ( 2 2 5 )

    (i.e. sign-extension for signed numbers)Saturation to highest/lowest value after over-/underow

    shift a) un- l. 7 8 2 1 1 1 0 0 sllsigned r. 0 7 8 1 1 1 1 1 srlsigned l. 7 8 1 7 8 3 1 1 1 0 0 sla

    r. 7 8 1 7 8 1 7 8 2 1 1 1 1 sra

    shift b) unsigned 7 %

    8 1 1 1 1

    signed 2 7 8 1 7 % 8 2 1 1 1

    rotate l. 7 8 2 1 1 1 0 7 8 1 rol

    r. 0

    7

    8

    1 1 1 1

    1 rorextend un- l. 0 7 8 1 1 1 1 0

    signed r. 7 8 1 1 1 1 0 0signed l. 7 8 1 7 8 1 7 8 2 1 1 1 0

    r. 7 8 1 7 8 2 1 1 1 0 0

    saturate unsigned 7 8 1 1 1 1 7 8 1signed 7 8 1 7 8 1 1 1 1 7 8 1

    Computer Arithmetic: Principles, Architectures, and VLSI Design 64

    5 Simple/ Addition-Based Operations 5.5 Shift, Extension, Saturation

    Applications : adaption of magnitude (shift a)) or word length

    (extension) of operands (e.g. for addition) multiplication/division by multiples of 2 (shift) logic bit/byte operations (shift, rotation) scaling of numbers for word-length reduction (i.e.

    ignore leading zeroes, shift b)) or normalization (e.g.of oating-point numbers, shift a)) using LZD

    reducing error after over-/underow (saturation) Implementation of shift/extension/rotation by

    constant values : hard-wired variable values : multiplexers 2 possible values : 2 by 2 barrel-shifter/rotator

    Example : 4by4 barrel-rotator

    3

    '

    2 2 )

    3

    '

    log 2 )

    muxshift.epsi41 4 28 mm

    a 3 a 2 a 1 a 0

    s 0

    s 1

    z 3 z 2 z 1 z 0

    multiplexers

    barshift.epsi44 4 49 mm

    a 3 a 2 a 1 a 0

    s 0

    s 1

    z 3 z 2 z 1 z 0

    s 0 s 1

    s 0 s 1

    s 0 s 1

    tristate buffers

    Computer Arithmetic: Principles, Architectures, and VLSI Design 65

    5 Simple / Addition-Based Operations 5.6 Addition Flags

    5.6 Addition Flags

    ag formula description

    7 carry ag

    7

    7

    8 1 signed overow ag

    7

    7

    7

    7

    7

    7

    0 : 0 zero ag

    7

    8 1 negative ag, sign

    Implementation of adder with ags

    ,

    : for free

    : fast

    7

    ,

    7

    8 1 computed by e.g. PPA

    very cheap

    : a) 7 1 (subtract.) :

    '

    )

    7

    8 1:0 (of PPA)

    b) 7 0 1 :

    1)

    7

    8 1

    7

    8 2

    0 (r.s.a.)

    2

    log 2 !

    2) faster without nal sum (i.e. carry prop.) [18] example : 01001 1 00 0

    10110 1 00 00000 0 00

    " 0 ' '

    0

    0)

    7

    )

    "

    ' '

    )

    '

    8 1

    8 1) )

    "

    7

    8 1 " 7 8 2 " 0 ; 0 0 1 1 1 2

    1 (r.s.a.)

    3 2 4

    log 2 !

    Computer Arithmetic: Principles, Architectures, and VLSI Design 66

    5 Simple / Addition-Based Operations 5.6 Addition Flags

    Basic and derived condition ags

    formulacondition ag

    unsigned signed

    operation:

    ( ) or

    ( )

    0 zero

    0 negative

    0 positive

    ( overow

    ( )

    0

    2 underow

    ( )

    operation:

    $

    $

    $

    '

    )

    6

    $

    Unsigned and signed addition/subtraction only differwith respect to the condition ags

    Computer Arithmetic: Principles, Architectures, and VLSI Design 67

  • 8/14/2019 Comp Arith Notes

    19/26

    5 Simple/ Addition-Based Operations 5.7 Arithmetic Logic Unit (ALU)

    5.7 Arithmetic Logic Unit (ALU)

    alusymbol.epsi30 4 29 mm

    c out ALU

    A B

    Z

    c in

    op flags

    ALU operations

    add

    7 sub

    7

    arithmetic inc

    1 dec

    1pass

    neg

    and

    nand

    or

    nor

    logicxor

    xnor

    pass not

    sll

    1 srl

    #

    1shift/ sla

    1 sra

    #

    1rotate

    rol

    1 ror

    #

    1 s/ro : shift/rotate ; l/r : left/right ;

    l/a : logic (unsigned) / arithmetic (signed)

    Logic of adder/subtractor can partly be shared with logicoperations

    Computer Arithmetic: Principles, Architectures, and VLSI Design 68

    6 Multiplication 6.1 Multiplication Basics

    6 Multiplication

    6.1 Multiplication Basics Multiplies two 2 -bit operands

    and

    [1, 2] Product

    is'

    2 2 )

    -bit unsigned number or'

    2 2 1)

    -bitsigned number

    Example : unsigned multiplication

    7

    8 1

    0

    2

    7

    8 1

    $

    0

    $ 2$

    7

    8 1

    0

    7

    8 1

    $

    0

    $ 2

    %

    $

    or

    7

    8 1

    0

    2

    ; 0 0 1 1 1 2 1 (r.s.a.)

    Algorithm

    1) Generation of 2 partial products

    2) Adding up partial products :

    a) sequentially (sequential shift-and-add),b) serially (combinational shift-and-add), orc) in parallel

    Speed-up techniques Reduce number of partial products Accelerate addition of partial products

    Computer Arithmetic: Principles, Architectures, and VLSI Design 69

    6 Multiplication 6.1 Multiplication Basics

    Sequential multipliers :partial products generatedand added sequentially (usingaccumulator )

    3

    '

    2

    )

    3

    '

    log 2 )

    2

    mulseq.epsi34 4 28 mm

    CPA

    Array multipliers :partial products generated andadded simultaneously in lineararray (using array adder )

    3

    '

    2 2)

    3

    '

    2

    )

    mularr.epsi34 4 47 mm

    CPA

    CSA

    CSA

    CSA

    CSA

    Parallel multipliers :partial productsgenerated in parallel andaddedsubsequently in multi-operandadder (using tree adder )

    3

    '

    2 2)

    3

    '

    log 2 )

    mulpar.epsi34 4 43 mm

    CPA

    CSAtree

    Signed multipliers :a) complement operands before and result after

    multiplication unsigned multiplicationb) direct implementation (dedicated multiplier structure)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 70

    6 Multiplication 6.2 Unsigned Array Multiplier

    6.2 Unsigned Array Multiplier Braun multiplier : array multiplier for unsigned numbers

    7

    8 1

    0

    7

    8 1

    $

    0

    $ 2

    %

    $

    8 2 2 11 2

    6 2 9

    0

    3 0

    2 0

    1 0

    0 1

    3 1

    2 1

    1 1

    0 2

    3 2

    2 2

    1 2

    0 3

    3 3

    2 3

    1 3

    0

    7

    6

    5

    4

    3

    2

    1

    0

    mulbraun.epsi99 4 83 mm

    b 3

    FA

    FA

    FA

    FA

    FA

    FA

    FA FA HA

    b 2

    b 1

    b 0

    p 7 p 6 p 5 p 4

    p 3

    p 2

    p 1

    p 0

    a 3

    a 2

    a 1

    a 0

    HA HA HA

    CPA

    CSA

    1

    2

    3

    Computer Arithmetic: Principles, Architectures, and VLSI Design 71

  • 8/14/2019 Comp Arith Notes

    20/26

    6 Multiplication 6.3 Signed Array Multipliers

    6.3 Signed Array Multipliers

    Modied Braun multiplier

    Subtract bits with negative weight

    special FAs [1]

    1 neg. bit :

    7

    2

    2 neg. bits :

    7

    2

    Replace FAs in regions

    1 ,

    2 , and

    3 by :(input at mark )

    7

    7

    7

    Otherwise exactly same structure and complexity asBraun multiplier efcient and exible

    Baugh-Wooley multiplier

    Arithmetic transformations yield the following partialproducts (two additional ones) :

    0

    3

    0

    2

    0

    1

    0

    0 1

    3 1

    2 1

    1 1

    0 2

    3 2

    2 2

    1 2

    0 3

    3 3

    2 3

    1 3

    0 3 3

    1

    3

    3

    7

    6

    5

    4

    3

    2

    1

    0

    Less efcient and regular than modied Braunmultiplier

    Computer Arithmetic: Principles, Architectures, and VLSI Design 72

    6 Multiplication 6.4 Booth Recoding

    6.4 Booth Recoding Speed-up technique : reduction of partial products

    Sequential multiplication Minimal (or canonical) signed-digit (SD) represent. of

    + One cycle per non-zero partial product (i.e.

    !

    0)

    Negative partial products

    Data-dependent reduction of partial products and latency

    Combinational multiplication Only xed reduction of partial product possible Radix-4 modied Booth recoding : 2 bits recoded to one

    multiplier digit

    2

    2 partial products

    7

    1

    2

    0( 2 8 1 2 2 2 % 1)

    8 2 8 1 0 %

    1 %

    2

    22

    ; 8 1 0

    2

    %

    1

    2

    2

    8 1

    0 0 0 00 0 1 0 1 0 0 1 1 2 1 0 0 2 1 0 1 1 1 0 1 1 1 0

    mulbooth.epsi41 4 43 mm

    B o o

    t h

    r e c o

    d i n g

    CPA

    CSAarray/tree

    Computer Arithmetic: Principles, Architectures, and VLSI Design 73

    6 Multiplication 6.4 Booth Recoding

    Applicable to sequential , array , and parallel multipliers

    additional recoding logic and morecomplex partial product generation(MUX for shift, XOR for negation)

    : 8 2

    : 7

    + adder array/tree cut in half considerably smaller (array and tree)

    : 2

    much faster for adder arrays : 2

    slightly or not faster for adder trees : 0

    Negative partial products (avoid sign-extension ) :

    3

    3

    3

    ext. sign

    3

    2

    1

    0 0 0 0

    3

    2

    1

    0

    1 1 1 1 3

    2

    1

    0

    03

    03

    03

    03

    02

    01

    00

    13

    13

    13

    12

    11

    10

    23

    23

    22

    21

    20

    33

    32

    31

    30

    6

    5

    4

    3

    2

    1

    0

    1

    03

    02

    01

    00

    13

    12

    11

    10

    23

    22

    21

    20

    33

    32

    31

    30

    6

    5

    4

    3

    2

    1

    0

    Suited for signed multiplication (incl. Booth recod.)

    Extend

    for unsigned multiplication : 7 0

    Radix-8 (3-bit recoding) and higher radices :precomputing 3

    , 1 1 1

    larger overhead

    Computer Arithmetic: Principles, Architectures, and VLSI Design 74

    6 Multiplication 6.6 Multiplier Implementations

    6.5 Wallace Tree Addition Speed-up technique : fast partial product addition

    3

    '

    2 2)

    3

    '

    log 2 )

    Applicable to parallel multipliers : parallel partialproduct generation (normal or Booth recoded)

    Irregular adder tree (Wallace tree) due to differentnumber of bits per column

    irregular wiring and/or layout

    non-uniform bit arrival times at nal adder

    6.6 Multiplier Implementations Sequential multipliers :

    low performance, small area, resource sharing (adder) Braun or Baugh-Wooley multiplier (array multiplier) :

    medium performance, high area, high regularity layout generators

    data paths and macro-cells simple pipelining , faster CPA higher speed

    Booth-Wallace multiplier (parallel multiplier) [9] : high performance, high area, low regularity

    custom multipliers, netlist generators often pipelined (e.g. register between CSA-tree and CPA)

    Signed-unsigned multiplier : signed multiplier withoperands extended by 1 bit ( 7 7 8 1 0,

    7

    7

    8 1 0)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 75

  • 8/14/2019 Comp Arith Notes

    21/26

    6 Multiplication 6.8 Squaring

    6.7 Composition from Smaller Multipliers

    '

    2 2

    2 2 )

    -bit multiplier can be composed from 4'

    2

    2

    )

    -bit multipliers (can be repeated recursively)

    '

    27

    )

    '

    27

    )

    227

    '

    )

    27

    4'

    2

    2

    )

    -bit multipliers+

    '

    2 2 )

    -bit CSA +'

    3 2 )

    -bit CPA

    less efcient (area and speed)

    6.8 Squaring

    2

    : multiplier optimizations possible

    0 3

    0 1 0 1 3 1 2 1 1 0

    2 3 2 2 1

    3 3 2 3 1 3 0 2 3 1 3 0 3

    0 1 0 0 3 3 1 2 1 1

    2 2

    7

    6

    5

    4

    3

    2

    1

    0

    +

    2

    2

    1 partial products (if no Booth recoding used)

    optimized squarer more efcient than multiplier

    Table look-up (ROM) less efcient for every 2

    Computer Arithmetic: Principles, Architectures, and VLSI Design 76

    7 Division / Square Root Extraction 7.1 Division Basics

    7 Division / Square Root Extraction

    7.1 Division Basics

    ;

    rem

    (remainder)

    0 227

    1

    0 27

    1

    0

    27

    27

    , otherwise overow

    normalize

    before division (

    27

    8 1 2

    7

    1 )

    Algorithms (radix-2) Subtract-and-shift : partial remainders

    [1, 2] Sequential algorithm : recursive, # non-associative

    "

    %

    1

    2

    %

    %

    1

    2

    7

    0 ; 0 2

    1 1 1 1 0 (r.m.n.)

    Basic algorithm : compare and conditionally subtract

    expensive comparison and CPA

    Restoring division : subtract and conditionally restore(adder or multiplexer)

    expensive CPA and restoring

    Non-restoring division : detect sign , subtract/add , andcorrect by next steps expensive CPA

    SRT division : estimate range , subtract/add (CSA), andcorrect by next steps

    inexpensive CSA

    Computer Arithmetic: Principles, Architectures, and VLSI Design 77

    7 Division / Square Root Extraction 7 .3 Non-Restoring Division

    7.2 Restoring Division

    1 if

    %

    1

    2

    00 if

    %

    1

    2

    0

    0

    %

    1

    2

    0 : 0

    %

    1 (restored)0 1

    %

    1

    2

    8 1 0 : 8 1 1

    8 1

    %

    1

    2

    8 1

    7.3 Non-Restoring Division

    1 if

    %

    1

    0 1 1 if

    %

    1 0

    0

    %

    1

    0 :

    1

    %

    1

    2

    0

    1

    %

    1

    2

    0 :

    8 1

    1

    8

    1

    %

    1

    2

    2

    8 1

    %

    1

    2

    8 1

    One subtraction/addition (CPA) per step Final correction step for

    (additional CPA) Simple quotient digit conversion : (note:

    irredundant)

    1 1

    0 1 : 12'

    1)

    '

    7

    8 1

    7

    8 2

    7

    8 3 1 1 1

    0 1)

    '

    2

    1)

    3

    '

    2

    2)

    or3

    '

    2

    2 log2

    )

    '

    2

    1)

    3

    '

    2 2)

    or3

    '

    2 log 2 )

    divnr.epsi46 4 38 mm

    + / CPA+ / CPA

    + / CPA+ / CPA

    Q

    + / CPA

    A B

    R

    Computer Arithmetic: Principles, Architectures, and VLSI Design 78

    7 Division / Square Root Extraction 7.4 Signed Division

    7.4 Signed Division

    1 if

    %

    1

    same sign1 if

    %

    1

    opposite sign Example : signed non-restoring array divider

    (simplications:

    0, nal correction of

    omitted)

    9 2 2 2 2 2 4 2

    divarray.epsi81 4 101 mm

    b 3 b 0

    r 3 r 2 r 1 r 0

    a 0

    a 1

    a 2

    q 3

    q 2

    q 1

    q 0

    b 2 b 1

    FAFAFAFA

    FAFAFAFA

    FAFAFAFA

    FAFAFAFA

    a 6 a 3 a 5 a 4 b 3 a 6

    Computer Arithmetic: Principles, Architectures, and VLSI Design 79

  • 8/14/2019 Comp Arith Notes

    22/26

    7 Division / Square Root Extraction 7.5 SRT Division

    7.5 SRT Division (Sweeney, Robertson, Tocher)

    1 if

    2

    6

    %

    1

    0 if

    2

    6

    %

    1

    2

    1 if

    %

    1

    2

    is SD number

    If 27

    8 16

    27

    , i.e.

    is normalized :

    2

    6

    2

    7

    %

    8 16

    %

    1

    2

    7

    %

    8 16

    2

    1 if 27

    %

    8 16

    %

    1

    0 if 27

    %

    8 16

    %

    1 27

    %

    8 1

    1 if

    %

    1 27

    %

    8 1

    + Only 3 MSB are compared

    are estimated

    CSAinstead of CPA can be used (precise enough) [19]

    Correction in following steps (+ nal correction step) Redundant representation of

    (SD representation)

    nal conversion necessary (CPA)+ Highly regular and fast (

    3

    '

    2

    )

    ) SRT array dividers

    only slightly slower/larger than array multipliers

    2

    2

    3

    '

    2 2)

    2

    3

    '

    2

    )

    divsrt.epsi50 4 38 mm

    + / CSA

    A B

    Q

    R

    + / CPA

    + / CSA+ / CSA

    + / CSA C P A

    Computer Arithmetic: Principles, Architectures, and VLSI Design 80

    7 Division / Square Root Extraction 7.7 Division by Multiplication

    7.6 High-Radix Division

    Radix

    2&

    ,

    1 1 1 1 1 0 1 1 1 1

    1

    quotient bits per step fewer , but more complex steps

    + Suitable for SRT algorithm

    faster

    Complex comparisons (more bits) and decisions

    table look-up (

    Pentium bug!)

    7.7 Division by Multiplication

    Division by convergence

    0

    1

    &

    8 1

    0

    1

    &

    8 1

    1!

    1!

    1resp.

    27

    %

    1

    27

    '

    1 )

    '

    1 )

    27

    '

    1 2)

    27

    1

    28

    7

    2

    28

    7

    1 (signed)

    Algorithm :

    %

    1

    %

    1

    1 ; 0 0 1 1 1 1

    0

    0

    & (r.s.n.)

    Quadratic convergence :

    log 2 !

    Computer Arithmetic: Principles, Architectures, and VLSI Design 81

    7 Division / Square Root Extraction 7.8 Remainder / Modulus

    Division by reciprocation

    1

    Newton-Raphson iteration method :

    nd # '

    )

    0 by recursion

    %

    1

    #

    '

    )

    #

    '

    )

    #

    '

    )

    1

    #

    '

    )

    1

    2#

    1

    &

    0

    Algorithm :

    %

    1

    '

    2

    )

    ; 0 0 1 1 1 1

    0

    & (r.s.n.)

    Quadratic convergence : 3

    '

    log 2 )

    Speed-up : rst approximation

    0 from table

    7.8 Remainder / Modulus

    Remainder (rem) : signed remainder of a division

    rem

    sign'

    )

    sign'

    )

    Modulus (mod) : positive remainder of a division

    mod

    0

    if

    0

    else

    Computer Arithmetic: Principles, Architectures, and VLSI Design 82

    7 Division / Square Root Extraction 7.9 Divider Implementations

    7.9 Divider Implementations

    Iterative dividers (through multiplication) :

    resource sharing of existing components (multiplier) medium performance, medium area high efciency if components are shared

    Sequential dividers (restoring, non-restoring, SRT) :

    resource sharing of existing components (e.g. adder)

    low performance, low area Array dividers (restoring, non-restoring, SRT) :

    dedicated hardware component high performance, high area high regularity layout generators, pipelining square root extraction possible by minor changes combination with multiplication or/and square root

    No parallel dividers exist, as compared to parallelmultipliers (sequential nature of division)

    Computer Arithmetic: Principles, Architectures, and VLSI Design 83

  • 8/14/2019 Comp Arith Notes

    23/26

    7 Division / Square Root Extraction 7 .10 Square Root Extraction

    7.10 Square Root Extraction0

    2

    0 227

    1

    0 27

    1

    Algorithm Subtract-and-shift : partial remainders

    and quotients

    %

    1

    2

    '

    7

    8 1 1 1 1


Recommended