Date post: | 30-May-2018 |
Category: |
Documents |
Upload: | dannmartins9 |
View: | 214 times |
Download: | 0 times |
of 26
8/14/2019 Comp Arith Notes
1/26
Zurich Technische Hochschule Eidgenossische
Swiss Federal Institute of Technology Zurich Politecnico federale di Zurigo Ecole polytechnique federale de Zurich
Institut f ur Integrierte Systeme Integrated Systems Laboratory
Lecture notes on
Computer Arithmetic:Principles, Architectures,
and VLSI Design
March 16, 1999
Reto Zimmermann
Integrated Systems LaboratorySwiss Federal Institute of Technology (ETH)
CH-8092 Z urich, Switzerland [email protected]
Copyright c
1999 by Integrated Systems Laboratory, ETH Z urichhttp://www.iis.ee.ethz.ch/ zimmi/publications/comp arith notes.ps.gz
8/14/2019 Comp Arith Notes
2/26
Contents
Contents
1 Introduction and Conventions 4
1.1 Outline 4
1.2 Motivation 4
1.3 Conventions 5
1.4 Recursive Function Evaluation
62 Arithmetic Operations 8
2.1 Overview 8
2.2 Implementation Techniques 9
3 Number Representations 10
3.1 Binary Number Systems (BNS) 10
3.2 Gray Numbers 13
3.3 Redundant Number Systems 14
3.4 Residue Number Systems (RNS) 16
3.5 Floating-Point Numbers 18
3.6 Logarithmic Number System 193.7 Antitetrational Number System 19
3.8 Composite Arithmetic 20
3.9 Round-Off Schemes 21
4 Addition 22
4.1 Overview 22
4.2 1-Bit Adders, (m, k)-Counters 23
Computer Arithmetic: Principles, Architectures, and VLSI Design 1
Contents
4.3 Carry-Propagate Adders (CPA) 26
4.4 Carry-Save Adder (CSA) 45
4.5 Multi-Operand Adders 46
4.6 Sequential Adders 52
5 Simple/ Addition-Based Operations 53
5.1 Complement and Subtraction 53
5.2 Increment / Decrement 545.3 Counting 58
5.4 Comparison, Coding, Detection 60
5.5 Shift, Extension, Saturation 64
5.6 Addition Flags 66
5.7 Arithmetic Logic Unit (ALU) 68
6 Multiplication 69
6.1 Multiplication Basics 69
6.2 Unsigned Array Multiplier 71
6.3 Signed Array Multipliers 72
6.4 Booth Recoding 73
6.5 Wallace Tree Addition 75
6.6 Multiplier Implementations 75
6.7 Composition from Smaller Multipliers 76
6.8 Squaring 76
7 Division / Square Root Extraction 77
7.1 Division Basics 77
Computer Arithmetic: Principles, Architectures, and VLSI Design 2
Contents
7.2 Restoring Division 78
7.3 Non-Restoring Division 78
7.4 Signed Division 79
7.5 SRT Division 80
7.6 High-Radix Division 81
7.7 Division by Multiplication 81
7.8 Remainder / Modulus 82
7.9 Divider Implementations 83
7.10 Square Root Extraction 84
8 Elementary Functions
858.1 Algorithms 85
8.2 Integer Exponentiation 86
8.3 Integer Logarithm 87
9 VLSI Design Aspects 88
9.1 Design Levels 88
9.2 Synthesis 90
9.3 VHDL 91
9.4 Performance 93
9.5 Testability 95
Bibliography 96
Computer Arithmetic: Principles, Architectures, and VLSI Design 3
8/14/2019 Comp Arith Notes
3/26
1 Introduction and Conventions 1.2 Motivation
1 Introduction and Conventions
1.1 Outline
Basic principles of computer arithmetic [1, 2, 3, 4, 5, 6, 7] Circuit architectures and implementations of main
arithmetic operations
Aspects regarding VLSI design of arithmetic units
1.2 Motivation
Arithmetic units are, among others, core of every data path and addressing unit
Data path is core of : microprocessors (CPU) signal processors (DSP) data-processing application specic ICs (ASIC) and
programmable ICs (e.g. FPGA) Standard arithmetic units available from libraries Design of arithmetic units necessary for :
non-standard operations high-performance components library development
Computer Arithmetic: Principles, Architectures, and VLSI Design 4
1 Introduction and Conventions 1.3 Conventions
1.3 Conventions
Naming conventions
Signal buses :
(1-D),
(2-D), : (subbus, 1-D)
Signals : , (1-D),
(2-D),
: (group signal)
Circuit complexity measures :
(area), (cycle time,
delay),
(area-time product), (latency, # cycles) Arithmetic operators : , , , , log ( log 2 )
Logic operators : (or), (and), (xor), (xnor), (not)
Circuit complexity measures
Unit-gate model ( gate-equivalents (GE) model) : Inverter, buffer :
0 0 (i.e. ignored) Simple monotonic 2-input gates (AND, NAND, OR,
NOR) :
1 1
Simple non-monotonic 2-input gates (XOR, XNOR) :
2 2 Complex gates : composed from simple gates
Simple -input gates :
1
log !
Wiring not considered (acceptable for comparisonpurposes, local wiring, multilevel metallization)
Only estimations given for complex circuits
Computer Arithmetic: Principles, Architectures, and VLSI Design 5
1 Introduction and Conventions 1.4 Recursive Function Evaluation
1.4 Recursive Function Evaluation
Given : inputs , outputs " , function # (graph sym. : )
Non-recursive functions (n. ) Output " is a function of input (or $ % & : $ const.)
"
#
'
(
)
; 0 0 1 1 1 2 1
parallel structure :
3
'
2
)
3
'
1)
funn.epsi
194
17 mm1
a 0 a 1a 2 a 3
z 0 z 1z 2 z 3
Recursive functions (r.) Output " is a function of all inputs
5 6 0
a) with single output " " 7 8 1 (r.s.) :
9
#
'
9
8 1)
; 0 0 1 1 1 2 19
8 1 0 1 " 9
7
8 1
1. # is non-associative (r.s.n. )
serial structure :
3
'
2
)
3
'
2
)
funrsn.epsi19 4 24 mm
123
a 0
a 1
a 2
a 3
z
Computer Arithmetic: Principles, Architectures, and VLSI Design 6
1 Introduction and Conventions 1.4 Recursive Function Evaluation
2. # is associative (r.s.a. ) serial or single-tree structure :
3
'
2
)
3
'
log 2 )
funrsa.epsi19 4 20 mm
12
a 0 a 1a 2 a 3
z
b) with multiple outputs " (r.m.) (
prex problem) :
"
#
'
"
8 1)
; 0 0 1 1 1 2 1 " 8 1 0 1
1. # is non-associative (r.m.n. )
serial structure :
3
'
2
)
3
'
2
)
funrmn.epsi19 4 25 mm
1
23
a 0 a 1a 2 a 3
z 0 z 1z 2 z 3
2. # is associative (r.m.a. )
serial or multi-tree structure :
3
'
2 2)
3
'
log 2 )
funrma1.epsi19 4 43 mm
12
a 0 a 1a 2 a 3
z 0
z 1
z 2
z 3
or shared-tree structure :
3
'
2 log 2 )
3
'
log 2 )
funrma2.epsi19 4 21 mm
12
a 0 a 1a 2 a 3
z 0 z 1z 2 z 3
Computer Arithmetic: Principles, Architectures, and VLSI Design 7
8/14/2019 Comp Arith Notes
4/26
2 Arithmetic Operations 2.1 Overview
2 Arithmetic Operations
2.1 Overview
arithops.epsi98 4 83 mm
= , < + 1 , 1 + , + /
exp (x)
trig (x)
sqrt (x)
log (x)
>
+ ,
fixed-point floating-pointbased on operation
related operation
hyp (x) c o m p
l e x
i t y
(same as onthe left for
floating-pointnumbers)
1 shift/extension 7 division2 comparison 8 square root extraction3 increment/decrement 9 exponential function4 complement 10 logarithm function5 addition/subtraction 11 trigonometric functions6 multiplication 12 hyperbolic functions
Computer Arithmetic: Principles, Architectures, and VLSI Design 8
2 Arithmetic Operations 2.2 Implementation Techniques
2.2 Implementation Techniques
Direct implementation of dedicated units :
always : 1 5 in most cases : 6 sometimes : 7, 8
Sequential implementation using simpler units andseveral clock cycles ( decomposition) :
sometimes : 6 in most cases : 7, 8, 9
Table look-up techniques using ROMs :
universal : simple application to all operations efcient only for single-operand operations of high
complexity (8 12) and small word length (note: ROMsize 2
7
2 )
Approximation techniques using simpler units : 712
taylor series expansion polynomial and rational approximations convergence of recursive equation systems CORDIC (COordinate Rotation DIgital Computer)
Computer Arithmetic: Principles, Architectures, and VLSI Design 9
3 Number Representations 3.1 Binary Number Systems (BNS)
3 Number Representations
3.1 Binary Number Systems (BNS)
Radix-2 , binary number system (BNS) : irredundant,weighted, positional, monotonic [1, 2]
2 -bit number is ordered sequence of bits (b inary dig its ) :
'
7
8 1 7 8 2 1 1 1 0)
2
0 1 Simple and efcient implementation in digital circuits MSB/LSB (most-/least-signicant bit) : 7 8 1 / 0 Represents an integer or xed-point number, exact Fixed-point numbers :
'
&
8 1 1 1 1 0
-bit integer
1
8 1 1 1 1 & 8 7
-bit fraction
)
Unsigned : positive or natural numbers
Value :
7
8 127
8 1 12 0
7
8 1
0
2
Range : 0 27
1
Twos (2s) complement : standard representation of signed or integer numbers
Value :
7
8 127
8 1
7
8 2
0
2
Range : 27
8 1 2
7
8 1 1
Computer Arithmetic: Principles, Architectures, and VLSI Design 10
3 Number Representations 3.1 Binary Number Systems (BNS)
Complement :
27
1 ,where
'
7
8 1 7 8 2 1 1 1 0)
Sign : 7 8 1
Properties : asymmetric range, compatible withunsigned numbers in many arithmetic operations(i.e. same treatment of positive and negative numbers)
Ones (1s) complement : similar to 2s complement
Value :
7
8 1'
27
8 1 1
)
7
8 2
0
2
Range : '
27
8 1 1
)
27
8 1 1
Complement :
27
1
Sign : 7 8 1
Properties : double representation of zero, symmetricrange, modulo
'
27
1)
number system
Sign-magnitude : alternative representation of signednumbers
Value :
'
1)
1
7
8 2
0
2
Range : '
27
8 1 1
)
27
8 1 1
Complement :
'
7
8 1 7 8 2 1 1 1 0)
Sign : 7 8 1
Computer Arithmetic: Principles, Architectures, and VLSI Design 11
8/14/2019 Comp Arith Notes
5/26
3 Number Representations 3.1 Binary Number Systems (BNS)
Properties : double representation of zero, symmetricrange, different treatment of positive and negativenumbers in arithmetic operations, no MSB toggles atsign changes around 0 (
low power)
Graphical representation
numrep.epsi95 4 73 mm
2 n 10
unsigned
2s complement
1s complement
sign-magnitude
2 n 2 n 1
0 0 0
. . . 0
0 1 1
. . . 1
1 0 0
. . . 0
1 1 1
. . . 1
binary number representation
Conventions 2s complement used for signed numbers in these notes Unsigned and signed numbers can be treated equally in
most cases, exceptions are mentioned
Computer Arithmetic: Principles, Architectures, and VLSI Design 12
3 Number Representations 3.2 Gray Numbers
3.2 Gray Numbers
Gray numbers (code ) : binary, irredundant, non-weighted,non-monotonic
+ Property : unit-distance coding (i.e. exactly one bittoggles between adjacent numbers)
Applications : counters with low output toggle rate(low-power signal buses), representation of continuoussignals for low-error sampling (no false numbers due toswitching of different bits at different times)
Non-monotonic numbers : difcult arithmetic operations,e.g. addition, comparison :
1
0
1
0
0
0
0 0 0 1 and 0 11 1 1 0 but 1 0
binary Gray :
%
1
7
0 ;0 0 1 1 1 2 1 (n.)
Gray binary :
%
1
7
0 ;0
2
1 1 1 1 0 (r.m.a.)
binary Gray
3
2
1
0 3 2 1 00 0 0 0 0 0 0 0 01 0 0 0 1 0 0 0 12 0 0 1 0 0 0 1 13 0 0 1 1 0 0 1 04 0 1 0 0 0 1 1 05 0 1 0 1 0 1 1 16 0 1 1 0 0 1 0 17 0 1 1 1 0 1 0 08 1 0 0 0 1 1 0 09 1 0 0 1 1 1 0 1
10 1 0 1 0 1 1 1 111 1 0 1 1 1 1 1 012 1 1 0 0 1 0 1 013 1 1 0 1 1 0 1 114 1 1 1 0 1 0 0 115 1 1 1 1 1 0 0 0
Computer Arithmetic: Principles, Architectures, and VLSI Design 13
3 Number Representations 3.3 Redundant Number Systems
3.3 Redundant Number Systems Non-binary , redundant , weighted number systems [1, 2] Digit set larger than radix (typically radix 2)
multiplerepresentations of same number redundancy
+ No carry-propagation in adders
more efcient impl.of adder-based units (e.g. multipliers and dividers)
Redundancy
no direct implementation of relationaloperators conversion to irredundant numbers
Several bits used to represent one digit
higher storagerequirements
Expensive conversion into irredundant numbers (notnecessary if redundant input operands are allowed)
Delayed-carry of half-adder number representation :
0 1 2 ,
0 1 ,
'
%
1
)
2 % 1
, % 1
0
7
8 1
0
2
'
)
1 digit holds sum of 2 bits (no carry-out digit) example :
'
00 10)
00 10 01 01 '
10 00)
irredundant representation of 1 [8], since
%
1
0 &
1
1
0
Carry-save number representation :
0 1 2 3 ,
0 1 ,
'
%
1
)
2 % 1
7
8 1
0
2
'
)
Computer Arithmetic: Principles, Architectures, and VLSI Design 14
3 Number Representations 3.3 Redundant Number Systems
1 digit holds sum of 3 bits or 1 digit + 1 bit (nocarry-out digit, i.e. carry is saved )
standard redundant number system for fast addition
Signed-digit (SD) or redundant digit (RD) numberrepresentation :
9
1 0 1
1 0 1 ,
7
8 1
0
2
no carry-propagation in
:
9
'
%
1
)
2 % 1
, % 1
1 0 1
'
%
1
)
is redundant (e.g. 0 1 01 11)
0
'
) !
1 0 1 1 digit holds sum of 2 digits (no carry-out digit) minimal SD representation : minimal number of
non-zero digits, 011
1 10 100
0 10 applications : sequential multiplication (less cycles),
lters with constant coefcients (less hardware) example :
7 '
0111!
1111!
1011!
minimal
1001!
11111!
)
canonical SD repres.: minimal SD + not two non-zero
digits in sequence,
01
1
10
10
0
10
SD binary : carry-propagation necessary (
adder) other applications : high-speed multipliers [9] similar to carry-save , simple use for signed numbers
Computer Arithmetic: Principles, Architectures, and VLSI Design 15
8/14/2019 Comp Arith Notes
6/26
3 Number Representations 3 .4 Residue Number Systems (RNS)
3.4 Residue Number Systems (RNS)
Non-binary , irredundant , non-weighted number system [1]
+ Carry-free and fast additions and multiplications
Complex and slow other arithmetic operations(e.g. comparison, sign and overow detection) because
digits are not weighted , conversion to weightedmixed-radix or binary system required Codes for error detection and correction [1] Possible applications (but hardly used) :
digital lters : fast additions and multiplications error detection and correction for arithmetic operations
in conventional and residue number systems Base is 2 -tuple of integers
'
7
8 1 7 8 2 1 1 1 0)
,residues (or moduli ) pairwise relatively prime
'
7
8 1 7
8 2 1 1 1
0
)
&
1 &
2 &
0 ,
0 1 1 1 1 1
Range:
7
8 1
0
, anywhere in ZZ
mod !
!
&
,
!
!
7
8 1
0
,
'
1 1 1
0 1
0 1 1 1 )
Computer Arithmetic: Principles, Architectures, and VLSI Design 16
3 Number Representa tions 3 .4 Residue Number Systems (RNS)
Arithmetic operations : (each digit computed separately)
"
!
!
&
!
#
'
) !
&
#
' !
!
&
)
&
!
#
'
) !
&
!
!
&
!
!
&
!
!
&
&
!
!
&
!
!
&
!
!
&
!
!
&
&
!
!
&
!
!
&
!
!
&
8 1
&
&
8 2
&
(Fermats theorem)
Best moduli are 2 and'
2 1)
: high storage efciency with 5 bits simple modular addition : 2 : 5 -bit adder without ,
2 1 : 5 -bit adder with end-around carry ( 7 ) Example :
'
1 0)
'
3 2)
,
6
4 3 2 1 0 1 2 3 4 5 6 7 8 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 0 1 0 1 0 1 0 1 0 1 0 1 0
possible range!
5!
6
'
1 0)
' !
5!
3 !
5!
2)
'
2 1)
!
4 5!
6 '
1 0)
'
2 1)
' !
1 2!
3 !
0 1!
2)
'
0 1)
!
3!
6!
4 5!
6 '
1 0)
'
2 1)
' !
1 2!
3 !
0 1!
2)
'
2 0)
!
2!
6
Computer Arithmetic: Principles, Architectures, and VLSI Design 17
3 Number Representations 3.5 Floating-Point Numbers
3.5 Floating-Point Numbers Larger range , smaller precision than xed-point
representation, inexact , real numbers [1, 2] Double-number form
discontinuous precision S biased exponent E unsigned norm. mantissa M
'
1)
'
1)
1 1
2
8
Basic arithmetic operations :
'
1)
!
%
"
'
1)
'
1)
"
! #
'
$
$
!
)
%
&
base on xed-point add, multiply, and shift operations postnormalization required (1
6
1) Applications :
processors : real oating-point formats (e.g. IEEEstandard), large range due to universal use
ASICs : usually simplied oating-point formats withsmall exponents, smaller range, used for rangeextension of normal xed-point numbers
IEEE oating-point format :precision 2 2 2 bias range precision
single 32 23 8 127 3 1 8 1038 108 7
double 64 52 11 1023 9 10307 108 15
Computer Arithmetic: Principles, Architectures, and VLSI Design 18
3 Number Representa tions 3 .7 Anti tet rat ional Number System
3.6 Logarithmic Number System Alternative representation to oating-point (i.e. mantissa
+ integer exponent only xed-point exponent ) [1] Single-number form
continuous precision
higheraccuracy, more reliable
S biased xed-point exponent E
'
1)
'
1)
2
8
(signed-logarithmic ) Basic arithmetic operations :
'
)
'
$
$
!
)
(additionally consider sign)
: by approximation or addition in conventionalnumber system and double conversion
'
1)
%
'
'
1)
'
(
)
0
'
1)
1
'
+ Simpler multiplication/exponent., more complex addition
Expensive conversion : (anti)logarithms (table look-up) Applications : real-time digital lters
3.7 Antitetrational Number System
Tetration (t. ( 22 2 2
2
3
4
) and antitetration (a.t. ( ) [10]
Larger range , smaller precision than logarithmic repres.,otherwise analogous (i.e. 2
3
t. ( log ( a.t. ( )
Computer Arithmetic: Principles, Architectures, and VLSI Design 19
8/14/2019 Comp Arith Notes
7/26
3 Number Representations 3.8 Composite Arithmetic
3.8 Composite Arithmetic Proposal for a new standard of number representations [10] Scheme for storage and display of exact (primary:
integer , secondary: rational ) and inexact (primary:logarithmic , secondary: antitetrational ) numbers
Secondary forms used for numbers not representable by
primary ones (
no over-/underow handling necessary) Choice of number representation hidden from user, i.e.
software/compiler selects format for highest accuracy Number representations :
tag valueinteger : 00 2s complement integer
rational : 01 slash denominator numerator
logarithmic : 10 log integer log fraction
antitetrational : 11 a.t. integer a.t. fraction Rational numbers : slash position (i.e. size of numerator/
denominator) is variable and stored (oating slash) Storage form sizes : 32-bit (short), 64-bit (normal),
128-bit (long), 256-bit (extended) Implementation : mixed hardware/software solutions Hardware proposal : long accumulator (4096 bits) holds
any oating-point number in xed-point format
higher accurary
large hardware/software overhead
Computer Arithmetic: Principles, Architectures, and VLSI Design 20
3 Number Representations 3.9 Round-Off Schemes
3.9 Round-Off Schemes Intermediate results with
additional lower bits(
higher accuracy) :
'
7
8 1 1 1 1 0 8 1 1 1 1 8
)
Rounding : keeping error small during nal word length reduction :
'
7
8 1 1 1 1
0)
Trade-off : numerical accuracy vs. implementation cost
Truncation :
'
7
8 1 1 1 1 0)
0
12
12
1 (= average error )
Round-to-nearest (i.e. normal rounding ) :
'
7
8 1 1 1 1
0
)
12
0 1 12
0
12
1 (nearly symmetric) 0 1 12 can often be included in previous operation
Round-to-nearest-even/-odd :
8
if '
8 1
1 1 1
8
)
0
0'
7
8 1 1 1 1
1 0)
otherwise
0
0 (symmetric) mandatory in IEEE oating-point standard
3 guard bits for rounding after oating-point operations :guard bit (postnormalization), round bit
(round-to-nearest), sticky bit
(round-to-nearest-even)
Computer Arithmetic: Principles, Architectures, and VLSI Design 21
4 Addition 4.1 Overview
4 Addition
4.1 Overview
adders.epsi103 4 121 mm
HA FA (m,k) (m,2)1-bit adders
RCA CSKA CSLA CIA
CLA PPA COSA
carry-propagate adders
carry-save adders
CSA
adderarray
addertree
arrayadder
treeaddermulti-operand adders
CPA
3-operand
multi-operand
Legend:
HA: half -adderFA: full-adder(m,k): (m,k)-counter(m,2): (m,2)-compressor
CPA: carry-propagate adderRCA: ripple-carry adderCSKA:carry-skip adderCSLA: carry-select adderCIA: carry-increment adder
CLA: carry-lookahead adderPPA: parallel-prefix adderCOSA:conditional-sum adder
CSA: carry-save adder
based on component related component
Computer Arithmetic: Principles, Architectures, and VLSI Design 22
4 Addition 4.2 1-Bit Adders, (m, k)-Counters
4.2 1-Bit Adders, (m, k)-Counters
Add up bits of same magnitude (i.e. 1-bit numbers)
Output sum as 5 -bit number ( 5
log
1)
or : count 1s at inputs
(m, k)-counter [3](combinational counters)
Half-adder (HA), (2, 2)-counter
'
)
2
3
2
'
1
)
(sum)
(carry-out)
hasym.epsi18 4 23 mmHA
a
c out
s
b haschema1.epsi
19 4 28 mm
a
c out
s
b
(reference)
haschema2.epsi21 4 43 mm
a
c out
s
b
Computer Arithmetic: Principles, Architectures, and VLSI Design 23
8/14/2019 Comp Arith Notes
8/26
4 Addition 4.2 1-Bit Adders, (m, k)-Counters
Full-adder (FA), (3, 2)-counter
'
)
2
7
7 4'
2)
(generate) 0
(propagate) 1
7
7
7
7
'
)
7
7
7
7
7
0
7
1
fasymbol.epsi18 4 21 mmFA
a
c out
s
b
c in
faschematic3.epsi29 4 32 mm
a
c out
s
b
c in
HA
HA
g
p faschematic2.epsi32 4 35 mm
a
c out
s
b
c in
faschematic1.epsi29 4 43 mm
a
c out
s
b
c in
g p
(reference)
faschematic4.epsi29 4 41 mm
a
c out
s
b
c in p
0
1faschematic5.epsi
35 4 47 mm
a
c out
s
b
c in
0
1
c 0
c 1
Computer Arithmetic: Principles, Architectures, and VLSI Design 24
4 Addition 4.2 1-Bit Adders, (m, k)-Counters
(m, k)-counters'
8 1 1 1 1
0)
8 1
$
0
$ 2$
&
8 1
0
cntsymbol.epsi18 4 23 mm(m,k)
a m-1...
...
a 0
s k-1 s 0 Usually built from full-adders
Associativity of addition allows convertion from linear totree structure faster at same number of FAs
7 log &
1
28
7'
log )
4 2
log
4
log3 !
2
log
Example : (7, 3)-counter
28 14
count73ser.epsi42 4 59 mm
FA
a 0
FA
FA
FA
a 1 a 2 a 3 a 4 a 5 a 6
s 0 s 1s 2 linear structure
28 10
count73par.epsi36 4 48 mm
FA
a 0
FA
FA
FA
a 1 a 2 a 3 a 4 a 5 a 6
s 0 s 1s 2
tree structure
Computer Arithmetic: Principles, Architectures, and VLSI Design 25
4 Addition 4.3 Carry-Propagate Adders (CPA)
4.3 Carry-Propagate Adders (CPA)
Add two 2 -bit operands
and
and an optional carry-in
7 by performing carry-propagation [1, 2, 11] Sum
'
)
is irredundant '
2
1)
-bit number'
)
27
7
2 % 1
;0 0 1 1 1 1 2 1
0
7
7 (r.m.a.)cpasymbol.epsi
29 4 26 mmc out CPA
A B
S
c in
Ripple-carry adder (RCA)
Serial arrangement of 2 full-adders Simplest , smallest , and slowest CPA structure
7 2 2 2
14 2 2
rca.epsi57 4 23 mmFAc out c in
a n-1 b n-1
s n-1
FA
a 1 b 1
s 1
FA
a 0 b 0
s 0
c 1c 2 c n-1
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Carry-propagation speed-up techniques
a) Concatenation of partial CPAs with fast 7
speedup1.epsi84 4 26 mm
a i-1:k b i-1:k
s i-1:k
c in c out CPA CPA
a k-1:0 b k-1:0
CPA
a n-1:j b n-1:j
s k-1:0 s n-1:j
c k c i c j
. . .
. . .
a) Fast carry look-ahead logic for entire range of bits
speedup2.epsi104 4 50 mm
c out c in
a n-1 b n-1
s n-1
a 1 b 1
s 1
a 0 b 0
s 0
. . .
. . .
preprocessing
postprocessing
carry propagation
Computer Arithmetic: Principles, Architectures, and VLSI Design 27
8/14/2019 Comp Arith Notes
9/26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Carry-skip adder (CSKA)
Type a) : partial CPA with fast
8 1:
8 1: (bit group'
8 1 1 1 1 )
)
8 1:
8 1
8 2
(group propagate)
1)
8 1: 0 :
and
selected (
)2)
8 1: 1 :
but
skipped (
)
path
never sensitized
fast
false path
inherent logic redundancy
problems incircuit optimization, timing analysis, and testing
Variable group sizes (faster) : larger groups in the middle(minimize delays 0
8 1 and
7
8 1) Partial CPA typ. is RCA or CSKA (
multilevel CSKA) Medium speed-up at small hardware overhead
(+ AND/bit + MUX/group)
82
42 1
1
2
322 3
1
2
cska.epsi99 4 36 mm
a i-1:k b i-1:k
s i-1:k
c in c out
CPA0
1
P i-1:k
CPA
a k-1:0 b k-1:0
CPA
a n-1:j b n-1:j
s k-1:0 s n-1:j
c k
c i
c i c j
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 28
4 Addition 4.3 Carry-Propagate Adders (CPA)
Carry-select adder (CSLA) Type a) : partial CPA with fast
and
8 1:
8 1: 0
8 1: 1
8 1:
0
1
Two CPAs compute two possible results ( 7 0 1),
group carry-in selects correct one afterwards Variable group sizes (faster) : larger groups at end (MSB)
(balance delays 0 and 0 ) Part. CPA typ. is RCA, CSLA ( multil. CSLA), or CLA High speed-up at high hardware overhead
(+ MUX/bit + (CPA + MUX)/group)
14 2
2 1 8 2 11
2
39 2 31
2
csla.epsi102 4 50 mm
c in c out CPA
a k-1:0 b k-1:0
s k-1:0
0CPA
CPA
0 1
10
1
s i-1:k 0 s i-1:k
1
c i 0
c i 1
a i-1:k b i-1:k
s i-1:k
c k c i
. . .
. . .
c k
Computer Arithmetic: Principles, Architectures, and VLSI Design 29
4 Addition 4.3 Carry-Propagate Adders (CPA)
Carry-increment adder (CIA) Type a) : partial CPA with fast
and
8 1:
8 1:
8 1:
8 1:
8 1:
8 1
8 2
(group propagate)
Result is incremented after addition, if
1 [12, 11] Variable group sizes (faster) : larger groups at end (MSB)
(balance delays 0 and
) Part. CPA typ. is RCA, CIA (
multilevel CIA) or CLA High speed-up at medium hardware overhead
(+ AND/bit + (incrementer + AND-OR)/group) Logic of CPA and incrementer can be merged [11]
10 2
2 1 8 2 11
2
28 2 31
2
cia.epsi86 4 43 mm
c in c out CPA
a k-1:0 b k-1:0
s k-1:0
c k c i
a i-1:k b i-1:k
s i-1:k
0CPA
+1
c i
s i-1:k
P i-1:k
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 30
4 Addition 4.3 Carry-Propagate Adders (CPA)
Example : gate-level schematic of carry-incr. adder (CIA) only 2 different logic cells ( bit-slices ) : IHA and IFA
4 6 10 12 14 16 18 20 22 24 26 28 ... 38max group 2 3 4 5 6 7 8 9 10 11 ... 16
1 2 4 7 11 16 22 29 37 46 56 67 ... 137
ciagate.epsi100 4 112 mm s k
a k b k a k+1 b k+1
s i-2
a i-2 b i-2
s i-1
a i-1 b i-1
c k c i
. . .
. . .
. . .
c in c out
IFA IFA IFA IHA
IHAIFA + IHA(i-k-1)IFA + IHA
. . .. . .
s k+1
2IFA + IHA IHA
bit 0bit 1bits 3,2bits 6...4bits i-1...k
Computer Arithmetic: Principles, Architectures, and VLSI Design 31
8/14/2019 Comp Arith Notes
10/26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Conditional-sum adder (COSA)
Type a) : optimized multilevel CSLA with'
log 2 )
levels(i.e. double CPAs are merged at higher levels)
Correct sum bits ( 0 8 1: or 1
8 1: ) are (conditionally )selected through
'
log 2 )
levels of multiplexers Bit groups of size 2 at level
Higher parallelism , more balanced signal paths Highest speed-up at highest hardware overhead
(2 RCA + more than'
log 2 )
MUX/bit)
3 2 log 2
2log 2
6 2 log 2 2
cosa.epsi100 4 57 mm
c in FA
a 0 b 0
s 3
0 1
FA
FA
0
1
a 1 b 1
0 1
0 1
FA
FA
0
1
a 3 b 3
0 1 0 1
FA
FA
0
1
a 2 b 2
0 1
c out s 0 s 2 s 1
0 1
l e v e
l 2
l e v e
l 1
l e v e
l 0
. . .
...
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 32
4 Addition 4.3 Carry-Propagate Adders (CPA)
Carry-lookahead adder (CLA), traditional
Type b) : carries looked ahead before sum bits computed Typically 4-bit blocks used (e.g. standard IC SN74181)
0
0 1
0
0
0
2
1
1
0
1
0
0 3
2
2
1
2
1
0
2
1
0
0
3
3
3
2
3
2
1
3
2
1
0
3
3
2
1
0
clbsymbol.epsi27 4 26 mm
c 3
CLBc 0
. . .
c 0 . . .
(g ,p )0 0
(g ,p )3 3
(g ,p )3 3
Hierarchical arrangement using'
12 log
2
)
levels :'
3
3
)
passed up,
0 passed down between levels High speed-up at medium hardware overhead
14 2
4 log 2
56 2 log 2
cla.epsi97 4 48 mm
CLB CLB CLB CLB
CLB c in
(g ,p )3 3 ... (g ,p )0 0 (g ,p )7 7 ... (g ,p )4 4 (g ,p )11 11 ... (g ,p )8 8 ... (g ,p )12 12 (g ,p )15 15
c 15 c 12 ...
c 12 c 8 c 4 c 0
( g
, p
)
3
3
c 11 c 8 ... c 7 c 4 ... c 3 c 0 ...
( g
, p
)
1 5
1 5
( g
, p
)
7
7
( g
, p
)
1 1
1 1
+ preprocessing :
+ postprocessing :
Computer Arithmetic: Principles, Architectures, and VLSI Design 33
4 Addition 4.3 Carry-Propagate Adders (CPA)
Parallel-prex adders (PPA) Type b) : universal adder architecture comprising RCA,
CIA, CLA, and more (i.e. entire range of area-delaytrade-offs from slowest RCA to fastest CLA)
Preprocessing , carry-lookahead , and postprocessing step Carries calculated using parallel-prex algorithms
+ High regularity : suitable for synthesis and layout
+ High exibility : special adders, other arithmeticoperations, exchangeable prex algorithms (i.e. speeds)
+ High performance : smallest and fastest adders
5 2 3
4 2
add.epsi///gures73 4 64 mm
a n - 1
a 0
b n - 1
b 0
s n - 1
s 0
c out
c in
c n p n-1
(g , p )0 0
c 0 p 0 c 1
(g , p )n-1n-1
...
a 1
b 1
s 1
a n - 2
b n - 2
s n - 2
...
... ...
preprocessing:
carry-lookahead:prex algorithm
postprocessing:
Computer Arithmetic: Principles, Architectures, and VLSI Design 34
4 Addition 4.3 Carry-Propagate Adders (CPA)
Prex problem Inputs
'
(
7
8 1 1 1 1 (
0)
, outputs'
7
8 1 1 1 1
0)
, associativebinary operator [11, 13]'
7
8 1 1 1 1
0)
'
(
7
8 1
(
0 1 1 1 (
1(
0 (
0)
or
0 (
0
(
8 1 ; 0 1 1 1 1 2
1 (r.m.a.)
Associativity of tree structures for evaluation :(
3
'
(
2
'
(
1(
0
'
1
11:0
)
'
2
2
2:0
)
'
3
33:0
'
(
3(
2
13:2
)
'
(
1(
0
'
1
11:0
)
'
3
2
3:0
, but 2 ?
Group variables : : covers bits'
(
1 1 1
(
)
at level
Carry-propagation is prex problem : : '
:
:
)
'
0
:
0
:
)
'
)
'
:
:
)
'
8 1
:$
%
1
8 1
:$
%
1
)
'
8 1$
:
8 1$
:
)
; 5 6
6 0
'
8 1
:$
%
1
8 1
:$
%
1 8 1
$
:
8 1
:$
%
1
8 1$
:
)
%
1 &
:0 ; 0 0 1 1 1 2
1
1 1 1 1
Parallel-prex algorithms [11] :
multi-tree structures ( 3
'
2
)
3
'
log2
)
) sharing subtrees (
3
'
2 2)
3
'
2 log 2 )
) different algorithms trading area vs. delay (inuences
also from wiring and maximum fan-out
3
)
Computer Arithmetic: Principles, Architectures, and VLSI Design 35
8/14/2019 Comp Arith Notes
11/26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Prex algorithms
Algorithms visualized by directed acyclic graphs (DAG)with array structure ( 2 bits
levels) Graph vertex symbols :
8 1
:$
%
1
8 1
:$
%
1
8 1$
:
8 1$
:
:
:
:
:
(contains logic for )
8 1
:
8 1
:
:
:
:
:
(contains no logic)
Performance measures :
: graph size (number of black nodes)
: graph depth (number of black nodes on critical path) Serial -prex algorithm (
RCA)
2
1 2 1
3
2
ser.epsi///gures69 4 38 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0123
1415
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 36
4 Addition 4.3 Carry-Propagate Adders (CPA)
Sklansky parallel-prex algorithm (
PPA-SK) Tree-like collection, parallel redistribution of carries
12
2 log 2
log 2 !
3
12
2
sk.epsi///gures67 4 30 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1234
0
Brent-Kung parallel-prex algorithm (
PPA-BK) Traditional CLA is PPA-BK with 4-bit groups Tree-like redistribution of carries (fan-out tree)
2 2
log 2 !
2 2
log 2 !
2
3
log 2
bk.epsi///gures67 4 38 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1234
0
56
Computer Arithmetic: Principles, Architectures, and VLSI Design 37
4 Addition 4.3 Carry-Propagate Adders (CPA)
Kogge-Stone parallel-prex algorithm (
PPA-KS) very high wiring requirements
2 log 2 2 1
log 2 !
3
2
ks.epsi///gures67 4 52 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1
2
3
4
0
Carry-increment parallel-prex algorithm ( CIA)
2 2 1 1 4 2 11
2
1 1 4 2 11
2
3
1 1 4 2 11
2
cia.epsi///gures67 4 34 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01
2345
Computer Arithmetic: Principles, Architectures, and VLSI Design 38
4 Addition 4.3 Carry-Propagate Adders (CPA)
Mixed serial/parallel -prex algorithm (
RCA + PPA)
linear size-depth trade-off using parameter 5 :
0 6 5 6 2 2
log 2 !
2
5 0 : serial-prex graph5
2
2
log 2 !
1 : Brent-Kung parallel-prexgraph
lls gap between RCA and PPA-BK (i.e. CLA) in stepsof single -operations
2
1 5 2 1 5
3
var.
var.epsi///gures68 4 54 mm
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
012345678
910
Computer Arithmetic: Principles, Architectures, and VLSI Design 39
8/14/2019 Comp Arith Notes
12/26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Example : 4-bit parallel-prex adder (PPA-SK) efcient AND-OR-prex circuit for the generate and
AND-prex circuit for the propagate signals optimization : alternatingly AOI-/OAI- resp. NAND-/
NOR-gates (inverting gates are smaller and faster) can also be realized using two MUX-prex circuits
askgate.epsi///gures100 4 103 mm
c out
a 3 b 3
s 3 s 2 s 1 s 0 P n-1:0
a 2 b 2 a 1 b 1 a 0 b 0
c in
Computer Arithmetic: Principles, Architectures, and VLSI Design 40
4 Addition 4.3 Carry-Propagate Adders (CPA)
Prex adder synthesis
Local prex graph transformation :
3
3unfact.epsi
20 4 26 mm
0123
012
3
depth-decr.transform
size-decr.
transform
fact.epsi20 4 26 mm
0123
012
3
4
2
Repeated (local) prex transformations result in overallminimization of graph depth or size
which sequence ? Goal: minimal size (area) at given depth (delay) Simple algorithm for sequence of applied transforms :
Step 1 : prex graph compression (depth minimization) :depth-decr. transforms in right-to-left bottom-up order
Step 2 : prex graph expansion (size minimization) :size-decreasing transforms in left-to-right top-downorder, if allowed depth not exceeded
Prex adder synthesis : 1) generate serial-prex graph, 2)graph compression, 3) depth-controlled graph expansion,4) generate pre-/postprocessing and prex logic
+ Generates all previous prex graphs (except PPA-KS)
+ Universal adder synthesis algorithm : generatesarea-optimal adders for any given timing constraints [11](including non-uniform signal arrival times)
Computer Arithmetic: Principles, Architectures, and VLSI Design 41
4 Addition 4.3 Carry-Propagate Adders (CPA)
Multilevel adders Multilevel versions of adders of type a) possible (CSKA,
CSLA, and CIA; notation: 2-level CIA = CIA-2L)
+ Delay is3
'
2 11
&
%
1 )
for levels
Area increase small for CSKA and CIA,high for CSLA ( COSA)
Difcult computation of optimal group sizes
Hybrid adders
Arbitrary combinations of speed-up techniques possible
hybrid/mixed adder architectures Often used combinations : CLA and CSLA [14]
Pure architectures usually perform best (at gate-level)
Transistor-level adders Inuence of logic styles (e.g. dynamic logic,
pass-transistor logic
faster)
+ Efcient transistor-level implementation of ripple-carrychains (Manchester chain) [14]
+ Combinations of speed-up techniques make sense
Much higher design effort Many efcient implementations exist and published
Computer Arithmetic: Principles, Architectures, and VLSI Design 42
4 Addition 4.3 Carry-Propagate Adders (CPA)
Self-timed adders
Average carry-propagation length : log 2
+ RCA is fast in average case ( 3
'
log 2 )
), slow in worstcase suitable for self-timed asynchronous designs [15]
Completion detection is not trivial
Adder performance comparisons
Standard-cell implementations, 0 1 8 process
addperf.ps84 4 84 mm
RCA
CSKA-2L
CIA-1L
CIA-2L
PPA-SK
PPA-BK
CLA
COSA
const. AT
area [lambda^2]
delay [ns]2
5
1e+06
2
5
1e+07
5 10 20
8-bit
16-bit
32-bit
64-bit
128-bit
Computer Arithmetic: Principles, Architectures, and VLSI Design 43
8/14/2019 Comp Arith Notes
13/26
4 Addition 4.3 Carry-Propagate Adders (CPA)
Complexity comparison under the unit-gate model
adder A T AT opt. 1 syn. 2
RCA 7 2 2 2 14 2 2 aaa0
CSKA-1L 8 2 4 2 11
2 32 2 31
2 aat 3
CSKA-2L 8 2 ( 2 11
3 4 ( 2 41
3 4
CSLA-1L 142
21
82 1
1
2
392 3
1
2
CIA-1L 10 2 2 1 8 2 11
2 28 2 31
2 att0
CIA-2L 10 2 3 1 6 2 11
3 36 2 41
3 att0
CIA-3L 10 2 4 1 4 2 11
4 44 2 51
4 0
PPA-SK 322 log 2 2log 2 3 2 log2 2 ttt
0
PPA-BK 10 2 4log 2 40 2 log 2 att0
PPA-KS 3 2 log 2 2log 2 6 2 log2 2 CLA 5 14 2 4log 2 56 2 log 2 (
0
)COSA 3 2 log 2 2log 2 6 2 log2 2
1 optimality regarding area and delayaaa : smallest area, longest delayaat : small area, medium delayatt : medium area, short delayttt : large area, shortest delay : not optimal
2 obtained from prex adder synthesis3 automatic logic optimization not possible (redundancy)4 exact factors not calculated5 corresponds to 4-bit PPA-BK
Computer Arithmetic: Principles, Architectures, and VLSI Design 44
4 Addition 4.4 Carry-Save Adder (CSA)
4.4 Carry-Save Adder (CSA)
a) Adds three 2 -bit operands
0 ,
1 ,
2 performing nocarry-propagation (i.e. carries are saved ) [1]
'
)
0
1
2
2 % 1
0
1
2 ;
0 0 1 1 1 1 2 1 (n.)
csasymbol.epsi21 4 26 mmCSA
S C
A0 A1 A2
b) Adds one 2 -bit operand to an 2 -digit carry-save operand'
)
'
)
7
Result is in redundant carry-save format ( 2 digits),represented by two 2 -bit numbers
(sum bits) and
(carry bits)
+ Parallel arrangement of 2 full-adders, constant delay
7 2 4
csa.epsi67 4 27 mmFA
s n-1
FA
s 1
FA
s 0
. . .
c n c 2 c 1
a 0
, n - 1
a 1
, n - 1
a 2
, n - 1
a 0
, 1
a 1
, 1
a 2
, 1
a 0
, 0
a 1
, 0
a 2
, 0
Multi-operand carry-save adders ( 3) adder array (linear arrangement), adder tree (tree arr.)
Computer Arithmetic: Principles, Architectures, and VLSI Design 45
4 Addition 4.5 Multi-Operand Adders
4.5 Multi-Operand Adders Add three or more ( 2) 2 -bit operands, yield
'
2
log ! )
-bit result in irredundant number rep. [1, 2]
Array adders Realization by array adders : (see gures on next page)
a) linear arrangement of CPAsb) linear arr. of CSAs (adder array ) and nal CPA
a) and b) differ in bit arrival times at nal CPA : if CPA = RCA : a) and b) have same overall delay
if fast nal CPA : uniform bit arrival times required
CSA array (b) Fast implementation : CSA array + fast nal CPA
(note: array of fast CPAs not efcient/necessary)
'
2)
'
2)
CPA = RCA :
3
'
2
2
)
3
'
2
)
Fast CPA :
3
'
2
2 log 2 )
3
'
log 2 )
mopadd.epsi30 4 58 mm
CSA
A0
CPA
CSA
A1 A2 Am-1
S
A3
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 46
4 Addition 4.5 Multi-Operand Adders
a) 4-operand CPA (RCA) array :
cparray.epsi93 4 57 mm
s n-1
FA
s 1
FA
s 0
a 0
, n - 1
a 1
, n - 1
a 2,n-1
a 0
, 1
a 1
, 1
a 0
, 0
a 1
, 0
FA
HA
FA HA
FA FA HA
FA
FA
FAFA
a 0
, 2
a 1
, 2
a 3,n-1
a 2,2
a 3,2
a 2,1
a 3,1
a 2,0
a 3,0
s 2 s n
CPA
CPA
CPA
. . .
. . .
. . .
. . .
b) 4-operand CSA array with nal CPA (RCA) :
csarray.epsi99 4 57 mm
s n-1
FA
s 1
FA
s 0
a 0
, n - 1
a 1
, n - 1
a 3,n-1
a 0
, 1
a 1
, 1
a 0
, 0
a 1
, 0
FA FA HA
FA HA
FA
FA
FAFA
a 0
, 2
a 1
, 2
a 3,2 a 3,1 a 3,0
s 2 s n
a 2
, n - 1
a 2
, 1
a 2
, 0
a 2
, 2
CSA
CSA
CPA
FA. . .
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 47
8/14/2019 Comp Arith Notes
14/26
4 Addition 4.5 Multi-Operand Adders
(m, 2)-compressors
2'
&
8 4
0
)
&
8 1
0
&
8 4
0
7
cprsymbol.epsi37 4 26 mm(m,2)
a m-1...
a 0
c s
......c in
m-4
c in 0
c out m-4
c out 0
1-bit adders (similar to (m,k)-counters) [16] Compresses bits down to 2 by forwarding
'
3)
intermediate carries to next higher bit position Is bit-slice of multi-operand CSA array (see prev. page)
+ No horizontal carry-propagation (i.e. 7 5
) Built from full-adders (= (3,2)-compressor) or
(4, 2)-compressors arranged in linear or tree structures Example : 4-operand adder using (4, 2)-compressors
cpradd.epsi99 4 44 mm
FA
s n-1
FA
s 1 s 0
(4,2)
HA
(4,2)(4,2)(4,2)
FA
s n s n+1 s 2
a 0 , n
- 1
a 1 , n
- 1
a 2 , n - 1
a 3 , n
- 1
a 0 , 2
a 1 , 2
a 2 , 2
a 3 , 2
a 0 , 1
a 1 , 1
a 2 , 1
a 3 , 1
a 0 , 0
a 1 , 0
a 2 , 0
a 3 , 0
CSA
CPA
Computer Arithmetic: Principles, Architectures, and VLSI Design 48
4 Addition 4.5 Multi-Operand Adders
7'
2)
4'
2)
6'
log !
1)
Optimized (4, 2)-compressor :
2 full-adders merged and optimized (i.e. XORsarranged in tree structure )
14 8
cpr42fa.epsi32 4 38 mm
FA
s
c out FA
a 0 a 1 a 2 a 3
c in
c with full-adders
14 6
cpr42opt.epsi41 4 53 mm
s
c out
a 0 a 1 a 2 a 3
c in
c
0 1
0 1
optimized
+ same area, 25% shorter delay SD-FA (signed-digit full-adder) is similar to
(4, 2)-compressor regarding structure and complexity
Computer Arithmetic: Principles, Architectures, and VLSI Design 49
4 Addition 4.5 Multi-Operand Adders
Advantages of (4, 2)-compressors over FAs for realizing(m, 2)-compressors :
higher compression rate (4:2 instead of 3:2) less deep and more regular trees
tree depth 0 1 2 3 4 5 6 7 8 9 10
FA 2 3 4 6 9 13 19 28 42 63 94# operands
(4,2) 2 4 8 16 32 64 128
Example : (8, 2)-compressor
42 16
cpr82fa.epsi47 4 65 mm
FA
a 0
FA
a 1 a 2 a 3 a 4 a 5 a 6
FA FA
FA
FA
a 7
c s
c in 0 c out
0
c in 1
c in 2
c in 3
c out 1
c out 2
c out 3
c in 4 c out
4
full-adder tree
42 12
cpr82cpr42.epsi47 4 50 mm
(4,2)
a 3 a 0
c s
c in 0 c out
0
a 1a 2 a 7 a 4 a 5 a 6
(4,2)
(4,2)
c in 1
c in 2
c in 3
c out 1
c out 2
c out 3
c in 4 c out
4
(4, 2)-compressor tree
Computer Arithmetic: Principles, Architectures, and VLSI Design 50
4 Addition 4.5 Multi-Operand Adders
Tree adders (Wallace tree)
Adder tree : 2 -bit -operand carry-save adder composed of 2 tree-structured (m, 2)-compressors [1, 17]
Tree adders : fastest multi-operand adders using anadder tree and a fast nal CPA
&
2 2
3
'
2
2 log 2 )
&
2
3
'
log log 2 )
Adder arrays and adder trees revisited
Some FA can often be replaced by HA or eliminated (i.e. redundant due to constant inputs)
Number of (irredundant) FA does not depend on adderstructure, but number of HA does
An -operand adder accomodates'
1)
carry inputs
Adder trees ( 3
'
log 2 )
) are faster than adder arrays(
3
'
2
)
) at same amount of gates (
3
'
2
)
)
Adder trees are less regular and have more complexrouting than adder arrays
larger area, difcult layout(i.e. limited use in layout generators)
Computer Arithmetic: Principles, Architectures, and VLSI Design 51
8/14/2019 Comp Arith Notes
15/26
4 Addition 4.6 Sequential Adders
4.6 Sequential Adders
Bit-serial adder : Sequential 2 -bit adder
2
bitseradd.epsi25 4 27 mm
FA
a i b i
s i Accumulators : Sequential -operand adders
With CPA
accucpa.epsi27 4 28 mm
A
CPA
S
With CSA and nal CPA Allows higher clock rates Final CPA too slow :
pipelining or multiplecycles for evaluation
4
accucsa.epsi33 4 52 mm
A
CPA
CSA
S Mixed CSA/CPA : CSA with partial CPAs (i.e. fewer
carries saved), trade-off between speed and register size
Computer Arithmetic: Principles, Architectures, and VLSI Design 52
5 Simple/ Addition-Based Operations 5.1 Complement and Subtraction
5 Simple / Addition-Based Operations
5.1 Complement and Subtraction
2s complementer (negation)
1
neg.epsi21 4 32 mm
+ 1
A
Z
1
2s complement subtractor
'
)
1
sub.epsi29 4 32 mm
c out CPA
A B
S
1
2s complement adder/subtractor
'
1)
'
)
addsub.epsi36 4 35 mm
c out CPA
A B
S
sub
1s complement adder
'
mod 27
1)
(end-around carry)
addmod.epsi29 4 28 mmc out
CPA
A B
S
c in
Computer Arithmetic: Principles, Architectures, and VLSI Design 53
5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement
5.2 Increment / Decrement
Incrementer Adds a single bit 7 to an 2 -bit operand
'
)
27
7
"
%
1
; 0 0 1 1 1 2 1 0
7
7 (r.m.a.)
incsymbol.epsi29 4 26 mmc out
+ 1
A
Z
c in
Corresponds to addition with
0 (
FA HA) Example : Ripple-carry incrementer using half-adders
3 2 2 1
3 2 2
incfa.epsi59 4 23 mmc out c in
a n-1
z n-1
a 1
z 1
a 0
z 0
c 1c 2 c n-1HA HA HA
. . .
. . .
or using incrementer slices (= half-adder)
inc.epsi83 4 33 mm
c out c in
a n-1
z n-1
a 2
z 2
a 1
z 1
a 0
z 0
HA
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 54
5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement
Prex problem :
:
:$
%
1
$
:
AND-prex struct.
12
2 log 2 2 2
log 2 !
2
12
2 log2 2
Decrementer'
)
7
dec.epsi93 4 41 mmc out c in
a 2
z 2
a 1
z 1
a 0
z 0
a n-1
z n-1
. . .
. . .
Incrementer-decrementer'
)
7
'
1)
7
incdec.epsi
944
46 mmc out c in
a 2
z 2
a 1
z 1
a 0
z 0
dec
a n-1
z n-1
. . .
. . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 55
8/14/2019 Comp Arith Notes
16/26
5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement
Fast incrementers
4-bit incrementer using multi-input gates :
inccg.epsi
62 4 39 mm
c out
c in
a 3 a 2 a 1 a 0
z 3 z 2 z 1 z 0
8-bit parallel-prex incrementer (Sklansky AND-prexstructure) :
incpp.epsi98 4 63 mm
c out
c in
a 7 a 6 a 5 a 4
z 7 z 6 z 5 z 4
a 3 a 2 a 1 a 0
z 3 z 2 z 1 z 0
Computer Arithmetic: Principles, Architectures, and VLSI Design 56
5 Simple/ Addi tion-Based Operat ions 5 .2 Increment / Decrement
Gray incrementer
Increments in Gray number system
0 7 8 1 7 8 2 0 (parity)
%
1
; 0 0 1 1 1 2 3 (r.m.a.)" 0 0 0
"
8 1
8 1 ; 0 1 1 1 1 2
2"
7
8 1 7 8 1 7 8 2
Prex problem
AND-prex structure
Computer Arithmetic: Principles, Architectures, and VLSI Design 57
5 Simple / Addition-Based Operations 5.3 Counting
5.3 Counting Count clock cycles counter ,
divide clock frequency
frequency divider ( )
Binary counter Sequential in-/decrementer Incrementer speed-up
techniques applicable Down- and up-down-counters
using decrementers / incrementer-decrementers
cntblock.epsi32 4 33 mm
c out + 1
Q
c in
clk
Example : Ripple-carry up-counter using counter slices(= HA + FF), 7 is count enable
cntripple.epsi87 4 36 mm
c out c in
q n-1 q 2 q 1 q 0
. . .
Asynchronous counter using toggle-ip-ops(lower toggle rate
lower power)
cntasync.epsi64 4 18 mm
clk
q n-1 q 2 q 1 q 0
TTTT . . .
Computer Arithmetic: Principles, Architectures, and VLSI Design 58
5 Simple / Addition-Based Operations 5.3 Counting
Fast divider ( 3
'
1)
) using delayed-carry numbers(irredundant carry-save represention of 1 allows usingfast carry-save incrementer) [8]
Gray counter Counter using Gray incrementer
Ring counters Shift register connected to ring :
cntring.epsi51 4 16 mm
q n-1 q 0 q 1q 2
State is not encoded
2 FF for counting 2 states Must be initialized correctly (e.g. 00 01) Applications:
fast dividers (no logic between FF) state counter for one-hot coded FSMs
Johnson / twisted-ring counter (inverted feed-back) :
cntjohnson.epsi59 4 16 mm
q n-1 q 0 q 1q 2
2 FF for counting 2 2 states
Computer Arithmetic: Principles, Architectures, and VLSI Design 59
8/14/2019 Comp Arith Notes
17/26
5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection
5.4 Comparison, Coding, Detection
Comparison operations$
'
)
(equal) $
'
)
$
(not equal)
$
'
)
(greater or equal)
'
)
$
(less than)
'
)
$
$
(greater than)
$
'
6
)
$
$
(less or equal)
Equality comparison$
'
)
%
1 '
)
'
)
;0 0 1 1 1 2 1
0 1 $
7 (r.s.a.)
cmpeq.epsi40 4 36 mm
a n - 1
a 2
a 1
a 0
EQ
b n - 1
b 2
b 1
b 0
. . .
Magnitude comparison
$
'
)
%
1 '
)
'
)
'
)
; 0 0 1 1 1 2 1
0 1 $
7 (r.s.a.)
Computer Arithmetic: Principles, Architectures, and VLSI Design 60
5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection
Comparators Subtractor (
)
:
$
$
7
8 1:0
(for free in PPA)
7 2 2 2 or
8
32
2 log 2 8
2 log 2
cmpsub.epsi37 4 31 mm
CPA
A B
1c out GE =
P n-1:0 EQ =
Optimized comparator : removing redundancies in subtractor (unused ) single-tree structure
speed-up at no cost :
6 2 2 2
2log 2
example : ripple comparator using comparator slices
cmpripple.epsi100 4 47 mm
a n - 1
a 2
a 1
EQ
b n - 1
b 2
b 1
a 0
b 0
GE
. . .
equality
magnitude
equality &magnitude
Computer Arithmetic: Principles, Architectures, and VLSI Design 61
5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection
Decoder Decodes binary number
7
8 1:0 to vector
&
8 1:0 ( 27
)
"
1 if
0
0 else ; 0 0 1 1 1 1
2
decodersym.epsi21 4 26 mmdecoder
A
Z
decoder.epsi58 4 28 mm
a 2 a 1 a 0
z 3 z 2 z 1 z 0 z 7 z 6 z 5 z 4
'
2
1)
27
log2
!
Encoder Encodes vector
&
8 1:0 to binary number
7
8 1:0 ( 27
)(condition: 0
5
!
if 5 0 then
1 else
0)
0 if 1 ; 0 0 1 1 1 1
log 2
encodersym.epsi21 4 26 mmencoder
A
Z
2
'
27
8 1 1
)
2
1
encoder.epsi30 4 34 mm
a 0
z 0
z 1
z 2
a 2 a 4 a 6 a 1a 3 a 5 a 7
(note: connectionsaccording to PPA-SK)
Computer Arithmetic: Principles, Architectures, and VLSI Design 62
5 Simple/ Addition-Based Operations 5.4 Comparison, Coding, Detection
Detection operations
All-zeroes detection : " 7 8 1 7 8 2 0
All-ones detection : " 7 8 1 7 8 2 0 (r.s.a.)
2
log 2
Leading-zeroes detection (LZD) : for scaling , normalization , priority encoding
a) non-encoded output :
0 1
0!
1
0 1
0
(e.g. 000101 000100)
2 2 2
lzdnenc.epsi50 4 28 mm
a 1
z 1
a 0
z 0
a n-1
z n-1
. . .
a n-2
z n-2
. . .
prex problem (r.m.a.)
AND-prex structure
b) encoded output : + encoder
signed numbers : + leading-ones detector (LOZ)
Computer Arithmetic: Principles, Architectures, and VLSI Design 63
8/14/2019 Comp Arith Notes
18/26
5 Simple/ Addition-Based Operations 5.5 Shift, Extension, Saturation
5.5 Shift, Extension, Saturation
Shift : a) shift 2 -bit vector by 5 bit positionsb) select 2 out of more bits at position 5
also: logical (= unsigned), arithmetic (= signed)
Rotation by 5 bit positions, 2 constant (logic operation)Extension of word lengths by 5 bits ( 2 2 5 )
(i.e. sign-extension for signed numbers)Saturation to highest/lowest value after over-/underow
shift a) un- l. 7 8 2 1 1 1 0 0 sllsigned r. 0 7 8 1 1 1 1 1 srlsigned l. 7 8 1 7 8 3 1 1 1 0 0 sla
r. 7 8 1 7 8 1 7 8 2 1 1 1 1 sra
shift b) unsigned 7 %
8 1 1 1 1
signed 2 7 8 1 7 % 8 2 1 1 1
rotate l. 7 8 2 1 1 1 0 7 8 1 rol
r. 0
7
8
1 1 1 1
1 rorextend un- l. 0 7 8 1 1 1 1 0
signed r. 7 8 1 1 1 1 0 0signed l. 7 8 1 7 8 1 7 8 2 1 1 1 0
r. 7 8 1 7 8 2 1 1 1 0 0
saturate unsigned 7 8 1 1 1 1 7 8 1signed 7 8 1 7 8 1 1 1 1 7 8 1
Computer Arithmetic: Principles, Architectures, and VLSI Design 64
5 Simple/ Addition-Based Operations 5.5 Shift, Extension, Saturation
Applications : adaption of magnitude (shift a)) or word length
(extension) of operands (e.g. for addition) multiplication/division by multiples of 2 (shift) logic bit/byte operations (shift, rotation) scaling of numbers for word-length reduction (i.e.
ignore leading zeroes, shift b)) or normalization (e.g.of oating-point numbers, shift a)) using LZD
reducing error after over-/underow (saturation) Implementation of shift/extension/rotation by
constant values : hard-wired variable values : multiplexers 2 possible values : 2 by 2 barrel-shifter/rotator
Example : 4by4 barrel-rotator
3
'
2 2 )
3
'
log 2 )
muxshift.epsi41 4 28 mm
a 3 a 2 a 1 a 0
s 0
s 1
z 3 z 2 z 1 z 0
multiplexers
barshift.epsi44 4 49 mm
a 3 a 2 a 1 a 0
s 0
s 1
z 3 z 2 z 1 z 0
s 0 s 1
s 0 s 1
s 0 s 1
tristate buffers
Computer Arithmetic: Principles, Architectures, and VLSI Design 65
5 Simple / Addition-Based Operations 5.6 Addition Flags
5.6 Addition Flags
ag formula description
7 carry ag
7
7
8 1 signed overow ag
7
7
7
7
7
7
0 : 0 zero ag
7
8 1 negative ag, sign
Implementation of adder with ags
,
: for free
: fast
7
,
7
8 1 computed by e.g. PPA
very cheap
: a) 7 1 (subtract.) :
'
)
7
8 1:0 (of PPA)
b) 7 0 1 :
1)
7
8 1
7
8 2
0 (r.s.a.)
2
log 2 !
2) faster without nal sum (i.e. carry prop.) [18] example : 01001 1 00 0
10110 1 00 00000 0 00
" 0 ' '
0
0)
7
)
"
' '
)
'
8 1
8 1) )
"
7
8 1 " 7 8 2 " 0 ; 0 0 1 1 1 2
1 (r.s.a.)
3 2 4
log 2 !
Computer Arithmetic: Principles, Architectures, and VLSI Design 66
5 Simple / Addition-Based Operations 5.6 Addition Flags
Basic and derived condition ags
formulacondition ag
unsigned signed
operation:
( ) or
( )
0 zero
0 negative
0 positive
( overow
( )
0
2 underow
( )
operation:
$
$
$
'
)
6
$
Unsigned and signed addition/subtraction only differwith respect to the condition ags
Computer Arithmetic: Principles, Architectures, and VLSI Design 67
8/14/2019 Comp Arith Notes
19/26
5 Simple/ Addition-Based Operations 5.7 Arithmetic Logic Unit (ALU)
5.7 Arithmetic Logic Unit (ALU)
alusymbol.epsi30 4 29 mm
c out ALU
A B
Z
c in
op flags
ALU operations
add
7 sub
7
arithmetic inc
1 dec
1pass
neg
and
nand
or
nor
logicxor
xnor
pass not
sll
1 srl
#
1shift/ sla
1 sra
#
1rotate
rol
1 ror
#
1 s/ro : shift/rotate ; l/r : left/right ;
l/a : logic (unsigned) / arithmetic (signed)
Logic of adder/subtractor can partly be shared with logicoperations
Computer Arithmetic: Principles, Architectures, and VLSI Design 68
6 Multiplication 6.1 Multiplication Basics
6 Multiplication
6.1 Multiplication Basics Multiplies two 2 -bit operands
and
[1, 2] Product
is'
2 2 )
-bit unsigned number or'
2 2 1)
-bitsigned number
Example : unsigned multiplication
7
8 1
0
2
7
8 1
$
0
$ 2$
7
8 1
0
7
8 1
$
0
$ 2
%
$
or
7
8 1
0
2
; 0 0 1 1 1 2 1 (r.s.a.)
Algorithm
1) Generation of 2 partial products
2) Adding up partial products :
a) sequentially (sequential shift-and-add),b) serially (combinational shift-and-add), orc) in parallel
Speed-up techniques Reduce number of partial products Accelerate addition of partial products
Computer Arithmetic: Principles, Architectures, and VLSI Design 69
6 Multiplication 6.1 Multiplication Basics
Sequential multipliers :partial products generatedand added sequentially (usingaccumulator )
3
'
2
)
3
'
log 2 )
2
mulseq.epsi34 4 28 mm
CPA
Array multipliers :partial products generated andadded simultaneously in lineararray (using array adder )
3
'
2 2)
3
'
2
)
mularr.epsi34 4 47 mm
CPA
CSA
CSA
CSA
CSA
Parallel multipliers :partial productsgenerated in parallel andaddedsubsequently in multi-operandadder (using tree adder )
3
'
2 2)
3
'
log 2 )
mulpar.epsi34 4 43 mm
CPA
CSAtree
Signed multipliers :a) complement operands before and result after
multiplication unsigned multiplicationb) direct implementation (dedicated multiplier structure)
Computer Arithmetic: Principles, Architectures, and VLSI Design 70
6 Multiplication 6.2 Unsigned Array Multiplier
6.2 Unsigned Array Multiplier Braun multiplier : array multiplier for unsigned numbers
7
8 1
0
7
8 1
$
0
$ 2
%
$
8 2 2 11 2
6 2 9
0
3 0
2 0
1 0
0 1
3 1
2 1
1 1
0 2
3 2
2 2
1 2
0 3
3 3
2 3
1 3
0
7
6
5
4
3
2
1
0
mulbraun.epsi99 4 83 mm
b 3
FA
FA
FA
FA
FA
FA
FA FA HA
b 2
b 1
b 0
p 7 p 6 p 5 p 4
p 3
p 2
p 1
p 0
a 3
a 2
a 1
a 0
HA HA HA
CPA
CSA
1
2
3
Computer Arithmetic: Principles, Architectures, and VLSI Design 71
8/14/2019 Comp Arith Notes
20/26
6 Multiplication 6.3 Signed Array Multipliers
6.3 Signed Array Multipliers
Modied Braun multiplier
Subtract bits with negative weight
special FAs [1]
1 neg. bit :
7
2
2 neg. bits :
7
2
Replace FAs in regions
1 ,
2 , and
3 by :(input at mark )
7
7
7
Otherwise exactly same structure and complexity asBraun multiplier efcient and exible
Baugh-Wooley multiplier
Arithmetic transformations yield the following partialproducts (two additional ones) :
0
3
0
2
0
1
0
0 1
3 1
2 1
1 1
0 2
3 2
2 2
1 2
0 3
3 3
2 3
1 3
0 3 3
1
3
3
7
6
5
4
3
2
1
0
Less efcient and regular than modied Braunmultiplier
Computer Arithmetic: Principles, Architectures, and VLSI Design 72
6 Multiplication 6.4 Booth Recoding
6.4 Booth Recoding Speed-up technique : reduction of partial products
Sequential multiplication Minimal (or canonical) signed-digit (SD) represent. of
+ One cycle per non-zero partial product (i.e.
!
0)
Negative partial products
Data-dependent reduction of partial products and latency
Combinational multiplication Only xed reduction of partial product possible Radix-4 modied Booth recoding : 2 bits recoded to one
multiplier digit
2
2 partial products
7
1
2
0( 2 8 1 2 2 2 % 1)
8 2 8 1 0 %
1 %
2
22
; 8 1 0
2
%
1
2
2
8 1
0 0 0 00 0 1 0 1 0 0 1 1 2 1 0 0 2 1 0 1 1 1 0 1 1 1 0
mulbooth.epsi41 4 43 mm
B o o
t h
r e c o
d i n g
CPA
CSAarray/tree
Computer Arithmetic: Principles, Architectures, and VLSI Design 73
6 Multiplication 6.4 Booth Recoding
Applicable to sequential , array , and parallel multipliers
additional recoding logic and morecomplex partial product generation(MUX for shift, XOR for negation)
: 8 2
: 7
+ adder array/tree cut in half considerably smaller (array and tree)
: 2
much faster for adder arrays : 2
slightly or not faster for adder trees : 0
Negative partial products (avoid sign-extension ) :
3
3
3
ext. sign
3
2
1
0 0 0 0
3
2
1
0
1 1 1 1 3
2
1
0
03
03
03
03
02
01
00
13
13
13
12
11
10
23
23
22
21
20
33
32
31
30
6
5
4
3
2
1
0
1
03
02
01
00
13
12
11
10
23
22
21
20
33
32
31
30
6
5
4
3
2
1
0
Suited for signed multiplication (incl. Booth recod.)
Extend
for unsigned multiplication : 7 0
Radix-8 (3-bit recoding) and higher radices :precomputing 3
, 1 1 1
larger overhead
Computer Arithmetic: Principles, Architectures, and VLSI Design 74
6 Multiplication 6.6 Multiplier Implementations
6.5 Wallace Tree Addition Speed-up technique : fast partial product addition
3
'
2 2)
3
'
log 2 )
Applicable to parallel multipliers : parallel partialproduct generation (normal or Booth recoded)
Irregular adder tree (Wallace tree) due to differentnumber of bits per column
irregular wiring and/or layout
non-uniform bit arrival times at nal adder
6.6 Multiplier Implementations Sequential multipliers :
low performance, small area, resource sharing (adder) Braun or Baugh-Wooley multiplier (array multiplier) :
medium performance, high area, high regularity layout generators
data paths and macro-cells simple pipelining , faster CPA higher speed
Booth-Wallace multiplier (parallel multiplier) [9] : high performance, high area, low regularity
custom multipliers, netlist generators often pipelined (e.g. register between CSA-tree and CPA)
Signed-unsigned multiplier : signed multiplier withoperands extended by 1 bit ( 7 7 8 1 0,
7
7
8 1 0)
Computer Arithmetic: Principles, Architectures, and VLSI Design 75
8/14/2019 Comp Arith Notes
21/26
6 Multiplication 6.8 Squaring
6.7 Composition from Smaller Multipliers
'
2 2
2 2 )
-bit multiplier can be composed from 4'
2
2
)
-bit multipliers (can be repeated recursively)
'
27
)
'
27
)
227
'
)
27
4'
2
2
)
-bit multipliers+
'
2 2 )
-bit CSA +'
3 2 )
-bit CPA
less efcient (area and speed)
6.8 Squaring
2
: multiplier optimizations possible
0 3
0 1 0 1 3 1 2 1 1 0
2 3 2 2 1
3 3 2 3 1 3 0 2 3 1 3 0 3
0 1 0 0 3 3 1 2 1 1
2 2
7
6
5
4
3
2
1
0
+
2
2
1 partial products (if no Booth recoding used)
optimized squarer more efcient than multiplier
Table look-up (ROM) less efcient for every 2
Computer Arithmetic: Principles, Architectures, and VLSI Design 76
7 Division / Square Root Extraction 7.1 Division Basics
7 Division / Square Root Extraction
7.1 Division Basics
;
rem
(remainder)
0 227
1
0 27
1
0
27
27
, otherwise overow
normalize
before division (
27
8 1 2
7
1 )
Algorithms (radix-2) Subtract-and-shift : partial remainders
[1, 2] Sequential algorithm : recursive, # non-associative
"
%
1
2
%
%
1
2
7
0 ; 0 2
1 1 1 1 0 (r.m.n.)
Basic algorithm : compare and conditionally subtract
expensive comparison and CPA
Restoring division : subtract and conditionally restore(adder or multiplexer)
expensive CPA and restoring
Non-restoring division : detect sign , subtract/add , andcorrect by next steps expensive CPA
SRT division : estimate range , subtract/add (CSA), andcorrect by next steps
inexpensive CSA
Computer Arithmetic: Principles, Architectures, and VLSI Design 77
7 Division / Square Root Extraction 7 .3 Non-Restoring Division
7.2 Restoring Division
1 if
%
1
2
00 if
%
1
2
0
0
%
1
2
0 : 0
%
1 (restored)0 1
%
1
2
8 1 0 : 8 1 1
8 1
%
1
2
8 1
7.3 Non-Restoring Division
1 if
%
1
0 1 1 if
%
1 0
0
%
1
0 :
1
%
1
2
0
1
%
1
2
0 :
8 1
1
8
1
%
1
2
2
8 1
%
1
2
8 1
One subtraction/addition (CPA) per step Final correction step for
(additional CPA) Simple quotient digit conversion : (note:
irredundant)
1 1
0 1 : 12'
1)
'
7
8 1
7
8 2
7
8 3 1 1 1
0 1)
'
2
1)
3
'
2
2)
or3
'
2
2 log2
)
'
2
1)
3
'
2 2)
or3
'
2 log 2 )
divnr.epsi46 4 38 mm
+ / CPA+ / CPA
+ / CPA+ / CPA
Q
+ / CPA
A B
R
Computer Arithmetic: Principles, Architectures, and VLSI Design 78
7 Division / Square Root Extraction 7.4 Signed Division
7.4 Signed Division
1 if
%
1
same sign1 if
%
1
opposite sign Example : signed non-restoring array divider
(simplications:
0, nal correction of
omitted)
9 2 2 2 2 2 4 2
divarray.epsi81 4 101 mm
b 3 b 0
r 3 r 2 r 1 r 0
a 0
a 1
a 2
q 3
q 2
q 1
q 0
b 2 b 1
FAFAFAFA
FAFAFAFA
FAFAFAFA
FAFAFAFA
a 6 a 3 a 5 a 4 b 3 a 6
Computer Arithmetic: Principles, Architectures, and VLSI Design 79
8/14/2019 Comp Arith Notes
22/26
7 Division / Square Root Extraction 7.5 SRT Division
7.5 SRT Division (Sweeney, Robertson, Tocher)
1 if
2
6
%
1
0 if
2
6
%
1
2
1 if
%
1
2
is SD number
If 27
8 16
27
, i.e.
is normalized :
2
6
2
7
%
8 16
%
1
2
7
%
8 16
2
1 if 27
%
8 16
%
1
0 if 27
%
8 16
%
1 27
%
8 1
1 if
%
1 27
%
8 1
+ Only 3 MSB are compared
are estimated
CSAinstead of CPA can be used (precise enough) [19]
Correction in following steps (+ nal correction step) Redundant representation of
(SD representation)
nal conversion necessary (CPA)+ Highly regular and fast (
3
'
2
)
) SRT array dividers
only slightly slower/larger than array multipliers
2
2
3
'
2 2)
2
3
'
2
)
divsrt.epsi50 4 38 mm
+ / CSA
A B
Q
R
+ / CPA
+ / CSA+ / CSA
+ / CSA C P A
Computer Arithmetic: Principles, Architectures, and VLSI Design 80
7 Division / Square Root Extraction 7.7 Division by Multiplication
7.6 High-Radix Division
Radix
2&
,
1 1 1 1 1 0 1 1 1 1
1
quotient bits per step fewer , but more complex steps
+ Suitable for SRT algorithm
faster
Complex comparisons (more bits) and decisions
table look-up (
Pentium bug!)
7.7 Division by Multiplication
Division by convergence
0
1
&
8 1
0
1
&
8 1
1!
1!
1resp.
27
%
1
27
'
1 )
'
1 )
27
'
1 2)
27
1
28
7
2
28
7
1 (signed)
Algorithm :
%
1
%
1
1 ; 0 0 1 1 1 1
0
0
& (r.s.n.)
Quadratic convergence :
log 2 !
Computer Arithmetic: Principles, Architectures, and VLSI Design 81
7 Division / Square Root Extraction 7.8 Remainder / Modulus
Division by reciprocation
1
Newton-Raphson iteration method :
nd # '
)
0 by recursion
%
1
#
'
)
#
'
)
#
'
)
1
#
'
)
1
2#
1
&
0
Algorithm :
%
1
'
2
)
; 0 0 1 1 1 1
0
& (r.s.n.)
Quadratic convergence : 3
'
log 2 )
Speed-up : rst approximation
0 from table
7.8 Remainder / Modulus
Remainder (rem) : signed remainder of a division
rem
sign'
)
sign'
)
Modulus (mod) : positive remainder of a division
mod
0
if
0
else
Computer Arithmetic: Principles, Architectures, and VLSI Design 82
7 Division / Square Root Extraction 7.9 Divider Implementations
7.9 Divider Implementations
Iterative dividers (through multiplication) :
resource sharing of existing components (multiplier) medium performance, medium area high efciency if components are shared
Sequential dividers (restoring, non-restoring, SRT) :
resource sharing of existing components (e.g. adder)
low performance, low area Array dividers (restoring, non-restoring, SRT) :
dedicated hardware component high performance, high area high regularity layout generators, pipelining square root extraction possible by minor changes combination with multiplication or/and square root
No parallel dividers exist, as compared to parallelmultipliers (sequential nature of division)
Computer Arithmetic: Principles, Architectures, and VLSI Design 83
8/14/2019 Comp Arith Notes
23/26
7 Division / Square Root Extraction 7 .10 Square Root Extraction
7.10 Square Root Extraction0
2
0 227
1
0 27
1
Algorithm Subtract-and-shift : partial remainders
and quotients
%
1
2
'
7
8 1 1 1 1