Date post: | 03-Apr-2018 |
Category: |
Documents |
Upload: | kaylaroberts |
View: | 215 times |
Download: | 0 times |
of 71
7/28/2019 2.2ArithmeticFull.pdf
1/71
1
Chapter 6. Arithmetic
7/28/2019 2.2ArithmeticFull.pdf
2/71
2
Outline
A basic operation in all digital computers is
the addition or subtraction of two numbers.
ALU AND, OR, NOT, XORUnsigned/signed numbers
Addition/subtraction
MultiplicationDivision
Floating number operation
7/28/2019 2.2ArithmeticFull.pdf
3/71
3
Adders
7/28/2019 2.2ArithmeticFull.pdf
4/71
4
Addition of Unsigned Numbers
Half Adder
Sums
0
1
1
0
Carryc
0
0
0
1
0
0+
0
1+
1000
1
0+
10
1
1+
01
x
y+
sc
SumCarry
(a) The four possible cases
x y
0
0
1
1
0
1
0
1
(b) Truth table
x
ys
c
HAx
y
s
c
(c) Circuit (d) Graphical symbol
7/28/2019 2.2ArithmeticFull.pdf
5/71
5
Addition and Subtraction of
Signed Numbers
si =
ci +1 =
Figure 6.1. Logic specification for a stage of binary addition.
13
7+ Y
1
0
0
0
1
0
1
1
0
0
1
1
0
1
1
0
0
1
1
0
1
00
1
0
0
0
0
1
1
1
1
0
0
0
0
1
11
1
Example:
10= = 0
01 1
11 1 0 0
1
1 1 10
Legend for stage i
xi yi Carry-in ci Sum si Carry-out ci +1
X
Z
+ 6 0+xiyi
si
Carry-outci+1
Carry-inci
xi
yi
ci
xiy
ic
ix
iy
ic
ix
iy
ic
i xi yi ci =+ + +
yic
ix
ic
ix
iy
i+ +
7/28/2019 2.2ArithmeticFull.pdf
6/71
6
Addition and Subtraction of
Signed Numbers
A full adder (FA)
Full adder
(FA)
ci
yi
xi
ci 1+
si
(a) Logic f or a single stage
ci
yi
xi
ci
yi
xi
xi
ci
yi
si
ci 1+
7/28/2019 2.2ArithmeticFull.pdf
7/71
7
Addition and Subtraction of
Signed Numbers
n-bit ripple-carry adder
Overflow? cn cn-1
Subtraction?
FA c0
y1x1
s1
FA
c1
y0x0
s0
FA
cn 1-
yn 1-xn 1-
cn
sn 1-
(b) nbit ripple carry adder
Most significant bit(MSB) position
Least significant bit(LSB) position
7/28/2019 2.2ArithmeticFull.pdf
8/71
8
Addition and Subtraction of
Signed Numbers
kn-bit ripple-carry adder
n-bit c0
y
n
x
n
sn
cn
y
0
x
n 1-
s0
ckn
sk 1-( )n
x
0
y
n 1-
y
2n 1-
x
2n 1-
y
kn 1-
sn 1-
s2n 1-
skn 1-
(c) Cascade of k n-bit adders
x
kn 1-
Figure 6.2. Logic for addition of binary vectors.
adder
n-bit
adder
n-bit
adder
7/28/2019 2.2ArithmeticFull.pdf
9/71
9
Addition and Subtraction of
Signed Numbers
Addition/subtraction logic unit
Add/Sucontrol
n-bit adder
xn 1- x1 x0
cn
sn 1- s1 s0
c0
yn 1- y1 y0
Figure 6.3. Binary addition-subtraction logic network.
7/28/2019 2.2ArithmeticFull.pdf
10/71
10
Make Addition Faster
7/28/2019 2.2ArithmeticFull.pdf
11/71
11
Ripple-Carry Adder (RCA)
Straight-forward design
Simple circuit structure
Easy to understandMost power efficient
Slowest (too long critical path, 2n gate
delays)
7/28/2019 2.2ArithmeticFull.pdf
12/71
12
Adders
We can view addition in terms of generate,
G[i], and propagate, P[i].
7/28/2019 2.2ArithmeticFull.pdf
13/71
13
Carry-lookahead Logic
Carry Generate Gi = Ai Bi must generate carry when A = B = 1
Carry Propagate Pi = Ai xor Bi carry-in will equal carry-out here
Si = Ai xor Bi xor Ci = Pi xor Ci
Ci+1 = Ai Bi + Ai Ci + Bi Ci
= Ai Bi + Ci (Ai + Bi)
= Ai Bi + Ci (Ai xor Bi)
= Gi + Ci Pi
Sum and Carry can be reexpressed in terms of generate/propagate/Ci
:
7/28/2019 2.2ArithmeticFull.pdf
14/71
14
Carry-lookahead Logic
Reexpress the carry logic as follows:
C1 = G0 + P0 C0
C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0
C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0
C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0
Each of the carry equations can be implemented in a two-level logicnetwork
Variables are the adder inputs and carry in to stage 0!
7/28/2019 2.2ArithmeticFull.pdf
15/71
15
Carry-lookahead
Implementation
Adder with Propagate andGenerate Outputs
Increasingly complex logic
Pi @ 1 gate delay
Ci Si @ 2 gate delays
BiAi
Gi @ 1 gate delay
C0C0
C0
C0P0P0
P0
P0
G0G0
G0
G0
C1
P1
P1
P1
P1
P1
P1 G1
G1
G1
C2P2
P2
P2
P2
P2
P2
G2
G2
C3
P3
P3
P3
P3
G3
C4
Pi &Gi obtained in 1 gate delay fig.(a).
Ci needs 2 more gate delays fig. (b). Total 3 gate delays for ci.
Si needs one more gate delay. Four gate delays for sum bits.
(a)
(b)
(c)
(d)
Higher fan-in
for more
complex logic
7/28/2019 2.2ArithmeticFull.pdf
16/71
16
Carry-lookahead Logic
Cascaded Carry Lookahead 4-bit adder
Carry lookaheadlogic generatesindividual carries
sums computedmuch faster
A0
B0
C0
S0@2
A1
B1
C1@3
S1@4
A2
B2
C2@3
S2@4
A3
B3
C3
@3
S3@4
C4 @3
7/28/2019 2.2ArithmeticFull.pdf
17/71
17
Carry-lookahead Logic
Extension
Figure 6.5. 16-bit carry-lookahead adder built from 4-bit adders (see Figure 6.4b).
Carry-lookahead log ic
4-bit adder 4-bit adder 4-bit adder 4-bit adder
s15-12
P3I
G3I
c12
P2I
G2I
c8
s11-8
G1I
c4
P1I
s7-4
G0I
c0
P0I
s3-0
c16
x15-12 y15-12 x11-8 y11-8 x7-4 y7-4 x3-0 y3-0
.
G0II
P0II
c4
3, +2 gate delays= c8
, + 2more gate delays=c12
+ 2 more gate
delays= c16. sum 1 more gate delay. Total 10 delays compared to 32for RCA
7/28/2019 2.2ArithmeticFull.pdf
18/71
18
Carry-lookahead Logic
4 bit adders with internal carry lookaheadsecond level carry lookahead unit, extends lookahead to 16 bits
Group Propagate P = P3 P2 P1 P0Group Generate G = G3 + G2P3 + G1P3P2 + G0P3P2P1
4-bit Adder
4 4
4
A [15-12] B [15-12] C12C16
S[15-12]
P G4-bit Adder
4 4
4
A [11-8] B[1 1-8] C8
S[1 1-8]
P G4-bit Adder
4 4
4
A [7-4] B [7-4] C4
S[7-4]
P G4-bit Adder
4 4
4
A [3-0] B[3-0] C0
S[3-0]
P G
Lookahead Carry UnitC0
P0 G0P1 G1P2 G2P3 G3 C3 C2 C1
C0
P 3-0 G3-0
C4
@3@2
@0
@4
@4@3@2@5
@7
@3@2@5
@8@8
@3@2
@5
@5@3
@0
C16
7/28/2019 2.2ArithmeticFull.pdf
19/71
19
UnsignedMultiplication
7/28/2019 2.2ArithmeticFull.pdf
20/71
20
Manual Multiplication
Algorithm
(13) Multiplicand M1
1
(143) Product P
(11) Multiplier Q1
0
0
1
1
1
1 1 0 1
1011
0000
1011
01 0 0 1 1 1 1
(a) Manual multiplication algorithm
7/28/2019 2.2ArithmeticFull.pdf
21/71
21
Array Multiplication Multiplicandm 3 m 2 m 1 m 00 0 0 0
q3
q2
q1
q0
0
p2
p1
p0
0
0
0
p3
p4
p5
p6
p7
PP1
PP2
PP3
P artial product(PP0)
p ,p , ...pPP4 = 7 6 0 = Product
Carry-in
qi
mj
Bit of incoming partial product (PP i)
Bit of outgoing partial product [PP( i +1)]
Carry-out
Typical cell
FA
(b) Array implementation
7/28/2019 2.2ArithmeticFull.pdf
22/71
22
7/28/2019 2.2ArithmeticFull.pdf
23/71
23
Another Version of 44 Array
Multiplier
7/28/2019 2.2ArithmeticFull.pdf
24/71
24
Array Multiplication
What is the critical path (worst case signal
propagation delay path)?
Assuming that there are two gate delays from
the inputs to the outputs of a full adder block,
the path has a total of 6(n-1)-1 gate delays,
including the initial AND gate delay in all
cells, for the nn array.Any advantages/disadvantages?
7/28/2019 2.2ArithmeticFull.pdf
25/71
25
Sequential Circuit Binary
Multiplier
qn 1-
mn 1-
n-bit
Multiplicand M
(a) Register configuration
Controlsequencer
Multiplier Q
0
C
Shift right
Register A (initially 0)
adder
Add/Noaddcontrol
an 1-
a0
q0
m0
0
MUX
1 1 1 1
1 0 1 1
1 1 1 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
Initial configuration
Add
M
1 1 0 1
(b) Multiplication example
C
First cycle
Second cycle
Third cycle
Fourth cycle
No add
Shift
ShiftAdd
Shift
ShiftAdd
1 1 1 1
0
0
0
1
0
0
0
1
0
0 0 0 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
0 1 0 0
0 0 0 1
1 0 0 0
1 0 0 1
1 0 1 1
QA
Product
7/28/2019 2.2ArithmeticFull.pdf
26/71
26
Signed Multiplication
7/28/2019 2.2ArithmeticFull.pdf
27/71
27
Signed Multiplication
Considering 2s-complement signed operands, what will happen to(-13)(+11) if following the same method of unsigned multiplication?
Figure 6.8. Sign extension of negative multiplicand.
1
0
11 11 1 1 0 0 1 1
110
110
1
0
1000111011
000000
1100111
00000000
110011111
13-( )
143-( )
11+( )
Sign extension isshown in blue
7/28/2019 2.2ArithmeticFull.pdf
28/71
28
Signed Multiplication
For a negative multiplier, a straightforwardsolution is to form the 2s-complement of boththe multiplier and the multiplicand and
proceed as in the case of a positive multiplier. This is possible because complementation of
both operands does not change the value orthe sign of the product.
A technique that works equally well for bothnegative and positive multipliers Boothalgorithm.
7/28/2019 2.2ArithmeticFull.pdf
29/71
29
Booth Algorithm
Consider in a multiplication, the multiplier ispositive 0011110, how many appropriatelyshifted versions of the multiplicand are addedin a standard procedure?
0
0 0
1 0 1 1 0 1
0
0 0 0 0 0 01
00110101011010
10110101011010
0000000000000
011000101010
0
00
1+ 1+ 1+ 1+
7/28/2019 2.2ArithmeticFull.pdf
30/71
30
Booth Algorithm
Since 0011110 = 0100000 0000010,(25-21)
if we use the expression to the right, what will
happen?
0
1
0 1 0 1 1 1
0000
00000000000000
1 1 1 1 1 1 1 0 1 0 0 1
00
0
0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 0
0110001001000 1
2's complement of
the multiplicand
0
0
00
1+ 1-
00
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 00 0 0 0 0 0 0
7/28/2019 2.2ArithmeticFull.pdf
31/71
31
Booth Algorithm In general, in the Booth scheme, -1 times the shifted multiplicand
is selected when moving from 0 to 1, and +1 times the shiftedmultiplicand is selected when moving from 1 to 0, as themultiplier is scanned from right to left.
Figure 6.10. Booth recoding of a multiplier.
001101011100110100
00000000 1+ 1-1-1+1-1+1-1+1-1+
7/28/2019 2.2ArithmeticFull.pdf
32/71
32
Booth Algorithm
Figure 6.11. Booth multiplication with a negative multiplier.
010
1 1 1 1 0 1 10 0 0 0 0 0 0 0 0
000110
0 0 0 0 1 1 01100111
0 0 0 0 0 0
01000 11111
1
10 1 1 0 11 1 0 1 0 6-( )
13+( )
78-( )
+11- 1-
7/28/2019 2.2ArithmeticFull.pdf
33/71
33
Booth Algorithm
Multiplier
Bit i Bit i 1-
Version of multiplicandselected by biti
0
1
0
0
01
1 1
0 M
1+ M
1 M
0 M
Figure 6.12. Booth multiplier recoding table.
7/28/2019 2.2ArithmeticFull.pdf
34/71
34
Booth Algorithm
Best case a long string of 1s (skipping over 1s) Worst case 0s and 1s are alternating
1
0
1110000111110000
001111011010001
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0
000000000000
00000000
1- 1- 1- 1- 1- 1- 1- 1-
1- 1- 1- 1-
1-1-
1+ 1+ 1+ 1+ 1+ 1+ 1+ 1+
1+
1+1+1+
1+
Worst-case
multiplier
Ordinarymultiplier
Good
multiplier
7/28/2019 2.2ArithmeticFull.pdf
35/71
35
Booth Algorithm - Advantages
Handles both positive and negative
multipliers uniformly
Efficient if large blocks of ones exist
On the average, speed same as that of
normal algorithm
7/28/2019 2.2ArithmeticFull.pdf
36/71
36
Fast Multiplication
7/28/2019 2.2ArithmeticFull.pdf
37/71
37
Bit-Pair Recoding of
Multipliers
Bit-pair recoding halves the maximum number of
summands (versions of the multiplicand).
1+1
(a) Example of bit-pair recoding derived from Booth recoding
0
000
1 1 0 1 0
Implied 0 to right of LSB
1
0
Sign extension
1
21
7/28/2019 2.2ArithmeticFull.pdf
38/71
38
Bit-Pair Recoding of
Multipliers
i 1+ i 1
(b) Table of multiplicand selection decisions
selected at positioni
MultiplicandMultiplier bit-pair
i
0
0
1
1
1
0
1
0
1
1
1
1
0
0
0
1
1
0
0
1
0
0
1
Multiplier bit on the right
0 0 M
1+
1
1+
0
1
2
2+
M
M
M
M
M
M
M
7/28/2019 2.2ArithmeticFull.pdf
39/71
39
Bit-Pair Recoding of
Multipliers
1-
00001 1 1 1 1 00 0 0 0 111 1 1 1 10 0
0 0 0 0 0 00000 111111
0 1 1 0 1
01 010011111
1 1 1 1 0 0 1 10 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0
0
1
0 0
1 0
1
0 0
00 1
0
0 1
10
0
0100 1 1 0 1
11
1-
6-( )13+( )
1+
78-( )
1- 2-
Figure 6.15. Multiplication requiring only n/2 summands.
ddi i f
7/28/2019 2.2ArithmeticFull.pdf
40/71
40
Carry-Save Addition of
Summands(13) Multiplicand M1
1
(143) Product P
(11) Multiplier Q1
0
0
1
1
1
1 1 0 1
1011
0000
1011
01 0 0 1 1 1 1
(a) Manual multiplication algorithm
C S ddi i f
7/28/2019 2.2ArithmeticFull.pdf
41/71
41
Carry-Save Addition of
SummandsMultiplicand
m3
m2
m1
m0
0 0 0 0
q3
q2
q1
q0
0
p2
p1
p0
0
0
0
p3
p4
p5
p6
p7
PP1
PP2
PP3
P artial product(PP0)
p ,p , ...pPP4 = 7 6 0 = Product
Carry-in
qi
mj
Bit of incoming partial product (PP i)
Bit of outgoing partial product [PP( i +1)]
Carry-out
T ypical cell
FA
(b) Array implementation
C S Addi i f
7/28/2019 2.2ArithmeticFull.pdf
42/71
42
Carry-Save Addition of
Summands
CSA speeds up the addition process.
FA FA FAFA
FA FA FAFA
FA FA FAFA
p7 p6 p5 p4 p3 p1 p0p2
0 m3q0
m3q1
(a) Ripple-carry array (Figure 6.6 structure)
m2q1
m2q0 m1q0
m1q1 m0q1
m3q2 m2q2 m1q2 m0q2
m3q3 m2q3 m1q3 m0q3
0
0
0
m0q0
Carry-Save Addition of
7/28/2019 2.2ArithmeticFull.pdf
43/71
43
Carry-Save Addition of
Summands
FA FA FAFA
FA FA FAFA
FA FA FAFA
p7 p6 p5 p4 p3 p1 p0p2
0 m3q0
m3q1
(b) Carry-save array
m2q1
m2q0 m1q0
m1q1 m0q1
m2q3 m1q3 m0q3 0
0
0
m2q2 m1q2 m0q2m3q2
m3q3
m0q0
Figure 6.16. Ripple-carry and carry-save arrays for themultiplication operation M x Q = P for 4-bit operands.
Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands.
C S Additi f
7/28/2019 2.2ArithmeticFull.pdf
44/71
44
Carry-Save Addition of
Summands
The delay through the carry-save array is somewhatless than delay through the ripple-carry array. This isbecause the S and C vector outputs from each roware produced in parallel in one full-adder delay.
Consider the addition of many summands, we can: Group the summands in threes and perform carry-save addition on
each of these groups in parallel to generate a set of S and C vectorsin one full-adder delay
Group all of the S and C vectors into threes, and perform carry-saveaddition on them, generating a further set of S and C vectors in onemore full-adder delay
Continue with this process until there are only two vectors remaining They can be added in a RCA or CLA to produce the desired product
C S Additi f
7/28/2019 2.2ArithmeticFull.pdf
45/71
45
Carry-Save Addition of
Summands
Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure 6.18.
100 1 11
100 1 11
100 1 11
11111 1
100 1 11 M
Q
A
B
C
D
E
F
(2,835)
X
(45)
(63)
100 1 11
100 1 11
100 1 11
000 1 11 111 0 00 Product
100 1 11 M
Q
7/28/2019 2.2ArithmeticFull.pdf
46/71
46Figure 6.18. The multiplication example from Figure 6.17 performed usingcarry-save addition.
00000101 0 10
10010000 1 11 1
+
1000011 1
10010111 0 10 1
0110 1 10 0
00011010 0 00
10001011 1 0 1
110001 1 0
00111100
00110 1 10
11001 0 01
100 1 11
100 1 11
100 1 11
00110 1 10
11001 0 01
100 1 11
100 1 11
100 1 11
11111 1 Q
A
B
C
S1
C1
D
E
F
S2
C2
S1
C1
S2
S3
C3
C2
S4
C4
Product
x
C S Additi f
7/28/2019 2.2ArithmeticFull.pdf
47/71
47
Carry-Save Addition of
Summands
Figure 6.19. Schematic representation of the carry-save
C2
ABE D CF
addition operations in Figure 6.18.
Level 1 CSA
S2 C1 S1
C2 C3 S3
C4 S4
+
Product
Level 2 CSA
Level 3 CSA
Final addition
Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18.
C S Additi f
7/28/2019 2.2ArithmeticFull.pdf
48/71
48
Carry-Save Addition of
Summands
When the number of summands is large, the
time saved is proportionally much greater.
Some omitted issues: Sign-extension
Computation width of the final CLA/RCA
Bit-pair recoding
7/28/2019 2.2ArithmeticFull.pdf
49/71
49
Integer Division
7/28/2019 2.2ArithmeticFull.pdf
50/71
50
Manual Division
Figure 6.20. Longhand division examples.
1101
1
13
14
26
21
274 100010010
10101
1101
1
1110
1101
10000
13 1101
7/28/2019 2.2ArithmeticFull.pdf
51/71
51
Longhand Division Steps
Position the divisor appropriately with respect to the
dividend and performs a subtraction.
If the remainder is zero or positive, a quotient bit of 1
is determined, the remainder is extended by anotherbit of the dividend, the divisor is repositioned, and
another subtraction is performed.
If the remainder is negative, a quotient bit of 0 is
determined, the dividend is restored by adding backthe divisor, and the divisor is repositioned for
another subtraction.
7/28/2019 2.2ArithmeticFull.pdf
52/71
52
Circuit Arrangement
qn 1-
mn 1-
-bit
Divisor M
Controlsequencer
Dividend Q
Shift left
adder
an 1-
a0
q0
m0
an
0
Add/Subtract
Quotientsetting
n 1+
Figure 6.21. Circuit arrangement for binary division.
A
7/28/2019 2.2ArithmeticFull.pdf
53/71
53
Restoring Division
Shift A and Q left one binary position
Subtract M from A, and place the answer
back in A
If the sign of A is 1, set q0to 0 and add M
back to A (restore A); otherwise, set q0to 1
Repeat these stepsn
times
01
7/28/2019 2.2ArithmeticFull.pdf
54/71
54
Examples
10111
Figure 6.22. A restoring-division example.
1 1 1 1 1
01111
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1 01
111 1 0001
Subtract
Shift
Restore
1 0000
1 0000
1 1
Initially
Subtract
Shift
10111
10000
11000
00000
Subtract
Shift
Restore
10111
01000
10000
1 1
QuotientRemainder
Shift
10111
1 0000
Subtract
Second cycle
First cycle
Third cycle
Fourth cycle
0
0
0
0
0
0
1
0
1
10000
1 1
1 0000
11111
Restore
q0Set
q0Set
q0Set
q0Set
7/28/2019 2.2ArithmeticFull.pdf
55/71
55
Nonrestoring Division
Avoid the need for restoring A after an unsuccessfulsubtraction.
Any idea?
A +ve, shift left and subtract =2A-M
A -ve, restore,shift,subtract = A+M, 2(A+M),2A+M
Step 1: (Repeat n times) If the sign of A is 0, shift A and Q left one bit position and
subtract M from A; otherwise, shift A and Q left and add M
to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Step2: If the sign of A is 1, add M to A
7/28/2019 2.2ArithmeticFull.pdf
56/71
56
Examples
Figure 6.23. A nonrestoring-division example.
1
Add
Quotient
Remainder
0 0 0 01
0 0 1 01 1 1 1 1
1 1 1 1 1
0 0 0 1 1
0 0 0 01 1 1 1 1
Shift 0 0 0
11000
01111
Add
0 0 0 1 1
0 0 0 0 1 0 0 0
1 1 1 0 1
Shift
Subtract
Initially 0 0 0 0 0 1 0 0 0
1 1 1 0 0000
1 1 1 0 0
0 0 0 1 1
0 0 0Shift
Add
0 0 10 0 0 01
1 1 1 0 1
Shift
Subtract
0 0 0 110000
Restore remainder
Fourth cycle
Third cycle
Second cycle
First cycle
q0Set
q0
Set
q0
Set
q0
Set
7/28/2019 2.2ArithmeticFull.pdf
57/71
57
Floating-Point Numbersand Operations
7/28/2019 2.2ArithmeticFull.pdf
58/71
58
Floating-Point Numbers
So far we have dealt with fixed-point numbers (whatis it?), and have considered them as integers.
Floating-point numbers: the binary point is just to theright of the sign bit.
Where the range of F is:
The position of the binary point is variable and isautomatically adjusted as computation proceeds.
)1(210. = nbbbbB )1(
)1(
2
2
1
1
0
0 2222)(
++++=
n
nbbbbBF
)1(
211
n
F
7/28/2019 2.2ArithmeticFull.pdf
59/71
59
Floating-Point Numbers
What are needed to represent a floating-point
decimal number?
Sign
Mantissa (the significant digits)
Exponent to an implied base (scale factor)
Normalized the decimal point is placed to
the right of the first (nonzero) significant digit.
IEEE Standard for Floating
7/28/2019 2.2ArithmeticFull.pdf
60/71
60
IEEE Standard for Floating-
Point Numbers
Think about this number (all digits are decimal):
X1.X
2X
3X
4X
5X
6X
710Y1Y2
It is possible to approximate this mantissa precision
and scale factor range in a binary representationthat occupies 32 bits: 24-bit mantissa (1 sign bit for
signed number), 8-bit exponent.
Instead of the signed exponent, E, the value actually
stored in the exponent field is an unsigned integerE=E+127, so called excess-127 format
7/28/2019 2.2ArithmeticFull.pdf
61/71
61
IEEE Standard
Sign ofnumber :
32 bits
mantissa fraction23-bit
representationexcess-127
exponent in8-bit signed
52-bitmantissa fraction
11-bit excess-1023exponent
64 bits
Sign
Value represented
0 0 10 1 0 . . . 00 0 0 1 0 1 0 0 0
S M
S M
Value represented
(a) Single precision
(b) Example of a single-precision number
(c) Double precision
Figure 6.24. IEEE standard floating-point formats.
E
+
1.001010
0 287-
=
1.M 2E 127-=
Value represented 1.M 2E 1023-
=
E
0 signifies-1 signifies
(101000)2=4010, 40-127=-87
7/28/2019 2.2ArithmeticFull.pdf
62/71
62
IEEE Standard
For excess-127 format, 0 E 255.
However, 0 and 255 are used to represent
special value. So actually 1 E 254. That
means -126 E 127.Single precision uses 32-bit. The value range
is from 2-126 to 2+127.
Double precision used 64-bit. The valuerange is from 2-1022 to 2+1023.
7/28/2019 2.2ArithmeticFull.pdf
63/71
63
Two Aspects
If a number is not normalized, it can always be put in normalizedform by shifting the fraction and adjusting the exponent.
0 1 1 00 1 0 0 0 0 1 0 1
(a) Unnormalized value
(b) Normalized version
0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 ...
(There is no implicit 1 to the left of the binary point.)
Value represented 0.0010110 29
+=
...
Value represented 1.0110 26
+=
Figure 6.25. Floating-point normalization in IEEE single-precision format.
excess-127 exponent
(10001000)2=13610, 136-127=-9
6+127=133. 13310, = (100000101)2
7/28/2019 2.2ArithmeticFull.pdf
64/71
64
Two Aspects
As computations proceed, a number that
does not fall in the representable range of
normal numbers might be generated.
It requires an exponent less than -126
(underflow) or greater than +127 (overflow).
Both are exceptions that need to be
considered.
7/28/2019 2.2ArithmeticFull.pdf
65/71
65
Special Values
The end value 0 and 255 are used to represent
special values.
When E=0 and M=0, the value exact 0 is
represented. (0) When E=255 and M=0, the value is represented.
( )
When E=0 and M0, denormal numbers are
represented. The value is 0.M2-126. When E=255 and M0, Not a Number (NaN).
7/28/2019 2.2ArithmeticFull.pdf
66/71
66
Exceptions
A processor must set exception flags if any of
the following occur in performing operations:
underflow, overflow, divide by zero, inexact
(requires rounding), invalid (0/0).When exception occurs, the results are set to
special values.
Arithmetic Operations on
7/28/2019 2.2ArithmeticFull.pdf
67/71
67
Arithmetic Operations on
Floating-Point Numbers
Add/Subtract rule Choose the number with the smaller exponent and shift its mantissa right a
number of steps equal to the difference in exponents. Set the exponent of the result equal to the larger exponent. Perform addition/subtraction on the mantissas and determine the sign of the
result. Normalize the resulting value, if necessary. Multiply rule Add the exponents and subtract 127. Multiply the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.
Divide rule Subtract the exponents and add 127. Divide the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.
7/28/2019 2.2ArithmeticFull.pdf
68/71
68
Guard Bits and Truncation
During the intermediate steps, it is important
to retain extra bits, often called guard bits, to
yield the maximum accuracy in the final
results.Removing the guard bits in generating a final
result requires truncation of the extended
mantissa how?
7/28/2019 2.2ArithmeticFull.pdf
69/71
69
Guard Bits and Truncation
Chopping biased, 0 to 1 at LSB. Von Neumann Rounding (any of the bits to be removed are 1,
the LSB of the retained bits is set to 1) unbiased, -1 to +1 atLSB.
Why unbiased rounding is better for the cases that manyoperands are involved?
Rounding (A 1 is added to the LSB position of the bits to beretained if there is a 1 in the MSB position of the bits beingremoved) unbiased, - to + at LSB.
Round to the nearest number or nearest even number in case of a tie
(0.b-1b-20000 - 0.b-1b-20, 0.b-1b-21100 - 0.b-1b-21+0.001) Best accuracy Most difficult to implement
0.b-1b-2b-3000 -- 0.b-1b-2b-31110.b-1b-2b-3
All 6-bit fractions with b-4b-5b6 not equal to000 are truncated to 0.b-1b-21
Implementing Floating-Point
7/28/2019 2.2ArithmeticFull.pdf
70/71
70
Implementing Floating Point
Operations
Hardware/software
In most general-purpose processors, floating-
point operations are available at the machine-
instruction level, implemented in hardware.
In high-performance processors, a significant
portion of the chip area is assigned to
floating-point operations.Addition/subtraction circuitry
EA E
B
MA
MB
7/28/2019 2.2ArithmeticFull.pdf
71/71
EX
MagnitudeM
with larger EMof number
with smaller EM of number
subtractor8-bit
sign
subtractor8-bit
MUX
Mantissa
SHIFTER
SWAP
detector
Normalize andround
Leading zeros
to right
adder/subtractor
SubtractAdd /
Sign
Add/Sub
n bitsSA SB n EA E
B-=
EA E
B
S
E X-
E MR32-bit
lt
32-bit operands
A : SA
EA M
A,,
B : SB
EB M
B,,
Combinational
CONTROLnetwork