2.2ArithmeticFull.pdf

7/28/2019 2.2ArithmeticFull.pdf

1/71

1

Chapter 6. Arithmetic


2/71

2

Outline

A basic operation in all digital computers is

the addition or subtraction of two numbers.

ALU AND, OR, NOT, XORUnsigned/signed numbers

Addition/subtraction

MultiplicationDivision

Floating number operation


3/71

3

Adders


4/71

4

Addition of Unsigned Numbers

Half Adder

Sums

0

1

1

0

Carryc

0

0

0

1

0

0+

0

1+

1000

1

0+

10

1

1+

01

x

y+

sc

SumCarry

(a) The four possible cases

x y

0

0

1

1

0

1

0

1

(b) Truth table

x

ys

c

HAx

y

s

c

(c) Circuit (d) Graphical symbol


5/71

5

Addition and Subtraction of

Signed Numbers

si =

ci +1 =

Figure 6.1. Logic specification for a stage of binary addition.

13

7+ Y

1

0

0

0

1

0

1

1

0

0

1

1

0

1

1

0

0

1

1

0

1

00

1

0

0

0

0

1

1

1

1

0

0

0

0

1

11

1

Example:

10= = 0

01 1

11 1 0 0

1

1 1 10

Legend for stage i

xi yi Carry-in ci Sum si Carry-out ci +1

X

Z

+ 6 0+xiyi

si

Carry-outci+1

Carry-inci

xi

yi

ci

xiy

ic

ix

iy

ic

ix

iy

ic

i xi yi ci =+ + +

yic

ix

ic

ix

iy

i+ +


6/71

6


Signed Numbers

A full adder (FA)

Full adder

(FA)

ci

yi

xi

ci 1+

si

(a) Logic f or a single stage

ci

yi

xi

ci

yi

xi

xi

ci

yi

si

ci 1+


7/71

7


Signed Numbers

n-bit ripple-carry adder

Overflow? cn cn-1

Subtraction?

FA c0

y1x1

s1

FA

c1

y0x0

s0

FA

cn 1-

yn 1-xn 1-

cn

sn 1-

(b) nbit ripple carry adder

Most significant bit(MSB) position

Least significant bit(LSB) position


8/71

8


Signed Numbers

kn-bit ripple-carry adder

n-bit c0

y

n

x

n

sn

cn

y

0

x

n 1-

s0

ckn

sk 1-( )n

x

0

y

n 1-

y

2n 1-

x

2n 1-

y

kn 1-

sn 1-

s2n 1-

skn 1-

(c) Cascade of k n-bit adders

x

kn 1-

Figure 6.2. Logic for addition of binary vectors.

adder

n-bit

adder

n-bit

adder


9/71

9


Signed Numbers

Addition/subtraction logic unit

Add/Sucontrol

n-bit adder

xn 1- x1 x0

cn

sn 1- s1 s0

c0

yn 1- y1 y0

Figure 6.3. Binary addition-subtraction logic network.


10/71

10

Make Addition Faster


11/71

11

Ripple-Carry Adder (RCA)

Straight-forward design

Simple circuit structure

Easy to understandMost power efficient

Slowest (too long critical path, 2n gate

delays)


12/71

12

Adders

We can view addition in terms of generate,

G[i], and propagate, P[i].


13/71

13

Carry-lookahead Logic

Carry Generate Gi = Ai Bi must generate carry when A = B = 1

Carry Propagate Pi = Ai xor Bi carry-in will equal carry-out here

Si = Ai xor Bi xor Ci = Pi xor Ci

Ci+1 = Ai Bi + Ai Ci + Bi Ci

= Ai Bi + Ci (Ai + Bi)

= Ai Bi + Ci (Ai xor Bi)

= Gi + Ci Pi

Sum and Carry can be reexpressed in terms of generate/propagate/Ci

:


14/71

14


Reexpress the carry logic as follows:

C1 = G0 + P0 C0

C2 = G1 + P1 C1 = G1 + P1 G0 + P1 P0 C0

C3 = G2 + P2 C2 = G2 + P2 G1 + P2 P1 G0 + P2 P1 P0 C0

C4 = G3 + P3 C3 = G3 + P3 G2 + P3 P2 G1 + P3 P2 P1 G0 + P3 P2 P1 P0 C0

Each of the carry equations can be implemented in a two-level logicnetwork

Variables are the adder inputs and carry in to stage 0!


15/71

15

Carry-lookahead

Implementation

Adder with Propagate andGenerate Outputs

Increasingly complex logic

Pi @ 1 gate delay

Ci Si @ 2 gate delays

BiAi

Gi @ 1 gate delay

C0C0

C0

C0P0P0

P0

P0

G0G0

G0

G0

C1

P1

P1

P1

P1

P1

P1 G1

G1

G1

C2P2

P2

P2

P2

P2

P2

G2

G2

C3

P3

P3

P3

P3

G3

C4

Pi &Gi obtained in 1 gate delay fig.(a).

Ci needs 2 more gate delays fig. (b). Total 3 gate delays for ci.

Si needs one more gate delay. Four gate delays for sum bits.

(a)

(b)

(c)

(d)

Higher fan-in

for more

complex logic


16/71

16


Cascaded Carry Lookahead 4-bit adder

Carry lookaheadlogic generatesindividual carries

sums computedmuch faster

A0

B0

C0

S0@2

A1

B1

C1@3

S1@4

A2

B2

C2@3

S2@4

A3

B3

C3

@3

S3@4

C4 @3


17/71

17


Extension

Figure 6.5. 16-bit carry-lookahead adder built from 4-bit adders (see Figure 6.4b).

Carry-lookahead log ic

4-bit adder 4-bit adder 4-bit adder 4-bit adder

s15-12

P3I

G3I

c12

P2I

G2I

c8

s11-8

G1I

c4

P1I

s7-4

G0I

c0

P0I

s3-0

c16

x15-12 y15-12 x11-8 y11-8 x7-4 y7-4 x3-0 y3-0

.

G0II

P0II

c4

3, +2 gate delays= c8

, + 2more gate delays=c12

+ 2 more gate

delays= c16. sum 1 more gate delay. Total 10 delays compared to 32for RCA


18/71

18


4 bit adders with internal carry lookaheadsecond level carry lookahead unit, extends lookahead to 16 bits

Group Propagate P = P3 P2 P1 P0Group Generate G = G3 + G2P3 + G1P3P2 + G0P3P2P1

4-bit Adder

4 4

4

A [15-12] B [15-12] C12C16

S[15-12]

P G4-bit Adder

4 4

4

A [11-8] B[1 1-8] C8

S[1 1-8]

P G4-bit Adder

4 4

4

A [7-4] B [7-4] C4

S[7-4]

P G4-bit Adder

4 4

4

A [3-0] B[3-0] C0

S[3-0]

P G

Lookahead Carry UnitC0

P0 G0P1 G1P2 G2P3 G3 C3 C2 C1

C0

P 3-0 G3-0

C4

@3@2

@0

@4

@4@3@2@5

@7

@3@2@5

@8@8

@3@2

@5

@5@3

@0

C16


19/71

19

UnsignedMultiplication


20/71

20

Manual Multiplication

Algorithm

(13) Multiplicand M1

1

(143) Product P

(11) Multiplier Q1

0

0

1

1

1

1 1 0 1

1011

0000

1011

01 0 0 1 1 1 1

(a) Manual multiplication algorithm


21/71

21

Array Multiplication Multiplicandm 3 m 2 m 1 m 00 0 0 0

q3

q2

q1

q0

0

p2

p1

p0

0

0

0

p3

p4

p5

p6

p7

PP1

PP2

PP3

P artial product(PP0)

p ,p , ...pPP4 = 7 6 0 = Product

Carry-in

qi

mj

Bit of incoming partial product (PP i)

Bit of outgoing partial product [PP( i +1)]

Carry-out

Typical cell

FA

(b) Array implementation


22/71

22


23/71

23

Another Version of 44 Array

Multiplier


24/71

24

Array Multiplication

What is the critical path (worst case signal

propagation delay path)?

Assuming that there are two gate delays from

the inputs to the outputs of a full adder block,

the path has a total of 6(n-1)-1 gate delays,

including the initial AND gate delay in all

cells, for the nn array.Any advantages/disadvantages?


25/71

25

Sequential Circuit Binary

Multiplier

qn 1-

mn 1-

n-bit

Multiplicand M

(a) Register configuration

Controlsequencer

Multiplier Q

0

C

Shift right

Register A (initially 0)

adder

Add/Noaddcontrol

an 1-

a0

q0

m0

0

MUX

1 1 1 1

1 0 1 1

1 1 1 1

1 1 1 0

1 1 1 0

1 1 0 1

1 1 0 1

Initial configuration

Add

M

1 1 0 1

(b) Multiplication example

C

First cycle

Second cycle

Third cycle

Fourth cycle

No add

Shift

ShiftAdd

Shift

ShiftAdd

1 1 1 1

0

0

0

1

0

0

0

1

0

0 0 0 0

0 1 1 0

1 1 0 1

0 0 1 1

1 0 0 1

0 1 0 0

0 0 0 1

1 0 0 0

1 0 0 1

1 0 1 1

QA

Product


26/71

26

Signed Multiplication


27/71

27


Considering 2s-complement signed operands, what will happen to(-13)(+11) if following the same method of unsigned multiplication?

Figure 6.8. Sign extension of negative multiplicand.

1

0

11 11 1 1 0 0 1 1

110

110

1

0

1000111011

000000

1100111

00000000

110011111

13-( )

143-( )

11+( )

Sign extension isshown in blue


28/71

28


For a negative multiplier, a straightforwardsolution is to form the 2s-complement of boththe multiplier and the multiplicand and

proceed as in the case of a positive multiplier. This is possible because complementation of

both operands does not change the value orthe sign of the product.

A technique that works equally well for bothnegative and positive multipliers Boothalgorithm.


29/71

29

Booth Algorithm

Consider in a multiplication, the multiplier ispositive 0011110, how many appropriatelyshifted versions of the multiplicand are addedin a standard procedure?

0

0 0

1 0 1 1 0 1

0

0 0 0 0 0 01

00110101011010

10110101011010

0000000000000

011000101010

0

00

1+ 1+ 1+ 1+


30/71

30

Booth Algorithm

Since 0011110 = 0100000 0000010,(25-21)

if we use the expression to the right, what will

happen?

0

1

0 1 0 1 1 1

0000

00000000000000

1 1 1 1 1 1 1 0 1 0 0 1

00

0

0 0 0 1 0 1 1 0 1

0 0 0 0 0 0 0 0

0110001001000 1

2's complement of

the multiplicand

0

0

00

1+ 1-

00

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 00 0 0 0 0 0 0


31/71

31

Booth Algorithm In general, in the Booth scheme, -1 times the shifted multiplicand

is selected when moving from 0 to 1, and +1 times the shiftedmultiplicand is selected when moving from 1 to 0, as themultiplier is scanned from right to left.

Figure 6.10. Booth recoding of a multiplier.

001101011100110100

00000000 1+ 1-1-1+1-1+1-1+1-1+


32/71

32

Booth Algorithm

Figure 6.11. Booth multiplication with a negative multiplier.

010

1 1 1 1 0 1 10 0 0 0 0 0 0 0 0

000110

0 0 0 0 1 1 01100111

0 0 0 0 0 0

01000 11111

1

10 1 1 0 11 1 0 1 0 6-( )

13+( )

78-( )

+11- 1-


33/71

33

Booth Algorithm

Multiplier

Bit i Bit i 1-

Version of multiplicandselected by biti

0

1

0

0

01

1 1

0 M

1+ M

1 M

0 M

Figure 6.12. Booth multiplier recoding table.


34/71

34

Booth Algorithm

Best case a long string of 1s (skipping over 1s) Worst case 0s and 1s are alternating

1

0

1110000111110000

001111011010001

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0

000000000000

00000000

1- 1- 1- 1- 1- 1- 1- 1-

1- 1- 1- 1-

1-1-

1+ 1+ 1+ 1+ 1+ 1+ 1+ 1+

1+

1+1+1+

1+

Worst-case

multiplier

Ordinarymultiplier

Good

multiplier


35/71

35

Booth Algorithm - Advantages

Handles both positive and negative

multipliers uniformly

Efficient if large blocks of ones exist

On the average, speed same as that of

normal algorithm


36/71

36

Fast Multiplication


37/71

37

Bit-Pair Recoding of

Multipliers

Bit-pair recoding halves the maximum number of

summands (versions of the multiplicand).

1+1

(a) Example of bit-pair recoding derived from Booth recoding

0

000

1 1 0 1 0

Implied 0 to right of LSB

1

0

Sign extension

1

21


38/71

38


Multipliers

i 1+ i 1

(b) Table of multiplicand selection decisions

selected at positioni

MultiplicandMultiplier bit-pair

i

0

0

1

1

1

0

1

0

1

1

1

1

0

0

0

1

1

0

0

1

0

0

1

Multiplier bit on the right

0 0 M

1+

1

1+

0

1

2

2+

M

M

M

M

M

M

M


39/71

39


Multipliers

1-

00001 1 1 1 1 00 0 0 0 111 1 1 1 10 0

0 0 0 0 0 00000 111111

0 1 1 0 1

01 010011111

1 1 1 1 0 0 1 10 0 0 0 0 0

1 1 1 0 1 1 0 0 1 0

0

1

0 0

1 0

1

0 0

00 1

0

0 1

10

0

0100 1 1 0 1

11

1-

6-( )13+( )

1+

78-( )

1- 2-

Figure 6.15. Multiplication requiring only n/2 summands.

ddi i f


40/71

40

Carry-Save Addition of

Summands(13) Multiplicand M1

1

(143) Product P

(11) Multiplier Q1

0

0

1

1

1

1 1 0 1

1011

0000

1011

01 0 0 1 1 1 1

(a) Manual multiplication algorithm

C S ddi i f


41/71

41


SummandsMultiplicand

m3

m2

m1

m0

0 0 0 0

q3

q2

q1

q0

0

p2

p1

p0

0

0

0

p3

p4

p5

p6

p7

PP1

PP2

PP3

P artial product(PP0)

p ,p , ...pPP4 = 7 6 0 = Product

Carry-in

qi

mj

Bit of incoming partial product (PP i)

Bit of outgoing partial product [PP( i +1)]

Carry-out

T ypical cell

FA

(b) Array implementation

C S Addi i f


42/71

42


Summands

CSA speeds up the addition process.

FA FA FAFA

FA FA FAFA

FA FA FAFA

p7 p6 p5 p4 p3 p1 p0p2

0 m3q0

m3q1

(a) Ripple-carry array (Figure 6.6 structure)

m2q1

m2q0 m1q0

m1q1 m0q1

m3q2 m2q2 m1q2 m0q2

m3q3 m2q3 m1q3 m0q3

0

0

0

m0q0



43/71

43


Summands

FA FA FAFA

FA FA FAFA

FA FA FAFA

p7 p6 p5 p4 p3 p1 p0p2

0 m3q0

m3q1

(b) Carry-save array

m2q1

m2q0 m1q0

m1q1 m0q1

m2q3 m1q3 m0q3 0

0

0

m2q2 m1q2 m0q2m3q2

m3q3

m0q0

Figure 6.16. Ripple-carry and carry-save arrays for themultiplication operation M x Q = P for 4-bit operands.

Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands.

C S Additi f


44/71

44


Summands

The delay through the carry-save array is somewhatless than delay through the ripple-carry array. This isbecause the S and C vector outputs from each roware produced in parallel in one full-adder delay.

Consider the addition of many summands, we can: Group the summands in threes and perform carry-save addition on

each of these groups in parallel to generate a set of S and C vectorsin one full-adder delay

Group all of the S and C vectors into threes, and perform carry-saveaddition on them, generating a further set of S and C vectors in onemore full-adder delay

Continue with this process until there are only two vectors remaining They can be added in a RCA or CLA to produce the desired product

C S Additi f


45/71

45


Summands

Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure 6.18.

100 1 11

100 1 11

100 1 11

11111 1

100 1 11 M

Q

A

B

C

D

E

F

(2,835)

X

(45)

(63)

100 1 11

100 1 11

100 1 11

000 1 11 111 0 00 Product

100 1 11 M

Q


46/71

46Figure 6.18. The multiplication example from Figure 6.17 performed usingcarry-save addition.

00000101 0 10

10010000 1 11 1

+

1000011 1

10010111 0 10 1

0110 1 10 0

00011010 0 00

10001011 1 0 1

110001 1 0

00111100

00110 1 10

11001 0 01

100 1 11

100 1 11

100 1 11

00110 1 10

11001 0 01

100 1 11

100 1 11

100 1 11

11111 1 Q

A

B

C

S1

C1

D

E

F

S2

C2

S1

C1

S2

S3

C3

C2

S4

C4

Product

x

C S Additi f


47/71

47


Summands

Figure 6.19. Schematic representation of the carry-save

C2

ABE D CF

addition operations in Figure 6.18.

Level 1 CSA

S2 C1 S1

C2 C3 S3

C4 S4

+

Product

Level 2 CSA

Level 3 CSA

Final addition

Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18.

C S Additi f


48/71

48


Summands

When the number of summands is large, the

time saved is proportionally much greater.

Some omitted issues: Sign-extension

Computation width of the final CLA/RCA

Bit-pair recoding


49/71

49

Integer Division


50/71

50

Manual Division

Figure 6.20. Longhand division examples.

1101

1

13

14

26

21

274 100010010

10101

1101

1

1110

1101

10000

13 1101


51/71

51

Longhand Division Steps

Position the divisor appropriately with respect to the

dividend and performs a subtraction.

If the remainder is zero or positive, a quotient bit of 1

is determined, the remainder is extended by anotherbit of the dividend, the divisor is repositioned, and

another subtraction is performed.

If the remainder is negative, a quotient bit of 0 is

determined, the dividend is restored by adding backthe divisor, and the divisor is repositioned for

another subtraction.


52/71

52

Circuit Arrangement

qn 1-

mn 1-

-bit

Divisor M

Controlsequencer

Dividend Q

Shift left

adder

an 1-

a0

q0

m0

an

0

Add/Subtract

Quotientsetting

n 1+

Figure 6.21. Circuit arrangement for binary division.

A


53/71

53

Restoring Division

Shift A and Q left one binary position

Subtract M from A, and place the answer

back in A

If the sign of A is 1, set q0to 0 and add M

back to A (restore A); otherwise, set q0to 1

Repeat these stepsn

times

01


54/71

54

Examples

10111

Figure 6.22. A restoring-division example.

1 1 1 1 1

01111

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

1 01

111 1 0001

Subtract

Shift

Restore

1 0000

1 0000

1 1

Initially

Subtract

Shift

10111

10000

11000

00000

Subtract

Shift

Restore

10111

01000

10000

1 1

QuotientRemainder

Shift

10111

1 0000

Subtract

Second cycle

First cycle

Third cycle

Fourth cycle

0

0

0

0

0

0

1

0

1

10000

1 1

1 0000

11111

Restore

q0Set

q0Set

q0Set

q0Set


55/71

55

Nonrestoring Division

Avoid the need for restoring A after an unsuccessfulsubtraction.

Any idea?

A +ve, shift left and subtract =2A-M

A -ve, restore,shift,subtract = A+M, 2(A+M),2A+M

Step 1: (Repeat n times) If the sign of A is 0, shift A and Q left one bit position and

subtract M from A; otherwise, shift A and Q left and add M

to A. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.

Step2: If the sign of A is 1, add M to A


56/71

56

Examples

Figure 6.23. A nonrestoring-division example.

1

Add

Quotient

Remainder

0 0 0 01

0 0 1 01 1 1 1 1

1 1 1 1 1

0 0 0 1 1

0 0 0 01 1 1 1 1

Shift 0 0 0

11000

01111

Add

0 0 0 1 1

0 0 0 0 1 0 0 0

1 1 1 0 1

Shift

Subtract

Initially 0 0 0 0 0 1 0 0 0

1 1 1 0 0000

1 1 1 0 0

0 0 0 1 1

0 0 0Shift

Add

0 0 10 0 0 01

1 1 1 0 1

Shift

Subtract

0 0 0 110000

Restore remainder

Fourth cycle

Third cycle

Second cycle

First cycle

q0Set

q0

Set

q0

Set

q0

Set


57/71

57

Floating-Point Numbersand Operations


58/71

58

Floating-Point Numbers

So far we have dealt with fixed-point numbers (whatis it?), and have considered them as integers.

Floating-point numbers: the binary point is just to theright of the sign bit.

Where the range of F is:

The position of the binary point is variable and isautomatically adjusted as computation proceeds.

)1(210. = nbbbbB )1(

)1(

2

2

1

1

0

0 2222)(

++++=

n

nbbbbBF

)1(

211

n

F


59/71

59


What are needed to represent a floating-point

decimal number?

Sign

Mantissa (the significant digits)

Exponent to an implied base (scale factor)

Normalized the decimal point is placed to

the right of the first (nonzero) significant digit.

IEEE Standard for Floating


60/71

60

IEEE Standard for Floating-

Point Numbers

Think about this number (all digits are decimal):

X1.X

2X

3X

4X

5X

6X

710Y1Y2

It is possible to approximate this mantissa precision

and scale factor range in a binary representationthat occupies 32 bits: 24-bit mantissa (1 sign bit for

signed number), 8-bit exponent.

Instead of the signed exponent, E, the value actually

stored in the exponent field is an unsigned integerE=E+127, so called excess-127 format


61/71

61

IEEE Standard

Sign ofnumber :

32 bits

mantissa fraction23-bit

representationexcess-127

exponent in8-bit signed

52-bitmantissa fraction

11-bit excess-1023exponent

64 bits

Sign

Value represented

0 0 10 1 0 . . . 00 0 0 1 0 1 0 0 0

S M

S M

Value represented

(a) Single precision

(b) Example of a single-precision number

(c) Double precision

Figure 6.24. IEEE standard floating-point formats.

E

+

1.001010

0 287-

=

1.M 2E 127-=

Value represented 1.M 2E 1023-

=

E

0 signifies-1 signifies

(101000)2=4010, 40-127=-87


62/71

62

IEEE Standard

For excess-127 format, 0 E 255.

However, 0 and 255 are used to represent

special value. So actually 1 E 254. That

means -126 E 127.Single precision uses 32-bit. The value range

is from 2-126 to 2+127.

Double precision used 64-bit. The valuerange is from 2-1022 to 2+1023.


63/71

63

Two Aspects

If a number is not normalized, it can always be put in normalizedform by shifting the fraction and adjusting the exponent.

0 1 1 00 1 0 0 0 0 1 0 1

(a) Unnormalized value

(b) Normalized version

0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 ...

(There is no implicit 1 to the left of the binary point.)

Value represented 0.0010110 29

+=

...

Value represented 1.0110 26

+=

Figure 6.25. Floating-point normalization in IEEE single-precision format.

excess-127 exponent

(10001000)2=13610, 136-127=-9

6+127=133. 13310, = (100000101)2


64/71

64

Two Aspects

As computations proceed, a number that

does not fall in the representable range of

normal numbers might be generated.

It requires an exponent less than -126

(underflow) or greater than +127 (overflow).

Both are exceptions that need to be

considered.


65/71

65

Special Values

The end value 0 and 255 are used to represent

special values.

When E=0 and M=0, the value exact 0 is

represented. (0) When E=255 and M=0, the value is represented.

( )

When E=0 and M0, denormal numbers are

represented. The value is 0.M2-126. When E=255 and M0, Not a Number (NaN).


66/71

66

Exceptions

A processor must set exception flags if any of

the following occur in performing operations:

underflow, overflow, divide by zero, inexact

(requires rounding), invalid (0/0).When exception occurs, the results are set to

special values.

Arithmetic Operations on


67/71

67

Arithmetic Operations on


Add/Subtract rule Choose the number with the smaller exponent and shift its mantissa right a

number of steps equal to the difference in exponents. Set the exponent of the result equal to the larger exponent. Perform addition/subtraction on the mantissas and determine the sign of the

result. Normalize the resulting value, if necessary. Multiply rule Add the exponents and subtract 127. Multiply the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.

Divide rule Subtract the exponents and add 127. Divide the mantissas and determine the sign of the result. Normalize the resulting value, if necessary.


68/71

68

Guard Bits and Truncation

During the intermediate steps, it is important

to retain extra bits, often called guard bits, to

yield the maximum accuracy in the final

results.Removing the guard bits in generating a final

result requires truncation of the extended

mantissa how?


69/71

69

Guard Bits and Truncation

Chopping biased, 0 to 1 at LSB. Von Neumann Rounding (any of the bits to be removed are 1,

the LSB of the retained bits is set to 1) unbiased, -1 to +1 atLSB.

Why unbiased rounding is better for the cases that manyoperands are involved?

Rounding (A 1 is added to the LSB position of the bits to beretained if there is a 1 in the MSB position of the bits beingremoved) unbiased, - to + at LSB.

Round to the nearest number or nearest even number in case of a tie

(0.b-1b-20000 - 0.b-1b-20, 0.b-1b-21100 - 0.b-1b-21+0.001) Best accuracy Most difficult to implement

0.b-1b-2b-3000 -- 0.b-1b-2b-31110.b-1b-2b-3

All 6-bit fractions with b-4b-5b6 not equal to000 are truncated to 0.b-1b-21

Implementing Floating-Point


70/71

70

Implementing Floating Point

Operations

Hardware/software

In most general-purpose processors, floating-

point operations are available at the machine-

instruction level, implemented in hardware.

In high-performance processors, a significant

portion of the chip area is assigned to

floating-point operations.Addition/subtraction circuitry

EA E

B

MA

MB


71/71

EX

MagnitudeM

with larger EMof number

with smaller EM of number

subtractor8-bit

sign

subtractor8-bit

MUX

Mantissa

SHIFTER

SWAP

detector

Normalize andround

Leading zeros

to right

adder/subtractor

SubtractAdd /

Sign

Add/Sub

n bitsSA SB n EA E

B-=

EA E

B

S

E X-

E MR32-bit

lt

32-bit operands

A : SA

EA M

A,,

B : SB

EB M

B,,

Combinational

CONTROLnetwork

Date post:	03-Apr-2018
Category:	Documents
Upload:	kaylaroberts
View:	215 times
Download:	0 times

2.2ArithmeticFull.pdf

Documents