Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | claire-osborne |
View: | 213 times |
Download: | 0 times |
arithmetic.12/15
Computer Arithmetic
ALU Performance is critical( App. C5, C6 4th ed.)
arithmetic.22/15
Requirements: CPU needs a 32-bit ALU(1) Functional Specification
inputs: 2 x 32-bit operands A, B, 4-bit modeoutputs: 32-bit result S, 1-bit carry, 1 bit overflowoperations: add, addu, sub, subu, and, or, xor, nor, slt, sltU
(2) Block Diagram (schematic symbol/ Verilog description)
ALUALUA B
movf
S
32 32
32
4c
arithmetic.32/15
1-bit adder Review (Appendix B.5, B.6)
A B C Co Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Sum = a!bc! + ab!c! + a!b!c+abc
= a b c = XOR
Carryout = a!bc + ab!c + abc! + abc
a
b
SumSum
CarryIn
CarryOut
a
b
Cin
CoA
B
Cinsum
2 units of delay from A/B to sum
1unit of delay from Cin to sum
arithmetic.42/15
Carry Out circuit
b
CarryOut
a
CarryInCin
a
b
Cout2 units of delay
from Cin to Cout
arithmetic.52/15
1-bit ALU cell: ADD, AND, OR
A
B
1-bitFull
Adder
CarryOut
Mu
x
CarryIn
Result
add
and
or
S-select
A B C Co
O
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Full Adder(3->2 element)
arithmetic.62/15
Additional operations: Subtract, AND, OR
• A - B = A + (– B) = A + B + 1– form two complement by invert and add one
A
B
1-bitFull
Adder
CarryOut
Mu
x
CarryIn
Result
add
and
or
S-selectinvert
arithmetic.72/15
1-bit ALU: AND, OR, a+b, a+b!
Most significant bit
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Set
Overflow detection Overflow
a.
b.
ALU Delays
Result = 1 gate delay
From a to result = 2
Form b to Result = 2 (ignore b invert)
arithmetic.82/15
Final 32-bit ALU,
including zero detect
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
Operation
arithmetic.92/15
Behavioral Representation: verilog, RTL FYI)
module ALU(A, B, m, S, c, ovf);input [0:31] A, B;input [0:3] m;output [0:31] S;output c, ovf;
reg [0:31] S;reg c, ovf;
always @(A, B, m) begincase (m)
0: S = A + B;
. . .
endendmodule
• Code written, simulated & verified
• translated into hardware (mapped)
• How complex digital design is done
arithmetic.102/15
Overflow ?? - 4-bit example
• Examples: 7 + 3 = 10 but ...
• - 4 - 5 = - 9 but ...
2’s ComplementBinaryDecimal
0 0000
1 0001
2 0010
3 0011
0000
1111
1110
1101
Decimal
0
-1
-2
-3
4 0100
5 0101
6 0110
7 0111
1100
1011
1010
1001
-4
-5
-6
-7
1000-8
0 1 1 1
0 0 1 1+
1 0 1 0
1
1 1 0 0
1 0 1 1+
0 1 1 1
110
7
3
1
– 6
– 4
– 5
7
arithmetic.112/15
Overflow Detection• Overflow: arithmetic result too large (or too small) to represent properly
– Example: - 8 4-bit binary number 7
• When adding operands with different signs, overflow cannot occur!
• Overflow occurs when adding:
– 2 positive numbers and sum is negative
– 2 negative numbers and the sum is positive
• On your own: Prove you can detect overflow by:
– Carry into MSB Carry out of MSB
0 1 1 1
0 0 1 1+
1 0 1 0
1
1 1 0 0
1 0 1 1+
0 1 1 1
110
7
3
1
– 6
–4
– 5
7
0
arithmetic.122/15
Overflow Detection Logic
• Carry into MSB Carry out of MSB– For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1]
CarryIn0
A0
B0
1-bitALU
Result0
CarryOut0
A1
B1
1-bitALU
Result1
CarryIn1
CarryOut1
A2
B2
1-bitALU
Result2
CarryIn2
A3
B3
1-bitALU
Result3
CarryIn3
CarryOut3
Overflow
X Y X XOR Y
0 0 0
0 1 1
1 0 1
1 1 0
arithmetic.132/15
MIPS ALU requirements
• Add, AddU, Sub, SubU, AddI, AddIU – => 2’s complement adder/sub with overflow detection
• And, Or, AndI, OrI, Xor, Xori, Nor– => Logical AND, logical OR, XOR, nor
• SLTI, SLTIU (set less than)– => 2’s complement adder with inverter, check sign bit of result
• ALU must support these ops
arithmetic.142/15
MIPS arithmetic instruction format - Review
• Signed arithmetic generate overflow, no carry
R-type:
I-Type:
31 25 20 15 5 0
op Rs Rt Rd funct
op Rs Rt Immed 16
Type op funct
ADDI 10 xx
ADDIU 11 xx
SLTI 12 xx
SLTIU 13 xx
ANDI 14 xx
ORI 15 xx
XORI 16 xx
LUI 17 xx
Type op funct
ADD 00 40
ADDU 00 41
SUB 00 42
SUBU 00 43
AND 00 44
OR 00 45
XOR 00 46
NOR 00 47
Type op funct
00 50
00 51
SLT 00 52
SLTU 00 53
arithmetic.152/15
Ripple Adder Performance?• Critical Path of n-bit
Rippled-carry adder is n*CP
A0
B0
1-bitALU
Result0
CarryIn0
CarryOut0
A1
B1
1-bitALU
Result1
CarryIn1
CarryOut1
A2
B2
1-bitALU
Result2
CarryIn2
CarryOut2
A3
B3
1-bitALU
Result3
CarryIn3
CarryOut3
Very slow: Must improveAssume t = carry delay / bit32- bit ALU needs 32 * t units of delay64-bit ALU needs64 * t units of delay
A
B
Cin sum
2 units of delay from A/B to sum
1unit of delay from Cin to sum
b
CarryOut
a
CarryIn
arithmetic.162/15
Fast Addition : Carry Lookahead
• Carry Inputs can be precomputed by logic c1 = g0 + c0 p0 = a0 b0 + c0 (a0 + b0) p0 = a0 + b0 g0 = a0 b0
c2 = g1 + p1 c1 = g1 + p1 g0 + p1 p0 c0 = a1 b1 + c1 a1 + b1) p1 = a1 + b1 g1 = a1 b1 c3 = g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 c0
c4 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 c0
C4= func( a3, b3, a2, b2, a1, b1, a0, b0, c0)
1 unit delay each p, g
1 unit delay
3 units of delay
3 units of delay
3 units of delay
arithmetic.172/15
Fast Addition: Carry Look Ahead – 4 bits A B C-out0 0 0 “kill”0 1 C-in “propagate”1 0 C-in “propagate”1 1 1 “generate”
g = a and b 1 delay p = a or b
C0 = Cin
c1 = g0 + c0 p0
c2 = g1 + g0 p1 + c0 p0 p1
c3 = g2 + g1 p2 + g0 p1 p2 + c0 p0 p1 p2
a0
b0
a1
b1
a2
b2
a3
b3
S
S
S
S
gp
gp
gp
gp
G0=g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0
C4 = . . .
P0 = p3 p2 p1 p0
3 units of delay for G0
3 units of delay for c1, c2, c3, (c4)4 units of delay for S1, S2, S3
3
3
3
4
4
4
2
arithmetic.182/15
Carry Lookahead – 2nd level – 16 bits Add 2nd level abstraction for more practical 4-bit units Each Pi, Gi handles 4 bits at a time, 0-3, 4-7, 8-11,..)
P0 = p3 p2 p1 p0 ; G0 = g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0
P1 = p7 p6 p5 p4 ;G1 = g7 + p7 g6 + p7 p6 g5 + p7 p6 p5 g4
P2 = p11 p10 p9 p8 ;G2 =g11 + p11 g10 + p11 p10 g9 + p11 p10 p9 g8
P3 = p15 p14 p13 p12;G3 = …….
3 units of delay for G0, G1, G2, G3
2 units of delay for P0, P1, P2, P3
arithmetic.192/15
Fast Addition: Cascaded Carry Look-ahead (16-bit):
CLA
4-bitAdder
4-bitAdder
4-bitAdder
c4 = G0 + C0 P0
c8 = G1 + G0 P1 + C0 P0 P1
c12 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2
GP
G0P0
c16 = . . .
C0
5 units of delay for c8, c12, c16
c4 has 4 units of delay
c8
c12
5
5
4
arithmetic.202/15
Carry Lookahead Homework
You are required to calculate the performance of a 16-bit Carry lookahead adder similar to the one discussed in class. The design has 2 options
1. assuming ripple carry is used inside each 4-bit cell2. Carry lookahead is used inside each 4-bit cell
•Both cases use carry lookahead at predicting 4-bit boundary carries [c4, c8, c12]•Draw a table showing the delay of each adder bit i.e. Sum0 - Sum 15; as well as the carry at each stage of the design – for the 2 designs
arithmetic.212/15
8-bit carry lookahead adder (4-bit block is also CLA)
c5= g4 + c4.p4Delays 1 4 1
S0
S1
S2
S3
a4b4
S4
S5
S6
S7
a5b5
a6b6
a7b7
c4= G0 + c0 P0
2nd level carry lookahead
a0b0
a1b1
a2b2
a3b3
3
3
3
4 units of delay
6
6
6
G0
P0
G1
P1
5
6
arithmetic.222/15
8-bit CLA – uses ripple carry inside 4-bit block
a0b0
Result0
Result1
Result2
Result3
a1b1
a2b2
a3b3
a4b4
Result4
Result5
Result6
Result7a7b7
a6b6
a5b5
2nd level carry lookahead c4
0
2
4
6
4
6
8
10
2
3
5
7
5
7
9
11
arithmetic.232/15
Additional MIPS ALU requirements
• Mult, MultU, Div, DivU => Need 32-bit multiply and divide, signed and unsigned
• Sll, Srl, Sra => Need left shift, right shift, right shift arithmetic by 0 to 31 bits
• Nor (leave as exercise !)=> logical NOR or use 2 steps: (A OR B) XOR 1111....1111
arithmetic.242/15
Multiply, Divide & Shift
arithmetic.252/15
MIPS arithmetic instructions
• Instruction Example Meaning Comments• add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible• subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible• add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible• add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions• subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions• add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 +
constant; no exceptions• multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product• multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product• divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder • Hi = $2 mod $3 • divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder • Hi = $2 mod $3• Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi• Move from Lo mflo $1 $1 = Lo Used to get copy of Lo
arithmetic.262/15
MULTIPLY (unsigned)• Paper and pencil example (unsigned):
Multiplicand 1000 AMultiplier 1001 B 1000
0000 0000 1000
Product 01001000• m bits x n bits = m+n bit product• Binary makes it easy:
–0 => place 0 ( 0 x multiplicand)–1 => place a copy ( 1 x multiplicand)
• 4 versions of multiply hardware & algorithm: –successive refinement
arithmetic.272/15
Fast Multiply== Array Multiplier
• Stage i accumulates A * 2 i if Bi == 1
• Q: How much hardware for 32 bit multiplier? Critical path?
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 0
FA
bj sum in
sum out
carryout
ai
carryin
Bi
Aj
Multiplicand A
Multiplier BProduct P
Cell delays ?
arithmetic.282/15
Multiplier operation
• At each stage shift multiplicand left ( x 2)
• Multiplier bit Bi determines : add in shifted multiplicand
• Accumulate 2n bit partial product at each stage
B0
A0A1A2A3
A0A1A2A3
A0A1A2A3
A0A1A2A3
B1
B2
B3
P0P1P2P3P4P5P6P7
0 0 0 00 0 0
Multiplication, using shift & Add
arithmetic.29
Multiplication, using shift & Add
• long-multiplication approach
1000× 1001 1000 0000 0000 1000 1001000
Length of product is the sum of operand lengths
multiplicand
multiplier
product
2/15
arithmetic.30
Multiplication Hardwareusing shift & Add
Initially 0
2/15
arithmetic.31
Optimized Multiplierusing shift & Add
• Perform steps in parallel: add/shift
One cycle per partial-product addition ok, if frequency of multiplications is low
2/15
32 – bit ALU, multiplicand
arithmetic.322/15
Multiply Algorithm
DoneYes: 32 repetitions
2. Shift the Product register right 1 bit.
No: < 32 repetitions
1. TestProduct0
Product0 = 0Product0 = 1
1a. Add multiplicand to the left half of product & place the result in the left half of Product register
32nd repetition?
Start
0000 0011 0010 1: 0010 0011 0010 2: 0001 0001 0010 1: 0011 0001 0010 2: 0001 1000 0010 1: 0001 1000 0010 2: 0000 1100 0010 1: 0000 1100 0010 2: 0000 0110 0010
0000 0110 0010
Product Multiplicand
arithmetic.332/15
MIPS logical instructions• Instruction Example Meaning Comment
• and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND• or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR• xor xor $1,$2,$3 $1 = $2 $3 3 reg. operands; Logical XOR• nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR• and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant• or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant• xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant• shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant• shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant• shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) • shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable• shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable• shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
arithmetic.342/15
How shift instructions are implemented
Two kinds: logical-- value shifted in is always "0"
arithmetic-- on right shifts, sign extend
msb lsb"0" "0"
msb lsb "0"
instruction can request 0 to 32 bits to be shifted!
1011 1110
shift right arithmeticby 2
1100 1011
shift right logical by 2
arithmetic.35
– Shift value can be either be:• 5 bit unsigned integer• Specified in bottom byte of another
register.
Example: ADD r0, r1, r2, LSL#7
• Semantics: r2 is shifted left by 7 & then added to r1
Result
Operand 1
BarrelShifter
Operand 2
ALU
ARM :: Barrel Shifter:
2/14
arithmetic.362/15
Barrel Shifter, used in ICsShift Right using one transistor per switch
D3
D2
D1
D0
A6
A5
A4
A3 A2 A1 A0
SR0SR1SR2SR3
arithmetic.37
Barrel Shifter, used in ICsShift ……Left & right
D3
D2
D1
D0
A5
A4
A3
A2 A1 A0
SR0SR1SR2SL 1 SL 2 SL3
arithmetic.382/15
Summary: Multiply & Shift• Multiply: successive refinement to see final design
– 32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register
• Fast multiply Array multiplier
• Shifter: success refinement 1/bit at a time shift register to barrel shifter
arithmetic.392/15
Floating Point Arithmetic
• How to represent – numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576 109
• Fixed point• Floating point: a number system with floating decimal
point• Normalized numbers: no leading 0’s , single digit before
decimal point1.0 x3.1557 x350.03
10 9
109
arithmetic.402/15
Floating Point Notation – IEEE 754 FP
6.02 x 10 1.673 x 1023 -24
exponent
radix (base)Mantissa
decimal point
Sign, magnitude
Sign, magnitude
IEEE F.P. ± 1.M x 2e - 127
• Issues:– Arithmetic (+, -, *, / )– Representation, Normal form– Range and Precision, Single, Double– Rounding– Exceptions (e.g., divide by zero, overflow, underflow)
arithmetic.412/15
Floating-Point ArithmeticFloating point numbers in IEEE 754 standard:
single precision1 8 23
sign
exponent:excess 127binary integer
mantissa:sign + magnitude, normalizedbinary significand w/ hiddeninteger bit: 1.M
actual exponent ise = E - 127
S E M
N = (-1) 2 (1.M)S E-127
0 < E < 255
0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0
Numbers that can be represented is in the range:
2-126
(1.0) to 2127
(2 - 2-23 )
Double Precision IEEE 754 [64-bits]
Exponent = 11 bits, Bias = 1023, Mantissa = 52, Sign= 1bit
127
arithmetic.422/15
Exponent Bias used to simplify comparisons
• If we use 2’s complement, not good for sorting and comparison
0000 0000 1111 1111most negative most positiveexponent exponent
arithmetic.432/15
Floating Point – Example review
•
• Represents – bias = 127 for 32-bit word– S = 1: negative
0: positive or zero
• Example (from fraction to floating point representation)-0.75
S exponent significant
( ) ( ) (exp. ) 1 1 2s biassignificant
arithmetic.442/15
Floating-Point Example - review
• Represent –0.75– –0.75 = (–1)1 × 1.12 × 2–1
– S = 1
– Fraction = 1000…002
– Exponent = –1 + Bias = 126• Single: –1 + 127 = 126 = 011111102
• Double: –1 + 1023 = 1022 = 011111111102
• Single: 1011111101000…00• Double: 1011111111101000…00
arithmetic.452/15
Addition – Multiply Algorithm issuesFor addition (or subtraction) :
(1) compute Ye - Xe (getting ready to align binary point)
(2) right shift Xm that many positions to form Xm 2
(3) compute Xm 2 + Ym
(4) for multiply, doubly biased exponent must be corrected:
Xe = 7 Ye = -3 Excess 8 extra subtraction step of the bias amount
Xe-Ye
Xe-Ye
Xe = 1111Ye = 0101 10100
= 15= 5 20
= 7 + 8= -3 + 8 4 + 8 + 8
arithmetic.462/15
Floating Point Addition
• Step 1: align, round
• Step 2: add
• Step 3: normalize, check overflow or underflow
• Step 4: round
• Example: 9 99 10 1610 10 1. .ten ten
arithmetic.472/15
Floating Point Multiplication
• Step 1: add exponents, subtract bias, Mpy mantissas
• Step 2: normalize and check over/underflow
• Step 3: round
• Step 4: check sign
• Example: 05 0 4375. ( . )
arithmetic.48
FP Adder Hardware
• more complex than integer adder
• Doing it in one clock cycle - takes too long– Much longer than integer operations– Slower clock would penalize all instructions
• FP adder usually takes several cycles– pipelined
2/15
arithmetic.49
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
2/15
arithmetic.502/15
Floating Point: Overflow & Underflow
• Exponent too large to be represented
• Underflow: negative exponent too small to fit in exponent field
arithmetic.512/15
Summary of Floating Point Arithmetic
• IEEE floating point standard 32 bit and 64 bit
• Converting decimal numbers to floating point and vice versa
• Overflow and underflow
• Floating point add and multiply