331 W07.1 Fall 2003
14:332:331Computer Architecture and Assembly Language
Fall 2003
Week 7
[Adapted from Dave Patterson’s UCB CS152 slides and
Mary Jane Irwin’s PSU CSE331 slides]
331 W07.2 Fall 2003
331 W07.3 Fall 2003
Head’s UpThis week’s material
MIPS logic and multiply instructions- Reading assignment – PH 4.4
MIPS ALU design- Reading assignment – PH 4.5
Next week’s materialBuilding a MIPS datapath
- Reading assignment – PH 5.1-5.2
331 W07.4 Fall 2003
Review: MIPS Arithmetic Instructions
R-type:
I-Type:
31 25 20 15 5 0
op Rs Rt Rd funct
op Rs Rt Immed 16
expand immediates to 32 bits before ALU10 operations so can encode in 4 bits
32
32
32
m (operation)
result
A
B
ALU
4
zeroovf
11
0 add
1 addu
2 sub
3 subu
4 and
5 or
6 xor
7 nor
a slt
b sltu
Type op funct
ADD 00 100000
ADDU 00 100001
SUB 00 100010
SUBU 00 100011
AND 00 100100
OR 00 100101
XOR 00 100110
NOR 00 100111
Type op funct
00 101000
00 101001
SLT 00 101010
SLTU 00 101011
00 101100
331 W07.5 Fall 2003
Review: A 32-bit Adder/Subtractoradd/subt
1-bit FA
c1
c0=carry_in
S0
1-bit FA
c2
S1
1-bit FA
c3
S2
c32=carry_out
1-bit FA
A31
B31
S31
c31
. . .
Built out of 32 full adders (FAs) A0
B0
A1
B1
A2
B2
1 bit FA
A
BS
carry_in
carry_out
S = A xor B xor carry_in
carry_out = A∧B v A∧carry_in v B∧carry_in(majority function)
Small but slow!
331 W07.6 Fall 2003
Minimal Implementation of a Full AdderGate library: inverters, 2-input nands, or-and-invertersarchitecture concurrent_behavior of full_adder is
signal t1, t2, t3, t4, t5: std_logic;begin
t1 <= not A after 1 ns;t2 <= not cin after 1 ns;t4 <= not((A or cin) and B) after 2 ns;t3 <= not((t1 or t2) and (A or cin)) after 2 ns;t5 <= t3 nand B after 2 ns;S <= not((B or t3) and t5) after 2 ns;cout <= not(t1 or t2) and t4) after 2 ns;
end concurrent_behavior;Can you create the equivalent schematic? Can you
determine worst case delay (the worst case timing path through the circuit)?
331 W07.7 Fall 2003
Logic OperationsLogic operations operate on individual bits of the operand.
$t2 = 0…0 0000 1101 0000$t1 = 0…0 0011 1100 0000
and $t0, $t1, $t2 $t0 =
or $t0, $t1 $t2 $t0 =
xor $t0, $t1, $t2 $t0 =
nor $t0, $t1, $t2 $t0 =
How do we expand our FA design to handle the logic operations - and, or, xor, nor ?
331 W07.8 Fall 2003
A Simple ALU Cell
1-bit FA
carry_in
carry_out
A
B
add/subt
add/subt
result
op
331 W07.9 Fall 2003
An Alternative ALU Cell
1-bit FA
carry_in
s2
s1
s0
result
carry_out
A
B
331 W07.10 Fall 2003
The Alternative ALU Cell’s Control Codes
transfer AA1110
complement A!Ax111andA and Bx011xorA xor Bx101orA or Bx001
decrement AA – 10110subtractA – B1010subt with borrowA – B – 10010add with carryA + B + 11100addA + B0100increment AA + 11000transfer AA0000
functionresultc_ins0s1s2
331 W07.11 Fall 2003
Tailoring the ALU to the MIPS ISA
Need to support the set-on-less-than instruction (slt)
remember: slt is an arithmetic instruction
produces a 1 if rs < rt and 0 otherwise
use subtraction: (a - b) < 0 implies a < b
Need to support test for equality (beq)
use subtraction: (a - b) = 0 implies a = b
Need to add the overflow detection hardware
331 W07.12 Fall 2003
Modifying the ALU Cell for slt
1-bit FA
carry_in
carry_out
add/subt op
less
A
result
B
add/subt
331 W07.13 Fall 2003
Modifying the ALU for slt
+
less
+
less
+
A31
B31less
. . .
A0
First perform a subtraction
Make the result 1 if the subtraction yields a negative result
Make the result 0 if the subtraction yields a positive result
result0
B0
A1
result1B1
result31
331 W07.14 Fall 2003
+
less
+
result0
less
+
A31
B31less
. . .
0
set
add/subtop
Modifying the ALU for ZeroA0
First perform subtraction
Insert additional logic to detect when all result bits are zero
B0
A1
result1
B10
result31
331 W07.15 Fall 2003
Review: Overflow DetectionOverflow: the result is too large to represent in the number of bits allocated
Overflow occurs whenadding two positives yields a negative or, adding two negatives gives a positiveor, subtract a negative from a positive gives a negativeor, subtract a positive from a negative gives a positive
On your own: Prove you can detect overflow by:Carry into MSB xor Carry out of MSB
1
1
0
1
1
0
0 1 1 1
0 0 1 1
0
1
1 10
1
1 1 0 0
1 0 1 1+
1
0
+
7
3
–4
– 5
– 6 7
331 W07.16 Fall 2003
+
less
+
result0
less
+
A31
B31
result31
less
. . .
0
set
Modify the most significant cell to determine overflow output setting
Disable overflow bit setting for unsigned arithmetic
zero
. . .
add/subtop
overflow
Modifying the ALU for OverflowA0
B0
A1
result1
B10
331 W07.17 Fall 2003
Example:When do the result outputs settle at their final values for the inputs:
add/subt = 0op = 000A = 1111B = 0001
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
331 W07.18 Fall 2003
Example: cont’dWhen do the result outputs settle at their final values for the inputs:
add/subt = 0op = 100A = 1111B = 0001
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
331 W07.19 Fall 2003
Example: cont’dWhen do the result outputs settle at their final values for the inputs:
add/subt = 1op = 101A = 1111B = 0001
What is the zero output of these inputs?
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
331 W07.20 Fall 2003
Example: cont’dWith the ALU design described in class, we assumed that a subtraction operation had to be performed as part of the beqinstruction. When do the outputs settle?
Is there a faster alternative?
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
o p
+
A 1
B 1
re s u lt1
le s s
+
A 0
B 0
re s u lt0
le s s
+
A 3
B 3
re s u lt3
le s s
0
0
s e t
z e ro
a d d /s u b t
o v e r f lo w
+
A 2
B 2
re s u lt2
le s s0
012345
2
+ 2
+ 6
+ 4
+ 8
+ 8
+ 8
+ 8
+ 6
331 W07.21 Fall 2003
But What about Performance?Critical path of n-bit ripple-carry adder is n*CP
Design trick – throw hardware at it (Carry Lookahead)
A0
B01-bitALU
Result0
CarryOut0
A1
B11-bitALU
Result1
CarryIn1
CarryOut1
A2
B21-bitALU
Result2
CarryIn2
CarryOut2
A3
B31-bitALU
CarryIn3
CarryIn0
Result3
CarryOut3
331 W07.22 Fall 2003
Fast carry using “infinite” hardware (Parallel)cout = b • cin + a • cin + a • bc1 = (b0+a0)•c0 + a0•b0 = a0•b0 + a0•c0 + b0•c0 c2 = (b1+a1)•c1 + a1•b1
= (b1+a1)•((b0+a0)•c0 + a0•b0) + a1•b1= a1•a0•b0 + a1•a0•c0 + b1•a0•c0 + b1•a0•b0 + a1•b0•c0 + b1•b0•c0 + b1•a1
c3 = a2•a1•a0•b0 + a2•a1•a0•c0 + a2•b1•a0•c0 + a2•b1•a0•b0 + a2•a1•b0•c0 + a2•b1•b0•c0 + a2•b1•a1 + …
…Outputs settle much faster
D_c3 = 2* D_and + D_or (best case)…D_c31 = 5 *D_and + D_or (best case)
Problem: Prohibitively expensive
331 W07.23 Fall 2003
Hierarchical Solution IHierarchical solution I
Group 32 bits into 8 4-bit groupsWithin each group, use carry look aheadUse 4-bit as a building block, and connect them in ripple carry fashion.
331 W07.24 Fall 2003
First Level: Propagate and generateci+1 = (ai•bi)+(ai+bi)•cigi = ai•bipi = (ai+bi)ci+1 = 1 if
gi = 1, orpi and ci = 1
c1 = g0+(p0•c0)c2 = g1+(p1•g0)+(p1•p0•c0)c3 = g2+(p2•g1)+(p2•p1•g0)+(p2•p1•p0•c0)c4 = g3+(p3•g2)+(p3•p2•g1)+ (p3•p2•p1•g0) + (p3•p2•p1•p0•c0)
ci+1 = gi + pi•ci
331 W07.25 Fall 2003
Hierarchical Solution I (16 bit)
ALU0
A0B0
c0=carry_in
A1B1A2B2A3B3
ALU1
c4=carry_in
…
result 0-3
A4
Delay = 4 * Delay ( 4-bit carry look-ahead ALU) B4A5B5
result 4-7A6B6A7B7
331 W07.26 Fall 2003
Hierarchical Solution IIHierarchical solution I
Group 32 bits into 8 4-bit groupsWithin each group, use carry look aheadUse 4-bit as a building block, and connect them in ripple carry fashion.
Hierarchical solution IIGroup 32 bits into 8 4-bit groupsWithin each group, use carry look aheadAnother level of carry look ahead is used to connect these 4-bit groups
331 W07.27 Fall 2003
A0B0
A3B3
A4B4
A7B7
A8B8
A11B11
A12B12
A15B15
cin
P0G0
P1G1
P2G2
P3G3
result 4-7
result 8-11
pigi
ci+1C1
pi+1gi+1
pi+2
pi+3
gi+2
gi+3
ci+2C2
ci+3
ci+3
•input a0-a15, b0-b15
•calculate P0-P3, G0-G3
•Calculate C1-C4
•each 4-bit ALU calculates its results
Hierarchical Solution IIresult 0-3
Carry-lookahead unit
C3result 12-15
cout
331 W07.28 Fall 2003
Fast Carry using the second level abstractionP0 = p3.p2.p1.p0 P1 = p7.p6.p5.p4P2 = p11.p10.p9.p8P3 = p15.p14.p13.p12G0 = g3+(p3.g2) + (p3.p2.g1) + (p3.p2.p1.g0) G1 = g7+(p7.g6) + (p7.p6.g5) + (p7.p6.p5.g4)G2 = g11+(p11.g10)+(p11.p10.g9) + (p11.p10.p9.g8)G3 = g15+(p15.g14)+(p15.p14.g3)+(p15.p14.p3.g12)C1 = G+(P0•c0)C2 = G1+(P1•G0)+(P1•P0•c0)C3 = G2+(P2•G1)+(P2•P1•G0)+(P2•P1•P0•c0)C4 = G3+(P3•G2)+(P3•P2•G1)+(P3•P2•P1•G0) + (P3•P2•P1•P0•c0)
331 W07.29 Fall 2003
Shift OperationsAlso need operations to pack and unpack 8-bit characters into 32-bit words
Shifts move all the bits in a word left or right
sll $t2, $s0, 8 #$t2 = $s0 << 8 bits
srl $t2, $s0, 8 #$t2 = $s0 >> 8 bits
Such shifts are logical because they fill with zeros
op rs rt rd shamt funct
000000 00000 10000 01010 01000 000000
000000 00000 10000 01010 01000 000010
331 W07.30 Fall 2003
Shift Operations, con’t
An arithmetic shift (sra) maintain the arithmetic correctness of the shifted value (i.e., a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value)
so sra uses the most significant bit (sign bit) as the bit shifted innote that there is no need for a sla when using two’s complement number representation
sra $t2, $s0, 8 #$t2 = $s0 >> 8 bits
The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU
000000 00000 10000 01010 01000 000011
331 W07.31 Fall 2003
Multiplication
More complicated than additionaccomplished via shifting and addition
0010 (multiplicand)x_1011 (multiplier)
0010 0010 (partial product0000 array)
0010 00010110 (product)
Double precision product produced
More time and more area to compute
331 W07.32 Fall 2003
MIPS Multiply Instructionmult $s0, $s1 # hi||lo = $s0 * $s1
Low-order word of the product is left in processor register lo and the high-order word is left in register hiInstructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file
op rs rt rd shamt funct
000000 10000 10001 00000 00000 011000
331 W07.33 Fall 2003
Review: MIPS ISA, so far
lo = $s1/$s2, rem. in hidivu $s1, $s20 and 27divide unsigned
lo = $s1/$s2, rem. in hidiv $s1, $s20 and 26divide
hi || lo = $s1 * $s2multu $s1, $s20 and 25multiply unsigned
hi || lo = $s1 * $s2mult $s1, $s20 and 24multiply
$s1 = $s2 xor 6xori $s1, $s2, 614xor immediate
$s1 = $s2 xor $s3xor $s1, $s2, $s30 and 38xor$s1 = !($s2 | $s2)nor $s1, $s3, $s30 and 39nor
$s1 = $s2 | 6ori $s1, $s2, 613or immediate
Logical
(R & I format)
$s1 = $s2 | $s3or $s1, $s2, $s30 and 37or
$s1 = $s2 & 6andi $s1, $s2, 612and immediate
$s1 = $s2 & $s3and $s1, $s2, $s30 and 36and
$s1 = $s2 + 6addiu $s1, $s2, 69add imm. unsigned
$s1 = $s2 + 6addi $s1, $s2, 68add immediate
$s1 = $s2 - $s3subu $s1, $s2, $s30 and 35subt unsigned
$s1 = $s2 - $s3sub $s1, $s2, $s30 and 34subtract
0 and 330 and 32Op Code
$s1 = $s2 + $s3addu $s1, $s2, $s3add unsigned$s1 = $s2 + $s3add $s1, $s2, $s3addArithmeti
c
(R & I format)
MeaningExampleInstrCategory
331 W07.34 Fall 2003
Review: MIPS ISA, so far con’t
$s1 = lomflo $s10 and 18move from lo
$s1 = $s2 << 4sll $s1, $s2, 40 and 0sllShift
(R format) $s1 = $s2 >> 4srl $s1, $s2, 40 and 2srl$s1 = $s2 >> 4sra $s1, $s2, 40 and 3sra
lo = $s1mtlo $s10 and 19move to lo
hi = $s1mthi $s10 and 17move to hi
$s1 = himfhi $s10 and 16move from hi
$s1 = Memory($s2+24)lw $s1, 24($s2)35load wordData Transfer
(I format)Memory($s2+24) = $s1sw $s1, 24($s2)43store word$s1 = Memory($s2+25)lb $s1, 25($s2)32load byte$s1 = Memory($s2+25)lbu $s1, 25($s2)36load byte unsignedMemory($s2+25) = $s1sb $s1, 25($s2)40store byte$s1 = 6 * 216lui $s1, 615load upper imm
Op Code MeaningExampleInstrCategory
331 W07.35 Fall 2003
Review: MIPS ISA, so far con’t
go to 10000; $ra=PC+4jal 25003jump and link
if ($s2<6) $s1=1 else $s1=0
slti $s1, $s2, 610set on less than immediate
go to $s1, $s2=PC+4jalr $s1, $s20 and 9jump and link reg
if ($s2<6) $s1=1 else $s1=0
sltiu $s1, $s2, 611set on less than imm. unsigned
if ($s2<$s3) $s1=1 else $s1=0
sltu $s1, $s2, $s3
0 and 43
set on less than unsigned
go to $s1jr $s10 and 8jump register
go to 10000j 25002jumpUncond. Jump (J & R format)
if ($s2<$s3) $s1=1 else $s1=0
slt $s1, $s2, $s30 and 42
set on less thanif ($s1 !=$s2) go to Lbne $s1, $s2, L5br on not equalif ($s1==$s2) go to Lbeq $s1, $s2, L 4br on equalCond.
Branch
(I & R format)
Op Code MeaningExampleInstrCategory