ECE 15B Computer OrganizationSpring 2010
Dmitri Strukov
Lecture 4: Arithmetic / Data Transfer Instructions
Partially adapted from Computer Organization and Design, 4th edition, Patterson and Hennessy, and classes taught by Ryan Kastner at UCSB
ECE 15B Spring 2010
Agenda
• Review of last lecture • Load/store operations • Multiply and divide instructions
ECE 15B Spring 2010
Last Lecture
ECE 15B Spring 2010
Assembly Language
• Basic job of a CPU: execute lots of instructions• Instructions are the primitive operations that the CPU may
execute
• Different CPUs implement different sets of instructions• Instruction Set Architecture (ISA) is a set of instructions a
particular CPU implements• Examples: Intel 80x86 (Pentium 4), IBM/Motorola Power
PC (Macintosh), MIPS, Intel IA64, ARM
ECE 15B Spring 2010
Assembly Variables: Registers
• Unlike HLL like C or Java, assembly cannot use variables– Why not? Keep hardware simple
• Assembly Operands are registers– Limited number of special locations built directly
into the hardware– Operations can only be performed on these– Benefit: Since registers file is small, it is very fast
ECE 15B Spring 2010
Assembly Variables: Registers• By convention, each register also has a name to make it easier
to code • For now:
$16 - $23 $s0 - $s7(correspond to C variables)$8 - $15 $t0 - $t7(correspond to temporary variables)
Will explain other 16 register names later
• In general, use names to make your code more readable
ECE 15B Spring 2010
MIPS Syntax• Instruction Syntax:[Label:] Op-code [oper. 1], [oper. 2], [oper.3], [#comment] (0) (1) (2) (3) (4) (5)
– Where1) operation name2,3,4) operands5) comments0) label field is optional, will discuss later
– For arithmetic and logic instruction2) operand getting result (“destination”)
3) 1st operand for operation (“source 1”) 4) 2nd operand for operation (source 2”
• Syntax is rigid– 1 operator, 3 operands– Why? Keep hardware simple via regularity
ECE 15B Spring 2010
Addition and Subtraction of Integers• Addition in assembly– Example:
add $s0, $s1, $s2 (in MIPS)• Equivalent to: a = b + c (in C)• Where MIPS registers $s0, $s1, $s2 are associated with C
variables a, b, c
• Subtraction in Assembly– Example
Sub $s3, $s4, S5 (in MIPS)• Equivalent to: d = e - f (in C)• Where MIPS registers $s3, $s4, $s5 are associated with C
variables d, e, f
ECE 15B Spring 2010
Addition and Subtraction of Integers
• How do we do this? f = (g + h) – (i + j)Use intermediate temporary registers
add $t0, $s1, $s2 #temp = g + hadd $t1, $s3, $s4 #temp = I + j
sub $s0, $t0, $t1 #f = (g+h)-(i+j)
ECE 15B Spring 2010
Immediates
• Immediates are numerical constants• They appear often in code, so there are special
instructions for them• Add immediate:
addi $s0, $s1, 10 # f= g + 10 (in C)– Where MIPS registers $s0 and $s1 are associated
with C variables f and g– Syntax similar to add instruction, except that last
argument is a number instead of register
ECE 15B Spring 2010
Load and Store Instructions
ECE 15B Spring 2010
CPU Overview
ECE 15B Spring 2010
… with muxes Can’t just join wires
together Use multiplexers
… with muxes
ECE 15B Spring 2010
Memory Operands• Main memory used for composite data– Arrays, structures, dynamic data
• To apply arithmetic operations– Load values from memory into registers– Store result from register to memory
• Memory is byte addressed– Each address identifies an 8-bit byte
• Words are aligned in memory– Address must be a multiple of 4
• MIPS is Big Endian– Most-significant byte at least address of a word– c.f. Little Endian: least-significant byte at least address
ECE 15B Spring 2010
ECE 15B Spring 2010
Data Transfer: Memory to Register• MIPS load Instruction Syntax lw register#, offset(register#) (1) (2) (3) (4)
Where1) operation name
2) register that will receive value 3) numerical offset in bytes 4) register containing pointer to memory
lw – meaning Load Word32 bits or one word are loaded at a time
ECE 15B Spring 2010
Data Transfer: Register to Memory• MIPS store Instruction Syntax sw register#, offset(register#) (1) (2) (3) (4)
Where1) operation name
2) register that will be written in memory 3) numerical offset in bytes 4) register containing pointer to memory
sw – meaning Store Word32 bits or one word are stored at a time
Memory Operand Example 1
• C code:g = h + A[8];– g in $s1, h in $s2, base address of A in $s3
• Compiled MIPS code:– Index 8 requires offset of 32• 4 bytes per word
lw $t0, 32($s3) # load wordadd $s1, $s2, $t0
offset base register
ECE 15B Spring 2010
Memory Operand Example 2
• C code:A[12] = h + A[8];– h in $s2, base address of A in $s3
• Compiled MIPS code:– Index 8 requires offset of 32lw $t0, 32($s3) # load wordadd $t0, $s2, $t0sw $t0, 48($s3) # store word
ECE 15B Spring 2010
Registers vs. Memory
• Registers are faster to access than memory• Operating on memory data requires loads and
stores– More instructions to be executed
• Compiler must use registers for variables as much as possible– Only spill to memory for less frequently used
variables– Register optimization is important!
ECE 15B Spring 2010
Byte/Halfword Operations• MIPS byte/halfword load/store
– String processing is a common caselb rt, offset(rs) lh rt, offset(rs)
– Sign extend to 32 bits in rtlbu rt, offset(rs) lhu rt, offset(rs)
– Zero extend to 32 bits in rtsb rt, offset(rs) sh rt, offset(rs)
– Store just rightmost byte/halfword
Why do we need them? characters and multimedia data are expressed by less than 32 bits;
having dedicated 8 and 16 bits load and store instructions results in faster operation
ECE 15B Spring 2010
ECE 15B Spring 2010
Two’s Compliment RepresentationMultiply and Divide
Unsigned Binary Integers• Given an n-bit number
00
11
2n2n
1n1n 2x2x2x2xx
Range: 0 to +2n – 1 Example
0000 0000 0000 0000 0000 0000 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
Using 32 bits 0 to +4,294,967,295
ECE 15B Spring 2010
2s-Complement Signed Integers• Given an n-bit number
00
11
2n2n
1n1n 2x2x2x2xx
Range: –2n – 1 to +2n – 1 – 1 Example
1111 1111 1111 1111 1111 1111 1111 11002
= –1×231 + 1×230 + … + 1×22 +0×21 +0×20
= –2,147,483,648 + 2,147,483,644 = –410
Using 32 bits –2,147,483,648 to +2,147,483,647
ECE 15B Spring 2010
2s-Complement Signed Integers
• Bit 31 is sign bit– 1 for negative numbers– 0 for non-negative numbers
• –(–2n – 1) can’t be represented• Non-negative numbers have the same unsigned and
2s-complement representation• Some specific numbers– 0: 0000 0000 … 0000– –1: 1111 1111 … 1111– Most-negative: 1000 0000 … 0000– Most-positive: 0111 1111 … 1111
ECE 15B Spring 2010
Signed Negation• Complement and add 1– Complement means 1 → 0, 0 → 1
x1x
11111...111xx 2
Example: negate +2 +2 = 0000 0000 … 00102
–2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102
ECE 15B Spring 2010
Sign Extension• Representing a number using more bits– Preserve the numeric value
• In MIPS instruction set– addi: extend immediate value– lb, lh: extend loaded byte/halfword– beq, bne: extend the displacement
• Replicate the sign bit to the left– c.f. unsigned values: extend with 0s
• Examples: 8-bit to 16-bit– +2: 0000 0010 => 0000 0000 0000 0010– –2: 1111 1110 => 1111 1111 1111 1110
ECE 15B Spring 2010
Integer Addition
• Example: 7 + 6
ECE 15B Spring 2010
Integer Subtraction
• Add negation of second operand• Example: 7 – 6 = 7 + (–6)
+7: 0000 0000 … 0000 0111–6: 1111 1111 … 1111 1010+1: 0000 0000 … 0000 0001
ECE 15B Spring 2010
Multiplication• Start with long-multiplication approach
1000× 1001 1000 0000 0000 1000 1001000
Length of product is the sum of operand lengths
multiplicand
multiplier
product
ECE 15B Spring 2010
Multiplication Hardware
Initially 0
ECE 15B Spring 2010
ECE 15B Spring 2010
Stopped here… will start next lecture from here
Optimized Multiplier• Perform steps in parallel: add/shift
One cycle per partial-product addition That’s ok, if frequency of multiplications is low
ECE 15B Spring 2010
Faster Multiplier• Uses multiple adders– Cost/performance tradeoff
Can be pipelined Several multiplication performed in parallel
ECE 15B Spring 2010
MIPS Multiplication
• Two 32-bit registers for product– HI: most-significant 32 bits– LO: least-significant 32-bits
• Instructions– mult rs, rt / multu rs, rt
• 64-bit product in HI/LO– mfhi rd / mflo rd
• Move from HI/LO to rd• Can test HI value to see if product overflows 32 bits
– mul rd, rs, rt• Least-significant 32 bits of product –> rd
ECE 15B Spring 2010
Division• Check for 0 divisor• Long division approach
– If divisor ≤ dividend bits• 1 bit in quotient, subtract
– Otherwise• 0 bit in quotient, bring down next
dividend bit• Restoring division
– Do the subtract, and if remainder goes < 0, add divisor back
• Signed division– Divide using absolute values– Adjust sign of quotient and remainder as
required
10011000 1001010 -1000 10 101 1010 -1000 10
n-bit operands yield n-bitquotient and remainder
quotient
dividend
remainder
divisor
ECE 15B Spring 2010
Division Hardware
Initially dividend
Initially divisor in left half
ECE 15B Spring 2010
Optimized Divider
• One cycle per partial-remainder subtraction• Looks a lot like a multiplier!– Same hardware can be used for both
ECE 15B Spring 2010
Faster Division
• Can’t use parallel hardware as in multiplier– Subtraction is conditional on sign of remainder
• Faster dividers (e.g. SRT devision) generate multiple quotient bits per step– Still require multiple steps
ECE 15B Spring 2010
MIPS Division
• Use HI/LO registers for result– HI: 32-bit remainder– LO: 32-bit quotient
• Instructions– div rs, rt / divu rs, rt– No overflow or divide-by-0 checking• Software must perform checks if required
– Use mfhi, mflo to access result
ECE 15B Spring 2010
ECE 15B Spring 2010
Conclusions
• In MIPS assembly language– Register replace C variables– One instruction (simple operation) per line– Simpler is faster
• Memory is byte-addressable, but lw and sw access one word at a time
• A pointer (used by lw and sw) is just a memory address, so we can add to it or subtract from it (using offset)
ECE 15B Spring 2010
Review
• Instructions so far:add, addi, submult, div, mfhi, mflo, lw, sw, lb, lbu, lh, lhu
• Registers so farC variables: $s0 - $s7Temporary variables: $t0 - $t9Zero: $zero