Computer Arithmetics
SC414: introduction to computer architecture
Dr. Dhammika ElkaduweDepartment of Computer EngineeringFaculty of Engineering University of Peradeniya
Recap
ISA Instruction encoding Simple programs in assembly Function calling methods
Compilation, linking and loading process Memory hierarchy
How cache memories work Virtual memory
Todays plan
Look at computer arithmetic Design tradeoffs Float point numbers
Chapter 3: Computer Organization and design: the hardware software interface
How to represent numbers?
Computers use binary Convert base 10 into binary (or from
any other base) Conversion can be done by division
by 2 Example: 10
10 = 1010
2
What about negative numbers?Simple solution: add a sign bit Use one bit (MSB) to indicate the sign MSB 0 positive, 1 negative → →
Called the “sign and magnitude” method Issues with the approach
Waste a bit Arithmetic operations are difficult: need to reset
the sign bit Have negative and positive zero (may confuse
some applications)
What about negative numbers?
Two's compliment Positive numbers are represented as
before (2n 1, where n is the number of bits used)
Negative numbers are represented as a complement Get the corresponding positive value Flip all the bits Add one
Two's complement
With n bits we can have 2n1 up to 2n1Note that it is not symmetric around
zero Example: suppose we use 4bits, then
Most negative number is: 1000 (8)Next most negative number is: 1001
(1) Most positive number: 0111 (7)
Exercises
Find the decimal number of the following 8bit two's complement numbers written in hexadecimal format
AB 6A FF EF
Exercises
Find 8bit two's complement of the following negative numbers, do the arithmetic operation and find the corresponding decimal value
1 9 8 16 1 + 4
Advantages of two's complement Main advantage: simplifies the
hardware Asymmetric around 0 Better range than sign magnitude
method Used in all most all the computers
Subtraction
SUB $r1, $r1, $r2 $r1 = $r1 $r2 →Find the two's complement of the
value in $r2 Use the addition circuit The end result would be subtraction The result is the two's complement
representation • Works due to overflow
Watchout for overflows Should you come out of the following
loops? Value changes sign when it reach the
maximum int main(){ int i = 0; while(i < 1); printf("Should not be here!\n");}
int main(){ int i = 1; while(i++ > 0 ); printf("Should not be here\n");
}
Loading signed values
Some ISAs provide an signed and unsigned load
In MIPS all loads are signed You load the sign bit as well Load byte (LB) and Load half (LH)
extends the signed bit to cover the remaining 24 and 16 MSBs respectively
Summary of signed numbers Sign and magnitude
MSB gives the sign Two's complement
Negative numbers are represented as 2n – ABS(x) where n is the number of bits used for the representation
One's complement Negative numbers are represented as
(2n 1 )– ABS(x)
Note on the overflow
Overflow can occurs when Adding large positive numbers Subtracting a larger number from a positive
This is how we get a negative result Borrow from the signed bit
Overflow condition
At times we want to ignore the overflow Example: dealing with memory pointers
MIPS provides two types of instructions for this add, addi, sub cause exception on overflow addu, addiu, subu does not cause overflow
exceptions Compiler selects the correct instruction
based on the data type (example, unsigned int or int)
Note that C ignores the overflow exception
Floating point
Integers are limited We need way to support factions So called “real” numbers
Things to ask ourselves:How to convert a fraction into binary?How numbers can be represented How can we encode them?
Recall: scientific notation
Example: 1.2 x 103
Idea: One digit leading the decimal point
Convert the following into scientific notation 10.1223 0.00012 12.34
Binary numbers
11011011 can we represent using →scientific notation 1.1011011 x 27
The point is called the binary point (not the decimal point we know)
How to encode such numbers?We need to track:Faction number Exponent
Observation
1.1001011 x 24
We cannot have anything other than one as the first digit?We can absorb that into the exponent
So, we can drop that from our representation and add it when we want to work with the number We call this the significant 1.significant = fraction Also called the 'hidden one' method
Designing time
We need some number of bits for the significant and some number of bits for the exponent
It is a tradeoff between accuracy and range More bits for exponent higher range →More bits for significant more →
accuracy
MIPS float point
MSB sign of the number → Next 11bits exponent using sign →
and magnitude representation Remaining 20bits fraction →
IEEE 754 float point numbers
Similar to MIPS float point in the use of bitsHowever, representation is different
32bit Value = (1)s x (1 + significand) x 2(exponent – 127)
Example
Represent 0.5 using the IEEE 754 standard 0.510 = .12 = 1.0 x 2 (1)
S=0, exponent = 126, significand=0
Example:
What is the number given below in IEEE 754 format?
X = 1.012 x 2 (129127) = 1.25 * 4 = 510
IEEE 754 format
Has representation to denote exceptions (such as0/0)
They are typically shown as NaN (Not a Number)
The representation is selected so that comparisons can be fast First the sign Then the exponent
Thinking time.... What is the maximum number that can be
represented using IEEE754 format?
What is the minimum number that can be represented using IEEE754 format?
Can you represent 0
Represent 0.6 using IEEE754
Issue:
We cannot represent some numbers accurately Loss of precession We some strange results when doing some
computations!!! On the other hand we might not need this much
precession
Double precision
Try to address the underflow condition 64bit floating point numbers Additional 32bits used for the
significant
Exceptions
Overflows: You have numbers too big to be represented
using the exponent (unlike integers we have another issue)
Underflows: Fraction bits are not enough to represent the
number • You end up with zero• A common mistake when doing arithmetics
Both overflow and underflow raises exceptions
Note
Some programming languages support arbitrary precision numbers Either via libraries (example C) Or via the language (example Haskell)
So that,You will not lose the accuracy Or get wrong results
Floating point addition
We cannot add them as they are First need to align the binary point Do the addition of the fraction Then adjust the exponent accordingly
MIPS instructions for float point numbers Provides instructions for IEEE 754 single
and double precision numbers Instructions include: addition,
substantiation, multiplication, division, comparison
Example: add.s single precision addition →mul.d double precision →
multiplication
IA32 floating point unit
Coprocessor to handle float point arithmetic Separate registers
About 100 instructions for dealing with float points Data movement, comparisons, addition,
subtraction, squareroot, sine, .... Stack architecture to deal with float point numbers
Loads push arguments into the stack Operations pop them from there Operands are converted into a different format
before pushing to the stack