Date post: | 16-Jan-2016 |
Category: |
Documents |
Upload: | clarissa-austin |
View: | 215 times |
Download: | 1 times |
Copyright 2005 Curt Hill
Data Representation
How are data represented?
Copyright 2005 Curt Hill
Introduction to data coding
• In order to manipulate data on a computer we need to be able to represent it
• The purpose of the binary/octal/hexadecimal talk is that everything in machine’s memory is in binary, though we can display it in many forms
• The expression digital computers comes from the fact that all data is quantized into numbers
• By contrast we also have analog computers where are numbers are represented by voltages that can vary continuously
Copyright 2005 Curt Hill
What needs representation?
• We need to be able to represent a variety of different types of data– Such as C’s simple types:
• Integers (int)• Reals (float and double)• Characters (char)• Booleans (bool)
Copyright 2005 Curt Hill
Boolean• Easiest to represent• Booleans only need one bit to represent
a true/false value• However, a bit is an inconvenient
amount since bytes (8 bits) are the ordinary unit of storage
• We end up with a two pronged approach• Usually store a boolean in one byte
– EG 0 = false, 1 = true– or perhaps one word, same values
• Collect 1-8 booleans in a single byte and manipulate as a group– Here a boolean only occupies one bit– Some machine instructions will have a
boolean embedded in them as an instruction switch
Copyright 2005 Curt Hill
Character• Character is almost as easy• We essentially map one byte to one
character• The mapping is just a convention
between the sending and receiving device
• Several standard conventions: ASCII, EBCDIC, BCD, CDC display code, Unicode
• Each is a different mapping and length
Copyright 2005 Curt Hill
ASCII
• American Standard Code for Information Interchange
• ASCII is a 7 bit code 0-127• See next for ordering
Copyright 2005 Curt Hill
ASCII has the following ordering:
• control characters (0-31)• blank (32)• punctuation and math (33-47)
– $ 36– & 38– ) 41– . 46– / 47
• digits (48-57)• more punctuation (58-64)
– < 60– ? 63
• Upper case (65 - 90)• more punctuation (91-96)
– [ 91– _ 95
• Lower case (97-122)• more punctuation (123-127)
– { 123– ~ 126
Copyright 2005 Curt Hill
EBCDIC
• Extended Binary Coded Decimal Interchange Code
• 8 bit code 0 – 255 • IBM and other mainframes• See next for ordering
Copyright 2005 Curt Hill
EBCDIC ordering:
• control characters (0-63)• blank (64)• punctuation and math (65-127)• Lower case (129-169, with gaps)• Upper case (193-233, with gaps)• digits (240-249)• The gaps are sometimes
unassigned, sometimes occupied
Copyright 2005 Curt Hill
Others• Most of the others are seldom used
anymore• The one other important is Unicode• Unicode is 16 bit
– First byte selects language– Second byte the character
• Language code 0 means ASCII• Language code 10 is Italian
– Similar to ASCII but with a few different characters
• Oriental languages usually need several
Copyright 2005 Curt Hill
What is desirable for integers?
• The desirable characteristics include:– Easy to use– Compact– Accurate– Standardized
• Some of these are contradictory
Copyright 2005 Curt Hill
Easy to use
• The form has to be such that the CPU can perform operations on it efficiently
• Consider integers• Arithmetic needs to be easy to
implement using digital logic• Conversion from an internal
representation to an external representation should also be easy
Copyright 2005 Curt Hill
Compact• It is a maxim that no computer has too
much memory• We consider the compactness issue in
many contexts• We want our data to be compact,
whether it is integer, real, etc.• We also want our instructions compact
– Which tends to make us want our addresses compact
– Which will bring out the whole issue of addressing schemes, but later
Copyright 2005 Curt Hill
Accurate• We could represent all integers as
signed characters• However, that would give us an
arithmetic range of ±127– This does not prove to be an valuable
range for integer arithmetic• Generally speaking compactness
and accuracy are opposites• Remember the three digit
arithmetic game
Copyright 2005 Curt Hill
Standardized for interchange
• We would like to transfer data around between different models of computers– Recall big-endian and little-endian
• Hence if we can settle on one or just a few data transfer formats then this problem will be much easier
• Generally transporting large amounts of data in external format is bad idea
Copyright 2005 Curt Hill
Representing signed numbers
• The binary presentation showed how to store positive integers
• How do we represent negatives?• Three general strategies:
– Sign magnitude– Excess notation (aka bias)– Complement
• All have been used
Copyright 2005 Curt Hill
Issues• How many zeros?
– One is desirable• How many singularities?
– A singularity is where we get an unexpected value when counting by one in binary
– Overflow is result– We will always have one in a finite number
of bits• How do we change signs?
– How easy is it to say if thing is positive/zero/negative
• How easy is it to add two numbers, especially of opposite sign
Copyright 2005 Curt Hill
Sign magnitude• Partition the bit string into two pieces• One bit for sign, usually the left most bit• The rest for the magnitude of the
number• Example: 4 bit number:
– 1000 = 0000 = zero– 0010 = 2– 1010 = -2– ±7 is the range
• Notice the two representations for zero, which makes circuitry somewhat more exciting, that is complicated
• Flipping the sign is easy• There are also two singularities
Copyright 2005 Curt Hill
Sign Magnitude Singularities• If we count up from postive zero
• 0000 = 0• 0001 = 1• …• 0111 = 7• 1000 = -0
• If we count down from negative zero
• 1000 = 0• 1001 = -1• …• 1111 = -7• 0000 = +0
Copyright 2005 Curt Hill
Sign Magnitude Conversion
• Converting to or from signed magnitude is easy
• Take absolute value of number • Convert decimal to binary • Set the sign bit
Copyright 2005 Curt Hill
Excess notation• Also known as biased notation• In unsigned binary we count
from zero to some power of two minus one– In four bit: 0 – 15
• The basic idea of excess notation is that we use the same range but do not start at zero
Copyright 2005 Curt Hill
Bias• Use all the range of your bit string and
then subtract a fixed amount– This is the excess or bias amount
• For example consider our 4 bit string– 0000 - 1111 is 0-15– Choose a number, usually near the range
divided by 2 and subtract• In this consider 8
– 0000 becomes (0 – 8) or -8– 1000 becomes (8 – 8) or zero– 1111 becomes (15 - 8) or 7– Range is -8 to 7
• This is excess 8
Copyright 2005 Curt Hill
Excess identification• We may choose any number within
the range to subtract from the straight binary value– This identifies the notation
• Usually near the midpoint– For four bit it is usually 7 or 8– 4 bit excess 7 gives -7 … +8 – 4 bit excess 8 gives -8 … +7
• There is no need for symmetry in the range– 4 bit excess 4 gives -4 … +11
Copyright 2005 Curt Hill
Bias: the good and the bad
• There is only one representation of zero so that simplifies things
• There is only one singularity• Counting is simple• In general there is no easy way to
identify the sign– In four bit excess 8 the first bit acts like a
sign bit– In all other forms that is not the case
• No easy negation either
Copyright 2005 Curt Hill
Conversion
• To Excess notation– Add bias value to the decimal number– If negative, it cannot be represented– Convert to binary
• From Excess– Convert to decimal – Subtract bias
Examples
• Convert -4 to excess 8– -4 + 8 = 4– 0100
• Convert 0110 (excess 8) to decimal– 0110 = 6 – Subtract 8 = -2
Copyright 2005 Curt Hill
Copyright 2005 Curt Hill
Complement• A complement is a negation of each bit
in a bit string• 1001 complemented is 0110• The representation for positives is the
same as with sign magnitude• In complement arithmetic what we do is
add a large bias to all negative numbers so that the negatives appear to be larger than the positives
• There are two frequently used complements:– Ones complement– Twos complement
Copyright 2005 Curt Hill
Ones complement• Reverses the bit pattern of the bit
string, so 0101 become 1010• This is trivially easy to do• The left digit now functions as a
sign• The problem is that like sign
magnitude we end up with a positive and negative zero
• Somewhat more common is twos complement
Copyright 2005 Curt Hill
Conversion
• To ones complement– Take absolute value– Convert to binary– If original was negative complement
• To decimal– If negative complement– Convert to decimal– Recall the sign and apply
Copyright 2005 Curt Hill
Twos complement• The negative is obtained by:
– Taking complement– Add 1 (ignoring overflow)
• 0110 = 6– Complement to 1001– Add 1 to 1010
• This is nicely reversible– Complement to 0101– Add 1 to 0110
Copyright 2005 Curt Hill
Singularities• Just one• Consider 4 bit
– 0101 – +5– 0110 – +6 – 0111 – +7– 1000 – -8– 1001 – -7– …– 1111 – -1– 0000 – 0
Copyright 2005 Curt Hill
Two weird states• What happens when zero is
complemented?– Start with 0000– Complement to 1111– Add 1 (ignoring carry) to 0000
• What happens when the largest negative is complemented?– Recall there is no positive counterpart– Start with -8 as 1000– Complement to 0111– Add 1 to get 1000
Copyright 2005 Curt Hill
Conversion
• To twos complement• Take absolute value• Convert to binary• If negative
– Complement – Add 1 ignoring overflow
Copyright 2005 Curt Hill
Good Features
• One singularity• One weird state -8 is own negative• One zero• Easy to count and other arithmetic• Twos complement is the most
common coding for integers, almost universal for various reasons
Copyright 2005 Curt Hill
Integer data type
• With twos complement the circuitry needed to perform arithmetic is very straightforward
• An adder, an inverter and the ability to shift, left or right gives us the four basic functions
• We will consider these later• One other problem: overflow
Copyright 2005 Curt Hill
Overflow
• Overflow is only a problem addition of two operands of the same sign
• Merely check that the result has the same sign as both of the operands
• If so then no overflow occurred• Otherwise you have detected
overflow– Then post an interrupt
Copyright 2005 Curt Hill
Integer Examples• Suppose that we have 01001101• What is that in signed magnitude?• 77• What is that in excess 100?• -23• What is that in ones complement?• 77• What is that in twos complement?• 77
Copyright 2005 Curt Hill
Integer Examples• Suppose that we have 10011010• What is that in unsigned binary?• 154• What is that in signed magnitude?• -26• What is that in excess 100?• 54• What is that in ones complement?• -101• What is that in twos complement?• -102
Copyright 2005 Curt Hill
Real numbers• The hardest and most interesting• Some interesting numbers that we need to
represent in order to be generally useful:• π 3.141592654• Speed of light 3.0 x 108 meters per second• Avogadro’s number 6.022 x 1023 atoms per
mole• Mass of earth 5.98 x 1024 kilograms• Electron charge -1.60 x 10-19 Coulombs• Planck’s constant 6.63 x 10-27 erg seconds• Without the ability to represent these, the
machine will be of dubious usefulness
Copyright 2005 Curt Hill
Real Numbers
• For the most part there are just two approaches– Rationals– Floating points
Copyright 2005 Curt Hill
Rationals
• Just integer fractions• They can exactly represent any number
that is the ratio of two integers• Store a numerator and denominator• Our calculations with these are
accurate, provided overflow does not occur
• The disadvantage of these is that they have problems with some of the extreme numbers that are given before– But not necesarily the ones you might think
Copyright 2005 Curt Hill
Rational Problems• PI is irrational
– But we can represent it– The rational expression is no worse than
floating point
• The extreme magnitudes are more problematic– Such as Planck’s constant or Avogadro’s
number
• An 8 byte number can represent a decimal exponent of 18– To have two of those in a rational is very
bulky– Still cannot represent either of the above
Copyright 2005 Curt Hill
Floating point notation
• Similar to scientific notation• Mantissa, base, exponent• There is considerable variation as to
how this is done - that is there are a lot of different schemes that have been used successfully
• There are typically four pieces– Sign bit– Mantissa– Exponent– Base
Copyright 2005 Curt Hill
The pieces• Sign bit determines the sign of the entire
number and is usually separate from that of the exponent and separated (physically) from the mantissa
• Mantissa is the digits of precision• Exponent is a number that shifts the
radix point to right or left• The base is the number that the
exponent is raised to– Often 2 or 16
Copyright 2005 Curt Hill
The IBM 370 example
• These may occupy 32, 64 (double) or 128 (extended) bits
• The first bit (bit position zero) is the sign bit (0=positive)
• Bits 1-7 are called the characteristic, which is the exponent
• Bits 8-31 (63 or 127) are called the fraction (the mantissa)
Copyright 2005 Curt Hill
IBM 370 continued• The fraction is always positive or zero• The radix point is to the left of the first
significant digit, that is bit 8• Hence 1/2 is a 1 in bit position 8 and
zeros to the right of that, 1/4 is a 1 in bit position 9 ...
• A number is represented as the fraction multiplied by 16 raised to the characteristic power, with the sign attached to result
• The characteristic is in excess 64 notation
Copyright 2005 Curt Hill
IBM 370 Example• 0 1000000 10000000 ...• Here the sign is 0, that is positive• The exponent is 64, which in
excess 64 notation represents an exponent of zero
• The leading bit of the mantissa is 1 so this number is:
• ½ x 160 = 0.5
Copyright 2005 Curt Hill
Second example• 1 1000001 01010100 0...• The sign is 1, that is negative• The exponent is 65
– In excess 64 notation represents an exponent of one
• The mantissa is• 1/4 + 1/16 + 1/64• (1/4 + 1/16 + 1/64) x 161
– 16/4 = 4– 16/16 = 1– 16/64 = 1/4– -5.25
Copyright 2005 Curt Hill
Some Notes• A 32 bit floating point number in this
format has approximately 6.5 decimal digits of precision (significant digits)
• A 64 bit floating point number is just a 32 bit number with 32 extra bits of precision
• This was convenient that if you did a single precision operation on a double precision number you got a valid number with reduced precision
• An extended was a double with an additional 64 bits of precision
• However there was no way to extend the exponent range which was: 16-64 to 1663 which gives a decimal exponent of -78 to +75 which is adequate for the vast majority of calculations
Copyright 2005 Curt Hill
Zeros
• A true zero has a zero fraction and a zero sign and exponent
• If a zero fraction occurs, then the hardware usually forces zeros elsewhere
• Why do we not want a number like• 1 000010 00000101 0100...• This reduces the legitimate digits
we can store later
Copyright 2005 Curt Hill
Normalization• The process of normalization means
to shift the mantissa left as far as we can while adjusting the exponent so that it means the same thing
• Since the base is sixteen we shift left units of four bits
• We may end up with leading zeros, but in the first four bits there should always be a one bit
• In a machine where the base is 2 instead of 16, we can always shift to left until the first bit is one
Copyright 2005 Curt Hill
IEEE floating point standard
• Sometime after the IBM 370 arrived the IEEE society defined a standard for floating point numbers
• This was adopted for the numeric co-processor of Intel CPUs
• Has some desirable characteristics• There are two flavors
– Short (32 bit)– Long (64 bit)
Copyright 2005 Curt Hill
IEEE short
• First bit is a sign• Next eight bits are an exponent• Almost excess 127 format• Almost excess 127?
– Range is 0-255 – The 0 and 255 values are reserved
• Thus range is 1-254
– Legitimate exponents then are in range -126 to 127
• Next 23 bits are mantissa
Copyright 2005 Curt Hill
IEEE Oddities
• A true zero has an exponent of all bits zero in mantissa and exponent, regardless of sign
• There are three other special values:
• Positive infinity, negative infinity and NaN
Copyright 2005 Curt Hill
The weird states• Positive infinity, which is sign 0,
exponent 255, mantissa 0– This is the result of overflow such as division
by near zero
• Negative infinity, which is sign 1, exponent 255, mantissa 0– This is the result of overflow such as division
by near zero, where the sign would have been negative
• Not a number (NaN) where exponent is 255, and the mantissa is not zero with either sign– This results from undefined operation:
division by true zero, square root of a negative etc.
Copyright 2005 Curt Hill
One More Trick• Three zero mantissas
– True zero– Positive infinity– Negative infinity
• All other numbers are normalized such that the first mantissa bit is one
• Since it is always one – leave it out
• We effectively get 24 bits of precision out of 23 actual bits
Copyright 2005 Curt Hill
IEEE long• First bit is a sign• Next eleven bits are an exponent
– Almost excess 1023• Next 52 bits are mantissa– Same scheme as short with just
more bits in both mantissa and exponent
– Intel processors use the same scheme in their internal floating point numbers but use 80 bits
Copyright 2005 Curt Hill
DEC PDP11 floating• Bit 0 is sign• Next 8 bits is exponent in excess 128
format• -128 is reserved for true zero• Next 23 is fraction• Base is 2• A double precision version adds 32
more bits of mantissa• The leftmost bit of the normalized
mantissa is not stored
Copyright 2005 Curt Hill
Repeating and Terminating
• Explanation of terminating and repeating fractions
• .01 is a repeating fraction in binary• Hence, it is just as representable
as any irrational such as PI• This really aggravates accountants
– They do not like to lose pennies to roundoff error
• Some machines then use decimal arithmetic to alleviate these problems
Copyright 2005 Curt Hill
Binary Coded Decimal
•Sometimes an alternative integer representation is desirable and available
•An IBM 360 may have the decimal arithmetic feature– Costs extra money
•This feature allows three kinds of representations of integers– External decimal (printable digit string)– Internal decimal (two digits per byte)– Binary
Copyright 2005 Curt Hill
Three Forms of Integers
• Binary – twos complement• Internal or packed decimal• External or zoned decimal• The form of this has much to do
with punch cards and other esoteric details lost in antiquity
• There is also a conversion path between the three
Copyright 2005 Curt Hill
External decimal
• AKA Zoned decimal• A punch card just needs a single
punch for a digit– Digit 5 is a five punch
• When it gets into EBCDIC it is F0 in hex
• The F is known as a zone• How is sign handled?
Copyright 2005 Curt Hill
Signs on External Decimal
• On input an eleven punch was overstruck on the last digit
• Thus -139 needed four punches• It came out to F1F3D9• On output (such as a report) this
must be edited
Copyright 2005 Curt Hill
Internal decimal
• Each byte may hold two decimal digits
• A pack instruction strips the zone and makes the F1F3D9 into 139D– -139
• This is why it is known as packed decimal
• It may vary from 1 to 16 bytes
Copyright 2005 Curt Hill
Internal decimal format• All bytes but the last have two
decimal digits• The last byte
– One decimal digit– One sign
• The positive signs are C, (preferred) or A,E,F
• The negative signs are D (preferred) or B
• Notice there is a clear test for validity– Unlike binary numbers
Copyright 2005 Curt Hill
Conversion Path
• External to binary• Numbers were read off of a card in
zoned decimal• A Pack instruction stripped the
zones and rearranged the last byte– This instruction could not fail
• A Convert to Binary instruction deposited the result in a register– This could fail if the number was
invalid
Copyright 2005 Curt Hill
Conversion Path
• Binary to external• Binary value in register • Use Convert to Decimal instruction
– Creates internal decimal• Use an Unpack
– Gives something to punch on a card • Use an Edit
– Gives something for a report
Copyright 2005 Curt Hill
Decimal Arithmetic on 360
• Arithmetic can be performed in binary or internal decimal
• Binary arithmetic is always much faster• Why use decimal?• Binary has faster arithmetic, but the
conversion path is E=>I=>B and the reverse• Converting from decimal to binary adding
and then back to decimal is slower than just doing it in decimal– Business applications typically have very simple
arithmetic• The decimal is variable length from 1 to 16
bytes (1-31 digits), where binary can be 2, 4, or 8 bytes only