Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Data Representation

How are data represented?


Introduction to data coding

• In order to manipulate data on a computer we need to be able to represent it

• The purpose of the binary/octal/hexadecimal talk is that everything in machine’s memory is in binary, though we can display it in many forms

• The expression digital computers comes from the fact that all data is quantized into numbers

• By contrast we also have analog computers where are numbers are represented by voltages that can vary continuously


What needs representation?

• We need to be able to represent a variety of different types of data– Such as C’s simple types:

• Integers (int)• Reals (float and double)• Characters (char)• Booleans (bool)


Boolean• Easiest to represent• Booleans only need one bit to represent

a true/false value• However, a bit is an inconvenient

amount since bytes (8 bits) are the ordinary unit of storage

• We end up with a two pronged approach• Usually store a boolean in one byte

– EG 0 = false, 1 = true– or perhaps one word, same values

• Collect 1-8 booleans in a single byte and manipulate as a group– Here a boolean only occupies one bit– Some machine instructions will have a

boolean embedded in them as an instruction switch


Character• Character is almost as easy• We essentially map one byte to one

character• The mapping is just a convention

between the sending and receiving device

• Several standard conventions: ASCII, EBCDIC, BCD, CDC display code, Unicode

• Each is a different mapping and length


ASCII

• American Standard Code for Information Interchange

• ASCII is a 7 bit code 0-127• See next for ordering


ASCII has the following ordering:

• control characters (0-31)• blank (32)• punctuation and math (33-47)

– $ 36– & 38– ) 41– . 46– / 47

• digits (48-57)• more punctuation (58-64)

– < 60– ? 63

• Upper case (65 - 90)• more punctuation (91-96)

– [ 91– _ 95

• Lower case (97-122)• more punctuation (123-127)

– { 123– ~ 126


EBCDIC

• Extended Binary Coded Decimal Interchange Code

• 8 bit code 0 – 255 • IBM and other mainframes• See next for ordering


EBCDIC ordering:

• control characters (0-63)• blank (64)• punctuation and math (65-127)• Lower case (129-169, with gaps)• Upper case (193-233, with gaps)• digits (240-249)• The gaps are sometimes

unassigned, sometimes occupied


Others• Most of the others are seldom used

anymore• The one other important is Unicode• Unicode is 16 bit

– First byte selects language– Second byte the character

• Language code 0 means ASCII• Language code 10 is Italian

– Similar to ASCII but with a few different characters

• Oriental languages usually need several


What is desirable for integers?

• The desirable characteristics include:– Easy to use– Compact– Accurate– Standardized

• Some of these are contradictory


Easy to use

• The form has to be such that the CPU can perform operations on it efficiently

• Consider integers• Arithmetic needs to be easy to

implement using digital logic• Conversion from an internal

representation to an external representation should also be easy


Compact• It is a maxim that no computer has too

much memory• We consider the compactness issue in

many contexts• We want our data to be compact,

whether it is integer, real, etc.• We also want our instructions compact

– Which tends to make us want our addresses compact

– Which will bring out the whole issue of addressing schemes, but later


Accurate• We could represent all integers as

signed characters• However, that would give us an

arithmetic range of ±127– This does not prove to be an valuable

range for integer arithmetic• Generally speaking compactness

and accuracy are opposites• Remember the three digit

arithmetic game


Standardized for interchange

• We would like to transfer data around between different models of computers– Recall big-endian and little-endian

• Hence if we can settle on one or just a few data transfer formats then this problem will be much easier

• Generally transporting large amounts of data in external format is bad idea


Representing signed numbers

• The binary presentation showed how to store positive integers

• How do we represent negatives?• Three general strategies:

– Sign magnitude– Excess notation (aka bias)– Complement

• All have been used


Issues• How many zeros?

– One is desirable• How many singularities?

– A singularity is where we get an unexpected value when counting by one in binary

– Overflow is result– We will always have one in a finite number

of bits• How do we change signs?

– How easy is it to say if thing is positive/zero/negative

• How easy is it to add two numbers, especially of opposite sign


Sign magnitude• Partition the bit string into two pieces• One bit for sign, usually the left most bit• The rest for the magnitude of the

number• Example: 4 bit number:

– 1000 = 0000 = zero– 0010 = 2– 1010 = -2– ±7 is the range

• Notice the two representations for zero, which makes circuitry somewhat more exciting, that is complicated

• Flipping the sign is easy• There are also two singularities


Sign Magnitude Singularities• If we count up from postive zero

• 0000 = 0• 0001 = 1• …• 0111 = 7• 1000 = -0

• If we count down from negative zero

• 1000 = 0• 1001 = -1• …• 1111 = -7• 0000 = +0


Sign Magnitude Conversion

• Converting to or from signed magnitude is easy

• Take absolute value of number • Convert decimal to binary • Set the sign bit


Excess notation• Also known as biased notation• In unsigned binary we count

from zero to some power of two minus one– In four bit: 0 – 15

• The basic idea of excess notation is that we use the same range but do not start at zero


Bias• Use all the range of your bit string and

then subtract a fixed amount– This is the excess or bias amount

• For example consider our 4 bit string– 0000 - 1111 is 0-15– Choose a number, usually near the range

divided by 2 and subtract• In this consider 8

– 0000 becomes (0 – 8) or -8– 1000 becomes (8 – 8) or zero– 1111 becomes (15 - 8) or 7– Range is -8 to 7

• This is excess 8


Excess identification• We may choose any number within

the range to subtract from the straight binary value– This identifies the notation

• Usually near the midpoint– For four bit it is usually 7 or 8– 4 bit excess 7 gives -7 … +8 – 4 bit excess 8 gives -8 … +7

• There is no need for symmetry in the range– 4 bit excess 4 gives -4 … +11


Bias: the good and the bad

• There is only one representation of zero so that simplifies things

• There is only one singularity• Counting is simple• In general there is no easy way to

identify the sign– In four bit excess 8 the first bit acts like a

sign bit– In all other forms that is not the case

• No easy negation either


Conversion

• To Excess notation– Add bias value to the decimal number– If negative, it cannot be represented– Convert to binary

• From Excess– Convert to decimal – Subtract bias

Examples

• Convert -4 to excess 8– -4 + 8 = 4– 0100

• Convert 0110 (excess 8) to decimal– 0110 = 6 – Subtract 8 = -2



Complement• A complement is a negation of each bit

in a bit string• 1001 complemented is 0110• The representation for positives is the

same as with sign magnitude• In complement arithmetic what we do is

add a large bias to all negative numbers so that the negatives appear to be larger than the positives

• There are two frequently used complements:– Ones complement– Twos complement


Ones complement• Reverses the bit pattern of the bit

string, so 0101 become 1010• This is trivially easy to do• The left digit now functions as a

sign• The problem is that like sign

magnitude we end up with a positive and negative zero

• Somewhat more common is twos complement


Conversion

• To ones complement– Take absolute value– Convert to binary– If original was negative complement

• To decimal– If negative complement– Convert to decimal– Recall the sign and apply


Twos complement• The negative is obtained by:

– Taking complement– Add 1 (ignoring overflow)

• 0110 = 6– Complement to 1001– Add 1 to 1010

• This is nicely reversible– Complement to 0101– Add 1 to 0110


Singularities• Just one• Consider 4 bit

– 0101 – +5– 0110 – +6 – 0111 – +7– 1000 – -8– 1001 – -7– …– 1111 – -1– 0000 – 0


Two weird states• What happens when zero is

complemented?– Start with 0000– Complement to 1111– Add 1 (ignoring carry) to 0000

• What happens when the largest negative is complemented?– Recall there is no positive counterpart– Start with -8 as 1000– Complement to 0111– Add 1 to get 1000


Conversion

• To twos complement• Take absolute value• Convert to binary• If negative

– Complement – Add 1 ignoring overflow


Good Features

• One singularity• One weird state -8 is own negative• One zero• Easy to count and other arithmetic• Twos complement is the most

common coding for integers, almost universal for various reasons


Integer data type

• With twos complement the circuitry needed to perform arithmetic is very straightforward

• An adder, an inverter and the ability to shift, left or right gives us the four basic functions

• We will consider these later• One other problem: overflow


Overflow

• Overflow is only a problem addition of two operands of the same sign

• Merely check that the result has the same sign as both of the operands

• If so then no overflow occurred• Otherwise you have detected

overflow– Then post an interrupt


Integer Examples• Suppose that we have 01001101• What is that in signed magnitude?• 77• What is that in excess 100?• -23• What is that in ones complement?• 77• What is that in twos complement?• 77


Integer Examples• Suppose that we have 10011010• What is that in unsigned binary?• 154• What is that in signed magnitude?• -26• What is that in excess 100?• 54• What is that in ones complement?• -101• What is that in twos complement?• -102


Real numbers• The hardest and most interesting• Some interesting numbers that we need to

represent in order to be generally useful:• π 3.141592654• Speed of light 3.0 x 108 meters per second• Avogadro’s number 6.022 x 1023 atoms per

mole• Mass of earth 5.98 x 1024 kilograms• Electron charge -1.60 x 10-19 Coulombs• Planck’s constant 6.63 x 10-27 erg seconds• Without the ability to represent these, the

machine will be of dubious usefulness


Real Numbers

• For the most part there are just two approaches– Rationals– Floating points


Rationals

• Just integer fractions• They can exactly represent any number

that is the ratio of two integers• Store a numerator and denominator• Our calculations with these are

accurate, provided overflow does not occur

• The disadvantage of these is that they have problems with some of the extreme numbers that are given before– But not necesarily the ones you might think


Rational Problems• PI is irrational

– But we can represent it– The rational expression is no worse than

floating point

• The extreme magnitudes are more problematic– Such as Planck’s constant or Avogadro’s

number

• An 8 byte number can represent a decimal exponent of 18– To have two of those in a rational is very

bulky– Still cannot represent either of the above


Floating point notation

• Similar to scientific notation• Mantissa, base, exponent• There is considerable variation as to

how this is done - that is there are a lot of different schemes that have been used successfully

• There are typically four pieces– Sign bit– Mantissa– Exponent– Base


The pieces• Sign bit determines the sign of the entire

number and is usually separate from that of the exponent and separated (physically) from the mantissa

• Mantissa is the digits of precision• Exponent is a number that shifts the

radix point to right or left• The base is the number that the

exponent is raised to– Often 2 or 16


The IBM 370 example

• These may occupy 32, 64 (double) or 128 (extended) bits

• The first bit (bit position zero) is the sign bit (0=positive)

• Bits 1-7 are called the characteristic, which is the exponent

• Bits 8-31 (63 or 127) are called the fraction (the mantissa)


IBM 370 continued• The fraction is always positive or zero• The radix point is to the left of the first

significant digit, that is bit 8• Hence 1/2 is a 1 in bit position 8 and

zeros to the right of that, 1/4 is a 1 in bit position 9 ...

• A number is represented as the fraction multiplied by 16 raised to the characteristic power, with the sign attached to result

• The characteristic is in excess 64 notation


IBM 370 Example• 0 1000000 10000000 ...• Here the sign is 0, that is positive• The exponent is 64, which in

excess 64 notation represents an exponent of zero

• The leading bit of the mantissa is 1 so this number is:

• ½ x 160 = 0.5


Second example• 1 1000001 01010100 0...• The sign is 1, that is negative• The exponent is 65

– In excess 64 notation represents an exponent of one

• The mantissa is• 1/4 + 1/16 + 1/64• (1/4 + 1/16 + 1/64) x 161

– 16/4 = 4– 16/16 = 1– 16/64 = 1/4– -5.25


Some Notes• A 32 bit floating point number in this

format has approximately 6.5 decimal digits of precision (significant digits)

• A 64 bit floating point number is just a 32 bit number with 32 extra bits of precision

• This was convenient that if you did a single precision operation on a double precision number you got a valid number with reduced precision

• An extended was a double with an additional 64 bits of precision

• However there was no way to extend the exponent range which was: 16-64 to 1663 which gives a decimal exponent of -78 to +75 which is adequate for the vast majority of calculations


Zeros

• A true zero has a zero fraction and a zero sign and exponent

• If a zero fraction occurs, then the hardware usually forces zeros elsewhere

• Why do we not want a number like• 1 000010 00000101 0100...• This reduces the legitimate digits

we can store later


Normalization• The process of normalization means

to shift the mantissa left as far as we can while adjusting the exponent so that it means the same thing

• Since the base is sixteen we shift left units of four bits

• We may end up with leading zeros, but in the first four bits there should always be a one bit

• In a machine where the base is 2 instead of 16, we can always shift to left until the first bit is one


IEEE floating point standard

• Sometime after the IBM 370 arrived the IEEE society defined a standard for floating point numbers

• This was adopted for the numeric co-processor of Intel CPUs

• Has some desirable characteristics• There are two flavors

– Short (32 bit)– Long (64 bit)


IEEE short

• First bit is a sign• Next eight bits are an exponent• Almost excess 127 format• Almost excess 127?

– Range is 0-255 – The 0 and 255 values are reserved

• Thus range is 1-254

– Legitimate exponents then are in range -126 to 127

• Next 23 bits are mantissa


IEEE Oddities

• A true zero has an exponent of all bits zero in mantissa and exponent, regardless of sign

• There are three other special values:

• Positive infinity, negative infinity and NaN


The weird states• Positive infinity, which is sign 0,

exponent 255, mantissa 0– This is the result of overflow such as division

by near zero

• Negative infinity, which is sign 1, exponent 255, mantissa 0– This is the result of overflow such as division

by near zero, where the sign would have been negative

• Not a number (NaN) where exponent is 255, and the mantissa is not zero with either sign– This results from undefined operation:

division by true zero, square root of a negative etc.


One More Trick• Three zero mantissas

– True zero– Positive infinity– Negative infinity

• All other numbers are normalized such that the first mantissa bit is one

• Since it is always one – leave it out

• We effectively get 24 bits of precision out of 23 actual bits


IEEE long• First bit is a sign• Next eleven bits are an exponent

– Almost excess 1023• Next 52 bits are mantissa– Same scheme as short with just

more bits in both mantissa and exponent

– Intel processors use the same scheme in their internal floating point numbers but use 80 bits


DEC PDP11 floating• Bit 0 is sign• Next 8 bits is exponent in excess 128

format• -128 is reserved for true zero• Next 23 is fraction• Base is 2• A double precision version adds 32

more bits of mantissa• The leftmost bit of the normalized

mantissa is not stored


Repeating and Terminating

• Explanation of terminating and repeating fractions

• .01 is a repeating fraction in binary• Hence, it is just as representable

as any irrational such as PI• This really aggravates accountants

– They do not like to lose pennies to roundoff error

• Some machines then use decimal arithmetic to alleviate these problems


Binary Coded Decimal

•Sometimes an alternative integer representation is desirable and available

•An IBM 360 may have the decimal arithmetic feature– Costs extra money

•This feature allows three kinds of representations of integers– External decimal (printable digit string)– Internal decimal (two digits per byte)– Binary


Three Forms of Integers

• Binary – twos complement• Internal or packed decimal• External or zoned decimal• The form of this has much to do

with punch cards and other esoteric details lost in antiquity

• There is also a conversion path between the three


External decimal

• AKA Zoned decimal• A punch card just needs a single

punch for a digit– Digit 5 is a five punch

• When it gets into EBCDIC it is F0 in hex

• The F is known as a zone• How is sign handled?


Signs on External Decimal

• On input an eleven punch was overstruck on the last digit

• Thus -139 needed four punches• It came out to F1F3D9• On output (such as a report) this

must be edited


Internal decimal

• Each byte may hold two decimal digits

• A pack instruction strips the zone and makes the F1F3D9 into 139D– -139

• This is why it is known as packed decimal

• It may vary from 1 to 16 bytes


Internal decimal format• All bytes but the last have two

decimal digits• The last byte

– One decimal digit– One sign

• The positive signs are C, (preferred) or A,E,F

• The negative signs are D (preferred) or B

• Notice there is a clear test for validity– Unlike binary numbers


Conversion Path

• External to binary• Numbers were read off of a card in

zoned decimal• A Pack instruction stripped the

zones and rearranged the last byte– This instruction could not fail

• A Convert to Binary instruction deposited the result in a register– This could fail if the number was

invalid


Conversion Path

• Binary to external• Binary value in register • Use Convert to Decimal instruction

– Creates internal decimal• Use an Unpack

– Gives something to punch on a card • Use an Edit

– Gives something for a report


Decimal Arithmetic on 360

• Arithmetic can be performed in binary or internal decimal

• Binary arithmetic is always much faster• Why use decimal?• Binary has faster arithmetic, but the

conversion path is E=>I=>B and the reverse• Converting from decimal to binary adding

and then back to decimal is slower than just doing it in decimal– Business applications typically have very simple

arithmetic• The decimal is variable length from 1 to 16

bytes (1-31 digits), where binary can be 2, 4, or 8 bytes only

Date post:	16-Jan-2016
Category:	Documents
Upload:	clarissa-austin
View:	215 times
Download:	1 times

Copyright 2005 Curt Hill Data Representation How are data represented?

Documents