+ All Categories
Home > Documents > Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill Data Representation How are data represented?

Date post: 16-Jan-2016
Category:
Upload: clarissa-austin
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
68
Copyright 2005 Curt Hill Data Representation How are data represented?
Transcript
Page 1: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Data Representation

How are data represented?

Page 2: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Introduction to data coding

• In order to manipulate data on a computer we need to be able to represent it

• The purpose of the binary/octal/hexadecimal talk is that everything in machine’s memory is in binary, though we can display it in many forms

• The expression digital computers comes from the fact that all data is quantized into numbers

• By contrast we also have analog computers where are numbers are represented by voltages that can vary continuously

Page 3: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

What needs representation?

• We need to be able to represent a variety of different types of data– Such as C’s simple types:

• Integers (int)• Reals (float and double)• Characters (char)• Booleans (bool)

Page 4: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Boolean• Easiest to represent• Booleans only need one bit to represent

a true/false value• However, a bit is an inconvenient

amount since bytes (8 bits) are the ordinary unit of storage

• We end up with a two pronged approach• Usually store a boolean in one byte

– EG 0 = false, 1 = true– or perhaps one word, same values

• Collect 1-8 booleans in a single byte and manipulate as a group– Here a boolean only occupies one bit– Some machine instructions will have a

boolean embedded in them as an instruction switch

Page 5: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Character• Character is almost as easy• We essentially map one byte to one

character• The mapping is just a convention

between the sending and receiving device

• Several standard conventions: ASCII, EBCDIC, BCD, CDC display code, Unicode

• Each is a different mapping and length

Page 6: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

ASCII

• American Standard Code for Information Interchange

• ASCII is a 7 bit code 0-127• See next for ordering

Page 7: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

ASCII has the following ordering:

• control characters (0-31)• blank (32)• punctuation and math (33-47)

– $ 36– & 38– ) 41– . 46– / 47

• digits (48-57)• more punctuation (58-64)

– < 60– ? 63

• Upper case (65 - 90)• more punctuation (91-96)

– [ 91– _ 95

• Lower case (97-122)• more punctuation (123-127)

– { 123– ~ 126

Page 8: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

EBCDIC

• Extended Binary Coded Decimal Interchange Code

• 8 bit code 0 – 255 • IBM and other mainframes• See next for ordering

Page 9: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

EBCDIC ordering:

• control characters (0-63)• blank (64)• punctuation and math (65-127)• Lower case (129-169, with gaps)• Upper case (193-233, with gaps)• digits (240-249)• The gaps are sometimes

unassigned, sometimes occupied

Page 10: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Others• Most of the others are seldom used

anymore• The one other important is Unicode• Unicode is 16 bit

– First byte selects language– Second byte the character

• Language code 0 means ASCII• Language code 10 is Italian

– Similar to ASCII but with a few different characters

• Oriental languages usually need several

Page 11: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

What is desirable for integers?

• The desirable characteristics include:– Easy to use– Compact– Accurate– Standardized

• Some of these are contradictory

Page 12: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Easy to use

• The form has to be such that the CPU can perform operations on it efficiently

• Consider integers• Arithmetic needs to be easy to

implement using digital logic• Conversion from an internal

representation to an external representation should also be easy

Page 13: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Compact• It is a maxim that no computer has too

much memory• We consider the compactness issue in

many contexts• We want our data to be compact,

whether it is integer, real, etc.• We also want our instructions compact

– Which tends to make us want our addresses compact

– Which will bring out the whole issue of addressing schemes, but later

Page 14: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Accurate• We could represent all integers as

signed characters• However, that would give us an

arithmetic range of ±127– This does not prove to be an valuable

range for integer arithmetic• Generally speaking compactness

and accuracy are opposites• Remember the three digit

arithmetic game

Page 15: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Standardized for interchange

• We would like to transfer data around between different models of computers– Recall big-endian and little-endian

• Hence if we can settle on one or just a few data transfer formats then this problem will be much easier

• Generally transporting large amounts of data in external format is bad idea

Page 16: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Representing signed numbers

• The binary presentation showed how to store positive integers

• How do we represent negatives?• Three general strategies:

– Sign magnitude– Excess notation (aka bias)– Complement

• All have been used

Page 17: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Issues• How many zeros?

– One is desirable• How many singularities?

– A singularity is where we get an unexpected value when counting by one in binary

– Overflow is result– We will always have one in a finite number

of bits• How do we change signs?

– How easy is it to say if thing is positive/zero/negative

• How easy is it to add two numbers, especially of opposite sign

Page 18: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Sign magnitude• Partition the bit string into two pieces• One bit for sign, usually the left most bit• The rest for the magnitude of the

number• Example: 4 bit number:

– 1000 = 0000 = zero– 0010 = 2– 1010 = -2– ±7 is the range

• Notice the two representations for zero, which makes circuitry somewhat more exciting, that is complicated

• Flipping the sign is easy• There are also two singularities

Page 19: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Sign Magnitude Singularities• If we count up from postive zero

• 0000 = 0• 0001 = 1• …• 0111 = 7• 1000 = -0

• If we count down from negative zero

• 1000 = 0• 1001 = -1• …• 1111 = -7• 0000 = +0

Page 20: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Sign Magnitude Conversion

• Converting to or from signed magnitude is easy

• Take absolute value of number • Convert decimal to binary • Set the sign bit

Page 21: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Excess notation• Also known as biased notation• In unsigned binary we count

from zero to some power of two minus one– In four bit: 0 – 15

• The basic idea of excess notation is that we use the same range but do not start at zero

Page 22: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Bias• Use all the range of your bit string and

then subtract a fixed amount– This is the excess or bias amount

• For example consider our 4 bit string– 0000 - 1111 is 0-15– Choose a number, usually near the range

divided by 2 and subtract• In this consider 8

– 0000 becomes (0 – 8) or -8– 1000 becomes (8 – 8) or zero– 1111 becomes (15 - 8) or 7– Range is -8 to 7

• This is excess 8

Page 23: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Excess identification• We may choose any number within

the range to subtract from the straight binary value– This identifies the notation

• Usually near the midpoint– For four bit it is usually 7 or 8– 4 bit excess 7 gives -7 … +8 – 4 bit excess 8 gives -8 … +7

• There is no need for symmetry in the range– 4 bit excess 4 gives -4 … +11

Page 24: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Bias: the good and the bad

• There is only one representation of zero so that simplifies things

• There is only one singularity• Counting is simple• In general there is no easy way to

identify the sign– In four bit excess 8 the first bit acts like a

sign bit– In all other forms that is not the case

• No easy negation either

Page 25: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Conversion

• To Excess notation– Add bias value to the decimal number– If negative, it cannot be represented– Convert to binary

• From Excess– Convert to decimal – Subtract bias

Page 26: Copyright 2005 Curt Hill Data Representation How are data represented?

Examples

• Convert -4 to excess 8– -4 + 8 = 4– 0100

• Convert 0110 (excess 8) to decimal– 0110 = 6 – Subtract 8 = -2

Copyright 2005 Curt Hill

Page 27: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Complement• A complement is a negation of each bit

in a bit string• 1001 complemented is 0110• The representation for positives is the

same as with sign magnitude• In complement arithmetic what we do is

add a large bias to all negative numbers so that the negatives appear to be larger than the positives

• There are two frequently used complements:– Ones complement– Twos complement

Page 28: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Ones complement• Reverses the bit pattern of the bit

string, so 0101 become 1010• This is trivially easy to do• The left digit now functions as a

sign• The problem is that like sign

magnitude we end up with a positive and negative zero

• Somewhat more common is twos complement

Page 29: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Conversion

• To ones complement– Take absolute value– Convert to binary– If original was negative complement

• To decimal– If negative complement– Convert to decimal– Recall the sign and apply

Page 30: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Twos complement• The negative is obtained by:

– Taking complement– Add 1 (ignoring overflow)

• 0110 = 6– Complement to 1001– Add 1 to 1010

• This is nicely reversible– Complement to 0101– Add 1 to 0110

Page 31: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Singularities• Just one• Consider 4 bit

– 0101 – +5– 0110 – +6 – 0111 – +7– 1000 – -8– 1001 – -7– …– 1111 – -1– 0000 – 0

Page 32: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Two weird states• What happens when zero is

complemented?– Start with 0000– Complement to 1111– Add 1 (ignoring carry) to 0000

• What happens when the largest negative is complemented?– Recall there is no positive counterpart– Start with -8 as 1000– Complement to 0111– Add 1 to get 1000

Page 33: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Conversion

• To twos complement• Take absolute value• Convert to binary• If negative

– Complement – Add 1 ignoring overflow

Page 34: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Good Features

• One singularity• One weird state -8 is own negative• One zero• Easy to count and other arithmetic• Twos complement is the most

common coding for integers, almost universal for various reasons

Page 35: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Integer data type

• With twos complement the circuitry needed to perform arithmetic is very straightforward

• An adder, an inverter and the ability to shift, left or right gives us the four basic functions

• We will consider these later• One other problem: overflow

Page 36: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Overflow

• Overflow is only a problem addition of two operands of the same sign

• Merely check that the result has the same sign as both of the operands

• If so then no overflow occurred• Otherwise you have detected

overflow– Then post an interrupt

Page 37: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Integer Examples• Suppose that we have 01001101• What is that in signed magnitude?• 77• What is that in excess 100?• -23• What is that in ones complement?• 77• What is that in twos complement?• 77

Page 38: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Integer Examples• Suppose that we have 10011010• What is that in unsigned binary?• 154• What is that in signed magnitude?• -26• What is that in excess 100?• 54• What is that in ones complement?• -101• What is that in twos complement?• -102

Page 39: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Real numbers• The hardest and most interesting• Some interesting numbers that we need to

represent in order to be generally useful:• π 3.141592654• Speed of light 3.0 x 108 meters per second• Avogadro’s number 6.022 x 1023 atoms per

mole• Mass of earth 5.98 x 1024 kilograms• Electron charge -1.60 x 10-19 Coulombs• Planck’s constant 6.63 x 10-27 erg seconds• Without the ability to represent these, the

machine will be of dubious usefulness

Page 40: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Real Numbers

• For the most part there are just two approaches– Rationals– Floating points

Page 41: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Rationals

• Just integer fractions• They can exactly represent any number

that is the ratio of two integers• Store a numerator and denominator• Our calculations with these are

accurate, provided overflow does not occur

• The disadvantage of these is that they have problems with some of the extreme numbers that are given before– But not necesarily the ones you might think

Page 42: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Rational Problems• PI is irrational

– But we can represent it– The rational expression is no worse than

floating point

• The extreme magnitudes are more problematic– Such as Planck’s constant or Avogadro’s

number

• An 8 byte number can represent a decimal exponent of 18– To have two of those in a rational is very

bulky– Still cannot represent either of the above

Page 43: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Floating point notation

• Similar to scientific notation• Mantissa, base, exponent• There is considerable variation as to

how this is done - that is there are a lot of different schemes that have been used successfully

• There are typically four pieces– Sign bit– Mantissa– Exponent– Base

Page 44: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

The pieces• Sign bit determines the sign of the entire

number and is usually separate from that of the exponent and separated (physically) from the mantissa

• Mantissa is the digits of precision• Exponent is a number that shifts the

radix point to right or left• The base is the number that the

exponent is raised to– Often 2 or 16

Page 45: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

The IBM 370 example

• These may occupy 32, 64 (double) or 128 (extended) bits

• The first bit (bit position zero) is the sign bit (0=positive)

• Bits 1-7 are called the characteristic, which is the exponent

• Bits 8-31 (63 or 127) are called the fraction (the mantissa)

Page 46: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IBM 370 continued• The fraction is always positive or zero• The radix point is to the left of the first

significant digit, that is bit 8• Hence 1/2 is a 1 in bit position 8 and

zeros to the right of that, 1/4 is a 1 in bit position 9 ...

• A number is represented as the fraction multiplied by 16 raised to the characteristic power, with the sign attached to result

• The characteristic is in excess 64 notation

Page 47: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IBM 370 Example• 0 1000000 10000000 ...• Here the sign is 0, that is positive• The exponent is 64, which in

excess 64 notation represents an exponent of zero

• The leading bit of the mantissa is 1 so this number is:

• ½ x 160 = 0.5

Page 48: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Second example• 1 1000001 01010100 0...• The sign is 1, that is negative• The exponent is 65

– In excess 64 notation represents an exponent of one

• The mantissa is• 1/4 + 1/16 + 1/64• (1/4 + 1/16 + 1/64) x 161

– 16/4 = 4– 16/16 = 1– 16/64 = 1/4– -5.25

Page 49: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Some Notes• A 32 bit floating point number in this

format has approximately 6.5 decimal digits of precision (significant digits)

• A 64 bit floating point number is just a 32 bit number with 32 extra bits of precision

• This was convenient that if you did a single precision operation on a double precision number you got a valid number with reduced precision

• An extended was a double with an additional 64 bits of precision

• However there was no way to extend the exponent range which was: 16-64 to 1663 which gives a decimal exponent of -78 to +75 which is adequate for the vast majority of calculations

Page 50: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Zeros

• A true zero has a zero fraction and a zero sign and exponent

• If a zero fraction occurs, then the hardware usually forces zeros elsewhere

• Why do we not want a number like• 1 000010 00000101 0100...• This reduces the legitimate digits

we can store later

Page 51: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Normalization• The process of normalization means

to shift the mantissa left as far as we can while adjusting the exponent so that it means the same thing

• Since the base is sixteen we shift left units of four bits

• We may end up with leading zeros, but in the first four bits there should always be a one bit

• In a machine where the base is 2 instead of 16, we can always shift to left until the first bit is one

Page 52: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IEEE floating point standard

• Sometime after the IBM 370 arrived the IEEE society defined a standard for floating point numbers

• This was adopted for the numeric co-processor of Intel CPUs

• Has some desirable characteristics• There are two flavors

– Short (32 bit)– Long (64 bit)

Page 53: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IEEE short

• First bit is a sign• Next eight bits are an exponent• Almost excess 127 format• Almost excess 127?

– Range is 0-255 – The 0 and 255 values are reserved

• Thus range is 1-254

– Legitimate exponents then are in range -126 to 127

• Next 23 bits are mantissa

Page 54: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IEEE Oddities

• A true zero has an exponent of all bits zero in mantissa and exponent, regardless of sign

• There are three other special values:

• Positive infinity, negative infinity and NaN

Page 55: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

The weird states• Positive infinity, which is sign 0,

exponent 255, mantissa 0– This is the result of overflow such as division

by near zero

• Negative infinity, which is sign 1, exponent 255, mantissa 0– This is the result of overflow such as division

by near zero, where the sign would have been negative

• Not a number (NaN) where exponent is 255, and the mantissa is not zero with either sign– This results from undefined operation:

division by true zero, square root of a negative etc.

Page 56: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

One More Trick• Three zero mantissas

– True zero– Positive infinity– Negative infinity

• All other numbers are normalized such that the first mantissa bit is one

• Since it is always one – leave it out

• We effectively get 24 bits of precision out of 23 actual bits

Page 57: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

IEEE long• First bit is a sign• Next eleven bits are an exponent

– Almost excess 1023• Next 52 bits are mantissa– Same scheme as short with just

more bits in both mantissa and exponent

– Intel processors use the same scheme in their internal floating point numbers but use 80 bits

Page 58: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

DEC PDP11 floating• Bit 0 is sign• Next 8 bits is exponent in excess 128

format• -128 is reserved for true zero• Next 23 is fraction• Base is 2• A double precision version adds 32

more bits of mantissa• The leftmost bit of the normalized

mantissa is not stored

Page 59: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Repeating and Terminating

• Explanation of terminating and repeating fractions

• .01 is a repeating fraction in binary• Hence, it is just as representable

as any irrational such as PI• This really aggravates accountants

– They do not like to lose pennies to roundoff error

• Some machines then use decimal arithmetic to alleviate these problems

Page 60: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Binary Coded Decimal

•Sometimes an alternative integer representation is desirable and available

•An IBM 360 may have the decimal arithmetic feature– Costs extra money

•This feature allows three kinds of representations of integers– External decimal (printable digit string)– Internal decimal (two digits per byte)– Binary

Page 61: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Three Forms of Integers

• Binary – twos complement• Internal or packed decimal• External or zoned decimal• The form of this has much to do

with punch cards and other esoteric details lost in antiquity

• There is also a conversion path between the three

Page 62: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

External decimal

• AKA Zoned decimal• A punch card just needs a single

punch for a digit– Digit 5 is a five punch

• When it gets into EBCDIC it is F0 in hex

• The F is known as a zone• How is sign handled?

Page 63: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Signs on External Decimal

• On input an eleven punch was overstruck on the last digit

• Thus -139 needed four punches• It came out to F1F3D9• On output (such as a report) this

must be edited

Page 64: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Internal decimal

• Each byte may hold two decimal digits

• A pack instruction strips the zone and makes the F1F3D9 into 139D– -139

• This is why it is known as packed decimal

• It may vary from 1 to 16 bytes

Page 65: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Internal decimal format• All bytes but the last have two

decimal digits• The last byte

– One decimal digit– One sign

• The positive signs are C, (preferred) or A,E,F

• The negative signs are D (preferred) or B

• Notice there is a clear test for validity– Unlike binary numbers

Page 66: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Conversion Path

• External to binary• Numbers were read off of a card in

zoned decimal• A Pack instruction stripped the

zones and rearranged the last byte– This instruction could not fail

• A Convert to Binary instruction deposited the result in a register– This could fail if the number was

invalid

Page 67: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Conversion Path

• Binary to external• Binary value in register • Use Convert to Decimal instruction

– Creates internal decimal• Use an Unpack

– Gives something to punch on a card • Use an Edit

– Gives something for a report

Page 68: Copyright 2005 Curt Hill Data Representation How are data represented?

Copyright 2005 Curt Hill

Decimal Arithmetic on 360

• Arithmetic can be performed in binary or internal decimal

• Binary arithmetic is always much faster• Why use decimal?• Binary has faster arithmetic, but the

conversion path is E=>I=>B and the reverse• Converting from decimal to binary adding

and then back to decimal is slower than just doing it in decimal– Business applications typically have very simple

arithmetic• The decimal is variable length from 1 to 16

bytes (1-31 digits), where binary can be 2, 4, or 8 bytes only


Recommended