+ All Categories
Home > Documents > Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits...

Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits...

Date post: 17-Jan-2018
Category:
Upload: owen-waters
View: 216 times
Download: 0 times
Share this document with a friend
Description:
Standards Look around – how many items do you see that are based on a standard? Standards: make our lives simpler, more efficient Sometimes there aren't any.
30
Data Encoding COSC 1301
Transcript
Page 1: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Data EncodingCOSC 1301

Page 2: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Computers and Data• Computers store information as sequences of bits• Computers store many types of data:• numbers• text• audio• images• video

Page 3: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Standards• Look around – how many items do you see that are based on a

standard?• Standards: make our lives simpler, more efficient• Sometimes there aren't any.

Page 4: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Not Much of a Standard

Page 5: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

A Small Number of Standards

Page 6: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

A Small Number of Standards

Page 7: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

A Small Number of Standards

Page 8: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Bitten by Lack of a Single Standard

Page 9: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Bitten by Lack of a Single Standard

Page 10: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Wishing for Standards

http://www.sheldonbrown.com/tire-sizing.html

Page 11: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

A General Trend Toward StandardsWord Sizes of Early Computers

EDVAC 44 bits 1947

MARK 1 40 bits 1948

EDSAC 17 bits 1949

CSIRAC 20 bits 1949

UNIVAC I 12 digits 1951

IBM 701 36 bits 1952

CDC 1604 48 bits 1959

CDC 6600 60 bits 1964

IBM 360 32 bits 1965

x-86 16 bits 1978

x-32 32 bits 1986

x-64 64 bits 2004

Page 12: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Standard: Integer Representation

• Representing integers in base 2:

93 0 1 0 1 1 1 0 1

Page 13: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Integers• Representing integers in base 2:

93 0 1 0 1 1 1 0 1

1 1 0 1 1 1 0 1But what about: -93

sign bit

Page 14: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Integers

1 1 0 1 1 1 0 1

But what about: -93

sign bit

Problem: Two representations of zero – positive zero and negative zeroUnnecessary complexity

Better representations make it easier for the computer.

Page 15: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Two's Complement: Negative Integers

93

-93Flip the bits:

Then add 1:

0 1 0 1 1 1 0 1

1 0 1 0 0 0 1 0

1 0 1 0 0 0 1 1

A good explanation of why it works: http://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html

Page 16: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

A Problem

104.23

What should we do about:

If we always want two places after . : Then we could write:

10423

And then always treat it as though the decimal point were there.

Page 17: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Floating Point Numbers• Floating point representation: exponential/scientific notation

Example:123l.45 can be represented as a decimal floating-point number with the integer 12345 as the significand and -2 as the exponent (and 10 as the base). It’s value is given by the following:

123.45 = 12345 X 10 -2

See the following slide to see how a computer stores this

Page 18: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

IEEE Standard - Floating PointSingle Format:• 32 bits (4 bytes) to store a floating point number:• 1 bit for the sign• 8 bits for the exponent• 23 bits for the mantissa or significand

Double Format: 64 bits (8 bytes) to store a floating point number:• 1 bit for the sign• 11 bits for the exponent• 52 bits for the mantissa or significand

Page 19: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Text

To represent text digitally, need to be able to represent every possible character that may appear:

Computers have revolutionized our world.

コンピュータは私たちの世界に革命をもたらしました。Les ordinateurs ont révolutionné notre monde.

Page 20: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Text• Decide how many characters we need to represent.

• Then: determine the required number of bits.

• English: 26 letters, 52 for upper and lower case. Plus punctuation...

• And other languages?

• character set: a list of characters and the codes used to represent each• Several character sets have been used over the years - a standard makes

processing text easier

Page 21: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

ASCII• ASCII: American Standard Code for Information Interchange• 1963: 7 bits per character = 128 different symbols• Thought to be enough at the time• 8th bit in each character byte – used as a check bit or parity bit

• check for errors in transmission of data

• Later: Latin-1 Extended ASCII character set• All 8 bits used to represent character• Represent 256 characters – includes accented characters, other special

characters

Page 22: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

ASCII

http://www.krisl.net/cgi-bin/ascbin.pl

Page 23: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Representing Text

Fourscore and seven …

F o u r01000110 01101111 01110101 01110010

Page 24: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Representing Text

T h e n u m b e r i s 1 7 .

54 68 65 20 6E 75 6D 62 65 72 20 69 73 20 31 37 2E

Page 25: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Computing with Text

Computers have revolutionized our world. They have changed the course of our daily lives, the way we do science, the way we entertain ourselves, the way that business is conducted, and the way we protect our security.

Suppose we want to capitalize this entire paragraph:

Let’s go back and look at the ASCII table to see how to do that.

Page 26: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

When We Need More Characters

简体字 What about things like:

Page 27: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

When We Need More Characters

简体字 What about things like:

Answer: Unicode

A conversion applet:http://www.pinyin.info/tools/converter/chars2uninumbers.html

Page 28: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Unicode• Previously, a letter maps to some bits:

A encoded as 0100 0001

• In Unicode, a letter maps to a code point – a number like U+0639• U+ means Unicode• numbers are hexadecimal• Every character has a Unicode code point• This doesn't indicate how the code point is encoded as a sequence of bits, though• U+0041: English letter A• U+0639: Arabic letter Ain

Page 29: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

Unicode• Example: Hello• 5 code points, one code point (i.e., number) per letter• U+0048 U+0065 U+006C U+006F

• How is this stored in memory? Different standards for this.

• One standard: UTF-8• Standard system for storing strings of Unicode code points in

binary (i.e., U+DDDD stored in some number of bytes)

Page 30: Data Encoding COSC 1301. Computers and Data Computers store information as sequences of bits Computers store many types of data: numbers text audio images.

UTF-8• Code points 0-127 stored in one byte• So English text looks same in UTF-8 as ASCII (backwards

compatible)• Code points 128 and higher: 2, 3, up to 6 bytes• Hello: U+0048 U+0065 U+006C U+006C U+006F• Stored as: 48 65 6C 6C 6F (same as ASCII)

• For Hebrew characters, accented letters, etc.: you may need more bytes


Recommended