I308 Information Representation - IU South Bend: …hhakimza/I308/Notes/Part1.pdf · I308...

transcript

I308 Information Representation

Dr. Hossein Hakimzadeh Computer Science and Informatics

IU South Bend

What is a Computer?

An Electronic digital device that can store, and process data.

A Fast and Accurate electronic symbol (or data) manipulating system that is designed to accept and store input data, process them and produce output result.

A programmable, multi-use machine that accepts data (raw

facts) and processes, or manipulates it into information we can use.

How does a Computer work?

We must answer the following two questions:

1) How is data represented inside a computer? (Encoding)

2) How is data manipulated inside a computer?

(Algorithms)

How is Data represented?

Encoding is the process of transforming information from one format into another. The opposite operation is called decoding. This is often used in many digital devices. (http://en.wikipedia.org/wiki/Encoding)

UPC (Universal Product Code)

Chinese Calligraphy

Encoding

http://www.unicode.org/charts/PDF/U2800.pdf

Encoding

http://en.wikipedia.org/wiki/DNA

http://en.wikipedia.org/wiki/DNA_sequencing

Encoding

http://en.wikipedia.org/wiki/Musical_notation

How is Data represented inside the Computer?

Remember, A computer is an electronic digital device

that can store, and process data.

Digital vs. Analog?

Analog systems have a continues range of values. Vinyl records Analog clocks Set of real numbers

Digital systems have a set of discrete values. CD’s and DVD’s Digital clocks Set of integer numbers

How is information represented inside the Computer?

Binary digits or BITs

(0’s and 1’s)

Why Binary Digits?

How is information represented inside the Computer? Digital Computers are designed to

process data in numerical form. They can store and manipulate information such as numbers, characters, images, and sound using numbers.

The information inside the computer is expressed in the binary system.

Binary digits (bits), are made up of 0’s and 1’s. (e.g. 0, 1, 110, 11, 1010, and 1011 are all binary numbers).

Binary digits are easily expressed in the computer circuitry by the presence or absence of voltage. For example 1 may mean 5 volts and 0 may mean 0 volts.

How is Data represented inside the Computer?

Bit (Binary digIT)

(A bit is a unit of storage in a computer) (A bit is a single binary digit. 0 or 1)

A Byte is 8 Bits

KiloByte (KB) = 210 or 1024 bytes (Approximately 1,000 bytes)

MegaByte (MB) = 220 bytes (Approximately 1,000,000 bytes)

GigaByte (GB) = 230 bytes (Approximately 1,000,000,000 bytes)

TeraByte (TB) = 240 bytes (Approximately 1,000,000,000,000 bytes)

PetaByte (PB) = 250 bytes

Problem 1:

You just bought a 60 gigabyte drive. After formatting the drive, you found out that it is only 58.6 Gigabytes? What should you do?

Solution 1:

You just bought a 60 gigabyte drive. After formatting the drive, you found out that it is only 58.6 Gigabytes? What should you do?

Nothing!

Most drive manufactures use a Giga Byte to mean one Billion Bytes! 60,000,000,000 / (1024*1000*1000) = 58.593

Encoding

Given that computers only understand binary numbers, in order to store and manipulate information inside a computer, we must find a way to encode information in binary.

This information may be NUMBERS, TEXT, or other type of data

such as AUDIO, IMAGE or VIDEO.

Encoding Text

Imagine our language was restricted to the following symbols (letters): Source alphabet: {n,k,b,e,r,d,i} Target alphabet: {0, 1} Encoding:

n = 000 k = 001 b = 010 e = 011 r = 100 d = 101 i = 110

Decode: 101100110000001010011011100 = _________________

Problem 2:

How many bits do we need in order to represent all the 26 upper case English letters?

How many bits do we need to represent the upper and lower

case letters, plus all the numbers, and the symbols (@, $, & and #)

ASCII Code

American Standard Code for Information Interchange

Why? …. Standardization between computers

7 or 8 bits are used to represent all the letters, numbers, and symbols, that appear on the English language keyboard.

A = 01000001 = 65 B = 01000010 = 66 C = 01000011 = 67

http://www.asciitable.com/

UNICODE Code 16 to 32 bits vs. 8 bit code Why? … Internationalization of computers and applications) Hello = U+0048 U+0065 U+006C U+006C U+006F There are many UNICODE encoding standards. These include:

UTF-8 (Treats English as normal ASCII would, then accommodate other characters as 2 or more byte characters)

UTF-16 or UCS-2 (2 byte code to store each Unicode character) UTF-32 or UCS-4 (4 byte code to store each code point or Unicode

character)

It is important to know what encoding standard is being used

before attempting to decode a string!

Insert the following

Unicode into text a html file and try to view it using a browser:

<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Unicode Characters</title> </head> <body> hello. <BR> Chinese: (循環效率) <BR> Persian: (الفبای فارسی) </h2> <BR> done. </body> </html>

Unicode Example:

Insert the following

Unicode into text a html file and try to view it using a browser:

<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Unicode Characters</title> </head> <body> hello. <BR> Chinese: (循環效率) <BR> Persian: (الفبای فارسی) </h2> <BR> done. </body> </html>

Unicode Example:

Encoding Numbers:

How do we represent numbers?

Character “1” in the ASCII table is encoded as 00110001 (decimal 49.) Character “2” is 00110010 = 50 Character “3” is 00110011 = 51

Can we use ASCII representation of numbers for the purpose of

calculations? Can we add Character “1” and Character “2” to get “3”?

Encoding Numbers:

If ASCII representation of numbers can not be use, then, we need a different encoding to be able to represent numbers and perform calculations.

What is a suitable encoding?

Hold on to this idea. We’ll come back to it……

Number Systems

Decimal (Base 10)

Binary (Base 2)

Octal (Base 8)

Hexadecimal (Base 16)

Decimal (Base 10)

Used by humans (probably because we have 10 fingers!) Numbers in base 10 are (0, 1, 2, 3, 4, ...... , 9) (always from 0

to Base -1) Example: (254)

What is 254?

2 * 102

5 * 101

4 *100

Binary (Base 2)

Used by digital computers (remember the ON / OFF states) Numbers in base 2 are (0, 1) (always from 0 to Base -1) Example: (Binary 110)

What is Binary 110?

1 * 22

1 * 21

0 * 20

Octal (Base 8)

Used by people when they want to represent large binary numbers. Its easier to deal with.

Numbers in base 8 are (0, 7) (always

from 0 to Base -1)

Binary

Octal (Base 8)

Example: (Octal 251)

2 * 82

5 * 81

1 * 80

Octal 251 can be converted to binary very easily. Each

number will be represented by 3 binary digits (bits).

010 101 001

Hexadecimal (Base 16) Similar to Octal, Hex numbers are

used by people when they want to represent larger binary numbers. Its easier to deal with.

Numbers in base 16 are

(0,1,2,....,9, A, B, C, D, E, F) (always from 0 to Base -1)

To keep each number as one

character we use the letters “A” through “F” as numbers 10 to 15. (A = 10, B = 11, C=12, D=13, E=14, F=15)

Hexadecimal

Binary

Hexadecimal (Base 16) Example:

What is HEX 25A?

2 * 162

5 * 161

A * 160

HEX 25A can be converted to binary very easily. Each number will be represented by 4 binary digits (bits).

0010 0101 1010

Different Ways of Representing Binary Numbers:

Unsigned Integers

Signed Magnitude

1's Complement

2's Complement

Unsigned Integers (non-negative numbers)

With k bits, we can represent 2k positive Integers Ranging from 0 to 2k-1

Unsigned Integers

Representation

Signed Magnitude

With k bits, we can represent 2k integers ranging from negative 2k-1-1 to positive 2k-1-1 The left most bit is a sign bit. (0 = positive, 1 = negative)

Signed Magnitude

Representation

1's Complement

With k bits, we can represent 2k integers ranging from negative 2k-1-1 to positive 2k-1-1 (-15 to +15) Negative numbers are represented by taking the positive numbers and flipping all their bits.

1's Complement

Representation

2's Complement

With k bits, we can represent 2k integers ranging from negative 2k-1 to positive 2k-1-1 (-16 to +15) Negative numbers are represented by taking the positive numbers and flipping all their bits, then adding 1 to it.

2's Complement

Representation

What are the advantages of one number system vs. another?

It is a lot easier to implement computer hardware that is

able to calculate numbers in 2's complement. Virtually all computers use the 2's complement number

system to do binary arithmetic.

Binary Arithmetic (Using 2's complement)

Two binary numbers can be added, starting at the rightmost bit

and adding the corresponding bits.

First number (addend) 0

Second number (augend)

If a carry is generated, it is carried one position to the left, just

as in decimal arithmetic.

Carry 1

Carry 1 1

Carry 1 1 1

Examples:

5 +4 === 9

00101 +00100 --------------- 01001

7 +4 === 11

00111 +00100 --------------- 01011

7 +7 === 14

00111 +00111 --------------- 01110

Overflow in 2's Complement

If the sum of two positive numbers carry into the last bit (left most bit), then an overflow has occurred and the sum becomes a negative number (incorrect).

15+15 = 30 (Note that this sum produces a negative number)

01111 +01111 --------------- 11110

Adding Negative Numbers

In 2’s complement arithmetic, a carry generated by the addition of the leftmost bits is simply thrown away.

7+(-7) = 0 (2's complement)

000111 +111001 --------------- 000000

7+(-6) = 1 (2's complement)

000111 +111010 --------------- 000001

(-6)+(-6) = (-12) (2's complement)

111010 +111010 --------------- 110100

Hardware Circuitry for Representing and Manipulating Information

Gates and Circuits:

The NOT circuit

The AND Gate

The OR Gate

The XOR Gate

Gates and Circuits:

The NOT Gate

Gates and Circuits:

The AND Gate

Gates and Circuits:

The OR Gate

Gates and Circuits:

The XOR Gate

Other Circuits:

Gates and Circuits:

The OR Gate

The OR gate can also be implemented as an AND gate with a few NOT gates:

A OR B = NOT( (NOT A) AND (NOT B) )

Other Circuits:

Gates and Circuits: The NOR Gate

Other Circuits:

Gates and Circuits:

The XNOR Gate

Other Circuits:

Gates and Circuits:

The XOR Gate

XOR gate made with 4 NAND gates

Building an ADDer Circuit

We proceed from the rightmost (least significant) bit position to the leftmost

(most significant) bit position. In each position, we add three binary digits A, B, and Cin and as a result we get two binary digits S (Sum) and Cout.

X and Y are the bits from the two numbers we want to add. Cin is the "carry-in" from the previous bit position, and Cout is the "carry-out" to the next bit position.

1 1 1 1 0 0 0 0 <--- Cin 0 1 0 1 1 1 0 1 <--- A + 0 0 1 1 1 0 1 0 <--- B ------------------------ 1 0 0 1 0 1 1 1 <--- S 0 1 1 1 1 0 0 0 <--- Cout

Building an ADDer Circuit using an XOR and a AND Circuit

1 1 1 1 0 0 0 0 <--- Cin 0 1 0 1 1 1 0 1 <--- A + 0 0 1 1 1 0 1 0 <--- B ------------------------ 1 0 0 1 0 1 1 1 <--- S 0 1 1 1 1 0 0 0 <--- Cout

Half Adder

A full adder

Representing Real Numbers How do we represent real numbers such as 2/3 or PI which may

have infinite repeating or non-repeating digit sequences? We approximate using Floating Point representation. Conceptually this

representation is very similar to scientific notation. For example 3456 = 3.456 * 103

Floating point numbers are generally allocated as either 32 or 64 bits. These bits are divided into 3 parts:

Sign bit

Exponent

Fraction

Floating point number = (+/-) (1+Fraction) x 2 (Exponent - Bias)

Representing Real Numbers Sign Bit:

The sign bit is the very first bit of the floating point number and determines whether the number is positive or negative. 0=positive 1=negative

Exponent:

The exponent consists of the next 8 (32bFP) or 11 (64bFP) bits. The bias is a fixed number:

127 for 32bFP 1023 for 64bFP

The Fraction:

The last 23 (32bFP) or 52 (64bFP) bits is the fraction. This is an unsigned binary string that represents binary places to the

right of the decimal point and is therefore a value between 0 and 1.

Sign (1 bit)

Exponent (8 bits)

Fraction (23 bits)

0 or 1

0000 0000

000 00000 00000 00000 00000

Example: Interpreting Floating Point Numbers

0 0000 0000 11000000000000000000000

This fraction represents 1*2-1 + 1*2-2

Therefore, our fraction value is 1/2 + 1/4 or .75

All floating point fractions are expressed in powers of 2

(+/-) (1+Fraction) x 2 (Exponent-Bias)

Sign (1 bit)

Exponent (8 bits)

Fraction (23 bits)

0 or 1

0000 0000

000 00000 00000 00000 00000

0 10000001 101 00000 00000 00000 00000

Sign bit: 0 = positive Exponent: 129 Bias: 127 Fraction: 1*2-1+0*2-2+1*2-3 = 1/2 + 1/8 = .5+.125 = .625

Sign (1 bit)

Exponent (8 bits)

Fraction (23 bits)

0 or 1

0000 0000

000 00000 00000 00000 00000

0 10000001 101 00000 00000 00000 00000

(+/-) (1+Fraction) x 2 (Exponent-Bias) + (1.625) x 2(129-127) = 1.625 x 22 = 1.625 x 4 = 6.5

Sign (1 bit)

Exponent (8 bits)

Fraction (23 bits)

0 or 1

0000 0000

000 00000 00000 00000 00000

Encoding Images

The image is represented as a 2D array and each color is represented as a number.

Encoding Sound http://www.school-for-champions.com/science/sound.htm

Encoding Sound

In order encode sound, we have to sample the wave 3

Encoding of the above sound: 1,3,3,1,-2,-4,-2,1, 3,3,1, -2,-4,-2,1

Encoding Video Video can be encoded by combining images (typically 30

frames per second) plus one or more channels of sound.

Problem 1: (Video recording)

We want to record a 4 minute 400x400 video. Assume the .bmp file has a 54 byte header, and each pixel is 16-bits (64K colors) and the frame rate is 10/sec. Audio is stereo, 16 bit samples taken at 40khz.

What is the total size of the file?

Hakimzadeh 77

Solution:

Frame 1

Video Audio

Step 1: Calculate Frame size Frame size = Header + Image resolution * Size of each pixel Header: 54B Pixels: 400*400 = 160,000 pixels Each Pixel (color): 2B Each frame = 160,000 * 2B = 320,000B Total size for each image: 54B + 320,000B = 320,054B Approx: 320 KB

Step 2: Calculate Video size Video size = Frame size * Frame rate * Duration Video size = 320 KB * 10/sec * 60 sec/min * 4min = 320KB *2400 = 768,000KB

Step 3: Calculate Audio Size Audio size = #Channels * Sample size * Sample rate * Duration Audio size = 2 * 16b * 40,000/sec * (60sec/min * 4min) = 307,200,000b * (1B/8b) = 38,400,000B * (1KB/1024B) = 37,500KB

Step 4: Video + Audio File size = 768,000KB + 37,500KB = 805,500KB = 805 MB (approx)

I308 Information Representation - IU South Bend: …hhakimza/I308/Notes/Part1.pdf · I308...

Documents