+ All Categories
Home > Documents > Summary of Computer Architecture -...

Summary of Computer Architecture -...

Date post: 05-Jul-2019
Category:
Upload: lephuc
View: 220 times
Download: 0 times
Share this document with a friend
76
Summary of Computer Architecture
Transcript

Summary of Computer Architecture

CHAP 1: INTRODUCTION

Summary

Structure – Top Level

Computer

Main

Memory

Input

Output

Systems

Interconnection

Peripherals

Communication

lines

Central

Processing

Unit

Computer

Computer Arithmetic

and

Logic Unit

Control

Unit

Internal CPU

Interconnection

Registers

CPU

I/O

Memory

System

Bus

CPU

Structure - CPU

CPU

• CPU – controls the operation of the computer

• Components of CPU

– Control Unit – control the operation of the CPU

– Arithmetic Logic Unit (ALU) – performs data processing function e.g. calculation

– Internal CPU Interconnection – provides communication between control unit, registers and ALU.

CPU

Control

Memory

Control Unit

Registers and

Decoders

Sequencing

Logic

Control

Unit

ALU

Registers

Internal

Bus

Control Unit

Structure - Control Unit

CHAP 2: BUS

Summary

• Bus system

• Expansion slots (PCI, PCIe, …)

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Function of Control Unit

• For each operation a unique code is provided

—e.g. ADD, MOVE

• A hardware segment accepts the code and issues the control signals

• We have a computer!

9 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Components

• The Control Unit and the Arithmetic and Logic Unit (ALU) constitute the Central Processing Unit (CPU)

• Data and instructions need to get into the system and results out

—Input/output

• Temporary storage of code and results is needed

—Main memory

10 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM BIT20303-Computer Architecture 11

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Computer Components:

Top Level View

12 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

How Instruction is Executed?

• What is instruction?

— Instruction specify the action that the processor is suppose to take.

• The processing required for a single instruction is called an instruction cycle.

• Instruction cycle are made of these two steps:

— Fetch (processor reads from memory and also referred to as fetch cycle)

— Execute (Also referred to as execute cycle)

13 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Fetch Cycle

• Program Counter (PC) holds address of next instruction to fetch

• Processor fetches instruction from memory location pointed to by PC

• Increment PC

—Unless told otherwise

• Instruction loaded into Instruction Register (IR)

• Processor interprets instruction and performs required actions

14 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Execute Cycle

• An instruction’s execution (execute cycle) may involve one or a combination of these actions

—Processor-memory

– Data transfer between CPU and main memory

—Processor I/O

– Data transfer between CPU and I/O module

—Data processing

– Some arithmetic or logical operation on data

—Control

– Alteration of operations’ sequences

15 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Instruction Format

• Assume both instructions and data are 16 bits (2 bytes) long.

• The instruction format provides 4 bytes for the opcode, so that there can be as many as 24 = 16 different opcodes and up to 212 words of memory can be directly addressed.

Instruction format

Integer format 16 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

What is Word, Half-Word and Double

Word?

• A "word," in computing, is a standard memory size used for data storage. The most popular word sizes for modern computers is 16, 32, or 64 bits.

• Some systems or programming languages do not declare specific sizes for variables and use "word," "half-word" and "double word" to describe how much storage space you are allocating.

• This means that if you have a system with a 32 bit word size, and you declare a double word integer, you have declared a 64 bit integer.

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

Example of Program Execution

Internal CPU Registers

PC (Program Counter)

AC (Accumulator)

– a data register

IR (Instruction Register)

Program to be executed:

Adds the content of the

memory word at address

940 to the content of the

memory word address

941 and stores the result

in latter location.

(Assume a word=16 bits/2

bytes)

18 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

Requires 3 fetch and 3 execute

cycles.

1. {1st Fetch cycle} The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the instruction register IR and the PC is incremented. Note that this process involves the use of a memory address register (MAR) and a memory buffer register (MBR). For simplicity these intermediate registers are ignored.

NOTE: The number used in this example is in

hexadecimal e.g. 0x1940.

19 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

2. {1st Execute cycle} The first

4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded. The remaining 12 bits (3 hexadecimal digits) specify the address (940) from which data are to be loaded.

20 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

3. {2nd Fetch cycle} The next instruction (5941) is fetched from location 301 and the PC is incremented.

21 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

4. {2nd Execute cycle} The old content of the AC and the content of location 941 are added and the result is stored in the AC.

22 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

5. {3rd Fetch cycle} The next instruction (2941) is fetched from location 302 and the PC is incremented.

23 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM

(cont.) Example of Program

Execution

6. {3rd Execute cycle} The content of AC is stored in location 941.

24 BIT20303-Computer Architecture

Fakulti Sains Komputer dan Technology Maklumat (FSKTM), UTHM BIT20303-Computer Architecture 25

1,10 2

3

4,11

5 6

7

8 9

CHAP 3: MEMORY

Summary

Location

• Inside CPU (e.g. Registers)

• Internal (inside the computer e.g. RAM, Level 1 or L1 cache, L2 cache, L3 cache)

• External (outside of the computer e.g. Hard disks, SSD, removable drives)

A Modern Memory Hierarchy

28

Register File 32 words, sub-nsec

L1 cache ~32 KB, ~nsec

L2 cache 512 KB ~ 1MB, many nsec

L3 cache, .....

Main memory (DRAM), GB, ~100 nsec

Swap Disk 100 GB, ~10 msec

manual/compiler register spilling

automatic demand paging

Automatic HW cache management

Memory Abstraction

How to access memory location?

• Random (e.g. RAM) – individual address identify locations exactly

• Direct (e.g. hard disk) – Each block has unique address; access by jumping to specific block plus sequential search

• Associative (e.g. cache) – data is retrieved based on the portion of its contents rather than its address

• Sequentially (e.g. tape) – start from the beginning of the tape; access time depends on location of data and previous location.

RAM

• Two types

– Static RAM (SRAM)

– Dynamic RAM (DRAM)

Memory Technology: DRAM • Dynamic random access memory

• Capacitor charge state indicates stored value – Whether the capacitor is charged or discharged

indicates storage of 1 or 0

– 1 capacitor

– 1 access transistor

• Capacitor leaks through the RC path – DRAM cell loses charge over time

– DRAM cell needs to be refreshed

row enable

_bitlin

e

• Static random access memory

• Two cross coupled inverters store a single bit

– Feedback path enables the stored value to persist in the “cell”

– 4 transistors for storage

– 2 transistors for access

Memory Technology: SRAM

row select

bitlin

e

_bitlin

e

Memory Hierarchy • Fundamental tradeoff

– Fast memory: small

– Large memory: slow

• Idea: Memory hierarchy

• Latency, cost, size,

bandwidth

CPU

Main

Memory

(DRAM)

RF

Cache

Hard Disk

Caching Basics: Exploit Temporal Locality

• Idea: Store recently accessed data in automatically managed fast memory (called cache)

• Anticipation: the data will be accessed again soon

• Temporal locality principle – Recently accessed data will be again accessed in the near future

– This is what Maurice Wilkes had in mind:

• Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Trans. On Electronic Computers, 1965.

• “The use is discussed of a fast core memory of, say 32000 words as a slave to a slower core memory of, say, one million words in such a way that in practical cases the effective access time is nearer that of the fast memory than that of the slow memory.”

Caching Basics: Exploit Spatial Locality • Idea: Store addresses adjacent to the recently accessed one in

automatically managed fast memory – Logically divide memory into equal size blocks

– Fetch to cache the accessed block in its entirety

• Anticipation: nearby data will be accessed soon

• Spatial locality principle – Nearby data in memory will be accessed in the near future

• E.g., sequential instruction access, array traversal

– This is what IBM 360/85 implemented

• 16 Kbyte cache with 64 byte blocks

• Liptay, “Structural aspects of the System/360 Model 85 II: the cache,” IBM Systems Journal, 1968.

The Bookshelf Analogy • Book in your hand

• Desk

• Bookshelf

• Boxes at home

• Boxes in storage

• Recently-used books tend to stay on desk – Comp Arch books, books for classes you are currently taking

– Until the desk gets full

• Adjacent books in the shelf needed around the same time – If I have organized/categorized my books well in the shelf

Cache

• Cache hits vs. Cache misses

• Cache types

– Direct-mapped cache

– Set Associativity cache

CHAP 4: INPUT OUTPUT

Summary

Input/Output Problems

• Wide variety of peripherals

– Delivering different amounts of data

– At different speeds

– In different formats

• All slower than CPU and RAM

• Need I/O modules

39 BIT20303-Computer Architecture

Input/Output Module

• Interface to CPU and Memory

• Interface to one or more peripherals

40 BIT20303-Computer Architecture

Generic Model of I/O Module

41 BIT20303-Computer Architecture

External Devices

• Human readable

– Screen, printer, keyboard

• Machine readable

– Monitoring and control

• Communication

– Modem

– Network Interface Card (NIC)

42 BIT20303-Computer Architecture

External Device Block Diagram Control Signal determines the function that the device will perform such as send data to the I/O module (INPUT or READ) or accept data from the I/O module (OUTPUT or WRITE). Status signal indicates the state of the device e.g. busy or idle. Data are according to the control signal either for READ or WRITE. Buffer is to temporarily hold the data being transferred between I/O and the external environment.

I/O Module Functions

• Control & Timing

• CPU Communication

• Device Communication

• Data Buffering

• Error Detection

44 BIT20303-Computer Architecture

Three Techniques for Input of a Block of Data

45 BIT20303-Computer Architecture

What are the differences between these techniques?

Programmed I/O

BIT20303-Computer Architecture

46

Programmed I/O

• CPU has direct control over I/O

– Sensing status

– Read/write commands

– Transferring data

• CPU waits for I/O module to complete operation

• Wastes CPU time

47 BIT20303-Computer Architecture

Programmed I/O - detail

• CPU requests I/O operation

• I/O module performs operation

• I/O module sets status bits

• CPU checks status bits periodically

• I/O module does not inform CPU directly

• I/O module does not interrupt CPU

• CPU may wait or come back later

48 BIT20303-Computer Architecture

Interrupt-Driven I/O

BIT20303-Computer Architecture

49

Interrupt Driven I/O Basic Operation

• CPU issues read command

• I/O module gets data from peripheral whilst CPU does other work

• I/O module interrupts CPU

• CPU requests data

• I/O module transfers data

50 BIT20303-Computer Architecture

Simple Interrupt Processing

51 BIT20303-Computer Architecture

Direct Memory Access (DMA)

BIT20303-Computer Architecture

52

DMA

• Interrupt driven and programmed I/O require active CPU intervention

– Transfer rate is limited

– CPU is tied up

• DMA is the answer

53 BIT20303-Computer Architecture

DMA Operation

• CPU tells DMA controller:-

– Read/Write

– Device address

– Starting address of memory block for data

– Amount of data to be transferred

• CPU carries on with other work

• DMA controller deals with transfer

• DMA controller sends interrupt when finished

54 BIT20303-Computer Architecture

DMA Transfer Cycle Stealing

• DMA controller takes over bus for a cycle

• Transfer of one word of data

• Not an interrupt – CPU does not switch context

• CPU suspended just before it accesses bus – i.e. before an operand or data fetch or a data

write

• Slows down CPU but not as much as CPU doing transfer

55 BIT20303-Computer Architecture

CHAP 5: COMPUTER ARITHMETIC

Summary

Unsigned Integer

• 0101 + 0010

=(4+1) + 2 = 7

0101

+ 0010

0111

• 0101 1010 + 0001 0001

0101 1010

+ 0001 0001

0110 1011

Unsigned Integer

• 0101 x 0110 0101 x 0110 0000 0101 0000 + 0101 011001

Minimum value = 1000000 = -64 Maximum value = 0111111 = 63

(REVERSE BIT)

(PLUS 1)

Signed Integers (2’s Complement)

OVERFLOW RULE If 2 numbers are added, and they are both positive or both negative, then OVERFLOW occurs if and only if the result has the opposite sign.

Fixed Floating Point

0010.1010 =21 + 2-1 + 2-3

= 2 + (½) + (1/8) = 2 + 0.5 + 0.25 = 2.75

Single-Precision Floating Point

FORMULA: Sign (1 bit).Exponent (3 bit).Significand (4 bit) ANSWER: 1.125x0.5=1.625 Note: Bias = 3, Thus exponent = -1 (where 010 is 2; thus 2 – 3 = -1), 1.001=1 + (1/8)=1 + 0.125

Single Precision Floating Point

• 0 010 0010 (8 bit) • Sign = 0 • Exponent = 010 – 7 = -5 • Significand = 0010 = 2-3 =

(1/8) = 0.25 • (-1)Sign x 1.significand x

2exponent-bias

= (-1)0 x 1.0010 x 2-5

= 1 x (1+0.25) x (1/32) = 1.25 x 0.03125 = 0.0390625

• 1 01111110 00100000000 000000000000 (24 bit)

• (-1)Sign x 1.significand x 2exponent-bias

= (-1)1 x 1.0010 x 2126-127

= -1 x (1+0.25) x 2-1

= -1.25 x 0.5 = -0.625

NOTE: For 8 bit, bias=3 (-3 to 4); for 24 bit, bias=127 (-127 to 128)

• 3-bit bias

• 111=-3

• 011=3

• 8-bit bias

• 1111 1111=-127

• 0111 1111=127

CPU Structure

• CPU must:

– Fetch instructions

– Interpret instructions

– Fetch data

– Process data

– Write data

65 BIT20303-Computer Architecture

CHAP 7: CPU

Summary

CPU With Systems Bus

67 BIT20303-Computer Architecture

CPU Internal Structure

68 BIT20303-Computer Architecture

Registers

• A small storage available in CPU

• Faster than main memory

69 BIT20303-Computer Architecture

Type of Registers

• General Purpose

• Data

• Address – hold addresses that are used by instructions to access main memory (RAM)

• Control and Status

70 BIT20303-Computer Architecture

How to increase speed performance of CPU?

• Improving organization – e.g. locate cache nearer to CPU, increase bus bandwidth

• Increase clock frequency – e.g. from 1 GHz to 5 GHz

• Increase parallelism e.g. pipelining, superscalar, Simultaneous Multithreading (SMT)

Thank You


Recommended