Computer Architecture - OS3 · What is computer architecture? (a) How to build a processor common...

Post on 12-May-2020

15 views 0 download

transcript

Computer Architecture

Sebastian Altmeyer(altmeyer@uva.nl)

What is computer architecture?

What is computer architecture?

(a) How to build a processorcommon understanding by computer architects

(b) How to build a computercommon understanding by many others …

(c) How a computer system worksIncluding all system layers

Aim of this lecture

● How a processor works● How instructions are processed● What the following items mean:

➢ Register and Gates➢ ALU➢ Program Counter➢ Instructions, Machine Code, Assembler➢ Pipelining➢ Caches

Aim of the next lecture

● How the peripherals of a processor work● What the following items mean:

➢ Interrupts?➢ Memory Management?➢ Virtual Memory?➢ Runtime systems?

(Actual content not yet fixed ...)

Unfortunately ...

that is way too much

Hence:– rough overview and basic concepts

– highly simplified explanation

– focus on the necessary abstractions

– and at some points you simply have to believe me ;)

– Feedback!

Intel Core i7

Quadcore with 731,000,000 transistors

… and it's natural habitat

Highly complex, but simple components

● registers (to store information)● gates (to process information● buses (to connect registers and gates)● a clock

A processor can only flip 0 to 1 and vice versa

… but that is all we need

Gottfried LeibnizBinary Numbers

George Boole Boolean Algebra

John von NeumannFirst Processor

Claude ShannonExpressivness of Relays

Leibniz: Binary Numbers Or, how to represent any natural number just

using bits

n denotes register/bus width, i.e.32-bit architecture means n = 32

Example (8 bits): 00101010 = 42

Range: [0;2n-1-1]

bn−1 bn−2 bn−3…b2 b1 b0≡∑i=0

n−1

bi2i

Two's Complement how to represent a negative number

Example (8 bits):11010110 = -42

00101010 = 42

Range: [-2n-2;2n-2-1]

bn−1 bn−2 bn−3…b2 b1 b0≡−bn−1 2n−1+∑

i=0

n−2

b i2i

Floating Point(IEEE 754 single-precision 32-bit)

b31b30b29...b23b22b21...b0

Sign Exponent Fraction

−1Sign(Fraction)2 2(Exponent )2−127

Boole: Boolean Algebra

z , v ,w∈{0,1 }

z=¬v

z=v∧w

z=v∨w

Variables:

Operators:

Claude Shannon

Has shown in his Master's Thesis: „A Symbolic Analysis of

Relay and Switching Circuits“how to implement boolean logic using electronic

circuits

Boole and Shannon: Gates

z=¬v

z=v∧w

z=v∨w

Half Adder/Full Adder

xor or and

Adder

How to substract?

How to substract?

a−b=a+b+1

Add/Substract

a−b=a+b+1

Arithmetic Logic Unit (ALU)

ALU and Register

How to control ALU/Registers?Machine Code

s2s

1s

0rs

3rs

2rs

1rs

0rt

3rt

2rt

1rt

0rd

3rd

2rd

1rd

0

OP regS regT regD

regD = regS [OP] regT

PC + Instruction Memory

s2s

1s

0rs

3rs

2rs

1rs

0rt

3rt

2rt

1rt

0rd

3rd

2rd

1rd

0

OP regS regT regD

regD = regS [OP] regT

Calculator Instructions

Calculator

Clock

Increase PC write result in register

one cycle

Here comes the example (SIM-PL)

Calculator with Immediates

Instructions

Machine Code

Instructions with 2 registerst0s

2s

1s

0rs

3rs

2rs

1rs

0rt

3rt

2rt

1rt

0rd

3rd

2rd

1rd

0

Type OP regS regT regD

regD = regS [OP] regT if Type = 1

Type with immediatest0s

2s

1s

0rs

3rs

2rs

1rs

0xxxxrd

3rd

2rd

1rd

0i15

i14

...i0

Type OP regS regD regT

regD = regS [OP] IMM if Type = 0

Loop Instructions

Calculator with Loops

condition

Registers are very limited ...

Little Endian/Big Endian

describes order in which bytes are written to memory ...

Harvard Machine

Load Store machine

Complete Instruction Set

Turing Complete

We can now compute whatever we want …

Further improvements target - increasing usability- speed

Usability

von Neumann Architecture

Just like Harvard Architecture, with one little difference

Instruction Memory and Data Memory are the same

(predates Harvard Architecture)

Procedure Call

PC PC + Offset

$ra PC + 1; Store PC+1 in register

Harvard Machine: ReturnLoad PC from register $ra

more details (recursive functions, stack, heap) in next lecture

Interrupts

ways to interrupt current execution for other stuff

Examples: ● pressing a key on the keyboard● network data available● shutting down processes● moving mouse

more details in the next lecture

Speed

Pipelining

currently:● multiple cycles for one instruction● large parts of the pipeline are idle

with pipelining● one cycle for one instruction ● nearly all parts are nearly always busy

(similar to conveyor belt)

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Harvard Pipelining

Pipeline Hazards/Multicycle Instructions

One instruction per cycle is the best case … but that's not always possible:

● Memory accesses● Branches● Dependencies

ADD $r1, $r2, $r3ADD $r4, $r1, $r2

Forwarding

Data dependency

Data dependency

Harvard Machine with Forwarding

Branch Prediction

Clock rate scaling

easiest way to improve performance … but mostly for processor speed

Load $r2, _aLoad $r1, _b

Add $r3, $r2, $r1

MPC 5xx

Load $r2, _aLoad $r1, _b

Add $r3, $r2, $r1

MPC 755

Memory Hierarchy

emulates a fast and large memory

● on top: small and fast ● on bottom: large and

slow

each level contains a subset of the data below

rough idea:

books on your desk, books in your shelf, books in the library

Principle of Locality

● Spatial Locality neighboring memory blocks are likely to be accessed contemporary

● Temporal Locality recently accessed memory blocks are likely to be accessed in the near future again

Harvard Architecture with cache

direct-mapped cache

Internal cache organization

2 way set-associative cache

Concept can be extended to fully-associative caches

Different types of cache misses

● Compulsory (cold) misses: caches are initially empty, first access is always a miss.

● Capacity misses due to the limited cache capacity (i.e. cache is full)

● Conflict misses due to an unbalanced cache usage (eviction in one cache set, while other lines are still empty)

See you in the lab session!

Exercise 1 (Boolean Algebra)

a) What is the minimal subset of the set of basic operations {and, or, not} sufficient to derive all logical operations? Justify your answer.

b) What is the minimal subset if we add the nand operation (i.e. not and) to the set {and, or, not}? Justify your answer.

Exercise 2 (Digital Circuit)

a) Draw a digital circuit of an 4-bit incrementer, i.e., a circuit that satisfies the equation

b = a + 1 mod 24

You can use the the operations from the set {and,or,nand,nor, not}.

b) Try to find an incrementer with the minimal number of operations needed.

c) Try to minimize the depth of the circuit, i.e. the maximal length of any path from input to output.

Exercise 3 (Assembler)

Write an assembler program (for the harvard machine) that converts memory data from little endian to big endian. Assume that the address of the memory data that should be converted is stored in register 1.

Test your assembly code using the SIM-PL.

Exercise 4 (Loops)

Extend you assembly code from Exercise 4 so that a complete block of data is converted.

Assume that the initial memory address is stored in register 1 and the number of blocks that should be converted in register 2. Your code shall convert each memory block from

[r1;r1 + r2]