Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and...

transcript

Computer = ALU + Memory

RegistersALU

Let’s try to compute 3 + 2 = 5

3 2 Go to jail and do not

collect £200

RegistersALU

GPR Architecture(General Purpose Register)

Let’s compute 3 + 2 = 5 again !

Bus YBus X

Put 3 on bus X

Put 2 on bus Y

Stuff X and Y into ALU

ALU adds X and Y

SLU send result to bus W

Put bus W into Mem

Our programmer needs to do this !

GRP Machine Details

Memory

Registers

Load from Memory

Store to Memory

Load reg from mem

Add reg to reg into reg

Store reg in mem

Our programmer

needs to do this !

Accumulator Architecure

Memory

Get 3 from Memory and ADD !

1. Assume 8 is already in the accumulator. The programmer writes

Accumulator3

8 Add 3

2. The ALU does 3 + 8 = 11 and writes the result back into the accumulator

Let’s build a Computer

Let’s take a RISC. What do we need ?• Memory• Registers• ALU• Control Circuits• A programming language• A good Name - Simple Although Meaningful

What’s needed to build Sam-4 ?

Code Memory

Code Memory – to store the program

Arithmetic – Logic Unit to do the maths

business

Registers to hold results of computationsX

Data Memory

Data memory to hold source and results of

our work

Program Memory

PC = 4

Code Memory

haltstore

Memory stores program instructions at a sequence of byte addresses. Each instruction is 32 bits, so the addresses increment by 4 bytes.

Here the Program Counter input address 4 to the memory which reads out the data word (32 bits) at address 4. This is the instruction ‘add’

Address in

Data out

Registers, Registers1. Registers Store data at addresses. Yep, that’s Memory !

3. Multiport Registers have an input port (W) where data is send to be written into the register file.

2. There are TWO read ports (X and Y) where data can be simultaneously read out of the reg file.

4. The addresses for the read ports (X and Y) and the write port (W) come in here.

Data Memory

Here’s the memory

The Memory Data Register (MDR) is a parking place for data coming and going from the memory.

The Memory Address Register holds the address of the data location selected for read or write e,g, 7

Here’s Sam

Data Memory

Instruction reg

Code Memory

Fetch-Execute Cycle

1. Fetch instruction from memory

2. Decode the opcode and read any

registers

3. Do any ALU operations

5. Write back results to registers

(Much more Clever and Useful)

add r3,r2,r1

Get contents of address 1

4. Do any Memory Access

ALU <- r1 ALU <- r2

ALU add

None needed

r3 <- ALU

First Example

ld r0 , [1]

ld r1 , [2]

add r2,r1,r0

st r2 , [7]

Load r0 with data at address 1Load r1 with data at address 2Add r0 and r1. Put result in r2Store r2 in memory address 7Note each of these instructions

runs through 5 steps of its own F-E Cycle

1. Instruction Fetch

Ld r0,[1]

Code Memory Data

MemoryALU

Ld 0 1

PC = 0

2. Decode, Reg Ops

Data Memory

Code Memory

r0Ld r0,[1]

Ld 0 1

PC = 4

3. ALU Operation

Code Memory Data

MemoryALU

r0Ld r0,[1]

Ld 0 1

PC = 4

4. Memory Access

Code Memory Data

MemoryALU

r0Ld r0,[1]

Ld 0 1

PC = 41

5. Register Write

Code Memory Data

MemoryALU

r0Ld r0,[1]

Ld 0 1

PC = 4

1. Instruction Fetch

Data Memory

Code Memory

add r2,r0,r1

add 2 0 1

PC = 4 mar

PC = 8

2. Decode, Reg Ops

Data Memory

Code Memory

add r2,r0,r1

add 2 0 1

3. ALU Operation

Data Memory

Code Memory

add r2,r0,r1

add 2 0 1

PC = 8 mar

4. Memory Access

Data Memory

Code Memory

add r2,r0,r1

add 2 0 1

PC = 8 mar

5. Register Write

Data Memory

Code Memory

add r2,r0,r1

add 2 0 1

PC = 8 mar

Instruction Encoding Example

add rd rs rt unused

rd <- rs + rt

e.g. add r3, r1, r2 means r3 = r1 + r2

010110 00011 00010 00001 unused

All Sam’s instructions take up 32 bits.

Sam’s instructions start with the opcode then the destination register then the source register

opcodedestination

Source regs

First 6 bits for the opcode.

6 5 5 5Nr of Bits 11

The Instruction Register

010110 00010 00001 00011 unusedCode Memory

Add r2,r1,r3

add 2 1 3

Loaded with the instruction, the IR decodes this into bits which drive the

CPU digital logic circuits

Electronic Wires

Control Path001010 00010 00001 00011 unused

000101 00010 00001 00011 unused

add r2, r1, r3

sub r2, r1, r3

The add instruction is decoded and produces digital signals which select the + function in the ALU

Subtract !

The sub function decoded produces different digital signals

Sam and MIPS are 32 bit

001010 00110 01001 00011 unused

001010 101001111110010101011011111

001010 00010 00001 0101001111111011

opcode rd rs rt unused

opcode rd rs 16-bit address

add rd,rs,rt

ldr rd,[rs+c]

ldr rd,[c]opcode 26-bit address

32 bits wide

Other Arithmetic Instructions

sub rd rs rt unused

rd <- rs - rtopcodedestination

Source regs

Same coding applies to other arithmetic instructions

sub r3,r2,r1 and r2,r1,r0 or r5,r1,r2

6 5 5 5Nr of Bits

unused

A simple ‘Load’ instruction‘Load into rd the contents of memory at address which is in reg rs.’ Simple!

7696231511541453

ldr r9 , [r1]3

rsrdld

opcodedestination

Single source reg

1. Let’s say have already

loaded r1 with 3

2. Get data from mem at addr r1

2. Load the data into r9

memory

A more complex ‘Load’

constant crsrdldr

opcodedestination

Source Load register rd with the contents of memory which you find at address r1 + c.

7696231511541453

ldr r9 , [r1 + 2]3 + 2

The mem

address is

formed as a sum

memory

… and a ‘Store’ instruction

constant crsrdstr

opcode destinationSource

Note here the data is moved from destination to store. Confusing? Mm.

7696196511541453

str r9 , [ r1 + 2 ]

1. Get data from r1

2. Write it to memory

What’s this?

‘Load Immediate’

Constant Crdldi

opcodedestination

In load immediate we get the constant C immediately following the opcode into the reg.

ldi r9 , 5

All reference to memory has gone!

Load ‘5’ straight into r9

A Summary So Far …

Example

add r3,r1,r1add rd,rs,rt

str r6,[r1 + 1]str rd, [rs + c]

str r0, [r1]st rd, [rs]

ldr r2,[r3 + 4]ldr rd, [rs + c]

ldr r2,[4]ld rd, [rs]

ldi r0,3ldi rd,C

Now it’s time to move on and look in detail at the hierarchy of computer languages – to see the influence

on the ISA.

Electronics

Assembling a Spreadsheet

ld r0, [ g ]

ld r1, [ h ]

add r2,r0,r1

st r2, [ f ]

Main() {

int f,g,h;

f = g + h;

Excel Applicatio

HLL Imple-mentation

ISA Assembler

The Great Idea here is that the ISA we need at

the bottom must serve the grand master at the

top, the Application.

The ISA must support the HLL implementation

Arrays (= Tables)How do we sum the array of numbers in column B? 1. We would use the

instruction ld r1,[r0 + B] where B=3, the start address of the array

2. Then we load r0 with 0 then 1, then 2, … to scan down the array

Ld r0 , 0

Ld r3 , 0

Ld r1, [r0 + 3]

r0 (=0) +3 = 3

Arrays (= Tables)How do we sum the array of numbers in column B?

Inc r0

Ld r1, [r0 + 3]

add r3,r3,r1

Get next cell, lad its value and add it

to the sum, in r3

1. Increment r1 to get the next data value inc r1 (0 + 1 = 1)

2. ld r2,[r0 + B] where B=3, the start address of the array but now r- contains 0

Making Decisions

if(c == 10) b = b + 2;

Let’s say we want to add 2 to a number B if

another number C is equal to 10

You mean, ‘If C = 10, then add 2

to B’

Here’s how we would do

it in C…

addi r3,r3,2

bne r2,r1,36

ldi r1,10

Branch around the

addBranch if not equal r1 r2 to addr 36

What about SAM?

First load the test number

ldi r2 , 0

ldi r1 , 4

ldi r0 , 0

bne r0,r1,12

addi r0 , r0 , 1

addi r2 , r2 , 3

Let’s say we want to make the sequence 0,3,6,9,12 and stop.

We take 4 steps and each step add

3 x = x + 3

So we need a register to

keep track of the number of steps (r0)

And a register to hold the

sum at each step

r0 r2Branch

unless r0 = r1 = 4

CBP 2005 Comp3070 Computer Architecture

Some x86 instructionsmov ax , [bx + c]mov [ax] , bxadd ax , bxadd [bx] , ax

These look rather like Sam’s RISC ops

But this is not. Here the contents of ax is being added straight into memory ! The x86 is a register – memory ISA and Sam is a register – register ISA

ldi r1 , aldi r2 , badd r3,r1,r2 st r3 , b

mov ax, aadd b,ax

Let’s compare the RR and RM ISA’s. Clearly RR needs more memory while the RM uses stronger operations

Intel x86

Intel Instruction FormatIA-32 Format

Variable Length Instructions

0% 10% 20% 30%

ExpressoGccSpiceNasa

All Sam’s instructions had the same length, 32 bits. This is also true for other RISC ISA’s such as SPARC and MIPS. Compare this with the x86 instruction vary from 1 to 17 bytes. Here’s some stats.

Frequency of use

Clearly long complex instructions are used infrequently

But the use does depend on the app.

Instruction TimingT1 T2 T3 T4 T5

Decode, Reg Op

ALU Op

Mem Access

Reg Write

All Sam’s instructions occur in 5 clock cycles

One Clock Cycle

• 1 Gigahertz SPARC in 1 second are 1 GigaClockCycles• That’s 109 cycles• That’s 1,000,000,000 cycles• That’s 200,000,000 add ops !

Variable Time InstructionsHere’s a timing diagram for an Intel add

T1 T2 T3 T4 T5

Fetch Decode, Reg Op ALU Mem

AccessReg Write

T1 T2 T3 T4 T5

Fetch Decode, Reg Op ALU Mem

AccessReg Write

add ax , [bx + c]

[bx + c] ax = ax + mem[]

We need two adds. The first to get the address summed up …

… and the second to actually add memory to register ax

strcmp(str, Greenspan);

Potent x86 Instructions

mov x,2 Immediate to memory 6xlat x Translate al via table 1imul x Multiply memory with

inc x Increment memory by 1

Repne scasb Scan string for match ! various

Greenspan

1.Application

2.High-Level Language (‘C’)

3.Intel ISA code

Top 10 Intel x86 InstructionsTop 10 Intel x86 InstructionsRank Instruction Usage1 load 22% 2

conditional branch 20% 3 arithmetic / logic 19%

4 compare 16% 5 store 12 % 6 move reg - reg 4% 7

call - return 2%

We see that most instructions are Simple load, store, calculate, branch. None of Intel’s potent stuff figures here. So why did Intel design instructions no-one uses ?

ISA R&D into the 80’s

1980 Berkeley Patterson RISC (SPARC)1981 Stanford Hennessy MIPS - Easy to Decode Ops

- Fast Issue Rate - Only load and Store references memory - Lots of registers

Emerging Design Guidelines

Let’s downshift and make things simpler …• Use simple instructions, load, store, add• Many of these will do one x86 potent op• Need more memory, but memory is cheap• More CPU cycles, but can still be faster

Intel Architecture

Looks Great from the outside …… but is a golden mishmash with history of add-ons

RISC Architecture

Minimalist Functional

Summary … so farRISC

MinimalistSomething like ZenAll instructions the same length in memorySmall number of instructionsSmall number of addressing modesSimple instructions5 clock cyclesSPARC, MIPS

Different Length in memoryLarge number of instructionsHuge number of addressing modesComplex InstructionsVariable number of clock cycles.

CBP 2005 Comp 3070 Computer Architecture

Today the consequences of …

Intel (CISC) MIPS (RISC)

Laundry Model

Washer Drier Store Basket Wardrobe

Process Steps

A. Wash then Dry

idle running

running

time9.00 10.00 11.00

1. Load the washer at 9.00

2. Done at 10, load the drier

3. Drier Done at 11

Sequential Process

3 loads takes 6 hours

time9.00 15.00 11.00

1. Load washer at 9.002. Done at 10, load

drier3. Drier Done at 114. Reload washer at

115. Done at 12, load

drier6. Drier done at 137. Reload washer at

138. Done at 14, load

drier9. Done at 15

Overlapping Process

3 loads takes 4 hours

time9.00 15.00 11.00

1. Load washer at 9.002. Done at 10, load drier

reload washer3. Both Done at 11. Reload

drier reload washer4. Both done at 12. Reload

drier5. Drier done at 13

From 10.00 till 11.00 both washer and dryer running concurrently

Washing Pipeline Filling

9.00 11.00 13.00 15.00 17.00

5 loads in 9 hours

5 Cycles !!!1. Get washing2. Wash3. Dry4. Store5. Put away

Can we Pipeline SAM ?

Data Memory

Instruction reg

Code Memory

1.Fetch 2.Dec/Reg 3.ALU 4.Mem

Pipelined Sam4

Data Memory

Code Memory

1.Fetch 2.Dec/Reg

3.ALU 4.Mem 5.RW

Buffer

5 Stages in Pipeline

ALUMem Reg Mem Reg

add r3,r1,r2 r1,r2 r3add

Let’s take the instruction add r3,r1,r2 and show which stage is needed for each part of the instruction.

1.Fetch 2.Dec/Reg

3.ALU 4.Mem 5.RW

ld r0 Mem r3

Two Instructions

ld r3,[r0+2]

Two instructions into the pipeline

add r4,r1,r2 ALUadd r1,r2 r4

Structural Hazard

ALUMem Reg Mem Reg

Here we are being asked to read from memory and write to it simultaneously. Impossible!

Write (store)

Read (fetch)

Solution – Use separate code and data memories

add r4,r1,r2

st r0,[5]

Hazardous Washing

9.00 11.00 13.00 15.00 17.00

Washing basket containes both clean and dirty washing!

Code and Data Memories

ALUMem Reg Mem Reg

add r1,r2 r3

Data Hazard

add r3,r1,r2

but need r3 hereEARLIER !

add r4,r1,r3 add r1,r3 r4

r3 set heretime

Data Hazard

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Need value of r3 for second instruction before the first is complete.

Pipeline Stalls

ALUMem Reg Mem Reg

ALUReg Mem RegStall Stall

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Resolve Hazard – Insert delay into second instruction stream. ‘Stall’ Cycles.

But this needs extra electronics on the chip. Complex and Costly.

Forwarding

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Need value of r3 for second instruction before the first is complete.

So build in extra circuits to get the data as soon as it is available from the ALU

Compiler resolves Hazard

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Compile can detect possible hazard and insert 2 nops (‘no ops’)

ALUMem Reg Mem Reg

ALUMem Reg Mem Regnop

Example op code regs alu mem reg

writeld r1,[7]

ld r2,[8]

add r3,r1,r2

ld r1[7]

ld r2[8]

addr1, r2 r3

Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and...

Documents