Computer = ALU + Memory Registers ALU 3 2 5 2 3 Let’s try to compute 3 + 2 = 5 32 Go to jail and...

Post on 17-Jan-2018

219 views 0 download

description

Registers ALU GPR Architecture (General Purpose Register) Let’s compute = 5 again ! Bus Y Bus X Bus W Put 3 on bus X Put 2 on bus Y Stuff X and Y into ALU ALU adds X and Y SLU send result to bus W Put bus W into Mem Our programmer needs to do this !

transcript

Computer = ALU + Memory

RegistersALU

3

2

5

2

3

Let’s try to compute 3 + 2 = 5

3 2 Go to jail and do not

collect £200

RegistersALU

GPR Architecture(General Purpose Register)

Let’s compute 3 + 2 = 5 again !

32

5

322

3

5 5

Bus YBus X

Bus W

Put 3 on bus X

Put 2 on bus Y

Stuff X and Y into ALU

ALU adds X and Y

SLU send result to bus W

Put bus W into Mem

Our programmer needs to do this !

GRP Machine Details

Memory

Registers

r11

r0

r1

r2

r3

r4

r10

ALU

..

..

0

8

16

24

32

..

..

Load from Memory

Store to Memory

Load reg from mem

Load reg from mem

Add reg to reg into reg

Store reg in mem

Our programmer

needs to do this !

Accumulator Architecure

Memory

ALU

..

..

0

8

16

24

32

..

..

Get 3 from Memory and ADD !

24

8

1. Assume 8 is already in the accumulator. The programmer writes

Accumulator3

8 Add 3

2. The ALU does 3 + 8 = 11 and writes the result back into the accumulator

3

Let’s build a Computer

Let’s take a RISC. What do we need ?• Memory• Registers• ALU• Control Circuits• A programming language• A good Name - Simple Although Meaningful

What’s needed to build Sam-4 ?

PC

Code Memory

Code Memory – to store the program

Arithmetic – Logic Unit to do the maths

business

Registers to hold results of computationsX

Y

W

Y

W

r1r2

r0

X

Data Memory

0

1

7

mar

mdr

Data memory to hold source and results of

our work

Program Memory

PC = 4

12840

Code Memory

add

haltstore

load

add

Memory stores program instructions at a sequence of byte addresses. Each instruction is 32 bits, so the addresses increment by 4 bytes.

Here the Program Counter input address 4 to the memory which reads out the data word (32 bits) at address 4. This is the inst- ruction ‘add’

Address in

Data out

Registers, Registers1. Registers Store data at addresses. Yep, that’s Memory !

3. Multiport Registers have an input port (W) where data is send to be written into the register file.

2. There are TWO read ports (X and Y) where data can be simultaneously read out of the reg file.

4. The addresses for the read ports (X and Y) and the write port (W) come in here.

X

Y

W

Y

W

r1

r2

r0

X

Data Memory

0

1

7

mar

mdr

Here’s the memory

The Memory Data Register (MDR) is a parking place for data coming and going from the memory.

The Memory Address Register holds the address of the data location selected for read or write e,g, 7

7

Here’s Sam

Data Memory

Instruction reg

Code Memory

ALU

r1

r2

r0X

Y

W

X Y

W

0

1

7mar

mdr

Fetch-Execute Cycle

1. Fetch instruction from memory

2. Decode the opcode and read any

registers

3. Do any ALU operations

5. Write back results to registers

(Much more Clever and Useful)

add r3,r2,r1

Get contents of address 1

4. Do any Memory Access

ALU <- r1 ALU <- r2

ALU add

None needed

r3 <- ALU

First Example

ld r0 , [1]

ld r1 , [2]

add r2,r1,r0

st r2 , [7]

Load r0 with data at address 1Load r1 with data at address 2Add r0 and r1. Put result in r2Store r2 in memory address 7Note each of these instructions

runs through 5 steps of its own F-E Cycle

1. Instruction Fetch

Ld r0,[1]

Code Memory Data

MemoryALU

r1

r2

r0

Ld 0 1

PC = 0

X

Y

W

X Y 0

1

7mar

mdr

2. Decode, Reg Ops

Data Memory

+

Code Memory

ALU

r1

r2

r0Ld r0,[1]

Ld 0 1

PC = 4

1

X

Y

W

X Y 0

1

7mar

mdr

3. ALU Operation

Code Memory Data

MemoryALU

r1

r2

r0Ld r0,[1]

Ld 0 1

PC = 4

1

1

1

X

Y

W

X Y 0

1

7mar

mdr

4. Memory Access

Code Memory Data

MemoryALU

r1

r2

r0Ld r0,[1]

Ld 0 1

PC = 41

1

0

7

X

Y

W

X Y 0

1

7mar

mdr

5. Register Write

Code Memory Data

MemoryALU

r1

r2

r0Ld r0,[1]

Ld 0 1

PC = 4

1

0

7

X

Y

W

X Y

mar

mdr

W

1. Instruction Fetch

Data Memory

Code Memory

ALU

r1

r2

r0X

Y

W

X Y

W

0

1

7

add r2,r0,r1

add 2 0 1

PC = 4 mar

mdr

PC = 8

2. Decode, Reg Ops

Y

Data Memory

+

Code Memory

ALU

r1

r2

r0X

W

X Y

W

0

1

7

add r2,r0,r1

add 2 0 1

mar

mdr

3. ALU Operation

Data Memory

Code Memory

ALU

r1

r2

r0X

Y

W

X Y

W

0

1

7

add r2,r0,r1

add 2 0 1

PC = 8 mar

mdr

4. Memory Access

Data Memory

Code Memory

ALU

r1

r2

r0X

Y

W

X Y

W

0

1

7

add r2,r0,r1

add 2 0 1

PC = 8 mar

mdr

5. Register Write

W

Data Memory

Code Memory

ALU

r1

r2

r0X

Y

W

X Y 0

1

7

add r2,r0,r1

add 2 0 1

PC = 8 mar

mdr

Instruction Encoding Example

add rd rs rt unused

rd <- rs + rt

e.g. add r3, r1, r2 means r3 = r1 + r2

010110 00011 00010 00001 unused

All Sam’s instructions take up 32 bits.

Sam’s instructions start with the opcode then the destination reg- ister then the source register

opcodedestination

Source regs

First 6 bits for the opcode.

3 2 1

6 5 5 5Nr of Bits 11

The Instruction Register

010110 00010 00001 00011 unusedCode Memory

Add r2,r1,r3

add 2 1 3

312

Loaded with the instruction, the IR decodes this into bits which drive the

CPU digital logic circuits

?

Electronic Wires

Control Path001010 00010 00001 00011 unused

000101 00010 00001 00011 unused

add r2, r1, r3

sub r2, r1, r3

ALU

ALU

+

+

-

-

The add instruction is decoded and produces digital signals which select the + function in the ALU

Add !

Subtract !

The sub function decoded produces different digital signals

r1 r3

r1 r3

Sam and MIPS are 32 bit

001010 00110 01001 00011 unused

001010 101001111110010101011011111

001010 00010 00001 0101001111111011

opcode rd rs rt unused

opcode rd rs 16-bit address

add rd,rs,rt

ldr rd,[rs+c]

ldr rd,[c]opcode 26-bit address

32 bits wide

Other Arithmetic Instructions

sub rd rs rt unused

rd <- rs - rtopcodedestination

Source regs

Same coding applies to other arithmetic instructions

sub r3,r2,r1 and r2,r1,r0 or r5,r1,r2

6 5 5 5Nr of Bits

unused

A simple ‘Load’ instruction‘Load into rd the contents of memory at address which is in reg rs.’ Simple!

7696231511541453

210

ldr r9 , [r1]3

145r9

145

rsrdld

opcodedestination

Single source reg

1. Let’s say have already

loaded r1 with 3

2. Get data from mem at addr r1

(=3)

2. Load the data into r9

memory

A more complex ‘Load’

constant crsrdldr

opcodedestination

Source Load register rd with the contents of memory which you find at address r1 + c.

7696231511541453

210

ldr r9 , [r1 + 2]3 + 2

5

231r9

231

The mem

address is

formed as a sum

memory

… and a ‘Store’ instruction

constant crsrdstr

opcode destinationSource

Note here the data is moved from destination to store. Confusing? Mm.

7696196511541453

210

str r9 , [ r1 + 2 ]

3 + 2

5

196r9

196

1. Get data from r1

2. Write it to memory

What’s this?

‘Load Immediate’

Constant Crdldi

opcodedestination

In load immediate we get the constant C immediately following the opcode into the reg.

ldi r9 , 5

5

5r9

All reference to memory has gone!

Load ‘5’ straight into r9

A Summary So Far …

Example

add r3,r1,r1add rd,rs,rt

str r6,[r1 + 1]str rd, [rs + c]

str r0, [r1]st rd, [rs]

ldr r2,[r3 + 4]ldr rd, [rs + c]

ldr r2,[4]ld rd, [rs]

ldi r0,3ldi rd,C

Now it’s time to move on and look in detail at the hierarchy of computer languages – to see the influence

on the ISA.

Electronics

Assembling a Spreadsheet

ld r0, [ g ]

ld r1, [ h ]

add r2,r0,r1

st r2, [ f ]

Main() {

int f,g,h;

f = g + h;

}

Excel Applicatio

n

HLL Imple-mentation

ISA Assembler

The Great Idea here is that the ISA we need at

the bottom must serve the grand master at the

top, the Application.

The ISA must support the HLL implementation

Arrays (= Tables)How do we sum the array of numbers in column B? 1. We would use the

instruction ld r1,[r0 + B] where B=3, the start address of the array

2. Then we load r0 with 0 then 1, then 2, … to scan down the array

Ld r0 , 0

Ld r3 , 0

Ld r1, [r0 + 3]

r0 (=0) +3 = 3

Arrays (= Tables)How do we sum the array of numbers in column B?

Inc r0

Ld r1, [r0 + 3]

add r3,r3,r1

Get next cell, lad its value and add it

to the sum, in r3

1. Increment r1 to get the next data value inc r1 (0 + 1 = 1)

2. ld r2,[r0 + B] where B=3, the start address of the array but now r- contains 0

Making Decisions

if(c == 10) b = b + 2;

Let’s say we want to add 2 to a number B if

another number C is equal to 10

You mean, ‘If C = 10, then add 2

to B’

Yep

Here’s how we would do

it in C…

addi r3,r3,2

bne r2,r1,36

ldi r1,10

36

32

28

24

20

16

Branch around the

addBranch if not equal r1 r2 to addr 36

What about SAM?

First load the test number

10

Loops

ldi r2 , 0

ldi r1 , 4

ldi r0 , 0

8

4

0

bne r0,r1,12

addi r0 , r0 , 1

addi r2 , r2 , 3

20

16

12

Let’s say we want to make the sequence 0,3,6,9,12 and stop.

01234

0369

12

We take 4 steps and each step add

3 x = x + 3

So we need a register to

keep track of the number of steps (r0)

And a register to hold the

sum at each step

r0 r2Branch

unless r0 = r1 = 4

CBP 2005 Comp3070 Computer Architecture

75

Some x86 instructionsmov ax , [bx + c]mov [ax] , bxadd ax , bxadd [bx] , ax

These look rather like Sam’s RISC ops

But this is not. Here the contents of ax is being added straight into memory ! The x86 is a register – memory ISA and Sam is a register – register ISA

ldi r1 , aldi r2 , badd r3,r1,r2 st r3 , b

mov ax, aadd b,ax

Let’s compare the RR and RM ISA’s. Clearly RR needs more memory while the RM uses stronger operations

Sam

Intel x86

CBP 2005 Comp3070 Computer Architecture

77

Intel Instruction FormatIA-32 Format

CBP 2005 Comp3070 Computer Architecture

79

Variable Length Instructions

0% 10% 20% 30%

1

2

3

4

5

6

7

8

9

10

ExpressoGccSpiceNasa

All Sam’s instructions had the same length, 32 bits. This is also true for other RISC ISA’s such as SPARC and MIPS. Compare this with the x86 instruction vary from 1 to 17 bytes. Here’s some stats.

Inst

ruct

ion

Leng

th (b

ytes

)

Frequency of use

Clearly long complex instructions are used infrequently

But the use does depend on the app.

CBP 2005 Comp3070 Computer Architecture

81

Instruction TimingT1 T2 T3 T4 T5

Fetch

Decode, Reg Op

ALU Op

Mem Access

Reg Write

All Sam’s instructions occur in 5 clock cycles

One Clock Cycle

Time

• 1 Gigahertz SPARC in 1 second are 1 GigaClockCycles• That’s 109 cycles• That’s 1,000,000,000 cycles• That’s 200,000,000 add ops !

CBP 2005 Comp3070 Computer Architecture

83

Variable Time InstructionsHere’s a timing diagram for an Intel add

T1 T2 T3 T4 T5

Fetch Decode, Reg Op ALU Mem

AccessReg Write

T1 T2 T3 T4 T5

Fetch Decode, Reg Op ALU Mem

AccessReg Write

add ax , [bx + c]

[bx + c] ax = ax + mem[]

We need two adds. The first to get the address summed up …

… and the second to actually add memory to register ax

CBP 2005 Comp3070 Computer Architecture

85

strcmp(str, Greenspan);

Potent x86 Instructions

mov x,2 Immediate to memory 6xlat x Translate al via table 1imul x Multiply memory with

ax4

inc x Increment memory by 1

4

Repne scasb Scan string for match ! various

Greenspan

1.Application

2.High-Level Language (‘C’)

3.Intel ISA code

CBP 2005 Comp3070 Computer Architecture

87

Top 10 Intel x86 InstructionsTop 10 Intel x86 InstructionsRank Instruction Usage1 load 22% 2

conditional branch 20% 3 arithmetic / logic 19%

4 compare 16% 5 store 12 % 6 move reg - reg 4% 7

call - return 2%

We see that most instructions are Simple load, store, calculate, branch. None of Intel’s potent stuff figures here. So why did Intel design instructions no-one uses ?

CBP 2005 Comp3070 Computer Architecture

89

ISA R&D into the 80’s

1980 Berkeley Patterson RISC (SPARC)1981 Stanford Hennessy MIPS - Easy to Decode Ops

- Fast Issue Rate - Only load and Store references memory - Lots of registers

Emerging Design Guidelines

Let’s downshift and make things simpler …• Use simple instructions, load, store, add• Many of these will do one x86 potent op• Need more memory, but memory is cheap• More CPU cycles, but can still be faster

CBP 2005 Comp3070 Computer Architecture

91

Intel Architecture

Looks Great from the outside …… but is a golden mishmash with history of add-ons

CBP 2005 Comp3070 Computer Architecture

93

RISC Architecture

RISC Architecture

Minimalist Functional

CBP 2005 Comp3070 Computer Architecture

95

Summary … so farRISC

MinimalistSomething like ZenAll instructions the same length in memorySmall number of instructionsSmall number of addressing modesSimple instructions5 clock cyclesSPARC, MIPS

CISC

Different Length in memoryLarge number of instructionsHuge number of addressing modesComplex InstructionsVariable number of clock cycles.

Intel

CBP 2005 Comp 3070 Computer Architecture

97

Today the consequences of …

Intel (CISC) MIPS (RISC)

CBP 2005 Comp 3070 Computer Architecture

99

Laundry Model

Washer Drier Store Basket Wardrobe

CBP 2005 Comp 3070 Computer Architecture

101

Process Steps

A. Wash then Dry

idle

idle running

running

time

time9.00 10.00 11.00

1. Load the washer at 9.00

2. Done at 10, load the drier

3. Drier Done at 11

CBP 2005 Comp 3070 Computer Architecture

103

Sequential Process

3 loads takes 6 hours

time9.00 15.00 11.00

1. Load washer at 9.002. Done at 10, load

drier3. Drier Done at 114. Reload washer at

115. Done at 12, load

drier6. Drier done at 137. Reload washer at

138. Done at 14, load

drier9. Done at 15

13.00

CBP 2005 Comp 3070 Computer Architecture

105

Overlapping Process

3 loads takes 4 hours

time9.00 15.00 11.00

1. Load washer at 9.002. Done at 10, load drier

reload washer3. Both Done at 11. Reload

drier reload washer4. Both done at 12. Reload

drier5. Drier done at 13

13.00

From 10.00 till 11.00 both washer and dryer running concurrently

CBP 2005 Comp 3070 Computer Architecture

107

Washing Pipeline Filling

time

9.00 11.00 13.00 15.00 17.00

18.00

5 loads in 9 hours

5 Cycles !!!1. Get washing2. Wash3. Dry4. Store5. Put away

CBP 2005 Comp 3070 Computer Architecture

109

Can we Pipeline SAM ?

Data Memory

Instruction reg

Code Memory

ALU

r1

r2

r0X

Y

W

X Y

W

0

1

7mar

mdr

1.Fetch 2.Dec/Reg 3.ALU 4.Mem

5.RW

CBP 2005 Comp 3070 Computer Architecture

111

Pipelined Sam4

Data Memory

0

1

7

X

Y

W

Y

W

r1r2

r0

X

Code Memory

1.Fetch 2.Dec/Reg

3.ALU 4.Mem 5.RW

Buffer

time

CBP 2005 Comp 3070 Computer Architecture

113

5 Stages in Pipeline

ALUMem Reg Mem Reg

add r3,r1,r2 r1,r2 r3add

Let’s take the instruction add r3,r1,r2 and show which stage is needed for each part of the instruction.

1.Fetch 2.Dec/Reg

3.ALU 4.Mem 5.RW

time

CBP 2005 Comp 3070 Computer Architecture

115

ld r0 Mem r3

Two Instructions

ld r3,[r0+2]

Two instructions into the pipeline

add r4,r1,r2 ALUadd r1,r2 r4

r0

2

time

CBP 2005 Comp 3070 Computer Architecture

117

Structural Hazard

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

Here we are being asked to read from memory and write to it simultaneously. Impossible!

Write (store)

Read (fetch)

Solution – Use separate code and data memories

add r4,r1,r2

st r0,[5]

CBP 2005 Comp 3070 Computer Architecture

119

Hazardous Washing

time

9.00 11.00 13.00 15.00 17.00

18.00

Washing basket containes both clean and dirty washing!

CBP 2005 Comp 3070 Computer Architecture

121

Code and Data Memories

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

CBP 2005 Comp 3070 Computer Architecture

123

add r1,r2 r3

Data Hazard

add r3,r1,r2

but need r3 hereEARLIER !

add r4,r1,r3 add r1,r3 r4

r3 set heretime

CBP 2005 Comp 3070 Computer Architecture

125

Data Hazard

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Need value of r3 for second instruction before the first is complete.

CBP 2005 Comp 3070 Computer Architecture

127

Pipeline Stalls

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

Mem

ALUReg Mem RegStall Stall

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Resolve Hazard – Insert delay into second instruction stream. ‘Stall’ Cycles.

But this needs extra electronics on the chip. Complex and Costly.

CBP 2005 Comp 3070 Computer Architecture

129

Forwarding

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Need value of r3 for second instruction before the first is complete.

So build in extra circuits to get the data as soon as it is available from the ALU

CBP 2005 Comp 3070 Computer Architecture

131

Compiler resolves Hazard

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

ALUMem Reg Mem Reg

add r3,r1,r2

add r4,r1,r3

Compile can detect possible hazard and insert 2 nops (‘no ops’)

ALUMem Reg Mem Reg

ALUMem Reg Mem Regnop

nop

CBP 2005 Comp 3070 Computer Architecture

133

Example op code regs alu mem reg

writeld r1,[7]

ld r2,[8]

add r3,r1,r2

ld r1[7]

ld r2[8]

addr1, r2 r3