+ All Categories
Home > Documents > Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our...

Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our...

Date post: 06-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
27
Fundamentals of Computer Systems A Multicycle MIPS Processor Stephen A. Edwards and Martha A. Kim Columbia University Fall 2012 Illustrations Copyright 2007 Elsevier
Transcript
Page 1: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Fundamentals of Computer SystemsA Multicycle MIPS Processor

Stephen A. Edwardsand

Martha A. Kim

Columbia University

Fall 2012

Illustrations Copyright© 2007 Elsevier

Page 2: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

State Elements

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

PCPC'

WD

WE

CLK

EN

Page 3: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathFetch instruction from memory

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

PCPC' Instr

CLK

WD

WE

CLK

EN

IRWrite

Page 4: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathRead source operands from register file

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

PCPC' Instr25:21

CLK

WD

WE

CLK CLK

A

EN

IRWrite

Page 5: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathSign-extend the immediate

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr25:21

15:0

CLK

WD

WE

CLK CLK

A

EN

IRWrite

Page 6: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathAdd base address to offset

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

AL

U

WD

WE

CLK CLK

A CLK

EN

IRWrite

Page 7: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathLoad data from memory

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr25:21

15:0

SrcB

ALUResult

SrcA

ALUOut

CLK

ALUControl2:0

AL

U

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

0

1

Page 8: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathWrite data back to register file

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

PCPC' Instr25:21

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

RegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A CLK

EN

IRWriteIorD

0

1

Page 9: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathAdd 4 to PC

PCWrite

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1PCPC' Instr25:21

15:0

SrcB

20:16

ALUResult

SrcA

ALUOut

ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD

0

1

Page 10: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathFor sw: Write register data to memory

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1PC 0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

ALUResult

SrcA

ALUOut

MemWrite ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

B

Page 11: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathFor R-type instructions: Write ALU result to registers

0

1

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1PC 0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

ALUResult

SrcA

ALUOut

RegDstMemWrite MemtoReg ALUSrcARegWrite

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorDPCWrite

Page 12: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathFor bne: Add immediate to PC

SignImm

b

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B 00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

Page 13: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle DatapathAdd Controller

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC 0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

Page 14: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller Internals

ALUSrcA

PCSrc

Branch

ALUSrcB1:0Opcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainController(FSM)

ALUOp1:0

ALUDecoder

RegWrite

PCWrite

IorD

MemWrite

IRWrite

RegDst

MemtoReg

RegisterEnables

MultiplexerSelects

Page 15: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch

Fetch

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

0

1 1

0

X

X

00

01

0100

1

0

Page 16: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

Reset

S0: Fetch S1: Decode

Decode

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

0X

XX

XXXX

0

0

Page 17: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

Reset

S0: Fetch

S2: MemAdr

S1: Decode

Op = LWor

Op = SW

Memory Address

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

Reg

Dst

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

X

0 0

0

X

X

01

10

010X

0

0

Page 18: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemRead

Op = LWor

Op = SW

Op = LW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 19: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1IorD = 1

MemWrite

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

Op = LWor

Op = SW

Op = LW

Op = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 20: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

Op = LWor

Op = SW

Op = R-type

Op = LW

Op = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 21: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

Page 22: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 0

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

Page 23: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorAdditional circuitry for the jump instruction

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1PC 0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

00

01

10

<<2

25:0 (jump)

31:28

27:0

PCJump

Page 24: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Controller BehaviorIorD = 0

AluSrcA = 0ALUSrcB = 01ALUOp = 00PCSrc = 00

IRWritePCWrite

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

IorD = 1RegDst = 1

MemtoReg = 0RegWrite

IorD = 1MemWrite

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01

Branch

Reset

S0: Fetch

S2: MemAdr

S1: Decode

S3: MemReadS5: MemWrite

S6: Execute

S7: ALUWriteback

S8: Branch

Op = LWor

Op = SW

Op = R-type

Op = BEQ

Op = LW

Op = SW

RegDst = 0MemtoReg = 1

RegWrite

S4: MemWriteback

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0MemtoReg = 0

RegWrite

Op = ADDI

S9: ADDIExecute

S10: ADDIWriteback

PCSrc = 10PCWrite

Op = J

S11: Jump

Page 25: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle Critical Path

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

Two hypotheses: Reading memory or going through the ALU

Page 26: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Multicycle Clock Period

Element Delay

Register clk-to-Q tpcq-PC 30 psRegister setup tsetup 20Multiplexer tmux 25ALU tALU 200Memory Read tmem 250Register file read tRFread 150Register file setup tRFsetup 20 SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1 0

1

PC0

1

PC' Instr25:21

20:16

15:0

5:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

31:26

RegD

st

Branch

MemWrite

Mem

toReg

ALUSrcA

RegWriteOp

Funct

ControlUnit

Zero

PCSrc

CLK

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWrite

IorD

PCWritePCEn

TC = tpcq-PC + tmux +max{tALU + tmux, tmem}+ tRFsetup

= (30+ 25+max{200+ 25,250}+ 20) ps

= 325 ps

= 3.08 GHz

vs. 925 ps for our single-cycle processor

Page 27: Fundamentals of Computer Systems - Columbia University · 2012-09-26 · Execution Time for Our Multi-Cycle Processor For a 100 billion-instruction task on our multi-cycle processor,

Execution Time for Our Multi-Cycle Processor

For a 100 billion-instruction task on our multi-cycleprocessor, each instruction takes 4.12 cycles onaverage. With a 325 ps clock period,

SecondsProgram = Instructions

Program × Clock CyclesInstruction ×

SecondsClock Cycle

= 100× 109× 4.12 × 325 ps

= 133.9 seconds

vs. 92.5 seconds for our single-cycle processor.


Recommended