+ All Categories
Home > Documents > Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and...

Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and...

Date post: 21-Jan-2016
Category:
Upload: ashlie-gabriella-bradford
View: 335 times
Download: 21 times
Share this document with a friend
Popular Tags:
112
Chapter 7 <1> MICROARCHITECTURE Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris
Transcript
Page 1: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <1>

MIC

ROAR

CHIT

ECTU

RE

Digital Design and Computer Architecture, 2nd Edition

Chapter 7

David Money Harris and Sarah L. Harris

Page 2: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <2>

MIC

ROAR

CHIT

ECTU

RE Chapter 7 :: Topics

• Introduction• Performance Analysis• Single-Cycle Processor• Pipelined Processor• Exceptions• Advanced Microarchitecture

Page 3: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <3>

MIC

ROAR

CHIT

ECTU

RE• Microarchitecture: the

implementation of an architecture in hardware

• Processor:– Datapath: functional blocks– Control: control signals

Physics

Devices

AnalogCircuits

DigitalCircuits

Logic

Micro-architecture

Architecture

OperatingSystems

ApplicationSoftware

electrons

transistorsdiodes

amplifiersfilters

AND gatesNOT gates

addersmemories

datapathscontrollers

instructionsregisters

device drivers

programs

Introduction

Page 4: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <4>

MIC

ROAR

CHIT

ECTU

RE• Multiple implementations for a single

architecture:– Single-cycle: Each instruction executes in a

single cycle– Multicycle: Instructions are broken into series of

shorter steps Each instruction executes in n cycles, where n varys according to the instr.

– Pipelined: Each instruction broken up into series of steps & multiple instructions execute at once (Note: AMD and Intel pipelines are different, for the same IA-32 architecture (a.k.a. x86 ISA)

Microarchitecture

Page 5: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <5>

MIC

ROAR

CHIT

ECTU

RE• Program execution timeExecution Time = (#instructions)(cycles/instruction)(seconds/cycle)

• Definitions:– IC: Instruction Count (= #instructions)– CPI: Cycles/Instruction– clock period: seconds/cycle– IPC: Instructions/Cycle (= 1/CPI)

• Challenge is to satisfy constraints of:– Cost– Power– Performance

Processor Performance

Page 6: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <6>

MIC

ROAR

CHIT

ECTU

RE• Consider subset of MIPS instructions:

– R-type instructions: and, or, add, sub, slt– Memory instructions: lw, sw– Branch instructions: beq

MIPS Processor

Page 7: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <7>

MIC

ROAR

CHIT

ECTU

RE• Determines everything about a processor:

– PC and special registers– Register File– Memory

Architectural State

Page 8: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <8>

MIC

ROAR

CHIT

ECTU

RE

CLK

A RD

InstructionMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RD

DataMemory

WD

WEPCPC'

CLK

32 3232 32

32

32

32 32

32

32

5

5

5

MIPS State Elements

Plus the HI and LO registers

Page 9: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <9>

MIC

ROAR

CHIT

ECTU

RE• Datapath—design it 1st, to make the

instruction actions possible• Control—design it 2nd, to make them

happen

Single-Cycle MIPS Processor

Page 10: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <10>

MIC

ROAR

CHIT

ECTU

RESTEP 1: Fetch instruction

IM[PC]

CLK

A RD

InstructionMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RD

DataMemory

WD

WEPCPC'

Instr

CLK

Single-Cycle Datapath: lw fetch

Page 11: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <11>

MIC

ROAR

CHIT

ECTU

RESTEP 2: Read source operands from RF

RF[rs] or RF[Instr(25:21)]

Instr

CLK

A RD

InstructionMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

RegisterFile

A RD

DataMemory

WD

WEPCPC'

25:21

CLK

Single-Cycle Datapath: lw Register Read

Page 12: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <12>

MIC

ROAR

CHIT

ECTU

RESTEP 3: Sign-extend the immediate SignExt(immed)

SignImm

CLK

A RD

InstructionMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RD

DataMemory

WD

WEPCPC' Instr

25:21

15:0

CLK

Single-Cycle Datapath: lw Immediate

Page 13: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <13>

MIC

ROAR

CHIT

ECTU

RESTEP 4: Compute the memory address

addr = RF[rs] + SignExt(immed)

SignImm

CLK

A RD

InstructionMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RD

DataMemory

WD

WEPCPC' Instr

25:21

15:0

SrcB

ALUResult

SrcA Zero

CLK

ALUControl2:0

ALU

010

Single-Cycle Datapath: lw address

Page 14: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <14>

MIC

ROAR

CHIT

ECTU

RE• STEP 5: Read data from memory and write

it back to register file: RF[rt] DM[addr]

A1

A3

WD3

RD2

RD1WE3

A2

SignImm

CLK

A RD

InstructionMemory

CLK

Sign Extend

RegisterFile

A RD

DataMemory

WD

WEPCPC' Instr

25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

Single-Cycle Datapath: lw Memory Read

Page 15: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <15>

MIC

ROAR

CHIT

ECTU

RESTEP 6: Determine address of next instruction PC PC + 4

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RD

DataMemory

WD

WEPCPC' Instr

25:21

15:0

SrcB20:16

ALUResult ReadData

SrcA

PCPlus4

Result

RegWrite

Zero

CLK

ALUControl2:0

ALU

0101

Single-Cycle Datapath: lw PC Increment

Page 16: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <16>

MIC

ROAR

CHIT

ECTU

REIM[PC]RF[rt] DM[RF[rs] + SignExt(immed)]PC PC + 4

Full RTL Expression for lw

Page 17: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <17>

MIC

ROAR

CHIT

ECTU

REWrite data in rt to memory: DM[addr]RF[rt]

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

A RD

DataMemory

WD

WEPCPC' Instr

25:21

20:16

15:0

SrcB20:16

ALUResult ReadData

WriteData

SrcA

PCPlus4

Result

MemWriteRegWrite

Zero

CLK

ALUControl2:0

ALU

10100

Single-Cycle Datapath: sw

Page 18: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <18>

MIC

ROAR

CHIT

ECTU

RE• Read from rs and rt• Write ALUResult to register file• Write to rd (instead of rt) RF[rd] RF[rs] op RF[rt]

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PCPC' Instr25:21

20:16

15:0

SrcB

20:16

15:11

ALUResult ReadData

WriteData

SrcA

PCPlus4WriteReg4:0

Result

RegDst MemWrite MemtoRegALUSrcRegWrite

Zero

CLK

ALUControl2:0

ALU

0varies1 001

Single-Cycle Datapath: R-Type

Page 19: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <19>

MIC

ROAR

CHIT

ECTU

RE• Determine whether values in rs and rt are equal• Calculate branch target address: BTA = PC + 4 + SignExt(immed)<< 2 # <<2 = 4x

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1

PC' Instr25:21

20:16

15:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

RegDst Branch MemWrite MemtoRegALUSrcRegWrite

Zero

PCSrc

CLK

ALUControl2:0

ALU

01100 x0x 1

Single-Cycle Datapath: beq

Page 20: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <20>

MIC

ROAR

CHIT

ECTU

REIM[PC]if (RF[rs] - RF[rt] == 0) PC BTAelse PC PC + 4

RTL Expression for beq

Page 21: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <21>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Single-Cycle Processor

Page 22: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <22>

MIC

ROAR

CHIT

ECTU

RE

RegDst

Branch

MemWrite

MemtoReg

ALUSrcOpcode5:0

ControlUnit

ALUControl2:0Funct5:0

MainDecoder

ALUOp1:0

ALUDecoder

RegWrite

Single-Cycle Control

Page 23: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <23>

MIC

ROAR

CHIT

ECTU

RE

ALU

N N

N

3

A B

Y

F

F2:0 Function

000 A & B

001 A | B

010 A + B

011 not used

100 A & ~B

101 A | ~B

110 A - B

111 SLT

Review: ALU

Page 24: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <24>

MIC

ROAR

CHIT

ECTU

RE

+

2 01

A B

Cout

Y

3

01

F2

F1:0

[N-1] S

NN

N

N

N NNN

N

2Z

ero

Extend

Review: ALU

Page 25: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <25>

MIC

ROAR

CHIT

ECTU

REALUOp1:0 Meaning

00 Add (for lw, sw)

01 Subtract (for beq)

10 Look at funct (R-type)

11 Not Used

ALUOp1:0 funct ALUControl2:0

00 X 010 (Add)

X1 X 110 (Subtract)

1X 100000 (add) 010 (Add)

1X 100010 (sub) 110 (Subtract)

1X 100100 (and) 000 (And)

1X 100101 (or) 001 (Or)

1X 101010 (slt) 111 (SLT)

Control Unit: ALU Decoder

Page 26: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <26>

MIC

ROAR

CHIT

ECTU

REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000

lw 100011

sw 101011

beq 000100

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

Control Unit: Main Decoder

Page 27: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <27>

MIC

ROAR

CHIT

ECTU

RE

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10lw 100011 1 0 1 0 0 1 00sw 101011 0 X 1 0 1 X 00beq 000100 0 X 0 1 0 X 01

Control Unit: Main Decoder

Page 28: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <28>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0010

01

0

0

1

0

Single-Cycle Datapath: or

Page 29: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <29>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

No change to datapath

Extended Functionality: addi

Page 30: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <30>

MIC

ROAR

CHIT

ECTU

REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000

Main Decoder table: addi

Page 31: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <31>

MIC

ROAR

CHIT

ECTU

REInstruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type 000000 1 1 0 0 0 0 10

lw 100011 1 0 1 0 0 1 00

sw 101011 0 X 1 0 1 X 00

beq 000100 0 X 0 1 0 X 01

addi 001000 1 0 1 0 0 0 00

Main Decoder table: addi

Page 32: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <32>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC'

Instr25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0

1

25:0 <<2

27:0 31:28

PCJump

Jump

Extended Functionality: j

Page 33: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <33>

MIC

ROAR

CHIT

ECTU

RE

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000010

Main Decoder table: j

Page 34: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <34>

MIC

ROAR

CHIT

ECTU

RE

Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type 000000 1 1 0 0 0 0 10 0

lw 100011 1 0 1 0 0 1 00 0

sw 101011 0 X 1 0 1 X 00 0

beq 000100 0 X 0 1 0 X 01 0

j 000010 0 X X X 0 X XX 1

Main Decoder table: j

Page 35: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <35>

MIC

ROAR

CHIT

ECTU

RE

Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = IC x CPI x TC

Review: Processor Performance

Page 36: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <36>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU1

0100

1

0

1

0 0

TC limited by critical path (lw)

Single-Cycle Performance

Page 37: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <37>

MIC

ROAR

CHIT

ECTU

RE• Single-cycle critical path: Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem

+ tmux + tRFsetup

• Typically, limiting paths are: – memory, ALU, register file – Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup

Single-Cycle Performance

Page 38: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <38>

MIC

ROAR

CHIT

ECTU

REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = ?

Single-Cycle Performance Example

Page 39: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <39>

MIC

ROAR

CHIT

ECTU

REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup

= [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps [fclk = 1/0.925 GHz = 1.08 GHz]

Single-Cycle Performance Example

Page 40: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <40>

MIC

ROAR

CHIT

ECTU

REProgram with IC = 100 billion instructions:

Execution Time = IC x CPI x TC

= (100 × 109)(1)(925 × 10-12 s) = 92.5 seconds

Single-Cycle Performance Example

Page 41: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <41>

MIC

ROAR

CHIT

ECTU

REPros and cons of single-cycle implementation: + simple design + 1 cycle per every instruction - slow cycle time

limited by longest instruction (lw) - HW: 2 adders + ALU; 2 memories

Evaluation of Single-Cycle Processor

Page 42: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <42>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

5:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

31:26

RegDst

Branch

MemWrite

MemtoReg

ALUSrc

RegWrite

Op

Funct

ControlUnit

Zero

PCSrc

CLK

ALUControl2:0

ALU

0

1

25:0 <<2

27:0 31:28

PCJump

Jump

Review: Single-Cycle Processor

Page 43: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <43>

MIC

ROAR

CHIT

ECTU

RE//------------------------------------------------// [email protected] 9 November 2005// Top level system including MIPS and memories//------------------------------------------------

module top (input clk, reset, output [31:0] writedata, dataadr, output memwrite);

wire [31:0] pc, instr, readdata;

// instantiate processor and memories mips mips (clk, reset, pc, instr, memwrite, dataadr, writedata, readdata); imem imem (pc[7:2], instr); dmem dmem (clk, memwrite, dataadr, writedata, readdata);

endmodule

Verilog Model

Page 44: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <44>

MIC

ROAR

CHIT

ECTU

RE//------------------------------------------------// [email protected] 23 October 2005// External data memory used by MIPS single-cycle processor//------------------------------------------------module dmem (input clk, we, input [31:0] a, wd, output [31:0] rd);

reg [31:0] RAM[63:0]; assign rd = RAM[a[31:2]]; // word-aligned read

always @(posedge clk) if (we) RAM[a[31:2]] <= wd; // word-aligned writeendmodule

Verilog Model of Data Memory

Page 45: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <45>

MIC

ROAR

CHIT

ECTU

REmodule imem (input [5:0] addr, output reg [31:0] instr);

// imem is modeled as a lookup table, a stored-program byte-addressable ROMalways@(addr) case ({addr, 2'b00})

// address instruction// --------- --------------

8'h00: instr = 32'h20020005;8'h04: instr = 32'h2003000c;8'h08: instr = 32'h2067fff7;8'h0c: instr = 32'h00e22025;8'h10: instr = 32'h00642824;8'h14: instr = 32'h00a42820;8'h18: instr = 32'h10a7000a;8'h1c: instr = 32'h0064202a;8'h20: instr = 32'h10800001;

default: instr = {32{1'bx}}; // unknown instruction endcase

endmodule

Verilog Model of Instr. Memory

Page 46: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <46>

MIC

ROAR

CHIT

ECTU

REmodule imem (input [5:0] addr, output [31:0] instr);

reg [31:0] RAM[63:0];

// imem is RAM, loaded from memfile.dat file with hex values at startup initial begin $readmemh("memfile.dat", RAM); end

assign instr = RAM[addr]; // instr at RAM[addr] is read out

endmodule

// imem can be created with CoreGen for Xilinx synthesis

Alternate Model of Instr. Memory

Page 47: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <47>

MIC

ROAR

CHIT

ECTU

RE// single-cycle MIPS processormodule mips (input clk, reset, output [31:0] pc, input [31:0] instr, output memwrite, output [31:0] aluout, writedata, input [31:0] readdata);

wire memtoreg, pcsrc, zero, alusrc, regdst, regwrite, jump; wire [2:0] alucontrol;

controller c (instr[31:26], instr[5:0], zero, memtoreg, memwrite, pcsrc, alusrc, regdst, regwrite, jump, alucontrol);

datapath dp (clk, reset, memtoreg, pcsrc, alusrc, regdst, regwrite, jump, alucontrol, zero, pc, instr, aluout, writedata, readdata);

endmodule

Verilog Model of MIPS Processor

Page 48: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <48>

MIC

ROAR

CHIT

ECTU

REmodule controller ( input [5:0] op, funct, input zero, output memtoreg, memwrite, output pcsrc, alusrc, output regdst, regwrite, output jump, output [2:0] alucontrol);

wire [1:0] aluop; wire branch;

maindec md (op, regwrite, regdst, alusrc, branch, memwrite, memtoreg, aluop, jump);

aludec ad (funct, aluop, alucontrol);

assign pcsrc = branch & zero;

endmodule

Verilog Model of Controller

Page 49: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <49>

MIC

ROAR

CHIT

ECTU

REmodule maindec (input [5:0] op, output regwrite, regdst, alusrc, branch,

output memwrite, memtoreg, output [1:0] aluop, output jump); reg [8:0] controls;

assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, aluop, jump} = controls;

always @(*) case(op) 6'b000000: controls = 9'b110000100; //Rtype 6'b100011: controls = 9'b101001000; //LW 6'b101011: controls = 9'b001010000; //SW 6'b000100: controls = 9'b000100010; //BEQ 6'b001000: controls = 9'b101000000; //ADDI 6'b000010: controls = 9'b000000001; //J default: controls = 9'bxxxxxxxxx; //??? endcaseendmodule

Verilog Model of Main Decoder

Page 50: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <50>

MIC

ROAR

CHIT

ECTU

REmodule aludec (input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); always @(*) case(aluop) 2'b00: alucontrol = 3'b010; // add 2'b01: alucontrol = 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol = 3'b010; // ADD 6'b100010: alucontrol = 3'b110; // SUB 6'b100100: alucontrol = 3'b000; // AND 6'b100101: alucontrol = 3'b001; // OR 6'b101010: alucontrol = 3'b111; // SLT default: alucontrol = 3'bxxx; // ??? endcase endcaseendmodule

Verilog Model of ALU Decoder

Page 51: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <51>

MIC

ROAR

CHIT

ECTU

REmodule datapath (input clk, reset, memtoreg, pcsrc, alusrc, regdst, input regwrite, jump, input [2:0] alucontrol, output zero, output [31:0] pc, input [31:0] instr, output [31:0] aluout, writedata, input [31:0] readdata);

wire [4:0] writereg; wire [31:0] pcnext, pcnextbr, pcplus4, pcbranch; wire [31:0] signimm, signimmsh, srca, srcb, result; // next PC logic flopr #(32) pcreg(clk, reset, pcnext, pc); adder pcadd1(pc, 32'b100, pcplus4); sl2 immsh(signimm, signimmsh); adder pcadd2(pcplus4, signimmsh, pcbranch); mux2 #(32) pcbrmux(pcplus4, pcbranch, pcsrc, pcnextbr); mux2 #(32) pcmux(pcnextbr, {pcplus4[31:28], instr[25:0], 2'b00}, jump, pcnext);

Verilog Model of Datapath

Page 52: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <52>

MIC

ROAR

CHIT

ECTU

RE// register file logic regfile rf (clk, regwrite, instr[25:21], instr[20:16], writereg, result, srca, writedata);

mux2 #(5) wrmux (instr[20:16], instr[15:11], regdst, writereg); mux2 #(32) resmux (aluout, readdata, memtoreg, result); signext se (instr[15:0], signimm);

// ALU logic mux2 #(32) srcbmux (writedata, signimm, alusrc, srcb); alu alu (srca, srcb, alucontrol, aluout, zero);

endmodule

Verilog Model of Datapath (con’t)

Page 53: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <53>

MIC

ROAR

CHIT

ECTU

REmodule regfile (input clk, we3, input [4:0] ra1, ra2, wa3, input [31:0] wd3, output [31:0] rd1, rd2);

reg [31:0] rf [31:0];

// three ported register file: read two ports combinationally // write third port on rising edge of clock. Register 0 hardwired to 0

always @(posedge clk) if (we3) rf [wa3] <= wd3;

assign rd1 = (ra1 != 0) ? rf [ra1] : 0; assign rd2 = (ra2 != 0) ? rf[ ra2] : 0;

endmodule

Verilog Model of Register File

Page 54: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <54>

MIC

ROAR

CHIT

ECTU

REmodule adder (input [31:0] a, b, output [31:0] y); assign y = a + b;endmodule

module sl2 (input [31:0] a, output [31:0] y);// shift left by 2 assign y = {a[29:0], 2'b00}; endmodule

module signext (input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a};endmodule

Verilog Models of Other Parts

Page 55: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <55>

MIC

ROAR

CHIT

ECTU

REmodule flopr #(parameter WIDTH = 8) (input clk, reset, input [WIDTH-1:0] d, output reg [WIDTH-1:0] q); always @(posedge clk, posedge reset) if (reset) q <= 0; else q <= d;endmodule

module flopenr #(parameter WIDTH = 8) (input clk, reset, en, input [WIDTH-1:0] d, output reg [WIDTH-1:0] q); always @(posedge clk, posedge reset) if (reset) q <= 0; else if (en) q <= d;endmodule

module mux2 #(parameter WIDTH = 8) (input [WIDTH-1:0] d0, d1, input s, output [WIDTH-1:0] y); assign y = s ? d1 : d0; endmodule

Verilog for Parameterized Parts

Page 56: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <56>

MIC

ROAR

CHIT

ECTU

RE• Unscheduled function call to exception handler• Caused by:

– Hardware, also called an interrupt, e.g. keyboard– Software, also called traps, e.g. undefined instruction

• When exception occurs, the processor:– Records cause of exception (Cause register)– Jumps to exception handler (0x80000180)– Returns to program (EPC register)

Review: Exceptions

Page 57: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <57>

MIC

ROAR

CHIT

ECTU

RE Example Exception

Page 58: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <58>

MIC

ROAR

CHIT

ECTU

RE• Not part of register file; in Coprocessor 0

– Cause• Records cause of exception• Coprocessor 0 register 13

– EPC (Exception PC)• Records PC where exception occurred• Coprocessor 0 register 14

• Move from Coprocessor 0– mfc0 $t0, Cause (=mfc0 $t0,$13)– Moves contents of Cause into $t0

00000 $t0 (8) Cause (13) 00000000000

mfc0

31:26 25:21 20:16 15:11 10:0

010000

Review: Exception Registers

Page 59: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <59>

MIC

ROAR

CHIT

ECTU

REException Cause

Hardware Interrupt 0x00000000

System Call 0x00000020

Breakpoint / Divide by 0 0x00000024

Undefined Instruction 0x00000028

Arithmetic Overflow 0x00000030

Extend single-cycle MIPS processor to handle last two types of exceptions

Review: Exception Causes

Page 60: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <60>

MIC

ROAR

CHIT

ECTU

RE Exception RTLs

Undefined InstructionIM[PC]. . . # problem in decoding (bad op or func) Cause 40 # = 0x28EPC PCPC 0x80000180 #Exception handler address

Arithmetic OverflowIM[PC]. . . # ALU operation overflowsCause 48 # = 0x30EPC PCPC 0x80000180 #Exception handler address

mfc0 instruction (e.g. mfc0 $t1, $13)IM[PC]RF[rt] RFc0[rd]PC PC + 4

Page 61: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <61>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0

1Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

<<2

25:0 (jump)

31:28

27:0

PCJump

00

01

10

11

0x8000 0180

Overflow

CLK

EN

EPCWrite

CLK

EN

CauseWrite

0

1

IntCause

0x30

0x28EPC

Cause

Exception Hardware: EPC & Cause

Never mind the multi-cycle datapath, focus on the exception hardware.

Page 62: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <62>

MIC

ROAR

CHIT

ECTU

RE

SignImm

CLK

ARD

Instr / DataMemory

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1PC0

1

PC' Instr25:21

20:16

15:0

SrcB20:16

15:11

<<2

ALUResult

SrcA

ALUOut

RegDst BranchMemWrite MemtoReg1:0 ALUSrcARegWrite

Zero

PCSrc1:0

CLK

ALUControl2:0

ALU

WD

WE

CLK

Adr

0001

Data

CLK

CLK

A

B00

01

10

11

4

CLK

ENEN

ALUSrcB1:0IRWriteIorD PCWrite

PCEn

<<2

25:0 (jump)

31:28

27:0

PCJump

00

01

10

11

0x8000 0180

CLK

EN

EPCWrite

CLK

EN

CauseWrite

0

1

IntCause

0x30

0x28EPC

Cause

Overflow

...

01101

01110

...15:11

10

C0

Exception Hardware: mfc0

Never mind the multi-cycle datapath, focus on the exception hardware.

Page 63: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <63>

MIC

ROAR

CHIT

ECTU

RE• Temporal parallelism• Divide single-cycle processor into 5 stages:

– Fetch– Decode– Execute– Memory– Writeback

• Add pipeline registers between stages

Pipelined MIPS Processor

Page 64: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <64>

MIC

ROAR

CHIT

ECTU

RE

Time (ps)Instr

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead / Write

WriteReg

1

2

0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 19001000

Instr

1

2

3

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead / Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

FetchInstruction

DecodeRead Reg

ExecuteALU

MemoryRead/Write

WriteReg

Single-Cycle

Pipelined

Single-Cycle vs. Pipelined

Page 65: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <65>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

lw $s2, 40($0) RF 40

$0RF

$s2+ DM

RF $t2

$t1RF

$s3+ DM

RF $s5

$s1RF

$s4- DM

RF $t6

$t5RF

$s5& DM

RF 20

$s1RF

$s6+ DM

RF $t4

$t3RF

$s7| DM

add $s3, $t1, $t2

sub $s4, $s1, $s5

and $s5, $t5, $t6

sw $s6, 20($s1)

or $s7, $t3, $t4

1 2 3 4 5 6 7 8 9 10

add

IM

IM

IM

IM

IM

IMlw

sub

and

sw

or

Pipelined Processor Abstraction

Page 66: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <66>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PCF0

1PC' InstrD

25:21

20:16

15:0

SrcBE

20:16

15:11

RtE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

ResultW

PCPlus4EPCPlus4F

ZeroM

CLK CLK

ALU

WriteRegE4:0

CLK

CLK

CLK

SignImm

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PC0

1PC' Instr

25:21

20:16

15:0

SrcB

20:16

15:11

<<2

+

ALUResult ReadData

WriteData

SrcA

PCPlus4

PCBranch

WriteReg4:0

Result

Zero

CLK

ALU

Fetch Decode Execute Memory Writeback

Single-Cycle & Pipelined Datapath

Page 67: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <67>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PCF0

1PC' InstrD

25:21

20:16

15:0

SrcBE

20:16

15:11

RtE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4EPCPlus4F

ZeroM

CLK CLK

WriteRegW4:0

ALU

WriteRegE4:0

CLK

CLK

CLK

Fetch Decode Execute Memory Writeback

WriteReg must arrive at same time as Result

Corrected Pipelined Datapath

Page 68: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <68>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

Sign Extend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE0

1

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

20:16

15:11

RtE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4EPCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

ZeroM

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

BranchE BranchM

RegDstE

ALUSrcE

WriteRegE4:0

• Same control unit as single-cycle processor• Control delayed to proper pipeline stage

Pipelined Processor Control

Page 69: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <69>

MIC

ROAR

CHIT

ECTU

RE• When an instruction depends on result from

instruction that hasn’t completed• Types:

– Data hazard: register value not yet written back to register file

– Control hazard: next instruction not decided yet (caused by branches) or target address not calculated yet (jumps and branches)

Pipeline Hazards

Page 70: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <70>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMadd

or

sub

Data Hazard

Page 71: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <71>

MIC

ROAR

CHIT

ECTU

RE

2 SW fixes• Insert nops in code at compile time• Rearrange code at compile time 2 HW fixes• Stall the processor at run time• Forward data at run time

Handling Data Hazards

Page 72: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <72>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMadd

or

sub

nop

nop

RF RFDMnopIM

RF RFDMnopIM

9 10

• Insert enough nops for result to be ready• Or move independent useful instructions forward

Compile-Time Hazard Elimination

Page 73: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <73>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

add $s0, $s2, $s3 RF $s3

$s2RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMadd

or

sub

Data Forwarding

Page 74: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <74>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE

1

0

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

AL

U

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

For

wa

rdA

E

For

wa

rdB

E

20:16RtE

RsD

RdD

RtD

Reg

Wri

teM

Reg

Wri

teW

Hazard Unit

PCPlus4E

BranchE BranchM

ZeroM

Data Forwarding

Page 75: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <75>

MIC

ROAR

CHIT

ECTU

RE• Forward to Execute stage from either:

– Memory stage or– Writeback stage

• Forwarding logic for ForwardAE:

if ((rsE != 0) AND (rsE == WriteRegM) AND RegWriteM) then ForwardAE = 10

else if ((rsE != 0) AND (rsE == WriteRegW) AND RegWriteW)

then ForwardAE = 01 else ForwardAE = 00

Forwarding logic for ForwardBE same, but replace rsE with rtE

Data Forwarding

Page 76: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <76>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

lw $s0, 40($0) RF 40

$0RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMlw

or

sub

Trouble!

StallingForwarding on a load-use hazard isn’t possible!

Page 77: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <77>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

lw $s0, 40($0) RF 40

$0RF

$s0+ DM

RF $s1

$s0RF

$t0& DM

RF $s0

$s4RF

$t1| DM

RF $s5

$s0RF

$t2- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMlw

or

sub

9

RF $s1

$s0

IMor

Stall

StallingThe HW solution is to stall the pipeline

Page 78: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <78>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE

1

0

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

Sta

llF

Sta

llD

For

war

dAE

For

war

dBE

20:16RtE

RsD

RdD

RtD

Reg

Writ

eM

Reg

Writ

eW

Mem

toR

egE

Hazard Unit

Flu

shE

PCPlus4E

BranchE BranchM

ZeroM

EN

EN

CLR

Stalling Hardware

Page 79: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <79>

MIC

ROAR

CHIT

ECTU

RElwstall = ((rsD==rtE) OR (rtD==rtE)) AND MemtoRegE

StallF = StallD = FlushE = lwstall

• By flushing the Execute stage, and stalling Fetch and Decode stages, the instruction flushed will simply be repeated in then next clock cycle, but this time with correct (forwarded) data!

Stalling Logic

Page 80: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <80>

MIC

ROAR

CHIT

ECTU

RE• beq:

– branch not determined until 4th stage of pipeline– Instructions after the branch are fetched before the

branch occurs– These instructions must be flushed if branch happens

• Branch misprediction penalty– the # of instruction flushed, when branch is taken– may be reduced by determining branch earlier

Control Hazards

Page 81: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <81>

MIC

ROAR

CHIT

ECTU

RE

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE

1

0

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchM

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcM

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

AL

U

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

SignImmD

Sta

llF

Sta

llD

For

wa

rdA

E

For

wa

rdB

E

20:16RtE

RsD

RdD

RtD

Reg

Wri

teM

Reg

Wri

teW

Me

mto

Reg

E

Hazard Unit

Flu

shE

PCPlus4E

BranchE BranchM

ZeroM

EN

EN

CL

R

Control Hazards: Original Pipeline

Page 82: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <82>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

beq $t1, $t2, 40 RF $t2

$t1RF- DM

RF $s1

$s0RF& DM

RF $s0

$s4RF| DM

RF $s5

$s0RF- DM

and $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

and

IM

IM

IM

IMlw

or

sub

20

24

28

2C

30

...

...

9

Flushthese

instructions

64 slt $t3, $s2, $s3 RF $s3

$s2RF

$t3slt DMIM

slt

Control Hazards

Page 83: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <83>

MIC

ROAR

CHIT

ECTU

RE

EqualD

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE

1

0

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchD

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcD

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

=

SignImmD

Sta

llF

Sta

llD

For

war

dAE

For

war

dBE

20:16RtE

RsD

RdE

RtD

Reg

Writ

eM

Reg

Writ

eW

Mem

toR

egE

Hazard Unit

Flu

shE

EN

EN

CLR

CLR

But: introduced another data hazard in Decode stage!

Early Branch Resolution

Page 84: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <84>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

beq $t1, $t2, 40 RF $t2

$t1RF- DM

RF $s1

$s0RF& DMand $t0, $s0, $s1

or $t1, $s4, $s0

sub $t2, $s0, $s5

1 2 3 4 5 6 7 8

andIM

IMlw20

24

28

2C

30

...

...

9

Flushthis

instruction

64 slt $t3, $s2, $s3 RF $s3

$s2RF

$t3slt DMIMslt

Early Branch Resolution

Page 85: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <85>

MIC

ROAR

CHIT

ECTU

RE

EqualD

SignImmE

CLK

A RD

InstructionMemory

+

4

A1

A3

WD3

RD2

RD1WE3

A2

CLK

SignExtend

RegisterFile

0

1

0

1

A RD

DataMemory

WD

WE

1

0

PCF0

1PC' InstrD

25:21

20:16

15:0

5:0

SrcBE

25:21

15:11

RsE

RdE

<<2

+

ALUOutM

ALUOutW

ReadDataW

WriteDataE WriteDataM

SrcAE

PCPlus4D

PCBranchD

WriteRegM4:0

ResultW

PCPlus4F

31:26

RegDstD

BranchD

MemWriteD

MemtoRegD

ALUControlD2:0

ALUSrcD

RegWriteD

Op

Funct

ControlUnit

PCSrcD

CLK CLK CLK

CLK CLK

WriteRegW4:0

ALUControlE2:0

ALU

RegWriteE RegWriteM RegWriteW

MemtoRegE MemtoRegM MemtoRegW

MemWriteE MemWriteM

RegDstE

ALUSrcE

WriteRegE4:0

000110

000110

0

1

0

1

=

SignImmD

Sta

llF

Sta

llD

For

war

dAE

For

war

dBE

For

war

dAD

For

war

dBD

20:16RtE

RsD

RdD

RtD

Reg

Writ

eE

Reg

Writ

eM

Reg

Writ

eW

Mem

toR

egE

Bra

nchD

Hazard Unit

Flu

shE

EN

EN

CLR

CLR

Forwarding to Early-branch HW

Page 86: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <86>

MIC

ROAR

CHIT

ECTU

RE• Forwarding logic:

ForwardAD = (rsD !=0) AND (rsD == WriteRegM) AND RegWriteM

ForwardBD = (rtD !=0) AND (rtD == WriteRegM) AND RegWriteM

• Stalling logic:branchstall = BranchD AND RegWriteE AND

(WriteRegE == rsD OR WriteRegE == rtD) OR

BranchD AND MemtoRegM AND (WriteRegM == rsD OR WriteRegM == rtD)

StallF = StallD = FlushE = (lwstall OR branchstall)

Control Forwarding & Stalling Logic

Page 87: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <87>

MIC

ROAR

CHIT

ECTU

RE• Guess whether branch will be taken

– Backward branches are usually taken (in bottom-tested loops)

– Consider history to improve guess• Good prediction significantly reduces fraction

of branches requiring a flush • Requires HW for history table, etc

Branch Prediction

Page 88: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <88>

MIC

ROAR

CHIT

ECTU

RE• SPECINT2000 benchmark:

– 25% loads– 10% stores – 11% branches– 2% jumps– 52% R-type

• Suppose:– 40% of loads used by next instruction– 25% of branches mispredicted– All jumps flush next instruction (JTA not ready)

• What is the average CPI?

Pipelined Performance Example

Page 89: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <89>

MIC

ROAR

CHIT

ECTU

RE• Average CPI is the weighted average of CPIlw , CPIsw ,

CPIbeq , CPIj and CPIR-type

• For pipeline processors, CPI = 1 + # of stall cycles

Load CPI = 1 when no stall, = 2 when load-use occurs (1 stall)– CPIlw = 1(0.6) + 2(0.4) = 1.4– CPIsw = 1Branch CPI = 1 when no stall, = 2 when it mispredicts and stalls– CPIbeq = 1(0.75) + 2(0.25) = 1.25Jump CPI = 2 since it always requires 1 stall– CPIj = 2– CPIR-type = 1

Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

Calculation of Average CPI

Page 90: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <90>

MIC

ROAR

CHIT

ECTU

RE• Pipelined processor critical path: Tc = max {

tpcq + tmem + tsetup

2(tRFread + tmux + teq + tAND + tmux + tsetup )

tpcq + tmux + tmux + tALU + tsetup

tpcq + tmemwrite + tsetup

2(tpcq + tmux + tRFwrite) }

Pipelined Performance

Page 91: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <91>

MIC

ROAR

CHIT

ECTU

REElement Parameter Delay (ps)Register clock-to-Q tpcq_PC 30

Register setup tsetup 20

Multiplexer tmux 25

ALU tALU 200

Memory read tmem 250

Register file read tRFread 150

Register file setup tRFsetup 20

Equality comparator teq 40

AND gate tAND 15

Memory write Tmemwrite 220

Register file write tRFwrite 100 ps

Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup )

= 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps

Pipelined Performance Example

Page 92: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <92>

MIC

ROAR

CHIT

ECTU

REProgram with IC = 100 billion instructions

Execution Time = IC × CPI × Tc

= (100 × 109)(1.15)(550 × 10-

12) = 63 seconds

Pipelined Performance Example

Page 93: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <93>

MIC

ROAR

CHIT

ECTU

RE

Processor

Execution Time(seconds)

Speedup(single-cycle as baseline)

Single-cycle 92.5 1

Multicycle 133 0.70

Pipelined 63 1.47

Processor Performance Comparison

Page 94: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <94>

MIC

ROAR

CHIT

ECTU

RE• Deep Pipelining• Branch Prediction• Superscalar Processors• Out of Order Processors• Register Renaming• SIMD• Multithreading• Multiprocessors

Advanced Microarchitecture

Page 95: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <95>

MIC

ROAR

CHIT

ECTU

RE• 10-20 stages typical• Number of stages limited by:

– Pipeline hazards– Sequencing overhead– Power– Cost

Deep Pipelining

Page 96: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <96>

MIC

ROAR

CHIT

ECTU

RE• Ideal pipelined processor: CPI = 1• Branch misprediction increases CPI• Static branch prediction:

– Check direction of branch (forward or backward)– If backward, predict taken– Else, predict not taken

• Dynamic branch prediction:– Keep history of last (several hundred) branches in branch

target buffer, record:• Branch destination• Whether branch was taken

Branch Prediction

Page 97: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <97>

MIC

ROAR

CHIT

ECTU

RE add $s1, $0, $0 # sum = 0 add $s0, $0, $0 # i = 0 addi $t0, $0, 10 # $t0 = 10for: beq $s0, $t0, done # if i == 10, branch add $s1, $s1, $s0 # sum = sum + i addi $s0, $s0, 1 # increment i j fordone:

Branch Prediction Example

Page 98: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <98>

MIC

ROAR

CHIT

ECTU

RE• Remembers whether branch was taken the

last time and does the same thing• Mispredicts first and last branch of loop

1-Bit Branch Predictor

Page 99: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <99>

MIC

ROAR

CHIT

ECTU

RE

Only mispredicts the last branch of the loop

stronglytaken

predicttaken

weaklytaken

predicttaken

weaklynot taken

predictnot taken

stronglynot taken

predictnot taken

taken taken taken

takentakentaken

taken

taken

2-Bit Branch Predictor

Page 100: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <100>

MIC

ROAR

CHIT

ECTU

RE• Multiple copies of datapath execute multiple

instructions at once• Dependencies make it tricky to issue multiple

instructions at onceCLK CLK CLK CLK

ARD A1

A2RD1A3

WD3WD6

A4A5A6

RD4

RD2RD5

InstructionMemory

RegisterFile Data

Memory

ALU

s

PC

CLK

A1A2

WD1WD2

RD1RD2

Superscalar

Page 101: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <101>

MIC

ROAR

CHIT

ECTU

RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 2or $t3, $s5, $s6sw $s7, 80($t3)

Time (cycles)

1 2 3 4 5 6 7 8

RF40

$s0

RF

$t0+

DMIM

lw

add

lw $t0, 40($s0)

add $t1, $s1, $s2

sub $t2, $s1, $s3

and $t3, $s3, $s4

or $t4, $s1, $s5

sw $s5, 80($s0)

$t1$s2

$s1

+

RF$s3

$s1

RF

$t2-

DMIM

sub

and $t3$s4

$s3

&

RF$s5

$s1

RF

$t4|

DMIM

or

sw80

$s0

+ $s5

Superscalar Example

Page 102: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <102>

MIC

ROAR

CHIT

ECTU

RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/5 = 1.17or $t3, $s5, $s6sw $s7, 80($t3)

Stall

Time (cycles)

1 2 3 4 5 6 7 8

RF40

$s0

RF

$t0+

DMIM

lwlw $t0, 40($s0)

add $t1, $t0, $s1

sub $t0, $s2, $s3

and $t2, $s4, $t0

sw $s7, 80($t3)

RF$s1

$t0add

RF$s1

$t0

RF

$t1+

DM

RF$t0

$s4

RF

$t2&

DMIM

and

IMor

and

sub

|$s6

$s5$t3

RF80

$t3

RF+

DM

sw

IM

$s7

9

$s3

$s2

$s3

$s2-

$t0

oror $t3, $s5, $s6

IM

Superscalar with Dependencies

Page 103: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <103>

MIC

ROAR

CHIT

ECTU

RE• Looks ahead across multiple instructions• Issues as many instructions as possible at once• Issues instructions out of order (as long as no

dependencies)• Dependencies:

– RAW (read after write): one instruction writes, later instruction reads a register

– WAR (write after read): one instruction reads, later instruction writes a register

– WAW (write after write): one instruction writes, later instruction writes a register

Out of Order Processor

Page 104: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <104>

MIC

ROAR

CHIT

ECTU

RE• Instruction level parallelism (ILP): number

of instruction that can be issued simultaneously (average < 3)

• Scoreboard: table that keeps track of:– Instructions waiting to issue– Available functional units– Dependencies

Out of Order Processor

Page 105: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <105>

MIC

ROAR

CHIT

ECTU

RElw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/4 =

1.5or $t3, $s5, $s6sw $s7, 80($t3) Time (cycles)

1 2 3 4 5 6 7 8

RF40

$s0

RF

$t0+

DMIM

lwlw $t0, 40($s0)

add $t1, $t0, $s1

sub $t0, $s2, $s3

and $t2, $s4, $t0

sw $s7, 80($t3)

or|$s6

$s5$t3

RF80

$t3

RF+

DM

sw $s7

or $t3, $s5, $s6

IM

RF$s1

$t0

RF

$t1+

DMIM

add

sub-$s3

$s2$t0

two cycle latencybetween load anduse of $t0

RAW

WAR

RAW

RF$t0

$s4

RF&

DM

and

IM

$t2

RAW

Out of Order Processor Example

Page 106: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <106>

MIC

ROAR

CHIT

ECTU

RE

Time (cycles)

1 2 3 4 5 6 7

RF40

$s0

RF

$t0+

DMIM

lwlw $t0, 40($s0)

add $t1, $t0, $s1

sub $r0, $s2, $s3

and $t2, $s4, $r0

sw $s7, 80($t3)

sub-$s3

$s2$r0

RF$r0

$s4

RF&

DM

and

$s7

or $t3, $s5, $s6IM

RF$s1

$t0

RF

$t1+

DMIM

add

sw+80

$t3

RAW

$s6

$s5|

or

2-cycle RAW

RAW

$t2

$t3

lw $t0, 40($s0)add $t1, $t0, $s1sub $t0, $s2, $s3 Ideal IPC: 2and $t2, $s4, $t0 Actual IPC: 6/3 =

2or $t3, $s5, $s6sw $s7, 80($t3)

Register Renaming

Page 107: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <107>

MIC

ROAR

CHIT

ECTU

RE• Single Instruction Multiple Data (SIMD)

– Single instruction acts on multiple pieces of data at once– Common application: graphics– Perform short arithmetic operations (also called packed

arithmetic)

• For example, add four 8-bit elementspadd8 $s2, $s0, $s1

a0

0781516232432 Bit position

$s0a1a2a3

b0 $s1b1b2b3

a0 + b0 $s2a1 + b1a2 + b2a3 + b3

+

SIMD

Page 108: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <108>

MIC

ROAR

CHIT

ECTU

RE• Multithreading

– Wordprocessor: thread for typing, spell checking, printing

• Multiprocessors– Multiple processors (cores) on a single chip

Advanced Architecture Techniques

Page 109: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <109>

MIC

ROAR

CHIT

ECTU

RE• Process: program running on a computer

– Multiple processes can run at once: e.g., surfing Web, playing music, writing a paper

• Thread: part of a program– Each process has multiple threads: e.g., a word

processor may have threads for typing, spell checking, printing

Threading: Definitions

Page 110: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <110>

MIC

ROAR

CHIT

ECTU

RE• One thread runs at once• When one thread stalls (for example, waiting

for memory):– Architectural state of that thread stored– Architectural state of waiting thread loaded into

processor and it runs– Called context switching

• Appears to user like all threads running simultaneously

Threads in Conventional Processor

Page 111: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <111>

MIC

ROAR

CHIT

ECTU

RE• Multiple copies of architectural state• Multiple threads active at once:

– When one thread stalls, another runs immediately– If one thread can’t keep all execution units busy,

another thread can use them

• Does not increase instruction-level parallelism (ILP) of single thread, but increases throughput

Intel calls this “hyperthreading”

Multithreading

Page 112: Chapter 7 Digital Design and Computer Architecture, 2 nd Edition Chapter 7 David Money Harris and Sarah L. Harris.

Chapter 7 <112>

MIC

ROAR

CHIT

ECTU

RE

• Multiple processors (cores) with a method of communication between them

• Types:– Homogeneous: multiple cores with shared memory– Heterogeneous: separate cores for different tasks (for

example, DSP and CPU in cell phone)– Clusters: each core has own memory system

Multiprocessors


Recommended