15-447 Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu.edu...

transcript

15-447 Computer Architecture Fall 2007 ©

October 3rd, 2007

Majd F. Sakr

msakr@qatar.cmu.edu

www.qatar.cmu.edu/~msakr/15447-f07/

CS-447– Computer Architecture

M,W 10-11:20am

Lecture 11Single Cycle Datapath

Lecture Objectives

°Learn what a datapath is, and how does it provide the required functions.

°Appreciate why different implementation strategies affects the clock rate and CPI of a machine.

°Understand how the ISA determines many aspects of the hardware implementation.

Implementation vs. Performance

Performance of a processor is determined by

• Instruction count of a program

• CPI

• Clock cycle time (clock rate)

The compiler & the ISA determine the instruction count.

The implementation of the processor determines the CPI and the clock cycle time.

Possible Execution Steps of Any Instructions

° Instruction Fetch

° Instruction Decode and Register Fetch

° Execution of the Memory Reference Instruction

° Execution of Arithmetic-Logical operations

° Branch Instruction

° Jump Instruction

Instruction Processing° Five steps:

• Instruction fetch (IF)

• Instruction decode and operand fetch (ID)

• ALU/execute (EX)

• Memory (not required) (MEM)

• Write-back (WB)

Registers

Register #

Datamemory

Address

Register #

PC Instruction ALU

Instructionmemory

Address

Datapath & Control

Control

Datapath Elements

The data path contains 2 types of logic elements:

• Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs.

• State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.

State Elements

Pentium Processor Die

° State

• Registers

• Memory

° Control ROM

° Combinational logic (Compute)

Abstract View of the Datapath

Registers

Register #

Datamemory

Address

Register #

PC Instruction ALU

Instructionmemory

Address

Single Cycle Implementation

° This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.

Program Counter

If each instruction needs 4 memory locations then, Next PC <= PC + 4

PC Datapath – Branch OffsetPC <= PC + Branch Offset

Abstract View After PC Basic Implementation

The Register File

° Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register.

° Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.

The Register File

InstructionRegisters

Writeregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

RegWrite

ALU operation3

R-Type Instructions•Assembly (e.g., register-register signed addition)

ADD rdreg rsreg rtreg

• Machine encoding

• Semantics

if MEM[PC] == ADD rd rs rtGPR[rd] ← GPR[rs] + GPR[rt]

PC ← PC + 4

ADD rd rs rt

Datapath for Add

I-Type ALU Instructions

° Assembly (e.g., register-immediate signed additions)

ADDI rtreg rsreg immediate16

° Machine encoding

° Semantics

if MEM[PC] == ADDI rt rs immediate

GPR[rt] ← GPR[rs] + sign-extend (immediate)

PC ← PC + 4

ADDI rtreg rsreg immediate16

Datapath for R and I-Type ALU Instructions

Data Memory

° The element needed to implement load and store

instructions are data memory. In addition we use

the existing ALU to compute the address to

access.

° The data memory has 2 x-bit inputs: the address

and the write data, and 1 x-output: the read data.

In addition it has 2 control lines:

MemWrite and MemRead.

Data Memory

Instruction

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Datamemory

Writedata

Readdata

Writedata

Signextend

ALUresult

ZeroALU

Address

MemRead

MemWrite

RegWrite

ALU operation3

Load Instruction° Assembly (e.g., load 4-byte word)

LW rtreg offset16 (basereg)

° Machine encoding

° Semantics

if MEM[PC]==LW rt offset16 (base)

EA = sign-extend(offset) + GPR[base]

GPR[rt] ← MEM[ translate(EA) ]

PC ← PC + 4

LW Datapath

Branch Equal

°The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.

Branch Equal

16 32Sign

extend

ZeroALU

Shiftleft 2

To branchcontrol logic

Branch target

PC + 4 from instruction datapath

Instruction

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

RegWrite

ALU operation3

Unconditional Jump° Assembly

J immediate26

° Machine encoding

° Semantics

if MEM[PC]==J immediate26

target = { PC[31:28], immediate26, 2’b00 }

PC ← target

Unconditional Jump Datapath

Combining ALU and Memory Instructions

° The ALU datapath and the Memory datapath are similar. The differences are:

• The second input to the ALU is a register (R-type) or the offset (I-type).

• The value stored into the destination register comes from the ALU (R-type) or from memory (I-type) .

° Using 2 multiplexers (Mux) we can combine both datapaths.

Combining ALU and Memory Instructions

Instruction

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Datamemory

Writedata

Readdata

MuxWrite

Signextend

ALUresult

ZeroALU

Address

RegWrite

ALU operation3

MemRead

MemWrite

ALUSrcMemtoReg

The Complete Datapath

Instructionmemory

Readaddress

Instruction

Add ALUresult

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

Shiftleft 2

ALU operation3

RegWrite

MemRead

MemWrite

ALUSrc

MemtoReg

ALUresult

ZeroALU

Datamemory

Address

Writedata

Readdata M

Signextend

Complete Datapath

What’s Wrong with Single Cycle?

° All instructions run at the speed of the slowest instruction.

° Adding a long instruction can hurt performance• What if you wanted to include multiply?

° You cannot reuse any parts of the processor• We have 3 different adders to calculate PC+1,

PC+1+offset and the ALU

° No profit in making the common case fast• Since every instruction runs at the slowest instruction

speed- This is particularly important for loads as we will see later

What’s Wrong with Single Cycle?

1 ns – Register read/write time

2 ns – ALU/adder

2 ns – memory access

0 ns – MUX, PC access, sign extend, ROM

add: 2ns + 1ns + 2ns + 1ns = 6 ns

beq: 2ns + 1ns + 2ns = 5 ns

sw: 2ns + 1ns + 2ns + 2ns = 7 ns

lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns

Get read ALU mem writeInstr reg operation reg

Computing Execution Time

Assume: 100 instructions executed25% of instructions are loads,

10% of instructions are stores,

45% of instructions are adds, and

20% of instructions are branches.

Single-cycle execution:

100 * 8ns = 800 ns

Optimal execution:

25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns

15-447 Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu.edu...

Documents