Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | nicholas-powell |
View: | 217 times |
Download: | 0 times |
1
COMP541
Multicycle MIPS
Montek Singh
Apr 8, 2015
Topics Challenges w/ single-cycle MIPS
implementation Multicycle MIPS
State elementsNow add registers between stages
How to controlPerformance
2
Review: Processor Performance Program execution time
Execution Time = (# instructions)
(cycles/instruction)(seconds/cycle)= IC x CPI x Tc
Definitions: IC = instruction countCycles/instruction = CPISeconds/cycle = clock period = Tc
1/CPI = Instructions/cycle = IPC Challenge is to satisfy constraints of:
CostPowerPerformance
Single-Cycle Performance (textbook version) TC is limited by the critical path (lw)
lw is typically the longest instruction
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
AL
U
1
010
01
0
1
0 0
Single-Cycle Performance (textbook version)• Single-cycle critical path:
• Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are: – memory, ALU, register file. – Tc = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
010
01
0
1
0 0
Single-Cycle Performance Example
Tc = tpcq_PC + 2tmem + tRFread + tALU + tmux + tRFsetup
= [30 + 2(250) + 150 + 200 + 25 + 20] ps = 925 ps
What’s the max clock frequency?
Single-Cycle Performance Example For a program with 100 billion instructions
executing on a single-cycle MIPS processor,Execution Time
= # instructions x CPI x TC
= (100 × 109)(1)(925 × 10-12 s)= 92.5 seconds
8
Multicycle MIPS
Key idea: Break instruction execution into multiple clock cycles
Multicycle MIPS Processor Single-cycle microarchitecture:
+ simple- cycle time limited by longest instruction (lw)- two adders/ALUs and two memories
Multicycle microarchitecture:+ higher clock speed+ simpler instructions run faster+ reuse expensive hardware on multiple cycles- sequencing overhead
Same design steps: datapath & control
Multicycle State Elements Replace Instruction and Data memories with a
single unified memoryMore realistic (buy one big RAM!)Was not possible in single-cycle implementation
both instruction and data accesses needed within same clock cycle
Now: Use same memory twice if needed instruction fetch and data access are in distinct clock
cyclesCLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
PCPC'
WD
WE
CLK
EN
Multicycle Datapath: lw instr fetch First consider executing lw STEP 1: Fetch instruction
introduce Instruction Register to buffer this instructiona “non-architectural register”
not accessible to programmer
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
PCPC' Instr
CLK
WD
WE
CLK
EN
IRWrite
Multicycle Datapath: lw register read Read register $rs
insert another non-architectural register, Abuffers the value of $rs read from register file
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
RegisterFile
PCPC' Instr25:21
CLK
WD
WE
CLK CLK
A
EN
IRWrite
Multicycle Datapath: lw immediate Immediate field is sign-extended
for consistency, could insert another non-architectural register to buffer SignImm
skipped in this versionbecause SignImm is a simple combinational function of
Instr, which is already being held in Instruction Register
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr25:21
15:0
CLK
WD
WE
CLK CLK
A
EN
IRWrite
Multicycle Datapath: lw address ALU computes memory address
insert another register to buffer ALUOut
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK CLK
A CLK
EN
IRWrite
Multicycle Datapath: lw memory read Same memory read now for data access
insert a mutiplexer in front of memory’s address inputchoose either PC or ALUOut as address
i.e., either instruction fetch or data accesscontrolled by new control signal IorD
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr25:21
15:0
SrcB
ALUResult
SrcA
ALUOut
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
0
1
Multicycle Datapath: lw write register Data from memory is written into register file
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
PCPC' Instr25:21
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
RegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A CLK
EN
IRWriteIorD
0
1
Multicycle Datapath: increment PC PC incremented by re-using the ALU to do PC +
4 in single-cycle, we had to introduce a dedicated +4
adder in multi-cycle, same ALU used twice, in distinct cycles!
PCWrite
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1PCPC' Instr25:21
15:0
SrcB
20:16
ALUResult
SrcA
ALUOut
ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD
0
1
Now using main ALU when it is not busy (instead of dedicated adder)
Multicycle Datapath: sw Compared to lw
address computation is identical to lwwrite data in $rt to memory
MemWrite will be 1 during the appropriate clock cycle$rt is buffered using nonarchitectural register B
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
ALUResult
SrcA
ALUOut
MemWrite ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
B
Multicycle Datapath: R-type Instrs. Read from $rs and $rt
multiplexers in front of ALU choose $rs and $rt as operands
rite ALUResult to register file Write to $rd (instead of $rt)
multiplexers in front of write address/data to register file
0
1
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
ALUResult
SrcA
ALUOut
RegDstMemWrite MemtoReg ALUSrcARegWrite
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorDPCWrite
Multicycle Datapath: beq 2 tasks
Determine whether values in rs and rt are equalCalculate branch target address:
BTA = (sign-extended immediate << 2) + (PC+4)ALU reused!
SignImm
b
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
Complete Multicycle Processor Caveat: Same differences in functionality w.r.t. our lab
version as single-cycle MIPS
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
Control Unit
ALUSrcA
PCSrc
Branch
ALUSrcB1:0
Opcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainController
(FSM)
ALUOp1:0
ALUDecoder
RegWrite
PCWrite
IorD
MemWrite
IRWrite
RegDst
MemtoReg
RegisterEnables
MultiplexerSelects
Main Controller FSM: Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
1
0
Reset
S0: Fetch
Main Controller FSM: Fetch
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
0
1 1
0
X
X
00
01
0100
1
0
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
• Fetch instruction• Also increment PC (because ALU not in use)
Note: signals only shown when needed and enables only when asserted.
Main Controller FSM: Decode
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch S1: Decode
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
0X
XX
XXXX
0
0
• No signals needed for decode• Register values also fetched
• Perhaps will not be used
Main Controller FSM: Address Calculation
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
0
0
• Now change states depending on instr
Main Controller FSM: Address Calculation
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
Reset
S0: Fetch
S2: MemAdr
S1: Decode
Op = LWor
Op = SW
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC 0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gDst
Branch
MemWriteM
emtoR
eg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
X
0 0
0
X
X
01
10
010X
0
0
• For lw or sw, need to compute addr
Main Controller FSM: lw
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemRead
Op = LWor
Op = SW
Op = LW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
• For lw now need to read from memory
• Then write to register
Main Controller FSM: sw
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1IorD = 1
MemWrite
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
Op = LWor
Op = SW
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
• sw just writes to memory
• One step shorter
Main Controller FSM: R-Type
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
Op = LWor
Op = SW
Op = R-type
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
• The r-type instructions have two steps: compute result in ALU and write to reg
Main Controller FSM: beq
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
beq needs to use ALU twice, so consumes two cycles• One to
compute addr
• Another to decide on eq
Can take advantage of decode when ALU not used to compute BTA(no harm if BTA not used)
Complete Multicycle Controller FSM
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Main Controller FSM: addi
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Similar to r-type
• Add• Write back
Main Controller FSM: addi
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 0
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 1
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Extended Functionality: j
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
RegDst BranchMemWrite MemtoReg ALUSrcARegWrite
Zero
PCSrc1:0
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWriteIorD PCWrite
PCEn
00
01
10
<<2
25:0 (jump)
31:28
27:0
PCJump
Control FSM: j
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
Op = J
S11: Jump
Control FSM: j
IorD = 0AluSrcA = 0
ALUSrcB = 01ALUOp = 00PCSrc = 00
IRWritePCWrite
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
IorD = 1RegDst = 1
MemtoReg = 0RegWrite
IorD = 1MemWrite
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCSrc = 01
Branch
Reset
S0: Fetch
S2: MemAdr
S1: Decode
S3: MemReadS5: MemWrite
S6: Execute
S7: ALUWriteback
S8: Branch
Op = LWor
Op = SW
Op = R-type
Op = BEQ
Op = LW
Op = SW
RegDst = 0MemtoReg = 1
RegWrite
S4: MemWriteback
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst = 0MemtoReg = 0
RegWrite
Op = ADDI
S9: ADDIExecute
S10: ADDIWriteback
PCSrc = 10PCWrite
Op = J
S11: Jump
Multicycle Performance Instructions take different number of cycles:
3 cycles: beq, j4 cycles: R-Type, sw, addi5 cycles: lw
CPI is weighted average SPECINT2000 benchmark:
25% loads10% stores 11% branches2% jumps52% R-type
Average CPI = (0.11 + 0.2)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12
Multicycle Performance
SignImm
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1 0
1
PC0
1
PC' Instr25:21
20:16
15:0
5:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
31:26
Re
gD
st
Branch
MemWrite
Mem
toReg
ALUSrcA
RegWriteOp
Funct
ControlUnit
Zero
PCSrc
CLK
CLK
ALUControl2:0
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
• Multicycle critical path: Tc = tpcq + tmux + max(tALU + tmux, tmem) + tsetup
Multicycle Performance Example
Tc = tpcq_PC + tmux + max(tALU + tmux, tmem) + tsetup
= tpcq_PC + tmux + tmem + tsetup
= [30 + 25 + 250 + 20] ps = 325 ps
Element Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Multicycle Performance Example For a program with 100 billion instructions
executing on a multicycle MIPS processorCPI = 4.12Tc = 325 ps
Execution Time = (# instructions) × CPI × Tc
= (100 × 109)(4.12)(325 × 10-12) = 133.9 seconds
This is slower than the single-cycle processor (92.5 seconds). Why?Not all steps the same lengthSequencing overhead for each step (tpcq + tsetup= 50 ps)
Review: Single-Cycle MIPS Processor
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1
A RD
DataMemory
WD
WE0
1
PC0
1PC' Instr
25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
Branch
MemWrite
MemtoReg
ALUSrc
RegWrite
Op
Funct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0
1
25:0 <<2
27:0 31:28
PCJump
Jump
Review: Multicycle MIPS Processor
ImmExt
CLK
ARD
Instr / DataMemory
A1
A3
WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
0
1
0
1PC0
1
PC' Instr25:21
20:16
15:0
SrcB20:16
15:11
<<2
ALUResult
SrcA
ALUOut
ZeroCLK
ALU
WD
WE
CLK
Adr
0
1Data
CLK
CLK
A
B00
01
10
11
4
CLK
ENEN
00
01
10
<<2
25:0 (Addr)
31:28
27:0
PCJump
5:0
31:26
Branch
MemWrite
ALUSrcA
RegWriteOp
Funct
ControlUnit
PCSrc
CLK
ALUControl2:0
ALUSrcB1:0IRWrite
IorD
PCWritePCEn
Re
gD
st
Mem
toReg
Next Time Next topic:
We’ll look at pipelined MIPS Improving throughput (and adding complexity!) by
trying to use all of the hardware every cycle
44