CS252/KubiatowiczLec 5.1
9/15/00
CS252Graduate Computer Architecture
Lecture 5
Software Scheduling around HazardsHardware* Out-of-order Scheduling
September 15, 2000
Prof. John Kubiatowicz
CS252/KubiatowiczLec 5.33
9/15/00
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
33
Techniques to Increase ILP
• Forwarding• Branch Prediction• Superpipelining• Superscalar with Static Multiple Issue
VLIW• Superscalar with Dynamic Multiple
Issue• Superscalar with Speculation• Superscalar with Simultaneous
Multithreading (SMT)
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
34
Static Multiple Issue
Key idea: issue (decode & execute) multiple instructions in each clock cycle
Example: Issue load/store and ALU/branch in MIPS
ALU or branch
Instruction type Pipe stages
IF ID EX MEM WBLoad/ Store IF ID EX MEM WBALU or branchLoad/ StoreALU or branchLoad/ StoreALU or branchLoad/ Store
IF ID EX MEM WBIF ID EX MEM WB
IF ID EX MEM WBIF ID EX MEM WB
IF ID EX MEM WBIF ID EX MEM WB
(Fig. 6.44, old 6.57)
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
35
Example - A Static Multiple Issue MIPS
PCInstruction
memory
4
RegistersMux
Mux
ALU
Mux
Datamemory
Mux
40000040
Signextend Sign
extend
ALU Address
Writedata
(Fig. 6.45, old 6.58)
Executes ALU/Branch Instructions
Executes Load/Store Instructions
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
36
VLIW / EPIC Processors
VLIW - Very Long Instruction Words Functional units exposed in instruction word Static scheduling by compiler Pipeline is exposed; compiler must schedule delays to get
right result Examples: Philips Trimedia, Texas Instruments C6000
Explicit Parallel Instruction Computer (EPIC) 3 41-bit instructions in each instruction packet Compiler determines parallelism Hardware checks dependencies and fowards/stalls Examples: Intel Itanium, Itanium 2
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
37
Itanium Block Diagram
Source: Extreme Tech www.extremetech.com
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
38
Software Manipulation to Increase ILP
Software Transformations can increase ILP Code reordering to reduce stalls Loop unrolling
Example (p. 438)Loop: lw $t0, 0($s1) # $t0=array element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 0($s1) # store result
addi $s1, $s1, -4 # decrement ptr
bne $s1, $zero, Loop
Goal: reorder to speed superscalar execution
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
39
Software ManipulationReordering Code
Note sparse utilization of superscalar pipeline! End result:
5 instructions in 4 clocks CPI = 0.8 IPC = 1.25
ALU or branch instruction Data transfer instruction ClockLoop: lw $t0, 0($s1) 1
addi $s1, $s1, -4 2addu $t0, $t0, $s2 3bne $s1, $zero, Loop sw $t0, 4($s1) 4
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
40
Software Manipulation - Loop Unrolling Assume loop count a multiple of 4 & unroll End result:
4 loop iterations in 8 clocks IPC = 1.75 2 clocks / iteration!
ALU or branch instruction Data transfer instruction ClockLoop: addi $s1, $s1, -16 lw $t0, 0($s1) 1
lw $t1, 12($s1) 2lw $t2, 8($s1) 3lw $t3, 4($s1) 4sw $t0, 0($s1) 5sw $t1, 12($s1) 6sw $t2, 8($s1) 7
bne $s1, $zero, Loop sw $t3, 4($s1) 8
addu $t0, $t0, $s2addu $t1, $t1, $s2addu $t2, $t2, $s2addu $t3, $t3, $s2
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
41
Techniques to Increase ILP
Forwarding Branch Prediction Superpipelining Superscalar with Static Multiple Issue VLIW Superscalar with Dynamic Multiple Issue Superscalar with Speculation Superscalar with Simultaneous Multithreading
(SMT)
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
42
Dynamic Multiple Issue
Key ideas: ”Look past" stalls for instructions that can execute
lw $t0, 20($t2)
addu $t1, $t0, $s2
sub $s4, $s4, $s3
slti $t5, $s4, 20 Execute instructions out of order Use multiple functional units for parallel execution Forward results between functional units when necessary Update registers (in original order of execution)
addu stalls until $t0 available
sub is ready to execute but blocked by stall!
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
43
Speculation
Guess about the outcome of an instruction (e.g., branch or load) Based on guess, start executing instructions Cancel started instructions if guess is incorrect
Complicating factors Must buffer instruction results until outcome known Exceptions in speculated instructions - how can you have
an exception in an instruction that didn’t execute?
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
44
Superscalar Dynamic Pipelining
(Fig. 6.49, old 6.61)
Instruction Fetchand decode unit
Reservationstation
Reservationstation
Reservationstation
Reservationstation
Integer IntegerFloating
pointLoad/Store
Commitunit
Functionalunits
In-order issue
In-order commit
Out-of-orderexecute
CS252/KubiatowiczLec 5.45
9/15/00
Can HW reduce CPI to 1- or IPC to 1+?
• Why in HW/at run time?– Works when can’t know real dependence at compile time– Compiler simpler– Code for one machine runs well on another
• Key idea #1: Allow instructions behind stall to proceed
DIVD F0,F2,F4ADDD F10,F0,F8SUBD F12,F8,F14
Out-of-order execution out-of-order completion?
• Key idea #2: Register RenamingDIVD F0,F2,F4 DIVD F0,F2,F4 ADDD F10,F0,F8 ADDD F10,F0,F8 SUBD F0,F8,F14 SUBD F100,F8,F14 MULD F6,F10,F0 MULD F6,F10,F100
Totally removes WAR and WAW hazards.
CS252/KubiatowiczLec 5.46
9/15/00
Moving beyond the five-stage pipeline:
• Why limit performance for slow/less frequent ops?– Variable latencies -> out-of-order execution desirable
• How do we prevent WAR and WAW hazards?• How do we deal with variable latency?
– Forwarding for RAW hazards will be harder.
Clock Cycle Number
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
LD F6,34(R2) I F I D EX MEM WB
LD F2,45(R3) I F I D EX MEM WB
MULTD F0,F2,F4 I F I D stall M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 MEM WB
SUBD F8,F6,F2 I F I D A1 A2 MEM WB
DI VD F10,F0,F6 I F I D stall stall stall stall stall stall stall stall stall D1 D2
ADDD F6,F8,F2 I F I D A1 A2 MEM WB
RAW
WAR
CS252/KubiatowiczLec 5.48
9/15/00
Scoreboard Architecture(CDC 6600)
Fu
ncti
on
al U
nit
s
Reg
iste
rs
FP MultFP Mult
FP MultFP Mult
FP DivideFP Divide
FP AddFP Add
IntegerInteger
MemorySCOREBOARDSCOREBOARD
CS252/KubiatowiczLec 5.49
9/15/00
ECE 313 Fall 2004
Lecture 19 - Pipelining 3
49
Basic Pipelined MIPS
W
M WE
5
RD1
RD2
RN1
RN2
WN
WD
RegisterFile
ALU
EXTND
16 32
RD
WD
DataMemory
ADDR
32
<<2
RD
InstructionMemory
ADDR
PC
4
ADD
ADD
5
5
5
IF/ID ID/EX EX/MEM MEM/WB
Zero
0
1
MemRead
MemWrite
ALUControl6
0
15
0
1
0
1
W
MControl
IF_pc_next
IF_pc
IF_pc4
ID_pc4
ID_op
WB_RegWrite
RegWRite
ID_immed
ID_rt
ID_rd
EX_rd
EX_rt
EX_pc4
EX_RegDst
EX_ALUOp
MEM_PCSrc
ID_extend
EX_btgt
EX_Zero
EX_offset
MEM_btgtMEM_btgt
MEM_Branch
MEM_Zero
MEM_MemRead
EX_RegRd MEM_RegRd WB_RegRd
WB_RegWrite
WB_ALUOut
EX_extend
EX_rd1
EX_funct
ID_rs
ID_rt
WB_wd
EX_ALUSrc
WB_wn
MEM_memout
WB_memout
reset
reset
reset
reset
reset
CS252/KubiatowiczLec 5.51
9/15/00
Four Stages of Scoreboard Control
• Issue—decode instructions & check for structural hazards (ID1)
– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any
previously issued but uncompleted instruction (WAW hazards)
• Read operands—wait until no data hazards, then read operands (ID2)
– All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
– No forwarding of data in this model!
CS252/KubiatowiczLec 5.52
9/15/00
Four Stages of Scoreboard Control
• Execution—operate on operands (EX)– The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the scoreboard that it has completed execution.
• Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:
Example: DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
CS252/KubiatowiczLec 5.53
9/15/00
Three Parts of the Scoreboard
• Instruction status:Which of 4 steps the instruction is in
• Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each functional unitBusy: Indicates whether the unit is busy or not
Op: Operation to perform in the unit (e.g., + or –)Fi: Destination registerFj,Fk: Source-register numbersQj,Qk:Functional units producing source registers Fj, FkRj,Rk: Flags indicating when Fj, Fk are ready
• Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register
CS252/KubiatowiczLec 5.54
9/15/00
Scoreboard ExampleInstruction status: Read Exec Write
Instruction j k Issue Oper Comp ResultLD F6 34+ R2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
CS252/KubiatowiczLec 5.55
9/15/00
Scoreboard Example: Cycle 1
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
CS252/KubiatowiczLec 5.56
9/15/00
Detailed Scoreboard Pipeline Control
Read operandsExecutio
n complete
Instruction status
Write result
Issue
Bookkeeping
Rj No; Rk No
f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rk(f) Yes); Result(Fi(FU)) 0; Busy(FU) No
Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;
Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;
Rj and Rk
Functional unit done
Wait until
f((Fj(f)Fi(FU) or Rj(f)=No) & (Fk(f)Fi(FU) or
Rk( f )=No))
Not busy (FU) and not result(D)
CS252/KubiatowiczLec 5.57
9/15/00
Scoreboard Example: Cycle 2
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
1 Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
• Can we enter Issue for 2nd LD?
CS252/KubiatowiczLec 5.58
9/15/00
Scoreboard Example: Cycle 3
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
0 Integer Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
• Issue MULT (in order)?
CS252/KubiatowiczLec 5.59
9/15/00
Scoreboard Example: Cycle 4
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU
CS252/KubiatowiczLec 5.60
9/15/00
Scoreboard Example: Cycle 5
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
CS252/KubiatowiczLec 5.61
9/15/00
Scoreboard Example: Cycle 6
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer
CS252/KubiatowiczLec 5.62
9/15/00
Scoreboard Example: Cycle 7
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add
• Read multiply operands?
CS252/KubiatowiczLec 5.63
9/15/00
Scoreboard Example: Cycle 8a
(First half of clock cycle)Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Sub F8 F6 F2 Integer Yes NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide
CS252/KubiatowiczLec 5.64
9/15/00
Scoreboard Example: Cycle 8b
(Second half of clock cycle)Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.65
9/15/00
Scoreboard Example: Cycle 9
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
• Read operands for MULT & SUB? Issue ADDD?
Note Remaining
CS252/KubiatowiczLec 5.66
9/15/00
Scoreboard Example: Cycle 10
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No9 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No1 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.67
9/15/00
Scoreboard Example: Cycle 11
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.68
9/15/00
Scoreboard Example: Cycle 12
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 Divide
• Read operands for DIVD?
CS252/KubiatowiczLec 5.69
9/15/00
Scoreboard Example: Cycle 13
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.70
9/15/00
Scoreboard Example: Cycle 14
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.71
9/15/00
Scoreboard Example: Cycle 15
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.72
9/15/00
Scoreboard Example: Cycle 16
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.73
9/15/00
Scoreboard Example: Cycle 17
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU Mult1 Add Divide
• Why not write result of ADD???
WAR Hazard!
CS252/KubiatowiczLec 5.74
9/15/00
Scoreboard Example: Cycle 18
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.75
9/15/00
Scoreboard Example: Cycle 19
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3019 FU Mult1 Add Divide
CS252/KubiatowiczLec 5.76
9/15/00
Scoreboard Example: Cycle 20
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3020 FU Add Divide
CS252/KubiatowiczLec 5.77
9/15/00
Scoreboard Example: Cycle 21
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No No
40 Divide Yes Div F10 F0 F6 Yes Yes
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3021 FU Add Divide
• WAR Hazard is now gone...
CS252/KubiatowiczLec 5.78
9/15/00
Scoreboard Example: Cycle 22
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
39 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3022 FU Divide
CS252/KubiatowiczLec 5.79
9/15/00
(skip a few cycles)
CS252/KubiatowiczLec 5.80
9/15/00
Scoreboard Example: Cycle 61
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd No
0 Divide Yes Div F10 F0 F6 No No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3061 FU Divide
CS252/KubiatowiczLec 5.81
9/15/00
Scoreboard Example: Cycle 62
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU
CS252/KubiatowiczLec 5.82
9/15/00
Review: Scoreboard Example: Cycle 62
Instruction status: Read Exec WriteInstruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer NoMult1 NoMult2 NoAdd NoDivide No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU
• In-order issue; out-of-order execute & commit
CS252/KubiatowiczLec 5.83
9/15/00
Detailed Scoreboard Pipeline Control
Read operandsExecutio
n complete
Instruction status
Write result
Issue
Bookkeeping
Rj No; Rk No
f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rk(f) Yes); Result(Fi(FU)) 0; Busy(FU) No
Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;
Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;
Rj and Rk
Functional unit done
Wait until
f((Fj(f)Fi(FU) or Rj(f)=No) & (Fk(f)Fi(FU) or
Rk( f )=No))
Not busy (FU) and not result(D)