RHK.F95 1
Lecture 11: Case Study—Tomasulo Algorithm
Professor Randy H. KatzComputer Science 252
Fall 1995
RHK.F95 2
Review: Scoreboard Summary
• Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache)
• Limitations of 6600 scoreboard– No forwarding– Limited to instructions in basic block (small window)– Number of functional units(structural hazards)– Wait for WAR hazards– Prevent WAW hazards
RHK.F95 3
Another Dynamic Algorithm: Tomasulo Algorithm
• For IBM 360/91 about 3 years after CDC 6600• Goal: High Performance without special compilers• Differences between IBM 360 & CDC 6600 ISA
– IBM has only 2 register specifiers/instr vs. 3 in CDC 6600– IBM has 4 FP registers vs. 8 in CDC 6600
• Differences between Tomasulo Algorithm & Scoreboard– Control & buffers distributed with Function Units vs. centralized in
scoreboard; called “reservation stations”– Registers in instructions replaced by pointers to reservation station
buffer– HW renaming of registers to avoid WAR, WAW hazards– Common Data Bus broadcasts results to all FUs– Load and Stores treated as FUs as well
RHK.F95 4
From instruction unit
Floating-pointoperations
Frommemory
Load buffersFP registers
Store buffers
To memory
654321 3
2
1
Reservationstations
FP adders FP multipliers
321
21
Common data bus (CDB)
Operation bus
Operandbuses
LoadBuffer
FPRegisters
FP Op Queue
StoreBuffer
FP AddRes.Station
FP MulRes.Station
CommonDataBus
Tomasulo Organization
RHK.F95 5
Reservation Station Components
Op—Operation to perform in the unit (e.g., + or –)Qj, Qk—Reservation stations producing source registers Vj, Vk—Value of Source operandsRj, Rk—Flags indicating when Vj, Vk are ready
Busy—Indicates reservation station and FU is busy
Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.
RHK.F95 6
Three Stages of Tomasulo Algorithm
1.Issue—get instruction from FP Op Queue If reservation station free, the scoreboard issues instr &
sends operands (renames registers).
2.Execution—operate on operands (EX) When both operands ready then execute;
if not ready, watch CDB for result
3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;
mark reservation station available.
RHK.F95 7
Tomasulo Example Cycle 0
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU
RHK.F95 8
Tomasulo Example Cycle 1
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 No 34+R2LD F2 45+ R3 Load2 NoMULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1
RHK.F95 9
Tomasulo Example Cycle 2
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1
RHK.F95 10
Tomasulo Example Cycle 3
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1
RHK.F95 11
Tomasulo Example Cycle 4
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(34+R2) Add1
RHK.F95 12
Tomasulo Example Cycle 5
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 Load2 Yes 45+R3MULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) Load20 Add2 No
Add3 No0 Mult1 Yes MULTD R(F4) Load20 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 Load2 M(34+R2) Add1 Mult2
RHK.F95 13
Tomasulo Example Cycle 6
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk2 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1
Add3 No1 0 Mult1 Yes MULTD M(45+R3) R(F4)
0 Mult2 Yes DIVD M(34+R2) Mult1Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(45+R3) Add2 Add1 Mult2
RHK.F95 14
Tomasulo Example Cycle 7
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk1 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1
Add3 No9 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(45+R3) Add2 Add1 Mult2
RHK.F95 15
Tomasulo Example Cycle 8
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 Yes SUBD M(34+R2) M(45+R3)0 Add2 Yes ADDD M(45+R3) Add1
Add3 No8 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(45+R3) Add2 Add1 Mult2
RHK.F95 16
Tomasulo Example Cycle 9
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)
Add3 No7 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(45+R3) Add2 M()–M() Mult2
RHK.F95 17
Tomasulo Example Cycle 10
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No2 Add2 Yes ADDD M()–M() M(45+R3)
Add3 No7 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 0 FU Mult1 M(45+R3) Add2 M()–M() Mult2
RHK.F95 18
Tomasulo Example Cycle 11
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No1 Add2 Yes ADDD M()–M() M(45+R3)
Add3 No5 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 1 FU Mult1 M(45+R3) Add2 M()–M() Mult2
RHK.F95 19
Tomasulo Example Cycle 12
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 Yes ADDD M()–M() M(45+R3)
Add3 No4 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 2 FU Mult1 M(45+R3) Add2 M()–M() Mult2
RHK.F95 20
Tomasulo Example Cycle 13
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No3 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 3 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 21
Tomasulo Example Cycle 14
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No2 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 4 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 22
Tomasulo Example Cycle 15
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No1 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 5 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 23
Tomasulo Example Cycle 16
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 Yes MULTD M(45+R3) R(F4)0 Mult2 Yes DIVD M(34+R2) Mult1
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 6 FU Mult1 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 24
Tomasulo Example Cycle 17
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 7 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 25
Tomasulo Example Cycle 18
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No
4 0 Mult2 Yes DIVD M*F4 M(34+R2)Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 8 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 26
Tomasulo Example Cycle 57
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No1 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F305 7 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 27
Tomasulo Example Cycle 58
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5 5 8ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 Yes DIVD M*F4 M(34+R2)
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F305 8 FU M*F4 M(45+R3) (M–M)+M() M()–M() Mult2
RHK.F95 28
Tomasulo Example Cycle 59
Instruction status Execution WriteInstruction j k Issue complete Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 5 6 Load2 NoMULTDF0 F2 F4 3 1 6 1 7 Load3 NoSUBD F8 F6 F2 4 8 9DIVD F10 F0 F6 5 5 8 5 9ADDD F6 F8 F2 6 1 2 1 3Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op V j Vk Qj Qk0 Add1 No0 Add2 No
Add3 No0 Mult1 No0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F305 9 FU M*F4 M(45+R3) (M–M)+M() M()–M() M*F4/M
RHK.F95 29
Tomasulo Loop Example
Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop
• Multiply takes 4 clocks• Load have cache misses
RHK.F95 30
Loop Example Cycle 0Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 Load1 NoMULTDF4 F0 F2 1 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
0 8 0 Qi
RHK.F95 31
Loop Example Cycle 1Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 8 0 Qi Load1
RHK.F95 32
Loop Example Cycle 2Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 Load3 No QiLD F0 0 R1 2 Store1 NoMULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
2 8 0 Qi Load1 Mult1
RHK.F95 33
Loop Example Cycle 3Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
3 8 0 Qi Load1 Mult1
RHK.F95 34
Loop Example Cycle 4Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
4 7 2 Qi Load1 Mult1
RHK.F95 35
Loop Example Cycle 5Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
5 7 2 Qi Load1 Mult1
RHK.F95 36
Loop Example Cycle 6Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
6 7 2 Qi Load1 Mult1
RHK.F95 37
Loop Example Cycle 7Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 NoSD F4 0 R1 2 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
7 7 2 Qi Load2 Mult2
RHK.F95 38
Loop Example Cycle 8Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
8 7 2 Qi Load2 Mult2
RHK.F95 39
Loop Example Cycle 9Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 Load1 Yes 8 0MULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load1 SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
9 6 4 Qi Load2 Mult2
RHK.F95 40
Loop Example Cycle 10Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 Yes 7 2SD F4 0 R1 1 3 Load3 No QiLD F0 0 R1 2 6 1 0 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R14 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 80 Mult2 Yes MULTD R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 0 6 4 Qi Load2 Mult2
RHK.F95 41
Loop Example Cycle 11Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R13 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 84 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 1 6 4 Qi Mult2
RHK.F95 42
Loop Example Cycle 12Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R12 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 83 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 2 6 4 Qi Mult2
RHK.F95 43
Loop Example Cycle 13Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R11 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 82 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 3 6 4 Qi Mult2
RHK.F95 44
Loop Example Cycle 14Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 Mult1MULTDF4 F0 F2 2 7 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD M(80) R(F2) SUBI R1 R1 # 81 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 4 6 4 Qi Mult2
RHK.F95 45
Loop Example Cycle 15Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 Store2 Yes 7 2 Mult2SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 No SUBI R1 R1 # 80 Mult2 Yes MULTD M(72) R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 5 6 4 Qi Mult2
RHK.F95 46
Loop Example Cycle 16Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 NoReservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 6 6 4 Qi Mult1
RHK.F95 47
Loop Example Cycle 17Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 7 6 4 Qi Mult1
RHK.F95 48
Loop Example Cycle 18Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 Yes 8 0 M(80)*R(F2)MULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 8 5 6 Qi Mult1
RHK.F95 49
Loop Example Cycle 19Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
1 9 5 6 Qi Mult1
RHK.F95 50
Loop Example Cycle 20Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 Yes 7 2 M(72)*R(72)SD F4 0 R1 2 8 2 0 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
2 0 5 6 Qi Mult1
RHK.F95 51
Loop Example Cycle 21Instruction status ExecutionWriteInstruction j k iteration Issue completeResult Busy AddressLD F0 0 R1 1 1 9 1 0 Load1 NoMULTDF4 F0 F2 1 2 1 4 1 5 Load2 NoSD F4 0 R1 1 3 1 8 1 9 Load3 Yes 6 4 QiLD F0 0 R1 2 6 1 0 1 1 Store1 NoMULTDF4 F0 F2 2 7 1 5 1 6 Store2 NoSD F4 0 R1 2 8 2 0 2 1 Store3 Yes 6 4 Mult1Reservation Stations S1 S2 RS for jRS for k
Time Name Busy Op V j Vk Qj Qk Code:0 Add1 No LD F0 0 R10 Add2 No MULTDF4 F0 F20 Add3 No SD F4 0 R10 Mult1 Yes MULTD R(F2) Load3 SUBI R1 R1 # 80 Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12... F30
2 1 5 6 Qi Mult1
RHK.F95 52
Tomasulo Summary
• Prevents Register as bottleneck• Avoids WAR, WAW hazards of Scoreboard• Allows loop unrolling in HW• Not limited to basic blocks (provided branch
prediction)• Lasting Contributions
– Dynamic scheduling– Register renaming– Load/store disambiguation
• Next: More branch prediction