+ All Categories
Home > Documents > ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in...

ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in...

Date post: 14-Dec-2015
Category:
Upload: gabriella-hicken
View: 223 times
Download: 5 times
Share this document with a friend
Popular Tags:
29
ECE 2162 Tomasulo’s Algorithm
Transcript
Page 1: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

ECE 2162Tomasulo’s Algorithm

Page 2: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Implementing Dynamic Scheduling

• Tomasulo’s Algorithm– Used in IBM 360/91 (in the 60s)– Tracks when operands are available

to satisfy data dependences– Removes name dependences

through register renaming– Very similar to what is used today

• Almost all modern high-performance processors use a derivative of Tomasulo’s… much of the terminology survives to today.

2

Page 3: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Tomasulo’s Algorithm: The Picture

3

Page 4: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Issue (1)

– Get next instruction from instruction queue.

– Find a free reservation station for it(if none are free, stall until one is)

– Read operands that are in the registers– If the operand is not in the register,

find which reservation station will produce it

– In effect, this step renames registers(reservation station IDs are “temporary” names)

4

Page 5: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Issue (2)

F2=F4+F1

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)

0

1

0

0

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

0.7071

To-Do list (from last slide):Get next inst from IB’sFind free reservation stationRead operands from RFRecord source of other operandsUpdate source mapping (RAT)

To-Do list (from last slide):Get next inst from IB’sFind free reservation stationRead operands from RFRecord source of other operandsUpdate source mapping (RAT)

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

5

Page 6: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Issue (2)

F2=F4+F1 F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)

0

1

0

0

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

0.7071

To-Do list (from last slide):Get next inst from IB’sFind free reservation stationRead operands from RFRecord source of other operandsUpdate source mapping (RAT)

To-Do list (from last slide):Get next inst from IB’sFind free reservation stationRead operands from RFRecord source of other operandsUpdate source mapping (RAT)

(1) 2.718

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

6

Page 7: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Execute (1)

– Monitor results as they are produced– Put a result into all reservation stations waiting

for it (missing source operand)– When all operands available for an instruction,

it is ready (we can actually execute it) – Several ready instrs for one functional unit?

• Pick one.• Except for load/store

Load/Store must be done inthe proper order to avoid hazards through memory(more loads/stores this in a later lecture)

7

Page 8: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Execute (2)

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

(1) 2.718

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

4

1

0

0

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

F2=F4+F1

8

Page 9: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

3

Execute (2)

F4=F1-F2

F1=F2+F3

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)(4) (1)

(1) 2.718

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

(1) 2.718

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

4

1

0

2

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

F2=F4+F1

9

Page 10: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Execute (2)

F4=F1-F2

F1=F2+F3

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)(4) (1)

(1) 2.718

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

(1) 2.718

F2=F4+F1(1) 3.8487

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

3

1

0

2

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

F2=F4+F1

10

Page 11: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Execute (2)

F4=F1-F2

F1=F2+F3

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)(4) (1)

(1) 2.718

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

To-Do list (from last slide):Monitor results from ALUsCapture matching operandsCompete for ALUs

(1) 2.718

F2=F4+F1(1) 3.8487

F1 = F2 + F3

F4 = F1 – F2

F1 = F2 / F3

Instruction Buffers

1.

2.

3.

3

1

0

2

F1

F2

F3

F4

RAT

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

11

Page 12: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Execute (3)More than one ready inst for the same unit

F4=F3-F2

F1=F2+F3

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)2.718 (1)

(1) 2.718

3.8487(1)

F2=F4+F1(1) 3.8487

3.8487

3.8487

2.718

Common heuristic: oldest first You can do whatever: it onlyaffects performance, not correctness

12

Page 13: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Write Result (1)

– When result is computed, make it availableon the “common data bus” (CDB), wherewaiting reservation stations can pick it up

– Result stored in the register file– Stores write to memory– This step frees the reservation station– For our register renaming,

this recycles the temporary name(future instructions can again find the value in the actual register, until it is renamed again)

13

Page 14: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

6.5667

Write Result (2)

0

0

3.8486994

F4=F1-F2

F1=F2+F3

F1=F2/F3

Adder FP-Cmplx

A1 (1)

A2 (2)

A3 (3)

C1 (4)

C2 (5)(4)

2.718

To-Do list (from last slide):Broadcast on CDBWriteback to RFUpdate MappingFree reservation station

To-Do list (from last slide):Broadcast on CDBWriteback to RFUpdate MappingFree reservation station

3.8487

3.141593

-1.00000

2.718282

0.707107

F1

F2

F3

F4

Reg File

3

1

0

2

F1

F2

F3

F4

RAT (1)

(1)

(1)F2=F4+F1 0.7071

(1) 0.7071+

F1 = F2 + F3F4 = F1 – F2F1 = F2 / F31.

2.3.

F2 = F4 + F10.

3.8487

3.8487

2.718

Only update RAT(and RF) if RAT still

contains your mapping!

Only update RAT(and RF) if RAT still

contains your mapping!

X

14

Page 15: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Tomasulo’s Algorithm: Load/Store

• The reservation stations take care of dependences through registers.

• Dependences also possible through memory– Loads and stores not reordered in original

IBM 360– We’ll talk about how to do load-store

reordering later

15

Page 16: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 0

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

Is Ex WBusy Op Vj Vk Qj Qk A

Reservation Stations

F0 F2 F4 F6 F8 F10 F12RAT:

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Assume

R2 is 100

R3 is 200

F4 is 2.5

2.5 …Architecture Reg File: 16

Page 17: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 1

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1Is Ex W 1 L.D 134

Busy Op Vj Vk Qj Qk A

Reservation Stations

LD1 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Assume

R2 is 100

R3 is 200

F4 is 2.5

RAT:

2.5 …Architecture Reg File: 17

Page 18: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 2

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2-Is Ex W 1 L.D 134

1 L.D 245

Busy Op Vj Vk Qj Qk A

Reservation Stations

LD2 LD1 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

2

Assume

R2 is 100

R3 is 200

F4 is 2.5

RAT:

2.5 …Architecture Reg File: 18

Page 19: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 3

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2-3Is Ex W 1 L.D 134

1 L.D 245

1 MUL.D 2.5 LD2

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 LD2 LD1 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

RAT:

2.5 …Architecture Reg File: 19

Page 20: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 4

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

1 L.D 2451 SUB.D 0.5 LD2

1 MUL.D 2.5 LD2

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 LD2 AD1 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

234

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

RAT:

2.5 0.5 …Architecture Reg File: 20

Page 21: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 5

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

01 SUB.D 1.5 0.5

1 MUL.D 1.5 2.51 DIV.D 0.5 ML1

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 AD1 ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

Assume

R2 is 100

R3 is 200

F4 is 2.5

3 5

RAT:

1.5 2.5 0.5 …Architecture Reg File: 21

Page 22: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 6

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

01 SUB.D 1.5 0.51 ADD.D 1.5 AD1

1 MUL.D 1.5 2.51 DIV.D 0.5 ML1

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 AD2 AD1 ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

RAT:

1.5 2.5 0.5 1.0 …Architecture Reg File: 22

Page 23: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 8

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

001 ADD.D 1.0 1.5

1 MUL.D 1.5 2.51 DIV.D 0.5 ML1

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 AD2 ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

8

RAT:

1.5 2.5 0.5 1.0 …Architecture Reg File: 23

Page 24: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 9

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

001 ADD.D 1.0 1.5

1 MUL.D 1.5 2.51 DIV.D 0.5 ML1

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 AD2 ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

8

9

RAT:

1.5 2.5 0.5 1.0 …Architecture Reg File: 24

Page 25: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 11

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

000

1 MUL.D 1.5 2.51 DIV.D 0.5 ML1

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML1 ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

8

9 11

RAT:

1.5 2.5 2.5 1.0 …Architecture Reg File: 25

Page 26: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 16

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

000

01 DIV.D 3.75 0.5

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

816

9 11

RAT:

3.75 1.5 2.5 2.5 1.0 …Architecture Reg File: 26

Page 27: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 17

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

000

01 DIV.D 3.75 0.5

Busy Op Vj Vk Qj Qk A

Reservation Stations

ML2 …

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

816

179 11

RAT:

3.75 1.5 2.5 2.5 1.0 …Architecture Reg File: 27

Page 28: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Detailed Example – Cycle 57

1. L.D F6, 34(R2)

2. L.D F2, 45(R3)

3. MUL.D F0, F2, F4

4. SUB.D F8, F2, F6

5. DIV.DF10,F0,F6

6. ADD.D F6, F8, F2

1 2 4Is Ex W 0

000

00

Busy Op Vj Vk Qj Qk A

Reservation Stations

F0 F2 F4 F6 F8 F10 F12

LD1LD2AD1AD2AD3ML1ML2

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

23

54

6

Assume

R2 is 100

R3 is 200

F4 is 2.5

3

66

5

816

179

5711

RAT:

3.75 1.5 2.5 2.5 1.0 7.5 …Architecture Reg File: 28

Page 29: ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Timing Example

• Kind of hard to keep track with previous table-based approach

• Simplified version to track timing only

F6,34(R2) 1 2 4L.D

Operands Is Exec Wr CommentsInst

F2, 45(R3) 2 3 5L.D

F0,F2,F4 3 6 16MUL.D

F8,F2,F6 4 6 8SUB.D

F10,F0,F6 5 17 57DIV.D

F6,F8,F2 6 9 11ADD.D

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

Load: 2 cyclesAdd: 2 cycles

Mult: 10 cyclesDivide: 40 cycles

29


Recommended