About the Final Exam
Due to scheduling conflicts, the final exam will be a take-home exam• Available by Friday, April 20 (this Friday)♦ May be ready in time for class on Thursday
• Three hour exam♦ Closed book, closed notes, closed devices♦ Covered by the Rice Honor Code
• Due back to me by May 2♦ It would help me if you return them earlier than May 2♦ Return by signing it in and sliding it under my door→ clipboard outside the door to sign
I will post a list of topics by Thursday
COMP 506, Rice University 1
Instruction Scheduling
Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 506 at Rice University have explicit permission to make copies of these
materials for their personal use.
Faculty from other educational institutions may use these materials for nonprofit educational purposes,
provided this copyright notice is preserved
COMP 506
Rice University
Spring 2018
Chapter 12 in EaC2e.
Front End Optimizer Back End
IR IRsourcecode
targetcode
…
Order of Operations Matters
The order in which operations issue affects total runtime• Many operations have non-zero latencies• Modern machines can issue several operations in each cycle• Execution time is order-dependent (and has been so since the 1960’s)
Instruction Scheduling reorders operations• Tries to minimize wasted cycles (cover latencies with useful work)• Tries to maximize functional unit utilization
Opportunities• Non-blocking operations♦ Can issue new operations which waiting for one that is executing• Operations that are independent of each other♦ Can issue independent operations in parallel with each other
COMP 506, Rice University 3
ALU Characteristics
To schedule operations, the compiler needs accurate data on latenciesThis data is surprisingly hard to obtain.
COMP 506, Rice University 4
platform-aware compilation environment
platform-aware compilation environment
Intel E5530 operation latencies Instruction Cost64 bit integer subtract 1
64 bit integer multiply 3
64 bit integer divide 41
Double precision add 3
Double precision subtract 3
Double precision multiply 5
Double precision divide 22
Single precision add 3
Single precision subtract 3
Single precision multiply 4
Single precision divide 14
• Value-dependent behavior
• Context-dependent behavior
• Compiler-dependent behavior
♦ Have seen gcc underallocate registers & inflate operation costs with spills
♦ Have seen commercial compiler generate 3 extra ops per divide (raising cost by 3)
• Difficult to reconcile measured reality with data from manufacturer’s docs (e.g., integer divide cost)
Order of Operations Matters
The order in which operations issue affects total runtime• Many operations have non-zero latencies• Modern machines can issue several operations in each cycle• Execution time is order-dependent (and has been so since the 1960’s)
Instruction Scheduling reorders operations• Tries to minimize wasted cycles (cover latencies with useful work)• Tries to maximize functional unit utilization
Opportunities• Non-blocking operations♦ Can issue new operations which waiting for one that is executing• Operations that are independent of each other♦ Can issue independent operations in parallel with each other
COMP 506, Rice University 5
A Simple Example
COMP 506, Rice University 6
start endNaïve Schedule
1 3 loadAI rARP, @a ⇒r1
4 4 add r1, r1 ⇒ r1
5 7 loadAI rARP, @b ⇒ r2
8 9 mult r1, r2 ⇒ r1
10 12 loadAI rARP, @c ⇒ r2
13 14 mult r1, r2 ⇒ r1
15 17 loadAI rARP, @d ⇒ r2
18 19 mult r1, r2 ⇒ r1
20 22 storeAI r1 ⇒ rARP, @a
Op Latency
loads & stores 3 cycles
multiplies 2 cycles
all others 1 cycle
• Schedule for a single functional unit
• Naïve schedule takes ops in treewalk order
a ¬ a * 2 * b * c * d
A Simple Example
COMP 506, Rice University 7
a ¬ a * 2 * b * c * d
start endNaïve Schedule
1 3 loadAI rARP, @a ⇒r1
4 4 add r1, r1 ⇒ r1
5 7 loadAI rARP, @b ⇒ r2
8 9 mult r1, r2 ⇒ r1
10 12 loadAI rARP, @c ⇒ r2
13 14 mult r1, r2 ⇒ r1
15 17 loadAI rARP, @d ⇒ r2
18 19 mult r1, r2 ⇒ r1
20 22 storeAI r1 ⇒ rARP, @a
Op Latency
loads & stores 3 cycles
multiplies 2 cycles
all others 1 cycle
start endSchedule Loads Early
1 3 loadAI rARP, @a ⇒r1
2 4 loadAI rARP, @b ⇒ r2
3 5 loadAI rARP, @c ⇒ r3
4 4 add r1, r1 ⇒ r1
5 6 mult r1, r2 ⇒ r1
6 8 loadAI rARP, @d ⇒ r2
7 8 mult r1, r3 ⇒ r1
9 10 mult r1, r2 ⇒ r1
11 13 storeAI r1 ⇒ rARP, @a
• Scheduled for a single functional unit
• Naïve schedule takes ops in treewalk order
• Scheduled code overlaps latencies in the load operations
• Shows potential of non-block operations
A Simple Example
COMP 506, Rice University 8
a ¬ a * 2 * b * c * d
start endNaïve Schedule
1 3 loadAI rARP, @a ⇒r1
4 4 add r1, r1 ⇒ r1
5 7 loadAI rARP, @b ⇒ r2
8 9 mult r1, r2 ⇒ r1
10 12 loadAI rARP, @c ⇒ r2
13 14 mult r1, r2 ⇒ r1
15 17 loadAI rARP, @d ⇒ r2
18 19 mult r1, r2 ⇒ r1
20 22 storeAI r1 ⇒ rARP, @a
Op Latency
loads & stores 3 cycles
multiplies 2 cycles
all others 1 cycle
start endSchedule Loads Early
1 3 loadAI rARP, @a ⇒r1
2 4 loadAI rARP, @b ⇒ r2
3 5 loadAI rARP, @c ⇒ r3
4 4 add r1, r1 ⇒ r1
5 6 mult r1, r2 ⇒ r1
6 8 loadAI rARP, @d ⇒ r2
7 8 mult r1, r3 ⇒ r1
9 10 mult r1, r2 ⇒ r1
11 13 storeAI r1 ⇒ rARP, @a
• Original code reused r2 , which constrained the scheduler• New schedule uses an extra register• We assume, wlog, that scheduling precedes allocation→ Scheduler can increase use of virtual registers
Instruction Scheduling (Operational View)
1. Rename registers into values to eliminate artificial constraints• New name at each definition, similar to names used in LVN and SSA• In practice, can only rename “local” names — not LIVE on exit
COMP 506, Rice University 9
The Original Code
a: loadAI rARP, @a ⇒ r1
b: add r1, r1 ⇒ r1
c: loadAI rARP, @b ⇒ r2
d: mult r1, r2 ⇒ r1
e: loadAI rARP, @c ⇒ r2
f: mult r1, r2 ⇒ r1
g: loadAI rARP, @d ⇒ r2
h: mult r1, r2 ⇒ r1
i: storeAI r1 ⇒ r0,@a
Instruction Scheduling (Operational View)
1. Rename registers into values to eliminate artificial constraints• New name at each definition, similar to names used in LVN and SSA• In practice, can only rename “local” names — not LIVE on exit
COMP 506, Rice University 10
The Original Code
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r1c: loadAI rARP, @b ⇒ r2d: mult r1, r2 ⇒ r1e: loadAI rARP, @c ⇒ r2f: mult r1, r2 ⇒ r1g: loadAI rARP, @d ⇒ r2h: mult r1, r2 ⇒ r1i: storeAI r1 ⇒ r0,@a
After Renaming
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a
2. Build a dependence graph to capture the flow of values in the code• Nodes n Î D represent operations with type(n) and delay(n)
• An edge e = (n1,n2) Î D iff n1 uses the result of n2 (runs from use to def)
The Original Code
a: loadAI rARP, @a ⇒ r1
b: add r1, r1 ⇒ r1
c: loadAI rARP, @b ⇒ r2
d: mult r1, r2 ⇒ r1
e: loadAI rARP, @c ⇒ r2
f: mult r1, r2 ⇒ r1
g: loadAI rARP, @d ⇒ r2
h: mult r1, r2 ⇒ r1
i: storeAI r1 ⇒ r0,@a
Instruction Scheduling (Operational View)
COMP 506, Rice University 11
The Dependence Graph
true dependences or flow dependences
Some authors prefer the term “precedence graph” over “dependence
graph.” In the context of scheduling, they are interchangeable.
a
b c
d e
f g
h
i
2. Build a dependence graph to capture the flow of values in the code• The final schedule must honor each dependence in the input code♦ Values computed before they are used & used before they are overwritten
• Renaming lets the scheduler ignore some anti-dependences
The Original Code
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r1c: loadAI rARP, @b ⇒ r2d: mult r1, r2 ⇒ r1e: loadAI rARP, @c ⇒ r2f: mult r1, r2 ⇒ r1g: loadAI rARP, @d ⇒ r2h: mult r1, r2 ⇒ r1i: storeAI r1 ⇒ r0,@a
Instruction Scheduling (Operational View)
COMP 506, Rice University 12
The Dependence Graph
a
b c
d e
f g
h
i
anti- dependences
Instruction Scheduling (Operational View)
2. Build a dependence graph to capture the flow of values in the code• The final schedule must honor each dependence in the input code♦ Values computed before they are used & used before they are rewritten
• Renaming lets the scheduler ignore some anti-dependences
COMP 506, Rice University 13
The Dependence Graph
a
b c
d e
f g
h
i
The Renamed Code
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a
anti- dependences are gone
The Scheduler Needs Some Additional Edges
To ensure the correct flow of values, the scheduler needs edges that specify the relative ordering of loads, stores, and outputs
COMP 506, Rice University 14
First op
Second op
Same address
Distinctaddresses
Unknownaddress(es) Acronym
load load — No conflict (after renaming) — —
load oroutput
store conflict no conflict conflict WAR
store store conflict no conflict conflict WAW
storeload oroutput
conflict no conflict conflict RAW
output output need an edge to serialize the outputs —
“conflict” implies 1st op must complete before 2nd op issues⇒ an edge in dependence graph
Computer architects also worry about these conflicts. The acronyms arise in discussions
about memory system design.
1: loadI 8 => vr3prio: <12,12,4>
2: loadI 12 => vr4prio: <12,12,4>
3: add vr3, vr4 => vr0prio: <11,11,3>
vr3, 1 vr4, 1
4: load vr0 => vr1prio: <10,10,2>
vr0, 1 5: load vr3 => vr2prio: <10,10,2>
vr3, 1
6: store vr1 => vr0prio: <5,5,1>
vr0, 1
vr1, 5 IO Edge, 1
7: output 12prio: <0,0,0>
IO Edge, 5
Dependence
Dependence relations among memory operations are more complex
COMP 412, Fall 2017 15
loadI 12 => r4 loadI 8 => r3load r3 => r2 add r3, r4 => r0load r0 => r1 nopnop nopnop nopnop nopnop nopstore r1 => r0 nopnop nopnop nopnop nopnop nopoutput 12 nop
Scheduled Code(2 Functional Units)
1 loadI 8 => r12 loadI 12 => r23 add r1, r2 => r34 load r3 => r45 load r1 => r56 store r4 => r37 output 12
Original Code Dependence Graph
What is this edge?
Instruction Scheduling (Operational View)
2. Build a dependence graph to capture the flow of values in the code
COMP 506, Rice University 16
Building the Graph
Create an empty map, Mwalk the block, top to bottom
at each operation o that defines VRi :create a node for o 1
set M(VRi ) to the node ofor each VRj used in o, add an edge from node o to the node in M(VRj) if o is a load, store, or outputoperation, add edges to ensure serialization of memory ops 2
Explanatory Notes1. ‘o’ refers to both the operation in the
original block and the node that represents it in the graph.
The meaning should be clear by context.
2. I/O operations need additional edges for synchronization. At a minimum:
• load & output need an edge to the most recent store (anti-dependence)
• output needs an edge to the most recent output (serialization)
• store needs an edge to the most recent store, as well as each previous load & output (serialization)O(n + m2)
where m is |memory ops|
Rem
inds
me
of LV
N
Instruction Scheduling (Operational View)
3. Compute a priority function on the nodes of the graph• The priority function should reflect the importance of the operation to
the final schedule (whatever that means)♦ Latency-weighted distance to a root is a classic priority function
COMP 506, Rice University 17
The Renamed Code
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a
The Dependence Graph
a
b c
d e
f g
h
i
leaf
leaf
leaf
leaf
root
Instruction Scheduling (Operational View)
3. Compute a priority function on the nodes of the graph• The priority function should reflect the importance of the operation to
the final schedule (whatever that means)♦ Latency-weighted distance to a root is a classic priority function
COMP 506, Rice University 18
The Renamed Code
a: loadAI rARP, @a ⇒ r1b: add r1, r1 ⇒ r2c: loadAI rARP, @b ⇒ r3d: mult r2, r3 ⇒ r4e: loadAI rARP, @c ⇒ r5f: mult r4, r5 ⇒ r6g: loadAI rARP, @d ⇒ r7h: mult r6, r7 ⇒ r8i: storeAI r8 ⇒ r0,@a
The Dependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
latency-weighted distance to root
Op Latency
loads & stores 3 cycles
multiplies 2 cycles
all others 1 cycle
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities
COMP 506, Rice University 19
Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø
while (Ready È Active ¹ Ø)for each functional unit, f, do
if ∃ 1 or more op in Ready for f thenremove highest priority op for f from
the Ready queueS(op) ¬ CycleActive ¬ Active È op
Cycle ¬ Cycle + 1
for each op in Activeif (S(op) + delay(op) ≤ Cycle) then
remove op from Activefor each successor s of op in D
if (s is ready) thenReady ¬ Ready È s
Remove in priority order (break ties)Need a sliding window to see the constrained operations.
op has completed execution
If successor’s operands are “ready”, add it to Ready
Implement Ready as a priority queue
Greedy, heuristic, list-scheduling algorithm
If we add edges to synchronize loads and stores, this test must also check successors along those edges
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 20
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1
2
3
4
5
6
7
8
9
10
11
Ready <a,13>, <c,12>, <e,10>, <g,8>
Active
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
repeatuntil done
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 21
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2
3
4
5
6
7
8
9
10
11
Ready <c,12>, <e,10>, <g,8>
Active <a,#4>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 22
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3
4
5
6
7
8
9
10
11
Ready <e,10>, <g,8>
Active <a,#4>, <c,#5>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 23
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4
5
6
7
8
9
10
11
Ready <g,8>, <b,10>
Active <a,#4>, <c,#5>, <e,#6>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 24
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a
2 c
3 e
4
5
6
7
8
9
10
11
Ready <g,8>, <b,10>
Active <a,#4>, <c,#5>, <e,#6>
Start with the operations that are
ready to schedule (the leaves)
— remove completed ops from
Active and visit ops that depend
on them. If they are ready, move
them onto the Ready queue
— take the highest priority op in
Ready & schedule it into next slot
— place the newly schedule op in
Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 25
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5
6
7
8
9
10
11
Ready <g,8>
Active <c,#5>, <e,#6>, <b#5>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 26
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5
6
7
8
9
10
11
Ready <g,8>, <d,9>
Active <c,#5>, <e,#6>, <b#5>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 27
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6
7
8
9
10
11
Ready <g,8>
Active <e,#6>, <d, #7>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 28
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6
7
8
9
10
11
Ready <g,8>
Active <e,#6>, <d,#7>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 29
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7
8
9
10
11
Ready
Active <g,#9>, <d,#7>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 30
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7
8
9
10
11
Ready <f,7>,
Active <g,#9>, <d,#7>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 31
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8
9
10
11
Ready
Active <g, #9>, <f,#9>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 32
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8
9
10
11
Ready
Active <g, #9>, <f,#9>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
nothing ready
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 33
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8 nop
9
10
11
Ready
Active <g, #9>, <f,#9>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
nothing ready
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 34
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8 nop
9
10
11
Ready <h,5>
Active <g, #9>, <f#9>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 35
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8 nop
9 h10
11
Ready
Active <h,#11>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 36
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8 nop
9 h10
11
Ready
Active <h,#11>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active⇐
nothing ready
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 37
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a2 c3 e4 b5 d6 g7 f8 nop
9 h10 nop
11
Ready
Active <h,#11>
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active⇐
nothing ready
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 38
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a
2 c
3 e
4 b
5 d
6 g
7 f
8 nop
9 h
10 nop
11
Ready <I,3>
Active <h,#11>
Start with the operations that are
ready to schedule (the leaves)
— remove completed ops from
Active and visit ops that depend
on them. If they are ready, move
them onto the Ready queue
— take the highest priority op in
Ready & schedule it into next slot
— place the newly schedule op in
Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 39
The ScheduleDependence Graph
3
58
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule1 a
2 c
3 e
4 b
5 d
6 g
7 f
8 nop
9 h
10 nop
11 i
Ready
Active
Start with the operations that are ready to schedule (the leaves)
— remove completed ops from Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
Instruction Scheduling (Operational View)
4. Schedule the operations to reflect dependences and priorities
• Use a greedy heuristic technique, such as list scheduling
COMP 506, Rice University 40
The ScheduleDependence Graph
3
5
8
7
9
10
12
10
13a
b c
d e
f g
h
i
Schedule
1 a2 c3 e4 b5 d6 g7 f8 nop
9 h10 nop
11 i
Ready
Active
Start with the operations that are ready to schedule (the leaves)— remove completed ops from
Active and visit ops that depend on them. If they are ready, move them onto the Ready queue
— take the highest priority op in Ready & schedule it into next slot
— place the newly schedule op in Active
⇐
And, the ready queue and
the active queue are empty,
at last … so, it halts.
Same schedule as on slide 5.
Improvements
COMP 506, Rice University 41
Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø
while (Ready È Active ¹ Ø)for each functional unit, f, do
if ∃ 1 or more op ∈ Ready for f thenremove highest priority op for f from
the Ready queueS(op) ¬ CycleActive ¬ Active È op
Cycle ¬ Cycle + 1
for each op Î Activeif (S(op) + delay(op) ≤ Cycle) then
remove op from Activefor each successor s of op in D
if (s is ready) thenReady ¬ Ready È s
Priorities
The problem is NP-Complete• No deterministic technique wins
• Different priority functions can have different effects♦ Latency-weighted depth♦ Most descendants in the graph♦ Breadth-first numbering♦ Depth-first numbering♦ Boost priority of any op that can
only run on one of the units♦ Boost priority of loads over stores
Cannot make up your mind?• Try a linear combination of two
priority functions (# x prio1 + $ x prio2)
• Can try # / $ tuning — Restrict # + $ = 1 and perform a
parameter sweep over values of #&$
Improvements
The problem is NP-Complete• No deterministic technique wins• When 2 or more ops have the
same priority, use another criterion to break the tie♦ Any of the suggested priority
functions might work♦ Prefer multi-cycle operations♦ Prefer more successors♦ Balance progress on all paths♦ Prefer operations on critical paths
Can also try random tie breaking, although you need to run the scheduler multiple times & keep the best schedule to see good results.
COMP 506, Rice University 42
Cycle ¬ 1Ready ¬ leaves of DActive ¬ Ø
while (Ready È Active ¹ Ø)for each functional unit, f, do
if ∃ 1 or more op ∈ Ready for f thenremove highest priority op for f from
the Ready queueS(op) ¬ CycleActive ¬ Active È op
Cycle ¬ Cycle + 1
for each op Î Activeif (S(op) + delay(op) ≤ Cycle) then
remove op from Activefor each successor s of op in D
if (s is ready) thenReady ¬ Ready È s
Tie breaking
Classic reference is “Efficient Instruction Scheduling for a Pipelined Architecture,” P.B. Gibbons and S.S. Muchnick, ACM SIGPLAN 86 Conference on Compiler Construction.
More List Scheduling
List scheduling algorithms fall into two distinct classes
The algorithm presented in this lecture is a forward scheduling algorithm• Schedules first cycle in the block before the second, before the third, …• A backward scheduling algorithm starts at the last, then schedules the
second last, then the third last, ...• An op only comes off of the active queue onto the ready queue when all of
its successors have been scheduled and its own latency has been covered.
COMP 506, Rice University 43
Forward list scheduling• Start with available operations• Work forward in time• Ready Þ all operands available
Backward list scheduling• Start with no successors• Work backward in time• Ready Þ latency covers uses
Again, classic reference is “Efficient Instruction Scheduling for a Pipelined Architecture,” P.B. Gibbons and S.S. Muchnick, ACM SIGPLAN 86 Conference on Compiler Construction.
More List Scheduling
Forward and backward scheduling can produce different results
COMP 506, Rice University 44
Block from SPEC benchmark “go”
Operation load loadI add addI store cmpLatency 1 1 2 1 4 1
cbr
cmp store1 store2 store3 store4 store5
add1 add2 add3 add4 addI
loadI1 lshift loadI2 loadI3 loadI4
1
2 5 5 5 5 5
7 7 7 7 6
88888Latency to
the cbr
Subscript to identify
More List Scheduling
COMP 506, Rice University 45Using “latency to root” as the priority function
Assume 2 identical ALUs & 1 load/store unit
Int Int Mem1 loadI1 lshift2 loadI2 loadI3
3 loadI4 add1
4 add2 add3
5 add4 addI store1
6 cmp store2
7 store3
8 store4
9 store5
10111213 cbr
Forward Schedule
Int Int Mem1 loadI4
2 addI lshift3 add4 loadI3
4 add3 loadI2 store5
5 add2 loadI1 store4
6 add1 store3
7 store2
8 store1
91011 cmp12 cbr
Backward Schedule1 fewer cycle in the backward schedule