+ All Categories
Home > Documents > Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

Date post: 06-Jan-2016
Category:
Upload: kaipo
View: 28 times
Download: 3 times
Share this document with a friend
Description:
Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling. John Cavazos University of Delaware. Instruction Scheduling. Reordering instructions to improve performance Takes into account anticipated latencies Machine-specific Performed late in optimization pass - PowerPoint PPT Presentation
25
UNIVERSITY NIVERSITY OF OF D DELAWARE ELAWARE C COMPUTER & OMPUTER & INFORMATION NFORMATION SCIENCES CIENCES DEPARTMENT EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University of Delaware
Transcript
Page 1: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizing CompilersCISC 673

Spring 2009Instruction Scheduling

John CavazosUniversity of Delaware

Page 2: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Instruction Scheduling

Reordering instructions to improve performance

Takes into account anticipated latencies Machine-specific

Performed late in optimization pass Instruction-Level Parallelism (ILP)

Page 3: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 3

Modern Architectures Features

Superscalar Multiple logic units

Multiple issue 2 or more instructions issued per cycle

Speculative execution Branch predictors Speculative loads

Deep pipelines

Page 4: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 4

Types of Instruction Scheduling

Local Scheduling Basic Block Scheduling

Global Scheduling Trace Scheduling Superblock Scheduling Software Pipelining

Page 5: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 5

Scheduling for different Computer Architectures

Out-of-order Issue Scheduling is useful

In-order issue Scheduling is very important

VLIW Scheduling is essential!

Page 6: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 6

Challenges to ILP

Structural hazards: Insufficient resources to exploit parallelism

Data hazards Instruction depends on result of previous

instruction still in pipeline Control hazards

Branches & jumps modify PC affect which instructions should be in

pipeline

Page 7: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Recall from Architecture…

IF – Instruction Fetch ID – Instruction Decode EX – Execute MA – Memory access WB – Write back

IF

IF

IF

ID

ID

ID

EX

EX

EX

MA

MA

MA

WB

WB

WB

Page 8: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Structural Hazards

IF

IF

ID

ID

EX EX MA

MA WB

WBaddf R3,R1,R2

addf R3,R3,R4 stall EX EX

Assumes floating point ops take 2 execute cycles

Instruction latency: execute takes > 1 cycle

Page 9: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Data Hazards

IF

IF

ID

ID

EX

EX

MA

MA WB

WBlw R1,0(R2)

add R3,R1,R4 stall

Memory latency: data not ready

Page 10: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Control Hazards

IF

IF

ID

---

EX

---

MA

--- ---

WB

IF ID EX MA WB

IF ID EX MA WB

Taken Branch

Instr + 1

Branch Target

Branch Target + 1

Page 11: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 11

Basic Block Scheduling

For each basic block: Construct directed acyclic graph

(DAG) using dependences between statements

Node = statement / instruction Edge (a,b) = statement a must execute

before b Schedule instructions using the

DAG

Page 12: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Data Dependences

If two operations access the same register and one access is a write, they are dependent

Types of data dependencesRAW=Read after Write WAW WAR

r1 = r2 + r3

r4 = r1 * 6

r1 = r2 + r3

r1 = r4 * 6

r1 = r2 + r3

r2 = r5 * 6

Cannot reorder two dependent instructions

Page 13: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Basic Block Scheduling Example

a) lw R2, (R1)b) lw R3, (R1) 4c) R4 R2 + R3d) R5 R2 - 1

a b

d c

2 2 2

a) lw R2, (R1)b) lw R3, (R1) 4 --- nop -----c) R4 R2 + R3d) R5 R2 - 1

a) lw R2, (R1)b) lw R3, (R1) 4d) R5 R2 - 1c) R4 R2 + R3

Original Schedule Dependence DAG

Schedule 1 (5 cycles) Schedule 2 (4 cycles)

Page 14: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 14

Scheduling Algorithm Construct dependence dag on basic

block Put roots in candidate set Use scheduling heuristics (in order) to

select instruction

While candidate set not empty Evaluate all candidates and select best one Delete scheduled instruction from

candidate set Add newly-exposed candidates

Page 15: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 15

Instruction Scheduling Heuristics

NP-complete = we need heuristics Bias scheduler to prefer instructions:

Earliest execution time Have many successors

More flexibility in scheduling Progress along critical path Free registers

Reduce register pressure Can be a combination of heuristics

Page 16: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Computing Priorities

Height(n) = exec(n) if n is a leaf max(height(m)) + exec(n) for m, where m is a successor of n

Critical path(s) = path through the dependence DAG with longest latency

Page 17: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

17

Example – Determine Height and CP

Code

a lw r1, w

b add r1,r1,r1

c lw r2,x

d mult r1,r1,r2

e lw r2,y

f mult r1,r1,r2

g lw r2,z

h mult r1,r1,r2

i sw r1, a

Assume: memory instrs = 3 mult = 2 = (to have result in register) rest = 1 cycle

Critical path: _______

a

b

d

f

h

i

c

e

g

2

32

2 3

31

3

Page 18: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

18

Example

start

Schedule

___ cycles

a

b

d

f

h

i

c

e

g

2

32

2 3

31

3

3

5

87

109

1210

13 Code

a lw r1, w

b add r1,r1,r1

c lw r2,x

d mult r1,r1,r2

e lw r2,y

f mult r1,r1,r2

g lw r2,z

h mult r1,r1,r2

i sw r1, a

Page 19: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Global Scheduling: Superblock Definition:

single trace of contiguous, frequently executed blocks

a single entry and multiple exits Formation algorithm:

pick a trace of frequently executed basic block eliminate side entrance (tail duplication)

Scheduling and optimization: speculate operations in the superblock apply optimization to scope defined by

superblock

Page 20: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Superblock Formation

A100

B90

E90

C10

D0

F100

A100

B90

E90

C10

D0

F90

F’10

Select a trace Tail duplicate

Page 21: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Optimizations within Superblock

By limiting the scope of optimization to superblock:

optimize for the frequent path may enable optimizations that are not feasible

otherwise (CSE, loop invariant code motion,...) For example: CSE

r1 = r2*3

r2 = r2 +1

r3 = r2*3

trace selection

r1 = r2*3

r2 = r2 +1

r3 = r2*3 r3 = r2*3

tail duplication

r1 = r2*3

r2 = r2 +1

r3 = r1 r3 = r2*3

CSE within superblock(no merge since single entry)

Page 22: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 22

Scheduling Algorithm Complexity

Time complexity: O(n2) n = max number of instructions in basic

block

Building dependence dag: worst-case O(n2) Each instruction must be compared to

every other instruction

Scheduling then requires each instruction be inspected at each step = O(n2)

Average-case: small constant (e.g., 3)

Page 23: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Very Long Instruction Word (VLIW)

Compiler determines exactly what is issued every cycle (before the program is run)

Schedules also account for latencies All hardware changes result in a compiler

change

Usually embedded systems (hence simple HW)

Itanium is actually an EPIC-style machine (accounts for most parallelism, not latencies)

Page 24: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT

Sample VLIW code

c = a + b d = a - b e = a * b ld j = [x] nop

g = c + d h = c - d nop ld k = [y] nop

nop nop i = j * c ld f = [z] br g

Add/Sub Add/Sub Mul/Div Ld/St Branch

VLIW processor: 5 issue2 Add/Sub units (1 cycle)1 Mul/Div unit (2 cycle, unpipelined)1 LD/ST unit (2 cycle, pipelined)1 Branch unit (no delay slots)

Page 25: Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling

UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT 25

Next Time

Phase-ordering


Recommended