+ All Categories
Home > Documents > L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler...

L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler...

Date post: 20-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
44
Introduction L25: Modern Compiler Design
Transcript
Page 1: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Introduction

L25: Modern Compiler Design

Page 2: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Course Aims

• Understand the performance characteristics of modernprocessors

• Be familiar with strategies for optimising dynamic dispatch forlanguages like JavaScript and Objective-C

• Have experience with algorithms for automatically takingadvantage of SIMD, SIMT, and MIMD parallelism

Page 3: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Course Structure

• 8 Lectures

• 8 Supervised practical sessions

• Hands-on work with the LLVM compiler infrastructure

Page 4: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Assessment

• 4 short exercises• Simple pass / fail• Due: End of this term

• Longer assessed mini-project report• Up to 4,000 words• Due: Start of next term

Page 5: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

LLVM

• Began as Chris Lattner’s Masters’ project in UIUC in 2002,supervised by Vikram Adve

• Now used in many compilers• ARM / AMD / Intel / nVidia GPU shader compilers• C/C++ compilers for various platforms• Lots of domain-specific languages

• LLVM is written in C++. This course will not teach youC++!

Page 6: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Questions?

Page 7: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Modern Intermediate Representations (IR)

L25: Modern Compiler Design

Page 8: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Reusable IR

• Modern compilers are made from loosely coupled components

• Front ends produce IR

• Middle ‘ends’ transform IR (optimisations)

• Back ends generate native code

Page 9: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Structure of a Modern Compiler

Tokeniser

Parser

AST Builder

Optimiser

Code Generator

Source Code

Token Stream

Parser Actions

Intermediate Representation

Intermediate Representation

Executable Code

As with any other piece ofsoftware using libraries simpli-fies development.

Page 10: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Structure of a Modern Compiler

Tokeniser

Parser

AST Builder

Optimiser

Code Generator

Source Code

Token Stream

Parser Actions

Intermediate Representation

Intermediate Representation

Executable Code

As with any other piece ofsoftware using libraries simpli-fies development.

Page 11: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Optimisation Passes

• Modular, transform IR (Analysis passes just inspect IR)

• Can be run multiple times, in di↵erent orders

• May not always produce improvements in the wrong order!

• Some intentionally pessimise code to make later passes workbetter

Page 12: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Register vs Stack IR

• Stack makes interpreting, naive compilation easier

• Register makes various optimisations easier

• Which ones?

Page 13: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Register IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥r1 = load b

r2 = load c

r3 = r1 + r2

r4 = load b

r5 = load c

r6 = r4 + r5

r7 = r3 * r6

store a r6 ⌃ ⇧

Page 14: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Register IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥r1 = load b

r2 = load c

r3 = r1 + r2

r4 = load b

r5 = load c

r6 = r1 + r5

r7 = r3 * r6

store a r7 ⌃ ⇧

Page 15: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Register IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥r1 = load b

r2 = load c

r3 = r1 + r2

r4 = load b

r5 = load c

r6 = r1 + r2

r7 = r3 * r6

store a r7 ⌃ ⇧

Page 16: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Register IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥r1 = load b

r2 = load c

r3 = r1 + r2

r4 = load b

r5 = load c

r6 = r1 + r2

r7 = r3 * r3

store a r7 ⌃ ⇧

Page 17: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Stack IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥load b

load c

add

load b

load c

add

mul

store a ⌃ ⇧

Page 18: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Common Subexpression Elimination: Stack IR

Source language:⌥a = (b+c) * (b+c); ⌃ ⇧⌥load b

load c

add

dup

mul

store a ⌃ ⇧

Page 19: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Problems with CSE and Stack IR

• Entire operation must happen at once (no incrementalalgorithm)

• Finding identical subtrees is possible, reusing results is harder

• If the operations were not adjacent, must spill to temporary

Page 20: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Hierarchical vs Flat IR

• Source code is hierarchical (contains structured flow control,scoped values)

• Assembly is flat (all flow control is by jumps)

• Intermediate representations are supposed to be somewherebetween the two

Page 21: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Hierarchical IR

• Easy to express high-level constructs

• Preserves program semantics

• Preserves high-level semantics (variable lifetime, exceptions)clearly

• Example: WHRIL in MIPSPro/Open64/Path64 andderivatives

Page 22: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Flat IR

• Easy to map to the back end

• Simple for optimisations to process

• Examples: LLVM IR, CGIR, PTX

Page 23: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Questions?

Page 24: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Modern Processor Architectures

L25: Modern Compiler Design

Page 25: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The 1960s - 1970s

• Instructions took multiple cycles

• Only one instruction in flight at once

• Optimisation meant minimising the number of instructionsexecuted

• Sometimes replacing expensive general-purpose instructionswith specialised sequences of cheaper ones

Page 26: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The 1980s

• CPUs became pipelined

• Optimisation meant minimising pipeline stalls

• Dependency ordering such that results were not needed in thenext instruction

• Computed branches became very expensive when not correctlypredicted

Page 27: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch

Decode

Decode

Register Fetch

Register FetchRegister FetchRegister Fetch

Execute

ExecuteExecuteExecute

Writeback

WritebackWriteback

add

add add add addjne jne jne jne jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 28: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch

Decode

Decode

Register Fetch

Register FetchRegister FetchRegister Fetch

Execute

ExecuteExecuteExecute

Writeback

WritebackWritebackadd

add

add add add

jne

jne jne jne jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 29: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch Decode

Decode

Register Fetch

Register FetchRegister FetchRegister Fetch

Execute

ExecuteExecuteExecute

Writeback

WritebackWritebackadd add

add

add addjne

jne

jne jne jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 30: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch DecodeDecode Register FetchRegister Fetch

Register FetchRegister Fetch

Execute

ExecuteExecuteExecute

Writeback

WritebackWritebackadd add add

add

addjne jne

jne

jne jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 31: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch

Decode

Decode Register Fetch

Register Fetch

Register Fetch

Register Fetch Execute

Execute

ExecuteExecute

Writeback

WritebackWritebackadd add add add

add

jne jne

jne

jne jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 32: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch

Decode

Decode Register Fetch

Register FetchRegister Fetch

Register Fetch

ExecuteExecute

Execute

Execute Writeback

Writeback

Writebackadd add add add addjne jne jne

jne

jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 33: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Stall Example

Fetch

Fetch

Decode

Decode Register Fetch

Register FetchRegister Fetch

Register Fetch

ExecuteExecuteExecute

Execute

WritebackWriteback

Writeback

add add add add addjne jne jne jne

jne

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

...

add r1 , r1 , 1

jne r1 , 0, start ⌃ ⇧

Page 34: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Fixing the Stall

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

add r1 , r1 , 1

...

jne r1 , 0, start ⌃ ⇧

Is this a good solution?

Page 35: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Fixing the Stall

⌥for (int i=100 ; i!=0 ; i--)

{

...

} ⌃ ⇧⌥start:

add r1 , r1 , 1

...

jne r1 , 0, start ⌃ ⇧Is this a good solution?

Page 36: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The Early 1990s

• CPUs became much faster than memory

• Caches hid some latency

• Optimisation meant maximising locality of reference,prefetching

• Sometimes, recalculating results is faster than fetching frommemory

Page 37: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The Mid 1990s

• CPUs became superscalar• Independent instructions executed in parallel

• CPUs became out-of-order• Reordered instructions to reduce dependencies

• Optimisation meant structuring code for highest-possible ILP

• Loop unrolling no longer such a big win

Page 38: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Superscalar CPU Pipeline Example: Sandy Bridge

Can dispatch up to six instructions at once, via 6 pipelines:

1. ALU, VecMul, Shu✏e, FpDiv, FpMul, Blend

2. ALU, VecAdd, Shu✏e, FpAdd

3. Load / Store address

4. Load / Store address

5. Load / Store data

6. ALU, Branch, Shu✏e, VecLogic, Blend

Page 39: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Branch Predictors

• Achieve 95+% accuracy on modern CPUs

• No cost when branch is correctly predicted

• Long and wide pipelines mean very expensive for theremaining 5%!

Page 40: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The Late 1990s

• SIMD became mainstream

• Factor of 2-4⇥ speedup when used correctly

• Optimisation meant ensuring data parallelism

• Loop unrolling starts winning again, as it exposes lateroptimisation opportunities (more on this later)

Page 41: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The Early 2000s

• (Homogeneous) Multicore became mainstream

• Power e�ciency became important

• Parallelism provides both better throughput and lower power

• Optimisation meant exploiting fine-grained parallelism

Page 42: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The Late 2000s

• Programmable GPUs became mainstream

• Hardware optimised for stream processing in parallel

• Very fast for massively-parallel floating point operations

• Cost of moving data between CPU and CPU is high

• Optimisation meant o✏oading operations to the GPU

Page 43: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

The 2010s

• Modern processors come with multiple CPU and GPU cores

• All cores behind the same memory interface, cost of movingdata between them is low

• Increasingly contain specialised accelerators

• Often contain general-purpose (programmable) cores forspecialised workload types (e.g. DSPs)

• Optimisation is hard.

Page 44: L25: Modern Compiler Design - University of Cambridge · 2013-10-17 · L25: Modern Compiler Design. The 1960s - 1970s ... Introduction Author: L25: Modern Compiler Design Created

Questions?


Recommended