+ All Categories
Home > Documents > Chapter 1.pdf

Chapter 1.pdf

Date post: 31-Oct-2014
Category:
Upload: justin-ashley
View: 14 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
1 COMP SCI 4TB3 / 6TB3 Compiler Construction Emil Sekerinski Department of Computing and Software McMaster University Copyright Emil Sekerinski, 2013
Transcript
Page 1: Chapter 1.pdf

1

COMP SCI 4TB3 / 6TB3 Compiler Construction

Emil Sekerinski Department of Computing and Software McMaster University

Copyright Emil Sekerinski, 2013

Page 2: Chapter 1.pdf

2

Objectives …

•  Taking this course is not just about writing compilers! •  Compilers are the missing link in connecting architecture,

programming languages, formal languages, and operating systems. •  Parsing techniques have broader applicability than compilers, e.g.

command lines, file formats (XML), protocols. •  You learn to understand why the syntax of programming languages

is defined in a particular way. •  Understanding memory layout of data types (e.g. arrays, objects)

and compilation of control structures (e.g. short circuit evaluation, recursion) is essential to fully understanding efficiency considerations of programming languages.

•  The issues of separate compilation and the (unavoidable) effects of garbage collection are central to judging many languages.

•  You learn to understand the various optimization options of compilers.

•  You learn to understand the implications of "byte code" vs. RICS and "just-in-time" compilation.

Page 3: Chapter 1.pdf

3

… Objectives

•  In summary, –  you will have a deeper understanding of programming languages,

which will help for a better programming style, –  you will be able to implement analysers and interpreters for

"small languages" which appear everywhere (e.g. configuration files, query languages),

–  you will be able to write whole compilers for simple processors, –  you will know if and when to use compiler tools like lex and yacc.

•  Compiler construction is a highly specialised subject. For writing optimising, commercial quality compilers additional study is necessary.

Page 4: Chapter 1.pdf

4

Reference Books

Niklaus Wirth. Compiler Construction, Addison-Wesley, 176 pages, 1996. Most of the lectures and assignments are based on this book. It is rather thin, easy to read, and covers most topics.

Alfred V. Aho, Monica Lam, Ravi Sethi, Jeffery D. Ullman. Compiler - Principles, Techniques, and Tools, Second Edition, Addison-Wesley, 1009 pages, 2007. The revision of the classic book on compiler design, excellent reference for all traditional topics.

Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, Koen G. Langendoen.

Modern Compiler Design. Wiley, 754 pages, 2000. Covers imperative, object-oriented, functional, logic, and distributed languages; has been used in this course.

Page 5: Chapter 1.pdf

5

… Course Text

Andrew W. Appel. Modern Compiler Implementation in Java, Cambridge University Press, 548 pages, 1998. Three versions of the book exist, each covering the same material but using a different programming language for the implementation: Modern Compiler Implementation in Java, Modern Compiler Implementation in C, and Modern Compiler Implementation in ML. The book gives an excellent coverage over all modern issues in compiler design for traditional, object-oriented, and functional programming languages.

Steven S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 856 pages, 1997. Very comprehensive coverage of all issues around code generation and code optimization.

Page 6: Chapter 1.pdf

6

Outline

1. Concepts of Compilation 2. Language and Syntax 3.  Regular Languages 4.  Analysis of Context-free

Languages 5. Syntax-Directed Translation 6. The Construction of a Parser 7. Context-Dependencies 8.  A RISC Architecture as

Target 9.  Expressions and Assignments

10. Conditionals, Iterations, and Boolean Expressions

11. Procedures and Locality 12. Further Data Types 13. Object-Oriented Concepts 14. Modules and Separate

Compilation 15. Code Optimization 16. Garbage Collection 17. Virtual Machines 18. Generalized Parsing

Page 7: Chapter 1.pdf

7

1. Concepts of Compilation

In a broader view, the compiler is a program which processes a structured source and generates (simpler structured) target code.

Compiler Task of a compiler:

source text (source program, source code)

target program (target code)

error messages

Source: •  programming languages:

C, Pascal, Assembler •  text formatting languages: PDF,

TeX, html, RTF •  scripting languages:

bash, emacs, python, JavaScript •  database query languages •  hardware description languages •  machine control languages

Target: •  machine code: MC68000, SPARC •  assembly language •  interpreted code:

Java Virtual Machine, P-Code •  special processor code: DSP •  text formatting languages: PDF,

TeX, html, RTF •  machine tool instructions

Page 8: Chapter 1.pdf

8

Syntax-Directed Translation

•  Starting with Algol 60, the first programming language with a formally defined syntax, the translation process of a compiler is guided by the syntactical structure of the source text: syntax-directed compilation.

•  All languages since Algol 60 follow the structure of its definition: the structure of symbols in terms of characters and the structure of the language in terms of symbols are defined by a formal grammar. The conditions for type-correct programs (the context-dependencies) and the meaning of programs (the semantics) are part of the definition but not part of that grammar.

•  This leads to following model of compilation: –  Analysis: recognising the structure of the source text

according to the grammar(s) and checking the context-dependencies.

–  Synthesis: generating code for the target processor. Note that in a compiler these activities are intertwined.

Page 9: Chapter 1.pdf

9

Phases of Compilation

More precisely, compilation is typically split into a number of consecutive phases. Symbols (also called tokens) are sequences of characters like a number (a sequences of digits), an identifier (a sequence of letters and digits), a keyword (if, while), a separator (e.g. :). Lexical analysis is usually called scanning and syntactic analysis parsing. The corresponding parts of the compiler are referred to as the scanner and parser.

syntactic analysis

lexical analysis

contextual analysis

intermediate code generation

code optimization

code generation

source text

seq. of symbols

syntax tree

syntax tree + context info.

intermediate code

target code

intermediate code

analysis

synthesis

Page 10: Chapter 1.pdf

10

Intermediate Representations…

•  Suppose following declarations are processed: var pos: integer; procedure update (r: integer); …

•  Example:

syntactic analysis

lexical analysis

contextual analysis

pos := pos + r * 60

idpos becomes idpos plus idr times const60

idpos assignment

plus times

const60 idr idpos

idpos,var,integer assignment

plus times

const60, integer idr, var, integer idpos, var, integer

Page 11: Chapter 1.pdf

11

… Intermediate Representations

intermediate code generation

code optimization

code generation

R1 := r R2 := 60 R3 := R1 * R2 R4 := pos R5 := R4 + R3 pos := R5

MOV R1,SP+$8 MULI R1,60 ADD R1, $4000 MOV $4000, R1

���R1 := r R1 := R1 * 60 R1 := R1 + pos pos := R1

Page 12: Chapter 1.pdf

12

Symbol Table

•  All the context information is stored in the symbol table, e.g. Identifier Class Description Value / Address pos variable type integer absolute at 4000 hex update procedure 1 integer param. absolute at 4 hex r variable type integer relative at 8 hex

•  The symbol table contains all the information given by the declarations for the purpose of type-checking, but later information for code generation is added.

•  Although it is represented as a graph (linked data structure), it is called historically the symbol table.

Page 13: Chapter 1.pdf

13

•  Phases are a conceptual decomposition of the task of a compiler, which does not necessarily reflect the structure of the compiler.

•  Typically, several phases are merged into passes such that no intermediate data structure is necessary between the phases of a pass.

•  Files are traditionally used for passing the data between the passes. Modern compilers use main memory.

Passes

Lexical Analysis

Syntactic & Context Analysis

Synthesis

Page 14: Chapter 1.pdf

14

Scanner

Single-Pass Compilers

•  Traditionally, compilers would have 4-6 passes in order to keep the memory requirements for each pass down.

•  Modern compilers, for which main memory is not a limitation, are often single-pass compilers, where the various tasks are interleaved.

•  In a syntax-directed translation scheme, the parser is the main program. Other tasks are implemented as modules which are called by the parser, e.g.:

Parser

Generator

imported/used by

Symbol Table

Page 15: Chapter 1.pdf

15

Front End / Back End ...

•  A common and advantageous separation of the tasks is by dividing the compiler into two parts, the font end and the back end.

Pascal C Java analysis and target independent transformations

front end

syntax tree and context info

MIPS SPARC PowerPC target dependent code generation

back end

Page 16: Chapter 1.pdf

16

… Front End / Back End

•  This division helps reducing the efforts for writing compilers for different targets for the same language by sharing the front end, or for different languages for the same target by sharing the back end.

•  Theoretically: m source languages reducing m x n compilers n target machines to m front ends + n back ends

•  In practice, this only works if the languages respectively the targets are sufficiently similar. It is nevertheless a good structuring principle for flexibility.

}

Page 17: Chapter 1.pdf

17

Interpreted Code Compilers

•  A variation of this scheme is when a sequential interpreted representation rather than a hierarchical syntax tree is used.

•  For compactness, interpreted codes usually represent each instruction by a single byte, hence are called byte-codes.

Pascal C Java

byte code

MIPS SPARC PowerPC

Compiler

Interpreter

e.g. JVM (Java Virtual Machine), .NET, LLVM

Page 18: Chapter 1.pdf

18

Cross-Compilers

•  Compilers which produce machine code for a different computer than on which they run are called cross-compilers.

•  These are typically used for programming microcontrollers in embedded applications, but also for general multi-platform program development.

•  (For testing purposes a compiler with two back-ends is particularly useful.)


Recommended