Compiler Principle and...

Post on 29-Aug-2021

2 views 0 download

transcript

Compiler Principle and Technology

Prof. Dongming LU

Mar. 1st, 2021

Content1. INTRODUCTION

2. SCANNING

3. CONTEXT-FREE GRMMARS AND PARSING

4. TOP-DOWN PARSING

5. BOTTOM-UP PARSING

6. SEMANTIC ANALYSIS

7. RUNTIME ENVIRONMENT

8. CODE GENERATION

1. INTRODUCTION

What is a compiler?

A program translates one language to another equivalently

(high-level language) (machine code)

CompilerSource

language

Target

language

Extension

CompilerOne

format

Another

format

What is a compiler?

• A complex program

• Used in almost all forms of computing

• Of significant practical use

The purpose of this course• Provide basic knowledge▫ Theory

▫ Coding techniques

• Give necessary tools and practical experiences

▫ A series of simple examples

▫ TINY, C-Minus

• Understand the design pattern and module division methods in

software engineering

▫ As a special software

CompilerOne format Another format

How to translate ?

Main Topics

1.1 Why Compilers? A Brief History

1.2 Programs Related to Compilers

1.3 The Translation Process

1.4 Major Data Structures in a Compiler

1.5 Other Issues in Compiler Structure

1.6 Bootstrapping and Porting

1.1 Why?A Brief History

Why Compiler

C7 06 0000 0002 Mov x, 2 x=2

Time consuming and Tedious, Not easy to understand

Not easy to write and understand, dependent on particular machine

Brief History of Compiler

1954-1957:First compiler

1960s ~1970s: Theories and

algorithms

1970s~1990s: Some tools

last 20 years : IDE

The Chomsky Hierarchy was

also studiedFA, CFG et al Yacc, Lex

Compiler design have not changed much

Used by e.g.,

Brief History of Compiler

Noam ChomskyStructure of nature lanugage

The Chomsky Hierarchy

Four levels ofGrammar (0-3)

Type 3: FAFinite Automata

Type 2: CFGContext-Free Grammar

1.2 Programs related to Compiler

Editors

Preprocessors

Assemblers

LinkersLoaders

similar to

Profilers

Programms run before the compiler

Programs related to compiler

Compilers

Interpreters Project managers

Debuggers

Programms run after the compiler

Executing immediately Generating object code

Run slowly Run fast

Interpreters and Compilers share many of their operations

Interpreters VS Compilers

(interpreters) (compilers)

source code (BASIC)

interpreterexecute on X86

interpreterexecute on ARM

interpreterexecute on MIPS

source code(c++)

X86

ARM

MIPS

target language 1

execute

compiler 2

compiler 3

target language 2

target language 3

execute

execute

compiler 1

Preprocessor

Compiler

Assembler

Linker

Object code

other programs, e.g., debugger

Executable file

Machine-dependent

Other Programs related to compiler

Editors

Source code

Loader

1.3 The Translation Process

The Phases of a Compiler

Scanner

Parser

Semantics Analyzer

Source Code Optimizer

Code Generator

Target Code Optimizer

Literal

Table

Symbol

Table

Error

Handler

Source code

Syntax Tree

Annotated Tree

Intermediate code

Target code

Target code

Tokens

Important data structures

The phases of a compiler

• Six phases

▫ Scanner

▫ Parser

▫ Semantic Analyzer

▫ Source code optimizer

▫ Code generator

▫ Target Code Optimizer

• Three auxiliary components

▫ Literal table

▫ Symbol table

▫ Error Handler

The Scanner

• Lexical analysis: sequences of characters->tokens

eg : a[index]=4+2

id, 1 [ id, 2 ] = 4 + 2

a id[ left bracketindex id] right bracket= assignment4 number+ plus sign2 number

Scanner

Symbol Tablea

index…

. . .

. . .

. . .

1

23

The Parser• Syntax analysis:

Tokens -> Parse tree / Syntax tree ( Abstract syntax tree)

a index 4 2identifier identifier number number

expression [ expression ] expression + expression

subscript-expression additive-expression

expression = expression

Assign-expression

expression

a[index]=4+2

a index 4 2

identifier identifier number number

subscript-expression additive-expression

Assign-expression

The Parse Tree The Syntax Tree

The Semantic Analyzer

• The semantics of a program are its “meaning” :

declarations and type checking

eg: a[index]=4+2

The Syntax Tree annotated with attributes

a index 4 2

identifier identifier number number

subscript-expression additive-expression

integer integer

Assign-expression

array of integer integer integer integer

The Source Code Optimizer• The constant folding

eg: a[index]=4+2

Three-address code is easier to optimize

a index

identifier identifier

subscript-expression

integer

Assign-expression

array of integer integer

number

6

integer

a index 4 2

identifier identifier number number

subscript-expression additive-expression

integer integer

Assign-expression

array of integer integer integer integer

constant folding

The Code Generator

• Ihe properties of the target machine become the major factor.

eg: a[index]=4+2:

MOV R0, index ;; value of index -> R0

MUL R0, 2 ;; double value in R0

MOV R1, &a ;; address of a -> R1

ADD R1, R0 ;; add R0 to R1

MOV *R1, 6 ;; constant 6 -> address in R1

The Target Code Optimizer• Optimizing methods:▫ Choosing addressing modes▫ Replacing slow instructions by the faster ones▫ Eliminating redundant operations

eg: a[index]=4+2

MOV R0, index ;; value of index -> R0MUL R0, 2 ;; double value in R0MOV R1, &a ;; address of a -> R1ADD R1, R0 ;; add R0 to R1MOV *R1, 6 ;; constant 6 -> address in R1

MOV R0, index ;; value of index -> R0SHL R0 ;; double value in R0MOV &a[R0], 6 ;; constant 6 -> address a + R0

transform to

1.4 Major Data Structures in a Compiler

Major Data Structures

• TOKEN:

a single global variable

• THE SYNTAX TREE:

pointer-bases structure

• THE SYMBOL TABLE:

hash table

• THE LITERAL TABLE:

reuse of constants and strings

• INTERMEDIATE CODE:

three-address code

• TEMPORARY FILES:

products of intermediate steps

1.5 Other Issues in Compiler Structure

The Structure of Compiler

• Multiple views from different angles

▫ Logical Structure

▫ Physical Structure

▫ Sequencing of the operations, and so on.

• A major impact of the structure

▫ Reliability

▫ Efficiency

▫ Usefulness

▫ Maintainability

Analysis and Synthesis(Logical Structure)

• Analysis part: analyzes source program & compute properties

▫ Lexical analysis, syntax analysis and semantics analysis, as well as

optimization

▫ More mathematical and better understood

• Synthesis part: produces the translated codes

▫ Code generation, as well as optimization

▫ More specialized

• The two parts can be changed independently of the other

Front End and Back End(Physical Structure)

Scanner

Parser

Semantics Analyser

Source code

Intermediate code synthesis

Code Generator

Target Code Optimizer

Target code

Important

for compiler

portability

Front end-orient to

people: depend on the

source language

independent of target

language

Back end- orient to

computer:depend on

the target language

independent of source

language

Passes(Sequencing of the operations)• Pass: the repetitions to process the entire source program

before generating code.

• Most compilers with optimization use more than one pass▫ One pass for scanning and parsing

▫ One pass for semantic analysis and source-level optimization

▫ The third pass for code generation and target-level optimization

• A compiler can be one pass or multiple passes▫ One pass: efficient compilation; less efficient target code

-> Space consuming▫ Multiple passes: efficient target code; complicated compilation

-> Time consuming

Language Definition and compilers

• The lexical and syntactic structure of a programming

language

▫ Regular expressions

▫ Context-free grammar

• The semantics of a programming language in English

descriptions

▫ Language reference manual, or language definition.

• A language definition and a compiler are often

developed simultaneously

▫ The techniques have a major impact on definition

▫ The definition has a major impact on the techniques

Language Definition and compilers

Compiler options and interfaces

• Mechanisms for interfacing with the operation system

▫ Input and output facilities

▫ Access to the file system of the target machine

• Options to the user for various purposes

▫ Specification of listing characteristic

▫ Code optimization options

Error Handling

• Static (or compile-time) errors must be reported by a compiler

▫ Generate meaningful error messages

▫ Resume compilation after each error

▫ Different kind of error handing for Each phase of a compiler

• Exception handling

▫ Perform suitable runtime tests

▫ Exception handing mechanisms are required

1.6 Bootstrapping and Porting

Third Language for Compiler Construction• Machine language

▫ Execute immediately

• Another language with existed compiler on the same target machine : (First Scenario)

▫ Compile the new compiler with existing compiler

• Another language with existed compiler on different machine : (Second Scenario)

▫ Compilation produce a cross compiler

T-Diagram Describing Complex Situation

• A compiler written in language H that translates

language S into language T.

S T

H

• T-Diagram can be combined in two basic ways.

The First T-diagram Combination

A B B C A C

H H H

• Two compilers run on the same machine H▫ First from A to B

▫ Second from B to C

▫ Result from A to C on H

The Second T-diagram Combination

A B A B

H H K K

M

• Translate implementation language of a compiler from H to K

• Use another compiler from H to K

The First Scenario

A H A H

B B H H

H

• Translate a compiler from A to H written in B

▫ Use an existing compiler for language B on machine H

• Another language with existed compiler on the same target machine : (First Scenario)

The Second Scenario

A H A H

B B K K

K

• Use an existing compiler for language B on different

machine K

▫ Result in a cross compiler

• Another language with existed compiler on different machine : (Second Scenario)

Process of Bootstrapping

• Write a compiler in the same language

S T

S

• No compiler for source language yet

• Porting to a new host machine

The First step in bootstrap

A H A H

A A H H

H

• “quick and dirty” compiler written in machine language H

• Compiler written in its own language A

• Result in running but inefficient compiler

The Second step in bootstrap

A H A H

A A H H

H

• Running but inefficient compiler

• Compiler written in its own language A

• Result in final version of the compiler

The step 1 in porting

A K A K

A A H H

H

• Original compiler

• Compiler source code retargeted to K

• Result in Cross Compiler

The step 2 in porting

A K A K

A A K K

H

• Cross compiler

• Compiler source code retargeted to K

• Result in Retargeted Compiler

1.7 The Tiny Sample Language

and Compiler

The Requirements

A complete source code in (ANSI) C for a

small language whose compiler can be easily

comprehended once the techniques are

understood .

The TINY Language

Very simple structure:

• Just a sequence of statements separated by semicolons;

• All variables are interger variables;

• An if-statement and a repeat-statement;

• Read and write statements for input/output;

• Comments are within curly brackets.

The TINY Language

A simple example:{ Sample program

in TINY language- computes factorial

}

Read x; { input an integer}

If x > 0 then { don’t compute if x <=0}

fact :=1;

repeat

fact := fact * x;

x := x-1

until x=0

write fact { output factorial of x}

end

The TM Compiler

The TINY compiler consists of the following C files:

▫ Globals.h main.c

▫ Util.h util.c

▫ Scan.h scan.c

▫ Parse.h parse.c

▫ Symtab.h analyze.c

▫ Code.h code.c

▫ Cgen.h cgen.c

The TM Machine

The target for the small language TINY:

▫ Enough instructions to be adequate;

▫ The properties of RISCs;

▫ …

End of Chapter One

Thanks

Mar. 1st, 2021