+ All Categories
Home > Documents > Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing...

Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing...

Date post: 10-Jun-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
35
Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan
Transcript
Page 1: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Language Processing Systems

Prof. Mohamed Hamada

Software Engineering Lab. The University of Aizu

Japan

Page 2: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Today’s Outline

•  Anatomy of a compiler

•  Compiler front-end and back-end

•  Regular expressions

Page 3: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Anatomy of a Compiler

Program written

in a Programming

Languages

Assembly Language

Translation Compiler

Page 4: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

What is a compiler?

program in some source

language

executable code for target

machine

compiler

A compiler is a program that reads a program written in one language and translates it into another language.

Traditionally, compilers go from high-level languages to low-level languages.

Page 5: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Example

X=a+b*10

MOV id3, R2 MUL #10.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1

compiler

Page 6: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

What is a compiler?

program in some source

language

executable code for target

machine

front-end analysis

semantic represen-

tation

back-end synthesis

compiler

Intermediate representation

Page 7: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Compiler Architecture

Scanner (lexical

analysis)

Parser (syntax

analysis)

Code Optimizer

Code Generator

Source language

tokens Parse tree Intermediate

Language

Target language

Semantic Analysis 

IC generator

AST

Error Handler

Symbol Table

OIL

Front End Back End

Page 8: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

front-end: from program text to AST

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

front-end

Page 9: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

front-end: from program text to AST

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

scanner generator

token description

parser generator

language grammar

Scanner

Parser

Semantic analysis Semantic

representation

Page 10: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Semantic representation

•  heart of the compiler •  intermediate code

–  linked lists of pseudo instructions – abstract syntax tree (AST)

program in some source

language

executable code for target

machine

front-end analysis

semantic represen-

tation

back-end synthesis

compiler

Page 11: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

AST example

•  expression grammar expression → expression ‘+’ term | expression ‘-’ term | term term → term ‘*’ factor | term ‘/’ factor | factor factor → identifier | constant | ‘(‘ expression ‘)’

•  example expression b*b – 4*a*c

Page 12: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

parse tree: b*b – 4*a*c

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘*’

‘4’

constant

term

factor

term

‘a’

factor

identifier

‘*’

term

factor ‘*’

‘c’

identifier

expression

‘-’

Page 13: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

AST: b*b – 4*a*c

‘*’

‘c’

‘-’

‘b’

‘4’

‘*’

‘a’

‘*’

‘b’

Page 14: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

annotated AST: b*b – 4*a*c

•  identifier •  constant •  term •  expression

‘*’

‘c’

‘-’

‘b’

‘4’

‘*’

type: real loc: reg1

type: real loc: reg2

type: real loc: const

type: real loc: sp+24

type: real loc: reg2

‘a’ type: real loc: sp+8

‘*’

type: real loc: reg1

type: real loc: sp+16 ‘b’

type: real loc: sp+16

Page 15: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Parser := id1 + id2 *

id3 60

position = initial + rate * 60

Scanner

id1 := id2 + id3 * 60

Semantic Analyzer

:= id1 + id2 *

id3 int-to-real

60

Example

Page 16: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

AST exercise •  expression grammar

expression → expression ‘+’ term | expression ‘-’ term | term term → term ‘*’ factor | term ‘/’ factor | factor factor → identifier | constant | ‘(‘ expression ‘)’

•  example expression b*b – (4*a*c)

•  draw parse tree and AST

Page 17: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

answer parse tree: b*b – 4*a*c

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘*’

‘4’

constant

term

factor

term

‘a’

factor

identifier

‘*’

term

factor ‘*’

‘c’

identifier

expression

‘-’

Page 18: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

answer parse tree: b*b – (4*a*c)

‘b’

identifier

expression

term

factor

term

‘b’

factor

identifier

‘*’

term

expression

‘-’

expression

factor

‘(’ ‘)’

‘4*a*c’

Page 19: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Advantages of Using Front-end and Back-end

1. Retargeting - Build a compiler for a new machine by attaching a new code generator to an existing front-end.

2. Optimization - reuse intermediate code optimizers in compilers for different languages and different machines.

Note: the terms “intermediate code”, “intermediate language”, and “intermediate representation” are all used interchangeably.

Page 20: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Compiler structure

•  L+M modules = LxM compilers

program in some source

language

front-end analysis

semantic represen-

tation

executable code for target

machine

back-end synthesis

compiler

program in some source

language

front-end analysis

executable code for target

machine

back-end synthesis

executable code for target

machine

back-end synthesis

Page 21: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Limitations of modular approach

•  performance – generic vs specific –  loss of information

•  variations must be small – same programming paradigm – similar processor architecture

program in some source

language

front-end analysis

semantic represen-

tation

executable code for target

machine

back-end synthesis

compiler

program in some source

language

front-end analysis

executable code for target

machine

back-end synthesis

executable code for target

machine

back-end synthesis

Page 22: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Front-end and Back-end

•  Suppose you want to write 3 compilers to 4 computer platforms:

C++

Java

FORTRAN

MIPS

SPARC

Pentium

PowerPC

We need to write 12 programs

Page 23: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Front-end and Back-end

•  But we can do it better

FE BE

IR

–  IR: Intermediate Representation –  FE: Front-End –  BE: Back-End

C++

Java

FORTRAN

MIPS

SPARC

Pentium

PowerPC

BE

BE

BE

FE

FE

We need to write 7 programs only

Page 24: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Front-end and Back-end

•  Suppose you want to write compilers from m source languages to n computer platforms. A naïve solution requires n*m programs:

•  but we can do it with n+m programs: FE

FE

FE

BE

BE

BE

BE

IR

–  IR: Intermediate Representation –  FE: Front-End –  BE: Back-End

C++ Java

FORTRAN

MIPS SPARC Pentium PowerPC

C++ Java

FORTRAN

MIPS SPARC Pentium PowerPC

Page 25: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Compiler Example

position=initial+rate*60

MOV id3, R2 MUL #60.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1

compiler

Page 26: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Parser := id1 + id2 *

id3 60

position := initial + rate * 60

Scanner

id1 := id2 + id3 * 60

Semantic Analyzer

:= id1 + id2 *

id3 int-to-real

60

Intermediate Code Generator

temp1 := int-to-real (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3

Code Optimizer

temp1 := id3 * 60.0 id1 := id2 + temp1

Code Generator

MOV id3, R2 MUL #60.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1

Example

Page 27: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Regular Expressions

Symbol: a A regular expression formed by a.

Alternation: M | N A regular expression formed by M or N.

Concatenation:  (M • N) A regular expression formed by M followed by N. Repetition:

(M*) A regular expression formed by zero or more repetitions of M.

Empty Set: Φ  A regular expression formed by Empty set.

Lambda: λ  A regular expression formed by Empty string.

A regular expression is built up out of simpler regular expressions using a set of defining rules.

Page 28: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Regular Expressions

Example: (a)|((b)*(c)) can be written as: a|b*c.

Language: The language denoted by a regular expression r

     will be expressed as L(r)

Operators Precedence: () > * >  • > |

This can simplify regular expressions.

Regular expressions allows us to define tokens of programming Languages such as identifiers and numbers.

Page 29: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Regular Expressions

Examples:

1.  a* is a regular expression denotes the set {λ,a,aa,…}

2. a|b is a regular expression denotes the set {a}U{b}

3. a*|b is a regular expression denotes the set {λ,a,aa,…}U{b}

4. a*b is a regular expression denotes the set {b,ab,aab,…}

Page 30: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Match and Create the Regular Expressions

1.  0(0|1)*0

2.  ((λ|0)1*)*

3.  ((0|1)0(0|1))*

•  All strings of 0’s and 1’s that does not contain the substring 011

a.  000000 b.  01010 c.  010101 d.  101010 e.  001100

Page 31: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Match and Create the Regular Expressions

1.  0(0|1)*0

2.  ((λ|0)1*)*

3.  ((0|1)0(0|1))*

•  All strings of 0’s and 1’s that does not contain the substring 011

a.  000000 b.  01010 c.  010101 d.  101010 e.  001100

Page 32: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Match and Create the Regular Expressions

1.  0(0|1)*0

2.  ((λ|0)1*)*

3.  ((0|1)0(0|1))*

•  All strings of 0’s and 1’s that does not contain the substring 011

a.  000000 b.  01010 c.  010101 d.  101010 e.  001100

Page 33: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Match and Create the Regular Expressions

1.  0(0|1)*0

2.  ((λ|0)1*)*

3.  ((0|1)0(0|1))*

•  All strings of 0’s and 1’s that does not contain the substring 011

a.  000000 b.  01010 c.  010101 d.  101010 e.  001100

Page 34: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

Match and Create the Regular Expressions

1.  0(0|1)*0

2.  ((λ|0)1*)*

3.  ((0|1)0(0|1))*

•  All strings of 0’s and 1’s that does not contain the substring 011 –  1*((010)*0*)*(λ|1)

a.  000000 b.  01010 c.  010101 d.  101010 e.  001100

Page 35: Language Processing Systems - 会津大学公式 ...hamada/LP/L02-LP.pdf · Language Processing Systems Prof. Mohamed Hamada Software Engineering Lab The University of Aizu Japan

END


Recommended