+ All Categories
Home > Documents > Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015...

Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015...

Date post: 11-Mar-2018
Category:
Upload: dongoc
View: 219 times
Download: 3 times
Share this document with a friend
31
Homework 2: Parser and Lexer Remi Meier Compiler Design – 08.10.2015 1
Transcript
Page 1: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Homework 2:Parser and Lexer

Remi MeierCompiler Design – 08.10.2015

1

Page 2: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Compiler phases

Javali x86 Assembly

2

Compiler

Machine independent

IR IRFront-end Optimizations Back-end

Machine dependent

SemanticAnalysis

LexicalAnalysis

SyntacticAnalysis

AST

Page 3: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Homework 2

class Main {void main() {

write(222);writeln();

}}

Class

Method

Seq

Write

IntConst

WriteLn

?

How do we…• check if a program follows the syntax of Javali?• extract meaning / structure?

Text

AbstractSyntax Tree

Page 4: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Part 2b

Homework 2

4

Part 2a

Token Stream

Lexer Parser Parse Tree Javali AST

Page 5: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Lexical Analysis

Lexer• Read input character by character• Recognize character groups → tokens

Token• Sequence of characters with a collective meaning

→ grammar terminals• E.g. constants, identifiers, keywords, …

5

Token Stream

Lexer Parser Parse Tree / ASTText

Page 6: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ID : [a-zA-Z]+ ;NUM : [0-9]+ ;MISC : [{()};] ;WS : ('\n'|' ') → skip ;

class Main {void main() {

write(222);writeln();

}}

Lexical Analysis

ID: class ID: Main MISC: { ID: void ID: main MISC: ( MISC: ) …

Token stream:

Page 7: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Syntactic Analysis

Parser• Check if token stream follows the grammar• Group tokens hierarchically (extract structure)

→ Parse Tree / Abstract Syntax Tree

7

Token Stream

Lexer Parser Parse Tree / ASTText

Page 8: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

TOP-DOWN PARSER

8

Page 9: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Top-down parsers

9

statement

return

expr ‘;’‘return’

‘+’ ‘b’‘a’

return a + b ;Grammar in Extended Backus-Naur Form (EBNF):

statement: return| assign

return:‘return’ expr ‘;’

assign:ID ‘=‘ expr ‘;’

expr: ID ‘+’ ID

Page 10: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

void statement() {return();assign();

}

void return() {match(‘return’);expr();match(‘;’);

}

void expr() {match(ID);match(‘+’);match(ID);

}

statement: return| assign

return:‘return’ expr ‘;’

assign:ID ‘=‘ expr ‘;’

expr: ID ‘+’ ID

Implementation

10

How to deal with alternatives?

Grammar in Extended Backus-Naur Form (EBNF):

Page 11: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

void statement() { if (next() is ‘return’) { return(); } else if (next() is ID) { assign(); }}

Lookahead

11

LL(1)

statement: return| assign

return:‘return’ expr ‘;’

assign:ID ‘=‘ expr ‘;’

expr: ID ‘+’ ID

Grammar in Extended Backus-Naur Form (EBNF):

Page 12: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

http://www.antlr4.org/(or HW2 fragment)

12

Page 13: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ANTLR

Top-down parser generator● ALL(*) adaptive, arbitrary lookahead● handles any non-left-recursive context-free grammar

Token specifications

+Grammar

13

MyLexer.javaMyParser.java

Page 14: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

/* This is an example */grammar Example;

/* Parser rules = Non-terminals */program : statement* EOF ;

statement : assignment ';' | expression ';' ;

/* Lexer rules = Terminals */Identifier : Letter (Letter | Digit)* ;Letter : '\u0024' | '\u0041'..'\u005a';Upper-case initial: Lexer

Literals → Tokens

Lower-case initial: Parser

Start rule matching end-of-file

ANTLR – Grammar description

14

Page 15: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ANTLR – Operators

15

program : statement* EOF;

statement : assignment ';' | expression ';' ;

method : type name '(' params? ')' ;

Extended Backus-Naur Form (EBNF)

lexer-on ly

EBNF operatorsx | y | z (ordered) alternative

x? at most once (optional)

x* 0 .. n times

x+ 1 .. n times

[charset]one of the chars, e.g.: [a-zA-Z]

'x'..'y' characters in range

Page 16: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Demo 1

16

Page 17: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ANTLR – TroubleshootingANTLR does not warn about ambiguous rules● resolves ambiguity at runtime

→ requires lots of testing

ANTLR does not handle indirect left-recursion● direct left-recursion supported

17

Page 18: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

parserRule : 'enum' parserRule ;

fragmentLetter : [a-z] ;

Identifier : Letter+ ;

fragment enforces that the rule never produces a token, but can be used in other lexer rules (e.g., a)

can never match enum, but e.g., enums

creates implicit lexer rule T123 : 'enum'

ANTLR – Lexer ambiguity

18

Lexer decides based on:1. rule with the longest match first2. literal tokens before all regular Lexer rules3. document order4. fragment rules never match on their own

documen t order

What if some input is matched by multiple lexer rules?

Page 19: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

19

stmt: 'if' expr 'then' stmt 'else' stmt

| 'if' expr 'then' stmt

| ID '=' expr ;

if a then if c then d else e

(1)(2)

if a then if c then d else e

(1), (2) (2), (1)

if a then if c then d else e

ANTLR – Parser ambiguity

Ambiguous since there exist more than one parse trees for the same input.

Page 20: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

20

ANTLR – Parser ambiguity

At decision points, if more than one alternative match a given input, follow document order.

stmt: 'if' expr 'then' stmt 'else' stmt

| 'if' expr 'then' stmt

| ID '=' expr ;

(1)(2)

if a then if c then d else e

if a then if c then d else e

(1), (2) (2), (1)

if a then if c then d else e

Page 21: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

21

ANTLR – Parser ambiguity

At decision points, if more than one alternative match a given input, follow document order.

Solution

stmt: 'if' expr 'then' stmt 'else' stmt

| 'if' expr 'then' stmt

| ID '=' expr ;

(1)(2)

stmt: 'if' expr 'then' stmt

| 'if' expr 'then' stmt 'else' stmt

| ID '=' expr ;

Page 22: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

22

ANTLR – Parser ambiguityAt decision points, if more than one alternative match a given input, follow document order.

Alternative solution:

Sub-rules introduce additional decision points.

(…)? → (…| )stmt: 'if' expr 'then' stmt ('else' stmt)?

| ID '=' expr ;

Page 23: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ANTLR – Left-recursion

23

Direct:

list : LETTER (',' LETTER)*;

Without: “a, b, c”

list : list ',' LETTER | LETTER ;

Indirect:list : LETTER

| longlist ;

longlist : list ',' LETTER;

Page 24: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

ANTLR – Direct left-recursion

24

exp: exp '*' exp

| exp '+' exp| ID ;

a + a * aa * a + a * a

rewriteA grammar that implicitly assigns priorities to alternatives in document order

21

3

1

2

https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Left-recursive+rules

Page 25: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Demo 2

25

Page 26: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Part 2bJavaliAstVisitor.java

Homework

26

Part 2aParser grammar: Javali.g4

Token Stream

Lexer Parser Parse Tree Javali AST

Page 27: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Generated files

27

ANTLR

JavaliLexer/Parser.java● the real thing

Javali(Base)Visitor.java● base class for parse-tree visitor

Javali(Lexer).tokens● token → number mapping for debugging

Page 28: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

start : exp EOF ;

exp : exp '*' exp | exp '+' exp | ID ;

Generated visitor

28

start : exp EOF ;

exp : exp '*' exp # MULT | exp '+' exp # ADD | ID # TERM ;

one method per rule

one method per label / rule

https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parser+Rules

Page 29: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Constructing the Javali AST

29

start : exp EOF ;

exp : exp '*' exp # MULT | exp '+' exp # ADD | ID # TERM ;

“a * a + a * a”

2

1

3 Var('a') Var('a')

BinaryOp(*)

Var('a') Var('a')4 6

5

7

BinaryOp(+)

BinaryOp(*)

Page 30: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Demo 3

30

Page 31: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt

Notes

• You are not allowed to use syntactic predicates.• Look on our website for more material.• Due date is October, 22th

31


Recommended