Nondeterministic Finite Acceptor (NFA)cs453/yr2011/Slides/... · q1q2 q3 a a a q0 Two choices...

Post on 05-Jul-2020

2 views 0 download

transcript

CS453 Lecture Lexical Analysis with JLex 1

Plan for Lexical Analysis with JLex

 Overview of the MeggyJava Assignments

 Lexer generators, show 15min example

 Expressing tokens with regular expressions –  regular expression syntax for JLex –  using JLex with JavaCup

 How do lexer generators work? –  Convert regular expressions to NFA –  Converting an NFA to DFA –  Implementing the DFA

CS453 Lecture Introduction and MiniSVG 2

Structure of the MeggyJava Compiler

“sentences”

Synthesis Analysis

character stream

lexical analysis

“words” tokens

semantic analysis

syntactic analysis

AST

AST and symbol table

code gen

Atmel assembly code

PA2: MeggyJava and Atmel warmup PA3: setPixel compiler PA4: add control flow PA5: add functions PA6: add variables and objects PA7: add arrays

CS453 Lecture Lexical Analysis with JLex 3

Specifying Tokens with JLex  JLex example input file:

 package mjparser;  import java_cup.runtime.Symbol;

 %%  %line  %char  %cup  %public

 %eofval{   return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar));  %eofval}

 LETTER=[A-Za-z]  DIGIT=[0-9]  UNDERSCORE="_"  LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE}  ID={LETTER}({LETT_DIG_UND})*

 %%  "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }

 "+" {return new Symbol(sym.PLUS, ...); }  "if" {return new Symbol(sym.IF,...); }

 {ID} {return new Symbol(sym.ID, new ...

 {EOL} { /* reset yychar */ … }  {WS} { /* ignore */ }

1q 2q

3q

a

a

a

0q

}{aAlphabet =

Nondeterministic Finite Acceptor (NFA)

1q 2q

3q

a

a

a

0q

Two choices

}{aAlphabet =

Nondeterministic Finite Accepter (NFA)

a a

0q

1q 2q

3q

a

a

First Choice

a

a a

0q

1q 2q

3q

a

a

a

First Choice

a a

0q

1q 2q

3q

a

a

First Choice

a

a a

0q

1q 2q

3q

a

a

a “accept”

First Choice

All input is consumed a a

0q

1q 2q

3q

a

a

Second Choice

a

a a

0q

1q 2qa

a

Second Choice

a

3q

a a

0q

1q 2qa

a

a

3q

Second Choice

No transition: the automaton hangs

a a

0q

1q 2qa

a

a

3q

Second Choice

should we reject aa?

Input cannot be consumed An NFA accepts a string: when there is a computation of the NFA that accepts the string

all the input is consumed and the automaton is in a final state

AND

When To Accept a String

Example

aa is accepted by the NFA:

0q

1q 2q

3q

a

a

a

“accept”

0q

1q 2qa

a

a

3q

“reject??” because this computation accepts aa

But this only tells us that choice didn’t work….

a

0q

1q 2q

3q

a

a

Rejection example

a

a

0q

1q 2q

3q

a

a

a

First Choice

a

0q

1q 2q

3q

a

a

a

First Choice

“reject??”

Second Choice

a

0q

1q 2q

3q

a

a

a

Second Choice

a

0q

1q 2qa

a

a

3q

Second Choice

a

0q

1q 2qa

a

a

3q “reject??”

An NFA rejects a string: when there is no computation of the NFA that accepts the string:

•  All the input is consumed and the automaton is in a non final state

•  The input cannot be consumed

OR

Example

a is rejected by the NFA:

0q

1q 2qa

a

a

3q “reject??” 0q

1q 2qa

a

a

3q

“reject??”

All possible computations lead to rejection

1q 2q

3q

a

a

a

0q

Language accepted: }{aaL =

CS453 Lecture Lexical Analysis with JLex 25

Specifying Tokens with JLex  JLex example input file:

 package mjparser;  import java_cup.runtime.Symbol;

 %%  %line  %char  %cup  %public

 %eofval{   return new Symbol(sym.EOF, new TokenValue("EOF", yyline, yychar));  %eofval}

 LETTER=[A-Za-z]  DIGIT=[0-9]  UNDERSCORE="_”  EOL=(\n|\r|\r\n)  LETT_DIG_UND={LETTER}|{DIGIT}|{UNDERSCORE}  ID={LETTER}({LETT_DIG_UND})*

 %%  "&&" {return new Symbol(sym.AND, new TokenValue(yytext(), yyline, yychar)); }

 "+" {return new Symbol(sym.PLUS, ...); }  "if" {return new Symbol(sym.IF,...); }

 {ID} {return new Symbol(sym.ID, new ...

 {EOL} { /* reset yychar */ … }  {WS} { /* ignore */ }

CS453 Lecture Lexical Analysis with JLex 26

Example NFA for Multiple Tokens

CS453 Lecture Lexical Analysis with JLex 27

DFA from IF and ID NFAs