+ All Categories
Home > Documents > Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The...

Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The...

Date post: 17-Aug-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
24
Lexical Analysis Mohammad Ghafari Spring 2019
Transcript
Page 1: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

LexicalAnalysis

Mohammad GhafariSpring 2019

Page 2: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

What is a language?

2

The method of human communication, either spoken orwritten, consisting of the use of words in a structured andconventional way.

Page 3: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

What is a programming language?

3

The means of communication with machines often written inASCII characters.

Page 4: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

We need a “valid” language

Validity breaks down into syntax and semantics. The former isthe arrangement of words, while the latter is the meaning ofwords.

For example:1. The dog the man walks.

2. The dog walks the man.

3. The man walks the dog.

4

Page 5: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Lexical analysis

5

The process of mapping sequences of characters to tokens in a particular language.

x = x + y <ID, x> <EQ> <ID, x> <Plus> <ID, y>

Scanner Parsersource tokens

errors

Page 6: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Typical token types

6

Nontokens are:• comment,• blanks, tabs, and newlines,• etc.

NB. Each reserved world likeif, void, return, etc. has adedicated token.

Page 7: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Regular expressions

We use the regular expressions to specify the grammar of a language.

7

StringsSymbols Language

We can decide whether a string is in the language or not.

Page 8: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Notations

If M and N are the languages, then:

8

Bind tighter

Useful extensions:[abc] means (a|b|c)[d-g] means [defg]

Page 9: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Some examples

9

How about the followings?

ab|c

(a|b)*

aa*bb*

a*(abb*)*(a|)

Page 10: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Principle of longest match

Usually, the scanner should pick the longest possible string asthe next token.

10

Scannerreturn flag != if8;

<ID, flag>

<RETURN>

<NEQ>

<ID, if8>

<SCOLON>

Page 11: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Finite state automata

• A finite automaton has a finite set of states; edges lead fromone state to another, and each edge is labeled with a symbol.One state is the start state, and certain of the states aredistinguished as final states.

• Finite automata are recognizers; they simply say "yes" or "no" about each possible input string.

• They come in two flavors:– Nondeterministic finite automata (NFA)– Deterministic finite automata (DFA)

12

start finala tc

Page 12: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Example

The regular expressions [a-z][a-z0-9]* specifies an identifier.

13

Page 13: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

NFA

It is an automaton that has a choice of edges – labeled with the same symbol – to follow out of a state. Or it may have special edges labeled with epsilon that can be followed without eating any symbol from the input.

14

(a|b)*abb

Page 14: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

DFA

In this automaton no two edges leaving from the same state are labeled with the same symbol.

15

Page 15: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Converting an NFA to a DFA

16

states a bs0

s1

s2

s3

s0, s1

000

s0

s2

s3

0

states a bs0

{s0, s1}{s0, s2}{s0, s3}

{s0, s1}{s0, s1}{s0, s1}{s0, s1}

s0

{s0, s2}{s0, s3}s0

Page 16: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Example

• Find the corresponding DFA of the following automaton.

• Draw a DFA that accepts the aa*bb* expression.

17

A

B

Ca

ab

a,b

ε

Page 17: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Compute e-closure

Lets define e-closure (T) as the states reachable from every state in set T on e-transitions.

18

push all sates of T onto stack;

initialize e-closure(T) to T;

while(stack is not empty){

pop t from the stack;

for(each state u with an edge from t to u labeled e)

if(u is not in e-closure(T)){

add u to e-closure(T);

push u onto stack;

}

}

Page 18: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

The subset construction

Lets define move(T, a) as set of NFA states to which there is a transition on input symbol “a” from some state s in T.

19

while(there is an unmarked state T in Dstates){

mark T;

for(each input symbol a){

U = e-closure(move(T,a));

if (U is not in Dstates)

add U as an unmarked state to Dstates;

Dtran[T,a] = U;

}

}

Page 19: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Example

Apply the subset construction to the following NFA.

20

(a|b)*abb

Page 20: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Example (answer)

21

Page 21: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Lexical analyzer

Each automaton accepts a certain token and the combination of several automata can serve as a lexical analyzer (also know as lexer or scanner).

22

Page 22: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Lexer in practice

23

The lexer must keep track of the longest match seen so far, and the input position of that match.

Page 23: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Example

24

• | the input position at each call to the lexer.

• ⊥ the current position.• T the last final state.

Page 24: Lexical Analysis - Portalscg.unibe.ch/download/lectures/cc/CC-02-Lexical.pdfLexical analysis 5 The process of mapping sequences of characters to tokens in a particular language. x

Acknowledgement

• Compilers: Principles, Techniques, and Tools by AlfredV.Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman.

• Modern Compiler Implementation in Java by Andrew W.Appel and Jens Palsberg.

26


Recommended