+ All Categories
Home > Documents > Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions...

Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions...

Date post: 22-Dec-2015
Category:
View: 224 times
Download: 0 times
Share this document with a friend
35
Lecture 2 Lexical Analysis Topics Topics Sample Simple Compiler Operations on strings Regular expressions Finite Automata Readings: Readings: January 11, 2006 CSCE 531 Compiler Construction
Transcript

Lecture 2 Lexical Analysis

Lecture 2 Lexical Analysis

Topics Topics Sample Simple Compiler Operations on strings Regular expressions Finite Automata

Readings:Readings:

January 11, 2006

CSCE 531 Compiler Construction

– 2 – CSCE 531 Spring 2006

OverviewOverviewLast TimeLast Time

A little History Compilers vs Interpreter Data-Flow View of Compilers Regular Languages Course Pragmatics

Today’s Lecture Today’s Lecture Why Study Compilers? xx

ReferencesReferences Chapter 2, Chapter 3

Assignment Due Wednesday Jan 18Assignment Due Wednesday Jan 18 3.3a; 3.5a,b; 3.6a,b,c; 3.7a; 3.8b

– 3 – CSCE 531 Spring 2006

A Simple Compiler for ExpressionsA Simple Compiler for Expressions

Chapter Two OverviewChapter Two Overview

Structure of the simple compiler, really just Structure of the simple compiler, really just translator for infix expressions translator for infix expressions postfix postfix

Grammars Grammars

Parse TreesParse Trees

Syntax directed TranslationSyntax directed Translation

Predictive ParsingPredictive Parsing

Translator for Simple ExpressionsTranslator for Simple Expressions Grammar Rewritten grammar (equivalent one better for pred. parsing) Parsing modules fig 2.24 Specification of Translator fig 2.35 Structure of translator fig 2.36

– 4 – CSCE 531 Spring 2006

GrammarsGrammars

Grammar (or a context free grammar more correctly) hasGrammar (or a context free grammar more correctly) has

A set of tokens also known as terminalsA set of tokens also known as terminals

A set of nonterminalsA set of nonterminals

A set of productions of the formA set of productions of the form nonterminal nonterminal sequence of tokens and/or nonterminals sequence of tokens and/or nonterminals

A special nonterminal the start symbol.A special nonterminal the start symbol.

ExampleExample

E E E + E E + E

E E E * E E * E

E E digit digit

– 5 – CSCE 531 Spring 2006

DerivationsDerivations

A derivation is a sequence of rewriting of a string of A derivation is a sequence of rewriting of a string of grammar symbols using the productions in a grammar symbols using the productions in a grammar. grammar.

We use the symbol We use the symbol to denote that one string of to denote that one string of grammar symbols is obtained by rewritting another grammar symbols is obtained by rewritting another using a productionusing a production

XX Y if there is a production N Y if there is a production N ββ where where The nonterminal N occurs in the sequence X of Grammar

symbols And Y is the same as X except β replaces the N

ExampleExample

E E E+E E+E d+E d+E d+ E*E d+ E*E d+ E+E*E d+ E+E*E d+d+E*E d+d+E*E d+d+d*E d+d+d*E d+d+d*d d+d+d*d

– 6 – CSCE 531 Spring 2006

Parse TreesParse Trees

A graphical presentation of a derivation, satisfyingA graphical presentation of a derivation, satisfying

Root is the start symbolRoot is the start symbol

Each leaf is a token or Each leaf is a token or εε (note different font from (note different font from text)text)

Each interior node is a nonterminalEach interior node is a nonterminal

If A is a parent with children XIf A is a parent with children X1 1 , X, X22 … X … Xnn then thenA A X X11XX22 … X … Xnn is a production is a production

– 7 – CSCE 531 Spring 2006

Syntax directed TranslationSyntax directed Translation

Frequently the rewritting by a production will be called a reduction Frequently the rewritting by a production will be called a reduction or reducing by the particular production.or reducing by the particular production.

Syntax directed translation attaches action (code) that are done Syntax directed translation attaches action (code) that are done when the reductions are performedwhen the reductions are performed

ExampleExample

EE E + TE + T {print(‘+’);}{print(‘+’);}

EE E - TE - T {print(‘-’);}{print(‘-’);}

EE TT

T T 00 {print(‘0’);} {print(‘0’);}

T T 11 {print(‘1’);} {print(‘1’);}

……

T T 99 {print(‘9’);} {print(‘9’);}

– 8 – CSCE 531 Spring 2006

Equivalent GrammarsEquivalent Grammars

– 9 – CSCE 531 Spring 2006

Specification of the translatorSpecification of the translatorS S L eof L eof figure 2.38figure 2.38

LL E ; L E ; L

L L ЄЄ

EE T E’T E’

E’E’ + T { print(‘+’); } E’+ T { print(‘+’); } E’

E’E’ - T { print(‘-’); } E’ - T { print(‘-’); } E’

EE ЄЄ

TT F T’F T’

T’T’ * F { print(‘*’); } T’* F { print(‘*’); } T’

T’T’ / F { print(‘/’); } T’ T / F { print(‘/’); } T’ T ЄЄ

FF ( E ) ( E )

FF id id { print(id.lexeme);}{ print(id.lexeme);}

FF num num { print(num.value);}{ print(num.value);}

– 10 – CSCE 531 Spring 2006

Translating to codeTranslating to code

E E T E’T E’

E’ E’ + T { print(‘+’); } E’+ T { print(‘+’); } E’

E’ E’ - T { print(‘-’); } E’ - T { print(‘-’); } E’

E E ЄЄ

Expr()Expr()

{{

int t;int t;

term();term();

while(1)while(1)

switch(lookahead){switch(lookahead){

case ‘+’: case ‘-’:case ‘+’: case ‘-’:

t = lookahead;t = lookahead;

match(lookahead);match(lookahead);term();term();

emit(t, NONE);emit(t, NONE);

continue;continue;

……

– 11 – CSCE 531 Spring 2006

Overview of the Code Figure 2.36Overview of the Code Figure 2.36

/class/csce531-001/class/csce531-001

– 12 – CSCE 531 Spring 2006

Operations on StringsOperations on Strings

A language over an alphabet is a set of strings of A language over an alphabet is a set of strings of characters from the alphabet.characters from the alphabet.

Operations on strings: Operations on strings: let x=x1x2…xn and t=t1t2…tm then

Concatenation: xt =xConcatenation: xt =x11xx22…x…xnntt11tt22…t…tmm

Alternation: x|t = either xAlternation: x|t = either x11xx22…x…xnn or t or t11tt22…t…tmm

– 13 – CSCE 531 Spring 2006

Operations on Sets of StringsOperations on Sets of Strings

Operations on sets of strings: Operations on sets of strings:

For these let S = {sFor these let S = {s11, s, s22, … s, … smm} and R = {r} and R = {r11, r, r22, … r, … rnn}}

Alternation: S | T = S U T = {sAlternation: S | T = S U T = {s11, s, s22, … s, … smm, r, r11, r, r22, … r, … rn n } }

Concatenation: Concatenation:

ST ={st | where s ST ={st | where s ЄЄ S and t S and t ЄЄ T} T}

= { s= { s11rr11, s, s11rr22, … s, … s11rrnn, s, s22rr11, … s, … s22rrnn, … s, … smmrr11, … s, … smmrrnn}}

Power: SPower: S22 = S S, S = S S, S33= S= S22 S, S S, Snn =S =Sn-1n-1 S S

What is SWhat is S00??

Kleene Closure: S* = UKleene Closure: S* = U∞∞i=0i=0 S Sii , note S , note S00 = is in S* = is in S*

– 14 – CSCE 531 Spring 2006

Operations cont. Kleene ClosureOperations cont. Kleene Closure

Powers: Powers: S2 = S S S3= S2 S … Sn =Sn-1 S

What is SWhat is S00??

Kleene Closure: S* = UKleene Closure: S* = U∞∞i=0i=0 S Sii , note S , note S00 = is in S* = is in S*

– 15 – CSCE 531 Spring 2006

Examples of Operations on Sets of StringsExamples of Operations on Sets of Strings

Operations on sets of strings: Operations on sets of strings:

For these let S = {a,b,c} and R = {t,u}For these let S = {a,b,c} and R = {t,u}

Alternation: S | T = S U T = {a,b,c,t,uAlternation: S | T = S U T = {a,b,c,t,u } }

Concatenation: Concatenation:

ST ={st | where s ST ={st | where s ЄЄ S and t S and t ЄЄ T} T}

= { at, au, bt, bu, ct, cu}= { at, au, bt, bu, ct, cu}

Power: SPower: S22 = { aa, ab, ac, ba, bb, bc, ca, cb, cc} = { aa, ab, ac, ba, bb, bc, ca, cb, cc}

SS33= { aaa, aab, aac, … ccc} 27 elements= { aaa, aab, aac, … ccc} 27 elements

Kleene closure: S* = {any string of any length of a’s, Kleene closure: S* = {any string of any length of a’s, b’s and c’s}b’s and c’s}

– 16 – CSCE 531 Spring 2006

Examples of Operations on Sets of StringsExamples of Operations on Sets of Strings

– 17 – CSCE 531 Spring 2006

Regular ExpressionsRegular Expressions

For a given alphabet For a given alphabet ΣΣ the following are regular the following are regular expressions:expressions:

If a If a ЄЄ ΣΣ then a is a regular expression and L(a) = { a } then a is a regular expression and L(a) = { a }

ЄЄ is a regular expression and L( is a regular expression and L(ЄЄ) = { ) = { ЄЄ } }

ΦΦ is a regular expression and L( is a regular expression and L(ΦΦ) = ) = ΦΦ

And if s and t are regular expressions denoting And if s and t are regular expressions denoting languages L(s) and L(t) respectively thenlanguages L(s) and L(t) respectively then st is a regular expression and L(st) = L(s) L(t) s | t is a regular expression and L(s | t) = L(s) U L(t) s* is a regular expression and L(s*) = L(s)*

– 18 – CSCE 531 Spring 2006

Why Regular Expressions?Why Regular Expressions?

We use regular expressions to describe the tokensWe use regular expressions to describe the tokens

Examples:Examples:

Reg expr for C identifiersReg expr for C identifiers C identifiers? Any string of letters, underscores and digits that

start with a letter or underscore

ID reg expr = (letter | underscore) (letter | underscore | digit)*

Or more explicitly

ID reg expr = ( a|b|…|z|_)(a|b|…z|_|0|1…|9)*

– 19 – CSCE 531 Spring 2006

Pop QuizPop QuizGiven r and s are regular expressions thenGiven r and s are regular expressions then

What is rWhat is rЄЄ ? ? r | r | ЄЄ ? ?

Describe the Language denoted by 0*110*Describe the Language denoted by 0*110*

Describe the Language denoted by (0|1)*110*Describe the Language denoted by (0|1)*110*

Give a regular expression for the language of 0’s Give a regular expression for the language of 0’s and 1’s such that end in a 1and 1’s such that end in a 1

Give a regular expression for the language of 0’s Give a regular expression for the language of 0’s and 1’s such that every 0 is followed by a 1and 1’s such that every 0 is followed by a 1

– 20 – CSCE 531 Spring 2006

Recognizers of Regular LanguagesRecognizers of Regular LanguagesTo develop efficient lexical analyzers (scanners) we will To develop efficient lexical analyzers (scanners) we will

rely on a mathematical model called finite automata, rely on a mathematical model called finite automata, similar to the state machines that you have probably similar to the state machines that you have probably seen. In particular we will use deterministic finite seen. In particular we will use deterministic finite automata, DFAs.automata, DFAs.

The construction of a lexical analyzer will then proceed as:The construction of a lexical analyzer will then proceed as:

1.1. Identify all tokensIdentify all tokens

2.2. Develop regular expressions for eachDevelop regular expressions for each

3.3. Convert the regular expressions to finite automataConvert the regular expressions to finite automata

4.4. Use the transition table for the finite automata as the Use the transition table for the finite automata as the basis for the scannerbasis for the scanner

We will actually use the tools lex and/or flex for steps 3 We will actually use the tools lex and/or flex for steps 3 and 4.and 4.

– 21 – CSCE 531 Spring 2006

Transition Diagram for a DFATransition Diagram for a DFA

Start in state sStart in state s00 then if the input is “f” make transition to then if the input is “f” make transition to state sstate s11..

The from state sThe from state s1 1 if the input is “o” make transition to state if the input is “o” make transition to state ss22..

And from state sAnd from state s2 2 if the input is “r” make transition to state if the input is “r” make transition to state ss33..

The double circle denotes an “accepting state” which The double circle denotes an “accepting state” which means we recognized the token.means we recognized the token.

Actually there is a missing state and transitionActually there is a missing state and transition

f o rs0 s1 s2 s3

– 22 – CSCE 531 Spring 2006

Now what about “fort”Now what about “fort”

The string “fort” is an identifier, not the keyword “for” The string “fort” is an identifier, not the keyword “for” followed by “t.”followed by “t.”

Thus we can’t really recognize the token until we see a Thus we can’t really recognize the token until we see a terminator – whitespace or a special symbol ( one terminator – whitespace or a special symbol ( one of ,;(){}[] of ,;(){}[]

– 23 – CSCE 531 Spring 2006

Deterministic Finite AutomataDeterministic Finite Automata

A Deterministic finite automaton (DFA) is a mathematical model that consists of

1. a set of states S

2. a set of input symbols ∑ , the input alphabet

3. a transition function δ: S x ∑ S that for each state and each input maps to the next state

4. a state s0 that is distinguished as the start state

5. a set of states F distinguished as accepting (or final) states

– 24 – CSCE 531 Spring 2006

DFA to recognize keyword “for”DFA to recognize keyword “for”

ΣΣ= {a,b,c …z, A,B,…Z,0,…9,’,’, ‘;’, …}= {a,b,c …z, A,B,…Z,0,…9,’,’, ‘;’, …}

S = {sS = {s00, s, s11, s, s22, s, s3, 3, ssdeaddead}}

ss00, is the start state, is the start state

SSF F = {s= {s33}}

δ given by the table below

ff oo rr OthersOthers

ss00 ss11 ssdeaddead

ss11 ssdeaddead

ss22 ssdeaddead

ss33 ssdeaddead

ssdeaddead ssdeaddead ssdeaddead ssdeaddead ssdeaddead

– 25 – CSCE 531 Spring 2006

Language Accepted by a DFALanguage Accepted by a DFA

A string xA string x00xx11…x…xnn is accepted by a DFA M = ( is accepted by a DFA M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) ) if s if si+1i+1= = δδ(s(sii, x, xii) for i=0,1, …n and s) for i=0,1, …n and sn+1n+1 ЄЄ S SFF

i.e. if xi.e. if x00xx11…x…xn n determines a path through the state diagram determines a path through the state diagram for the DFA that ends in an Accepting State.for the DFA that ends in an Accepting State.

Then the language accepted by the DFA Then the language accepted by the DFA M = ( M = (ΣΣ, S, s, S, s00, , δδ, S, SFF), denoted L(M) is the set of all ), denoted L(M) is the set of all

strings accepted by M.strings accepted by M.

– 26 – CSCE 531 Spring 2006

What is the Language Accepted by…What is the Language Accepted by…

– 27 – CSCE 531 Spring 2006

DFA1.cDFA1.c/*/*

* Deteministic Finite Automata Simulation* Deteministic Finite Automata Simulation

* *

* One line of input is read and then processed character by character.* One line of input is read and then processed character by character.

* Thus '\n' (EOL) is treated as the end of input.* Thus '\n' (EOL) is treated as the end of input.

* The major functions are:* The major functions are:

** delta(s,c) - that implements the tranistion function, anddelta(s,c) - that implements the tranistion function, and

** accept(s) - that tells whether state s is an accepting state or not.accept(s) - that tells whether state s is an accepting state or not.

* The particular DFA recognizes strings of digits that end in 000.* The particular DFA recognizes strings of digits that end in 000.

* The DFA has:* The DFA has:

* * S = {0, 1, 2, 3, DEAD_STATE}S = {0, 1, 2, 3, DEAD_STATE}

* Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3* Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3

* Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0* Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0

* Transitions on non-digits: Si=> DEAD_STATE* Transitions on non-digits: Si=> DEAD_STATE

**

*/*/

– 28 – CSCE 531 Spring 2006

#include <stdio.h>#include <stdio.h>

#define DEAD_STATE -1#define DEAD_STATE -1

#define ACCEPT 1#define ACCEPT 1

#define DO_NOT 0#define DO_NOT 0

#define EOL '\n'#define EOL '\n'

main(){main(){

int c;int c;

int state;int state;

state = 0;state = 0;

while((c = getchar()) != EOL && state != DEAD_STATE){while((c = getchar()) != EOL && state != DEAD_STATE){

state = delta(state, c);state = delta(state, c);

}}

if(accept(state)){if(accept(state)){

printf("Accept!\n");printf("Accept!\n");

}else{}else{

printf("Do not accept!\n");printf("Do not accept!\n");

}}

}}

– 29 – CSCE 531 Spring 2006

/* DFA Transition function delta *//* DFA Transition function delta */

/* delta(s,c) = transition from state s on input c *//* delta(s,c) = transition from state s on input c */

int delta(int s, int c){int delta(int s, int c){

switch (s){switch (s){

case 0: if (c == '0') return 1;case 0: if (c == '0') return 1;

else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;

else return(DEAD_STATE);else return(DEAD_STATE);

break;break;

case 1: if (c == '0') return 2;case 1: if (c == '0') return 2;

else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;

else return(DEAD_STATE);else return(DEAD_STATE);

break;break;

case 2: if (c == '0') return 3;case 2: if (c == '0') return 3;

else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;

else return(DEAD_STATE);else return(DEAD_STATE);

break;break;

case 3: if (c == '0') return 3;case 3: if (c == '0') return 3;

else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;

else return(DEAD_STATE);else return(DEAD_STATE);

break;break;

case DEAD_STATE: return DEAD_STATE;case DEAD_STATE: return DEAD_STATE;

break;break;

default:default:

printf("Bad State\n");printf("Bad State\n");

return(DEAD_STATE);return(DEAD_STATE);

}}

}}

– 30 – CSCE 531 Spring 2006

int accept(state){int accept(state){

if (state == 3) return ACCEPT;if (state == 3) return ACCEPT;

else return DO_NOT;else return DO_NOT;

}}

– 31 – CSCE 531 Spring 2006

Non-Deterministic Finite AutomataNon-Deterministic Finite Automata

What does deterministic mean?What does deterministic mean?

In a Non-Deterministic Finite Automata (NFA) we relax the In a Non-Deterministic Finite Automata (NFA) we relax the restriction that the transition function restriction that the transition function δ maps every state and maps every state and every element of the alphabet to a unique state, i.e. every element of the alphabet to a unique state, i.e. δ: S x ∑ S

An NFA can: Have multiple transitions from a state for the same input Have Є transitions, where a transition from one state to another can

be accomplished without consuming an input character Not have transitions defined for every state and every input

Note for NFAs Note for NFAs δ: S x ∑ 2S where is the power set of Swhere is the power set of S

– 32 – CSCE 531 Spring 2006

Language Accepted by an NFALanguage Accepted by an NFA

A string xA string x00xx11…x…xnn is accepted by an NFA is accepted by an NFA

M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) if s) if si+1i+1= = δδ(s(sii, x, xii) for i=0,1, …n and ) for i=0,1, …n and ssn+1n+1 ЄЄ S SFF

i.e. if xi.e. if x00xx11…x…xn n can determines a path through the state can determines a path through the state diagram for the NFA that ends in an Accepting State, diagram for the NFA that ends in an Accepting State, taking taking ЄЄ where ever necessary. where ever necessary.

Then the language accepted by the DFA Then the language accepted by the DFA M = ( M = (ΣΣ, S, s, S, s00, , δδ, S, SFF), denoted L(M) is the set of ), denoted L(M) is the set of

all strings accepted by M.all strings accepted by M.

– 33 – CSCE 531 Spring 2006

Language Accepted by an NFALanguage Accepted by an NFA

– 34 – CSCE 531 Spring 2006

Thompson ConstructionThompson Construction

For any regular expression R construct an NFA, M, that For any regular expression R construct an NFA, M, that accepts the language denoted by R, i.e., L(M) = L(R).accepts the language denoted by R, i.e., L(M) = L(R).

– 35 – CSCE 531 Spring 2006


Recommended