+ All Categories
Home > Documents > Lexical Analysis - Lecture 2 Sections 3.1 - 3 -...

Lexical Analysis - Lecture 2 Sections 3.1 - 3 -...

Date post: 11-May-2018
Category:
Upload: dinhnga
View: 236 times
Download: 4 times
Share this document with a friend
21
Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions State Diagrams Assignment Lexical Analysis Lecture 2 Sections 3.1 - 3.4 Robb T. Koether Hampden-Sydney College Mon, Jan 19, 2009
Transcript
Page 1: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Lexical AnalysisLecture 2

Sections 3.1 - 3.4

Robb T. Koether

Hampden-Sydney College

Mon, Jan 19, 2009

Page 2: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Outline

1 Lexical Analysis

2 Regular Expressions

3 State Diagrams

4 Assignment

Page 3: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Tokens

A token has a type and a value.Types include id, num, assign, lparen, etc.Values are used primarily with identifiers and numbers.If we read “count”, the type is id and the value is“count”.If we read “123”, the type is num and the value is“123”.If we read “=”, the type is assign and the value is “=”.

Page 4: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Analyzing Tokens

Each type of token can be described by a regularexpression.Therefore, the set of all tokens can be described by aregular expression. (Why?)Regular expressions are accepted by DFAs.Therefore, the set of all tokens can be processed andaccepted by a DFA.

Page 5: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Regular Expressions

The set of all regular expressions may be defined in twoparts.The basic part:

ε represents the language {ε}.a represents the language {a} for every a ∈ Σ.Call these languages L(ε) and L(a), respectively.

Page 6: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Regular Expressions

The recursive part: Let r and s denote regularexpressions.

r | s represents the language L(r) ∪ L(s).rs represents the language L(r)L(s).r∗ represents the language L(r)∗.

In other wordsL(r | s) = L(r) ∪ L(s).L(rs) = L(r)L(s).L(r∗) = L(r)∗.

Page 7: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (Identifiers)Identifiers in C++ can be represented by a regularexpression.

r = A | B | · · · | Z | a | b | · · · | zs = 0 | 1 | · · · | 9t = r(r | s)∗

Page 8: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Regular Expressions

Definition (Regular definition)

A regular definition of a regular expression is a “grammar” ofthe form

d1 → r1

d2 → r2

...dn → rn

where each ri is a regular expression overΣ ∪ {d1, d2, . . . , di−1}.

Page 9: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Regular Expressions

Note that this definition does not allow recursivelydefined tokens.In other words, di cannot be defined in terms of di, noteven indirectly.

Page 10: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (Identifiers)We may now describe C++ identifiers as follows.

letter → A | B | · · · | Z | a | b | · · · | zdigit → 0 | 1 | · · · | 9

id → letter(letter | digit)∗

Page 11: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Lexical Analysis

After writing a regular expression for each kind oftoken, we may combine them into one big regularexpression describing all tokens.

id → letter(letter | digit)∗

num → digit(digit)∗

relop → < | > | == | != | >= | <=token → id | num | relop | . . .

Page 12: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

State Diagrams

A regular expression may be represented by a statediagram.The state diagram provides a good guide to writing alexical analyzer program.

Page 13: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (State Diagrams)

letterletter | digit

digitdigit

id

num

letter

digit

token

digit

letter | digit

Page 14: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Diagram Diagrams

Unfortunately, it is not that simple.At what point may we stop in an accepting state?Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”,“t”.When we stop in an accepting state, we must be able todetermine the type of token processed.Did we read the id token “count” or did we read the iftoken “if”?

Page 15: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (State Diagrams)

Consider state diagrams to accept relational operators==, !=, <, >, <=, and >=.

=

!

==

!=

<=

=

=

< =

Page 16: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (State Diagrams)

Combine them into a single state diagram.

= | !

< | >

relop =

=

1

2

3

4

Page 17: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

State Diagrams

When we reach an accepting state, how can we tellwhich operator was processed?.In general, we design the diagram so that each kind oftoken has its own accepting state.

Page 18: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

State Diagrams

If we reach state 3, how do we decide whether tocontinue to state 4?We read characters until the current character does notmatch any pattern, i.e., it would lead to the dead state.At that point, we accept the string, minus the lastcharacter.Later, processing resumes with the last character.

Page 19: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

State Diagrams

The Maximal Munch PrincipleProcess as many symbols as possible and still be able tomatch a regular expression.

Page 20: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Example

Example (State Diagrams)

=relop

= other

! = other

< = other

other

> = other

other

Page 21: Lexical Analysis - Lecture 2 Sections 3.1 - 3 - …people.hsc.edu/faculty-staff/robbk/Coms480/Lectures/Spring 2009...Lexical Analysis Robb T. Koether Lexical Analysis Regular Expressions

LexicalAnalysis

Robb T.Koether

LexicalAnalysis

RegularExpressions

StateDiagrams

Assignment

Assignment

HomeworkRead Sections 3.1 - 3.4.


Recommended