LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Lexical AnalysisLecture 2
Sections 3.1 - 3.4
Robb T. Koether
Hampden-Sydney College
Mon, Jan 19, 2009
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Outline
1 Lexical Analysis
2 Regular Expressions
3 State Diagrams
4 Assignment
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Tokens
A token has a type and a value.Types include id, num, assign, lparen, etc.Values are used primarily with identifiers and numbers.If we read “count”, the type is id and the value is“count”.If we read “123”, the type is num and the value is“123”.If we read “=”, the type is assign and the value is “=”.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Analyzing Tokens
Each type of token can be described by a regularexpression.Therefore, the set of all tokens can be described by aregular expression. (Why?)Regular expressions are accepted by DFAs.Therefore, the set of all tokens can be processed andaccepted by a DFA.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
The set of all regular expressions may be defined in twoparts.The basic part:
ε represents the language {ε}.a represents the language {a} for every a ∈ Σ.Call these languages L(ε) and L(a), respectively.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
The recursive part: Let r and s denote regularexpressions.
r | s represents the language L(r) ∪ L(s).rs represents the language L(r)L(s).r∗ represents the language L(r)∗.
In other wordsL(r | s) = L(r) ∪ L(s).L(rs) = L(r)L(s).L(r∗) = L(r)∗.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (Identifiers)Identifiers in C++ can be represented by a regularexpression.
r = A | B | · · · | Z | a | b | · · · | zs = 0 | 1 | · · · | 9t = r(r | s)∗
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
Definition (Regular definition)
A regular definition of a regular expression is a “grammar” ofthe form
d1 → r1
d2 → r2
...dn → rn
where each ri is a regular expression overΣ ∪ {d1, d2, . . . , di−1}.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Regular Expressions
Note that this definition does not allow recursivelydefined tokens.In other words, di cannot be defined in terms of di, noteven indirectly.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (Identifiers)We may now describe C++ identifiers as follows.
letter → A | B | · · · | Z | a | b | · · · | zdigit → 0 | 1 | · · · | 9
id → letter(letter | digit)∗
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Lexical Analysis
After writing a regular expression for each kind oftoken, we may combine them into one big regularexpression describing all tokens.
id → letter(letter | digit)∗
num → digit(digit)∗
relop → < | > | == | != | >= | <=token → id | num | relop | . . .
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
A regular expression may be represented by a statediagram.The state diagram provides a good guide to writing alexical analyzer program.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
letterletter | digit
digitdigit
id
num
letter
digit
token
digit
letter | digit
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Diagram Diagrams
Unfortunately, it is not that simple.At what point may we stop in an accepting state?Do not read “count” as 5 identifiers: “c”, “o”, “u”, “n”,“t”.When we stop in an accepting state, we must be able todetermine the type of token processed.Did we read the id token “count” or did we read the iftoken “if”?
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
Consider state diagrams to accept relational operators==, !=, <, >, <=, and >=.
=
!
==
!=
<=
=
=
< =
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
Combine them into a single state diagram.
= | !
< | >
relop =
=
1
2
3
4
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
When we reach an accepting state, how can we tellwhich operator was processed?.In general, we design the diagram so that each kind oftoken has its own accepting state.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
If we reach state 3, how do we decide whether tocontinue to state 4?We read characters until the current character does notmatch any pattern, i.e., it would lead to the dead state.At that point, we accept the string, minus the lastcharacter.Later, processing resumes with the last character.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
State Diagrams
The Maximal Munch PrincipleProcess as many symbols as possible and still be able tomatch a regular expression.
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Example
Example (State Diagrams)
=relop
= other
! = other
< = other
other
> = other
other
LexicalAnalysis
Robb T.Koether
LexicalAnalysis
RegularExpressions
StateDiagrams
Assignment
Assignment
HomeworkRead Sections 3.1 - 3.4.