Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | alexandrina-barnett |
View: | 213 times |
Download: | 1 times |
Regular Expressions
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language Translators
Regular Expressions
• A compact, easy-to-read language description.
• Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.
Regular ExpressionsDefinition: A regular expression over an alphabet Σ is
recursively defined as follows:
1. ø denotes language ø 2. ε denotes language {ε}3. a denotes language {a}, for all a Σ.4. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s.5. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s.6. P* denotes L(P)*, where P is r.e.
To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +
Regular Expressions
Examples:(O + 1)*: any string of O’s and 1’s.(O + 1)*1: any string of O’s and 1’s, ending with a 1.1*O1*: any string of 1’s with a single O inserted.Letter (Letter + Digit)*: an identifier.Digit Digit*: an integer.Quote Char* Quote: a string. †# Char* Eoln: a comment. †{Char*}: another comment. †
† Assuming that Char does not contain quotes, eoln’s, or } .
Regular Expressions
Conversion from Right-linear grammars to regular expressions
Example:S → aS R → aS → bR → ε
What does S → aS mean? L(S) {a}·L(S)
S → bR means L(S) {b}·L(R)S → ε means L(S) {ε}
Regular Expressions
Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε}or S = aS + bR + ε
Similarly, R → aS means R = aS.
Thus, S = aS + bR + ε R = aS
System of simultaneous equations, in which the variables are nonterminals.
Regular Expressions
Solving systems of simultaneously equations.S = aS + bR + εR = aS
Back substitute R = aS:S = aS + baS + ε
= (a + ba) S + ε
Question: What to do with equations of the form:X = X + β ?
Regular Expressions
Answer: β L(x), so αβ L(x), ααβ L(x), αααβ L(x), …
Thus α*β = L(x).
In our case,S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*
Regular Expressions
Right-linear regular grammar↓
regular expression
1. A = α1 + α2 + … + αn if A → α1
→ α2
. . . → αn
Regular Expressions
2. If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α.
If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β.
Note: Some algebraic manipulations may be needed to obtain the form X = αX + β.
Important: Catenation is not commutative!!
Regular Expressions
Example: S → a R → abaU U → aS → bU → U → b → bR
S = a + bU + bRR = abaU + U = (aba + ε) UU = aS + b
Back substitute R:S = a + bU + b(aba + ε) UU = aS + b
Regular Expressions
Back substitute U:S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb
= (ba + babaa)S + (a + bb + babab)
thereforeS = (ba + babaa)*(a + bb + babab)
repeats
Regular Expressions
Summarizing:
RGR RGL Minimum
DFA
RE NSA DFA
Done
Soon
Regular Expressions
Regular Expression↓
NFA
Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state.
Conversions:
if ø21
Regular Expressions
• if ε
• if a
• if P + Q
• if P· Q
or
1
1 2a
1 2
ε
Q
P
ε ε
ε
P Qε
1 Pε
Q 2ε ε
Regular Expressions if P*
Example: (b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
1 Pε
2
ε
ε
ε
1 2
3 4
5 6
b
a
b
Regular Expressions
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
(b (aba + ε) a)*
7 8
9
10 11
a
a
3 4 5 6
78
a b
a
εε
Regular Expressions
(b (aba + ε) a)*
(b (aba + ε) a)*
3 4 5 6
78
a b
a
εε
13912
εεε
ε
3 4 5 6
78
a b
a
εε
13912
εεε
ε
2 1b
ε
Regular Expressions
(b (aba + ε) a) *
3 4 5 6
78
a b
a
εε
13912
εεε
ε
2 1b
ε
1011
εa
Regular Expressions
(b (aba + ε) a)*
2 12 3 4
67
ε a
ε
ε
813 aε
14 1ε b
10 ε ε
59ε
ε
11ε
a15
ε
ε
Regular Expressions
Regular Expression↓
NFA
Start With:
ALGORITHM 2
E
Regular ExpressionsApply Rules:
a*
a + b
ab
ε εa
a b
a
b
Regular Expressions
Algorithm 1:• Builds FSA bottom up• Good for machines• Bad for humans
Algorithm 2:• Builds FSA top down• Bad for machines• Good for humans
Arguable
Regular Expressions
Example (Algorithm 2):
(a + b)* (aa + bb)
(a + b)* aa + bb
ε εaa
bba + b
ε ε
a
b
a a
b b
Regular Expressions
Example (Algorithm 2):
ba(a + b)* ab
b a ε ε a b
a
b
Regular Expressions
Deterministic Finite-State Automata (DFA’s)
Definition: A deterministic FSA is defined just like an NFA, except that
δ: Q x Σ → Q, rather thanδ: Q x Σ union {ε}→ 2Q
Thus, bothand
are impossible.
ε a
a
Regular Expressions
Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s.
Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.
Regular ExpressionsConversion from NFA’s to DFA’s:
• “Simulate” all moves of the NFA with the DFA.• The start state of the DFA is the start state of the
NFA (say, S), together with states that are ε-reachable from S.
• Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states.
• New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state.
• The final states in the DFA are those that contain any final state of the NFA.
Regular Expressions
Example: a*b + ba*
NFA
ε
b
b
ε ε
ε
1
3
4
2
5
6
a
a
Regular Expressions
DFAInput
State a b123 23 456 23 23 6456 56 --- 6 --- --- 56 56 ---
a
b123
23
456 56
6b
a
a
a
Regular Expressions
In general, if NFA has N states, the DFA can have as many as 2N states.
Example: ba (a + b)* ab
ε
a
ε ε
ε
3
5
6
4
7
8b a ε0 1 2
b
ε
ε
11 10 9
NFA
Regular Expressions
DFAInput
State a b 0 --- 1 1 234689 --- 234689 34568910 34678934568910 34568910 34678911 346789 34568910 34678934678911 34568910 346789
Regular Expressions
a
b
a
b
234689
346789
b
34568910
34678911a
b
a0 1 ab
Regular Expressions
State Minimization
Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’.
Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.
Regular Expressions
Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} }
Π2 = { {1, 2, 3,}, {4}, {5} }
Π3 = { {1, 3}, {2}, {4}, {5} }
Note: Π2 is a refinement of Π1 , and Π3 is a refinement of Π2.
Regular Expressions
Minimization Algorithm:
1. Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable.
2. Partition all states into two groups (final states and non-final states).
3. Complete the “Next State” table for each group, by specifying transitions from group to group.Form the next partition: split groups in which Next State table entries differ.Repeat 3 until no further splitting is possible.
4. Determine start and final states.
Regular ExpressionsExample:
Π0 = { {1, 2, 3, 4}, {5} }
State a b1 1234 12342 1234 12343 1234 12344 1234 55 1234 1234
b
a
b
1
2
3 5
4
b
b aa
a
b
a
Split {4} from partition {1,2,3,4}
Regular Expressions
Π1 = { {1, 2, 3}, {4}, {5} }
State a b1 123 1232 123 43 123 1234 123 55 123 123
Split {2} from partition {1,2,3}
a
b
1
2
3 5
4
b
b aa
a
Regular Expressions
Π2 = { {1, 3}, {2}, {4}, {5} }
State a b1 2 133 2 132 2 44 2 55 2 13
No more splitting Minimal DFA
5
13
4
2a
a
aa
b
b
b
Regular Expressions
Summary of Regular Languages
• Smallest class in the Chomsky hierarchy.• Appropriate for lexical analysis.
• Four representations: RGR , RGL , RE and FSA.
• All four are equivalent; there are algorithms to perform transformations among them.
• Various advantages and disadvantages among these four, for language designer, implementor, and user.
• FSA’s can be made deterministic, and minimal.