+ All Categories
Home > Documents > Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida...

Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida...

Date post: 04-Jan-2016
Category:
Upload: alexandrina-barnett
View: 213 times
Download: 1 times
Share this document with a friend
40
Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators
Transcript
Page 1: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language Translators

Page 2: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

• A compact, easy-to-read language description.

• Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.

Page 3: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular ExpressionsDefinition: A regular expression over an alphabet Σ is

recursively defined as follows:

1. ø denotes language ø 2. ε denotes language {ε}3. a denotes language {a}, for all a Σ.4. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s.5. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s.6. P* denotes L(P)*, where P is r.e.

To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +

Page 4: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Examples:(O + 1)*: any string of O’s and 1’s.(O + 1)*1: any string of O’s and 1’s, ending with a 1.1*O1*: any string of 1’s with a single O inserted.Letter (Letter + Digit)*: an identifier.Digit Digit*: an integer.Quote Char* Quote: a string. †# Char* Eoln: a comment. †{Char*}: another comment. †

† Assuming that Char does not contain quotes, eoln’s, or } .

Page 5: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Conversion from Right-linear grammars to regular expressions

Example:S → aS R → aS → bR → ε

What does S → aS mean? L(S) {a}·L(S)

S → bR means L(S) {b}·L(R)S → ε means L(S) {ε}

Page 6: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε}or S = aS + bR + ε

Similarly, R → aS means R = aS.

Thus, S = aS + bR + ε R = aS

System of simultaneous equations, in which the variables are nonterminals.

Page 7: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Solving systems of simultaneously equations.S = aS + bR + εR = aS

Back substitute R = aS:S = aS + baS + ε

= (a + ba) S + ε

Question: What to do with equations of the form:X = X + β ?

Page 8: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Answer: β L(x), so αβ L(x), ααβ L(x), αααβ L(x), …

Thus α*β = L(x).

In our case,S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*

Page 9: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Right-linear regular grammar↓

regular expression

1. A = α1 + α2 + … + αn if A → α1

→ α2

. . . → αn

Page 10: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

2. If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α.

If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β.

Note: Some algebraic manipulations may be needed to obtain the form X = αX + β.

Important: Catenation is not commutative!!

Page 11: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Example: S → a R → abaU U → aS → bU → U → b → bR

S = a + bU + bRR = abaU + U = (aba + ε) UU = aS + b

Back substitute R:S = a + bU + b(aba + ε) UU = aS + b

Page 12: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Back substitute U:S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb

= (ba + babaa)S + (a + bb + babab)

thereforeS = (ba + babaa)*(a + bb + babab)

repeats

Page 13: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Summarizing:

RGR RGL Minimum

DFA

RE NSA DFA

Done

Soon

Page 14: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Regular Expression↓

NFA

Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state.

Conversions:

if ø21

Page 15: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

• if ε

• if a

• if P + Q

• if P· Q

or

1

1 2a

1 2

ε

Q

P

ε ε

ε

P Qε

1 Pε

Q 2ε ε

Page 16: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions if P*

Example: (b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

1 Pε

2

ε

ε

ε

1 2

3 4

5 6

b

a

b

Page 17: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

7 8

9

10 11

a

a

3 4 5 6

78

a b

a

εε

Page 18: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

(b (aba + ε) a)*

(b (aba + ε) a)*

3 4 5 6

78

a b

a

εε

13912

εεε

ε

3 4 5 6

78

a b

a

εε

13912

εεε

ε

2 1b

ε

Page 19: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

(b (aba + ε) a) *

3 4 5 6

78

a b

a

εε

13912

εεε

ε

2 1b

ε

1011

εa

Page 20: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

(b (aba + ε) a)*

2 12 3 4

67

ε a

ε

ε

813 aε

14 1ε b

10 ε ε

59ε

ε

11ε

a15

ε

ε

Page 21: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Regular Expression↓

NFA

Start With:

ALGORITHM 2

E

Page 22: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular ExpressionsApply Rules:

a*

a + b

ab

ε εa

a b

a

b

Page 23: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Algorithm 1:• Builds FSA bottom up• Good for machines• Bad for humans

Algorithm 2:• Builds FSA top down• Bad for machines• Good for humans

Arguable

Page 24: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Example (Algorithm 2):

(a + b)* (aa + bb)

(a + b)* aa + bb

ε εaa

bba + b

ε ε

a

b

a a

b b

Page 25: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Example (Algorithm 2):

ba(a + b)* ab

b a ε ε a b

a

b

Page 26: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Deterministic Finite-State Automata (DFA’s)

Definition: A deterministic FSA is defined just like an NFA, except that

δ: Q x Σ → Q, rather thanδ: Q x Σ union {ε}→ 2Q

Thus, bothand

are impossible.

ε a

a

Page 27: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s.

Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.

Page 28: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular ExpressionsConversion from NFA’s to DFA’s:

• “Simulate” all moves of the NFA with the DFA.• The start state of the DFA is the start state of the

NFA (say, S), together with states that are ε-reachable from S.

• Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states.

• New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state.

• The final states in the DFA are those that contain any final state of the NFA.

Page 29: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Example: a*b + ba*

NFA

ε

b

b

ε ε

ε

1

3

4

2

5

6

a

a

Page 30: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

DFAInput

State a b123 23 456 23 23 6456 56 --- 6 --- --- 56 56 ---

a

b123

23

456 56

6b

a

a

a

Page 31: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

In general, if NFA has N states, the DFA can have as many as 2N states.

Example: ba (a + b)* ab

ε

a

ε ε

ε

3

5

6

4

7

8b a ε0 1 2

b

ε

ε

11 10 9

NFA

Page 32: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

DFAInput

State a b 0 --- 1 1 234689 --- 234689 34568910 34678934568910 34568910 34678911 346789 34568910 34678934678911 34568910 346789

Page 33: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

a

b

a

b

234689

346789

b

34568910

34678911a

b

a0 1 ab

Page 34: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

State Minimization

Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’.

Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.

Page 35: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} }

Π2 = { {1, 2, 3,}, {4}, {5} }

Π3 = { {1, 3}, {2}, {4}, {5} }

Note: Π2 is a refinement of Π1 , and Π3 is a refinement of Π2.

Page 36: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Minimization Algorithm:

1. Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable.

2. Partition all states into two groups (final states and non-final states).

3. Complete the “Next State” table for each group, by specifying transitions from group to group.Form the next partition: split groups in which Next State table entries differ.Repeat 3 until no further splitting is possible.

4. Determine start and final states.

Page 37: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular ExpressionsExample:

Π0 = { {1, 2, 3, 4}, {5} }

State a b1 1234 12342 1234 12343 1234 12344 1234 55 1234 1234

b

a

b

1

2

3 5

4

b

b aa

a

b

a

Split {4} from partition {1,2,3,4}

Page 38: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Π1 = { {1, 2, 3}, {4}, {5} }

State a b1 123 1232 123 43 123 1234 123 55 123 123

Split {2} from partition {1,2,3}

a

b

1

2

3 5

4

b

b aa

a

Page 39: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Π2 = { {1, 3}, {2}, {4}, {5} }

State a b1 2 133 2 132 2 44 2 55 2 13

No more splitting Minimal DFA

5

13

4

2a

a

aa

b

b

b

Page 40: Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Translators.

Regular Expressions

Summary of Regular Languages

• Smallest class in the Chomsky hierarchy.• Appropriate for lexical analysis.

• Four representations: RGR , RGL , RE and FSA.

• All four are equivalent; there are algorithms to perform transformations among them.

• Various advantages and disadvantages among these four, for language designer, implementor, and user.

• FSA’s can be made deterministic, and minimal.


Recommended