Regular Expressions Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida...

Post on 04-Jan-2016

213 views 1 download

transcript

Regular Expressions

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language Translators

Regular Expressions

• A compact, easy-to-read language description.

• Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.

Regular ExpressionsDefinition: A regular expression over an alphabet Σ is

recursively defined as follows:

1. ø denotes language ø 2. ε denotes language {ε}3. a denotes language {a}, for all a Σ.4. (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s.5. (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s.6. P* denotes L(P)*, where P is r.e.

To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +

Regular Expressions

Examples:(O + 1)*: any string of O’s and 1’s.(O + 1)*1: any string of O’s and 1’s, ending with a 1.1*O1*: any string of 1’s with a single O inserted.Letter (Letter + Digit)*: an identifier.Digit Digit*: an integer.Quote Char* Quote: a string. †# Char* Eoln: a comment. †{Char*}: another comment. †

† Assuming that Char does not contain quotes, eoln’s, or } .

Regular Expressions

Conversion from Right-linear grammars to regular expressions

Example:S → aS R → aS → bR → ε

What does S → aS mean? L(S) {a}·L(S)

S → bR means L(S) {b}·L(R)S → ε means L(S) {ε}

Regular Expressions

Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε}or S = aS + bR + ε

Similarly, R → aS means R = aS.

Thus, S = aS + bR + ε R = aS

System of simultaneous equations, in which the variables are nonterminals.

Regular Expressions

Solving systems of simultaneously equations.S = aS + bR + εR = aS

Back substitute R = aS:S = aS + baS + ε

= (a + ba) S + ε

Question: What to do with equations of the form:X = X + β ?

Regular Expressions

Answer: β L(x), so αβ L(x), ααβ L(x), αααβ L(x), …

Thus α*β = L(x).

In our case,S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*

Regular Expressions

Right-linear regular grammar↓

regular expression

1. A = α1 + α2 + … + αn if A → α1

→ α2

. . . → αn

Regular Expressions

2. If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α.

If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β.

Note: Some algebraic manipulations may be needed to obtain the form X = αX + β.

Important: Catenation is not commutative!!

Regular Expressions

Example: S → a R → abaU U → aS → bU → U → b → bR

S = a + bU + bRR = abaU + U = (aba + ε) UU = aS + b

Back substitute R:S = a + bU + b(aba + ε) UU = aS + b

Regular Expressions

Back substitute U:S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb

= (ba + babaa)S + (a + bb + babab)

thereforeS = (ba + babaa)*(a + bb + babab)

repeats

Regular Expressions

Summarizing:

RGR RGL Minimum

DFA

RE NSA DFA

Done

Soon

Regular Expressions

Regular Expression↓

NFA

Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state.

Conversions:

if ø21

Regular Expressions

• if ε

• if a

• if P + Q

• if P· Q

or

1

1 2a

1 2

ε

Q

P

ε ε

ε

P Qε

1 Pε

Q 2ε ε

Regular Expressions if P*

Example: (b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

1 Pε

2

ε

ε

ε

1 2

3 4

5 6

b

a

b

Regular Expressions

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

(b (aba + ε) a)*

7 8

9

10 11

a

a

3 4 5 6

78

a b

a

εε

Regular Expressions

(b (aba + ε) a)*

(b (aba + ε) a)*

3 4 5 6

78

a b

a

εε

13912

εεε

ε

3 4 5 6

78

a b

a

εε

13912

εεε

ε

2 1b

ε

Regular Expressions

(b (aba + ε) a) *

3 4 5 6

78

a b

a

εε

13912

εεε

ε

2 1b

ε

1011

εa

Regular Expressions

(b (aba + ε) a)*

2 12 3 4

67

ε a

ε

ε

813 aε

14 1ε b

10 ε ε

59ε

ε

11ε

a15

ε

ε

Regular Expressions

Regular Expression↓

NFA

Start With:

ALGORITHM 2

E

Regular ExpressionsApply Rules:

a*

a + b

ab

ε εa

a b

a

b

Regular Expressions

Algorithm 1:• Builds FSA bottom up• Good for machines• Bad for humans

Algorithm 2:• Builds FSA top down• Bad for machines• Good for humans

Arguable

Regular Expressions

Example (Algorithm 2):

(a + b)* (aa + bb)

(a + b)* aa + bb

ε εaa

bba + b

ε ε

a

b

a a

b b

Regular Expressions

Example (Algorithm 2):

ba(a + b)* ab

b a ε ε a b

a

b

Regular Expressions

Deterministic Finite-State Automata (DFA’s)

Definition: A deterministic FSA is defined just like an NFA, except that

δ: Q x Σ → Q, rather thanδ: Q x Σ union {ε}→ 2Q

Thus, bothand

are impossible.

ε a

a

Regular Expressions

Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s.

Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.

Regular ExpressionsConversion from NFA’s to DFA’s:

• “Simulate” all moves of the NFA with the DFA.• The start state of the DFA is the start state of the

NFA (say, S), together with states that are ε-reachable from S.

• Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states.

• New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state.

• The final states in the DFA are those that contain any final state of the NFA.

Regular Expressions

Example: a*b + ba*

NFA

ε

b

b

ε ε

ε

1

3

4

2

5

6

a

a

Regular Expressions

DFAInput

State a b123 23 456 23 23 6456 56 --- 6 --- --- 56 56 ---

a

b123

23

456 56

6b

a

a

a

Regular Expressions

In general, if NFA has N states, the DFA can have as many as 2N states.

Example: ba (a + b)* ab

ε

a

ε ε

ε

3

5

6

4

7

8b a ε0 1 2

b

ε

ε

11 10 9

NFA

Regular Expressions

DFAInput

State a b 0 --- 1 1 234689 --- 234689 34568910 34678934568910 34568910 34678911 346789 34568910 34678934678911 34568910 346789

Regular Expressions

a

b

a

b

234689

346789

b

34568910

34678911a

b

a0 1 ab

Regular Expressions

State Minimization

Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’.

Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.

Regular Expressions

Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} }

Π2 = { {1, 2, 3,}, {4}, {5} }

Π3 = { {1, 3}, {2}, {4}, {5} }

Note: Π2 is a refinement of Π1 , and Π3 is a refinement of Π2.

Regular Expressions

Minimization Algorithm:

1. Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable.

2. Partition all states into two groups (final states and non-final states).

3. Complete the “Next State” table for each group, by specifying transitions from group to group.Form the next partition: split groups in which Next State table entries differ.Repeat 3 until no further splitting is possible.

4. Determine start and final states.

Regular ExpressionsExample:

Π0 = { {1, 2, 3, 4}, {5} }

State a b1 1234 12342 1234 12343 1234 12344 1234 55 1234 1234

b

a

b

1

2

3 5

4

b

b aa

a

b

a

Split {4} from partition {1,2,3,4}

Regular Expressions

Π1 = { {1, 2, 3}, {4}, {5} }

State a b1 123 1232 123 43 123 1234 123 55 123 123

Split {2} from partition {1,2,3}

a

b

1

2

3 5

4

b

b aa

a

Regular Expressions

Π2 = { {1, 3}, {2}, {4}, {5} }

State a b1 2 133 2 132 2 44 2 55 2 13

No more splitting Minimal DFA

5

13

4

2a

a

aa

b

b

b

Regular Expressions

Summary of Regular Languages

• Smallest class in the Chomsky hierarchy.• Appropriate for lexical analysis.

• Four representations: RGR , RGL , RE and FSA.

• All four are equivalent; there are algorithms to perform transformations among them.

• Various advantages and disadvantages among these four, for language designer, implementor, and user.

• FSA’s can be made deterministic, and minimal.