Chapter 3 Regular Expressions and Languages

transcript

Chapter 3 Regular Chapter 3 Regular Expressions and LanguagesExpressions and Languages

Giza Pyramids, Egypt

OutlineOutline 3.1 Regular Expressions3.1 Regular Expressions 3.2 Finite Automata & Regular 3.2 Finite Automata & Regular

ExpressionsExpressions 3.3 Applications of RE’s3.3 Applications of RE’s 3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

3.1 Regular Expressions3.1 Regular Expressions

Use of Regular expressionsUse of Regular expressions– The regular expression is a kind of The regular expression is a kind of

generator for languages.generator for languages.

– It offers a “declarative” way of It offers a “declarative” way of expressing strings of symbols.expressing strings of symbols.

– It defines It defines all and onlyall and only regular languages regular languages (a theorem).(a theorem).

Applications of Regular expressionsApplications of Regular expressions– Used as commands for finding strings in Used as commands for finding strings in

Web browsers or text-formatting systems Web browsers or text-formatting systems (such as UNIX (such as UNIX grepgrep commands) commands)**

– Used as lexical analyzer generator (such LUsed as lexical analyzer generator (such Lex or Flex)ex or Flex) A lexical analyzer breaks source programs intA lexical analyzer breaks source programs int

o “tokens” (keywords, identifiers, signs, …)o “tokens” (keywords, identifiers, signs, …)

– The The grepgrep command searches files or standard input globally for li command searches files or standard input globally for lines matching a given regular expression, and prints them to the prnes matching a given regular expression, and prints them to the program's standard output. ogram's standard output.

Operators of Regular ExpressionsOperators of Regular Expressions

– Review of three operations on languages Review of three operations on languages LL

and and MM:: UnionUnion --- --- LL∪∪M = M = {{xx | | xxLL or or xxMM}}

ConcatenationConcatenation --- --- LM = LM = {{xyxy | | xxLL, , yyMM}}

– Example --- Example --- LL00 = { = {}, }, LL11 = = LL, , LL22 = = LL, …LL, …

ClosureClosure (or star, or Kleene closure) --- (or star, or Kleene closure) ---

LL** = = LL00∪∪LL11∪∪LL22∪∪......

Example 3.1 --- Example 3.1 --- (language)(language)

– ** = { = {} because } because 00 = { = {}.}.

– If If LL = {0, 1}, then = {0, 1}, then LL00 = { = {}, }, LL11 = = LL, , LL22 = =

{00. 01, 10, 11}, …{00. 01, 10, 11}, …

– If If LL is the set of all strings of 0’s, then it is the set of all strings of 0’s, then it can be proved that can be proved that LL** is is LL itself (see the itself (see the textbook for the proof).textbook for the proof).

3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions– Recursive definition of a regular expression Recursive definition of a regular expression (RE)(RE)

EE and the language which it defines, and the language which it defines, LL((EE):): BasisBasis: :

– Constants Constants and and are RE’s, defining languages { are RE’s, defining languages {} } and and , respectively , respectively LL(() = {) = {}, }, LL(() = ) = ..

– If If aa is a symbol, then is a symbol, then aa is an RE, defining the lang is an RE, defining the language {uage {aa} } LL((aa) = {) = {aa}. (note: }. (note: aa is of bold face)is of bold face)

– A variable like A variable like LL (capitalized and italic) (capitalized and italic) represents represents any language.any language.

3.1 Regular Expressions3.1 Regular Expressions 3.1.2 Building Regular Expressions3.1.2 Building Regular Expressions

– Recursive definition of an RE (cont’d):Recursive definition of an RE (cont’d): InductionInduction: given two RE’s : given two RE’s EE and and FF, then, then

– E E + + FF is an RE such that is an RE such that LL((E E + + FF) = ) = LL((EE))∪∪LL((FF) ) ((unionunion))

– EFEF is an RE such that is an RE such that LL((EFEF) = ) = LL((EE))LL((FF))((concatenationconcatenation))

– EE** is an RE such that is an RE such that LL((EE**) = () = (LL((EE))))** ((closureclosure))

– ((EE) is an RE such that ) is an RE such that LL((((EE)) = )) = LL((EE) ) ((parenthparenth

esizationesization).).

Examples (supplemental)(1/4)Examples (supplemental)(1/4) – RE RE FF = = 11 “expresses” the language “expresses” the language LL((11) = ) =

{1}.{1}.

– RE RE E = E = 11**

Language expressed by Language expressed by EE --- ---

LL = = LL((EE) = ) = LL((11**) = () = (LL((11))))* * = ({1})= ({1})** (closure of language)(closure of language)

= {= {, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}

= {1= {1nn | | nn 0} 0}

Examples (supplemental)(2/4)Examples (supplemental)(2/4)

– RE RE G = G = 0011**

Language expressed by Language expressed by GG --- ---

LL = = LL((GG) = ) = LL((0101**) = ) = LL((00))LL((11**) ) (concatenation)(concatenation)

= {0}{= {0}{, 1, 11, 111, 1111, …} , 1, 11, 111, 1111, …}

= {0, 01, 011, 0111, …}= {0, 01, 011, 0111, …}

= {01= {01nn | | nn 0} 0}

Examples (supplemental)(3/4)Examples (supplemental)(3/4)

– RE RE H = H = 11 + + 0011**

Language expressed by Language expressed by HH --- ---

LL = = LL((HH) = ) = LL((11 + + 0101**) = ) = LL((11) ) U U LL(0(011**))

= {1} = {1} UU {0, 01, 011, 0111, …} {0, 01, 011, 0111, …}

= {1, 0, 01, 011, 0111, …}= {1, 0, 01, 011, 0111, …}

= {1}= {1}UU{01{01nn | | nn 0} 0}

3.1 Regular Expressions3.1 Regular Expressions Examples (supplemental)(4/4)Examples (supplemental)(4/4)

– RE RE K = K = + + aa**

Language expressed by Language expressed by KK --- ---

LL = = LL((KK) = ) = LL(( + + aa**) = ) = LL(( ) ) UU LL((aa**))

= {= {} } UU { {a, aa, aaa, …}a, aa, aaa, …}

= {= {a, aa, aaa, …}a, aa, aaa, …}

= = LL((aa**))

That is, we have the following That is, we have the following RE equalitiesRE equalities::

+ + aa** = = aa** = = aa* * ++

Example 3.2 Example 3.2 – An RE defining a language of strings of An RE defining a language of strings of

alternating 0’s and 1’s alternating 0’s and 1’s (including none)(including none) is is one of the two below:one of the two below: ((0101))** + ( + (1010)* + )* + 00((1010))** + + 11((0101))* *

(0…1 1…0(0…1 1…0 0…0 1…1) 0…0 1…1) (( + + 11)()(0101)*()*( + + 00))

((Why? See the textbook.)Why? See the textbook.)

3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators

– Precedence Precedence

Highest --- Highest --- ** (closure)(closure)

Next--- . Next--- . (concatenation) (left to right)(concatenation) (left to right)

Last--- + Last--- + (union) (left to right)(union) (left to right)

Use parentheses anywhere to resolve ambiguityUse parentheses anywhere to resolve ambiguity

3.1.3 Precedence of RE operators3.1.3 Precedence of RE operators

– Example 3.3: Example 3.3:

Three ways to interpret Three ways to interpret 0101* + * + 1:1:

((00((11*)) + *)) + 1 1 by precedence above by precedence above (= (= 0101* + * + 1)1)

((0101)* +)* + 1 (another meaning) 1 (another meaning)

00((11* + * + 11) ) (a third meaning)(a third meaning)

3.2 FA’s & RE’s3.2 FA’s & RE’s

Theorems to be proved:Theorems to be proved:

– Every language defined by a DFA is also Every language defined by a DFA is also

defined by an RE.defined by an RE.

– Every language defined by an RE is also Every language defined by an RE is also

defined by an defined by an -NFA.-NFA.

Relations of theorems (yellow lines are to be Relations of theorems (yellow lines are to be proved): proved):

-NFA-NFA

NFANFA

DFADFA

3.2.1 From DFA’s to RE’s3.2.1 From DFA’s to RE’s– Theorem 3.4:Theorem 3.4:

If If LL = = LL((AA) for some DFA ) for some DFA AA, then there is an RE , then there is an RE RR such that such that LL = = LL((RR).).

ProofProof. . Prove by constructing progressively string sets defineProve by constructing progressively string sets define

d by a d by a certain RE formcertain RE form RRijij((kk)) until the entire set of accepuntil the entire set of accep

table strings (i.e., language table strings (i.e., language LL((AA)) is obtained.)) is obtained. Assume the states are {1, 2, ..., Assume the states are {1, 2, ..., nn} (1 is the start state).} (1 is the start state).

Meaning of Meaning of RRijij((kk)) --- ---

– RE RE RRijij((kk)) is used to denote the set of strings is used to denote the set of strings

ww such that such that Each Each ww is the label of a path from state is the label of a path from state ii to to

state state jj in DFA in DFA AA; ;

the path has the path has nono intermediateintermediate node whose node whose

number is larger than number is larger than kk..

– T construct T construct RRijij((kk)),, we use induction, starting we use induction, starting

at at kk = 0 and stop at = 0 and stop at kk = = n (the largest state n (the largest state number)number).. Then, when Then, when kk = = nn, , ii = =11, and , and jj specifies an specifies an

acceptingaccepting state, then state, then RRijij((kk)) defines a set of defines a set of

strings strings acceptedaccepted by DFA by DFA AA, with each string , with each string forming a path starting from the start state to forming a path starting from the start state to the accepting state.the accepting state.

Basis:Basis:

– when when kk = 0, all state numbers = 0, all state numbers 1, and so ther 1, and so ther

e is e is nono intermediate state in path intermediate state in path ii to to jj, leading , leading

to 2 cases:to 2 cases:

(1)(1) an arc (a transition) from an arc (a transition) from ii to to jj;;

(2)(2) a path from a path from ii to to ii itself. itself.

Basis (cont’d):Basis (cont’d):

– If If ii jj,, only only (1)(1) is possible, leading to 3 cases: is possible, leading to 3 cases:

no symbol for such a transition no symbol for such a transition RRijij(0) (0) = =

one symbol one symbol aa for the transition for the transition RRijij(0) (0) = = aa

multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,

RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm

Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---

Basis (cont’d) Basis (cont’d) ii jj::

RRijij(0) (0) = =

RRijij(0) (0) = = aa

RRijij(0) (0) = = aa11 + a + a22 + ... + a + ... + amm

a qi qj

a1+…+am qi qj

3.2 FA’s & RE’s3.2 FA’s & RE’s Meaning of Meaning of RRijij

((kk)) --- ---

Basis (cont’d):Basis (cont’d):

– If If ii = = jj,, only only (2)(2) is possible, which means is possible, which means there exists at least a there exists at least a path path from from ii to to ii itself, itself, in addition to the 3 cases:in addition to the 3 cases: no symbol for such a transition no symbol for such a transition RRijij

(0)(0)= =

one symbol one symbol aa for the transition for the transition RRijij(0)(0)= = + + aa

multiple symbls multiple symbls aa11, , aa22, ..., , ..., aamm for the transition, for the transition,

RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm

Meaning of Meaning of RRijij((kk)) (supplemental)(supplemental) --- ---

Basis (cont’d) Basis (cont’d) ii = = jj::

RRijij(0) (0) = =

RRijij(0) (0) = = + + aa

RRijij(0) (0) = = + + aa11 + a + a22 + ... + a + ... + amm

+a1+…+am

InductionInduction (to compute (to compute RRijij((kk)) ))::

– Suppose there is a path from Suppose there is a path from ii to to jj that goes t that goes through no state numbered higher than hrough no state numbered higher than kk. Th. Then, two cases should be considered:en, two cases should be considered: (1)(1) the path does not go through the path does not go through k k RRijij

((k-k-1)1)

(2) (2) the path goes through the path goes through k k at least once, then that least once, then the path may be broken into 3 pieces:e path may be broken into 3 pieces:

– through through ii to to kk without passing without passing kk RRiikk((k-k-1)1)

– from from kk to to kk itself itself ( (RRkkkk((k-k-1)1)))** (recusive) (recusive);;

– from from kk to to jj without passing without passing k k RRkkjj((k-k-1)1)..

Illustration of paths represented by Illustration of paths represented by RRiijj((kk)) ::

ki j……

((RRkkkk((k-k-1)1)))**

circulating zero or more times

RRkkjj((k-k-1)1)RRiikk

((k-k-1)1)

Induction (cont’d)Induction (cont’d)::

– The three pieces are concatenated to beThe three pieces are concatenated to be

RRikik((k-k-1)1)((RRkkkk

((k-k-1)1)))**RRkjkj((k-k-1)1)..

– Combining Combining (1)(1) & & (2)(2), we get the RE defini, we get the RE defini

ng “ng “all the paths from all the paths from ii to to jj that go through that go through

no state higher than no state higher than kk”” as as

RRijij((kk)) = R = Rijij

((k-k-1)1) + R + Rikik((k-k-1)1)((RRkkkk

((k-k-1)1)))**RRkjkj((k-k-1)1)..

Induction (cont’d)Induction (cont’d)::

– ConstructingConstructing RRikik((kk)) in the order of in the order of kk until until kk = =

nn for for ii = 1 = 1

– For each accepting state For each accepting state jjkk, we can get the , we can get the

union below as the resultunion below as the result

RR11jj11

((nn) ) ++ RR11jj22

((nn)) + ... + ... ++ RR11jjmm

((nn))

where {where {jj11, , jj22, ..., , ..., jjmm} are the set of final states, } are the set of final states, FF..

(End of proof of thereom)(End of proof of thereom)

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5Example 3.5

– Convert the following DFA into an RE.Convert the following DFA into an RE.

– RRijij(0)(0) may be constructed to be ( may be constructed to be (details in the next pagedetails in the next page):):

1startstart 20

R11(0) + 1

R12(0) 0

R21(0)

R22(0) ( + 0 + 1)

Example 3.5 Example 3.5 (cont’d)(cont’d)

– RR1111(0)(0) = = + + 11 because because (1, (1, 11) = 1 & going back to itself) = 1 & going back to itself

– RR1212(0)(0) = = 00 because because (1, (1, 00) = 2 (going out to state 2)) = 2 (going out to state 2)

– RR2121(0)(0) = = because there is no path from state 2 to 1because there is no path from state 2 to 1

– RR2222(0)(0) = ( = ( + + 00 + + 11) because ) because (2, (2, 00) = 2 & ) = 2 & (2, (2, 11) = 2 & ) = 2 &

going back to itselfgoing back to itself

1startstart 20

– We can then compute all We can then compute all RRijij((kk)) for for kk=1 & =1 & kk=2.=2.

– However, we may alternatively compute However, we may alternatively compute onlonlyy necessarynecessary terms of terms of RRijij((kk) ) backwardbackward from th from th

e final states, to save time.e final states, to save time.

1startstart 20

– There is only one final state 2, so only have There is only one final state 2, so only have to compute to compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)..

– Only have to compute Only have to compute RR1212(1)(1) and and RR2222

(1)(1), ,

withoutwithout computing computing RR2121(1)(1) and and RR1111

(1)(1)..

– To compute each of these terms, we need soTo compute each of these terms, we need some RE equalities to simplify intermediate reme RE equalities to simplify intermediate results.sults.

Some equalitiesSome equalities ( (RR is an RE): is an RE):

1.1. RR==RR== ( (==annihilatorannihilator for concatenation) for concatenation)

2.2. + + RR = = RR + + RR ( (==identityidentity for union) for union)

3.3. RR = = RR = = RR ( (= = identityidentity for concatenation) for concatenation)

4.4. (( + + aa))** = = aa* * == ((aa + + ))**

5.5. (( + + aa))aa** = ( = (aa** + + aaaa**) = ) = aa** + + aa++ = = aa**

aa**(( + + aa) = () = (aa** + + aa**aa) = ) = aa** + + aa++ = = aa**

(all provable by easy deduction)(all provable by easy deduction)

To compute To compute

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– RR1212(1)(1) = = RR1212

(0)(0) + + RR1111(0)(0)((RR1111

(0)(0)))**RR1212(0)(0)

= = 00 + ( + ( + + 11)()( + + 11))**00 (by substitutions) (by substitutions)

= = 00 + ( + ( + + 11))11**00 (by 4. in last page) (by 4. in last page)

= = 00 + + 11** 00 (by 5.) (by 5.)

= (= ( + 1 + 1**))00 (by distributive law) (by distributive law)

= = 11**00 (by 4.) (by 4.)

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– RR2222(1)(1) = = RR2222

(0)(0) + + RR2121(0)(0)((RR1111

(0)(0)))**RR1212(0)(0)

= (= ( + 0 + 1) + + 0 + 1) + (( + 1) + 1)**0 0 (by substitutions)(by substitutions)

= (= ( + 0 + 1) + + 0 + 1) + (by 1.)(by 1.)

= = + 0 + 1 + 0 + 1 (by 2.)(by 2.)

RR1212(2)(2) = = RR1212

(1)(1) + + RR1212(1)(1)((RR2222

(1)(1)))**RR2222(1)(1)

– Finally, Finally, RR1212(2)(2)

= 1= 1**0 +10 +1**0(0( + 0 + 1) + 0 + 1)**(( + 0 + 1) + 0 + 1) (by subst.)(by subst.)

= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)**(( + 0 + 1) + 0 + 1) (by 4.)(by 4.)

= 1= 1**0 +10 +1**0(0 + 1)0(0 + 1)* * (by 6.)(by 6.)

=1=1**0(0( + (0 + 1 + (0 + 1))**) (by distributive law)) (by distributive law)

= 1= 1**0(0 + 1)0(0 + 1)** (by 4.)(by 4.)

Check the correctness of the final resultCheck the correctness of the final result

RR1212(2)(2) = = 11**00((00 + + 11))**

correct (by looking at the diagram directly)! correct (by looking at the diagram directly)! The above method also works for NFA andThe above method also works for NFA and

--NFANFA. .

1startstart 20

3.2.2 Converting DFA’s to RE’s 3.2.2 Converting DFA’s to RE’s by Eliminating Sby Eliminating Statestates --- --- another wayanother way– Step 1 – regard symbols on arcs as RE’sStep 1 – regard symbols on arcs as RE’s– Step 2 – conduct the following conversionStep 2 – conduct the following conversion– Step 3 – collect RE’s for all the final statesStep 3 – collect RE’s for all the final states

(for a complete diagram of this, see textbook)(for a complete diagram of this, see textbook)

Sq1 q2

R11+ Q1S*P1

Fig. 3.7 (partial)

Fig. 3.8 (partial)

Details of Step 3:Details of Step 3:(1) For (1) For eacheach final state final state qq, eliminate all states , eliminate all states

as above except the start state as above except the start state qq00..

(2) If (2) If qq qq00, then a 2-state automaton is left , then a 2-state automaton is left as follows:as follows:

Corresponding RE is (Corresponding RE is (RR++SUSU**TT))**SUSU* * (provable (provable by the first method)by the first method)

qq00 qqSS

TTstartstart

Fig. 3.9

3.2 FA’s & RE’s3.2 FA’s & RE’s(3) If (3) If qq = = qq00, then perform , then perform one moreone more state state

elimination to eliminate elimination to eliminate qq, leaving only , leaving only the start state the start state qq00 as follows (see an as follows (see an example in the next page):example in the next page):

The corresponding RE is The corresponding RE is RR**..

(4)(4) Collect the result for each final state Collect the result for each final state derived as above to get the final result.derived as above to get the final result.

qq00startstartFig. 3.10

An example of Case (3) in the last page An example of Case (3) in the last page (supplemental)(supplemental)

– Regard Regard qq00 as two separate states, as two separate states, qq as as ss, and apply Figs. , and apply Figs.

3.7 & 3.8 to eliminate 3.7 & 3.8 to eliminate qq11 as follows: as follows:

qq00 qqYY

ZZstartstart

S= Vq0 q0

Q1=Y P1=Z

. . .. . .

R11+ Q1S*P1

=X+YV*Z

. . .. . .

Fig. 3.7 (partial) Fig. 3.8 (partial)

3.2 FA’s & RE’s3.2 FA’s & RE’s An example of Case (3) in the last page An example of Case (3) in the last page (supplemental) (supplemental)

(cont’d)(cont’d)

– Use the result Use the result RR1111 + + QQ11SS**PP11 = = XX + + YVYV**ZZ as as RR in Fig. 3.10 in Fig. 3.10

like the following:like the following:

– And the final result is And the final result is RR** = ( = (X + YV*ZX + YV*Z))**..

– This will be used in your homework.This will be used in your homework.

R=X + YV*Z R=X + YV*Z

qq00startstart

3.2 FA’s & RE’s3.2 FA’s & RE’s Example 3.5 revisitedExample 3.5 revisited

– Use the derivation for 2-state automaton described Use the derivation for 2-state automaton described previously directly to bepreviously directly to be

((RR++SUSU**TT))**SUSU* * = (= (1 1 + + 001111

= = 11* * 11 correct!correct!

1startstart 20

qq11 qq22

TTstartstart

– Step 1: regard all symbols on the arcs as RE’s, Step 1: regard all symbols on the arcs as RE’s, we getwe get

Astartstart B1

C0 + 1

D0 + 1

– Step 2: to remove B, use the following conversion we getStep 2: to remove B, use the following conversion we get

ss = = , , qq11 = A, = A, qq22 = C, = C, SS = = , , QQ11 = = 11, , PP11 = = 00 + + 11, , RR1111 = = , ,

so so RR1111 + + QQ11SS**PP11 = = + + 11**((00 + + 11) = ) = 11((00 + + 11) = ) = 11((00 + + 11))

R11+ Q1S *P1

. . .Astartstart B

C0 + 1

D0 + 1

Example 3.6 (cont’d)Example 3.6 (cont’d)

– For final state D, we have to remove C further, resulting inFor final state D, we have to remove C further, resulting in

ss = C, = C, qq11 = A, = A, qq22 = D, = D, SS = = , , QQ11 = =1(0 + 1)1(0 + 1), , PP11 = =00 + + 11, , RR1111= = , ,

so so RR1111 + + QQ11SS**PP11 = = + + 1(0 + 1)1(0 + 1)**((00 + + 11) = ) = 11((00 + + 11)()(00 + + 11))

. . .. . .

R11+ Q1S *P1

. . .. . .

Astartstart

C1(0 + 1)

D0 + 1

Example 3.6 (cont’d)Example 3.6 (cont’d)

– By the following conversion, we getBy the following conversion, we get

– RR = ( = (00 + + 11), ), qq11 =A, =A, qq22 =D, =D, SS = = 11((00 + + 11)()(00 + + 11),), T T = = , , UU = =

soso ( (RR++SUSU**TT))**SUSU** = (0+1+= (0+1+1(0 + 1)1(0 + 1)**))**((11((00 + + 11)()(00 + + 11)) )) ** = = ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)

Astartstart

1(0 + 1)(0 + 1)1(0 + 1)(0 + 1)D

qq11 qq22

TTstartstart

Example 3.6 (cont’d)Example 3.6 (cont’d)– For the other final state C, starting from the For the other final state C, starting from the

following diagramfollowing diagram

We have to eliminate D by the following diagramWe have to eliminate D by the following diagram

Astartstart

C1(0 + 1)

D0 + 1

. . .. . .

R11+ Q1S *P1

. . .. . .

Example 3.6 (cont’d)Example 3.6 (cont’d)– Since D has no successor (and C before it is a final state), Since D has no successor (and C before it is a final state),

deleting D has no effect to the other partsdeleting D has no effect to the other parts, resulting in the , resulting in the following diagram.following diagram.

And by the following conversion, we getAnd by the following conversion, we get

((RR++SUSU**TT))**SUSU** = = ((0 0 + + 1 1 + + 1(0 + 1)1(0 + 1)**))**((1(0 + 1)) 1(0 + 1)) **

= (= (0 0 + + 11))**1(0 + 1)1(0 + 1)

Astartstart

C1(0 + 1)

qq11 qq22

TTstartstart

Example 3.6 (cont’d)Example 3.6 (cont’d)– The final result is a sum of the previous two The final result is a sum of the previous two

derivation results:derivation results:

((0 0 + + 11))**1(0 + 1)1(0 + 1) + + ((00 + + 11))**11(0 + 1)(0 + 1)(0 + 1)(0 + 1)

3.2.3 Converting RE’s to Automata3.2.3 Converting RE’s to Automata

– Theorem 3.7Theorem 3.7 Every language defined by Every language defined by

an RE is also defined by an FA.an RE is also defined by an FA.

ProofProof. .

Basis. Basis. There are three cases, as shown There are three cases, as shown

below.below.

RE = RE =

RE = a

InductionInduction. Three cases need be considered.. Three cases need be considered.

(1) RE = (1) RE = RR + + SS

RE = R + S

(2) RE = (2) RE = RSRS

RE = RS

(3) RE =(3) RE = R R**

RE = R*

– Example 3.8Example 3.8 (see Fig. 3.18 in the (see Fig. 3.18 in the textbook).textbook).

Convert RE (Convert RE (00 + + 11)*)*11((0 0 + + 11) into a DFA.) into a DFA.

(a) (a) 00 + + 110

(b) ((b) (00 + + 1)1)**

(c) ((c) (00 + + 1)1)**11((00 + + 1)1)

Connect every two parts by an Connect every two parts by an -transition-transition0

3.3 Applications of RE’s3.3 Applications of RE’s

Two examples of uses of RE’sTwo examples of uses of RE’s

– Lexical analysisLexical analysis

– Text searchText search

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX

– RE’s used in UNIX are extended versions RE’s used in UNIX are extended versions

of RE’s, allowing of RE’s, allowing non-regularnon-regular languages to languages to

be recognized.be recognized.

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX– Rules for character classes:Rules for character classes:

The symbol . (dot) The symbol . (dot) any characters. any characters. [[aa11aa22……aakk] ] aa11 + + aa22 + … + + … + aakk

[[aa11--aakk] ] [ [aa11aa22……aakk]]

e.g., [0-9] e.g., [0-9] [0 1 … 9] [0 1 … 9] 00 + + 11 + … + + … + 99

[A-Z] [A-Z] A + B + … +Z A + B + … +Z

[A-Za-z0-9] [A-Za-z0-9] set of all letters and digits set of all letters and digits

[+[+.0-9] .0-9] characters for forming signed digits characters for forming signed digits

Special notationsSpecial notations

e.g., e.g., [:digit:][:digit:] = [0-9], = [0-9], [:alpha:][:alpha:] = [A-Za-z], = [A-Za-z], [:alnum:][:alnum:] = = [A-Za-z0-9][A-Za-z0-9]

3.3.1 RE’s in UNIX3.3.1 RE’s in UNIX

– Operators used in UNIX:Operators used in UNIX: | as union | as union + in RE + in RE

? as “zero ? as “zero or or one of” like one of” like RR? ? + + RR

+ as “one or more of” like + as “one or more of” like RR+ + RRRR* * (= (= RR++))

{{nn} as “} as “nn copies of” like R{5} copies of” like R{5} RRRRRRRRRR (= (= RR55))

– * still used in UNIX.* still used in UNIX.

3.3.2 Lexical analysis3.3.2 Lexical analysis– Example recalled (in Chapter 1)Example recalled (in Chapter 1)

’’[A-Z][a-z]*[A-Z][a-z]*[ ][ ][A-Z][A-Z][A-Z][A-Z]’’

means the following REmeans the following RE

(A+B+…+Z)(a+b+…+z)*(A+B+…+Z)(a+b+…+z)*__(A+B+…Z)(A+B+…+Z)(A+B+…Z)(A+B+…+Z)

where where __ means a blank.means a blank.

The above can be used to represent addresses The above can be used to represent addresses

like like Ithaca NY, Buffalo NYIthaca NY, Buffalo NY, … , …

3.3 Applications of RE’s3.3 Applications of RE’s 3.3.2 Lexical analysis3.3.2 Lexical analysis

– Each UNIX command lex or flex has a form:Each UNIX command lex or flex has a form:

UNIX-style REUNIX-style RE {code for lexical analyze{code for lexical analyzerr

generation}generation}

– ExamplesExamples else else {return(ELSE);}{return(ELSE);}

[A-Za-z][A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* {code to enter the{code to enter the found identifier ifound identifier i

nn the symbol table;the symbol table; return(ID)}return(ID)}

>=>= {return(GE);}{return(GE);} ……

3.3.3 Finding Patterns in Text3.3.3 Finding Patterns in Text– We can use RE’s in UNIX for pattern search in WeWe can use RE’s in UNIX for pattern search in We

b pagesb pages– Example: UNIX RE for addresses (incomplete)Example: UNIX RE for addresses (incomplete)

’’[0-9]+[A-Z]?[0-9]+[A-Z]?[ ][ ][A-Z][a-z]*([A-Z][a-z]*([ ][ ][A-Z][a-z]*)*[A-Z][a-z]*)*[ ][ ] (S (Street|Sttreet|St\.\.|Avenue|Ave|Avenue|Ave\.\.|Road |Rd|Road |Rd\.\.))’’

e.g., 123A Main Street, 20 Ta Hsueh Rd., …e.g., 123A Main Street, 20 Ta Hsueh Rd., …

– Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] Notes: 1. there is inconsistency in textbook; blanks should be replaced by [ ] (see p. 4 & p. 113 in the textbook)(see p. 4 & p. 113 in the textbook) 2. the backslash is used to differentiate a real dot from the dot used for 2. the backslash is used to differentiate a real dot from the dot used for ‘ ‘any character’)any character’)

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s

Purpose:Purpose:– To derive “high-level” algebraic laws for To derive “high-level” algebraic laws for

equivalent RE’sequivalent RE’s

Two RE’s are said to be Two RE’s are said to be equivalentequivalent if the if the

languages they define are identical. languages they define are identical.

The RE’s to be discussed include The RE’s to be discussed include variablesvariables, ,

instead of just constants like instead of just constants like , , 00, , 11, , aa, , 0101, ,

……

3.4.1 Associativity & Commutativity 3.4.1 Associativity & Commutativity – Assume Assume LL, , MM, and , and NN are RE’s ( are RE’s (variablesvariables))– Commutative law for unionCommutative law for union

LL + + MM = = MM + + LL – Associative law for unionAssociative law for union

((LL + + MM) + ) + NN = = LL + ( + (MM + + NN) ) – Associative law for concatenationAssociative law for concatenation

((LMLM))NN = = LL((MNMN) ) (Note: commutative law for concatenation is false)(Note: commutative law for concatenation is false)

3.4.2 Identities and Annihilators 3.4.2 Identities and Annihilators

– identity for union (identity for union ( + + LL = = LL + + = = LL))

– U U annihilator for union (U + annihilator for union (U + LL = = LL + U = U) + U = U)

– identity for concatenation (identity for concatenation (LL = = LL = = LL ) )

– annihilator for concatenation (annihilator for concatenation (LL = = LL = =

3.4 Algebraic Laws for RE’s3.4 Algebraic Laws for RE’s 3.4.3 Distributive Laws3.4.3 Distributive Laws

– Left distributive law of concatenation over Left distributive law of concatenation over unionunion

LL((MM + + NN) = ) = LMLM + + LNLN

– Right distributive law of concatenation over Right distributive law of concatenation over unionunion

((MM + + NN))LL = = MLML + + NLNL

Note: U: universal languageNote: U: universal language

3.4.4 The Idempotent Law 3.4.4 The Idempotent Law

– Idempotent law for unionIdempotent law for union

LL + + LL = = LL

Note: “idempotent” means Note: “idempotent” means 【數】冪等【數】冪等 (( 的的 ););

等冪等冪 (( 的的 ))

3.4.5 Laws Involving Closures3.4.5 Laws Involving Closures– ((LL**))** = = LL**

** = = ** = = – LL++ = = LL**LL = = LLLL** – ((LL + + MM))** = ( = (LL**MM**))**

– LL* = * = LL+ + + + (easy)(easy)– LL?? = = L L (definition of ? said before)(definition of ? said before)

(for proofs, see the textbook)(for proofs, see the textbook)

3.4.6 & 3.4.7 Discovering Laws for RE’3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Laws and A Test for an RE Algebraic Law

– It can be proved thatIt can be proved that

((LL + + MM))** = ( = (LL**MM**))* * is true is true iffiff (a + b) (a + b)** = (a = (a**bb**))** is true is true

3.4.6 & 3.4.7 Discovering Laws for RE’s a3.4.6 & 3.4.7 Discovering Laws for RE’s and A Test for an RE Algebraic Law (cont’nd A Test for an RE Algebraic Law (cont’d)d)– That is, replace variables in an RE equality with sThat is, replace variables in an RE equality with s

ingle symbols, and check if the resulting ingle symbols, and check if the resulting concreteconcrete RE equality can be proved to be true; if so, then tRE equality can be proved to be true; if so, then the original RE equality is also true.he original RE equality is also true.

Proof.Proof. By By Theorems 3.13 and 3.14.Theorems 3.13 and 3.14. For details, se For details, see the textbook. e the textbook. (iff = if and only if)(iff = if and only if)

3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental)

– ** = = ** = =

– rrrr** = = rr**rr

– rr** = r = r**rr** = ( = (rr**))** = r = r** + + rr**

– rr** = = + + rrrr** = = + + rr**rr = = + + rr** =( =( + + rr))** = ( = ( + + rr))rr**

– rr** = ( = (rr + + rr22 + … + + … +rrkk))** ((kk 1) 1)(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese

textbook)textbook)

3.4.7a Some RE Equalities 3.4.7a Some RE Equalities (supplemental) (supplemental) – rr** = = + + rr + + rr22 + … + + … + rrk k - 1 - 1 + + rrkkrr** ((kk 1) 1)

– ((pp + + qq))** = ( = (pp** + + qq**))**==((pp**qq**))**==pp**((qpqp**))**= (= (pp**qq))**pp**

– ((pqpq))**pp = = pp((qpqp))**

– ((pp**qq))* * = = + ( + (pp + + qq))**qq

– ((pqpq**))* * = = + + p p((pp + + qq))**

(for proofs, see the text and exercises of Chapter 6 in my Chinese (for proofs, see the text and exercises of Chapter 6 in my Chinese textbook)textbook)

Chapter 3 Regular Expressions and Languages

Documents