+ All Categories
Home > Documents > Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf ·...

Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf ·...

Date post: 10-Aug-2018
Category:
Upload: lenhan
View: 218 times
Download: 0 times
Share this document with a friend
27
Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea [email protected] 1 / 27
Transcript
Page 1: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Regular Expressions

Seungjin Choi

Department of Computer Science and EngineeringPohang University of Science and Technology

77 Cheongam-ro, Nam-gu, Pohang 37673, [email protected]

1 / 27

Page 2: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Outline

I Regular expressions: One type of language-defining notation

I Regular expressions and finite automata

I Regular grammars (will be discussed in Lecture 4)

2 / 27

Page 3: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Operators

I Union:L ∪M = {w |w ∈ L or w ∈ M}.

I Concatenation:

LM = {w |w = xy , x ∈ L, y ∈ M}.

I Powers:L0 = {ε}, L1 = L, Ln+1 = LLn.

I Kleene Closure (star-closure):

L∗ = L0 ∪ L1 ∪ L2 · · · = ∪∞i=0Li .

3 / 27

Page 4: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

∅i

Question: What are ∅0,∅i ,∅∗?

I ∅0 = {ε}.I ∅i , i ≥ 1 is empty since we cannot select any strings from the empty

set.

I ∅∗ = {ε}.

4 / 27

Page 5: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Regular Expressions

I A FA (DFA or NFA) is a ”blueprint” for constructing a machinerecognizing a regular language. (machine-like description)

I A regular expression is a ”user-friendly” declarative way of describinga regular language. (algebraic description)

I Involves a combination of:I strings of symbols from ΣI parenthesesI operators (+, ·, ∗). (union, concatenation, star-closure)

5 / 27

Page 6: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Examples

I {a, b, c} (regular language) ⇐⇒ a + b + c (regular expression).

I (a + b · c)∗ represents the star-closure of {a} ∪ {bc}, which is{ε, a, bc, abc, bca, bcbc, aaa, . . .}.

I 01∗ + 10∗: The language consisting of all strings that are either asingle 0 followed by any number of 1’s or a single 1 followed by anynumber of 0’s.

I UNIX grep command

I UNIX Lex (lexical analyzer generator) and Flex (fast

lex) tools

6 / 27

Page 7: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Inductive Definition of Regular Expressions

DefinitionLet Σ be a given alphabet. Then

1. ∅, ε, and a ∈ Σ are all regular expressions. These are called primitiveregular expressions.

2. If r1 and r2 are regular expressions, so are r1 + r2, r1 · r2, r∗1 , (r1).

3. A string is a regular expression if and only if it can be derived fromprimitive regular expressions by a finite number of applications of therules described above.

7 / 27

Page 8: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Language L(r)

DefinitionThe language L(r) denoted by any regular expression r , is defined by thefollowing rules:

1. ∅ is a regular expression denoting the empty set.

2. ε is a regular expression denoting {ε}.3. For every a ∈ Σ, a is a regular expression denoting {a}.4. L(r1 + r2) = L(r1) ∪ L(r2).

5. L(r1r2) = L(r1)L(r2).

6. L((r1)) = L(r1).

7. L(r∗1 ) = (L(r1))∗.

8 / 27

Page 9: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Example

Exhibit the language L (a∗ · (a + b)) in set notation.

Solution.

L (a∗ · (a + b)) = L(a∗)L(a + b)

= (L(a))∗ (L(a) ∪ L(b))

= {ε, a, aa, aaa, . . .} · {a, b}= {a, aa, aaa, . . . , b, ab, aab, . . .}.

9 / 27

Page 10: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Another Example

Given Σ = {a, b} and r = (a + b)∗(a + bb), find L(r).

L(r) = L ((a + b)∗) L(a + bb)

= (L(a + b))∗ (L(a) ∪ L(bb))

= (L(a) ∪ L(b))∗(L(a) ∪ L2(b)

)= {a, b}∗ ({a} ∪ {bb})= {a, b}∗{a, bb}= {ε, a, b, aa, ab, ba, bb, aaa, aab, . . .}{a, bb}= {a, bb, aa, abb, ba, bbb, . . .}.

(a + b)∗ represents any string of a’s and b’s.

{a}∗ = {ε, a, aa, . . .}.

10 / 27

Page 11: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

More Examples

1. Given r = (aa)∗(bb)∗b, determine L(r).

The set of strings with an even number of a’s followed by an odd numberof b’s, i.e.,

L(r) = {a2nb2m+1 | n ≥ 0,m ≥ 0}.

2. For Σ = {0, 1}, give a regular expression r such that

L(r) = {w ∈ Σ∗ |w has at least one pair of consecutive zeros}.

r = (0 + 1)∗00(0 + 1)∗.

3. Find a regular expression for

L = {w ∈ {0, 1}∗ |w has no pair of consecutive zeros}.

r = (1∗011∗)∗ (0 + ε)︸ ︷︷ ︸ending in 0

+ 1∗(0 + ε)︸ ︷︷ ︸all 1′s

. Alternatively, we see L as the

repetitions of the strings 1 and 01, i.e., r = (1 + 01)∗(0 + ε).

11 / 27

Page 12: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Connection between Regular Expressions and RegularLanguages

For every regular language, there is a regular expression and vice versa.

TheoremLet r be a regular expression. Then, there exists some NFA that acceptsL(r). Consequently L(r) is a regular language.

Proof is shown in the next several slides!

12 / 27

Page 13: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

ProofWe begin with automata that accept languages for simple regularexpressions ∅, ε, and a ∈ Σ.

q0 q1

q0 q1e

q0 q1a

M(r)

NFA accepts ∅

NFA accepts ε

NFA accepts a

NFA accepts L(r)

13 / 27

Page 14: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

ǫ

ǫ

ǫ

ǫ

M(r1)

M(r2)

Automaton for L(r1 + r2)

ǫǫ ǫM(r1) M(r2)

Automaton for L(r1r2)

14 / 27

Page 15: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

ǫ

ǫǫ

ǫ

M(r1)

Automaton for L(r∗1 )

W.l.g we consider a NFA with a single final state.

Just like building automata for L(r1 + r2), L(r1r2), L(r∗1 ), we can buildautomata for arbitrary regular expressions. �

15 / 27

Page 16: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Example

Find an NFA which accepts L(r) where r = (a + bb)∗(ba∗ + ε).

16 / 27

Page 17: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Generalized Transition Graph

Generalized transition graph is a transition graph whose edges are labeledwith regular expressions.

c∗

a+ b

a

L (a∗ + a∗(a + b)c∗)

17 / 27

Page 18: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Remove State q

qi q qj

a b

cde

qi qj

ae∗b

ce∗d ce∗bae∗d

After removing state q

18 / 27

Page 19: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Canonical Form

qi qj

r1

r2

r3 r4

Associated regular expression is r = r∗1 r2 (r4 + r3r∗1 r2)∗.

19 / 27

Page 20: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Regular Language and Regular Expression

TheoremLet L be a regular language. Then there exists a regular expression r suchthat L = L(r).

Proof. Let M be an NFA that accepts L. W.l.g. we can assume that Mhas only one final state and that q0 /∈ F . We interpret the graph M as ageneralized transition graph and apply the construction (illustrated inprevious slides) to it. We use the method of removing state q. Wecontinue this process, removing one state after the other, until we reachthe canonical form. Then the regular expression is of the form in theprevious slide. �

20 / 27

Page 21: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

ExampleFind a regular expression for

L = {w ∈ {a, b}∗ | na(w) is even and nb(w) is odd}.

The associate DFA is given by

EE OE

OOEO

a

a

a

a

b bbb

21 / 27

Page 22: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

The canonical graph is of the form

EE EO

aa+ ab(bb)∗ba

b+ ab(bb)∗a

b+ a(bb)∗ba a(bb)∗a

22 / 27

Page 23: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Algebraic Laws: Commutativity and Associativity

I Commutative law for union: r1 + r2 = r2 + r1I Commutative law for concatenation: r1r2 6= r2r1I Associative law for union: (r1 + r2) + r3 = r1 + (r2 + r3)

I Associative law for concatenation: (r1r2)r3 = r1(r2r3)

23 / 27

Page 24: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Algebraic Laws: Identities and Annihilators

I Identities (∅ and ε)

∅ + r = r + ∅ = r

ε r = r ε = r

I Annihilator (∅)

∅ r = r ∅ = ∅

24 / 27

Page 25: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Algebraic Laws: Distributive Laws

I Left distributive lat of concatenation over union

r1(r2 + r3) = r1r2 + r1r3

I Right distributive law of concatenation over union

(r2 + r3)r1 = r2r1 + r3r1

25 / 27

Page 26: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Algebraic Laws: Idempotent Law

DefinitionAn operator is said to be idempotent if the result of applying it to two ofthe same values as arguments is that that value.

I Common arithmetic operators are not idempotent, i.e.,

x + x 6= x , x × x 6= x .

I In regular expressions, idempotent law for union

r + r = r

26 / 27

Page 27: Seungjin Choi - POSTECHmlg.postech.ac.kr/~seungjin/courses/automata/handouts/handout03.pdf · Regular Expressions Seungjin Choi Department of Computer Science and Engineering Pohang

Algebraic Laws: Laws Involving Closures

I (r∗)∗ = r∗

I ∅∗ = ε

I ε∗ = ε

I r+ = rr∗ = r∗r

I r∗ = r+ + ε

27 / 27


Recommended