Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
1
Patterns,
Automata
&
Regular Expressions
int MSTWeight(int graph[][], int size){
int i,j;int weight = 0;
for(i=0; i<size; i++)for(j=0; j<size; j++)
weight+= graph[i][j];
return weight;}
1
1
nn
O(1)
O(1)
O(n) O(n)
Running Time = 2O(1) + O(n2) = O(n2)
MCS680:Foundations Of
Computer Science
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
2
Introduction
• A pattern is a set of objects with some recognizable property– Programming language identifiers
• Start with a character, may be followed by zero or more other characters or numbers
• [a-z|A-Z][a-z|A-Z|0-9|’_’]*
• Problems with patterns– Definition of patterns
– Recognition of patterns
• Uses for patterns– Programming language design
– Circuit design
– Text editors• Searching for words
– Operating system command processors• dir *.exe
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
3
State Machines and Automata
• Programs that search for patterns in data often have a special structure
• Track progress toward overall goal– Manage states
• Overall behavior of the program can be viewed as moving from state to state as it reads its input
• We can use a graph to represent the behavior of programs that search for patterns in data– Graph is called an automation
– Special nodes• Start node
• Accepting nodes (may be more than 1)
• Edges are called transitions
• The input is not accepted if we are not at an accepting node after all of the input is read
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
4
Example Automation(Finite State Machine)
• Consider an automation to recognize a sequence of characters that contains the characters ‘aeiou’ in order– Let (Lambda) be the entire alphabet of
acceptable characters. In this example, a-z and A-Z
• Sigma () is also used in many texts to represent the alphabet
0 1 2 3 4 5a e i o u
-a -e -i -o -u
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
5
Example Automation(Finite State Machine)
• Consider an automation to recognize a signed integer– May begin with a ‘+’,’-’ or integer value in
the range of 0-9
– Followed by zero or more occurrences of integers 0-9
– Let (Lambda) be the entire alphabet of acceptable characters. In this example, 0-9
0
1
3
2
‘+’
‘-’
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
6
Deterministic and Nondeterministic Automata
• Deterministic Automata– For any state s and any input x there is at most
one transition out of state s whose label includes x
– Simulating deterministic automata• Given that we are in state s and the next input is x
– We either transition out of state s or,– We “die” at state s
– Easy to convert a deterministic automata into a program
• NonDeterministic Automata– Nondeterministic automata are allowed (but not
required) to have two or more transitions containing the same symbol out of the same state
– Nondeterministic automata are allowed to have (e-moves) where we transition out of a state with no inputs (use an empty string character)
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
7
Nondeterministic AutomataExample
• Consider the language that accepts strings ending with “man”– Let (Lambda) be the entire alphabet of acceptable characters.
In this example, a-z and A-Z• There is an error in your books representation
0 1 2 3m a n
-m
-m
-a
-n
mm
0 1 2 3m a n
NFA
DFA
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
8
Nondeterministic AutomataExample
• Consider the language that accepts zero or more occurrences of the sub-strings:– {ab} or {aba}
– L = [ab|aba]* (regular expression)
0 a 1 2 3
4
b a
a
bba
b
ba
0 1
2
a
b
baOR
0 1
2
a
bae
DFA
NFA’s
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
9
Deterministic and Nondeterministic Automata
• Deterministic Automata are easy to code because all possible transitions are accounted for– From every possible state - every possible
input must be accounted for
– Makes state machine tough to construct
• Nondeterministic automata are simplier to construct, however they can not be directly coded due to the non-determinism– Not every possible input needs to be
accounted for at every possible state
– Input not accepted if a transition out of a state is not defined
– May use e-moves to move to new states in the absence of input
• Nondeterministic automata can be converted to deterministic automata by using the subset construction method
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
10
Subset Construction
• Elimination of the nondeterminisim from an automata
• Given a nondeterministic automata:– Build a new starting state by expanding e-moves
– Start at the starting state
– Build new states based on allowable transitions out of the starting state• Treat new states as sets of states from the original
NFA
– Take the new states (from above) and build more new states by considering the allowable transitions out of the state that you started with
– Continue this process until no new states are developed
– Any state ( which is a set of states from the original NFA) that contains an accepting state in the NFA is also an accepting state
– Construct the resultant DFA
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
11
Subset Construction (Example) - No e-moves
• Consider the NFA that accepts as input all strings that end with the substring “man”– Begin at the starting state (state 0): {0}
• From state zero we stay at state 0 for any letter other than ‘m’ ({0},-m,{0})
– We already have state 0
• We go to state 0 or state 1 with the letter ‘m’ ({0},m,{0,1})
– We create the new state {0,1}
– From state {0,1}:• If we get an ‘a’ we go to state 2 (from state 1) or state 0
(from state 0)– ({0,1}, a, {0,2}) - A new state
• If we get an ‘m’; we go to state 0 or state 1 (from state 0) nowhere to go from state 1
– ({0,1}, m, {0,1}) - Already have {0,1}
0 1 2 3m a n
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
12
Subset Construction (Example)
• Subset construction example continued– From state {0,1} (con’t)
• Anything besides an ‘a’ or ‘m’ - go to state 0 (from state 0) nowhere to go from state 1
– ({0,1}, -a-m,{0}) - Already have {0}
– From state {0,2}• If we get an ‘n’ we go to state 3 (from state 2) or
state 0 (from state 0)– ({0,2}, n, {0,3}) - A new state
• If we get an ‘m’; we go to state 0 or state 1 (from state 0) nowhere to go from state 2
– ({0,2}, m, {0,1}) - Already have {0,1}
• Anything besides an ‘n’ or ‘m’ we go to state 0 (from state 0) nowhere to go from state 2
– ({0,2}, -n-m, {0}) - Already have {0}
0 1 2 3m a n
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
13
Subset Construction (Example)
• Subset construction example continued– From state {0,3}
• If we get an ‘m’ we go to state 0 or state 1 (from state 0) nowhere to go from state 3
– ({0,3}, m, {0,1}) - Already have {0,1}
• Anything besides an or ‘m’ we go to state 0 (from state 0) nowhere to go from state 3
– ({0,3}, -m, {0}) - Already have {0}
• There are no new state
• Recap on the set of found states– {0}, {0,1}, {0,2}, {0,3}
• State {0} is the starting state
• State {0,3} is the only accepting state– It is the only state containing an accepting state from
the original NFA
0 1 2 3m a n
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
14
Subset Construction (Example)
• Now lets recall the transitions that we discovered– ({0},-m,{0}), ({0},m,{0,1}),
({0,1}, a, {0,2}), ({0,1}, m, {0,1}), ({0,1}, -a-m,{0}), ({0,2}, n, {0,3}), ({0,2}, m, {0,1}) , ({0,2}, -n-m, {0}) , ({0,3}, m, {0,1}) , ({0,3}, -m, {0})
{0} {0,1} {0,2} {0,3}
-m
m a n
m
-m-a
m
-m-n
m
-m
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
15
Subset Construction (Example) NFA with e-moves
• Construct the DFA from the NFA
• Notice how the language only consists of two characters {a,b}
• Step 1: Build new start state by expanding the e-moves– From state 0 we can reach states 1,2,3 by e-
moves
– Thus the new “logical” start state is{0,1,2,3}
– State {0,1,2,3}:• Input ‘a’: get to states {0,1,2,3,4} New state
• Input ‘b’ get to states {2,3,4} New state
0 2 4
1 3a
a a
bb
e
e
ee
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
16
Subset Construction (Example) NFA with e-moves
• Construct the DFA from the NFA (con’t)– State {0,1,2,3,4}
• Input ‘a’: get to state {0,1,2,3,4} Already known
• Input ‘b’: get to state {2,3,4} Already known
– State {2,3,4}• Input ‘a’: get to state {3,4} New state
• Input ‘b’: get to state {3,4} Just discovered
– State {3,4}• Input ‘a’: get to state {3,4} Already known
• Input ‘b’: get to state {} New state
– State {}• Input ‘a’ or ‘b’ get to state {}Already known
0 2 4
1 3a
a a
bb
e
e
ee
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
17
Subset Construction (Example) NFA with e-moves
• States– {0,1,2,3}, {0,1,2,3,4}, {2,3,4}, {3,4}, {}
• Start State: {0,1,2,3} obtained by traversing the e-moves from the start state in the NFA
• Accepting states: {0,1,2,3,4}, {2,3,4}, {3,4} because they all have state 4 which was an accepting state in the NFA
• Construct the DFA using the states and discovered transitions
{0,1,2,3}
{2,3,4}
{0,1,2,3,4}
{3,4} {}
b
a
b
a
a
b
a
a
b
b
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
18
Regular Expressions
• An automation graphically defines a pattern
• A regular expression algebraically defines a pattern
• A regular expression consists of a sequence of atomic operands– A character
– The empty string character, e
– The empty set character, – A variable that can be defined with any
regular expression
• A regular expression represents a set of strings that are often called a language– Language of atomic operands:
• L(x) = {x}
• L(e) = {e}
• L() = {}
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
19
Regular Expression Operators
• Union– Denoted by ‘|’
– If R and S are regular expressions then R|S denotes the union of the R and S languages
• L(R|S) = L(R) L(S)
• Concatenation– No special symbol for concatenation operation
– If R and S are regular expressions then RS denotes the concatenation of language S onto the back of language R
• L(RS) = L(R)L(S)
• Closure– Denoted by ‘*’ - Kleene closure or closure
– If R is a regular expression then R* indicates zero or more occurrences of language R
• L(R*) = L(R) L(R)L(R) L(R)L(R)L(R) L(R)L(R)L(R)L(R) ... = |R|RR|RRR|...
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
20
Regular Expression Examples
• Order of precedence of regular expressions– Kleene star
– Concatenation
– Union
• Examples– [a|b] = {a,b}
– [ab] = {ab}
– [a|ab] = {a,ab}
– [c|bc] = {c,bc}
– [a|ab][c|bc] = {ac,abc,abbc} omit 2nd {abc}
– [a*] = {e,a,aa,aaa,aaaa,aaaaa,...}
– [a|b]* = {e, a, b, aa, ab, ba, bb, ...}
– [a|bc*d] = {a,bd,bcd,bccd,bcccd,bccccd,...}
• Simplify by using precidance– [a|bc*d] = [a|b(c*)d] = [a|(b(c*))d]=
[a|((b(c*))d)] = [(a|((b(c*))d))]
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
21
Regular Expression Example
• Build a regular expression for programming language identifiers– Begins with a character
– Followed by any number of characters, integers or the underscore (‘_’) character
• Solution– letter = [a|b|...|y|z|A|B|...|Y|Z]
– integer = [0|1|2|3|4|5|6|7|8|9]
– underscore = [ _ ]
– identifier [letter|(letter|integer|underscore)*]
• Construct a regular expression for a signed integer– signed integer = [(+|-|e)(0|1|2|3|4|5|6|7|8|9)+]
– The ‘+’ is usually used to indicate one or more occurrences. The ‘+’ is only a simplification:
• [a+] = [aa*]
• [(0|1|...|8|9)+] =[(0|1|...|8|9) (0|1|...|8|9)*]
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
22
Converting A Regular Expression into an NFA
• There are 3 simple rule for converting a regular expression into an NFA
• NFA can then be converted into a DFA using the subset construction method
Union: R1|R2 R1 R2
R1
R2
e
e e
e
Concatenation: R1R2 R1 R2
R1 R2e e e
Closure: R1* R1
R1e e
e
e
Brian Mitchell ([email protected]) - Drexel University MCS680-FCS
23
Example: Converting A Regular Expression into an NFA
• Convert (ab|aab)* to an NFA– This is ((ab)|(aab))* = ((ab)|((aa)b))*
• Concatenation has higher precedence over union
Step 1: Handle the concatenation a e b
a e a e b
ab
aab
Step 2: Handle the uniona e b
a e a e b
ab|aab e
e
e
e
Step 3: Handle the closure
a e b
a e a e b
(ab|aab )*e
e
e
e
ee
e
e