+ All Categories
Home > Documents > Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For...

Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For...

Date post: 21-Dec-2015
Category:
Upload: doris-logan
View: 215 times
Download: 0 times
Share this document with a friend
62
Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given input string is a legal identifier in some language is a legal literal of a particular type in some particular programming language
Transcript
Page 1: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Uses of DFAs

DFAs are a way of handling certain specialized issues involving character strings

For example, one might want to determine whether a given input string– is a legal identifier in some language– is a legal literal of a particular type in some

particular programming language– is a possible word in a given natural language– contains a particular substring

Page 2: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Recognizing legal identifiers/literals

Typical sets of legal identifers (or literals) include those– strings beginning with a letter followed by some

digits– strings consisting only of letters– strings that containing at most one digit– strings that begin with the underscore character

Note that all of these sets are languages

Page 3: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

What’s a DFA (roughly)?

A DFA is an abstraction of a computing machine– or more precisely, to a machine with a program

It answers questions about membership in certain languages

That is, it answers yes/no questions of the form: is a string in a particular set of strings?– note that the examples given above all have this

form

Page 4: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

An abstract machine for a simple switch

An abstract light switch might have two states One would correspond to the “on” position Transitions would be possible between states It would need to be clear which state was the

initial state

This information could be summarized as in the diagram below

Page 5: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

An abstract machine diagram for a simple switch

Page 6: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Labeled transitions

This machine is not yet a DFA– it has nothing to do with strings or languages.

By labeling the transitions with symbols from an alphabet, we’d get a DFA (shown below)

If both transitions were labeled with a 0, the resulting DFA would correspond to the set of strings of odd length over the alphabet {0}.

That is, sequences of transitions that took us from the initial state to the “on” state would correspond to sequences of 0’s of odd length

Page 7: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

A DFA for a simple switch

This DFA was constructed with JFLAP

Page 8: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Specifying a DFA

Just as in our example, specifying a DFA requires specifying – An input alphabet ({0} in our example)– A set of states– An initial state (the “off” state in our example)– A set of final states (consisting only of the “on”

state in our example)– A set of transitions from state to state, each

labeled with a symbol from the input alphabet

Page 9: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

DFA pragmatics

Realistic examples can have large alphabets and thus ugly diagrams– so we’ll use simple alphabets in most examples

States from which no final states are reachable can be omitted from diagrams– We call these dead states (or trap states)

Page 10: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Generalizing the alphabet

Suppose we generalize our example to a question about a larger alphabet – say {0,1}

For example, we might care about strings with an odd number of both 0’s and 1’s.

Unsurprisingly, we need more than two states to recognize this language.

A DFA for this language is given below– note the conventions for initial and final states,

and for state names

Page 11: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Recognizing strings with an odd number of both 0's and 1's

Page 12: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

What’s a DFA – precisely?

A DFA (deterministic finite acceptor – or automaton) consists of– A finite (input) alphabet – A finite set Q of states– An initial (or start) state q0 (from Q)– A set F of final states (a subset of Q)– A transition function : Q x Q

Note that both the alphabet and set of states are finite -- a DFA has a finite description

Page 13: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Tabular representation of DFAs

Tabular representation of the DFA above: – | 0 1 – q0 | q1 q2

– q1 | q0 q3

– q2 | q3 q0

– F q3 | q2 q1

Note that the table implicitly gives the states, the start state, and the input alphabet– The final state(s) must be explicitly labeled

Page 14: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The extended transition function

may be extended to handle strings This gives a function *: Q x * Q * is defined for all states q, strings w, and

symbols a by– *(q,) = q– *(q,wa) = (*(q,w), a)

Note that * extends in the sense that *(q,a) = (q,a)– to see this, let w = above

Page 15: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Notational conventions

Lower-case letters near q in the English alphabet are used for states, perhaps with subscripts

If only one DFA is under discussion– we use , Q, q0, F, , and * without comment

– we typically assume that the DFA is called M and its language is called L.

Page 16: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Acceptance of a string

A string x is accepted by a DFA M if and only if *(q0,x) F, – where * is the extended transition function

Fact: For this function **(q,xy) = * (*(q,x), y) for any q, x, and y

Page 17: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Acceptance of a language

A language L is accepted (or recognized) by M iff every string in L is accepted by M, and no string outside of L is accepted by M

A DFA M accepts exactly one language.– this language is denoted L(M)

A language is regular iff it is accepted by some DFA.– fact: all finite languages are regular

Two DFAs are equivalent iff they accept the same language.

Page 18: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

More examples of DFAs

Strings alternating 0's and 1'sState q3 is a dead state

Page 19: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Still another example of a DFA

For binary representations of integers (without leading 0’s)– all strings over {0,1} beginning with 1– dead state omitted

Page 20: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Yet another example of a DFA

Strings over {0,1,.} that contain at most one period

This DFA is not minimal

Page 21: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

A DFA for binary multiples of 3

Here we identify with 0 – a good exercise is to redo this example without this assumption

Page 22: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Constructing large DFAs

DFAs – especially large ones, are hard to construct by hand.

To help build DFAs, we’ll use the following notions nondeterminism -moves – moves that don’t consume input

symbols the regular expression notation for languages

Page 23: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Regular expressions

Regular expressions are used to represent languages.

It turns out that every regular expression represents a regular language,

Also, every regular language may be represented by a regular expression.

The language represented by the regular expression r is denoted L(r).

Page 24: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Regular expressions defined

An expression is a regular expression iff it has one of the following forms:– or a, where a is a member of some – r+s, rs, r*, or (r), where r and s are regular

expressions.

The expressions and a respectively represent {}, and {a}

The expressions r+s, rs, r*, and (r) respectively representL(r) U L(s), L(r)L(s), L(r)*, and L(r)

Page 25: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

About regular expressions

Don’t confuse the empty language with the nonempty language {

The precedence of the * operator is higher than for juxtaposition; for +, it’s lower.

These pairs of regular expressions are equivalent (represent the same language)– r and r; r and r– (rs)t and r(st)– (r+s)t and rt + st; and r(s+t) and rs + rt– r* and (r*)*; and (r+s)* and (r*s*)*

Page 26: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Examples of regular expressions

(0+1+2)*– all strings over {0,1,2}

a(a+b+c)*– all strings over {a,b,c} starting with a

(0+1)*010(0+1)*– all strings over {0,1} having 010 as a substring

000+001+010+011+100+101+110+111– all bit strings of length 3

(0+1)(0+1)(0+1)(0+1)– all bit strings of length 4

Page 27: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Regular expressions and regular languages

For every regular expression r, L(r) is a regular language.

If L is a regular language, then L = L(r) for some regular expression r.

We will soon see constructive proofs of these claims.

It’s in the construction for the first claim that we use nondeterminism we first build a nondeterministic FA, and then find

a determistic equivalent

Page 28: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Problems with constructing DFAs

For some languages, the FAs that are most natural are incorrect. For example,– for (a+b)*ab(a+b)*, the natural FA below isn't

deterministic ( is not a function)– for 0*1*, the natural FA recognizes (0+1)*– for 10+(12)*, the natural FA isn't deterministic,

and accepts too many strings– for (10)*12, the natural FA isn't deterministic

Page 29: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

An attempt at a DFA for L( (a+b)*ab(a+b)* )

Page 30: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Nondeterminism

One way of dealing with this issue is to allow finite automata a choice of moves for a given state and symbol.

This approach is called nondeterminism. It can be easier to build a nondeterministic

FA for a language than a DFA– e.g., the FA of the previous slide is a legal

nondeterministic FA– and it recognizes the appropriate language

Page 31: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Allowing -moves

In some cases, it is also useful to allow -moves that don't consume an input symbol. – we'll consider this to be just another form of

nondeterminism

It turns out that adding nondeterminism does not allow us to accept any new languages.– this is true whether or not -moves are allowed

Page 32: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Modeling nondeterminism

Ways of thinking of nondeterminism: omniscience

– suppose that you could always guess correctly

branching processes– allocate a new processor for each choice

simultaneous processes – allow being in several states at once

backtracking– try each choice one after another

Page 33: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Sets of states

Our treatment of nondeterminism is based on the third approach above.

We will allow the computation to be in several states at once (or no state at all).

This requires a new transition function:– : Q x (U 2Q

– note that is a legal 2nd argument, and that the value is a (possibly empty) set of states.

Page 34: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

NFAs

If our definition of a DFA is modified to use the new transition function, we get an NFA– Note that the “N” stands for “nondeterministic”

rather than “not”.

The FA shown above for L((a+b)*ab(a+b)*) is a legal NFA for that language, with table: | a b

q0 | {q0, q1} {q0} {} q1 | {} {q2} {} F q2 | {q2} {q2} {}

UU

Page 35: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Acceptance by NFAs (informal)

An NFA accepts a string x iff *(q0, x) contains a state of F, where * is defined informally in Linz (p. 51).

Informally, *(q, x) is the set of states you can get to from q by reading the symbols of x in order

Less informally, we should be able to follow an edge labeled with at any time.

Page 36: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Acceptance by NFAs (formally)

More precisely, *(q, ) is the set of states accessible from q by

zero or more moves

*(q, a) is the set of states accessible from q by zero or more moves, then one a move and then zero or more moves

*(q, wa) is the union over all p in *(q, w) of *(p, a)

*(S, x) is the union over all q in S of *(q, x)

Page 37: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

An NFA for L((10)*12)

A table for an NFA for (10)*12:

 – δ | 0 1 2 – q0 | {} {q1,q2} {} {}

– q1 | {q0} {} {} {}

– q2 | {} {} {q3} {}

– F q3 | {} {} {} {}

Page 38: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The use of moves in NFAs

A bad NFA for 0*1*:

  A good NFA for 0*1*:

Page 39: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Constructing DFAs

We’ll soon have algorithms to find equivalent DFAs for NFAs or regular expressions.

In other cases, try starting with a start state. For every new state, determine destination

states from this state on every symbol– tricky part: must these destination states be new

states, or can old states be reused?– useful observation: if x and y go to the same

state, so do xz and yz for all z

Page 40: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Equivalence of NFAs and DFAs

Claim: Any language accepted by an NFA is regular

To show the claim, we need to find, for any NFA MN, a DFA MD with L(MD) = L(MN)

Idea: let states of the DFA correspond to sets of states of the NFA– we’ll use brackets to denote states of MD

Although there will be 2m states (if MN has m states), many will be dead or inaccessible

Page 41: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Constructing equivalent DFAs

It's easiest to generate only those states that are reachable from the start stateThe start state of MD will be [*(q, )]

[S] is final iff S contains a final state of MN

We define D by letting D([S],a) correspond to the union over all q in S of N*(q,a),

– here N is MN’s function

– we use N* rather than N to allow moves

Page 42: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Proof of equivalence

It's enough to show that D*([q0],x) = [N*(q0,x)] for all x– since then N accepts x iff N*(q0,x) contains a

state of F iff [N*(q0,x)] is final in D iff D*([q0],x) is final in D iff D accepts x

But the desired equality follows by a straightforward (if ugly) induction.

Page 43: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Example 1 of an equivalent DFA

The NFA below accepts L(0*1*)  – δ | 0 1 – q0 | {q0} {} {q1}

– F q1 | {} {q1} {}

The equivalent DFA (with trap state):– F [q0] | [q0,q1] [q1]

– F [q0,q1] | [q0,q1] [q1]

– F [q1] | [] [q1]

–   [] | [] []

Page 44: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Example 2 of an equivalent DFA

The DFA equivalent to the NFA given above for L((10)*12) is:– δ | 0 1 2– [q0] | [] [q1, q2] []

– [q1, q2] | [q0] [] [q3]

– F [q3] | [] [] []

Here the dead state and inaccessible states have been omitted

Page 45: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Another example with moves

The NFA below accepts L(10+(12)*)  

Page 46: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The equivalent DFA

The equivalent DFA is:– δ | 0 1 2– F [q0] | [] [q2,q5] []

– [q2,q5] | [q3] [] [q4]

– F [q3] | [] [] []

– F [q4] | [] [q5] []

–  [q5] | [] [] [q4]

Page 47: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Breadth-first search (BFS)

Several steps in the algorithms above require finding all states accessible from some state q.

These steps can be implemented by a breadth-first search (BFS) algorithm, which– is a specialization of the algorithm of Linz, p. 9– can be used in step 2 of Linz’s nfa-to-dfa

algorithm of p. 59– computes the transitive closure of a relation

Page 48: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The BFS algorithm

The algorithm maintains a queue of generated states whose successors have not been generated.

Until the queue becomes empty, the next state is dequeued and marked, and its successors found.

Those successors that are not marked are enqueued.

The queue will eventually empty, since there are only finitely many states

Page 49: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Using the BFS algorithm

BFS can find all -paths leaving a state q– a “successor” is the destination of a -transition – the queue is initialized with q only

Or all accessible states of a DFA – a “successor” is the destination of any transition – the queue is initialized with q0 only

Or all nondead states of a DFA– q’s “successor” is the source of any transition to q – the queue is initialized with all states of F

Page 50: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

M-equivalence of strings

In both DFA minimization and in finding regular expressions for DFAs, it’s useful to identify states with the strings taken there– that is, we identify q with {x | *(q0,x) = q}

Notation: [x] is the state to which x is taken [x] may also be thought of as an equivalency

class for an equivalency relation ~M on *

Here x ~M y iff *(q0,x) = *(q0,y).

Page 51: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Distinguishability

Another useful notion for minimization is distinguishability

For a language L, two strings x and y in * are distinguishable iff for some z in *, exactly one of {xz, yz} is in L

It's not hard to see that indistinguishability is an equivalence relation on *.

We’ll use the infix operator ~ for this relation– note that it does not depend on any machine.

Page 52: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Minimal DFAs and distinguishability

A DFA M separates x and y if and only if *(q0,x) ≠ *(q0,y)– that is, iff x ≁M y

DFAs must separate distinguishable strings DFAs may separate indistinguishable strings One might expect minimal DFAs to separate

only distinguishable strings– this is the key to finding minimal DFAs

Page 53: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The minimal DFA for L

Note: for a given M accepting L, ~M refines ~– that is, if x ~M y, then x ~ y

– so every class of ~M is in a single class of ~

– and ~ has no more classes than any ~M

If the classes of ~ can represent the states of some DFA for L, this DFA would be minimal.

They can! Such a DFA M^ must have this set of states, and alphabet , and …

Page 54: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The minimal DFA for L

… the class [] containing as its start state, transition function ^, where ^([x],a) = [xa],

– note: this is well-defined; if y [x], then [ya] = [xa]

and {[x] | x L} as its set of final states.– note: strings in any class are all in L or all not in L

An easy induction shows that ^*([],x) = [x]– and thus that M^ accepts L

This construction depends only on L– and not on any DFA accepting L

Page 55: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Combining states

We’ve now seen that M^ can be obtained by combining states of any M accepting L

But which states of M do we combine?– the indistinguishable states, where for all x in *,

both of {*(p,x), *(q,x)} are final or both not?

These are hard to recognize So instead we split distinguishable states

– that is, we successively refine a partition of Q– starting with a split of states into F and Q-F

Page 56: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Distinguishing states

States can be distinguished recursively, by an “unzipping” process analogous to BFS– using rather than *

For example, if p and q are distinguishable because *(p,abc) is in F and *(q,abc) isn’t– *(p,abc) and *(q,abc) are distinguished initially– *(p,ab) and *(q,ab) will be distinguished based

on c– then *(p,a) and *(q,a) will be distinguished

based on b– then p and q will be distinguished based on a

Page 57: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The minimization algorithm

To test whether a set S of states may be split, find (p,a) for all p in S and some a in

If the results are in different sets of the partition, split S based on these sets.

The minimization algorithm (cf. Linz, pp. 64-5), repeatedly applies this test (as a single pass) to all nonsingleton sets in the current partition, and a in

It halts if no splits are made in a pass – only finitely many passes are needed, as for Linz

Page 58: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

From regular expressions to DFAs

To find an NFA accepting L(r) for a regular expression r, it’s enough to– find NFAs for the three basic regular expressions– show how to deal with the three ways of creating

new regular expressions from old

The relevant constructions are given in Linz, Figures 3.1-3.5

NFAs constructed in this way can be converted to DFAs and minimized

Page 59: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

From DFAs to regular expressions

Linz’s algorithm for constructing a regular expression for L(M) is perhaps more naturally expressed in terms of equations.

The solution technique is very much as for simultaneous linear equations in algebra

We identify each state A with {x | (q0,x) = A} For each state we construct an equation

based on the incoming transitions.

Page 60: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Equations for DFAs

For example, the DFA of Figure 2.13 of Linz corresponds to the pair of equations– A = + Bb– B = Aa + Ba

Note that start states have a term, and that we may ignore dead states

We proceed by eliminating equations– it’s usually best not to eliminate the start state’s

equation

Page 61: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

Eliminating equations

Fact: X = r + Xs has the solution X = rs*– if r does not contain the variable X, s is a regular

expression, and L(s)∉– e.g., B = Aa + Ba has solution B = Aaa*

After solving an equation, we may substitute into other equations.– for us, A = + Bb becomes A = + Aaa*b – and solving for A gives A = (aa*b)* = (aa*b)*– so L(M) = A = (aa*b)*– if B alone is final, L(M) = B = Aaa* = (aa*b)*aa*

Page 62: Uses of DFAs DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given.

The example of binary multiples of 3

For binary multiples of 3, we get equations– A = + A0 + B1– B = A1 + C0– C = B0 + C1

We see that C = B01*, and then get– A = + A0 + B1– B = A1 + B01*0

From B = A1(01*0)* we get – A = + A[0 + 1(01*0)*1] and then – L(M) = A = [0 + 1(01*0)*1]* = [0 + 1(01*0)*1]*


Recommended