+ All Categories
Home > Documents > BASIC GUIDE OF COMPUTER LANGUAGE

BASIC GUIDE OF COMPUTER LANGUAGE

Date post: 09-Nov-2015
Category:
Upload: jonmartello
View: 20 times
Download: 2 times
Share this document with a friend
Description:
This documetn explains the basic language of computer since the introduction of DFAs anf the theory of Dassow. In this paper, we investigate a question of Dassow et al. as to how these sizes are related.
11
Theoretical Computer Science 578 (2015) 2–12 Contents lists available at ScienceDirect Theoretical Computer Science www.elsevier.com/locate/tcs On the state complexity of partial word DFAs Eric Balkanski a , F. Blanchet-Sadri b,, Matthew Kilgore c , B.J. Wyatt b a Department of Mathematical Sciences, Carnegie Mellon University, Wean Hall 6113, Pittsburgh, PA 15213, USA b Department of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro, NC 27402-6170, USA c Department of Mathematics, Lehigh University, 14 East Packer Avenue, Bethlehem, PA 18015, USA a r t i c l e i n f o a b s t r a c t Article history: Received 28 October 2013 Received in revised form 19 May 2014 Accepted 14 January 2015 Available online 19 January 2015 Keywords: Automata and formal languages State complexity Regular languages Partial languages Partial words Deterministic finite automata Non-deterministic finite automata Recently, Dassow et al. connected partial words and regular languages. Partial words are sequences in which some positions may be undefined, represented with a “hole” symbol . If we restrict what the symbol can represent, we can use partial words to compress the representation of regular languages. Doing so allows the creation of so-called -DFAs, smaller than the DFAs recognizing the original language L, which recognize the compressed language. However, the -DFAs may be larger than the NFAs recognizing L. In this paper, we investigate a question of Dassow et al. as to how these sizes are related. © 2015 Elsevier B.V. All rights reserved. 1. Introduction The study of regular languages dates back to McCulloch and Pitts’ investigation of neuron nets (1943) and has been ex- tensively developing since (for a survey see, e.g., [2]). Regular languages can be represented by deterministic finite automata, DFAs, by non-deterministic finite automata, NFAs, and by regular expressions. They have found a number of important ap- plications such as compiler design. There are well-known algorithms to convert a given NFA to an equivalent DFA and to minimize a given DFA, i.e., find an equivalent DFA with as few states as possible (see, e.g., [3]). It turns out that there are languages accepted by DFAs that have 2 n states while their equivalent NFAs only have n states, these DFAs with 2 n states being optimal, i.e., minimal. Let Σ be a finite alphabet of letters. A (full) word over Σ is a sequence of letters from Σ . We denote by Σ the set of all words over Σ , the free monoid generated by Σ under the concatenation of words where the empty word ε serves as the identity. A language L over Σ is a subset of Σ . It is regular if it is recognized by a DFA or an NFA. A DFA is a 5-tuple M = ( Q , Σ, δ, q 0 , F ), where Q is a set of states, δ : Q × Σ Q is the transition function, q 0 Q is the start state, and F Q is the set of final or accept states. In an NFA, δ maps Q × Σ to 2 Q . We call | Q | the state complexity of the automaton. Recently, Dassow et al. [4] connected regular languages and partial words. Partial words first appeared in 1974 and are also known under the name of strings with don’t cares [5]. In 1999, Berstel and Boasson [6] initiated their combinatorics under the name of partial words. Since then, many combinatorial properties and algorithms have been developed (see, This material is based upon work supported by the National Science Foundation under Grant No. DMS-1060775. Part of this paper was presented at CIAA’13 [1]. We thank the referees of a preliminary version of this paper for their very valuable comments and suggestions. * Corresponding author. E-mail address: [email protected] (F. Blanchet-Sadri). http://dx.doi.org/10.1016/j.tcs.2015.01.021 0304-3975/© 2015 Elsevier B.V. All rights reserved.
Transcript
  • Theoretical Computer Science 578 (2015) 212

    Contents lists available at ScienceDirect

    O

    Ea Db Dc D

    a

    ArReReAcAv

    KeAuStaRePaPaDeNo

    1.

    teDFplmlabe

    althMF

    alun

    CIA

    *

    ht03Theoretical Computer Science

    www.elsevier.com/locate/tcs

    n the state complexity of partial word DFAs

    ric Balkanski a, F. Blanchet-Sadri b,, Matthew Kilgore c, B.J. Wyatt b

    epartment of Mathematical Sciences, Carnegie Mellon University, Wean Hall 6113, Pittsburgh, PA 15213, USAepartment of Computer Science, University of North Carolina, P.O. Box 26170, Greensboro, NC 27402-6170, USAepartment of Mathematics, Lehigh University, 14 East Packer Avenue, Bethlehem, PA 18015, USA

    r t i c l e i n f o a b s t r a c t

    ticle history:ceived 28 October 2013ceived in revised form 19 May 2014cepted 14 January 2015ailable online 19 January 2015

    ywords:tomata and formal languageste complexitygular languagesrtial languagesrtial wordsterministic nite automatan-deterministic nite automata

    Recently, Dassow et al. connected partial words and regular languages. Partial words are sequences in which some positions may be undened, represented with a hole symbol . If we restrict what the symbol can represent, we can use partial words to compress the representation of regular languages. Doing so allows the creation of so-called -DFAs, smaller than the DFAs recognizing the original language L, which recognize the compressed language. However, the -DFAs may be larger than the NFAs recognizing L. In this paper, we investigate a question of Dassow et al. as to how these sizes are related.

    2015 Elsevier B.V. All rights reserved.

    Introduction

    The study of regular languages dates back to McCulloch and Pitts investigation of neuron nets (1943) and has been ex-nsively developing since (for a survey see, e.g., [2]). Regular languages can be represented by deterministic nite automata, As, by non-deterministic nite automata, NFAs, and by regular expressions. They have found a number of important ap-ications such as compiler design. There are well-known algorithms to convert a given NFA to an equivalent DFA and to inimize a given DFA, i.e., nd an equivalent DFA with as few states as possible (see, e.g., [3]). It turns out that there are nguages accepted by DFAs that have 2n states while their equivalent NFAs only have n states, these DFAs with 2n states ing optimal, i.e., minimal.Let be a nite alphabet of letters. A (full) word over is a sequence of letters from . We denote by the set of

    l words over , the free monoid generated by under the concatenation of words where the empty word serves as e identity. A language L over is a subset of . It is regular if it is recognized by a DFA or an NFA. A DFA is a 5-tuple = (Q , , , q0, F ), where Q is a set of states, : Q Q is the transition function, q0 Q is the start state, and Q is the set of nal or accept states. In an NFA, maps Q to 2Q . We call |Q | the state complexity of the automaton.Recently, Dassow et al. [4] connected regular languages and partial words. Partial words rst appeared in 1974 and are

    so known under the name of strings with dont cares [5]. In 1999, Berstel and Boasson [6] initiated their combinatorics der the name of partial words. Since then, many combinatorial properties and algorithms have been developed (see,

    This material is based upon work supported by the National Science Foundation under Grant No. DMS-1060775. Part of this paper was presented at A13 [1]. We thank the referees of a preliminary version of this paper for their very valuable comments and suggestions.Corresponding author.E-mail address: [email protected] (F. Blanchet-Sadri).

    tp://dx.doi.org/10.1016/j.tcs.2015.01.02104-3975/ 2015 Elsevier B.V. All rights reserved.

  • E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212 3

    e.-prplof

    ofabladeofcasae.la

    Lfrmhaanstm

    Hin

    topawcadiW

    2.

    trstth

    2.

    Th

    Prg., [7]). One of Dassow et al.s motivations was to compress DFAs into smaller machines, called partial word DFAs or DFAs, which may have transitions labelled by , a dont care symbol, that can replace letters of the alphabet. This feature ovides a type of restricted nondeterminism that allows -DFAs to fall between NFAs and DFAs in regards to state com-exity. In this paper, we solve several problems raised by Dassow et al. providing a better understanding of the structure the -DFAs, their relation to DFAs and NFAs, and the eciency of using them to accept regular languages.More precisely, setting = {}, where / represents an undened position, or a hole, and matches every letter

    , a partial word over is a sequence of symbols from (a full word is a partial word without holes). For example, bbcba is a partial word with two holes over {a, b, c}. Denoting the set of all partial words over by , a partial nguage L over is a subset of . It is regular if it is regular when being considered over . In other words, we ne languages of partial words, or partial languages, by treating as a letter. A regular partial language over , subset , can be recognized by a -DFA, i.e., a DFA of the form (Q , , , q0, F ). Partial languages over , subsets of , n be transformed to languages over , subsets of , by using -substitutions over . A -substitution : 2tises (a) = {a} for all a , () , and (uv) = (u) (v) for u, v . As a result, is fully dened by (), g., if () = {a, b} and L = {b, c} then (L) = {ab, bb, ac, bc}. If we consider this process in reverse, we can compress nguages into partial languages.Given a regular language L, L , the minimal state complexity among all -DFAs that accept partial languages L , , such that (L) = L for some -substitution , is referred to as min-DFA(L). We consider the following question om Dassow et al. [4]: Is there a regular language L such that min-DFA(L) is (strictly) less than minDFA(L), the inimal state complexity of a DFA accepting L? Reference [4, Theorem 4] states that for every regular language L, we ve minDFA(L) min-DFA(L) minNFA(L), where minNFA(L) denotes the minimal state complexity of an NFA accepting L, d there exist regular languages L such that minDFA(L) > min-DFA(L) > minNFA(L). On the other hand, [4, Theorem 5]ates that if n 3 is an integer, regular languages L and L exist such that min-DFA(L) n + 1, minDFA(L) = 2n 2n2, inNFA(L) 2n + 1, and min-DFA(L) 2n 2n2. This has been the rst step towards analyzing the sets:

    Dn ={m there exists L such that min-DFA(L) = n and minDFA (L) =m

    },

    Nn ={m there exists L such that min-DFA(L) = n and minNFA (L) =m

    }.

    ence, Dn describes the increase in state complexity when using a DFA instead of a -DFA, while Nn describes the decrease state complexity when using an NFA instead of a -DFA.Our paper, whose focus is the analysis of Dn and Nn , is organized as follows. We obtain in Section 2 values belonging

    Dn by looking at specic types of regular languages, followed by values belonging to Nn in Section 3. We show, in rticular, that 2n 1 is the least upper bound for values in Dn when we consider languages with some arbitrarily long ords. Due to the nature of NFAs, generating a sequence of minimal NFAs from a -DFA is dicult. However, in the se minDFA(L) >min-DFA(L) = minNFA(L), we show how to use concatenation of languages to create an L with systematic fferences between min-DFA(L) and minNFA(L). We also develop a way of applying integer partitions to obtain such values. e conclude with some remarks in Section 4.

    Constructs for Dn

    This section provides some values for Dn by analyzing several classes of regular languages. In the description of the ansition function of our DFAs and -DFAs, all the transitions lead to the error state (a sink non-nal state) unless otherwise ated. Also, in our gures, the error state and transitions leading to it have been removed for clarity. We will often refer to e following algorithm.Given a -DFA M = (Q , , , q0, F ) and a -substitution , Algorithm 1 gives a minimal DFA that accepts (L(M )):

    Build an NFA N = (Q , , , q0, F ) that accepts (L(M )), where (q, a) = {(q, a)} if a \ () and (q, a) ={(q, a), (q, )} if a (). Convert N to an equivalent minimal DFA.1. Languages of words of equal length

    First, we look at languages of words of equal length. We give three constructs. Our rst construct is illustrated in Fig. 1.

    eorem 1. For n 1, n13 2 + n13 + 2 + (n 1) mod 3 Dn.

    oof. Let d = n13 and r = (n 1) mod 3. We dene the DFA M = (Q , , , q0, F ) as follows:

    Q = {(0, 0)} Q 1 Q 2 Q 3, where q0 = (0, 0), Q 1 = {(1, i) | 0 i < d}, Q 2 = {(2, i) | 1 i < d2}, and Q 3 = {(i, 0) | 3 i r + 4}, F = {(r + 3, 0)}, and (r + 4, 0) is the error state;

    = {a0, . . . , ad1} {b1, . . . , bd1} {bd+1, . . . , b2d1} {c};

  • 4 E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

    d2

    is

    Thstot

    Th

    PrthFig. 1. DFA M (left) and -DFA M (right) from Theorem 1, n = 11. The error states are omitted.

    Fig. 2. DFA M (left) and -DFA M (right) from Theorem 2, n = 11, x = 4. The error states are omitted.

    is dened as follows: ((0,0),ai) = (1, i) for all ai , ((1, i),a j) = (2, id + j) if (1, i), (2, id + j) Q and a j , ((2, i),b j) = (3, 0) if (2, i) Q , b j , j = id or j = (i mod d) + d, ((i,0), c) = (i + 1, 0) for 3 i 2 + r.

    Observe that L = L(M) = {aia jbkcr | ai, a j, bk ; k = i or k = d + j} and that M contains |Q 1| + |Q 2| + |Q 3| + 1 =+ d + r + 2 states. Furthermore, each state from Q 2 is reached after M reads a unique two-letter sequence, and (3, 0)

    reachable from each state of Q 2 for a unique subset of letters, so M is minimal. We can build a -DFA M such that (L(M )) = L for () = {ai | ai }. Let M = (Q , , , q0, F ), where:

    Q = {(0, 0)} Q 1 Q 2 Q 3, where Q 2 = {(2, i) | 1 i < 2d, i = d}, q0 = (0, 0), F = {(r + 3, 0)}, and the error state is (r + 4, 0);

    is dened as follows: ((0,0),) = (1, 0) and ((0,0),ai) = (1, i) for all ai , i = 0, ((1,0),ai) = (2, d + i) and ((1, i),) = (2, i) for 1 i < d, ((2, i),bi) = (3, 0) for all (2, i) Q , ((i,0), c) = (i + 1, 0) for 3 i 2 + r.

    en L(M ) = {aib jcr | ai, b j ; j = d + i} {aib jcr | ai, b j ; j = i}. Furthermore, M is minimal with 3d + r + 1 = nates over all -substitutions since each pair of states (1, i), (2, j) is necessary. If i = 0, the pair is needed for a jdb j ; herwise, it is needed for aib j . Our next construct is illustrated in Fig. 2.

    eorem 2. For n 1, if x =1+8(n1)1

    2 then 2x + n 1 x(x+1)2 Dn.

    oof. We start by writing n as n = r +xi=1 i such that 1 r x + 1 (from [8], x is as stated). Let M = (Q , , , q0, F ) be e DFA dened as follows:

  • E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212 5

    Eaax0W

    Oofov

    ill

    Th

    PrFig. 3. DFA M (top) and -DFA M (bottom) from Theorem 3, k = 3, l = 1, r = 0. The error states are omitted.

    {(i, j) | 0 i < x, 0 j < 2i, (i, j) = (x 1, 0)} {(i, 0) | x i x + r} = Q , q0 = (0, 0), F = {(x + r 1, 0)}, and (x + r, 0)is the error state;

    = {a0, a1, c} {bi | 1 i < x}; is dened as follows:

    ((i, j), ak) = (i + 1, 2 j + k) for all (i, j), (i + 1, 2 j + k) Q , ak , i = x 1, with the exception of ((x 2, 0), a0) =(x + r, 0),

    ((x 1, i), b j) = (x, 0) for all (x 1, i) Q , b j where the jth digit from the right in the binary representation of i is a 1,

    ((i, 0), c) = (i + 1, 0) for x i < x + r.

    ch word accepted by M can be written in the form w = ubicr1, where u is a word of length x 1 over {a0, a1} except for 1, and bi belongs to some subset of unique for each u. This implies that M is minimal with 2x +n 1 x(x+1)2 states. e can build the minimal equivalent -DFA for () = {a0, a1}, giving M = (Q , , , q0, F ) with n states as follows:

    {(i, j) | 0 i < x, 0 j i, (i, j) = (x 1, 0)} {(i, 0) | x i x + r} = Q , q0 = (0, 0), F = {(x + r 1, 0)}, and (x + r, 0)is the error state;

    is dened as follows: ((i, 0), a1) = (i + 1, i + 1) for 0 i < x 1, ((i, j), ) = (i + 1, j) for all (i, j) Q \{(x 2, 0)} where i < x 1, ((x 1, i), bxi) = (x, 0) for 1 i < x, ((x + i, 0), c) = (x + i + 1, 0) for 0 i < r 1.

    bserve that L(M ) = {xi1a1i1bicr1 | 1 i < x}, so (L(M )) = L(M). Each accepted word consists of a unique prex length x 1 paired with a unique bi , and r states are needed for the sux cr1, which implies that M is minimal er all -substitutions. Note that |Q | = (xi=1 i) + r = n. The two previous constructs both used an alphabet of variable size. Our next construct restricts this to a constant k. It is

    ustrated in Fig. 3.

    eorem 3. For k > 1 and l, r 0, let n = k(k+2l+3)2 + r + 2. Then

    2k+1 + l(2k 1)+ r Dn.oof. We start by dening M = (Q , , , q0, F ) as follows:

    Q = Q 1 Q 2 Q 3, where Q 1 = {(i, j) | 0 i k and 0 j < 2i}\{(k, 0)},Q 2 = {(i, j) | k < i k + l and 1 j < 2k},Q 3 = {(i, 0) | k + l < i k + l + r + 2}, q0 = (0, 0), F = {(k + l + r + 1, 0)}, and (k + l + r + 2, 0) is the error state;

    = {a0, . . . , ak1};

  • 6 E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

    Eazremwst

    Wla

    2.

    Th

    PrFig. 4. DFA M (top) and -DFA M (bottom) from Theorem 4, n = 7 and m = 15 (l = 3, r = 1). The error states are omitted.

    is dened as follows: ((i, j), a0) = (i + 1, 2 j) for all (i, j) Q , 0 i < k, except (k 1, 0), ((i, j), a1) = (i + 1, 2 j + 1) for all (i, j) Q , 0 i < k, ((i, j), a0) = (i + 1, j) for all (i, j) Q , k i < k + l, ((k + l, i), a j1) = (k + l + 1, 0) for all (k + l, i) Q and a j1 where the jth digit from the right in the binary

    representation of i is 1, ((i, 0), a0) = (i + 1, 0) for k + l + 1 i k + l + r.

    ch word accepted by M can be written as xyai z, where x is a word of length k over {a0, a1} with at least one a1, y = al0, = ar0, and rev(x)[i] = a1, where rev(x) denotes the reversal of x. Each x corresponds to a different non-empty subset of presenting which letters ai are allowed in position k + l (numbering of positions starts at 0), meaning that each prex xyust be represented by a unique state in M , which accounts for the states in Q 1 and Q 2. The set Q 3 contains (k + l + 1, 0), hich M reaches after reading any prex xyai of an accepted word, along with the r states needed to spell z and the error ate. Thus, M is minimal with 2k+1 + l(2k 1) + r states.We now dene a -DFA M = (Q , , , q0, F ) as follows:

    Q = Q 1 Q 2 Q 3 where Q 1 = {(i, j) | 0 i k and 0 j i}\{(k, 0)},Q 2 = {(i, j) | k < i k + l and 0 < j k},Q 3 = {(i, 0) | k + l < i k + l + r + 2}, q0 = (0, 0), F = {(k + l + r + 1, 0)}, and (k + l + r + 2, 0) is the error state; is dened as follows: ((i, j), ) = (i + 1, j) for all (i, j) Q , 0 i < k, except for (i, j) = (k 1, 0), ((i, 0), a1) = (i + 1, i + 1) for all (i, j) Q , 0 i < k, ((i, j), a0) = (i + 1, j) for all (i, j) Q , k i < k + l, ((k + l, i), aki) = (k + l + 1, 0) for all (k + l, i) Q and aki , ((i, 0), a0) = (i + 1, 0) for k + l + 1 i k + l + r.

    e have (L(M )) = L for () = {a0, a1}. State (k + l +1, 0) is reachable from each state (k + l, i) through a single, uniquely belled transition. Thus, M is minimal with n states. 2. Languages of words of bounded length

    Next, we look at languages of words of bounded length. The following theorem is illustrated in Fig. 4.

    eorem 4. For n 3, [n, n + (n2)(n3)2 ] Dn.

    oof. Write m = n + r+n3i=l i for the lowest value of l 1 such that r 0. Let M = (Q , , , q0, F ) be dened as follows: = {a0, ar} {ai | l i n 3}; Q = {(i, 0) | 0 i < n} {(i, j) | a j and 1 i j}, q0 = (0, 0), F = {(n 2, 0)} {(i, i) | i = 0, (i, i) Q }, and (n 1, 0)

    is the error state; is dened by ((0, 0), ai) = (1, i) for all ai where i > 0, ((i, j), a0) = (i + 1, j) for all (i, j) Q , i = j, and

    ((i, i), a0) = (i + 1, 0) for all (i, i) Q .

  • E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212 7

    Fise

    Th

    erQaist

    Le

    befoalacofth

    istra5th-otpaon{2thre

    Thfog. 5. -DFA R7 (top if the dashed edges are seen as solid) and minimal DFA for (L7) (bottom if the dotted element is ignored and the dashed edges are en as solid) where 0 = a, 1 = b, 2 = c, 3 = d and () = {a0, a2, a3, a4, a5, b3, b4, b5, c4, c5, d5}. The error states are omitted.

    en L(M) = {aian30 | ai } {aiai10 | ai , i = 0}. For each ai, i = 0, M requires i states. These are added to the ror state and the n 1 states needed for an20 . Thus, M is minimal with m states. Let M = (Q , , , q0, F ), where = {i | 0 i < n}, q0 = 0, F = {n 2}, and n 1 is the error state; is dened by (0, ) = 1, (0, ai) = n 1 i for all , i > 0, and (i, a0) = i + 1 for 1 i < n 1. For () = , we have (L(M )) = L(M). Furthermore, M needs n 1ates to accept an30 L(M ), so M is minimal with n states. Theorem 4 gives elements of Dn close to its lower bound. To nd an upper bound, we look at a specic class of machines. t n 2 and let

    Rn =({0, . . . ,n 1}, {a0} {(i) j 2 i + 2 j n 2}, ,0, {n 2}) (1)

    the -DFA where n 1 is the error state, and is dened by (i, ) = i + 1 for 0 i < n 2 and (i, (i) j) = jr all (i) j . Fig. 5 gives an example when n = 7. Set Ln = (L(Rn)), where is the -substitution that maps to the phabet. Note that Rn is minimal for L(Rn), since we need at least n 1 states to accept words of length n 2 without cepting longer strings. Furthermore, Rn is minimal for , as each letter (i) j encodes a transition between a unique pair states (i, j). This also implies that Rn is minimal for any -substitution. The next two theorems look at the minimal DFA at accepts Ln . We refer the reader to Fig. 5 to visualize the ideas behind the proofs.Referring to Fig. 5, in the DFA, each explicitly labelled transition is for the indicated letters. From each state, there

    one transition that is not labelled this represents the transition for each letter not explicitly labelled in a different ansition from that state. (For example, from state 0, a3 transitions to {1, 3}, a2 transitions to {1, 2}, a4 transitions to {1, 4}, transitions to {1, 5}, and all other letters a0, b3, b4, b5, c4, c5, d5 transition to {1}.) We introduce a new letter, e, into e alphabet and add a new state, {2, 3, 4, 5}, along with a transition from {1, 3} to {2, 3, 4, 5} for e. We want to alter the DFA to accommodate this. So we add a transition for e from 1 to 3 and from 3 to 5 (represented by dashed edges). All her states transition to the error state for e. Now consider the string a3e. We get four strings that correspond to some rtial word that produces a3e after substitution: a3e, a3, e, and . When the -DFA reads the rst, it halts in state 5; the second, it halts in 4; on the third, it halts in 3; and for the fourth, it halts in 2, which matches the added state , 3, 4, 5}. Finally, we need to consider the effect of adding e and the described transitions to the -DFA does it change e corresponding minimal DFA in other ways? To show that it does not, all transitions with dashed edges in the DFA present the transitions for e, e.g., from state {2, 3}, an e transitions to {3, 4, 5}.

    eorem 5. Let Fib be the Fibonacci sequence dened by Fib(1) = Fib(2) = 1 and for n 2, Fib(n + 1) = Fib(n) + Fib(n 1). Then r n 1, Fib(n + 1) Dn.

  • 8 E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

    PrQor

    wel

    D

    Th

    PrQstfrotoonas

    thQmfo(Ifst

    bonuad

    Th

    Preran|wM0 Ma 0 th

    2.

    Thoof. For n 2, applying Algorithm 1, convert M = Rn to a minimal DFA M = (Q , , , q0, F ) that accepts Ln , where 2{0,...,n1} . For each state {i} Q for 0 i n 2, M requires additional states to represent each possible subset of one more states of {i + 1, . . . , n 2} that M could reach in i transitions. Thus M is minimal with number of states

    1+n2i=0

    min{i,n2i}j=0

    (n 2 i

    j

    )= Fib(n + 1),

    here the summand 1 refers to the error state and where the inside sum refers to the number of states with minimal ement i. We can use the machine M from the proof of Theorem 5 to construct a machine that gives the least upper bound for

    n for languages of words of bounded length.

    eorem 6. For n 3,n1i=0

    (n 1 log2 i

    i

    ) Dn.

    oof. Let M and M be as in Theorem 5 and let () = = {a0} {(i) j | 2 i + 2 j n 2}. For each state P \{{n 2}, {n 1}}, consider the set P Q of possible states to which M transitions when reading one symbol. For each ate i in P , if i + 1 n 1 then i + 1 is in each state p P , as M must track, for each symbol, the state M would reach m a current state on reading . At most, for each i in P , i n 4, p may contain one additional state j, i +2 j n 2,

    track non- transitions in M that do not end in the error state. Thus for each i, 1 i n 2, M needs at maximum e state corresponding to each non-empty subset of {i, . . . , n 2}. Counting the number of resulting states gives our result the upper bound for m in this case.For n 7, M contains fewer states than our bound. However, we can modify M to produce a machine M1 that reaches

    is bound. Let N = (Q , , , q0, F ) be the NFA accepting L(M) such that (q, a) = {(q, a)} for each (q, a). Set Q = S2 Sn4, where Si is the set of all subsets of {0, . . . , n 2} of size no greater than 2i containing i as its lowest ember. Then for each state P Q that is unreachable, we add a letter / to our alphabet and dene for as llows. Let i be the lowest state in P . We select a state P1 Q such that P1 is a subset of P of size |P | 1 that contains i. no such P1 yet exists, we move to a different P ; eventually, such a P1 is generated with this method.) We then select a ate P2 Q such that there is a transition (P2, a) = P1 for some a , a = a0. We then dene (P2, ) = P .We convert N into a minimal equivalent DFA M1 = (Q 1, 1, 1, {q0}, F1), which has the same number of states as our und. Then for () = 1, we derive a minimal -DFA M 1 so that L(M1) = (L(M 1)). Note that M 1 and M have the same mber of states, and the transition functions are identical for all states and symbols common to both machines. For each ded letter , all but two transitions lead to the error state. Our next result restricts the alphabet size to 2.

    eorem 7. For n 1,n2(n2 + 1) + n12 (n12 + 1)

    2+ 1 Dn.

    oof. Let M = (Q , , , q0, F ) be the -DFA, where Q = {0, . . . , n 1}, = {a, b}, q0 = 0, F = {n 2}, n 1 is the ror state, and is dened by (i, ) = i + 1 for i < n 1, and (i, b) = i + 2 for i < n 2. For a word w over , let |w|ad |w|b be the number of occurrences in w of a and b, respectively. Observe that (L(M )) = {w | n 2 |w|b |a + |w|b n 2} for () = , and that M is minimal, as an2 (L(M )), but ai / (L(M )) for i > n 2. Next, let be the minimal DFA constructed for (L(M )) using Algorithm 1. Then M has a state corresponding to {i} 2Q for each

    i n 2 to accept an2. For each b read by M from a state corresponding to a subset {i} of Q where 0 i n 4, has a state corresponding to {i + 1, i + 2} to represent the states in Q that M reaches from i after reading either

    or a b. Continuing this way, M has a state corresponding to each subset {i + j, i + j + 1, . . . , i + 2 j} of Q , where i n 2, 0 j n2i2 . Thus we can calculate the number of non-error states in M . If we add 1 to this number for e error state, we get our result. 3. Languages with some arbitrarily long words

    Finally, we look at languages with some arbitrarily long words.

    eorem 8. For n 1, 2n 1 is the least upper bound for m Dn.

  • E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212 9

    PrAup

    (QfothPpo

    ansust

    i Thw

    an

    beor0 siex

    Th

    Pr{a

    thcoacar

    aswalal

    n

    3.

    sythoof. Let M be a minimal -DFA with -substitution . If we convert this to a minimal DFA accepting (L(M )) using lgorithm 1, the resulting DFA has at most 2n 1 states, one for each non-empty subset of the set of states in M . Thus an per bound for m Dn is 2n 1.We show that there exists a regular language L such that min-DFA(L) = n and minDFA(L) = 2n 1. Let M =, , , q0, F ) with Q = {0, . . . , n 1}, = {a, b}, q0 = 0, F = {n 1}, and dened by (i, ) = i +1 for 0 i < n 1,

    {, a}; (n 1, ) = 0 for {, a}; and (i, b) = 0 for 0 i < n. Then M is minimal, since n1 L(M ) but i / L(M )r 0 i < n 1. After constructing the minimal DF A M = (Q , , , q0, F ) using Algorithm 1 for () = {a, b}, we claim at all non-empty subsets of Q are states in Q . To show this, we construct a word that ends in any non-empty subset of Q . Let P = {p0, . . . , px} with p0 < < px . We start with apx . Then create the word w by replacing the a in each sition px pi 1, 0 i < x, with b.We show that w ends in state P by rst showing that for each pi P , some partial word w exists such that w (w )d M halts in pi when reading w . First, suppose pi = px . Since |w| = px , let w = px . For w , M halts in px . Now, ppose pi = px . Let w = pxpi1bpi . After reading pxpi1, M is in state px pi 1, then in state 0 for b, and then in ate pi after reading pi .Now suppose a partial word w exists such that w (w ) where M halts in p for p / P . Suppose p > px . Each state Q is only reachable after i transitions and |w | = px , so M cannot reach p after reading w . Now suppose p < px .

    en M needs to be in state 0 after reading px p symbols to end in p, so we must have w [px p 1] = b. However, [px p 1] = a, a contradiction.Furthermore, no states of Q are equivalent, as each word w ends in a unique state of Q . Therefore, M has 2n 1 states, d 2n 1 Dn . To further study intervals in Dn , we look at the following class of -DFAs. For n 2 and 0 r < n, let

    Rn,r{s1, . . . , sk} =({0, . . . ,n 1}, {a0,a1, . . . ,ak}, ,0, {n 1}) (2)

    the -DFA where {s1, . . . , sk} is a set of tuples whose rst member is a letter ai , distinct from a0, followed by one more states in ascending order, and where (q, ai) = 0 for all (q, ai) that occur in the same tuple, (i, ) = i + 1 for i n 2, (n 1, ) = r, and (q, ai) = (q, ) for all other (q, ai). Since Rn,r{} is minimal for any -substitution, and nce and non- transitions from any state end in the same state, Algorithm 1 converts Rn,r{} to a minimal DFA with actly n states. The next result looks at -DFAs of the form Rn,r{(a1, 0)}.

    eorem 9. For n 2 and 0 i < n, n + (n 1)i Dn.

    oof. Let a0 = a and a1 = b, let r = n i 1, let () = = {a, b}, and let M be the -DFA Rn,r{(b, 0)} = ({0, . . . , n 1},, b}, , 0, {n 1}), where

    a b 0 1 0 11 2 2 2...

    .

    .

    ....

    .

    .

    .

    n 2 n 1 n 1 n 1n 1 r r r

    Using Algorithm 1, let M = (Q , , , {0}, F ) be the minimal DFA accepting (L(M )). For all words over of length less an n, M must halt in some state P Q , a subset of consecutive states of {0, . . . , n 1}. Moreover, any state P Q of nsecutive states of {0, . . . , n 1}, with minimal element p, is reached by M when reading bqap for some q 0. Also, any cept states in Q that are subsets of {0, . . . , n 1} of size n r or greater are equivalent, as are any non-accept states that e subsets of size n r or greater such that the n r greatest values in each set are identical. This implies that M requires nj=ni j states for words of length less than n.For words of length n or greater, M may halt in a state P Q that is not a subset of consecutive states of {0, . . . , n 1},

    for some r < p < n 1, it is possible to have r, n 1 P but p / P . This only occurs when a transition from a state Pith n 1 P occurs, in which case, M moves to a state P containing r, corresponding to (n 1, ) for all . Thus, l states can be considered subsets of consecutive values if we consider r consecutive to n 1 or, in other words, if we low values from n 1 to r to wrap around to each other. This means that M requires i1j=1 j states for words of length or greater. Therefore,

    nj=ni j +

    i1j=1 j = n + (n 1)i Dn .

    Constructs for Nn

    Let be an alphabet, and let i = {ai | a } for all integers i, i > 0. Let i : i such that a ai , and let # j be a mbol in no i , for all i and j. Given a language L over , the ith product of L and the ith #-product of L are, respectively, e languages

  • 10 E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

    Ina

    Th

    Prst(

    eqq0

    Th

    thse

    fo

    wre

    LeM(

    th

    2

    Than

    Prthusas

    Ou

    of

    3lei(L) =i

    j=1 j(L),

    i (L) = 1(L)

    ij=2

    {# j1} j(L).

    general, we call any construct of this form, languages over different alphabets concatenated with # symbols, #-concatenation. With these denitions in hand, we obtain our rst interval for Nn .

    eorem 10. For n > 0, [n n13 , n] Nn.

    oof. Let L = {aa, ba, b} be a language over = {a, b}. A minimal NFA recognizing i(L) is dened as having 2i + 1ates, q0, . . . , q2i , with accept state q2i , start state q0, and transition function dened by (q2 j, b j+1) = {q2 j+1, q2( j+1)}, q2 j, a j+1) = {q2 j+1}, and (q2 j+1, a j+1) = {q2( j+1)} for j < i. It is easy to see this is minimal: the number of states is ual to the maximal length of the words plus one. A minimal -DFA recognizing i(L) is dened as having 3i + 1 states, , . . . , q3i1 and qerr, with accept states q3i1 and q3i2, start state q0, and transition function dened as follows:

    (q0, b1) = q2, (q0, ) = q1, and (q1, a1) = q2; (q3 j1, a j+1) = q3 j , (q3 j1, b j+1) = q3 j+1, (q3 j, a j+1) = q3( j+1)1, and (q3 j+1, a j+1) = q3( j+1)1 for 0 < j < i; (q3 j+1, a j+2) = (q3( j+1)1, a j+2) and (q3 j+1, b j+2) = (q3( j+1)1, b j+2) for 0 < j < i 1.

    e -substitution corresponds to 1 = {a1, b1} here. It is easy to see that this is minimal.Now, x n; take any i n13 . We can write n = 3i + r + 1, for some r 0. Let { j}0 jr be a set of symbols not in

    e alphabet of i(L). Minimal NFA and -DFA recognizing i(L) {0 r} can clearly be obtained by adding to each a ries of states q0 = q0, q1, . . . , qr , and qr+1 = q2i and qr+1 = q3i1 respectively, with (qj, j) = qj+1 for 0 j r. Hence, r i n13 , we can produce a -DFA of size n = 3i + r + 1 which reduces to an NFA of size 2i + r + 1 = n i. Theorem 10 gives an interval for Nn based on i(L) and Theorem 12 will give an interval for Nn based on i (L),

    here no -substitutions exist over multiple j s. To do this, we rst need the following lemma which establishes some lationships between minNFA( i (L)), min-DFA(

    i (L)), minNFA(L), min-DFA(L), and minDFA(L).

    mma 1. Let L, L be languages recognized by minimal NFAs N = (Q , , , q0, F ) and N = (Q , , , q0, F ), where = . oreover, let # / , . Then L = L{#}L is recognized by the minimal NFA N = (Q Q , , , q0, F ), where (q, a) =q, a) if q Q and a ; (q, a) = (q, a) if q Q and a ; (q, #) = {q0} if q F ; and (q, a) = otherwise. Consequently, e following hold:

    1. For any L, minNFA( i (L)) = i minNFA(L);. Let L1, . . . , Ln be languages whose minimal DFAs have no error states and whose alphabets are pairwise disjoint, and without loss of generality, let minDFA(L1) min-DFA(L1) minDFA(Ln) min-DFA(Ln). Then

    min-DFA(L1{#1}L2{#2} Ln

    )= 1+ min-DFA(L1) +n

    i=2minDFA

    (Li).

    This lemma allows us to obtain the following linear bound.

    eorem 11. Let L be a language whose minimal DFA has no error state. Moreover, assume min-DFA(L) = minNFA(L). Fix some nd j, 0 < j nmin-DFA(L)1minDFA(L) . Then n j(minDFA(L) min-DFA(L)) 1 Nn.

    oof. Since 0 < j nmin-DFA(L)1minDFA(L) , we can write n = 1 +min-DFA(L) + j minDFA(L) + r for some r. Then, by Lemma 1(2), is corresponds to n = min-DFA( j+1(L) {w}), where w is a word corresponding to an r-length chain of states, as we ed in the proof of Theorem 10. We also have minNFA( j+1(L) {w}) = ( j + 1) min-DFA(L) + r using Lemma 1(1) and our sumption that min-DFA(L) =minNFA(L). Alternatively,

    minNFA

    ( j+1(L) {w}

    )= n j(minDFA

    (L) min-DFA(L))

    1.r result follows. The above linear bounds can be improved, albeit with a loss of clarity in the overall construct. Consider the interval

    values obtained in Theorem 4. Fix an integer x. The minimal integer y such that x y + (y2)(y3)2 is clearly nx =+8x15

    2 , for x 4. Associate with x and nx the corresponding DFAs and -DFAs used in the proof of Theorem 4, i.e., t Ln,m be the language in the proof with minimal -DFA size n and minimal DFA size m. If we replace each -transition

  • E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212 11

    inmst

    Th

    Prbyalgiim

    n A

    w

    Toa Trhain

    de

    To

    le

    ThFig. 6. Integer partitions = (6,4,1,1) (left) and T = (4,2,2,2,1,1) (right).

    the minimal -DFA and remove the error state, we get a minimal NFA of size n 1 accepting Ln,m (this NFA must be inimal since the maximal length of a word in Ln,m is n 2). Noting that all deterministic automata in question have error ates, we get, using Lemma 1, that

    min-DFA( i (Lnx,x)

    )= 1+ min-DFA(Lnx,x) + (i 1)minDFA (Lnx,x)= nx + (i 1)(x 1),

    minNFA

    ( i (Lnx,x)

    )= iminNFA

    (Lnx,x) = i(nx 1).

    We next prove our interval based on i (Lnx,x).

    eorem 12. For n > nx 4, [n (x nx)nnxx1 1, n] Nn.

    oof. For any n and xed x, write n = nx + (i 1)(x 1) + r, for some 0 r < x 1, which is realizable as a minimal -DFA appending to the minimal -DFA accepting i (Lnx,x) an arbitrary chain of states of length r, using letters not in the phabet of i (Lnx,x), similar to what we did in the proof of Theorem 10. This leads to a minimal NFA of size i(nx 1) + r, ving the lower bound n (x nx)nnxx1 1 if we solve for i. Anything in the upper bound can be obtained by decreasing or replacing occurrences of Lnx,x with Lnx,x j (for some j) and in turn adding additional chains of states of length r, to aintain the size of the -DFA. We can obtain even lower bounds by considering the sequence of DFAs dened in Theorem 8. Recall that for any

    1, we have a minimal DFA, which we call Mn , of size 2n 1; the equivalent minimal -DFA, M n , has size n. Applying lgorithm 1 to M n , the resulting NFA of size n is also minimal. Let n0 n1 nk be a sequence of integers and consider

    min-DFA(L(Mn0){#1}L(Mn1) {#k}L(Mnk )

    )= 1+ n0 +k

    i=1

    (2ni 1), (3)

    here the equality comes from Lemma 1(2). Iteratively applying Lemma 1 gives

    minNFA

    (L(Mn0){#1}L(Mn1) {#k}L(Mnk )

    )=k

    i=0ni . (4)

    understand the difference between (3) and (4) in greater depth, let us view (n1, . . . , nk) as an integer partition, , or as Young Diagram and assign each cell a value (see, e.g., [9]). In this case, the ith column of has each cell valued at 2i1. ansposing about y = x gives the diagram corresponding to the transpose of , T = (m1, . . . , mn1), in which the ith row s each cell valued at 2i1. Note that m1 = k and there are, for each i, mi terms of 2i1. Fig. 6 gives an example of an teger partition and its transpose. Dene (T ) =n1i=1 2i1mi =ki=1(2ni 1) and () =ki=1 ni .Given this, we can view the language L described in (3) and (4), i.e., L = L(Mn0){#1}L(Mn1) {#k}L(Mnk ), as being ned by the integer n0 and the partition of integers = (n1, . . . , nk) with n0 n1. This gives

    min-DFA(L) = 1+ n0 + (T)

    and minNFA

    (L) = n0 + ().

    further understand this, we must consider the following sub-problem: Let () = n. What are the possible values of ()? To proceed here, we dene the sequence pn recursively as follows: if n = 2k 1 for some k, pn = k; otherwise, tting n =m + (2k 1) for k maximal, pn = k + pm . This serves as the minimal bound for the possible values of ().

    eorem 13. If () = n, then () pn. Consequently, for all n and k = log2(n + 1), k + pn N1+k+n.

  • 12 E. Balkanski et al. / Theoretical Computer Science 578 (2015) 212

    Proof. To show that pn is obtainable, we prove that the following partition, n , satises (n) = n and (n) = pn: if n =2k 1 for some k, n = (1k); otherwise, letting n = m + (2k 1) for k maximal, n = 2k1 + m . Here, the sum of two partitions is the partition obtained by adding the summands term by term; (1k) is the k-tuple of ones. Clearly, for partitions and , ( +) = () +() and ( +) = () +(). By construction, (n) = n and (n) = pn . To see this, if n = 2k 1 for some k, (n) = ((1k)) = ((k)T ) = 2k 1 = n and (n) = ((1k)) =ki=1 1 = k = pn . Otherwise,

    (n) = (2k1) + (m) = ((1k))+ (m) = 2k 1+m = n,

    (n) = (2k1) + (m) = ((1k))+ (m) = k + pm = pn.

    To show that pn , or n , is minimal, we can proceed inductively.From the above, each pn is obtainable by a partition of size k, where k is the maximal integer with n 2k 1. Alterna-

    tively, k = log2(n + 1). Fixing n, we get k + pn N1+k+n . 4. Conclusion

    For languages of words of equal length, Theorem 2 gives the maximum element in Dn found so far and Theorem 3gives that maximum element when we restrict to a constant alphabet size. For languages of words of bounded length, Theorem 6 gives the least upper bound for elements in Dn based on minimal -DFAs of the form (1) and Theorem 7 gives the maximum element found so far when we restrict to a binary alphabet. For languages with words of arbitrary length, Theorem 8 gives the least upper bound of 2n 1 for elements in Dn , bound that can be achieved over a binary alphabet. Wfro

    inexerth

    Re

    [1

    [2[3

    [4[5[6[7[8[9e conjecture that for n 1, [n, 2n 1] Dn . This conjecture has been veried for all 1 n 7 based on all our constructs m Section 2.In Section 3, via products, Theorem 10 gives an interval for Nn . If we replace products with #-concatenations, Theorem 12

    creases the interval further. Theorem 13 does not give an interval, but an isolated point not previously achieved. With the ception of this latter result, all of our bounds are linear. Some of our constructs satisfy min-DFA(L) = minNFA(L), ignoring ror states. As noted earlier, this is a requirement for #-concatenations to produce meaningful bounds. Constructs without is restriction are often too large to be useful.

    ferences

    ] E. Balkanski, F. Blanchet-Sadri, M. Kilgore, B.J. Wyatt, Partial word DFAs, in: S. Konstantinidis (Ed.), 18th International Conference on Implementation and Application of Automata, CIAA 2013, Halifax, Nova Scotia, Canada, in: Lecture Notes in Computer Science, vol. 7982, Springer-Verlag, Berlin, Heidelberg, 2013, pp. 3647.

    ] S. Yu, Regular languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, vol. 1, Springer-Verlag, Berlin, 1997, pp. 41110, Ch. 2.] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation International Edition, 2nd ed., AddisonWesley, 2003.

    ] J. Dassow, F. Manea, R. Mercas, Regular languages of partial words, Inform. Sci. 268 (2014) 290304.] M. Fischer, M. Paterson, String matching and other products, in: R. Karp (Ed.), 7th SIAMAMS Complexity of Computation, 1974, pp. 113125.] J. Berstel, L. Boasson, Partial words and a theorem of Fine and Wilf, Theoret. Comput. Sci. 218 (1999) 135141.] F. Blanchet-Sadri, Algorithmic Combinatorics on Partial Words, Chapman & Hall/CRC Press, Boca Raton, FL, 2008.] N.J.A. Sloane, The On-Line Encyclopedia of Integer Sequences, http://oeis.org.] G.E. Andrews, K. Eriksson, Integer Partitions, Cambridge University Press, 2004.

    On the state complexity of partial word DFAs1 Introduction2 Constructs for Dn2.1 Languages of words of equal length2.2 Languages of words of bounded length2.3 Languages with some arbitrarily long words

    3 Constructs for Nn4 ConclusionReferences


Recommended